About
IT Pharmacist with over 12 years of experience in health information technology. Expertise in data analysis, data modelling and data exchange. Interoperability expert with several implementations and different standards. Focused on improving healthcare using data-based decisions. PharmD, Msc, PhD
Data Management & Science.
Creating Data solutions for healthcare.
- City: Porto, Portugal
- Freelance: Available
- Degree: PhD
- Email: joaofcalmeida [at] outlook [dot] com
Healthcare needs to be safer and more efficient. With a combined background in pharmaceutical Sciences and Health Informatics, my ambition is to use technology to improve patient safety, to increase treatment efficacy and to improve care efficiency.
Facts
During my professional and academic activities, I take pride in some numbers that I have achieved
Projects Implemented in different health institutions
Countries all over the world
Different positions across digital health product development
Different classes taught, from health information system development to interoperability
Skills
Some of my main skills are
Resume
Some of my personal, academic, professional and associative history.
Summary
João Almeida
IT Pharmacist with over 10 years of experience in health information technology. Expertise in data analysis, data modelling and data exchange. Interoperability expert with several implementations and different standards. Focused on improving healthcare using data-based decisions.
- Porto, Portugal
- joaofcalmeida [at] outlook.com
Education
PhD in Health Data Science
2019 - 2024
Faculty of Medicine of University of Porto, Portugal
Specialization in Federated Learning, Data modelling, Generative networks and real word evidence
Master in Medical Informatics
2017 - 2019
Faculty of Medicine of University of Porto, Portugal
Focus on pharmacovigilance, medical system evaluation and statistics
Specialization in Health Informatics
2016 - 2017
Faculty of Medicine of University of Porto, Portugal
Focus on data modelling and interoperability
Msc in Pharmaceutical Sciences
2008 - 2013
Faculty of Pharmacy of University of Porto, Portugal
Degree to practice pharmacy in community and hospital setting
Associative
HL7 Portugal Affiliate
2018 - today
Portuguese affiliate of the HL7 foundation. Helped to teach and advise the Portuguese interoperability community on best practices regarding HL7 implementation.
E-mais
2017 - today
One of the Portuguese representatives in the EFMI (European Federation for Medical Informatics). Created initiatives to promote digital health adoption in Portugal.
IHE Pharmacy
2017 - today
Focused on creating profiles to help the community implement correct and useful practices regarding information exchange in the pharmacy setting.
Professional Experience
Independent Consultant
2021 - Present
Worldwide
- Product Owner for gravitate-Health Project: developing, managing and defining features for the product
- Develop, implement, validate and monitor AI-based systems for Healthcare
- Co-authorship of IHE Pharmacy FHIR Profiles on prescription, dispense and drug adverse events
- Creating FHIR servers for medicine regulation, IDMP compatible.
- Creating and implementing FHIR profiles across domains.
- Leading Tracks on connectathons and projects
Interoperability and machine learning
2018 - 2021
HealthySystems, Porto
- Creating technical and clinical alarming mechanisms
- Developing data pipelines for several machine learning algorithms
- Creating, deploying and monitoring ML models in several hospitals
- Creating integration mechanisms with over 15 different suppliers in 7 hospitals
Invited Assistant
2019 - Present
Faculty of Medicine Of University Porto, Porto
- Lecturing classes about interoperability, data standards, Health information systems and health data science
- Creating infrastructure for practical classes support
Researcher
2016 - Present
NanoSTIMA, Porto
- Researching superior tools to support causality assessment in the pharmacovigilance centres. Provide faster, better and more efficiently drug information to the population.
- Currently developing evaluation methods for biomedical systems and natural language processing of biomedical papers.
Integration Analyst
2016 - 2018
Alert Life Sciences Computing, Porto
- Aimed to provide the best medication information for the software in 13 countries. Integrated different information sources into the product, along with communicating with clients, governmental institutions and information providers.
- Integration of several data sources into the application
- Creating ETL processes for incremental updates of information, focusing on data quality and patient safety
- Implementation of automatic mechanisms for client support
Data Analyst
2014 - 2016
Alert Life Sciences Computing, Porto
- Data and business analysis for hospitals over 13 countries. Great involvement with features development, helping both functional analysis and development teams.
- Quality control of content produced for 1+ years
- Functional analysis for several features for 3+ years
- Implementation consultant in over 5 projects
Projects
An overview of a selection of the projects that I have developed over the years.
- All
- Machine Learning
- Interoperability
- Academic
Title: awesome-academic-resources
Description: Links and resources for academics
Topics: academic, awesome-list, cv, and resources
Last Updated: 08 Aug 2023
Description: A curated list of awesome resources for creating synthetic data
Topics: awesome-list, cv, deep-learning, gan, generative-adversarial-network, and machine-learning
Last Updated: 06 Nov 2024
Description: list of respositories for NLP medical related
Topics: awesome-list, cv, machine-learning, named-entity-recognition, and nlp
Last Updated: 07 Jul 2024
Title: distributed-data-benchmark
Description: Making benchmark through distributed systems
Topics: academic, cv, distributed-systems, and machine-learning
Last Updated: 21 Jul 2023
Description: Tools for consuming drug information databases with several formats and export them normalized into CSV for further purposes
Topics: bioinformatics, cv, data-science, database, and machine-learning
Last Updated: 15 Aug 2022
Description: messing around with drug discovery and machine-learning
Topics: cv, drug-discovery, machine-learning, and python
Last Updated: 25 Aug 2022
Description: Data Science Curriculum
Topics: curriculum, cv, data-science, deep-learning, and machine-learning
Last Updated: 15 Aug 2022
Title: Evaluating-distributed-learning-algorithms-on-real-world-healthcare-data
Description: code for the paper Evaluating distributed-learning algorithms on real world healthcare data
Topics: academic, cv, data-science, distributed-systems, and machine-learning
Last Updated: 18 Jul 2023
Title: fhir-server-search-testing
Description: Testing Search Methods of FHIR Servers
Topics: cv, fhir, fhir-server, interoperability, and testing
Last Updated: 15 Aug 2022
Description: PhD Thesis on Health Data Science
Topics: academic and cv
Last Updated: 21 Oct 2024
Title: Health-information-systems-data-analysis
Description:
Topics: academic, cv, and data-visualization
Last Updated: 15 Aug 2022
Description: medicationIG
Topics: cv, fhir, fhir-ig, hl7-fhir, implementationguide, interoperability, and medication
Last Updated: 15 Aug 2022
Description: Drug Catalog on graph
Topics: cv, data-visualization, interoperability, medication, and neo4j
Last Updated: 15 Aug 2022
Description:
Topics: cv, fhir, implementationguide, and interoperability
Last Updated: 12 Jun 2023
Description:
Topics: cv, fhir, hl7-fhir, implementationguide, interoperability, and prescription
Last Updated: 15 Aug 2022
Title: portuguese-medication-idmp-fhir
Description:
Topics: cv, data, graph, idmp, interoperability, and medication
Last Updated: 11 Oct 2022
Description: Python for chemoinformatics
Topics: cheminformatics, chemistry, chemoinformatics, cv, deep-learning, drug-design, drug-discovery, jupyter, machine-learning, python, rdkit, and scikit-learn
Last Updated: 07 Nov 2024
Publications
An overview of a selection of academic publications.
-
Synthetic data has been more and more used in the last few years. While its applications are various, measuring its utility and privacy is seldom an easy task. Since there are different methods of evaluating these issues, which are dependent on data types, use cases and purpose, a generic method for evaluating utility and privacy does not exist at the moment. So, we introduced a compilation of the most recent methods for evaluating privacy and utility into a single executable in order to create a report of the similarities and potential privacy breaches between two datasets, whether it is related to synthetic or not. We catalogued 24 different methods, from qualitative to quantitative, column-wise or table-wise evaluations. We hope this resource can help scientists and industries get a better grasp of the synthetic data they have and produce more easily and a better basis to create a new, more broad method for evaluating dataset similarities.
-
In healthcare facilities, processes are not always carried out under the expected methods. The variation in practice leads to lesser quality treatments and greater costs. Within a single visit, patients are now likely to interact with multiple departments, healthcare providers, and Health Information Systems (HIS). Because of these events, information is frequently dispersed, not normalized, and incoherent, creating several barriers to overview processes and audit their quality. Process mining can be a useful tool for getting over some of these obstacles. We designed a procedure to automatically apply process mining techniques using the HL7 standard, which is used for exchanging information between HIS as a source of event logs. Our work provides a way for pooling HL7 messages from a unified repository of a healthcare institution and provides a pipeline to apply process mining methods to create insights relative to the healthcare processes that are implemented. We show a few diagrams to demonstrate the tool’s potential as a process formalization and analysis tool. We concluded that using HL7 messages as a proxy for processes that involve several HIS is a way to easily provide process mining capabilities to an organization.
-
Data is a major asset in today’s healthcare scenery. Hospitals are one of the primary producers of healthcare-related data and the value this data can provide is enormous. However, to use this to improve healthcare practice and push science forward, it is necessary to safeguard the patient’s privacy and the ethical use of the data. The ethical and legal requirements are vast and complex. Synthetic data appears as a tool to overcome these hurdles and provide fast and reliable access to data without compromising utility nor privacy. Even though Generative Adversarial Networks (GANs) are receiving a lot of attention lately, the application of most common models and architectures are not suited to tabular data – the most prevalent healthcare-related data. This study surveys the current GAN implementations tailored to this scenario. The analysis was focused mainly on the models employed, datasets used, and metrics reported regarding the quality of the generated data in terms of utility, privacy and how they compare among themselves. We aim to help institutions and investigators get a grasp of the tools to facilitate access to healthcare data, as well as recommendations for testing data synthesizers with privacy concerns.
-
BACKGROUND: The domain of Biomedical and Health Informatics (BMHI) lies in the intersection of multiple disciplines, making it difficult to define and, consequently, characterise the workforce, training needs and requirements in this domain. Nevertheless, to the best of our knowledge, there isn’t any aggregated information about the higher education programmes in BMHI currently being delivered in Portugal, and which knowledge, skills, and competencies these programmes aim to develop. AIM: Our aim is to map BMHI teaching in Portugal. More specifically, our objective is to identify and characterise the: a.) programmes delivering relevant BMHI teaching; b.) geographical distribution and chronological evolution of such programmes; and c.) credit distribution and weight. METHODS: We conducted a descriptive, cross-sectional study to systematically identify all programmes currently delivering any core BMHI modules in Portugal. Our population included all graduate-level programmes being delivered in the 2021/2022 academic year in any Portuguese higher education institution. RESULTS: We identified 23 programmes delivering relevant teaching in BMHI in Portugal. Of these, eight (35%) were classified as dedicated educational programmes in BMHI, mostly delivered in polytechnic institutes at a master’s level (5; 63%) and located preferentially in the northern part of the country (7). Currently, there are four programmes with potential for accreditation but still requiring some workload increase in certain areas in order to be eligible.
-
Background: The escalating prevalence of cesarean delivery globally poses significant health impacts on mothers and newborns. Despite this trend, the underlying reasons for increased cesarean delivery rates, which have risen to 36.3% in Portugal as of 2020, remain unclear. This study delves into these issues within the Portuguese health care context, where national efforts are underway to reduce cesarean delivery occurrences. Objective: This paper aims to introduce a machine learning, algorithm-based support system designed to assist clinical teams in identifying potentially unnecessary cesarean deliveries. Key objectives include developing clinical decision support systems for cesarean deliveries using interoperability standards, identifying predictive factors influencing delivery type, assessing the economic impact of implementing this tool, and comparing system outputs with clinicians’ decisions. Methods: This study used retrospective data collected from 9 public Portuguese hospitals, encompassing maternal and fetal data and delivery methods from 2019 to 2020. We used various machine learning algorithms for model development, with light gradient-boosting machine (LightGBM) selected for deployment due to its efficiency. The model’s performance was compared with clinician assessments through questionnaires. Additionally, an economic simulation was conducted to evaluate the financial impact on Portuguese public hospitals. Results: The deployed model, based on LightGBM, achieved an area under the receiver operating characteristic curve of 88%. In the trial deployment phase at a single hospital, 3.8% (123/3231) of cases triggered alarms for potentially unnecessary cesarean deliveries. Financial simulation results indicated potential benefits for 30% (15/48) of Portuguese public hospitals with the implementation of our tool. However, this study acknowledges biases in the model, such as combining different vaginal delivery types and focusing on potentially unwarranted cesarean deliveries. Conclusions: This study presents a promising system capable of identifying potentially incorrect cesarean delivery decisions, with potentially positive implications for medical practice and health care economics. However, it also highlights the challenges and considerations necessary for real-world application, including further evaluation of clinical decision-making impacts and understanding the diverse reasons behind delivery type choices. This study underscores the need for careful implementation and further robust analysis to realize the full potential and real-world applicability of such clinical support systems.
-
Background The prevalence of chronic diseases has shifted the burden of disease from incidental acute inpatient admissions to long-term coordinated care across healthcare institutions and the patient’s home. Digital healthcare ecosystems emerge to target increasing healthcare costs and invest in standard Application Programming Interfaces (API), such as HL7 Fast Healthcare Interoperability Resources (HL7 FHIR) for trusted data flows. Objectives This scoping review assessed the role and impact of HL7 FHIR and associated Implementation Guides (IGs) in digital healthcare ecosystems focusing on chronic disease management. Methods To study trends and developments relevant to HL7 FHIR, a scoping review of the scientific and gray English literature from 2017 to 2023 was used. Results The selection of 93 of 524 scientific papers reviewed in English indicates that the popularity of HL7 FHIR as a robust technical interface standard for the health sector has been steadily rising since its inception in 2010, reaching a peak in 2021. Digital Health applications use HL7 FHIR in cancer (45 %), cardiovascular disease (CVD) (more than 15 %), and diabetes (almost 15 %). The scoping review revealed that references to HL7 FHIR IGs are limited to ∼ 20 % of articles reviewed. HL7 FHIR R4 was most frequently referenced when the HL7 FHIR version was mentioned. In HL7 FHIR IGs registries and the internet, we found 35 HL7 FHIR IGs addressing chronic disease management, i.e., cancer (40 %), chronic disease management (25 %), and diabetes (20 %). HL7 FHIR IGs frequently complement the information in the article. Conclusions HL7 FHIR matures with each revision of the standard as HL7 FHIR IGs are developed with validated data sets, common shared HL7 FHIR resources, and supporting tools. Referencing HL7 FHIR IGs cataloged in official registries and in scientific publications is recommended to advance data quality and facilitate mutual learning in growing digital healthcare ecosystems that nurture interoperability in digital health innovation.
-
Abstract This study focused on comparing distributed learning models with centralized and local models, assessing their efficacy in predicting specific delivery and patient-related outcomes in obstetrics using real-world data. The predictions focus on key moments in the obstetric care process, including discharge and various stages of hospitalization. Our analysis: using 6 different machine learning methods like Decision Trees, Bayesian methods, Stochastic Gradient Descent, K-nearest neighbors, AdaBoost, and Multi-layer Perceptron and 19 different variables with various distributions and types, revealed that distributed models were at least equal, and often superior, to centralized versions and local versions. We also describe thoroughly the preprocessing stage in order to help others implement this method in real-world scenarios. The preprocessing steps included cleaning and harmonizing missing values, handling missing data and encoding categorical variables with multisite logic. Even though the type of machine learning model and the distribution of the outcome variable can impact the result, we reached results of 66% being superior to the centralized and local counterpart and 77% being better than the centralized with AdaBoost. Our experiments also shed light in the preprocessing steps required to implement distributed models in a real-world scenario. Our results advocate for distributed learning as a promising tool for applying machine learning in clinical settings, particularly when privacy and data security are paramount, thus offering a robust solution for privacy-concerned clinical applications.
-
The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement.This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data.A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020.The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool’s infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians’ assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined.Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality.This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool’s utility across diverse medical domains.With the widespread use of healthcare information systems, a vast amount of health data are generated, stored in electronic health records (EHRs). These data have the potential to advance medical knowledge and improve patient care, but only if it is of high quality. Data quality varies depending on its use, such as daily patient care, research, or management purposes. Poor data quality in EHRs can lead to incorrect healthcare decisions. Errors can occur at various stages, from data entry to processing and interpretation. Different approaches are needed to assess data quality based on its intended use. This article focuses on developing a tool to improve data quality in obstetrics using 3 main categories: completeness, plausibility, and conformance. Tested with data from 9 Portuguese hospitals, the tool uses methods like Bayesian networks and rule-based systems. Initial real-world testing showed promising results. However, assessing data quality remains complex and context dependent. Future research will refine the tool and expand its application. This work is a significant step towards ensuring high-quality EHR data for clinical and research purposes.
-
Introduction/Background Hormone Receptor-positive (HR+) and Human Epidermal Growth Factor Receptor 2-negative (HER2-) breast cancer is the most common subtype, predominantly treated with endocrine therapy. The efficacy of CDK4/6 inhibitors combined with endocrine therapy in this context remains to be fully evaluated. Materials (or Patients) and Methods This study compared the effectiveness of CDK4/6 inhibitors (palbociclib and ribociclib) in combination with an aromatase inhibitor or fulvestrant against endocrine therapy alone in patients with HR+/HER2- advanced breast cancer. The main focus was on progression-free survival (PFS) and overall survival (OS). The study involved a population treated exclusively with endocrine therapy for bone involvement, examining median OS and PFS, and adjusting for variables like stage, visceral metastasis, age, and treatment line. Results The study found no significant OS difference between treatments with palbociclib, ribociclib, and endocrine therapy alone. However, ribociclib combined with letrozole significantly improved PFS over letrozole alone. Propensity score weighting indicated a potential 50 % reduction in death risk with ribociclib compared to palbociclib, though this was not confirmed by cox regression. Conclusion CDK4/6 inhibitors, particularly ribociclib in combination with letrozole, show promise in improving outcomes for HR+/HER2- breast cancer patients. While palbociclib may not be superior to traditional endocrine therapy, the results underscore the need for further research. These findings could influence future treatment protocols, emphasizing the importance of personalized therapy in this patient group.
Services
I now offer a wide range of services to companies in order to get healthcare data reach its full potential for industry and health institutions.
Interoperability Implementation
Focus on specification, development and implementation of interoperability solutions.
Data Science Projects
From business understanding, data prep, model development and evaluation, model deployment and monitoring.
Product Management
Digital health product management, roadmap definition, DevOps implementation, specifications and deployment.