João

I'm a

About

IT Pharmacist with over 10 years of experience in health information technology. Expertise in data analysis, data modelling and data exchange. Interoperability expert with several implementations and different standards. Focused on improving healthcare using data-based decisions.

Data Management & Science.

Creating Data solutions for healthcare.

  • City: Porto, Portugal
  • Freelance: Available
  • Degree: PhD
  • Email: joaofcalmeida [at] outlook [dot] com

Healthcare needs to be safer and more efficient. With a combined background in pharmaceutical Sciences and Health Informatics, my ambition is to use technology to improve patient safety, to increase treatment efficacy and to improve care efficiency.

Facts

During my professional and academic activities, I take pride in some numbers that I have achieved

Projects Implemented in different health institutions

Countries all over the world

Different positions across digital health product development

Different classes taught, from health information system development to interoperability

Skills

Some of my main skills are

Machine Learning85%
Data Management 90%
Interoperability100%
Python80%
DevOps85%
HL7 FHIR95%

Resume

Some of my personal, academic, professional and associative history.

Summary

João Almeida

IT Pharmacist with over 10 years of experience in health information technology. Expertise in data analysis, data modelling and data exchange. Interoperability expert with several implementations and different standards. Focused on improving healthcare using data-based decisions.

  • Porto, Portugal
  • joaofcalmeida [at] outlook.com

Education

PhD in Health Data Science

2019 - Now

Faculty of Medicine of University of Porto, Portugal

Specialization in Federated Learning, Data modelling, Generative networks and real word evidence

Master in Health Informatics

2017 - 2019

Faculty of Medicine of University of Porto, Portugal

Focus on pharmacovigilance, medical system evaluation and statistics

Specialization in Health Informatics

2016 - 2016

Faculty of Medicine of University of Porto, Portugal

Focus on data modelling and interoperability

Msc in Pharmaceutical Sciences

2008 - 2013

Faculty of Pharmacy of University of Porto, Portugal

Degree to practice pharmacy in community and hospital setting

Associative

HL7 Portugal Affiliate

2018 - today

Portuguese affiliate of the HL7 foundation. Helped to teach and advise the Portuguese interoperability community on best practices regarding HL7 implementation.

E-mais

2017 - today

One of the Portuguese representatives in the EFMI (European Federation for Medical Informatics). Created initiatives to promote digital health adoption in Portugal.

IHE Pharmacy

2017 - today

Focused on creating profiles to help the community implement correct and useful practices regarding information exchange in the pharmacy setting.

Professional Experience

Independent Consultant

2021 - Present

Worldwide

  • Product Owner for gravitate-Health Project: developing, managing and defining features for the product
  • Develop, implement, validate and monitor AI-based systems for Healthcare
  • Co-authorship of IHE Pharmacy FHIR Profiles on prescription, dispense and drug adverse events
  • Creating FHIR servers for medicine regulation, IDMP compatible.
  • Creating and implementing FHIR profiles across domains.
  • Leading Tracks on connectathons and projects

Interoperability and machine learning

2018 - 2021

HealthySystems, Porto

  • Creating technical and clinical alarming mechanisms
  • Developing data pipelines for several machine learning algorithms
  • Creating, deploying and monitoring ML models in several hospitals
  • Creating integration mechanisms with over 15 different suppliers in 7 hospitals

Invited Assistant

2019 - Present

Faculty of Medicine Of University Porto, Porto

  • Lecturing classes about interoperability, data standards, Health information systems and health data science
  • Creating infrastructure for practical classes support

Researcher

2016 - Present

NanoSTIMA, Porto

  • Researching superior tools to support causality assessment in the pharmacovigilance centres. Provide faster, better and more efficiently drug information to the population.
  • Currently developing evaluation methods for biomedical systems and natural language processing of biomedical papers.

Integration Analyst

2016 - 2018

Alert Life Sciences Computing, Porto

  • Aimed to provide the best medication information for the software in 13 countries. Integrated different information sources into the product, along with communicating with clients, governmental institutions and information providers.
  • Integration of several data sources into the application
  • Creating ETL processes for incremental updates of information, focusing on data quality and patient safety
  • Implementation of automatic mechanisms for client support

Data Analyst

2014 - 2016

Alert Life Sciences Computing, Porto

  • Data and business analysis for hospitals over 13 countries. Great involvement with features development, helping both functional analysis and development teams.
  • Quality control of content produced for 1+ years
  • Functional analysis for several features for 3+ years
  • Implementation consultant in over 5 projects

Projects

An overview of a selection of the projects that I have developed over the years.

  • All
  • Machine Learning
  • Interoperability
  • Academic

Title: awesome-academic-resources

Description: Links and resources for academics

Topics: academic, awesome-list, cv, and resources

Last Updated: 08 Aug 2023

Title: awesome-data-synthesis

Description: A curated list of awesome resources for creating synthetic data

Topics: awesome-list, cv, deep-learning, gan, generative-adversarial-network, and machine-learning

Last Updated: 22 Jul 2024

Title: awesome-medical-nlp

Description: list of respositories for NLP medical related

Topics: awesome-list, cv, machine-learning, named-entity-recognition, and nlp

Last Updated: 07 Jul 2024

Title: distributed-data-benchmark

Description: Making benchmark through distributed systems

Topics: academic, cv, distributed-systems, and machine-learning

Last Updated: 21 Jul 2023

Title: DrugDatabasesTools

Description: Tools for consuming drug information databases with several formats and export them normalized into CSV for further purposes

Topics: bioinformatics, cv, data-science, database, and machine-learning

Last Updated: 15 Aug 2022

Title: DrugDiscoveryML

Description: messing around with drug discovery and machine-learning

Topics: cv, drug-discovery, machine-learning, and python

Last Updated: 25 Aug 2022

Title: ds-curriculum

Description: Data Science Curriculum

Topics: curriculum, cv, data-science, deep-learning, and machine-learning

Last Updated: 15 Aug 2022

Title: Evaluating-distributed-learning-algorithms-on-real-world-healthcare-data

Description: code for the paper Evaluating distributed-learning algorithms on real world healthcare data

Topics: academic, cv, data-science, distributed-systems, and machine-learning

Last Updated: 18 Jul 2023

Title: fhir-server-search-testing

Description: Testing Search Methods of FHIR Servers

Topics: cv, fhir, fhir-server, interoperability, and testing

Last Updated: 15 Aug 2022

Title: heads-thesis

Description: PhD Thesis on Health Data Science

Topics: academic and cv

Last Updated: 11 Sep 2024

Title: Health-information-systems-data-analysis

Description:

Topics: academic, cv, and data-visualization

Last Updated: 15 Aug 2022

Title: medicationIG

Description: medicationIG

Topics: cv, fhir, fhir-ig, hl7-fhir, implementationguide, interoperability, and medication

Last Updated: 15 Aug 2022

Title: my-graph-drugs

Description: Drug Catalog on graph

Topics: cv, data-visualization, interoperability, medication, and neo4j

Last Updated: 15 Aug 2022

Title: obs-cdss-fhir

Description:

Topics: cv, fhir, implementationguide, and interoperability

Last Updated: 12 Jun 2023

Title: pem-h

Description:

Topics: cv, fhir, hl7-fhir, implementationguide, interoperability, and prescription

Last Updated: 15 Aug 2022

Title: portuguese-medication-idmp-fhir

Description:

Topics: cv, data, graph, idmp, interoperability, and medication

Last Updated: 11 Oct 2022

Title: py4chemoinformatics

Description: Python for chemoinformatics

Topics: cheminformatics, chemistry, chemoinformatics, cv, deep-learning, drug-design, drug-discovery, jupyter, machine-learning, python, rdkit, and scikit-learn

Last Updated: 01 Sep 2024

Publications

An overview of a selection of academic publications.

  1. Coutinho-Almeida, João, et al. “Dataset Comparison Tool: Utility and Privacy.” Challenges of Trustable AI and Added-Value on Health, IOS Press, 2022, pp. 23–27, doi:10.3233/SHTI220389.

    Synthetic data has been more and more used in the last few years. While its applications are various, measuring its utility and privacy is seldom an easy task. Since there are different methods of evaluating these issues, which are dependent on data types, use cases and purpose, a generic method for evaluating utility and privacy does not exist at the moment. So, we introduced a compilation of the most recent methods for evaluating privacy and utility into a single executable in order to create a report of the similarities and potential privacy breaches between two datasets, whether it is related to synthetic or not. We catalogued 24 different methods, from qualitative to quantitative, column-wise or table-wise evaluations. We hope this resource can help scientists and industries get a better grasp of the synthetic data they have and produce more easily and a better basis to create a new, more broad method for evaluating dataset similarities.

  2. Coutinho-Almeida, João, and Ricardo João Cruz-Correia. “Developing a Process Mining Tool Based on HL7.” Procedia Computer Science, vol. 196, 2022, pp. 501–08, doi:10.1016/j.procs.2021.12.042.

    In healthcare facilities, processes are not always carried out under the expected methods. The variation in practice leads to lesser quality treatments and greater costs. Within a single visit, patients are now likely to interact with multiple departments, healthcare providers, and Health Information Systems (HIS). Because of these events, information is frequently dispersed, not normalized, and incoherent, creating several barriers to overview processes and audit their quality. Process mining can be a useful tool for getting over some of these obstacles. We designed a procedure to automatically apply process mining techniques using the HL7 standard, which is used for exchanging information between HIS as a source of event logs. Our work provides a way for pooling HL7 messages from a unified repository of a healthcare institution and provides a pipeline to apply process mining methods to create insights relative to the healthcare processes that are implemented. We show a few diagrams to demonstrate the tool’s potential as a process formalization and analysis tool. We concluded that using HL7 messages as a proxy for processes that involve several HIS is a way to easily provide process mining capabilities to an organization.

  3. Coutinho-Almeida, João, et al. “GANs for Tabular Healthcare Data Generation: A Review on Utility and Privacy.” Discovery Science, edited by Carlos Soares and Luis Torgo, Springer International Publishing, 2021, pp. 282–91, doi:10/gm3tf5.

    Data is a major asset in today’s healthcare scenery. Hospitals are one of the primary producers of healthcare-related data and the value this data can provide is enormous. However, to use this to improve healthcare practice and push science forward, it is necessary to safeguard the patient’s privacy and the ethical use of the data. The ethical and legal requirements are vast and complex. Synthetic data appears as a tool to overcome these hurdles and provide fast and reliable access to data without compromising utility nor privacy. Even though Generative Adversarial Networks (GANs) are receiving a lot of attention lately, the application of most common models and architectures are not suited to tabular data – the most prevalent healthcare-related data. This study surveys the current GAN implementations tailored to this scenario. The analysis was focused mainly on the models employed, datasets used, and metrics reported regarding the quality of the generated data in terms of utility, privacy and how they compare among themselves. We aim to help institutions and investigators get a grasp of the tools to facilitate access to healthcare data, as well as recommendations for testing data synthesizers with privacy concerns.

  4. Costa, Paulo Dias, et al. “Biomedical and Health Informatics Teaching in Portugal: Current Status.” Heliyon, vol. 9, no. 3, March 2023, p. e14163, doi:10.1016/j.heliyon.2023.e14163.

    BACKGROUND: The domain of Biomedical and Health Informatics (BMHI) lies in the intersection of multiple disciplines, making it difficult to define and, consequently, characterise the workforce, training needs and requirements in this domain. Nevertheless, to the best of our knowledge, there isn’t any aggregated information about the higher education programmes in BMHI currently being delivered in Portugal, and which knowledge, skills, and competencies these programmes aim to develop. AIM: Our aim is to map BMHI teaching in Portugal. More specifically, our objective is to identify and characterise the: a.) programmes delivering relevant BMHI teaching; b.) geographical distribution and chronological evolution of such programmes; and c.) credit distribution and weight. METHODS: We conducted a descriptive, cross-sectional study to systematically identify all programmes currently delivering any core BMHI modules in Portugal. Our population included all graduate-level programmes being delivered in the 2021/2022 academic year in any Portuguese higher education institution. RESULTS: We identified 23 programmes delivering relevant teaching in BMHI in Portugal. Of these, eight (35%) were classified as dedicated educational programmes in BMHI, mostly delivered in polytechnic institutes at a master’s level (5; 63%) and located preferentially in the northern part of the country (7). Currently, there are four programmes with potential for accreditation but still requiring some workload increase in certain areas in order to be eligible.

  5. Coutinho-Almeida, João, et al. “Fast Healthcare Interoperability Resources–Based Support System for Predicting Delivery Type: Model Development and Evaluation Study.” JMIR Formative Research, vol. 8, April 2024, p. e54109, doi:10.2196/54109.

    Background: The escalating prevalence of cesarean delivery globally poses significant health impacts on mothers and newborns. Despite this trend, the underlying reasons for increased cesarean delivery rates, which have risen to 36.3% in Portugal as of 2020, remain unclear. This study delves into these issues within the Portuguese health care context, where national efforts are underway to reduce cesarean delivery occurrences. Objective: This paper aims to introduce a machine learning, algorithm-based support system designed to assist clinical teams in identifying potentially unnecessary cesarean deliveries. Key objectives include developing clinical decision support systems for cesarean deliveries using interoperability standards, identifying predictive factors influencing delivery type, assessing the economic impact of implementing this tool, and comparing system outputs with clinicians’ decisions. Methods: This study used retrospective data collected from 9 public Portuguese hospitals, encompassing maternal and fetal data and delivery methods from 2019 to 2020. We used various machine learning algorithms for model development, with light gradient-boosting machine (LightGBM) selected for deployment due to its efficiency. The model’s performance was compared with clinician assessments through questionnaires. Additionally, an economic simulation was conducted to evaluate the financial impact on Portuguese public hospitals. Results: The deployed model, based on LightGBM, achieved an area under the receiver operating characteristic curve of 88%. In the trial deployment phase at a single hospital, 3.8% (123/3231) of cases triggered alarms for potentially unnecessary cesarean deliveries. Financial simulation results indicated potential benefits for 30% (15/48) of Portuguese public hospitals with the implementation of our tool. However, this study acknowledges biases in the model, such as combining different vaginal delivery types and focusing on potentially unwarranted cesarean deliveries. Conclusions: This study presents a promising system capable of identifying potentially incorrect cesarean delivery decisions, with potentially positive implications for medical practice and health care economics. However, it also highlights the challenges and considerations necessary for real-world application, including further evaluation of clinical decision-making impacts and understanding the diverse reasons behind delivery type choices. This study underscores the need for careful implementation and further robust analysis to realize the full potential and real-world applicability of such clinical support systems.

  6. Gazzarata, Roberta, et al. “HL7 Fast Healthcare Interoperability Resources (HL7 FHIR) in Digital Healthcare Ecosystems for Chronic Disease Management: Scoping Review.” International Journal of Medical Informatics, vol. 189, September 2024, p. 105507, doi:10.1016/j.ijmedinf.2024.105507.

    Background The prevalence of chronic diseases has shifted the burden of disease from incidental acute inpatient admissions to long-term coordinated care across healthcare institutions and the patient’s home. Digital healthcare ecosystems emerge to target increasing healthcare costs and invest in standard Application Programming Interfaces (API), such as HL7 Fast Healthcare Interoperability Resources (HL7 FHIR) for trusted data flows. Objectives This scoping review assessed the role and impact of HL7 FHIR and associated Implementation Guides (IGs) in digital healthcare ecosystems focusing on chronic disease management. Methods To study trends and developments relevant to HL7 FHIR, a scoping review of the scientific and gray English literature from 2017 to 2023 was used. Results The selection of 93 of 524 scientific papers reviewed in English indicates that the popularity of HL7 FHIR as a robust technical interface standard for the health sector has been steadily rising since its inception in 2010, reaching a peak in 2021. Digital Health applications use HL7 FHIR in cancer (45 %), cardiovascular disease (CVD) (more than 15 %), and diabetes (almost 15 %). The scoping review revealed that references to HL7 FHIR IGs are limited to ∼ 20 % of articles reviewed. HL7 FHIR R4 was most frequently referenced when the HL7 FHIR version was mentioned. In HL7 FHIR IGs registries and the internet, we found 35 HL7 FHIR IGs addressing chronic disease management, i.e., cancer (40 %), chronic disease management (25 %), and diabetes (20 %). HL7 FHIR IGs frequently complement the information in the article. Conclusions HL7 FHIR matures with each revision of the standard as HL7 FHIR IGs are developed with validated data sets, common shared HL7 FHIR resources, and supporting tools. Referencing HL7 FHIR IGs cataloged in official registries and in scientific publications is recommended to advance data quality and facilitate mutual learning in growing digital healthcare ecosystems that nurture interoperability in digital health innovation.

  7. Coutinho-Almeida, João, et al. “Evaluating Distributed-Learning on Real-World Obstetrics Data: Comparing Distributed, Centralized and Local Models.” Scientific Reports, vol. 14, no. 1, May 2024, p. 11128, doi:10.1038/s41598-024-61371-1.

    Abstract This study focused on comparing distributed learning models with centralized and local models, assessing their efficacy in predicting specific delivery and patient-related outcomes in obstetrics using real-world data. The predictions focus on key moments in the obstetric care process, including discharge and various stages of hospitalization. Our analysis: using 6 different machine learning methods like Decision Trees, Bayesian methods, Stochastic Gradient Descent, K-nearest neighbors, AdaBoost, and Multi-layer Perceptron and 19 different variables with various distributions and types, revealed that distributed models were at least equal, and often superior, to centralized versions and local versions. We also describe thoroughly the preprocessing stage in order to help others implement this method in real-world scenarios. The preprocessing steps included cleaning and harmonizing missing values, handling missing data and encoding categorical variables with multisite logic. Even though the type of machine learning model and the distribution of the outcome variable can impact the result, we reached results of 66% being superior to the centralized and local counterpart and 77% being better than the centralized with AdaBoost. Our experiments also shed light in the preprocessing steps required to implement distributed models in a real-world scenario. Our results advocate for distributed learning as a promising tool for applying machine learning in clinical settings, particularly when privacy and data security are paramount, thus offering a robust solution for privacy-concerned clinical applications.

  8. ---. “Development and Initial Validation of a Data Quality Evaluation Tool in Obstetrics Real-World Data through HL7-FHIR Interoperable Bayesian Networks and Expert Rules.” JAMIA Open, vol. 7, no. 3, October 2024, p. ooae062, doi:10.1093/jamiaopen/ooae062.

    The increasing prevalence of electronic health records (EHRs) in healthcare systems globally has underscored the importance of data quality for clinical decision-making and research, particularly in obstetrics. High-quality data is vital for an accurate representation of patient populations and to avoid erroneous healthcare decisions. However, existing studies have highlighted significant challenges in EHR data quality, necessitating innovative tools and methodologies for effective data quality assessment and improvement.This article addresses the critical need for data quality evaluation in obstetrics by developing a novel tool. The tool utilizes Health Level 7 (HL7) Fast Healthcare Interoperable Resources (FHIR) standards in conjunction with Bayesian Networks and expert rules, offering a novel approach to assessing data quality in real-world obstetrics data.A harmonized framework focusing on completeness, plausibility, and conformance underpins our methodology. We employed Bayesian networks for advanced probabilistic modeling, integrated outlier detection methods, and a rule-based system grounded in domain-specific knowledge. The development and validation of the tool were based on obstetrics data from 9 Portuguese hospitals, spanning the years 2019-2020.The developed tool demonstrated strong potential for identifying data quality issues in obstetrics EHRs. Bayesian networks used in the tool showed high performance for various features with area under the receiver operating characteristic curve (AUROC) between 75% and 97%. The tool’s infrastructure and interoperable format as a FHIR Application Programming Interface (API) enables a possible deployment of a real-time data quality assessment in obstetrics settings. Our initial assessments show promised, even when compared with physicians’ assessment of real records, the tool can reach AUROC of 88%, depending on the threshold defined.Our results also show that obstetrics clinical records are difficult to assess in terms of quality and assessments like ours could benefit from more categorical approaches of ranking between bad and good quality.This study contributes significantly to the field of EHR data quality assessment, with a specific focus on obstetrics. The combination of HL7-FHIR interoperability, machine learning techniques, and expert knowledge presents a robust, adaptable solution to the challenges of healthcare data quality. Future research should explore tailored data quality evaluations for different healthcare contexts, as well as further validation of the tool capabilities, enhancing the tool’s utility across diverse medical domains.With the widespread use of healthcare information systems, a vast amount of health data are generated, stored in electronic health records (EHRs). These data have the potential to advance medical knowledge and improve patient care, but only if it is of high quality. Data quality varies depending on its use, such as daily patient care, research, or management purposes. Poor data quality in EHRs can lead to incorrect healthcare decisions. Errors can occur at various stages, from data entry to processing and interpretation. Different approaches are needed to assess data quality based on its intended use. This article focuses on developing a tool to improve data quality in obstetrics using 3 main categories: completeness, plausibility, and conformance. Tested with data from 9 Portuguese hospitals, the tool uses methods like Bayesian networks and rule-based systems. Initial real-world testing showed promising results. However, assessing data quality remains complex and context dependent. Future research will refine the tool and expand its application. This work is a significant step towards ensuring high-quality EHR data for clinical and research purposes.

  9. ---. “CDK4/6 Inhibitors and Endocrine Therapy in the Treatment of Metastatic Breast Cancer: A Real-World and Propensity Score-Adjusted Comparison.” Cancer Treatment and Research Communications, vol. 40, January 2024, p. 100818, doi:10.1016/j.ctarc.2024.100818.

    Introduction/Background Hormone Receptor-positive (HR+) and Human Epidermal Growth Factor Receptor 2-negative (HER2-) breast cancer is the most common subtype, predominantly treated with endocrine therapy. The efficacy of CDK4/6 inhibitors combined with endocrine therapy in this context remains to be fully evaluated. Materials (or Patients) and Methods This study compared the effectiveness of CDK4/6 inhibitors (palbociclib and ribociclib) in combination with an aromatase inhibitor or fulvestrant against endocrine therapy alone in patients with HR+/HER2- advanced breast cancer. The main focus was on progression-free survival (PFS) and overall survival (OS). The study involved a population treated exclusively with endocrine therapy for bone involvement, examining median OS and PFS, and adjusting for variables like stage, visceral metastasis, age, and treatment line. Results The study found no significant OS difference between treatments with palbociclib, ribociclib, and endocrine therapy alone. However, ribociclib combined with letrozole significantly improved PFS over letrozole alone. Propensity score weighting indicated a potential 50 % reduction in death risk with ribociclib compared to palbociclib, though this was not confirmed by cox regression. Conclusion CDK4/6 inhibitors, particularly ribociclib in combination with letrozole, show promise in improving outcomes for HR+/HER2- breast cancer patients. While palbociclib may not be superior to traditional endocrine therapy, the results underscore the need for further research. These findings could influence future treatment protocols, emphasizing the importance of personalized therapy in this patient group.

Services

I now offer a wide range of services to companies in order to get healthcare data reach its full potential for industry and health institutions.

Data

Data Governance, Data Cleaning, Data Pipelines and Data Management

Interoperability Implementation

Focus on specification, development and implementation of interoperability solutions.

Healthcare Standards Definition

Defining data standards for projects

Data Science Projects

From business understanding, data prep, model development and evaluation, model deployment and monitoring.

Formation

Teaching about interoperability, data science and health data standards

Product Management

Digital health product management, roadmap definition, DevOps implementation, specifications and deployment.