Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

Kedro for multimodal machine learning in healthcare 

Healthcare data is heterogenous with several types of data like reports, tabular data, and images. Combining multiple modalities of data into a single model can be challenging due to several reasons. One challenge is that the diverse types of data may have different structures, formats, and scales which can make it difficult to integrate them into a single model. Additionally, some modalities of data may be missing or incomplete, which can make it difficult to train a model effectively. Another challenge is that different modalities of data may require different types of pre-processing and feature extraction techniques, which can further complicate the integration process. Furthermore, the lack of large-scale, annotated datasets that have multiple modalities of data can also be a challenge. Despite these challenges, advances in deep learning, multi-task learning and transfer learning are making it possible to develop models that can effectively combine multiple modalities of data and achieve reliable performance. 

Kedro for multimodal machine learning in healthcare 
Pipelines Kedro for multimodal machine learning

Kedro for multimodal machine learning

Kedro is an open-source Python framework that helps data scientists and engineers organize their code, increase productivity and collaboration, and make it easier to deploy their models to production. It is built on top of popular libraries such as Pandas, TensorFlow and PySpark, and follows best practices from software engineering, such as modularity and code reusability. Kedro supplies a standardized structure for organizing code, handling data and configuration, and running experiments. It also includes built-in support for version control, logging, and testing, making it easy to implement reproducible and maintainable pipelines. Additionally, Kedro allows to easily deploy the pipeline on cloud platforms like AWS, GCP or Azure. This makes it a powerful tool for creating robust and scalable data science and data engineering pipelines. 

I have built a few kedro packages that can make multi-modal machine learning easy in healthcare. The packages supply prebuilt pipelines for preprocessing images, tabular and text data and build fusion models that can be trained on multi-modal data for easy deployment. The text preprocessing package currently supports BERT and CNN-text models. There is also a template that you can copy to build your own pipelines making use of the preprocessing pipelines that I have built. Any number and combination of data types are supported. Additionally, like any other kedro pipeline, these can be deployed on kubeflow and VertexAI. Do comment below if you find these tools useful in your research. 

Link to the repository below.

Dark Mode

kedro-multimodal (this link opens in a new window) by dermatologist (this link opens in a new window)

Template for multi-modal machine learning in healthcare using Kedro. Combine reports, tabular data and image using various fusion methods.

Six things data scientists in healthcare should know

Healthcare, like most other fields, is eager to get on the data science bandwagon. Data scientists can make a huge difference in the way big data is utilized for clinical decision-making. However, there are paradigmatic differences in the way data scientists from quantitative fields view the world, compared to their clinical counterparts. This is especially true in the emerging fields of machine learning and artificial intelligence. This may lead to considerable inefficiencies. As a person trained in both fields, here is my take on this.

Data scientists
Credit: Dasaptaerwin, CC0, via Wikimedia Commons

Data scientists should focus on the problem and not the solutions

Data scientists are excited about the latest GPT or BERT. Data scientists tend to refine the model a bit more using 10 more GPUs! In the process, they tend to solve problems that do not exist. From my experience practicing medicine in extremely resource-poor areas, simple solutions are valued more than BERT running on Kubernetes! This is true in the developed world as well, and many teams may have fundamental data needs that need to be tackled first.

Explanation comes before prediction

Emerging machine learning methods prioritize prediction accuracy compromising on explainability in the process. Clinicians, in most cases, cannot use nor trust a model that arrives at a conclusion without showing how it reached there. Hence, in the clinical domain, a simple logistic regression model may be more acceptable than a deep learning neural network. Parsimony is the key and a bit of feature selection to ensure parsimony will be appreciated always.

You need to know the clinical terminologies

A basic understanding of the clinical terminologies and terminology systems such as SNOMED and ICD is vital. It helps in understanding the clinical community better. Any healthcare analytics to consider variations in terminologies and adopt a standard system for consistency. Any tool that data scientists build for the clinical community should have support for terminology systems.

Biostatistics is more pervasive than you think

Most healthcare professionals are trained in biostatistics. Hence, the thinking leans towards population, sampling, randomization, blindings and showing a ‘statistically significant’ difference. Moving towards machine learning needs a paradigmatic shift. It may be useful to have a discussion on this at the outset.

Classes are of unequal importance

In healthcare, finding one class (e.g. cancer) is more important than the other class (e.g. no cancer). One class may need active intervention to save lives. Hence, sensitivity and specificity are of vital importance than accuracy!

Life is precious!

In healthcare, there is no room for error. Some decisions may have disastrous consequences while few others may save lives. As a data scientist in the healthcare domain, you should be cognizant of the fact that healthcare data is different from banking/airline data.

Open-source for healthcare

This post is meant to be an instruction guide for healthcare professionals who would like to join my projects on GitHub.

eHealth Programmer Girl

What is a contribution?

Contribution is not always coding. You can clean up stuff, add documentation, instructions for others to follow etc. Issues and feature requests should be posted under the ‘issues’ tab and general discussions under the ‘Discussions’ tab if one is available.

How do I contribute.

How do I develop

  • The .devcontainer folder will have the configuration for the docker container for development.
  • Version bump action (if present) will automatically bump version based on the following terms in a commit message: major/minor/patch. Avoid these words in the commit message unless you want to trigger the action.
  • Most repositories have GH actions to generate and deploy documentation and changelog.

What do I do next

  • My repositories (so far) are small enough for beginners to get the big picture and make meaningful contributions.
  • Don’t be discouraged if you make mistakes. That is how we all learn.

There’s no better time than now to choose a repo to contribute!

Interoperability for Doctors and Healthcare professionals Part I

It is important for health information systems to talk to each other. Unfortunately they speak different languages. This article is not for eHealth experts to understand the nuances of interoperability (HIE), but for health care professionals to have an idea about what is out there and what can be expected in the future.

When we consider HIE we have to think about what is being exchanged (package), how the information is organized (format) and how it is being transported (protocol). Though it is not essential to know, few terms that you might recognize are: HL7, XML for format and HTTP, TCP/IP for protocol. (Have you heard of MLLP? Google it!) The donor has the information in a certain format and protocol while the recipient expects it in a particular format and protocol.

At the core of all HIE platforms such as MirthConnect or OrionHealth’s Rhapsody, is an engine that does this conversion. Format and protocol of donor to format and protocol of recipient. Simple eh?

Is pragmatic interoperability the best solution?- M. Martineau @eHealthMusings explores a pragmatic Rhapsody approach http://t.co/XMkMXAEdt7

— Orion Health Canada (@OrionHealthCA) December 11, 2014

Now most of these platforms have a user interface or IDE for making this connection. You can also introduce certain filters at this stage. Enterprise systems like Rhapsody presents an attractive visual interface, whereas open source solutions may not be very user friendly.

What else can the engine do? It usually keeps a log of all package deliveries and whether the delivery was successful. If the delivery failed, it can attempt again and alert the maintenance team through a console. The console can even be mobile as in rhapsody. Though the engine can store the package itself for a limited time, storing the package is not really its job.

The donors could be:

  • A single department in a hospital sending lab reports.
  • All departments in a hospital sending all sorts of information.
  • Several hospitals in a region.

The recipient could be:

  • Another department in the same hospital expecting a lab test report.
  • A family physician who wants real time access to the lab reports for his patients admitted in the hospital.
  • A researcher who wants to know blood sugar status of all the diabetes patients. (population health)

We need a separate layer between the engine and the recipient to support all these use cases. Let us call this layer mediator.

The mediator can pull data in real time from donors or store it in a local database. The first one is the federated model while the other one is the centralized model. Federated is slow but up-to-date while centralized is fast but not concurrent. Mixed model has both and is preferred. The so called clinical viewers are federated mediators with a web interface.

Emerging paradigms like NoSQL and RDF may be ideal for data representation in mixed model. I have discussed RDF before. Will discuss NoSQL soon!

Psoriasis support : eHealth gaming tools for patient engagement

Psoriasis manum
Psoriasis manum (Photo credit: Wikipedia)

Here is the IFPA  survey to compare 17 different strategies and activities that can be used to advance psoriasis education, advocacy and awareness. Preliminary results of the survey will be presented on World Psoriasis Day and the final results will be announced at the 4th World Psoriasis & Psoriatic Arthritis Conference in Stockholm July 8-11, 2015.

I have listed below some of my random ideas on eHealth tools for patient engagement in psoriasis:

An Agent based model (ABM) offers visual simulations of complex systems that can be displayed on a web browser. Psoriasis disease process can be modelled using psoriasis patients as ‘turtles’ with the known probabilities of auto-remission, exacerbation, response to conventional treatments and response to newer drugs added to the model. The patients and caregivers could interact with the model to understand how the treatment decisions affect the quality of life. ABM could be an innovative and useful web based patient education tool that portrays the reality of psoriasis without giving any false promises. Those in the  patient’s circle of care and the patient would understand the odds of improving quality of life.

Psoriasis: The naked truth
Psoriasis: The naked truth (Photo credit: SomosMedicina)

An android or iPhone app to calculate and log the PASI score of the patient would be a less obtrusive disease monitoring tool. The app may be designed to send the log to patient’s caregiver. I have not checked the apple app store or google play, probably such apps already exist.

A ‘push’ strategy such as email alerts is unlikely to work for psoriatics. An innovative strategy game where the body is modelled as a kingdom and the immunological perturbations as a t-cell mutiny could be a useful engagement tool. Vascular and systemic changes could also be part of the game. The game would be web based and would continue for a long time with the patient required to login periodically to make strategic alterations (treatment choices). Everytime the patient login to the game, medication reminders would be displayed. The game would mimic reality with changes reflecting new clinical studies. New clinical studies that has an impact on the ‘game plan’ would be available under the ‘game resources’ for everyone to read. Reading and understanding these resources would improve the performance in the game.

SUSie: SUS based questionnaire for assessing usability and physician attitude toward health information exchange

evaluation of eyetracking after an usability test
evaluation of eyetracking after an usability test (Photo credit: Wikipedia)

Health information exchange (HIE) allows healthcare providers and patients to access and securely share medical information electronically. Several organizations are now emerging to provide both form and function for HIE efforts, both on independent and governmental/regional levels. However the biggest challenge is Change Management, as healthcare providers are exposed to one more ICT tool that they need to master for providing quality care.

There are no formal tools to study individual and organizational attitude towards HIE or to measure their usability. Physician attitude towards the impact of HIE on reducing healthcare costs, improving quality of patient care, saving time and their concern about data privacy and security are important in HIE adoption. Usability is also of vital importance in the meaningful use of HIE tools.

SUSie (SUS for HIE) is an attempt at creating a useful tool for measuring the above factors. It is modelled based on System Usability Scale (SUS), one of the most used questionnaire for measuring perceptions of usability. Five additional questions were added to assess factors that are specific for HIE. The scoring is based on a scale of 5 ranging from Strongly disagree(1) to Strongly agree (5). The ratio of positively and negatively worded questions are maintained and the final multiplication factor was changed to 1.67 to represent the final score on a scale of 100. I hope that this would make the interpretation similar to SUS and benefit from the prior experience available for SUS. The questions and details of scoring are explained below.

If you use SUSie, please cite this webpage and the articles below:

  • Brooke, J. (1996). “SUS: a “quick and dirty” usability scale”. In P. W. Jordan, B. Thomas, B. A. Weerdmeester, & A. L. McClelland. Usability Evaluation in Industry. London: Taylor and Francis.
  • Wright, Adam et al. “Physician attitudes toward health information exchange: results of a statewide survey.” Journal of the American Medical Informatics Association 17.1 (2010): 66-70.
  • Eapen BR (2014). “SUSie: SUS based questionnaire for assessing usability and provider attitude toward health information exchange.” Applied Bimatics – An Informatics & eHealth Blog.[Internet] Accessible from: http://bioblog.gulfdoctor.net/2014/06/susie-hie-usability-physician-attitude.html

Questions:

  1. I think that I would like to use this Health Information Exchange System frequently. 
  2. I found this Health Information Exchange System unnecessarily complex. 
  3. I think this Health Information Exchange System will reduce healthcare costs. 
  4. I think that I would need the support of a technical person to be able to use this Health Information Exchange System. 
  5. I thought this Health Information Exchange System was easy to use. 
  6. I thought there was too much inconsistency in this Health Information Exchange System. 
  7. I think this Health Information Exchange System will improve Quality of Patient Care. 
  8. I found this Health Information Exchange System very cumbersome to use. 
  9. I found the various functions in this Health Information Exchange System were well integrated. 
  10. I am concerned about the privacy and security of healthcare information on this Health Information Exchange System. 
  11. I found this Health Information Exchange System can save me time. 
  12. I needed to learn a lot of things before I could get going with this Health Information Exchange System. 
  13. I would imagine that most people would learn to use this Health Information Exchange System very quickly. 
  14. I found that I had to significantly change my workflow to use this Health Information Exchange System.
  15. I felt very confident using this Health Information Exchange System.

SUSie uses the following response format:

Scoring SUSie

  • For odd items: subtract one from the user response.
  • For even-numbered items: subtract the user responses from 5
  • This scales all values from 0 to 4 (with four being the most positive response).
  • Add up the converted responses for each user and multiply that total by 1.67
  • This converts the range of possible values from 0 to 100 instead of from 0 to 60.

Interpreting SUSie (Based on SUS)

  • The value is not a percentage.
  • Average value is approximately 68.
  • This is not a validated questionnaire.
  • A percentile graph for SUS and other relevant information is available here: http://www.measuringusability.com/sus.php (courtesy: Jeff Sauro )
  • SUSie may be used to compare groups or for comparing pre and post intervention.

I have created a wiki page for updates. Please add any use-cases you can think of to the Wiki page.