Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

NLP for Clinical Notes – Tools and Techniques

Clinicians add clinical notes to the EMR on each visit. The clinical notes are unstructured in most cases and can benefit from NLP (natural language processing) tools and techniques. Some are created by dictation software or by medical scribes. Family physicians and family practice-centric EMRs like OSCAR EMR rely on unstructured clinical notes.

natural language processing
NLP for Clinical Notes

Clinical notes, because of the unstructured nature is difficult to analyze for statistical insights. Besides, the notes may require further processing for billing and for generating problem charts. The analysis is becoming increasingly important for quality assessments as well.

NLP can be useful in automated analysis of clinical notes. Here I have listed some of the open-source tools (some maintained by me) for such automated analysis of clinical notes.

Apache cTakes for NLP

Apache cTakes (clinical Text Analysis and Knowledge Extraction System) is one of the first open-source NLP systems that extract clinical information from electronic health record unstructured text. Though it is relatively slow, it is still widely used. I have packaged it as a Quarkus application, that is fast. Quarkus (Supersonic Subatomic Java) is designed primarily for docker containers and the quarkus based containers are easy to be deployed and scaled using platforms such as Kubernetes.

SpaCy and related tools for NLP

SpaCy is an open-source python library for NLP. It features NER, POS tagging, dependency parsing, word vectors and is widely used. But spacy is not designed for clinical workflows and may not be directly usable. Scispacy is SpaCy pipeline and models for scientific/biomedical documents trained on biomedical data. MedaCy is a healthcare-specific NLP framework built over spaCy to support the fast prototyping, training, and application of medical NLP models. One of the advantages of Medacy is that it is fast and lightweight.

UMLS

Unified Medical Language System (UMLS), is a set of files and software that brings together biomedical vocabularies for health information systems. UMLS provides a set of RESTful APIs for licensed users. I have created a JavaScript wrapper for the UMLS APIs that are easy to be called from JavaScript programs. It is available from the npm package repository. See the update on UmlsBERT below.

MedCAT

Medical  Concept Annotation Tool (MedCAT) is a relatively new tool for extraction and linking of terms from vocabularies such as UMLS and SNOMED for free text in EMRs. The paper describing MedCAT is here. MedCAT models can be further refined by training on a domain-specific corpus of text. MedCAT is fast and very useful.

Word Embeddings for NLP

A word embedding is a weighted model for text where words that have the same meaning have a similar weight. It is one of the most popular methods of deep learning for NLP problems. Word2Vec is a method to construct embeddings and the word2vec model based on the entire Wikipedia corpus is available for use. This paper describes the creation of a clinical concept embedding based on a large corpus of clinical documents. I have created a gensim wrapper for this model that can be used for concept similarity search in python.

BERT and related

Bidirectional Encoder Representations from Transformers (BERT) is a technique for NLP pre-training developed by Google. Here is the highly cited official paper. BERT has replaced embeddings as the most successful NLP technique in most domains including healthcare. Some of the refined BERT models used in healthcare are BioBERT and ClinicalBERT.

It is vital to deploy these models in a scalable and maintainable manner to be available for use within EMR systems. We are working on such a framework called ‘Serverless on FHIR’. Give me a shout if you want to know more.

Update (2022): Tools for building multi-modal models.

UPDATE: May 30, 2021: The library (ckblib) is now available under MPL 2.0 license (see below). Feel free to use it in your research.

Dark Mode

ckblib (this link opens in a new window) by dermatologist (this link opens in a new window)

Tools to create a clinical knowledge graph from biomedical literature. Includes wrappers for NCBI Eutils, cTakes annotator and Neo4J

Update (Dec 2020):

Researchers from the University of Waterloo have introduced the novel concept of UmlsBERT. Current clinical embedding such as BioBERT described above are generic models, trained further on clinical corpora applying the concept of transfer learning. Most biomedical ontologies such as UMLS define the hierarchies of concepts defined in them. UmlsBERT makes use of these hierarchical group information at the pre-training stage for augmenting the clinical concept embeddings. Table 3 in the paper compares the results with other embeddings, and it is quite impressive. The GitHub repo is here
Way to go George Michalopoulos and team!

Update (Mar 2021):

Create a chatbot to talk to an FHIR endpoint using conversational AI!

Update (May 2022):

ICDBigBird: A Contextual Embedding Model for ICD Code Classification: https://arxiv.org/pdf/2204.10408.pdf

How to deploy an h2o ai model using OpenFaaS on Digitalocean in 2 minutes

H2O is an open-source, distributed and scalable machine learning platform written in JAVA. H2O supports many of the statistical & machine learning algorithms, including gradient boosted machines, generalized linear models, deep learning and more.  OpenFaaS® (Functions as a Service) is a framework for building Serverless functions easily with Docker. Read my previous post to learn more about OpenFaaS and DO. 

H2O AI model deployment

H2O has a module aptly named sparkling water that allows users to combine the machine learning algorithms of H2O with the capabilities of Spark. Integrating these two open-source environments provides a seamless experience for users who want to make a query using Spark SQL, feed the results into H2O to build a model and make predictions, and then use the results again in Spark. For any given problem, better interoperability between tools provides a better experience.

H2O Driverless AI is a commercial package for automatic machine learning that automates some of the most difficult data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection, and model deployment. H2O also has a popular open-source module called AutoML that automates the process of training a large selection of candidate models. H2O’s AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. AutoML makes hyperparameter tuning accessible to everyone.

H2O allows you to convert the models to either a Plain Old Java Object (POJO) or a Model Object or an Optimized (MOJO) that can be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is the h2o-genmodel.jar file produced as the build output of these packages. You can read more about deploying h2o models here.

I have created an OpenFaaS template for deploying the exported MOJO file using a base java container and the dependencies defined in the gradle build file. Using the OpenFaaS CLI (How to Install) pull my template as below:

mkdir watersplash
cd watersplash

faas-cli template pull https://github.com/dermatologist/java-ext --prefix your-docker-uname

faas-cli new --lang java-h2o watersplash

Copy the exported MOJO zip file to the root folder along with build.gradle and settings.gradle. Make appropriate changes to handle.java as per the needs of the model, as explained here. Add http://digitaloceanIP:8080 to watersplash.yml

 provider:
  	name: openfaas
  	gateway: http://digitaloceanIP:8080

and finally:

 faas-cli up -f watersplash.yml

That’s it! Congratulations! Your model is up and running! Access it at http://digitaloceanIP:8080/function/watersplash

If you get stuck at any stage, give me a shout below. 

DHIS2 and longitudinal health records: Connecting systems to get the best of both worlds!

DHIS2 is a health information system that revolutionized the way healthcare data is managed. It is open source and is a byproduct of a multinational action research project initiated from Oslo and first implemented in India. 1Currently, DHIS2 is the world’s largest health management information system (HMIS) platform, in use by 67 low and middle-income countries. 2.28 billion (30% of the world’s population) people live in countries where DHIS2 is used.

DHIS2 is a public health information system (PHIS) where the unit of management is a group or a geographical region and not individuals. It is unfortunate that this distinction between a typical EMR (a longitudinal health record) and a public health information system to manage population health is not clear to many policymakers.

The growing popularity of machine learning and artificial intelligence applications make the PH agencies rethink their data management strategies. A longitudinal health record is essential for most ML and AI applications for creating complex predictive models. PH agencies are gradually realizing the importance of data warehouses in managing the changing healthcare data management applications and workflows. Hence, the next generation of public health information systems should be able to efficiently handle longitudinal as well as group/cross-sectional data.

The easiest strategy to adopt may be to make existing PHIS systems talk to each other by leveraging the recent advances in health information exchange. HL7 may not be ideal for this purpose as it relies on a patient-centric model. FHIR may be more capable to deal with this, but the underlying REST interface may not support real-time data exchange.

RabbitMQ and Apache Kafka are industry standard open-source messaging frameworks that can be leveraged for real-time communication between disparate systems such as DHIS2 and OSCAR EMR / OpenMRS. DHIS2 supports both out of the box, and I have modified the DHIS2 docker container optimized for message exchange. A sample Java client is also available from my fork. The repo is here.

If you have ideas/want to work on creating DHIS2 connectors for EMRs like OSCAR EMR or OpenMRS, please comment below. OpenMRS has an existing module that can pull certain reports from DHIS2.

References

1.
Braa, Monteiro, Sahay. Networks of Action: Sustainable Health Information Systems across Developing Countries. M. 2004;28(3):337. doi:10.2307/25148643

10 points to consider before adopting open-source software in eHealth

Open-source software (hereafter OSS) is a phenomenon that has revolutionized the software industry. OSS is supported by voluntary programmers, who regularly dedicate their time and energy for the common good of all. The question that immediately comes to mind is how is it sustainable? Will they continue to contribute their social hours forever? Read the programmers perspective here. But does it make sense for healthcare organizations to accept their charity always? And, how do these organizations that adopt OSS improve the sustainability of these projects? These are some of the factors to consider:

artificial intelligence

Do you have enough funding?

OSS supporters are humanists with an emancipatory worldview. OSS is fundamentally not designed for an organization that can sustain a paid product. Firstly, there is the ethical problem of exploiting the OSS community. But more importantly, healthcare organizations with enough funding tend to spend more on the long-term maintenance and customization of OSS. Hence, OSS is generally designed to be an option when you have no other option.

Does the project have a regional focus?

OSS projects generally aim to solve global problems. So be careful when you hear Canadian OSS or Danish OSS. Regional OSS is mostly just cheaper local products masquerading as OSS for funding or for other reasons. They are unlikely to have the support of the global OSS community and is prone to burnout.

Is the OSS really OSS?

Any OSS worth its salt will be on GitHub. If you cannot find the project on GitHub, you should definitely ask why.

Is it really popular?

Some OSS that masquerade as OSS claim that they have a worldwide network of developers. The GitHub stars and forks would be a reasonable indicator of the popularity. Consider an OSS for your organization only if it has a thousand stars on the GitHub sky.

Are you looking for a specific workflow support?

Is your workflow generic enough to be supported by a global network of volunteers? Well, OHIP billing workflow may not be the right process to seek OSSM support.

Do you need customization?

If you need a lot of customizations to support your workflow, then OSS may not be the ideal solutions. OSS is ideally suited for situations where you can use it out of the box.

Do you have the time?

Remember that OSS is supported by voluntary programmers. So if you need a feature, you make a request and wait. If your organization is used to demanding, then OSS is not for you. OSS project is not owned by anyone, so their priorities may be different from yours.

Do you have internal expertise?

It is far easier to use an OSS if you have someone supporting the project in your organization. OSS community tends to respect one of their own more than an organization.

Supporting Open-Source Software?

It is crucial for organizations that depend on an OSS for your day to day operations to support the project. If the project becomes unsustainable, it affects the organization too. You can support the project in many ways such as donations, coding support and infrastructural support.

Do you know what OSS means and stands for?

Does the higher management know what OSS means and stands for? It is common in healthcare organizations to adopt OSS focusing on the free aspect.

“Free software” is a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech,” not as in “free beer”.

Personally, I think the first point is the most important. OSS is designed and intended for use in areas where a paid option is not viable. In other scenarios in healthcare, you are likely to spend more for an open-source product than you spend for a regular product.

Finally, a quick mention of some noteworthy OSS in healthcare. OpenMRS is an open-source EMR started with the mission to improve healthcare delivery in resource-constrained environments. DHIS2 is web-based open-source public health information system with awesome visualization features including GIS, charts and pivot tables.

Dockerized OSCAR EMR for developers

I have created a simple docker-compose script to set up Oscar for developers. The script checks out the master branch from OSCAR repository, compile with maven, create Docker containers and deploy them.

AngularJS and Electronic Health Records

I am not an Angular expert (as yet) though I used it for one of my successful applications called LesionMapper™. For the uninitiated, AngularJS is (yet another) javascript framework that is different in many ways. It extends HTML and implements the exciting concept of two-way data binding for dynamic web apps. If you are still skeptical about whether angular is big, please take a note of their logo that has a small (in)significant subscript: ‘by Google’. My intention here is not to dissect AngularJS and compare its many features or to disambiguate the diverse terminologies such as ‘directives’ and ‘expressions’ that makes it seem more daunting than it actually is.

Photocredit AngularJS @ github and jfcherry @ flikr (Images altered)

For health professionals, the bottom line is that AngularJS makes browsers powerful and can perform some of the tasks that are traditionally relegated to the server. So how is it going to improve our EMRs? During my student days, I have seen a popular regional EMR with a dismal user-interface. I have also seen a health analytics platform with more than 100 dropdowns on a single page. I have seen doctors returning to paper after failed EMR experiments. I have seen regional clinical viewers reeling under usability concerns. Can angularJS make any difference?

when I see a company mentioning they use angular, I read it as: no tech insight, no vision, hates life. #AngularJS
— Fredrik Carlsson (@fizk) November 15, 2014

A tool cannot change everything. Angular as a tool is not going to be a panacea. But the concept may change the way we think and organize our electronic health record systems. The traditional way of seeing EMRs as data-centric models was rejected by us, health professionals. Blame it on technology averse senior doctors or blame it on the inefficient healthcare system not learning from banks and airline industry, the fact is, eHealth failed to deliver! Will a new version with an improved interface change this? Unlikely!

EHealth has to accommodate our workflow, not the other way round. Because EMR is the tool, not us, physicians!

I know this is not my idea, and we have discussed this here before. How can AngularJS change this?

AngularJS could effectively separate presentation from storage. Our data scientists could work on the data layer (let me call it enterprise EHR) and concentrate on interoperability and population health. Our interface experts could work on the presentation layer customizing it for each department and each doctor, accommodating their workflow. (Let me call it Bring Your Own EMR #BYOE). We need a glue to bring both sides together. My vote is  for JSON/REST in the short term and RDF in the long term.

I am a huge fan of reverse innovation, and I am excited to see initiatives such as http://www.bahmni.org/ building AngularJS based customization layer on top of OpenMRS.org. True open source projects such as OpenMRS foster innovation (reverse or not).. It should open the eyes of others calling themselves open-source without being truly open! (Well.. that is another story..)

Related articles

GIT for Doctors & healthcare professionals – I

Read the full series on GIT for doctors here

Type 36 JSDF Arm Suit
Type 36 JSDF Arm Suit (Photo credit: Mechanekton)

GIT for us doctors is an acronym for GastroIntestinal Tract. But the GIT I am going to talk about here has nothing to do with GastroIntestinal tract. Do you have any gut feeling about what it is going to be?

Well, GIT according to wikipedia is a distributed revision control and source code management (SCM) system with an emphasis on speed. What has that got to do with healthcare professionals and doctors? As healthcare is becoming tech savvy, healthcare professionals and some doctors have started to recognize and understand, not just completed software products, but their source code as well.

The rising prominence of open-source movement in healthcare will greatly benefit from this, as doctors start contributing actively to healthcare application design and code. When you contribute code to an open-source project, there should be a mechanism to download what others have done, maintain concurrency as others keep adding things, keep track of what you add along with others who contribute, and finally impress the master with your contribution. This process is much more complex than the conventional version control or keeping track of older version by keeping copies of different stages. So that is where GIT steps in.

My aim is not to teach you the nitty-gritties of GIT, so that you will become a master of versioning systems. GIT can be quite challenging for doctors to understand and use. (Can you believe it: It is generally used as a command line tool). My aim is to make you comfortable enough to understand and use GIT in a user-friendly way so that when open source initiatives like openmrs.org talks about using GIT, you will know what they mean. BTW if you use EMRs but have not heard about openMRS.org, do take a look. Maybe you (like me) will also be enticed by their motto “ Write Code. Save Lives.”

Deutsch: Logo von GitHub
Deutsch: Logo von GitHub (Photo credit: Wikipedia)

I will start with actual steps in the next post. In the meantime, join the facebook of GIT called GitHub. https://github.com. If you feel like following someone checkout this guy 😉 https://github.com/dermatologist . Don’t download and install anything as yet. Just register for a new account.

Next, download and install this free ‘windows’ to GIT http://www.sourcetreeapp.com/ from a software company called Atlassian. This (surprisingly not so popular yet!) software will make your first experience with GIT painless, I promise.

Will be back again with more. BTW did I tell you that GIT may be useful for collaborative writing too!? Unfortunately GIT was not around when I wrote this article in a dermatology journal in 2007. http://www.ncbi.nlm.nih.gov/pubmed/18032878

Read the full series on GIT for doctors here