Bell Eapen

Physician | HealthIT Developer | Digital Health Consultant

Open-source for healthcare

This post is meant to be an instruction guide for healthcare professionals who would like to join my projects on GitHub.

eHealth Programmer Girl

What is a contribution?

Contribution is not always coding. You can clean up stuff, add documentation, instructions for others to follow etc. Issues and feature requests should be posted under the ‘issues’ tab and general discussions under the ‘Discussions’ tab if one is available.

How do I contribute.

How do I develop

  • The .devcontainer folder will have the configuration for the docker container for development.
  • Version bump action (if present) will automatically bump version based on the following terms in a commit message: major/minor/patch. Avoid these words in the commit message unless you want to trigger the action.
  • Most repositories have GH actions to generate and deploy documentation and changelog.

What do I do next

  • My repositories (so far) are small enough for beginners to get the big picture and make meaningful contributions.
  • Don’t be discouraged if you make mistakes. That is how we all learn.

There’s no better time than now to choose a repo to contribute!

Clinical knowledge representation for reuse

The need for computerized clinical decision support is becoming increasingly obvious with the COVID-19 pandemic. The initial emphasis has been on ‘replacing’ the clinician which for a variety of reasons is impossible or impractical. Pragmatically, clinical decision support systems could provide clinical knowledge support for clinicians to make time-sensitive decisions with whatever information they have at the point of patient care.

Siobhán Grayson, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Providing clinical decision support requires some formal way of representing clinical knowledge and complex algorithms for sophisticated inference. In knowledge management terms, the information requires to be transformed into actionable knowledge. Knowledge needs to be represented and stored in a way conducive to easy inference (knowledge reuse)​1​. I have been exploring this domain for a considerable period of time, from ontologies to RDF datasets. With the advent of popular graph databases (especially Neo4J ), this seems to be a good knowledge representation method for clinical purposes.

To cut a long story short, I have been working on building a suite of JAVA libraries to support knowledge extraction, annotation and transformation to a graph schema for inference. I have not open-source it yet as I have not decided on what license to use. However, I am posting some preliminary information here to assess interest. Please give me a shout, if you share an interest or see some potential applications for this. As always, I am open to collaboration.

The JAVA package consists of three modules. The ‘library’ module wraps the NCBI’s E-Utils API to harvest published article abstracts if that is your knowledge source. Though data extraction from the clinical notes in EMR’s is a recent trend, it is challenging because of unstructured data and lack of interoperability. The ‘qtakes’ module provides a programmable interface to my quick-ctakes or the quarkus based apache ctakes, a fast clinical text annotation engine. Finally, the graph module provides the Neo4J models, repositories and services for abstracting as a knowledge graph.

The clinical knowledge graph (ckb) consists of entities such as Disease, Treatment and Anatomy and appropriate relationships and cypher queries are defined. The module exposes services that can be consumed by JAVA applications. It will be available as a maven artifact once I complete it.

UPDATE: May 30, 2021: The library (ckblib) is now available under MPL 2.0 license (see below). Feel free to use it in your research.

  1. 1.
    Toward a Theory of Knowledge Reuse: Types of Knowledge Reuse Situations and Factors in Reuse Success. Journal of Management Information Systems. Published online May 31, 2001:57-93. doi:10.1080/07421222.2001.11045671
Cite this article as: Eapen BR. (April 28, 2021). Nuchange.ca - Clinical knowledge representation for reuse. Retrieved May 24, 2022, from https://nuchange.ca/2021/04/clinical-knowledge-representation-for-reuse.html.

FHIR and public health data warehouses

First posted on CanEHealth.com

The provincial government is building a connected health care system centred around patients, families and caregivers through the newly established OHTs. As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

artificial intelligence

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams. 

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications. 

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout. 

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention? 

NLP for Clinical Notes – Tools and Techniques

Clinicians add clinical notes to the EMR on each visit. The clinical notes are unstructured in most cases and can benefit from NLP (natural language processing) tools and techniques. Some are created by dictation software or by medical scribes. Family physicians and family practice-centric EMRs like OSCAR EMR rely on unstructured clinical notes.

natural language processing
NLP for Clinical Notes

Clinical notes, because of the unstructured nature is difficult to analyze for statistical insights. Besides, the notes may require further processing for billing and for generating problem charts. The analysis is becoming increasingly important for quality assessments as well.

NLP can be useful in automated analysis of clinical notes. Here I have listed some of the open-source tools (some maintained by me) for such automated analysis of clinical notes.

Apache cTakes for NLP

Apache cTakes (clinical Text Analysis and Knowledge Extraction System) is one of the first open-source NLP systems that extract clinical information from electronic health record unstructured text. Though it is relatively slow, it is still widely used. I have packaged it as a Quarkus application, that is fast. Quarkus (Supersonic Subatomic Java) is designed primarily for docker containers and the quarkus based containers are easy to be deployed and scaled using platforms such as Kubernetes.

SpaCy and related tools for NLP

SpaCy is an open-source python library for NLP. It features NER, POS tagging, dependency parsing, word vectors and is widely used. But spacy is not designed for clinical workflows and may not be directly usable. Scispacy is SpaCy pipeline and models for scientific/biomedical documents trained on biomedical data. MedaCy is a healthcare-specific NLP framework built over spaCy to support the fast prototyping, training, and application of medical NLP models. One of the advantages of Medacy is that it is fast and lightweight.

UMLS

Unified Medical Language System (UMLS), is a set of files and software that brings together biomedical vocabularies for health information systems. UMLS provides a set of RESTful APIs for licensed users. I have created a JavaScript wrapper for the UMLS APIs that are easy to be called from JavaScript programs. It is available from the npm package repository. See the update on UmlsBERT below.

MedCAT

Medical  Concept Annotation Tool (MedCAT) is a relatively new tool for extraction and linking of terms from vocabularies such as UMLS and SNOMED for free text in EMRs. The paper describing MedCAT is here. MedCAT models can be further refined by training on a domain-specific corpus of text. MedCAT is fast and very useful.

Word Embeddings for NLP

A word embedding is a weighted model for text where words that have the same meaning have a similar weight. It is one of the most popular methods of deep learning for NLP problems. Word2Vec is a method to construct embeddings and the word2vec model based on the entire Wikipedia corpus is available for use. This paper describes the creation of a clinical concept embedding based on a large corpus of clinical documents. I have created a gensim wrapper for this model that can be used for concept similarity search in python.

BERT and related

Bidirectional Encoder Representations from Transformers (BERT) is a technique for NLP pre-training developed by Google. Here is the highly cited official paper. BERT has replaced embeddings as the most successful NLP technique in most domains including healthcare. Some of the refined BERT models used in healthcare are BioBERT and ClinicalBERT.

It is vital to deploy these models in a scalable and maintainable manner to be available for use within EMR systems. We are working on such a framework called ‘Serverless on FHIR’. Give me a shout if you want to know more.

UPDATE: May 30, 2021: The library (ckblib) is now available under MPL 2.0 license (see below). Feel free to use it in your research.

ckblib

Update (Dec 2020):

Researchers from the University of Waterloo have introduced the novel concept of UmlsBERT. Current clinical embedding such as BioBERT described above are generic models, trained further on clinical corpora applying the concept of transfer learning. Most biomedical ontologies such as UMLS define the hierarchies of concepts defined in them. UmlsBERT makes use of these hierarchical group information at the pre-training stage for augmenting the clinical concept embeddings. Table 3 in the paper compares the results with other embeddings, and it is quite impressive. The GitHub repo is here
Way to go George Michalopoulos and team!

Update (Mar 2021):

Create a chatbot to talk to an FHIR endpoint using conversational AI!

Update (May 2022):

ICDBigBird: A Contextual Embedding Model for ICD Code Classification: https://arxiv.org/pdf/2204.10408.pdf

How to deploy an h2o ai model using OpenFaaS on Digitalocean in 2 minutes

WordPress › Error

There has been a critical error on this website.

Learn more about troubleshooting WordPress.