Bell Eapen

Physician | HealthIT Developer | Digital Health Consultant

Clinical knowledge representation for reuse

The need for computerized clinical decision support is becoming increasingly obvious with the COVID-19 pandemic. The initial emphasis has been on ‘replacing’ the clinician which for a variety of reasons is impossible or impractical. Pragmatically, clinical decision support systems could provide clinical knowledge support for clinicians to make time-sensitive decisions with whatever information they have at the point of patient care.

Siobhán Grayson, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Providing clinical decision support requires some formal way of representing clinical knowledge and complex algorithms for sophisticated inference. In knowledge management terms, the information requires to be transformed into actionable knowledge. Knowledge needs to be represented and stored in a way conducive to easy inference (knowledge reuse)​1​. I have been exploring this domain for a considerable period of time, from ontologies to RDF datasets. With the advent of popular graph databases (especially Neo4J ), this seems to be a good knowledge representation method for clinical purposes.

To cut a long story short, I have been working on building a suite of JAVA libraries to support knowledge extraction, annotation and transformation to a graph schema for inference. I have not open-source it yet as I have not decided on what license to use. However, I am posting some preliminary information here to assess interest. Please give me a shout, if you share an interest or see some potential applications for this. As always, I am open to collaboration.

The JAVA package consists of three modules. The ‘library’ module wraps the NCBI’s E-Utils API to harvest published article abstracts if that is your knowledge source. Though data extraction from the clinical notes in EMR’s is a recent trend, it is challenging because of unstructured data and lack of interoperability. The ‘qtakes’ module provides a programmable interface to my quick-ctakes or the quarkus based apache ctakes, a fast clinical text annotation engine. Finally, the graph module provides the Neo4J models, repositories and services for abstracting as a knowledge graph.

The clinical knowledge graph (ckb) consists of entities such as Disease, Treatment and Anatomy and appropriate relationships and cypher queries are defined. The module exposes services that can be consumed by JAVA applications. It will be available as a maven artifact once I complete it.

UPDATE: May 30, 2021: The library (ckblib) is now available under MPL 2.0 license (see below). Feel free to use it in your research.

Tools to create a clinical knowledge graph from biomedical literature. Includes wrappers for NCBI Eutils, cTakes annotator and Neo4J
https://github.com/dermatologist/ckblib
2 forks.
4 stars.
0 open issues.

Recent commits:

  1. 1.
    Toward a Theory of Knowledge Reuse: Types of Knowledge Reuse Situations and Factors in Reuse Success. Journal of Management Information Systems. Published online May 31, 2001:57-93. doi:10.1080/07421222.2001.11045671
Cite this article as: Eapen BR. (April 28, 2021). Nuchange.ca - Clinical knowledge representation for reuse. Retrieved July 25, 2021, from https://nuchange.ca/2021/04/clinical-knowledge-representation-for-reuse.html.

COVID vaccination tracking with blockchain

COVID vaccine rollout has the potential to bring relief to billions of people around the world. But as encouraging as these programs may be, it is extremely important to note that a vaccine cannot be as effective if it is not effectively distributed and trusted by the public.

SPQR10, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

IBM Blockchain has a vaccine distribution network for manufacturers to proactively monitor for adverse events and improve recall management. Moderna is planning to explore vaccine traceability with the IBM blockchain.

The International Air Transport Association (IATA) is planning to launch a system of digital ‘passports’ as proof that passengers have been vaccinated against COVID-19. Blockchain technology could offer a better data-storage system for such vaccination records. A decentralized blockchain ledger would be anonymous, immutable and transparent and the entries can be publicly audited.

A vaccine blockchain system could support vaccine traceability and smart contract functions and can be used to address the problems of vaccine expiration and vaccine record fraud. Additionally, the use of machine learning models can provide valuable recommendations to immunization practitioners and recipients, allowing them to choose better immunization methods and vaccines as recommended by this study. A blockchain-based system developed by Singapore-based Zuellig Pharma can help governments and healthcare providers manage vaccine distribution and administration. UK hospitals are using blockchain to track the temperature of coronavirus vaccines.

In my opinion, a blockchain application in healthcare should satisfy the following characteristics:

  1. Both patient and provider should have an interest in the decentralized storage of the concerned piece of information. One party may be neutral, but there should not be a collision of interests.
  2. One or more third parties should have an interest in this information and may have a reason not to trust the patient or provider.
  3. The information should be a dynamic time-bound list that requires periodic updating.
  4. The privacy concern related to the concerned information should be minimal.
  5. The information should not be easy to measure or procure from other sources.

Vaccination satisfies the above criteria and as such blockchain may be a good solution for this problem. Before the COVID-19 pandemic, I had played a bit with solidity and made a web application with three different views:

  1. Provider view: From this view, a provider can extend an offer to save the information on the blockchain to a patient.
  2. Patient view: From this view, a patient can accept an offer extended by a provider.
  3. Lookup view: To look up information on any patient.

Vac-chain is a prototype of on-chain storage of vaccination information on Ethereum blockchain using smart contracts in solidity using the truffle Drizzle box (React/Redux).

Cite this article as: Eapen BR. (March 24, 2021). Nuchange.ca - COVID vaccination tracking with blockchain. Retrieved July 25, 2021, from https://nuchange.ca/2021/03/covid-vaccination-tracking-with-blockchain.html.

Clinical Query Language – Part 1

Clinical Query Language (CQL) is a high-level query language to represent and generate unambiguous quality measures or clinical decision rules. I am not a CQL expert. These are my notes from a system development perspective. I am trying to make sense of this emerging concept and add my notes here in the hope that others may find this useful.

U.S. Navy photo by Chief Warrant Officer 4 Seth Rossman. / Public domain (wikimedia)

Clinical Query Language is designed to be intuitive for clinicians authoring the queries for quality measures and clinical decision support. The decision support rules are mostly alert type rules at the individual and population level that is calculated from a database (not usually diagnostic decision support). You can use any data model with CQL.

Here is an example segment of CQL:

define “InDemographic”:
AgeInYearsAt(start of MeasurementPeriod) >= 16 and AgeInYearsAt(start of MeasurementPeriod) < 24
and “Patient”.”gender” in “Female Administrative Sex”

As Clinical Query Language follows strict semantics, you can autogenerate lexers, parsers and visitors using ANTLR. In simple terms, CQL’s semantics can be represented as a ‘grammar’ that ANTLR can read and generate code to process any CQL in a variety of programming languages, including Java, Javascript, Python, C# and Go. The CQL grammar files are here: https://cql.hl7.org/08-a-cqlsyntax.html. Incidentally, CQL grammar inherits from fhirpath.

If you wish to generate code from these files, there are two things to note:

  • You need to rename CQL.g4 to cql.g4 as the library names are case sensitive and should correspond to the filename.
  • Put fhirpath.g4 in the same folder as cql.g4, and cql refers to fhirpath grammar.

Clinical Query Language aims to provide a high-level domain-independent language for clinicians that can be translated into low-level database logic. As CQL does not prescribe a data model, an intermediary format linking CQL to the data management logic is required. That is called Expression Logical Model (ELM) that we will discuss in part 2.

OHDSI OMOP CDM ETL Tools in Python, .Net and Go

TL;DR Here are few OHDSI OMOP CDM tools that may save you time if you are developing ETL tools!

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

eHealth Programmer Girl
OHDSI OMOP CDM Libraries

The COVID-19 pandemic brought to light many of the vulnerabilities in our data collection and analytics workflows. Lack of uniform data models limits the analytical capabilities of public health organizations and many of them have to re-invent the wheel even for basic analysis. As many other sectors embrace big data and machine learning, many healthcare analysts are still stuck with the basic data wrenching with Excel.

The OHDSI OMOP CDM (Common data model) for observational data is a popular initiative for bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. Though OHDSI OMOP CDM is primarily for patient-centred observational analysis, mostly for clinical research, it can be used with minor tweaks for public health and epidemiologic data as well. We have written about some of the technical details here.

The OHDSI OMOP CDM is relatively simple and intuitive for clinical teams than emerging standards such as FHIR. Though the relational database approach and some of the software tools associated with OHDSI OMOP CDM are archaic, the data model is clinically motivated. There is an ecosystem of software tools for many of the analytics tools that can be used out of the box. The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 6.0, has simple but powerful vocabulary management. OHDSI OMOP CDM is a good choice for healthcare organizations moving towards health data warehousing and OLAP.

One weakness of OHDSI is the lack of tools for efficient ETL from existing EHR and HIS. Converting existing EHR data to the CDM is still a complex task that requires technical expertise. During the additional “home time” during the COVID pandemic, I have created three software libraries for ETL tool developers. These libraries in Python, .NET and Golang encapsulated the V6.0 CDM and helps in writing and reading data from a variety of databases with the V6.0 tables. The libraries also support creating the CDM tables for new databases and loading the vocabulary files.

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

These libraries might save you some time if you are building scripts for ETL to CDM. They are all open-source and free to use in your tools. Do give me a shout if you find these libraries useful and please star the repositories on GitHub.

FHIR and public health data warehouses

First posted on CanEHealth.com

The provincial government is building a connected health care system centred around patients, families and caregivers through the newly established OHTs. As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

artificial intelligence

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams. 

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications. 

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout. 

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention? 

Random forest model for predicting the total length of hospital stay (TLOS)

TL;DR here is the Random Forest classifier code:

And an (obvious) upfront disclaimer: This is a learning project. This is not for actual use.

DAD is a database consisting of patient demographics, comorbidities, interventions and the length of stay for the de-identified 10% sample of hospital admissions. DAD (2014-15) has an enhanced dataset with variables that were created at Western to act as flags for ICD-10 and CCI groupings, to make using the file easier.

Here is an experiment with the DAD enhanced dataset to create a Random forest model for predicting the total length of hospital stay (TLOS) in less than 100 lines of code. Random forests are an ensemble classifier, that operates by building multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. This is a learning project for Apache Spark and Spark ML using pyspark. The accuracy of the model taking all derived categorical variables is low.

I have access to Apache Spark @ CC. If you are installing Spark in your computer you may have to change the following:

Some of the commonly tweaked parameters can be changed here:

Uncomment the following line to include only variables that you need.

Here is the repo. How can this model be improved? Maybe a PCA before the RF? or Am I missing something important?

 

Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However, the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

eHealth User Interface Design for Seniors

Many of the open-source PHR systems have poor user interfaces (UI) by any standards while older adults have specific interface requirements. Credible research on the special interface requirements of elderly is still hard to find.

The Context-Aware Knowledge Retrieval – Infobutton

Infobutton

Image Credit: By Roberth Edberg (Own work (Original text: self-made)) [Public domain], via Wikimedia Commons. Image altered and text added.

Clinicians using patient information systems such as EMRs and HIE clinical viewers need a cross-referencing mechanism to clinical information aggregators such as DynaMed and Medlineplus. Busy practitioners don’t have time to wade through several pages to find what they need. So there has to be a mechanism to convey as much information as possible about the patient at hand to these knowledge providers so that the doctor is presented with information most relevant to the patient being managed. Infobutton is the standard (HL7 Version 3) for this information exchange, and Infobutton manager is the broker that facilitates this data transfer. With the increasing popularity of Personal Health Records (PHR) systems, the infobutton standard is relevant to PHRs as well.

Infobutton implementation can be either URL (REST) based or SOAP based. The information provided by the client system include patient characteristics such as age and sex, provider, care setting, clinical task and diagnosis. The OpenInfobutton project is an initiative led by the United States Veterans Health Administration (VHA) and the University of Utah that provides the infobutton manager that brokers the interaction between client information systems and online knowledge providers.

Understanding of Infobutton standard is important to Clinical information system developers, Clinical knowledge resource publishers and health care organizations such as regional clinical viewers. Several studies have demonstrated an increase in productivity for physicians with Infobutton adoption. Since infobutton standard is based on well-known protocols such as SOAP and REST, it is relatively easy to implement.