Bell Eapen

Physician | HealthIT Developer | Digital Health Consultant

Clinical Query Language – Part 1

Clinical Query Language (CQL) is a high-level query language to represent and generate unambiguous quality measures or clinical decision rules. I am not a CQL expert. These are my notes from a system development perspective (not a clinical author perspective). I am trying to make sense of this emerging concept and add my notes here in the hope that others may find this useful.

U.S. Navy photo by Chief Warrant Officer 4 Seth Rossman. / Public domain (wikimedia)

Clinical Query Language is designed to be intuitive for clinicians authoring the queries for quality measures and clinical decision support. The decision support rules are mostly alert type rules at the individual and population level that is calculated from a database (not usually diagnostic decision support). You can use any data model with CQL, and you can use any data model you prefer.

Here is an example segment of CQL:

define “InDemographic”:
AgeInYearsAt(start of MeasurementPeriod) >= 16 and AgeInYearsAt(start of MeasurementPeriod) < 24
and “Patient”.”gender” in “Female Administrative Sex”

As Clinical Query Language follows strict semantics, you can autogenerate lexers, parsers and visitors using ANTLR. In simple terms, CQL’s semantics can be represented as a ‘grammar’ that ANTLR can read and generate code to process any CQL in a variety of programming languages, including Java, Javascript, Python, C# and Go. The CQL grammar files are here: Incidentally, CQL grammar inherits from fhirpath.

If you wish to generate code from these files, there are two things to note:

  • You need to rename CQL.g4 to cql.g4 as the library names are case sensitive and should correspond to the filename.
  • Put fhirpath.g4 in the same folder as cql.g4, and cql refers to fhirpath grammar.

Clinical Query Language aims to provide a high-level domain-independent language for clinicians that can be translated into low-level database logic. As CQL does not prescribe a data model, an intermediary format linking CQL to the data management logic is required. That is called Expression Logical Model (ELM) that we will discuss in part 2.

OHDSI OMOP CDM ETL Tools in Python, .Net and Go

TL;DR Here are few OHDSI OMOP CDM tools that may save you time if you are developing ETL tools!

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

eHealth Programmer Girl

The COVID-19 pandemic brought to light many of the vulnerabilities in our data collection and analytics workflows. Lack of uniform data models limits the analytical capabilities of public health organizations and many of them have to re-invent the wheel even for basic analysis. As many other sectors embrace big data and machine learning, many healthcare analysts are still stuck with the basic data wrenching with Excel.

The OHDSI OMOP CDM (Common data model) for observational data is a popular initiative for bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. Though OHDSI OMOP CDM is primarily for patient-centred observational analysis, mostly for clinical research, it can be used with minor tweaks for public health and epidemiologic data as well. We have written about some of the technical details here.

The OHDSI OMOP CDM is relatively simple and intuitive for clinical teams than emerging standards such as FHIR. Though the relational database approach and some of the software tools associated with OHDSI OMOP CDM are archaic, the data model is clinically motivated. There is an ecosystem of software tools for many of the analytics tools that can be used out of the box. The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 6.0, has simple but powerful vocabulary management. OHDSI OMOP CDM is a good choice for healthcare organizations moving towards health data warehousing and OLAP.

One weakness of OHDSI is the lack of tools for efficient ETL from existing EHR and HIS. Converting existing EHR data to the CDM is still a complex task that requires technical expertise. During the additional “home time” during the COVID pandemic, I have created three software libraries for ETL tool developers. These libraries in Python, .NET and Golang encapsulated the V6.0 CDM and helps in writing and reading data from a variety of databases with the V6.0 tables. The libraries also support creating the CDM tables for new databases and loading the vocabulary files.

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

These libraries might save you some time if you are building scripts for ETL to CDM. They are all open-source and free to use in your tools. Do give me a shout if you find these libraries useful and please star the repositories on GitHub.

FHIR and public health data warehouses

First posted on

The provincial government is building a connected health care system centred around patients, families and caregivers through the newly established OHTs. As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

artificial intelligence

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams. 

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications. 

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout. 

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention? 

Random forest model for predicting the total length of hospital stay (TLOS)

TL;DR here is the Random Forest classifier code:

And an (obvious) upfront disclaimer: This is a learning project. This is not for actual use.

DAD is a database consisting of patient demographics, comorbidities, interventions and the length of stay for the de-identified 10% sample of hospital admissions. DAD (2014-15) has an enhanced dataset with variables that were created at Western to act as flags for ICD-10 and CCI groupings, to make using the file easier.

Here is an experiment with the DAD enhanced dataset to create a Random forest model for predicting the total length of hospital stay (TLOS) in less than 100 lines of code. Random forests are an ensemble classifier, that operates by building multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. This is a learning project for Apache Spark and Spark ML using pyspark. The accuracy of the model taking all derived categorical variables is low.

I have access to Apache Spark @ CC. If you are installing Spark in your computer you may have to change the following:

Some of the commonly tweaked parameters can be changed here:

Uncomment the following line to include only variables that you need.

Here is the repo. How can this model be improved? Maybe a PCA before the RF? or Am I missing something important?


Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However, the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

eHealth User Interface Design for Seniors

Many of the open-source PHR systems have poor user interfaces (UI) by any standards while older adults have specific interface requirements. Credible research on the special interface requirements of elderly is still hard to find.

The Context-Aware Knowledge Retrieval – Infobutton


Image Credit: By Roberth Edberg (Own work (Original text: self-made)) [Public domain], via Wikimedia Commons. Image altered and text added.

Clinicians using patient information systems such as EMRs and HIE clinical viewers need a cross-referencing mechanism to clinical information aggregators such as DynaMed and Medlineplus. Busy practitioners don’t have time to wade through several pages to find what they need. So there has to be a mechanism to convey as much information as possible about the patient at hand to these knowledge providers so that the doctor is presented with information most relevant to the patient being managed. Infobutton is the standard (HL7 Version 3) for this information exchange, and Infobutton manager is the broker that facilitates this data transfer. With the increasing popularity of Personal Health Records (PHR) systems, the infobutton standard is relevant to PHRs as well.

Infobutton implementation can be either URL (REST) based or SOAP based. The information provided by the client system include patient characteristics such as age and sex, provider, care setting, clinical task and diagnosis. The OpenInfobutton project is an initiative led by the United States Veterans Health Administration (VHA) and the University of Utah that provides the infobutton manager that brokers the interaction between client information systems and online knowledge providers.

Understanding of Infobutton standard is important to Clinical information system developers, Clinical knowledge resource publishers and health care organizations such as regional clinical viewers. Several studies have demonstrated an increase in productivity for physicians with Infobutton adoption. Since infobutton standard is based on well-known protocols such as SOAP and REST, it is relatively easy to implement.

2015 National Change Management Survey (Canada)

ລາວ: ການຈັດການຕ້ອງເຮັດໃຫ້ດີ

(Photo credit: Wikipedia)

In 2010 Infoway conducted a change management survey, which was the catalyst to establishing the Pan Canadian Change Management Network and development of the National Change Management Framework.

Fast forward 5 years…Canada Health Infoway’s (Infoway) Clinical Adoption Team is conducting a survey to gain insight on how change management is currently being conducted across the country. Change management is an essential driver of adoption and benefits realized from the use of digital health. Infoway’s multifaceted approach to change management is guided by an evidence-informed change management framework that was developed with input from experts across Canada in 2011.

The purpose of this survey is to assess how change management is currently being conducted across the country and the usefulness of the Change Management framework and associated toolkit. The survey takes between 10-15 minutes to complete. Upon completion of the survey, you can enter the participant’s draw, where you could win one of five $200 Visa cards. The survey will be available until February 12, 2015. Your input will help provide direction on how to shape the ongoing development of change management resources for the adoption of digital health solutions across the country in 2015 and beyond.

Link to 2015 National Change Management Survey:

If Ebola Spreads to Canada

While reading the news about the public health agency of Canada taking all possible steps to prevent the spread of Ebola to Canada, with a glass of Ontario wine in my hands, I for a brief moment thought, what if ………

Picture credit DFID @ Flikr (Image altered and text added) – If Ebola spreads to Canada

So let me set the context right. I am not an infectious disease expert, though my post on cutaneous signs of Ebola virus infection got more attention that it deserved. I am not an epidemiologist either to comment authoritatively on what healthmap is doing. To me it is the social media version of what John Snow did two centuries back to identify the epicentre of the cholera outbreak and established epidemiology as a speciality.

So if Ebola spreads to Canada, How do we identify the epicentre and take preventive measures? Turn to healthmaps and see where it originated and take measures to contain? Healthmaps will get that information from Google news and similar services. We have half a dozen major Health Information Exchange (HIE) initiatives in the country and would probably have accurate records of where each case presented with the characteristic symptoms. But we would look up to healthmaps and google since we cannot use HIE data for research!

i wonder how long it wil take for #ebola to hit #canada? which city first? and wil it get #outofcontrol? #crazy :headshaking:
— 411inToronto (@411inToronto) October 10, 2014

I am not a health policy expert neither am I an HIE architecture expert. But to me, if we have to realize the benefits of the ever increasing number of HIE initiatives, we have to find a way to use the wealth of the information there for population health. If we get it right, privacy is not even a concern.

HIE, built to abolish silos, paradoxically created larger silos, because of fragmented systems. The utopian population health requires a glue to bring these silos together. We got it wrong the first time, with data-centric HIS that offered little clinical workflow support and were (inadvertently) rejected by doctors. (We always have the doctors to blame as the universal slow technology adopters. BTW India’s mission to Mars discovered that all doctors in the planet originated from Mars!). We are sure to get it wrong again if we don’t change the data-centric HIE models.

HIE should be versatile, structureless and scalable enough to support disparate clinical use cases. The only option that comes to my mind is RDF.

If you are still unsure, read all that I have written about RDF. Convinced? Go ahead and head on over to the Yosemite Manifesto. BTW it has got nothing to do with the new OS X!