Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

Clinical knowledge representation for reuse

The need for computerized clinical decision support is becoming increasingly obvious with the COVID-19 pandemic. The initial emphasis has been on ‘replacing’ the clinician which for a variety of reasons is impossible or impractical. Pragmatically, clinical decision support systems could provide clinical knowledge support for clinicians to make time-sensitive decisions with whatever information they have at the point of patient care.

Siobhán Grayson, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Providing clinical decision support requires some formal way of representing clinical knowledge and complex algorithms for sophisticated inference. In knowledge management terms, the information requires to be transformed into actionable knowledge. Knowledge needs to be represented and stored in a way conducive to easy inference (knowledge reuse)​1​. I have been exploring this domain for a considerable period of time, from ontologies to RDF datasets. With the advent of popular graph databases (especially Neo4J ), this seems to be a good knowledge representation method for clinical purposes.

To cut a long story short, I have been working on building a suite of JAVA libraries to support knowledge extraction, annotation and transformation to a graph schema for inference. I have not open-source it yet as I have not decided on what license to use. However, I am posting some preliminary information here to assess interest. Please give me a shout, if you share an interest or see some potential applications for this. As always, I am open to collaboration.

The JAVA package consists of three modules. The ‘library’ module wraps the NCBI’s E-Utils API to harvest published article abstracts if that is your knowledge source. Though data extraction from the clinical notes in EMR’s is a recent trend, it is challenging because of unstructured data and lack of interoperability. The ‘qtakes’ module provides a programmable interface to my quick-ctakes or the quarkus based apache ctakes, a fast clinical text annotation engine. Finally, the graph module provides the Neo4J models, repositories and services for abstracting as a knowledge graph.

The clinical knowledge graph (ckb) consists of entities such as Disease, Treatment and Anatomy and appropriate relationships and cypher queries are defined. The module exposes services that can be consumed by JAVA applications. It will be available as a maven artifact once I complete it.

UPDATE: May 30, 2021: The library (ckblib) is now available under MPL 2.0 license (see below). Feel free to use it in your research.

Tools to create a clinical knowledge graph from biomedical literature. Includes wrappers for NCBI Eutils, cTakes annotator and Neo4J
https://github.com/dermatologist/ckblib
4 forks.
16 stars.
9 open issues.

Recent commits:

  1. 1.
    Toward a Theory of Knowledge Reuse: Types of Knowledge Reuse Situations and Factors in Reuse Success. Journal of Management Information Systems. Published online May 31, 2001:57-93. doi:10.1080/07421222.2001.11045671
Cite this article as: Eapen BR. (April 28, 2021). Clinical knowledge representation for reuse. Retrieved December 26, 2024, from https://nuchange.ca/2021/04/clinical-knowledge-representation-for-reuse.html.

If Ebola Spreads to Canada

While reading the news about the public health agency of Canada taking all possible steps to prevent the spread of Ebola to Canada, with a glass of Ontario wine in my hands, I for a brief moment thought, what if ………

Picture credit DFID @ Flikr (Image altered and text added) – If Ebola spreads to Canada

So let me set the context right. I am not an infectious disease expert, though my post on cutaneous signs of Ebola virus infection got more attention that it deserved. I am not an epidemiologist either to comment authoritatively on what healthmap is doing. To me it is the social media version of what John Snow did two centuries back to identify the epicentre of the cholera outbreak and established epidemiology as a speciality.

So if Ebola spreads to Canada, How do we identify the epicentre and take preventive measures? Turn to healthmaps and see where it originated and take measures to contain? Healthmaps will get that information from Google news and similar services. We have half a dozen major Health Information Exchange (HIE) initiatives in the country and would probably have accurate records of where each case presented with the characteristic symptoms. But we would look up to healthmaps and google since we cannot use HIE data for research!

i wonder how long it wil take for #ebola to hit #canada? which city first? and wil it get #outofcontrol? #crazy :headshaking:
— 411inToronto (@411inToronto) October 10, 2014

I am not a health policy expert neither am I an HIE architecture expert. But to me, if we have to realize the benefits of the ever increasing number of HIE initiatives, we have to find a way to use the wealth of the information there for population health. If we get it right, privacy is not even a concern.

HIE, built to abolish silos, paradoxically created larger silos, because of fragmented systems. The utopian population health requires a glue to bring these silos together. We got it wrong the first time, with data-centric HIS that offered little clinical workflow support and were (inadvertently) rejected by doctors. (We always have the doctors to blame as the universal slow technology adopters. BTW India’s mission to Mars discovered that all doctors in the planet originated from Mars!). We are sure to get it wrong again if we don’t change the data-centric HIE models.

HIE should be versatile, structureless and scalable enough to support disparate clinical use cases. The only option that comes to my mind is RDF.

If you are still unsure, read all that I have written about RDF. Convinced? Go ahead and head on over to the Yosemite Manifesto. BTW it has got nothing to do with the new OS X!

Resource Description Framework (RDF) and Population Informatics

English: A PICTURE OF A RDF
A PICTURE OF A RDF (Photo credit: Wikipedia)

I have been an RDF fan even before I used it for dermbase. I promptly signed the Yosmite Manifesto and blogged about it last year. After gaining more experience in the regional health information exchange initiative(s), I still feel that RDF is important, but in a different way.

Most federated regional clinical viewers query host databases, convert the results into an intermediary format (mostly xml or HL7), apply filters and then provide a consolidated view in the browser and mobile as html embellished with jQuery. Though this seems not-so-scalable technology, it works remarkably well in a regional context. Federated clinical viewers also attempt to create data warehouses on top of the Clinical Viewer. Such data warehouses have enormous potential in population informatics and RDF could be an ideal framework for this purpose.

RDF is a proven technology that is schema agnostic. However in this context the biggest advantage of RDF is its data-atomic nature that enables each data element to be queried, changed, or deleted independent of any other data element. RDF blank nodes can be used to effectively anonymize the data. From a data analytics perspective representation in the RDF format makes data amenable for “reasoning” to discover new knowledge.

Genomic data analytics has revolutionized pre-clinical research. Growing popularity of Health Information Technology (HIT) and Health Information Exchange (HIE) has not yet resulted in a similar impact on population health. There are some fundamental differences between genomic and clinical data.

The fundamental characteristics of genomic data are:

1. The data format is simple though it can be annotated in different ways.
2. Raw data is collected first without consideration of relevance. Hypothesis formulation and analysis come later.
3. The data is mostly anonymous.
4. The format and analysis protocol remain the same.

The clinical data has different characteristics:

1. The data is often complex and hence it is difficult to have a uniform format.
2. Data is collected to prove or disprove a hypothesis/diagnosis. Hence only relevant data is collected.
3. The data is often tagged to an individual.
4. The analysis protocol and data collection depends on the hypothesis/diagnosis.

RDF framework would allow abstracting population data from normal everyday HIE data for clinical practice, but both operating within the same ecosystem. The framework will also allow clinical data to have the analytics friendly qualities of genomic data. The clinical viewer can push data into the RDF repository without a separate warehousing process thereby reducing overhead and increasing relevance. New generation wearable devices and monitors can push anonymized raw data directly into the RDF repository. The privacy and security concerns of this architecture will be minimal.

There is another hitherto unexplored advantage for such a clinical RDF repository. Temporal data related to climate changes and other events such as natural calamities can also be pushed into the “structureless” RDF repository making it possible to assess the population health impact of such events.

Yosemite Manifesto on RDF as a Universal Healthcare Exchange Language

Layered Semantic Web Technology Stack
Layered Semantic Web Technology Stack (Photo credit: jalbertbowdenii)

The Bring Your Own EMR (#BYOE) pronounced ‘bio‘, as explained in my last post relies on a reliable interoperability platform. I have always believed that RDF is the key to successful interoperability. RDF has successfully been employed in several other fields and has many stable tools such as jena. I was searching the web for information on how to present the advantages of an RDF based interoperability platform for healthcare data. Then I found this website.

Yosemite Manifesto, pretty much summarizes whatever I had in mind and a lot more! They are also trying to raise awareness about the possible advantages of adopting the RDF platform by requesting researchers to sign a form. I have already signed. Have you?

I am starting a wiki page for #BYOE too.

Importing ONTODerm Ontology

Map of Colombia with Departments
Map of Colombia with Departments (Photo credit: Wikipedia)

Two engineering students from Colombia are using ONTODerm for a noble cause. They are planning to take dermatology to the poor and the underprivileged. They started a teledermatology project, but discontinued it because of lack of support. Now they are working on an Ontology based diagnostic application using semantic web technologies.

One of the problems they have encountered may be important to be addressed here. They could not import ONTODerm into Sesame. The URL provided in the ONTODerm home page is protege specific. However you can export the latest ONTODerm version in the native format from the project page on knoodl. Just request for a free membership to ONTODerm community.

Please keep me posted if you are using ONTODerm or DermKnowledgeBASE for any such projects and I wish both of them all the very best.