Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

🔍 Why FHIRy Matters

In the evolving landscape of health information systems, interoperability is no longer a luxury—it’s a necessity. The Fast Healthcare Interoperability Resources (FHIR) standard, developed by HL7, has emerged as a cornerstone for structuring and exchanging electronic health data. But while FHIR excels at standardization and data sharing, it stumbles when faced with the demands of modern analytics and AI workflows. Enter the fhiry package—a Python toolkit that bridges this gap with elegance and efficiency.

FHIRy: FHIR to pandas dataframe for data analytics, AI and ML!

🏥 The Promise of FHIR in Health Information Systems

FHIR was designed to solve a fundamental problem: how to enable seamless, standardized communication between disparate healthcare systems. It provides:

  • Modular Resources: Patient, Observation, Condition, Medication, and more—each defined with a consistent schema.
  • RESTful APIs: Making it easy to query and retrieve data using standard HTTP methods.
  • Extensibility: Supporting custom extensions while maintaining core interoperability.
  • Global Adoption: Used by major EHR vendors, government agencies, and research institutions.

In short, FHIR is the lingua franca of health data exchange. But when it comes to analytics and AI, its strengths become limitations.

⚠️ Why FHIR Is Not Conducive to AI and Analytics

Despite its utility, FHIR data presents several challenges for data scientists and machine learning practitioners:

1. Nested and Complex Structure

FHIR resources are deeply nested JSON objects. For example, a Patient resource might contain arrays of telecom entries, addresses, and extensions. This structure is great for flexibility but terrible for tabular analysis.

2. Inconsistent Representations

Even within the same resource type, fields may vary based on context or implementation. This inconsistency complicates feature engineering and model training.

3. Lack of Native Support for ML Pipelines

FHIR was not designed with TensorFlow, PyTorch, or scikit-learn in mind. Converting FHIR data into a format suitable for these tools requires significant preprocessing.

4. Limited Query Capabilities

FHIR servers support basic search parameters, but lack the expressive power of SQL or natural language queries. This limits exploratory data analysis and hypothesis generation.

5. Scalability Issues

Bulk data exports in NDJSON format are helpful, but parsing and flattening them into usable datasets is non-trivial—especially at scale.

🚀 Enter fhiry: FHIR to Pandas for AI and ML

The fhiry package, is a game-changer for anyone working at the intersection of healthcare and data science. It transforms FHIR’s complexity into analytical clarity.

🔧 What Is fhiry?

fhiry is a Python package that converts FHIR bundles and NDJSON files into flat, analysis-ready pandas DataFrames. It supports:

  • FHIR Server Search: Pull data directly from FHIR servers using the Search API.
  • Bulk NDJSON Import: Parse and flatten NDJSON files from SMART Bulk Data exports.
  • Google BigQuery Integration: Query FHIR tables hosted on BigQuery.
  • Natural Language Queries: Use LLMs to query FHIR data conversationally.
  • Custom Column Filtering and Renaming: Tailor the DataFrame to your needs.

📦 Key Features

This tool offers a range of features designed to efficiently manage and analyze FHIR data. It includes a flattening capability that converts nested FHIR JSON into flat DataFrames, simplifying data manipulation. The tool supports NDJSON, allowing for the efficient parsing of bulk exports. With the FHIR Search API, users can fetch resources using parameterized queries, enhancing data retrieval flexibility. Additionally, BigQuery access is enabled, providing SQL-like querying capabilities for FHIR datasets. LLM integration is supported through llama-index, which facilitates natural language queries. Finally, the tool offers configurable columns, allowing users to remove or rename fields through a JSON configuration.

The fhiry package includes a FlattenFhir class that transforms complex FHIR bundles or resources into flattened textual representations, making them suitable for LLM ingestion and reasoning.

🧰 Customization

You can pass a config JSON to remove or rename columns:

🌐 Why fhiry Matters

By flattening FHIR data and integrating with ML tools, fhiry unlocks new possibilities:

  • Accelerated Research: Quickly prototype models using real-world health data.
  • Improved Accessibility: Lower the barrier for data scientists unfamiliar with FHIR.
  • Enhanced Interoperability: Combine FHIR with other datasets in unified pipelines.
  • Scalable Analytics: Leverage BigQuery and LLMs for large-scale insights.

⭐ Final Thoughts

FHIR is indispensable for health data exchange, but its analytical limitations have long frustrated researchers and developers. fhiry elegantly solves this problem, transforming FHIR from a data silo into a launchpad for AI innovation.

Whether you’re building predictive models, exploring patient cohorts, or experimenting with LLMs in healthcare, fhiry is the missing link between interoperability and intelligence.

Explore the project on GitHub and give it a ⭐️ if it helps you unlock the full potential of FHIR.

Embeddings in healthcare: TypingDNA and Skinmesh

Neural networks (NN) are everywhere, from image analytics to NLP to clinical decision support systems. Embeddings are a popular class of techniques that emerged out of NN with diverse applications. embedding is a low-dimensional translation of high dimensional space. For clinicians, embedding is nothing but a simple representation of complex data.

SiobhĂĄn Grayson, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Embeddings are typically used in text analytics and NLP. However, the same concept can be extended to other domains. Take our typing pattern on a keyboard, for example. It is potentially a complex pattern that involves speed, the sequence of keys and the lag between keys. Typingdna finds an innovative way of converting this complex pattern into an embedding. The obvious use case is to use that embedding to identify the user and use it as a supplementary authentication mechanism. Typingdna provides a Javascript library that you can embed on any webpage and capture the typing embedding from any textbox! typingdna also provides a backend API for comparing the embedding for similarity.

Typing is a complex activity that requires interaction between the nervous system and the musculoskeletal system. I presume that any change in the neurocognitive state or muscle weakness could impact the typing pattern. Monitoring the changes in the typing embeddings may have applications in the monitoring or diagnosis of these disorders. To cut a long story short, I recently participated in a Devpost hackathon organized by typingdna. I submitted a simple solution that records the typing pattern and graphs the changes and proposed it as a monitoring application for Dermatomyositis. Details of the submission are available here and the simple prototype is here.

Devpost-TypingDNA hackathon

On a different note, I recently discovered that TensorflowJS has an inbuilt facemesh model that can detect facial landmarks from webcam streams, all within the browser! I have created a React component called skinmesh, that can harvest facial landmark coordinates (facial embedding) from a webcam or any facial image. I feel this has potential applications in cosmetic dermatology. I have blogged about this here. Give me a shout, if anybody wants to collaborate on creating a suitable backend.

Rendering FHIR Questionnaire for data capture

Standardized data collection forms are vital for health information systems. This is particularly true in public health, where there is a host of data collection forms shared by various organizations. InterRAI is a typical example. Standardization is important for collaborative data analytics at various levels, a need that became painfully apparent during the recent COVID-19 pandemic.

Google Nevit Dilmen Slawek Borewicz Commons-emblem-question blue / CC BY-SA (https://creativecommons.org/licenses/by-sa/3.0)

Though standardization of content is widely addressed, the standardization of the form appearance and rendering is less so. Many healthcare teams find their information systems incapable of dealing with this requirement. They have to often resort to expensive resources to create these forms using various technologies such as javascript. It is not easy to share such electronic forms (E-Forms) with other organizations because of the different requirements of the host systems.

The HL7 FHIR Questionnaire is emerging as a standard that is capable of dealing with both the content and presentation of E-forms. We proposed the FHIRForm framework to use this standard for form management and introduced an open-source stack for form management. One of the components of the framework is an npm module (javascript) to convert the FHIR Questionnaire resource to a JSON schema that can be used by popular form rendering libraries can use. The component called FHIRFormJS can also convert the form submission into a QuestionnaireResponse resource that can be submitted to any FHIR compliant servers. Below is a sample ReactJS application that uses FHIRFormJS to render FHIR Questionnaire.

FHIRFormJS was a work in progress for a long time and was not really stable enough to use. I am excited to introduce the new version of FHIRFormJS that is better and easier made possible by the @ahryman40k/ts-fhir-types library. If you have been using older systems such as LHC-Forms, give FHIRFormJS a try and let me know how we can improve it. Pull requests are most welcome and I have added this repository as a hacktoberfest participant. There is also a sample react application that uses FHIRFormJS (repository link above). If you are using Vue, I have a separate library that targets one of the popular form rendering engines for Vue.

Let me know if you find this interesting and use the GitHub issues for feature requests.

This blog has moved

This blog is now located at http://nuchange.ca/.
You will be automatically redirected in 30 seconds, or you may click here.

For feed subscribers, please update your feed subscriptions to
http://www.gulfdoctor.net/bioblog/bioblog.xml.

Counting conditional Occurrences using Prolog.


Counting conditional occurrences using Prolog.

I wanted to implement the following rule in prolog for my IISA project.

If the percentage of coil region is greater than 20% the homologue detection algorithms may become unreliable. The coil module of IISA returns prolog database in the following format.

coil(105,a,0.000).
coil(109,g,0.001).
coil(110,l,0.004).
coil(111,l,0.014).
coil(112,v,0.055).
coil(113,g,0.055).
coil(114,s,0.371).
coil(115,e,0.416).
coil(116,k,0.860).
coil(117,v,0.955).
coil(118,t,0.998).
coil(119,m,0.999).
coil(120,q,0.999).
coil(121,n,0.999).

ie coil(position, aminoacid, probability of being part of coil).

I wanted to count the number of facts with the probability exceeding a cutoff value, say 0.7.

Though it is simple problem, I could not find any solution for this even after googling for few hours. Finally I found this code on the net.

%Code to count the number of proofs for a goal. Found on the net.
count_proof(Goal, N) :-
            flag(counter, Outer, 0),
            (   call(Goal),
                flag(counter, Inner, Inner+1),
                fail;
                flag(counter, Count, Outer),
                N = Count
            ).
Though it worked well (I have still not figured out how!) the code is SWI-Prolog specific and did not work in JLog prolog applet I use for IISA web interface.

Finally I learned about findall and bagof
But still it took some time to figure out how to use findall for my purpose because the description read findall (Var, Database, Array) and to use length(Array). Using coil (_,_,A) as database will count only the number of aminoacids and I didn’t know how to implement the condition A>0.7.
Finally somewhere I found the actual definition for findall (Var, Goal, Array) which solved my problem, though I wasted a whole day on this simple problem.

Here is the final code.

%The region is coil if the value is greater than threshold
coil_region(Cutoff, Val):-
                     coil(_,_,Val),
                     Val>Cutoff.
                    
coil_percent(Region, Total, Percent):-
                     setting(coil,X),
                     findall(Val,coil_region(X,Val),R),
                     length(R,Region),
                     findall(C,coil(_,_,C),A),
                     length(A,Total),
                     Total>0,        %Avoid division by zero
                     Percent is (Region/Total) * 100,!.

coil_percent(0,0,0).

IISA Home Page.

Zostavax

Merck recently gained approval for a vaccine called Zostavax for preventing herpes zoster reactivation in elderly people. This live virus vaccine is unusual in that it is to prevent reemergence, not to prevent initial infection. Can the same concept be used for recurrent herpes simplex infection which is much more disabling than herpes zoster in younger people?

Vigyaan CD Part II [NX SERVER]

Running NX server & accessing it from windows machines.

To start NX server, click on KNOPPIX -> Services ->NX server.

The script is supposed to add a new user called nxuser. However this functionality is not working properly in Vigyaan 1.0. Hence you have to use the root account or the default knoppix account. But by default both these accounts do not have passwords. Hence you have to set a password for both these accounts.

Click on the terminal icon on status bar.
su
passwd
<change the password>
Close the terminal window
System ->kduser
Select knoppix
Uncheck Disable account’
Click Set password and set a new password.

Now on your windows machine download and install nx client from NOMACHINE.

Find the local IP of the vigyaan box by typing ifconfig in a terminal window. Now you can access vigyaan box from your windows machine.

Coiled Coils

Coiled coils consist of two to five amphipathic alpha-helices that twist around one another to form a supercoil which can be left-handed or right-handed. Left handed ones show a seven-residue periodicity and the right handed one a 11 residue periodicity the stability of which is achieved by a knobs-into-holes packing of apolar side chains into a hydrophobic core.


By modulation of their polar interactions, many different properties like extreme thermo stability can be achieved. Coiled coils are involved in signal transduction or molecular recognition. They provide mechanical stability to cells and are involved in movement process. Charged residues are frequently found at coiled coil interfaces.

The building blocks of IF architecture is an elongated coiled coil region (which inturn contains monomeric 1A and dimeric 1B, 2A and 2B sub segments connected by short linkers) flanked by non helical end domains. The dimeric 2B contains a discontinuity in the heptad repeat pattern called a stutter that creates an undecad repeat which is important for its structural integrity. The highly conserved region within the C-terminus of 2B has an inter-helical and intra-helical salt bridge.

There are several software products to predict the coiled coil region of proteins based on the above structural peculiarities.

References:
Peter Burkhard et al. Coiled Coil: a highly versatile protein folding motif. Trends in Cell Biology 2001:11(2); 82-8


VigyaanCD





Setting Up your own Bioinformatics workstation using VigyaanCD

VigyaanCD at
http://www.vigyaancd.org/
has a nice collection of bioinformatics software and is worth downloading. It will boot directly from the CD and needs very little linux expertise as the X-windows system is almost like windows. However I wanted to use it along with windows and I did not want to reboot every time I switch operating system. This is the story of how I set up VigyaanCD on a second hand PIII and use it from my win XP laptop.

I bought a second hand IBM PIII, 450 MHz, 10MB HDD and 128 MB RAM. I formatted the hard disk and added the following partitions
/boot 2GB (ext2), /knophome 6GB (ext2) /knopswap (vfat) /dos (vfat)
My video card supports only 800×600

I copied the CD image to the boot partition with the knoppix tohd =/dev/hda1 command. I created a persistent home directory in /knophome partition and a swap in /knopswap

I start with the following command.
Knoppix screen=800×600 fromhd home=scan noprompt noeject

I will write about how to set up the NX server and run the NX client from laptop to access it in my next post.

PLoS Clinical Trials


Clinical trials, particularly randomized trials are critical in delivering reliable evidence about the efficacy of an intervention. Clinical trial data can also provide important information about the potential adverse effects of treatment. Currently, not all trials on human participants are reported in the peer-reviewed literature. PLoS Clinical Trials aims to fill this gap. As an open-access journal, all articles published in the journal will be immediately and freely available online. Join them in supporting these goals, and get your paper read by the widest possible audience: submit your trial results today.
[Download Poster.]