Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

Pragmatic Research That Builds and Travels

I have noticed a steady shift from abstract theorizing toward pragmatic research, resulting in tangible, reusable artifacts across many areas. These artifacts are not just code; they are models, methods, algorithms, datasets, and tools that solve real operational problems. In areas where generative AI is already changing workflows, the value of such pragmatic research is becoming unmistakable.

Pragmatic Research That Builds and Travels

Image credit: Justmee3001, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Why building matters now

The catalyst is twofold. First, the technical maturity of generative AI and related toolchains has lowered the cost of moving from idea to prototype. Second, health systems and organizations are asking for systems that integrate with workflows and regulatory constraints rather than for more conceptual frameworks. In practice, this means moving upstream in the research lifecycle: designing artifacts with deployability, explainability, and governance in mind, and creating reproducible stacks that others can use.

Open-source availability plays a special role. When models, algorithms, and tools are shared openly, they invite scrutiny, rapid iteration, and safer deployment, especially in high-stakes domains like healthcare, where transparency aids validation and trust. Open artifacts accelerate safe, community-driven improvements and reduce single-vendor lock-in, improving the odds that a research output will see real-world use.

How evaluation and impact change

Traditional academic success metrics emphasize conceptual novelty and citation counts. For pragmatic research, those metrics are necessary but insufficient. The new signals of value include artifact availability, adoption, downloads, forks, integration reports, and even social engagement that indicates uptake and practitioner interest. Empirical evaluation will increasingly combine:

  • Classical metrics from peer review and controlled experiments.
  • Community signals (downloads, GitHub stars/forks, package installs).
  • Operational outcomes (reduced task time, fewer errors, improved throughput).
  • Policy and governance readiness (documentation, auditing hooks, monitoring plans)

As researchers build usable systems, journals and conferences will need to evolve their review criteria to assess reproducibility and real-world applications, not just the strength of theoretical claims.

Sharing, incentives, and scholarly credit

Open-source distribution is central to the pragmatic approach because it enables external validation and iterative refinement. But scholarships must also evolve to reward the labor of engineering, documentation, and maintenance. Practical contributions, well-documented software and model releases, replicable deployment recipes, and usable toolkits should become first-class scholarly outputs. Peer communities should value artifacts that show measurable use in the wild, not just theoretical elegance.

Risks and guardrails

A pragmatic focus raises important risks: rushed or poorly validated tools entering clinical environments, fragile artifacts that break in new settings, and overreliance on usage metrics that can be gamed. Academic conferences and funders must insist on transparency: open validation datasets (where privacy allows), clear documentation of model limitations, and post-deployment evaluation plans.

What this means for MIS and health informatics researchers

For MIS researchers, the pragmatic paradigm reframes scholarship as product plus evidence. Studies should connect organizational processes, human factors, and deployed systems, measuring how an artifact changes decisions, coordination, or resource allocation. For health informatics scholars, the emphasis on safety, explainability, and auditability becomes non-negotiable; artifacts must be designed with clinical oversight, privacy-preserving techniques, and regulatory constraints in mind.

Practically, scholars will benefit from adopting engineering best practices: continuous integration for models, packaged reproducible environments, clear APIs, and user-centered design. Collaboration across disciplinary boundaries, clinical partners, product engineers, ethicists, and implementation scientists will be essential to translate artifacts into impact.

Research that travels

The pragmatic paradigm restores a simple promise: research should travel beyond the page. When MIS and health informatics scholars build artifacts designed for real settings and share them openly, scholarship becomes a living conversation, one of iterative improvements, operational learning, and measurable benefits. Publication will no longer be the last step in the journey; it will be a milestone on the route to adoption, where downloads, forks, deployment stories, and measurable outcomes tell the fuller story of impact. In an era powered by generative AI, the most consequential research will be the kind that people can pick up, run, and improve. Research that travels beyond the lab or paper into real-world settings.

IV. DocumentReference hook in CQL execution

The GitHub repository below is a fork of the CQL Execution Framework, which provides a TypeScript/JavaScript library for executing Clinical Quality Language (CQL) artifacts expressed as JSON ELM. The fork introduces an experimental feature supporting LLM-based assertion checking on DocumentReference. The framework enables execution of CQL logic within different data models, such as QDM and FHIR, but does not provide direct support for data models or terminology services. The library implements various features from CQL 1.4 and 1.5 but has some limitations, such as incomplete support for specific datatypes and functions.

Locally hosted LLMs

TL; DR: From my personal experiments (on an 8-year-old, i5 laptop with 16 GB RAM), locally hosted LLMs are extremely useful for many tasks that do not require much model-captured knowledge. 

Image credit: I, Luc Viatour, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

The era of relying solely on large language models (LLMs) for all-encompassing knowledge is evolving. As technology advances, the focus shifts towards more specialized and integrated systems. These systems combine the strengths of LLMs with real-time data access, domain-specific expertise, and interactive capabilities. This evolution aims to provide more accurate, context-aware, and up-to-date information, saving us time and addressing the limitations of static model knowledge.

I have started to realize that LLMs are more useful as language assistants who can summarize documents, write discharge summaries, and find relevant information from a patient’s medical record. The last one still has several unsolved limitations, and reliable diagnostic (or other) decision-making is still in the (distant?) future. In short, LLMs are becoming increasingly useful in healthcare as time-saving tools, but they are unlikely to replace us doctors as decision-makers soon. That raises an important question; Do locally hosted LLMs (or even the smaller models) have a role to play? I believe they do! 

Locally hosted large language models (LLMs) offer several key benefits. First, they provide enhanced data privacy and security, as all data remains on your local infrastructure, reducing the risk of breaches and unauthorized access. Second, they allow for greater customization and control over the hardware, software, and data used, enabling more tailored solutions. Additionally, locally hosted LLMs can operate offline, making them valuable in areas with unreliable internet access. Finally, they can reduce latency and potentially lower costs if you already have the necessary hardware. These advantages make locally hosted LLMs an attractive option for many users.  

The accessibility and ease of use offered by modern, user-friendly platforms like OLLAMA are significantly lowering the barriers for individuals seeking technical expertise in self-hosting large language models (LLMs). The availability of a range of open-source models on Hugging Face lowers the barrier even further. 

I have been doing some personal experiments with Ollama (on docker), Microsoft’s phi3: mini (language model) and all-minilm (embedding model), and I must say I am pleasantly surprised by the results! I have been using an 8-year-old, i5 laptop with 16 GB RAM. I have been using it as part of a project for democratizing Gen AI in healthcare, especially for resource-deprived areas (more about it here), and it does a decent job of vectorizing health records and answering questions based on RAG. I also made a helpful personal writing assistant that is RAG-based. I am curious to know if anybody else in my network is doing similar experiments with locally hosted LLMs on personal hardware. 

To or not to LangChain

LangChain is a free and accessible coordination framework for building applications that rely on large language models (LLMs). Although it is widely used, it sometimes receives critiques such as being complex, insecure, unscalable, and hard to maintain. As a novel framework, some of these critiques might be valid, but they might also be a strategy by the dominant LLM actors to regain power from the rebels. 

The well-known machine learning frameworks PyTorch and Tensorflow are from the major players who also own some of the largest and most powerful LLMs in the market. By offering these frameworks for free, they can attract more developers and researchers to use their LLMs and platforms and gain more data and insights from them. They can also shape the standards and norms of the LLM ecosystem and influence the direction of future research and innovation. 

It may not be the case that the major actors are actively trying to discredit LangChain, but some trends are worth noting. A common misconception is that LLM’s shortcomings are due to LangChain. You would often hear about LangChain hallucinating! Another frequent strategy is to confuse the discussion by bringing conflicting terms to the more widely used LangChain vocabulary. SDKs from major actors (deliberately) attempt to substitute their own syntaxes for LangChain’s. 

You might be bewildered after a conference run by the main players, and that could be part of their plan to make you dependent on their products. My approach is to use a mind map to keep track of the LLM landscape and refer to that when suggesting LLM solutions. It also helps to have a list of open-source implementations of common patterns.  

Mind map of LLM techniques, methods and tools
LLM Mind map

I have also noticed that the big players are gradually giving up and embarrassing the LangChain paradigms. I feel that despite LangChain’s limitations, it is here to stay! What do you think? 

Named Entity Recognition using LLMs: a cTakes alternative?

TLDR: The targeted distillation method described may be useful for creating an LLM-based cTakes alternative for Named Entity Recognition. However, the recipe is not available yet. 

Image credit: Wikimedia

Named Entity Recognition is essential in clinical documents because it enhances patient safety, supports efficient healthcare workflows, aids in research and analytics, and ensures compliance with regulations. It enables healthcare organizations to harness the valuable information contained in clinical documents for improved patient care and outcomes. 

Though Large Language Models (LLMs) can perform Named Entity Recognition (NER), the capability can be improved by fine-tuning, where you provide the model with input text that contains named entities and their associated labels. The model learns to recognize these entities and classify them into predefined categories. However, as described before fine-tuning Large Language Models (LLMs) is challenging due to the need for substantial, high-quality labelled data, the risk of overfitting on limited datasets, complex hyperparameter tuning, the requirement for computational resources, domain adaptation difficulties, ethical considerations, the interpretability of results, and the necessity of defining appropriate evaluation metrics. 

Targeted distillation of Large Language Models (LLMs) is a process where a smaller model is trained to mimic the behaviour of a larger, pre-trained LLM but only for specific tasks or domains. It distills the essential knowledge of the LLM, making it more efficient and suitable for particular applications, reducing computational demands.  

This paper described targeted distillation with mission-focused instruction tuning to train student models that can excel in a broad application class. The authors present a general recipe for such targeted distillation from LLMs and demonstrate that for open-domain NER. Their recipe may be useful for creating efficient distilled models that can perform NER on clinical documents, a potential alternative to cTakes. Though the authors have open-sourced their generic UniversalNER model, they haven’t released the distillation recipe code yet. 

REF: Zhou, W., Zhang, S., Gu, Y., Chen, M., & Poon, H. (2023). UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition. ArXiv. /abs/2308.03279 

Distilling LLMs to small task-specific models

Deploying large language models (LLMs) can be difficult because they require a lot of memory and computing power to run efficiently. Companies want to create smaller task-specific LLMs that are cheap and easy to deploy. Such small models may even be more interpretable, an important consideration in healthcare.

Distilling LLMs

Distilling LLMs refers to the process of training a smaller, more efficient model to mimic the behaviour of a larger, more complex LLM. This is done by training the smaller model on the same task as the larger model but using the predictions of the larger model as “soft targets” or guidance during training. The goal of distillation is to transfer the knowledge and capabilities of the larger model to the smaller model, without requiring the same level of computational resources.

Distilling step-by-step is an efficient distillation method proposed by Google that requires less amount of training data. The intuition is that the use of rationale generated by a chain of thought prompting along with labels during training, thereby framing it as multi-task learning, improves distillation performance. We can use ground truth labels or use a teacher LLM to generate the labels and rationale. Ground truth labels are the correct labels for the data, and they are typically obtained from human annotators. The rationale for each label can be generated by using the model to generate a short explanation for why the model predicted that label.

The paper on the method is here and the repository is here. I have converted the code from the original repository into a tool that can be used to distill any seq2seq model into a smaller model based on a generic schema. See the repository below. The original paper uses Google’s T5-v1 model, which is a large-scale language model that was developed by Google. It is part of the T5 (Text-to-Text Transfer Transformer) family of models and is based on the Transformer architecture. You can find more open-source base models for distilling on huggingface. The next plan is to use this method to create a model that can predict the FHIR filter for this repository.

Distilling LLMs step by step!

I will update this post regularly with my findings and notes on distilling models. Also, please check out my post on NLP tools in healthcare.

Using OpenFaaS containers in Kubeflow 

OpenFaas

OpenFaaS is an open-source framework for building serverless functions with containers. Serverless functions are pieces of code that are executed in response to a specific event, such as an HTTP request or a message being added to a queue. These functions are typically short-lived and only run when needed, which makes them a cost-effective and scalable way to build cloud-native applications. OpenFaaS makes it easy to build, deploy, and manage serverless functions. OpenFaaS CLI minimizes the need to write boilerplate code. You can write code in any supported language and deploy it to any cloud provider. It provides a set of base containers that encapsulates the ‘function’ with a webserver that exposes its HTTP service on port 8080 (incidentally the default port for Google Cloud Run). OpenFaaS containers can be directly deployed on Google Cloud Run and with the faas CLI on any cloud provider. 

OpenFaaS ® – Serverless Functions Made Simple

Kubeflow

Kubeflow is a toolkit for building and deploying machine learning models on Kubernetes. Kubeflow is designed to make it easy to build, deploy, and manage end-to-end machine learning pipelines, from data preparation and model training to serving predictions and implementing custom logic. It can be used with any machine learning framework or library. Google’s Vertex AI platform can run Kubeflow pipelines. Kubeflow pipeline components are self-contained code that can perform a step in the machine learning workflow. They are packaged as a docker container and pushed to a container registry that the Kubernetes cluster can access. A Kubeflow component is a containerized command line application that takes input and output as command line arguments.  

OpenFaaS containers expose HTTP services, while Kubeflow containers provide CLI services. That introduces the possibility of tweaking OpenFaaS containers to support CLI invocation, making the same containers usable as Kubeflow components. Below I explain how a minor tweak in the OpenFaaS templates can enable this. 

Let me take the OpenFaaS golang template as an example. The same principle applies to other language templates as well. In the golang-middleware’s main.go, the following lines set the main route and start the server. This exposes the function as a service when the container is deployed on Cloud Run.

 
	http.HandleFunc("/", function.Handle),  

	listenUntilShutdown(s, healthInterval, writeTimeout) 

I have added the following lines [see on GitHub] that expose the same function on the command line for Kubeflow.  

	if len(os.Args) < 2 {,  

		listenUntilShutdown(s, healthInterval, writeTimeout) 

	} else { 

		dat, _ := os.ReadFile(os.Args[1]) 

		_dat := function.HandleFile(dat) 

		_ = os.WriteFile(os.Args[2], _dat, 0777) 

	} 

If the input and output file names are supplied on the command line as in kubeflow, it reads from and writes to those files. The kubeflow component definition is as below: 

implementation:
  container:
    image: <image:version>
    command: [
        'sh',
        '-c',
        'mkdir --parents $(dirname "$1") && /home/app/handler "$0" "$1"',
    ]
    args: [{inputPath: Input 1}, {outputPath: Output 1}]

With this simple tweak, we can use the same container to host the function on any cloud provider as serverless functions and Kubeflow components.  You can pull the modified template from the repo below.

Open-source for healthcare

This post is meant to be an instruction guide for healthcare professionals who would like to join my projects on GitHub.

eHealth Programmer Girl

What is a contribution?

Contribution is not always coding. You can clean up stuff, add documentation, instructions for others to follow etc. Issues and feature requests should be posted under the ‘issues’ tab and general discussions under the ‘Discussions’ tab if one is available.

How do I contribute.

How do I develop

  • The .devcontainer folder will have the configuration for the docker container for development.
  • Version bump action (if present) will automatically bump version based on the following terms in a commit message: major/minor/patch. Avoid these words in the commit message unless you want to trigger the action.
  • Most repositories have GH actions to generate and deploy documentation and changelog.

What do I do next

  • My repositories (so far) are small enough for beginners to get the big picture and make meaningful contributions.
  • Don’t be discouraged if you make mistakes. That is how we all learn.

There’s no better time than now to choose a repo to contribute!