Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

Architecting LLM solutions for healthcare – Part II

Healthcare data and applications are complex and require careful design choices. In a previous post, I have outlined some examples and tools for architecting LLM solutions.

One challenge of developing LLM applications for healthcare is the complexity and diversity of the architectures involved. LLMs can be used for different purposes, such as information retrieval, text generation, or knowledge extraction. Each of these tasks may require different components, such as vector stores, databases, chains, or tools. Moreover, the components may interact with each other in various ways, such as through APIs, data pipelines, or feedback loops.

To communicate the architecture of an LLM application to software developers, it is helpful to have a set of standardized symbols that can represent the different components and their interactions. Such standardized notations help in developing the conceptual (business) architecture to a logical architecture. Symbols can provide a visual and intuitive way of describing the structure and logic of an LLM application, without requiring too much technical detail or jargon. Symbols can also facilitate comparison and evaluation of different architectures and identify potential gaps or errors.

In this post, I propose a set of symbols for LLM components that are likely to be used commonly in healthcare applications. This may apply to other domains as well. Our goal is to provide a useful communication tool for software developers who want to design, implement, or understand LLM applications for healthcare.

Architecting LLM solutions: LLM notations and symbols

Most of the symbols above, for architecting LLM solutions, are self-explanatory and aligns with abstractions in LangChain. An LLM chain can have sequential or parallel flow, depending on the design and purpose of the application. We use the following conventions to depict the interactions between LLM components:

Other Symbols

  • An arrow indicates the direction of data flow between components. For example, A -> B means that component A sends data to component B.
  • A dashed arrow indicates an optional or conditional data flow between components. For example, A -/> B means that component A may or may not send data to component B, depending on some condition.
  • A loop arrow indicates a feedback or reinforcement data flow between components. For example, A B means that component A and component B exchange data with each other, either to update their parameters or to improve their performance.
  • A branch arrow indicates a parallel or alternative data flow between components. For example, A -|> B means that component A splits its data into two streams, one of which goes to component B and the other goes elsewhere.
  • A merge arrow indicates a joint or combined data flow between components. For example, A B means that component A and component B join their data into one stream, which goes to another component.
  • A label above an arrow indicates the type or format of the data that flows between components. For example, A ->[text] B means that component A sends textual data to component B.

I would love to hear your suggestions, feedback and input on how to improve this process!

Navigating the Complexities of Gen AI in Medicine: 5 Development Blunders to Avoid

Below, I have listed five critical missteps that you should steer clear of to ensure the successful integration of Gen AI in Medicine. This post is primarily for healthcare professionals managing a software team developing a Gen AI application.

Image credit: Nicolas Rougier, GPL via Wikimedia Commons

#1 Focus on requirements

Gen AI is an evolving technological landscape. ChatGPT’s user interface makes it look simplistic. Even a simple interface to any LLM is useful for mundane clinical chores (provided PHI is handled appropriately). However, developing an application that can automate tasks or assist clinical decision-making requires much engineering. It’s crucial to define clear and detailed requirements for your Gen AI solution. Without a comprehensive understanding of the needs and constraints, your project can easily become misaligned with clinical goals. Ensure that your AI application is not only technically sound but also meets the specific demands of healthcare settings. This precision will guide your software development process, avoiding costly detours or features that do not add value to healthcare providers or patients.

#2 Avoid solutioning

When working with your software team, be wary of dictating specific semi-technical solutions too early in the process. Most applications require techniques beyond prompting in a text window. It’s essential to allow your engineering team to explore and assess various options that can best meet the outlined requirements. By fostering an environment where creative and innovative problem-solving flourishes, you enable the team to find the most effective and sustainable technological path. This approach can also lead to discoveries of new capabilities of Gen AI that could benefit your project.

#3 Prioritize features

It’s essential to prioritize the features that will bring the most value to the end-user. Engage with stakeholders, including clinicians and patients, to understand what functionalities are most critical for their workflow and care delivery. This collaborative approach ensures the practicality of the AI application and aligns it with user needs. Avoid overloading your app with unnecessary features that complicate the user experience and detract from the core value proposition. Instead, aim for a lean product with high-impact features.

Gen AI app development is a time-consuming and technically challenging process. It is important to keep this in mind while prioritizing. Time and resource management are key in this regard. Allocate sufficient time for your team to refine their work, ensuring that each feature is developed with quality and precision. This disciplined approach to scheduling also helps in avoiding burnout among your team members, which is common in high-pressure development environments. Remember, a feature-packed application that lacks reliability or user-friendliness is less likely to be embraced by the healthcare community. Focus on delivering a polished, useful tool.

#4 You may never get it right, the first time when it comes to Gen AI in Medicine

Accept that perfection is unattainable on the initial try. In the world of software, especially with Gen AI, iterative testing and refinement are key. Encourage your team to build a Minimum Viable Product (MVP) and then improve it through user feedback and continuous development cycles. This iterative process is crucial to adapt to the ever-changing needs of healthcare professionals and to integrate the latest advancements in AI. Also, don’t underestimate the value of user testing; real-world feedback is invaluable.

#5 Avoid technology pivots and information overloads

Avoiding abrupt technological shifts late in the development cycle is critical. Such pivots can be costly and disruptive, derailing the project timeline. Stay committed to the chosen technology stack unless significant, unforeseeable impediments arise. Additionally, guard against overwhelming your team with excessive information. While staying informed is crucial, too much data can paralyze decision-making. Strive for a balance that empowers your team with the knowledge they need to be effective without causing analysis paralysis.

In my next post, I will explain the symbols and notations that I employ in my Gen AI in Medicine development process. BTW, What is your next Gen AI in Medicine project?

Medprompt: How to architect LLM solutions for healthcare.

Leveraging the power of advanced machine learning, particularly large language models (LLMs), has increasingly become a transformative element in healthcare and medicine. The applications of LLMs in healthcare are multifaceted, showing immense potential to improve patient outcomes, streamline administrative tasks, and foster medical research and innovation.

David S. Soriano, CC BY-SA 4.0 <https://creativecommons.org/licenses/by-sa/4.0>, via Wikimedia Commons
Image Credit: David S. Soriano, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Architecting LLM solutions in the healthcare domain is challenging because of the intricacies associated with healthcare data and the complex nature of healthcare applications. In this post, I will give some recommendations based on the widely popular LangChain library, giving some examples.

The first step is to define the overarching problem you are trying to solve. It can be broad as in getting the right information about a patient to the doctor. Next, subdivide the problem into subproblems that can be tackled separately. For example in the above case, we need to find the patient’s health record, convert it into an easily searchable form (embedding), find areas of interest in the record and generate a summary or an answer to the specific question. Next, find solutions for each problem that may or may not require an LLM. Finally, design the orchestrator that can stitch everything together.

LangChain has some useful abstractions that will help in the last two steps. If the solution does not involve an LLM and mostly involves data retrieval and transformations, use the tool abstraction. If you need one or more LLM calls to achieve it, use the chain abstraction. Agents are the orchestrators that can stitch everything together. It is important to carefully craft the prompts for the chains and agents. Rigorous testing is vital. This includes technical performance and validation of the model’s recommendations by healthcare professionals to ensure they are accurate and clinically relevant.

MEDPrompt coming soon!

Named Entity Recognition using LLMs: a cTakes alternative?

TLDR: The targeted distillation method described may be useful for creating an LLM-based cTakes alternative for Named Entity Recognition. However, the recipe is not available yet. 

Image credit: Wikimedia

Named Entity Recognition is essential in clinical documents because it enhances patient safety, supports efficient healthcare workflows, aids in research and analytics, and ensures compliance with regulations. It enables healthcare organizations to harness the valuable information contained in clinical documents for improved patient care and outcomes. 

Though Large Language Models (LLMs) can perform Named Entity Recognition (NER), the capability can be improved by fine-tuning, where you provide the model with input text that contains named entities and their associated labels. The model learns to recognize these entities and classify them into predefined categories. However, as described before fine-tuning Large Language Models (LLMs) is challenging due to the need for substantial, high-quality labelled data, the risk of overfitting on limited datasets, complex hyperparameter tuning, the requirement for computational resources, domain adaptation difficulties, ethical considerations, the interpretability of results, and the necessity of defining appropriate evaluation metrics. 

Targeted distillation of Large Language Models (LLMs) is a process where a smaller model is trained to mimic the behaviour of a larger, pre-trained LLM but only for specific tasks or domains. It distills the essential knowledge of the LLM, making it more efficient and suitable for particular applications, reducing computational demands.  

This paper described targeted distillation with mission-focused instruction tuning to train student models that can excel in a broad application class. The authors present a general recipe for such targeted distillation from LLMs and demonstrate that for open-domain NER. Their recipe may be useful for creating efficient distilled models that can perform NER on clinical documents, a potential alternative to cTakes. Though the authors have open-sourced their generic UniversalNER model, they haven’t released the distillation recipe code yet. 

REF: Zhou, W., Zhang, S., Gu, Y., Chen, M., & Poon, H. (2023). UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition. ArXiv. /abs/2308.03279 

Distilling LLMs to small task-specific models

Deploying large language models (LLMs) can be difficult because they require a lot of memory and computing power to run efficiently. Companies want to create smaller task-specific LLMs that are cheap and easy to deploy. Such small models may even be more interpretable, an important consideration in healthcare.

Distilling LLMs

Distilling LLMs refers to the process of training a smaller, more efficient model to mimic the behaviour of a larger, more complex LLM. This is done by training the smaller model on the same task as the larger model but using the predictions of the larger model as “soft targets” or guidance during training. The goal of distillation is to transfer the knowledge and capabilities of the larger model to the smaller model, without requiring the same level of computational resources.

Distilling step-by-step is an efficient distillation method proposed by Google that requires less amount of training data. The intuition is that the use of rationale generated by a chain of thought prompting along with labels during training, thereby framing it as multi-task learning, improves distillation performance. We can use ground truth labels or use a teacher LLM to generate the labels and rationale. Ground truth labels are the correct labels for the data, and they are typically obtained from human annotators. The rationale for each label can be generated by using the model to generate a short explanation for why the model predicted that label.

The paper on the method is here and the repository is here. I have converted the code from the original repository into a tool that can be used to distill any seq2seq model into a smaller model based on a generic schema. See the repository below. The original paper uses Google’s T5-v1 model, which is a large-scale language model that was developed by Google. It is part of the T5 (Text-to-Text Transfer Transformer) family of models and is based on the Transformer architecture. You can find more open-source base models for distilling on huggingface. The next plan is to use this method to create a model that can predict the FHIR filter for this repository.

Distilling LLMs step by step!

I will update this post regularly with my findings and notes on distilling models. Also, please check out my post on NLP tools in healthcare.

Kedro for multimodal machine learning in healthcare 

Healthcare data is heterogenous with several types of data like reports, tabular data, and images. Combining multiple modalities of data into a single model can be challenging due to several reasons. One challenge is that the diverse types of data may have different structures, formats, and scales which can make it difficult to integrate them into a single model. Additionally, some modalities of data may be missing or incomplete, which can make it difficult to train a model effectively. Another challenge is that different modalities of data may require different types of pre-processing and feature extraction techniques, which can further complicate the integration process. Furthermore, the lack of large-scale, annotated datasets that have multiple modalities of data can also be a challenge. Despite these challenges, advances in deep learning, multi-task learning and transfer learning are making it possible to develop models that can effectively combine multiple modalities of data and achieve reliable performance. 

Pipelines Kedro for multimodal machine learning

Kedro for multimodal machine learning

Kedro is an open-source Python framework that helps data scientists and engineers organize their code, increase productivity and collaboration, and make it easier to deploy their models to production. It is built on top of popular libraries such as Pandas, TensorFlow and PySpark, and follows best practices from software engineering, such as modularity and code reusability. Kedro supplies a standardized structure for organizing code, handling data and configuration, and running experiments. It also includes built-in support for version control, logging, and testing, making it easy to implement reproducible and maintainable pipelines. Additionally, Kedro allows to easily deploy the pipeline on cloud platforms like AWS, GCP or Azure. This makes it a powerful tool for creating robust and scalable data science and data engineering pipelines. 

I have built a few kedro packages that can make multi-modal machine learning easy in healthcare. The packages supply prebuilt pipelines for preprocessing images, tabular and text data and build fusion models that can be trained on multi-modal data for easy deployment. The text preprocessing package currently supports BERT and CNN-text models. There is also a template that you can copy to build your own pipelines making use of the preprocessing pipelines that I have built. Any number and combination of data types are supported. Additionally, like any other kedro pipeline, these can be deployed on kubeflow and VertexAI. Do comment below if you find these tools useful in your research. 

Link to the repository below.

Dark Mode

kedro-multimodal (this link opens in a new window) by dermatologist (this link opens in a new window)

Template for multi-modal machine learning in healthcare using Kedro. Combine reports, tabular data and image using various fusion methods.

Using OpenFaaS containers in Kubeflow 

OpenFaas

OpenFaaS is an open-source framework for building serverless functions with containers. Serverless functions are pieces of code that are executed in response to a specific event, such as an HTTP request or a message being added to a queue. These functions are typically short-lived and only run when needed, which makes them a cost-effective and scalable way to build cloud-native applications. OpenFaaS makes it easy to build, deploy, and manage serverless functions. OpenFaaS CLI minimizes the need to write boilerplate code. You can write code in any supported language and deploy it to any cloud provider. It provides a set of base containers that encapsulates the ‘function’ with a webserver that exposes its HTTP service on port 8080 (incidentally the default port for Google Cloud Run). OpenFaaS containers can be directly deployed on Google Cloud Run and with the faas CLI on any cloud provider. 

OpenFaaS ® – Serverless Functions Made Simple

Kubeflow

Kubeflow is a toolkit for building and deploying machine learning models on Kubernetes. Kubeflow is designed to make it easy to build, deploy, and manage end-to-end machine learning pipelines, from data preparation and model training to serving predictions and implementing custom logic. It can be used with any machine learning framework or library. Google’s Vertex AI platform can run Kubeflow pipelines. Kubeflow pipeline components are self-contained code that can perform a step in the machine learning workflow. They are packaged as a docker container and pushed to a container registry that the Kubernetes cluster can access. A Kubeflow component is a containerized command line application that takes input and output as command line arguments.  

OpenFaaS containers expose HTTP services, while Kubeflow containers provide CLI services. That introduces the possibility of tweaking OpenFaaS containers to support CLI invocation, making the same containers usable as Kubeflow components. Below I explain how a minor tweak in the OpenFaaS templates can enable this. 

Let me take the OpenFaaS golang template as an example. The same principle applies to other language templates as well. In the golang-middleware’s main.go, the following lines set the main route and start the server. This exposes the function as a service when the container is deployed on Cloud Run.

 
	http.HandleFunc("/", function.Handle),  

	listenUntilShutdown(s, healthInterval, writeTimeout) 

I have added the following lines [see on GitHub] that expose the same function on the command line for Kubeflow.  

	if len(os.Args) < 2 {,  

		listenUntilShutdown(s, healthInterval, writeTimeout) 

	} else { 

		dat, _ := os.ReadFile(os.Args[1]) 

		_dat := function.HandleFile(dat) 

		_ = os.WriteFile(os.Args[2], _dat, 0777) 

	} 

If the input and output file names are supplied on the command line as in kubeflow, it reads from and writes to those files. The kubeflow component definition is as below: 

implementation:
  container:
    image: <image:version>
    command: [
        'sh',
        '-c',
        'mkdir --parents $(dirname "$1") && /home/app/handler "$0" "$1"',
    ]
    args: [{inputPath: Input 1}, {outputPath: Output 1}]

With this simple tweak, we can use the same container to host the function on any cloud provider as serverless functions and Kubeflow components.  You can pull the modified template from the repo below.

Six things data scientists in healthcare should know

Healthcare, like most other fields, is eager to get on the data science bandwagon. Data scientists can make a huge difference in the way big data is utilized for clinical decision-making. However, there are paradigmatic differences in the way data scientists from quantitative fields view the world, compared to their clinical counterparts. This is especially true in the emerging fields of machine learning and artificial intelligence. This may lead to considerable inefficiencies. As a person trained in both fields, here is my take on this.

Data scientists
Credit: Dasaptaerwin, CC0, via Wikimedia Commons

Data scientists should focus on the problem and not the solutions

Data scientists are excited about the latest GPT or BERT. Data scientists tend to refine the model a bit more using 10 more GPUs! In the process, they tend to solve problems that do not exist. From my experience practicing medicine in extremely resource-poor areas, simple solutions are valued more than BERT running on Kubernetes! This is true in the developed world as well, and many teams may have fundamental data needs that need to be tackled first.

Explanation comes before prediction

Emerging machine learning methods prioritize prediction accuracy compromising on explainability in the process. Clinicians, in most cases, cannot use nor trust a model that arrives at a conclusion without showing how it reached there. Hence, in the clinical domain, a simple logistic regression model may be more acceptable than a deep learning neural network. Parsimony is the key and a bit of feature selection to ensure parsimony will be appreciated always.

You need to know the clinical terminologies

A basic understanding of the clinical terminologies and terminology systems such as SNOMED and ICD is vital. It helps in understanding the clinical community better. Any healthcare analytics to consider variations in terminologies and adopt a standard system for consistency. Any tool that data scientists build for the clinical community should have support for terminology systems.

Biostatistics is more pervasive than you think

Most healthcare professionals are trained in biostatistics. Hence, the thinking leans towards population, sampling, randomization, blindings and showing a ‘statistically significant’ difference. Moving towards machine learning needs a paradigmatic shift. It may be useful to have a discussion on this at the outset.

Classes are of unequal importance

In healthcare, finding one class (e.g. cancer) is more important than the other class (e.g. no cancer). One class may need active intervention to save lives. Hence, sensitivity and specificity are of vital importance than accuracy!

Life is precious!

In healthcare, there is no room for error. Some decisions may have disastrous consequences while few others may save lives. As a data scientist in the healthcare domain, you should be cognizant of the fact that healthcare data is different from banking/airline data.

How to deploy an h2o ai model using OpenFaaS on Digitalocean in 2 minutes

H2O is an open-source, distributed and scalable machine learning platform written in JAVA. H2O supports many of the statistical & machine learning algorithms, including gradient boosted machines, generalized linear models, deep learning and more.  OpenFaaS® (Functions as a Service) is a framework for building Serverless functions easily with Docker. Read my previous post to learn more about OpenFaaS and DO. 

H2O AI model deployment

H2O has a module aptly named sparkling water that allows users to combine the machine learning algorithms of H2O with the capabilities of Spark. Integrating these two open-source environments provides a seamless experience for users who want to make a query using Spark SQL, feed the results into H2O to build a model and make predictions, and then use the results again in Spark. For any given problem, better interoperability between tools provides a better experience.

H2O Driverless AI is a commercial package for automatic machine learning that automates some of the most difficult data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection, and model deployment. H2O also has a popular open-source module called AutoML that automates the process of training a large selection of candidate models. H2O’s AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. AutoML makes hyperparameter tuning accessible to everyone.

H2O allows you to convert the models to either a Plain Old Java Object (POJO) or a Model Object or an Optimized (MOJO) that can be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is the h2o-genmodel.jar file produced as the build output of these packages. You can read more about deploying h2o models here.

I have created an OpenFaaS template for deploying the exported MOJO file using a base java container and the dependencies defined in the gradle build file. Using the OpenFaaS CLI (How to Install) pull my template as below:

mkdir watersplash
cd watersplash

faas-cli template pull https://github.com/dermatologist/java-ext --prefix your-docker-uname

faas-cli new --lang java-h2o watersplash

Copy the exported MOJO zip file to the root folder along with build.gradle and settings.gradle. Make appropriate changes to handle.java as per the needs of the model, as explained here. Add http://digitaloceanIP:8080 to watersplash.yml

 provider:
  	name: openfaas
  	gateway: http://digitaloceanIP:8080

and finally:

 faas-cli up -f watersplash.yml

That’s it! Congratulations! Your model is up and running! Access it at http://digitaloceanIP:8080/function/watersplash

If you get stuck at any stage, give me a shout below. 

Deploy a fastai image classifier using OpenFaaS for serverless on DigitalOcean in 5 easy steps!

Fastai is a python library that simplifies training neural nets using modern best practices. See the fastai website and view the free online course to get started. Fastai lets you develop and improve various NN models with little effort. Some of the deployment strategies are mentioned in their course, but most are not production-ready.

OpenFaaS® (Functions as a Service) is a framework for building Serverless functions easily with Docker that can be deployed on multiple infrastructures including Docker swarm and Kubernetes with little boilerplate code. Serverless is a cloud-computing model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources and can scale to zero if a service is not being used. It is interesting to note that OpenFaaS has the same requirements as the new Google Cloud Run and is interoperable. Read more about OpenFaaS (and install the CLI) from their website.

DigitalOcean: I host all my websites on DigitalOcean (DO) which offers good (in my opinion) cloud services at a low cost. They have data centres in Canada and India. DO supports Kubernetes and Docker Swarm, but they offer a One-Click install of OpenFaaS for as little as $5 per month (You can remove the droplet after the experiment if you like, and you will only be charged for the time you use it.) If you are new to DO, please sign up and setup OpenFaaS as shown here:

In fastai class, Jeremy creates a dog breed classifier.

As STEP 1, export the model to .pkl as below

learn.export()

This creates the export.pkl file that we will be using later. To deploy we need a base container to run the prediction workflow. I have created one with Python3 along with fastai core and vision dependencies (to keep the size small). It is available here: https://hub.docker.com/r/beapen/fastai-vision But you don’t have to directly use this container. My OpenFaaS template will make this easy for you.

STEP 2: Using the OpenFaaS CLI (How to Install) pull my template as below:

mkdir dog-classifier
cd dog-classifier
faas-cli template pull https://github.com/dermatologist/python3-ml --prefix your-docker-uname
faas-cli new --lang fastai-vision dog-classifier

STEP 3: Copy export.pkl to the model folder

STEP 4: Add http://digitaloceanIP:8080 to dog-classifier.yml

provider:
  name: openfaas
  gateway: http://digitaloceanIP:8080

and finally in STEP 5:

faas-cli up -f dog-classifier.yml

That’s it! Your predictor is up and running! Access it at http://digitaloceanIP:8080/function/dog-classifier

The template has a builtin image uploader interface! If you get stuck at any stage, give me a shout below. More to follow on using OpenFaaS for deploying machine learning workflows!