Large Language Models Archives

V. LLM-in-the-Loop CQL execution with unstructured data and FHIR terminology support

Here is how you can run the CQL execution with unstructured data and FHIR terminology support using our LLM-in-the-Loop stack as presented at AMIA #CIC25.

Published by Bell Eapen on June 4, 2025 | Permalink

Loading MIMIC dataset onto a FHIR server in two easy steps

The integration of generative AI into healthcare has the potential to revolutionize the industry, from drug discovery to personalized medicine. However, the success of these applications hinges on the availability of high-quality, curated datasets such as MIMIC. These datasets are crucial for training and testing AI models to ensure they can perform tasks accurately and reliably.

Free Clip Art, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

The Medical Information Mart for Intensive Care (MIMIC) dataset is a comprehensive, freely accessible database developed by the Laboratory for Computational Physiology at MIT. It includes deidentified health data from over 40,000 critical care patients admitted to the Beth Israel Deaconess Medical Center between 2001 and 2012. The dataset encompasses a wide range of information, such as demographics, vital signs, laboratory test results, medications, and caregiver notes. MIMIC is notable for its detailed and granular data, which supports diverse research applications in epidemiology, clinical decision-making, and the development of electronic health tools. The open nature of the dataset allows for reproducibility and broad use in the scientific community, making it a valuable resource for advancing healthcare research.

MIMIC-IV has been converted into the Fast Healthcare Interoperability Resources (FHIR) format and exported as newline-delimited JSON (ndjson). FHIR provides a structured way to represent healthcare data, ensuring consistency and reducing the complexity of data integration. However, importing the ndjson export of FHIR resources into a FHIR server can be challenging. Having the MIMIC-IV dataset loaded onto a FHIR server could be incredibly valuable. It would provide a consistent and reproducible environment for testing and developing Generative AI applications. Researchers and developers could leverage this setup to create and refine AI models, ensuring they work effectively with standardized healthcare data. This could ultimately lead to more robust and reliable AI applications in the healthcare sector. Here I show you how to do it in two easy steps using docker and the MIMIC-IV demo dataset.

STEP 1: Start the FHIR server

Use docker-compose to spin up the latest HAPI FHIR server that supports bulk data import using the docker-compose.yml file as below.

version: "3.7" 

services: 
  fhir: 
    image: hapiproject/hapi:latest 
    ports: 
      - 8080:8080 
    restart: "unless-stopped" 
    environment: 
      - hapi.fhir.bulkdata.enabled=true 
      - hapi.fhir.bulk_export_enabled=true 
      - hapi.fhir.bulk_import_enabled=true 
      - hapi.fhir.cors.enabled=true 
      - hapi.fhir.cors.allow_origin=* 
      - hapi.fhir.enforce_referential_integrity_on_write=false 
      - hapi.fhir.enforce_referential_integrity_on_delete=false 
      - "spring.datasource.url=jdbc:postgresql://postgres-db:5432/postgres" 
      - "spring.datasource.username=postgres" 
      - "spring.datasource.password=postgres" 
      - "spring.datasource.driverClassName=org.postgresql.Driver" 
      - "spring.jpa.properties.hibernate.dialect=ca.uhn.fhir.jpa.model.dialect.HapiFhirPostgres94Dialect" 

  
  postgres-db: 
    image: postgis/postgis:16-3.4 
    restart: "unless-stopped" 
    environment: 
      - POSTGRES_USER=postgres 
      - POSTGRES_PASSWORD=postgres 
      - POSTGRES_DB=postgres 
    ports: 
      - 5432:5432 
    volumes: 
      - postgres-db:/var/lib/postgresql/data 

volumes: 
  postgres-db: ~

Please note that the referential integrity on write is set to false.

docker compose up to start the server at the following base URL: http://localhost:8080/fhir

STEP 2: Send a POST request to the $import endpoint.

The full MIMIC-IV dataset is available here for credentialed users. The demo dataset used in the request below is available here. You don’t have to download the dataset. The request below contains the URL to the demo data sources. Anyone can access the files, as long as they conform to the terms of the license specified in this page. All you need is an internet connection for the docker environment. The FHIR $import operation allows for bulk data import into a FHIR server. When using resource type Parameters, you can specify the types of FHIR resources to be imported. This is done by including a Parameters resource in the request body, which details the resource types and their respective data files. I use the VSCODE REST Client extension to make the request and the format below aligns with its requirements. However, you can make the POST request in any way you prefer.

### 

  
POST http://localhost:8080/fhir/$import HTTP/1.1 
Prefer: respond-async 
Content-Type: application/fhir+json 

  
{ 

  "resourceType": "Parameters", 

  "parameter": [ { 

    "name": "inputFormat", 

    "valueCode": "application/fhir+ndjson" 

  }, { 

    "name": "inputSource", 

    "valueUri": "http://example.com/fhir/" 

  }, { 

    "name": "storageDetail", 

    "part": [ { 

      "name": "type", 

      "valueCode": "https" 

    }, { 

      "name": "credentialHttpBasic", 

      "valueString": "admin:password" 

    }, { 

      "name": "maxBatchResourceCount", 

      "valueString": "500" 

    } ] 

  }, { 

    "name": "input", 

    "part": [ { 

      "name": "type", 

      "valueCode": "Observation" 

    }, { 

      "name": "url", 

      "valueUri": "https://physionet.org/files/mimic-iv-fhir-demo/2.0/mimic-fhir/ObservationLabevents.ndjson" 

    } ] 

  }, { 

    "name": "input", 

    "part": [ { 

      "name": "type", 

      "valueCode": "Medication" 

    }, { 

      "name": "url", 

      "valueUri": "https://physionet.org/files/mimic-iv-fhir-demo/2.0/mimic-fhir/Medication.ndjson" 

    } ] 

  }, { 

    "name": "input", 

    "part": [ { 

      "name": "type", 

      "valueCode": "Procedure" 

    }, { 

      "name": "url", 

      "valueUri": "https://physionet.org/files/mimic-iv-fhir-demo/2.0/mimic-fhir/Procedure.ndjson" 

    } ] 

  }, { 

    "name": "input", 

    "part": [ { 

      "name": "type", 

      "valueCode": "Condition" 

    }, { 

      "name": "url", 

      "valueUri": "https://physionet.org/files/mimic-iv-fhir-demo/2.0/mimic-fhir/Condition.ndjson" 

    } ] 

  }, { 

    "name": "input", 

    "part": [ { 

      "name": "type", 

      "valueCode": "Patient" 

    }, { 

      "name": "url", 

      "valueUri": "https://physionet.org/files/mimic-iv-fhir-demo/2.0/mimic-fhir/Patient.ndjson" 

    } ] 

  } ] 

}

That’s it! It takes a few minutes for the bulk import to complete, depending on your system resources.

Feel free to reach out if you’re interested in collaborating on developing a gold QA dataset for testing clinician-facing GenAI applications. My research is centered on creating and validating clinician-facing chatbots.

Cite this article as: Eapen BR. (November 20, 2024). Loading MIMIC dataset onto a FHIR server in two easy steps. Retrieved July 22, 2025, from https://nuchange.ca/2024/11/loading-mimic-dataset-onto-a-fhir-server-in-two-easy-steps.html.

Published by Bell Eapen on November 20, 2024 | Permalink

Locally hosted LLMs

TL; DR: From my personal experiments (on an 8-year-old, i5 laptop with 16 GB RAM), locally hosted LLMs are extremely useful for many tasks that do not require much model-captured knowledge.

Image credit: I, Luc Viatour, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

The era of relying solely on large language models (LLMs) for all-encompassing knowledge is evolving. As technology advances, the focus shifts towards more specialized and integrated systems. These systems combine the strengths of LLMs with real-time data access, domain-specific expertise, and interactive capabilities. This evolution aims to provide more accurate, context-aware, and up-to-date information, saving us time and addressing the limitations of static model knowledge.

I have started to realize that LLMs are more useful as language assistants who can summarize documents, write discharge summaries, and find relevant information from a patient’s medical record. The last one still has several unsolved limitations, and reliable diagnostic (or other) decision-making is still in the (distant?) future. In short, LLMs are becoming increasingly useful in healthcare as time-saving tools, but they are unlikely to replace us doctors as decision-makers soon. That raises an important question; Do locally hosted LLMs (or even the smaller models) have a role to play? I believe they do!

Locally hosted large language models (LLMs) offer several key benefits. First, they provide enhanced data privacy and security, as all data remains on your local infrastructure, reducing the risk of breaches and unauthorized access. Second, they allow for greater customization and control over the hardware, software, and data used, enabling more tailored solutions. Additionally, locally hosted LLMs can operate offline, making them valuable in areas with unreliable internet access. Finally, they can reduce latency and potentially lower costs if you already have the necessary hardware. These advantages make locally hosted LLMs an attractive option for many users.

The accessibility and ease of use offered by modern, user-friendly platforms like OLLAMA are significantly lowering the barriers for individuals seeking technical expertise in self-hosting large language models (LLMs). The availability of a range of open-source models on Hugging Face lowers the barrier even further.

I have been doing some personal experiments with Ollama (on docker), Microsoft’s phi3: mini (language model) and all-minilm (embedding model), and I must say I am pleasantly surprised by the results! I have been using an 8-year-old, i5 laptop with 16 GB RAM. I have been using it as part of a project for democratizing Gen AI in healthcare, especially for resource-deprived areas (more about it here), and it does a decent job of vectorizing health records and answering questions based on RAG. I also made a helpful personal writing assistant that is RAG-based. I am curious to know if anybody else in my network is doing similar experiments with locally hosted LLMs on personal hardware.

Published by Bell Eapen on July 14, 2024 | Permalink

Come, join us to make generative AI in healthcare more accessible!

ChatGPT captured the imagination of the healthcare world though it led to the rather misguided belief that all it needs is a chatbot application that can make API calls. A more realistic and practical way to leverage generative AI in healthcare is to focus on specific problems that can benefit from its ability to synthesize and augment data, generate hypotheses and explanations, and enhance communication and education.

Generative AI Image credit: Bovee and Thill, CC BY 2.0 https://creativecommons.org/licenses/by/2.0, via Wikimedia Commons

One of the main challenges of applying generative AI in healthcare is that it requires a high level of technical expertise and resources to develop and deploy solutions. This creates a barrier for many healthcare organizations, especially smaller ones, that do not have the capacity or the budget to build or purchase customized applications. As a result, generative AI applications are often limited to large health systems that can invest in innovation and experimentation. Needless to say, this has widened the already big digital healthcare disparity.

One of my goals is to use some of the experience that I have gained as part of an early adopter team to increase the use and availability of Gen AI in regions where it can save lives. I think it is essential to incorporate this mission in the design thinking itself if we want to create applications that we can scale everywhere. What I envision is a platform that can host and support a variety of generative AI applications that can be easily accessed and integrated by healthcare organizations and professionals. The platform would provide the necessary infrastructure, tools, and services to enable developers and users to create, customize, and deploy generative AI solutions for various healthcare problems. The platform would also foster a community of practice and collaboration among different stakeholders, such as researchers, clinicians, educators, and patients, who can share their insights, feedback, and best practices.

I have done some initial work, guided by my experience in OpenMRS and I have been greatly inspired by Bhamini. The focus is on modular design both at the UI and API layers. OpenMRS O3 and LangServe templates show promise in modular design. I hope to release the first iteration on GitHub in late August 2024.

Do reach out in the comments below, if you wish to join this endeavour, and together we can shape the future of healthcare with generative AI.

Read Part II

Building a Modular Framework for Generative AI in Healthcare

Published by Bell Eapen on June 15, 2024 | Permalink

Why is RAG not suitable for all Generative AI applications in healthcare?

Retrieval-augmented generation (RAG) is a method of generating natural language that leverages external knowledge sources, such as large-scale text corpora. RAG first retrieves a set of relevant documents for a given input query or context and then uses these documents as additional input for a neural language model that generates the output text. RAG aims to improve the factual accuracy, diversity, and informativeness of the generated text by incorporating knowledge from the retrieved documents.

Image credit: Nomen4Omen with relabelling by Felix QW, CC BY-SA 3.0 DE https://creativecommons.org/licenses/by-sa/3.0/de/deed.en, via Wikimedia Commons

However, it may not be suitable for all healthcare applications because of the following reasons:

– RAG relies on the quality and relevance of the retrieved documents, which may not always be available or accurate for specific healthcare domains or tasks. For example, if the task is to generate a personalized treatment plan for a patient based on their medical history and symptoms, RAG may not be able to retrieve any relevant documents from a general-domain corpus, or it may retrieve outdated or inaccurate information that could harm the patient’s health.

– RAG may not be able to capture the complex and nuanced context of healthcare scenarios, such as the patient’s preferences, values, goals, emotions, or social factors. These aspects may not be explicitly stated in the retrieved documents, or they may require additional knowledge and reasoning to infer. For example, if the task is to generate empathetic and supportive messages for a patient who is diagnosed with a terminal illness, RAG may not be able to consider the patient’s psychological state, coping strategies, or family situation, and may generate generic or inappropriate responses that could worsen the patient’s distress.

– RAG cannot be used to summarize a patient’s medical history as it may not be able to extract the most relevant and important information from the retrieved documents, which may contain a lot of noise, redundancy, or inconsistency. For example, if the task is to generate a concise summary of a patient’s chronic conditions, medications, allergies, and surgeries, RAG may not be able to filter out irrelevant or outdated information, such as the patient’s demographics, vital signs, test results, or minor complaints, or it may include conflicting or duplicate information from different sources. This could lead to a confusing or inaccurate summary that could misinform the patient or the healthcare provider.

Therefore, RAG is not suitable for all Generative AI applications in healthcare, and it may require careful design, evaluation, and adaptation to ensure its safety, reliability, and effectiveness in specific healthcare contexts and tasks.

Cite this article as: Eapen BR. (May 11, 2024). Why is RAG not suitable for all Generative AI applications in healthcare?. Retrieved July 22, 2025, from https://nuchange.ca/2024/05/why-is-rag-not-suitable-for-all-generative-ai-applications-in-healthcare.html.

Published by Bell Eapen on May 11, 2024 | Permalink

Grounding vs RAG in Healthcare Applications

Both Grounding and RAG (Retrieval-Augmented Generation) play significant roles in enhancing LLMs capabilities and effectiveness and reducing hallucinations. In this post, I delve into the subtle differences between RAG and grounding, exploring their use in generative AI applications in healthcare.

What is RAG?

RAG, short for Retrieval-Augmented Generation, represents a paradigm shift in the field of generative AI. It combines the power of retrieval-based models with the fluency and creativity of generative models, resulting in a versatile approach to natural language processing tasks.

RAG has two components; the first focuses on retrieving relevant information and the other on generating textual outputs based on the retrieved context.
By incorporating a retrieval mechanism into the generation process, RAG can enhance the model’s ability to access external knowledge sources and incorporate them seamlessly into its responses.
RAG models excel in tasks that require a balance of factual accuracy and linguistic fluency, such as question-answering, summarization, decision support and dialogue generation.

Understanding Grounding

On the other hand, grounding in the context of AI refers to the process of connecting language to its real-world referents or grounding sources in perception, action, or interaction.

It helps models establish connections between words, phrases, and concepts in the text and their corresponding real-world entities or experiences.
Through grounding, AI systems can learn to associate abstract concepts with concrete objects, actions, or situations, enabling more effective communication and interaction with humans.
It serves as a bridge between language and perception, enabling AI models to interpret and generate language in a contextually appropriate manner.

Contrasting RAG and Grounding

While RAG and grounding both contribute to enhancing AI models’ performance and capabilities, they operate at different levels and serve distinct purposes in the generative AI landscape.

RAG focuses on improving the generation process by incorporating a retrieval mechanism for accessing external knowledge sources and enhancing the model’s output fluency.
Grounding, on the other hand, emphasizes connecting language to real-world referents, ensuring that AI systems can understand and generate language in a contextually meaningful way.
In general, grounding uses a simple and faster model with a lower temperature setting, while RAG uses more “knowledgeable” models at higher temperatures.
Grounding can also be achieved by finetuning a model on the grounding sources.

Implications for Healthcare applications

In the domain of healthcare, grounding is especially useful when the primary intent is to retrieve information for the clinician at the point of patient care. Typically, generation is based on patient information or policy information and the emphasis is on generating content that does not deviate much from the grounding sources. The variation from the source can be quantitatively measured and monitored easily.

In contrast, RAG is useful in situations where LLMs are actively involved in interpreting the information provided to them in the prompt and making inferences, decisions or recommendations based on the knowledge originally captured in the model itself. The expectation is not to base the output on the input, but to use the provided information for intelligent and useful interpretations. It is difficult to quantitatively assess and monitor RAG and some form of qualitative assessment is often needed.

In conclusion, RAG and grounding represent essential components in the advancement of generative AI and LLMs. By understanding the nuances of these concepts and their implications for healthcare, researchers and practitioners can harness their potential to create more intelligent and contextually aware applications.

Published by Bell Eapen on May 4, 2024 | Permalink

To or not to LangChain

LangChain is a free and accessible coordination framework for building applications that rely on large language models (LLMs). Although it is widely used, it sometimes receives critiques such as being complex, insecure, unscalable, and hard to maintain. As a novel framework, some of these critiques might be valid, but they might also be a strategy by the dominant LLM actors to regain power from the rebels.

The well-known machine learning frameworks PyTorch and Tensorflow are from the major players who also own some of the largest and most powerful LLMs in the market. By offering these frameworks for free, they can attract more developers and researchers to use their LLMs and platforms and gain more data and insights from them. They can also shape the standards and norms of the LLM ecosystem and influence the direction of future research and innovation.

It may not be the case that the major actors are actively trying to discredit LangChain, but some trends are worth noting. A common misconception is that LLM’s shortcomings are due to LangChain. You would often hear about LangChain hallucinating! Another frequent strategy is to confuse the discussion by bringing conflicting terms to the more widely used LangChain vocabulary. SDKs from major actors (deliberately) attempt to substitute their own syntaxes for LangChain’s.

You might be bewildered after a conference run by the main players, and that could be part of their plan to make you dependent on their products. My approach is to use a mind map to keep track of the LLM landscape and refer to that when suggesting LLM solutions. It also helps to have a list of open-source implementations of common patterns.

Mind map of LLM techniques, methods and tools — LLM Mind map

I have also noticed that the big players are gradually giving up and embarrassing the LangChain paradigms. I feel that despite LangChain’s limitations, it is here to stay! What do you think?

Published by Bell Eapen on April 27, 2024 | Permalink

Navigating the Complexities of Gen AI in Medicine: 5 Development Blunders to Avoid

Below, I have listed five critical missteps that you should steer clear of to ensure the successful integration of Gen AI in Medicine. This post is primarily for healthcare professionals managing a software team developing a Gen AI application.

Image credit: Nicolas Rougier, GPL via Wikimedia Commons

#1 Focus on requirements

Gen AI is an evolving technological landscape. ChatGPT’s user interface makes it look simplistic. Even a simple interface to any LLM is useful for mundane clinical chores (provided PHI is handled appropriately). However, developing an application that can automate tasks or assist clinical decision-making requires much engineering. It’s crucial to define clear and detailed requirements for your Gen AI solution. Without a comprehensive understanding of the needs and constraints, your project can easily become misaligned with clinical goals. Ensure that your AI application is not only technically sound but also meets the specific demands of healthcare settings. This precision will guide your software development process, avoiding costly detours or features that do not add value to healthcare providers or patients.

#2 Avoid solutioning

When working with your software team, be wary of dictating specific semi-technical solutions too early in the process. Most applications require techniques beyond prompting in a text window. It’s essential to allow your engineering team to explore and assess various options that can best meet the outlined requirements. By fostering an environment where creative and innovative problem-solving flourishes, you enable the team to find the most effective and sustainable technological path. This approach can also lead to discoveries of new capabilities of Gen AI that could benefit your project.

#3 Prioritize features

It’s essential to prioritize the features that will bring the most value to the end-user. Engage with stakeholders, including clinicians and patients, to understand what functionalities are most critical for their workflow and care delivery. This collaborative approach ensures the practicality of the AI application and aligns it with user needs. Avoid overloading your app with unnecessary features that complicate the user experience and detract from the core value proposition. Instead, aim for a lean product with high-impact features.

Gen AI app development is a time-consuming and technically challenging process. It is important to keep this in mind while prioritizing. Time and resource management are key in this regard. Allocate sufficient time for your team to refine their work, ensuring that each feature is developed with quality and precision. This disciplined approach to scheduling also helps in avoiding burnout among your team members, which is common in high-pressure development environments. Remember, a feature-packed application that lacks reliability or user-friendliness is less likely to be embraced by the healthcare community. Focus on delivering a polished, useful tool.

#4 You may never get it right, the first time when it comes to Gen AI in Medicine

Accept that perfection is unattainable on the initial try. In the world of software, especially with Gen AI, iterative testing and refinement are key. Encourage your team to build a Minimum Viable Product (MVP) and then improve it through user feedback and continuous development cycles. This iterative process is crucial to adapt to the ever-changing needs of healthcare professionals and to integrate the latest advancements in AI. Also, don’t underestimate the value of user testing; real-world feedback is invaluable.

#5 Avoid technology pivots and information overloads

Avoiding abrupt technological shifts late in the development cycle is critical. Such pivots can be costly and disruptive, derailing the project timeline. Stay committed to the chosen technology stack unless significant, unforeseeable impediments arise. Additionally, guard against overwhelming your team with excessive information. While staying informed is crucial, too much data can paralyze decision-making. Strive for a balance that empowers your team with the knowledge they need to be effective without causing analysis paralysis.

In my next post, I will explain the symbols and notations that I employ in my Gen AI in Medicine development process. BTW, What is your next Gen AI in Medicine project?

Published by Bell Eapen on March 9, 2024 | Permalink