Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

Named Entity Recognition using LLMs: a cTakes alternative?

TLDR: The targeted distillation method described may be useful for creating an LLM-based cTakes alternative for Named Entity Recognition. However, the recipe is not available yet. 

Named Entity Recognition using LLMs: a cTakes alternative?

Image credit: Wikimedia

Named Entity Recognition is essential in clinical documents because it enhances patient safety, supports efficient healthcare workflows, aids in research and analytics, and ensures compliance with regulations. It enables healthcare organizations to harness the valuable information contained in clinical documents for improved patient care and outcomes. 

Though Large Language Models (LLMs) can perform Named Entity Recognition (NER), the capability can be improved by fine-tuning, where you provide the model with input text that contains named entities and their associated labels. The model learns to recognize these entities and classify them into predefined categories. However, as described before fine-tuning Large Language Models (LLMs) is challenging due to the need for substantial, high-quality labelled data, the risk of overfitting on limited datasets, complex hyperparameter tuning, the requirement for computational resources, domain adaptation difficulties, ethical considerations, the interpretability of results, and the necessity of defining appropriate evaluation metrics. 

Targeted distillation of Large Language Models (LLMs) is a process where a smaller model is trained to mimic the behaviour of a larger, pre-trained LLM but only for specific tasks or domains. It distills the essential knowledge of the LLM, making it more efficient and suitable for particular applications, reducing computational demands.  

This paper described targeted distillation with mission-focused instruction tuning to train student models that can excel in a broad application class. The authors present a general recipe for such targeted distillation from LLMs and demonstrate that for open-domain NER. Their recipe may be useful for creating efficient distilled models that can perform NER on clinical documents, a potential alternative to cTakes. Though the authors have open-sourced their generic UniversalNER model, they haven’t released the distillation recipe code yet. 

REF: Zhou, W., Zhang, S., Gu, Y., Chen, M., & Poon, H. (2023). UniversalNER: Targeted Distillation from Large Language Models for Open Named Entity Recognition. ArXiv. /abs/2308.03279 

Kedro for multimodal machine learning in healthcare 

Healthcare data is heterogenous with several types of data like reports, tabular data, and images. Combining multiple modalities of data into a single model can be challenging due to several reasons. One challenge is that the diverse types of data may have different structures, formats, and scales which can make it difficult to integrate them into a single model. Additionally, some modalities of data may be missing or incomplete, which can make it difficult to train a model effectively. Another challenge is that different modalities of data may require different types of pre-processing and feature extraction techniques, which can further complicate the integration process. Furthermore, the lack of large-scale, annotated datasets that have multiple modalities of data can also be a challenge. Despite these challenges, advances in deep learning, multi-task learning and transfer learning are making it possible to develop models that can effectively combine multiple modalities of data and achieve reliable performance. 

Pipelines Kedro for multimodal machine learning

Kedro for multimodal machine learning

Kedro is an open-source Python framework that helps data scientists and engineers organize their code, increase productivity and collaboration, and make it easier to deploy their models to production. It is built on top of popular libraries such as Pandas, TensorFlow and PySpark, and follows best practices from software engineering, such as modularity and code reusability. Kedro supplies a standardized structure for organizing code, handling data and configuration, and running experiments. It also includes built-in support for version control, logging, and testing, making it easy to implement reproducible and maintainable pipelines. Additionally, Kedro allows to easily deploy the pipeline on cloud platforms like AWS, GCP or Azure. This makes it a powerful tool for creating robust and scalable data science and data engineering pipelines. 

I have built a few kedro packages that can make multi-modal machine learning easy in healthcare. The packages supply prebuilt pipelines for preprocessing images, tabular and text data and build fusion models that can be trained on multi-modal data for easy deployment. The text preprocessing package currently supports BERT and CNN-text models. There is also a template that you can copy to build your own pipelines making use of the preprocessing pipelines that I have built. Any number and combination of data types are supported. Additionally, like any other kedro pipeline, these can be deployed on kubeflow and VertexAI. Do comment below if you find these tools useful in your research. 

Link to the repository below.

Dark Mode

kedro-multimodal (this link opens in a new window) by dermatologist (this link opens in a new window)

Template for multi-modal machine learning in healthcare using Kedro. Combine reports, tabular data and image using various fusion methods.

Open-source for healthcare

This post is meant to be an instruction guide for healthcare professionals who would like to join my projects on GitHub.

eHealth Programmer Girl

What is a contribution?

Contribution is not always coding. You can clean up stuff, add documentation, instructions for others to follow etc. Issues and feature requests should be posted under the ‘issues’ tab and general discussions under the ‘Discussions’ tab if one is available.

How do I contribute.

How do I develop

  • The .devcontainer folder will have the configuration for the docker container for development.
  • Version bump action (if present) will automatically bump version based on the following terms in a commit message: major/minor/patch. Avoid these words in the commit message unless you want to trigger the action.
  • Most repositories have GH actions to generate and deploy documentation and changelog.

What do I do next

  • My repositories (so far) are small enough for beginners to get the big picture and make meaningful contributions.
  • Don’t be discouraged if you make mistakes. That is how we all learn.

There’s no better time than now to choose a repo to contribute!

10 points to consider before adopting open-source software in eHealth

Open-source software (hereafter OSS) is a phenomenon that has revolutionized the software industry. OSS is supported by voluntary programmers, who regularly dedicate their time and energy for the common good of all. The question that immediately comes to mind is how is it sustainable? Will they continue to contribute their social hours forever? Read the programmers perspective here. But does it make sense for healthcare organizations to accept their charity always? And, how do these organizations that adopt OSS improve the sustainability of these projects? These are some of the factors to consider:

artificial intelligence

Do you have enough funding?

OSS supporters are humanists with an emancipatory worldview. OSS is fundamentally not designed for an organization that can sustain a paid product. Firstly, there is the ethical problem of exploiting the OSS community. But more importantly, healthcare organizations with enough funding tend to spend more on the long-term maintenance and customization of OSS. Hence, OSS is generally designed to be an option when you have no other option.

Does the project have a regional focus?

OSS projects generally aim to solve global problems. So be careful when you hear Canadian OSS or Danish OSS. Regional OSS is mostly just cheaper local products masquerading as OSS for funding or for other reasons. They are unlikely to have the support of the global OSS community and is prone to burnout.

Is the OSS really OSS?

Any OSS worth its salt will be on GitHub. If you cannot find the project on GitHub, you should definitely ask why.

Is it really popular?

Some OSS that masquerade as OSS claim that they have a worldwide network of developers. The GitHub stars and forks would be a reasonable indicator of the popularity. Consider an OSS for your organization only if it has a thousand stars on the GitHub sky.

Are you looking for a specific workflow support?

Is your workflow generic enough to be supported by a global network of volunteers? Well, OHIP billing workflow may not be the right process to seek OSSM support.

Do you need customization?

If you need a lot of customizations to support your workflow, then OSS may not be the ideal solutions. OSS is ideally suited for situations where you can use it out of the box.

Do you have the time?

Remember that OSS is supported by voluntary programmers. So if you need a feature, you make a request and wait. If your organization is used to demanding, then OSS is not for you. OSS project is not owned by anyone, so their priorities may be different from yours.

Do you have internal expertise?

It is far easier to use an OSS if you have someone supporting the project in your organization. OSS community tends to respect one of their own more than an organization.

Supporting Open-Source Software?

It is crucial for organizations that depend on an OSS for your day to day operations to support the project. If the project becomes unsustainable, it affects the organization too. You can support the project in many ways such as donations, coding support and infrastructural support.

Do you know what OSS means and stands for?

Does the higher management know what OSS means and stands for? It is common in healthcare organizations to adopt OSS focusing on the free aspect.

“Free software” is a matter of liberty, not price. To understand the concept, you should think of “free” as in “free speech,” not as in “free beer”.

Personally, I think the first point is the most important. OSS is designed and intended for use in areas where a paid option is not viable. In other scenarios in healthcare, you are likely to spend more for an open-source product than you spend for a regular product.

Finally, a quick mention of some noteworthy OSS in healthcare. OpenMRS is an open-source EMR started with the mission to improve healthcare delivery in resource-constrained environments. DHIS2 is web-based open-source public health information system with awesome visualization features including GIS, charts and pivot tables.