Health informatics Archives

Design Science Research in Healthcare: Bridging the Gap Between Ideas and Impact

In the world of healthcare research, the dominant paradigm has long been empirical and observational—studies that measure, compare, and validate phenomena to uncover truths. But what if the goal isn’t just to understand the world, but to change it? That’s where Design Science Research (DSR) comes in—a paradigm that’s less about observing and more about building, solving, and transforming. As a health informatics researcher working at the intersection of consultancy and academia, I’ve found DSR to be the most powerful lens for translating ideas into practice.

Design Science Research in Health Informatics

Image Credit: kevineriley, CC BY-SA 3.0 https://creativecommons.org/licenses/by-sa/3.0, via Wikimedia Commons

What Is Design Science Research?

Design Science Research, as defined by Hevner et al. (2004), focuses on the creation and evaluation of artifacts—tools, frameworks, models, or systems—that solve identified problems. Unlike traditional behavioral science, which seeks to explain phenomena, DSR aims to design solutions and assess their utility. In healthcare, this means building decision support tools, workflow optimizers, or data integration platforms that directly improve clinical or operational outcomes.

Hevner’s seminal work laid out seven guidelines for DSR, including the need for a clear problem definition, artifact relevance, rigorous evaluation, and contribution to knowledge. These principles have since guided a wave of innovation in health informatics, where the complexity of real-world systems demands more than just theoretical insight—it demands actionable design.

Why Traditional Research Falls Short in Healthcare Innovation

Traditional healthcare research often struggles with the “translational gap”—the chasm between discovery and implementation. A novel algorithm might predict sepsis with high accuracy, but without integration into clinical workflows, it remains a paper exercise. Similarly, a new policy framework might promise equity, but without tools to operationalize it, it’s just words.

This is where DSR shines. It doesn’t stop at the idea; it builds the bridge. It asks: What artifact can embody this idea? How will it be used? What constraints must it navigate? And most importantly: Does it work in practice?

Living at the Intersection: Consultancy Meets Research

My work often begins with a specific organizational challenge—say, a hospital struggling with fragmented dermatology referrals. This is the consultancy mode: solving a particular problem for a particular client. But as a researcher, I’m also asking: Is this problem part of a broader class? Can the solution be generalized?

This dual lens allows me to design artifacts that are both context-sensitive and theoretically grounded. For example, a referral triage tool built for one clinic might evolve into a modular framework for dermatology decision support across multiple institutions. That’s the essence of DSR: solving classes of problems through iterative design, evaluation, and abstraction.

The Role of Translational Designers in Health Research Teams

Most health research teams are rich in domain expertise—clinicians, epidemiologists, policy analysts. But they often lack what I call translational designers: people who can take a promising idea and make it work in the messy, constraint-laden world of healthcare delivery.

These designers are fluent in both theory and practice. They understand stakeholder needs, data limitations, regulatory constraints, and user experience. They build prototypes, test them in real settings, and refine them based on feedback. Without them, even the best ideas risk dying in the valley of death between research and implementation.

Making DSR Accessible and Impactful

One challenge with DSR is that it can feel abstract or overly technical. But at its heart, it’s a human-centered approach. It starts with people—their needs, frustrations, and goals—and builds solutions that fit their world. It’s not about perfect algorithms; it’s about useful artifacts.

To make DSR more accessible, I often use metaphors. I tell students: “Traditional research is like mapping the terrain. DSR is like building the bridge.” Or: “Empirical studies ask ‘what is?’ DSR asks ‘what could be?’” These reframings help shift the mindset from passive observation to active creation.

Final Thoughts: Designing the Future of Healthcare

As healthcare becomes more complex, data-rich, and digitally mediated, we need more than observational studies. We need designers—people who can build, test, and refine solutions that work in practice. Design Science Research offers a rigorous, impactful way to do just that.

Whether you’re a clinician with an idea, a researcher with a model, or a technologist with a prototype, DSR provides the scaffolding to turn insight into innovation. And if you’re looking for inspiration, check out DHTI—it’s a living example of how design can drive transformation.

DHTI: a reference architecture for Gen AI in healthcare and a skill platform for vibe coding!
https://github.com/dermatologist/dhti
0 forks.
15 stars.
8 open issues.

Recent commits:

Merge branch 'release/1.3.3' into develop, Bell Eapen
ci: version bump to 1.3.3, github-actions
Commit from GitHub Actions (Bump Version), github-actions
Feature/fix hanging 1 (#126)* feat: add cleanup message during conversation history save* fix: update command in SKILL.md to remove version specifier for dhti-cli, GitHub
docs: update SKILL.md to include instructions for freeing port 8080 for CDS-Hooks container (#125), GitHub

Published by Bell Eapen on November 11, 2025 | Permalink

II. VSAC-on-FHIR

My enhancement now extends support beyond VSAC, enabling the use of FHIR-compliant terminology servers for private or custom-defined Value Sets. This added feature allows healthcare organizations to leverage their own FHIR-based terminology repositories, improving flexibility for institutions that need localized or proprietary clinical vocabularies while maintaining compliance with existing standards. Users can specify a FHIR Base URL to direct queries toward non-VSAC terminology servers, ensuring broader accessibility to domain-specific terminologies.

Published by Bell Eapen on May 30, 2025 | Permalink

I. CQL to ELM translator API with SpringBoot

Clinical Query Language (CQL) is a flexible, domain-independent query language designed to support clinical decision-making by enabling intuitive and standardized queries without requiring extensive technical knowledge. It works with any data model, integrates with widely used programming languages, and relies on the Expression Logical Model (ELM) as an intermediary format to ensure consistency with existing healthcare data standards. The open-source CQL-to-ELM Translator, built in Java, facilitates seamless execution of CQL queries by converting them into ELM representations, supporting various customization options and integrating with FHIR, QDM, and QUICK models to enhance clinical data interoperability.

Published by Bell Eapen on May 29, 2025 | Permalink

Translational Research in Digital Health and Gen AI

Translational research is the process of turning scientific discoveries into practical applications that can benefit society. It involves bridging the gap between different stages of research, from basic to applied, and between different stakeholders, such as researchers, clinicians, policy makers, and industry. Translational research aims to accelerate the transfer of knowledge and technology from the laboratory to the bedside, from the bench to the market, and from the ivory tower to the community.

Image credit: DoD Architecture Framework Working Group, Public domain, via Wikimedia Commons.

One of the key features of translational research is pragmatism, which means focusing on real-world problems and solutions, rather than abstract theories and models. Pragmatism also implies being flexible and adaptable to the changing needs and contexts of the target users and environments. Translational researchers are not satisfied with publishing papers in academic journals; they want to see their work make a difference in people’s lives and health outcomes.

Translational Research Methods & Techniques

To achieve this goal, translational researchers need to adopt a variety of methods and techniques that can help them design, develop, evaluate, and implement digital health solutions in an effective and efficient way. These methods and techniques include:

User-centered design, which involves understanding the needs, preferences, and behaviors of the potential users and stakeholders of a digital health solution and involving them in the co-creation and evaluation of the solution.
Rapid prototyping, which involves creating low-fidelity or high-fidelity prototypes of a digital health solution and testing them with the users and stakeholders in an iterative way, to obtain feedback and improve the solution.
Pilot testing, which involves conducting a small-scale trial of a digital health solution in a real-world setting, to assess its feasibility, acceptability, usability, and preliminary effectiveness.
Randomized controlled trials, which involve comparing the effects of a digital health solution with a control condition (such as usual care or another intervention) in a large and representative sample of participants, to determine its efficacy, safety, and cost-effectiveness.
Implementation science, which involves studying the factors and strategies that influence the adoption, integration, and sustainability of a digital health solution in a real-world setting and developing and evaluating interventions to enhance these processes.
Health economics, which involves analyzing the costs and benefits of a digital health solution from different perspectives, such as the users, the providers, the payers, and the society.

Translational researchers need to be aware of the latest advances and trends in digital health and related fields. One of the emerging paradigms in digital health is Gen AI. Gen AI refers to the development of artificial intelligence systems that can perform any intellectual task that a human can do, such as reasoning, learning, planning, decision making, and creativity. Gen AI has the potential to revolutionize digital health by enabling personalized, predictive, preventive, and participatory medicine, as well as enhancing the quality and efficiency of health care delivery and management.

Translational researchers play a crucial role in shaping the future of digital health and Gen AI. They act as translators, mediators, facilitators, and innovators between different disciplines, sectors, and domains. They also work as consultants for companies, organizations, and startups that want to develop, test, and implement digital health and Gen AI solutions. Translational researchers provide expert advice and guidance on the best practices and methods for designing, developing, evaluating, and implementing digital health and Gen AI solutions, as well as identifying and addressing the potential challenges and risks involved. Translational researchers also help to disseminate and communicate the results and impacts of digital health and Gen AI solutions to various audiences, such as academics, practitioners, policy makers, industry, and the public.

In summary, translational research is a vital and exciting field that aims to bring research papers into working artifacts, and to bridge the gap between digital health and Gen AI research and practice. Translational researchers adopt pragmatism as their guiding principle and use a variety of methods and techniques to design, develop, evaluate, and implement digital health and Gen AI solutions in real-world settings. Translational research is a practical endeavor that can make a positive difference in people’s lives and health outcomes.

Do you have a Gen AI research project that you need help with?

Published by Bell Eapen on March 23, 2024 | Permalink

Six things data scientists in healthcare should know

Healthcare, like most other fields, is eager to get on the data science bandwagon. Data scientists can make a huge difference in the way big data is utilized for clinical decision-making. However, there are paradigmatic differences in the way data scientists from quantitative fields view the world, compared to their clinical counterparts. This is especially true in the emerging fields of machine learning and artificial intelligence. This may lead to considerable inefficiencies. As a person trained in both fields, here is my take on this.

Data scientists should focus on the problem and not the solutions

Data scientists are excited about the latest GPT or BERT. Data scientists tend to refine the model a bit more using 10 more GPUs! In the process, they tend to solve problems that do not exist. From my experience practicing medicine in extremely resource-poor areas, simple solutions are valued more than BERT running on Kubernetes! This is true in the developed world as well, and many teams may have fundamental data needs that need to be tackled first.

Explanation comes before prediction

Emerging machine learning methods prioritize prediction accuracy compromising on explainability in the process. Clinicians, in most cases, cannot use nor trust a model that arrives at a conclusion without showing how it reached there. Hence, in the clinical domain, a simple logistic regression model may be more acceptable than a deep learning neural network. Parsimony is the key and a bit of feature selection to ensure parsimony will be appreciated always.

You need to know the clinical terminologies

A basic understanding of the clinical terminologies and terminology systems such as SNOMED and ICD is vital. It helps in understanding the clinical community better. Any healthcare analytics to consider variations in terminologies and adopt a standard system for consistency. Any tool that data scientists build for the clinical community should have support for terminology systems.

Biostatistics is more pervasive than you think

Most healthcare professionals are trained in biostatistics. Hence, the thinking leans towards population, sampling, randomization, blindings and showing a ‘statistically significant’ difference. Moving towards machine learning needs a paradigmatic shift. It may be useful to have a discussion on this at the outset.

Classes are of unequal importance

In healthcare, finding one class (e.g. cancer) is more important than the other class (e.g. no cancer). One class may need active intervention to save lives. Hence, sensitivity and specificity are of vital importance than accuracy!

Life is precious!

In healthcare, there is no room for error. Some decisions may have disastrous consequences while few others may save lives. As a data scientist in the healthcare domain, you should be cognizant of the fact that healthcare data is different from banking/airline data.

Published by Bell Eapen on November 3, 2021 | Permalink

Clinical knowledge representation for reuse

The need for computerized clinical decision support is becoming increasingly obvious with the COVID-19 pandemic. The initial emphasis has been on ‘replacing’ the clinician which for a variety of reasons is impossible or impractical. Pragmatically, clinical decision support systems could provide clinical knowledge support for clinicians to make time-sensitive decisions with whatever information they have at the point of patient care.

Siobhán Grayson, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

Providing clinical decision support requires some formal way of representing clinical knowledge and complex algorithms for sophisticated inference. In knowledge management terms, the information requires to be transformed into actionable knowledge. Knowledge needs to be represented and stored in a way conducive to easy inference (knowledge reuse)¹. I have been exploring this domain for a considerable period of time, from ontologies to RDF datasets. With the advent of popular graph databases (especially Neo4J ), this seems to be a good knowledge representation method for clinical purposes.

To cut a long story short, I have been working on building a suite of JAVA libraries to support knowledge extraction, annotation and transformation to a graph schema for inference. I have not open-source it yet as I have not decided on what license to use. However, I am posting some preliminary information here to assess interest. Please give me a shout, if you share an interest or see some potential applications for this. As always, I am open to collaboration.

The JAVA package consists of three modules. The ‘library’ module wraps the NCBI’s E-Utils API to harvest published article abstracts if that is your knowledge source. Though data extraction from the clinical notes in EMR’s is a recent trend, it is challenging because of unstructured data and lack of interoperability. The ‘qtakes’ module provides a programmable interface to my quick-ctakes or the quarkus based apache ctakes, a fast clinical text annotation engine. Finally, the graph module provides the Neo4J models, repositories and services for abstracting as a knowledge graph.

The clinical knowledge graph (ckb) consists of entities such as Disease, Treatment and Anatomy and appropriate relationships and cypher queries are defined. The module exposes services that can be consumed by JAVA applications. It will be available as a maven artifact once I complete it.

UPDATE: May 30, 2021: The library (ckblib) is now available under MPL 2.0 license (see below). Feel free to use it in your research.

Tools to create a clinical knowledge graph from biomedical literature. Includes wrappers for NCBI Eutils, cTakes annotator and Neo4J
https://github.com/dermatologist/ckblib
4 forks.
16 stars.
9 open issues.

Recent commits:

1.
Toward a Theory of Knowledge Reuse: Types of Knowledge Reuse Situations and Factors in Reuse Success. Journal of Management Information Systems. Published online May 31, 2001:57-93. doi:10.1080/07421222.2001.11045671

Cite this article as: Eapen BR. (April 28, 2021). Clinical knowledge representation for reuse. Retrieved March 2, 2026, from https://nuchange.ca/2021/04/clinical-knowledge-representation-for-reuse.html.

Published by Bell Eapen on April 28, 2021 | Permalink

COVID vaccination tracking with blockchain

COVID vaccine rollout has the potential to bring relief to billions of people around the world. But as encouraging as these programs may be, it is extremely important to note that a vaccine cannot be as effective if it is not effectively distributed and trusted by the public.

SPQR10, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

IBM Blockchain has a vaccine distribution network for manufacturers to proactively monitor for adverse events and improve recall management. Moderna is planning to explore vaccine traceability with the IBM blockchain.

The International Air Transport Association (IATA) is planning to launch a system of digital ‘passports’ as proof that passengers have been vaccinated against COVID-19. Blockchain technology could offer a better data-storage system for such vaccination records. A decentralized blockchain ledger would be anonymous, immutable and transparent and the entries can be publicly audited.

A vaccine blockchain system could support vaccine traceability and smart contract functions and can be used to address the problems of vaccine expiration and vaccine record fraud. Additionally, the use of machine learning models can provide valuable recommendations to immunization practitioners and recipients, allowing them to choose better immunization methods and vaccines as recommended by this study. A blockchain-based system developed by Singapore-based Zuellig Pharma can help governments and healthcare providers manage vaccine distribution and administration. UK hospitals are using blockchain to track the temperature of coronavirus vaccines.

In my opinion, a blockchain application in healthcare should satisfy the following characteristics:

Both patient and provider should have an interest in the decentralized storage of the concerned piece of information. One party may be neutral, but there should not be a collision of interests.
One or more third parties should have an interest in this information and may have a reason not to trust the patient or provider.
The information should be a dynamic time-bound list that requires periodic updating.
The privacy concern related to the concerned information should be minimal.
The information should not be easy to measure or procure from other sources.

Vaccination satisfies the above criteria and as such blockchain may be a good solution for this problem. Before the COVID-19 pandemic, I had played a bit with solidity and made a web application with three different views:

Provider view: From this view, a provider can extend an offer to save the information on the blockchain to a patient.
Patient view: From this view, a patient can accept an offer extended by a provider.
Lookup view: To look up information on any patient.

Vac-chain is a prototype of on-chain storage of vaccination information on Ethereum blockchain using smart contracts in solidity using the truffle Drizzle box (React/Redux).

Ethereum Blockchain Smart Contracts and React App for vaccination data
https://github.com/E-Health/vac-chain
7 forks.
15 stars.
35 open issues.

Recent commits:

Merge pull request #40 from dermatologist/beapen-patch-1Add author name, GitHub
Add author name, Bell Eapen
Update README, dermatologist
Update README, dermatologist
Update README, dermatologist

Cite this article as: Eapen BR. (March 24, 2021). COVID vaccination tracking with blockchain. Retrieved March 2, 2026, from https://nuchange.ca/2021/03/covid-vaccination-tracking-with-blockchain.html.

Published by Bell Eapen on March 24, 2021 | Permalink

OHDSI OMOP CDM ETL Tools in Python, .Net and Go

TL;DR Here are few OHDSI OMOP CDM tools that may save you time if you are developing ETL tools!

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

The COVID-19 pandemic brought to light many of the vulnerabilities in our data collection and analytics workflows. Lack of uniform data models limits the analytical capabilities of public health organizations and many of them have to re-invent the wheel even for basic analysis. As many other sectors embrace big data and machine learning, many healthcare analysts are still stuck with the basic data wrenching with Excel.

The OHDSI OMOP CDM (Common data model) for observational data is a popular initiative for bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. Though OHDSI OMOP CDM is primarily for patient-centred observational analysis, mostly for clinical research, it can be used with minor tweaks for public health and epidemiologic data as well. We have written about some of the technical details here.

The OHDSI OMOP CDM is relatively simple and intuitive for clinical teams than emerging standards such as FHIR. Though the relational database approach and some of the software tools associated with OHDSI OMOP CDM are a bit old-fashioned, the data model is clinically motivated. There is an ecosystem of software tools for many of the analytics tools that can be used out of the box. The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 6.0, has simple but powerful vocabulary management. OHDSI OMOP CDM is a good choice for healthcare organizations moving towards health data warehousing and OLAP.

One weakness of OHDSI is the lack of tools for efficient ETL from existing EHR and HIS. Converting existing EHR data to the CDM is still a complex task that requires technical expertise. During the additional “home time” during the COVID pandemic, I have created three software libraries for ETL tool developers. These libraries in Python, .NET and Golang encapsulated the V6.0 CDM and helps in writing and reading data from a variety of databases with the V6.0 tables. The libraries also support creating the CDM tables for new databases and loading the vocabulary files.

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

These libraries might save you some time if you are building scripts for ETL to CDM. They are all open-source and free to use in your tools. Do give me a shout if you find these libraries useful and please star the repositories on GitHub.

Published by Bell Eapen on June 11, 2020 | Permalink

FHIR and public health data warehouses

First posted on CanEHealth.com

The provincial government is building a connected health care system centred around patients, families and caregivers through the newly established OHTs. As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams.

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications.

Public Health Data Warehouse Framework on FHIR
https://github.com/E-Health/fhir-server-phis-dw
2 forks.
3 stars.
3 open issues.

Recent commits:

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout.

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention?

Published by Bell Eapen on April 28, 2020 | Permalink

Random forest model for predicting the total length of hospital stay (TLOS)

TL;DR here is the Random Forest classifier code:

And an (obvious) upfront disclaimer: This is a learning project. This is not for actual use.

DAD is a database consisting of patient demographics, comorbidities, interventions and the length of stay for the de-identified 10% sample of hospital admissions. DAD (2014-15) has an enhanced dataset with variables that were created at Western to act as flags for ICD-10 and CCI groupings, to make using the file easier.

Here is an experiment with the DAD enhanced dataset to create a Random forest model for predicting the total length of hospital stay (TLOS) in less than 100 lines of code. Random forests are an ensemble classifier, that operates by building multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. This is a learning project for Apache Spark and Spark ML using pyspark. The accuracy of the model taking all derived categorical variables is low.

I have access to Apache Spark @ CC. If you are installing Spark in your computer you may have to change the following:

SparkContext.setSystemProperty('spark.executor.memory', '48g')
SparkContext.setSystemProperty('spark.driver.memory', '6g')

Some of the commonly tweaked parameters can be changed here:

RF_NUM_TREES = 3
RF_MAX_DEPTH = 4
RF_MAX_BINS = 12

Uncomment the following line to include only variables that you need.

# df.select([c for c in df.columns if c in ['TLOS_CAT', 'COLNAME', 'COLNAME']]).show()

Here is the repo. How can this model be improved? Maybe a PCA before the RF? or Am I missing something important?

Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However, the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

Published by Bell Eapen on August 29, 2018 | Permalink