Health Analytics Archives

CRISP-T: Bridging Text, Numbers, and AI for Smarter Qualitative Research

CRISP-T is a tool for researchers navigating the complexities of qualitative analysis on mixed data types. In fields like healthcare, education, and social sciences, qualitative data—interviews, open-ended surveys, field notes—often hold the richest insights. Yet, integrating this with structured numeric data has traditionally been cumbersome. CRISP-T addresses this gap by offering a unified framework that enables computational triangulation, allowing researchers to explore relationships between textual themes and quantitative outcomes with precision and flexibility.

CRISP-T: AI assisted qualitative research!

Its relevance is especially pronounced in the age of AI-assisted research. With built-in support for large language models and agentic AI (through MCP), CRISP-T empowers users to go beyond manual coding and thematic analysis. For instance, researchers can use topic modelling to identify recurring themes in patient feedback, then correlate these with retention metrics or clinical outcomes using decision trees or regression analysis. This kind of integrated insight is invaluable for evidence-based decision-making and theory development.

Moreover, CRISP-T’s modular design and open-source ethos make it highly adaptable. Whether you’re a solo researcher, part of an interdisciplinary team, or integrating AI agents into your workflow, CRISP-T provides the scaffolding to build sharable, reproducible analyses. Its MCP server interface further extends its utility, enabling seamless integration with platforms like Claude Desktop or VSCode, and facilitating interactive exploration of data. In short, CRISP-T isn’t just a toolkit—it’s a bridge between qualitative depth and quantitative rigour, built for the future of data-centric theory building.

🧠 Why CRISP-T Matters

Qualitative research often involves messy, complex data—interview transcripts, open-ended survey responses, field notes, and more. But what if you could combine these with structured numeric data like demographics or survey scores, and analyze both in tandem?

CRISP-T enables this fusion, offering a computational triangulation approach that:

Integrates text and numbers into unified corpus objects
Applies NLP techniques like topic modelling and sentiment analysis
Leverages ML algorithms such as decision trees and clustering
Supports semantic search and metadata export for deeper insights

This toolkit is especially valuable in domains like healthcare, education, and social sciences, where research on mixed data types is common.

⚙️ What’s Inside the Toolkit?

CRISP-T is built in Python and offers four powerful command-line interfaces (CLIs):

CLI Tool	Purpose
crisp	Main CLI for triangulation and analysis (topics, sentiment, regression)
crispt	Corpus manipulation (add/remove/query documents, relationships)
crispviz	Visualization (word clouds, topic charts, heatmaps)
crisp-mcp	MCP server for AI agent integration (Claude, VSCode, etc.)

These tools allow researchers to ingest data from .txt, .pdf, or .csv files, define relationships between text and numbers, and validate findings through machine learning models.

📊 Real-World Use Case: Market Research

Imagine a company collecting:

Customer feedback (text)
Retention rates and sales data (numbers)

Using CRISP-T, analysts can:

Extract recurring themes from feedback
Link them to performance metrics
Validate relationships using decision trees or regression
Visualize findings with topic charts and word clouds

This kind of mixed-methods insight is invaluable for strategic decision-making.

🌐 Learn More and Get Involved

If you’re a researcher, educator, or data scientist looking to bridge qualitative depth with quantitative rigour, CRISP-T is your toolkit. Give it a ⭐️ on GitHub, try the demo, and join the movement toward smarter, AI-powered sense-making.

CRISP-T: AI assisted Qualitative Research!
https://github.com/dermatologist/crisp-t
0 forks.
2 stars.
0 open issues.

Recent commits:

Published by Bell Eapen on October 14, 2025 | Permalink

Vibe Coding FHIR to OMOP

TL;DR: A clinician‑researcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoon. This function, generated by vibe coding using these prompts, would help you do just that!

🎵 What is Vibe Coding?

Vibe coding is an AI‑assisted development approach popularized by Andrej Karpathy. Instead of hand‑coding every function, you describe your intent in natural language to a large language model (LLM), and the AI generates boilerplate, scaffolding, or even full implementations.

Core elements:

Natural language prompts: You focus on what you want, not how to implement it.
AI‑driven code generation: The LLM translates your description into actual code.
Iteration through conversation: You refine the output by prompting adjustments, much like pair programming.
Conceptual focus: This frees mental bandwidth for architecture, data flows, and domain logic, rather than syntax minutiae.

Done well, vibe coding is not “letting the AI code everything blindly.” It’s a collaborative, high‑level design process where you still review, edit, and integrate the code yourself.

But let us discuss the problem first!

🗂 What is OMOP CDM?

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is the de‑facto standard for representing health data in a way that’s consistent, query‑friendly, and research‑ready. Stewarded by OHDSI, OMOP CDM structures clinical data into a fixed set of standardized tables—patients, conditions, drug exposures, measurements, procedures—each with precisely defined fields and standard vocabulary concept IDs.

The payoff:

You can write a single analysis and run it across multiple sites—without rewriting SQL for each local schema.
Data from disparate EHR systems, claims databases, and registries can be harmonized and compared.
The model supports rich vocabularies like SNOMED, RxNorm, and LOINC through standardized concept IDs.

If raw EHR data is a jumble of puzzle pieces, OMOP CDM is the finished picture on the box—complete with a shared key so everyone knows exactly where each piece fits.

🔗 Why is it hard to map FHIR to OMOP?

At first glance, FHIR (Fast Healthcare Interoperability Resources) and OMOP both sound like they’re aiming for the same thing: standardization. But FHIR focuses on data exchange—it’s an API‑friendly format designed for real‑time transactions, mobile apps, and message passing between systems. OMOP, in contrast, is optimized for longitudinal analytics and cohort‑level studies.

Mapping challenges include:

Structural mismatch
FHIR resources are often nested, verbose JSON objects with variable optional fields. OMOP tables are flat, relational, and denormalized for fast query performance. Flattening nested hierarchies without losing nuance is non‑trivial.
Semantic alignment
FHIR can store local codes or references; OMOP demands standardized vocabulary concepts. This means you can’t just copy values—you must map each source code to a standard concept ID, often using crosswalk tables from OHDSI’s Athena vocabulary service.
Event granularity
In FHIR, the same clinical fact might appear in multiple resources with different contexts. In OMOP, each fact needs to live in exactly one table with precise timestamps and provenance.
Volume and variety
FHIR Bulk Export can generate millions of NDJSON lines across dozens of resource types. Efficiently parsing, reconciling, and loading that into OMOP while preserving relationships is an engineering workout.

🐍 Meet PyOMOP

PyOMOP is a lightweight, Python‑based toolkit for working with OMOP CDM v5.4 or v6 databases.

Key capabilities include:

Creating and initializing OMOP databases (SQLite for quick tests, Postgres/MySQL for production).
Loading vocabularies from Athena CSVs.
Running standard OHDSI QueryLibrary queries or your own SQL.
Converting results to DataFrames for downstream analysis or machine learning.
FHIR → OMOP import utilities for bulk‑loading FHIR data into CDM tables.

PyOMOP’s design goal is to get researchers and developers productive quickly without the steep ramp‑up of setting up full OHDSI stacks from scratch.

🛠 Building FHIR→OMOP in PyOMOP with Vibe Coding

When I set out to add robust FHIR Bulk Export to OMOP CDM loading in PyOMOP, I knew the mapping complexity would be the biggest hurdle. I also knew that much of the tedious parsing, table loading, and vocabulary reconciliation could be scaffolded quickly using an AI coding partner.

Leveraging vibe coding for scaffolding

Using natural‑language prompts, I asked the AI to:

Generate Python functions to parse NDJSON files into DataFrames.
Build class methods that map FHIR resource fields to OMOP column names.
Handle async bulk inserts into Postgres using SQLAlchemy.

I treated the AI’s output like a first draft. I kept the domain‑specific logic in my own hands. I fed failure examples back into the AI prompt, asking for exception handling and logging improvements. This conversational loop closed the gap quickly. Once the mapping functions were stable, I embedded them behind a simple command:

pyomop --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/

Now, a single command could create an OMOP DB, load vocabularies, import FHIR data, and reconcile mappings.

With the FHIR→OMOP feature now baked into PyOMOP, a clinician‑researcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoon—no endless SQL migrations required.

Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import.
https://github.com/dermatologist/pyomop
9 forks.
51 stars.
0 open issues.

Recent commits:

Published by Bell Eapen on August 22, 2025 | Permalink

Kedro for multimodal machine learning in healthcare

Healthcare data is heterogenous with several types of data like reports, tabular data, and images. Combining multiple modalities of data into a single model can be challenging due to several reasons. One challenge is that the diverse types of data may have different structures, formats, and scales which can make it difficult to integrate them into a single model. Additionally, some modalities of data may be missing or incomplete, which can make it difficult to train a model effectively. Another challenge is that different modalities of data may require different types of pre-processing and feature extraction techniques, which can further complicate the integration process. Furthermore, the lack of large-scale, annotated datasets that have multiple modalities of data can also be a challenge. Despite these challenges, advances in deep learning, multi-task learning and transfer learning are making it possible to develop models that can effectively combine multiple modalities of data and achieve reliable performance.

Kedro for multimodal machine learning

Kedro is an open-source Python framework that helps data scientists and engineers organize their code, increase productivity and collaboration, and make it easier to deploy their models to production. It is built on top of popular libraries such as Pandas, TensorFlow and PySpark, and follows best practices from software engineering, such as modularity and code reusability. Kedro supplies a standardized structure for organizing code, handling data and configuration, and running experiments. It also includes built-in support for version control, logging, and testing, making it easy to implement reproducible and maintainable pipelines. Additionally, Kedro allows to easily deploy the pipeline on cloud platforms like AWS, GCP or Azure. This makes it a powerful tool for creating robust and scalable data science and data engineering pipelines.

I have built a few kedro packages that can make multi-modal machine learning easy in healthcare. The packages supply prebuilt pipelines for preprocessing images, tabular and text data and build fusion models that can be trained on multi-modal data for easy deployment. The text preprocessing package currently supports BERT and CNN-text models. There is also a template that you can copy to build your own pipelines making use of the preprocessing pipelines that I have built. Any number and combination of data types are supported. Additionally, like any other kedro pipeline, these can be deployed on kubeflow and VertexAI. Do comment below if you find these tools useful in your research.

Template for multi-modal machine learning in healthcare using Kedro. Combine reports, tabular data and images using various fusion methods.
https://github.com/dermatologist/kedro-multimodal
3 forks.
21 stars.
1 open issues.

Recent commits:

Update README.md, GitHub
change graphics, Bell Eapen
Update README.md, GitHub
Merge pull request #1 from dermatologist/add-license-1Create LICENSE, GitHub
Create LICENSE, GitHub

Dark Mode

kedro-multimodal (this link opens in a new window) by dermatologist (this link opens in a new window)

Template for multi-modal machine learning in healthcare using Kedro. Combine reports, tabular data and image using various fusion methods.

Published by Bell Eapen on January 25, 2023 | Permalink

Six things data scientists in healthcare should know

Healthcare, like most other fields, is eager to get on the data science bandwagon. Data scientists can make a huge difference in the way big data is utilized for clinical decision-making. However, there are paradigmatic differences in the way data scientists from quantitative fields view the world, compared to their clinical counterparts. This is especially true in the emerging fields of machine learning and artificial intelligence. This may lead to considerable inefficiencies. As a person trained in both fields, here is my take on this.

Data scientists should focus on the problem and not the solutions

Data scientists are excited about the latest GPT or BERT. Data scientists tend to refine the model a bit more using 10 more GPUs! In the process, they tend to solve problems that do not exist. From my experience practicing medicine in extremely resource-poor areas, simple solutions are valued more than BERT running on Kubernetes! This is true in the developed world as well, and many teams may have fundamental data needs that need to be tackled first.

Explanation comes before prediction

Emerging machine learning methods prioritize prediction accuracy compromising on explainability in the process. Clinicians, in most cases, cannot use nor trust a model that arrives at a conclusion without showing how it reached there. Hence, in the clinical domain, a simple logistic regression model may be more acceptable than a deep learning neural network. Parsimony is the key and a bit of feature selection to ensure parsimony will be appreciated always.

You need to know the clinical terminologies

A basic understanding of the clinical terminologies and terminology systems such as SNOMED and ICD is vital. It helps in understanding the clinical community better. Any healthcare analytics to consider variations in terminologies and adopt a standard system for consistency. Any tool that data scientists build for the clinical community should have support for terminology systems.

Biostatistics is more pervasive than you think

Most healthcare professionals are trained in biostatistics. Hence, the thinking leans towards population, sampling, randomization, blindings and showing a ‘statistically significant’ difference. Moving towards machine learning needs a paradigmatic shift. It may be useful to have a discussion on this at the outset.

Classes are of unequal importance

In healthcare, finding one class (e.g. cancer) is more important than the other class (e.g. no cancer). One class may need active intervention to save lives. Hence, sensitivity and specificity are of vital importance than accuracy!

Life is precious!

In healthcare, there is no room for error. Some decisions may have disastrous consequences while few others may save lives. As a data scientist in the healthcare domain, you should be cognizant of the fact that healthcare data is different from banking/airline data.

Published by Bell Eapen on November 3, 2021 | Permalink

OHDSI OMOP to FHIR mapper

TL;DR Below is an open-source common-line tool for converting an OHDSI OMOP cohort (defined in ATLAS) to a FHIR bundle and vice versa.

Wikimedia commons: Copyright held by BAPS Swaminarayan Sanstha (web: www.baps.org, email: info@baps.org); Unknown photographer / CC BY-SA (https://creativecommons.org/licenses/by-sa/4.0)

OHDSI OMOP CDM is one of the most popular clinical data models for health data warehouses. The simple, but clinically motivated data structure is intuitively appealing to clinicians leading to its good adoption. In this respect, it has overtaken HL7-V3 which is more robust but has a steeper learning curve, especially for clinicians. The OHDSI OMOP CDM is widely used in the pharmaceutical industry for drug monitoring.

FHIR is emerging as the defacto standard for health system interoperability, owing largely to its simplicity and the use of existing and popular standards such as REST. As NoSQL databases become more and popular in healthcare, FHIR can also be a good persistence schema. It aligns well with search technologies such as elasticsearch.

As both standards are popular, conversion from one to the other may be commonly required. Researchers at Georgia Tech have an open-source tool – GT-FHIR2 – for mapping an existing OHDSI OMOP CDM database as FHIR endpoint. However, conversion between existing systems may not be easy with a full-stack solution.

I have a simpler solution that I believe will be useful in the following scenarios:

To export a cohort to a FHIR based analytics tool.
To load new resources to OMOP CDM databases for incremental ETL.

Omopfhirmap is a command-line tool for mapping a OHDSI cohort, defined in ATLAS, to a FHIR bundle that can be optionally submitted to a FHIR server for processing. Conversely, it can process a FHIR bundle and add resources to an existing CDM database ignoring duplicates. Unlike GT-FHIR2, the OMOP on FHIR Project at Georgia Tech omopfhirmap does not expose OMOP database as FHIR endpoints.

I have used spring-boot and JPA for easy wiring of services and abstraction of database and the hapi-fhir as it is an obvious choice for any java based FHIR applications. It is still a work in progress and any help will be appreciated (Refer to CONTRBUTING.md).

OMOP <-> FHIR mapper
https://github.com/dermatologist/omopfhirmap
6 forks.
12 stars.
8 open issues.

Recent commits:

Published by Bell Eapen on July 22, 2020 | Permalink

OHDSI OMOP CDM ETL Tools in Python, .Net and Go

TL;DR Here are few OHDSI OMOP CDM tools that may save you time if you are developing ETL tools!

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

The COVID-19 pandemic brought to light many of the vulnerabilities in our data collection and analytics workflows. Lack of uniform data models limits the analytical capabilities of public health organizations and many of them have to re-invent the wheel even for basic analysis. As many other sectors embrace big data and machine learning, many healthcare analysts are still stuck with the basic data wrenching with Excel.

The OHDSI OMOP CDM (Common data model) for observational data is a popular initiative for bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. Though OHDSI OMOP CDM is primarily for patient-centred observational analysis, mostly for clinical research, it can be used with minor tweaks for public health and epidemiologic data as well. We have written about some of the technical details here.

The OHDSI OMOP CDM is relatively simple and intuitive for clinical teams than emerging standards such as FHIR. Though the relational database approach and some of the software tools associated with OHDSI OMOP CDM are a bit old-fashioned, the data model is clinically motivated. There is an ecosystem of software tools for many of the analytics tools that can be used out of the box. The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 6.0, has simple but powerful vocabulary management. OHDSI OMOP CDM is a good choice for healthcare organizations moving towards health data warehousing and OLAP.

One weakness of OHDSI is the lack of tools for efficient ETL from existing EHR and HIS. Converting existing EHR data to the CDM is still a complex task that requires technical expertise. During the additional “home time” during the COVID pandemic, I have created three software libraries for ETL tool developers. These libraries in Python, .NET and Golang encapsulated the V6.0 CDM and helps in writing and reading data from a variety of databases with the V6.0 tables. The libraries also support creating the CDM tables for new databases and loading the vocabulary files.

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

These libraries might save you some time if you are building scripts for ETL to CDM. They are all open-source and free to use in your tools. Do give me a shout if you find these libraries useful and please star the repositories on GitHub.

Published by Bell Eapen on June 11, 2020 | Permalink

FHIR and public health data warehouses

First posted on CanEHealth.com

The provincial government is building a connected health care system centred around patients, families and caregivers through the newly established OHTs. As disparate healthcare and public health teams move towards a unified structure, there is a growing need to reconsider our information system strategy. Most off the shelf solutions are pricey, while open-source solutions such as DHIS2 is not popular in Canada. Some of the public health units have existing systems, and it will be too resource-intensive to switch to another system. The interoperability challenge needs an innovative solution, beyond finding the single, provincial EMR.

We have written about the theoretical aspects, especially the need to envision public health information systems separate from an EMR. In this working paper, we propose a maturity model for PHIS and offer some pragmatic recommendations for dealing with the common challenges faced by public health teams.

Below is a demo project on GitHub from the data-intel lab that showcases a potential solution for a scalable data warehouse for health information system integration. Public health databases are vital for the community for efficient planning, surveillance and effective interventions. Public health data needs to be integrated at various levels for effective policymaking. PHIS-DW adopts FHIR as the data model for storage with the integrated Elasticsearch stack. Kibana provides the visualization engine. PHIS-DW can support complex algorithms for disease surveillance such as machine learning methods, hidden Markov models, and Bayesian to multivariate analytics. PHIS-DW is work in progress and code contributions are welcome. We intend to use Bunsen to integrate PHIS-DW with Apache Spark for big data applications.

Public Health Data Warehouse Framework on FHIR
https://github.com/E-Health/fhir-server-phis-dw
2 forks.
3 stars.
3 open issues.

Recent commits:

FHIR has some advantages as a data persistence schema for public health. Apart from its popularity, the FHIR bundle makes it possible to send observations to FHIR servers without the associated patient resource, thereby ensuring reasonable privacy. This is especially useful in the surveillance of pandemics such as COVID19. Some useful yet complicated integrations with OSCAR EMR and DHIS2 is under consideration. If any of the OHTs find our approach interesting, give us a shout.

BTW, have you seen Drishti, our framework for FHIR based behavioural intervention?

Published by Bell Eapen on April 28, 2020 | Permalink

How to deploy an h2o ai model using OpenFaaS on Digitalocean in 2 minutes

H2O is an open-source, distributed and scalable machine learning platform written in JAVA. H2O supports many of the statistical & machine learning algorithms, including gradient boosted machines, generalized linear models, deep learning and more. OpenFaaS® (Functions as a Service) is a framework for building Serverless functions easily with Docker. Read my previous post to learn more about OpenFaaS and DO.

H2O has a module aptly named sparkling water that allows users to combine the machine learning algorithms of H2O with the capabilities of Spark. Integrating these two open-source environments provides a seamless experience for users who want to make a query using Spark SQL, feed the results into H2O to build a model and make predictions, and then use the results again in Spark. For any given problem, better interoperability between tools provides a better experience.

H2O Driverless AI is a commercial package for automatic machine learning that automates some of the most difficult data science and machine learning workflows such as feature engineering, model validation, model tuning, model selection, and model deployment. H2O also has a popular open-source module called AutoML that automates the process of training a large selection of candidate models. H2O’s AutoML can be used for automating the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. AutoML makes hyperparameter tuning accessible to everyone.

H2O allows you to convert the models to either a Plain Old Java Object (POJO) or a Model Object or an Optimized (MOJO) that can be easily embeddable in any Java environment. The only compilation and runtime dependency for a generated model is the h2o-genmodel.jar file produced as the build output of these packages. You can read more about deploying h2o models here.

I have created an OpenFaaS template for deploying the exported MOJO file using a base java container and the dependencies defined in the gradle build file. Using the OpenFaaS CLI (How to Install) pull my template as below:

mkdir watersplash
cd watersplash

faas-cli template pull https://github.com/dermatologist/java-ext --prefix your-docker-uname

faas-cli new --lang java-h2o watersplash

Copy the exported MOJO zip file to the root folder along with build.gradle and settings.gradle. Make appropriate changes to handle.java as per the needs of the model, as explained here. Add http://digitaloceanIP:8080 to watersplash.yml

 provider:
  	name: openfaas
  	gateway: http://digitaloceanIP:8080

and finally:

 faas-cli up -f watersplash.yml

That’s it! Congratulations! Your model is up and running! Access it at http://digitaloceanIP:8080/function/watersplash

If you get stuck at any stage, give me a shout below.

Published by Bell Eapen on November 13, 2019 | Permalink

Machine Learning on Diabetic Retinopathy Images

Artificial intelligence (AI) and Machine Learning (ML) are having a profound impact on the way medicine is being practiced. AI/ML algorithms and techniques fit imaging applications easily and can help with automation. Radiology is the specialty that has benefitted the most from the AI/ML revolution. Melanoma detection in Dermatology is another obvious winner.

Many of the machine learning algorithms are reasonably well known. The real challenge is to get the infrastructure to crunch massive amounts of data, getting the ideal dataset for a problem, optimizing the model for performance and deploying the model for use. If you are relatively new to ML, Kaggle is a useful resource for you to start.

I will briefly introduce Kaggle for those who have not used it before. Kaggle is a platform for posting datasets that you have collected. They also provide ‘kernels’ or computational resources (typically Jupyter Notebooks) for collaborative analysis. The datasets can be made private or public under a variety of license options. Organizations post competitions and reward teams that solve them. Solutions are typically posted as predictions on a test dataset or share the kernel code

I recently noticed a good competition on Kaggle that the eHealth community may find interesting. Aravind Eye Hospital in India has posted a dataset consisting of fundoscopic images of diabetic retinopathy with varying degrees of severity. The dataset consists of thousands of images collected in rural areas by the technicians of Aravind hospital from the rural areas of India. The challenge is to develop a model that can predict the severity of diabetic retinopathy from the fundoscopic image. Further, the successful solutions will be shared with other Ophthalmologists through the 4th Asia Pacific Tele-Ophthalmology Society (APTOS) Symposium.

The competition page is available here: https://www.kaggle.com/c/aptos2019-blindness-detection
Let me know if anybody wants to team up!

Published by Bell Eapen on July 2, 2019 | Permalink

Hephestus: Health data warehousing tool for public health and clinical research

Health data warehousing is becoming an important requirement for deriving knowledge from the vast amount of health data that healthcare organizations collect. A data warehouse is vital for collaborative and predictive analytics. The first step in designing a data warehouse is to decide on a suitable data model. This is followed by the extract-transform-load (ETL) process that converts source data to the new data model amenable for analytics.

The OHDSI – OMOP Common Data Model is one such data model that allows for the systematic analysis of disparate observational databases and EMRs. The data from diverse systems needs to be extracted, transformed and loaded on to a CDM database. Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools that are already available.

Each data source requires customized ETL tools for this conversion from the source data to CDM. The OHDSI ecosystem has made some tools available for helping the ETL process such as the White Rabbit and the Rabbit In a Hat. However, health data warehousing process is still challenging because of the variability of source databases in terms of structure and implementations.

Hephestus is an open-source python tool for this ETL process organized into modules to allow code reuse between various ETL tools for open-source EMR systems and data sources. Hephestus uses SqlAlchemy for database connection and automapping tables to classes and bonobo for managing ETL. The ultimate aim is to develop a tool that can translate the report from the OHDSI tools into an ETL script with minimal intervention. This is a good python starter project for eHealth geeks.

Anyone anywhere in the world can build their own environment that can store patient-level observational health data, convert their data to OHDSI’s open community data standards (including the OMOP Common Data Model), run open-source analytics using the OHDSI toolkit, and collaborate in OHDSI research studies that advance our shared mission toward reliable evidence generation. Join the journey! here

Disclaimer: Hephestus is just my experiment and is not a part of the official OHDSI toolset.

[github-clone username=”dermatologist” repository=”hephaestus”]

Published by Bell Eapen on November 3, 2018 | Permalink