OMOP CDM Archives

Vibe Coding FHIR to OMOP

TL;DR: A clinician‑researcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoon. This function, generated by vibe coding using these prompts, would help you do just that!

Pyomop: Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries!

🎵 What is Vibe Coding?

Vibe coding is an AI‑assisted development approach popularized by Andrej Karpathy. Instead of hand‑coding every function, you describe your intent in natural language to a large language model (LLM), and the AI generates boilerplate, scaffolding, or even full implementations.

Core elements:

Natural language prompts: You focus on what you want, not how to implement it.
AI‑driven code generation: The LLM translates your description into actual code.
Iteration through conversation: You refine the output by prompting adjustments, much like pair programming.
Conceptual focus: This frees mental bandwidth for architecture, data flows, and domain logic, rather than syntax minutiae.

Done well, vibe coding is not “letting the AI code everything blindly.” It’s a collaborative, high‑level design process where you still review, edit, and integrate the code yourself.

But let us discuss the problem first!

🗂 What is OMOP CDM?

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is the de‑facto standard for representing health data in a way that’s consistent, query‑friendly, and research‑ready. Stewarded by OHDSI, OMOP CDM structures clinical data into a fixed set of standardized tables—patients, conditions, drug exposures, measurements, procedures—each with precisely defined fields and standard vocabulary concept IDs.

The payoff:

You can write a single analysis and run it across multiple sites—without rewriting SQL for each local schema.
Data from disparate EHR systems, claims databases, and registries can be harmonized and compared.
The model supports rich vocabularies like SNOMED, RxNorm, and LOINC through standardized concept IDs.

If raw EHR data is a jumble of puzzle pieces, OMOP CDM is the finished picture on the box—complete with a shared key so everyone knows exactly where each piece fits.

🔗 Why is it hard to map FHIR to OMOP?

At first glance, FHIR (Fast Healthcare Interoperability Resources) and OMOP both sound like they’re aiming for the same thing: standardization. But FHIR focuses on data exchange—it’s an API‑friendly format designed for real‑time transactions, mobile apps, and message passing between systems. OMOP, in contrast, is optimized for longitudinal analytics and cohort‑level studies.

Mapping challenges include:

Structural mismatch
FHIR resources are often nested, verbose JSON objects with variable optional fields. OMOP tables are flat, relational, and denormalized for fast query performance. Flattening nested hierarchies without losing nuance is non‑trivial.
Semantic alignment
FHIR can store local codes or references; OMOP demands standardized vocabulary concepts. This means you can’t just copy values—you must map each source code to a standard concept ID, often using crosswalk tables from OHDSI’s Athena vocabulary service.
Event granularity
In FHIR, the same clinical fact might appear in multiple resources with different contexts. In OMOP, each fact needs to live in exactly one table with precise timestamps and provenance.
Volume and variety
FHIR Bulk Export can generate millions of NDJSON lines across dozens of resource types. Efficiently parsing, reconciling, and loading that into OMOP while preserving relationships is an engineering workout.

🐍 Meet PyOMOP

PyOMOP is a lightweight, Python‑based toolkit for working with OMOP CDM v5.4 or v6 databases.

Key capabilities include:

Creating and initializing OMOP databases (SQLite for quick tests, Postgres/MySQL for production).
Loading vocabularies from Athena CSVs.
Running standard OHDSI QueryLibrary queries or your own SQL.
Converting results to DataFrames for downstream analysis or machine learning.
FHIR → OMOP import utilities for bulk‑loading FHIR data into CDM tables.

PyOMOP’s design goal is to get researchers and developers productive quickly without the steep ramp‑up of setting up full OHDSI stacks from scratch.

🛠 Building FHIR→OMOP in PyOMOP with Vibe Coding

When I set out to add robust FHIR Bulk Export to OMOP CDM loading in PyOMOP, I knew the mapping complexity would be the biggest hurdle. I also knew that much of the tedious parsing, table loading, and vocabulary reconciliation could be scaffolded quickly using an AI coding partner.

Leveraging vibe coding for scaffolding

Using natural‑language prompts, I asked the AI to:

Generate Python functions to parse NDJSON files into DataFrames.
Build class methods that map FHIR resource fields to OMOP column names.
Handle async bulk inserts into Postgres using SQLAlchemy.

I treated the AI’s output like a first draft. I kept the domain‑specific logic in my own hands. I fed failure examples back into the AI prompt, asking for exception handling and logging improvements. This conversational loop closed the gap quickly. Once the mapping functions were stable, I embedded them behind a simple command:

pyomop --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/

Now, a single command could create an OMOP DB, load vocabularies, import FHIR data, and reconcile mappings.

With the FHIR→OMOP feature now baked into PyOMOP, a clinician‑researcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoon—no endless SQL migrations required.

Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries, MCP server and FHIR import.
https://github.com/dermatologist/pyomop
9 forks.
56 stars.
0 open issues.

Recent commits:

Published by Bell Eapen on August 22, 2025 | Permalink

OHDSI OMOP CDM ETL Tools in Python, .Net and Go

TL;DR Here are few OHDSI OMOP CDM tools that may save you time if you are developing ETL tools!

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

The COVID-19 pandemic brought to light many of the vulnerabilities in our data collection and analytics workflows. Lack of uniform data models limits the analytical capabilities of public health organizations and many of them have to re-invent the wheel even for basic analysis. As many other sectors embrace big data and machine learning, many healthcare analysts are still stuck with the basic data wrenching with Excel.

The OHDSI OMOP CDM (Common data model) for observational data is a popular initiative for bringing data into a common format that allows for collaborative research, large-scale analytics, and sharing of sophisticated tools and methodologies. Though OHDSI OMOP CDM is primarily for patient-centred observational analysis, mostly for clinical research, it can be used with minor tweaks for public health and epidemiologic data as well. We have written about some of the technical details here.

The OHDSI OMOP CDM is relatively simple and intuitive for clinical teams than emerging standards such as FHIR. Though the relational database approach and some of the software tools associated with OHDSI OMOP CDM are a bit old-fashioned, the data model is clinically motivated. There is an ecosystem of software tools for many of the analytics tools that can be used out of the box. The Observational Medical Outcomes Partnership (OMOP) CDM, now in its version 6.0, has simple but powerful vocabulary management. OHDSI OMOP CDM is a good choice for healthcare organizations moving towards health data warehousing and OLAP.

One weakness of OHDSI is the lack of tools for efficient ETL from existing EHR and HIS. Converting existing EHR data to the CDM is still a complex task that requires technical expertise. During the additional “home time” during the COVID pandemic, I have created three software libraries for ETL tool developers. These libraries in Python, .NET and Golang encapsulated the V6.0 CDM and helps in writing and reading data from a variety of databases with the V6.0 tables. The libraries also support creating the CDM tables for new databases and loading the vocabulary files.

Python: pyomop | pypi
.NET: omopcdmlib | NuGet
Golang: gocdm

These libraries might save you some time if you are building scripts for ETL to CDM. They are all open-source and free to use in your tools. Do give me a shout if you find these libraries useful and please star the repositories on GitHub.

Published by Bell Eapen on June 11, 2020 | Permalink