Vibe Coding FHIR to OMOP
TL;DR: A clinicianâresearcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoon. This function, generated by vibe coding using these prompts, would help you do just that!
đ” What is Vibe Coding?
Vibe coding is an AIâassisted development approach popularized by Andrej Karpathy. Instead of handâcoding every function, you describe your intent in natural language to a large language model (LLM), and the AI generates boilerplate, scaffolding, or even full implementations.
Core elements:
- Natural language prompts: You focus on what you want, not how to implement it.
- AIâdriven code generation: The LLM translates your description into actual code.
- Iteration through conversation: You refine the output by prompting adjustments, much like pair programming.
- Conceptual focus: This frees mental bandwidth for architecture, data flows, and domain logic, rather than syntax minutiae.
Done well, vibe coding is not âletting the AI code everything blindly.â Itâs a collaborative, highâlevel design process where you still review, edit, and integrate the code yourself.
But let us discuss the problem first!
đ What is OMOP CDM?
The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is the deâfacto standard for representing health data in a way thatâs consistent, queryâfriendly, and researchâready. Stewarded by OHDSI, OMOP CDM structures clinical data into a fixed set of standardized tablesâpatients, conditions, drug exposures, measurements, proceduresâeach with precisely defined fields and standard vocabulary concept IDs.
The payoff:
- You can write a single analysis and run it across multiple sitesâwithout rewriting SQL for each local schema.
- Data from disparate EHR systems, claims databases, and registries can be harmonized and compared.
- The model supports rich vocabularies like SNOMED, RxNorm, and LOINC through standardized concept IDs.
If raw EHR data is a jumble of puzzle pieces, OMOP CDM is the finished picture on the boxâcomplete with a shared key so everyone knows exactly where each piece fits.
đ Why is it hard to map FHIR to OMOP?
At first glance, FHIR (Fast Healthcare Interoperability Resources) and OMOP both sound like theyâre aiming for the same thing: standardization. But FHIR focuses on data exchangeâitâs an APIâfriendly format designed for realâtime transactions, mobile apps, and message passing between systems. OMOP, in contrast, is optimized for longitudinal analytics and cohortâlevel studies.
Mapping challenges include:
- Structural mismatch
FHIR resources are often nested, verbose JSON objects with variable optional fields. OMOP tables are flat, relational, and denormalized for fast query performance. Flattening nested hierarchies without losing nuance is nonâtrivial. - Semantic alignment
FHIR can store local codes or references; OMOP demands standardized vocabulary concepts. This means you canât just copy valuesâyou must map each source code to a standard concept ID, often using crosswalk tables from OHDSIâs Athena vocabulary service. - Event granularity
In FHIR, the same clinical fact might appear in multiple resources with different contexts. In OMOP, each fact needs to live in exactly one table with precise timestamps and provenance. - Volume and variety
FHIR Bulk Export can generate millions of NDJSON lines across dozens of resource types. Efficiently parsing, reconciling, and loading that into OMOP while preserving relationships is an engineering workout.
đ Meet PyOMOP
PyOMOP is a lightweight, Pythonâbased toolkit for working with OMOP CDM v5.4 or v6 databases.
Key capabilities include:
- Creating and initializing OMOP databases (SQLite for quick tests, Postgres/MySQL for production).
- Loading vocabularies from Athena CSVs.
- Running standard OHDSI QueryLibrary queries or your own SQL.
- Converting results to DataFrames for downstream analysis or machine learning.
- FHIR â OMOP import utilities for bulkâloading FHIR data into CDM tables.
PyOMOPâs design goal is to get researchers and developers productive quickly without the steep rampâup of setting up full OHDSI stacks from scratch.
đ Building FHIRâOMOP in PyOMOP with Vibe Coding
When I set out to add robust FHIR Bulk Export to OMOP CDM loading in PyOMOP, I knew the mapping complexity would be the biggest hurdle. I also knew that much of the tedious parsing, table loading, and vocabulary reconciliation could be scaffolded quickly using an AI coding partner.
Leveraging vibe coding for scaffolding
Using naturalâlanguage prompts, I asked the AI to:
- Generate Python functions to parse NDJSON files into DataFrames.
- Build class methods that map FHIR resource fields to OMOP column names.
- Handle async bulk inserts into Postgres using SQLAlchemy.
I treated the AIâs output like a first draft. I kept the domainâspecific logic in my own hands. I fed failure examples back into the AI prompt, asking for exception handling and logging improvements. This conversational loop closed the gap quickly. Once the mapping functions were stable, I embedded them behind a simple command:
pyomop --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/
Now, a single command could create an OMOP DB, load vocabularies, import FHIR data, and reconcile mappings.
With the FHIRâOMOP feature now baked into PyOMOP, a clinicianâresearcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoonâno endless SQL migrations required.