TL;DR: A clinician‑researcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoon. This function, generated by vibe coding using these prompts, would help you do just that!

Pyomop: Python package for managing OHDSI clinical data models. Includes support for LLM based plain text queries!

🎵 What is Vibe Coding?

Vibe coding is an AI‑assisted development approach popularized by Andrej Karpathy. Instead of hand‑coding every function, you describe your intent in natural language to a large language model (LLM), and the AI generates boilerplate, scaffolding, or even full implementations.

Core elements:

  • Natural language prompts: You focus on what you want, not how to implement it.
  • AI‑driven code generation: The LLM translates your description into actual code.
  • Iteration through conversation: You refine the output by prompting adjustments, much like pair programming.
  • Conceptual focus: This frees mental bandwidth for architecture, data flows, and domain logic, rather than syntax minutiae.

Done well, vibe coding is not “letting the AI code everything blindly.” It’s a collaborative, high‑level design process where you still review, edit, and integrate the code yourself.

But let us discuss the problem first!

🗂 What is OMOP CDM?

The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) is the de‑facto standard for representing health data in a way that’s consistent, query‑friendly, and research‑ready. Stewarded by OHDSI, OMOP CDM structures clinical data into a fixed set of standardized tables—patients, conditions, drug exposures, measurements, procedures—each with precisely defined fields and standard vocabulary concept IDs.

The payoff:

  • You can write a single analysis and run it across multiple sites—without rewriting SQL for each local schema.
  • Data from disparate EHR systems, claims databases, and registries can be harmonized and compared.
  • The model supports rich vocabularies like SNOMED, RxNorm, and LOINC through standardized concept IDs.

If raw EHR data is a jumble of puzzle pieces, OMOP CDM is the finished picture on the box—complete with a shared key so everyone knows exactly where each piece fits.

🔗 Why is it hard to map FHIR to OMOP?

At first glance, FHIR (Fast Healthcare Interoperability Resources) and OMOP both sound like they’re aiming for the same thing: standardization. But FHIR focuses on data exchange—it’s an API‑friendly format designed for real‑time transactions, mobile apps, and message passing between systems. OMOP, in contrast, is optimized for longitudinal analytics and cohort‑level studies.

Mapping challenges include:

  • Structural mismatch
    FHIR resources are often nested, verbose JSON objects with variable optional fields. OMOP tables are flat, relational, and denormalized for fast query performance. Flattening nested hierarchies without losing nuance is non‑trivial.
  • Semantic alignment
    FHIR can store local codes or references; OMOP demands standardized vocabulary concepts. This means you can’t just copy values—you must map each source code to a standard concept ID, often using crosswalk tables from OHDSI’s Athena vocabulary service.
  • Event granularity
    In FHIR, the same clinical fact might appear in multiple resources with different contexts. In OMOP, each fact needs to live in exactly one table with precise timestamps and provenance.
  • Volume and variety
    FHIR Bulk Export can generate millions of NDJSON lines across dozens of resource types. Efficiently parsing, reconciling, and loading that into OMOP while preserving relationships is an engineering workout.

🐍 Meet PyOMOP

PyOMOP is a lightweight, Python‑based toolkit for working with OMOP CDM v5.4 or v6 databases.

Key capabilities include:

  • Creating and initializing OMOP databases (SQLite for quick tests, Postgres/MySQL for production).
  • Loading vocabularies from Athena CSVs.
  • Running standard OHDSI QueryLibrary queries or your own SQL.
  • Converting results to DataFrames for downstream analysis or machine learning.
  • FHIR → OMOP import utilities for bulk‑loading FHIR data into CDM tables.

PyOMOP’s design goal is to get researchers and developers productive quickly without the steep ramp‑up of setting up full OHDSI stacks from scratch.

🛠 Building FHIR→OMOP in PyOMOP with Vibe Coding

When I set out to add robust FHIR Bulk Export to OMOP CDM loading in PyOMOP, I knew the mapping complexity would be the biggest hurdle. I also knew that much of the tedious parsing, table loading, and vocabulary reconciliation could be scaffolded quickly using an AI coding partner.

Leveraging vibe coding for scaffolding

Using natural‑language prompts, I asked the AI to:

  • Generate Python functions to parse NDJSON files into DataFrames.
  • Build class methods that map FHIR resource fields to OMOP column names.
  • Handle async bulk inserts into Postgres using SQLAlchemy.

I treated the AI’s output like a first draft. I kept the domain‑specific logic in my own hands. I fed failure examples back into the AI prompt, asking for exception handling and logging improvements. This conversational loop closed the gap quickly. Once the mapping functions were stable, I embedded them behind a simple command:

pyomop --create --vocab ~/Downloads/omop-vocab/ --input ~/Downloads/fhir/

Now, a single command could create an OMOP DB, load vocabularies, import FHIR data, and reconcile mappings.

With the FHIR→OMOP feature now baked into PyOMOP, a clinician‑researcher can download vocabularies, point at a folder of FHIR Bulk Export files, and be querying in OMOP CDM in an afternoon—no endless SQL migrations required.

Bell Eapen
Follow Me
Latest posts by Bell Eapen (see all)