Data mining Archives

Hephestus: Health data warehousing tool for public health and clinical research

Health data warehousing is becoming an important requirement for deriving knowledge from the vast amount of health data that healthcare organizations collect. A data warehouse is vital for collaborative and predictive analytics. The first step in designing a data warehouse is to decide on a suitable data model. This is followed by the extract-transform-load (ETL) process that converts source data to the new data model amenable for analytics.

The OHDSI – OMOP Common Data Model is one such data model that allows for the systematic analysis of disparate observational databases and EMRs. The data from diverse systems needs to be extracted, transformed and loaded on to a CDM database. Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools that are already available.

Each data source requires customized ETL tools for this conversion from the source data to CDM. The OHDSI ecosystem has made some tools available for helping the ETL process such as the White Rabbit and the Rabbit In a Hat. However, health data warehousing process is still challenging because of the variability of source databases in terms of structure and implementations.

Hephestus is an open-source python tool for this ETL process organized into modules to allow code reuse between various ETL tools for open-source EMR systems and data sources. Hephestus uses SqlAlchemy for database connection and automapping tables to classes and bonobo for managing ETL. The ultimate aim is to develop a tool that can translate the report from the OHDSI tools into an ETL script with minimal intervention. This is a good python starter project for eHealth geeks.

Anyone anywhere in the world can build their own environment that can store patient-level observational health data, convert their data to OHDSI’s open community data standards (including the OMOP Common Data Model), run open-source analytics using the OHDSI toolkit, and collaborate in OHDSI research studies that advance our shared mission toward reliable evidence generation. Join the journey! here

Disclaimer: Hephestus is just my experiment and is not a part of the official OHDSI toolset.

[github-clone username=”dermatologist” repository=”hephaestus”]

Published by Bell Eapen on November 3, 2018 | Permalink

Combining Clinical Trials

English: Icon representing Bayesian statistics (Photo credit: Wikipedia)

BMC Medical Research Methodology | Abstract | Bayesian methods in clinical trials: a Bayesian analysis of ECOG trials E1684 and E1690:

Happy new year to all!

I have always wondered how to effectively combine data from a previous similar clinical trial into a new trial. If this is not attempted, the wealth of information already collected will be wasted. Besides if the trials give conflicting results, the entire effort in conducting both trials is lost and you end up with only confusion. The authors here have conceived a method to effectively combine data from similar trials conducted at different times using the Bayesian method. In short, the older trial is used to generate the prior probability distribution for the analysis of the new results. The methodology has been used in Melanoma studies. (I am happy that it is from my domain). I have also experimented with Bayesian methodology before.

I give 4 peels to this idea. (Pardon me for using a grading system envisaged for a different cause!)

My Rating: 4 peels
What is peel score?

Negative N to Unknown U

The identification of disease specific genes is pivotal in clinical informatics. This paper describes an improved algorithm for machine learning in which the negative N is classified more appropriately as Unknown U.

English: Weka Data Mining Open Software in Java (Photo credit: Wikipedia)

Peng Yang, Xiao-Li Li, Jian-Ping Mei, Chee-Keong Kwoh, and See-Kiong Ng. Positive-Unlabeled Learning for Disease Gene Identification
Bioinformatics first published online August 24, 2012 doi:10.1093/bioinformatics/bts504

SVMs are an important tool in bioinformaticians armamentarium. Weka is a collection of machine learning algorithms for data mining tasks.

Bell Eapen MD, PhD.

Hephestus: Health data warehousing tool for public health and clinical research

Combining Clinical Trials

Related articles

Negative N to Unknown U

Related articles