Bell Eapen

Physician | HealthIT Developer | Digital Health Consultant

Random forest model for predicting the total length of hospital stay (TLOS)

TL;DR here is the Random Forest classifier code:

And an (obvious) upfront disclaimer: This is a learning project. This is not for actual use.

DAD is a database consisting of patient demographics, comorbidities, interventions and the length of stay for the de-identified 10% sample of hospital admissions. DAD (2014-15) has an enhanced dataset with variables that were created at Western to act as flags for ICD-10 and CCI groupings, to make using the file easier.

Here is an experiment with the DAD enhanced dataset to create a Random forest model for predicting the total length of hospital stay (TLOS) in less than 100 lines of code. Random forests are an ensemble classifier, that operates by building multiple decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. This is a learning project for Apache Spark and Spark ML using pyspark. The accuracy of the model taking all derived categorical variables is low.

I have access to Apache Spark @ CC. If you are installing Spark in your computer you may have to change the following:

Some of the commonly tweaked parameters can be changed here:

Uncomment the following line to include only variables that you need.

Here is the repo. How can this model be improved? Maybe a PCA before the RF? or Am I missing something important?


Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However, the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.

eHealth User Interface Design for Seniors

Many of the open-source PHR systems have poor user interfaces (UI) by any standards while older adults have specific interface requirements. Credible research on the special interface requirements of elderly is still hard to find.

The Context-Aware Knowledge Retrieval – Infobutton


Image Credit: By Roberth Edberg (Own work (Original text: self-made)) [Public domain], via Wikimedia Commons. Image altered and text added.

Clinicians using patient information systems such as EMRs and HIE clinical viewers need a cross-referencing mechanism to clinical information aggregators such as DynaMed and Medlineplus. Busy practitioners don’t have time to wade through several pages to find what they need. So there has to be a mechanism to convey as much information as possible about the patient at hand to these knowledge providers so that the doctor is presented with information most relevant to the patient being managed. Infobutton is the standard (HL7 Version 3) for this information exchange, and Infobutton manager is the broker that facilitates this data transfer. With the increasing popularity of Personal Health Records (PHR) systems, the infobutton standard is relevant to PHRs as well.

Infobutton implementation can be either URL (REST) based or SOAP based. The information provided by the client system include patient characteristics such as age and sex, provider, care setting, clinical task and diagnosis. The OpenInfobutton project is an initiative led by the United States Veterans Health Administration (VHA) and the University of Utah that provides the infobutton manager that brokers the interaction between client information systems and online knowledge providers.

Understanding of Infobutton standard is important to Clinical information system developers, Clinical knowledge resource publishers and health care organizations such as regional clinical viewers. Several studies have demonstrated an increase in productivity for physicians with Infobutton adoption. Since infobutton standard is based on well-known protocols such as SOAP and REST, it is relatively easy to implement.

2015 National Change Management Survey (Canada)

ລາວ: ການຈັດການຕ້ອງເຮັດໃຫ້ດີ

(Photo credit: Wikipedia)

In 2010 Infoway conducted a change management survey, which was the catalyst to establishing the Pan Canadian Change Management Network and development of the National Change Management Framework.

Fast forward 5 years…Canada Health Infoway’s (Infoway) Clinical Adoption Team is conducting a survey to gain insight on how change management is currently being conducted across the country. Change management is an essential driver of adoption and benefits realized from the use of digital health. Infoway’s multifaceted approach to change management is guided by an evidence-informed change management framework that was developed with input from experts across Canada in 2011.

The purpose of this survey is to assess how change management is currently being conducted across the country and the usefulness of the Change Management framework and associated toolkit. The survey takes between 10-15 minutes to complete. Upon completion of the survey, you can enter the participant’s draw, where you could win one of five $200 Visa cards. The survey will be available until February 12, 2015. Your input will help provide direction on how to shape the ongoing development of change management resources for the adoption of digital health solutions across the country in 2015 and beyond.

Link to 2015 National Change Management Survey:

If Ebola Spreads to Canada

While reading the news about the public health agency of Canada taking all possible steps to prevent the spread of Ebola to Canada, with a glass of Ontario wine in my hands, I for a brief moment thought, what if ………

Picture credit DFID @ Flikr (Image altered and text added) – If Ebola spreads to Canada

So let me set the context right. I am not an infectious disease expert, though my post on cutaneous signs of Ebola virus infection got more attention that it deserved. I am not an epidemiologist either to comment authoritatively on what healthmap is doing. To me it is the social media version of what John Snow did two centuries back to identify the epicentre of the cholera outbreak and established epidemiology as a speciality.

So if Ebola spreads to Canada, How do we identify the epicentre and take preventive measures? Turn to healthmaps and see where it originated and take measures to contain? Healthmaps will get that information from Google news and similar services. We have half a dozen major Health Information Exchange (HIE) initiatives in the country and would probably have accurate records of where each case presented with the characteristic symptoms. But we would look up to healthmaps and google since we cannot use HIE data for research!

i wonder how long it wil take for #ebola to hit #canada? which city first? and wil it get #outofcontrol? #crazy :headshaking:
— 411inToronto (@411inToronto) October 10, 2014

I am not a health policy expert neither am I an HIE architecture expert. But to me, if we have to realize the benefits of the ever increasing number of HIE initiatives, we have to find a way to use the wealth of the information there for population health. If we get it right, privacy is not even a concern.

HIE, built to abolish silos, paradoxically created larger silos, because of fragmented systems. The utopian population health requires a glue to bring these silos together. We got it wrong the first time, with data-centric HIS that offered little clinical workflow support and were (inadvertently) rejected by doctors. (We always have the doctors to blame as the universal slow technology adopters. BTW India’s mission to Mars discovered that all doctors in the planet originated from Mars!). We are sure to get it wrong again if we don’t change the data-centric HIE models.

HIE should be versatile, structureless and scalable enough to support disparate clinical use cases. The only option that comes to my mind is RDF.

If you are still unsure, read all that I have written about RDF. Convinced? Go ahead and head on over to the Yosemite Manifesto. BTW it has got nothing to do with the new OS X!