Bell Eapen MD, PhD.

Bringing Digital health & Gen AI research to life!

CRISP-T: Bridging Text, Numbers, and AI for Smarter Qualitative Research

CRISP-T is a tool for researchers navigating the complexities of qualitative analysis on mixed data types. In fields like healthcare, education, and social sciences, qualitative data—interviews, open-ended surveys, field notes—often hold the richest insights. Yet, integrating this with structured numeric data has traditionally been cumbersome. CRISP-T addresses this gap by offering a unified framework that enables computational triangulation, allowing researchers to explore relationships between textual themes and quantitative outcomes with precision and flexibility.

CRISP-T: AI assisted qualitative research!

Its relevance is especially pronounced in the age of AI-assisted research. With built-in support for large language models and agentic AI (through MCP), CRISP-T empowers users to go beyond manual coding and thematic analysis. For instance, researchers can use topic modelling to identify recurring themes in patient feedback, then correlate these with retention metrics or clinical outcomes using decision trees or regression analysis. This kind of integrated insight is invaluable for evidence-based decision-making and theory development.

Moreover, CRISP-T’s modular design and open-source ethos make it highly adaptable. Whether you’re a solo researcher, part of an interdisciplinary team, or integrating AI agents into your workflow, CRISP-T provides the scaffolding to build sharable, reproducible analyses. Its MCP server interface further extends its utility, enabling seamless integration with platforms like Claude Desktop or VSCode, and facilitating interactive exploration of data. In short, CRISP-T isn’t just a toolkit—it’s a bridge between qualitative depth and quantitative rigour, built for the future of data-centric theory building.

🧠 Why CRISP-T Matters

Qualitative research often involves messy, complex data—interview transcripts, open-ended survey responses, field notes, and more. But what if you could combine these with structured numeric data like demographics or survey scores, and analyze both in tandem?

CRISP-T enables this fusion, offering a computational triangulation approach that:

  • Integrates text and numbers into unified corpus objects
  • Applies NLP techniques like topic modelling and sentiment analysis
  • Leverages ML algorithms such as decision trees and clustering
  • Supports semantic search and metadata export for deeper insights

This toolkit is especially valuable in domains like healthcare, education, and social sciences, where research on mixed data types is common.

⚙️ What’s Inside the Toolkit?

CRISP-T is built in Python and offers four powerful command-line interfaces (CLIs):

CLI ToolPurpose
crispMain CLI for triangulation and analysis (topics, sentiment, regression)
crisptCorpus manipulation (add/remove/query documents, relationships)
crispvizVisualization (word clouds, topic charts, heatmaps)
crisp-mcpMCP server for AI agent integration (Claude, VSCode, etc.)

These tools allow researchers to ingest data from .txt, .pdf, or .csv files, define relationships between text and numbers, and validate findings through machine learning models.

📊 Real-World Use Case: Market Research

Imagine a company collecting:

  • Customer feedback (text)
  • Retention rates and sales data (numbers)

Using CRISP-T, analysts can:

  1. Extract recurring themes from feedback
  2. Link them to performance metrics
  3. Validate relationships using decision trees or regression
  4. Visualize findings with topic charts and word clouds

This kind of mixed-methods insight is invaluable for strategic decision-making.

🌐 Learn More and Get Involved

If you’re a researcher, educator, or data scientist looking to bridge qualitative depth with quantitative rigour, CRISP-T is your toolkit. Give it a ⭐️ on GitHub, try the demo, and join the movement toward smarter, AI-powered sense-making.

CRISP-T: Sense-making from Text and Numbers for Qualitative Research!
https://github.com/dermatologist/crisp-t
0 forks.
1 stars.
1 open issues.

Recent commits:

R&D and Innovation in IT; to or not to combine both

R&D and innovation are two related but distinct concepts. My aim is not to delve into the subtle semantic differences between the two but to explore, as an information systems researcher, some organizational factors that may impact individual innovators. My focus is exclusively on information technology and information systems innovation within a corporate setting. 

R&D and innovation
Image credit: Petrovskyz and Jahobr, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0, via Wikimedia Commons

In my view, Research & Development (R&D) is a systematic process of exploring existing methods, techniques, and processes within an organization to improve upon them or discover new applications. This involves thorough investigation, experimentation, analysis, and refinement aimed at creating solutions that can enhance productivity, efficiency, quality, and competitiveness in the marketplace. The methods, techniques, and processes are either available internally or can be procured free or bought. The focus is on finding the organizational fit for a known solution. 

The R&D process typically comprises several stages:  

1. Identification of the pain points. (Problem space) 

2. Identification of potential solutions. (Solution space) 

3. Gap analysis and research objectives. 

4. Experimental design and execution. 

5. Interpretation. 

6. Reporting back to stakeholders and decision-makers. 

R&D is potentially scalable by increasing the team size. Individuals work as a team to solve problems. It is easy to track and monitor progress. Documentation is the key to externalizing the gained knowledge to the organizational memory for the use and reuse of knowledge, ensuring transparency and continuous learning within the organization. Moreover, utilizing collaboration tools and project management software will streamline communication between team members, facilitating effective knowledge sharing and reducing potential bottlenecks. As the R&D department grows, it is essential to maintain an agile mindset.  

As the focus is on finding the organizational fit for a known IT artifact, the recommendations have little relevance outside the organization; and as such are not publishable. This is not to discount any attempt to tease out generalizable knowledge from R&D initiatives and publish them as papers. The notion of “failed R&D” applies only if you treat the identification of an IT artifact as unsuitable for the organization as a failed R&D project. I do not believe you should! 

In contrast, innovation is the pursuit of the unknown. The idea, method or process is either not obvious (it may be obvious in hindsight), or it is outright novel and disruptive. It involves pushing boundaries, challenging pre-existing norms and beliefs, and exploring uncharted territories to create something new. Innovation often requires creativity, critical thinking, and a willingness to take calculated risks. It is driven by curiosity, passion for learning, and an unyielding desire to find better solutions to complex problems. 

Innovation is risky with a high failure rate. Innovation teams are typically small, and members often work in isolation. Most innovation teams maintain some secrecy as the potential worth of some of the artifacts generated is not immediately apparent. Successful innovation projects offer substantial rewards such as competitive advantage, market differentiation, and the opportunity for disruptive breakthroughs that can revolutionize industries or create entirely new ones. Innovation artifacts are often valuable outside the organization and publishable as new knowledge sources. However, it is uncommon to publish or even document the interim artifacts. Most innovation artifacts are “ideas” in the innovator’s mind. 

Many organizations (knowingly or unknowingly) club R&D and innovation teams together and try to blur the boundaries. This may be due to many reasons: 

1. Innovation has a high failure rate. Combining both teams can hedge against the risk of failure. 

2. Combining both teams may encourage knowledge sharing. 

3. True innovation teams are expensive to maintain, and turnover rates are high. 

4. Many innovators are averse to structure organizational norms and culture. 

5. Innovation team may be unaware of organizational facilities, needs and requirements. 

Though all these reasons are valid, combining R&D and innovation teams reduces the chances of disruptive innovation. The decision to or not to combine R&D and innovation teams depends on the organization’s aspirations. 

Kickstart NLP with UMLS

The UMLS, or Unified Medical Language System, is a set of files and software that brings together many health and biomedical vocabularies and standards to enable interoperability between computer systems.

Natural Language Processing (NLP) on the vast amount of data captured by electronic medical records (EMR) is gaining popularity. The recent advances in machine learning (ML) algorithms and the democratization of high-performance computing (HPC) have reduced the technical challenges in NLP. However, the real challenge is not the technology or the infrastructure, but the lack of interoperability — in this case, the inconsistent use of terminology systems.

natural language processing
UMLS for NLP

NLP tasks start with recognizing medical terms in the corpus of text and converting it into a standard terminology space such as SNOMED and ICD. This requires a terminology mapping service that can do this mapping in an easy and consistent manner. The Unified Medical Language System (UMLS) terminology server is the most popular for integrating and distributing key terminology, classification and coding standards. The consistent use of  UMLS resources leads to effective and interoperable biomedical information systems and services, including EMRs.

To make things easier, UMLS provides both REST-based and SOAP-based services that can be integrated into software applications. A high-level library that encapsulated these services, making the REST calls easy to the user is required for the efficient use of these resources.  Umlsjs is one such high-level library for the UMLS REST web services for javascript. It is free, open-source and available on NPM, making it easy to integrate into any javascript (for browsers) or any nodejs applications.

The umlsjs package is available on GitHub and the NPM. It is still work in progress and any coding/documentation contributions are welcome. Please read the CONTRIBUTING.md file on the repository for instructions. If you use it and find any issues, please report it on GitHub.