Deploying large language models (LLMs) can be difficult because they require a lot of memory and computing power to run efficiently. Companies want to create smaller task-specific LLMs that are cheap and easy to deploy. Such small models may even be more interpretable, an important consideration in healthcare.

Distilling LLMs to small task-specific models

Distilling LLMs refers to the process of training a smaller, more efficient model to mimic the behaviour of a larger, more complex LLM. This is done by training the smaller model on the same task as the larger model but using the predictions of the larger model as “soft targets” or guidance during training. The goal of distillation is to transfer the knowledge and capabilities of the larger model to the smaller model, without requiring the same level of computational resources.

Distilling step-by-step is an efficient distillation method proposed by Google that requires less amount of training data. The intuition is that the use of rationale generated by a chain of thought prompting along with labels during training, thereby framing it as multi-task learning, improves distillation performance. We can use ground truth labels or use a teacher LLM to generate the labels and rationale. Ground truth labels are the correct labels for the data, and they are typically obtained from human annotators. The rationale for each label can be generated by using the model to generate a short explanation for why the model predicted that label.

The paper on the method is here and the repository is here. I have converted the code from the original repository into a tool that can be used to distill any seq2seq model into a smaller model based on a generic schema. See the repository below. The original paper uses Google’s T5-v1 model, which is a large-scale language model that was developed by Google. It is part of the T5 (Text-to-Text Transfer Transformer) family of models and is based on the Transformer architecture. You can find more open-source base models for distilling on huggingface. The next plan is to use this method to create a model that can predict the FHIR filter for this repository.

Distilling LLMs step by step!

I will update this post regularly with my findings and notes on distilling models. Also, please check out my post on NLP tools in healthcare.

Bell Eapen
Follow Me