This repo has been deprecated. Please refer to this one
TLDR; this repo contains code that showcases the process of:
- Ingesting data related to Insurance questions and answers (Insurance QA Dataset) into Delta Lake
- Basic cleaning and preprocessing
- Creating custom PyTorch Lightning
DataModule
andLightningModule
to wrap, respectively, our dataset and our backbone model (distilbert_en_uncased
) - Training with multiple GPUs while logging desired metrics into MLflow and registering model assets into Databricks Model Registry
- Running inference both with single and multiple nodes
- Minwei Feng, Bing Xiang, Michael R. Glass, Lidan Wang, Bowen Zhou. Applying Deep Learning to Answer Selection: A Study and An Open Task
- Fine-tune Transformers Models with PyTorch Lightning
- PyTorch Lightning MLflow Logger
- dbx by Databricks Labs