Insurance Q&A Intent Classification with Databricks & Hugging Face

This repo has been deprecated. Please refer to this one

TLDR; this repo contains code that showcases the process of:

Ingesting data related to Insurance questions and answers (Insurance QA Dataset) into Delta Lake
Basic cleaning and preprocessing
Creating custom PyTorch Lightning DataModule and LightningModule to wrap, respectively, our dataset and our backbone model (distilbert_en_uncased)
Training with multiple GPUs while logging desired metrics into MLflow and registering model assets into Databricks Model Registry
Running inference both with single and multiple nodes

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.dbx		.dbx
.github/workflows		.github/workflows
conf		conf
img		img
insuranceqa		insuranceqa
notebooks		notebooks
tests		tests
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
setup.py		setup.py