NLP-with-Disaster-Tweets

Real or not ? Predict which tweets are about real disasters and which ones are not.

Getting started

Introduction

This repository contains my solution for the Kaggle's NLP disaster tweets classification competition. You may find several solutions I've came up with as well as an exploratory data analysis notebook.

Embeddings used :

GloVe
BERT

Problem's description

Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

But, it’s not always clear whether a person’s words are actually announcing a disaster. Take this example:

The author explicitly uses the word “ABLAZE” but means it metaphorically. This is clear to a human right away, especially with the visual aid. But it’s less clear to a machine.

The goal is to build a machine learning model that predicts which Tweets are about real disasters and which one’s aren’t. The dataset is composed of 10,000 tweets that were hand classified.

Solution overview

Data cleaning

Ekphrasis offers a quick and interesting solution. Coupled with some additionnal regex work, it was possible to get a satisfying dataset.

Models

Based on these solutions, BERT gives very good scores. It managed to provide embeddings for each tweets and separate the ones that deal with real disaster from the ones that does not. The images below show the separation these tweets. The point cloud is obtained from the input of the last network layer using PCA.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
imgs		imgs
src		src
.gitignore		.gitignore
00_EDA.ipynb		00_EDA.ipynb
01_basic_models.ipynb		01_basic_models.ipynb
02_MLP_baseline.ipynb		02_MLP_baseline.ipynb
03_CNN.ipynb		03_CNN.ipynb
04_LSTM.ipynb		04_LSTM.ipynb
05_GRU.ipynb		05_GRU.ipynb
06_LSTM+keyword.ipynb		06_LSTM+keyword.ipynb
07_BERT_baseline.ipynb		07_BERT_baseline.ipynb
08_BERT_and_models.ipynb		08_BERT_and_models.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NLP-with-Disaster-Tweets

Getting started

Introduction

Problem's description

Solution overview

Data cleaning

Models

About

Releases

Packages

Languages

vesran/NLP-with-Disaster-Tweets

Folders and files

Latest commit

History

Repository files navigation

NLP-with-Disaster-Tweets

Getting started

Introduction

Problem's description

Solution overview

Data cleaning

Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages