Skip to content

DaveSyiemlieh/Air-Tweet-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Twitter US Airline Sentiment Classifier (Natural Language Processing #2)

You can find the dataset here

Objective

To classify tweets made by travelers in February 2015 as Neutral, Positive or Negative.

Random Forest

I used the random forest classifier as the problem dealt with a relatively large dataset. Random Forests are also great classifiers when it comes to dealing with a large number of features.

Using max_features = 1600 and n_estimators = 550

I got an accuracy of 75.4% (2208/2928)

However, is there a way to get better results?

Artificial Neural Networks

I decided to use an ANN with:

2 hidden layers (adding a third did not have a significant effect in this case)

Overall, the ANN resulted in better classifications with an accuracy ranging between

Accuracy ~ 76-77%

Conclusion

Finally, the results were not bad for the given dataset which contained many ambiguous/abbreviated tweets that would be difficult for a machine to interpret.

Walking through the Code

Random Forest

The steps taken were as follows:

  • Get the Dataset
  • Pre-process the text
  • Create the Bag of Words Model
  • Label Encode and OneHot Encode the Dependent Variable
  • Split the data into Test and Training sets
  • Train the Random Forest Classifier
  • Get the Predicted values of test set
  • Compare the predicted and test values and use a confusion matrix to calculate the accuracy of the model.
  • Accuracy = (number of correct predictions on testing data / total number of testing data)

ANN

The steps taken were as follows:

  • Get the Dataset
  • Pre-process the text
  • Create the Bag of Words Model
  • Label Encode and OneHot Encode the Dependent Variable
  • Split the data into Test and Training sets
  • Add Layers to your ANN
  • Compile the ANN
  • Get the Predicted values of test set
  • Compare the predicted and test values and use a confusion matrix to calculate the accuracy of the model.
  • Accuracy = (number of correct predictions on testing data / total number of testing data)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages