Candidate : Emiliano Moreno
Location : Santiago, CL
Mail Box : [email protected] (Academic) / [email protected] (Personal)
In the notebook 'solution.ipynb' we are gonna explore the SCL dataset, which contains data on flights departing from the main airport in Santiago Chile. Our main goal will be to predict if, given a set of conditions, called covariates or features, a particular flight will be delayed.
First, a exploratory analysis is done, in order to better understand the data we are dealing with.
Then, we proceed with data cleaning and feature engineering to get a well crafted data matrix for modeling.
With our data ready for modeling, we train three types of classificators. Last but not least, we evaluate model performance.
In addition to the files requested, two more .csv
files are delivered through the repo:
-
test_set.csv
: containing the final data used for testing. -
train_set.csv
: containing the final data used for modeling.
For the sake of clarity, I've included in solution.ipynb
several sections not explicitly requested.
The main points that were actually requested are marked with bold square brackets containing the 'challenge' number. They look very mucho like the following example:
- [Challenge #] A example section