Fomo is an ongoing "machine learning bootcamp", based on the idea that ML is gaining traction and honestly we are collectively not super aware of the strengths and weaknesses of how this can be applied in a biological/evolutionary context. In this light, we have established of a collective monthly ML hackathon. We will meet regularly (once a month), sit down together and hash through different ML paradigms, think about how these can be applied to the kinds of data we are generating and the questions we have, and collectively work through tutorials, and maybe even do some preliminary exploratory analysis. For example, one week we might evaluate boosted regression. What are the core ideas of what this is doing? How do you achieve this in R and python? Explore how this can be applied to genetic/morphological/abundance/phylogenetic data.
- The truly excellent and useful Python Data Science Handbook.
- An excellent 12-part ML Course: Introduction to Machine Learning for Coders, from the makers of
fastai
. "Unlike many educational materials in the field, our approach is “code first” rather than “math first”." - Practical Machine Learning Tutorial with Python Introduction: An extensive ML online course based on sklearn & TensorFlow. Very, very long.
- Machine Learning Online course (Stanford), with exercises on github
TL;DR Gradient boosting is "better", but random forest is easier to tune, and maybe faster. Additionally, gradient boosting may have trouble when the training data is noisy.
- Nice overview of ensemble methods
- A more in-depth review of the strengths and weaknesses of GBM vs RF
- A detailed and careful explanation of Gradient Boosting
- XGBoost has nice documentation, plenty of examples and tutorials and R & python interfaces. It seems this is a very strong ML package, but could be more difficult to use than other options.
- [Another explanation of GBM (including a nice visual representation))[https://www.analyticsvidhya.com/blog/2016/02/complete-guide-parameter-tuning-gradient-boosting-gbm-python/]. Also goes into extensive depth explaining all the parameters.
- Zhou et al 2018 A primer on deep learning in genomics
- fastai: Python wrapper around pytorch focused on making construction and training of NN fast and easy. Good documentation and examples (focused on vision, text classficiation, and tabular datasets).
- keras, a high level python deep learning library. Tons of excellent examples in the repo.
- Time Series Prediction Using LSTM Deep Neural Networks - This person actually implements the LSTM neural network from scratch! Cool for learning about the nuts and bolts.