This code performs two tasks:
-
Given data on performance for a set of NBA basketball players, select the best Markov chain (aka Markov-switching) time series model for each player among five candidates: pure AR model (one state), AR with two states, AR with three states, AR with two states and exogenous variables, and AR with three states and exogenous variables. (see code/main_model_selection.py). The code writes the results to the file "markov_summary_gh.txt".
-
Given the output of the above, estimate the model for each player, and write historical data, a forecast of performance, and the estimated current state to "forecasts_only_gh.csv". (see main_estimate_forecast.py)
It uses the statsmodels package for estimation, adding a number of utility functions to extract parameters, compute model selection criteria, and produce forecasts (see code/utilities.py)
This code implements a simple version of the Markov switching model
where
A good reference on Markov switching models is "Regime Switching Models", Palgrave Dictionary of Economics, James Hamilton (2005)
Start with main_model_selection.py--set the following parameters as desired (lines 7-12):
min_datalength = 30 # minimum number of obs for estimation.
variable = "FG_PCT" # series for modeling.
exog_variables = ["REST_DAYS", "HOME/AWAY"] # exogenous variables.
subtract_ma = False # whether to subtract moving average from the independent variable.
ma_order = 2 # order of MA variable.
Note that for this demo version, I have limited the number of players to 5:
players = list(players)[:5]
Comment out or remove this line to estimate the model for all players. Note, however, that this can run a long time for large datasets (one run with over 500 players and 1.5 million records ran over 24 hours on a laptop).
Next, run main_estimate_forecast.py--ensure that you have the above parameters set the same way (lines 10-16). The output file "forecasts_only_gh.csv" will contain the results for each player.