Predicting the Result of an NBA Shot: Machine Learning Analysis Project

Albert Tan
2 min readMay 7, 2021

--

In my previous project, I used data visualization to analyze the percentage of shots made in relation to the number of seconds left on the shot clock and the distance from the basket, and produced interesting results. In this project, I explored different machine learning models in sklearn to see whether I could predict whether a shot was made given data about it.

I carried over the same dataset from the previous project, which consisted of every single shot taken in the 2014–2015 NBA season. The data initially consisted of 21 columns and 128,000 rows. I removed irrelevant columns that wouldn’t be of potential interest. The columns I ended up with were Shot Clock, Dribbles, Touch Time, Shot Distance, Closest Defender Distance, and whether or not the field goal was made. I then dropped rows with missing values and normalized the data via pandas.

After determining the target variable (field goal made) and splitting the data into training and testing (test_size = 0.3), I trained seven different models on the data and calculated the accuracy.

  1. Dummy Classifier (Dummy): this model predicts the outcome that was most frequently seen in the training data (field goal missed). Accuracy = 0.544
  2. Logistic Regression (Log). Accuracy = 0.608
  3. Decision Tree Classifier (DT): max_depth = 10, min_samples_leaf = 3. Accuracy = 0.610
Confusion Matrix for DT
  1. Decision Tree Classifier with ADA boost (ADA): n_estimators = 20, learning rate = 0.001. Accuracy = 0.611
  2. Bagging Classifier (Bag): n_estimators = 20. Accuracy = 0.580
  3. Random Forest Classifier (RF): n_estimators = 50. Accuracy = 0.586
  4. Soft Voting Classifier (Voting): consisting of DT, RF, Log, and Bag, with weights of 3, 1, 20, 1, respectively. Accuracy = 0.612
  5. Optimal Decision Tree Classifier with ADA Boost (Optimal): max_depth = 6, min_samples_leaf = 17, n_estimators = 10, and learning rate = 0.19. Accuracy = 0.619

From these models I chose the Decision Tree Classifier with ADA boost to further analyze, since its accuracy was one of the highest. Using for loops, I fine-tuned its hyperparameters of max_depth, min_samples_leaf, n_estimators, and learning rate. The results are shown above.

Conclusion

The Optimal Decision Tree Classifier with ADA Boost was able to improve the accuracy of classifying the NBA shots by nearly 8%. However, a model with 61.9% accuracy is not good enough to be utilized in further research. To improve the accuracy, we must introduce more predictor variables or even change the model entirely to perhaps a neural network.

--

--

No responses yet