Predicting the Result of an NBA Shot: Machine Learning Analysis Project

2 min readMay 7, 2021

In my previous project, I used data visualization to analyze the percentage of shots made in relation to the number of seconds left on the shot clock and the distance from the basket, and produced interesting results. In this project, I explored different machine learning models in sklearn to see whether I could predict whether a shot was made given data about it.

I carried over the same dataset from the previous project, which consisted of every single shot taken in the 2014–2015 NBA season. The data initially consisted of 21 columns and 128,000 rows. I removed irrelevant columns that wouldn’t be of potential interest. The columns I ended up with were Shot Clock, Dribbles, Touch Time, Shot Distance, Closest Defender Distance, and whether or not the field goal was made. I then dropped rows with missing values and normalized the data via pandas.

After determining the target variable (field goal made) and splitting the data into training and testing (test_size = 0.3), I trained seven different models on the data and calculated the accuracy.

Dummy Classifier (Dummy): this model predicts the outcome that was most frequently seen in the training data (field goal missed). Accuracy = 0.544
Logistic Regression (Log). Accuracy = 0.608
Decision Tree Classifier (DT): max_depth = 10, min_samples_leaf = 3. Accuracy = 0.610

Decision Tree Classifier with ADA boost (ADA): n_estimators = 20, learning rate = 0.001. Accuracy = 0.611
Bagging Classifier (Bag): n_estimators = 20. Accuracy = 0.580
Random Forest Classifier (RF): n_estimators = 50. Accuracy = 0.586
Soft Voting Classifier (Voting): consisting of DT, RF, Log, and Bag, with weights of 3, 1, 20, 1, respectively. Accuracy = 0.612
Optimal Decision Tree Classifier with ADA Boost (Optimal): max_depth = 6, min_samples_leaf = 17, n_estimators = 10, and learning rate = 0.19. Accuracy = 0.619

From these models I chose the Decision Tree Classifier with ADA boost to further analyze, since its accuracy was one of the highest. Using for loops, I fine-tuned its hyperparameters of max_depth, min_samples_leaf, n_estimators, and learning rate. The results are shown above.

Conclusion

The Optimal Decision Tree Classifier with ADA Boost was able to improve the accuracy of classifying the NBA shots by nearly 8%. However, a model with 61.9% accuracy is not good enough to be utilized in further research. To improve the accuracy, we must introduce more predictor variables or even change the model entirely to perhaps a neural network.

Predicting the Result of an NBA Shot: Machine Learning Analysis Project

Conclusion

Written by Albert Tan

No responses yet