## A slightly less than advanced statistical evaluation of the NBA draft prospects / NBA Performance Predictions

Hello Everyone!

I've not posted anything to the site in a while but have something that is topical that I wanted to share. As many of you are aware, the Sacramento Kings solicited advanced statisticians to submit their draft analysis and recommendations in the form of a contest which ended on 05/19. I am currently in graduate school and one of my courses is Data Mining for Business Intelligence. My group decided that this contest was the perfect premise for our group project. Even though we weren't going to finish by the contest deadline, we went for it anyhow. I wanted to share an overview of the results. Rather than cut and paste the whole report, I tried to make the process and results a bit more concise. I'll try to answer any questions that may come up.

On the off chance that P'doro reads this, maybe it helps. That's if we don't trade the pick :)

For those of you who only want a general overview of what happened here, basically we took a bunch of NBA players (current and historical). We applied our own player efficiency score to them. We then took a list of approximately 300 NBA players with varied scores and matched them to their college statistics. We evaluated which stats translated over and to what degree. Based on these observations we made a prediction model that we could input current draft prospects stats and see a prediction of what their NBA Player Efficiency score would look like. Our efficiency scoring model caps out at between 22 and 23 points. Only Michael Jordan and Charles Barkley scored above 22. Once you get below 11 points these are players that probably shouldn't have been drafted ever. I made a quick reference list below for everyone to see some historical players and where they fell in the model. Don't be upset if your favorites landed somewhere that you disagree with, it's not science, it's just a statistical inquiry. The model isn't perfect.

For those of you who also want the nitty gritty:

The following are excerpts from our deliverable:

Questions:

1. Based on college performance statistics, can we predict how National Collegiate Athletic Association (NCAA) players will perform in the NBA (Overall Performance, factoring in both Offense and Defense) for the purpose of ranking draft eligible collegiate players in the NBA?
2. If successful, who are the top five NCAA prospects entering the NBA draft at the position of Guard (G), Forward (F), and Center (C)?

Methodology:

Several methodologies were used to evaluate the question outlined. The first step was to identify and obtain relevant data. Data for the NBA was collected from http://stats.nba.com, while data for the NCAA was collected from http://web1.ncaa.org/stats/ and http://sports-reference.com/cbb/players/. Data preparation included gathering historical NBA data for 1257 both retired and currently active players (statistics from 1984 to 2009). In addition, a more complete data set was created for 296 NBA players that included their NCAA statistics. For these players, NBA statistics were paired with their NCAA statistics. Finally, the same related statistics for this year’s top 60 NBA draft prospects were obtained. Ultimately this prospect list concluded with 46 players. The limitations include; international prospects that did not register statistics within the NCAA, and players who decided not to attend college while waiting to meet the NBA’s draft eligibility age requirement of being at least 19 years old.

The second step was to develop a valid output measurement of NBA performance that balanced offensive and defensive statistics. John Hollinger of ESPN has developed such a metric call Player Efficiency Rating (PER) that measures a player’s per minute performance, and can be used to rank players. However, PER requires a deep and complex look at individual and team statistics. Our proposal was to create a more simplistic but valid metric title "Player Effectiveness Score" (PESNBA). After developing and validating the "Player Effectiveness Score" method, the score was applied to the NBA player historical data for alignment to historical player rankings.

The next step was to perform a Cluster Analysis of the NBA player stats. There were two analyses conducted to categorize NBA player ‘clusters’ (Hierarchical Clustering and k-mean clusters). When the PESNBA score was applied to the cluster averages in conjunction with the type of NBA players (marquee players, above average players, and support players) that fell into each cluster, we became confident that our PESNBA output metric was applying an appropriate score to the players within our data set.

In order to create a practical model that would allow for a player's NCAA statistics to predict their PESNBA, both a Neural Network and Multiple Linear Regression methods were applied to the NBA/NCAA player statistics. The resulting model had an acceptable error and was therefore valid in making predictions of draft candidate performance given their NCAA statistics. The final step was to rank 2014 draft candidates. Using our PESNBA scoring model, we predicted the top 5 players (reference Section VII) at the three positions (Guard, Center, and Forward) for 2014 draft candidates. In addition, we applied our PESNBA scoring to the consensus top 60 NBA draft prospects that had available NCAA data to compare to existing mock draft results (based on NBA scouting reports).

Data Collection and Preparation

In order to obtain the required data, career statistics were collected for both NBA and NCAA players. These statistics included: Height, Games Played, Points Scored, Rebounds (Offensive, Defensive, and Total), Field Goal Percentage, Free Throw Percentage, Blocks, Steals, and Assists. The data collected was for NBA players ranging from years 1984 to 2013. Data on NCAA players was only available in large enough quantities for years 2001 to 2014. Data from both NBA and NCAA data sources were processed through a Microsoft Access database in order to clean up and organize records to enable matching of NCAA players with NBA players where available. Through manual search and data entry we attempted to match an additional 100 NBA players to their respective NCAA statistics to strengthen the sample size of our data set and to improve the accuracy of our models. Upon the final compilation and preparation of the data, the "per game" statistics were calculated. A limited amount of calculated PER scores were obtained from the data collection effort. Using these PER scores, the PESNBA was modeled weighing the contribution and overall importance to winning games of five offensive and defensive statistics that were collected. These five statistics with their associated weights are as follows: Points per Game (PPG) - 8%, Rebounds Per Game (RPG) - 12%, Field Goal Percentage (FG %) - 46%, Free Throw Percentage (FT %) - 24%, and Steals Per Game (SPG) - 10%. PESNBA scores ranged from 5 to 22, with a median of 14.

Cluster Analysis:

Using Ward’s method we determined that four clusters, ranging from "Good Offensive Shooters" to "Support Player," would be an appropriate number of clusters for categorization of player types. Further we performed a k-means analysis. Through this analysis, we determined that there were four clusters of player categorized as follows: Elite Players", Offense Leaders, Defense Leaders, and Support Players.

To validate our PESNBA statistic, PESNBA was calculated for all 1257 players in our database. One should expect that clusters with high statistical averages should have high PESNBA scores while clusters with low statistic averages should have low PESNBA scores. This logic was validated through our cluster analysis, and should be a "good" metric that will balance both offensive and defensive statistics and enable a relative ranking of draft prospects.

I. Graphical and Exploratory Analysis

Scatter plots were used to explore the relationships between an individual player’s categorical data for each NCAA statistic, and the related NBA statistic. The key discovery here is that due to the overall increase in talent level at the NBA level, the categorical statistic faces a handicap when being predicted at the next level (i.e. NCAA RPG > NBA RPG for same player). This characteristic between the two data sets makes it challenging to predict if an NCAA prospect can reach the Elite Level (i.e. PESNBA >/= 19.0) in the NBA.

We also used scatter plots to explore the correlation between NCAA categorical statistics and their predictive weight on the player’s PESNBA score. The relationships are summarized in Table 1.

Table 1- NCAA Stats correlation to PESNBA

 Relationship of NCAA Stat to PESNBA Strength of Relationship Direction of Relationship Stat Strong Positive NCAA Field Goal % Moderate Positive NCAA Points Per Game NCAA Rebounds Per Game Weak Positive NCAA Steals Per Game NCAA Blocks Per Game No Correlation NCAA Free Throw % Weak Negative NCAA Assists Per Game

I. Model Building

There were two different modeling techniques used to determine the model that would best predict NCAA player effectiveness if they were drafted to the NBA. Both a Multiple Linear Regression as well as a Neural Network Model were developed and compare for which one provided the lowest RMS error for the validation data. The Multiple Regression model provided an explicit equation. (*SECRET SAUCE REMOVED*), while the Neural Network provided the appropriate weights and node values to conduct a "black box" model to predict the PESNBA score.

II. Error Analysis and Model Selection

Examining the residuals of the Multiple Regression Model we can see they meet necessary assumptions for regressions testing: normally distributed, have a mean of zero, are independent by time (not auto correlated), and have equal variance across the range of values.

The RMS Error was calculated using the model to predicted 2013 player data (for which we had established PESNBA scores), with the Regression Model having an error of 2.04 and the Neural Network model having an error of 2.36. Based both on greater simplicity of the model (not a black box) and the lower RMS error, the Multiple Linear Regression model was selected for ranking the 2014 NBA draft class using the predicted PESNBA.

III. Predictions

Based on the predicted PESNBA score resulting from the use of our Multiple Linear Regression model the team was able to develop a "TOP 5 PROSPECTS" recommendation by sorting the players by position and predicted PESNBA score. To provide a comparison with industry predictions, we applied the same model to consensus ranked top sixty NBA Draft prospects This table provides the top 60 NBA draft picks in sequence of their draft predictions. This sequence is based on performance metrics as well as additional qualitative elements which are cannot be computed through our model. Column 3 of this table depicts the computed PESNBA rating based on the developed model.

IV. Analysis of Weaknesses and Assumptions

The optimal use of our prediction results would be to predict draft value for overall individual NBA performance, and allow for selection of highest value players regardless of the context of their NBA Draft destination. Therefore, we recognize the following shortcomings of our model and the resulting predictions that it has provided.

1. Based on the quantity of players drafted each year (60) there was limited related historical data available. There were a significant number of NCAA players drafted with no NBA statistics available as well as older and retired NBA players whose college statistics were largely unavailable. These shortfalls mean that although our data sets had enough records to be statistically significant they do not encompass all actual data for every historical NBA player who was drafted out of the NCAA.

2. Our metrics in the utilized data is "per game" and is not adjusted for amount of playing time. For example, some scoring models use "per minute" or "per 36 minutes" to smooth the statistics adjusting for amount of playing time. This means that some talented players at NCAA schools with talent laden rosters who did not play significant minutes may not score favorably in the model. In addition, NCAA schools with talent barren rosters may have their top player score more favorably than their actual ability to translate over in the NBA.

3. An NBA General Manager will be looking for individual skill sets that favorably blend with their existing franchise roster. Because our rankings are not catered to one franchise, our PESNBA scoring model does not necessarily provide a draft recommendation for an individual franchise but rather an overall general ranking.

4. The research excluded international players entirely and is very limited for domestic players who decide to skip college and wait until their draft eligible age requirement is met. This means that an international player or domestic player who did not attend college cannot be ranked in our recommendation even if their talent level might warrant them to be included.

5. The data is unforgiving for players who have unquantifiable characteristics from our variables such as leadership ability, physical prowess, athletic prowess, playing with an injury, difficulty of opponent/schedule, winning collegiate program, level of NCAA tournament success, competitive fire/instinct. Essentially, things that are typically qualitative and viewed by the eye test and judged against knowledge and past experience.

**********************************************************************************************************************

So, if you made it this far you probably want to see the result.

Here are the top five Centers, Forwards, and Guards based on our prediction model as well as the consensus top 30 prospects scored by our model. Below the results are the historical NBA player reference information so you can give yourself a frame of reference. I look at these scores as a reference of "NBA Readiness". Obviously many of these young men are still growing and are still developing their games. With that said, some of the top prospects stand out amongst the rest based on our model.

CENTER

1. Joel Embiid 16.5
2. Mitch McGary 16.1
3. Daniel Miller 15.8
4. Mason Cox 14.5
5. Jordan Bachynski 14.0

FORWARD

1. Jordan Morgan 16.5
2. Tarik Black 15.8
3. Richard Solomon 15.6
4. Casey Prather 15.5
5. Talib Zanna 15.3

GUARD

1. Kyle Anderson 16.0
3. Aaron Craft 15
4. Marcus Smart 15
5. Deandre Kane 14.8

TOP 30

1. Andrew Wiggins 12.7
2. Jabari Parker 13.8
3. Joel Embiid 16.5
4. Dante Exum No Score
5. Marcus Smart 14.7
6. Julius Randle 14.1
7. Noah Vonleh 14.5
8. Doug McDermott 14.0
9. Dario Saric No Score
10. Aaron Gordon 15
11. Jusuf Nurkic No Score
12. James Young 11
13. Nik Stauskas 11.8
14. Gary Harris 12.4
15. Rodney Hood 12.4
16. Tyler Ennis 13.3
17. Zach LaVine 11.4
18. TJ Warren 15.1
19. Clint Capela 14.0
21. Kristaps Porzingis No Score
22. KJ McDaniel 12.9
23. Kyle Anderson 16
24. PJ Hairston 10.1
25. Jerami Grant 12.3
27. Elfrid Payton 14.0
28. Shabazz Napier 12.8
29. Glenn Robinson 13.7
30. Mitch McGary 15.6

Reference Players above 22 PES

Michael Jordan; Charles Barkley

Reference Players 21-22 PES

Karl Malone; David Robinson

Reference Players 20-21 PES

LeBron James; Shaquille O’Neal; Tim Duncan; Dirk Nowitzki; Patrick Ewing; Chris Paul

Reference Players 19-20 PES

Kevin Durant; Yao Ming; Dwyane Wade; Chris Webber; Kobe Bryant; Paul Pierce; Steph Curry; Carmelo Anthony; Al Horford; Scottie Pippen

Reference Players 18-19 PES

John Stockton; Andre Iguodala; Glenn Robinson; Charles Oakley; Marcus Camby; Vince Carter; Al Jefferson; Alonzo Mourning; Vlade Divac; Jason Kidd; Kevin Love; Ray Allen; Dikembe Mutombo

Reference Players 17-18 PES

Christian Laettner; LaMarcus Aldridge; Mitch Ritchmond; Monte Ellis; Kevin Johnson; Peja Stojakovic; Jeff Hornacek; Luol Deng; Gary Payton; Ron Artest; Rudy Gay; Tom Gugliotta; Danny Granger; Tyson Chandler; Detlef Schrempf; Manu Ginobili; Rasheed Wallace; Wayman Tisdale; Chris Kaman; Reggie Miller; Rajon Rondo

Reference Players 16-17 PES

Lionel Simmons; Thaddeus Young; Doug Christie; Michael Beasley; Jermaine O’Neal; Russell Westbrook; Tim Hardaway; Jamal Mashburn; Derrick Rose; Kevin Martin; Cedric Ceballos; Joe Johnson; Deron Williams; Wally Szczerbiak; Mike Bibby; Danny Fortson; Tony Parker; Chauncey Billups

Reference Players 15-16 PES

Isaiah Rider; Jerry Stackhouse; Dan Majerle; Allen Houston; Kirk Hinrich; Tayshaun Prince; Shane Battier; Raef LaFrentz; Chris Wilcox; Brandon Jennings; Jamal Crawford; Marcus Thornton; Tony Kukoc; Damon Stoudamire; Brent Barry; John Salmons; Hido Turkoglu; Serge Ibaka; Robert Horry

Reference Players 14-15 PES

Luc Longley; Kyle Korver; J.J. Hickson; Spencer Hawes; Nate Robinson; Luke Ridnour; Trevor Ariza; Nicolas Batum; James Harden; Jason Williams; Bobby Jackson; Beno Udrih; Matt Barnes; Marcin Gortat

Reference Players 13-14 PES

Danny Ferry; Austin Croshere; Chase Budinger; Joe Kleine; Brian Shaw; Steve Kerr; Greg Ostertag; Earl Watson; Tyler Hansbrough; Luke Walton; Jared Jeffries; Samaki Walker; Steve Blake

Reference Players 12-13 PES

Jason Hart; J.J. Redick; Sebastian Telfair; Kris Humphries; Jerome James; Jordan Farmar; Sergio Rodriguez; Chris Dudley

Reference Players 11-12 PES

Jeff Teague; Acie Law; Loren Woods; DeSagna Diop; Von Wafer; Adam Morrison; Wang ZhiZhi

(This is a FanPost from a member of the Sactown Royalty community. The views expressed come from the member, and not Sactown Royalty staff.)

SB Nation Featured Video

## Trending Discussions

forgot?

We'll email you a reset link.

Try another email?

### Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

### Join Sactown Royalty

You must be a member of Sactown Royalty to participate.

We have our own Community Guidelines at Sactown Royalty. You should read them.

### Join Sactown Royalty

You must be a member of Sactown Royalty to participate.

We have our own Community Guidelines at Sactown Royalty. You should read them.