Fun with math: Similarity Scores and the Kentucky Wildcats


One of the fun things to discuss about sports are the ways players and teams are related to historical counterparts.  Who does a certain player remind you of?  What team plays like this one?  Most of the time these discussions are limited by our own experience, memory and observation which tends to limit the available pool of comparisons.  Many years ago the famed baseball writer Bill James introduced the idea of Similarity Scores - a method of quantifying how similar the careers of two players were.  This provided a way of identifying potentially similar baseball players without the prerequisite of having seen them play - an advantage given the long history of the sport.  The idea has since been applied to basketball (mostly NBA) and is used quite frequently in making player projections.
 
I've been fiddling around with doing something similar for college basketball teams because I was interested in what kind of results I would get and I wanted to know if there was any value in applying the method to the current season to see if it might lend any insight as to the future prospects of Kentucky.  While I wouldn't take this too seriously, some of the results I got were quite intrigueing and if nothing else I think it might provoke some good discussions.

The use of similarity scores as a means of comparing teams - while not unheard of - is not common, but I did find a discussion on the APBRmetrics board where someone did pretty much the same thing I did only for NBA teams.  For the sake of completeness, here is an article from a few years ago discussing some of the philosophy behind the use of Similarity Scores.

More after the jump

The method I used is fairly straightforward: I used the 4 Factors for both offense and defense as a means to compare two teams.  This gives me 8 categories to compare and provides a solid fundamental description of every team.  I computed the z-score for each category and used the Euclidean Distance to measure how "close" two teams are.  If you don't know what a z-score or Euclidean distance are then don't worry - just know the lower the score, the better the match.  After that it's a simple process to sort the scores to find the closest matches.

Fortunately for me, Ken Pomeroy has data on the 4 factors available in a handy csv file for each season starting with 2003-2004.  This gave me data on 2,011 team-seasons to use for comparisons and made it relatively easy to set up a spreadsheet with all the pertinant information.

Here I must admit to a bit of mathematical fudging.  While Pomeroy provides the statistical means for each category for each season, he does not list the standard deviations so to get exact figures would require me to go team-by-team to get the exact number of rebounds, possessions, field goal attempts, and free throws for each. That's a rather overwhelming task for 2011 teams, so rather than use exact values for the standard deviation and the mean, I estimated them for each season using the values in the csv file.  I don't think this makes too much difference however, as the difference between the estimated mean and the true mean tends to be less than one half of one percent (0.05%) except for defensive free throw rate where the difference is closer to 1%.  I suspect the estimated and true standard deviations are similarly close.

On to some results!

I started with finding some comps for UK's two best teams of the last 6 years: 2004 and 2005.

Year Team Score Record PostSeason
2004 Kentucky 0 27 - 5 2nd rd
2005 Oklahoma 1.12 25 - 8 2nd rd
2007 North Dakota St. 1.24 20 - 8
2004 Stanford 1.26 30 - 2 2nd rd
2006 UCLA 1.26 32 - 7 Champ. Gm
2007 Kansas 1.3 33 - 5 Elite 8
2009 Louisville 1.34 31 - 6 Elite 8
2004 Central Florida 1.37 25 - 6 1st round
2004 Florida St. 1.4 19 - 14 NIT
2006 Winthrop 1.4 23 - 8 1st round
2008 Illinois St. 1.44 25 - 10 NIT

2005 Kentucky 0 28 - 6 Elite 8
2008 Duke 0.92 28 - 6 2nd rd
2006 Arkansas 1.15 22 - 10 1st rd
2005 Wisconsin Milwaukee 1.36 26 - 6 Sweet 16
2009 East Tennessee St. 1.49 23 - 11 1st rd
2009 Duke 1.51 30 - 7 Sweet 16
2006 Pennsylvania 1.57 20 - 9 1st rd
2008 Akron 1.58 24 - 11 NIT
2007 Purdue 1.6 22 - 12 2nd rd
2004 East Tennessee St. 1.63 27 - 6 1st rd
2005 George Washington 1.64 22 - 8 1st rd


The 2004 team has some interesting comps.  Stanford was also a #1 seed in 2004 and was also upset in the 2nd round of the NCAA Tournament by Alabama.  There are also a couple of teams that made the Elite 8 and a UCLA Final 4 squad.  The top 10 is rounded out by some smaller schools that either made the NIT or were early exits in the Big Dance.  The 2005 UK team has a little different makeup in the top 10: only 1 NIT team and every other school made the tournament but none made it past the Sweet 16.

Next up is last year's UK squad that went to the NIT:

2009 Kentucky 0 22 - 14 NIT
2004 Virginia Commonwealth 1.37 23 - 8 1st rd
2008 Texas Arlington 1.39 21 - 12 1st rd
2006 UCLA 1.51 32 - 7 Champ. Gm
2007 Duke 1.51 22 - 11 1st rd
2009 Kansas 1.56 27 - 8 Sweet 16
2009 Wake Forest 1.6 24 - 7 1st rd
2005 Texas A&M Corpus Chris 1.61 20 - 8
2007 Central Florida 1.62 22 - 9
2007 Arkansas 1.68 21 - 13 1st rd
2004 St. Mary's 1.7 19 - 12


I think the results here are actually rather informative.  There are a few NCAA participants and some small schools that missed the tournament, but I think this lines up with what we saw from last year's squad: they definitely had the talent to get to make the NCAAs and were well on their way after starting 4 - 0 in SEC Play but - as we know all too well - just fell apart down the stretch.

By the way, you'll notice that 2006 UCLA team pops up again.  They are actually an interesting case.  Nearly every comparison I've done so far has had a dozen teams with a comparison score of 2 or less - except for 2006 UCLA.  That team has around 85 teams with a comparison score of 2 or less.  The 2005 UCLA team has 180 such scores and the 2007 UCLA team has around 40.  I'm not sure what that means exactly, but I have yet to find another team with such a large number of low scores.

Here are some more non-UK results: The 2008 Memphis team and last year's national champs.

2008 Memphis 0 38 - 2 Champ game
2009 North Carolina 0 34 - 4 NCAA Champ
2009 Memphis 1.44 33 - 4 Sweet 16
2007 North Carolina 1.27 31 - 7 Elite 8
2007 North Carolina 1.72 31 - 7 Elite 8
2007 Notre Dame 1.45 24 - 8 1st rd
2008 Wisconsin 1.78 31 - 5 Sweet 16
2008 Pittsburgh 1.55 27 - 10 2nd rd
2004 Cincinnati 1.79 25 - 7 2nd rd
2007 Texas 1.62 25 - 10 2nd rd
2004 Nevada 1.81 25 - 9 Sweet 16
2007 Ohio St. 1.66 35 - 4 Champ game
2007 Notre Dame 1.82 24 - 8 1st rd
2008 North Carolina 1.78 36 - 3 Final 4
2008 St. Mary's 1.85 25 - 7 1st rd
2008 UCLA 1.8 35 - 4 Final 4
2008 Kansas 1.89 37 - 3 NCAA Champ
2005 Charlotte 1.88 21 - 8 1st rd
2007 Wisconsin 1.91 30 - 6 2nd rd
2008 St. Mary's 1.92 25 - 7 1st rd
2008 UCLA 1.92 35 - 4 Final 4
2009 St. Mary's 1.96 28 - 7 NIT


These were elite teams and that's reflected in the comparisons.  All the Top 10 comps for both schools made the NCAA tournament with the exception of UNC's #10.  For those that don't recall, that 2009 St. Mary's team starred Patty Mills and would have easily made the tournament had Mills not been injured against Gonzaga halfway through the season.  As it was they were one of the teams right on the bubble on the Selection Sunday that year.  Otherwise you see a lot of teams that had considerable success in the tournament.  For Memphis 6 of 10 comps got past the first weekend of the tournament and 4 of 10 did the same for UNC.

You'll also note that the #1 comp for both teams are the same school from either the previous or successive season.  That makes a lot of sense when you think about the makeup and stability of each squad from year to year and I think it lends some credibility to this method.  As an aside, I found similar results for other teams where successive seasons from the same school appeared in the top 10 comps.  These tended to be schools with several important returning players and no coaching changes.

Okay, so now the moment you've all been waiting for.  Here is the current edition of the Wildcats.  I actually have two sets to show you, one was done on Monday, before the Hartford game, the other was done Wednesday.  Since this is an in-season comparison, I wanted to see how the list changes as more games are played.

Year Team Score Record PostSeason
Year Team Score Record PostSeason
2010 Kentucky 0 13 – 0 ?
2010 Kentucky 0 14 – 0 ?
2006 North Carolina 0.99 23 – 8 2nd rd
2006 North Carolina 1.11 23 – 8 2nd rd
2005 Pittsburgh 1.17 20 – 9 1st rd
2004 Mississippi St. 1.38 26 – 4 2nd rd
2004 Mississippi St. 1.4 26 – 4 2nd rd
2005 Pittsburgh 1.48 20 – 9 1st rd
2008 New Mexico St. 1.55 21 – 14

2008 North Carolina 1.53 36 – 3 Final 4
2008 Syracuse 1.61 21 – 14 NIT
2008 New Mexico St. 1.73 21 – 14
2008 North Carolina 1.82 36 – 3 Final 4
2008 Syracuse 1.81 21 – 14 NIT
2007 Providence 1.84 18 – 13 NIT
2007 North Carolina 1.81 31 - 7 Elite 8
2005 Mississippi St. 1.85 23 – 11 2nd rd
2009 Pittsburgh 1.85 31 – 5 Elite 8
2007 North Dakota St. 1.91 20 – 8

2007 North Dakota St. 1.89 20 - 8
2006 Louisiana St. 1.92 27 – 9 Final 4
2006 Louisiana St. 1.98 27 – 9 Final 4


For the most part the lists are identical.  8 of the 10 comps are the same and in mostly the same order.  I've highlighted the differences and you can see that the 7th and 8th highest comps have changed to more impressive teams after the win over Hartford.  These comps suggest that right now UK is good enough to make the Sweet 16/Elite 8.  Of particular interest to me is the presence of three recent UNC teams.  Those teams had a lot of young talent, much like our Wildcats this year and those teams did pretty well collectively.  Recall that 2007 UNC team starred sophomore Tyler Hansbrough and freshmen Ty Lawson & company.  The game they lost to Georgetown in the Elite 8 was one in which they dominate the first 30+ minutes and had a huge lead only to fall apart at the end.  Despite some struggles early, I have a lot more confidence in the Cats ability to close out teams late.  The only team that looks really out of place is North Dakota St, but let me tell you that 2007 NDS team has a lot of good, BCS team comps in their top 10.

I'm going to continue tracking UK this way for the rest of the season.  I won't do it after every game, but maybe every couple of weeks while I also play around with the lists and look for any interesting patterns.  If you would like a copy of the spreadsheet I'm using you can email me and I'll send you a copy.
X
Log In Sign Up

forgot?
Log In Sign Up

Forgot password?

We'll email you a reset link.

If you signed up using a 3rd party account like Facebook or Twitter, please login with it instead.

Forgot password?

Try another email?

Almost done,

By becoming a registered user, you are also agreeing to our Terms and confirming that you have read our Privacy Policy.

Join A Sea Of Blue

You must be a member of A Sea Of Blue to participate.

We have our own Community Guidelines at A Sea Of Blue. You should read them.

Join A Sea Of Blue

You must be a member of A Sea Of Blue to participate.

We have our own Community Guidelines at A Sea Of Blue. You should read them.

Spinner.vc97ec6e

Authenticating

Great!

Choose an available username to complete sign up.

In order to provide our users with a better overall experience, we ask for more information from Facebook when using it to login so that we can learn more about our audience and provide you with the best possible experience. We do not store specific user data and the sharing of it is not required to login with Facebook.

tracking_pixel_9347_tracker