One of the fun things to discuss about sports are the ways players and teams are related to historical counterparts. Who does a certain player remind you of? What team plays like this one? Most of the time these discussions are limited by our own experience, memory and observation which tends to limit the available pool of comparisons. Many years ago the famed baseball writer Bill James introduced the idea of Similarity Scores - a method of quantifying how similar the careers of two players were. This provided a way of identifying potentially similar baseball players without the prerequisite of having seen them play - an advantage given the long history of the sport. The idea has since been applied to basketball (mostly NBA) and is used quite frequently in making player projections.
I've been fiddling around with doing something similar for college basketball teams because I was interested in what kind of results I would get and I wanted to know if there was any value in applying the method to the current season to see if it might lend any insight as to the future prospects of Kentucky. While I wouldn't take this too seriously, some of the results I got were quite intrigueing and if nothing else I think it might provoke some good discussions.
The use of similarity scores as a means of comparing teams - while not unheard of - is not common, but I did find a discussion on the APBRmetrics board where someone did pretty much the same thing I did only for NBA teams. For the sake of completeness, here is an article from a few years ago discussing some of the philosophy behind the use of Similarity Scores.
More after the jump
The method I used is fairly straightforward: I used the 4 Factors for both offense and defense as a means to compare two teams. This gives me 8 categories to compare and provides a solid fundamental description of every team. I computed the z-score for each category and used the Euclidean Distance to measure how "close" two teams are. If you don't know what a z-score or Euclidean distance are then don't worry - just know the lower the score, the better the match. After that it's a simple process to sort the scores to find the closest matches.
Fortunately for me, Ken Pomeroy has data on the 4 factors available in a handy csv file for each season starting with 2003-2004. This gave me data on 2,011 team-seasons to use for comparisons and made it relatively easy to set up a spreadsheet with all the pertinant information.
Here I must admit to a bit of mathematical fudging. While Pomeroy provides the statistical means for each category for each season, he does not list the standard deviations so to get exact figures would require me to go team-by-team to get the exact number of rebounds, possessions, field goal attempts, and free throws for each. That's a rather overwhelming task for 2011 teams, so rather than use exact values for the standard deviation and the mean, I estimated them for each season using the values in the csv file. I don't think this makes too much difference however, as the difference between the estimated mean and the true mean tends to be less than one half of one percent (0.05%) except for defensive free throw rate where the difference is closer to 1%. I suspect the estimated and true standard deviations are similarly close.
On to some results!
I started with finding some comps for UK's two best teams of the last 6 years: 2004 and 2005.
|2004||Kentucky||0||27 - 5||2nd rd|
|2005||Oklahoma||1.12||25 - 8||2nd rd|
|2007||North Dakota St.||1.24||20 - 8|
|2004||Stanford||1.26||30 - 2||2nd rd|
|2006||UCLA||1.26||32 - 7||Champ. Gm|
|2007||Kansas||1.3||33 - 5||Elite 8|
|2009||Louisville||1.34||31 - 6||Elite 8|
|2004||Central Florida||1.37||25 - 6||1st round|
|2004||Florida St.||1.4||19 - 14||NIT|
|2006||Winthrop||1.4||23 - 8||1st round|
|2008||Illinois St.||1.44||25 - 10||NIT|
|2005||Kentucky||0||28 - 6||Elite 8|
|2008||Duke||0.92||28 - 6||2nd rd|
|2006||Arkansas||1.15||22 - 10||1st rd|
|2005||Wisconsin Milwaukee||1.36||26 - 6||Sweet 16|
|2009||East Tennessee St.||1.49||23 - 11||1st rd|
|2009||Duke||1.51||30 - 7||Sweet 16|
|2006||Pennsylvania||1.57||20 - 9||1st rd|
|2008||Akron||1.58||24 - 11||NIT|
|2007||Purdue||1.6||22 - 12||2nd rd|
|2004||East Tennessee St.||1.63||27 - 6||1st rd|
|2005||George Washington||1.64||22 - 8||1st rd|
The 2004 team has some interesting comps. Stanford was also a #1 seed in 2004 and was also upset in the 2nd round of the NCAA Tournament by Alabama. There are also a couple of teams that made the Elite 8 and a UCLA Final 4 squad. The top 10 is rounded out by some smaller schools that either made the NIT or were early exits in the Big Dance. The 2005 UK team has a little different makeup in the top 10: only 1 NIT team and every other school made the tournament but none made it past the Sweet 16.
Next up is last year's UK squad that went to the NIT:
|2009||Kentucky||0||22 - 14||NIT|
|2004||Virginia Commonwealth||1.37||23 - 8||1st rd|
|2008||Texas Arlington||1.39||21 - 12||1st rd|
|2006||UCLA||1.51||32 - 7||Champ. Gm|
|2007||Duke||1.51||22 - 11||1st rd|
|2009||Kansas||1.56||27 - 8||Sweet 16|
|2009||Wake Forest||1.6||24 - 7||1st rd|
|2005||Texas A&M Corpus Chris||1.61||20 - 8|
|2007||Central Florida||1.62||22 - 9|
|2007||Arkansas||1.68||21 - 13||1st rd|
|2004||St. Mary's||1.7||19 - 12|
I think the results here are actually rather informative. There are a few NCAA participants and some small schools that missed the tournament, but I think this lines up with what we saw from last year's squad: they definitely had the talent to get to make the NCAAs and were well on their way after starting 4 - 0 in SEC Play but - as we know all too well - just fell apart down the stretch.
By the way, you'll notice that 2006 UCLA team pops up again. They are actually an interesting case. Nearly every comparison I've done so far has had a dozen teams with a comparison score of 2 or less - except for 2006 UCLA. That team has around 85 teams with a comparison score of 2 or less. The 2005 UCLA team has 180 such scores and the 2007 UCLA team has around 40. I'm not sure what that means exactly, but I have yet to find another team with such a large number of low scores.
Here are some more non-UK results: The 2008 Memphis team and last year's national champs.
|2008||Memphis||0||38 - 2||Champ game||2009||North Carolina||0||34 - 4||NCAA Champ|
|2009||Memphis||1.44||33 - 4||Sweet 16||2007||North Carolina||1.27||31 - 7||Elite 8|
|2007||North Carolina||1.72||31 - 7||Elite 8||2007||Notre Dame||1.45||24 - 8||1st rd|
|2008||Wisconsin||1.78||31 - 5||Sweet 16||2008||Pittsburgh||1.55||27 - 10||2nd rd|
|2004||Cincinnati||1.79||25 - 7||2nd rd||2007||Texas||1.62||25 - 10||2nd rd|
|2004||Nevada||1.81||25 - 9||Sweet 16||2007||Ohio St.||1.66||35 - 4||Champ game|
|2007||Notre Dame||1.82||24 - 8||1st rd||2008||North Carolina||1.78||36 - 3||Final 4|
|2008||St. Mary's||1.85||25 - 7||1st rd||2008||UCLA||1.8||35 - 4||Final 4|
|2008||Kansas||1.89||37 - 3||NCAA Champ||2005||Charlotte||1.88||21 - 8||1st rd|
|2007||Wisconsin||1.91||30 - 6||2nd rd||2008||St. Mary's||1.92||25 - 7||1st rd|
|2008||UCLA||1.92||35 - 4||Final 4||2009||St. Mary's||1.96||28 - 7||NIT|
These were elite teams and that's reflected in the comparisons. All the Top 10 comps for both schools made the NCAA tournament with the exception of UNC's #10. For those that don't recall, that 2009 St. Mary's team starred Patty Mills and would have easily made the tournament had Mills not been injured against Gonzaga halfway through the season. As it was they were one of the teams right on the bubble on the Selection Sunday that year. Otherwise you see a lot of teams that had considerable success in the tournament. For Memphis 6 of 10 comps got past the first weekend of the tournament and 4 of 10 did the same for UNC.
You'll also note that the #1 comp for both teams are the same school from either the previous or successive season. That makes a lot of sense when you think about the makeup and stability of each squad from year to year and I think it lends some credibility to this method. As an aside, I found similar results for other teams where successive seasons from the same school appeared in the top 10 comps. These tended to be schools with several important returning players and no coaching changes.
Okay, so now the moment you've all been waiting for. Here is the current edition of the Wildcats. I actually have two sets to show you, one was done on Monday, before the Hartford game, the other was done Wednesday. Since this is an in-season comparison, I wanted to see how the list changes as more games are played.
|2010||Kentucky||0||13 – 0||?||2010||Kentucky||0||14 – 0||?|
|2006||North Carolina||0.99||23 – 8||2nd rd||2006||North Carolina||1.11||23 – 8||2nd rd|
|2005||Pittsburgh||1.17||20 – 9||1st rd||2004||Mississippi St.||1.38||26 – 4||2nd rd|
|2004||Mississippi St.||1.4||26 – 4||2nd rd||2005||Pittsburgh||1.48||20 – 9||1st rd|
|2008||New Mexico St.||1.55||21 – 14||2008||North Carolina||1.53||36 – 3||Final 4|
|2008||Syracuse||1.61||21 – 14||NIT||2008||New Mexico St.||1.73||21 – 14|
|2008||North Carolina||1.82||36 – 3||Final 4||2008||Syracuse||1.81||21 – 14||NIT|
|2007||Providence||1.84||18 – 13||NIT||2007||North Carolina||1.81||31 - 7||Elite 8|
|2005||Mississippi St.||1.85||23 – 11||2nd rd||2009||Pittsburgh||1.85||31 – 5||Elite 8|
|2007||North Dakota St.||1.91||20 – 8||2007||North Dakota St.||1.89||20 - 8|
|2006||Louisiana St.||1.92||27 – 9||Final 4||2006||Louisiana St.||1.98||27 – 9||Final 4|
For the most part the lists are identical. 8 of the 10 comps are the same and in mostly the same order. I've highlighted the differences and you can see that the 7th and 8th highest comps have changed to more impressive teams after the win over Hartford. These comps suggest that right now UK is good enough to make the Sweet 16/Elite 8. Of particular interest to me is the presence of three recent UNC teams. Those teams had a lot of young talent, much like our Wildcats this year and those teams did pretty well collectively. Recall that 2007 UNC team starred sophomore Tyler Hansbrough and freshmen Ty Lawson & company. The game they lost to Georgetown in the Elite 8 was one in which they dominate the first 30+ minutes and had a huge lead only to fall apart at the end. Despite some struggles early, I have a lot more confidence in the Cats ability to close out teams late. The only team that looks really out of place is North Dakota St, but let me tell you that 2007 NDS team has a lot of good, BCS team comps in their top 10.
I'm going to continue tracking UK this way for the rest of the season. I won't do it after every game, but maybe every couple of weeks while I also play around with the lists and look for any interesting patterns. If you would like a copy of the spreadsheet I'm using you can email me and I'll send you a copy.