clock menu more-arrow no yes

Filed under:

Kentucky vs. Louisville statistical breakdown

New, 1 comment

A look at what could ultimately decide who wins this edition of Cats vs. Cards.

Kentucky v Louisville Photo by Andy Lyons/Getty Images

Now that we are eleven games into the season and prepping for the annual holiday face-off with the Louisville Cardinals, I’m wondering about the key measures we should be looking at to assess our chances of pulling out a win.

Maybe it’s not a bad time to take a deeper look – with the help of some statistical learning tools –to see if we can find any statistical trends that have been associated with Kentucky wins and losses this season.

To that end, let’s examine several standard and advanced team statistics for each game and compare to the Cats’ margin of victory (MOV). The variables we consider are specified in Table 1.

Table 1. Per game statistics used as independent variables in the lasso regression model to predict MOV. Note statistics with asterisks and MOV are scaled to represent estimated values for 100 possessions.

We include both UK and opponent statistics (except pace) in the analysis for a total of 57 “independent” predictor variables; also, the stats with asterisks as well as MOV for each game are scaled to represent their equivalent values for a 100 possession game (since games have different numbers of possessions, we do this so we can make fair “apples-to-apples” comparisons among the games).

The engine for our analysis is a technique called “lasso regression” that was perfected by a team of statisticians at Stanford University several years ago. The great thing about the lasso is it will automatically select a subset of the 57 independent variables that can be used to predict the MOV. With this approach, we get a good estimate of the “vital few” game statistics that have been most highly associated with the outcome of the Cats’ games this season.

Enough of the regression lesson; let’s get on to the analysis. Figure 1 summarizes the results and reveals only four “vital few” predictor variables. These predictors are UK’s field goal percentage (UK FG%) and three of the opponents’ statistics: made field goals (Opp FGM), defensive rebounds (Opp DRB), and “true shooting percentage” (Opp TS%, an advanced statistic which is a scaled ratio of total points to the effective number of total shots taken, including free throws).

Figure 1. Estimated regression coefficients for the lasso regression model to predict MOV.

Before getting into the interpretation of the model, let’s check to see how well it predicts MOV for games we’ve already played. Figure 2 shows a scatterplot of the actual MOV versus predicted MOV. Here, we see a pretty good fit with the actual data, especially considering we only have eleven games of data.

The mean square difference between actual and predicted is about 5 points for a 100 possession game, and the only game where the model does not have an accurate win/lose prediction is against Evansville, where the actual margin was -3 versus a predicted margin of +4.

Figure 2. Actual MOV versus MOV predicted by the lasso regression model. Note MOV have been scaled to represent effective MOV for 100 possession games.

So how do we interpret this result? The coefficient estimates show a large positive value for UK FG% and large negative value for Opp TS%, meaning UK’s field goal percentage and opponents true shooting percentage are the primary positive and negative predictors, respectively, of UK’s margin of victory.

We also see smaller predictive value from Opp FGM and Opp TS%; one could argue that field goals made is actually an element of true shooting percentage while defensive rebounds are inversely related to UK’s field goal percentage. Bottom line: according to this analysis, UK’s field goal percentage and the opponent’s true shooting percentage are primary indicators of the final scoring margin.

So what does this imply about the upcoming UK/Louisville game? Currently, U of L is ranked third in the country in defensive field goal percentage, holding their opponents to a paltry 35.0% from the field; UK’s average FG% in losses this year has been 40.2%, so if this analysis holds the Cats are going to have to exceed the average Cards opponent’s shooting percentage significantly, perhaps by double digits.

As far as the other key indicator goes, in UKs three losses they have allowed their opponents an average TS% of 60.2%, right at Louisville’s average of 59.4%. So if the Cats can’t slow down the Cards, this would appear to suggest a UK loss. On the other hand, UK is holding opponents to a TS% of 48.0% for the season, and in U of L’s loss to Texas Tech, their TS% was 44.5%, not too much different than UK’s norm.

Overall, if you believe the model, things are not shaping up well for a Kentucky victory on Saturday. Defensively, UK has shown they can contain at least one good opponent from the field, but that Michigan State game seems like eons ago.

The key may well be Kentucky’s offensive production, which could benefit from some front court help in the form of high percentage shots down low. Nick Richards, EJ Montgomery, and Nate Sestina have all shown flashes this season, but with Louisville’s tough interior defense it might be a tall order to expect a significant improvement in this game.

The best news is the lasso regression model identifies a few factors that are associated with margin of victory over a young season in which UK has not played many high major teams; as the season progresses, the model should change and its predictive value should improve. It’s my way of saying, at this early stage of the season, the model may be wrong.

And this is one situation where I really I hope I’m wrong.