Riddler Classic 2022-12-09

2022 FIFA World Cup – Can You Win The Riddler Football Playoff?

Author

Affiliation

Ryan McShane

Published

Dec. 9, 2022

DOI

The question from fivethirtyeight.com:

Speaking of “football,” the Riddler Football Playoff (RFP) consists of four teams. Each team is assigned a random real number between 0 and 1, representing the “quality” of the team. If team A has quality a and team B has quality b, then the probability that team A will defeat team B in a game is a/(a+b). And the probability that team B will defeat team A is b/(a+b). There are no ties!

In the semifinal games of the playoff, the team with the highest quality (the “1 seed”) plays the team with the lowest quality (the “4 seed”), while the other two teams play each other as well. The two teams that win their respective semifinal games then play each other in the final.

On average, what is the quality of the RFP champion?

Definitions

The strength ratings, before they are drawn and ordered, are distributed .

Let team have rating , team have rating , team have rating , team have rating , and so that team is ranked first and team is ranked last. Thus, the first round will be vs and vs . We assume that each game is an independent event, so that the outcome of one game does not affect any other game.

Let indicate that team beat team .

We know that if and if are the order statistics, then each of and have a distribution. That is,

where the general form of the Beta distribution is

and the gamma function evaluates to when is an integer. We’ll need these distributions as priors for our later probability calculations. We also note that

Simulation Approach

So, we can simulate times using the sampling scheme described in the definitions – draw four , sort high to low, and assign to and .

M = 10000000
# M = 100
set.seed(538)
group_ratings = matrix(runif(4*M), nrow = M)
for(i in 1:nrow(group_ratings)) {
  group_ratings[i, ] = sort(group_ratings[i, ], decreasing = TRUE)
}
colnames(group_ratings) = letters[1:4]

A quick sanity check – we examine the first three simulations (to verify the randomly drawn values of are and ordered from best to worst).


0.5946701	0.5004485	0.1986629	0.0034940
0.7773826	0.6322653	0.5646906	0.0867222
0.9377972	0.4526582	0.3499206	0.1221021

Perhaps more importantly, the expectations of these values match what they should be:


0.8	0.6	0.4	0.2

And we can see the empirical densities match the Beta distributions (in yellow) we’ve described above.

Solution

We see that

Then, since will always play and will always play , we know that every conditional probability will be of the form

where is the tournament winner’s rating and is the runner-up’s strength. The conditional probability that team will win is then

where is the rating of the other runner-up’s strength. E.g.

## Function reflecting above derivation
prob_calculator = function(winner, a, b, c, d){
  s1 = ifelse(winner %in% c("a", "d"), "b", "a")
  s2 = ifelse(winner %in% c("a", "d"), "c", "d")
  s1 = case_when(s1 == "a" ~ a, s1 == "b" ~ b, s1 == "c" ~ c, s1 == "d" ~ d)
  s2 = case_when(s2 == "a" ~ a, s2 == "b" ~ b, s2 == "c" ~ c, s2 == "d" ~ d)
  winner = case_when(winner == "a" ~ a, winner == "b" ~ b, winner == "c" ~ c,
                     winner == "d" ~ d)
  pre = 1/((a + d)*(b + c))
  post = winner^2*(s1/(winner + s1) + s2/(winner + s2))
  return(pre*post)
}

Now we use this prob_calculator function on our simulated and values from earlier.

exp_results = group_ratings %>%
  as.data.frame() %>%
  mutate(
    probA = prob_calculator(winner = "a", a = a, b = b, c = c, d = d),
    probB = prob_calculator(winner = "b", a = a, b = b, c = c, d = d), 
    probC = prob_calculator(winner = "c", a = a, b = b, c = c, d = d), 
    probD = prob_calculator(winner = "d", a = a, b = b, c = c, d = d)
    ) %>% 
  mutate(winner_rating = a*probA + b*probB + c*probC + d*probD, 
         check = probA + probB + probC + probD)

Results

For the purposes of the table, let , and define as the expected winning team’s rating when given and . That is,

Finally, . As another check, we verify that this is for every simulation. Now, we examine the first three simulations:


0.5947	0.5004	0.1987	0.0035	0.5982	0.3294	0.0724	0.0001	0.5349	1
0.7774	0.6323	0.5647	0.0867	0.5079	0.2598	0.2196	0.0127	0.6842	1
0.9378	0.4527	0.3499	0.1221	0.6175	0.2136	0.1421	0.0268	0.7288	1

## Indeed, all probability sums are 1
all.equal(exp_results$check, rep(1, M))

[1] TRUE

Now, we examine at the distribution of :

And finally, find the expected values of our simulations (remember that there were of them).


0.7999	0.6	0.4	0.2	0.5028	0.2867	0.1505	0.06	0.6735

Thus, .

Update 12/18/2022

I got a shout-out on the Riddler blog for my second plot! Here it is.