Sarina-Joi’s elimination was not shocking (Finals model methodology)

I’m seeing a lot of surprise among the Idol blogosphere about Sarina-Joi’s elimination, and while I kind of understand it, I disagree. If you read the Top 12 forecast, you saw that I had Sarina-Joi second most likely to be eliminated, within a hair’s breadth (1 percentage point) of Daniel Seavey. After the jump, I’ll say how I get that.

If we take all performances in American Idol final rounds from Season 5 to Season 13 (last year) that didn’t have the Idols singing multiple songs on a night, tabulate the numbers that each performance registered on Votefair, WhatNotToSing, and several other variables, we can do a regression fit to the data to see what determines whether someone was Safe (as opposed to eliminated or Bottom 3). I have placed the data here if you want to download it yourself. After extracting the zip, open the file in R:

FinalsNoMult <- read.csv('FinalsNoMult.csv')
FinalsNoMult$Black <- FinalsNoMult$RaceCode == 201
FinalsNoMult$Safe <- FinalsNoMult$Result=='Safe'

The above also codes a field called “Black” to track whether there is a racial bias, and a field called “Safe” for simplicity of fitting (it turns the “Result” field, which can have several different values such as Safe, Eliminated, Bottom Group, Wild Card into one field that’s either True or False).

572 records have complete fields for the fit we’re going to do. To do a logistical regression on the variables, we say

> fit <- glm(Safe~Black+Age.at.start+Order+VFPercent+FinalsByVote
+Bottom.Prev+PrevAvg+WNTS.Rating+Sex,
data=FinalsNoMult,family="binomial")

This tells R to determine the regression parameters assuming that being Safe depends on race, Age, singing order, Votefair percentage, FinalsByVote (which is true except for Wild Cards), whether or not the contestant had already been in the bottom three, their previous average, their WhatNotToSing rating, and their gender (sex). The specification is for a logit by using glm (generalized linear model) and family=”binomial” which defaults to a logit link.

We now ask R how the fit went:

> summary(fit)
Call:
glm(formula = Safe ~ Black + Age.at.start + Order + VFPercent +
    FinalsByVote + Bottom.Prev + PrevAvg + WNTS.Rating + Sex,
    family = "binomial", data = FinalsNoMult)

Deviance Residuals:
    Min       1Q   Median       3Q      Max 
-3.5314  -0.7480   0.2860   0.7289   2.0517 

Coefficients:
               Estimate Std. Error z value Pr(>|z|)   
(Intercept)  -3.6311705  0.8858132  -4.099 4.14e-05 ***
BlackTRUE     0.2712437  0.2790509   0.972  0.33104   
Age.at.start  0.0036947  0.0317647   0.116  0.90740   
Order         0.2555671  0.0428352   5.966 2.43e-09 ***
VFPercent     0.1060621  0.0226117   4.691 2.72e-06 ***
FinalsByVote  0.4674427  0.3362479   1.390  0.16448   
Bottom.Prev  -0.6505188  0.2329869  -2.792  0.00524 **
PrevAvg      -0.0003424  0.0122648  -0.028  0.97773   
WNTS.Rating   0.0337298  0.0077596   4.347 1.38e-05 ***
SexM          0.7197623  0.2388726   3.013  0.00259 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 709.34  on 571  degrees of freedom
Residual deviance: 501.46  on 562  degrees of freedom
  (224 observations deleted due to missingness)
AIC: 521.46

Number of Fisher Scoring iterations: 6

According to this, being safe in the finals

  • does not significantly depend on race (at least as far as being black or white)
  • does not significantly depend on age
  • does depend on the order you sang in
  • does depend on Votefair popularity
  • does not significantly depend on being a wild card pick
  • does depend on whether the person was in the bottom 3 previously
  • does not depend on how well the person has done in WNTS on previous weeks
  • does depend on WhatNotToSing rating
  • does depend on gender

Let’s ask R to do the fit again with only the variables it found significant:

> fit <- glm(Safe~Order+VFPercent+Bottom.Prev+WNTS.Rating+Sex,
data=FinalsNoMult,family="binomial")
> summary(fit)

Call:
glm(formula = Safe ~ Order + VFPercent + Bottom.Prev + WNTS.Rating +
    Sex, family = "binomial", data = FinalsNoMult)

Deviance Residuals:
    Min       1Q   Median       3Q      Max 
-3.4859  -0.7426   0.2880   0.7315   2.0501 

Coefficients:
             Estimate Std. Error z value Pr(>|z|)   
(Intercept) -3.031485   0.408278  -7.425 1.13e-13 ***
Order        0.248616   0.042187   5.893 3.79e-09 ***
VFPercent    0.102274   0.019279   5.305 1.13e-07 ***
Bottom.Prev -0.680739   0.225052  -3.025  0.00249 **
WNTS.Rating  0.034566   0.006164   5.607 2.05e-08 ***
SexM         0.655808   0.222489   2.948  0.00320 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 709.34  on 571  degrees of freedom
Residual deviance: 504.15  on 566  degrees of freedom
  (224 observations deleted due to missingness)
AIC: 516.15

Number of Fisher Scoring iterations: 6

Now, look at the parameter estimates in the first column. Those tell us how the odds of being safe scale with each variable. Your odds of being safe are

  • Increased by 0.25 for each step closer to the pimp spot (last singer of the night)
  • Increased by 0.10 for each percentage point you do better on Votefair
  • Decreased by 0.681 if you were already in the Bottom 3 previously
  • Increased by 0.03 for each point you get on WhatNotToSing for that performance
  • Increased by 0.66 by being male

For the purposes of American Idol, having been in the bottom 3 before is only a slightly worse indicator than being a female.

We can see how well this model assigns people into the Bottom 3 (at risk of being eliminated) and determine how bad any ranking errors are by running the following:

FinalsNoMult$Prob <- predict(fit,newdata=FinalsNoMult, type="response")
episodelist <- unique(FinalsNoMult$Episode_ID)
falsepos <- data.frame()
falseneg <- data.frame()
for (ep in episodelist){
thisep <-FinalsNoMult[FinalsNoMult$Episode_ID == ep,]
if(!any(is.na(thisep$Prob))){
thisep <- thisep[order(-thisep$Prob),]
topranked <- thisep[1:sum(thisep$Safe),]
bottomranked <- thisep[-sum(thisep$Safe):0,]
wrong <- topranked[!topranked$Safe,]
wrong$Diff <- wrong$Prob - (thisep$Prob[sum(thisep$Safe)]+thisep$Prob[sum(thisep$Safe)+1])/2
falsepos <- rbind(falsepos, wrong)
wrong <- bottomranked[bottomranked$Safe,]
wrong$Diff <- wrong$Prob - (thisep$Prob[sum(thisep$Safe)]+thisep$Prob[sum(thisep$Safe)+1])/2
falseneg  <- rbind(falseneg, wrong)
}
}
allwrong <- rbind(falseneg,falsepos)
hist(allwrong$Diff,30)

This goes through all episodes, determines where the cutoff point was (halfway in between the rank where the bottom 3 would be), and determines how far away from the cutoff the wrong calls were. The plot of ranking errors is shown:

FinalsNoMultMistakesThere are 56 mis-assignments (people not sorted correctly into the Bottom 3), each one with a corresponding positive and negative count. Positive numbers mean that the person was thought to be safe but was actually in the bottom 3, and negative numbers mean that the person was predicted in the bottom 3 but was actually safe. The mean of this mistake distribution is 0.0126 (close to 0), with a standard deviation of 0.168.

Here are the raw numbers for the Top 12:

Name WNTS Rating VFPercent Order Probability Diff from cutoff
Clark 79 29.305 9 0.996277835 0.423942752
Quentin 69 13.87 11 0.970169961 0.397834879
JAX 75 11.37 7 0.921588801 0.349253719
Joey 59 8.87 10 0.916921764 0.344586681
Tyanna 75 10.01 5 0.861500359 0.289165277
Rayvon 74 6.615 2 0.795105812 0.222770729
Nick 48 5.4 6 0.790421272 0.21808619
Adanna 43 2.665 12 0.736928926 0.164593843
Qaasim 27 1.55 8 0.669308174 0.096973092
Maddie 43 4.42 4 0.475361991 -0.096973092
Sarina 33 4.795 1 0.240162801 -0.332172282
Daniel 9 1.13 3 0.230912592 -0.341422491

Sarina had a relatively low WNTS Rating, a relatively low Votefair percentage, she sang first, and she was female. That assigned her a 0.240 probability of being safe. Halfway in between the 3rd and 4th worst ranked was in between Qaasim and Maddie, which was about 0.5724. That means Sarina was 33 percentage points below the cutoff. Looking at the above chart, there are very few incidents of someone being safe with such a distance from the cutoff. Thus, Sarina-Joi looked quite likely to be in the Bottom 3. So did Daniel Seavey, and if I had to wager, I would bet that he was (although this was not revealed to us). Daniel and Sarina-Joi had nearly identical safe probabilities (0.24 versus 0.23).

Bookmark the permalink.
  • ABox

    Sarina-Joi should not have been up for elimination in the 1st place and should have been saved by the judges in the 2nd place. Anyway, I’m now ready to make my projection as to who’ll be this year’s winner … drum roll please… If Idol’s history is any guide, and it most definitely is, it’ll be this season’s favorite WGWG who’s also from the south, Clark Beckham.

  • Xfactor Fan

    @Abox since there is only a limit of voting you can’t use ifol history to predict winner I mean there’s only 20 votes now and during a 50 vote limit last season a non-WGWG won so imagine thT significantly decreasing a WGWG chance and also when was clark. Wgwg? I only watched stRting semis