I’m seeing a lot of surprise among the Idol blogosphere about Sarina-Joi’s elimination, and while I kind of understand it, I disagree. If you read the Top 12 forecast, you saw that I had Sarina-Joi second most likely to be eliminated, within a hair’s breadth (1 percentage point) of Daniel Seavey. After the jump, I’ll say how I get that.
If we take all performances in American Idol final rounds from Season 5 to Season 13 (last year) that didn’t have the Idols singing multiple songs on a night, tabulate the numbers that each performance registered on Votefair, WhatNotToSing, and several other variables, we can do a regression fit to the data to see what determines whether someone was Safe (as opposed to eliminated or Bottom 3). I have placed the data here if you want to download it yourself. After extracting the zip, open the file in R:
FinalsNoMult <- read.csv('FinalsNoMult.csv') FinalsNoMult$Black <- FinalsNoMult$RaceCode == 201 FinalsNoMult$Safe <- FinalsNoMult$Result=='Safe'
The above also codes a field called “Black” to track whether there is a racial bias, and a field called “Safe” for simplicity of fitting (it turns the “Result” field, which can have several different values such as Safe, Eliminated, Bottom Group, Wild Card into one field that’s either True or False).
572 records have complete fields for the fit we’re going to do. To do a logistical regression on the variables, we say
> fit <- glm(Safe~Black+Age.at.start+Order+VFPercent+FinalsByVote +Bottom.Prev+PrevAvg+WNTS.Rating+Sex, data=FinalsNoMult,family="binomial")
This tells R to determine the regression parameters assuming that being Safe depends on race, Age, singing order, Votefair percentage, FinalsByVote (which is true except for Wild Cards), whether or not the contestant had already been in the bottom three, their previous average, their WhatNotToSing rating, and their gender (sex). The specification is for a logit by using glm (generalized linear model) and family=”binomial” which defaults to a logit link.
We now ask R how the fit went:
> summary(fit) Call: glm(formula = Safe ~ Black + Age.at.start + Order + VFPercent + FinalsByVote + Bottom.Prev + PrevAvg + WNTS.Rating + Sex, family = "binomial", data = FinalsNoMult) Deviance Residuals: Min 1Q Median 3Q Max -3.5314 -0.7480 0.2860 0.7289 2.0517 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.6311705 0.8858132 -4.099 4.14e-05 *** BlackTRUE 0.2712437 0.2790509 0.972 0.33104 Age.at.start 0.0036947 0.0317647 0.116 0.90740 Order 0.2555671 0.0428352 5.966 2.43e-09 *** VFPercent 0.1060621 0.0226117 4.691 2.72e-06 *** FinalsByVote 0.4674427 0.3362479 1.390 0.16448 Bottom.Prev -0.6505188 0.2329869 -2.792 0.00524 ** PrevAvg -0.0003424 0.0122648 -0.028 0.97773 WNTS.Rating 0.0337298 0.0077596 4.347 1.38e-05 *** SexM 0.7197623 0.2388726 3.013 0.00259 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 709.34 on 571 degrees of freedom Residual deviance: 501.46 on 562 degrees of freedom (224 observations deleted due to missingness) AIC: 521.46 Number of Fisher Scoring iterations: 6
According to this, being safe in the finals
- does not significantly depend on race (at least as far as being black or white)
- does not significantly depend on age
- does depend on the order you sang in
- does depend on Votefair popularity
- does not significantly depend on being a wild card pick
- does depend on whether the person was in the bottom 3 previously
- does not depend on how well the person has done in WNTS on previous weeks
- does depend on WhatNotToSing rating
- does depend on gender
Let’s ask R to do the fit again with only the variables it found significant:
> fit <- glm(Safe~Order+VFPercent+Bottom.Prev+WNTS.Rating+Sex, data=FinalsNoMult,family="binomial") > summary(fit) Call: glm(formula = Safe ~ Order + VFPercent + Bottom.Prev + WNTS.Rating + Sex, family = "binomial", data = FinalsNoMult) Deviance Residuals: Min 1Q Median 3Q Max -3.4859 -0.7426 0.2880 0.7315 2.0501 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.031485 0.408278 -7.425 1.13e-13 *** Order 0.248616 0.042187 5.893 3.79e-09 *** VFPercent 0.102274 0.019279 5.305 1.13e-07 *** Bottom.Prev -0.680739 0.225052 -3.025 0.00249 ** WNTS.Rating 0.034566 0.006164 5.607 2.05e-08 *** SexM 0.655808 0.222489 2.948 0.00320 ** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 709.34 on 571 degrees of freedom Residual deviance: 504.15 on 566 degrees of freedom (224 observations deleted due to missingness) AIC: 516.15 Number of Fisher Scoring iterations: 6
Now, look at the parameter estimates in the first column. Those tell us how the odds of being safe scale with each variable. Your odds of being safe are
- Increased by 0.25 for each step closer to the pimp spot (last singer of the night)
- Increased by 0.10 for each percentage point you do better on Votefair
- Decreased by 0.681 if you were already in the Bottom 3 previously
- Increased by 0.03 for each point you get on WhatNotToSing for that performance
- Increased by 0.66 by being male
For the purposes of American Idol, having been in the bottom 3 before is only a slightly worse indicator than being a female.
We can see how well this model assigns people into the Bottom 3 (at risk of being eliminated) and determine how bad any ranking errors are by running the following:
FinalsNoMult$Prob <- predict(fit,newdata=FinalsNoMult, type="response") episodelist <- unique(FinalsNoMult$Episode_ID) falsepos <- data.frame() falseneg <- data.frame() for (ep in episodelist){ thisep <-FinalsNoMult[FinalsNoMult$Episode_ID == ep,] if(!any(is.na(thisep$Prob))){ thisep <- thisep[order(-thisep$Prob),] topranked <- thisep[1:sum(thisep$Safe),] bottomranked <- thisep[-sum(thisep$Safe):0,] wrong <- topranked[!topranked$Safe,] wrong$Diff <- wrong$Prob - (thisep$Prob[sum(thisep$Safe)]+thisep$Prob[sum(thisep$Safe)+1])/2 falsepos <- rbind(falsepos, wrong) wrong <- bottomranked[bottomranked$Safe,] wrong$Diff <- wrong$Prob - (thisep$Prob[sum(thisep$Safe)]+thisep$Prob[sum(thisep$Safe)+1])/2 falseneg <- rbind(falseneg, wrong) } } allwrong <- rbind(falseneg,falsepos) hist(allwrong$Diff,30)
This goes through all episodes, determines where the cutoff point was (halfway in between the rank where the bottom 3 would be), and determines how far away from the cutoff the wrong calls were. The plot of ranking errors is shown:
There are 56 mis-assignments (people not sorted correctly into the Bottom 3), each one with a corresponding positive and negative count. Positive numbers mean that the person was thought to be safe but was actually in the bottom 3, and negative numbers mean that the person was predicted in the bottom 3 but was actually safe. The mean of this mistake distribution is 0.0126 (close to 0), with a standard deviation of 0.168.
Here are the raw numbers for the Top 12:
Name | WNTS Rating | VFPercent | Order | Probability | Diff from cutoff |
---|---|---|---|---|---|
Clark | 79 | 29.305 | 9 | 0.996277835 | 0.423942752 |
Quentin | 69 | 13.87 | 11 | 0.970169961 | 0.397834879 |
JAX | 75 | 11.37 | 7 | 0.921588801 | 0.349253719 |
Joey | 59 | 8.87 | 10 | 0.916921764 | 0.344586681 |
Tyanna | 75 | 10.01 | 5 | 0.861500359 | 0.289165277 |
Rayvon | 74 | 6.615 | 2 | 0.795105812 | 0.222770729 |
Nick | 48 | 5.4 | 6 | 0.790421272 | 0.21808619 |
Adanna | 43 | 2.665 | 12 | 0.736928926 | 0.164593843 |
Qaasim | 27 | 1.55 | 8 | 0.669308174 | 0.096973092 |
Maddie | 43 | 4.42 | 4 | 0.475361991 | -0.096973092 |
Sarina | 33 | 4.795 | 1 | 0.240162801 | -0.332172282 |
Daniel | 9 | 1.13 | 3 | 0.230912592 | -0.341422491 |
Sarina had a relatively low WNTS Rating, a relatively low Votefair percentage, she sang first, and she was female. That assigned her a 0.240 probability of being safe. Halfway in between the 3rd and 4th worst ranked was in between Qaasim and Maddie, which was about 0.5724. That means Sarina was 33 percentage points below the cutoff. Looking at the above chart, there are very few incidents of someone being safe with such a distance from the cutoff. Thus, Sarina-Joi looked quite likely to be in the Bottom 3. So did Daniel Seavey, and if I had to wager, I would bet that he was (although this was not revealed to us). Daniel and Sarina-Joi had nearly identical safe probabilities (0.24 versus 0.23).