I’ve been playing around with the numbers, and am able to get a decent projection model with relatively few assumptions. The model is not yet amazingly statistically significant. Nevertheless, the predictions are interesting.

The model assumes that the primary indicators of elimination are Gender, Slot, and whether or not the contestant scored in the bottom 3 on Dialidol and WNTS. You can download the full data set for Top 11 episodes (S4-S9; male is coded as 1, female as 0, and bottom 3 is 1 for yes, 0 for no). Now, we can use R to get a logistical fit

> Final11 <- read.csv("Final11b.csv") > attach(Final11) > mylogit <- glm(Result~Sex+Order+DIBT+WNBT,family=binomial(link="logit")) > summary(mylogit)

A fit summary is obtained:

Estimate Std. Error z value Pr(>|z|) (Intercept) -1.8368 1.3327 -1.378 0.168 Sex -1.4383 1.2036 -1.195 0.232 Order -0.1338 0.1677 -0.798 0.425 DIBT 0.8017 1.1133 0.720 0.471 WNBT 1.0850 1.0773 1.007 0.314

The model thinks that Sex is still the determining factor. The odds of being eliminated decrease by 1.44 if you are a man. Being in the Dialidol bottom 3, conversely, increases your odds of being eliminated by 0.8. Note that the significance is quite low. Sex is still the most significant in the fit.

Now if we put in the data from last week, we can get a projected score for Season 10

> Sex <- c(1,0,1,1,0,1,0,0,0,1,1) > Order <- c(3,8,5,7,2,11,6,10,4,1,9) > DIBT <- c(0,0,0,0,0,0,0,0,1,1,1) > WNBT <- c(0,0,1,0,1,0,0,0,0,1,0) > newdata1 <- data.frame(Sex,Order,DIBT,WNBT) > newdata1$rankP <- predict(mylogit,newdata=newdata1,type="response") > newdata1

which gives the output

Sex Order DIBT WNBT rankP 1 1 3 0 0 0.024684403 2 0 8 0 0 0.051785813 3 1 5 0 1 0.054204166 4 1 7 0 0 0.014601516 5 0 2 0 1 0.265128481 6 1 11 0 0 0.008600886 7 0 6 0 0 0.066620560 8 0 10 0 0 0.040112434 9 0 4 1 0 0.172149933 10 1 1 1 1 0.179127343 11 1 9 1 0 0.024652399

The rightmost column is the probability of elimination. The projected people with the highest percentage of being eliminated are slot 2, slot 4, and slot 1. i.e., the projected bottom 3 based on this logistical model were

Name | Elimination probability |

Thia Megia | 0.265 |

Casey Abrams | 0.179 |

Lauren Alaina | 0.172 |

This really isn’t a terrible prediction. It has 2/3 of the bottom 3 that actually occurred, including the one who was actually eliminated (or saved, in this case). The logistic points are plotted below:

Lauren appears to be over-performing somewhat. I don’t know if this is because of the huge amount of exposure that she got by being in the promos, or what it is, but she is definitely the one to watch. Pia, under this model, is not nearly as safe as people would believe. I’m getting nervous about that call. Thia was safe this week, but I imagine she’ll be gone before too long. If the forecast puts her in the bottom 3 again tomorrow, I would feel comfortable betting that she’s out.

I intend to apply the same model tomorrow, and hopefully the results will feel right. If not, well, it’s back to the drawing board.