A logistical model of the Top 11

I’ve been playing around with the numbers, and am able to get a decent projection model with relatively few assumptions. The model is not yet amazingly statistically significant. Nevertheless, the predictions are interesting.

The model assumes that the primary indicators of elimination are Gender, Slot, and whether or not the contestant scored in the bottom 3 on Dialidol and WNTS. You can download the full data set for Top 11 episodes (S4-S9; male is coded as 1, female as 0, and bottom 3 is 1 for yes, 0 for no). Now, we can use R to get a logistical fit

> Final11 <- read.csv("Final11b.csv")
> attach(Final11)
> mylogit <- glm(Result~Sex+Order+DIBT+WNBT,family=binomial(link="logit"))
> summary(mylogit)

A fit summary is obtained:

            Estimate Std. Error z value Pr(>|z|)
(Intercept)  -1.8368     1.3327  -1.378    0.168
Sex          -1.4383     1.2036  -1.195    0.232
Order        -0.1338     0.1677  -0.798    0.425
DIBT          0.8017     1.1133   0.720    0.471
WNBT          1.0850     1.0773   1.007    0.314

The model thinks that Sex is still the determining factor. The odds of being eliminated decrease by 1.44 if you are a man. Being in the Dialidol bottom 3, conversely, increases your odds of being eliminated by 0.8. Note that the significance is quite low. Sex is still the most significant in the fit.

Now if we put in the data from last week, we can get a projected score for Season 10

> Sex <- c(1,0,1,1,0,1,0,0,0,1,1)
> Order <- c(3,8,5,7,2,11,6,10,4,1,9)
> DIBT <- c(0,0,0,0,0,0,0,0,1,1,1)
> WNBT <- c(0,0,1,0,1,0,0,0,0,1,0)
> newdata1 <- data.frame(Sex,Order,DIBT,WNBT)
> newdata1$rankP <- predict(mylogit,newdata=newdata1,type="response")
> newdata1

which gives the output

   Sex Order DIBT WNBT       rankP
1    1     3    0    0 0.024684403
2    0     8    0    0 0.051785813
3    1     5    0    1 0.054204166
4    1     7    0    0 0.014601516
5    0     2    0    1 0.265128481
6    1    11    0    0 0.008600886
7    0     6    0    0 0.066620560
8    0    10    0    0 0.040112434
9    0     4    1    0 0.172149933
10   1     1    1    1 0.179127343
11   1     9    1    0 0.024652399

The rightmost column is the probability of elimination. The projected people with the highest percentage of being eliminated are slot 2, slot 4, and slot 1. i.e., the projected bottom 3 based on this logistical model were

Name Elimination probability
Thia Megia 0.265
Casey Abrams 0.179
Lauren Alaina 0.172

This really isn’t a terrible prediction. It has 2/3 of the bottom 3 that actually occurred, including the one who was actually eliminated (or saved, in this case). The logistic points are plotted below:

Logistical probabilty projection based on last week's performances

Lauren appears to be over-performing somewhat. I don’t know if this is because of the huge amount of exposure that she got by being in the promos, or what it is, but she is definitely the one to watch. Pia, under this model, is not nearly as safe as people would believe. I’m getting nervous about that call. Thia was safe this week, but I imagine she’ll be gone before too long. If the forecast puts her in the bottom 3 again tomorrow, I would feel comfortable betting that she’s out.

I intend to apply the same model tomorrow, and hopefully the results will feel right. If not, well, it’s back to the drawing board.

Bookmark the permalink.

Comments are closed.

Comments are closed