I was recently made aware of the following video interview of Richard Foley, who presented results of a “sentiment analysis” in predicting American Idol outcomes:
Put simply, sentiment analysis attempts to determine what the public thinks about a certain topic by scouring data sources like Twitter, and then trying to classify the data according to whether it was positive or negative. As far as I can tell, the results were not published outside of the conference.
I’m highly skeptical of this method for several reasons.
First, check out the results of doing a sentiment analysis of the term “heejun”, nearly all of which are relevant to Idol. You can see some results here on Twitter Sentiment or here on Social Mention. First, there are some obvious problems with this, in that the actual sentiment analysis is really rough. Tweets like “@HHanAI11 I NEED A HUG HEEJUN! *holds out arms* ?” are classified as negative, and copious positive tweets are wrongly classified as being neutral (e.g. “I would seriously love to see Heejun win American Idol.”, “I dont know if I can wait until Tuesday for American Idol!!I want to hear Phil,Heejun,and others sing!!!!”, etc.). The sentiment is scored by the former website as 105/8 (pos/neg) at the time of this writing. Nearly all of the contestants have a similar ratio, indicating that people are more likely to express positive sentiment than negative.
How would we count this versus a term like “Baylie”? First of all, not nearly all of the tweets with that term mean Baylie Brown, the contestant from this year. This is even more acute a problem when considering someone like Aaron Sanders (aka Aaron Marcellus), for which even terms like “aaron idol” do not necessarily mean this contestant (many of them pertain to Aaron Kelly). Secondly, the total number of tweets is much lower, and so it’s unclear how to judge the disposition of Baylie relative to Heejun. If Baylie has 10/1 (pos/neg), should we say she has 90% popularity? In that case, she and Heejun are in a dead heat. Does anybody really think that’s true?
On the first point, I could go on. Adam Brock, how to search for him? Adam mostly refers to Adam Lambert, even when you filter for very recent tweets. You could narrow it down to Brock’s actual Twitter account, but that would likely skew the results even farther positive, since most people won’t tweet a negative comment at the person.Narrowing it to his whole name restricts the data set so much that the noise problem is exacerbated.
Finally, and most importantly, the main measure of any independent variable’s predictive power is historical. How good was that variable at predicting previous results? Well, we simply don’t have data like that for a sentiment analysis. Twitter itself was only established in 2006, and only became popular in about 2008 or 2009. That is, there is no way to test whether Twitter sentiment was any good at all at indicating past results. It’s possible (likely, even) that the people who vote don’t use Twitter very much.
The point of all this is that sentiment analysis, though it may be useful in some contexts, is probably not very useful for predicting American Idol. The negative sentiment is buried in the large noise of sycophantic prattle, the search terms are incredibly difficult to form because of the large number of variations of someone’s name, the sentiment analysis itself isn’t very good as of now, and there is no historical record to work from in evaluating the data and proposing a model. Yes, if we had a super-intelligent robot who went through each tweet individually, scored it properly, and tabulated the results in a historical way, a model could be built that relied on the past few years’ Idol competitions. When this eventuality occurs, I will revisit this notion. For now, I think this just isn’t ready for prime time.