Monday, April 28, 2008

Answering Questions About the Superdelegate Predictions

Since I've begun generating the Superdelegate predictions, I've gotten several questions from readers about how I generate the predictions or why various things are missing from the model. For some of the questions, you might find answers in Carl Bialik's recent blog posting about this site. I've also taken a stab at three questions below...

1) What about John Edwards (or other missing superdelegates)?

We originally left out special superdelegates, like John Edwards (CORRECTION: John Edwards is NOT a superdelegate), who are in their position because they are a former president, speaker of the House, etc. The issue with these delegates is that unlike active politicians, they don't have an obvious constituency, which makes their behavior far less predictable.

If you were assume that John Edwards' constituency is the people of North Carolina, then the model would likely predict that he would support Obama (as it does for other male politicians from North Carolina). However, he is a bit of a free agent since he doesn't really have to answer to any particular group, and that makes it far more difficult to model his behavior (or that of other superdelegates like him).

CORRECTION: Thanks to Matt from the Democratic Convention Watch site for pointing out that John Edwards is NOT a superdelegate. Being a VP candidate on a losing ticket does not qualify you for a vote. Since we took our superdelegate list straight from the Democratic Convention Watch website, Edwards has never been in our dataset, so this has not affected our predictions.

2) Why is race not also included in the model? I would expect that it would have similar predictive power to sex.

When my research assistants originally collected the data for these models it was very hard to find information like race on some of the superdelegates who are not governors or members of Congress. Many of these superdelegates were not well known and google searches revealed little to us about their race or ethnicity. This remains true about at least some of the superdelegates. Therefore, we had to exclude this variable from the model. If there is a central source of information that would allow us to fill in the holes on the race of superdelegates, let me know and we will add it to the model.

For what it's worth, I re-ran the model with the superdelegates that I did know the race of and found that the race of the superdelegate was NOT a statistically significant predictor of who a superdelegate chose to support.

3) Does your methodology take into account the fact that a large number of pledged superdelegates made their commitments (primarily) to Clinton prior to anyone believing that there would be a viable alternative candidate?

We use a two-stage model that allows us to account first for the factors affecting whether a superdelegate has endorsed at all and then estimates which candidate that superdelegate endorsed (if he/she has endorsed). To some extent, this should pick up some of the fact that potential Clinton endorsers were more likely to have already endorsed while more potential Obama supporters may be waiting to make sure he is going to actually pull out the nomination. Nevertheless, in recent works I have begun considering a different approach, by attempting to model when superdelegates made their decisions (for example, before or after Super Tuesday) and include that as a factor in the model. I may do that in the next iteration of the predictions.

Thanks for the questions, and let me know if you have more.


Matt said...

John Edwards is not a superdelegate.

Brian Schaffner said...

Oops, you are right. I'll fix the post.

Corinne said...

Are you treating all delegates as a training set? Your prospective testing set is 4 out of 8 correctly classified.