I've had some readers ask for more information on the methods I used to generate the predictions of which candidate the remaining unpledged Democratic superdelegates are more likely to support. This post may be a bit technical for some, but I'll do my best to explain the methodology and the logic behind it.
The idea here is that we have a lot of information about who has endorsed and which candidate they have endorsed. I argue that based on that information, we should be able to get an idea of how the unpledged superdelegates might cast their votes. This is particularly true if superdelegates largely choose their endorsements based, at least partly, on what their constituents would want.
To model this, I first throw out superdelegates from IL, AR, and NY, because they all seem to be backing their favorite son/daughter. Then I begin by using a logit model to predict whether a pledged delegate chose to support Clinton or Obama. The factors I include in this model are the superdelegate's gender and whether the superdelegate is a DNC member (the idea being that DNC members may behave differently from elected officials). I also include the 2004 vote for Bush (%) in that superdelegate's constituency (whether it be the state or the congressional district), the percentage of the superdelegate's state that is unionized, the percentage of the state that is college educated, and the percentage of the state's population living in urban areas. We also included the state's per capita income. These are all factors that the exit polls have shown to have an effect on whether one votes for Obama or Clinton, so incorporating them into the model allows us to capture whether the superdelegate comes from a state more predisposed to one candidate over the other.
Based on these factors, the model is able to correctly predict 73% of the superdelegates who have already pledged their support to one candidate or the other. The strongest factors influencing whether a superdelegate backed Clinton was whether the superdelegate was female, and if the state the superdelegate came from was highly unionized and more urban. Obama fared better among male superdelegates and those from states with larger college educated populatoins. Interestingly, the race and ethnicity of the superdelegate did not appear to have a significant influence on who a superdelegate supported.
Now, I could just use the logit model to predict the unpledged superdelegates, but if I do that I'm not accounting for the fact that there is likely something about superdelegates who have already pledged that makes them different from those who have chosen not to. To account for this, and this is where it gets real technical, I employ a Heckman Selection Probit model. Essentially, this model first estimates whether a superdelegate chose to support a candidate at all to this point and, if they did, then estimates which candidate they chose to support (using the information described above). I find that the two factors influencing whether a candidate ahs endorsed at all at this point are whether the superdelegate's state has had its primary/caucus yet and whether the superdelegate was a DNC member (rather than an elected official). Superdelegates were less likely to have endorsed anyone if their state had not yet voted and if they were a DNC member (i.e. not an elected official).
Using the Heckman probit selection model, I then generate the predicted probability that a superdelegate who has not yet endorsed will endorse Clinton or Obama when they cast their vote. As I noted in the earlier post, I find that more unpledged superdelegates would be favorably disposed to Obama rather than Clinton.
Of course, there are several reasons to be cautious about these estimates. First, the model only correctly predicts 73% of the superdelegates who have already pledged. This means the model will be wrong at least one out of every four times. Second, these models rely heavily on what has already happened, yet the dynamics of the race are changing significantly each day. What led someone to endorse a particular candidate two months ago may not be the same thing influencing where they stand now. Third, as I've noted elsewhere, I suspect that a significant number of superdelegates are going to vote for whichever candidate is ahead in pledged delegates rather than who they might support otherwise. Obviously this model is not necessary if that happens.
Nonetheless, this is an interesting exercise and it will be fun to see how the results turn out in teh coming weeks. I will try to update the model at least once a week for as long as there is a reason to do it. As always, thanks to Alicia Prevost and Caitlin Zook for their help collecting the data.
If you have more questions about the methodology, leave a comment and I'll be happy to answer them.