Civics: Using supervised machine learning to quantify political rhetoric
Using the 11,000 posts coded by Mechanical Turk workers, we split the data into five equally sized portions and trained the model five separate times, each time omitting a different 20% of the data so we could check how well the model did. Not only did this process help determine whether the models worked, but it also helped us determine prediction thresholds that provided us with the greatest balance between precision and recall. The full results of the cross validation process are shown here. Finally, in addition to cross-validation, we also compared our model’s predictions for the 1,100 posts that Center researchers labeled themselves, to make sure that our models agreed not only with the Mechanical Turk workers, but our own judgments as well.
Once we had our predictions, we decided to step back from the individual posts and take a look at the overall rates at which congressional Facebook posts took sides or went local. To do so, we computed the overall weighted proportions of posts that contained each of our topics, based on the machine learning models as well as the Mechanical Turk coders and in-house researchers, and compared the three different overall estimates. Happily, they all seemed to match closely.
Finally, we used these data to look at patterns in which members of Congress took sides, who went local and how the Facebook audience reacted to different kinds of posts. But our research doesn’t have to be the last word on this. We think other researchers and scholars interested in Congress might have their own ideas about how to use these data, so we’re publishing a dataset that contains our estimates of the rate at which individual members of Congress expressed opposition, expressed support or discussed local topics on Facebook. The dataset covers the full 114th Congress and the first year of the 115th Congress and is available here. (You’ll just need to create an account first.)
Please cite the dataset as: van Kessel, Patrick, Adam G. Hughes, and Solomon Messing. 2018. “Taking Sides on Facebook: How Congressional Outreach Changed Under President Trump.” Dataset: Pew Research Center.