Daniel Nuccio:

A group of professors is using taxpayer dollars doled out by the federal government to develop a new misinformation fact-checking tool called “Course Correct.”

National Science Foundation funding, awarded through a pair of grants from 2021 and 2022, has amounted to more than $5.7 million for the development of this tool, which, according to the grant abstracts, is intended to aid reporters, public health organizations, election administration officials, and others to address so-called misinformation on topics such as U.S. elections and COVID-19 vaccine hesitancy.

This $5.7 million in grant money is on top of nearly another $200,000 awarded in 2020 through a Coronavirus Aid, Relief and Economic Security Act-funded NSF grant for a project focused in part on mental health that Course Correct is said to have grown out of.

According to the abstract of the 2021 grant, Course Correct’s developers, a group of five professors from various institutions nationwide, are using techniques related to machine learning and natural language processing to identify social media posts pertaining to electoral skepticism and vaccine hesitancy, identify people likely to be exposed to misinformation in the future, and flag at-risk online communities for intervention

Phase II proposal; more.

Overview: Democracy and public health in the United States are in crisis. These twin crises are exemplified by two major public problems: 1) vaccine hesitancy related to the COVID-19 pandemic, hindering vaccination and spilling over to other domains (e.g., flu vaccines) and 2) skepticism regarding American election integrity. These crises have resulted in 200,000 excess deaths after COVID-19 vaccines became available due to low uptake rates, especially among Black, Hispanic and Native American people, and concerted attempts to overturn the 2020 United States presidential election, culminating in an attack on the US Capitol. Networks of misinformation production and diffusion on social media platforms are ground zero for the creation, sharing, and uptake of content that spurs election skepticism and vaccine hesitancy. Journalists reported to us in Phase I that while they are trying to tame the misinformation tide, they are overwhelmed by what to check, how to effectively correct misinformation and target misinformation networks, and how to evaluate their interventions. To address these twin crises, we propose Course Correct, our innovative, four-step method to detect, test, verify, and field test a system to counter real- world misinformation flows. We propose to (1) extend our computational work to detect misinformation, using multimodal signal detection of linguistic and visual features surrounding vaccine hesitancy and electoral skepticism, coupled with network analytic methods to pinpoint key misinformation diffusers and consumers; (2) further develop A/B-tested correction strategies against misinformation, such as observational correction, using ad promotion infrastructure and randomized message delivery to optimize efficacy for countering misinformation; (3) disseminate and evaluate the effectiveness of evidence-based corrections using various scalable intervention techniques available through social media platforms by conducting a series of randomized control trials within affected networks, focusing on diffusers, not producers of misinformation and whether our intervention system can reduce misinformation uptake and sharing; and (4) scale Course Correct into local, national, and international newsrooms, guided by our interviews and ongoing collaborations with journalists, as well as tech developers and software engineers.

Intellectual Merit: The Intellectual Merit of our project springs from the insight that the problems of both vaccine hesitancy and electoral skepticism emerge from a common set of sources: a) declines in the trust that many citizens have in political processes, public institutions, and the news media; b) accumulation of misperceptions where the acceptance of one piece of misinformation often reliably predicts the endorsement of other misinformation; c) an active online group of merchants of doubt, often driven by ideological extremism and empowered by social media recommendation algorithms, and d) growing cadres of micro-influencers within online communities who, often unintentionally, play an outsized role in fueling the spread of misinformation. Despite the rapid development, testing, approval, and delivery of safe, reliable, and effective COVID-19 vaccines, 34.5 percent of Americans are not vaccinated. Despite a clear and transparent result, several recounts, audits, and lawsuits concerning the 2020 presidential election, 40 percent of Americans do not believe the result. Good science and good electoral administration alone are not enough to foster trust in health and political institutions, outcomes, and behaviors. Converging approaches across communication, social platforms, computer science, politics, and journalism are necessary to show which networks and actors spread falsehoods, and which strategies work best for reducing sharing and endorsement behaviors on social media that amplify misinformation.

Broader Impacts: The Broader Impacts of the project include delivering: 1) Course Correct: an interactive system that enables reporters to detect high-priority misinformation topics and the underlying networks where they flow, perform rapid-response randomized testing of fact-checks, and monitor their real-time performance, 2) the underlying code, survey instruments, and databases of labeled and curated messages to share publicly, 3) evidence-based corrective messages of immediate utility to public health and electoral professionals, 4) training of research personnel and journalists in interdisciplinary topics of global and practical significance, and 5) papers and presentations that will share our findings and conclusions with the academic and broader community.

and

Network Detection of Misinformation and its Spread: To address Aim 1, we will continue our work from Phase I, using multimodal signal detection to develop a curated dataset and machine learning classifiers to discern social media posts related to COVID-19 vaccine hesitancy and adoption of misinformation about election integrity, along with the spreaders and consumers of misinformation.



We have begun creating a corpus of millions of public content on our two topics: posts, images, videos, and URLs shared on popular social media and information platforms, including, but not limited to, Twitter, Reddit, and YouTube. In Phase II, we will consult with our Advisory Board member, Dr Kate Blackburn at TikTok to explore adding TikTok data collection. The data collection, which has already started, will span from January 2019 to January 2023. We focus our data collection on content about (1) election administration in the U.S. 2020 generally and a secondary focus on the 2022 midterm elections and (2) COVID-19. specifically, vaccine hesitancy. In Phase I, using the respective platforms’ Application Programming Interfaces (APIs), we have collected data about COVID-19 vaccine misinformation, including false claims that vaccines cause infertility and COVID-19 was caused by 5G*.




The data collection will continue to be done via a snowball sampling technique where we begin the collection with seed relevant keywords (identified with expert consultation) and then expand it with their co-occurring terms. With this corpus, we will continue to develop machine learning techniques to accurately detect electoral administration and COVID-19 related content that is directly related to attitudes about the veracity of the elections we target and the effectiveness and safety of the COVID-19 vaccines. Since all posts that contain a certain keyword (e g., ‘COVID-19′) may not be related to the topic (many users add popular keywords so their posts get more views), we will develop a two-tier filtering process to identify the relevant posts that support or deny a specific claim. We will take inspiration from our previous works that adopts a similar strategy to create a clean and relevant data corpust’. We will build supervised machine leaming classifiers for this task. The first tier of the classifier will weed out irrelevant posts, while the second tier will categorize posts as pro versus anti posts according to the topic. Word embedding and multi-modal models: To enable this, our team members will label a set of 2000-3000 posts on each topic and mark their relevance to the topic and their stance (pro or anti). We will use the relevance labels to train a supervised classifier (e.g., SVM, Random Forest classifier or a neural network), which uses text features as inputs and generates relevance class as output. The text features will contain syntactic, semantic, lexical and psycholinguistic categories. We will also use word embedding models (such as BERT and its variants’ , which will be fine-tuned on the supervised data) to extract tweet features? – a direction that our Advisory Board member, Dr. Koustuv Saha, has extensively used in his research. Models trained with an ensemble of all these features will be used for both tiers (relevance in the first tier and pro- or anti in the second tier). We will evaluate the performance of the trained machine learning classifiers with precision, recall, area under the ROC curve, false positive rate, and false negative rate with respect to the hand-labeled dataset. The classifier that perforns the best will be used to classify the entire corpus. As a proof of concept, in Phase I, we followed this pipeline to conduct classification for one topic of COVID-19 misinformation, specifically on ‘vaccines cause infertility’ misinformation. The classifier achieved an F-1 score of 0.9848 This shows the effectiveness of the proposed pipeline. This pipeline, however, was focused on text-based misinformation detection only.

In Phase II, we will extend the framework to detect misinformation to a multimodal setting, i.e. integrating images/videos along with the text. When both features are available (as is the case with many social media posts) the image can often disambiguate the text (for example, making it clear whether it is a post about basketball or about guns). In outline, we will develop deep multimodal fusion-based methods that leverage knowledge extraction from visual and linguistic features, as images can often complement aifically ont methad will encode the text usine_a_BERJibased fontire vector and.

Based on common forms and types of misinformation we detect, we will collaborate with our end-users at Snopes, the Wisconsin Center for Investigative Journalism, and the Capital Times in Year 1, adding International Fact-Checking Network signatories in Year 2, to co-design misinformation mitigation messages for use in evidence-based correction strategies. Interventions will be tested for effectiveness while also meeting organizational needs and journalistic norms— an aspect important for the purposes of cultural validity. Then, we will take advantage of existing sponsored content mechartisms available on platforms such as Twitter and deliver the co-designed interventions through randomized n-arm A/B testing to social media users on these platforms. Based on the pilot test we conducted in Phase I, we are confident that rapid-cycle A/B testing can help demonstrate the feasibility and efficacy of various corrective interventions, some content specific and some “evergreen” (i.e., non-content specific), and better understand which messages best reduce the endorsement and sharing of misinformation.

we will implement Course Correct into local, national, and international newsrooms, guided by dozens of interviews and ongoing collaborations with journalists, as well as tech developers and entrepreneurs. Rather than focusing on platform restriction and fact-checking partisan political elites, Course Correct will help journalists, and ultimately public health and election administration officials, to see what misinformation is circulating on social platforms and to quickly test correction strategies within the online communities most in need of seeing those corrections so that they are exposed to the verifiable truth. We will begin scaling up on a case study basis with our local (Capital Times), state (WCIJ, and national (Snopes) partners in Year 1. Phase II supports the hiring of a new journalist for