Text as Data (TADA 🪄) Conference 2022

October 6-7, 2022

TADA 2022

Conference: October 6-7, 2022 at Cornell Tech, Roosevelt Island, New York City

All events will take place in the Verizon Executive Education Center. From the Tram and F train stops, walk south along the river until you get to the Cornell Tech campus. The VEEC is on your left after the Graduate hotel.

Online registration is available for $25 (for AV support staff) until Wednesday Oct 5.

Registration form

For questions write to info@tada2022.org

The New Directions in Analyzing Text as Data (TADA) meeting is a leading forum for research on the study of politics, society, and culture through computational analysis of documents. Recent advances in NLP have the potential to revolutionize how we study human society. But using these tools effectively, reliably, and equitably requires continuous dialog between experts across computational methods, social science, and the humanities.

TADA 2022 invites applications for research presentations on new work related to text-as-data methods and applications. TADA is an interdisciplinary conference, drawing scholars from across the social sciences, computer and information science, and related fields. Our programs from past meetings (TADA 2018, TADA 2019, and TADA 2021) show the wide range of work presented at our conference.


  Thursday Oct 6
8:00 Breakfast
9:00 Opening remarks
9:15 Contributed talks 1
10:30 Break
11:00 Keynote Speaker, Julia Silge
12:00 Lunch (provided)
1:00 Contributed talks 2
2:15 Poster session A
3:15 Break
3:30 Contributed talks 3
  Dinner on your own
  Friday Oct 7
8:00 Breakfast
8:55 Remarks
9:00 Cassandra Project @ TADA Roundtable
10:00 Poster session B
11:00 Contributed talks 4
  Lunch (provided)

Keynote Speaker: Julia Silge

Julia Silge is a data scientist and software engineer at RStudio PBC where she works on open source modeling tools. She is an author, an international keynote speaker, and a real-world practitioner focusing on text mining, data analysis, and machine learning. Julia loves making beautiful charts and communicating about technical topics with diverse audiences.

Key Dates

This year’s conference will be held at Cornell Tech on Roosevelt Island and is sponsored by the Cornell Center for Social Science, the Cornell Center for Data Science for Enterprise and Society, and the National Science Foundation. Events will take place at the Verizon Executive Education Center on the Cornell Tech campus.

Accommodations The Graduate Hotel Roosevelt Island is immediately adjacent to the conference location. Roosevelt Island is also accessible on the F train from Manhattan and Queens. Locations around Bryant Park are particularly convenient. Note that subway service may be limited after 9pm.

Proposals are due July 18, and consist of a brief, 300-word abstract in text format rather than a full paper. TADA 2022 is a non-archival conference; there are no formal proceedings, and papers presented at the conference will not be distributed publicly by the conference. Presenters are expected to provide a paper to their discussant two weeks before the conference. We welcome any work, so long as it hasn’t been previously presented at a TADA conference. We also welcome individuals to volunteer to serve as discussants.

In addition to oral presentations and posters TADA 2022 will have a doctoral consortium. PhD students will be matched with experienced mentors from complementary fields to offer critiques to specific work and to provide guidance in how to do effective interdisciplinary work.

Diversity leads to stronger science. We actively seek, welcome, and encourage people with diverse backgrounds, experiences, and identities to apply and attend. While many participants have attended TADA for years, we also eagerly welcome new researchers!


Talks should be 12-15 minutes, leaving time for discussant remarks and audience questions.

Title Author
Where Did It Come From? Deep Learning for Event Extraction in Art Provenance Fabio Mariani
Immigration and Social Distance: Evidence from Newspapers during the Age of Mass Migration Elliott Ash, Gloria Gennaro, Dominik Hangartner, Alessandra Stampi-Bombelli
Do Journalists Overstate Science? Findings from Computational Modeling of Scientific (Un)certainty Jiaxin Pei and David Jurgens
  Discussant: Stephen Downie
Title Author
How does rising inflation affect EV charging cost and consumer sentiment? Sarthak Chaturvedi, Omar Isaac Asensio; Georgia Institute of Technology
Conceptualization of ESG in corporate discourse: a computational text analytic approach Ilya Akdemir
A Graph-Augmented Generative Entity-to-Entity Stance Detection Framework Xinliang Frederick Zhang, Nick Beauchamp, Lu Wang
  Discussant: Ken Benoit
Title Author
Challenges in Opinion Manipulation Detection: An Examination of Wartime Russian Media Chan Young Park, Julia Mendelsohn, Anjalie Field, Yulia Tsvetkov
Strengthening Propaganda and the Limits of Media Commercialization in China: Evidence from Millions of Newspaper Articles Margaret Roberts, Brandon Stewart, Hannah Waight, and Yin Yuan
Was It Political? Interpretations of the 1967 Detroit Rebellion by Detroit Residents Fifty Years Later Tina Law
  Discussant: Sarah Dreier
Title Author
Towards measuring populism from text Ines Rehbein, Christopher Klamm, Simone Ponzetto
Sounding the Bullhorn: Surfacing and Analyzing Dogwhistles with Language Models Julia Mendelsohn, Maarten Sap, Ronan Le Bras
News Media Consolidation and Ideological Positioning Pierre Bodéré, Nicolas Longuet Marx, Marguerite Obolensky
  Discussant: Laure Thompson

Poster sessions

Posters may be in any shape up to A0 size.

Session A (Thursday)

Title Authors
Computational Text Analysis of Binding Language in Administrative Guidance Amit Haim
A Versatile Data Annotation System Yikai Liu, Mingye Chen, Naihao Deng, Yulong Chen
“Tell China’s Story Well” on YouTube: How do pro-Beijing influencers (re)shape China’s global narratives Ryan Wang
OCR Correction of Historical Texts with Pre-Trained Language Models Chris Buckley, Melissa M. Lee, Brandon M. Stewart
Dictionary Enrichment with Word Embedding: Tracking Online Incivility in Hong KongDictionary Enrichment with Word Embedding: Tracking Online Incivility in Hong Kong Hai Liang, Yee Man Margaret Ng, & Nathan L.T. Tsang
Quantifying the Causal Effect of Gender on Interruptions in Supreme Court Oral Arguments Katherine Keith, Ankita Gupta, Erica Cai, Brendan O’Connor, Douglas Rice
Medical Misinformation during a Pandemic: Text as Data during the Russian Influenza (1889-1890) E. Thomas Ewing
Scaling latent political positions from textual data using word embeddings Patrick Schwabl
How to stop ignoring automated classification errors: Differential measurement error and inter-coder reliability in measurement error models Nathan TeBlunthuis, Valerie Hase, Chung-hong Chan
How Questions Can Propagate Online Mis- and Dis-information Kaitlyn Zhou and Dan Jurafsky
Construction and Analysis of a Map-Based Data Corpus for Tracking Linguistic Variation and Demographic Characteristic Identification Theodore Daniel Manning, Eugenia Lukin, Ross Klein, James Cooper Roberts, Eliana Mugar, Michael Fang, Harleigh Niyu, Alejandro Napolitano-Jawerbaum, Patrick Juola
The impact of social media reaction design on political discourse: A quasi-experimental analysis of 155 million comments on Reddit Orestis Papakyriakopoulos, Severin Engelmann, Amy Winecoff
Removing the Heavy Burden of Corruption: Media, Movements, and Politics in the Grand Corruption Reform in South Korea, 2016-2017 Hyunsik Chun, Ion Bogdan Vasi, Chanhum Yoon
Measures and Interventions for improving workplace feedback Michael Yeomans & Ariella Kristal
Synthetic text for supervised text analysis Andrew Halterman
Causal Attributions in Textual Data Paulina García Corral
Seeing Like a Topic Model Bolun Zhang, Yimang Zhou, Dai Li
“Get Out and Vote,” Or “You Can’t Complain”: Non-Voters on Twitter During the 2016 and 2020 U.S. Elections Chelsea Butkowski, Sam Wilson, Eric Wiemer
Dictionary-Assisted Supervised Contrastive Learning Patrick Y. Wu, Richard Bonneau, Joshua A. Tucker, Jonathan Nagler
The Rise of and Demand for Identitarian Media Coverage Daniel Hopkins, Yphtach Lelkes, Samuel Wolken
Cambridge Law Corpus: A corpus for research on legal AI Andreas Östling; Holli Sargeant; Ludwig Bull; Alex Terenin; Leif Jonsson; Måns Magnusson; Felix Steffek
Filtering Technologies and the Fairness of Natural Language Systems Eddie Yang, Chad Atalla, Su Lin Blodgett, Kate Cook, Kristen Laird, Emily Lawton, Michael Madaio, Samir Passi, Forough Poursabzi, Vyoma Raman, Bella Rideau, Emily Sheng, Dan Vann, Andy Zhao, Solon Barocas, Hanna Wallach

Poster session B (Friday)

Title Authors
Multilingual Word Embeddings for Social Scientists: Estimation, Inference and Validation Resources for 157 Languages Pedro L. Rodriguez, Arthur Spirling, Brandon M. Stewart, Elisa M. Wirsching
Bridging Topic Modeling and Framing Theory Arya D. McCarthy and Giovanna Maria Dora Dore
Do Politicians Collaborate? Measuring Coordination in Political Discourse Katherine Atwell, Michael Datz, Max Goplerud, Tessa Provins, Malihe Alikhani
COVID-19 Public Opinion on Turkish Twittersphere Burak Ozturan, Yunus Emre Tapan
Aligning Large Natural Language Documents Tanzir Pial, Steven Skiena
Exploring conflicting values in the founding of ARPANET and the Internet Meera Desai
LegisBERT: A Language Model for the Analysis of Legislative Text Mitchell Bosley
What now? - a Twitter textual analysis of the abortion debates with the shifting policies in the US Jialin Shan, Tuan-he Lee, Hanfei Li
Narrative Detection Across Political Domains Maria Antoniak, Elliott Ash
Spoken Identity: Titular Language Usage and the War in Ukraine Erin Walk
Signaled or Suppressed? How Gender Informs Women’s Undergraduate Applications in Biology and Engineering Sonia Giebel, AJ Alvero, Ben Gebre-Medhin, anthony lising antonio
Affective Idiosyncratic Responses to Music Sky CH-Wang, Evan Li, Oliver Li, Smaranda Muresan, Zhou Yu
Inferring Age from Linguistic and Verbal Cues in Celebrity Interviews Yunting Yin, Steven Skiena
A Step-by-Step Protocol for Curation of Topic Models by Subject Matter Experts Philip Resnik, Pranav Goel, Alexander Hoyle, Rupak Sarkar, Josh Hagedorn, Maeve Gearing, and Carol Bruce
Who gets a say in this? Speaking security on social media Natalia Umansky
Learning from Machines: Differentiating US Presidential Campaigns with Attribution and Annotation Musashi Jacobs-Harukawa
Finding the story: Leveraging expert knowledge in computational sensemaking of multi-platform text data Hope Schroeder, Tobin South
What’s a Parent to do? Measuring the Cultural Logics of Parenting with Biterm Topic Models Orestes P. Hastings, Luca Maria Pesando
Decoding matrimonial advertisements: Individual preferences entrenched in socio cultural biases Pranathi Iyer
Gendered Information in Resumes and Hiring Bias: A Predictive Modeling Approach Prasanna Parasurama, Joao Sedoc, Anindya Ghose

Roundtable Discussion

The Cassandra Project at Johns Hopkins has organized its third roundtable in the Learning How to Play with the Machines series on computational social science at TADA.

Title Authors
Misinformation and dataset biases Panelists: Kathy McKeown, David Mimno, Sarah Shugars, Arthur Spirling
  Chairs: Giovanna Maria Dora Dore, Eva Klaus, Arya D. McCarthy