Text as Data (TADA 🪄) Conference 2022

October 6-7, 2022

TADA 2022

Conference: October 6-7, 2022 at Cornell Tech, Roosevelt Island, New York City

All events will take place in the Verizon Executive Education Center. From the Tram and F train stops, walk south along the river until you get to the Cornell Tech campus. The VEEC is on your left after the Graduate hotel.

Online registration is available for $25 (for AV support staff) until Wednesday Oct 5.

Registration form

For questions write to info@tada2022.org

The New Directions in Analyzing Text as Data (TADA) meeting is a leading forum for research on the study of politics, society, and culture through computational analysis of documents. Recent advances in NLP have the potential to revolutionize how we study human society. But using these tools effectively, reliably, and equitably requires continuous dialog between experts across computational methods, social science, and the humanities.

TADA 2022 invites applications for research presentations on new work related to text-as-data methods and applications. TADA is an interdisciplinary conference, drawing scholars from across the social sciences, computer and information science, and related fields. Our programs from past meetings (TADA 2018, TADA 2019, and TADA 2021) show the wide range of work presented at our conference.

Schedule

	Thursday Oct 6
8:00	Breakfast
9:00	Opening remarks
9:15	Contributed talks 1
10:30	Break
11:00	Keynote Speaker, Julia Silge
12:00	Lunch (provided)
1:00	Contributed talks 2
2:15	Poster session A
3:15	Break
3:30	Contributed talks 3
	Dinner on your own

	Friday Oct 7
8:00	Breakfast
8:55	Remarks
9:00	Cassandra Project @ TADA Roundtable
10:00	Poster session B
11:00	Contributed talks 4
	Lunch (provided)

Keynote Speaker: Julia Silge

Julia Silge is a data scientist and software engineer at RStudio PBC where she works on open source modeling tools. She is an author, an international keynote speaker, and a real-world practitioner focusing on text mining, data analysis, and machine learning. Julia loves making beautiful charts and communicating about technical topics with diverse audiences.

Key Dates

Monday July 18, abstract submission
Monday Aug 15, notification of selection
Sept 2, registration opens for participation
Thursday Sep 22, full papers for discussants
Thursday Oct 6 – Friday Oct 7, conference

This year’s conference will be held at Cornell Tech on Roosevelt Island and is sponsored by the Cornell Center for Social Science, the Cornell Center for Data Science for Enterprise and Society, and the National Science Foundation. Events will take place at the Verizon Executive Education Center on the Cornell Tech campus.

Accommodations The Graduate Hotel Roosevelt Island is immediately adjacent to the conference location. Roosevelt Island is also accessible on the F train from Manhattan and Queens. Locations around Bryant Park are particularly convenient. Note that subway service may be limited after 9pm.

Proposals are due July 18, and consist of a brief, 300-word abstract in text format rather than a full paper. TADA 2022 is a non-archival conference; there are no formal proceedings, and papers presented at the conference will not be distributed publicly by the conference. Presenters are expected to provide a paper to their discussant two weeks before the conference. We welcome any work, so long as it hasn’t been previously presented at a TADA conference. We also welcome individuals to volunteer to serve as discussants.

In addition to oral presentations and posters TADA 2022 will have a doctoral consortium. PhD students will be matched with experienced mentors from complementary fields to offer critiques to specific work and to provide guidance in how to do effective interdisciplinary work.

Diversity leads to stronger science. We actively seek, welcome, and encourage people with diverse backgrounds, experiences, and identities to apply and attend. While many participants have attended TADA for years, we also eagerly welcome new researchers!

Talks

Talks should be 12-15 minutes, leaving time for discussant remarks and audience questions.

Title	Author
Where Did It Come From? Deep Learning for Event Extraction in Art Provenance	Fabio Mariani
Immigration and Social Distance: Evidence from Newspapers during the Age of Mass Migration	Elliott Ash, Gloria Gennaro, Dominik Hangartner, Alessandra Stampi-Bombelli
Do Journalists Overstate Science? Findings from Computational Modeling of Scientific (Un)certainty	Jiaxin Pei and David Jurgens
	Discussant: Stephen Downie

Title	Author
How does rising inflation affect EV charging cost and consumer sentiment?	Sarthak Chaturvedi, Omar Isaac Asensio; Georgia Institute of Technology
Conceptualization of ESG in corporate discourse: a computational text analytic approach	Ilya Akdemir
A Graph-Augmented Generative Entity-to-Entity Stance Detection Framework	Xinliang Frederick Zhang, Nick Beauchamp, Lu Wang
	Discussant: Ken Benoit

Title	Author
Challenges in Opinion Manipulation Detection: An Examination of Wartime Russian Media	Chan Young Park, Julia Mendelsohn, Anjalie Field, Yulia Tsvetkov
Strengthening Propaganda and the Limits of Media Commercialization in China: Evidence from Millions of Newspaper Articles	Margaret Roberts, Brandon Stewart, Hannah Waight, and Yin Yuan
Was It Political? Interpretations of the 1967 Detroit Rebellion by Detroit Residents Fifty Years Later	Tina Law
	Discussant: Sarah Dreier

Title	Author
Towards measuring populism from text	Ines Rehbein, Christopher Klamm, Simone Ponzetto
Sounding the Bullhorn: Surfacing and Analyzing Dogwhistles with Language Models	Julia Mendelsohn, Maarten Sap, Ronan Le Bras
News Media Consolidation and Ideological Positioning	Pierre Bodéré, Nicolas Longuet Marx, Marguerite Obolensky
	Discussant: Laure Thompson

Poster sessions

Posters may be in any shape up to A0 size.

Session A (Thursday)

Title	Authors
Computational Text Analysis of Binding Language in Administrative Guidance	Amit Haim
A Versatile Data Annotation System	Yikai Liu, Mingye Chen, Naihao Deng, Yulong Chen
“Tell China’s Story Well” on YouTube: How do pro-Beijing influencers (re)shape China’s global narratives	Ryan Wang
OCR Correction of Historical Texts with Pre-Trained Language Models	Chris Buckley, Melissa M. Lee, Brandon M. Stewart
Dictionary Enrichment with Word Embedding: Tracking Online Incivility in Hong KongDictionary Enrichment with Word Embedding: Tracking Online Incivility in Hong Kong	Hai Liang, Yee Man Margaret Ng, & Nathan L.T. Tsang
Quantifying the Causal Effect of Gender on Interruptions in Supreme Court Oral Arguments	Katherine Keith, Ankita Gupta, Erica Cai, Brendan O’Connor, Douglas Rice
Medical Misinformation during a Pandemic: Text as Data during the Russian Influenza (1889-1890)	E. Thomas Ewing
Scaling latent political positions from textual data using word embeddings	Patrick Schwabl
How to stop ignoring automated classification errors: Differential measurement error and inter-coder reliability in measurement error models	Nathan TeBlunthuis, Valerie Hase, Chung-hong Chan
How Questions Can Propagate Online Mis- and Dis-information	Kaitlyn Zhou and Dan Jurafsky
Construction and Analysis of a Map-Based Data Corpus for Tracking Linguistic Variation and Demographic Characteristic Identification	Theodore Daniel Manning, Eugenia Lukin, Ross Klein, James Cooper Roberts, Eliana Mugar, Michael Fang, Harleigh Niyu, Alejandro Napolitano-Jawerbaum, Patrick Juola
The impact of social media reaction design on political discourse: A quasi-experimental analysis of 155 million comments on Reddit	Orestis Papakyriakopoulos, Severin Engelmann, Amy Winecoff
Removing the Heavy Burden of Corruption: Media, Movements, and Politics in the Grand Corruption Reform in South Korea, 2016-2017	Hyunsik Chun, Ion Bogdan Vasi, Chanhum Yoon
Measures and Interventions for improving workplace feedback	Michael Yeomans & Ariella Kristal
Synthetic text for supervised text analysis	Andrew Halterman
Causal Attributions in Textual Data	Paulina García Corral
Seeing Like a Topic Model	Bolun Zhang, Yimang Zhou, Dai Li
“Get Out and Vote,” Or “You Can’t Complain”: Non-Voters on Twitter During the 2016 and 2020 U.S. Elections	Chelsea Butkowski, Sam Wilson, Eric Wiemer
Dictionary-Assisted Supervised Contrastive Learning	Patrick Y. Wu, Richard Bonneau, Joshua A. Tucker, Jonathan Nagler
The Rise of and Demand for Identitarian Media Coverage	Daniel Hopkins, Yphtach Lelkes, Samuel Wolken
Cambridge Law Corpus: A corpus for research on legal AI	Andreas Östling; Holli Sargeant; Ludwig Bull; Alex Terenin; Leif Jonsson; Måns Magnusson; Felix Steffek
Filtering Technologies and the Fairness of Natural Language Systems	Eddie Yang, Chad Atalla, Su Lin Blodgett, Kate Cook, Kristen Laird, Emily Lawton, Michael Madaio, Samir Passi, Forough Poursabzi, Vyoma Raman, Bella Rideau, Emily Sheng, Dan Vann, Andy Zhao, Solon Barocas, Hanna Wallach

Poster session B (Friday)

Title	Authors
Multilingual Word Embeddings for Social Scientists: Estimation, Inference and Validation Resources for 157 Languages	Pedro L. Rodriguez, Arthur Spirling, Brandon M. Stewart, Elisa M. Wirsching
Bridging Topic Modeling and Framing Theory	Arya D. McCarthy and Giovanna Maria Dora Dore
Do Politicians Collaborate? Measuring Coordination in Political Discourse	Katherine Atwell, Michael Datz, Max Goplerud, Tessa Provins, Malihe Alikhani
COVID-19 Public Opinion on Turkish Twittersphere	Burak Ozturan, Yunus Emre Tapan
Aligning Large Natural Language Documents	Tanzir Pial, Steven Skiena
Exploring conflicting values in the founding of ARPANET and the Internet	Meera Desai
LegisBERT: A Language Model for the Analysis of Legislative Text	Mitchell Bosley
What now? - a Twitter textual analysis of the abortion debates with the shifting policies in the US	Jialin Shan, Tuan-he Lee, Hanfei Li
Narrative Detection Across Political Domains	Maria Antoniak, Elliott Ash
Spoken Identity: Titular Language Usage and the War in Ukraine	Erin Walk
Signaled or Suppressed? How Gender Informs Women’s Undergraduate Applications in Biology and Engineering	Sonia Giebel, AJ Alvero, Ben Gebre-Medhin, anthony lising antonio
Affective Idiosyncratic Responses to Music	Sky CH-Wang, Evan Li, Oliver Li, Smaranda Muresan, Zhou Yu
Inferring Age from Linguistic and Verbal Cues in Celebrity Interviews	Yunting Yin, Steven Skiena
A Step-by-Step Protocol for Curation of Topic Models by Subject Matter Experts	Philip Resnik, Pranav Goel, Alexander Hoyle, Rupak Sarkar, Josh Hagedorn, Maeve Gearing, and Carol Bruce
Who gets a say in this? Speaking security on social media	Natalia Umansky
Learning from Machines: Differentiating US Presidential Campaigns with Attribution and Annotation	Musashi Jacobs-Harukawa
Finding the story: Leveraging expert knowledge in computational sensemaking of multi-platform text data	Hope Schroeder, Tobin South
What’s a Parent to do? Measuring the Cultural Logics of Parenting with Biterm Topic Models	Orestes P. Hastings, Luca Maria Pesando
Decoding matrimonial advertisements: Individual preferences entrenched in socio cultural biases	Pranathi Iyer
Gendered Information in Resumes and Hiring Bias: A Predictive Modeling Approach	Prasanna Parasurama, Joao Sedoc, Anindya Ghose

Roundtable Discussion

The Cassandra Project at Johns Hopkins has organized its third roundtable in the Learning How to Play with the Machines series on computational social science at TADA.

Title	Authors
Misinformation and dataset biases	Panelists: Kathy McKeown, David Mimno, Sarah Shugars, Arthur Spirling
	Chairs: Giovanna Maria Dora Dore, Eva Klaus, Arya D. McCarthy