Sophie Hao

Assistant Professor
Linguistics and Data Science
Boston University

I am a Moorman–Simon assistant professor of Linguistics and Data Science at Boston University. My research is on interpretability for natural language processing (NLP), with the aim of developing an interdisciplinary science of deep neural language models. Interests include but are not limited to:

probing, neural representations, feature attribution
linguistic evaluation and psycholinguistic modeling
theory of computation, analysis of neural architectures
generative linguistics, syntax, phonology, mathematical linguistics
bias, fairness, digital humanities and social science
theoretical foundations of NLP and computational linguistics

I was recently a postdoc without a supervisor in Data Science at New York University, where I worked with Tal Linzen, Sunoo Park, Saadia Gabriel, Byung-Doh Oh, João Sedoc, and many amazing students. I did my PhD in Linguistics and Computer Science with Dana Angluin and Bob Frank at Yale University.

Apply to work with me!

What does the URL mean?

Pizza ranking

Representative Publications

Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
TACL, 2022 (official version)
Sophie Hao, Dana Angluin, and Robert Frank

Verb Conjugation in Transformers is Determined by Linear Encodings of Subject Number
EMNLP Findings, 2023 (official version)
Sophie Hao and Tal Linzen

What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
NAACL, 2025 (official version, arXiv)
Lindia Tjuatja, Graham Neubig, Tal Linzen, and Sophie Hao

Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Italian Journal of Linguistics, 2025 (arXiv)
Sophie Hao

ModelCitizens: Representing Community Voices in Online Safety
EMNLP, To Appear (arXiv)
Ashima Suvarna, Christina Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, and Saadia Gabriel

Recent Invited Talks

Towards a Science and Mathematics of Language Models
Mathematics of Language Conference, 2025 (slides)

Word Embeddings: Examining Culture Through a Data-Driven Lens
Vanderbilt University Department of English, 2024

Transformers and Circuit Complexity
Flatiron Institute Center for Computational Mathematics, 2023

Understanding RNNs and Transformers using Formal Languages
ETH Zürich Department of Computer Science, 2022
University of Notre Dame Department of Computer Science, 2022

Education

PhD in Linguistics and Computer Science
Yale University, 2022
Advisors: Dana Angluin and Bob Frank
Committee: Yoav Goldberg, John Lafferty, and Jason Shaw

BA in Mathematics and Linguistics
University of Chicago, 2015
Advisor: Greg Kobele

Professional Experience

Assistant Professor
Boston University, 2025–

Assistant Professor/Faculty Fellow
New York University, 2022–2025
With Tal Linzen, Sunoo Park, and others

Natural Language Machine Learning Intern
Apple, Summer 2021
With Hadas Kotek, David Q. Sun, and others

Visiting Researcher
National Institute of Informatics, Summer 2017
With Makoto Kanazawa, Ryo Yoshinaka, and others

Software Developer
Epic, 2015–2016

Sales and Trading Intern
Wells Fargo Securities, Summer 2014

Market Research Intern
Networked Insights, Summer 2012

Profile photo credit: Hannah Parsley