I am an assistant professor of Linguistics and Data Science at Boston University. My research is on interpretability for natural language processing (NLP), with the aim of developing an interdisciplinary science of deep neural language models. Interests include but are not limited to:
- probing, neural representations, feature attribution
- linguistic evaluation and psycholinguistic modeling
- theory of computation, analysis of neural architectures
- generative linguistics, syntax, phonology, mathematical linguistics
- bias, fairness, digital humanities and social science
- theoretical foundations of NLP and computational linguistics
I was recently a postdoc without a supervisor in Data Science at New York University, where I worked with Tal Linzen, Sunoo Park, Saadia Gabriel, Byung-Doh Oh, João Sedoc, and many amazing students. I did my PhD in Linguistics and Computer Science with Dana Angluin and Bob Frank at Yale University.
Representative Publications
Formal Language Recognition by Hard Attention Transformers: Perspectives from Circuit Complexity
TACL, 2022 (official version)
Sophie Hao, Dana Angluin, and Robert Frank
Verb Conjugation in Transformers is Determined by Linear Encodings of Subject Number
EMNLP Findings, 2023 (official version)
Sophie Hao and Tal Linzen
What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length
NAACL, 2025 (official version, arXiv)
Lindia Tjuatja, Graham Neubig, Tal Linzen, and Sophie Hao
ModelCitizens: Representing Community Voices in Online Safety
EMNLP, To Appear (arXiv)
Ashima Suvarna, Christina Chance, Karolina Naranjo, Hamid Palangi, Sophie Hao, Thomas Hartvigsen, and Saadia Gabriel
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Italian Journal of Linguistics, To Appear (arXiv)
Sophie Hao
Recent Invited Talks
Towards a Science and Mathematics of Language Models
Mathematics of Language Conference, 2025 (slides)
Word Embeddings: Examining Culture Through a Data-Driven Lens
Vanderbilt University Department of English, 2024
Transformers and Circuit Complexity
Flatiron Institute Center for Computational Mathematics, 2023
Understanding RNNs and Transformers using Formal Languages
ETH Zürich Department of Computer Science, 2022
University of Notre Dame Department of Computer Science, 2022
Education
PhD in Linguistics and Computer Science
Yale University, 2022
Advisors: Dana Angluin and Bob Frank
Committee: Yoav Goldberg, John Lafferty, and Jason Shaw
BA in Mathematics and Linguistics
University of Chicago, 2015
Advisor: Greg Kobele
Professional Experience
Assistant Professor
Boston University, 2025–
Assistant Professor/Faculty Fellow
New York University, 2022–2025
With Tal Linzen, Sunoo Park, and others
Natural Language Machine Learning Intern
Apple, Summer 2021
With Hadas Kotek, David Q. Sun, and others
Visiting Researcher
National Institute of Informatics, Summer 2017
With Makoto Kanazawa, Ryo Yoshinaka, and others
Software Developer
Epic, 2015–2016
Sales and Trading Intern
Wells Fargo Securities, Summer 2014
Market Research Intern
Networked Insights, Summer 2012
Profile photo credit: Hannah Parsley