Analyzing the Inner Workings of Transformers in Compositional GeneralizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025 |
Circuit Compositions: Exploring Modular Structures in Transformer-Based Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |
Optimal ablation for interpretabilityNeural Information Processing Systems (NeurIPS), 2024 |
Embedded Named Entity Recognition using Probing ClassifiersConference on Empirical Methods in Natural Language Processing (EMNLP), 2024 |
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale
of Two BenchmarksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023 |
Attribution Patching Outperforms Automated Circuit DiscoveryBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023 |
SPADE: Sparsity-Guided Debugging for Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2023 |
Discovering Knowledge-Critical Subnetworks in Pretrained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023 |
Towards Automated Circuit Discovery for Mechanistic InterpretabilityNeural Information Processing Systems (NeurIPS), 2023 |
Break It Down: Evidence for Structural Compositionality in Neural
NetworksNeural Information Processing Systems (NeurIPS), 2023 |
CREPE: Can Vision-Language Foundation Models Reason Compositionally?Computer Vision and Pattern Recognition (CVPR), 2022 |
The Architectural Bottleneck PrincipleConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
SocioProbe: What, When, and Where Language Models Learn about
SociodemographicsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022 |
Emergent World Representations: Exploring a Sequence Model Trained on a
Synthetic TaskInternational Conference on Learning Representations (ICLR), 2022 |
The Open-World Lottery Ticket Hypothesis for OOD Intent ClassificationInternational Conference on Language Resources and Evaluation (LREC), 2022 |
Probing via PromptingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022 |
Visualizing the Relationship Between Encoded Linguistic Information and
Task PerformanceFindings (Findings), 2022 |