Perturbed examples reveal invariances shared by language models

7 November 2023

Papers citing "Perturbed examples reveal invariances shared by language models"

4 / 4 papers shown

Title
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 205 486 0 01 Nov 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers Yi Tay Mostafa Dehghani J. Rao W. Fedus Samira Abnar Hyung Won Chung Sharan Narang Dani Yogatama Ashish Vaswani Donald Metzler 181 89 0 22 Sep 2021
Similarity Analysis of Contextual Word Representation Models John M. Wu Yonatan Belinkov Hassan Sajjad Nadir Durrani Fahim Dalvi James R. Glass 40 65 0 03 May 2020