v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

Neural Information Processing Systems (NeurIPS), 2022

10 February 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,361 papers shown

Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language ModelsInternational Conference on Language Resources and Evaluation (LREC), 2023

...

320

21 May 2023

Decouple knowledge from parameters for plug-and-play language modeling

Xin Cheng

Yankai Lin

Preslav Nakov

Dongyan Zhao

Rui Yan

KELM

227

19 May 2023

Interpretability at Scale: Identifying Causal Mechanisms in AlpacaNeural Information Processing Systems (NeurIPS), 2023

434

108

15 May 2023

Semantic Composition in Visually Grounded Language Models

Rohan Pandey

CoGe

201

15 May 2023

FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Shangbin Feng

Vidhisha Balachandran

Yuyang Bai

Yulia Tsvetkov

KELM HILM

308

14 May 2023

RECKONING: Reasoning through Dynamic Knowledge EncodingNeural Information Processing Systems (NeurIPS), 2023

349

10 May 2023

Coherent Wave Dynamics and Language Generation of a Generative Pre-trained Transformer

Tao Hong

08 May 2023

Chain-of-Skills: A Configurable Model for Open-domain Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Kaixin Ma

Hao Cheng

Yu Zhang

Xiaodong Liu

Eric Nyberg

Jianfeng Gao

LRM

189

04 May 2023

ReMask: A Robust Information-Masking Approach for Domain Counterfactual GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

151

04 May 2023

Can LMs Learn New Entities from Descriptions? Challenges in Propagating Injected KnowledgeAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

421

02 May 2023

Key-Locked Rank One Editing for Text-to-Image PersonalizationInternational Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), 2023

425

217

02 May 2023

Finding Neurons in a Haystack: Case Studies with Sparse Probing

520

286

02 May 2023

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language modelNeural Information Processing Systems (NeurIPS), 2023

1.0K

179

30 Apr 2023

Towards Automated Circuit Discovery for Mechanistic InterpretabilityNeural Information Processing Systems (NeurIPS), 2023

Arthur Conmy

Augustine N. Mavor-Parker

Aengus Lynch

Stefan Heimersheim

Adrià Garriga-Alonso

531

442

28 Apr 2023

Dissecting Recall of Factual Associations in Auto-Regressive Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

724

418

28 Apr 2023

Label-Free Concept Bottleneck ModelsInternational Conference on Learning Representations (ICLR), 2023

332

239

12 Apr 2023

Localizing Model Behavior with Path Patching

Nicholas W. Goldowsky-Dill

Chris MacLeod

L. Sato

Aryaman Arora

489

122

12 Apr 2023

Inspecting and Editing Knowledge Representations in Language Models

300

123

03 Apr 2023

Ablating Concepts in Text-to-Image Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

Jun-Yan Zhu

478

282

23 Mar 2023

Language Model Behavior: A Comprehensive SurveyInternational Conference on Computational Logic (ICCL), 2023

Tyler A. Chang

Benjamin Bergen

VLM LRM LM&MA

372

139

20 Mar 2023

Context-faithful Prompting for Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

240

20 Mar 2023

Editing Implicit Assumptions in Text-to-Image Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

363

115

14 Mar 2023

Erasing Concepts from Diffusion ModelsIEEE International Conference on Computer Vision (ICCV), 2023

494

431

13 Mar 2023

Making a Computational AttorneySDM (SDM), 2023

165

07 Mar 2023

Finding Alignments Between Interpretable Causal Variables and Distributed Neural RepresentationsCLEaR (CLEaR), 2023

499

139

05 Mar 2023

Competence-Based Analysis of Language Models

357

01 Mar 2023

Edit at your own risk: evaluating the robustness of edited models to distribution shifts

246

28 Feb 2023

Inseq: An Interpretability Toolkit for Sequence Generation ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Nils Feldhus

317

27 Feb 2023

Analyzing And Editing Inner Mechanisms Of Backdoored Language ModelsConference on Fairness, Accountability and Transparency (FAccT), 2023

Max Lamparth

Anka Reuel

KELM

213

24 Feb 2023

Task-Specific Skill Localization in Fine-tuned Language ModelsInternational Conference on Machine Learning (ICML), 2023

322

13 Feb 2023

What Matters In The Structured Pruning of Generative Language Models?

178

07 Feb 2023

Effective Data Augmentation With Diffusion ModelsInternational Conference on Learning Representations (ICLR), 2023

472

336

07 Feb 2023

Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention MapsInternational Conference on Learning Representations (ICLR), 2023

463

01 Feb 2023

Do Multi-Document Summarization Models Synthesize?Transactions of the Association for Computational Linguistics (TACL), 2023

Jay DeYoung

Stephanie C. Martinez

Iain J. Marshall

Byron C. Wallace

287

31 Jan 2023

Truth Machines: Synthesizing Veracity in AI Language ModelsAi & Society (AI & Society), 2023

28 Jan 2023

Tracr: Compiled Transformers as a Laboratory for InterpretabilityNeural Information Processing Systems (NeurIPS), 2023

494

12 Jan 2023

Can Large Language Models Change User Preference Adversarially?

Varshini Subhash

AAML

180

05 Jan 2023

A Survey on Knowledge-Enhanced Pre-trained Language Models

238

27 Dec 2022

DialGuide: Aligning Dialogue Model Behavior with Developer GuidelinesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Yang Liu

220

20 Dec 2022

DSI++: Updating Transformer Memory with New DocumentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

223

19 Dec 2022

Talking About Large Language ModelsCommunications of the ACM (CACM), 2022

Murray Shanahan

AI4CE

350

373

07 Dec 2022

Language Models as Agent ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Jacob Andreas

LLMAG

269

169

03 Dec 2022

Convexifying Transformers: Improving optimization and understanding of transformer networks

226

20 Nov 2022

Aging with GRACE: Lifelong Model Editing with Discrete Key-Value AdaptorsNeural Information Processing Systems (NeurIPS), 2022

639

234

20 Nov 2022

Diagnostics for Deep Neural Networks with Automated Copy/Paste Attacks

Stephen Casper

K. Hariharan

Dylan Hadfield-Menell

AAML

401

18 Nov 2022

DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

230

101

10 Nov 2022

On the Domain Adaptation and Generalization of Pretrained Language Models: A Survey

Xu Guo

Han Yu

LM&MA VLM

307

06 Nov 2022

Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 smallInternational Conference on Learning Representations (ICLR), 2022

616

775

01 Nov 2022

Causal Analysis of Syntactic Agreement Neurons in Multilingual Language ModelsConference on Computational Natural Language Learning (CoNLL), 2022

233

25 Oct 2022

A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

384

21 Oct 2022