v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

Neural Information Processing Systems (NeurIPS), 2022

10 February 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,361 papers shown

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

...

458

1,998

09 Nov 2023

Future Lens: Anticipating Subsequent Tokens from a Single Hidden State

204

08 Nov 2023

Massive Editing for Large Language Models via Meta Learning

Chenmien Tan

Ge Zhang

Jie Fu

KELM

284

08 Nov 2023

Towards Interpretable Sequence Continuation: Analyzing Shared Circuits in Large Language Models

379

07 Nov 2023

The Linear Representation Hypothesis and the Geometry of Large Language ModelsInternational Conference on Machine Learning (ICML), 2023

485

335

07 Nov 2023

In-Context Exemplars as Clues to Retrieving from Large Associative Memory

Jiachen Zhao

292

06 Nov 2023

The Effect of Scaling, Retrieval Augmentation and Form on the Factual Consistency of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

175

02 Nov 2023

Training Dynamics of Contextual N-Grams in Language Models

261

01 Nov 2023

Defining a New NLP PlaygroundConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

Heng Ji

386

31 Oct 2023

DEPN: Detecting and Editing Privacy Neurons in Pretrained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

382

31 Oct 2023

The Expressibility of Polynomial based Attention Scheme

Zhao Song

Guangyi Xu

Junze Yin

326

30 Oct 2023

A Survey on Knowledge Editing of Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

416

30 Oct 2023

Debiasing Algorithm through Model AdaptationInternational Conference on Learning Representations (ICLR), 2023

Tomasz Limisiewicz

David Marecek

Tomáš Musil

476

29 Oct 2023

Codebook Features: Sparse and Discrete Interpretability for Neural NetworksInternational Conference on Machine Learning (ICML), 2023

Alex Tamkin

Mohammad Taufeeque

Noah D. Goodman

217

26 Oct 2023

How do Language Models Bind Entities in Context?International Conference on Learning Representations (ICLR), 2023

Jiahai Feng

Jacob Steinhardt

325

26 Oct 2023

Give Me the Facts! A Survey on Factual Knowledge Probing in Pre-trained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

315

25 Oct 2023

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

484

25 Oct 2023

Knowledge Editing for Large Language Models: A SurveyACM Computing Surveys (ACM Comput. Surv.), 2023

463

204

24 Oct 2023

In-Context Learning Creates Task VectorsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Roee Hendel

Mor Geva

Amir Globerson

345

244

24 Oct 2023

Characterizing Mechanisms for Factual Recall in Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

284

24 Oct 2023

SoK: Memorization in General-Purpose Large Language Models

Valentin Hartmann

Anshuman Suri

Vincent Bindschaedler

328

24 Oct 2023

Unnatural language processing: How do language models handle machine-generated prompts?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Corentin Kervadec

Francesca Franzon

Marco Baroni

251

24 Oct 2023

Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward NetworksBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Sunit Bhattacharya

Ondrej Bojar

171

24 Oct 2023

KITAB: Evaluating LLMs on Constraint Satisfaction for Information RetrievalInternational Conference on Learning Representations (ICLR), 2023

Mert Yuksekgonul

197

24 Oct 2023

Function Vectors in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

332

182

23 Oct 2023

Plausibility Processing in Transformer Language Models: Focusing on the Role of Attention Heads in GPT

Soo Hyun Ryu

171

20 Oct 2023

Understanding Addition in Transformers

Abir Harrasse

Fazl Barez

626

19 Oct 2023

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

431

19 Oct 2023

SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and GenerationInternational Conference on Learning Representations (ICLR), 2023

550

263

19 Oct 2023

Getting aligned on representational alignment

...

333

138

18 Oct 2023

Emptying the Ocean with a Spoon: Should We Edit Models?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yuval Pinter

Michael Elhadad

KELM

263

18 Oct 2023

From Neural Activations to Concepts: A Survey on Explaining Concepts in Neural Networks

Jae Hee Lee

Sergio Lanza

Stefan Wermter

238

18 Oct 2023

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric PerspectiveInternational Conference on Learning Representations (ICLR), 2023

367

17 Oct 2023

How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with RepresentationsInternational Conference on Learning Representations (ICLR), 2023

Huan Wang

Silvio Savarese

264

16 Oct 2023

Interpreting and Controlling Vision Foundation Models via Text Explanations

Haozhe Chen

Junfeng Yang

Carl Vondrick

Chengzhi Mao

206

16 Oct 2023

Attribution Patching Outperforms Automated Circuit DiscoveryBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

Aaquib Syed

Can Rager

Arthur Conmy

369

102

16 Oct 2023

Untying the Reversal Curse via Bidirectional Language Model Editing

318

16 Oct 2023

Cross-Lingual Consistency of Factual Knowledge in Multilingual Language Models

725

107

16 Oct 2023

VLIS: Unimodal Language Models Guide Multimodal Language Generation

Jiwan Chung

Youngjae Yu

VLM

253

15 Oct 2023

Measuring Feature Sparsity in Language Models

Mingyang Deng

Lucas Tao

Joe Benton

243

11 Oct 2023

Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity

...

Xing Xie

465

261

11 Oct 2023

How Do Large Language Models Capture the Ever-changing World Knowledge? A Review of Recent AdvancesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Zihan Zhang

Meng Fang

Lingxi Chen

Mohammad-Reza Namazi-Rad

Jun Wang

KELM

241

11 Oct 2023

An Adversarial Example for Direct Logit Attribution: Memory Management in gelu-4lBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023

384

11 Oct 2023

The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

Samuel Marks

Max Tegmark

HILM

486

360

10 Oct 2023

A Meta-Learning Perspective on Transformers for Causal Language ModelingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Xinbo Wu

Lav Varshney

304

09 Oct 2023

Factuality Challenges in the Era of Large Language Models

Giovanni Luca Ciampaglia

...

412

08 Oct 2023

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Jonathan Ragan-Kelley

Gintare Karolina Dziugaite

LRM

342

07 Oct 2023

SPADE: Sparsity-Guided Debugging for Deep Neural NetworksInternational Conference on Machine Learning (ICML), 2023

Arshia Soltani Moakhar

Eugenia Iofinova

Elias Frantar

Dan Alistarh

332

06 Oct 2023

DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

254

05 Oct 2023

Discovering Knowledge-Critical Subnetworks in Pretrained Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

261

04 Oct 2023