v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

Neural Information Processing Systems (NeurIPS), 2022

10 February 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,361 papers shown

The Hydra Effect: Emergent Self-repair in Language Model Computations

229

28 Jul 2023

FeedbackLogs: Recording and Incorporating Stakeholder Feedback into Machine Learning PipelinesConference on Equity and Access in Algorithms, Mechanisms, and Optimization (EAAMO), 2023

189

28 Jul 2023

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

...

Dorsa Sadigh

Dylan Hadfield-Menell

ALM OffRL

367

731

27 Jul 2023

Evaluating the Ripple Effects of Knowledge Editing in Language ModelsTransactions of the Association for Computational Linguistics (TACL), 2023

419

231

24 Jul 2023

Embroid: Unsupervised Prediction Smoothing Can Improve Few-Shot ClassificationNeural Information Processing Systems (NeurIPS), 2023

186

20 Jul 2023

Deceptive Alignment Monitoring

220

20 Jul 2023

Can Neural Network Memorization Be Localized?International Conference on Machine Learning (ICML), 2023

J. Zico Kolter

182

18 Jul 2023

Overthinking the Truth: Understanding how Language Models Process False DemonstrationsInternational Conference on Learning Representations (ICLR), 2023

Danny Halawi

Jean-Stanislas Denain

Jacob Steinhardt

315

18 Jul 2023

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

322

141

18 Jul 2023

Discovering Variable Binding Circuitry with Desiderata

192

07 Jul 2023

An Overview of Catastrophic AI Risks

601

253

21 Jun 2023

Schema-learning and rebinding as mechanisms of in-context learning and emergenceNeural Information Processing Systems (NeurIPS), 2023

Siva K. Swaminathan

Antoine Dedieu

Rajkumar Vasudeva Raju

Murray Shanahan

Miguel Lazaro-Gredilla

Dileep George

223

16 Jun 2023

Propagating Knowledge Updates to LMs Through DistillationNeural Information Processing Systems (NeurIPS), 2023

268

15 Jun 2023

Operationalising Representation in Natural Language ProcessingBritish Journal for the Philosophy of Science (BJPS), 2023

J. Harding

351

14 Jun 2023

Measuring and Modifying Factual Knowledge in Large Language ModelsInternational Conference on Machine Learning and Applications (ICMLA), 2023

Pouya Pezeshkpour

KELM

198

09 Jun 2023

Mixture-of-Domain-Adapters: Decoupling and Injecting Domain Knowledge to Pre-trained Language Models MemoriesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Tong Zhang

224

08 Jun 2023

Causal interventions expose implicit situation models for commonsense language understandingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

323

06 Jun 2023

Inference-Time Intervention: Eliciting Truthful Answers from a Language ModelNeural Information Processing Systems (NeurIPS), 2023

758

839

06 Jun 2023

Encoding Time-Series Explanations through Self-Supervised Model Behavior ConsistencyNeural Information Processing Systems (NeurIPS), 2023

Theodoros Tsiligkaridis

Marinka Zitnik

AI4TS

312

03 Jun 2023

Learning Transformer ProgramsNeural Information Processing Systems (NeurIPS), 2023

Dan Friedman

Alexander Wettig

Danqi Chen

301

01 Jun 2023

Birth of a Transformer: A Memory ViewpointNeural Information Processing Systems (NeurIPS), 2023

395

142

01 Jun 2023

ReFACT: Updating Text-to-Image Models by Editing the Text EncoderNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

348

01 Jun 2023

Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive SurveyACM Computing Surveys (ACM Comput. Surv.), 2023

...

Quanquan Gu

417

216

30 May 2023

Gaussian Process Probes (GPP) for Uncertainty-Aware ProbingNeural Information Processing Systems (NeurIPS), 2023

261

29 May 2023

Detecting Edit Failures In Large Language Models: An Improved Specificity BenchmarkAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

J. Hoelscher-Obermaier

261

27 May 2023

Theoretical and Practical Perspectives on what Influence Functions DoNeural Information Processing Systems (NeurIPS), 2023

177

26 May 2023

Backpack Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

John Hewitt

John Thickstun

Christopher D. Manning

Abigail Z. Jacobs

KELM

239

26 May 2023

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion ModelsACM Transactions on Graphics (TOG), 2023

431

120

25 May 2023

Language Models Implement Simple Word2Vec-style Vector ArithmeticNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023

340

25 May 2023

Self-contradictory Hallucinations of Large Language Models: Evaluation, Detection and MitigationInternational Conference on Learning Representations (ICLR), 2023

Niels Mündler

Jingxuan He

Slobodan Jenko

Martin Vechev

HILM

314

159

25 May 2023

Editable Graph Neural Network for Node Classifications

227

24 May 2023

Referral Augmentation for Zero-Shot Information RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

249

24 May 2023

Meta-Learning Online Adaptation of Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Nathan J. Hu

E. Mitchell

Christopher D. Manning

Chelsea Finn

KELM

297

24 May 2023

A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

286

24 May 2023

Editing Common Sense in TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Anshita Gupta

Debanjan Mondal

Akshay Krishna Sheshadri

Xiang Lorraine Li

230

24 May 2023

Mitigating Temporal Misalignment by Discarding Outdated FactsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Michael J.Q. Zhang

Eunsol Choi

KELM HILM

287

24 May 2023

MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop QuestionsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Zexuan Zhong

Zhengxuan Wu

Christopher D. Manning

Christopher Potts

Danqi Chen

KELM

396

276

24 May 2023

Can Transformers Learn to Solve Problems Recursively?

176

24 May 2023

All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations

Qipeng Guo

149

23 May 2023

Deduction under Perturbed Evidence: Probing Student Simulation Capabilities of Large Language Models

Shashank Sonkar

Richard G. Baraniuk

126

23 May 2023

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on WikipediaConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

340

103

23 May 2023

Polyglot or Not? Measuring Multilingual Encyclopedic Knowledge in Foundation ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

283

23 May 2023

The Knowledge Alignment Problem: Bridging Human and External Knowledge for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

245

23 May 2023

VISIT: Visualizing and Interpreting the Semantic Information Flow of TransformersConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Shahar Katz

Yonatan Belinkov

214

22 May 2023

Can LLMs facilitate interpretation of pre-trained language models?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Basel Mousi

Nadir Durrani

Fahim Dalvi

303

22 May 2023

Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge ConflictsInternational Conference on Learning Representations (ICLR), 2023

713

253

22 May 2023

LM vs LM: Detecting Factual Errors via Cross ExaminationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

331

186

22 May 2023

Editing Large Language Models: Problems, Methods, and OpportunitiesConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Peng Wang

Shumin Deng

Huajun Chen

Ningyu Zhang

KELM

349

400

22 May 2023

RWKV: Reinventing RNNs for the Transformer EraConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

...

Rui-Jie Zhu

594

862

22 May 2023

Can We Edit Factual Knowledge by In-Context Learning?Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Ce Zheng

Lei Li

Qingxiu Dong

Zhiyong Wu

253

289

22 May 2023