v1v2v3v4v5 (latest)

Locating and Editing Factual Associations in GPT

Neural Information Processing Systems (NeurIPS), 2022

10 February 2022

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 1,363 papers shown

Identifying Sub-networks in Neural Networks via Functionally Similar Representations

Tian Gao

Amit Dhurandhar

Karthikeyan N. Ramamurthy

Dennis L. Wei

385

21 Oct 2024

Catastrophic Failure of LLM Unlearning via QuantizationInternational Conference on Learning Representations (ICLR), 2024

Zhiwei Zhang

Fali Wang

Xiaomin Li

Zongyu Wu

Xianfeng Tang

Hui Liu

Qi He

Wenpeng Yin

Suhang Wang

334

21 Oct 2024

Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation EngineeringNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Hongru Wang

320

21 Oct 2024

Towards Faithful Natural Language Explanations: A Study Using Activation Patching in Large Language Models

Wei Jie Yeo

Ranjan Satapathy

Erik Cambria

324

18 Oct 2024

Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact CompletionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

622

18 Oct 2024

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

Michael I. Jordan

309

17 Oct 2024

Looking Inward: Language Models Can Learn About Themselves by Introspection

256

17 Oct 2024

Seeing Through VisualBERT: A Causal Adventure on Memetic LandscapesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Dibyanayan Bandyopadhyay

Mohammed Hasanuzzaman

Asif Ekbal

AAML

266

17 Oct 2024

Breaking Chains: Unraveling the Links in Multi-Hop Knowledge Unlearning

170

17 Oct 2024

On the Role of Attention Heads in Large Language Model SafetyInternational Conference on Learning Representations (ICLR), 2024

Kun Wang

Yang Liu

Cunchun Li

Yongbin Li

507

17 Oct 2024

The Geometry of Numerical Reasoning: Language Models Compare Numeric Properties in Linear SubspacesNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Ahmed Oumar El-Shangiti

Tatsuya Hiraoka

Hilal AlQuabeh

Benjamin Heinzerling

Kentaro Inui

433

17 Oct 2024

Bridging the Language Gaps in Large Language Models with Inference-Time Cross-Lingual InterventionAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

237

16 Oct 2024

Neuron-based Personality Trait Induction in Large Language Models

252

16 Oct 2024

SoK: Prompt Hacking of Large Language ModelsBigData Congress [Services Society] (BSS), 2024

172

16 Oct 2024

Deep Model Merging: The Sister of Neural Network Interpretability -- A Survey

Kyle Chard

205

16 Oct 2024

Cross-Modal Safety Mechanism Transfer in Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024

303

16 Oct 2024

Interpreting token compositionality in LLMs: A robustness analysis

Nura Aljaafari

Danilo S. Carvalho

André Freitas

440

16 Oct 2024

Reconstruction of Differentially Private Text Sanitization via Large Language Models

431

16 Oct 2024

AERO: Entropy-Guided Framework for Private LLM Inference

N. Jha

Brandon Reagen

493

16 Oct 2024

The Persian Rug: solving toy models of superposition using large-scale symmetries

Aditya Cowsik

Kfir Dolev

Alex Infanger

217

15 Oct 2024

O-Edit: Orthogonal Subspace Editing for Language Model Sequential Editing

Yuchen Cai

Ding Cao

KELM

228

15 Oct 2024

A Theoretical Survey on Foundation Models

Shi Fu

Yuzhu Chen

Yingjie Wang

Dacheng Tao

304

15 Oct 2024

ReDeEP: Detecting Hallucination in Retrieval-Augmented Generation via Mechanistic InterpretabilityInternational Conference on Learning Representations (ICLR), 2024

Yang Song

313

15 Oct 2024

Advancing the Understanding of Fixed Point Iterations in Deep Neural Networks: A Detailed Analytical Study

240

15 Oct 2024

Semantic Image Inversion and Editing using Rectified Stochastic Differential EquationsInternational Conference on Learning Representations (ICLR), 2024

Litu Rout

Yujia Chen

Nataniel Ruiz

Constantine Caramanis

Sanjay Shakkottai

Wen-Sheng Chu

DiffM

215

14 Oct 2024

Locking Down the Finetuned LLMs Safety

Minjun Zhu

Linyi Yang

Yifan Wei

Ningyu Zhang

Yue Zhang

279

14 Oct 2024

Parenting: Optimizing Knowledge Selection of Retrieval-Augmented Language Models with Parameter Decoupling and Tailored Tuning

276

14 Oct 2024

Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family ExpertsInternational Conference on Learning Representations (ICLR), 2024

Xidong Wang

315

14 Oct 2024

Safety-Aware Fine-Tuning of Large Language Models

Hyeong Kyu Choi

Xuefeng Du

Yixuan Li

278

13 Oct 2024

ChroKnowledge: Unveiling Chronological Knowledge of Language Models in Multiple DomainsInternational Conference on Learning Representations (ICLR), 2024

503

13 Oct 2024

Inference and Verbalization Functions During In-Context LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

304

12 Oct 2024

Keys to Robust Edits: from Theoretical Insights to Practical AdvancesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

290

12 Oct 2024

CollabEdit: Towards Non-destructive Collaborative Knowledge EditingInternational Conference on Learning Representations (ICLR), 2024

Jianwei Yin

540

12 Oct 2024

Understanding the Interplay between Parametric and Contextual Knowledge for Large Language Models

Sitao Cheng

Liangming Pan

Xunjian Yin

Xinyi Wang

William Yang Wang

KELM

242

10 Oct 2024

Mitigating Gender Bias in Code Large Language Models via Model Editing

Haochuan Wang

Zhiying Tu

Dianbo Sui

200

10 Oct 2024

Unlearning-based Neural InterpretationsInternational Conference on Learning Representations (ICLR), 2024

599

10 Oct 2024

The Geometry of Concepts: Sparse Autoencoder Feature Structure

337

10 Oct 2024

Uncovering Overfitting in Large Language Model EditingInternational Conference on Learning Representations (ICLR), 2024

299

10 Oct 2024

Mitigating the Language Mismatch and Repetition Issues in LLM-based Machine Translation via Model EditingConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Weichuan Wang

Zhaoyi Li

Defu Lian

Chen Ma

Linqi Song

Ying Wei

225

09 Oct 2024

Towards Universality: Studying Mechanistic Similarity Across Language Model ArchitecturesInternational Conference on Learning Representations (ICLR), 2024

Junxuan Wang

Xipeng Qiu

252

09 Oct 2024

Dissecting Fine-Tuning Unlearning in Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Yihuai Hong

Lijie Hu

Di Wang

241

09 Oct 2024

On the Similarity of Circuits across Languages: a Case Study on the Subject-verb Agreement TaskConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Javier Ferrando

Marta R. Costa-jussá

167

09 Oct 2024

Towards Interpreting Visual Information Processing in Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2024

545

09 Oct 2024

Jet Expansions of Residual Computation

Yao Lu

194

08 Oct 2024

Probing Language Models on Their Knowledge SourceBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024

Zineddine Tighidet

Andrea Mogini

Jiali Mei

Benjamin Piwowarski

Patrick Gallinari

KELM

210

08 Oct 2024

From Tokens to Words: On the Inner Lexicon of LLMsInternational Conference on Learning Representations (ICLR), 2024

Guy Kaplan

Matanel Oren

Yuval Reif

Roy Schwartz

460

08 Oct 2024

Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing

Lijie Hu

Di Wang

KELM

432

08 Oct 2024

Attribute Controlled Fine-tuning for Large Language Models: A Case Study on DetoxificationConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Anil Ramakrishna

Richard Zemel

Kai-Wei Chang

Rahul Gupta

Charith Peris

136

07 Oct 2024

Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

M. Farahani

Richard Johansson

RALM

223

07 Oct 2024

Mechanistic?BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackBoxNLP), 2024

Naomi Saphra

Sarah Wiegreffe

AI4CE

263

07 Oct 2024