ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.05262
  4. Cited By
Locating and Editing Factual Associations in GPT

Locating and Editing Factual Associations in GPT

10 February 2022
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
    KELM
ArXivPDFHTML

Papers citing "Locating and Editing Factual Associations in GPT"

50 / 924 papers shown
Title
A Study into Investigating Temporal Robustness of LLMs
A Study into Investigating Temporal Robustness of LLMs
Jonas Wallat
Abdelrahman Abdallah
Adam Jatowt
Avishek Anand
42
0
0
21 Mar 2025
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
LLM Braces: Straightening Out LLM Predictions with Relevant Sub-Updates
Ying Shen
Lifu Huang
47
1
0
20 Mar 2025
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
Yunzhi Yao
Jizhan Fang
Jia-Chen Gu
N. Zhang
Shumin Deng
H. Chen
Nanyun Peng
KELM
54
1
0
20 Mar 2025
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Parameters vs. Context: Fine-Grained Control of Knowledge Reliance in Language Models
Baolong Bi
Shenghua Liu
Y. Wang
Yilong Xu
Junfeng Fang
Lingrui Mei
Xueqi Cheng
KELM
61
4
0
20 Mar 2025
Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification
Exploring Model Editing for LLM-based Aspect-Based Sentiment Classification
Shichen Li
Zhongqing Wang
Zheyu Zhao
Yue Zhang
Peifeng Li
KELM
54
0
0
19 Mar 2025
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
Efficient but Vulnerable: Benchmarking and Defending LLM Batch Prompting Attack
Murong Yue
Ziyu Yao
SILM
AAML
56
0
0
18 Mar 2025
SuperBPE: Space Travel for Language Models
SuperBPE: Space Travel for Language Models
Alisa Liu
J. Hayase
Valentin Hofmann
Sewoong Oh
Noah A. Smith
Yejin Choi
43
3
0
17 Mar 2025
TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
TinySQL: A Progressive Text-to-SQL Dataset for Mechanistic Interpretability Research
Philip Quirke
Clement Neo
Abir Harrasse
Dhruv Nathawani
Amir Abdullah
39
0
0
17 Mar 2025
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Using the Tools of Cognitive Science to Understand Large Language Models at Different Levels of Analysis
Alexander Ku
Declan Campbell
Xuechunzi Bai
Jiayi Geng
Ryan Liu
...
Ilia Sucholutsky
Veniamin Veselovsky
Liyi Zhang
Jian-Qiao Zhu
Thomas L. Griffiths
ELM
88
2
0
17 Mar 2025
Taming Knowledge Conflicts in Language Models
Gaotang Li
Yuzhong Chen
Hanghang Tong
KELM
44
0
0
14 Mar 2025
Safe Vision-Language Models via Unsafe Weights Manipulation
Safe Vision-Language Models via Unsafe Weights Manipulation
Moreno DÍncà
E. Peruzzo
Xingqian Xu
Humphrey Shi
N. Sebe
Massimiliano Mancini
MU
55
0
0
14 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
45
0
0
14 Mar 2025
Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing
Resolving UnderEdit & OverEdit with Iterative & Neighbor-Assisted Model Editing
Bhiman Kumar Baghel
Scott M. Jordan
Zheyuan Ryan Shi
Xiang Lorraine Li
KELM
50
0
0
14 Mar 2025
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Jiuding Sun
Jing Huang
Sidharth Baskaran
Karel DÓosterlinck
Christopher Potts
Michael Sklar
Atticus Geiger
AI4CE
68
0
0
13 Mar 2025
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
TruthPrInt: Mitigating LVLM Object Hallucination Via Latent Truthful-Guided Pre-Intervention
Jinhao Duan
Fei Kong
Hao-Ran Cheng
James Diffenderfer
B. Kailkhura
Lichao Sun
Xiaofeng Zhu
Xiaoshuang Shi
Kaidi Xu
134
0
0
13 Mar 2025
C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion
Lijie Hu
Junchi Liao
Weimin Lyu
Shaopeng Fu
Tianhao Huang
Shu Yang
Guimin Hu
Di Wang
AAML
65
0
0
12 Mar 2025
ACE: Concept Editing in Diffusion Models without Performance Degradation
Ruipeng Wang
Junfeng Fang
Jiaqi Li
Hao Chen
Jie Shi
K. Wang
X. Wang
DiffM
53
2
0
11 Mar 2025
BiasEdit: Debiasing Stereotyped Language Models via Model Editing
Xin Xu
Wei Xu
N. Zhang
Julian McAuley
KELM
39
0
0
11 Mar 2025
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Implicit Reasoning in Transformers is Reasoning through Shortcuts
Tianhe Lin
Jian Xie
Siyu Yuan
Deqing Yang
ReLM
LRM
66
2
0
10 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
63
0
0
08 Mar 2025
Exploiting Edited Large Language Models as General Scientific Optimizers
Exploiting Edited Large Language Models as General Scientific Optimizers
Qitan Lv
T. Liu
H. Wang
36
0
0
08 Mar 2025
Knowledge Updating? No More Model Editing! Just Selective Contextual Reasoning
Guoxiu He
Xin Song
Aixin Sun
KELM
68
3
0
07 Mar 2025
From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning
Eric Zhao
Pranjal Awasthi
Nika Haghtalab
47
0
0
07 Mar 2025
Revealing Hidden Mechanisms of Cross-Country Content Moderation with Natural Language Processing
Neemesh Yadav
Jiarui Liu
Francesco Ortu
Roya Ensafi
Zhijing Jin
Rada Mihalcea
36
0
0
07 Mar 2025
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems
Richard Ren
Arunim Agarwal
Mantas Mazeika
Cristina Menghini
Robert Vacareanu
...
Matias Geralnik
Adam Khoja
Dean Lee
Summer Yue
Dan Hendrycks
HILM
ALM
88
0
0
05 Mar 2025
Effectively Steer LLM To Follow Preference via Building Confident Directions
Bingqing Song
Boran Han
Shuai Zhang
Hao Wang
Haoyang Fang
Bonan Min
Yuyang Wang
Mingyi Hong
LLMSV
54
0
0
04 Mar 2025
MindBridge: Scalable and Cross-Model Knowledge Editing via Memory-Augmented Modality
Shuaike Li
Kai Zhang
Q. Liu
Enhong Chen
KELM
78
1
0
04 Mar 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
44
0
0
04 Mar 2025
Superscopes: Amplifying Internal Feature Representations for Language Model Interpretation
Jonathan Jacobi
Gal Niv
LRM
ReLM
60
0
0
03 Mar 2025
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
SePer: Measure Retrieval Utility Through The Lens Of Semantic Perplexity Reduction
Lu Dai
Yijie Xu
Jinhui Ye
Hao Liu
Hui Xiong
3DV
RALM
80
2
0
03 Mar 2025
Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia
Chenxi Wang
Tianle Gu
Zhongyu Wei
Lang Gao
Zirui Song
Xiuying Chen
OffRL
56
2
0
03 Mar 2025
SAKE: Steering Activations for Knowledge Editing
Marco Scialanga
Thibault Laugel
Vincent Grari
Marcin Detyniecki
KELM
LLMSV
72
1
0
03 Mar 2025
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu
R. Li
Yunzhe Qi
Hui Liu
X. Tang
...
Qingyu Yin
Monica Cheng
Jun Huan
Haoyu Wang
Jing Gao
KELM
46
2
0
01 Mar 2025
Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
Capability Localization: Capabilities Can be Localized rather than Individual Knowledge
Xiusheng Huang
Jiaxiang Liu
Yequan Wang
Jun Zhao
Kang Liu
54
0
0
28 Feb 2025
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Everything, Everywhere, All at Once: Is Mechanistic Interpretability Identifiable?
Maxime Méloux
Silviu Maniu
François Portet
Maxime Peyrard
34
0
0
28 Feb 2025
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
MMKE-Bench: A Multimodal Editing Benchmark for Diverse Visual Knowledge
Yuntao Du
Kailin Jiang
Zhi Gao
Chenrui Shi
Zilong Zheng
Siyuan Qi
Qing Li
KELM
65
2
0
27 Feb 2025
GeoEdit: Geometric Knowledge Editing for Large Language Models
GeoEdit: Geometric Knowledge Editing for Large Language Models
Yujie Feng
Liming Zhan
Zexin Lu
Yongxin Xu
Xu Chu
Yasha Wang
Jiannong Cao
Philip S. Yu
Xiao-Ming Wu
KELM
53
0
0
27 Feb 2025
Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models
Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models
Huazheng Wang
Yongcheng Jing
Haifeng Sun
Yingjie Wang
J. Wang
Jianxin Liao
Dacheng Tao
KELM
MU
42
0
0
27 Feb 2025
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
Emergent Symbolic Mechanisms Support Abstract Reasoning in Large Language Models
Yukang Yang
Declan Campbell
Kaixuan Huang
Mengdi Wang
Jonathan D. Cohen
Taylor W. Webb
LRM
65
2
0
27 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELM
MU
46
0
0
27 Feb 2025
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong
Kamilė Stankevičiūtė
Chao-gang Wan
Anmol Kabra
Raphael Thesmar
Johann Lee
Julius Klenke
Carla P. Gomes
Kilian Q. Weinberger
RALM
LRM
60
0
0
27 Feb 2025
Neuroplasticity and Corruption in Model Mechanisms: A Case Study Of Indirect Object Identification
Vishnu Kabir Chhabra
Ding Zhu
Mohammad Mahdi Khalili
37
2
0
27 Feb 2025
Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing
Norm Growth and Stability Challenges in Localized Sequential Knowledge Editing
Akshat Gupta
Christine Fang
Atahan Ozdemir
Maochuan Lu
Ahmed Alaa
Thomas Hartvigsen
Gopala Anumanchipalli
KELM
33
0
0
26 Feb 2025
A Causal Lens for Evaluating Faithfulness Metrics
A Causal Lens for Evaluating Faithfulness Metrics
Kerem Zaman
Shashank Srivastava
66
0
0
26 Feb 2025
Steered Generation via Gradient Descent on Sparse Features
Steered Generation via Gradient Descent on Sparse Features
Sumanta Bhattacharyya
Pedram Rooshenas
LLMSV
43
0
0
25 Feb 2025
Can LLMs Explain Themselves Counterfactually?
Can LLMs Explain Themselves Counterfactually?
Zahra Dehghanighobadi
Asja Fischer
Muhammad Bilal Zafar
LRM
38
0
0
25 Feb 2025
Constraining Sequential Model Editing with Editing Anchor Compression
Hao-Xiang Xu
Jun-Yu Ma
Zhen-Hua Ling
Ningyu Zhang
Jia-Chen Gu
KELM
47
1
0
25 Feb 2025
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Jacobian Sparse Autoencoders: Sparsify Computations, Not Just Activations
Lucy Farnik
Tim Lawson
Conor Houghton
Laurence Aitchison
56
0
0
25 Feb 2025
Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems
Investigating the Adaptive Robustness with Knowledge Conflicts in LLM-based Multi-Agent Systems
Tianjie Ju
B. Wang
Hao Fei
M. Lee
W. Hsu
...
Qianren Wang
Pengzhou Cheng
Zongru Wu
Zhuosheng Zhang
Gongshen Liu
AAML
36
0
0
24 Feb 2025
Do Multilingual LLMs Think In English?
Do Multilingual LLMs Think In English?
Lisa Schut
Y. Gal
Sebastian Farquhar
42
3
0
24 Feb 2025
Previous
12345...171819
Next