Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.00740
Cited By
Inspecting and Editing Knowledge Representations in Language Models
3 April 2023
Evan Hernandez
Belinda Z. Li
Jacob Andreas
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inspecting and Editing Knowledge Representations in Language Models"
21 / 21 papers shown
Title
HyperDAS: Towards Automating Mechanistic Interpretability with Hypernetworks
Jiuding Sun
Jing Huang
Sidharth Baskaran
Karel DÓosterlinck
Christopher Potts
Michael Sklar
Atticus Geiger
AI4CE
68
0
0
13 Mar 2025
Task-driven Layerwise Additive Activation Intervention
Hieu Trung Nguyen
Bao Nguyen
Binh Nguyen
V. Nguyen
KELM
45
0
0
10 Feb 2025
Towards Unifying Interpretability and Control: Evaluation via Intervention
Usha Bhalla
Suraj Srinivas
Asma Ghandeharioun
Himabindu Lakkaraju
40
5
0
07 Nov 2024
The LLM Language Network: A Neuroscientific Approach for Identifying Causally Task-Relevant Units
Badr AlKhamissi
Greta Tuckute
Antoine Bosselut
Martin Schrimpf
MILM
39
9
0
04 Nov 2024
Semantics-Adaptive Activation Intervention for LLMs via Dynamic Steering Vectors
Weixuan Wang
J. Yang
Wei Peng
LLMSV
26
2
0
16 Oct 2024
Improving Instruction-Following in Language Models through Activation Steering
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
62
14
0
15 Oct 2024
Robust LLM safeguarding via refusal feature adversarial training
L. Yu
Virginie Do
Karen Hambardzumyan
Nicola Cancedda
AAML
62
10
0
30 Sep 2024
Extracting Paragraphs from LLM Token Activations
Nicholas Pochinkov
Angelo Benoit
Lovkush Agarwal
Zainab Ali Majid
Lucile Ter-Minassian
30
1
0
10 Sep 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
66
4
0
08 Jul 2024
Beyond Individual Facts: Investigating Categorical Knowledge Locality of Taxonomy and Meronomy Concepts in GPT Models
Christopher Burger
Yifan Hu
Thai Le
KELM
36
0
0
22 Jun 2024
"Flex Tape Can't Fix That": Bias and Misinformation in Edited Language Models
Karina Halevy
Anna Sotnikova
Badr AlKhamissi
Syrielle Montariol
Antoine Bosselut
KELM
34
3
0
29 Feb 2024
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Yuyang Bai
Shangbin Feng
Vidhisha Balachandran
Zhaoxuan Tan
Shiqi Lou
Tianxing He
Yulia Tsvetkov
ELM
40
2
0
15 Oct 2023
PMET: Precise Model Editing in a Transformer
Xiaopeng Li
Shasha Li
Shezheng Song
Jing Yang
Jun Ma
Jie Yu
KELM
26
115
0
17 Aug 2023
Evaluating the Ripple Effects of Knowledge Editing in Language Models
Roi Cohen
Eden Biran
Ori Yoran
Amir Globerson
Mor Geva
KELM
40
155
0
24 Jul 2023
Fast Model Editing at Scale
E. Mitchell
Charles Lin
Antoine Bosselut
Chelsea Finn
Christopher D. Manning
KELM
219
341
0
21 Oct 2021
Tailor: Generating and Perturbing Text with Semantic Controls
Alexis Ross
Tongshuang Wu
Hao Peng
Matthew E. Peters
Matt Gardner
136
77
0
15 Jul 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
280
3,844
0
18 Apr 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
250
1,986
0
31 Dec 2020
Reducing conversational agents' overconfidence through linguistic calibration
Sabrina J. Mielke
Arthur Szlam
Emily Dinan
Y-Lan Boureau
209
153
0
30 Dec 2020
Language Models as Knowledge Bases?
Fabio Petroni
Tim Rocktaschel
Patrick Lewis
A. Bakhtin
Yuxiang Wu
Alexander H. Miller
Sebastian Riedel
KELM
AI4MH
413
2,584
0
03 Sep 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
199
882
0
03 May 2018
1