Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.14680
Cited By
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
28 March 2022
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space"
50 / 269 papers shown
Title
When Only Time Will Tell: Interpreting How Transformers Process Local Ambiguities Through the Lens of Restart-Incrementality
Brielen Madureira
Patrick Kahardipraja
David Schlangen
31
2
0
20 Feb 2024
Backward Lens: Projecting Language Model Gradients into the Vocabulary Space
Shahar Katz
Yonatan Belinkov
Mor Geva
Lior Wolf
47
9
1
20 Feb 2024
Empirical Study on Updating Key-Value Memories in Transformer Feed-forward Layers
Zihan Qiu
Zeyu Huang
Youcheng Huang
Jie Fu
KELM
19
5
0
19 Feb 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
41
11
0
19 Feb 2024
Retrieval-Augmented Generation: Is Dense Passage Retrieval Retrieving?
Benjamin Z. Reichman
Larry Heck
8
2
0
16 Feb 2024
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Mengnan Du
35
10
0
16 Feb 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers
Chris Wendler
V. Veselovsky
Giovanni Monea
Robert West
56
95
0
16 Feb 2024
Do LLMs Know about Hallucination? An Empirical Investigation of LLM's Hidden States
Hanyu Duan
Yi Yang
K. Tam
HILM
14
26
0
15 Feb 2024
Spectral Filters, Dark Signals, and Attention Sinks
Nicola Cancedda
56
16
0
14 Feb 2024
Summing Up the Facts: Additive Mechanisms Behind Factual Recall in LLMs
Bilal Chughtai
Alan Cooney
Neel Nanda
HILM
KELM
14
16
0
11 Feb 2024
AttnLRP: Attention-Aware Layer-Wise Relevance Propagation for Transformers
Reduan Achtibat
Sayed Mohammad Vakilzadeh Hatefi
Maximilian Dreyer
Aakriti Jain
Thomas Wiegand
Sebastian Lapuschkin
Wojciech Samek
12
24
0
08 Feb 2024
How do Large Language Models Learn In-Context? Query and Key Matrices of In-Context Heads are Two Towers for Metric Learning
Zeping Yu
Sophia Ananiadou
48
6
0
05 Feb 2024
Neighboring Perturbations of Knowledge Editing on Large Language Models
Jun-Yu Ma
Zhen-Hua Ling
Ningyu Zhang
Jia-Chen Gu
KELM
21
5
0
31 Jan 2024
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao
Xianjun Yang
Tianyu Pang
Chao Du
Lei Li
Yu-Xiang Wang
William Yang Wang
26
52
0
30 Jan 2024
From Understanding to Utilization: A Survey on Explainability for Large Language Models
Haoyan Luo
Lucia Specia
27
21
0
23 Jan 2024
Universal Neurons in GPT2 Language Models
Wes Gurnee
Theo Horsley
Zifan Carl Guo
Tara Rezaei Kheirkhah
Qinyi Sun
Will Hathaway
Neel Nanda
Dimitris Bertsimas
MILM
92
37
0
22 Jan 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
25
86
0
11 Jan 2024
Model Editing Harms General Abilities of Large Language Models: Regularization to the Rescue
Jia-Chen Gu
Haoyang Xu
Jun-Yu Ma
Pan Lu
Zhen-Hua Ling
Kai-Wei Chang
Nanyun Peng
KELM
25
33
0
09 Jan 2024
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO and Toxicity
Andrew Lee
Xiaoyan Bai
Itamar Pres
Martin Wattenberg
Jonathan K. Kummerfeld
Rada Mihalcea
64
95
0
03 Jan 2024
A Comprehensive Study of Knowledge Editing for Large Language Models
Ningyu Zhang
Yunzhi Yao
Bo Tian
Peng Wang
Shumin Deng
...
Lei Liang
Zhiqiang Zhang
Xiao-Jun Zhu
Jun Zhou
Huajun Chen
KELM
26
76
0
02 Jan 2024
LLM Factoscope: Uncovering LLMs' Factual Discernment through Inner States Analysis
Jinwen He
Yujia Gong
Kai-xiang Chen
Zijin Lin
Chengán Wei
Yue Zhao
13
3
0
27 Dec 2023
Neuron-Level Knowledge Attribution in Large Language Models
Zeping Yu
Sophia Ananiadou
FAtt
KELM
14
6
0
19 Dec 2023
Knowledge Trees: Gradient Boosting Decision Trees on Knowledge Neurons as Probing Classifier
Sergey A. Saltykov
11
0
0
17 Dec 2023
Weight subcloning: direct initialization of transformers using larger pretrained ones
Mohammad Samragh
Mehrdad Farajtabar
Sachin Mehta
Raviteja Vemulapalli
Fartash Faghri
Devang Naik
Oncel Tuzel
Mohammad Rastegari
16
25
0
14 Dec 2023
ToViLaG: Your Visual-Language Generative Model is Also An Evildoer
Xinpeng Wang
Xiaoyuan Yi
Han Jiang
Shanlin Zhou
Zhihua Wei
Xing Xie
25
12
0
13 Dec 2023
Exploring the Reversal Curse and Other Deductive Logical Reasoning in BERT and GPT-Based Large Language Models
Da Wu
Jing Yang
Kai Wang
LRM
10
5
0
06 Dec 2023
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings
Andrea W Wen-Yi
David Mimno
17
14
0
29 Nov 2023
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation
Haoyi Wu
Kewei Tu
57
3
0
26 Nov 2023
Do Localization Methods Actually Localize Memorized Data in LLMs? A Tale of Two Benchmarks
Ting-Yun Chang
Jesse Thomason
Robin Jia
15
14
0
15 Nov 2023
Assessing Knowledge Editing in Language Models via Relation Perspective
Yifan Wei
Xiaoyan Yu
Huanhuan Ma
Fangyu Lei
Yixuan Weng
Ran Song
Kang Liu
KELM
13
15
0
15 Nov 2023
Forgetting before Learning: Utilizing Parametric Arithmetic for Knowledge Updating in Large Language Models
Shiwen Ni
Dingwei Chen
Chengming Li
Xiping Hu
Ruifeng Xu
Min Yang
KELM
MoMe
11
7
0
14 Nov 2023
In-context Learning and Gradient Descent Revisited
Gilad Deutch
Nadav Magar
Tomer Bar Natan
Guy Dar
20
7
0
13 Nov 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSV
MILM
10
134
0
07 Nov 2023
Training Dynamics of Contextual N-Grams in Language Models
Lucia Quirke
Lovis Heindrich
Wes Gurnee
Neel Nanda
10
4
0
01 Nov 2023
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Martina G. Vilas
Timothy Schaumlöffel
Gemma Roig
ViT
14
23
0
29 Oct 2023
Knowledge Editing for Large Language Models: A Survey
Song Wang
Yaochen Zhu
Haochen Liu
Zaiyi Zheng
Chen Chen
Jundong Li
KELM
66
127
0
24 Oct 2023
Unnatural language processing: How do language models handle machine-generated prompts?
Corentin Kervadec
Francesca Franzon
Marco Baroni
12
5
0
24 Oct 2023
Unveiling Multilinguality in Transformer Models: Exploring Language Specificity in Feed-Forward Networks
Sunit Bhattacharya
Ondrej Bojar
25
8
0
24 Oct 2023
Function Vectors in Large Language Models
Eric Todd
Millicent Li
Arnab Sen Sharma
Aaron Mueller
Byron C. Wallace
David Bau
8
99
0
23 Oct 2023
Identifying and Adapting Transformer-Components Responsible for Gender Bias in an English Language Model
Abhijith Chintam
Rahel Beloch
Willem H. Zuidema
Michael Hanna
Oskar van der Wal
18
16
0
19 Oct 2023
Investigating semantic subspaces of Transformer sentence embeddings through linear structural probing
Dmitry Nikolaev
Sebastian Padó
35
5
0
18 Oct 2023
The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models
Aviv Slobodkin
Omer Goldman
Avi Caciularu
Ido Dagan
Shauli Ravfogel
HILM
LRM
45
23
0
18 Oct 2023
Unlocking Emergent Modularity in Large Language Models
Zihan Qiu
Zeyu Huang
Jie Fu
14
8
0
17 Oct 2023
Untying the Reversal Curse via Bidirectional Language Model Editing
Jun-Yu Ma
Jia-Chen Gu
Zhen-Hua Ling
Quan Liu
Cong Liu
KELM
79
35
0
16 Oct 2023
Self-Detoxifying Language Models via Toxification Reversal
Chak Tou Leong
Yi Cheng
Jiashuo Wang
Jian Wang
Wenjie Li
MU
11
28
0
14 Oct 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity
Cunxiang Wang
Xiaoze Liu
Yuanhao Yue
Xiangru Tang
Tianhang Zhang
...
Linyi Yang
Jindong Wang
Xing Xie
Zheng-Wei Zhang
Yue Zhang
HILM
KELM
51
182
0
11 Oct 2023
Probing Large Language Models from A Human Behavioral Perspective
Xintong Wang
Xiaoyu Li
Xingshan Li
Chris Biemann
46
5
0
08 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
19
10
0
05 Oct 2023
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
Deniz Bayazit
Negar Foroutan
Zeming Chen
Gail Weiss
Antoine Bosselut
KELM
11
12
0
04 Oct 2023
Quantifying the Plausibility of Context Reliance in Neural Machine Translation
Gabriele Sarti
Grzegorz Chrupala
Malvina Nissim
Arianna Bisazza
29
5
0
02 Oct 2023
Previous
1
2
3
4
5
6
Next