Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.04341
Cited By
What Does BERT Look At? An Analysis of BERT's Attention
11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Does BERT Look At? An Analysis of BERT's Attention"
50 / 883 papers shown
Title
Probability Consistency in Large Language Models: Theoretical Foundations Meet Empirical Discrepancies
Xiaoliang Luo
Xinyi Xu
Michael Ramscar
Bradley C. Love
25
0
0
13 May 2025
Are We Paying Attention to Her? Investigating Gender Disambiguation and Attention in Machine Translation
Chiara Manna
Afra Alishahi
Frédéric Blain
Eva Vanmassenhove
22
0
0
13 May 2025
LECTOR: Summarizing E-book Reading Content for Personalized Student Support
Erwin Daniel López Zapata
Cheng Tang
Valdemar Švábenský
Fumiya Okubo
Atsushi Shimada
14
0
0
12 May 2025
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu
Kayo Yin
Michael I. Jordan
Jacob Steinhardt
Lijie Chen
51
0
0
08 May 2025
CrashSage: A Large Language Model-Centered Framework for Contextual and Interpretable Traffic Crash Analysis
Hao Zhen
Jidong J. Yang
28
0
0
08 May 2025
Efficient Shapley Value-based Non-Uniform Pruning of Large Language Models
Chuan Sun
Han Yu
Lizhen Cui
Xiaoxiao Li
72
0
0
03 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
0
0
01 May 2025
Polysemy of Synthetic Neurons Towards a New Type of Explanatory Categorical Vector Spaces
Michael Pichat
William Pogrund
Paloma Pichat
Judicael Poumay
Armanouche Gasparian
Samuel Demarchi
Martin Corbet
Alois Georgeon
Michael Veillet-Guillem
MILM
24
0
0
30 Apr 2025
Exploring How LLMs Capture and Represent Domain-Specific Knowledge
Mirian Hipolito Garcia
Camille Couturier
Daniel Madrigal Diaz
Ankur Mallick
Anastasios Kyrillidis
Robert Sim
Victor Rühle
Saravan Rajmohan
30
0
0
23 Apr 2025
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
50
1
0
23 Apr 2025
Word Embedding Techniques for Classification of Star Ratings
Hesham Abdelmotaleb
Craig McNeile
Malgorzata Wojtys
24
0
0
18 Apr 2025
Weight-of-Thought Reasoning: Exploring Neural Network Weights for Enhanced LLM Reasoning
Saif Punjwani
Larry Heck
LRM
27
0
0
14 Apr 2025
Question Tokens Deserve More Attention: Enhancing Large Language Models without Training through Step-by-Step Reading and Question Attention Recalibration
Feijiang Han
Licheng Guo
Hengtao Cui
Zhiyuan Lyu
LRM
31
0
0
13 Apr 2025
Linguistic Interpretability of Transformer-based Language Models: a systematic review
Miguel López-Otal
Jorge Gracia
Jordi Bernad
Carlos Bobed
Lucía Pitarch-Ballesteros
Emma Anglés-Herrero
VLM
36
0
0
09 Apr 2025
Few Dimensions are Enough: Fine-tuning BERT with Selected Dimensions Revealed Its Redundant Nature
Shion Fukuhata
Yoshinobu Kano
19
0
0
07 Apr 2025
Hallucination Detection using Multi-View Attention Features
Yuya Ogasa
Yuki Arase
26
0
0
06 Apr 2025
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models
Guy Kaplan
Michael Toker
Yuval Reif
Yonatan Belinkov
Roy Schwartz
DiffM
48
0
0
01 Apr 2025
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models
Zhanke Zhou
Zhaocheng Zhu
Xuan Li
Mikhail Galkin
Xiao Feng
Sanmi Koyejo
Jian Tang
Bo Han
LRM
56
0
0
28 Mar 2025
Linguistic Blind Spots of Large Language Models
Jiali Cheng
Hadi Amiri
43
1
0
25 Mar 2025
Construction Identification and Disambiguation Using BERT: A Case Study of NPN
Wesley Scivetti
Nathan Schneider
44
0
0
24 Mar 2025
Intra-neuronal attention within language models Relationships between activation and semantics
Michael Pichat
William Pogrund
Paloma Pichat
Armanouche Gasparian
Samuel Demarchi
Corbet Alois Georgeon
Michael Veillet-Guillem
MILM
38
0
0
17 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
45
0
0
14 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
54
1
0
14 Mar 2025
AttentionRAG: Attention-Guided Context Pruning in Retrieval-Augmented Generation
Yixiong Fang
Tianran Sun
Yuling Shi
Xiaodong Gu
50
0
0
13 Mar 2025
Mitigating Memorization in LLMs using Activation Steering
Manan Suri
Nishit Anand
Amisha Bhaskar
LLMSV
50
2
0
08 Mar 2025
Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
85
2
0
08 Mar 2025
(How) Do Language Models Track State?
Belinda Z. Li
Zifan Carl Guo
Jacob Andreas
LRM
44
0
0
04 Mar 2025
AxBERT: An Interpretable Chinese Spelling Correction Method Driven by Associative Knowledge Network
Fanyu Wang
Hangyu Zhu
Zhenping Xie
40
0
0
04 Mar 2025
Language Models Grow Less Humanlike beyond Phase Transition
Tatsuya Aoyama
Ethan Wilcox
36
1
0
26 Feb 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
63
0
0
24 Feb 2025
Attention Eclipse: Manipulating Attention to Bypass LLM Safety-Alignment
Pedram Zaree
Md Abdullah Al Mamun
Quazi Mishkatul Alam
Yue Dong
Ihsen Alouani
Nael B. Abu-Ghazaleh
AAML
41
0
0
24 Feb 2025
A Survey of Model Architectures in Information Retrieval
Zhichao Xu
Fengran Mo
Zhiqi Huang
Crystina Zhang
Puxuan Yu
Bei Wang
Jimmy J. Lin
Vivek Srikumar
KELM
3DV
48
2
0
21 Feb 2025
A Close Look at Decomposition-based XAI-Methods for Transformer Language Models
L. Arras
Bruno Puri
Patrick Kahardipraja
Sebastian Lapuschkin
Wojciech Samek
35
0
0
21 Feb 2025
PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models
J. Zhao
Miao Zhang
M. Wang
Yuzhang Shang
Kaihao Zhang
Weili Guan
Yaowei Wang
Min Zhang
MQ
44
0
0
18 Feb 2025
LLMs as a synthesis between symbolic and continuous approaches to language
Gemma Boleda
SyDa
69
0
0
17 Feb 2025
Learning Task Representations from In-Context Learning
Baturay Saglam
Zhuoran Yang
Dionysis Kalogerias
Amin Karbasi
55
0
0
08 Feb 2025
Mechanistic Interpretability of Emotion Inference in Large Language Models
Ala Nekouvaght Tak
Amin Banayeeanzade
Anahita Bolourani
Mina Kian
Robin Jia
Jonathan Gratch
49
0
0
08 Feb 2025
Can Cross Encoders Produce Useful Sentence Embeddings?
Haritha Ananthakrishnan
Julian T Dolby
Harsha Kokel
Horst Samulowitz
Kavitha Srinivas
66
0
0
05 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
143
0
0
04 Feb 2025
Understanding Why Adam Outperforms SGD: Gradient Heterogeneity in Transformers
Akiyoshi Tomihari
Issei Sato
ODL
59
0
0
31 Jan 2025
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
Yuan Feng
Junlin Lv
Yukun Cao
Xike Xie
S. K. Zhou
VLM
53
27
0
28 Jan 2025
Are Transformers Able to Reason by Connecting Separated Knowledge in Training Data?
Yutong Yin
Zhaoran Wang
LRM
ReLM
107
0
0
27 Jan 2025
Ehrenfeucht-Haussler Rank and Chain of Thought
Pablo Barceló
A. Kozachinskiy
Tomasz Steifer
LRM
73
1
0
22 Jan 2025
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Michael Toker
Ido Galil
Hadas Orgad
Rinon Gal
Yoad Tewel
Gal Chechik
Yonatan Belinkov
DiffM
54
2
0
12 Jan 2025
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
Bohang Sun
Pietro Liò
ViT
AAML
38
1
0
02 Jan 2025
Dynamic Attention-Guided Context Decoding for Mitigating Context Faithfulness Hallucinations in Large Language Models
Yanwen Huang
Yong Zhang
Ning Cheng
Zhitao Li
Shaojun Wang
Jing Xiao
80
0
0
02 Jan 2025
Decoupling Knowledge and Reasoning in Transformers: A Modular Architecture with Generalized Cross-Attention
Zhenyu Guo
Wenguang Chen
37
0
0
01 Jan 2025
Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models
Zhisong Zhang
Yan Wang
Xinting Huang
Tianqing Fang
H. Zhang
Chenlong Deng
Shuaiyi Li
Dong Yu
80
2
0
21 Dec 2024
Attention with Dependency Parsing Augmentation for Fine-Grained Attribution
Qiang Ding
Lvzhou Luo
Yixuan Cao
Ping Luo
74
0
0
16 Dec 2024
Analyzing the Attention Heads for Pronoun Disambiguation in Context-aware Machine Translation Models
Paweł Mąka
Yusuf Can Semerci
Jan Scholtes
Gerasimos Spanakis
74
0
0
15 Dec 2024
1
2
3
4
...
16
17
18
Next