ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04341
  4. Cited By
What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
    MILM
ArXivPDFHTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 883 papers shown
Title
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering
  of Layer-Distributed Neural Representations
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang
David Yunis
Michael Maire
25
2
0
11 Dec 2023
Exploring Sparsity in Graph Transformers
Exploring Sparsity in Graph Transformers
Chuang Liu
Yibing Zhan
Xueqi Ma
Liang Ding
Dapeng Tao
Jia Wu
Wenbin Hu
Bo Du
29
6
0
09 Dec 2023
INSPECT: Intrinsic and Systematic Probing Evaluation for Code
  Transformers
INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers
Anjan Karmakar
Romain Robbes
22
4
0
08 Dec 2023
Interpretability Illusions in the Generalization of Simplified Models
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman
Andrew Kyle Lampinen
Lucas Dixon
Danqi Chen
Asma Ghandeharioun
17
14
0
06 Dec 2023
Towards Measuring Representational Similarity of Large Language Models
Towards Measuring Representational Similarity of Large Language Models
Max Klabunde
Mehdi Ben Amor
Michael Granitzer
Florian Lemmerich
34
1
0
05 Dec 2023
Class-Discriminative Attention Maps for Vision Transformers
Class-Discriminative Attention Maps for Vision Transformers
L. Brocki
Jakub Binda
N. C. Chung
MedIm
30
3
0
04 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with
  bounded Dyck grammars
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
21
21
0
03 Dec 2023
Mitigating Over-smoothing in Transformers via Regularized Nonlocal
  Functionals
Mitigating Over-smoothing in Transformers via Regularized Nonlocal Functionals
Tam Nguyen
Tan-Minh Nguyen
Richard G. Baraniuk
21
8
0
01 Dec 2023
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings
Hyperpolyglot LLMs: Cross-Lingual Interpretability in Token Embeddings
Andrea W Wen-Yi
David Mimno
25
14
0
29 Nov 2023
Injecting linguistic knowledge into BERT for Dialogue State Tracking
Injecting linguistic knowledge into BERT for Dialogue State Tracking
Xiaohan Feng
Xixin Wu
Helen M. Meng
17
0
0
27 Nov 2023
Probabilistic Transformer: A Probabilistic Dependency Model for
  Contextual Word Representation
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation
Haoyi Wu
Kewei Tu
117
3
0
26 Nov 2023
Nova: Generative Language Models for Assembly Code with Hierarchical
  Attention and Contrastive Learning
Nova: Generative Language Models for Assembly Code with Hierarchical Attention and Contrastive Learning
Nan Jiang
Chengxiao Wang
Kevin Liu
Xiangzhe Xu
Lin Tan
Xiangyu Zhang
OffRL
25
7
0
22 Nov 2023
Visual Analytics for Generative Transformer Models
Visual Analytics for Generative Transformer Models
Raymond Li
Ruixin Yang
Wen Xiao
Ahmed AbuRaed
Gabriel Murray
Giuseppe Carenini
27
1
0
21 Nov 2023
Bias A-head? Analyzing Bias in Transformer-Based Language Model
  Attention Heads
Bias A-head? Analyzing Bias in Transformer-Based Language Model Attention Heads
Yi Yang
Hanyu Duan
Ahmed Abbasi
John P. Lalor
K. Tam
11
5
0
17 Nov 2023
XplainLLM: A QA Explanation Dataset for Understanding LLM
  Decision-Making
XplainLLM: A QA Explanation Dataset for Understanding LLM Decision-Making
Zichen Chen
Jianda Chen
Mitali Gaidhani
Ambuj K. Singh
Misha Sra
30
4
0
15 Nov 2023
Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers
Finding and Editing Multi-Modal Neurons in Pre-Trained Transformers
Haowen Pan
Yixin Cao
Xiaozhi Wang
Xun Yang
Meng Wang
KELM
38
24
0
13 Nov 2023
Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform
Legal-HNet: Mixing Legal Long-Context Tokens with Hartley Transform
Daniele Giofré
Sneha Ghantasala
AILaw
29
0
0
09 Nov 2023
ATHENA: Mathematical Reasoning with Thought Expansion
ATHENA: Mathematical Reasoning with Thought Expansion
JB. Kim
Hazel Kim
Joonghyuk Hahn
Yo-Sub Han
ReLM
LRM
AIMat
40
6
0
02 Nov 2023
Syntactic Inductive Bias in Transformer Language Models: Especially
  Helpful for Low-Resource Languages?
Syntactic Inductive Bias in Transformer Language Models: Especially Helpful for Low-Resource Languages?
Luke Gessler
Nathan Schneider
11
1
0
01 Nov 2023
Improving Prompt Tuning with Learned Prompting Layers
Improving Prompt Tuning with Learned Prompting Layers
Wei Zhu
Ming Tan
VLM
31
1
0
31 Oct 2023
Pushdown Layers: Encoding Recursive Structure in Transformer Language
  Models
Pushdown Layers: Encoding Recursive Structure in Transformer Language Models
Shikhar Murty
Pratyusha Sharma
Jacob Andreas
Christopher D. Manning
AI4CE
41
13
0
29 Oct 2023
Roles of Scaling and Instruction Tuning in Language Perception: Model
  vs. Human Attention
Roles of Scaling and Instruction Tuning in Language Perception: Model vs. Human Attention
Changjiang Gao
Shujian Huang
Jixing Li
Jiajun Chen
LRM
ALM
32
6
0
29 Oct 2023
Probing LLMs for Joint Encoding of Linguistic Categories
Probing LLMs for Joint Encoding of Linguistic Categories
Giulio Starace
Konstantinos Papakostas
Rochelle Choenni
Apostolos Panagiotopoulos
Matteo Rosati
Alina Leidinger
Ekaterina Shutova
15
5
0
28 Oct 2023
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
27
27
0
26 Oct 2023
WSDMS: Debunk Fake News via Weakly Supervised Detection of Misinforming
  Sentences with Contextualized Social Wisdom
WSDMS: Debunk Fake News via Weakly Supervised Detection of Misinforming Sentences with Contextualized Social Wisdom
Ruichao Yang
Wei Gao
Jing Ma
Hongzhan Lin
Zhiwei Yang
24
5
0
25 Oct 2023
PartialFormer: Modeling Part Instead of Whole for Machine Translation
PartialFormer: Modeling Part Instead of Whole for Machine Translation
Tong Zheng
Bei Li
Huiwen Bao
Jiale Wang
Weiqiao Shan
Tong Xiao
Jingbo Zhu
MoE
AI4CE
13
0
0
23 Oct 2023
Continual Named Entity Recognition without Catastrophic Forgetting
Continual Named Entity Recognition without Catastrophic Forgetting
Duzhen Zhang
Wei Cong
Jiahua Dong
Yahan Yu
Xiuyi Chen
Yonggang Zhang
Zhen Fang
23
10
0
23 Oct 2023
Attention-Enhancing Backdoor Attacks Against BERT-based Models
Attention-Enhancing Backdoor Attacks Against BERT-based Models
Weimin Lyu
Songzhu Zheng
Lu Pang
Haibin Ling
Chao Chen
21
34
0
23 Oct 2023
UniMAP: Universal SMILES-Graph Representation Learning
UniMAP: Universal SMILES-Graph Representation Learning
Shikun Feng
Lixin Yang
Wei-Ying Ma
Yanyan Lan
OffRL
11
6
0
22 Oct 2023
Implications of Annotation Artifacts in Edge Probing Test Datasets
Implications of Annotation Artifacts in Edge Probing Test Datasets
Sagnik Ray Choudhury
Jushaan Kalra
16
0
0
20 Oct 2023
Plausibility Processing in Transformer Language Models: Focusing on the
  Role of Attention Heads in GPT
Plausibility Processing in Transformer Language Models: Focusing on the Role of Attention Heads in GPT
Soo Hyun Ryu
11
0
0
20 Oct 2023
The Locality and Symmetry of Positional Encodings
The Locality and Symmetry of Positional Encodings
Lihu Chen
Gaël Varoquaux
Fabian M. Suchanek
25
0
0
19 Oct 2023
Are Structural Concepts Universal in Transformer Language Models?
  Towards Interpretable Cross-Lingual Generalization
Are Structural Concepts Universal in Transformer Language Models? Towards Interpretable Cross-Lingual Generalization
Ningyu Xu
Qi Zhang
Jingting Ye
Menghan Zhang
Xuanjing Huang
38
4
0
19 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
25
0
0
17 Oct 2023
Untying the Reversal Curse via Bidirectional Language Model Editing
Untying the Reversal Curse via Bidirectional Language Model Editing
Jun-Yu Ma
Jia-Chen Gu
Zhen-Hua Ling
Quan Liu
Cong Liu
KELM
79
36
0
16 Oct 2023
Interpreting and Exploiting Functional Specialization in Multi-Head
  Attention under Multi-task Learning
Interpreting and Exploiting Functional Specialization in Multi-Head Attention under Multi-task Learning
Chong Li
Shaonan Wang
Yunhao Zhang
Jiajun Zhang
Chengqing Zong
27
4
0
16 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech
  Transformers
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
A. Alishahi
28
12
0
15 Oct 2023
Advancing Perception in Artificial Intelligence through Principles of
  Cognitive Science
Advancing Perception in Artificial Intelligence through Principles of Cognitive Science
Palaash Agrawal
Cheston Tan
Heena Rathore
51
1
0
13 Oct 2023
Linear Latent World Models in Simple Transformers: A Case Study on
  Othello-GPT
Linear Latent World Models in Simple Transformers: A Case Study on Othello-GPT
D. Hazineh
Zechen Zhang
Jeffery Chiu
25
6
0
11 Oct 2023
Evaluating Explanation Methods for Vision-and-Language Navigation
Evaluating Explanation Methods for Vision-and-Language Navigation
Guanqi Chen
Lei Yang
Guanhua Chen
Jia Pan
XAI
21
0
0
10 Oct 2023
Rethinking Model Selection and Decoding for Keyphrase Generation with
  Pre-trained Sequence-to-Sequence Models
Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models
Di Wu
Wasi Uddin Ahmad
Kai-Wei Chang
41
9
0
10 Oct 2023
An Attribution Method for Siamese Encoders
An Attribution Method for Siamese Encoders
Lucas Moller
Dmitry Nikolaev
Sebastian Padó
15
4
0
09 Oct 2023
Breaking Down Word Semantics from Pre-trained Language Models through
  Layer-wise Dimension Selection
Breaking Down Word Semantics from Pre-trained Language Models through Layer-wise Dimension Selection
Nayoung Choi
16
0
0
08 Oct 2023
Uncovering hidden geometry in Transformers via disentangling position
  and context
Uncovering hidden geometry in Transformers via disentangling position and context
Jiajun Song
Yiqiao Zhong
26
10
0
07 Oct 2023
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
Discovering Knowledge-Critical Subnetworks in Pretrained Language Models
Deniz Bayazit
Negar Foroutan
Zeming Chen
Gail Weiss
Antoine Bosselut
KELM
19
13
0
04 Oct 2023
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Suyu Ge
Yunan Zhang
Liyuan Liu
Minjia Zhang
Jiawei Han
Jianfeng Gao
4
215
0
03 Oct 2023
Nugget: Neural Agglomerative Embeddings of Text
Nugget: Neural Agglomerative Embeddings of Text
Guanghui Qin
Benjamin Van Durme
24
18
0
03 Oct 2023
Defending Against Authorship Identification Attacks
Defending Against Authorship Identification Attacks
Haining Wang
19
1
0
02 Oct 2023
Faithful Explanations of Black-box NLP Models Using LLM-generated
  Counterfactuals
Faithful Explanations of Black-box NLP Models Using LLM-generated Counterfactuals
Y. Gat
Nitay Calderon
Amir Feder
Alexander Chapanin
Amit Sharma
Roi Reichart
21
28
0
01 Oct 2023
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of
  Language Models
Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models
Mert Yuksekgonul
Varun Chandrasekaran
Erik Jones
Suriya Gunasekar
Ranjita Naik
Hamid Palangi
Ece Kamar
Besmira Nushi
HILM
13
40
0
26 Sep 2023
Previous
123456...161718
Next