ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04341
  4. Cited By
What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
    MILM
ArXivPDFHTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 885 papers shown
Title
Attention Weights in Transformer NMT Fail Aligning Words Between
  Sequences but Largely Explain Model Predictions
Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions
Javier Ferrando
Marta R. Costa-jussá
14
13
0
13 Sep 2021
GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based
  on Transformer Networks
GradTS: A Gradient-Based Automatic Auxiliary Task Selection Method Based on Transformer Networks
Weicheng Ma
Renze Lou
Kai Zhang
Lili Wang
Soroush Vosoughi
23
8
0
13 Sep 2021
Artificial Text Detection via Examining the Topology of Attention Maps
Artificial Text Detection via Examining the Topology of Attention Maps
Laida Kushnareva
D. Cherniavskii
Vladislav Mikhailov
Ekaterina Artemova
S. Barannikov
A. Bernstein
Irina Piontkovskaya
D. Piontkovski
Evgeny Burnaev
36
49
0
10 Sep 2021
Sparsity and Sentence Structure in Encoder-Decoder Attention of
  Summarization Systems
Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems
Potsawee Manakul
Mark J. F. Gales
18
5
0
08 Sep 2021
Eliminating Sentiment Bias for Aspect-Level Sentiment Classification
  with Unsupervised Opinion Extraction
Eliminating Sentiment Bias for Aspect-Level Sentiment Classification with Unsupervised Opinion Extraction
Bo Wang
Tao Shen
Guodong Long
Dinesh Manocha
Yi-Ju Chang
14
24
0
06 Sep 2021
CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing
  Human Trust in Image Recognition Models
CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models
Arjun Reddy Akula
Keze Wang
Changsong Liu
Sari Saba-Sadiya
Hongjing Lu
S. Todorovic
J. Chai
Song-Chun Zhu
27
47
0
03 Sep 2021
LightNER: A Lightweight Tuning Paradigm for Low-resource NER via
  Pluggable Prompting
LightNER: A Lightweight Tuning Paradigm for Low-resource NER via Pluggable Prompting
Xiang Chen
Lei Li
Shumin Deng
Chuanqi Tan
Changliang Xu
Fei Huang
Luo Si
Huajun Chen
Ningyu Zhang
VLM
34
65
0
31 Aug 2021
Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
Linyang Li
Demin Song
Xiaonan Li
Jiehang Zeng
Ruotian Ma
Xipeng Qiu
22
134
0
31 Aug 2021
Enjoy the Salience: Towards Better Transformer-based Faithful
  Explanations with Word Salience
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
G. Chrysostomou
Nikolaos Aletras
32
16
0
31 Aug 2021
T3-Vis: a visual analytic framework for Training and fine-Tuning
  Transformers in NLP
T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP
Raymond Li
Wen Xiao
Lanjun Wang
Hyeju Jang
Giuseppe Carenini
ViT
23
23
0
31 Aug 2021
Legal Search in Case Law and Statute Law
Legal Search in Case Law and Statute Law
Julien Rossi
Evangelos Kanoulas
AILaw
ELM
114
8
0
23 Aug 2021
VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case
  Law
VerbCL: A Dataset of Verbatim Quotes for Highlight Extraction in Case Law
Julien Rossi
Svitlana Vakulenko
Evangelos Kanoulas
AILaw
20
2
0
23 Aug 2021
Contributions of Transformer Attention Heads in Multi- and Cross-lingual
  Tasks
Contributions of Transformer Attention Heads in Multi- and Cross-lingual Tasks
Weicheng Ma
Kai Zhang
Renze Lou
Lili Wang
Soroush Vosoughi
111
15
0
18 Aug 2021
Post-hoc Interpretability for Neural NLP: A Survey
Post-hoc Interpretability for Neural NLP: A Survey
Andreas Madsen
Siva Reddy
A. Chandar
XAI
21
222
0
10 Aug 2021
Differentiable Subset Pruning of Transformer Heads
Differentiable Subset Pruning of Transformer Heads
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
37
53
0
10 Aug 2021
Knowledge Distillation from BERT Transformer to Speech Transformer for
  Intent Classification
Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification
Yiding Jiang
Bidisha Sharma
Maulik C. Madhavi
Haizhou Li
20
25
0
05 Aug 2021
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field
  and Far-field Attention
FMMformer: Efficient and Flexible Transformer via Decomposed Near-field and Far-field Attention
T. Nguyen
Vai Suliafu
Stanley J. Osher
Long Chen
Bao Wang
21
35
0
05 Aug 2021
Structural Guidance for Transformer Language Models
Structural Guidance for Transformer Language Models
Peng Qian
Tahira Naseem
R. Levy
Ramón Fernández Astudillo
39
31
0
30 Jul 2021
Graph-free Multi-hop Reading Comprehension: A Select-to-Guide Strategy
Graph-free Multi-hop Reading Comprehension: A Select-to-Guide Strategy
Bohong Wu
Zhuosheng Zhang
Hai Zhao
24
20
0
25 Jul 2021
Multi-Stream Transformers
Multi-Stream Transformers
Mikhail Burtsev
Anna Rumshisky
AI4CE
6
0
0
21 Jul 2021
Human Attention during Goal-directed Reading Comprehension Relies on
  Task Optimization
Human Attention during Goal-directed Reading Comprehension Relies on Task Optimization
Jiajie Zou
Yuran Zhang
Jialu Li
Xing Tian
Nai Ding
AIMat
32
2
0
13 Jul 2021
Hate versus Politics: Detection of Hate against Policy makers in Italian
  tweets
Hate versus Politics: Detection of Hate against Policy makers in Italian tweets
Armend Duzha
Cristiano Casadei
Michael Tosi
Fabio Celli
17
6
0
12 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
36
259
0
01 Jul 2021
Stable, Fast and Accurate: Kernelized Attention with Relative Positional
  Encoding
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding
Shengjie Luo
Shanda Li
Tianle Cai
Di He
Dinglan Peng
Shuxin Zheng
Guolin Ke
Liwei Wang
Tie-Yan Liu
27
50
0
23 Jun 2021
It's All in the Heads: Using Attention Heads as a Baseline for
  Cross-Lingual Transfer in Commonsense Reasoning
It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning
Alexey Tikhonov
Max Ryabinin
LRM
13
57
0
22 Jun 2021
Eigen Analysis of Self-Attention and its Reconstruction from Partial
  Computation
Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation
Srinadh Bhojanapalli
Ayan Chakrabarti
Himanshu Jain
Sanjiv Kumar
Michal Lukasik
Andreas Veit
16
8
0
16 Jun 2021
What Context Features Can Transformer Language Models Use?
What Context Features Can Transformer Language Models Use?
J. O'Connor
Jacob Andreas
KELM
23
75
0
15 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
37
815
0
14 Jun 2021
Thinking Like Transformers
Thinking Like Transformers
Gail Weiss
Yoav Goldberg
Eran Yahav
AI4CE
35
127
0
13 Jun 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT
  Knowledge Distillation
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation
Yuanxin Liu
Fandong Meng
Zheng Lin
Weiping Wang
Jie Zhou
19
6
0
10 Jun 2021
Neural Supervised Domain Adaptation by Augmenting Pre-trained Models
  with Random Units
Neural Supervised Domain Adaptation by Augmenting Pre-trained Models with Random Units
Sara Meftah
N. Semmar
Y. Tamaazousti
H. Essafi
F. Sadat
14
3
0
09 Jun 2021
On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness,
  and Semantic Evaluation
On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation
Wei Zhang
Ziming Huang
Yada Zhu
Guangnan Ye
Xiaodong Cui
Fan Zhang
23
17
0
09 Jun 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the
  Order of Reasoning
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning
Piotr Pikekos
Henryk Michalewski
Mateusz Malinowski
22
28
0
07 Jun 2021
Attend and select: A segment selective transformer for microblog hashtag
  generation
Attend and select: A segment selective transformer for microblog hashtag generation
Qianren Mao
Xi Li
Bang Liu
Shu Guo
Peng Hao
Jianxin Li
Lihong Wang
18
3
0
06 Jun 2021
Causal Abstractions of Neural Networks
Causal Abstractions of Neural Networks
Atticus Geiger
Hanson Lu
Thomas F. Icard
Christopher Potts
NAI
CML
15
217
0
06 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
22
372
0
04 Jun 2021
The Case for Translation-Invariant Self-Attention in Transformer-Based
  Language Models
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Ulme Wennberg
G. Henter
MILM
29
21
0
03 Jun 2021
SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption
  Evaluation via Typicality Analysis
SMURF: SeMantic and linguistic UndeRstanding Fusion for Caption Evaluation via Typicality Analysis
Joshua Forster Feinglass
Yezhou Yang
16
21
0
02 Jun 2021
On the Distribution, Sparsity, and Inference-time Quantization of
  Attention Values in Transformers
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers
Tianchu Ji
Shraddhan Jain
M. Ferdman
Peter Milder
H. A. Schwartz
Niranjan Balasubramanian
MQ
50
15
0
02 Jun 2021
Implicit Representations of Meaning in Neural Language Models
Implicit Representations of Meaning in Neural Language Models
Belinda Z. Li
Maxwell Nye
Jacob Andreas
NAI
MILM
8
169
0
01 Jun 2021
Using Integrated Gradients and Constituency Parse Trees to explain
  Linguistic Acceptability learnt by BERT
Using Integrated Gradients and Constituency Parse Trees to explain Linguistic Acceptability learnt by BERT
Anmol Nayak
Hariprasad Timmapathini
27
4
0
01 Jun 2021
Do Multilingual Neural Machine Translation Models Contain Language Pair
  Specific Attention Heads?
Do Multilingual Neural Machine Translation Models Contain Language Pair Specific Attention Heads?
Zae Myung Kim
Laurent Besacier
Vassilina Nikoulina
D. Schwab
MILM
47
7
0
31 May 2021
Cascaded Head-colliding Attention
Cascaded Head-colliding Attention
Lin Zheng
Zhiyong Wu
Lingpeng Kong
13
2
0
31 May 2021
On the Interplay Between Fine-tuning and Composition in Transformers
On the Interplay Between Fine-tuning and Composition in Transformers
Lang-Chi Yu
Allyson Ettinger
30
14
0
31 May 2021
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging
UCPhrase: Unsupervised Context-aware Quality Phrase Tagging
Xiaotao Gu
Zihan Wang
Zhenyu Bi
Yu Meng
Liyuan Liu
Jiawei Han
Jingbo Shang
85
36
0
28 May 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Shuhuai Ren
Junyang Lin
Guangxiang Zhao
Rui Men
An Yang
Jingren Zhou
Xu Sun
Hongxia Yang
26
36
0
28 May 2021
Inspecting the concept knowledge graph encoded by modern language models
Inspecting the concept knowledge graph encoded by modern language models
Carlos Aspillaga
Marcelo Mendoza
Alvaro Soto
21
13
0
27 May 2021
CogView: Mastering Text-to-Image Generation via Transformers
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViT
VLM
19
760
0
26 May 2021
Context-Sensitive Visualization of Deep Learning Natural Language
  Processing Models
Context-Sensitive Visualization of Deep Learning Natural Language Processing Models
A. Dunn
Diana Inkpen
Razvan Andonie
14
8
0
25 May 2021
VLM: Task-agnostic Video-Language Model Pre-training for Video
  Understanding
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Prahal Arora
Masoumeh Aminzadeh
Christoph Feichtenhofer
Florian Metze
Luke Zettlemoyer
26
129
0
20 May 2021
Previous
123...111213...161718
Next