ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04341
  4. Cited By
What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
    MILM
ArXivPDFHTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 883 papers shown
Title
AMPLIFY:Attention-based Mixup for Performance Improvement and Label
  Smoothing in Transformer
AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in Transformer
Leixin Yang
Yu Xiang
23
0
0
22 Sep 2023
AttentionMix: Data augmentation method that relies on BERT attention
  mechanism
AttentionMix: Data augmentation method that relies on BERT attention mechanism
Dominik Lewy
Jacek Mañdziuk
8
3
0
20 Sep 2023
Weakly Supervised Reasoning by Neuro-Symbolic Approaches
Weakly Supervised Reasoning by Neuro-Symbolic Approaches
Xianggen Liu
Zhengdong Lu
Lili Mou
LRM
NAI
22
4
0
19 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
24
62
0
13 Sep 2023
Generating Natural Language Queries for More Effective Systematic Review
  Screening Prioritisation
Generating Natural Language Queries for More Effective Systematic Review Screening Prioritisation
Shuai Wang
Harrisen Scells
Martin Potthast
Bevan Koopman
Guido Zuccon
22
10
0
11 Sep 2023
DeViT: Decomposing Vision Transformers for Collaborative Inference in
  Edge Devices
DeViT: Decomposing Vision Transformers for Collaborative Inference in Edge Devices
Guanyu Xu
Zhiwei Hao
Yong Luo
Han Hu
J. An
Shiwen Mao
ViT
37
14
0
10 Sep 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Neurons in Large Language Models: Dead, N-gram, Positional
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
22
45
0
09 Sep 2023
One Wide Feedforward is All You Need
One Wide Feedforward is All You Need
Telmo Pires
António V. Lopes
Yannick Assogba
Hendra Setiawan
27
12
0
04 Sep 2023
A Visual Interpretation-Based Self-Improved Classification System Using
  Virtual Adversarial Training
A Visual Interpretation-Based Self-Improved Classification System Using Virtual Adversarial Training
Shuai Jiang
Sayaka Kamei
Chen Li
Shengzhe Hou
Yasuhiko Morimoto
SSL
11
1
0
03 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A Survey
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Mengnan Du
LRM
21
408
0
02 Sep 2023
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive
  Languages
Evaluating Transformer's Ability to Learn Mildly Context-Sensitive Languages
Shunjie Wang
Shane Steinert-Threlkeld
25
4
0
02 Sep 2023
Why do universal adversarial attacks work on large language models?:
  Geometry might be the answer
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
14
10
0
01 Sep 2023
Consensus of state of the art mortality prediction models: From
  all-cause mortality to sudden death prediction
Consensus of state of the art mortality prediction models: From all-cause mortality to sudden death prediction
Yola Jones
F. Deligianni
Jeffrey Stephen Dalton
P. Pellicori
John G. F. Cleland
OOD
16
0
0
30 Aug 2023
Uncertainty Estimation of Transformers' Predictions via Topological
  Analysis of the Attention Matrices
Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices
Elizaveta Kostenok
D. Cherniavskii
Alexey Zaytsev
41
5
0
22 Aug 2023
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Radio2Text: Streaming Speech Recognition Using mmWave Radio Signals
Running Zhao
Jiang-Tao Luca Yu
H. Zhao
Edith C. H. Ngai
24
4
0
16 Aug 2023
Task Conditioned BERT for Joint Intent Detection and Slot-filling
Task Conditioned BERT for Joint Intent Detection and Slot-filling
Diogo Tavares
Pedro Azevedo
David Semedo
R. Sousa
João Magalhães
14
4
0
11 Aug 2023
Slot Induction via Pre-trained Language Model Probing and Multi-level
  Contrastive Learning
Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
Hoang Nguyen
Chenwei Zhang
Ye Liu
Philip S. Yu
26
5
0
09 Aug 2023
Trusting Language Models in Education
Trusting Language Models in Education
J. Neto
Li-Ming Deng
Thejaswi Raya
Reza Shahbazi
Nick Liu
Adhitya Venkatesh
Miral Shah
Neeru Khosla
Rodrigo Guido
19
0
0
07 Aug 2023
Prompt Guided Copy Mechanism for Conversational Question Answering
Prompt Guided Copy Mechanism for Conversational Question Answering
Yong Zhang
Zhitao Li
Jianzong Wang
Yiming Gao
Ning Cheng
Fengying Yu
Jing Xiao
12
0
0
07 Aug 2023
Explaining Relation Classification Models with Semantic Extents
Explaining Relation Classification Models with Semantic Extents
Lars Klöser
André Büsgen
Philipp Kohl
Bodo Kraft
Albert Zündorf
14
0
0
04 Aug 2023
MDViT: Multi-domain Vision Transformer for Small Medical Image
  Segmentation Datasets
MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets
Siyi Du
Nourhan Bayasi
Ghassan Hamarneh
Rafeef Garbi
ViT
18
18
0
05 Jul 2023
The Inner Sentiments of a Thought
The Inner Sentiments of a Thought
Christian Gagné
Peter Dayan
33
4
0
04 Jul 2023
Transformers in Healthcare: A Survey
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
19
25
0
30 Jun 2023
Constraint-aware and Ranking-distilled Token Pruning for Efficient
  Transformer Inference
Constraint-aware and Ranking-distilled Token Pruning for Efficient Transformer Inference
Junyan Li
Li Lyna Zhang
Jiahang Xu
Yujing Wang
Shaoguang Yan
...
Ting Cao
Hao-Lun Sun
Weiwei Deng
Qi Zhang
Mao Yang
25
10
0
26 Jun 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads
  Do Nothing
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
13
88
0
22 Jun 2023
Opening the Black Box: Analyzing Attention Weights and Hidden States in
  Pre-trained Language Models for Non-language Tasks
Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kühnberger
28
2
0
21 Jun 2023
Explicit Syntactic Guidance for Neural Text Generation
Explicit Syntactic Guidance for Neural Text Generation
Yafu Li
Leyang Cui
Jianhao Yan
Yongjng Yin
Wei Bi
Shuming Shi
Yue Zhang
16
9
0
20 Jun 2023
Did the Models Understand Documents? Benchmarking Models for Language
  Understanding in Document-Level Relation Extraction
Did the Models Understand Documents? Benchmarking Models for Language Understanding in Document-Level Relation Extraction
Haotian Chen
Bingsheng Chen
Xiangdong Zhou
43
6
0
20 Jun 2023
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided
  Framework
PEACE: Cross-Platform Hate Speech Detection- A Causality-guided Framework
Paras Sheth
Tharindu Kumarage
Raha Moraffah
Amanat Chadha
Huan Liu
31
7
0
15 Jun 2023
Is Anisotropy Inherent to Transformers?
Is Anisotropy Inherent to Transformers?
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
17
3
0
13 Jun 2023
Actively Supervised Clustering for Open Relation Extraction
Actively Supervised Clustering for Open Relation Extraction
Jun Zhao
Yongxin Zhang
Qi Zhang
Tao Gui
Zhongyu Wei
Minlong Peng
Mingming Sun
11
5
0
08 Jun 2023
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural
  Language Understanding
InfoPrompt: Information-Theoretic Soft Prompt Tuning for Natural Language Understanding
Junda Wu
Tong Yu
Rui Wang
Zhao-quan Song
Ruiyi Zhang
Handong Zhao
Chaochao Lu
Shuai Li
Ricardo Henao
VLM
31
22
0
08 Jun 2023
Causal interventions expose implicit situation models for commonsense
  language understanding
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
17
6
0
06 Jun 2023
CUE: An Uncertainty Interpretation Framework for Text Classifiers Built
  on Pre-Trained Language Models
CUE: An Uncertainty Interpretation Framework for Text Classifiers Built on Pre-Trained Language Models
Jiazheng Li
ZHAOYUE SUN
Bin Liang
Lin Gui
Yulan He
10
2
0
06 Jun 2023
Representational Strengths and Limitations of Transformers
Representational Strengths and Limitations of Transformers
Clayton Sanford
Daniel J. Hsu
Matus Telgarsky
22
81
0
05 Jun 2023
DecompX: Explaining Transformers Decisions by Propagating Token
  Decomposition
DecompX: Explaining Transformers Decisions by Propagating Token Decomposition
Ali Modarressi
Mohsen Fayyaz
Ehsan Aghazadeh
Yadollah Yaghoobzadeh
Mohammad Taher Pilehvar
25
25
0
05 Jun 2023
Span Identification of Epistemic Stance-Taking in Academic Written
  English
Span Identification of Epistemic Stance-Taking in Academic Written English
Masaki Eguchi
K. Kyle
11
5
0
03 Jun 2023
Do Large Language Models Pay Similar Attention Like Human Programmers
  When Generating Code?
Do Large Language Models Pay Similar Attention Like Human Programmers When Generating Code?
Bonan Kou
Shengmai Chen
Zhijie Wang
Lei Ma
Tianyi Zhang
ALM
11
13
0
02 Jun 2023
Learning Transformer Programs
Learning Transformer Programs
Dan Friedman
Alexander Wettig
Danqi Chen
28
32
0
01 Jun 2023
ACLM: A Selective-Denoising based Generative Data Augmentation Approach
  for Low-Resource Complex NER
ACLM: A Selective-Denoising based Generative Data Augmentation Approach for Low-Resource Complex NER
Sreyan Ghosh
Utkarsh Tyagi
Manan Suri
Sonal Kumar
S. Ramaneswaran
Dinesh Manocha
28
15
0
01 Jun 2023
Masked Autoencoders with Multi-Window Local-Global Attention Are Better
  Audio Learners
Masked Autoencoders with Multi-Window Local-Global Attention Are Better Audio Learners
Sarthak Yadav
Sergios Theodoridis
Lars Kai Hansen
Z. Tan
20
7
0
01 Jun 2023
Assessing Word Importance Using Models Trained for Semantic Tasks
Assessing Word Importance Using Models Trained for Semantic Tasks
Dávid Javorský
Ondrej Bojar
François Yvon
19
2
0
31 May 2023
Emergent Modularity in Pre-trained Transformers
Emergent Modularity in Pre-trained Transformers
Zhengyan Zhang
Zhiyuan Zeng
Yankai Lin
Chaojun Xiao
Xiaozhi Wang
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
37
23
0
28 May 2023
Robust Natural Language Understanding with Residual Attention Debiasing
Robust Natural Language Understanding with Residual Attention Debiasing
Fei Wang
James Y. Huang
Tianyi Yan
Wenxuan Zhou
Muhao Chen
29
10
0
28 May 2023
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model
  Fine-tuning
Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning
Zhen-Ru Zhang
Chuanqi Tan
Haiyang Xu
Chengyu Wang
Jun Huang
Songfang Huang
25
29
0
24 May 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight
  Initialisation and Distillation Objectives
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives
Xinpeng Wang
Leonie Weissweiler
Hinrich Schütze
Barbara Plank
18
8
0
24 May 2023
SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language
  Representations
SenteCon: Leveraging Lexicons to Learn Human-Interpretable Language Representations
Victoria Lin
Louis-Philippe Morency
MILM
14
1
0
24 May 2023
All Roads Lead to Rome? Exploring the Invariance of Transformers'
  Representations
All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations
Yuxin Ren
Qipeng Guo
Zhijing Jin
Shauli Ravfogel
Mrinmaya Sachan
Bernhard Schölkopf
Ryan Cotterell
19
4
0
23 May 2023
Understanding and Mitigating Spurious Correlations in Text
  Classification with Neighborhood Analysis
Understanding and Mitigating Spurious Correlations in Text Classification with Neighborhood Analysis
Oscar Chew
Hsuan-Tien Lin
Kai-Wei Chang
Kuan-Hao Huang
32
5
0
23 May 2023
GATology for Linguistics: What Syntactic Dependencies It Knows
GATology for Linguistics: What Syntactic Dependencies It Knows
Yuqian Dai
S. Sharoff
M. Kamps
24
0
0
22 May 2023
Previous
123...567...161718
Next