ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.04341
  4. Cited By
What Does BERT Look At? An Analysis of BERT's Attention

What Does BERT Look At? An Analysis of BERT's Attention

11 June 2019
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
    MILM
ArXivPDFHTML

Papers citing "What Does BERT Look At? An Analysis of BERT's Attention"

50 / 883 papers shown
Title
Teaching Probabilistic Logical Reasoning to Transformers
Teaching Probabilistic Logical Reasoning to Transformers
Aliakbar Nafar
K. Venable
Parisa Kordjamshidi
ReLM
LRM
16
3
0
22 May 2023
LMs: Understanding Code Syntax and Semantics for Code Analysis
LMs: Understanding Code Syntax and Semantics for Code Analysis
Wei Ma
Shangqing Liu
Zhihao Lin
Wenhan Wang
Q. Hu
Ye Liu
Cen Zhang
Liming Nie
Li Li
Yang Liu
29
16
0
20 May 2023
Constructing Word-Context-Coupled Space Aligned with Associative
  Knowledge Relations for Interpretable Language Modeling
Constructing Word-Context-Coupled Space Aligned with Associative Knowledge Relations for Interpretable Language Modeling
Fanyu Wang
Zhenping Xie
19
0
0
19 May 2023
Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words
  Extraction with Wordpieces and Aspect Enhancement
Trading Syntax Trees for Wordpieces: Target-oriented Opinion Words Extraction with Wordpieces and Aspect Enhancement
Samuel Mensah
Kai Sun
Nikolaos Aletras
19
1
0
18 May 2023
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings
Qian Chen
Wen Wang
Qinglin Zhang
Siqi Zheng
Chong Deng
Hai Yu
Jiaqing Liu
Yukun Ma
Chong Zhang
16
3
0
18 May 2023
Probing the Role of Positional Information in Vision-Language Models
Probing the Role of Positional Information in Vision-Language Models
Philipp J. Rösch
Jindrich Libovický
16
8
0
17 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Zhengxuan Wu
Atticus Geiger
Thomas Icard
Christopher Potts
Noah D. Goodman
MILM
36
81
0
15 May 2023
Continual Multimodal Knowledge Graph Construction
Continual Multimodal Knowledge Graph Construction
Xiang Chen
Jintian Zhang
Xiaohan Wang
Ningyu Zhang
Tongtong Wu
Luo Si
Yongheng Wang
Huajun Chen
KELM
CLL
27
14
0
15 May 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent
  English?
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDa
LRM
18
237
0
12 May 2023
Think Twice: Measuring the Efficiency of Eliminating Prediction
  Shortcuts of Question Answering Models
Think Twice: Measuring the Efficiency of Eliminating Prediction Shortcuts of Question Answering Models
Lukávs Mikula
Michal vStefánik
Marek Petrovivc
Petr Sojka
33
3
0
11 May 2023
HiFi: High-Information Attention Heads Hold for Parameter-Efficient
  Model Adaptation
HiFi: High-Information Attention Heads Hold for Parameter-Efficient Model Adaptation
Anchun Gui
Han Xiao
19
4
0
08 May 2023
Transformer Working Memory Enables Regular Language Reasoning and
  Natural Language Length Extrapolation
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
14
12
0
05 May 2023
AttentionViz: A Global View of Transformer Attention
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
33
52
0
04 May 2023
Entity Tracking in Language Models
Entity Tracking in Language Models
Najoung Kim
Sebastian Schuster
52
16
0
03 May 2023
Causality-aware Concept Extraction based on Knowledge-guided Prompting
Causality-aware Concept Extraction based on Knowledge-guided Prompting
Siyu Yuan
Deqing Yang
Jinxi Liu
Shuyu Tian
Jiaqing Liang
Yanghua Xiao
R. Xie
61
13
0
03 May 2023
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Amanda Bertsch
Uri Alon
Graham Neubig
Matthew R. Gormley
RALM
94
122
0
02 May 2023
Logion: Machine Learning for Greek Philology
Logion: Machine Learning for Greek Philology
Charlie Cowen-Breen
Creston Brooks
J. Haubold
B. Graziosi
13
4
0
01 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical
  abilities in a pre-trained language model
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
189
119
0
30 Apr 2023
What does BERT learn about prosody?
What does BERT learn about prosody?
Sofoklis Kakouros
Johannah O'Mahony
MILM
12
4
0
25 Apr 2023
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Improving Autoregressive NLP Tasks via Modular Linearized Attention
Victor Agostinelli
Lizhong Chen
22
1
0
17 Apr 2023
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
26
6
0
13 Apr 2023
Can BERT eat RuCoLA? Topological Data Analysis to Explain
Can BERT eat RuCoLA? Topological Data Analysis to Explain
Irina Proskurina
Irina Piontkovskaya
Ekaterina Artemova
69
3
0
04 Apr 2023
Coupling Artificial Neurons in BERT and Biological Neurons in the Human
  Brain
Coupling Artificial Neurons in BERT and Biological Neurons in the Human Brain
Xu Liu
Mengyue Zhou
Gaosheng Shi
Yu Du
Lin Zhao
Zihao Wu
David Liu
Tianming Liu
Xintao Hu
28
10
0
27 Mar 2023
Language Model Behavior: A Comprehensive Survey
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
27
103
0
20 Mar 2023
Attention-likelihood relationship in transformers
Attention-likelihood relationship in transformers
Valeria Ruscio
Valentino Maiorca
Fabrizio Silvestri
11
1
0
15 Mar 2023
Finding the Needle in a Haystack: Unsupervised Rationale Extraction from
  Long Text Classifiers
Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers
Kamil Bujel
Andrew Caines
H. Yannakoudakis
Marek Rei
AI4TS
19
1
0
14 Mar 2023
The Life Cycle of Knowledge in Big Language Models: A Survey
The Life Cycle of Knowledge in Big Language Models: A Survey
Boxi Cao
Hongyu Lin
Xianpei Han
Le Sun
KELM
26
27
0
14 Mar 2023
Input-length-shortening and text generation via attention values
Input-length-shortening and text generation via attention values
Necset Ozkan Tan
A. Peng
Joshua Bensemann
Qiming Bao
Tim Hartill
M. Gahegan
Michael Witbrock
21
1
0
14 Mar 2023
LUKE-Graph: A Transformer-based Approach with Gated Relational Graph
  Attention for Cloze-style Reading Comprehension
LUKE-Graph: A Transformer-based Approach with Gated Relational Graph Attention for Cloze-style Reading Comprehension
Shima Foolad
Kourosh Kiani
17
3
0
12 Mar 2023
How Do Transformers Learn Topic Structure: Towards a Mechanistic
  Understanding
How Do Transformers Learn Topic Structure: Towards a Mechanistic Understanding
Yuchen Li
Yuan-Fang Li
Andrej Risteski
117
61
0
07 Mar 2023
Ultra-High-Resolution Detector Simulation with Intra-Event Aware GAN and
  Self-Supervised Relational Reasoning
Ultra-High-Resolution Detector Simulation with Intra-Event Aware GAN and Self-Supervised Relational Reasoning
H. Hashemi
Nikolai Hartmann
Sahand Sharifzadeh
James Kahn
T. Kuhr
21
4
0
07 Mar 2023
Spelling convention sensitivity in neural language models
Spelling convention sensitivity in neural language models
Elizabeth Nielsen
Christo Kirov
Brian Roark
20
1
0
06 Mar 2023
A Survey on Long Text Modeling with Transformers
A Survey on Long Text Modeling with Transformers
Zican Dong
Tianyi Tang
Lunyi Li
Wayne Xin Zhao
VLM
19
53
0
28 Feb 2023
Inseq: An Interpretability Toolkit for Sequence Generation Models
Inseq: An Interpretability Toolkit for Sequence Generation Models
Gabriele Sarti
Nils Feldhus
Ludwig Sickert
Oskar van der Wal
Malvina Nissim
Arianna Bisazza
30
64
0
27 Feb 2023
SynGen: A Syntactic Plug-and-play Module for Generative Aspect-based
  Sentiment Analysis
SynGen: A Syntactic Plug-and-play Module for Generative Aspect-based Sentiment Analysis
Chengze Yu
Taiqiang Wu
Jiayi Li
Xingyu Bai
Yujiu Yang
28
10
0
25 Feb 2023
Mask-guided BERT for Few Shot Text Classification
Mask-guided BERT for Few Shot Text Classification
Wenxiong Liao
Zheng Liu
Haixing Dai
Zihao Wu
Yiyang Zhang
...
Dajiang Zhu
Tianming Liu
Sheng R. Li
Xiang Li
Hongmin Cai
VLM
42
39
0
21 Feb 2023
Evaluating the Effectiveness of Pre-trained Language Models in
  Predicting the Helpfulness of Online Product Reviews
Evaluating the Effectiveness of Pre-trained Language Models in Predicting the Helpfulness of Online Product Reviews
Ali Boluki
Javad Pourmostafa Roshan Sharami
D. Shterionov
17
1
0
19 Feb 2023
Representation Deficiency in Masked Language Modeling
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
79
7
0
04 Feb 2023
Analyzing Feed-Forward Blocks in Transformers through the Lens of
  Attention Maps
Analyzing Feed-Forward Blocks in Transformers through the Lens of Attention Maps
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
28
14
0
01 Feb 2023
Quantifying Context Mixing in Transformers
Quantifying Context Mixing in Transformers
Hosein Mohebbi
Willem H. Zuidema
Grzegorz Chrupała
A. Alishahi
164
24
0
30 Jan 2023
Can We Use Probing to Better Understand Fine-tuning and Knowledge
  Distillation of the BERT NLU?
Can We Use Probing to Better Understand Fine-tuning and Knowledge Distillation of the BERT NLU?
Jakub Ho'scilowicz
Marcin Sowanski
Piotr Czubowski
Artur Janicki
23
2
0
27 Jan 2023
Interpretability in Activation Space Analysis of Transformers: A Focused
  Survey
Interpretability in Activation Space Analysis of Transformers: A Focused Survey
Soniya Vijayakumar
AI4CE
27
3
0
22 Jan 2023
Deep Learning Models to Study Sentence Comprehension in the Human Brain
Deep Learning Models to Study Sentence Comprehension in the Human Brain
S. Arana
Jacques Pesnot Lerousseau
P. Hagoort
21
10
0
16 Jan 2023
Topics in Contextualised Attention Embeddings
Topics in Contextualised Attention Embeddings
Mozhgan Talebpour
A. G. S. D. Herrera
Shoaib Jameel
26
2
0
11 Jan 2023
Can Large Language Models Change User Preference Adversarially?
Can Large Language Models Change User Preference Adversarially?
Varshini Subhash
AAML
29
8
0
05 Jan 2023
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Don't Be So Sure! Boosting ASR Decoding via Confidence Relaxation
Tomer Wullach
Shlomo E. Chazan
22
1
0
27 Dec 2022
EIT: Enhanced Interactive Transformer
EIT: Enhanced Interactive Transformer
Tong Zheng
Bei Li
Huiwen Bao
Tong Xiao
Jingbo Zhu
24
2
0
20 Dec 2022
Attention as a Guide for Simultaneous Speech Translation
Attention as a Guide for Simultaneous Speech Translation
Sara Papi
Matteo Negri
Marco Turchi
24
30
0
15 Dec 2022
Explainability of Text Processing and Retrieval Methods: A Critical
  Survey
Explainability of Text Processing and Retrieval Methods: A Critical Survey
Sourav Saha
Debapriyo Majumdar
Mandar Mitra
8
5
0
14 Dec 2022
Mortality Prediction Models with Clinical Notes Using Sparse Attention
  at the Word and Sentence Levels
Mortality Prediction Models with Clinical Notes Using Sparse Attention at the Word and Sentence Levels
Miguel Rios
A. Abu-Hanna
16
0
0
12 Dec 2022
Previous
123...678...161718
Next