Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.08593
Cited By
v1
v2 (latest)
Revealing the Dark Secrets of BERT
21 August 2019
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revealing the Dark Secrets of BERT"
50 / 185 papers shown
Title
RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer
Haotian Ni
Yake Wei
Hang Liu
Gong Chen
Chong Peng
Hao Lin
Di Hu
OffRL
68
0
0
13 Jun 2025
Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models
Yukin Zhang
Qi Dong
102
0
0
23 May 2025
Intra-Layer Recurrence in Transformers for Language Modeling
Anthony Nguyen
Wenjun Lin
63
0
0
03 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
107
1
0
01 May 2025
Do Large Language Models know who did what to whom?
Joseph M. Denning
Xiaohan
Bryor Snefjella
Idan A. Blank
261
1
0
23 Apr 2025
Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning
Julian Minder
Clement Dumas
Caden Juang
Bilal Chugtai
Neel Nanda
172
1
0
03 Apr 2025
Parameter-Efficient Fine-Tuning for Foundation Models
Dan Zhang
Tao Feng
Lilong Xue
Yuandong Wang
Yuxiao Dong
J. Tang
234
12
0
23 Jan 2025
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Michael Toker
Ido Galil
Hadas Orgad
Rinon Gal
Yoad Tewel
Gal Chechik
Yonatan Belinkov
DiffM
100
2
0
12 Jan 2025
Multi-Head Explainer: A General Framework to Improve Explainability in CNNs and Transformers
Bohang Sun
Pietro Liò
ViT
AAML
152
1
0
02 Jan 2025
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
130
16
0
27 Jul 2024
SRViT: Vision Transformers for Estimating Radar Reflectivity from Satellite Observations at Scale
Jason Stock
Kyle Hilburn
Imme Ebert-Uphoff
Charles Anderson
70
2
0
20 Jun 2024
Latent Concept-based Explanation of NLP Models
Xuemin Yu
Fahim Dalvi
Nadir Durrani
Marzia Nouri
Hassan Sajjad
LRM
FAtt
59
3
0
18 Apr 2024
Deconstructing In-Context Learning: Understanding Prompts via Corruption
Namrata Shivagunde
Vladislav Lialin
Sherin Muckatira
Anna Rumshisky
89
3
0
02 Apr 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
86
1
0
27 Mar 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
108
3
0
18 Feb 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
145
12
0
25 Jan 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
Yun-Wei Chu
Dong-Jun Han
Christopher G. Brinton
136
4
0
15 Jan 2024
Decoding Layer Saliency in Language Transformers
Elizabeth M. Hou
Greg Castañón
MILM
67
0
0
09 Aug 2023
Multi-Task Learning Improves Performance In Deep Argument Mining Models
Amirhossein Farzam
Shashank Shekhar
Isaac Mehlhaff
Marco Morucci
48
1
0
03 Jul 2023
Not wacky vs. definitely wacky: A study of scalar adverbs in pretrained language models
Isabelle Lorge
J. Pierrehumbert
70
0
0
25 May 2023
All Roads Lead to Rome? Exploring the Invariance of Transformers' Representations
Yuxin Ren
Qipeng Guo
Zhijing Jin
Shauli Ravfogel
Mrinmaya Sachan
Bernhard Schölkopf
Ryan Cotterell
77
4
0
23 May 2023
Constructing Word-Context-Coupled Space Aligned with Associative Knowledge Relations for Interpretable Language Modeling
Fanyu Wang
Zhenping Xie
76
0
0
19 May 2023
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
79
55
0
04 May 2023
Zero-Shot Learning for Requirements Classification: An Exploratory Study
Waad Alhoshan
Alessio Ferrari
Liping Zhao
VLM
113
41
0
09 Feb 2023
An Empirical Study on the Transferability of Transformer Modules in Parameter-Efficient Fine-Tuning
Mohammad AkbarTajari
S. Rajaee
Mohammad Taher Pilehvar
50
2
0
01 Feb 2023
Towards Long-Term Time-Series Forecasting: Feature, Pattern, and Distribution
Yan Li
Xin Lu
Haoyi Xiong
Jian Tang
Jian Su
Bo Jin
Dejing Dou
AI4TS
70
27
0
05 Jan 2023
Attention as a Guide for Simultaneous Speech Translation
Sara Papi
Matteo Negri
Marco Turchi
93
31
0
15 Dec 2022
Explainability of Text Processing and Retrieval Methods: A Critical Survey
Sourav Saha
Debapriyo Majumdar
Mandar Mitra
96
5
0
14 Dec 2022
Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers
Tzu-Quan Lin
Tsung-Huan Yang
Chun-Yao Chang
Kuang-Ming Chen
Tzu-hsun Feng
Hung-yi Lee
Hao Tang
84
6
0
17 Nov 2022
LERT: A Linguistically-motivated Pre-trained Language Model
Yiming Cui
Wanxiang Che
Shijin Wang
Ting Liu
91
25
0
10 Nov 2022
Robust Lottery Tickets for Pre-trained Language Models
Rui Zheng
Rong Bao
Yuhao Zhou
Di Liang
Sirui Wang
Wei Wu
Tao Gui
Qi Zhang
Xuanjing Huang
AAML
87
14
0
06 Nov 2022
On the Transformation of Latent Space in Fine-Tuned NLP Models
Nadir Durrani
Hassan Sajjad
Fahim Dalvi
Firoj Alam
120
19
0
23 Oct 2022
Transparency Helps Reveal When Language Models Learn Meaning
Zhaofeng Wu
William Merrill
Hao Peng
Iz Beltagy
Noah A. Smith
59
10
0
14 Oct 2022
Shapley Head Pruning: Identifying and Removing Interference in Multilingual Transformers
William B. Held
Diyi Yang
VLM
102
6
0
11 Oct 2022
What the DAAM: Interpreting Stable Diffusion Using Cross Attention
Raphael Tang
Linqing Liu
Akshat Pandey
Zhiying Jiang
Gefei Yang
K. Kumar
Pontus Stenetorp
Jimmy J. Lin
Ferhan Ture
175
177
0
10 Oct 2022
Parameter-Efficient Tuning with Special Token Adaptation
Xiaoocong Yang
James Y. Huang
Wenxuan Zhou
Muhao Chen
89
12
0
10 Oct 2022
Analysis of Self-Attention Head Diversity for Conformer-based Automatic Speech Recognition
Kartik Audhkhasi
Yinghui Huang
Bhuvana Ramabhadran
Pedro J. Moreno
62
3
0
13 Sep 2022
Pre-Training a Graph Recurrent Network for Language Representation
Yile Wang
Linyi Yang
Zhiyang Teng
M. Zhou
Yue Zhang
GNN
81
1
0
08 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
156
114
0
31 Aug 2022
Coarse-to-Fine: Hierarchical Multi-task Learning for Natural Language Understanding
Zhaoye Fei
Yu Tian
Yongkang Wu
Xinyu Zhang
Yutao Zhu
...
Dejiang Kong
Ruofei Lai
Bo Zhao
Zhicheng Dou
Xipeng Qiu
282
1
0
19 Aug 2022
DataPerf: Benchmarks for Data-Centric AI Development
Mark Mazumder
Colby R. Banbury
Xiaozhe Yao
Bojan Karlavs
W. G. Rojas
...
Carole-Jean Wu
Cody Coleman
Andrew Y. Ng
Peter Mattson
Vijay Janapa Reddi
VLM
87
105
0
20 Jul 2022
What does Transformer learn about source code?
Kechi Zhang
Ge Li
Zhi Jin
ViT
90
8
0
18 Jul 2022
Embedding Recycling for Language Models
Jon Saad-Falcon
Amanpreet Singh
Luca Soldaini
Mike DÁrcy
Arman Cohan
Doug Downey
KELM
60
4
0
11 Jul 2022
Proton: Probing Schema Linking Information from Pre-trained Language Models for Text-to-SQL Parsing
Lihan Wang
Bowen Qin
Binyuan Hui
Bowen Li
Min Yang
Bailin Wang
Binhua Li
Fei Huang
Luo Si
Yongbin Li
135
44
0
28 Jun 2022
Understanding Long Programming Languages with Structure-Aware Sparse Attention
Tingting Liu
Chengyu Wang
Cen Chen
Ming Gao
Aoying Zhou
65
3
0
27 May 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
165
45
0
23 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
110
6
0
21 May 2022
Acceptability Judgements via Examining the Topology of Attention Maps
D. Cherniavskii
Eduard Tulchinskii
Vladislav Mikhailov
Irina Proskurina
Laida Kushnareva
Ekaterina Artemova
S. Barannikov
Irina Piontkovskaya
D. Piontkovski
Evgeny Burnaev
826
20
0
19 May 2022
Discovering Latent Concepts Learned in BERT
Fahim Dalvi
A. Khan
Firoj Alam
Nadir Durrani
Jia Xu
Hassan Sajjad
SSL
50
61
0
15 May 2022
Silence is Sweeter Than Speech: Self-Supervised Model Using Silence to Store Speaker Information
Chiyu Feng
Po-Chun Hsu
Hung-yi Lee
SSL
86
8
0
08 May 2022
1
2
3
4
Next