ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.08593
  4. Cited By
Revealing the Dark Secrets of BERT
v1v2 (latest)

Revealing the Dark Secrets of BERT

21 August 2019
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
ArXiv (abs)PDFHTML

Papers citing "Revealing the Dark Secrets of BERT"

50 / 185 papers shown
Title
AdapterBias: Parameter-efficient Token-dependent Representation Shift
  for Adapters in NLP Tasks
AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
Chin-Lun Fu
Zih-Ching Chen
Yun-Ru Lee
Hung-yi Lee
85
49
0
30 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLMOffRL
80
23
0
22 Apr 2022
Probing for the Usage of Grammatical Number
Probing for the Usage of Grammatical Number
Karim Lasri
Tiago Pimentel
Alessandro Lenci
Thierry Poibeau
Ryan Cotterell
80
58
0
19 Apr 2022
Entropy-based Stability-Plasticity for Lifelong Learning
Entropy-based Stability-Plasticity for Lifelong Learning
Vladimir Araujo
J. Hurtado
Alvaro Soto
Marie-Francine Moens
CLL
71
15
0
18 Apr 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
77
7
0
15 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
90
127
0
14 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language
  Models via Attention Guiding
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Zhaochun Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
57
9
0
06 Apr 2022
Scaling Language Model Size in Cross-Device Federated Learning
Scaling Language Model Size in Cross-Device Federated Learning
Jae Hun Ro
Theresa Breiner
Lara McConnaughey
Mingqing Chen
A. Suresh
Shankar Kumar
Rajiv Mathews
FedML
61
26
0
31 Mar 2022
A Novel Perspective to Look At Attention: Bi-level Attention-based
  Explainable Topic Modeling for News Classification
A Novel Perspective to Look At Attention: Bi-level Attention-based Explainable Topic Modeling for News Classification
Dairui Liu
Derek Greene
Ruihai Dong
62
12
0
14 Mar 2022
Measuring the Mixing of Contextual Information in the Transformer
Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
103
57
0
08 Mar 2022
Controlling the Focus of Pretrained Language Generation Models
Controlling the Focus of Pretrained Language Generation Models
Jiabao Ji
Yoon Kim
James R. Glass
Tianxing He
118
5
0
02 Mar 2022
cosFormer: Rethinking Softmax in Attention
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
95
222
0
17 Feb 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
Revisiting Over-smoothing in BERT from the Perspective of Graph
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
89
76
0
17 Feb 2022
What Do They Capture? -- A Structural Analysis of Pre-Trained Language
  Models for Source Code
What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code
Yao Wan
Wei Zhao
Hongyu Zhang
Yulei Sui
Guandong Xu
Hairong Jin
103
113
0
14 Feb 2022
GAP-Gen: Guided Automatic Python Code Generation
GAP-Gen: Guided Automatic Python Code Generation
Junchen Zhao
Yurun Song
Junlin Wang
Ian G. Harris
60
6
0
19 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIPVLM
83
40
0
15 Jan 2022
Automatic Mixed-Precision Quantization Search of BERT
Automatic Mixed-Precision Quantization Search of BERT
Changsheng Zhao
Ting Hua
Yilin Shen
Qian Lou
Hongxia Jin
MQ
58
22
0
30 Dec 2021
Block-Skim: Efficient Question Answering for Transformer
Block-Skim: Efficient Question Answering for Transformer
Yue Guan
Zhengyi Li
Jingwen Leng
Zhouhan Lin
Minyi Guo
Yuhao Zhu
87
32
0
16 Dec 2021
Recent Advances in Natural Language Processing via Large Pre-Trained
  Language Models: A Survey
Recent Advances in Natural Language Processing via Large Pre-Trained Language Models: A Survey
Bonan Min
Hayley L Ross
Elior Sulem
Amir Pouran Ben Veyseh
Thien Huu Nguyen
Oscar Sainz
Eneko Agirre
Ilana Heinz
Dan Roth
LM&MAVLMAI4CE
189
1,094
0
01 Nov 2021
Interpreting Deep Learning Models in Natural Language Processing: A
  Review
Interpreting Deep Learning Models in Natural Language Processing: A Review
Xiaofei Sun
Diyi Yang
Xiaoya Li
Tianwei Zhang
Yuxian Meng
Han Qiu
Guoyin Wang
Eduard H. Hovy
Jiwei Li
97
46
0
20 Oct 2021
When in Doubt, Summon the Titans: Efficient Inference with Large Models
When in Doubt, Summon the Titans: Efficient Inference with Large Models
A. S. Rawat
Manzil Zaheer
A. Menon
Amr Ahmed
Sanjiv Kumar
38
7
0
19 Oct 2021
Identifying and Mitigating Spurious Correlations for Improving
  Robustness in NLP Models
Identifying and Mitigating Spurious Correlations for Improving Robustness in NLP Models
Tianlu Wang
Rohit Sridhar
Diyi Yang
Xuezhi Wang
AAML
215
77
0
14 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
109
129
0
05 Oct 2021
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient
  BERT Inference
DACT-BERT: Differentiable Adaptive Computation Time for an Efficient BERT Inference
Cristobal Eyzaguirre
Felipe del-Rio
Vladimir Araujo
Alvaro Soto
61
7
0
24 Sep 2021
AES Systems Are Both Overstable And Oversensitive: Explaining Why And
  Proposing Defenses
AES Systems Are Both Overstable And Oversensitive: Explaining Why And Proposing Defenses
Yaman Kumar Singla
Swapnil Parekh
Somesh Singh
Junjie Li
R. Shah
Changyou Chen
AAML
90
14
0
24 Sep 2021
Classification-based Quality Estimation: Small and Efficient Models for
  Real-world Applications
Classification-based Quality Estimation: Small and Efficient Models for Real-world Applications
Shuo Sun
Ahmed El-Kishky
Vishrav Chaudhary
James Cross
Francisco Guzmán
Lucia Specia
64
1
0
17 Sep 2021
Fine-Tuned Transformers Show Clusters of Similar Representations Across
  Layers
Fine-Tuned Transformers Show Clusters of Similar Representations Across Layers
Jason Phang
Haokun Liu
Samuel R. Bowman
86
29
0
17 Sep 2021
Incorporating Residual and Normalization Layers into Analysis of Masked
  Language Models
Incorporating Residual and Normalization Layers into Analysis of Masked Language Models
Goro Kobayashi
Tatsuki Kuribayashi
Sho Yokoi
Kentaro Inui
240
49
0
15 Sep 2021
T3-Vis: a visual analytic framework for Training and fine-Tuning
  Transformers in NLP
T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP
Raymond Li
Wen Xiao
Lanjun Wang
Hyeju Jang
Giuseppe Carenini
ViT
90
23
0
31 Aug 2021
Multilingual Multi-Aspect Explainability Analyses on Machine Reading
  Comprehension Models
Multilingual Multi-Aspect Explainability Analyses on Machine Reading Comprehension Models
Yiming Cui
Weinan Zhang
Wanxiang Che
Ting Liu
Zhigang Chen
Shijin Wang
LRM
47
9
0
26 Aug 2021
AMMUS : A Survey of Transformer-based Pretrained Models in Natural
  Language Processing
AMMUS : A Survey of Transformer-based Pretrained Models in Natural Language Processing
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
VLMLM&MA
105
270
0
12 Aug 2021
Multi-Stream Transformers
Multi-Stream Transformers
Andrey Kravchenko
Anna Rumshisky
AI4CE
24
0
0
21 Jul 2021
AutoBERT-Zero: Evolving BERT Backbone from Scratch
AutoBERT-Zero: Evolving BERT Backbone from Scratch
Jiahui Gao
Hang Xu
Han Shi
Xiaozhe Ren
Philip L. H. Yu
Xiaodan Liang
Xin Jiang
Zhenguo Li
85
37
0
15 Jul 2021
Learning Syntactic Dense Embedding with Correlation Graph for Automatic
  Readability Assessment
Learning Syntactic Dense Embedding with Correlation Graph for Automatic Readability Assessment
Xinying Qiu
Yuan Chen
Hanwu Chen
J. Nie
Yuming Shen
D. Lu
66
18
0
09 Jul 2021
Elbert: Fast Albert with Confidence-Window Based Early Exit
Elbert: Fast Albert with Confidence-Window Based Early Exit
Keli Xie
Siyuan Lu
Meiqi Wang
Zhongfeng Wang
54
20
0
01 Jul 2021
A Closer Look at How Fine-tuning Changes BERT
A Closer Look at How Fine-tuning Changes BERT
Yichu Zhou
Vivek Srikumar
82
68
0
27 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFinMQAI4MH
174
863
0
14 Jun 2021
A Survey of Transformers
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
197
1,147
0
08 Jun 2021
BERTnesia: Investigating the capture and forgetting of knowledge in BERT
BERTnesia: Investigating the capture and forgetting of knowledge in BERT
Jonas Wallat
Jaspreet Singh
Avishek Anand
CLLKELM
141
60
0
05 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained
  Transformer Compression
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
81
13
0
04 Jun 2021
On the Distribution, Sparsity, and Inference-time Quantization of
  Attention Values in Transformers
On the Distribution, Sparsity, and Inference-time Quantization of Attention Values in Transformers
Tianchu Ji
Shraddhan Jain
M. Ferdman
Peter Milder
H. Andrew Schwartz
Niranjan Balasubramanian
MQ
113
16
0
02 Jun 2021
Implicit Representations of Meaning in Neural Language Models
Implicit Representations of Meaning in Neural Language Models
Belinda Z. Li
Maxwell Nye
Jacob Andreas
NAIMILM
67
177
0
01 Jun 2021
HiddenCut: Simple Data Augmentation for Natural Language Understanding
  with Better Generalization
HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization
Jiaao Chen
Dinghan Shen
Weizhu Chen
Diyi Yang
BDL
74
48
0
31 May 2021
Attention Flows are Shapley Value Explanations
Attention Flows are Shapley Value Explanations
Kawin Ethayarajh
Dan Jurafsky
FAttTDI
81
35
0
31 May 2021
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Learning Relation Alignment for Calibrated Cross-modal Retrieval
Shuhuai Ren
Junyang Lin
Guangxiang Zhao
Rui Men
An Yang
Jingren Zhou
Xu Sun
Hongxia Yang
82
38
0
28 May 2021
Inspecting the concept knowledge graph encoded by modern language models
Inspecting the concept knowledge graph encoded by modern language models
Carlos Aspillaga
Marcelo Mendoza
Alvaro Soto
72
13
0
27 May 2021
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?
Laughing Heads: Can Transformers Detect What Makes a Sentence Funny?
Maxime Peyrard
Beatriz Borges
Kristina Gligorić
Robert West
72
13
0
19 May 2021
Effective Attention Sheds Light On Interpretability
Effective Attention Sheds Light On Interpretability
Kaiser Sun
Ana Marasović
MILM
61
16
0
18 May 2021
BERT Busters: Outlier Dimensions that Disrupt Transformers
BERT Busters: Outlier Dimensions that Disrupt Transformers
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
117
92
0
14 May 2021
Rationalization through Concepts
Rationalization through Concepts
Diego Antognini
Boi Faltings
FAtt
124
22
0
11 May 2021
Previous
1234
Next