Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1908.08593
Cited By
v1
v2 (latest)
Revealing the Dark Secrets of BERT
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
21 August 2019
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Revealing the Dark Secrets of BERT"
50 / 347 papers shown
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
168
5
0
16 Jun 2024
Exploring Alignment in Shared Cross-lingual Spaces
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Basel Mousi
Nadir Durrani
Fahim Dalvi
Majd Hawasly
Ahmed Abdelali
209
7
0
23 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
230
20
0
23 May 2024
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Haojie Duanmu
Zhihang Yuan
Xiuhong Li
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
286
31
0
10 May 2024
Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models
Shiyu Xia
Wenxuan Zhu
Xu Yang
Xin Geng
224
5
0
25 Apr 2024
Detecting Conceptual Abstraction in LLMs
Michaela Regneri
Alhassan Abdelhalim
Soren Laue
276
4
0
24 Apr 2024
What do Transformers Know about Government?
Jue Hou
Anisia Katinskaia
Lari Kotilainen
Sathianpong Trangcasanchai
Anh Vu
R. Yangarber
288
2
0
22 Apr 2024
Latent Concept-based Explanation of NLP Models
Xuemin Yu
Fahim Dalvi
Nadir Durrani
Marzia Nouri
Hassan Sajjad
LRM
FAtt
181
12
0
18 Apr 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
233
41
0
04 Apr 2024
Deconstructing In-Context Learning: Understanding Prompts via Corruption
International Conference on Language Resources and Evaluation (LREC), 2024
Namrata Shivagunde
Vladislav Lialin
Sherin Muckatira
Anna Rumshisky
353
8
0
02 Apr 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
266
3
0
27 Mar 2024
A Study on How Attention Scores in the BERT Model are Aware of Lexical Categories in Syntactic and Semantic Tasks on the GLUE Benchmark
Dongjun Jang
Sungjoo Byun
Hyopil Shin
164
5
0
25 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
International Conference on Learning Representations (ICLR), 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Willie Neiswanger
471
2
0
11 Mar 2024
Word Importance Explains How Prompts Affect Language Model Outputs
Stefan Hackmann
Haniyeh Mahmoudian
Mark Steadman
Michael Schmidt
AAML
480
10
0
05 Mar 2024
Topic Aware Probing: From Sentence Length Prediction to Idiom Identification how reliant are Neural Language Models on Topic?
Vasudevan Nedumpozhimana
John D. Kelleher
204
2
0
04 Mar 2024
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
272
162
0
27 Feb 2024
Probing Multimodal Large Language Models for Global and Local Semantic Representations
Mingxu Tao
Quzhe Huang
Kun Xu
Liwei Chen
Yansong Feng
Dongyan Zhao
327
11
0
27 Feb 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash
Tamar Rott Shaham
Tal Haklay
Yonatan Belinkov
David Bau
327
97
0
22 Feb 2024
Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao
Christof Monz
MQ
137
1
0
19 Feb 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
348
5
0
18 Feb 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI Models
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
324
15
0
25 Jan 2024
Leveraging Social Media Data to Identify Factors Influencing Public Attitude Towards Accessibility, Socioeconomic Disparity and Public Transportation
Khondhaker Al Momin
A. M. Sadri
Md Sami Hasnine
92
2
0
22 Jan 2024
Better Explain Transformers by Illuminating Important Information
Linxin Song
Yan Cui
Ao Luo
Freddy Lecue
Irene Li
FAtt
321
5
0
18 Jan 2024
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang
Zhiwei Wang
Junjie Yao
Zhangchen Zhou
Xiaolong Li
E. Weinan
Z. Xu
351
9
0
16 Jan 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
IEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Yun-Wei Chu
Dong-Jun Han
Christopher G. Brinton
310
6
0
15 Jan 2024
Towards Probing Contact Center Large Language Models
Varun Nathan
Ayush Kumar
Digvijay Ingle
Jithendra Vepa
142
0
0
26 Dec 2023
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment
Lingling Xu
Haoran Xie
S. J. Qin
Xiaohui Tao
F. Wang
308
282
0
19 Dec 2023
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang
David Yunis
Michael Maire
195
8
0
11 Dec 2023
Transformer as Linear Expansion of Learngene
AAAI Conference on Artificial Intelligence (AAAI), 2023
Shiyu Xia
Miaosen Zhang
Xu Yang
Ruiming Chen
Haokun Chen
Xin Geng
212
12
0
09 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Neural Information Processing Systems (NeurIPS), 2023
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
289
28
0
03 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
416
35
0
01 Dec 2023
Visual Analytics for Generative Transformer Models
Raymond Li
Ruixin Yang
Wen Xiao
Ahmed AbuRaed
Gabriel Murray
Giuseppe Carenini
234
3
0
21 Nov 2023
GradSim: Gradient-Based Language Grouping for Effective Multilingual Training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mingyang Wang
Heike Adel
Lukas Lange
Jannik Strötgen
Hinrich Schütze
239
4
0
23 Oct 2023
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject Number
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sophie Hao
Tal Linzen
179
8
0
23 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
253
0
0
17 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
Afra Alishahi
219
19
0
15 Oct 2023
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
International Conference on Learning Representations (ICLR), 2023
Suyu Ge
Yunan Zhang
Liyuan Liu
Minjia Zhang
Jiawei Han
Jianfeng Gao
452
380
0
03 Oct 2023
Grasping AI: experiential exercises for designers
Ai & Society (AI & Society), 2023
Dave Murray-Rust
M. Lupetti
Iohanna Nicenboim
W. V. D. Hoog
166
16
0
02 Oct 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
400
73
0
09 Sep 2023
Explainability for Large Language Models: A Survey
ACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
500
717
0
02 Sep 2023
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
215
16
0
01 Sep 2023
Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices
Elizaveta Kostenok
D. Cherniavskii
Alexey Zaytsev
264
9
0
22 Aug 2023
Scaling up Discovery of Latent Concepts in Deep NLP Models
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Majd Hawasly
Fahim Dalvi
Nadir Durrani
320
6
0
20 Aug 2023
PMET: Precise Model Editing in a Transformer
AAAI Conference on Artificial Intelligence (AAAI), 2023
Xiaopeng Li
Shasha Li
Shezheng Song
Jing Yang
Jun Ma
Jie Yu
KELM
549
186
0
17 Aug 2023
Decoding Layer Saliency in Language Transformers
International Conference on Machine Learning (ICML), 2023
Elizabeth M. Hou
Greg Castañón
MILM
282
3
0
09 Aug 2023
Improving BERT with Hybrid Pooling Network and Drop Mask
Qian Chen
Wen Wang
Qinglin Zhang
Chong Deng
Ma Yukun
Siqi Zheng
116
1
0
14 Jul 2023
Multi-Task Learning Improves Performance In Deep Argument Mining Models
Workshop on Argument Mining (ArgMining), 2023
Amirhossein Farzam
Shashank Shekhar
Isaac Mehlhaff
Marco Morucci
206
1
0
03 Jul 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing
Neural Information Processing Systems (NeurIPS), 2023
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
325
128
0
22 Jun 2023
Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kühnberger
120
2
0
21 Jun 2023
Explicit Syntactic Guidance for Neural Text Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yafu Li
Leyang Cui
Jianhao Yan
Yongjng Yin
Wei Bi
Shuming Shi
Yue Zhang
217
11
0
20 Jun 2023
Previous
1
2
3
4
5
6
7
Next
Page 2 of 7