ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.08593
  4. Cited By
Revealing the Dark Secrets of BERT
v1v2 (latest)

Revealing the Dark Secrets of BERT

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
21 August 2019
Olga Kovaleva
Alexey Romanov
Anna Rogers
Anna Rumshisky
ArXiv (abs)PDFHTML

Papers citing "Revealing the Dark Secrets of BERT"

50 / 347 papers shown
Outlier Reduction with Gated Attention for Improved Post-training
  Quantization in Large Sequence-to-sequence Speech Foundation Models
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
Korbinian Riedhammer
Tobias Bocklet
MQ
168
5
0
16 Jun 2024
Exploring Alignment in Shared Cross-lingual Spaces
Exploring Alignment in Shared Cross-lingual SpacesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Basel Mousi
Nadir Durrani
Fahim Dalvi
Majd Hawasly
Ahmed Abdelali
209
7
0
23 May 2024
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based
  LLMs
Mitigating Quantization Errors Due to Activation Spikes in GLU-Based LLMs
Jaewoo Yang
Hayun Kim
Younghoon Kim
230
20
0
23 May 2024
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language
  Models
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models
Haojie Duanmu
Zhihang Yuan
Xiuhong Li
Jiangfei Duan
Xingcheng Zhang
Dahua Lin
MQ
286
31
0
10 May 2024
Exploring Learngene via Stage-wise Weight Sharing for Initializing
  Variable-sized Models
Exploring Learngene via Stage-wise Weight Sharing for Initializing Variable-sized Models
Shiyu Xia
Wenxuan Zhu
Xu Yang
Xin Geng
224
5
0
25 Apr 2024
Detecting Conceptual Abstraction in LLMs
Detecting Conceptual Abstraction in LLMs
Michaela Regneri
Alhassan Abdelhalim
Soren Laue
276
4
0
24 Apr 2024
What do Transformers Know about Government?
What do Transformers Know about Government?
Jue Hou
Anisia Katinskaia
Lari Kotilainen
Sathianpong Trangcasanchai
Anh Vu
R. Yangarber
288
2
0
22 Apr 2024
Latent Concept-based Explanation of NLP Models
Latent Concept-based Explanation of NLP Models
Xuemin Yu
Fahim Dalvi
Nadir Durrani
Marzia Nouri
Hassan Sajjad
LRMFAtt
181
12
0
18 Apr 2024
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Outlier-Efficient Hopfield Layers for Large Transformer-Based Models
Jerry Yao-Chieh Hu
Pei-Hsuan Chang
Haozheng Luo
Hong-Yu Chen
Weijian Li
Wei-Po Wang
Han Liu
233
41
0
04 Apr 2024
Deconstructing In-Context Learning: Understanding Prompts via Corruption
Deconstructing In-Context Learning: Understanding Prompts via CorruptionInternational Conference on Language Resources and Evaluation (LREC), 2024
Namrata Shivagunde
Vladislav Lialin
Sherin Muckatira
Anna Rumshisky
353
8
0
02 Apr 2024
Can AI Models Appreciate Document Aesthetics? An Exploration of
  Legibility and Layout Quality in Relation to Prediction Confidence
Can AI Models Appreciate Document Aesthetics? An Exploration of Legibility and Layout Quality in Relation to Prediction Confidence
Hsiu-Wei Yang
Abhinav Agrawal
Pavlos Fragkogiannis
Shubham Nitin Mulay
266
3
0
27 Mar 2024
A Study on How Attention Scores in the BERT Model are Aware of Lexical
  Categories in Syntactic and Semantic Tasks on the GLUE Benchmark
A Study on How Attention Scores in the BERT Model are Aware of Lexical Categories in Syntactic and Semantic Tasks on the GLUE Benchmark
Dongjun Jang
Sungjoo Byun
Hyopil Shin
164
5
0
25 Mar 2024
Transformers Learn Low Sensitivity Functions: Investigations and Implications
Transformers Learn Low Sensitivity Functions: Investigations and ImplicationsInternational Conference on Learning Representations (ICLR), 2024
Bhavya Vasudeva
Deqing Fu
Tianyi Zhou
Elliott Kau
Youqi Huang
Willie Neiswanger
471
2
0
11 Mar 2024
Word Importance Explains How Prompts Affect Language Model Outputs
Word Importance Explains How Prompts Affect Language Model Outputs
Stefan Hackmann
Haniyeh Mahmoudian
Mark Steadman
Michael Schmidt
AAML
480
10
0
05 Mar 2024
Topic Aware Probing: From Sentence Length Prediction to Idiom
  Identification how reliant are Neural Language Models on Topic?
Topic Aware Probing: From Sentence Length Prediction to Idiom Identification how reliant are Neural Language Models on Topic?
Vasudevan Nedumpozhimana
John D. Kelleher
204
2
0
04 Mar 2024
Massive Activations in Large Language Models
Massive Activations in Large Language Models
Mingjie Sun
Xinlei Chen
J. Zico Kolter
Zhuang Liu
272
162
0
27 Feb 2024
Probing Multimodal Large Language Models for Global and Local Semantic
  Representations
Probing Multimodal Large Language Models for Global and Local Semantic Representations
Mingxu Tao
Quzhe Huang
Kun Xu
Liwei Chen
Yansong Feng
Dongyan Zhao
327
11
0
27 Feb 2024
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
  Tracking
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking
Nikhil Prakash
Tamar Rott Shaham
Tal Haklay
Yonatan Belinkov
David Bau
327
97
0
22 Feb 2024
Is It a Free Lunch for Removing Outliers during Pretraining?
Is It a Free Lunch for Removing Outliers during Pretraining?
Baohao Liao
Christof Monz
MQ
137
1
0
19 Feb 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
348
5
0
18 Feb 2024
Semantic Sensitivities and Inconsistent Predictions: Measuring the
  Fragility of NLI Models
Semantic Sensitivities and Inconsistent Predictions: Measuring the Fragility of NLI ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2024
Erik Arakelyan
Zhaoqi Liu
Isabelle Augenstein
AAML
324
15
0
25 Jan 2024
Leveraging Social Media Data to Identify Factors Influencing Public
  Attitude Towards Accessibility, Socioeconomic Disparity and Public
  Transportation
Leveraging Social Media Data to Identify Factors Influencing Public Attitude Towards Accessibility, Socioeconomic Disparity and Public Transportation
Khondhaker Al Momin
A. M. Sadri
Md Sami Hasnine
92
2
0
22 Jan 2024
Better Explain Transformers by Illuminating Important Information
Better Explain Transformers by Illuminating Important Information
Linxin Song
Yan Cui
Ao Luo
Freddy Lecue
Irene Li
FAtt
321
5
0
18 Jan 2024
Anchor function: a type of benchmark functions for studying language
  models
Anchor function: a type of benchmark functions for studying language models
Zhongwang Zhang
Zhiwei Wang
Junjie Yao
Zhangchen Zhou
Xiaolong Li
E. Weinan
Z. Xu
351
9
0
16 Jan 2024
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine Translation
Only Send What You Need: Learning to Communicate Efficiently in Federated Multilingual Machine TranslationIEEE Transactions on Audio, Speech, and Language Processing (IEEE TASLP), 2024
Yun-Wei Chu
Dong-Jun Han
Christopher G. Brinton
310
6
0
15 Jan 2024
Towards Probing Contact Center Large Language Models
Towards Probing Contact Center Large Language Models
Varun Nathan
Ayush Kumar
Digvijay Ingle
Jithendra Vepa
142
0
0
26 Dec 2023
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models:
  A Critical Review and Assessment
Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models: A Critical Review and Assessment
Lingling Xu
Haoran Xie
S. J. Qin
Xiaohui Tao
F. Wang
308
282
0
19 Dec 2023
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering
  of Layer-Distributed Neural Representations
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang
David Yunis
Michael Maire
195
8
0
11 Dec 2023
Transformer as Linear Expansion of Learngene
Transformer as Linear Expansion of LearngeneAAAI Conference on Artificial Intelligence (AAAI), 2023
Shiyu Xia
Miaosen Zhang
Xu Yang
Ruiming Chen
Haokun Chen
Xin Geng
212
12
0
09 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with
  bounded Dyck grammars
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammarsNeural Information Processing Systems (NeurIPS), 2023
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
289
28
0
03 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
416
35
0
01 Dec 2023
Visual Analytics for Generative Transformer Models
Visual Analytics for Generative Transformer Models
Raymond Li
Ruixin Yang
Wen Xiao
Ahmed AbuRaed
Gabriel Murray
Giuseppe Carenini
234
3
0
21 Nov 2023
GradSim: Gradient-Based Language Grouping for Effective Multilingual
  Training
GradSim: Gradient-Based Language Grouping for Effective Multilingual TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Mingyang Wang
Heike Adel
Lukas Lange
Jannik Strötgen
Hinrich Schütze
239
4
0
23 Oct 2023
Verb Conjugation in Transformers Is Determined by Linear Encodings of
  Subject Number
Verb Conjugation in Transformers Is Determined by Linear Encodings of Subject NumberConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sophie Hao
Tal Linzen
179
8
0
23 Oct 2023
Disentangling the Linguistic Competence of Privacy-Preserving BERT
Disentangling the Linguistic Competence of Privacy-Preserving BERTBlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP (BlackboxNLP), 2023
Stefan Arnold
Nils Kemmerzell
Annika Schreiner
253
0
0
17 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech
  Transformers
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers
Hosein Mohebbi
Grzegorz Chrupała
Willem H. Zuidema
Afra Alishahi
219
19
0
15 Oct 2023
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMsInternational Conference on Learning Representations (ICLR), 2023
Suyu Ge
Yunan Zhang
Liyuan Liu
Minjia Zhang
Jiawei Han
Jianfeng Gao
452
380
0
03 Oct 2023
Grasping AI: experiential exercises for designers
Grasping AI: experiential exercises for designersAi & Society (AI & Society), 2023
Dave Murray-Rust
M. Lupetti
Iohanna Nicenboim
W. V. D. Hoog
166
16
0
02 Oct 2023
Neurons in Large Language Models: Dead, N-gram, Positional
Neurons in Large Language Models: Dead, N-gram, PositionalAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Elena Voita
Javier Ferrando
Christoforos Nalmpantis
MILM
400
73
0
09 Sep 2023
Explainability for Large Language Models: A Survey
Explainability for Large Language Models: A SurveyACM Transactions on Intelligent Systems and Technology (ACM TIST), 2023
Haiyan Zhao
Hanjie Chen
Fan Yang
Ninghao Liu
Huiqi Deng
Hengyi Cai
Shuaiqiang Wang
D. Yin
Jundong Li
LRM
500
717
0
02 Sep 2023
Why do universal adversarial attacks work on large language models?:
  Geometry might be the answer
Why do universal adversarial attacks work on large language models?: Geometry might be the answer
Varshini Subhash
Anna Bialas
Weiwei Pan
Finale Doshi-Velez
AAML
215
16
0
01 Sep 2023
Uncertainty Estimation of Transformers' Predictions via Topological
  Analysis of the Attention Matrices
Uncertainty Estimation of Transformers' Predictions via Topological Analysis of the Attention Matrices
Elizaveta Kostenok
D. Cherniavskii
Alexey Zaytsev
264
9
0
22 Aug 2023
Scaling up Discovery of Latent Concepts in Deep NLP Models
Scaling up Discovery of Latent Concepts in Deep NLP ModelsConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Majd Hawasly
Fahim Dalvi
Nadir Durrani
320
6
0
20 Aug 2023
PMET: Precise Model Editing in a Transformer
PMET: Precise Model Editing in a TransformerAAAI Conference on Artificial Intelligence (AAAI), 2023
Xiaopeng Li
Shasha Li
Shezheng Song
Jing Yang
Jun Ma
Jie Yu
KELM
549
186
0
17 Aug 2023
Decoding Layer Saliency in Language Transformers
Decoding Layer Saliency in Language TransformersInternational Conference on Machine Learning (ICML), 2023
Elizabeth M. Hou
Greg Castañón
MILM
282
3
0
09 Aug 2023
Improving BERT with Hybrid Pooling Network and Drop Mask
Improving BERT with Hybrid Pooling Network and Drop Mask
Qian Chen
Wen Wang
Qinglin Zhang
Chong Deng
Ma Yukun
Siqi Zheng
116
1
0
14 Jul 2023
Multi-Task Learning Improves Performance In Deep Argument Mining Models
Multi-Task Learning Improves Performance In Deep Argument Mining ModelsWorkshop on Argument Mining (ArgMining), 2023
Amirhossein Farzam
Shashank Shekhar
Isaac Mehlhaff
Marco Morucci
206
1
0
03 Jul 2023
Quantizable Transformers: Removing Outliers by Helping Attention Heads
  Do Nothing
Quantizable Transformers: Removing Outliers by Helping Attention Heads Do NothingNeural Information Processing Systems (NeurIPS), 2023
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
325
128
0
22 Jun 2023
Opening the Black Box: Analyzing Attention Weights and Hidden States in
  Pre-trained Language Models for Non-language Tasks
Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks
Mohamad Ballout
U. Krumnack
Gunther Heidemann
Kai-Uwe Kühnberger
120
2
0
21 Jun 2023
Explicit Syntactic Guidance for Neural Text Generation
Explicit Syntactic Guidance for Neural Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yafu Li
Leyang Cui
Jianhao Yan
Yongjng Yin
Wei Bi
Shuming Shi
Yue Zhang
217
11
0
20 Jun 2023
Previous
1234567
Next
Page 2 of 7