Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.06990
Cited By
BERT Busters: Outlier Dimensions that Disrupt Transformers
14 May 2021
Olga Kovaleva
Saurabh Kulshreshtha
Anna Rogers
Anna Rumshisky
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BERT Busters: Outlier Dimensions that Disrupt Transformers"
18 / 18 papers shown
Title
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
46
0
0
13 May 2025
Fast and Low-Cost Genomic Foundation Models via Outlier Removal
Haozheng Luo
Chenghao Qiu
Maojiang Su
Zhihan Zhou
Zoe Mehta
Guo Ye
Jerry Yao-Chieh Hu
Han Liu
AAML
55
0
0
01 May 2025
Outlier dimensions favor frequent tokens in language models
Iuri Macocco
Nora Graichen
Gemma Boleda
Marco Baroni
53
0
0
27 Mar 2025
Probe Pruning: Accelerating LLMs through Dynamic Pruning via Model-Probing
Qi Le
Enmao Diao
Ziyan Wang
Xinran Wang
Jie Ding
Li Yang
Ali Anwar
69
1
0
24 Feb 2025
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
28
0
0
12 Oct 2024
OATS: Outlier-Aware Pruning Through Sparse and Low Rank Decomposition
Stephen Zhang
V. Papyan
VLM
43
1
0
20 Sep 2024
Outlier Reduction with Gated Attention for Improved Post-training Quantization in Large Sequence-to-sequence Speech Foundation Models
Dominik Wagner
Ilja Baumann
K. Riedhammer
Tobias Bocklet
MQ
30
1
0
16 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
34
3
0
29 May 2024
Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models
Wanyun Cui
Qianle Wang
MQ
34
1
0
03 Apr 2024
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu (Allen) Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
23
78
0
08 Oct 2023
A Simple and Effective Pruning Approach for Large Language Models
Mingjie Sun
Zhuang Liu
Anna Bair
J. Zico Kolter
50
353
0
20 Jun 2023
Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models
Xiuying Wei
Yunchen Zhang
Xiangguo Zhang
Ruihao Gong
Shanghang Zhang
Qi Zhang
F. Yu
Xianglong Liu
MQ
22
145
0
27 Sep 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
71
42
0
23 May 2022
Life after BERT: What do Other Muppets Understand about Language?
Vladislav Lialin
Kevin Zhao
Namrata Shivagunde
Anna Rumshisky
34
6
0
21 May 2022
Measuring the Mixing of Contextual Information in the Transformer
Javier Ferrando
Gerard I. Gállego
Marta R. Costa-jussá
21
48
0
08 Mar 2022
Neural reality of argument structure constructions
Bai Li
Zining Zhu
Guillaume Thomas
Frank Rudzicz
Yang Xu
28
26
0
24 Feb 2022
The Lottery Ticket Hypothesis for Pre-trained BERT Networks
Tianlong Chen
Jonathan Frankle
Shiyu Chang
Sijia Liu
Yang Zhang
Zhangyang Wang
Michael Carbin
148
345
0
23 Jul 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
1