Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.03844
Cited By
On the Effect of Dropping Layers of Pre-trained Transformer Models
8 April 2020
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Effect of Dropping Layers of Pre-trained Transformer Models"
20 / 20 papers shown
Title
How Redundant Is the Transformer Stack in Speech Representation Models?
Teresa Dorszewski
Albert Kjøller Jacobsen
Lenka Tětková
Lars Kai Hansen
104
0
0
20 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
45
0
0
10 Jan 2025
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent
Lin Wang
Zhichao Wang
Xiaoying Tang
29
1
0
17 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
31
5
0
30 May 2024
The Unreasonable Ineffectiveness of the Deeper Layers
Andrey Gromov
Kushal Tirumala
Hassan Shapourian
Paolo Glorioso
Daniel A. Roberts
41
79
0
26 Mar 2024
Where does In-context Translation Happen in Large Language Models
Suzanna Sia
David Mueller
Kevin Duh
LRM
33
0
0
07 Mar 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers
Shuzhou Yuan
Ercong Nie
Bolei Ma
Michael Farber
32
3
0
18 Feb 2024
Graph Neural Networks for Antisocial Behavior Detection on Twitter
Martina Toshevska
S. Kalajdziski
Sonja Gievska
14
0
0
28 Dec 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
25
4
0
07 Aug 2023
Deep Model Compression Also Helps Models Capture Ambiguity
Hancheol Park
Jong C. Park
17
1
0
12 Jun 2023
The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification
Anastasiia Grishina
Max Hort
Leon Moonen
22
6
0
08 May 2023
On the Transformation of Latent Space in Fine-Tuned NLP Models
Nadir Durrani
Hassan Sajjad
Fahim Dalvi
Firoj Alam
27
17
0
23 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning
Shuo Xie
Jiahao Qiu
Ankita Pasad
Li Du
Qing Qu
Hongyuan Mei
28
16
0
18 Oct 2022
Differentiable Subset Pruning of Transformer Heads
Jiaoda Li
Ryan Cotterell
Mrinmaya Sachan
29
52
0
10 Aug 2021
Comparing Rewinding and Fine-tuning in Neural Network Pruning
Alex Renda
Jonathan Frankle
Michael Carbin
222
382
0
05 Mar 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
196
0
07 Feb 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
225
571
0
12 Sep 2019
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives
Elena Voita
Rico Sennrich
Ivan Titov
188
181
0
03 Sep 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties
Alexis Conneau
Germán Kruszewski
Guillaume Lample
Loïc Barrault
Marco Baroni
199
879
0
03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,927
0
20 Apr 2018
1