On the Effect of Dropping Layers of Pre-trained Transformer Models

8 April 2020

Papers citing "On the Effect of Dropping Layers of Pre-trained Transformer Models"

20 / 20 papers shown

Title
How Redundant Is the Transformer Stack in Speech Representation Models? Teresa Dorszewski Albert Kjøller Jacobsen Lenka Tětková Lars Kai Hansen 104 0 0 20 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers Neha Verma Kenton W. Murray Kevin Duh AI4CE 45 0 0 10 Jan 2025
Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent Lin Wang Zhichao Wang Xiaoying Tang 29 1 0 17 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs Wei Zhong Manasa Bharadwaj 31 5 0 30 May 2024
The Unreasonable Ineffectiveness of the Deeper Layers Andrey Gromov Kushal Tirumala Hassan Shapourian Paolo Glorioso Daniel A. Roberts 41 79 0 26 Mar 2024
Where does In-context Translation Happen in Large Language Models Suzanna Sia David Mueller Kevin Duh LRM 33 0 0 07 Mar 2024
Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers Shuzhou Yuan Ercong Nie Bolei Ma Michael Farber 32 3 0 18 Feb 2024
Graph Neural Networks for Antisocial Behavior Detection on Twitter Martina Toshevska S. Kalajdziski Sonja Gievska 14 0 0 28 Dec 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models Seungcheol Park Ho-Jin Choi U. Kang VLM 25 4 0 07 Aug 2023
Deep Model Compression Also Helps Models Capture Ambiguity Hancheol Park Jong C. Park 17 1 0 12 Jun 2023
The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification Anastasiia Grishina Max Hort Leon Moonen 22 6 0 08 May 2023
On the Transformation of Latent Space in Fine-Tuned NLP Models Nadir Durrani Hassan Sajjad Fahim Dalvi Firoj Alam 27 17 0 23 Oct 2022
Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer Learning Shuo Xie Jiahao Qiu Ankita Pasad Li Du Qing Qu Hongyuan Mei 28 16 0 18 Oct 2022
Differentiable Subset Pruning of Transformer Heads Jiaoda Li Ryan Cotterell Mrinmaya Sachan 29 52 0 10 Aug 2021
Comparing Rewinding and Fine-tuning in Neural Network Pruning Alex Renda Jonathan Frankle Michael Carbin 222 382 0 05 Mar 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing Canwen Xu Wangchunshu Zhou Tao Ge Furu Wei Ming Zhou 221 196 0 07 Feb 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT Sheng Shen Zhen Dong Jiayu Ye Linjian Ma Z. Yao A. Gholami Michael W. Mahoney Kurt Keutzer MQ 225 571 0 12 Sep 2019
The Bottom-up Evolution of Representations in the Transformer: A Study with Machine Translation and Language Modeling Objectives Elena Voita Rico Sennrich Ivan Titov 188 181 0 03 Sep 2019
What you can cram into a single vector: Probing sentence embeddings for linguistic properties Alexis Conneau Germán Kruszewski Guillaume Lample Loïc Barrault Marco Baroni 199 879 0 03 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 294 6,927 0 20 Apr 2018