v1v2v3 (latest)

On the Effect of Dropping Layers of Pre-trained Transformer Models

Computer Speech and Language (CSL), 2020

8 April 2020

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "On the Effect of Dropping Layers of Pre-trained Transformer Models"

50 / 56 papers shown

Iterative Layer Pruning for Efficient Translation Inference

Yasmin Moslem

Muhammad Hazim Al Farouq

John D. Kelleher

157

26 Oct 2025

QLENS: Towards A Quantum Perspective of Language Transformers

Chirag Shah

167

13 Oct 2025

Spiralformer: Low Latency Encoder for Streaming Speech Recognition with Circular Layer Skipping and Early Exiting

135

01 Oct 2025

Efficient Layer-wise LLM Fine-tuning for Revision Intention Prediction

Zhexiong Liu

Diane Litman

KELM

205

30 Sep 2025

TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning

Seohyun Lee

Wenzhi Fang

Dong-Jun Han

Seyyedali Hosseinalipour

Christopher G. Brinton

161

30 Sep 2025

Efficient Large Language Models with Zero-Shot Adjustable Acceleration

Sajjad Kachuee

M. Sharifkhani

237

01 Sep 2025

On the Evolution of Federated Post-Training Large Language Models: A Model Accessibility View

134

22 Aug 2025

FedSODA: Federated Fine-tuning of LLMs via Similarity Group Pruning and Orchestrated Distillation Alignment

193

18 Aug 2025

Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

334

10 Jun 2025

FLoE: Fisher-Based Layer Selection for Efficient Sparse Adaptation of Low-Rank Experts

277

31 May 2025

LPASS: Linear Probes as Stepping Stones for vulnerability detection using compressed LLMsJournal of Information Security and Applications (JISA), 2025

Luis Ibanez-Lissen

Lorena Gonzalez-Manzano

José Maria De Fuentes

Nicolas Anciaux

164

30 May 2025

Efficient Speech Translation through Model Compression and Knowledge DistillationInternational Workshop on Spoken Language Translation (IWSLT), 2025

Yasmin Moslem

265

26 May 2025

RSQ: Learning from Important Tokens Leads to Better Quantized LLMs

373

03 Mar 2025

How Redundant Is the Transformer Stack in Speech Representation Models?IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Teresa Dorszewski

Albert Kjøller Jacobsen

Lenka Tětková

Lars Kai Hansen

512

20 Jan 2025

Merging Feed-Forward Sublayers for Compressed Transformers

422

10 Jan 2025

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs

Lanxiang Hu

Tajana Rosing

Hao Zhang

358

15 Dec 2024

CULL-MT: Compression Using Language and Layer pruning for Machine Translation

Pedram Rostami

M. Dousti

331

10 Nov 2024

Layer-wise Importance Matters: Less Memory for Better Performance in Parameter-efficient Fine-tuning of Large Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Kai Yao

196

15 Oct 2024

Persistent Topological Features in Large Language Models

617

14 Oct 2024

Resource Allocation and Secure Wireless Communication in the Large Model-based Mobile Edge Computing System

Zefan Wang

Yitong Wang

Jun Zhao

242

29 Jun 2024

The Remarkable Robustness of LLMs: Stages of Inference?

Vedang Lad

Wes Gurnee

Max Tegmark

647

113

27 Jun 2024

Save It All: Enabling Full Parameter Tuning for Federated Large Language Models via Cycle Block Gradient Descent

Lin Wang

Zhichao Wang

Xiaoying Tang

267

17 Jun 2024

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

289

03 Jun 2024

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Wei Zhong

Manasa Bharadwaj

430

30 May 2024

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

Zheng Wang

Chenglu Wen

306

17 Apr 2024

The Unreasonable Ineffectiveness of the Deeper Layers

657

192

26 Mar 2024

Why Lift so Heavy? Slimming Large Language Models by Cutting Off the Layers

435

18 Feb 2024

Graph Neural Networks for Antisocial Behavior Detection on Twitter

Martina Toshevska

S. Kalajdziski

Sonja Gievska

149

28 Dec 2023

CRaSh: Clustering, Removing, and Sharing Enhance Fine-tuning without Full Large Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

315

24 Oct 2023

Sub-network Discovery and Soft-masking for Continual Learning of Mixed Tasks

298

13 Oct 2023

Can pruning make Large Language Models more efficient?

Sia Gholami

Marwan Omar

350

06 Oct 2023

ECoFLaP: Efficient Coarse-to-Fine Layer-Wise Pruning for Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Yi-Lin Sung

Jaehong Yoon

Mohit Bansal

VLM

357

04 Oct 2023

CoMFLP: Correlation Measure based Fast Search on ASR Layer PruningInterspeech (Interspeech), 2023

W. Liu

Zhiyuan Peng

Tan Lee

273

21 Sep 2023

Multilingual Text Representation

Fahim Faisal

261

02 Sep 2023

Accurate Retraining-free Pruning for Pretrained Encoder-based Language ModelsInternational Conference on Learning Representations (ICLR), 2023

349

07 Aug 2023

Deep Model Compression Also Helps Models Capture AmbiguityAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Hancheol Park

Jong C. Park

368

12 Jun 2023

PruMUX: Augmenting Data Multiplexing with Model CompressionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

317

24 May 2023

Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence Modeling

350

15 May 2023

The EarlyBIRD Catches the Bug: On Exploiting Early Layers of Encoder Models for More Efficient Code Classification

Anastasiia Grishina

Max Hort

Leon Moonen

356

08 May 2023

Gradient-Free Structured Pruning with Unlabeled DataInternational Conference on Machine Learning (ICML), 2023

371

07 Mar 2023

Offsite-Tuning: Transfer Learning without Full Model

Guangxuan Xiao

Ji Lin

Song Han

299

100

09 Feb 2023

Tracing and Manipulating Intermediate Values in Neural Math Problem Solvers

269

17 Jan 2023

On the Transformation of Latent Space in Fine-Tuned NLP ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Nadir Durrani

Hassan Sajjad

Fahim Dalvi

Firoj Alam

292

23 Oct 2022

Hidden State Variability of Pretrained Language Models Can Guide Computation Reduction for Transfer LearningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Qing Qu

276

18 Oct 2022

Efficient Methods for Natural Language Processing: A SurveyTransactions of the Association for Computational Linguistics (TACL), 2022

Marcos Vinícius Treviso

...

Niranjan Balasubramanian

Leon Derczynski

Iryna Gurevych

Roy Schwartz

500

151

31 Aug 2022

Embedding Recycling for Language ModelsFindings (Findings), 2022

Jon Saad-Falcon

Amanpreet Singh

Luca Soldaini

Mike DÁrcy

Arman Cohan

Doug Downey

KELM

231

11 Jul 2022

Discovering Salient Neurons in Deep NLP ModelsJournal of machine learning research (JMLR), 2022

367

27 Jun 2022

Probing Structured Pruning on Multilingual Pre-trained Models: Settings, Algorithms, and EfficiencyAnnual Meeting of the Association for Computational Linguistics (ACL), 2022

Runxin Xu

Fei Huang

194

06 Apr 2022

A Fast Post-Training Pruning Framework for TransformersNeural Information Processing Systems (NeurIPS), 2022

Sehoon Kim

292

213

29 Mar 2022

No One Left Behind: Inclusive Federated Learning over Heterogeneous DevicesKnowledge Discovery and Data Mining (KDD), 2022

Ruixuan Liu

Xing Xie

258

102

16 Feb 2022