v1v2 (latest)

When BERT Plays the Lottery, All Tickets Are Winning

Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

1 May 2020

Papers citing "When BERT Plays the Lottery, All Tickets Are Winning"

50 / 122 papers shown

TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training

Anastasios Kyrillidis

227

06 Nov 2025

GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters

215

22 Oct 2025

SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks

Md. Kowsher

Ali O. Polat

Ehsan Mohammady Ardehaly

262

09 Oct 2025

Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation

186

08 Oct 2025

Downsized and Compromised?: Assessing the Faithfulness of Model Compression

Moumita Kamal

Douglas A. Talbert

138

07 Oct 2025

BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers

Patrik Okanovic

Sameer Deshmukh

Grzegorz Kwa'sniewski

...

250

03 Jul 2025

Balanced and Elastic End-to-end Training of Dynamic LLMs

Mohamed Wahib

Muhammed Abdullah Soyturk

Didem Unat

MoE

383

20 May 2025

Few Dimensions are Enough: Fine-tuning BERT with Selected Dimensions Revealed Its Redundant Nature

Shion Fukuhata

Yoshinobu Kano

283

07 Apr 2025

As easy as PIE: understanding when pruning causes language models to disagreeNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

304

27 Mar 2025

Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success

Sophie Hao

ELM AI4CE

279

25 Mar 2025

Are formal and functional linguistic mechanisms dissociated in language models?

Michael Hanna

Sandro Pezzelle

Yonatan Belinkov

593

14 Mar 2025

Local Contrastive Editing of Gender StereotypesConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

385

23 Oct 2024

Superficial Safety Alignment Hypothesis

Jianwei Li

Jung-Eun Kim

LLMSV

420

07 Oct 2024

Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining

Jianwei Li

Yijun Dong

Qi Lei

412

26 Jul 2024

Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies

Changye Li

Zhecheng Sheng

Trevor Cohen

Serguei V. S. Pakhomov

208

05 Jun 2024

What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models

Busayo Awobade

Mardiyyah Oduwole

Steven Kolawole

242

06 Apr 2024

LayerNorm: A key component in parameter-efficient fine-tuning

Taha ValizadehAslani

Hualou Liang

314

29 Mar 2024

SEVEN: Pruning Transformer Model by Reserving SentinelsIEEE International Joint Conference on Neural Network (IJCNN), 2024

243

19 Mar 2024

Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language ModelInternational Conference on Computational Linguistics (COLING), 2024

288

18 Mar 2024

CHAI: Clustered Head Attention for Efficient LLM InferenceInternational Conference on Machine Learning (ICML), 2024

Saurabh Agarwal

Shivaram Venkataraman

Dimitris Papailiopoulos

Carole-Jean Wu

320

12 Mar 2024

A Survey of Lottery Ticket Hypothesis

414

07 Mar 2024

NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models

419

28 Feb 2024

Model Compression and Efficient Inference for Large Language Models: A Survey

380

15 Feb 2024

Dynamic Layer Tying for Parameter-Efficient TransformersInternational Conference on Learning Representations (ICLR), 2024

Tamir David Hay

Lior Wolf

257

23 Jan 2024

Fairness-Aware Structured Pruning in Transformers

338

24 Dec 2023

Gradient-based Parameter Selection for Efficient Fine-TuningComputer Vision and Pattern Recognition (CVPR), 2023

Zhi Zhang

Qizhe Zhang

Shanghang Zhang

486

15 Dec 2023

Picking the Underused Heads: A Network Pruning Perspective of Attention Head Selection for Fusing Dialogue Coreference InformationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

Zhengyuan Liu

Nancy F. Chen

280

15 Dec 2023

Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammarsNeural Information Processing Systems (NeurIPS), 2023

Kaiyue Wen

Yuchen Li

Bing Liu

Andrej Risteski

327

03 Dec 2023

Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks

Rochelle Choenni

Ekaterina Shutova

Daniel H Garrette

279

14 Nov 2023

Sparse Contrastive Learning of Sentence Embeddings

Ruize An

Chen Zhang

Dawei Song

233

07 Nov 2023

Successfully Applying Lottery Ticket Hypothesis to Diffusion Model

320

28 Oct 2023

Outlier Dimensions Encode Task-Specific KnowledgeConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

William Rudman

Catherine Chen

Carsten Eickhoff

390

26 Oct 2023

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models

Dongkuan Xu

407

19 Oct 2023

Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection

Jianwei Li

Weizhi Gao

Qi Lei

Dongkuan Xu

398

19 Oct 2023

NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models

301

16 Oct 2023

Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head AttentionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Huiyin Xue

Nikolaos Aletras

389

11 Oct 2023

Multilingual Text Representation

Fahim Faisal

261

02 Sep 2023

$$\rm SP^3$: Enhancing Structured Pruning via PCA Projection$

\rm SP^3

: Enhancing Structured Pruning via PCA ProjectionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

327

31 Aug 2023

Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large ModelsInternational Conference on Machine Learning (ICML), 2023

281

18 Jun 2023

The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that MatterNeural Information Processing Systems (NeurIPS), 2023

353

06 Jun 2023

Exploring the Impact of Model Scaling on Parameter-Efficient TuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Yusheng Su

Chi-Min Chan

Jiali Cheng

Yujia Qin

Yankai Lin

...

Ning Ding

Xingzhi Sun

Guotong Xie

Zhiyuan Liu

Maosong Sun

312

04 Jun 2023

The Information Pathways Hypothesis: Transformers are Dynamic Self-EnsemblesKnowledge Discovery and Data Mining (KDD), 2023

Md Shamim Hussain

Mohammed J Zaki

D. Subramanian

451

02 Jun 2023

Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers

Zahra Atashgahi

Mykola Pechenizkiy

Raymond N. J. Veldhuis

Decebal Constantin Mocanu

AI4TS AI4CE

343

28 May 2023

Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

Zhong Zhang

Bang Liu

Junming Shao

314

27 May 2023

PruMUX: Augmenting Data Multiplexing with Model CompressionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

317

24 May 2023

Rethinking Graph Lottery Tickets: Graph Sparsity MattersInternational Conference on Learning Representations (ICLR), 2023

350

03 May 2023

Gradient-Free Structured Pruning with Unlabeled DataInternational Conference on Machine Learning (ICML), 2023

370

07 Mar 2023

MUX-PLMs: Data Multiplexing for High-throughput Language ModelsWorkshop on Representation Learning for NLP (RepL4NLP), 2023

256

24 Feb 2023

Modular Deep Learning

493

111

22 Feb 2023

Task-Specific Skill Localization in Fine-tuned Language ModelsInternational Conference on Machine Learning (ICML), 2023

445

13 Feb 2023