Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2005.00561
Cited By
v1
v2 (latest)
When BERT Plays the Lottery, All Tickets Are Winning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
1 May 2020
Sai Prasanna
Anna Rogers
Anna Rumshisky
MILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"When BERT Plays the Lottery, All Tickets Are Winning"
50 / 122 papers shown
TwIST: Rigging the Lottery in Transformers with Independent Subnetwork Training
Michael Menezes
Barbara Su
Xinze Feng
Yehya Farhat
Hamza Shili
Anastasios Kyrillidis
227
1
0
06 Nov 2025
GaLLoP: Gradient-based Sparse Learning on Low-Magnitude Parameters
Anand Choudhary
Yasser Sulaıman
Lukas Mauch
G. B. Hacene
Fabien Cardinaux
Antoine Bosselut
215
0
0
22 Oct 2025
SliceFine: The Universal Winning-Slice Hypothesis for Pretrained Networks
Md. Kowsher
Ali O. Polat
Ehsan Mohammady Ardehaly
Mehrdad Salehi
Zia Ghiasi
Prasanth Murali
Chen Chen
262
3
0
09 Oct 2025
Where to Begin: Efficient Pretraining via Subnetwork Selection and Distillation
Arjun Krishnakumar
R. Sukthanker
Hannan Javed Mahadik
Gabriela Kadlecová
Vladyslav Moroshan
Timur Carstensen
Frank Hutter
Aaron Klein
186
0
0
08 Oct 2025
Downsized and Compromised?: Assessing the Faithfulness of Model Compression
Moumita Kamal
Douglas A. Talbert
138
0
0
07 Oct 2025
BLaST: High Performance Inference and Pretraining using BLock Sparse Transformers
Patrik Okanovic
Sameer Deshmukh
Grzegorz Kwa'sniewski
Yi Zhu
Haruto Fujii
...
Maciej Besta
Kentaro Katayama
Takumi Honda
Yusuke Nagasaka
Torsten Hoefler
250
0
0
03 Jul 2025
Balanced and Elastic End-to-end Training of Dynamic LLMs
Mohamed Wahib
Muhammed Abdullah Soyturk
Didem Unat
MoE
383
1
0
20 May 2025
Few Dimensions are Enough: Fine-tuning BERT with Selected Dimensions Revealed Its Redundant Nature
Shion Fukuhata
Yoshinobu Kano
283
1
0
07 Apr 2025
As easy as PIE: understanding when pruning causes language models to disagree
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Pietro Tropeano
Maria Maistro
Tuukka Ruotsalo
Christina Lioma
304
0
0
27 Mar 2025
Generative Linguistics, Large Language Models, and the Social Nature of Scientific Success
Sophie Hao
ELM
AI4CE
279
0
0
25 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
593
6
0
14 Mar 2025
Local Contrastive Editing of Gender Stereotypes
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Marlene Lutz
Rochelle Choenni
M. Strohmaier
Anne Lauscher
385
2
0
23 Oct 2024
Superficial Safety Alignment Hypothesis
Jianwei Li
Jung-Eun Kim
LLMSV
420
8
0
07 Oct 2024
Greedy Output Approximation: Towards Efficient Structured Pruning for LLMs Without Retraining
Jianwei Li
Yijun Dong
Qi Lei
412
10
0
26 Jul 2024
Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies
Changye Li
Zhecheng Sheng
Trevor Cohen
Serguei V. S. Pakhomov
208
2
0
05 Jun 2024
What Happens When Small Is Made Smaller? Exploring the Impact of Compression on Small Data Pretrained Language Models
Busayo Awobade
Mardiyyah Oduwole
Steven Kolawole
242
1
0
06 Apr 2024
LayerNorm: A key component in parameter-efficient fine-tuning
Taha ValizadehAslani
Hualou Liang
314
5
0
29 Mar 2024
SEVEN: Pruning Transformer Model by Reserving Sentinels
IEEE International Joint Conference on Neural Network (IJCNN), 2024
Jinying Xiao
Ping Li
Jie Nie
Zhe Tang
243
3
0
19 Mar 2024
Let's Focus on Neuron: Neuron-Level Supervised Fine-tuning for Large Language Model
International Conference on Computational Linguistics (COLING), 2024
Haoyun Xu
Runzhe Zhan
Yang Li
Lidia S. Chao
288
7
0
18 Mar 2024
CHAI: Clustered Head Attention for Efficient LLM Inference
International Conference on Machine Learning (ICML), 2024
Saurabh Agarwal
Bilge Acun
Basil Homer
Mostafa Elhoushi
Yejin Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
Carole-Jean Wu
320
16
0
12 Mar 2024
A Survey of Lottery Ticket Hypothesis
Bohan Liu
Zijie Zhang
Peixiong He
Zhensen Wang
Yang Xiao
Ruimeng Ye
Yang Zhou
Wei-Shinn Ku
Bo Hui
UQCV
414
25
0
07 Mar 2024
NeuroPrune: A Neuro-inspired Topological Sparse Training Algorithm for Large Language Models
Amit Dhurandhar
Tejaswini Pedapati
Ronny Luss
Soham Dan
Aurélie C. Lozano
Payel Das
Georgios Kollias
419
3
0
28 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
380
95
0
15 Feb 2024
Dynamic Layer Tying for Parameter-Efficient Transformers
International Conference on Learning Representations (ICLR), 2024
Tamir David Hay
Lior Wolf
257
13
0
23 Jan 2024
Fairness-Aware Structured Pruning in Transformers
A. Zayed
Gonçalo Mordido
Samira Shabanian
Ioana Baldini
Sarath Chandar
338
34
0
24 Dec 2023
Gradient-based Parameter Selection for Efficient Fine-Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Zhi Zhang
Qizhe Zhang
Zijun Gao
Renrui Zhang
Ekaterina Shutova
Shiji Zhou
Shanghang Zhang
486
47
0
15 Dec 2023
Picking the Underused Heads: A Network Pruning Perspective of Attention Head Selection for Fusing Dialogue Coreference Information
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Zhengyuan Liu
Nancy F. Chen
280
1
0
15 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars
Neural Information Processing Systems (NeurIPS), 2023
Kaiyue Wen
Yuchen Li
Bing Liu
Andrej Risteski
327
28
0
03 Dec 2023
Examining Modularity in Multilingual LMs via Language-Specialized Subnetworks
Rochelle Choenni
Ekaterina Shutova
Daniel H Garrette
279
12
0
14 Nov 2023
Sparse Contrastive Learning of Sentence Embeddings
Ruize An
Chen Zhang
Dawei Song
233
0
0
07 Nov 2023
Successfully Applying Lottery Ticket Hypothesis to Diffusion Model
Chao Jiang
Bo Hui
Bohan Liu
Da Yan
DiffM
320
16
0
28 Oct 2023
Outlier Dimensions Encode Task-Specific Knowledge
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
William Rudman
Catherine Chen
Carsten Eickhoff
390
9
0
26 Oct 2023
Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models
Jianwei Li
Qi Lei
Wei Cheng
Dongkuan Xu
KELM
407
10
0
19 Oct 2023
Breaking through Deterministic Barriers: Randomized Pruning Mask Generation and Selection
Jianwei Li
Weizhi Gao
Qi Lei
Dongkuan Xu
398
4
0
19 Oct 2023
NASH: A Simple Unified Framework of Structured Pruning for Accelerating Encoder-Decoder Language Models
Jongwoo Ko
Seungjoon Park
Yujin Kim
Sumyeong Ahn
Du-Seong Chang
Euijai Ahn
SeYoung Yun
301
10
0
16 Oct 2023
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Huiyin Xue
Nikolaos Aletras
389
1
0
11 Oct 2023
Multilingual Text Representation
Fahim Faisal
261
1
0
02 Sep 2023
S
P
3
\rm SP^3
S
P
3
: Enhancing Structured Pruning via PCA Projection
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuxuan Hu
Jing Zhang
Zhe Zhao
Chengliang Zhao
Xiaodong Chen
Cuiping Li
Hong Chen
327
4
0
31 Aug 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
International Conference on Machine Learning (ICML), 2023
A. Jaiswal
Shiwei Liu
Tianlong Chen
Ying Ding
Zinan Lin
VLM
281
24
0
18 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Neural Information Processing Systems (NeurIPS), 2023
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zinan Lin
VLM
353
44
0
06 Jun 2023
Exploring the Impact of Model Scaling on Parameter-Efficient Tuning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yusheng Su
Chi-Min Chan
Jiali Cheng
Yujia Qin
Yankai Lin
...
Ning Ding
Xingzhi Sun
Guotong Xie
Zhiyuan Liu
Maosong Sun
312
9
0
04 Jun 2023
The Information Pathways Hypothesis: Transformers are Dynamic Self-Ensembles
Knowledge Discovery and Data Mining (KDD), 2023
Md Shamim Hussain
Mohammed J Zaki
D. Subramanian
451
4
0
02 Jun 2023
Adaptive Sparsity Level during Training for Efficient Time Series Forecasting with Transformers
Zahra Atashgahi
Mykola Pechenizkiy
Raymond N. J. Veldhuis
Decebal Constantin Mocanu
AI4TS
AI4CE
343
2
0
28 May 2023
Fine-tuning Happens in Tiny Subspaces: Exploring Intrinsic Task-specific Subspaces of Pre-trained Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Zhong Zhang
Bang Liu
Junming Shao
314
20
0
27 May 2023
PruMUX: Augmenting Data Multiplexing with Model Compression
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yushan Su
Vishvak Murahari
Karthik Narasimhan
Keqin Li
317
3
0
24 May 2023
Rethinking Graph Lottery Tickets: Graph Sparsity Matters
International Conference on Learning Representations (ICLR), 2023
Bo Hui
Jocelyn M Mora
Adrian Dalca
I. Aganj
350
28
0
03 May 2023
Gradient-Free Structured Pruning with Unlabeled Data
International Conference on Machine Learning (ICML), 2023
Azade Nova
H. Dai
Dale Schuurmans
SyDa
370
38
0
07 Mar 2023
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Workshop on Representation Learning for NLP (RepL4NLP), 2023
Vishvak Murahari
Ameet Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik Narasimhan
MoE
256
5
0
24 Feb 2023
Modular Deep Learning
Jonas Pfeiffer
Sebastian Ruder
Ivan Vulić
Edoardo Ponti
MoMe
OOD
493
111
0
22 Feb 2023
Task-Specific Skill Localization in Fine-tuned Language Models
International Conference on Machine Learning (ICML), 2023
A. Panigrahi
Nikunj Saunshi
Haoyu Zhao
Sanjeev Arora
MoMe
445
98
0
13 Feb 2023
1
2
3
Next
Page 1 of 3