Linear Mode Connectivity and the Lottery Ticket Hypothesis

11 December 2019

Jonathan Frankle

Gintare Karolina Dziugaite

Papers citing "Linear Mode Connectivity and the Lottery Ticket Hypothesis"

50 / 154 papers shown

Title
One is More: Diverse Perspectives within a Single Network for Efficient DRL Yiqin Tan Ling Pan Longbo Huang OffRL 38 0 0 21 Oct 2023
Model Merging by Uncertainty-Based Gradient Matching Nico Daheim Thomas Möllenhoff E. Ponti Iryna Gurevych Mohammad Emtiyaz Khan MoMe FedML 32 43 0 19 Oct 2023
Layer-wise Linear Mode Connectivity Linara Adilova Maksym Andriushchenko Michael Kamp Asja Fischer Martin Jaggi FedML FAtt MoMe 33 15 0 13 Jul 2023
Distilled Pruning: Using Synthetic Data to Win the Lottery Luke McDermott Daniel Cummings SyDa DD 34 1 0 07 Jul 2023
Quantifying lottery tickets under label noise: accuracy, calibration, and complexity V. Arora Daniele Irto Sebastian Goldt G. Sanguinetti 36 2 0 21 Jun 2023
Make Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning Baohao Liao Shaomu Tan Christof Monz KELM 23 29 0 01 Jun 2023
Investigating how ReLU-networks encode symmetries Georg Bökman Fredrik Kahl 29 6 0 26 May 2023
Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning Moonseok Choi Hyungi Lee G. Nam Juho Lee 32 2 0 24 May 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models Guillermo Ortiz-Jiménez Alessandro Favero P. Frossard MoMe 51 110 0 22 May 2023
NTK-SAP: Improving neural network pruning by aligning training dynamics Yite Wang Dawei Li Ruoyu Sun 34 19 0 06 Apr 2023
On the Variance of Neural Network Training with respect to Test Sets and Distributions Keller Jordan OOD 21 10 0 04 Apr 2023
Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation Tianli Zhang Mengqi Xue Jiangtao Zhang Haofei Zhang Yu Wang Lechao Cheng Jie Song Mingli Song 28 5 0 26 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency Vithursan Thangarasa Shreyas Saxena Abhay Gupta Sean Lie 31 3 0 21 Mar 2023
Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning Sang-Ho Kim Lorenzo Noci Antonio Orvieto Thomas Hofmann CLL 22 35 0 16 Mar 2023
Understanding plasticity in neural networks Clare Lyle Zeyu Zheng Evgenii Nikishin Bernardo Avila-Pires Razvan Pascanu Will Dabney AI4CE 35 97 0 02 Mar 2023
Average of Pruning: Improving Performance and Stability of Out-of-Distribution Detection Zhen Cheng Fei Zhu Xu-Yao Zhang Cheng-Lin Liu MoMe OODD 40 11 0 02 Mar 2023
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks Samyak Jain Sravanti Addepalli P. Sahu Priyam Dey R. Venkatesh Babu MoMe OOD 43 20 0 28 Feb 2023
Modular Deep Learning Jonas Pfeiffer Sebastian Ruder Ivan Vulić E. Ponti MoMe OOD 32 73 0 22 Feb 2023
Considering Layerwise Importance in the Lottery Ticket Hypothesis Benjamin Vandersmissen José Oramas 23 1 0 22 Feb 2023
Simple Hardware-Efficient Long Convolutions for Sequence Modeling Daniel Y. Fu Elliot L. Epstein Eric N. D. Nguyen A. Thomas Michael Zhang Tri Dao Atri Rudra Christopher Ré 16 52 0 13 Feb 2023
Quantum Neuron Selection: Finding High Performing Subnetworks With Quantum Algorithms Tim Whitaker 30 1 0 12 Feb 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models Almog Gueta Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen 31 49 0 09 Feb 2023
Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning Huan Wang Can Qin Yue Bai Yun Fu 32 20 0 12 Jan 2023
Training trajectories, mini-batch losses and the curious role of the learning rate Mark Sandler A. Zhmoginov Max Vladymyrov Nolan Miller ODL 25 10 0 05 Jan 2023
Dataless Knowledge Fusion by Merging Weights of Language Models Xisen Jin Xiang Ren Daniel Preotiuc-Pietro Pengxiang Cheng FedML MoMe 24 214 0 19 Dec 2022
Can We Find Strong Lottery Tickets in Generative Models? Sangyeop Yeo Yoojin Jang Jy-yong Sohn Dongyoon Han Jaejun Yoo 20 6 0 16 Dec 2022
Editing Models with Task Arithmetic Gabriel Ilharco Marco Tulio Ribeiro Mitchell Wortsman Suchin Gururangan Ludwig Schmidt Hannaneh Hajishirzi Ali Farhadi KELM MoMe MU 57 435 0 08 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning Shachar Don-Yehiya Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen MoMe 28 52 0 02 Dec 2022
The Effect of Data Dimensionality on Neural Network Prunability Zachary Ankner Alex Renda Gintare Karolina Dziugaite Jonathan Frankle Tian Jin 28 5 0 01 Dec 2022
LU decomposition and Toeplitz decomposition of a neural network Yucong Liu Simiao Jiao Lek-Heng Lim 30 7 0 25 Nov 2022
Linear Interpolation In Parameter Space is Good Enough for Fine-Tuned Language Models Mark Rofin Nikita Balagansky Daniil Gavrilov MoMe KELM 36 5 0 22 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair Keller Jordan Hanie Sedghi O. Saukh R. Entezari Behnam Neyshabur MoMe 46 94 0 15 Nov 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning Yaqing Wang Sahaj Agarwal Subhabrata Mukherjee Xiaodong Liu Jing Gao Ahmed Hassan Awadallah Jianfeng Gao MoE 19 117 0 31 Oct 2022
Symmetries, flat minima, and the conserved quantities of gradient flow Bo-Lu Zhao I. Ganev Robin G. Walters Rose Yu Nima Dehmamy 47 16 0 31 Oct 2022
Exploring Mode Connectivity for Pre-trained Language Models Yujia Qin Cheng Qian Jing Yi Weize Chen Yankai Lin Xu Han Zhiyuan Liu Maosong Sun Jie Zhou 29 20 0 25 Oct 2022
lo-fi: distributed fine-tuning without communication Mitchell Wortsman Suchin Gururangan Shen Li Ali Farhadi Ludwig Schmidt Michael G. Rabbat Ari S. Morcos 32 24 0 19 Oct 2022
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models Nikolaos Dimitriadis P. Frossard Franccois Fleuret 26 25 0 18 Oct 2022
Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask? Mansheej Paul F. Chen Brett W. Larsen Jonathan Frankle Surya Ganguli Gintare Karolina Dziugaite UQCV 25 38 0 06 Oct 2022
Stochastic optimization on matrices and a graphon McKean-Vlasov limit Zaïd Harchaoui Sewoong Oh Soumik Pal Raghav Somani Raghavendra Tripathi 36 2 0 02 Oct 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models Rohan Anil S. Gadanho Danya Huang Nijith Jacob Zhuoshu Li ... Cristina Pop Kevin Regan G. Shamir Rakesh Shivanna Qiqi Yan 3DV 26 41 0 12 Sep 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries Samuel K. Ainsworth J. Hayase S. Srinivasa MoMe 255 314 0 11 Sep 2022
Complexity-Driven CNN Compression for Resource-constrained Edge AI Muhammad Zawish Steven Davy L. Abraham 33 16 0 26 Aug 2022
Doge Tickets: Uncovering Domain-general Language Models by Playing Lottery Tickets Yi Yang Chen Zhang Benyou Wang Dawei Song LRM 24 6 0 20 Jul 2022
Comprehensive Graph Gradual Pruning for Sparse Training in Graph Neural Networks Chuang Liu Xueqi Ma Yinbing Zhan Liang Ding Dapeng Tao Bo Du Wenbin Hu Danilo P. Mandic 39 28 0 18 Jul 2022
On the Robustness and Anomaly Detection of Sparse Neural Networks Morgane Ayle Bertrand Charpentier John Rachwan Daniel Zügner Simon Geisler Stephan Günnemann AAML 52 3 0 09 Jul 2022
Where to Begin? On the Impact of Pre-Training and Initialization in Federated Learning John Nguyen Jianyu Wang Kshitiz Malik Maziar Sanjabi Michael G. Rabbat FedML AI4CE 23 21 0 30 Jun 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting Zhengqi He Zeke Xie Quanzhi Zhu Zengchang Qin 74 27 0 17 Jun 2022
Zeroth-Order Topological Insights into Iterative Magnitude Pruning Aishwarya H. Balwani J. Krzyston 29 2 0 14 Jun 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning Yi-Lin Sung Jaemin Cho Joey Tianyi Zhou VLM 21 236 0 13 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Tri Dao Daniel Y. Fu Stefano Ermon Atri Rudra Christopher Ré VLM 69 2,024 0 27 May 2022