Fusing finetuned models for better pretraining

6 April 2022

Papers citing "Fusing finetuned models for better pretraining"

29 / 79 papers shown

Title
Deep Model Fusion: A Survey Weishi Li Yong Peng Miao Zhang Liang Ding Han Hu Li Shen FedML MoMe 15 51 0 27 Sep 2023
Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia Classifiers with a Multilingual Understanding Dean Ninalga 17 2 0 24 Sep 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks Mustafa Shukor Corentin Dancette Alexandre Ramé Matthieu Cord MoMe MLLM 27 42 0 30 Jul 2023
Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study Damith Premasiri Tharindu Ranasinghe R. Mitkov VLM 16 1 0 18 Jul 2023
Tangent Transformers for Composition, Privacy and Removal Tian Yu Liu Aditya Golatkar Stefano Soatto 16 8 0 16 Jul 2023
Tangent Model Composition for Ensembling and Continual Fine-tuning Tianlin Liu Stefano Soatto LRM MoMe CLL 6 15 0 16 Jul 2023
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging Max Zimmer Christoph Spiegel S. Pokutta MoMe 28 14 0 29 Jun 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models A. Jaiswal Shiwei Liu Tianlong Chen Ying Ding Zhangyang Wang VLM 32 22 0 18 Jun 2023
Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models Nikhil Kandpal Brian Lester Mohammed Muqeeth Anisha Mascarenhas Monty Evans Vishal Baskaran Tenghao Huang Haokun Liu Colin Raffel VLM 14 10 0 07 Jun 2023
Soft Merging of Experts with Adaptive Routing Mohammed Muqeeth Haokun Liu Colin Raffel MoMe MoE 17 44 0 06 Jun 2023
TIES-Merging: Resolving Interference When Merging Models Prateek Yadav Derek Tam Leshem Choshen Colin Raffel Mohit Bansal MoMe 14 244 0 02 Jun 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models Guillermo Ortiz-Jiménez Alessandro Favero P. Frossard MoMe 30 103 0 22 May 2023
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks Alon Jacovi Avi Caciularu Omer Goldman Yoav Goldberg 15 95 0 17 May 2023
ZipIt! Merging Models from Different Tasks without Training George Stoica Daniel Bolya J. Bjorner Pratik Ramesh Taylor N. Hearn Judy Hoffman VLM MoMe 36 109 0 04 May 2023
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies Daniel Lawson A. H. Qureshi MoMe OffRL 6 13 0 14 Mar 2023
Towards Zero-Shot Functional Compositionality of Language Models Hangyeol Yu Myeongho Jeong Jamin Shin Hyeongdon Moon Juneyoung Park Seungtaek Choi 20 1 0 06 Mar 2023
Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights? Ruisi Cai Zhenyu (Allen) Zhang Zhangyang Wang AAML OOD 20 12 0 24 Feb 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models Almog Gueta Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen 16 49 0 09 Feb 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization Alexandre Ramé Kartik Ahuja Jianyu Zhang Matthieu Cord Léon Bottou David Lopez-Paz MoMe OODD 18 80 0 20 Dec 2022
Dataless Knowledge Fusion by Merging Weights of Language Models Xisen Jin Xiang Ren Daniel Preotiuc-Pietro Pengxiang Cheng FedML MoMe 13 210 0 19 Dec 2022
Editing Models with Task Arithmetic Gabriel Ilharco Marco Tulio Ribeiro Mitchell Wortsman Suchin Gururangan Ludwig Schmidt Hannaneh Hajishirzi Ali Farhadi KELM MoMe MU 28 421 0 08 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning Shachar Don-Yehiya Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen MoMe 14 52 0 02 Dec 2022
Where to start? Analyzing the potential value of intermediate models Leshem Choshen Elad Venezian Shachar Don-Yehiya Noam Slonim Yoav Katz MoMe 17 27 0 31 Oct 2022
lo-fi: distributed fine-tuning without communication Mitchell Wortsman Suchin Gururangan Shen Li Ali Farhadi Ludwig Schmidt Michael G. Rabbat Ari S. Morcos 13 24 0 19 Oct 2022
Patching open-vocabulary models by interpolating weights Gabriel Ilharco Mitchell Wortsman S. Gadre Shuran Song Hannaneh Hajishirzi Simon Kornblith Ali Farhadi Ludwig Schmidt VLM KELM 14 166 0 10 Aug 2022
Diverse Weight Averaging for Out-of-Distribution Generalization Alexandre Ramé Matthieu Kirchmeyer Thibaud Rahier A. Rakotomamonjy Patrick Gallinari Matthieu Cord OOD 186 128 0 19 May 2022
On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation Gal Patel Leshem Choshen Omri Abend 20 2 0 06 Oct 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes James Lucas Juhan Bae Michael Ruogu Zhang Stanislav Fort R. Zemel Roger C. Grosse MoMe 146 28 0 22 Apr 2021
e-SNLI: Natural Language Inference with Natural Language Explanations Oana-Maria Camburu Tim Rocktaschel Thomas Lukasiewicz Phil Blunsom LRM 252 618 0 04 Dec 2018