ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.03044
  4. Cited By
Fusing finetuned models for better pretraining

Fusing finetuned models for better pretraining

6 April 2022
Leshem Choshen
Elad Venezian
Noam Slonim
Yoav Katz
    FedML
    AI4CE
    MoMe
ArXivPDFHTML

Papers citing "Fusing finetuned models for better pretraining"

29 / 79 papers shown
Title
Deep Model Fusion: A Survey
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedML
MoMe
15
51
0
27 Sep 2023
Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia
  Classifiers with a Multilingual Understanding
Cordyceps@LT-EDI: Patching Language-Specific Homophobia/Transphobia Classifiers with a Multilingual Understanding
Dean Ninalga
17
2
0
24 Sep 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
27
42
0
30 Jul 2023
Can Model Fusing Help Transformers in Long Document Classification? An
  Empirical Study
Can Model Fusing Help Transformers in Long Document Classification? An Empirical Study
Damith Premasiri
Tharindu Ranasinghe
R. Mitkov
VLM
16
1
0
18 Jul 2023
Tangent Transformers for Composition, Privacy and Removal
Tangent Transformers for Composition, Privacy and Removal
Tian Yu Liu
Aditya Golatkar
Stefano Soatto
16
8
0
16 Jul 2023
Tangent Model Composition for Ensembling and Continual Fine-tuning
Tangent Model Composition for Ensembling and Continual Fine-tuning
Tianlin Liu
Stefano Soatto
LRM
MoMe
CLL
6
15
0
16 Jul 2023
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Sparse Model Soups: A Recipe for Improved Pruning via Model Averaging
Max Zimmer
Christoph Spiegel
S. Pokutta
MoMe
28
14
0
29 Jun 2023
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery
  Tickets from Large Models
Instant Soup: Cheap Pruning Ensembles in A Single Pass Can Draw Lottery Tickets from Large Models
A. Jaiswal
Shiwei Liu
Tianlong Chen
Ying Ding
Zhangyang Wang
VLM
32
22
0
18 Jun 2023
Git-Theta: A Git Extension for Collaborative Development of Machine
  Learning Models
Git-Theta: A Git Extension for Collaborative Development of Machine Learning Models
Nikhil Kandpal
Brian Lester
Mohammed Muqeeth
Anisha Mascarenhas
Monty Evans
Vishal Baskaran
Tenghao Huang
Haokun Liu
Colin Raffel
VLM
14
10
0
07 Jun 2023
Soft Merging of Experts with Adaptive Routing
Soft Merging of Experts with Adaptive Routing
Mohammed Muqeeth
Haokun Liu
Colin Raffel
MoMe
MoE
17
44
0
06 Jun 2023
TIES-Merging: Resolving Interference When Merging Models
TIES-Merging: Resolving Interference When Merging Models
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Mohit Bansal
MoMe
14
244
0
02 Jun 2023
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained
  Models
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models
Guillermo Ortiz-Jiménez
Alessandro Favero
P. Frossard
MoMe
30
103
0
22 May 2023
Stop Uploading Test Data in Plain Text: Practical Strategies for
  Mitigating Data Contamination by Evaluation Benchmarks
Stop Uploading Test Data in Plain Text: Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks
Alon Jacovi
Avi Caciularu
Omer Goldman
Yoav Goldberg
15
95
0
17 May 2023
ZipIt! Merging Models from Different Tasks without Training
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLM
MoMe
36
109
0
04 May 2023
Merging Decision Transformers: Weight Averaging for Forming Multi-Task
  Policies
Merging Decision Transformers: Weight Averaging for Forming Multi-Task Policies
Daniel Lawson
A. H. Qureshi
MoMe
OffRL
6
13
0
14 Mar 2023
Towards Zero-Shot Functional Compositionality of Language Models
Towards Zero-Shot Functional Compositionality of Language Models
Hangyeol Yu
Myeongho Jeong
Jamin Shin
Hyeongdon Moon
Juneyoung Park
Seungtaek Choi
20
1
0
06 Mar 2023
Robust Weight Signatures: Gaining Robustness as Easy as Patching
  Weights?
Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?
Ruisi Cai
Zhenyu (Allen) Zhang
Zhangyang Wang
AAML
OOD
20
12
0
24 Feb 2023
Knowledge is a Region in Weight Space for Fine-tuned Language Models
Knowledge is a Region in Weight Space for Fine-tuned Language Models
Almog Gueta
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
16
49
0
09 Feb 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution
  Generalization
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
Alexandre Ramé
Kartik Ahuja
Jianyu Zhang
Matthieu Cord
Léon Bottou
David Lopez-Paz
MoMe
OODD
18
80
0
20 Dec 2022
Dataless Knowledge Fusion by Merging Weights of Language Models
Dataless Knowledge Fusion by Merging Weights of Language Models
Xisen Jin
Xiang Ren
Daniel Preotiuc-Pietro
Pengxiang Cheng
FedML
MoMe
13
210
0
19 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
28
421
0
08 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning
Shachar Don-Yehiya
Elad Venezian
Colin Raffel
Noam Slonim
Yoav Katz
Leshem Choshen
MoMe
14
52
0
02 Dec 2022
Where to start? Analyzing the potential value of intermediate models
Where to start? Analyzing the potential value of intermediate models
Leshem Choshen
Elad Venezian
Shachar Don-Yehiya
Noam Slonim
Yoav Katz
MoMe
17
27
0
31 Oct 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
13
24
0
19 Oct 2022
Patching open-vocabulary models by interpolating weights
Patching open-vocabulary models by interpolating weights
Gabriel Ilharco
Mitchell Wortsman
S. Gadre
Shuran Song
Hannaneh Hajishirzi
Simon Kornblith
Ali Farhadi
Ludwig Schmidt
VLM
KELM
14
166
0
10 Aug 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Diverse Weight Averaging for Out-of-Distribution Generalization
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
186
128
0
19 May 2022
On Neurons Invariant to Sentence Structural Changes in Neural Machine
  Translation
On Neurons Invariant to Sentence Structural Changes in Neural Machine Translation
Gal Patel
Leshem Choshen
Omri Abend
20
2
0
06 Oct 2021
Analyzing Monotonic Linear Interpolation in Neural Network Loss
  Landscapes
Analyzing Monotonic Linear Interpolation in Neural Network Loss Landscapes
James Lucas
Juhan Bae
Michael Ruogu Zhang
Stanislav Fort
R. Zemel
Roger C. Grosse
MoMe
146
28
0
22 Apr 2021
e-SNLI: Natural Language Inference with Natural Language Explanations
e-SNLI: Natural Language Inference with Natural Language Explanations
Oana-Maria Camburu
Tim Rocktaschel
Thomas Lukasiewicz
Phil Blunsom
LRM
252
618
0
04 Dec 2018
Previous
12