ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.00986
  4. Cited By
Merging Text Transformer Models from Different Initializations

Merging Text Transformer Models from Different Initializations

1 March 2024
Neha Verma
Maha Elbayad
    MoMe
ArXivPDFHTML

Papers citing "Merging Text Transformer Models from Different Initializations"

10 / 10 papers shown
Title
Wormhole Memory: A Rubik's Cube for Cross-Dialogue Retrieval
Wormhole Memory: A Rubik's Cube for Cross-Dialogue Retrieval
Libo Wang
85
0
0
24 Jan 2025
Merging Feed-Forward Sublayers for Compressed Transformers
Merging Feed-Forward Sublayers for Compressed Transformers
Neha Verma
Kenton W. Murray
Kevin Duh
AI4CE
45
0
0
10 Jan 2025
Training-free Heterogeneous Model Merging
Zhengqi Xu
Han Zheng
Jie Song
Li Sun
Mingli Song
MoMe
68
1
0
03 Jan 2025
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Charles Goddard
Shamane Siriwardhana
Malikeh Ehghaghi
Luke Meyers
Vladimir Karpukhin
Brian Benedict
Mark McQuade
Jacob Solawetz
MoMe
KELM
82
77
0
20 Mar 2024
Git Re-Basin: Merging Models modulo Permutation Symmetries
Git Re-Basin: Merging Models modulo Permutation Symmetries
Samuel K. Ainsworth
J. Hayase
S. Srinivasa
MoMe
239
313
0
11 Sep 2022
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
71
42
0
23 May 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Diverse Weight Averaging for Out-of-Distribution Generalization
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
188
128
0
19 May 2022
Optimizing Mode Connectivity via Neuron Alignment
Optimizing Mode Connectivity via Neuron Alignment
N. Joseph Tatro
Pin-Yu Chen
Payel Das
Igor Melnyk
P. Sattigeri
Rongjie Lai
MoMe
223
80
0
05 Sep 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
273
2,886
0
15 Sep 2016
1