ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.02312
  4. Cited By
Stochastic Weight Averaging in Parallel: Large-Batch Training that
  Generalizes Well

Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

International Conference on Learning Representations (ICLR), 2020
7 January 2020
Vipul Gupta
S. Serrano
D. DeCoste
    MoMe
ArXiv (abs)PDFHTML

Papers citing "Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well"

50 / 50 papers shown
T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis
T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis
Raza Imam
Hu Wang
Dwarikanath Mahapatra
Mohammad Yaqub
MoMe
327
0
0
31 Oct 2025
Probabilistic Token Alignment for Large Language Model Fusion
Probabilistic Token Alignment for Large Language Model Fusion
Runjia Zeng
James Liang
Cheng Han
Zhiwen Cao
Jiahao Liu
...
Yingjie Victor Chen
Lifu Huang
Tong Geng
Qifan Wang
Dongfang Liu
207
3
0
21 Sep 2025
Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing
Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing
Zihao Wang
Enneng Yang
L. Yin
Shiwei Liu
Li Shen
FedMLMoMe
198
1
0
01 Sep 2025
UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models
Yimu Wang
Weiming Zhuang
Chen Chen
Jiabo Huang
Jingtao Li
Lingjuan Lyu
FedML
232
1
0
27 Aug 2025
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning
Tolga Dimlioglu
A. Choromańska
FedML
314
1
0
27 Jul 2025
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data
Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data
Bingjie Zhang
Hongkang Li
Changlong Shi
Guowei Rong
He Zhao
Dongsheng Wang
Dandan Guo
Meng Wang
MoMe
333
1
0
10 Jun 2025
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Navigating the Accuracy-Size Trade-Off with Flexible Model Merging
Akash Dhasade
Divyansh Jhunjhunwala
Milos Vujasinovic
Gauri Joshi
Anne-Marie Kermarrec
MoMe
344
1
0
29 May 2025
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking
Yuatyong Chaichana
Thanapat Trachu
Peerat Limkonchotiwat
Konpat Preechakul
Tirasan Khandhawit
Ekapol Chuangsuwanich
MoMe
672
1
0
29 May 2025
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling
Multi-Modality Expansion and Retention for LLMs through Parameter Merging and DecouplingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Junlin Li
Guodong DU
Jing Li
Sim Kuan Goh
Wenya Wang
...
Fangming Liu
Jing Li
Saleh Alharbi
Daojing He
Min Zhang
MoMeCLL
436
1
0
21 May 2025
Efficient Multi-Task Modeling through Automated Fusion of Trained Models
Efficient Multi-Task Modeling through Automated Fusion of Trained Models
Jingxuan Zhou
Weidong Bao
Ji Wang
Zhengyi Zhong
Dayu Zhang
MoMe
245
0
0
14 Apr 2025
Rethinking Data: Towards Better Performing Domain-Specific Small Language Models
Rethinking Data: Towards Better Performing Domain-Specific Small Language Models
Boris Nazarov
Darya Frolova
Yackov Lubarsky
Alexei Gaissinski
Pavel Kisilev
ALM
313
1
0
03 Mar 2025
Multi-Level Collaboration in Model Merging
Multi-Level Collaboration in Model Merging
Qi Li
Runpeng Yu
Xinchao Wang
MoMeFedML
393
0
0
03 Mar 2025
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint
LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-DisjointAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qianli Ma
Dongrui Liu
Qian Chen
Linfeng Zhang
Jing Shao
MoMe
1.0K
5
0
24 Feb 2025
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging
LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging
Zehua Liu
Han Wu
Yuxuan Yao
Ruifeng She
Xiongwei Han
Tao Zhong
Mingxuan Yuan
MoMe
420
7
0
15 Feb 2025
When, Where and Why to Average Weights?
When, Where and Why to Average Weights?International Conference on Machine Learning (ICML), 2025
Niccolò Ajroldi
Antonio Orvieto
Jonas Geiping
MoMe
650
4
0
10 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
777
11
0
01 Feb 2025
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
AlignGuard: Scalable Safety Alignment for Text-to-Image Generation
Runtao Liu
Chen I Chieh
Jindong Gu
Jipeng Zhang
Renjie Pi
Qifeng Chen
Juil Sock
Ashkan Khakzar
Fabio Pizzati
EGVM
587
7
0
13 Dec 2024
Exponential Moving Average of Weights in Deep Learning: Dynamics and
  Benefits
Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits
Daniel Morales-Brotons
Thijs Vogels
Aymeric Dieuleveut
489
95
0
27 Nov 2024
Task Arithmetic Through The Lens Of One-Shot Federated Learning
Task Arithmetic Through The Lens Of One-Shot Federated Learning
Zhixu Tao
I. Mason
Sanjeev R. Kulkarni
Xavier Boix
MoMeFedML
585
10
0
27 Nov 2024
LoRA Soups: Merging LoRAs for Practical Skill Composition Tasks
LoRA Soups: Merging LoRAs for Practical Skill Composition TasksInternational Conference on Computational Linguistics (COLING), 2024
Akshara Prabhakar
Yuanzhi Li
Karthik Narasimhan
Sham Kakade
Eran Malach
Samy Jelassi
MoMe
432
37
0
16 Oct 2024
QT-DoG: Quantization-aware Training for Domain Generalization
QT-DoG: Quantization-aware Training for Domain Generalization
Saqib Javed
Hieu Le
Mathieu Salzmann
OODMQ
396
9
0
08 Oct 2024
Parameter Competition Balancing for Model Merging
Parameter Competition Balancing for Model MergingNeural Information Processing Systems (NeurIPS), 2024
Guodong DU
Junlin Lee
Jing Li
Runhua Jiang
Yifei Guo
...
Hanting Liu
Sim Kuan Goh
Jing Li
Daojing He
Min Zhang
MoMe
281
60
0
03 Oct 2024
FuseChat: Knowledge Fusion of Chat Models
FuseChat: Knowledge Fusion of Chat Models
Fanqi Wan
Longguang Zhong
Ziyi Yang
Ruijun Chen
Xiaojun Quan
ALMKELMMoMe
433
56
0
15 Aug 2024
ProFuser: Progressive Fusion of Large Language Models
ProFuser: Progressive Fusion of Large Language Models
Tianyuan Shi
Fanqi Wan
Canbin Huang
Xiaojun Quan
Chenliang Li
Ming Yan
J. Zhang
Minhua Huang
Wu Kai
MoMe
413
3
0
09 Aug 2024
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning
  for Medical Images
DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images
Mohammad Areeb Qazi
Ibrahim Almakky
Anees Ur Rehman Hashmi
Santosh Sanjeev
Mohammad Yaqub
MoMe
272
9
0
22 Apr 2024
DAM: Dynamic Adapter Merging for Continual Video QA Learning
DAM: Dynamic Adapter Merging for Continual Video QA LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Feng Cheng
Ziyang Wang
Yi-Lin Sung
Yan-Bo Lin
Mohit Bansal
Gedas Bertasius
CLLMoMe
458
19
0
13 Mar 2024
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report
Knowledge Fusion of Chat LLMs: A Preliminary Technical Report
Fanqi Wan
Ziyi Yang
Longguang Zhong
Xiaojun Quan
Xinting Huang
Wei Bi
MoMe
616
3
0
25 Feb 2024
Representation Surgery for Multi-Task Model Merging
Representation Surgery for Multi-Task Model Merging
Enneng Yang
Li Shen
Zhenyi Wang
Guibing Guo
Xiaojun Chen
Xingwei Wang
Dacheng Tao
MoMe
414
95
0
05 Feb 2024
eXplainable Bayesian Multi-Perspective Generative Retrieval
eXplainable Bayesian Multi-Perspective Generative Retrieval
EuiYul Song
Philhoon Oh
Sangryul Kim
Hyunjung Shim
BDL
287
0
0
04 Feb 2024
Knowledge Fusion of Large Language Models
Knowledge Fusion of Large Language Models
Fanqi Wan
Xinting Huang
Deng Cai
Xiaojun Quan
Wei Bi
Shuming Shi
MoMe
388
109
0
19 Jan 2024
Language and Task Arithmetic with Parameter-Efficient Layers for
  Zero-Shot Summarization
Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization
Alexandra Chronopoulou
Jonas Pfeiffer
Joshua Maynez
Xinyi Wang
Sebastian Ruder
Priyanka Agrawal
MoMe
258
28
0
15 Nov 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
A Quadratic Synchronization Rule for Distributed Deep LearningInternational Conference on Learning Representations (ICLR), 2023
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
353
4
0
22 Oct 2023
AdaMerging: Adaptive Model Merging for Multi-Task Learning
AdaMerging: Adaptive Model Merging for Multi-Task LearningInternational Conference on Learning Representations (ICLR), 2023
Enneng Yang
Zhenyi Wang
Li Shen
Shiwei Liu
Guibing Guo
Xingwei Wang
Dacheng Tao
MoMe
374
219
0
04 Oct 2023
Deep Model Fusion: A Survey
Deep Model Fusion: A Survey
Weishi Li
Yong Peng
Miao Zhang
Liang Ding
Han Hu
Li Shen
FedMLMoMe
346
106
0
27 Sep 2023
Accelerating Large Batch Training via Gradient Signal to Noise Ratio
  (GSNR)
Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)
Guo-qing Jiang
Jinlong Liu
Zixiang Ding
Lin Guo
W. Lin
AI4CE
250
2
0
24 Sep 2023
The Split Matters: Flat Minima Methods for Improving the Performance of
  GNNs
The Split Matters: Flat Minima Methods for Improving the Performance of GNNsInternational Cross-Domain Conference on Machine Learning and Knowledge Extraction (CD-MAKE), 2023
N. Lell
A. Scherp
256
2
0
15 Jun 2023
TIES-Merging: Resolving Interference When Merging Models
TIES-Merging: Resolving Interference When Merging ModelsNeural Information Processing Systems (NeurIPS), 2023
Prateek Yadav
Derek Tam
Leshem Choshen
Colin Raffel
Joey Tianyi Zhou
MoMe
465
640
0
02 Jun 2023
Understanding and Improving Model Averaging in Federated Learning on
  Heterogeneous Data
Understanding and Improving Model Averaging in Federated Learning on Heterogeneous DataIEEE Transactions on Mobile Computing (IEEE TMC), 2023
Tailin Zhou
Zehong Lin
Jinchao Zhang
Danny H. K. Tsang
MoMeFedML
444
25
0
13 May 2023
Hierarchical Weight Averaging for Deep Neural Networks
Hierarchical Weight Averaging for Deep Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Xiaozhe Gu
Zixun Zhang
Yuncheng Jiang
Yaoyu Zhang
Ruimao Zhang
Shuguang Cui
Zhuguo Li
242
7
0
23 Apr 2023
A Survey of Historical Learning: Learning Models with Learning History
A Survey of Historical Learning: Learning Models with Learning History
Xiang Li
Ge Wu
Lingfeng Yang
Wenzhe Wang
Renjie Song
Jian Yang
MUAI4TS
298
3
0
23 Mar 2023
Randomized Adversarial Training via Taylor Expansion
Randomized Adversarial Training via Taylor ExpansionComputer Vision and Pattern Recognition (CVPR), 2023
Gao Jin
Xinping Yi
Dengyu Wu
Ronghui Mu
Xiaowei Huang
AAML
351
58
0
19 Mar 2023
Dataless Knowledge Fusion by Merging Weights of Language Models
Dataless Knowledge Fusion by Merging Weights of Language ModelsInternational Conference on Learning Representations (ICLR), 2022
Xisen Jin
Xiang Ren
Daniel Preoţiuc-Pietro
Pengxiang Cheng
FedMLMoMe
545
359
0
19 Dec 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Diverse Weight Averaging for Out-of-Distribution GeneralizationNeural Information Processing Systems (NeurIPS), 2022
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
699
167
0
19 May 2022
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized
  Stochastic Gradient Descent
Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Wei Zhang
Mingrui Liu
Yu Feng
Xiaodong Cui
Brian Kingsbury
Yuhai Tu
216
3
0
02 Dec 2021
RobustART: Benchmarking Robustness on Architecture Design and Training
  Techniques
RobustART: Benchmarking Robustness on Architecture Design and Training Techniques
Shiyu Tang
Yazhe Niu
Yan Wang
Aishan Liu
Jinyang Guo
...
Xianglong Liu
Basel Alomair
Alan Yuille
Juil Sock
Dacheng Tao
VLMAAML
376
124
0
11 Sep 2021
LocalNewton: Reducing Communication Bottleneck for Distributed Learning
LocalNewton: Reducing Communication Bottleneck for Distributed Learning
Vipul Gupta
Avishek Ghosh
Michal Derezinski
Rajiv Khanna
Kannan Ramchandran
Michael W. Mahoney
221
14
0
16 May 2021
Consensus Control for Decentralized Deep Learning
Consensus Control for Decentralized Deep LearningInternational Conference on Machine Learning (ICML), 2021
Lingjing Kong
Tao Lin
Anastasia Koloskova
Martin Jaggi
Sebastian U. Stich
307
100
0
09 Feb 2021
Truly Sparse Neural Networks at Scale
Truly Sparse Neural Networks at Scale
Selima Curci
Decebal Constantin Mocanu
Mykola Pechenizkiy
444
24
0
02 Feb 2021
Training Recommender Systems at Scale: Communication-Efficient Model and
  Data Parallelism
Training Recommender Systems at Scale: Communication-Efficient Model and Data ParallelismKnowledge Discovery and Data Mining (KDD), 2020
Vipul Gupta
Dhruv Choudhary
P. T. P. Tang
Xiaohan Wei
Xing Wang
Yuzhen Huang
A. Kejariwal
Kannan Ramchandran
Michael W. Mahoney
366
33
0
18 Oct 2020
The Limit of the Batch Size
The Limit of the Batch Size
Yang You
Yuhui Wang
Huan Zhang
Zhao-jie Zhang
J. Demmel
Cho-Jui Hsieh
338
23
0
15 Jun 2020
1
Page 1 of 1