Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well

International Conference on Learning Representations (ICLR), 2020

7 January 2020

Papers citing "Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well"

50 / 50 papers shown

T3: Test-Time Model Merging in VLMs for Zero-Shot Medical Imaging Analysis

Raza Imam

Hu Wang

Dwarikanath Mahapatra

Mohammad Yaqub

MoMe

327

31 Oct 2025

Probabilistic Token Alignment for Large Language Model Fusion

...

207

21 Sep 2025

Model Unmerging: Making Your Models Unmergeable for Secure Model Sharing

198

01 Sep 2025

UNIFORM: Unifying Knowledge from Large-scale and Diverse Pre-trained Models

232

27 Aug 2025

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

Tolga Dimlioglu

A. Choromańska

FedML

314

27 Jul 2025

Merging Smarter, Generalizing Better: Enhancing Model Merging on OOD Data

333

10 Jun 2025

Navigating the Accuracy-Size Trade-Off with Flexible Model Merging

Akash Dhasade

Divyansh Jhunjhunwala

344

29 May 2025

Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking

Yuatyong Chaichana

Thanapat Trachu

Peerat Limkonchotiwat

Konpat Preechakul

Tirasan Khandhawit

Ekapol Chuangsuwanich

MoMe

672

29 May 2025

Multi-Modality Expansion and Retention for LLMs through Parameter Merging and DecouplingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

436

21 May 2025

Efficient Multi-Task Modeling through Automated Fusion of Trained Models

245

14 Apr 2025

Rethinking Data: Towards Better Performing Domain-Specific Small Language Models

313

03 Mar 2025

Multi-Level Collaboration in Model Merging

393

03 Mar 2025

LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-DisjointAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

1.0K

24 Feb 2025

LoRE-Merging: Exploring Low-Rank Estimation For Large Language Model Merging

420

15 Feb 2025

When, Where and Why to Average Weights?International Conference on Machine Learning (ICML), 2025

650

10 Feb 2025

Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion

777

01 Feb 2025

AlignGuard: Scalable Safety Alignment for Text-to-Image Generation

587

13 Dec 2024

Exponential Moving Average of Weights in Deep Learning: Dynamics and Benefits

Daniel Morales-Brotons

Thijs Vogels

Aymeric Dieuleveut

489

27 Nov 2024

Task Arithmetic Through The Lens Of One-Shot Federated Learning

585

27 Nov 2024

LoRA Soups: Merging LoRAs for Practical Skill Composition TasksInternational Conference on Computational Linguistics (COLING), 2024

432

16 Oct 2024

QT-DoG: Quantization-aware Training for Domain Generalization

396

08 Oct 2024

Parameter Competition Balancing for Model MergingNeural Information Processing Systems (NeurIPS), 2024

Jing Li

...

Min Zhang

281

03 Oct 2024

FuseChat: Knowledge Fusion of Chat Models

Xiaojun Quan

433

15 Aug 2024

ProFuser: Progressive Fusion of Large Language Models

413

09 Aug 2024

DynaMMo: Dynamic Model Merging for Efficient Class Incremental Learning for Medical Images

Mohammad Areeb Qazi

Ibrahim Almakky

Anees Ur Rehman Hashmi

Santosh Sanjeev

Mohammad Yaqub

MoMe

272

22 Apr 2024

DAM: Dynamic Adapter Merging for Continual Video QA LearningIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024

Feng Cheng

Ziyang Wang

Yi-Lin Sung

Yan-Bo Lin

Mohit Bansal

Gedas Bertasius

CLL MoMe

458

13 Mar 2024

Knowledge Fusion of Chat LLMs: A Preliminary Technical Report

Wei Bi

616

25 Feb 2024

Representation Surgery for Multi-Task Model Merging

Li Shen

414

05 Feb 2024

eXplainable Bayesian Multi-Perspective Generative Retrieval

287

04 Feb 2024

Knowledge Fusion of Large Language Models

Wei Bi

388

109

19 Jan 2024

Language and Task Arithmetic with Parameter-Efficient Layers for Zero-Shot Summarization

Alexandra Chronopoulou

Xinyi Wang

258

15 Nov 2023

A Quadratic Synchronization Rule for Distributed Deep LearningInternational Conference on Learning Representations (ICLR), 2023

353

22 Oct 2023

AdaMerging: Adaptive Model Merging for Multi-Task LearningInternational Conference on Learning Representations (ICLR), 2023

Li Shen

374

219

04 Oct 2023

Deep Model Fusion: A Survey

Liang Ding

Li Shen

346

106

27 Sep 2023

Accelerating Large Batch Training via Gradient Signal to Noise Ratio (GSNR)

250

24 Sep 2023

The Split Matters: Flat Minima Methods for Improving the Performance of GNNsInternational Cross-Domain Conference on Machine Learning and Knowledge Extraction (CD-MAKE), 2023

N. Lell

A. Scherp

256

15 Jun 2023

TIES-Merging: Resolving Interference When Merging ModelsNeural Information Processing Systems (NeurIPS), 2023

465

640

02 Jun 2023

Understanding and Improving Model Averaging in Federated Learning on Heterogeneous DataIEEE Transactions on Mobile Computing (IEEE TMC), 2023

444

13 May 2023

Hierarchical Weight Averaging for Deep Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

Zixun Zhang

242

23 Apr 2023

A Survey of Historical Learning: Learning Models with Learning History

Xiang Li

Lingfeng Yang

Jian Yang

298

23 Mar 2023

Randomized Adversarial Training via Taylor ExpansionComputer Vision and Pattern Recognition (CVPR), 2023

351

19 Mar 2023

Dataless Knowledge Fusion by Merging Weights of Language ModelsInternational Conference on Learning Representations (ICLR), 2022

Xisen Jin

Xiang Ren

Daniel Preoţiuc-Pietro

Pengxiang Cheng

FedML MoMe

545

359

19 Dec 2022

Diverse Weight Averaging for Out-of-Distribution GeneralizationNeural Information Processing Systems (NeurIPS), 2022

699

167

19 May 2022

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent

Wei Zhang

216

02 Dec 2021

RobustART: Benchmarking Robustness on Architecture Design and Training Techniques

...

Xianglong Liu

376

124

11 Sep 2021

LocalNewton: Reducing Communication Bottleneck for Distributed Learning

221

16 May 2021

Consensus Control for Decentralized Deep LearningInternational Conference on Machine Learning (ICML), 2021

307

100

09 Feb 2021

Truly Sparse Neural Networks at Scale

Selima Curci

Decebal Constantin Mocanu

Mykola Pechenizkiy

444

02 Feb 2021

Training Recommender Systems at Scale: Communication-Efficient Model and Data ParallelismKnowledge Discovery and Data Mining (KDD), 2020

366

18 Oct 2020

The Limit of the Batch Size

Yang You

338

15 Jun 2020