Averaging Weights Leads to Wider Optima and Better Generalization

14 March 2018

Dmitry Vetrov

Papers citing "Averaging Weights Leads to Wider Optima and Better Generalization"

50 / 305 papers shown

Title
Randomized Adversarial Training via Taylor Expansion Gao Jin Xinping Yi Dengyu Wu Ronghui Mu Xiaowei Huang AAML 31 34 0 19 Mar 2023
Rethinking Model Ensemble in Transfer-based Adversarial Attacks Huanran Chen Yichi Zhang Yinpeng Dong Xiao Yang Hang Su Junyi Zhu AAML 26 55 0 16 Mar 2023
CAT: Causal Audio Transformer for Audio Classification Xiaoyu Liu Hanlin Lu Jianbo Yuan Xinyu Li ViT 24 22 0 14 Mar 2023
Rethinking Confidence Calibration for Failure Prediction Fei Zhu Zhen Cheng Xu-Yao Zhang Cheng-Lin Liu UQCV 14 39 0 06 Mar 2023
DSD $^2$ : Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free? Victor Quétu Enzo Tartaglione 24 7 0 02 Mar 2023
Average of Pruning: Improving Performance and Stability of Out-of-Distribution Detection Zhen Cheng Fei Zhu Xu-Yao Zhang Cheng-Lin Liu MoMe OODD 40 11 0 02 Mar 2023
DART: Diversify-Aggregate-Repeat Training Improves Generalization of Neural Networks Samyak Jain Sravanti Addepalli P. Sahu Priyam Dey R. Venkatesh Babu MoMe OOD 35 20 0 28 Feb 2023
A Comprehensive Study on Robustness of Image Classification Models: Benchmarking and Rethinking Chang-Shu Liu Yinpeng Dong Wenzhao Xiang X. Yang Hang Su Junyi Zhu YueFeng Chen Yuan He H. Xue Shibao Zheng OOD VLM AAML 17 72 0 28 Feb 2023
Personalized Privacy-Preserving Framework for Cross-Silo Federated Learning Van Tuan Tran Huy Hieu Pham Kok-Seng Wong FedML 25 7 0 22 Feb 2023
Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks Mohamed Aziz Bhouri M. Joly Robert Yu S. Sarkar P. Perdikaris BDL UQCV AI4CE 11 1 0 14 Feb 2023
Contour-based Interactive Segmentation Danil Galeev Polina Popenova Anna Vorontsova Anton Konushin 22 5 0 13 Feb 2023
Making Substitute Models More Bayesian Can Enhance Transferability of Adversarial Examples Qizhang Li Yiwen Guo W. Zuo Hao Chen AAML 27 35 0 10 Feb 2023
Better Diffusion Models Further Improve Adversarial Training Zekai Wang Tianyu Pang Chao Du Min-Bin Lin Weiwei Liu Shuicheng Yan DiffM 16 207 0 09 Feb 2023
Generalization in Graph Neural Networks: Improved PAC-Bayesian Bounds on Graph Diffusion Haotian Ju Dongyue Li Aneesh Sharma Hongyang R. Zhang 23 40 0 09 Feb 2023
A Survey of Deep Learning: From Activations to Transformers Johannes Schneider Michalis Vlachos ViT MedIm AI4TS AI4CE 46 9 0 01 Feb 2023
Cross-Architectural Positive Pairs improve the effectiveness of Self-Supervised Learning P. Singh Jacopo Cirrone SSL 40 0 0 27 Jan 2023
Exploring the Effect of Multi-step Ascent in Sharpness-Aware Minimization Hoki Kim Jinseong Park Yujin Choi Woojin Lee Jaewook Lee 15 9 0 27 Jan 2023
Model soups to increase inference without increasing compute time Charles Dansereau Milo Sobral Maninder Bhogal Mehdi Zalai 16 2 0 24 Jan 2023
Stability Analysis of Sharpness-Aware Minimization Hoki Kim Jinseong Park Yujin Choi Jaewook Lee 28 12 0 16 Jan 2023
Training trajectories, mini-batch losses and the curious role of the learning rate Mark Sandler A. Zhmoginov Max Vladymyrov Nolan Miller ODL 13 10 0 05 Jan 2023
Do Bayesian Variational Autoencoders Know What They Don't Know? Misha Glazunov Apostolis Zarras UQCV BDL 20 5 0 29 Dec 2022
Training Integer-Only Deep Recurrent Neural Networks V. Nia Eyyub Sari Vanessa Courville M. Asgharian MQ 42 2 0 22 Dec 2022
KL Regularized Normalization Framework for Low Resource Tasks Neeraj Kumar Ankur Narang Brejesh Lall 21 1 0 21 Dec 2022
Dataless Knowledge Fusion by Merging Weights of Language Models Xisen Jin Xiang Ren Daniel Preotiuc-Pietro Pengxiang Cheng FedML MoMe 13 211 0 19 Dec 2022
The Underlying Correlated Dynamics in Neural Training Rotem Turjeman Tom Berkov I. Cohen Guy Gilboa 19 3 0 18 Dec 2022
Bayesian posterior approximation with stochastic ensembles Oleksandr Balabanov Bernhard Mehlig H. Linander BDL UQCV 27 5 0 15 Dec 2022
A New Linear Scaling Rule for Private Adaptive Hyperparameter Optimization Ashwinee Panda Xinyu Tang Saeed Mahloujifar Vikash Sehwag Prateek Mittal 31 11 0 08 Dec 2022
Editing Models with Task Arithmetic Gabriel Ilharco Marco Tulio Ribeiro Mitchell Wortsman Suchin Gururangan Ludwig Schmidt Hannaneh Hajishirzi Ali Farhadi KELM MoMe MU 43 424 0 08 Dec 2022
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning Shachar Don-Yehiya Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen MoMe 26 52 0 02 Dec 2022
BARTSmiles: Generative Masked Language Models for Molecular Representations Gayane Chilingaryan Hovhannes Tamoyan Ani Tevosyan N. Babayan L. Khondkaryan Karen Hambardzumyan Zaven Navoyan Hrant Khachatrian Armen Aghajanyan SSL 27 25 0 29 Nov 2022
Indian Commercial Truck License Plate Detection and Recognition for Weighbridge Automation Siddharth Agrawal Keyur D. Joshi 30 4 0 23 Nov 2022
Improving Robust Generalization by Direct PAC-Bayesian Bound Minimization Zifa Wang Nan Ding Tomer Levinboim Xi Chen Radu Soricut AAML 35 5 0 22 Nov 2022
REPAIR: REnormalizing Permuted Activations for Interpolation Repair Keller Jordan Hanie Sedghi O. Saukh R. Entezari Behnam Neyshabur MoMe 46 94 0 15 Nov 2022
Learning to Annotate Part Segmentation with Gradient Matching Yu Yang Xiaotian Cheng Hakan Bilen Xiangyang Ji GAN 19 7 0 06 Nov 2022
Quantifying Model Uncertainty for Semantic Segmentation using Operators in the RKHS Rishabh Singh José C. Príncipe UQCV 19 3 0 03 Nov 2022
Circling Back to Recurrent Models of Language Gábor Melis 27 0 0 03 Nov 2022
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning Yaqing Wang Sahaj Agarwal Subhabrata Mukherjee Xiaodong Liu Jing Gao Ahmed Hassan Awadallah Jianfeng Gao MoE 13 118 0 31 Oct 2022
Symmetries, flat minima, and the conserved quantities of gradient flow Bo-Lu Zhao I. Ganev Robin G. Walters Rose Yu Nima Dehmamy 42 16 0 31 Oct 2022
Towards Generalized Few-Shot Open-Set Object Detection Binyi Su Hua Zhang Jingzhi Li Zhongjun Zhou 43 9 0 28 Oct 2022
Weight Averaging: A Simple Yet Effective Method to Overcome Catastrophic Forgetting in Automatic Speech Recognition Steven Vander Eeckt Hugo Van hamme CLL MoMe 58 14 0 27 Oct 2022
Sufficient Invariant Learning for Distribution Shift Taero Kim Sungjun Lim Kyungwoo Song OOD 19 2 0 24 Oct 2022
On the optimization and pruning for Bayesian deep learning X. Ke Yanan Fan BDL UQCV 22 1 0 24 Oct 2022
Revisiting Checkpoint Averaging for Neural Machine Translation Yingbo Gao Christian Herold Zijian Yang Hermann Ney MoMe 23 11 0 21 Oct 2022
lo-fi: distributed fine-tuning without communication Mitchell Wortsman Suchin Gururangan Shen Li Ali Farhadi Ludwig Schmidt Michael G. Rabbat Ari S. Morcos 19 24 0 19 Oct 2022
Scaling Adversarial Training to Large Perturbation Bounds Sravanti Addepalli Samyak Jain Gaurang Sriramanan R. Venkatesh Babu AAML 25 22 0 18 Oct 2022
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models Nikolaos Dimitriadis P. Frossard Franccois Fleuret 16 25 0 18 Oct 2022
RoS-KD: A Robust Stochastic Knowledge Distillation Approach for Noisy Medical Imaging A. Jaiswal Kumar Ashutosh Justin F. Rousseau Yifan Peng Zhangyang Wang Ying Ding 13 9 0 15 Oct 2022
Wasserstein Barycenter-based Model Fusion and Linear Mode Connectivity of Neural Networks A. K. Akash Sixu Li Nicolas García Trillos 24 12 0 13 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling Haw-Shiuan Chang Ruei-Yao Sun Kathryn Ricci Andrew McCallum 41 14 0 10 Oct 2022
Learning Across Domains and Devices: Style-Driven Source-Free Domain Adaptation in Clustered Federated Learning Donald Shenaj Eros Fani Marco Toldo Debora Caldarola A. Tavera Umberto Michieli Marco Ciccone Pietro Zanuttigh Barbara Caputo FedML 21 39 0 05 Oct 2022