Averaging Weights Leads to Wider Optima and Better Generalization

14 March 2018

Dmitry Vetrov

Papers citing "Averaging Weights Leads to Wider Optima and Better Generalization"

50 / 305 papers shown

Title
Stability Analysis and Generalization Bounds of Adversarial Training Jiancong Xiao Yanbo Fan Ruoyu Sun Jue Wang Zhimin Luo AAML 24 30 0 03 Oct 2022
Adaptive Smoothness-weighted Adversarial Training for Multiple Perturbations with Its Stability Analysis Jiancong Xiao Zeyu Qin Yanbo Fan Baoyuan Wu Jue Wang Zhimin Luo AAML 29 7 0 02 Oct 2022
Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging Jean Kaddour MoMe 3DH 19 39 0 29 Sep 2022
Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization Danni Peng Sinno Jialin Pan 27 2 0 29 Sep 2022
Two-Tailed Averaging: Anytime, Adaptive, Once-in-a-While Optimal Weight Averaging for Better Generalization Gábor Melis MoMe 19 1 0 26 Sep 2022
Random initialisations performing above chance and how to find them Frederik Benzing Simon Schug Robert Meier J. Oswald Yassir Akram Nicolas Zucchet Laurence Aitchison Angelika Steger ODL 11 24 0 15 Sep 2022
Git Re-Basin: Merging Models modulo Permutation Symmetries Samuel K. Ainsworth J. Hayase S. Srinivasa MoMe 243 313 0 11 Sep 2022
Generalisation under gradient descent via deterministic PAC-Bayes Eugenio Clerico Tyler Farghly George Deligiannidis Benjamin Guedj Arnaud Doucet 23 4 0 06 Sep 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey Yanbei Chen Massimiliano Mancini Xiatian Zhu Zeynep Akata 36 113 0 24 Aug 2022
A Unified Analysis of Mixed Sample Data Augmentation: A Loss Function Perspective Chanwoo Park Sangdoo Yun Sanghyuk Chun AAML 16 32 0 21 Aug 2022
Interpretable Uncertainty Quantification in AI for HEP Thomas Y. Chen B. Dey A. Ghosh Michael Kagan Brian D. Nord Nesar Ramachandra 25 7 0 05 Aug 2022
Learning Hyper Label Model for Programmatic Weak Supervision Renzhi Wu Sheng Chen Jieyu Zhang Xu Chu 18 16 0 27 Jul 2022
LGV: Boosting Adversarial Example Transferability from Large Geometric Vicinity Martin Gubri Maxime Cordy Mike Papadakis Yves Le Traon Koushik Sen AAML 22 51 0 26 Jul 2022
Learning from Data with Noisy Labels Using Temporal Self-Ensemble Jun Ho Lee J. Baik Taebaek Hwang J. Choi NoLa 22 1 0 21 Jul 2022
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors Chien-Yao Wang Alexey Bochkovskiy H. Liao ObjD 27 6,222 0 06 Jul 2022
Federated Self-supervised Learning for Video Understanding Yasar Abbas Ur Rehman Yan Gao Jiajun Shen Pedro Porto Buarque de Gusmão Nicholas D. Lane FedML 17 15 0 05 Jul 2022
Effective training-time stacking for ensembling of deep neural networks P. Proskura Alexey Zaytsev 9 6 0 27 Jun 2022
When Does Re-initialization Work? Sheheryar Zaidi Tudor Berariu Hyunjik Kim J. Bornschein Claudia Clopath Yee Whye Teh Razvan Pascanu 30 10 0 20 Jun 2022
Uncertainty-aware Evaluation of Time-Series Classification for Online Handwriting Recognition with Domain Shift Andreas Klass Sven M. Lorenz M. Lauer-Schmaltz David Rügamer Bernd Bischl Christopher Mutschler Felix Ott 29 10 0 17 Jun 2022
A Closer Look at Smoothness in Domain Adversarial Training Harsh Rangwani Sumukh K Aithal Mayank Mishra Arihant Jain R. Venkatesh Babu 25 119 0 16 Jun 2022
Bayesian Learning of Parameterised Quantum Circuits Samuel Duffield Marcello Benedetti Matthias Rosenkranz 12 11 0 15 Jun 2022
Density Regression and Uncertainty Quantification with Bayesian Deep Noise Neural Networks Daiwei Zhang Tianci Liu Jian Kang BDL UQCV 26 2 0 12 Jun 2022
FlexLip: A Controllable Text-to-Lip System Dan Oneaţă Beáta Lőrincz Adriana Stan H. Cucu 14 3 0 07 Jun 2022
Differentiable programming for functional connectomics R. Ciric A. Thomas Oscar Esteban R. Poldrack 15 0 0 31 May 2022
Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models Sabeesh Ethiraj B. Bolla 17 2 0 23 May 2022
NeuralEF: Deconstructing Kernels by Deep Neural Networks Zhijie Deng Jiaxin Shi Jun Zhu 16 18 0 30 Apr 2022
Conformer and Blind Noisy Students for Improved Image Quality Assessment Marcos V. Conde Maxime Burchi Radu Timofte DiffM 38 14 0 27 Apr 2022
A Simple Approach to Adversarial Robustness in Few-shot Image Classification Akshayvarun Subramanya Hamed Pirsiavash VLM 17 6 0 11 Apr 2022
The Two Dimensions of Worst-case Training and the Integrated Effect for Out-of-domain Generalization Zeyi Huang Haohan Wang Dong Huang Yong Jae Lee Eric P. Xing 11 22 0 09 Apr 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022 Jiameng Gao 18 0 0 08 Apr 2022
Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results T. Ridnik Hussam Lawen Emanuel Ben-Baruch Asaf Noy 36 11 0 07 Apr 2022
Event Transformer. A sparse-aware solution for efficient event data processing Alberto Sabater Luis Montesano Ana C. Murillo 21 51 0 07 Apr 2022
FedCos: A Scene-adaptive Federated Optimization Enhancement for Performance Improvement Hao Zhang Tingting Wu Siyao Cheng Jie Liu FedML 30 11 0 07 Apr 2022
Omni-DETR: Omni-Supervised Object Detection with Transformers Pei Wang Zhaowei Cai Hao Yang Gurumurthy Swaminathan Nuno Vasconcelos Bernt Schiele Stefano Soatto 24 40 0 30 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification Juncheng Billy Li Shuhui Qu Po-Yao (Bernie) Huang Florian Metze VLM 22 9 0 25 Mar 2022
Closing the Generalization Gap of Cross-silo Federated Medical Image Segmentation An Xu Wenqi Li Pengfei Guo Dong Yang H. Roth Ali Hatamizadeh Can Zhao Daguang Xu Heng-Chiao Huang Ziyue Xu FedML 28 51 0 18 Mar 2022
Flexible Amortized Variational Inference in qBOLD MRI Ivor J. A. Simpson Ashley McManamon Balázs Örzsik A. Stone N. Blockley Iris Asllani A. Colasanti M. Cercignani 14 0 0 11 Mar 2022
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization Xiuying Wei Ruihao Gong Yuhang Li Xianglong Liu F. Yu MQ VLM 19 166 0 11 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Mitchell Wortsman Gabriel Ilharco S. Gadre Rebecca Roelofs Raphael Gontijo-Lopes ... Hongseok Namkoong Ali Farhadi Y. Carmon Simon Kornblith Ludwig Schmidt MoMe 42 909 1 10 Mar 2022
Low-Loss Subspace Compression for Clean Gains against Multi-Agent Backdoor Attacks Siddhartha Datta N. Shadbolt AAML 21 6 0 07 Mar 2022
Scalable Uncertainty Quantification for Deep Operator Networks using Randomized Priors Yibo Yang Georgios Kissas P. Perdikaris BDL UQCV 20 40 0 06 Mar 2022
Towards a Common Speech Analysis Engine Hagai Aronowitz Itai Gat E. Morais Weizhong Zhu R. Hoory 18 3 0 01 Mar 2022
Adversarial robustness of sparse local Lipschitz predictors Ramchandran Muthukumar Jeremias Sulam AAML 32 13 0 26 Feb 2022
Interacting Contour Stochastic Gradient Langevin Dynamics Wei Deng Siqi Liang Botao Hao Guang Lin F. Liang BDL 21 10 0 20 Feb 2022
Sparsity Winning Twice: Better Robust Generalization from More Efficient Training Tianlong Chen Zhenyu (Allen) Zhang Pengju Wang Santosh Balachandra Haoyu Ma Zehao Wang Zhangyang Wang OOD AAML 77 46 0 20 Feb 2022
PFGE: Parsimonious Fast Geometric Ensembling of DNNs Hao Guo Jiyong Jin B. Liu FedML 11 1 0 14 Feb 2022
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data Yaoqing Yang Ryan Theisen Liam Hodgkinson Joseph E. Gonzalez Kannan Ramchandran Charles H. Martin Michael W. Mahoney 82 17 0 06 Feb 2022
When Do Flat Minima Optimizers Work? Jean Kaddour Linqing Liu Ricardo M. A. Silva Matt J. Kusner ODL 11 58 0 01 Feb 2022
Stochastic Neural Networks with Infinite Width are Deterministic Liu Ziyin Hanlin Zhang Xiangming Meng Yuting Lu Eric P. Xing Masakuni Ueda 21 3 0 30 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape Devansh Bisla Jing Wang A. Choromańska 25 34 0 20 Jan 2022