Sharpness-Aware Minimization for Efficiently Improving Generalization

3 October 2020

Papers citing "Sharpness-Aware Minimization for Efficiently Improving Generalization"

50 / 867 papers shown

Title
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models Clara Na Sanket Vaibhav Mehta Emma Strubell 62 19 0 25 May 2022
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models A. Engel Zhichao Wang Anand D. Sarwate Sutanay Choudhury Tony Chiang 22 3 0 24 May 2022
Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization Shudong Zhang Haichang Gao Tianwei Zhang Yunyi Zhou Zihui Wu AAML 18 3 0 24 May 2022
Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models Sabeesh Ethiraj B. Bolla 17 2 0 23 May 2022
Vision Transformers in 2022: An Update on Tiny ImageNet Ethan Huynh ViT 31 11 0 21 May 2022
Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors J. C. V. Soares Avijit Shah Topojoy Biswas 35 32 0 20 May 2022
Diverse Weight Averaging for Out-of-Distribution Generalization Alexandre Ramé Matthieu Kirchmeyer Thibaud Rahier A. Rakotomamonjy Patrick Gallinari Matthieu Cord OOD 196 128 0 19 May 2022
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective Keitaro Sakamoto Issei Sato 28 9 0 15 May 2022
Discovering and Explaining the Representation Bottleneck of Graph Neural Networks from Multi-order Interactions Fang Wu Siyuan Li Lirong Wu Dragomir R. Radev Stan Z. Li 27 2 0 15 May 2022
Goldilocks-curriculum Domain Randomization and Fractal Perlin Noise with Application to Sim2Real Pneumonia Lesion Detection Takahiro Suzuki S. Hanaoka Issei Sato OOD MedIm 26 1 0 29 Apr 2022
Detecting Deepfakes with Self-Blended Images Kaede Shiohara T. Yamasaki 26 291 0 18 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents Aditya A. Ramesh Prafulla Dhariwal Alex Nichol Casey Chu Mark Chen VLM DiffM 72 6,637 0 13 Apr 2022
Few-Shot Forecasting of Time-Series with Heterogeneous Channels L. Brinkmeyer Rafael Rêgo Drumond Johannes Burchert Lars Schmidt-Thieme AI4TS 22 7 0 07 Apr 2022
Exploiting Explainable Metrics for Augmented SGD Mahdi S. Hosseini Mathieu Tuli Konstantinos N. Plataniotis AAML 14 3 0 31 Mar 2022
Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices Andrey V. Savchenko CVBM 15 30 0 25 Mar 2022
ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator Zi-Chao Zhang Zhen-Duo Chen Yongxin Wang Xin Luo Xin-Shun Xu ViT 22 6 0 24 Mar 2022
Improving Generalization in Federated Learning by Seeking Flat Minima Debora Caldarola Barbara Caputo Marco Ciccone FedML 27 110 0 22 Mar 2022
The activity-weight duality in feed forward neural networks: The geometric determinants of generalization Yu Feng Yuhai Tu MLT 75 14 0 21 Mar 2022
Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning Yang Zhao Hao Zhang Xiuyuan Hu 16 9 0 18 Mar 2022
DeepAD: A Robust Deep Learning Model of Alzheimer's Disease Progression for Real-World Clinical Applications Somaye Hashemifar C. Iriondo Evan Casey Mohsen Hejrati for Alzheimer's Disease Neuroimaging Initiative OOD MedIm 20 3 0 17 Mar 2022
A New Quantum CNN Model for Image Classification Xing-Qiang Zhao Tianlong Chen 9 0 0 16 Mar 2022
Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective Gowthami Somepalli Liam H. Fowl Arpit Bansal Ping Yeh-Chiang Yehuda Dar Richard Baraniuk Micah Goldblum Tom Goldstein 13 64 0 15 Mar 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training Juntang Zhuang Boqing Gong Liangzhe Yuan Yin Cui Hartwig Adam Nicha Dvornek S. Tatikonda James Duncan Ting Liu 22 146 0 15 Mar 2022
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization Xiuying Wei Ruihao Gong Yuhang Li Xianglong Liu F. Yu MQ VLM 19 166 0 11 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time Mitchell Wortsman Gabriel Ilharco S. Gadre Rebecca Roelofs Raphael Gontijo-Lopes ... Hongseok Namkoong Ali Farhadi Y. Carmon Simon Kornblith Ludwig Schmidt MoMe 54 914 1 10 Mar 2022
Adaptor: Objective-Centric Adaptation Framework for Language Models Michal vStefánik Vít Novotný Nikola Groverová Petr Sojka 27 10 0 08 Mar 2022
Flat minima generalize for low-rank matrix recovery Lijun Ding D. Drusvyatskiy Maryam Fazel Zaid Harchaoui 26 16 0 07 Mar 2022
$β$ -DARTS: Beta-Decay Regularization for Differentiable Architecture Search Peng Ye Baopu Li Yikang Li Tao Chen Jiayuan Fan Wanli Ouyang 13 101 0 03 Mar 2022
Color Space-based HoVer-Net for Nuclei Instance Segmentation and Classification Hussam Azzuni Muhammad Ridzuan Min Xu Mohammad Yaqub 38 6 0 03 Mar 2022
Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds Shengjing Tian Jun Liu Xiuping Liu 3DPC 27 4 0 28 Feb 2022
Adversarial robustness of sparse local Lipschitz predictors Ramchandran Muthukumar Jeremias Sulam AAML 32 13 0 26 Feb 2022
Tackling benign nonconvexity with smoothing and stochastic gradients Harsh Vardhan Sebastian U. Stich 20 8 0 18 Feb 2022
How Do Vision Transformers Work? Namuk Park Songkuk Kim ViT 32 465 0 14 Feb 2022
Parametric t-Stochastic Neighbor Embedding With Quantum Neural Network Yoshiaki Kawase K. Mitarai Keisuke Fujii 26 5 0 09 Feb 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning Yang Zhao Hao Zhang Xiuyuan Hu 30 116 0 08 Feb 2022
Towards an Analytical Definition of Sufficient Data Adam Byerly T. Kalganova 27 4 0 07 Feb 2022
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry Fabrizio Pittorino Antonio Ferraro Gabriele Perugini Christoph Feinauer Carlo Baldassi R. Zecchina 201 24 0 07 Feb 2022
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data Yaoqing Yang Ryan Theisen Liam Hodgkinson Joseph E. Gonzalez Kannan Ramchandran Charles H. Martin Michael W. Mahoney 86 17 0 06 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models Chen Liang Haoming Jiang Simiao Zuo Pengcheng He Xiaodong Liu Jianfeng Gao Weizhu Chen T. Zhao 17 14 0 06 Feb 2022
Learning strides in convolutional neural networks Rachid Riad O. Teboul David Grangier Neil Zeghidour 30 41 0 03 Feb 2022
Deep Hierarchy in Bandits Joey Hong B. Kveton S. Katariya Manzil Zaheer Mohammad Ghavamzadeh 25 20 0 03 Feb 2022
When Do Flat Minima Optimizers Work? Jean Kaddour Linqing Liu Ricardo M. A. Silva Matt J. Kusner ODL 11 58 0 01 Feb 2022
Fortuitous Forgetting in Connectionist Networks Hattie Zhou Ankit Vani Hugo Larochelle Aaron Courville CLL 11 42 0 01 Feb 2022
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise Minjia Zhang U. Niranjan Yuxiong He 23 1 0 29 Jan 2022
Weight Expansion: A New Perspective on Dropout and Generalization Gao Jin Xinping Yi Pengfei Yang Lijun Zhang S. Schewe Xiaowei Huang 29 5 0 23 Jan 2022
Learning to Minimize the Remainder in Supervised Learning Yan Luo Yongkang Wong Mohan S. Kankanhalli Qi Zhao 44 1 0 23 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape Devansh Bisla Jing Wang A. Choromańska 25 34 0 20 Jan 2022
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks Yang Zhao Hao Zhang 22 1 0 16 Jan 2022
There is a Singularity in the Loss Landscape M. Lowell 14 0 0 12 Jan 2022
Communication-Efficient Federated Learning with Accelerated Client Gradient Geeho Kim Jinkyu Kim Bohyung Han FedML 32 11 0 10 Jan 2022