ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.01412
  4. Cited By
Sharpness-Aware Minimization for Efficiently Improving Generalization

Sharpness-Aware Minimization for Efficiently Improving Generalization

3 October 2020
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
    AAML
ArXivPDFHTML

Papers citing "Sharpness-Aware Minimization for Efficiently Improving Generalization"

50 / 867 papers shown
Title
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More
  Compressible Models
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
62
19
0
25 May 2022
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch
  Models
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models
A. Engel
Zhichao Wang
Anand D. Sarwate
Sutanay Choudhury
Tony Chiang
22
3
0
24 May 2022
Alleviating Robust Overfitting of Adversarial Training With Consistency
  Regularization
Alleviating Robust Overfitting of Adversarial Training With Consistency Regularization
Shudong Zhang
Haichang Gao
Tianwei Zhang
Yunyi Zhou
Zihui Wu
AAML
18
3
0
24 May 2022
Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks
  for Lighter, Faster and Robust Models
Training Efficient CNNS: Tweaking the Nuts and Bolts of Neural Networks for Lighter, Faster and Robust Models
Sabeesh Ethiraj
B. Bolla
17
2
0
23 May 2022
Vision Transformers in 2022: An Update on Tiny ImageNet
Vision Transformers in 2022: An Update on Tiny ImageNet
Ethan Huynh
ViT
31
11
0
21 May 2022
Temporally Precise Action Spotting in Soccer Videos Using Dense
  Detection Anchors
Temporally Precise Action Spotting in Soccer Videos Using Dense Detection Anchors
J. C. V. Soares
Avijit Shah
Topojoy Biswas
35
32
0
20 May 2022
Diverse Weight Averaging for Out-of-Distribution Generalization
Diverse Weight Averaging for Out-of-Distribution Generalization
Alexandre Ramé
Matthieu Kirchmeyer
Thibaud Rahier
A. Rakotomamonjy
Patrick Gallinari
Matthieu Cord
OOD
196
128
0
19 May 2022
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
Analyzing Lottery Ticket Hypothesis from PAC-Bayesian Theory Perspective
Keitaro Sakamoto
Issei Sato
28
9
0
15 May 2022
Discovering and Explaining the Representation Bottleneck of Graph Neural
  Networks from Multi-order Interactions
Discovering and Explaining the Representation Bottleneck of Graph Neural Networks from Multi-order Interactions
Fang Wu
Siyuan Li
Lirong Wu
Dragomir R. Radev
Stan Z. Li
27
2
0
15 May 2022
Goldilocks-curriculum Domain Randomization and Fractal Perlin Noise with
  Application to Sim2Real Pneumonia Lesion Detection
Goldilocks-curriculum Domain Randomization and Fractal Perlin Noise with Application to Sim2Real Pneumonia Lesion Detection
Takahiro Suzuki
S. Hanaoka
Issei Sato
OOD
MedIm
26
1
0
29 Apr 2022
Detecting Deepfakes with Self-Blended Images
Detecting Deepfakes with Self-Blended Images
Kaede Shiohara
T. Yamasaki
26
291
0
18 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
72
6,637
0
13 Apr 2022
Few-Shot Forecasting of Time-Series with Heterogeneous Channels
Few-Shot Forecasting of Time-Series with Heterogeneous Channels
L. Brinkmeyer
Rafael Rêgo Drumond
Johannes Burchert
Lars Schmidt-Thieme
AI4TS
22
7
0
07 Apr 2022
Exploiting Explainable Metrics for Augmented SGD
Exploiting Explainable Metrics for Augmented SGD
Mahdi S. Hosseini
Mathieu Tuli
Konstantinos N. Plataniotis
AAML
14
3
0
31 Mar 2022
Frame-level Prediction of Facial Expressions, Valence, Arousal and
  Action Units for Mobile Devices
Frame-level Prediction of Facial Expressions, Valence, Arousal and Action Units for Mobile Devices
Andrey V. Savchenko
CVBM
15
30
0
25 Mar 2022
ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator
ViT-FOD: A Vision Transformer based Fine-grained Object Discriminator
Zi-Chao Zhang
Zhen-Duo Chen
Yongxin Wang
Xin Luo
Xin-Shun Xu
ViT
22
6
0
24 Mar 2022
Improving Generalization in Federated Learning by Seeking Flat Minima
Improving Generalization in Federated Learning by Seeking Flat Minima
Debora Caldarola
Barbara Caputo
Marco Ciccone
FedML
27
110
0
22 Mar 2022
The activity-weight duality in feed forward neural networks: The
  geometric determinants of generalization
The activity-weight duality in feed forward neural networks: The geometric determinants of generalization
Yu Feng
Yuhai Tu
MLT
75
14
0
21 Mar 2022
Randomized Sharpness-Aware Training for Boosting Computational
  Efficiency in Deep Learning
Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
16
9
0
18 Mar 2022
DeepAD: A Robust Deep Learning Model of Alzheimer's Disease Progression
  for Real-World Clinical Applications
DeepAD: A Robust Deep Learning Model of Alzheimer's Disease Progression for Real-World Clinical Applications
Somaye Hashemifar
C. Iriondo
Evan Casey
Mohsen Hejrati
for Alzheimer's Disease Neuroimaging Initiative
OOD
MedIm
20
3
0
17 Mar 2022
A New Quantum CNN Model for Image Classification
A New Quantum CNN Model for Image Classification
Xing-Qiang Zhao
Tianlong Chen
9
0
0
16 Mar 2022
Can Neural Nets Learn the Same Model Twice? Investigating
  Reproducibility and Double Descent from the Decision Boundary Perspective
Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective
Gowthami Somepalli
Liam H. Fowl
Arpit Bansal
Ping Yeh-Chiang
Yehuda Dar
Richard Baraniuk
Micah Goldblum
Tom Goldstein
13
64
0
15 Mar 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training
Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang
Boqing Gong
Liangzhe Yuan
Yin Cui
Hartwig Adam
Nicha Dvornek
S. Tatikonda
James Duncan
Ting Liu
22
146
0
15 Mar 2022
QDrop: Randomly Dropping Quantization for Extremely Low-bit
  Post-Training Quantization
QDrop: Randomly Dropping Quantization for Extremely Low-bit Post-Training Quantization
Xiuying Wei
Ruihao Gong
Yuhang Li
Xianglong Liu
F. Yu
MQ
VLM
19
166
0
11 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
54
914
1
10 Mar 2022
Adaptor: Objective-Centric Adaptation Framework for Language Models
Adaptor: Objective-Centric Adaptation Framework for Language Models
Michal vStefánik
Vít Novotný
Nikola Groverová
Petr Sojka
27
10
0
08 Mar 2022
Flat minima generalize for low-rank matrix recovery
Flat minima generalize for low-rank matrix recovery
Lijun Ding
D. Drusvyatskiy
Maryam Fazel
Zaid Harchaoui
26
16
0
07 Mar 2022
$β$-DARTS: Beta-Decay Regularization for Differentiable Architecture
  Search
βββ-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
Peng Ye
Baopu Li
Yikang Li
Tao Chen
Jiayuan Fan
Wanli Ouyang
13
101
0
03 Mar 2022
Color Space-based HoVer-Net for Nuclei Instance Segmentation and
  Classification
Color Space-based HoVer-Net for Nuclei Instance Segmentation and Classification
Hussam Azzuni
Muhammad Ridzuan
Min Xu
Mohammad Yaqub
38
6
0
03 Mar 2022
Towards Class-agnostic Tracking Using Feature Decorrelation in Point
  Clouds
Towards Class-agnostic Tracking Using Feature Decorrelation in Point Clouds
Shengjing Tian
Jun Liu
Xiuping Liu
3DPC
27
4
0
28 Feb 2022
Adversarial robustness of sparse local Lipschitz predictors
Adversarial robustness of sparse local Lipschitz predictors
Ramchandran Muthukumar
Jeremias Sulam
AAML
32
13
0
26 Feb 2022
Tackling benign nonconvexity with smoothing and stochastic gradients
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
20
8
0
18 Feb 2022
How Do Vision Transformers Work?
How Do Vision Transformers Work?
Namuk Park
Songkuk Kim
ViT
32
465
0
14 Feb 2022
Parametric t-Stochastic Neighbor Embedding With Quantum Neural Network
Parametric t-Stochastic Neighbor Embedding With Quantum Neural Network
Yoshiaki Kawase
K. Mitarai
Keisuke Fujii
26
5
0
09 Feb 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in
  Deep Learning
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
30
116
0
08 Feb 2022
Towards an Analytical Definition of Sufficient Data
Towards an Analytical Definition of Sufficient Data
Adam Byerly
T. Kalganova
27
4
0
07 Feb 2022
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of
  Flat Regions in the Landscape Geometry
Deep Networks on Toroids: Removing Symmetries Reveals the Structure of Flat Regions in the Landscape Geometry
Fabrizio Pittorino
Antonio Ferraro
Gabriele Perugini
Christoph Feinauer
Carlo Baldassi
R. Zecchina
201
24
0
07 Feb 2022
Evaluating natural language processing models with generalization
  metrics that do not need access to any training or testing data
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
Yaoqing Yang
Ryan Theisen
Liam Hodgkinson
Joseph E. Gonzalez
Kannan Ramchandran
Charles H. Martin
Michael W. Mahoney
86
17
0
06 Feb 2022
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for
  Training Large Transformer Models
No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Chen Liang
Haoming Jiang
Simiao Zuo
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
T. Zhao
17
14
0
06 Feb 2022
Learning strides in convolutional neural networks
Learning strides in convolutional neural networks
Rachid Riad
O. Teboul
David Grangier
Neil Zeghidour
30
41
0
03 Feb 2022
Deep Hierarchy in Bandits
Deep Hierarchy in Bandits
Joey Hong
B. Kveton
S. Katariya
Manzil Zaheer
Mohammad Ghavamzadeh
25
20
0
03 Feb 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
11
58
0
01 Feb 2022
Fortuitous Forgetting in Connectionist Networks
Fortuitous Forgetting in Connectionist Networks
Hattie Zhou
Ankit Vani
Hugo Larochelle
Aaron Courville
CLL
11
42
0
01 Feb 2022
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language
  Models via Efficient Large-Batch Adversarial Noise
ScaLA: Accelerating Adaptation of Pre-Trained Transformer-Based Language Models via Efficient Large-Batch Adversarial Noise
Minjia Zhang
U. Niranjan
Yuxiong He
23
1
0
29 Jan 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
29
5
0
23 Jan 2022
Learning to Minimize the Remainder in Supervised Learning
Learning to Minimize the Remainder in Supervised Learning
Yan Luo
Yongkang Wong
Mohan S. Kankanhalli
Qi Zhao
44
1
0
23 Jan 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
  Optimization Landscape
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
Neighborhood Region Smoothing Regularization for Finding Flat Minima In
  Deep Neural Networks
Neighborhood Region Smoothing Regularization for Finding Flat Minima In Deep Neural Networks
Yang Zhao
Hao Zhang
22
1
0
16 Jan 2022
There is a Singularity in the Loss Landscape
M. Lowell
14
0
0
12 Jan 2022
Communication-Efficient Federated Learning with Accelerated Client
  Gradient
Communication-Efficient Federated Learning with Accelerated Client Gradient
Geeho Kim
Jinkyu Kim
Bohyung Han
FedML
32
11
0
10 Jan 2022
Previous
123...1415161718
Next