Understanding AdamW through Proximal Methods and Scale-Freeness

31 January 2022

Papers citing "Understanding AdamW through Proximal Methods and Scale-Freeness"

26 / 26 papers shown

Title
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters Baz Roland Kristina Malyseva Anna Pappa Tristan Cazenave 59 0 0 29 Apr 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization Dmitry Kovalev 52 0 0 16 Mar 2025
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm Nanyu Luo Feng Ji DRL 33 0 0 15 Feb 2025
Harnessing Loss Decomposition for Long-Horizon Wave Predictions via Deep Neural Networks Indu Kant Deo R. Jaiman 57 1 0 04 Dec 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection Pengfei Qi Yifei Zhang Wenqiang Li Youwen Hu Kunlong Bai ObjD 30 0 0 10 Sep 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme Songwei Liu Chao Zeng Lianqiang Li Chenqian Yan Lean Fu Xing Mei Fangmin Chen 40 4 0 01 Jul 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness Yuxing Liu Rui Pan Tong Zhang 19 4 0 21 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF Jinghan Zhang Xiting Wang Yiqiao Jin Changyu Chen Xinhao Zhang Kunpeng Liu ALM 41 18 0 06 Jun 2024
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis Mohammad Amaz Uddin Muhammad Nazrul Islam Leandros A. Maglaras Helge Janicke Iqbal H. Sarker 32 2 0 12 May 2024
$Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization$ Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization Shuo Xie Zhiyuan Li OffRL 27 12 0 05 Apr 2024
EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network Bin Wang Fei Deng Peifan Jiang 25 6 0 20 Mar 2024
TAPTR: Tracking Any Point with Transformers as Detection Hongyang Li Hao Zhang Shilong Liu Zhaoyang Zeng Tianhe Ren Feng Li Lei Zhang 32 19 0 19 Mar 2024
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach Mohammad Amaz Uddin Iqbal H. Sarker 24 14 0 21 Feb 2024
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting Eloy Reulen S. Mehrkanoon 26 3 0 18 Jan 2024
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms Farshed Abdukhakimov Chulu Xiang Dmitry Kamzolov Robert Mansel Gower Martin Takáč 32 2 0 28 Dec 2023
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach Suhaima Jamal H. Wimmer 16 18 0 01 Nov 2023
Transformer-based classification of user queries for medical consultancy with respect to expert specialization Dmitry Lyutkin A. Soloviev Dmitry V. Zhukov Denis Pozdnyakov Muhammad Shahid Iqbal Malik D. Ignatov MedIm 20 0 0 26 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale Hao-Jun Michael Shi Tsung-Hsien Lee Shintaro Iwasaki Jose Gallego-Posada Zhijing Li Kaushik Rangadurai Dheevatsa Mudigere Michael Rabbat ODL 11 20 0 12 Sep 2023
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions Reza Fayyazi S. Yang 11 14 0 24 Jun 2023
Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain Vanessa Liao Syed Shariyar Murtaza Yifan Nie Jimmy J. Lin 19 0 0 23 May 2023
MoMo: Momentum Models for Adaptive Learning Rates Fabian Schaipp Ruben Ohana Michael Eickenberg Aaron Defazio Robert Mansel Gower 22 10 0 12 May 2023
A Stochastic Proximal Polyak Step Size Fabian Schaipp Robert Mansel Gower M. Ulbrich 8 12 0 12 Jan 2023
Robustness to Unbounded Smoothness of Generalized SignSGD M. Crawshaw Mingrui Liu Francesco Orabona Wei Zhang Zhenxun Zhuang AAML 20 62 0 23 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models Xingyu Xie Pan Zhou Huan Li Zhouchen Lin Shuicheng Yan ODL 33 147 0 13 Aug 2022
High-Performance Large-Scale Image Recognition Without Normalization Andrew Brock Soham De Samuel L. Smith Karen Simonyan VLM 223 512 0 11 Feb 2021
A High Probability Analysis of Adaptive SGD with Momentum Xiaoyun Li Francesco Orabona 81 64 0 28 Jul 2020