Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.00089
Cited By
Understanding AdamW through Proximal Methods and Scale-Freeness
31 January 2022
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Understanding AdamW through Proximal Methods and Scale-Freeness"
26 / 26 papers shown
Title
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
Baz Roland
Kristina Malyseva
Anna Pappa
Tristan Cazenave
59
0
0
29 Apr 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
52
0
0
16 Mar 2025
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm
Nanyu Luo
Feng Ji
DRL
31
0
0
15 Feb 2025
Harnessing Loss Decomposition for Long-Horizon Wave Predictions via Deep Neural Networks
Indu Kant Deo
R. Jaiman
57
1
0
04 Dec 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection
Pengfei Qi
Yifei Zhang
Wenqiang Li
Youwen Hu
Kunlong Bai
ObjD
25
0
0
10 Sep 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Songwei Liu
Chao Zeng
Lianqiang Li
Chenqian Yan
Lean Fu
Xing Mei
Fangmin Chen
40
4
0
01 Jul 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Rui Pan
Tong Zhang
19
4
0
21 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF
Jinghan Zhang
Xiting Wang
Yiqiao Jin
Changyu Chen
Xinhao Zhang
Kunpeng Liu
ALM
31
18
0
06 Jun 2024
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis
Mohammad Amaz Uddin
Muhammad Nazrul Islam
Leandros A. Maglaras
Helge Janicke
Iqbal H. Sarker
30
2
0
12 May 2024
Implicit Bias of AdamW:
ℓ
∞
\ell_\infty
ℓ
∞
Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
22
12
0
05 Apr 2024
EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network
Bin Wang
Fei Deng
Peifan Jiang
25
6
0
20 Mar 2024
TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li
Hao Zhang
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Lei Zhang
29
19
0
19 Mar 2024
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach
Mohammad Amaz Uddin
Iqbal H. Sarker
22
14
0
21 Feb 2024
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting
Eloy Reulen
S. Mehrkanoon
26
3
0
18 Jan 2024
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms
Farshed Abdukhakimov
Chulu Xiang
Dmitry Kamzolov
Robert Mansel Gower
Martin Takáč
32
2
0
28 Dec 2023
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach
Suhaima Jamal
H. Wimmer
14
18
0
01 Nov 2023
Transformer-based classification of user queries for medical consultancy with respect to expert specialization
Dmitry Lyutkin
A. Soloviev
Dmitry V. Zhukov
Denis Pozdnyakov
Muhammad Shahid Iqbal Malik
D. Ignatov
MedIm
18
0
0
26 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
9
20
0
12 Sep 2023
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions
Reza Fayyazi
S. Yang
11
14
0
24 Jun 2023
Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain
Vanessa Liao
Syed Shariyar Murtaza
Yifan Nie
Jimmy J. Lin
19
0
0
23 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
22
10
0
12 May 2023
A Stochastic Proximal Polyak Step Size
Fabian Schaipp
Robert Mansel Gower
M. Ulbrich
6
12
0
12 Jan 2023
Robustness to Unbounded Smoothness of Generalized SignSGD
M. Crawshaw
Mingrui Liu
Francesco Orabona
Wei Zhang
Zhenxun Zhuang
AAML
20
62
0
23 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
33
147
0
13 Aug 2022
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
220
512
0
11 Feb 2021
A High Probability Analysis of Adaptive SGD with Momentum
Xiaoyun Li
Francesco Orabona
81
64
0
28 Jul 2020
1