ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.00089
  4. Cited By
Understanding AdamW through Proximal Methods and Scale-Freeness

Understanding AdamW through Proximal Methods and Scale-Freeness

31 January 2022
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
ArXivPDFHTML

Papers citing "Understanding AdamW through Proximal Methods and Scale-Freeness"

26 / 26 papers shown
Title
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
BrAIcht, a theatrical agent that speaks like Bertolt Brecht's characters
Baz Roland
Kristina Malyseva
Anna Pappa
Tristan Cazenave
59
0
0
29 Apr 2025
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Understanding Gradient Orthogonalization for Deep Learning via Non-Euclidean Trust-Region Optimization
Dmitry Kovalev
52
0
0
16 Mar 2025
Generative Adversarial Networks for High-Dimensional Item Factor Analysis: A Deep Adversarial Learning Algorithm
Nanyu Luo
Feng Ji
DRL
33
0
0
15 Feb 2025
Harnessing Loss Decomposition for Long-Horizon Wave Predictions via Deep
  Neural Networks
Harnessing Loss Decomposition for Long-Horizon Wave Predictions via Deep Neural Networks
Indu Kant Deo
R. Jaiman
57
1
0
04 Dec 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open
  Detection
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection
Pengfei Qi
Yifei Zhang
Wenqiang Li
Youwen Hu
Kunlong Bai
ObjD
30
0
0
10 Sep 2024
FoldGPT: Simple and Effective Large Language Model Compression Scheme
FoldGPT: Simple and Effective Large Language Model Compression Scheme
Songwei Liu
Chao Zeng
Lianqiang Li
Chenqian Yan
Lean Fu
Xing Mei
Fangmin Chen
40
4
0
01 Jul 2024
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Large Batch Analysis for Adagrad Under Anisotropic Smoothness
Yuxing Liu
Rui Pan
Tong Zhang
19
4
0
21 Jun 2024
Prototypical Reward Network for Data-Efficient RLHF
Prototypical Reward Network for Data-Efficient RLHF
Jinghan Zhang
Xiting Wang
Yiqiao Jin
Changyu Chen
Xinhao Zhang
Kunpeng Liu
ALM
41
18
0
06 Jun 2024
ExplainableDetector: Exploring Transformer-based Language Modeling
  Approach for SMS Spam Detection with Explainability Analysis
ExplainableDetector: Exploring Transformer-based Language Modeling Approach for SMS Spam Detection with Explainability Analysis
Mohammad Amaz Uddin
Muhammad Nazrul Islam
Leandros A. Maglaras
Helge Janicke
Iqbal H. Sarker
32
2
0
12 May 2024
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
Implicit Bias of AdamW: ℓ∞\ell_\inftyℓ∞​ Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
27
12
0
05 Apr 2024
EEGDiR: Electroencephalogram denoising network for temporal information
  storage and global modeling through Retentive Network
EEGDiR: Electroencephalogram denoising network for temporal information storage and global modeling through Retentive Network
Bin Wang
Fei Deng
Peifan Jiang
25
6
0
20 Mar 2024
TAPTR: Tracking Any Point with Transformers as Detection
TAPTR: Tracking Any Point with Transformers as Detection
Hongyang Li
Hao Zhang
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Lei Zhang
32
19
0
19 Mar 2024
An Explainable Transformer-based Model for Phishing Email Detection: A
  Large Language Model Approach
An Explainable Transformer-based Model for Phishing Email Detection: A Large Language Model Approach
Mohammad Amaz Uddin
Iqbal H. Sarker
24
14
0
21 Feb 2024
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme
  Precipitation Nowcasting
GA-SmaAt-GNet: Generative Adversarial Small Attention GNet for Extreme Precipitation Nowcasting
Eloy Reulen
S. Mehrkanoon
26
3
0
18 Jan 2024
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant
  Stochastic Algorithms
SANIA: Polyak-type Optimization Framework Leads to Scale Invariant Stochastic Algorithms
Farshed Abdukhakimov
Chulu Xiang
Dmitry Kamzolov
Robert Mansel Gower
Martin Takáč
32
2
0
28 Dec 2023
An Improved Transformer-based Model for Detecting Phishing, Spam, and
  Ham: A Large Language Model Approach
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach
Suhaima Jamal
H. Wimmer
16
18
0
01 Nov 2023
Transformer-based classification of user queries for medical consultancy
  with respect to expert specialization
Transformer-based classification of user queries for medical consultancy with respect to expert specialization
Dmitry Lyutkin
A. Soloviev
Dmitry V. Zhukov
Denis Pozdnyakov
Muhammad Shahid Iqbal Malik
D. Ignatov
MedIm
20
0
0
26 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
11
20
0
12 Sep 2023
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack
  Descriptions
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions
Reza Fayyazi
S. Yang
11
14
0
24 Jun 2023
Regex-augmented Domain Transfer Topic Classification based on a
  Pre-trained Language Model: An application in Financial Domain
Regex-augmented Domain Transfer Topic Classification based on a Pre-trained Language Model: An application in Financial Domain
Vanessa Liao
Syed Shariyar Murtaza
Yifan Nie
Jimmy J. Lin
19
0
0
23 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
22
10
0
12 May 2023
A Stochastic Proximal Polyak Step Size
A Stochastic Proximal Polyak Step Size
Fabian Schaipp
Robert Mansel Gower
M. Ulbrich
8
12
0
12 Jan 2023
Robustness to Unbounded Smoothness of Generalized SignSGD
Robustness to Unbounded Smoothness of Generalized SignSGD
M. Crawshaw
Mingrui Liu
Francesco Orabona
Wei Zhang
Zhenxun Zhuang
AAML
20
62
0
23 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
  Models
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
33
147
0
13 Aug 2022
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
A High Probability Analysis of Adaptive SGD with Momentum
A High Probability Analysis of Adaptive SGD with Momentum
Xiaoyun Li
Francesco Orabona
81
64
0
28 Jul 2020
1