ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.05101
  4. Cited By
Decoupled Weight Decay Regularization
v1v2v3 (latest)

Decoupled Weight Decay Regularization

14 November 2017
I. Loshchilov
Katharina Eggensperger
    OffRL
ArXiv (abs)PDFHTMLGithub (275★)

Papers citing "Decoupled Weight Decay Regularization"

50 / 1,216 papers shown
Open-domain Implicit Format Control for Large Language Model Generation
Open-domain Implicit Format Control for Large Language Model Generation
Yiqun Yao
Wenjia Ma
Xuezhi Fang
Xin Jiang
Xiang Li
Xuying Meng
Peng Han
Jing Li
Aixin Sun
Yequan Wang
284
2
0
08 Aug 2024
Lightweight Video Denoising Using a Classic Bayesian Backbone
Lightweight Video Denoising Using a Classic Bayesian BackboneIEEE International Conference on Multimedia and Expo (ICME), 2024
Clement Bled
François Pitié
210
4
0
07 Aug 2024
Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid
  and Tanh Function
Adaptive Friction in Deep Learning: Enhancing Optimizers with Sigmoid and Tanh Function
Hongye Zheng
Bingxing Wang
Minheng Xiao
Honglin Qin
Zhizhong Wu
Lianghao Tan
ODL
173
27
0
07 Aug 2024
GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion Transformer
GLDiTalker: Speech-Driven 3D Facial Animation with Graph Latent Diffusion TransformerInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Yihong Lin
Zhaoxin Fan
Lingyu Xiong
Liang Peng
Xiandong Li
Xiandong Li
Wenxiong Kang
Xiandong Li
Huang Xu
578
7
0
03 Aug 2024
POA: Pre-training Once for Models of All Sizes
POA: Pre-training Once for Models of All SizesEuropean Conference on Computer Vision (ECCV), 2024
Yingying Zhang
Xin Guo
Jiangwei Lao
Lei Yu
Lixiang Ru
Jian Wang
Guo Ye
Huimei He
Jingdong Chen
Ming Yang
431
2
0
02 Aug 2024
Towards Flexible Evaluation for Generative Visual Question Answering
Towards Flexible Evaluation for Generative Visual Question AnsweringACM Multimedia (MM), 2024
Huishan Ji
Q. Si
Zheng Lin
Weiping Wang
229
2
0
01 Aug 2024
Meltemi: The first open Large Language Model for Greek
Meltemi: The first open Large Language Model for Greek
Leon Voukoutis
Dimitris Roussis
Georgios Paraskevopoulos
Sokratis Sofianopoulos
Prokopis Prokopidis
Vassilis Papavasileiou
Athanasios Katsamanis
Stelios Piperidis
Vassilis Katsouros
VLM
177
17
0
30 Jul 2024
LLAVADI: What Matters For Multimodal Large Language Models Distillation
LLAVADI: What Matters For Multimodal Large Language Models Distillation
Shilin Xu
Xiangtai Li
Haobo Yuan
Lu Qi
Yunhai Tong
Ming-Hsuan Yang
216
15
0
28 Jul 2024
Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking
Estimating Earthquake Magnitude in Sentinel-1 Imagery via Ranking
Daniele Rege Cambrin
Isaac Corley
Paolo Garza
Peyman Najafirad
243
1
0
25 Jul 2024
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Unsqueeze [CLS] Bottleneck to Learn Rich Representations
Qing Su
Shihao Ji
295
0
0
24 Jul 2024
Hopfield Networks for Asset Allocation
Hopfield Networks for Asset Allocation
Carlo Nicolini
Monisha Gopalan
Jacopo Staiano
Bruno Lepri
174
1
0
24 Jul 2024
LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using
  LiDAR and Camera
LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera
Yukai Ma
Jianbiao Mei
Xuemeng Yang
Licheng Wen
Weihua Xu
Jiangning Zhang
Ding Wang
Yong-Jin Liu
Xingxing Zuo
255
12
0
23 Jul 2024
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation
  for Video Moment Retrieval
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Yiyang Jiang
Wengyu Zhang
Xu-Lu Zhang
Xiaoyong Wei
Chang Wen Chen
Qing Li
345
24
0
21 Jul 2024
Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement
Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement
Yulin He
Wei Chen
Tianci Xun
Yusong Tan
3DPC
286
1
0
18 Jul 2024
GroupMamba: Efficient Group-Based Visual State Space Model
GroupMamba: Efficient Group-Based Visual State Space Model
Abdelrahman M. Shaker
Syed Talal Wasim
Salman Khan
Juergen Gall
Fahad Shahbaz Khan
Mamba
213
4
0
18 Jul 2024
Missing Modality Prediction for Unpaired Multimodal Learning via Joint
  Embedding of Unimodal Models
Missing Modality Prediction for Unpaired Multimodal Learning via Joint Embedding of Unimodal Models
Donggeun Kim
Taesup Kim
265
12
0
17 Jul 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
357
6
0
17 Jul 2024
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Runyi Yu
Yu Xie
Xiangyang Ji
Li-ming Yuan
Jie Chen
DiffM
196
13
0
15 Jul 2024
Restoring Images in Adverse Weather Conditions via Histogram Transformer
Restoring Images in Adverse Weather Conditions via Histogram Transformer
Shangquan Sun
Wenqi Ren
Xinwei Gao
Rui Wang
Xiaochun Cao
240
94
0
14 Jul 2024
MapLocNet: Coarse-to-Fine Feature Registration for Visual
  Re-Localization in Navigation Maps
MapLocNet: Coarse-to-Fine Feature Registration for Visual Re-Localization in Navigation Maps
Hang Wu
Zhenghao Zhang
Siyuan Lin
Xiangru Mu
Qiang Zhao
Ming Yang
Tong Qin
241
18
0
11 Jul 2024
RoboMorph: Evolving Robot Morphology using Large Language Models
RoboMorph: Evolving Robot Morphology using Large Language Models
Kevin Qiu
Krzysztof Ciebiera
Krzysztof Ciebiera
Marek Cygan
Marek Cygan
Łukasz Kuciński
LM&Ro
335
7
0
11 Jul 2024
Fusion of Short-term and Long-term Attention for Video Mirror Detection
Fusion of Short-term and Long-term Attention for Video Mirror Detection
Mingchen Xu
Jing Wu
Yukun Lai
Ze Ji
164
1
0
10 Jul 2024
Vulnerability-Hunter: An Adaptive Feature Perception Attention Network
  for Smart Contract Vulnerabilities
Vulnerability-Hunter: An Adaptive Feature Perception Attention Network for Smart Contract Vulnerabilities
Yizhou Chen
147
2
0
07 Jul 2024
ESQA: Event Sequences Question Answering
ESQA: Event Sequences Question Answering
Irina Abdullaeva
Andrei Filatov
Mikhail Orlov
Ivan Karpukhin
Viacheslav Vasilev
Denis Dimitrov
Andrey Kuznetsov
Ivan A Kireev
Ivan A Kireev
226
1
0
03 Jul 2024
Learning from Memory: Non-Parametric Memory Augmented Self-Supervised
  Learning of Visual Features
Learning from Memory: Non-Parametric Memory Augmented Self-Supervised Learning of Visual Features
T. Silva
Hélio Pedrini
Adín Ramírez Rivera
SSL
178
6
0
03 Jul 2024
Predicting Visual Attention in Graphic Design Documents
Predicting Visual Attention in Graphic Design Documents
Souradeep Chakraborty
Zijun Wei
Conor Kelton
Seoyoung Ahn
A. Balasubramanian
G. Zelinsky
Dimitris Samaras
167
16
0
02 Jul 2024
Multi-Modal Video Dialog State Tracking in the Wild
Multi-Modal Video Dialog State Tracking in the Wild
Adnen Abdessaied
Lei Shi
Andreas Bulling
362
4
0
02 Jul 2024
Enhancing Travel Decision-Making: A Contrastive Learning Approach for
  Personalized Review Rankings in Accommodations
Enhancing Travel Decision-Making: A Contrastive Learning Approach for Personalized Review Rankings in Accommodations
Reda Igebaria
Eran Fainman
Sarai Mizrachi
Moran Beladev
Fengjun Wang
158
3
0
30 Jun 2024
Brevity is the soul of wit: Pruning long files for code generation
Brevity is the soul of wit: Pruning long files for code generation
Aaditya K. Singh
Yu Yang
Kushal Tirumala
Mostafa Elhoushi
Ari S. Morcos
SyDa
197
5
0
29 Jun 2024
Into the Unknown: Generating Geospatial Descriptions for New
  Environments
Into the Unknown: Generating Geospatial Descriptions for New Environments
Tzuf Paz-Argaman
John Palowitch
Sayali Kulkarni
Reut Tsarfaty
Jason Baldridge
282
1
0
28 Jun 2024
SignSpeak: Open-Source Time Series Classification for ASL Translation
SignSpeak: Open-Source Time Series Classification for ASL Translation
Aditya Makkar
Divya Makkar
Aarav Patel
Liam Hebert
SLR
143
0
0
27 Jun 2024
Molecular Diffusion Models with Virtual Receptors
Molecular Diffusion Models with Virtual Receptors
Matan Halfon
Eyal Rozenberg
Ehud Rivlin
Daniel Freedman
221
0
0
26 Jun 2024
Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration
Continuous Urban Change Detection from Satellite Image Time Series with Temporal Feature Refinement and Multi-Task Integration
Sebastian Hafner
Heng Fang
Hossein Azizpour
Y. Ban
362
9
0
25 Jun 2024
Sparser is Faster and Less is More: Efficient Sparse Attention for
  Long-Range Transformers
Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
Chao Lou
Zixia Jia
Zilong Zheng
Kewei Tu
ODL
233
50
0
24 Jun 2024
KEHRL: Learning Knowledge-Enhanced Language Representations with
  Hierarchical Reinforcement Learning
KEHRL: Learning Knowledge-Enhanced Language Representations with Hierarchical Reinforcement Learning
Dongyang Li
Taolin Zhang
Longtao Huang
Chengyu Wang
Xiaofeng He
Hui Xue
KELMOffRL
199
0
0
24 Jun 2024
Confidence Regulation Neurons in Language Models
Confidence Regulation Neurons in Language Models
Alessandro Stolfo
Ben Wu
Wes Gurnee
Yonatan Belinkov
Xingyi Song
Mrinmaya Sachan
Neel Nanda
242
39
0
24 Jun 2024
Linearly-Interpretable Concept Embedding Models for Text Analysis
Linearly-Interpretable Concept Embedding Models for Text Analysis
Francesco De Santis
Philippe Bich
Gabriele Ciravegna
Pietro Barbiero
Danilo Giordano
Tania Cerquitelli
311
1
0
20 Jun 2024
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation
Jungdae Lee
Taiki Miyanishi
Shuhei Kurita
Koya Sakamoto
Daichi Azuma
Yutaka Matsuo
Nakamasa Inoue
325
23
0
20 Jun 2024
Active Diffusion Subsampling
Active Diffusion Subsampling
Oisin Nolan
Tristan S. W. Stevens
Wessel L. van Nierop
Ruud J. G. van Sloun
DiffMMedIm
250
6
0
20 Jun 2024
$\texttt{MoE-RBench}$: Towards Building Reliable Language Models with
  Sparse Mixture-of-Experts
MoE-RBench\texttt{MoE-RBench}MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts
Guanjie Chen
Xinyu Zhao
Tianlong Chen
Yu Cheng
MoE
270
6
0
17 Jun 2024
P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models
P-TA: Using Proximal Policy Optimization to Enhance Tabular Data Augmentation via Large Language Models
Shuo Yang
Chenchen Yuan
Yao Rong
Felix Steinbauer
Gjergji Kasneci
233
1
0
17 Jun 2024
Taking a Deep Breath: Enhancing Language Modeling of Large Language
  Models with Sentinel Tokens
Taking a Deep Breath: Enhancing Language Modeling of Large Language Models with Sentinel Tokens
Weiyao Luo
Suncong Zheng
Heming Xia
Weikang Wang
Yan Lei
Tianyu Liu
Shuang Chen
Zhifang Sui
150
2
0
16 Jun 2024
Diffusion Synthesizer for Efficient Multilingual Speech to Speech
  Translation
Diffusion Synthesizer for Efficient Multilingual Speech to Speech TranslationInterspeech (Interspeech), 2024
Nameer Hirschkind
Xiao Yu
Xiao Yu
Joseph Liu
Eloi DuBois
...
Colin Sinclair
Kyle Spence
Charles Shang
Zoë Abrams
Morgan McGuire
149
1
0
14 Jun 2024
SimGen: Simulator-conditioned Driving Scene Generation
SimGen: Simulator-conditioned Driving Scene Generation
Yunsong Zhou
Michael Simon
Zhenghao Peng
Sicheng Mo
Hongzi Zhu
Minyi Guo
Bolei Zhou
VGen
302
23
0
13 Jun 2024
3M: Multi-modal Multi-task Multi-teacher Learning for Game Event
  Detection
3M: Multi-modal Multi-task Multi-teacher Learning for Game Event Detection
Thye Shan Ng
Feiqi Cao
S. Han
111
0
0
13 Jun 2024
Deep Transformer Network for Monocular Pose Estimation of Shipborne Unmanned Aerial Vehicle
Deep Transformer Network for Monocular Pose Estimation of Shipborne Unmanned Aerial Vehicle
Maneesha Wickramasuriya
Taeyoung Lee
Murray Snyder
MDEViT
123
1
0
13 Jun 2024
SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D
  Generative Models
SynthForge: Synthesizing High-Quality Face Dataset with Controllable 3D Generative Models
Abhay Rawat
Shubham Dokania
Astitva Srivastava
Shuaib Ahmed
Haiwen Feng
Rahul Tallamraju
220
1
0
12 Jun 2024
Large Language Models Must Be Taught to Know What They Don't Know
Large Language Models Must Be Taught to Know What They Don't Know
Sanyam Kapoor
Nate Gruver
M. Roberts
Katherine M. Collins
Arka Pal
Umang Bhatt
Adrian Weller
Samuel Dooley
Micah Goldblum
Andrew Gordon Wilson
450
52
0
12 Jun 2024
Information Geometry of Evolution of Neural Network Parameters While
  Training
Information Geometry of Evolution of Neural Network Parameters While Training
A. Thiruthummal
Eun-Jin Kim
Sergiy Shelyag
AAML
124
1
0
07 Jun 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular LanguagesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Nadav Borenstein
Anej Svete
R. Chan
Josef Valvoda
Franz Nowak
Isabelle Augenstein
Eleanor Chodroff
Robert Bamler
802
19
0
06 Jun 2024
Previous
123...678...232425
Next
Page 7 of 25
Pageof 25