ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04235
  4. Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018
Noam M. Shazeer
Mitchell Stern
    ODL
ArXiv (abs)PDFHTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
239
41
0
29 May 2024
4-bit Shampoo for Memory-Efficient Network Training
4-bit Shampoo for Memory-Efficient Network Training
Sike Wang
Jia Li
Pan Zhou
Hua Huang
MQ
473
12
0
28 May 2024
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
VeLoRA: Memory Efficient Training using Rank-1 Sub-Token Projections
Roy Miles
Pradyumna Reddy
Ismail Elezi
Jiankang Deng
VLM
264
12
0
28 May 2024
LoQT: Low Rank Adapters for Quantized Training
LoQT: Low Rank Adapters for Quantized Training
Sebastian Loeschcke
M. Toftrup
M. Kastoryano
Serge Belongie
Vésteinn Snæbjarnarson
MQ
237
0
0
26 May 2024
AdaFisher: Adaptive Second Order Optimization via Fisher Information
AdaFisher: Adaptive Second Order Optimization via Fisher Information
Damien Martins Gomes
Yanlei Zhang
Eugene Belilovsky
Guy Wolf
Mahdi S. Hosseini
ODL
630
5
0
26 May 2024
Accelerating Inference of Retrieval-Augmented Generation via Sparse
  Context Selection
Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection
Yun Zhu
Jia-Chen Gu
Caitlin Sikora
Ho Ko
Yinxiao Liu
...
Lei Shu
Liangchen Luo
Lei Meng
Bang Liu
Jindong Chen
RALM
251
26
0
25 May 2024
Sparse maximal update parameterization: A holistic approach to sparse
  training dynamics
Sparse maximal update parameterization: A holistic approach to sparse training dynamics
Nolan Dey
Shane Bergsma
Joel Hestness
257
8
0
24 May 2024
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and
  Provable Convergence
MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence
Ionut-Vlad Modoranu
M. Safaryan
Grigory Malinovsky
Eldar Kurtic
Thomas Robert
Peter Richtárik
Dan Alistarh
MQ
204
21
0
24 May 2024
Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling
Surge Phenomenon in Optimal Learning Rate and Batch Size ScalingNeural Information Processing Systems (NeurIPS), 2024
Shuaipeng Li
Penghao Zhao
Hailin Zhang
Xingwu Sun
Hao Wu
...
Zheng Fang
Jinbao Xue
Yangyu Tao
Tengjiao Wang
Di Wang
293
25
0
23 May 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive
  Vision-Language Models
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim Alabdulmohsin
VLM
286
13
0
22 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical
  Fisher information
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
654
15
0
21 May 2024
Prompting-based Synthetic Data Generation for Few-Shot Question
  Answering
Prompting-based Synthetic Data Generation for Few-Shot Question AnsweringInternational Conference on Language Resources and Evaluation (LREC), 2024
Maximilian Schmidt
Andrea Bartezzaghi
Ngoc Thang Vu
SyDa
225
10
0
15 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
326
0
0
13 May 2024
Stochastic RAG: End-to-End Retrieval-Augmented Generation through
  Expected Utility Maximization
Stochastic RAG: End-to-End Retrieval-Augmented Generation through Expected Utility MaximizationAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2024
Hamed Zamani
Michael Bendersky
355
40
0
05 May 2024
Investigating Wit, Creativity, and Detectability of Large Language
  Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts
Investigating Wit, Creativity, and Detectability of Large Language Models in Domain-Specific Writing Style Adaptation of Reddit's Showerthoughts
Tolga Buz
Benjamin Frost
Nikola Genchev
Moritz Schneider
Lucie-Aimée Kaffee
Gerard de Melo
DeLMO
268
9
0
02 May 2024
RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document
  Abstractive Summarization
RST-LoRA: A Discourse-Aware Low-Rank Adaptation for Long Document Abstractive Summarization
Dongqi Pu
Vera Demberg
364
11
0
01 May 2024
Empowering Large Language Models for Textual Data Augmentation
Empowering Large Language Models for Textual Data Augmentation
Yichuan Li
Kaize Ding
Jianling Wang
Kyumin Lee
269
19
0
26 Apr 2024
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual
  Alignment
Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment
Zhaofeng Wu
Ananth Balashankar
Yoon Kim
Jacob Eisenstein
Ahmad Beirami
250
25
0
18 Apr 2024
Deferred NAM: Low-latency Top-K Context Injection via Deferred Context
  Encoding for Non-Streaming ASR
Deferred NAM: Low-latency Top-K Context Injection via Deferred Context Encoding for Non-Streaming ASR
Zelin Wu
Gan Song
Christopher Li
Pat Rondon
Zhong Meng
...
D. Caseiro
Golan Pundak
Tsendsuren Munkhdalai
Angad Chandorkar
Rohit Prabhavalkar
316
5
0
15 Apr 2024
Impact of Preference Noise on the Alignment Performance of Generative
  Language Models
Impact of Preference Noise on the Alignment Performance of Generative Language Models
Yang Gao
Dana Alon
Donald Metzler
352
35
0
15 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
422
17
0
14 Apr 2024
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
  Training Strategies
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Zichao Li
Cihang Xie
E. D. Cubuk
CLIP
220
11
0
12 Apr 2024
Leave No Context Behind: Efficient Infinite Context Transformers with
  Infini-attention
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention
Tsendsuren Munkhdalai
Manaal Faruqui
Siddharth Gopal
LRMLLMAGCLL
309
170
0
10 Apr 2024
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule
  Joint Evolution
Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint EvolutionAnnual Conference on Genetic and Evolutionary Computation (GECCO), 2024
Brandon Morgan
Dean Frederick Hougen
ODL
261
0
0
10 Apr 2024
Privacy Preserving Prompt Engineering: A Survey
Privacy Preserving Prompt Engineering: A Survey
Kennedy Edemacu
Xintao Wu
386
40
0
09 Apr 2024
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Best-of-Venom: Attacking RLHF by Injecting Poisoned Preference Data
Tim Baumgärtner
Yang Gao
Dana Alon
Donald Metzler
AAML
240
33
0
08 Apr 2024
Implicit Bias of AdamW: $\ell_\infty$ Norm Constrained Optimization
Implicit Bias of AdamW: ℓ∞\ell_\inftyℓ∞​ Norm Constrained Optimization
Shuo Xie
Zhiyuan Li
OffRL
268
38
0
05 Apr 2024
Training LLMs over Neurally Compressed Text
Training LLMs over Neurally Compressed Text
Brian Lester
Jaehoon Lee
A. Alemi
Jeffrey Pennington
Adam Roberts
Jascha Narain Sohl-Dickstein
Noah Constant
215
11
0
04 Apr 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang
Yi Luan
Hexiang Hu
Kenton Lee
Siyuan Qiao
Wenhu Chen
Yu-Chuan Su
Ming-Wei Chang
VLMLRM
307
78
0
28 Mar 2024
Juru: Legal Brazilian Large Language Model from Reputable Sources
Juru: Legal Brazilian Large Language Model from Reputable Sources
Roseval Malaquias Junior
Ramon Pires
R. Romero
R. Nogueira
AILawELM
245
4
0
26 Mar 2024
A Hybrid Approach To Aspect Based Sentiment Analysis Using Transfer
  Learning
A Hybrid Approach To Aspect Based Sentiment Analysis Using Transfer Learning
Gaurav Negi
Rajdeep Sarkar
Omnia Zayed
P. Buitelaar
142
11
0
25 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss PerspectiveNeural Information Processing Systems (NeurIPS), 2024
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCVLRM
416
79
0
23 Mar 2024
Adapprox: Adaptive Approximation in Adam Optimization via Randomized
  Low-Rank Matrices
Adapprox: Adaptive Approximation in Adam Optimization via Randomized Low-Rank Matrices
Pengxiang Zhao
Ping Li
Yingjie Gu
Yi Zheng
Stephan Ludger Kölker
Zhefeng Wang
Xiaoming Yuan
186
8
0
22 Mar 2024
Partitioned Neural Network Training via Synthetic Intermediate Labels
Partitioned Neural Network Training via Synthetic Intermediate Labels
C. V. Karadag
Nezih Topaloglu
251
2
0
17 Mar 2024
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
PERL: Parameter Efficient Reinforcement Learning from Human Feedback
Hakim Sidahmed
Samrat Phatale
Alex Hutcheson
Zhuonan Lin
Zhan Chen
...
Jessica Hoffmann
Hassan Mansoor
Wei Li
Abhinav Rastogi
Lucas Dixon
251
4
0
15 Mar 2024
Frozen Feature Augmentation for Few-Shot Image Classification
Frozen Feature Augmentation for Few-Shot Image Classification
Andreas Bär
N. Houlsby
Mostafa Dehghani
Manoj Kumar
VLM
285
16
0
15 Mar 2024
Human Alignment of Large Language Models through Online Preference
  Optimisation
Human Alignment of Large Language Models through Online Preference OptimisationInternational Conference on Machine Learning (ICML), 2024
Daniele Calandriello
Daniel Guo
Rémi Munos
Mark Rowland
Yunhao Tang
...
Michal Valko
Tianqi Liu
Rishabh Joshi
Zeyu Zheng
Bilal Piot
277
86
0
13 Mar 2024
Low-Resource Court Judgment Summarization for Common Law Systems
Low-Resource Court Judgment Summarization for Common Law Systems
Shuaiqi Liu
Jiannong Cao
Yicong Li
Ruosong Yang
Zhiyuan Wen
ELMAILaw
169
17
0
07 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu Zhang
Beidi Chen
Zinan Lin
A. Anandkumar
Yuandong Tian
418
339
0
06 Mar 2024
FENICE: Factuality Evaluation of summarization based on Natural language
  Inference and Claim Extraction
FENICE: Factuality Evaluation of summarization based on Natural language Inference and Claim Extraction
Alessandro Sciré
Karim Ghonim
Roberto Navigli
HILM
253
22
0
04 Mar 2024
Learning to Deliver: a Foundation Model for the Montreal Capacitated
  Vehicle Routing Problem
Learning to Deliver: a Foundation Model for the Montreal Capacitated Vehicle Routing Problem
Samuel J. K. Chin
Matthias Winkenbach
Akash Srivastava
190
0
0
28 Feb 2024
When Scaling Meets LLM Finetuning: The Effect of Data, Model and
  Finetuning Method
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Biao Zhang
Zhongtao Liu
Colin Cherry
Orhan Firat
LRM
289
233
0
27 Feb 2024
Extreme Encoder Output Frame Rate Reduction: Improving Computational
  Latencies of Large End-to-End Models
Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models
Rohit Prabhavalkar
Zhong Meng
Weiran Wang
Adam Stooke
Xingyu Cai
Yanzhang He
Arun Narayanan
Dongseong Hwang
Tara N. Sainath
Pedro J. Moreno
214
11
0
27 Feb 2024
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Second-Order Fine-Tuning without Pain for LLMs:A Hessian Informed Zeroth-Order Optimizer
Yanjun Zhao
Sizhe Dang
Haishan Ye
Guang Dai
Yi Qian
Ivor W.Tsang
690
29
0
23 Feb 2024
Can Language Models Act as Knowledge Bases at Scale?
Can Language Models Act as Knowledge Bases at Scale?
Qiyuan He
Yizhong Wang
Wenya Wang
KELMLRM
257
18
0
22 Feb 2024
FLAME: Self-Supervised Low-Resource Taxonomy Expansion using Large
  Language Models
FLAME: Self-Supervised Low-Resource Taxonomy Expansion using Large Language Models
Sahil Mishra
Ujjwal Sudev
Tanmoy Chakraborty
131
4
0
21 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
392
68
0
20 Feb 2024
Aya Model: An Instruction Finetuned Open-Access Multilingual Language
  Model
Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model
Ahmet Üstün
Viraat Aryabumi
Zheng-Xin Yong
Wei-Yin Ko
Daniel D'souza
...
Shayne Longpre
Niklas Muennighoff
Marzieh Fadaee
Julia Kreutzer
Sara Hooker
ALMELMSyDaLRM
246
328
0
12 Feb 2024
Low-Resource Counterspeech Generation for Indic Languages: The Case of
  Bengali and Hindi
Low-Resource Counterspeech Generation for Indic Languages: The Case of Bengali and HindiFindings (Findings), 2024
Mithun Das
Saurabh Kumar Pandey
Shivansh Sethi
Punyajoy Saha
Animesh Mukherjee
157
4
0
11 Feb 2024
Efficient Stagewise Pretraining via Progressive Subnetworks
Efficient Stagewise Pretraining via Progressive Subnetworks
Abhishek Panigrahi
Nikunj Saunshi
Kaifeng Lyu
Sobhan Miryoosefi
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
184
8
0
08 Feb 2024
Previous
123456...141516
Next
Page 5 of 16
Pageof 16