ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04235
  4. Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018
Noam M. Shazeer
Mitchell Stern
    ODL
ArXiv (abs)PDFHTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown
Efficient Stagewise Pretraining via Progressive Subnetworks
Efficient Stagewise Pretraining via Progressive Subnetworks
Abhishek Panigrahi
Nikunj Saunshi
Kaifeng Lyu
Sobhan Miryoosefi
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
184
8
0
08 Feb 2024
Generalized Preference Optimization: A Unified Approach to Offline
  Alignment
Generalized Preference Optimization: A Unified Approach to Offline Alignment
Yunhao Tang
Z. Guo
Zeyu Zheng
Daniele Calandriello
Rémi Munos
Mark Rowland
Pierre Harvey Richemond
Michal Valko
Bernardo Avila-Pires
Bilal Piot
268
143
0
08 Feb 2024
InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write
InkSight: Offline-to-Online Handwriting Conversion by Teaching Vision-Language Models to Read and Write
B. Mitrevski
Arina Rak
Julian Schnitzler
Chengkun Li
Andrii Maksai
Jesse Berent
C. Musat
DiffM
334
0
0
08 Feb 2024
Direct Language Model Alignment from Online AI Feedback
Direct Language Model Alignment from Online AI Feedback
Shangmin Guo
Biao Zhang
Tianlin Liu
Tianqi Liu
Misha Khalman
...
Thomas Mesnard
Yao-Min Zhao
Bilal Piot
Johan Ferret
Mathieu Blondel
ALM
259
211
0
07 Feb 2024
Flora: Low-Rank Adapters Are Secretly Gradient Compressors
Flora: Low-Rank Adapters Are Secretly Gradient CompressorsInternational Conference on Machine Learning (ICML), 2024
Yongchang Hao
Yanshuai Cao
Lili Mou
294
86
0
05 Feb 2024
Fractal Patterns May Illuminate the Success of Next-Token Prediction
Fractal Patterns May Illuminate the Success of Next-Token Prediction
Ibrahim Alabdulmohsin
Vinh Q. Tran
Mostafa Dehghani
172
4
0
02 Feb 2024
SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization
SPECTRUM: Speaker-Enhanced Pre-Training for Long Dialogue Summarization
Sangwoo Cho
Kaiqiang Song
Chao Zhao
Xiaoyang Wang
Dong Yu
218
1
0
31 Jan 2024
TeenyTinyLlama: open-source tiny language models trained in Brazilian
  Portuguese
TeenyTinyLlama: open-source tiny language models trained in Brazilian Portuguese
N. Corrêa
Sophia Falk
Shiza Fatimah
Aniket Sen
N. D. Oliveira
268
22
0
30 Jan 2024
Unlearning Traces the Influential Training Data of Language Models
Unlearning Traces the Influential Training Data of Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Masaru Isonuma
Ivan Titov
MU
380
14
0
26 Jan 2024
HiFT: A Hierarchical Full Parameter Fine-Tuning Strategy
HiFT: A Hierarchical Full Parameter Fine-Tuning StrategyConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Yongkang Liu
Yiqun Zhang
Qian Li
Tong Liu
Shi Feng
Daling Wang
Yifei Zhang
Hinrich Schütze
303
14
0
26 Jan 2024
LongFin: A Multimodal Document Understanding Model for Long Financial
  Domain Documents
LongFin: A Multimodal Document Understanding Model for Long Financial Domain Documents
Ahmed Masry
Amir Hajian
145
5
0
26 Jan 2024
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced
  Understanding and Generation
TURNA: A Turkish Encoder-Decoder Language Model for Enhanced Understanding and GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Gokcce Uludougan
Zeynep Yirmibecsouglu Balal
Furkan Akkurt
Melikcsah Turker
Onur Gungor
S. Uskudarli
214
20
0
25 Jan 2024
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced
  Token Detection
SpacTor-T5: Pre-training T5 Models with Span Corruption and Replaced Token Detection
Ke Ye
Heinrich Jiang
Afshin Rostamizadeh
Ayan Chakrabarti
Giulia DeSalvo
Jean-François Kagy
Lazaros Karydas
Gui Citovsky
Sanjiv Kumar
198
0
0
24 Jan 2024
Lumiere: A Space-Time Diffusion Model for Video Generation
Lumiere: A Space-Time Diffusion Model for Video GenerationACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia (SIGGRAPH Asia), 2024
Omer Bar-Tal
Hila Chefer
Omer Tov
Charles Herrmann
Roni Paiss
...
T. Michaeli
Oliver Wang
Deqing Sun
Tali Dekel
Inbar Mosseri
VGen
403
383
0
23 Jan 2024
WARM: On the Benefits of Weight Averaged Reward Models
WARM: On the Benefits of Weight Averaged Reward ModelsInternational Conference on Machine Learning (ICML), 2024
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
356
130
0
22 Jan 2024
Inflation with Diffusion: Efficient Temporal Adaptation for
  Text-to-Video Super-Resolution
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution
Xin Yuan
Jinoo Baek
Keyang Xu
Omer Tov
Hongliang Fei
VGen
161
6
0
18 Jan 2024
Large Language Models for Scientific Information Extraction: An
  Empirical Study for Virology
Large Language Models for Scientific Information Extraction: An Empirical Study for Virology
Mahsa Shamsabadi
Jennifer D'Souza
Sören Auer
309
13
0
18 Jan 2024
On the importance of Data Scale in Pretraining Arabic Language Models
On the importance of Data Scale in Pretraining Arabic Language Models
Abbas Ghaddar
Philippe Langlais
Mehdi Rezagholizadeh
Boxing Chen
139
0
0
15 Jan 2024
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Scaling Laws for Forgetting When Fine-Tuning Large Language Models
Damjan Kalajdzievski
CLL
253
21
0
11 Jan 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Instruct-Imagen: Image Generation with Multi-modal InstructionComputer Vision and Pattern Recognition (CVPR), 2024
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
...
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
MLLM
248
74
0
03 Jan 2024
To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine
  Translation vs Human Translation
To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human TranslationTransactions of the Association for Computational Linguistics (TACL), 2024
Jiaming Luo
Colin Cherry
George F. Foster
185
12
0
02 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
282
271
0
28 Dec 2023
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for
  Text Entry: A Case Study on Abbreviation Expansion
Parameter Efficient Tuning Allows Scalable Personalization of LLMs for Text Entry: A Case Study on Abbreviation Expansion
Katrin Tomanek
Shanqing Cai
Subhashini Venugopalan
79
1
0
21 Dec 2023
Decoupling SQL Query Hardness Parsing for Text-to-SQL
Decoupling SQL Query Hardness Parsing for Text-to-SQL
J. Yi
Guo Chen
269
5
0
11 Dec 2023
Order Matters in the Presence of Dataset Imbalance for Multilingual
  Learning
Order Matters in the Presence of Dataset Imbalance for Multilingual Learning
Dami Choi
Derrick Xin
Hamid Dadkhahi
Justin Gilmer
Ankush Garg
Orhan Firat
Chih-Kuan Yeh
Andrew M. Dai
Behrooz Ghorbani
280
6
0
11 Dec 2023
Domain Adaptation of a State of the Art Text-to-SQL Model: Lessons
  Learned and Challenges Found
Domain Adaptation of a State of the Art Text-to-SQL Model: Lessons Learned and Challenges Found
Irene Manotas
Octavian Popescu
Ngoc Phuoc An Vo
V. Sheinin
OOD
197
2
0
09 Dec 2023
Magicoder: Empowering Code Generation with OSS-Instruct
Magicoder: Empowering Code Generation with OSS-InstructInternational Conference on Machine Learning (ICML), 2023
Yuxiang Wei
Zhe Wang
Jiawei Liu
Yifeng Ding
Lingming Zhang
SyDa
308
196
0
04 Dec 2023
A Machine Learning Approach Towards SKILL Code Autocompletion
A Machine Learning Approach Towards SKILL Code Autocompletion
Enrique Dehaerne
Bappaditya Dey
Wannes Meert
195
0
0
04 Dec 2023
Using Large Language Models to Accelerate Communication for Users with
  Severe Motor Impairments
Using Large Language Models to Accelerate Communication for Users with Severe Motor Impairments
Shanqing Cai
Subhashini Venugopalan
Katie Seaver
Xiang Xiao
Katrin Tomanek
...
Daniel E Vance
Blair Casey
Steve M. Gleason
Philip Q. Nelson
Michael P. Brenner
249
10
0
03 Dec 2023
RLHF and IIA: Perverse Incentives
RLHF and IIA: Perverse Incentives
Wanqiao Xu
Shi Dong
Xiuyuan Lu
Grace Lam
Zheng Wen
Benjamin Van Roy
237
4
0
02 Dec 2023
Meta-learning Optimizers for Communication-Efficient Learning
Meta-learning Optimizers for Communication-Efficient Learning
Charles-Étienne Joseph
Benjamin Thérien
A. Moudgil
Boris Knyazev
Eugene Belilovsky
388
2
0
02 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
397
33
0
01 Dec 2023
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
Damjan Kalajdzievski
ALM
269
170
0
28 Nov 2023
Who is leading in AI? An analysis of industry AI research
Who is leading in AI? An analysis of industry AI research
Ben Cottier
T. Besiroglu
David Owen
317
9
0
24 Nov 2023
Locally Optimal Descent for Dynamic Stepsize Scheduling
Locally Optimal Descent for Dynamic Stepsize SchedulingInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Gilad Yehudai
Alon Cohen
Amit Daniely
Yoel Drori
Tomer Koren
Mariano Schain
258
0
0
23 Nov 2023
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference OptimizationComputer Vision and Pattern Recognition (CVPR), 2023
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
449
516
0
21 Nov 2023
Joyful: Joint Modality Fusion and Graph Contrastive Learning for
  Multimodal Emotion Recognition
Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition
Dongyuan Li
Yusong Wang
Kotaro Funakoshi
Manabu Okumura
188
39
0
18 Nov 2023
Countering Misinformation via Emotional Response Generation
Countering Misinformation via Emotional Response Generation
Daniel Russo
Shane P. Kaszefski-Yaschuk
Jacopo Staiano
Marco Guerini
OffRL
231
15
0
17 Nov 2023
A Computationally Efficient Sparsified Online Newton Method
A Computationally Efficient Sparsified Online Newton Method
Fnu Devvrit
Sai Surya Duvvuri
Rohan Anil
Vineet Gupta
Cho-Jui Hsieh
Inderjit Dhillon
193
0
0
16 Nov 2023
Take One Step at a Time to Know Incremental Utility of Demonstration: An
  Analysis on Reranking for Few-Shot In-Context Learning
Take One Step at a Time to Know Incremental Utility of Demonstration: An Analysis on Reranking for Few-Shot In-Context Learning
Kazuma Hashimoto
K. Raman
Michael Bendersky
371
2
0
16 Nov 2023
Efficient End-to-End Visual Document Understanding with Rationale
  Distillation
Efficient End-to-End Visual Document Understanding with Rationale Distillation
Peng Guo
Alekh Agarwal
Mandar Joshi
Robin Jia
Jesse Thomason
Kristina Toutanova
152
4
0
16 Nov 2023
GistScore: Learning Better Representations for In-Context Example
  Selection with Gist Bottlenecks
GistScore: Learning Better Representations for In-Context Example Selection with Gist Bottlenecks
Shivanshu Gupta
Clemens Rosenbaum
Ethan R. Elenberg
LRM
231
9
0
16 Nov 2023
SiRA: Sparse Mixture of Low Rank Adaptation
SiRA: Sparse Mixture of Low Rank Adaptation
Yun Zhu
Nevan Wichers
Chu-Cheng Lin
Xinyi Wang
Tianlong Chen
...
Han Lu
Canoee Liu
Liangchen Luo
Jindong Chen
Lei Meng
MoE
235
35
0
15 Nov 2023
Argumentation Element Annotation Modeling using XLNet
Argumentation Element Annotation Modeling using XLNet
Christopher M. Ormerod
Amy Burkhardt
Mackenzie Young
Susan Lottridge
125
7
0
10 Nov 2023
SEMQA: Semi-Extractive Multi-Source Question Answering
SEMQA: Semi-Extractive Multi-Source Question Answering
Tal Schuster
Á. Lelkes
Haitian Sun
Jai Gupta
Jonathan Berant
W. Cohen
Donald Metzler
249
24
0
08 Nov 2023
Making Harmful Behaviors Unlearnable for Large Language Models
Making Harmful Behaviors Unlearnable for Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Xin Zhou
Yi Lu
Ruotian Ma
Tao Gui
Tao Gui
Xuanjing Huang
MU
168
18
0
02 Nov 2023
Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine
  Entity Typing
Calibrated Seq2seq Models for Efficient and Generalizable Ultra-fine Entity TypingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yanlin Feng
Adithya Pratapa
David R. Mortensen
283
8
0
01 Nov 2023
De-Diffusion Makes Text a Strong Cross-Modal Interface
De-Diffusion Makes Text a Strong Cross-Modal InterfaceComputer Vision and Pattern Recognition (CVPR), 2023
Chen Wei
Chenxi Liu
Siyuan Qiao
Zhishuai Zhang
Alan Yuille
Jiahui Yu
VLMDiffM
273
17
0
01 Nov 2023
HARE: Explainable Hate Speech Detection with Step-by-Step Reasoning
HARE: Explainable Hate Speech Detection with Step-by-Step ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yongjin Yang
Joonkee Kim
Yujin Kim
Namgyu Ho
James Thorne
Se-Young Yun
315
47
0
01 Nov 2023
Continuous Training and Fine-tuning for Domain-Specific Language Models
  in Medical Question Answering
Continuous Training and Fine-tuning for Domain-Specific Language Models in Medical Question Answering
Zhen Guo
Yining Hua
LM&MACLLALMAI4MH
175
5
0
01 Nov 2023
Previous
123...567...141516
Next