ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04235
  4. Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018
Noam M. Shazeer
Mitchell Stern
    ODL
ArXiv (abs)PDFHTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown
A Spectral Condition for Feature Learning
A Spectral Condition for Feature Learning
Greg Yang
James B. Simon
Jeremy Bernstein
337
62
0
26 Oct 2023
XFEVER: Exploring Fact Verification across Languages
XFEVER: Exploring Fact Verification across LanguagesTaiwan Conference on Computational Linguistics and Speech Processing (TCLSP), 2023
Yi-Chen Chang
Canasai Kruengkrai
Junichi Yamagishi
HILM
106
6
0
25 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning CoordinatorsNeural Information Processing Systems (NeurIPS), 2023
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLMLRM
276
91
0
23 Oct 2023
Implicit meta-learning may lead language models to trust more reliable
  sources
Implicit meta-learning may lead language models to trust more reliable sourcesInternational Conference on Machine Learning (ICML), 2023
Dmitrii Krasheninnikov
Egor Krasheninnikov
Bruno Mlodozeniec
Tegan Maharaj
David M. Krueger
516
7
0
23 Oct 2023
Once Upon a $\textit{Time}$ in $\textit{Graph}$: Relative-Time
  Pretraining for Complex Temporal Reasoning
Once Upon a Time\textit{Time}Time in Graph\textit{Graph}Graph: Relative-Time Pretraining for Complex Temporal ReasoningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sen Yang
Xin Li
Li Bing
Wai Lam
AI4CE
197
15
0
23 Oct 2023
Benchmarking and Improving Text-to-SQL Generation under Ambiguity
Benchmarking and Improving Text-to-SQL Generation under Ambiguity
Adithya Bhaskar
Tushar Tomar
Ashutosh Sathe
Sunita Sarawagi
316
38
0
20 Oct 2023
Auto-Instruct: Automatic Instruction Generation and Ranking for
  Black-Box Language Models
Auto-Instruct: Automatic Instruction Generation and Ranking for Black-Box Language Models
Zhihan Zhang
Shuohang Wang
Wenhao Yu
Yichong Xu
Dan Iter
Qingkai Zeng
Yang Liu
Chenguang Zhu
Meng Jiang
SyDaALM
161
28
0
19 Oct 2023
Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning
  for Versatile Multimodal Modeling
Non-Intrusive Adaptation: Input-Centric Parameter-efficient Fine-Tuning for Versatile Multimodal Modeling
Yaqing Wang
Jialin Wu
T. Dabral
Jiageng Zhang
Geoff Brown
...
Frederick Liu
Yi Liang
Bo Pang
Michael Bendersky
Radu Soricut
VLM
182
19
0
18 Oct 2023
Grounded and Well-rounded: A Methodological Approach to the Study of
  Cross-modal and Cross-lingual Grounding
Grounded and Well-rounded: A Methodological Approach to the Study of Cross-modal and Cross-lingual GroundingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Timothee Mickus
Elaine Zosa
Denis Paperno
167
0
0
18 Oct 2023
DemoSG: Demonstration-enhanced Schema-guided Generation for Low-resource
  Event Extraction
DemoSG: Demonstration-enhanced Schema-guided Generation for Low-resource Event ExtractionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Gang Zhao
Xiaocheng Gong
Xinjie Yang
Guanting Dong
Shudong Lu
Si Li
239
14
0
16 Oct 2023
AdaLomo: Low-memory Optimization with Adaptive Learning Rate
AdaLomo: Low-memory Optimization with Adaptive Learning RateAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Kai Lv
Hang Yan
Qipeng Guo
Haijun Lv
Xipeng Qiu
ODL
312
29
0
16 Oct 2023
DPZero: Private Fine-Tuning of Language Models without Backpropagation
DPZero: Private Fine-Tuning of Language Models without Backpropagation
Liang Zhang
Bingcong Li
K. K. Thekumparampil
Sewoong Oh
Niao He
450
22
0
14 Oct 2023
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
DistillSpec: Improving Speculative Decoding via Knowledge DistillationInternational Conference on Learning Representations (ICLR), 2023
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
266
123
0
12 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
MatFormer: Nested Transformer for Elastic InferenceNeural Information Processing Systems (NeurIPS), 2023
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
256
61
0
11 Oct 2023
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
QFT: Quantized Full-parameter Tuning of LLMs with Affordable Resources
Zhikai Li
Xiaoxuan Liu
Banghua Zhu
Zhen Dong
Qingyi Gu
Kurt Keutzer
MQ
275
11
0
11 Oct 2023
Guiding Language Model Math Reasoning with Planning Tokens
Guiding Language Model Math Reasoning with Planning Tokens
Xinyi Wang
Lucas Caccia
O. Ostapenko
Xingdi Yuan
William Yang Wang
Alessandro Sordoni
LRM
285
55
0
09 Oct 2023
Fast and Robust Early-Exiting Framework for Autoregressive Language
  Models with Synchronized Parallel Decoding
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel DecodingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sangmin Bae
Jongwoo Ko
Hwanjun Song
SeYoung Yun
273
78
0
09 Oct 2023
Parameterizing Context: Unleashing the Power of Parameter-Efficient
  Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing
Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic ParsingNeural Information Processing Systems (NeurIPS), 2023
Yongrui Chen
Shenyu Zhang
Guilin Qi
Xinnan Guo
CLL
226
10
0
07 Oct 2023
Module-wise Adaptive Distillation for Multimodality Foundation Models
Module-wise Adaptive Distillation for Multimodality Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023
Chen Liang
Jiahui Yu
Ming-Hsuan Yang
Matthew A. Brown
Huayu Chen
Tuo Zhao
Boqing Gong
Tianyi Zhou
190
13
0
06 Oct 2023
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle
  Consistency
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle ConsistencyInternational Conference on Learning Representations (ICLR), 2023
Tianhong Li
Sangnie Bhardwaj
Yonglong Tian
Han Zhang
Jarred Barber
Dina Katabi
Guillaume Lajoie
Huiwen Chang
Dilip Krishnan
VLM
272
7
0
05 Oct 2023
Learning to Rewrite Prompts for Personalized Text Generation
Learning to Rewrite Prompts for Personalized Text GenerationThe Web Conference (WWW), 2023
Cheng-rong Li
Mingyang Zhang
Qiaozhu Mei
Weize Kong
Michael Bendersky
LLMAG
313
44
0
29 Sep 2023
Transformer-VQ: Linear-Time Transformers via Vector Quantization
Transformer-VQ: Linear-Time Transformers via Vector QuantizationInternational Conference on Learning Representations (ICLR), 2023
Albert Mohwald
250
26
0
28 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilitiesInternational Conference on Learning Representations (ICLR), 2023
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
319
135
0
25 Sep 2023
Massive End-to-end Models for Short Search Queries
Massive End-to-end Models for Short Search Queries
Weiran Wang
Rohit Prabhavalkar
Dongseong Hwang
Qiujia Li
K. Sim
...
Zhong Meng
CJ Zheng
Yanzhang He
Tara N. Sainath
P. M. Mengibar
174
2
0
22 Sep 2023
AMPLIFY:Attention-based Mixup for Performance Improvement and Label
  Smoothing in Transformer
AMPLIFY:Attention-based Mixup for Performance Improvement and Label Smoothing in TransformerPeerJ Computer Science (PeerJ Comput. Sci.), 2023
Leixin Yang
Yu Xiang
392
2
0
22 Sep 2023
A Family of Pretrained Transformer Language Models for Russian
A Family of Pretrained Transformer Language Models for RussianInternational Conference on Language Resources and Evaluation (LREC), 2023
Dmitry Zmitrovich
Alexander Abramov
Andrey Kalmykov
Maria Tikhonova
Ekaterina Taktasheva
...
Vitalii Kadulin
Sergey Markov
Tatiana Shavrina
Vladislav Mikhailov
Alena Fenogenova
318
51
0
19 Sep 2023
Few-Shot Adaptation for Parsing Contextual Utterances with LLMs
Few-Shot Adaptation for Parsing Contextual Utterances with LLMsInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Kevin Lin
Patrick Xia
Hao Fang
195
3
0
18 Sep 2023
Scaling Laws for Sparsely-Connected Foundation Models
Scaling Laws for Sparsely-Connected Foundation ModelsInternational Conference on Learning Representations (ICLR), 2023
Elias Frantar
C. Riquelme
N. Houlsby
Dan Alistarh
Utku Evci
296
46
0
15 Sep 2023
Reward Engineering for Generating Semi-structured Explanation
Reward Engineering for Generating Semi-structured ExplanationFindings (Findings), 2023
Jiuzhou Han
Wray Buntine
Ehsan Shareghi
LRM
155
0
0
15 Sep 2023
Self-Consistent Narrative Prompts on Abductive Natural Language
  Inference
Self-Consistent Narrative Prompts on Abductive Natural Language InferenceInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Chunkit Chan
Xin Liu
Tszho Chan
Cheng Jiayang
Yangqiu Song
Ginny Wong
Simon See
LRM
149
8
0
15 Sep 2023
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained
  Foundation Models
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Guanlong Zhao
Yongqiang Wang
Jason W. Pelecanos
Yu Zhang
Hank Liao
Yiling Huang
Han Lu
Quan Wang
249
6
0
14 Sep 2023
Benchmarking Procedural Language Understanding for Low-Resource
  Languages: A Case Study on Turkish
Benchmarking Procedural Language Understanding for Low-Resource Languages: A Case Study on TurkishInternational Joint Conference on Natural Language Processing (IJCNLP), 2023
Arda Uzunouglu
Gözde Gül Sahin
220
7
0
13 Sep 2023
Statistical Rejection Sampling Improves Preference Optimization
Statistical Rejection Sampling Improves Preference OptimizationInternational Conference on Learning Representations (ICLR), 2023
Tianqi Liu
Yao-Min Zhao
Rishabh Joshi
Misha Khalman
Mohammad Saleh
Peter J. Liu
Jialu Liu
319
318
0
13 Sep 2023
A Distributed Data-Parallel PyTorch Implementation of the Distributed
  Shampoo Optimizer for Training Neural Networks At-Scale
A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale
Hao-Jun Michael Shi
Tsung-Hsien Lee
Shintaro Iwasaki
Jose Gallego-Posada
Zhijing Li
Kaushik Rangadurai
Dheevatsa Mudigere
Michael Rabbat
ODL
244
44
0
12 Sep 2023
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient
  MoE for Instruction Tuning
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction TuningInternational Conference on Learning Representations (ICLR), 2023
Ted Zadouri
Ahmet Üstün
Arash Ahmadian
Beyza Ermics
Acyr Locatelli
Sara Hooker
MoE
253
138
0
11 Sep 2023
Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain
  Adaptation in Neural Machine Translation
Epi-Curriculum: Episodic Curriculum Learning for Low-Resource Domain Adaptation in Neural Machine TranslationIEEE Transactions on Artificial Intelligence (IEEE TAI), 2023
Keyu Chen
Zhuang Di
Mingchen Li
J. M. Chang
310
5
0
06 Sep 2023
Memory Efficient Optimizers with 4-bit States
Memory Efficient Optimizers with 4-bit StatesNeural Information Processing Systems (NeurIPS), 2023
Bingrui Li
Jianfei Chen
Jun Zhu
MQ
337
57
0
04 Sep 2023
RSDiff: Remote Sensing Image Generation from Text Using Diffusion Model
RSDiff: Remote Sensing Image Generation from Text Using Diffusion Model
A. Sebaq
Mohamed ElHelw
DiffM
289
45
0
03 Sep 2023
Benchmarking the Generation of Fact Checking Explanations
Benchmarking the Generation of Fact Checking ExplanationsTransactions of the Association for Computational Linguistics (TACL), 2023
Daniel Russo
Serra Sinem Tekiroğlu
Marco Guerini
159
30
0
29 Aug 2023
MEMORY-VQ: Compression for Tractable Internet-Scale Memory
MEMORY-VQ: Compression for Tractable Internet-Scale MemoryNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yury Zemlyanskiy
Michiel de Jong
Luke Vilnis
Santiago Ontañón
William W. Cohen
Sumit Sanghai
Joshua Ainslie
RALMMQ
191
2
0
28 Aug 2023
Training and Meta-Evaluating Machine Translation Evaluation Metrics at
  the Paragraph Level
Training and Meta-Evaluating Machine Translation Evaluation Metrics at the Paragraph LevelConference on Machine Translation (WMT), 2023
Daniel Deutsch
Juraj Juraska
M. Finkelstein
and Markus Freitag
310
13
0
25 Aug 2023
Towards an On-device Agent for Text Rewriting
Towards an On-device Agent for Text Rewriting
Yun Zhu
Yinxiao Liu
Felix Stahlberg
Shankar Kumar
Yu-hui Chen
Liangchen Luo
Lei Shu
Renjie Liu
Jindong Chen
Lei Meng
LLMAG
195
9
0
22 Aug 2023
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation
  with Large Language Models
Exploring Parameter-Efficient Fine-Tuning Techniques for Code Generation with Large Language ModelsACM Transactions on Software Engineering and Methodology (TOSEM), 2023
Martin Weyssow
Xin Zhou
Kisub Kim
David Lo
H. Sahraoui
248
66
0
21 Aug 2023
TokenSplit: Using Discrete Speech Representations for Direct, Refined,
  and Transcript-Conditioned Speech Separation and Recognition
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and RecognitionInterspeech (Interspeech), 2023
Hakan Erdogan
Scott Wisdom
Xuankai Chang
Zalan Borsos
Marco Tagliasacchi
Neil Zeghidour
J. Hershey
192
16
0
21 Aug 2023
A Methodology for Generative Spelling Correction via Natural Spelling
  Errors Emulation across Multiple Domains and Languages
A Methodology for Generative Spelling Correction via Natural Spelling Errors Emulation across Multiple Domains and LanguagesFindings (Findings), 2023
Nikita Martynov
Mark Baushenko
Anastasia Kozlova
Katerina Kolomeytseva
Aleksandr Abramov
Alena Fenogenova
201
8
0
18 Aug 2023
Teach LLMs to Personalize -- An Approach inspired by Writing Education
Teach LLMs to Personalize -- An Approach inspired by Writing Education
Cheng Li
Mingyang Zhang
Qiaozhu Mei
Yaqing Wang
Spurthi Amba Hombaiah
Yi Liang
Michael Bendersky
AI4Ed
236
55
0
15 Aug 2023
Robustness Over Time: Understanding Adversarial Examples' Effectiveness
  on Longitudinal Versions of Large Language Models
Robustness Over Time: Understanding Adversarial Examples' Effectiveness on Longitudinal Versions of Large Language Models
Yugeng Liu
Tianshuo Cong
Subrat Kishore Dutta
Michael Backes
Yun Shen
Yang Zhang
AAML
249
11
0
15 Aug 2023
You Only Prompt Once: On the Capabilities of Prompt Learning on Large
  Language Models to Tackle Toxic Content
You Only Prompt Once: On the Capabilities of Prompt Learning on Large Language Models to Tackle Toxic ContentIEEE Symposium on Security and Privacy (IEEE S&P), 2023
Xinlei He
Savvas Zannettou
Yun Shen
Yang Zhang
CLL
171
65
0
10 Aug 2023
KITLM: Domain-Specific Knowledge InTegration into Language Models for
  Question Answering
KITLM: Domain-Specific Knowledge InTegration into Language Models for Question AnsweringICON (ICON), 2023
Ankush Agarwal
Sakharam Gawade
A. Azad
P. Bhattacharyya
KELM
114
8
0
07 Aug 2023
PromptSum: Parameter-Efficient Controllable Abstractive Summarization
PromptSum: Parameter-Efficient Controllable Abstractive Summarization
Mathieu Ravaut
Hailin Chen
Ruochen Zhao
Chengwei Qin
Shafiq Joty
Nancy Chen
175
3
0
06 Aug 2023
Previous
123...678...141516
Next