ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.04235
  4. Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

11 April 2018
Noam M. Shazeer
Mitchell Stern
    ODL
ArXiv (abs)PDFHTML

Papers citing "Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"

50 / 799 papers shown
The Marginal Value of Momentum for Small Learning Rate SGD
The Marginal Value of Momentum for Small Learning Rate SGDInternational Conference on Learning Representations (ICLR), 2023
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
234
10
0
27 Jul 2023
f-Divergence Minimization for Sequence-Level Knowledge Distillation
f-Divergence Minimization for Sequence-Level Knowledge DistillationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
280
83
0
27 Jul 2023
Towards Generalist Biomedical AI
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MAMedImAI4MH
279
410
0
26 Jul 2023
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
429
58
0
12 Jul 2023
RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named
  Entity Recognition
RoPDA: Robust Prompt-based Data Augmentation for Low-Resource Named Entity RecognitionAAAI Conference on Artificial Intelligence (AAAI), 2023
Sihan Song
Jian Zhao
Jian Zhao
195
6
0
11 Jul 2023
Event Extraction as Question Generation and Answering
Event Extraction as Question Generation and AnsweringAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Di Lu
Shihao Ran
Joel R. Tetreault
A. Jaimes
205
51
0
10 Jul 2023
Scaling In-Context Demonstrations with Structured Attention
Scaling In-Context Demonstrations with Structured Attention
Tianle Cai
Kaixuan Huang
Jason D. Lee
Mengdi Wang
LRM
166
9
0
05 Jul 2023
CAME: Confidence-guided Adaptive Memory Efficient Optimization
CAME: Confidence-guided Adaptive Memory Efficient OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yang Luo
Xiaozhe Ren
Zangwei Zheng
Zhuo Jiang
Xin Jiang
Yang You
ODL
345
35
0
05 Jul 2023
Could Small Language Models Serve as Recommenders? Towards Data-centric
  Cold-start Recommendations
Could Small Language Models Serve as Recommenders? Towards Data-centric Cold-start RecommendationsThe Web Conference (WWW), 2023
Xuansheng Wu
Huachi Zhou
Yucheng Shi
Wenlin Yao
Xiao Shi Huang
Ninghao Liu
LRM
293
29
0
29 Jun 2023
YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English
  Parallel Corpus
YouTube-ASL: A Large-Scale, Open-Domain American Sign Language-English Parallel CorpusNeural Information Processing Systems (NeurIPS), 2023
David C. Uthus
Garrett Tanzer
Manfred Georg
SLR
277
71
0
27 Jun 2023
Is Pre-training Truly Better Than Meta-Learning?
Is Pre-training Truly Better Than Meta-Learning?
Alycia Lee
P. Yu
Saumya Goyal
Yu-Xiong Wang
Oluwasanmi Koyejo
276
8
0
24 Jun 2023
On-Policy Distillation of Language Models: Learning from Self-Generated
  Mistakes
On-Policy Distillation of Language Models: Learning from Self-Generated MistakesInternational Conference on Learning Representations (ICLR), 2023
Rishabh Agarwal
Nino Vieillard
Yongchao Zhou
Piotr Stańczyk
Sabela Ramos
Matthieu Geist
Olivier Bachem
319
183
0
23 Jun 2023
A Reference-less Quality Metric for Automatic Speech Recognition via
  Contrastive-Learning of a Multi-Language Model with Self-Supervision
A Reference-less Quality Metric for Automatic Speech Recognition via Contrastive-Learning of a Multi-Language Model with Self-Supervision
K. Yuksel
Thiago Castro Ferreira
Ahmet Gunduz
Mohamed Al-Badrashiny
Golara Javadi
129
7
0
21 Jun 2023
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition
  via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive LearningInterspeech (Interspeech), 2023
K. Yuksel
Thiago Castro Ferreira
Golara Javadi
Mohamed El-Badrashiny
Ahmet Gunduz
152
5
0
21 Jun 2023
GLIMMER: generalized late-interaction memory reranker
GLIMMER: generalized late-interaction memory reranker
Michiel de Jong
Yury Zemlyanskiy
Nicholas FitzGerald
Sumit Sanghai
William W. Cohen
Joshua Ainslie
RALM
232
9
0
17 Jun 2023
Conformal Language Modeling
Conformal Language ModelingInternational Conference on Learning Representations (ICLR), 2023
Victor Quach
Adam Fisch
Tal Schuster
Adam Yala
J. Sohn
Tommi Jaakkola
Regina Barzilay
574
97
0
16 Jun 2023
Scaling Open-Vocabulary Object Detection
Scaling Open-Vocabulary Object DetectionNeural Information Processing Systems (NeurIPS), 2023
Matthias Minderer
A. Gritsenko
N. Houlsby
VLMObjD
423
315
0
16 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and
  Lipschitz Constant
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
202
0
0
15 Jun 2023
Interleaving Pre-Trained Language Models and Large Language Models for
  Zero-Shot NL2SQL Generation
Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation
Zihui Gu
Ju Fan
Nan Tang
Songyue Zhang
Yuxin Zhang
Zui Chen
Lei Cao
Guoliang Li
Sam Madden
Xiaoyong Du
217
32
0
15 Jun 2023
AutoML in the Age of Large Language Models: Current Challenges, Future
  Opportunities and Risks
AutoML in the Age of Large Language Models: Current Challenges, Future Opportunities and Risks
Alexander Tornede
Difan Deng
Theresa Eimer
Joseph Giovanelli
Aditya Mohan
...
Sarah Segel
Daphne Theodorakopoulos
Tanja Tornede
Henning Wachsmuth
Marius Lindauer
325
36
0
13 Jun 2023
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural
  Language Processing
AraMUS: Pushing the Limits of Data and Model Scale for Arabic Natural Language ProcessingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Asaad Alghamdi
Xinyu Duan
Wei Jiang
Zhenhai Wang
Yimeng Wu
...
Yifei Zheng
Mehdi Rezagholizadeh
Baoxing Huai
Peilun Cheng
Abbas Ghaddar
VLM
140
10
0
11 Jun 2023
PoET: A generative model of protein families as sequences-of-sequences
PoET: A generative model of protein families as sequences-of-sequencesNeural Information Processing Systems (NeurIPS), 2023
Timothy F. Truong
Tristan Bepler
SLR
211
69
0
09 Jun 2023
Leaping through tree space: continuous phylogenetic inference for rooted
  and unrooted trees
Leaping through tree space: continuous phylogenetic inference for rooted and unrooted treesGenome Biology and Evolution (GBE), 2023
Matthew J. Penn
Neil Scheidwasser
Joseph Penn
C. Donnelly
D. Duchêne
Samir Bhatt
302
6
0
09 Jun 2023
Unbalanced Optimal Transport for Unbalanced Word Alignment
Unbalanced Optimal Transport for Unbalanced Word AlignmentAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuki Arase
Han Bao
Sho Yokoi
OT
140
6
0
07 Jun 2023
Click: Controllable Text Generation with Sequence Likelihood Contrastive
  Learning
Click: Controllable Text Generation with Sequence Likelihood Contrastive LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Chujie Zheng
Pei Ke
Zheng Zhang
Shiyu Huang
BDL
240
45
0
06 Jun 2023
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and
  Generative Fusion
LLM-Blender: Ensembling Large Language Models with Pairwise Ranking and Generative FusionAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Dongfu Jiang
Xiang Ren
Bill Yuchen Lin
ELM
445
487
0
05 Jun 2023
SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models
  with Same Tower Negatives
SamToNe: Improving Contrastive Loss for Dual Encoder Retrieval Models with Same Tower NegativesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Fedor Moiseev
Gustavo Hernández Ábrego
Peter Dornbach
I. Zitouni
Enrique Alfonseca
Zhe Dong
192
9
0
05 Jun 2023
Harnessing large-language models to generate private synthetic text
Harnessing large-language models to generate private synthetic text
Alexey Kurakin
Natalia Ponomareva
Umar Syed
Liam MacDermed
Seth Neel
SILMSyDa
290
54
0
02 Jun 2023
THiFLY Research at SemEval-2023 Task 7: A Multi-granularity System for
  CTR-based Textual Entailment and Evidence Retrieval
THiFLY Research at SemEval-2023 Task 7: A Multi-granularity System for CTR-based Textual Entailment and Evidence RetrievalInternational Workshop on Semantic Evaluation (SemEval), 2023
Yuxuan Zhou
Ziyun Jin
Meiwei Li
Chenyi Guo
Xien Liu
Xinxin You
Ji Wu
137
12
0
02 Jun 2023
From Pixels to UI Actions: Learning to Follow Instructions via Graphical
  User Interfaces
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User InterfacesNeural Information Processing Systems (NeurIPS), 2023
Peter Shaw
Mandar Joshi
James Cohan
Jonathan Berant
Panupong Pasupat
Hexiang Hu
Urvashi Khandelwal
Kenton Lee
Kristina Toutanova
LLMAGLM&Ro
269
75
0
31 May 2023
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Toward Understanding Why Adam Converges Faster Than SGD for Transformers
Yan Pan
Yuanzhi Li
293
52
0
31 May 2023
Factually Consistent Summarization via Reinforcement Learning with
  Textual Entailment Feedback
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Paul Roit
Johan Ferret
Lior Shani
Roee Aharoni
Geoffrey Cideron
...
Olivier Bachem
G. Elidan
Avinatan Hassidim
Olivier Pietquin
Idan Szpektor
HILM
289
100
0
31 May 2023
Adam Accumulation to Reduce Memory Footprints of both Activations and
  Gradients for Large-scale DNN Training
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN TrainingEuropean Conference on Artificial Intelligence (ECAI), 2023
Yijia Zhang
Yibo Han
Shijie Cao
Guohao Dai
Youshan Miao
Ting Cao
Fan Yang
Ningyi Xu
118
5
0
31 May 2023
Correcting Semantic Parses with Natural Language through Dynamic Schema
  Encoding
Correcting Semantic Parses with Natural Language through Dynamic Schema Encoding
Parker Glenn
Parag Dakle
Preethi Raghavan
197
3
0
31 May 2023
Comparing and combining some popular NER approaches on Biomedical tasks
Comparing and combining some popular NER approaches on Biomedical tasksWorkshop on Biomedical Natural Language Processing (BioNLP), 2023
Harsh Verma
S. Bergler
Narjes Tahaei
196
7
0
30 May 2023
Brainformers: Trading Simplicity for Efficiency
Brainformers: Trading Simplicity for EfficiencyInternational Conference on Machine Learning (ICML), 2023
Yan-Quan Zhou
Nan Du
Yanping Huang
Daiyi Peng
Chang Lan
...
Zhifeng Chen
Quoc V. Le
Claire Cui
J.H.J. Laundon
J. Dean
MoE
247
36
0
29 May 2023
Federated Learning for Semantic Parsing: Task Formulation, Evaluation
  Setup, New Algorithms
Federated Learning for Semantic Parsing: Task Formulation, Evaluation Setup, New AlgorithmsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Tianshu Zhang
Wei-Han Lee
Yan Koyfman
Yu-Chuan Su
Huan Sun
FedML
117
8
0
26 May 2023
Diable: Efficient Dialogue State Tracking as Operations on Tables
Diable: Efficient Dialogue State Tracking as Operations on TablesAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Pietro Lesci
Yoshinari Fujinuma
Momchil Hardalov
Chao Shang
Yassine Benajiba
Lluís Marquez
LMTD
304
8
0
26 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Three Towers: Flexible Contrastive Learning with Pretrained Image ModelsNeural Information Processing Systems (NeurIPS), 2023
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
212
18
0
26 May 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Learning to Imagine: Visually-Augmented Natural Language GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
427
10
0
26 May 2023
Domain Aligned Prefix Averaging for Domain Generalization in Abstractive
  Summarization
Domain Aligned Prefix Averaging for Domain Generalization in Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Pranav Ajit Nair
Sukomal Pal
Pradeepika Verm
MoMe
239
2
0
26 May 2023
Incorporating Distributions of Discourse Structure for Long Document
  Abstractive Summarization
Incorporating Distributions of Discourse Structure for Long Document Abstractive SummarizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Dongqi Pu
Yifa Wang
Vera Demberg
224
28
0
26 May 2023
Scan and Snap: Understanding Training Dynamics and Token Composition in
  1-layer Transformer
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer TransformerNeural Information Processing Systems (NeurIPS), 2023
Yuandong Tian
Yiping Wang
Beidi Chen
S. Du
MLT
493
100
0
25 May 2023
SING: A Plug-and-Play DNN Learning Technique
SING: A Plug-and-Play DNN Learning Technique
Adrien Courtois
Damien Scieur
Jean-Michel Morel
Pablo Arias
Thomas Eboli
162
0
0
25 May 2023
RewriteLM: An Instruction-Tuned Large Language Model for Text Rewriting
RewriteLM: An Instruction-Tuned Large Language Model for Text RewritingAAAI Conference on Artificial Intelligence (AAAI), 2023
Lei Shu
Liangchen Luo
Jayakumar Hoskere
Yun Zhu
Canoee Liu
Simon Tong
Jindong Chen
Lei Meng
KELMLRM
273
75
0
25 May 2023
Lexinvariant Language Models
Lexinvariant Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Qian Huang
E. Zelikman
Sarah Chen
Yuhuai Wu
Gregory Valiant
Abigail Z. Jacobs
176
1
0
24 May 2023
The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing
The Role of Output Vocabulary in T2T LMs for SPARQL Semantic ParsingAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Debayan Banerjee
Pranav Ajit Nair
Ricardo Usbeck
Chris Biemann
200
3
0
24 May 2023
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models
  using Causal Mediation Analysis
A Mechanistic Interpretation of Arithmetic Reasoning in Language Models using Causal Mediation AnalysisConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Alessandro Stolfo
Yonatan Belinkov
Mrinmaya Sachan
MILMKELMLRM
280
67
0
24 May 2023
Active Learning for Natural Language Generation
Active Learning for Natural Language GenerationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yotam Perlitz
Ariel Gera
Michal Shmueli-Scheuer
D. Sheinwald
Noam Slonim
L. Ein-Dor
334
4
0
24 May 2023
Text encoders bottleneck compositionality in contrastive vision-language
  models
Text encoders bottleneck compositionality in contrastive vision-language modelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Amita Kamath
Jack Hessel
Kai-Wei Chang
CoGeCLIPVLM
273
30
0
24 May 2023
Previous
123...789...141516
Next
Page 8 of 16
Pageof 16