Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.04235
Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
11 April 2018
Noam M. Shazeer
Mitchell Stern
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"
50 / 799 papers shown
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
Conference on Machine Translation (WMT), 2024
Juraj Juraska
Daniel Deutsch
Mara Finkelstein
Markus Freitag
255
84
0
04 Oct 2024
What Matters for Model Merging at Scale?
Prateek Yadav
Tu Vu
Jonathan Lai
Alexandra Chronopoulou
Manaal Faruqui
Joey Tianyi Zhou
Tsendsuren Munkhdalai
MoMe
273
43
0
04 Oct 2024
MELODI: Exploring Memory Compression for Long Contexts
International Conference on Learning Representations (ICLR), 2024
Yinpeng Chen
DeLesley Hutchins
Aren Jansen
Andrey Zhmoginov
David Racz
Jesper Andersen
194
3
0
04 Oct 2024
CorPipe at CRAC 2024: Predicting Zero Mentions from Raw Text
Milan Straka
LRM
206
6
0
03 Oct 2024
On the Inductive Bias of Stacking Towards Improving Reasoning
Neural Information Processing Systems (NeurIPS), 2024
Nikunj Saunshi
Stefani Karp
Shankar Krishnan
Sobhan Miryoosefi
Sashank J. Reddi
Sanjiv Kumar
LRM
AI4CE
294
13
0
27 Sep 2024
Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
Yael Segal-Feldman
Aviv Shamsian
Aviv Navon
Gill Hetz
Joseph Keshet
181
6
0
24 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
530
93
0
17 Sep 2024
Propulsion: Steering LLM with Tiny Fine-Tuning
International Conference on Computational Linguistics (COLING), 2024
Md. Kowsher
Nusrat Jahan Prottasha
Prakash Bhat
294
11
0
17 Sep 2024
Exploring Foundation Models for Synthetic Medical Imaging: A Study on Chest X-Rays and Fine-Tuning Techniques
Davide Clode da Silva
Marina Musse Bernardes
Nathalia Giacomini Ceretta
Gabriel Vaz de Souza
Gabriel Fonseca Silva
Rafael Heitor Bordini
S. Musse
MedIm
LM&MA
153
0
0
06 Sep 2024
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak
Conference on Machine Translation (WMT), 2024
Mukhammadsaid Mamasaidov
Abror Shopulatov
VLM
113
7
0
06 Sep 2024
The AdEMAMix Optimizer: Better, Faster, Older
International Conference on Learning Representations (ICLR), 2024
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
335
23
0
05 Sep 2024
NeuralOOD: Improving Out-of-Distribution Generalization Performance with Brain-machine Fusion Learning Framework
Shuangchen Zhao
Changde Du
Hui Li
Huiguang He
161
4
0
27 Aug 2024
Diffusion Models Are Real-Time Game Engines
International Conference on Learning Representations (ICLR), 2024
Dani Valevski
Yaniv Leviathan
Moab Arar
Shlomi Fruchter
DiffM
VGen
AI4CE
541
158
0
27 Aug 2024
FLEURS-ASL: Including American Sign Language in Massively Multilingual Multitask Evaluation
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Garrett Tanzer
SLR
VLM
226
8
0
24 Aug 2024
Memory-Efficient LLM Training with Online Subspace Descent
Neural Information Processing Systems (NeurIPS), 2024
Kaizhao Liang
Bo Liu
Lizhang Chen
Qiang Liu
237
27
0
23 Aug 2024
Data-Centric Approach to Constrained Machine Learning: A Case Study on Conway's Game of Life
A. Bibin
Anton Dereventsov
140
2
0
23 Aug 2024
Crafting Tomorrow's Headlines: Neural News Generation and Detection in English, Turkish, Hungarian, and Persian
Cem Uyuk
Danica Rovó
Shaghayegh Kolli
Rabia Varol
Georg Groh
Daryna Dementieva
216
2
0
20 Aug 2024
Instruction Finetuning for Leaderboard Generation from Empirical AI Research
Salomon Kabongo
Jennifer D'Souza
ALM
196
0
0
19 Aug 2024
Towards Realistic Synthetic User-Generated Content: A Scaffolding Approach to Generating Online Discussions
K. Balog
John Palowitch
Barbara Ikica
Filip Radlinski
Hamidreza Alvari
Mehdi Manshadi
SyDa
228
3
0
15 Aug 2024
Training Language Models on the Knowledge Graph: Insights on Hallucinations and Their Detectability
Jiri Hron
Laura J. Culp
Gamaleldin F. Elsayed
Rosanne Liu
Ben Adlam
...
T. Warkentin
Lechao Xiao
Kelvin Xu
Jasper Snoek
Simon Kornblith
167
3
0
14 Aug 2024
MoDE: Effective Multi-task Parameter Efficient Fine-Tuning with a Mixture of Dyadic Experts
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Lin Ning
Harsh Lara
Meiqi Guo
Abhinav Rastogi
MoMe
MoE
224
4
0
02 Aug 2024
What comes after transformers? -- A selective survey connecting ideas in deep learning
Johannes Schneider
AI4CE
423
4
0
01 Aug 2024
Maverick: Efficient and Accurate Coreference Resolution Defying Recent Trends
Giuliano Martinelli
Martin Larsson
Johannes Wiesel
225
23
0
31 Jul 2024
Exploring the Effectiveness and Consistency of Task Selection in Intermediate-Task Transfer Learning
Pin-Jie Lin
Miaoran Zhang
Marius Mosbach
Dietrich Klakow
219
1
0
23 Jul 2024
Promises and Pitfalls of Generative Masked Language Modeling: Theoretical Framework and Practical Guidelines
Yuchen Li
Alexandre Kirchmeyer
Aashay Mehta
Yilong Qin
Boris Dadachev
Kishore Papineni
Sanjiv Kumar
Andrej Risteski
314
4
0
22 Jul 2024
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
Cheng Luo
Jiawei Zhao
Zhuoming Chen
Beidi Chen
A. Anandkumar
279
5
0
22 Jul 2024
MASIVE: Open-Ended Affective State Identification in English and Spanish
Nicholas Deas
Elsbeth Turcan
Iván Pérez Mejía
Kathleen McKeown
CVBM
175
1
0
16 Jul 2024
Scaling Sign Language Translation
Biao Zhang
Garrett Tanzer
Orhan Firat
LRM
VLM
SLR
238
6
0
16 Jul 2024
Self-training Language Models for Arithmetic Reasoning
Marek Kadlcík
Michal Štefánik
KELM
ReLM
OffRL
LRM
166
1
0
11 Jul 2024
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients
Zhenyu Zhang
Ajay Jaiswal
L. Yin
Shiwei Liu
Jiawei Zhao
Yuandong Tian
Zhangyang Wang
VLM
204
33
0
11 Jul 2024
Fine-Tuning Large Language Models with User-Level Differential Privacy
Zachary Charles
Arun Ganesh
Ryan McKenna
H. B. McMahan
Nicole Mitchell
Krishna Pillutla
Keith Rush
297
34
0
10 Jul 2024
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
454
36
0
10 Jul 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
214
4
0
08 Jul 2024
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation
Sungkyun Chang
Emmanouil Benetos
Holger Kirchhoff
Simon Dixon
308
9
0
05 Jul 2024
LoCo: Low-Bit Communication Adaptor for Large-scale Model Training
Xingyu Xie
Zhijie Lin
Kim-Chuan Toh
Pan Zhou
234
5
0
05 Jul 2024
Neurocache: Efficient Vector Retrieval for Long-range Language Modeling
Ali Safaya
Deniz Yuret
212
0
0
02 Jul 2024
LoPT: Low-Rank Prompt Tuning for Parameter Efficient Language Models
Shouchang Guo
Sonam Damani
Keng-hao Chang
VLM
147
3
0
27 Jun 2024
Fast Optimizer Benchmark
Simon Blauth
Tobias Bürger
Zacharias Häringer
Jörg Franke
Katharina Eggensperger
148
0
0
26 Jun 2024
Grass: Compute Efficient Low-Memory LLM Training with Structured Sparse Gradients
Aashiq Muhamed
Oscar Li
David Woodruff
Mona Diab
Virginia Smith
261
20
0
25 Jun 2024
Adam-mini: Use Fewer Learning Rates To Gain More
Yushun Zhang
Congliang Chen
Ziniu Li
Tian Ding
Chenwei Wu
Yinyu Ye
Zhi-Quan Luo
Tian Ding
457
88
0
24 Jun 2024
H-Fac: Memory-Efficient Optimization with Factorized Hamiltonian Descent
International Conference on Artificial Intelligence and Statistics (AISTATS), 2024
Son Nguyen
Lizhang Chen
Bo Liu
Qiang Liu
312
8
0
14 Jun 2024
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization
Yi Gu
Zhendong Wang
Yueqin Yin
Yujia Xie
Mingyuan Zhou
243
30
0
10 Jun 2024
PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
Interspeech (Interspeech), 2024
Shuchen Shi
Ruibo Fu
Zhengqi Wen
Jianhua Tao
Tao Wang
...
Zhengqi Wen
Yukun Liu
Yongwei Li
Zhiyong Wang
Xiaopeng Wang
172
2
0
07 Jun 2024
Exploring the Latest LLMs for Leaderboard Extraction
Salomon Kabongo
Jennifer D'Souza
Sören Auer
195
4
0
06 Jun 2024
USM RNN-T model weights binarization
Oleg Rybakov
Dmitriy Serdyuk
Chengjian Zheng
MQ
325
2
0
05 Jun 2024
Item-Language Model for Conversational Recommendation
Li Yang
Anushya Subbiah
Hardik Patel
Judith Yue Li
Yanwei Song
Reza Mirghaderi
Vikram Aggarwal
Qifan Wang
KELM
228
12
0
05 Jun 2024
LADI v2: Multi-label Dataset and Classifiers for Low-Altitude Disaster Imagery
Samuel Scheele
Katherine Picchione
Jeffrey Liu
143
1
0
04 Jun 2024
Landscape-Aware Growing: The Power of a Little LAG
Stefani Karp
Nikunj Saunshi
Sobhan Miryoosefi
Sashank J. Reddi
Sanjiv Kumar
268
1
0
04 Jun 2024
SLTrain: a sparse plus low-rank approach for parameter and memory efficient pretraining
Andi Han
Jiaxiang Li
Wei Huang
Mingyi Hong
Akiko Takeda
Pratik Jawanpuria
Bamdev Mishra
296
31
0
04 Jun 2024
Selectively Answering Visual Questions
Julian Martin Eisenschlos
Hernán Maina
Guido Ivetta
Luciana Benotti
249
1
0
03 Jun 2024
Previous
1
2
3
4
5
...
14
15
16
Next
Page 4 of 16
Page
of 16
Go