Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1804.04235
Cited By
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
11 April 2018
Noam M. Shazeer
Mitchell Stern
ODL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Adafactor: Adaptive Learning Rates with Sublinear Memory Cost"
50 / 799 papers shown
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
325
18
0
01 Nov 2021
ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5
David Samuel
Milan Straka
143
17
0
28 Oct 2021
Applications and Techniques for Fast Machine Learning in Science
Frontiers in Big Data (Front. Big Data), 2021
A. Deiana
Nhan Tran
Joshua C. Agar
Michaela Blott
G. D. Guglielmo
...
Ashish Sharma
S. Summers
Pietro Vischia
J. Vlimant
Olivia Weng
214
81
0
25 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
477
117
0
16 Oct 2021
The Power of Prompt Tuning for Low-Resource Semantic Parsing
Nathan Schucher
Siva Reddy
H. D. Vries
VLM
236
36
0
16 Oct 2021
Improving Compositional Generalization with Self-Training for Data-to-Text Generation
Sanket Vaibhav Mehta
J. Rao
Yi Tay
Mihir Kale
Ankur P. Parikh
Emma Strubell
AI4CE
248
34
0
16 Oct 2021
Control Prefixes for Parameter-Efficient Text Generation
Jordan Clive
Kris Cao
Marek Rei
267
34
0
15 Oct 2021
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
Tu Vu
Brian Lester
Noah Constant
Rami Al-Rfou
Daniel Cer
VLM
LRM
473
315
0
15 Oct 2021
LFPT5: A Unified Framework for Lifelong Few-shot Language Learning Based on Prompt Tuning of T5
Chengwei Qin
Shafiq Joty
CLL
402
122
0
14 Oct 2021
Vector-quantized Image Modeling with Improved VQGAN
International Conference on Learning Representations (ICLR), 2021
Jiahui Yu
Xin Li
Jing Yu Koh
Han Zhang
Ruoming Pang
James Qin
Alexander Ku
Yuanzhong Xu
Jason Baldridge
Yonghui Wu
ViT
VLM
DRL
490
675
0
09 Oct 2021
8-bit Optimizers via Block-wise Quantization
Tim Dettmers
M. Lewis
Sam Shleifer
Luke Zettlemoyer
MQ
398
390
0
06 Oct 2021
Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition
Tsendsuren Munkhdalai
K. Sim
Angad Chandorkar
Fan Gao
Mason Chua
Trevor Strohman
F. Beaufays
224
39
0
05 Oct 2021
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2021
Yu Zhang
Daniel S. Park
Wei Han
James Qin
Anmol Gulati
...
Zhifeng Chen
Quoc V. Le
Chung-Cheng Chiu
Ruoming Pang
Yonghui Wu
SSL
220
196
0
27 Sep 2021
Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Sneha Kudugunta
Yanping Huang
Ankur Bapna
M. Krikun
Dmitry Lepikhin
Minh-Thang Luong
Orhan Firat
MoE
1.2K
127
0
24 Sep 2021
Well Googled is Half Done: Multimodal Forecasting of New Fashion Product Sales with Image-based Google Trends
Geri Skenderi
Christian Joppi
Matteo Denitto
Marco Cristani
AI4TS
308
33
0
20 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
401
184
0
17 Sep 2021
Scaling Laws for Neural Machine Translation
Behrooz Ghorbani
Orhan Firat
Markus Freitag
Ankur Bapna
M. Krikun
Xavier Garcia
Ciprian Chelba
Colin Cherry
212
125
0
16 Sep 2021
ePiC: Employing Proverbs in Context as a Benchmark for Abstract Language Understanding
Sayan Ghosh
Shashank Srivastava
292
16
0
14 Sep 2021
STraTA: Self-Training with Task Augmentation for Better Few-shot Learning
Tu Vu
Minh-Thang Luong
Quoc V. Le
Grady Simon
Mohit Iyyer
410
62
0
13 Sep 2021
Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information
Majid Jahani
S. Rusakov
Zheng Shi
Peter Richtárik
Michael W. Mahoney
Martin Takávc
ODL
190
29
0
11 Sep 2021
PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Torsten Scholak
Nathan Schucher
Dzmitry Bahdanau
465
471
0
10 Sep 2021
ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Alireza Salemi
Emad Kebriaei
Ghazal Neisi Minaei
A. Shakery
CVBM
134
6
0
09 Sep 2021
Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Leonardo F. R. Ribeiro
Jonas Pfeiffer
Yue Zhang
Iryna Gurevych
214
11
0
08 Sep 2021
FH-SWF SG at GermEval 2021: Using Transformer-Based Language Models to Identify Toxic, Engaging, & Fact-Claiming Comments
Tobias Bornheim
Stephan Bialonski
123
12
0
07 Sep 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
1.7K
4,618
0
03 Sep 2021
Do Prompt-Based Models Really Understand the Meaning of their Prompts?
Albert Webson
Ellie Pavlick
LRM
429
426
0
02 Sep 2021
Effective Sequence-to-Sequence Dialogue State Tracking
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Jeffrey Zhao
Mahdis Mahdieh
Ye Zhang
Yuan Cao
Yonghui Wu
230
43
0
31 Aug 2021
Injecting Text in Self-Supervised Speech Pretraining
Automatic Speech Recognition & Understanding (ASRU), 2021
Zhehuai Chen
Yu Zhang
Andrew Rosenberg
Bhuvana Ramabhadran
Gary Wang
Pedro J. Moreno
SSL
165
38
0
27 Aug 2021
Alleviating Exposure Bias via Contrastive Learning for Abstractive Text Summarization
Shichao Sun
Wenjie Li
126
29
0
26 Aug 2021
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Jianmo Ni
Gustavo Hernández Ábrego
Noah Constant
Ji Ma
Keith B. Hall
Daniel Cer
Yinfei Yang
528
708
0
19 Aug 2021
How Optimal is Greedy Decoding for Extractive Question Answering?
Conference on Automated Knowledge Base Construction (AKBC), 2021
Or Castel
Ori Ram
Avia Efrat
Omer Levy
202
5
0
12 Aug 2021
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Automatic Speech Recognition & Understanding (ASRU), 2021
Yu-An Chung
Yu Zhang
Wei Han
Chung-Cheng Chiu
James Qin
Ruoming Pang
Yonghui Wu
SSL
VLM
250
500
0
07 Aug 2021
Large-Scale Differentially Private BERT
Rohan Anil
Badih Ghazi
Vineet Gupta
Ravi Kumar
Pasin Manurangsi
245
148
0
03 Aug 2021
Towards Universality in Multilingual Text Rewriting
Xavier Garcia
Noah Constant
Mandy Guo
Orhan Firat
LRM
184
11
0
30 Jul 2021
Sequence-to-Sequence Piano Transcription with Transformers
International Society for Music Information Retrieval Conference (ISMIR), 2021
Curtis Hawthorne
Ian Simon
Rigel Swavely
Ethan Manilow
Jesse Engel
334
98
0
19 Jul 2021
Deduplicating Training Data Makes Language Models Better
Katherine Lee
Daphne Ippolito
A. Nystrom
Chiyuan Zhang
Douglas Eck
Chris Callison-Burch
Nicholas Carlini
SyDa
717
770
0
14 Jul 2021
XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages
Findings (Findings), 2021
Tahmid Hasan
Abhik Bhattacharjee
Md. Saiful Islam
Kazi Samin Mubasshir
Yuan-Fang Li
Yong-Bin Kang
M. Rahman
Rifat Shahriyar
314
449
0
25 Jun 2021
Black Box Variational Bayesian Model Averaging
Vojtech Kejzlar
Shrijita Bhattacharya
Mookyong Son
T. Maiti
BDL
222
3
0
23 Jun 2021
LocoProp: Enhancing BackProp via Local Loss Optimization
International Conference on Artificial Intelligence and Statistics (AISTATS), 2021
Ehsan Amid
Rohan Anil
Manfred K. Warmuth
ODL
172
21
0
11 Jun 2021
Scaling Vision Transformers
Computer Vision and Pattern Recognition (CVPR), 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
474
1,309
0
08 Jun 2021
Enriching Transformers with Structured Tensor-Product Representations for Abstractive Summarization
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Yichen Jiang
Asli Celikyilmaz
P. Smolensky
Paul Soulos
Sudha Rao
Hamid Palangi
Roland Fernandez
Caitlin Smith
Joey Tianyi Zhou
Jianfeng Gao
148
21
0
02 Jun 2021
A Multi-Level Attention Model for Evidence-Based Fact Checking
Findings (Findings), 2021
Canasai Kruengkrai
Junichi Yamagishi
Xin Wang
GNN
156
29
0
02 Jun 2021
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Rowan Zellers
Ari Holtzman
Matthew E. Peters
Roozbeh Mottaghi
Aniruddha Kembhavi
Ali Farhadi
Yejin Choi
314
77
0
01 Jun 2021
M6-T: Exploring Sparse Expert Models and Beyond
An Yang
Junyang Lin
Rui Men
Chang Zhou
Le Jiang
...
Dingyang Zhang
Jialin Li
Lin Qu
Jingren Zhou
Hongxia Yang
MoE
367
24
0
31 May 2021
Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models
Workshop on Innovative Use of NLP for Building Educational Applications (UNBEA), 2021
Felix Stahlberg
Shankar Kumar
SyDa
220
103
0
27 May 2021
A cost-benefit analysis of cross-lingual transfer methods
G. Rosa
L. Bonifacio
Leandro Rodrigues de Souza
R. Lotufo
Rodrigo Nogueira
217
14
0
14 May 2021
GSPMD: General and Scalable Parallelization for ML Computation Graphs
Yuanzhong Xu
HyoukJoong Lee
Dehao Chen
Blake A. Hechtman
Yanping Huang
...
Noam M. Shazeer
Shibo Wang
Tao Wang
Yonghui Wu
Zhifeng Chen
MoE
218
161
0
10 May 2021
Are Pre-trained Convolutions Better than Pre-trained Transformers?
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Yi Tay
Mostafa Dehghani
J. Gupta
Dara Bahri
V. Aribandi
Zhen Qin
Donald Metzler
AI4CE
177
51
0
07 May 2021
Learning to Perturb Word Embeddings for Out-of-distribution QA
Annual Meeting of the Association for Computational Linguistics (ACL), 2021
Seanie Lee
Minki Kang
Juho Lee
Sung Ju Hwang
OOD
380
19
0
06 May 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Automatic Speech Recognition & Understanding (ASRU), 2021
Yue Liu
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Wenjie Huang
Min Ma
Junwen Bai
CLL
380
83
0
30 Apr 2021
Previous
1
2
3
...
13
14
15
16
Next