Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.02095
Cited By
Exploring the Limits of Large Scale Pre-training
5 October 2021
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring the Limits of Large Scale Pre-training"
41 / 91 papers shown
Title
Transferability Estimation Based On Principal Gradient Expectation
Huiyan Qi
Lechao Cheng
Jingjing Chen
Yue Yu
Xue Song
Zunlei Feng
Yueping Jiang
12
2
0
29 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
26
41
0
17 Nov 2022
Cross-Reality Re-Rendering: Manipulating between Digital and Physical Realities
Siddhartha Datta
14
0
0
15 Nov 2022
Where to start? Analyzing the potential value of intermediate models
Leshem Choshen
Elad Venezian
Shachar Don-Yehiya
Noam Slonim
Yoav Katz
MoMe
17
27
0
31 Oct 2022
Changes from Classical Statistics to Modern Statistics and Data Science
Kai Zhang
Shan-Yu Liu
M. Xiong
15
0
0
30 Oct 2022
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
19
74
0
26 Oct 2022
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
22
101
0
26 Oct 2022
A Kernel-Based View of Language Model Fine-Tuning
Sadhika Malladi
Alexander Wettig
Dingli Yu
Danqi Chen
Sanjeev Arora
VLM
66
60
0
11 Oct 2022
Adversarial Lagrangian Integrated Contrastive Embedding for Limited Size Datasets
Amin Jalali
Minho Lee
8
5
0
06 Oct 2022
Under the Cover Infant Pose Estimation using Multimodal Data
Daniel G. Kyrollos
A. Fuller
K. Greenwood
J. Harrold
J.R. Green
3DH
13
6
0
03 Oct 2022
Transfer Learning with Pretrained Remote Sensing Transformers
A. Fuller
K. Millard
J.R. Green
15
11
0
28 Sep 2022
Revisiting Neural Scaling Laws in Language and Vision
Ibrahim M. Alabdulmohsin
Behnam Neyshabur
Xiaohua Zhai
148
101
0
13 Sep 2022
Intersection of Parallels as an Early Stopping Criterion
Ali Vardasbi
Maarten de Rijke
Mostafa Dehghani
MoMe
17
5
0
19 Aug 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
9
100
0
21 Jul 2022
UFO: Unified Feature Optimization
Teng Xi
Yifan Sun
Deli Yu
Bi Li
Nan Peng
...
Haocheng Feng
Junyu Han
Jingtuo Liu
Errui Ding
Jingdong Wang
19
10
0
21 Jul 2022
How Much More Data Do I Need? Estimating Requirements for Downstream Tasks
Rafid Mahmood
James Lucas
David Acuna
Daiqing Li
Jonah Philion
Jose M. Alvarez
Zhiding Yu
Sanja Fidler
M. Law
6
25
0
04 Jul 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
129
94
0
01 Jul 2022
LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks
Tuan Dinh
Yuchen Zeng
Ruisu Zhang
Ziqian Lin
Michael Gira
Shashank Rajput
Jy-yong Sohn
Dimitris Papailiopoulos
Kangwook Lee
LMTD
32
123
0
14 Jun 2022
Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing
Linlu Qiu
Peter Shaw
Panupong Pasupat
Tianze Shi
Jonathan Herzig
Emily Pitler
Fei Sha
Kristina Toutanova
AI4CE
LRM
23
52
0
24 May 2022
Deep transfer learning for image classification: a survey
J. Plested
Tom Gedeon
OOD
14
36
0
20 May 2022
Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey
Kento Nozawa
Issei Sato
AI4TS
8
4
0
18 Apr 2022
GreaseVision: Rewriting the Rules of the Interface
Siddhartha Datta
Konrad Kollnig
N. Shadbolt
20
5
0
07 Apr 2022
Fusing finetuned models for better pretraining
Leshem Choshen
Elad Venezian
Noam Slonim
Yoav Katz
FedML
AI4CE
MoMe
25
86
0
06 Apr 2022
Last Layer Re-Training is Sufficient for Robustness to Spurious Correlations
Polina Kirichenko
Pavel Izmailov
A. Wilson
OOD
24
313
0
06 Apr 2022
How stable are Transferability Metrics evaluations?
A. Agostinelli
Michal Pándy
J. Uijlings
Thomas Mensink
V. Ferrari
17
17
0
04 Apr 2022
Understanding Contrastive Learning Requires Incorporating Inductive Biases
Nikunj Saunshi
Jordan T. Ash
Surbhi Goel
Dipendra Kumar Misra
Cyril Zhang
Sanjeev Arora
Sham Kakade
A. Krishnamurthy
SSL
19
108
0
28 Feb 2022
Transformer Memory as a Differentiable Search Index
Yi Tay
Vinh Q. Tran
Mostafa Dehghani
Jianmo Ni
Dara Bahri
...
Zhe Zhao
Jai Gupta
Tal Schuster
William W. Cohen
Donald Metzler
8
260
0
14 Feb 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi
Y. Carmon
Jonathan Berant
11
17
0
13 Feb 2022
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Mannat Singh
Laura Gustafson
Aaron B. Adcock
Vinicius de Freitas Reis
B. Gedik
Raj Prateek Kosaraju
D. Mahajan
Ross B. Girshick
Piotr Dollár
L. V. D. van der Maaten
VLM
24
117
0
20 Jan 2022
Transferability in Deep Learning: A Survey
Junguang Jiang
Yang Shu
Jianmin Wang
Mingsheng Long
OOD
12
100
0
15 Jan 2022
Weakly-guided Self-supervised Pretraining for Temporal Activity Detection
Kumara Kahatapitiya
Zhou Ren
Haoxiang Li
Zhenyu Wu
Michael S. Ryoo
G. Hua
ViT
26
6
0
26 Nov 2021
PolyViT: Co-training Vision Transformers on Images, Videos and Audio
Valerii Likhosherstov
Anurag Arnab
K. Choromanski
Mario Lucic
Yi Tay
Adrian Weller
Mostafa Dehghani
ViT
33
73
0
25 Nov 2021
No One Representation to Rule Them All: Overlapping Features of Training Methods
Raphael Gontijo-Lopes
Yann N. Dauphin
E. D. Cubuk
12
58
0
20 Oct 2021
SCENIC: A JAX Library for Computer Vision Research and Beyond
Mostafa Dehghani
A. Gritsenko
Anurag Arnab
Matthias Minderer
Yi Tay
41
67
0
18 Oct 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
183
89
0
22 Sep 2021
WebQA: Multihop and Multimodal QA
Yingshan Chang
M. Narang
Hisami Suzuki
Guihong Cao
Jianfeng Gao
Yonatan Bisk
LRM
8
76
0
01 Sep 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
239
2,554
0
04 May 2021
Carbon Emissions and Large Neural Network Training
David A. Patterson
Joseph E. Gonzalez
Quoc V. Le
Chen Liang
Lluís-Miquel Munguía
D. Rothchild
David R. So
Maud Texier
J. Dean
AI4CE
239
626
0
21 Apr 2021
Why Do Better Loss Functions Lead to Less Transferable Features?
Simon Kornblith
Ting-Li Chen
Honglak Lee
Mohammad Norouzi
FaML
14
90
0
30 Oct 2020
Meta Pseudo Labels
Hieu H. Pham
Zihang Dai
Qizhe Xie
Minh-Thang Luong
Quoc V. Le
VLM
245
648
0
23 Mar 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
223
4,424
0
23 Jan 2020
Previous
1
2