Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.02095
Cited By
Exploring the Limits of Large Scale Pre-training
5 October 2021
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring the Limits of Large Scale Pre-training"
50 / 91 papers shown
Title
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
Scaling Laws for Downstream Task Performance in Machine Translation
Berivan Isik
Natalia Ponomareva
Hussein Hazimeh
Dimitris Paparas
Sergei Vassilvitskii
Sanmi Koyejo
105
23
0
24 Feb 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
41
0
0
17 Feb 2025
Adaptive Blind All-in-One Image Restoration
David Serrano-Lozano
Luis Herranz
Shaolin Su
Javier Vázquez-Corral
VLM
87
0
0
27 Nov 2024
Loss-to-Loss Prediction: Scaling Laws for All Datasets
David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham Kakade
77
2
0
19 Nov 2024
A Hitchhiker's Guide to Scaling Law Estimation
Leshem Choshen
Yang Zhang
Jacob Andreas
41
6
0
15 Oct 2024
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading
Fang Gao
XueTao Li
Jiabao Wang
Shengheng Ma
Jun Yu
15
0
0
08 Oct 2024
Target-Aware Language Modeling via Granular Data Sampling
Ernie Chang
Pin-Jie Lin
Yang Li
Changsheng Zhao
Daeil Kim
Rastislav Rabatin
Zechun Liu
Yangyang Shi
Vikas Chandra
SyDa
33
1
0
23 Sep 2024
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
25
0
0
22 Aug 2024
Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription
Mickaël Zehren
Marco Alunno
Paolo Bientinesi
32
0
0
29 Jul 2024
Retrieval-Augmented Generation for Natural Language Processing: A Survey
Shangyu Wu
Ying Xiong
Yufei Cui
Haolun Wu
Can Chen
...
Lianming Huang
Xue Liu
Tei-Wei Kuo
Nan Guan
C. Xue
3DV
RALM
27
25
0
18 Jul 2024
Understanding the Role of Invariance in Transfer Learning
Till Speicher
Vedant Nanda
Krishna P. Gummadi
SSL
OOD
31
1
0
05 Jul 2024
Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models
Wenzhuo Tang
Haitao Mao
Danial Dervovic
Ivan Brugere
Saumitra Mishra
Yuying Xie
Jiliang Tang
46
3
0
04 Jun 2024
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Jacob Mitchell Springer
Vaishnavh Nagarajan
Aditi Raghunathan
31
2
0
30 May 2024
Ensemble Model With Bert,Roberta and Xlnet For Molecular property prediction
Junling Hu
19
1
0
30 May 2024
Views Can Be Deceiving: Improved SSL Through Feature Space Augmentation
Kimia Hamidieh
Haoran Zhang
Swami Sankaranarayanan
Marzyeh Ghassemi
27
0
0
28 May 2024
A Multi-Perspective Analysis of Memorization in Large Language Models
Bowen Chen
Namgi Han
Yusuke Miyao
38
1
0
19 May 2024
Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration
Akshay Dudhane
Omkar Thawakar
Syed Waqas Zamir
Salman Khan
Fahad Shahbaz Khan
Ming-Hsuan Yang
AI4CE
30
6
0
02 Apr 2024
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALM
ELM
LRM
91
40
0
13 Mar 2024
Ask Your Distribution Shift if Pre-Training is Right for You
Benjamin Cohen-Wang
Joshua Vendrow
Aleksander Madry
OOD
16
3
0
29 Feb 2024
On Catastrophic Inheritance of Large Foundation Models
Hao Chen
Bhiksha Raj
Xing Xie
Jindong Wang
AI4CE
48
12
0
02 Feb 2024
keqing: knowledge-based question answering is a nature chain-of-thought mentor of LLM
Chaojie Wang
Yishi Xu
Zhong Peng
Chenxi Zhang
Bo Chen
Xinrun Wang
Lei Feng
Bo An
70
18
0
31 Dec 2023
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
220
64
0
31 Dec 2023
Continual Learning Under Language Shift
Evangelia Gogoulou
Timothée Lesort
Magnus Boman
Joakim Nivre
KELM
CLL
14
2
0
02 Nov 2023
Better with Less: A Data-Active Perspective on Pre-Training Graph Neural Networks
Jiarong Xu
Renhong Huang
Xin Jiang
Yuxuan Cao
Carl Yang
Chunping Wang
Yang Yang
AI4CE
15
14
0
02 Nov 2023
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
A. Fuller
K. Millard
James R. Green
8
60
0
01 Nov 2023
Anchor Points: Benchmarking Models with Much Fewer Examples
Rajan Vivek
Kawin Ethayarajh
Diyi Yang
Douwe Kiela
ALM
6
20
0
14 Sep 2023
Examining the Effect of Pre-training on Time Series Classification
Jiashu Pu
Shiwei Zhao
Ling Cheng
Yongzhu Chang
Runze Wu
Tangjie Lv
Rongsheng Zhang
AI4TS
18
0
0
11 Sep 2023
An Analysis of Initial Training Strategies for Exemplar-Free Class-Incremental Learning
Grégoire Petit
Michael Soumm
Eva Feillet
Adrian Daniel Popescu
Bertrand Delezoide
David Picard
C´eline Hudelot
CLL
11
7
0
22 Aug 2023
A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models
Bilel Guetarni
Féryal Windal
H. Benhabiles
Marianne Petit
Romain Dubois
Emmanuelle Leteurtre
Dominique Collard
DiffM
2
2
0
02 Aug 2023
Does Progress On Object Recognition Benchmarks Improve Real-World Generalization?
Megan Richards
Polina Kirichenko
Diane Bouchacourt
Mark Ibrahim
VLM
64
11
0
24 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
13
41
0
12 Jul 2023
Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How
Sebastian Pineda Arango
Fabio Ferreira
Arlind Kadra
Frank Hutter
Frank Hutter Josif Grabocka
19
15
0
06 Jun 2023
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter
Ajay Jaiswal
Shiwei Liu
Tianlong Chen
Zhangyang Wang
VLM
6
33
0
06 Jun 2023
Quantifying the Variability Collapse of Neural Networks
Jing-Xue Xu
Haoxiong Liu
23
4
0
06 Jun 2023
Continual Learning with Pretrained Backbones by Tuning in the Input Space
Simone Marullo
Matteo Tiezzi
Marco Gori
S. Melacci
Tinne Tuytelaars
CLL
14
2
0
05 Jun 2023
No Free Lunch in Self Supervised Representation Learning
Ihab Bendidi
Adrien Bardes
E. Cohen
Alexis Lamiable
Guillaume Bollot
Auguste Genovesio
OOD
41
11
0
23 Apr 2023
Towards Efficient Task-Driven Model Reprogramming with Foundation Models
Shoukai Xu
Jiangchao Yao
Ran Luo
Shuhai Zhang
Zihao Lian
Mingkui Tan
Bo Han
Yaowei Wang
19
5
0
05 Apr 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
102
62
0
23 Mar 2023
A Meta-Learning Approach to Predicting Performance and Data Requirements
Achin Jain
Gurumurthy Swaminathan
Paolo Favaro
Hao-Yu Yang
Avinash Ravichandran
...
Alessandro Achille
O. Dabeer
Bernt Schiele
A. Swaminathan
Stefano Soatto
22
8
0
02 Mar 2023
The Role of Pre-training Data in Transfer Learning
R. Entezari
Mitchell Wortsman
O. Saukh
M. Shariatnia
Hanie Sedghi
Ludwig Schmidt
24
20
0
27 Feb 2023
Data efficiency and extrapolation trends in neural network interatomic potentials
Joshua A Vita
Daniel Schwalbe-Koda
16
16
0
12 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
26
562
0
10 Feb 2023
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
19
18
0
30 Jan 2023
A Closer Look at Few-shot Classification Again
Xu Luo
Hao Wu
Ji Zhang
Lianli Gao
Jing Xu
Jingkuan Song
14
48
0
28 Jan 2023
Does progress on ImageNet transfer to real-world datasets?
Alex Fang
Simon Kornblith
Ludwig Schmidt
VLM
14
33
0
11 Jan 2023
Exploring Efficient Few-shot Adaptation for Vision Transformers
C. Xu
Siqian Yang
Yabiao Wang
Zhanxiong Wang
Yanwei Fu
Xiangyang Xue
14
16
0
06 Jan 2023
Model Ratatouille: Recycling Diverse Models for Out-of-Distribution Generalization
Alexandre Ramé
Kartik Ahuja
Jianyu Zhang
Matthieu Cord
Léon Bottou
David Lopez-Paz
MoMe
OODD
11
80
0
20 Dec 2022
OAMixer: Object-aware Mixing Layer for Vision Transformers
H. Kang
Sangwoo Mo
Jinwoo Shin
VLM
29
4
0
13 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
22
8
0
12 Dec 2022
1
2
Next