Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.04560
Cited By
Scaling Vision Transformers
8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Vision Transformers"
50 / 243 papers shown
Title
Scaling Law for Time Series Forecasting
Jingzhe Shi
Qinwei Ma
Huan Ma
Lei Li
AI4TS
31
8
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
71
41
0
23 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
59
0
0
13 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao-Yu Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
55
15
0
08 May 2024
Decentralized Personalized Federated Learning based on a Conditional Sparse-to-Sparser Scheme
Qianyu Long
Qiyuan Wang
Christos Anagnostopoulos
Daning Bi
FedML
26
0
0
24 Apr 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLM
CLIP
36
6
0
24 Apr 2024
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang
ZiYun Wang
Lingjie Liu
Kostas Daniilidis
37
25
0
26 Mar 2024
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Mu Hu
Wei Yin
C. Zhang
Zhipeng Cai
Xiaoxiao Long
Kaixuan Wang
Kaixuan Wang
Gang Yu
Chunhua Shen
Shaojie Shen
3DGS
52
115
0
22 Mar 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection
Jiawei Zhao
Zhenyu (Allen) Zhang
Beidi Chen
Zhangyang Wang
A. Anandkumar
Yuandong Tian
43
173
0
06 Mar 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
86
1,059
0
05 Mar 2024
Multi-objective Differentiable Neural Architecture Search
R. Sukthanker
Arber Zela
B. Staffler
Samuel Dooley
Josif Grabocka
Frank Hutter
37
1
0
28 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
29
29
0
20 Feb 2024
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse
Patrick Fernandes
Nuno M. Guerreiro
António Loison
Duarte M. Alves
...
François Yvon
André F.T. Martins
Gautier Viaud
C´eline Hudelot
Pierre Colombo
43
32
0
01 Feb 2024
VIALM: A Survey and Benchmark of Visually Impaired Assistance with Large Models
Yi Zhao
Yilin Zhang
Rong Xiang
Jing Li
Hillming Li
31
16
0
29 Jan 2024
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Edward Sanderson
B. Matuszewski
21
2
0
11 Jan 2024
Effective pruning of web-scale datasets based on complexity of concept clusters
Amro Abbas
E. Rusak
Kushal Tirumala
Wieland Brendel
Kamalika Chaudhuri
Ari S. Morcos
VLM
CLIP
34
22
0
09 Jan 2024
An Empirical Study of Scaling Law for OCR
Miao Rang
Zhenni Bi
Chuanjian Liu
Yunhe Wang
Kai Han
33
6
0
29 Dec 2023
How Smooth Is Attention?
Valérie Castin
Pierre Ablin
Gabriel Peyré
AAML
40
9
0
22 Dec 2023
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
39
174
0
11 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
13
16
0
08 Dec 2023
Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data
M. Olson
Shusen Liu
Jayaraman J. Thiagarajan
B. Kustowski
Weng-Keen Wong
Rushil Anirudh
AI4CE
26
1
0
06 Dec 2023
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Xunguang Wang
Zhenlan Ji
Pingchuan Ma
Zongjie Li
Shuai Wang
MLLM
35
11
0
04 Dec 2023
Improve Supervised Representation Learning with Masked Image Modeling
Kaifeng Chen
Daniel M. Salz
Huiwen Chang
Kihyuk Sohn
Dilip Krishnan
Mojtaba Seyedhosseini
SSL
ViT
37
3
0
01 Dec 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
33
143
0
10 Nov 2023
PolyMaX: General Dense Prediction with Mask Transformer
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
...
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
26
14
0
09 Nov 2023
Gramian Attention Heads are Strong yet Efficient Vision Learners
Jongbin Ryu
Dongyoon Han
J. Lim
25
1
0
25 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
17
33
0
20 Oct 2023
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
19
3
0
15 Oct 2023
Transformer Fusion with Optimal Transport
Moritz Imfeld
Jacopo Graldi
Marco Giordano
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
ViT
MoMe
22
16
0
09 Oct 2023
DimCL: Dimensional Contrastive Learning For Improving Self-Supervised Learning
Thanh Nguyen
T. Pham
Chaoning Zhang
T. Luu
Thang Vu
Chang-Dong Yoo
23
9
0
21 Sep 2023
Dataset Factory: A Toolchain For Generative Computer Vision Datasets
Daniel Kharitonov
Ryan Turner
11
1
0
20 Sep 2023
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Amir Rahimi
Vanessa D’Amario
Moyuru Yamada
Kentaro Takemoto
Tomotake Sasaki
Xavier Boix
28
1
0
15 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
43
8
0
07 Sep 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
29
3
0
22 Aug 2023
Refashioning Emotion Recognition Modelling: The Advent of Generalised Large Models
Zixing Zhang
Liyizhe Peng
Tao Pang
Jing Han
Huan Zhao
Bjorn W. Schuller
32
13
0
21 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
37
3
0
18 Aug 2023
A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models
Bilel Guetarni
Féryal Windal
H. Benhabiles
Marianne Petit
Romain Dubois
Emmanuelle Leteurtre
Dominique Collard
DiffM
15
2
0
02 Aug 2023
From Sparse to Soft Mixtures of Experts
J. Puigcerver
C. Riquelme
Basil Mustafa
N. Houlsby
MoE
121
114
0
02 Aug 2023
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Anthony Brohan
Noah Brown
Justice Carbajal
Yevgen Chebotar
Xi Chen
...
Ted Xiao
Peng-Tao Xu
Sichun Xu
Tianhe Yu
Brianna Zitkovich
LM&Ro
LRM
30
1,091
0
28 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
26
118
0
25 Jul 2023
Towards Unifying Anatomy Segmentation: Automated Generation of a Full-body CT Dataset via Knowledge Aggregation and Anatomical Guidelines
A. Jaus
C. Seibold
Kelsey Hermann
Alexandra Walter
K. Giske
Johannes Haubold
Jens Kleesiek
Rainer Stiefelhagen
28
19
0
25 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
38
8
0
18 Jul 2023
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Hiroki Naganuma
Ryuichiro Hataya
Kotaro Yoshida
Ioannis Mitliagkas
OODD
84
1
0
17 Jul 2023
LongNet: Scaling Transformers to 1,000,000,000 Tokens
Jiayu Ding
Shuming Ma
Li Dong
Xingxing Zhang
Shaohan Huang
Wenhui Wang
Nanning Zheng
Furu Wei
CLL
35
151
0
05 Jul 2023
MaskBEV: Joint Object Detection and Footprint Completion for Bird's-eye View 3D Point Clouds
William Guimont-Martin
Jean-Michel Fortin
François Pomerleau
Philippe Giguère
3DPC
12
1
0
04 Jul 2023
Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation
Xin Yuan
Pedro H. P. Savarese
Michael Maire
8
5
0
22 Jun 2023
RedMotion: Motion Prediction via Redundancy Reduction
Royden Wagner
Ömer Sahin Tas
Marvin Klemp
Carlos Fernandez Lopez
Christoph Stiller
46
6
0
19 Jun 2023
Parameter-efficient is not sufficient: Exploring Parameter, Memory, and Time Efficient Adapter Tuning for Dense Predictions
Dongshuo Yin
Xueting Han
Bin Li
Hao Feng
Jinghua Bai
VPVLM
26
17
0
16 Jun 2023
One-Shot Learning of Visual Path Navigation for Autonomous Vehicles
Zhongying CuiZhu
François Charette
A. Ghafourian
Debo Shi
Matthew Cui
Anjali Krishnamachar
I. S. Bozchalooi
24
1
0
15 Jun 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang
Bhishma Dedhia
N. Jha
ViT
VLM
36
26
0
27 May 2023
Previous
1
2
3
4
5
Next