ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04560
  4. Cited By
Scaling Vision Transformers

Scaling Vision Transformers

8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
    ViT
ArXivPDFHTML

Papers citing "Scaling Vision Transformers"

43 / 243 papers shown
Title
AtmoDist: Self-supervised Representation Learning for Atmospheric
  Dynamics
AtmoDist: Self-supervised Representation Learning for Atmospheric Dynamics
Sebastian Hoffmann
C. Lessig
AI4Cl
24
8
0
02 Feb 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Pushing the limits of self-supervised ResNets: Can we outperform
  supervised learning without labels on ImageNet?
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?
Nenad Tomašev
Ioana Bica
Brian McWilliams
Lars Buesing
Razvan Pascanu
Charles Blundell
Jovana Mitrović
SSL
74
80
0
13 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
24
211
0
12 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
26
207
0
07 Jan 2022
Persformer: A Transformer Architecture for Topological Machine Learning
Persformer: A Transformer Architecture for Topological Machine Learning
Raphael Reinauer
Matteo Caorsi
Nicolas Berkouk
16
15
0
30 Dec 2021
A Simple Single-Scale Vision Transformer for Object Localization and
  Instance Segmentation
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Wuyang Chen
Xianzhi Du
Fan Yang
Lucas Beyer
Xiaohua Zhai
...
Huizhong Chen
Jing Li
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
21
20
0
17 Dec 2021
Co-training Transformer with Videos and Images Improves Action
  Recognition
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
20
54
0
14 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
  Network Models
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao
Beidi Chen
Kaizhao Liang
Jiaming Yang
Zhao-quan Song
Atri Rudra
Christopher Ré
25
75
0
30 Nov 2021
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image
  Analysis
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
Yucheng Tang
Dong Yang
Wenqi Li
H. Roth
Bennett Landman
Daguang Xu
V. Nath
Ali Hatamizadeh
ViT
MedIm
24
517
0
29 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLM
VLM
28
246
0
24 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
24
878
0
22 Nov 2021
SimMIM: A Simple Framework for Masked Image Modeling
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Qi Dai
Han Hu
37
1,309
0
18 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
47
1,744
0
18 Nov 2021
Scaling Law for Recommendation Models: Towards General-purpose User
  Representations
Scaling Law for Recommendation Models: Towards General-purpose User Representations
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Max Nihlén Ramström
Jisu Jeong
Jung-Woo Ha
S. Kim
ELM
28
38
0
15 Nov 2021
A Survey of Visual Transformers
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
71
330
0
11 Nov 2021
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
Christoph Schuhmann
Richard Vencu
Romain Beaumont
R. Kaczmarczyk
Clayton Mullis
Aarush Katta
Theo Coombes
J. Jitsev
Aran Komatsuzaki
VLM
MLLM
CLIP
10
1,371
0
03 Nov 2021
The Efficiency Misnomer
The Efficiency Misnomer
Daoyuan Chen
Liuyi Yao
Dawei Gao
Ashish Vaswani
Yaliang Li
32
98
0
25 Oct 2021
Sinkformers: Transformers with Doubly Stochastic Attention
Sinkformers: Transformers with Doubly Stochastic Attention
Michael E. Sander
Pierre Ablin
Mathieu Blondel
Gabriel Peyré
27
76
0
22 Oct 2021
No One Representation to Rule Them All: Overlapping Features of Training
  Methods
No One Representation to Rule Them All: Overlapping Features of Training Methods
Raphael Gontijo-Lopes
Yann N. Dauphin
E. D. Cubuk
18
60
0
20 Oct 2021
Exploring the Limits of Large Scale Pre-training
Exploring the Limits of Large Scale Pre-training
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
55
114
0
05 Oct 2021
ResNet strikes back: An improved training procedure in timm
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
209
487
0
01 Oct 2021
Digital Signal Processing Using Deep Neural Networks
Digital Signal Processing Using Deep Neural Networks
Brian Shevitski
Y. Watkins
Nicole Man
Michael Girard
AI4CE
18
4
0
21 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández Orallo
29
113
0
12 Sep 2021
Robust fine-tuning of zero-shot models
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
23
688
0
04 Sep 2021
Towards Efficient and Data Agnostic Image Classification Training
  Pipeline for Embedded Systems
Towards Efficient and Data Agnostic Image Classification Training Pipeline for Embedded Systems
K. Prokofiev
V. Sovrasov
3DH
19
2
0
16 Aug 2021
Go Wider Instead of Deeper
Go Wider Instead of Deeper
Fuzhao Xue
Ziji Shi
Futao Wei
Yuxuan Lou
Yong Liu
Yang You
ViT
MoE
17
80
0
25 Jul 2021
A Systematic Survey of Text Worlds as Embodied Natural Language
  Environments
A Systematic Survey of Text Worlds as Embodied Natural Language Environments
Peter Alexander Jansen
LM&Ro
15
21
0
08 Jul 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
53
749
0
25 Jun 2021
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo
A. Piergiovanni
Anurag Arnab
Mostafa Dehghani
A. Angelova
ViT
23
127
0
21 Jun 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
30
2,744
0
15 Jun 2021
Scaling Vision with Sparse Mixture of Experts
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
12
575
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
271
2,603
0
04 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
308
5,773
0
29 Apr 2021
Visformer: The Vision-friendly Transformer
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
111
209
0
26 Apr 2021
The Shape of Learning Curves: a Review
The Shape of Learning Curves: a Review
T. Viering
Marco Loog
18
122
0
19 Mar 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction
  without Convolutions
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
274
3,622
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
298
3,693
0
11 Feb 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Tsung-Yi Lin
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
290
979
0
27 Jan 2021
Transformers in Vision: A Survey
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
227
2,428
0
04 Jan 2021
Meta Pseudo Labels
Meta Pseudo Labels
Hieu H. Pham
Zihang Dai
Qizhe Xie
Minh-Thang Luong
Quoc V. Le
VLM
253
656
0
23 Mar 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
228
4,460
0
23 Jan 2020
Previous
12345