Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2502.15021
Cited By
v1
v2
v3
v4 (latest)
Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers
20 February 2025
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github (12★)
Papers citing
"Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers"
50 / 79 papers shown
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting
Computer Vision and Pattern Recognition (CVPR), 2025
Kaouther Messaoud
Matthieu Cord
Alexandre Alahi
288
5
0
10 Jan 2025
Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data
Soroush Omranpour
Guillaume Rabusseau
Reihaneh Rabbany
AI4TS
AIFin
227
4
0
13 Dec 2024
FlowTS: Time Series Generation via Rectified Flow
Yang Hu
Xinyu Wang
Lirong Wu
Huatian Zhang
Stan Z. Li
Sheng Wang
Jen-tse Huang
Jiheng Zhang
Ziyun Li
Tianlong Chen
AI4TS
426
0
0
12 Nov 2024
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
International Conference on Learning Representations (ICLR), 2024
Hugo Thimonier
José Lucas De Melo Costa
Fabrice Popineau
Arpad Rimmel
Bich-Liên Doan
553
8
0
07 Oct 2024
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
327
21
0
19 Sep 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei
Abhinav Gupta
Pedro Morgado
SSL
261
18
0
22 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
568
430
0
11 Jul 2024
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu
Andres Potapczynski
Marc Finzi
Micah Goldblum
Andrew Gordon Wilson
293
24
0
10 Jun 2024
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu
277
8
0
28 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
372
7
0
22 May 2024
MobileNetV4 - Universal Models for the Mobile Ecosystem
Danfeng Qin
Chas Leichner
M. Delakis
Marco Fornoni
Shixin Luo
...
Berkin Akin
Vaibhav Aggarwal
Tenghui Zhu
Daniele Moro
Andrew G. Howard
MQ
424
495
0
16 Apr 2024
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
561
181
0
20 Mar 2024
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li
Qiang Nie
Weifu Fu
Yuhuan Lin
Guangpin Tao
Yong-Jin Liu
Chengjie Wang
295
9
0
07 Mar 2024
Learning and Leveraging World Models in Visual Representation Learning
Q. Garrido
Mahmoud Assran
Nicolas Ballas
Adrien Bardes
Laurent Najman
Yann LeCun
SSL
349
61
0
01 Mar 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
443
12
0
19 Feb 2024
Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning
Esther Rolf
Konstantin Klemmer
Caleb Robinson
Hannah Kerner
273
82
0
02 Feb 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Computer Vision and Pattern Recognition (CVPR), 2024
Seokju Yun
Youngmin Ro
ViT
447
125
0
29 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
431
38
0
25 Jan 2024
Vision Transformers Need Registers
International Conference on Learning Representations (ICLR), 2023
Zilong Chen
Maxime Oquab
Julien Mairal
Huaping Liu
ViT
564
706
0
28 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
754
1,842
0
24 Aug 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
International Conference on Learning Representations (ICLR), 2023
Tri Dao
LRM
601
2,426
0
17 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution
Neural Information Processing Systems (NeurIPS), 2023
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
428
206
0
12 Jul 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
International Conference on Machine Learning (ICML), 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
434
366
0
01 Jun 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Neural Information Processing Systems (NeurIPS), 2023
Ibrahim Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
679
98
0
22 May 2023
Transformer-Based Visual Segmentation: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
563
271
0
19 Apr 2023
FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization
IEEE International Conference on Computer Vision (ICCV), 2023
Pavan Kumar Anasosalu Vasu
J. Gabriel
Jeff J. Zhu
Oncel Tuzel
Anurag Ranjan
ViT
406
334
0
24 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
European Conference on Computer Vision (ECCV), 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
881
3,657
0
09 Mar 2023
Scaling Vision Transformers to 22 Billion Parameters
International Conference on Machine Learning (ICML), 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
466
825
0
10 Feb 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Computer Vision and Pattern Recognition (CVPR), 2023
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSL
AI4TS
MDE
571
706
0
19 Jan 2023
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Computer Vision and Pattern Recognition (CVPR), 2023
Sanghyun Woo
Shoubhik Debnath
Ronghang Hu
Xinlei Chen
Zhuang Liu
In So Kweon
Saining Xie
SyDa
618
1,521
0
02 Jan 2023
Rethinking Vision Transformers for MobileNet Size and Speed
IEEE International Conference on Computer Vision (ICCV), 2022
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
433
290
0
15 Dec 2022
FlexiViT: One Model for All Patch Sizes
Computer Vision and Pattern Recognition (CVPR), 2022
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim Alabdulmohsin
Filip Pavetić
VLM
512
152
0
15 Dec 2022
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
International Conference on Learning Representations (ICLR), 2022
Yuqi Nie
Nam H. Nguyen
Phanwadee Sinthong
Jayant Kalagnanam
AIFin
AI4TS
1.2K
3,099
0
27 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
IEEE International Conference on Computer Vision (ICCV), 2022
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
593
167
0
18 Nov 2022
SegViT: Semantic Segmentation with Plain Vision Transformers
Neural Information Processing Systems (NeurIPS), 2022
Bowen Zhang
Zhi Tian
Quan Tang
Xiangxiang Chu
Xiaolin K. Wei
Chunhua Shen
Yifan Liu
ViT
374
212
0
12 Oct 2022
PatchDropout: Economizing Vision Transformers Using Patch Dropout
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Yue Liu
Christos Matsoukas
Fredrik Strand
Hossein Azizpour
Kevin Smith
316
40
0
10 Aug 2022
Multimodal Learning with Transformers: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
652
947
0
13 Jun 2022
MobileOne: An Improved One millisecond Mobile Backbone
Computer Vision and Pattern Recognition (CVPR), 2022
Pavan Kumar Anasosalu Vasu
J. Gabriel
Jeff J. Zhu
Oncel Tuzel
Anurag Ranjan
404
287
0
08 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
381
416
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
Neural Information Processing Systems (NeurIPS), 2022
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
835
571
0
02 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Neural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
984
3,922
0
27 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision Transformers
European Conference on Computer Vision (ECCV), 2022
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
Łukasz Dudziak
Jiaming Song
Georgios Tzimiropoulos
Brais Martínez
ViT
431
259
0
06 May 2022
DeiT III: Revenge of the ViT
European Conference on Computer Vision (ECCV), 2022
Hugo Touvron
Matthieu Cord
Edouard Grave
ViT
421
597
0
14 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
IEEE International Conference on Computer Vision (ICCV), 2022
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
324
67
0
06 Apr 2022
Exploring Plain Vision Transformer Backbones for Object Detection
European Conference on Computer Vision (ECCV), 2022
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
800
1,096
0
30 Mar 2022
A ConvNet for the 2020s
Computer Vision and Pattern Recognition (CVPR), 2022
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
807
7,672
0
10 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Computer Vision and Pattern Recognition (CVPR), 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
2.8K
10,973
0
11 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Computer Vision and Pattern Recognition (CVPR), 2021
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
362
445
0
03 Nov 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
720
2,093
0
05 Oct 2021
Mobile-Former: Bridging MobileNet and Transformer
Computer Vision and Pattern Recognition (CVPR), 2021
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Xiyang Dai
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
913
659
0
12 Aug 2021
1
2
Next
Page 1 of 2