ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15021
  4. Cited By
Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers
v1v2v3v4 (latest)

Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers

20 February 2025
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github (12★)

Papers citing "Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers"

50 / 79 papers shown
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting
Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive PromptingComputer Vision and Pattern Recognition (CVPR), 2025
Kaouther Messaoud
Matthieu Cord
Alexandre Alahi
288
5
0
10 Jan 2025
Higher Order Transformers: Enhancing Stock Movement Prediction On
  Multimodal Time-Series Data
Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data
Soroush Omranpour
Guillaume Rabusseau
Reihaneh Rabbany
AI4TSAIFin
227
4
0
13 Dec 2024
FlowTS: Time Series Generation via Rectified Flow
FlowTS: Time Series Generation via Rectified Flow
Yang Hu
Xinyu Wang
Lirong Wu
Huatian Zhang
Stan Z. Li
Sheng Wang
Jen-tse Huang
Jiheng Zhang
Ziyun Li
Tianlong Chen
AI4TS
426
0
0
12 Nov 2024
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular Data
T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular DataInternational Conference on Learning Representations (ICLR), 2024
Hugo Thimonier
José Lucas De Melo Costa
Fabrice Popineau
Arpad Rimmel
Bich-Liên Doan
553
8
0
07 Oct 2024
Is Tokenization Needed for Masked Particle Modelling?
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
327
21
0
19 Sep 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual
  Representation Learning
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei
Abhinav Gupta
Pedro Morgado
SSL
261
18
0
22 Jul 2024
FlashAttention-3: Fast and Accurate Attention with Asynchrony and
  Low-precision
FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Jay Shah
Ganesh Bikshandi
Ying Zhang
Vijay Thakkar
Pradeep Ramani
Tri Dao
568
430
0
11 Jul 2024
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Compute Better Spent: Replacing Dense Layers with Structured Matrices
Shikai Qiu
Andres Potapczynski
Marc Finzi
Micah Goldblum
Andrew Gordon Wilson
293
24
0
10 Jun 2024
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any
  Resolution
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu
277
8
0
28 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and
  Extrapolate
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
372
7
0
22 May 2024
MobileNetV4 - Universal Models for the Mobile Ecosystem
MobileNetV4 - Universal Models for the Mobile Ecosystem
Danfeng Qin
Chas Leichner
M. Delakis
Marco Fornoni
Shixin Luo
...
Berkin Akin
Vaibhav Aggarwal
Tenghui Zhu
Daniele Moro
Andrew G. Howard
MQ
424
495
0
16 Apr 2024
Rotary Position Embedding for Vision Transformer
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
561
181
0
20 Mar 2024
LORS: Low-rank Residual Structure for Parameter-Efficient Network
  Stacking
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li
Qiang Nie
Weifu Fu
Yuhuan Lin
Guangpin Tao
Yong-Jin Liu
Chengjie Wang
295
9
0
07 Mar 2024
Learning and Leveraging World Models in Visual Representation Learning
Learning and Leveraging World Models in Visual Representation Learning
Q. Garrido
Mahmoud Assran
Nicolas Ballas
Adrien Bardes
Laurent Najman
Yann LeCun
SSL
349
61
0
01 Mar 2024
Perceiving Longer Sequences With Bi-Directional Cross-Attention
  Transformers
Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers
Markus Hiller
Krista A. Ehinger
Tom Drummond
443
12
0
19 Feb 2024
Mission Critical -- Satellite Data is a Distinct Modality in Machine
  Learning
Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning
Esther Rolf
Konstantin Klemmer
Caleb Robinson
Hannah Kerner
273
82
0
02 Feb 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignComputer Vision and Pattern Recognition (CVPR), 2024
Seokju Yun
Youngmin Ro
ViT
447
125
0
29 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
431
38
0
25 Jan 2024
Vision Transformers Need Registers
Vision Transformers Need RegistersInternational Conference on Learning Representations (ICLR), 2023
Zilong Chen
Maxime Oquab
Julien Mairal
Huaping Liu
ViT
564
706
0
28 Sep 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
754
1,842
0
24 Aug 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningInternational Conference on Learning Representations (ICLR), 2023
Tri Dao
LRM
601
2,426
0
17 Jul 2023
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and
  Resolution
Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and ResolutionNeural Information Processing Systems (NeurIPS), 2023
Mostafa Dehghani
Basil Mustafa
Josip Djolonga
Jonathan Heek
Matthias Minderer
...
Avital Oliver
Piotr Padlewski
A. Gritsenko
Mario Luvcić
N. Houlsby
ViT
428
206
0
12 Jul 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesInternational Conference on Machine Learning (ICML), 2023
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
434
366
0
01 Jun 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model DesignNeural Information Processing Systems (NeurIPS), 2023
Ibrahim Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
679
98
0
22 May 2023
Transformer-Based Visual Segmentation: A Survey
Transformer-Based Visual Segmentation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViTMedIm
563
271
0
19 Apr 2023
FastViT: A Fast Hybrid Vision Transformer using Structural
  Reparameterization
FastViT: A Fast Hybrid Vision Transformer using Structural ReparameterizationIEEE International Conference on Computer Vision (ICCV), 2023
Pavan Kumar Anasosalu Vasu
J. Gabriel
Jeff J. Zhu
Oncel Tuzel
Anurag Ranjan
ViT
406
334
0
24 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionEuropean Conference on Computer Vision (ECCV), 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
881
3,657
0
09 Mar 2023
Scaling Vision Transformers to 22 Billion Parameters
Scaling Vision Transformers to 22 Billion ParametersInternational Conference on Machine Learning (ICML), 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
466
825
0
10 Feb 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive
  Architecture
Self-Supervised Learning from Images with a Joint-Embedding Predictive ArchitectureComputer Vision and Pattern Recognition (CVPR), 2023
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSLAI4TSMDE
571
706
0
19 Jan 2023
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
ConvNeXt V2: Co-designing and Scaling ConvNets with Masked AutoencodersComputer Vision and Pattern Recognition (CVPR), 2023
Sanghyun Woo
Shoubhik Debnath
Ronghang Hu
Xinlei Chen
Zhuang Liu
In So Kweon
Saining Xie
SyDa
618
1,521
0
02 Jan 2023
Rethinking Vision Transformers for MobileNet Size and Speed
Rethinking Vision Transformers for MobileNet Size and SpeedIEEE International Conference on Computer Vision (ICCV), 2022
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
ViT
433
290
0
15 Dec 2022
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch SizesComputer Vision and Pattern Recognition (CVPR), 2022
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim Alabdulmohsin
Filip Pavetić
VLM
512
152
0
15 Dec 2022
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
A Time Series is Worth 64 Words: Long-term Forecasting with TransformersInternational Conference on Learning Representations (ICLR), 2022
Yuqi Nie
Nam H. Nguyen
Phanwadee Sinthong
Jayant Kalagnanam
AIFinAI4TS
1.2K
3,099
0
27 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical FlowIEEE International Conference on Computer Vision (ICCV), 2022
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
593
167
0
18 Nov 2022
SegViT: Semantic Segmentation with Plain Vision Transformers
SegViT: Semantic Segmentation with Plain Vision TransformersNeural Information Processing Systems (NeurIPS), 2022
Bowen Zhang
Zhi Tian
Quan Tang
Xiangxiang Chu
Xiaolin K. Wei
Chunhua Shen
Yifan Liu
ViT
374
212
0
12 Oct 2022
PatchDropout: Economizing Vision Transformers Using Patch Dropout
PatchDropout: Economizing Vision Transformers Using Patch DropoutIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Yue Liu
Christos Matsoukas
Fredrik Strand
Hossein Azizpour
Kevin Smith
316
40
0
10 Aug 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
652
947
0
13 Jun 2022
MobileOne: An Improved One millisecond Mobile Backbone
MobileOne: An Improved One millisecond Mobile BackboneComputer Vision and Pattern Recognition (CVPR), 2022
Pavan Kumar Anasosalu Vasu
J. Gabriel
Jeff J. Zhu
Oncel Tuzel
Anurag Ranjan
404
287
0
08 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViTMQ
381
416
0
06 Jun 2022
EfficientFormer: Vision Transformers at MobileNet Speed
EfficientFormer: Vision Transformers at MobileNet SpeedNeural Information Processing Systems (NeurIPS), 2022
Yanyu Li
Geng Yuan
Yang Wen
Eric Hu
Georgios Evangelidis
Sergey Tulyakov
Yanzhi Wang
Jian Ren
ViT
835
571
0
02 Jun 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
984
3,922
0
27 May 2022
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision
  Transformers
EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022
Junting Pan
Adrian Bulat
Fuwen Tan
Xiatian Zhu
Łukasz Dudziak
Jiaming Song
Georgios Tzimiropoulos
Brais Martínez
ViT
431
259
0
06 May 2022
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViTEuropean Conference on Computer Vision (ECCV), 2022
Hugo Touvron
Matthieu Cord
Edouard Grave
ViT
421
597
0
14 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for
  Object Detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object DetectionIEEE International Conference on Computer Vision (ICCV), 2022
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
324
67
0
06 Apr 2022
Exploring Plain Vision Transformer Backbones for Object Detection
Exploring Plain Vision Transformer Backbones for Object DetectionEuropean Conference on Computer Vision (ECCV), 2022
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
800
1,096
0
30 Mar 2022
A ConvNet for the 2020s
A ConvNet for the 2020sComputer Vision and Pattern Recognition (CVPR), 2022
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
807
7,672
0
10 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision LearnersComputer Vision and Pattern Recognition (CVPR), 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViTTPM
2.8K
10,973
0
11 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language
  Transformers
An Empirical Study of Training End-to-End Vision-and-Language TransformersComputer Vision and Pattern Recognition (CVPR), 2021
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
362
445
0
03 Nov 2021
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision
  Transformer
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
720
2,093
0
05 Oct 2021
Mobile-Former: Bridging MobileNet and Transformer
Mobile-Former: Bridging MobileNet and TransformerComputer Vision and Pattern Recognition (CVPR), 2021
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Xiyang Dai
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
913
659
0
12 Aug 2021
12
Next
Page 1 of 2