v1v2v3v4 (latest)

Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers

20 February 2025

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)Github (12★)

Papers citing "Thicker and Quicker: A Jumbo Token for Fast Plain Vision Transformers"

50 / 79 papers shown

Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive PromptingComputer Vision and Pattern Recognition (CVPR), 2025

Kaouther Messaoud

Matthieu Cord

Alexandre Alahi

288

10 Jan 2025

Higher Order Transformers: Enhancing Stock Movement Prediction On Multimodal Time-Series Data

227

13 Dec 2024

FlowTS: Time Series Generation via Rectified Flow

Jiheng Zhang

Ziyun Li

Tianlong Chen

AI4TS

426

12 Nov 2024

T-JEPA: Augmentation-Free Self-Supervised Learning for Tabular DataInternational Conference on Learning Representations (ICLR), 2024

Hugo Thimonier

José Lucas De Melo Costa

Fabrice Popineau

Arpad Rimmel

Bich-Liên Doan

553

07 Oct 2024

Is Tokenization Needed for Masked Particle Modelling?

Matthew Leigh

Samuel Klein

François Charton

Tobias Golling

Lukas Heinrich

Michael Kagan

Ines Ochoa

Margarita Osadchy

327

19 Sep 2024

Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning

261

22 Jul 2024

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

568

430

11 Jul 2024

Compute Better Spent: Replacing Dense Layers with Structured Matrices

Andrew Gordon Wilson

293

10 Jun 2024

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

277

28 May 2024

LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate

372

22 May 2024

MobileNetV4 - Universal Models for the Mobile Ecosystem

...

424

495

16 Apr 2024

Rotary Position Embedding for Vision Transformer

561

181

20 Mar 2024

LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking

Chengjie Wang

295

07 Mar 2024

Learning and Leveraging World Models in Visual Representation Learning

349

01 Mar 2024

Perceiving Longer Sequences With Bi-Directional Cross-Attention Transformers

Markus Hiller

Krista A. Ehinger

Tom Drummond

443

19 Feb 2024

Mission Critical -- Satellite Data is a Distinct Modality in Machine Learning

273

02 Feb 2024

SHViT: Single-Head Vision Transformer with Memory Efficient Macro DesignComputer Vision and Pattern Recognition (CVPR), 2024

Seokju Yun

Youngmin Ro

ViT

447

125

29 Jan 2024

Rethinking Patch Dependence for Masked Autoencoders

Letian Fu

431

25 Jan 2024

Vision Transformers Need RegistersInternational Conference on Learning Representations (ICLR), 2023

Maxime Oquab

564

706

28 Sep 2023

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Jinze Bai

Shuai Bai

Shusheng Yang

Shijie Wang

Sinan Tan

Peng Wang

Junyang Lin

Chang Zhou

Jingren Zhou

MLLM VLM ObjD

754

1,842

24 Aug 2023

FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningInternational Conference on Learning Representations (ICLR), 2023

Tri Dao

LRM

601

2,426

17 Jul 2023

Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and ResolutionNeural Information Processing Systems (NeurIPS), 2023

...

428

206

12 Jul 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-WhistlesInternational Conference on Machine Learning (ICML), 2023

...

Christoph Feichtenhofer

3DH

434

366

01 Jun 2023

Getting ViT in Shape: Scaling Laws for Compute-Optimal Model DesignNeural Information Processing Systems (NeurIPS), 2023

Ibrahim Alabdulmohsin

679

22 May 2023

Transformer-Based Visual Segmentation: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Xiangtai Li

563

271

19 Apr 2023

FastViT: A Fast Hybrid Vision Transformer using Structural ReparameterizationIEEE International Conference on Computer Vision (ICCV), 2023

Pavan Kumar Anasosalu Vasu

406

334

24 Mar 2023

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionEuropean Conference on Computer Vision (ECCV), 2023

...

Jianwei Yang

Hang Su

Jun Zhu

Lei Zhang

ObjD

881

3,657

09 Mar 2023

Scaling Vision Transformers to 22 Billion ParametersInternational Conference on Machine Learning (ICML), 2023

...

466

825

10 Feb 2023

Self-Supervised Learning from Images with a Joint-Embedding Predictive ArchitectureComputer Vision and Pattern Recognition (CVPR), 2023

Pascal Vincent

571

706

19 Jan 2023

ConvNeXt V2: Co-designing and Scaling ConvNets with Masked AutoencodersComputer Vision and Pattern Recognition (CVPR), 2023

In So Kweon

618

1,521

02 Jan 2023

Rethinking Vision Transformers for MobileNet Size and SpeedIEEE International Conference on Computer Vision (ICCV), 2022

433

290

15 Dec 2022

FlexiViT: One Model for All Patch SizesComputer Vision and Pattern Recognition (CVPR), 2022

Ibrahim Alabdulmohsin

Filip Pavetić

VLM

512

152

15 Dec 2022

A Time Series is Worth 64 Words: Long-term Forecasting with TransformersInternational Conference on Learning Representations (ICLR), 2022

1.2K

3,099

27 Nov 2022

CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical FlowIEEE International Conference on Computer Vision (ICCV), 2022

593

167

18 Nov 2022

SegViT: Semantic Segmentation with Plain Vision TransformersNeural Information Processing Systems (NeurIPS), 2022

Chunhua Shen

374

212

12 Oct 2022

PatchDropout: Economizing Vision Transformers Using Patch DropoutIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

316

10 Aug 2022

Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022

652

947

13 Jun 2022

MobileOne: An Improved One millisecond Mobile BackboneComputer Vision and Pattern Recognition (CVPR), 2022

Pavan Kumar Anasosalu Vasu

404

287

08 Jun 2022

Separable Self-attention for Mobile Vision Transformers

Sachin Mehta

Mohammad Rastegari

ViT MQ

381

416

06 Jun 2022

EfficientFormer: Vision Transformers at MobileNet SpeedNeural Information Processing Systems (NeurIPS), 2022

835

571

02 Jun 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022

984

3,922

27 May 2022

EdgeViTs: Competing Light-weight CNNs on Mobile Devices with Vision TransformersEuropean Conference on Computer Vision (ECCV), 2022

Georgios Tzimiropoulos

Brais Martínez

ViT

431

259

06 May 2022

DeiT III: Revenge of the ViTEuropean Conference on Computer Vision (ECCV), 2022

421

597

14 Apr 2022

Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object DetectionIEEE International Conference on Computer Vision (ICCV), 2022

Yuxin Fang

Shusheng Yang

Shijie Wang

Yixiao Ge

Ying Shan

Xinggang Wang

324

06 Apr 2022

Exploring Plain Vision Transformer Backbones for Object DetectionEuropean Conference on Computer Vision (ECCV), 2022

800

1,096

30 Mar 2022

A ConvNet for the 2020sComputer Vision and Pattern Recognition (CVPR), 2022

Zhuang Liu

Hanzi Mao

Chaozheng Wu

Christoph Feichtenhofer

Trevor Darrell

Saining Xie

ViT

807

7,672

10 Jan 2022

Masked Autoencoders Are Scalable Vision LearnersComputer Vision and Pattern Recognition (CVPR), 2021

Piotr Dollár

2.8K

10,973

11 Nov 2021

An Empirical Study of Training End-to-End Vision-and-Language TransformersComputer Vision and Pattern Recognition (CVPR), 2021

...

Lu Yuan

Zicheng Liu

362

445

03 Nov 2021

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

Sachin Mehta

Mohammad Rastegari

ViT

720

2,093

05 Oct 2021

Mobile-Former: Bridging MobileNet and TransformerComputer Vision and Pattern Recognition (CVPR), 2021

Lu Yuan

Zicheng Liu

ViT

913

659

12 Aug 2021