Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

Neural Information Processing Systems (NeurIPS), 2023

12 July 2023

Ibrahim Alabdulmohsin

ArXiv (abs)PDF HTML HuggingFace (31 upvotes)

Papers citing "Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

20 / 120 papers shown

Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT

Yangguang Li

...

Hongsheng Li

Peng Gao

302

104

05 Jun 2024

Patch-enhanced Mask Encoder Prompt Image Generation

Shusong Xu

Peiye Liu

DiffM

170

29 May 2024

Wavelet-Based Image Tokenizer for Vision Transformers

Zhenhai Zhu

Radu Soricut

ViT

226

28 May 2024

MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution

221

28 May 2024

LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate

331

22 May 2024

What matters when building vision-language models?Neural Information Processing Systems (NeurIPS), 2024

299

274

03 May 2024

PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot

Shenbagaraj Kannapiran

Sreenithy Chandran

Suren Jayasuriya

Spring Berman

176

07 Apr 2024

MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection

Ali Behrouz

Michele Santacatterina

Ramin Zabih

434

29 Mar 2024

ViTAR: Vision Transformer with Any Resolution

Hongxia Yang

333

27 Mar 2024

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Kai Zhang

...

423

488

27 Feb 2024

Representing Online Handwriting for Recognition in Large Vision-Language Models

284

23 Feb 2024

Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures

Vincent Abbott

143

08 Feb 2024

MESA: Matching Everything by Segmenting Anything

Yesheng Zhang

Xu Zhao

186

30 Jan 2024

BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything ModelComputer Vision and Pattern Recognition (CVPR), 2024

502

04 Jan 2024

Input Compression with Positional Consistency for Efficient Training and Inference of Transformer Neural Networks

Amrit Nagarajan

Anand Raghunathan

VLM ViT

22 Nov 2023

Navigating Scaling Laws: Compute Optimality in Adaptive Model TrainingInternational Conference on Machine Learning (ICML), 2023

336

06 Nov 2023

Win-Win: Training High-Resolution Vision Transformers from Two WindowsInternational Conference on Learning Representations (ICLR), 2023

271

01 Oct 2023

Beyond Grids: Exploring Elastic Input Sampling for Vision TransformersIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

168

23 Sep 2023

DropCompute: simple and more robust distributed synchronous training via compute variance reductionNeural Information Processing Systems (NeurIPS), 2023

346

18 Jun 2023

Generative AI for Rapid Diffusion MRI with Improved Image Quality, Reliability and Generalizability

186

10 Mar 2023