Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.13298
Cited By
Rotary Position Embedding for Vision Transformer
20 March 2024
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Rotary Position Embedding for Vision Transformer"
14 / 14 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
W. Xu
Shibiao Xu
ViT
39
0
0
06 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
98
0
0
17 Apr 2025
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
45
0
0
14 Mar 2025
LEDiT: Your Length-Extrapolatable Diffusion Transformer without Positional Encoding
Shen Zhang
Yaning Tan
Siyuan Liang
Zhaowei Chen
Linze Li
...
Shuheng Li
Zhenyu Zhao
Caihua Chen
Jiajun Liang
Yao Tang
47
0
0
06 Mar 2025
Detection, Retrieval, and Explanation Unified: A Violence Detection System Based on Knowledge Graphs and GAT
Wen-Dong Jiang
Chih-Yung Chang
Diptendu Sinha Roy
36
0
0
07 Jan 2025
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
H. Chen
Z. Wang
X. Li
X. Sun
Fangyi Chen
Jiang Liu
J. Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
103
6
0
14 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
103
2
0
02 Dec 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
55
25
0
07 Jun 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
34
2
0
22 May 2024
FiT: Flexible Vision Transformer for Diffusion Model
Zeyu Lu
Zidong Wang
Di Huang
Chengyue Wu
Xihui Liu
Wanli Ouyang
Lei Bai
146
13
0
19 Feb 2024
What Do Self-Supervised Vision Transformers Learn?
Namuk Park
Wonjae Kim
Byeongho Heo
Taekyung Kim
Sangdoo Yun
SSL
62
76
1
01 May 2023
ResNet strikes back: An improved training procedure in timm
Ross Wightman
Hugo Touvron
Hervé Jégou
AI4TS
196
477
0
01 Oct 2021
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
948
20,214
0
17 Apr 2017
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
243
1,817
0
18 Aug 2016
1