ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.14222
  4. Cited By
Rethinking and Improving Relative Position Encoding for Vision
  Transformer

Rethinking and Improving Relative Position Encoding for Vision Transformer

29 July 2021
Kan Wu
Houwen Peng
Minghao Chen
Jianlong Fu
Hongyang Chao
    ViT
ArXivPDFHTML

Papers citing "Rethinking and Improving Relative Position Encoding for Vision Transformer"

50 / 163 papers shown
Title
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Feng Liu
Nicholas Chimitt
Lanqing guo
Jitesh Jain
Aditya Kane
...
Arun Ross
Humphrey Shi
Zhangyang Wang
A. Jain
Xiaoming Liu
CVBM
22
0
0
07 May 2025
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers
M. Chowdhury
Md Rifat Ur Rahman
Akil Ahmad Taki
25
0
0
19 Apr 2025
Air Quality Prediction with A Meteorology-Guided Modality-Decoupled Spatio-Temporal Network
Air Quality Prediction with A Meteorology-Guided Modality-Decoupled Spatio-Temporal Network
Hang Yin
Yan Zhang
Jian Xu
Jian-Long Chang
Y. Li
Cheng-Lin Liu
34
0
0
14 Apr 2025
Learning Object Focused Attention
Learning Object Focused Attention
Vivek Trivedy
A. Almalki
Longin Jan Latecki
31
0
0
10 Apr 2025
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
HGFormer: Topology-Aware Vision Transformer with HyperGraph Learning
Hao Wang
Shuo Zhang
Biao Leng
ViT
62
0
0
03 Apr 2025
Spectral-Adaptive Modulation Networks for Visual Perception
Spectral-Adaptive Modulation Networks for Visual Perception
Guhnoo Yun
J. Yoo
Kijung Kim
Jeongho Lee
Paul Hongsuck Seo
Dong Hwan Kim
34
0
0
31 Mar 2025
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition
Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition
Koki Hirooka
Abu Saleh Musa Miah
Tatsuya Murakami
Yuto Akiba
Yong Seok Hwang
Jungpil Shin
SLR
54
0
0
21 Mar 2025
UniNet: A Unified Multi-granular Traffic Modeling Framework for Network Security
Binghui Wu
D. Divakaran
M. Gurusamy
57
0
0
06 Mar 2025
Partial Convolution Meets Visual Attention
Haiduo Huang
Fuwei Yang
D. Li
Ji Liu
Lu Tian
Jinzhang Peng
Pengju Ren
E. Barsoum
3DH
121
0
0
05 Mar 2025
Constrained Generative Modeling with Manually Bridged Diffusion Models
Constrained Generative Modeling with Manually Bridged Diffusion Models
Saeid Naderiparizi
Xiaoxuan Liang
Berend Zwartsenberg
Frank D. Wood
DiffM
60
0
0
27 Feb 2025
Lightweight yet Efficient: An External Attentive Graph Convolutional Network with Positional Prompts for Sequential Recommendation
Lightweight yet Efficient: An External Attentive Graph Convolutional Network with Positional Prompts for Sequential Recommendation
Jinyu Zhang
Chao Li
Zhongying Zhao
62
0
0
21 Feb 2025
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Yunshan Zhong
Yuyao Zhou
Yuxin Zhang
Shen Li
Yong Li
Fei Chao
Zhanpeng Zeng
Rongrong Ji
MQ
89
0
0
31 Dec 2024
Harmformer: Harmonic Networks Meet Transformers for Continuous
  Roto-Translation Equivariance
Harmformer: Harmonic Networks Meet Transformers for Continuous Roto-Translation Equivariance
Tomáš Karella
Adam Harmanec
J. Kotera
Jan Blažek
F. Šroubek
21
1
0
06 Nov 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide
  Image Analysis
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
32
4
0
18 Oct 2024
Toward Robust Real-World Audio Deepfake Detection: Closing the
  Explainability Gap
Toward Robust Real-World Audio Deepfake Detection: Closing the Explainability Gap
Georgia Channing
Juil Sock
Ronald Clark
Philip H. S. Torr
Christian Schroeder de Witt
30
2
0
09 Oct 2024
Tackling the Abstraction and Reasoning Corpus with Vision Transformers:
  the Importance of 2D Representation, Positions, and Objects
Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects
Wenhao Li
Yudong Xu
Scott Sanner
Elias Boutros Khalil
ViT
29
3
0
08 Oct 2024
3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model
  for Laryngeal Cancer Detection Using Laryngoscopic Videos
3D-LSPTM: An Automatic Framework with 3D-Large-Scale Pretrained Model for Laryngeal Cancer Detection Using Laryngoscopic Videos
Meiyu Qiu
Y. Li
Wenjun Huang
Haoyun Zhang
Weiping Zheng
Wenbin Lei
Xiaomao Fan
18
0
0
02 Sep 2024
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point
  Cloud Representation Learning
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning
Kunming Su
Qiuxia Wu
Panpan Cai
Xiaogang Zhu
Xuequan Lu
Zhiyong Wang
Kun Hu
3DPC
27
2
0
31 Aug 2024
Hierarchical Network Fusion for Multi-Modal Electron Micrograph
  Representation Learning with Foundational Large Language Models
Hierarchical Network Fusion for Multi-Modal Electron Micrograph Representation Learning with Foundational Large Language Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
30
0
0
24 Aug 2024
Positional Prompt Tuning for Efficient 3D Representation Learning
Positional Prompt Tuning for Efficient 3D Representation Learning
Shaochen Zhang
Zekun Qi
Runpei Dong
Xiuxiu Bai
Xing Wei
37
4
0
21 Aug 2024
MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from
  Multi-Cylindrical Panoramas
MCPDepth: Omnidirectional Depth Estimation via Stereo Matching from Multi-Cylindrical Panoramas
Feng Qiao
Zhexiao Xiong
Xinge Zhu
Yuexin Ma
Qiumeng He
Nathan Jacobs
MDE
16
1
0
03 Aug 2024
Rethinking Attention Module Design for Point Cloud Analysis
Rethinking Attention Module Design for Point Cloud Analysis
Chengzhi Wu
Kaige Wang
Zeyun Zhong
Hao Fu
Junwei Zheng
Jiaming Zhang
Julius Pfrommer
Jürgen Beyerer
3DPC
44
1
0
27 Jul 2024
Transformer-based Single-Cell Language Model: A Survey
Transformer-based Single-Cell Language Model: A Survey
Wei Lan
Guohang He
Mingyang Liu
Qingfeng Chen
Junyue Cao
Wei Peng
MedIm
LRM
20
7
0
18 Jul 2024
Translatotron-V(ison): An End-to-End Model for In-Image Machine
  Translation
Translatotron-V(ison): An End-to-End Model for In-Image Machine Translation
Zhibin Lan
Liqiang Niu
Fandong Meng
Jie Zhou
Min Zhang
Jinsong Su
VLM
23
5
0
03 Jul 2024
PNeRV: A Polynomial Neural Representation for Videos
PNeRV: A Polynomial Neural Representation for Videos
Sonam Gupta
S. Tomar
Grigorios G. Chrysos
Sukhendu Das
A. N. Rajagopalan
38
0
0
27 Jun 2024
LookHere: Vision Transformers with Directed Attention Generalize and
  Extrapolate
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
34
2
0
22 May 2024
From CNNs to Transformers in Multimodal Human Action Recognition: A
  Survey
From CNNs to Transformers in Multimodal Human Action Recognition: A Survey
Muhammad Bilal Shaikh
Syed Mohammed Shamsul Islam
Douglas Chai
Naveed Akhtar
30
9
0
22 May 2024
Pseudo Channel: Time Embedding for Motor Imagery Decoding
Pseudo Channel: Time Embedding for Motor Imagery Decoding
Zhengqing Miao
Meirong Zhao
16
1
0
21 May 2024
Semantically Consistent Video Inpainting with Conditional Diffusion
  Models
Semantically Consistent Video Inpainting with Conditional Diffusion Models
Dylan Green
William Harvey
Saeid Naderiparizi
Matthew Niedoba
Yunpeng Liu
...
Vasileios Lioutas
Setareh Dabiri
Adam Scibior
Berend Zwartsenberg
Frank D. Wood
DiffM
23
1
0
30 Apr 2024
Utilizing Large Language Models for Information Extraction from Real
  Estate Transactions
Utilizing Large Language Models for Information Extraction from Real Estate Transactions
Yu Zhao
Haoxiang Gao
AILaw
40
9
0
28 Apr 2024
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic
  IoT
NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT
Xinzhe Zheng
Sijie Ji
Yipeng Pan
Kaiwen Zhang
Chenshu Wu
19
1
0
13 Apr 2024
OmniSat: Self-Supervised Modality Fusion for Earth Observation
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
28
24
0
12 Apr 2024
HSViT: Horizontally Scalable Vision Transformer
HSViT: Horizontally Scalable Vision Transformer
Chenhao Xu
Chang-Tsun Li
Chee Peng Lim
Douglas Creighton
ViT
24
1
0
08 Apr 2024
Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation
Equipping Sketch Patches with Context-Aware Positional Encoding for Graphic Sketch Representation
Sicong Zang
Zhijun Fang
34
0
0
26 Mar 2024
KeyPoint Relative Position Encoding for Face Recognition
KeyPoint Relative Position Encoding for Face Recognition
Minchul Kim
Yiyang Su
Feng Liu
Anil Jain
Xiaoming Liu
CVBM
32
7
0
21 Mar 2024
Rotary Position Embedding for Vision Transformer
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
29
33
0
20 Mar 2024
Quantum Mixed-State Self-Attention Network
Quantum Mixed-State Self-Attention Network
Fu Chen
Qinglin Zhao
Li Feng
Chuangtao Chen
Yangbin Lin
Jianhong Lin
34
5
0
05 Mar 2024
Feature Re-Embedding: Towards Foundation Model-Level Performance in
  Computational Pathology
Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology
Wenhao Tang
Fengtao Zhou
Shengyue Huang
Xiang Zhu
Yi Zhang
Bo Liu
27
20
0
27 Feb 2024
Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene
  Understanding
Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding
Yu-Qi Yang
Yufeng Guo
Yang Liu
3DPC
35
2
0
22 Feb 2024
Locality-Sensitive Hashing-Based Efficient Point Transformer with
  Applications in High-Energy Physics
Locality-Sensitive Hashing-Based Efficient Point Transformer with Applications in High-Energy Physics
Siqi Miao
Zhiyuan Lu
Mia Liu
Javier Duarte
Pan Li
34
4
0
19 Feb 2024
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for
  Computer Vision: A survey
Exploring the Synergies of Hybrid CNNs and ViTs Architectures for Computer Vision: A survey
Haruna Yunusa
Shiyin Qin
Abdulrahman Hamman Adama Chukkol
Abdulganiyu Abdu Yusuf
Isah Bello
A. Lawan
ViT
17
12
0
05 Feb 2024
Towards Visual Syntactical Understanding
Towards Visual Syntactical Understanding
Sayeed Shafayet Chowdhury
Soumyadeep Chandra
Kaushik Roy
NAI
14
0
0
30 Jan 2024
MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D
  Object Detection
MsSVT++: Mixed-scale Sparse Voxel Transformer with Center Voting for 3D Object Detection
Jianan Li
Shaocong Dong
Lihe Ding
Tingfa Xu
3DPC
19
7
0
22 Jan 2024
SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of
  Lumbar Spine MRI
SymTC: A Symbiotic Transformer-CNN Net for Instance Segmentation of Lumbar Spine MRI
Jiasong Chen
Linchen Qian
Linhai Ma
Timur Urakov
Weiyong Gu
Liang Liang
MedIm
29
4
0
17 Jan 2024
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning
Hongye Jin
Xiaotian Han
Jingfeng Yang
Zhimeng Jiang
Zirui Liu
Chia-Yuan Chang
Huiyuan Chen
Xia Hu
20
100
0
02 Jan 2024
SCHEME: Scalable Channel Mixer for Vision Transformers
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
18
0
0
01 Dec 2023
Categorical Traffic Transformer: Interpretable and Diverse Behavior
  Prediction with Tokenized Latent
Categorical Traffic Transformer: Interpretable and Diverse Behavior Prediction with Tokenized Latent
Yuxiao Chen
Sander Tonkens
Marco Pavone
25
9
0
30 Nov 2023
Typhoon Intensity Prediction with Vision Transformer
Typhoon Intensity Prediction with Vision Transformer
Huanxin Chen
Pengshuai Yin
Huichou Huang
Qingyao Wu
Ruirui Liu
Xiatian Zhu
17
0
0
28 Nov 2023
Predicting Gradient is Better: Exploring Self-Supervised Learning for
  SAR ATR with a Joint-Embedding Predictive Architecture
Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture
Wei-Jang Li
Yang Wei
Tianpeng Liu
Yuenan Hou
Yuxuan Li
Zhen Liu
Yongxiang Liu
Li Liu
19
17
0
26 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
25
4
0
21 Nov 2023
1234
Next