Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2303.15105
Cited By
Vision Transformer with Quadrangle Attention
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
27 March 2023
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Github (214★)
Papers citing
"Vision Transformer with Quadrangle Attention"
36 / 36 papers shown
Title
UniDGF: A Unified Detection-to-Generation Framework for Hierarchical Object Visual Recognition
Xinyu Nan
Lingtao Mao
Huangyu Dai
Zexin Zheng
Xinyu Sun
...
Ben Chen
Yuqing Ding
Chenyi Lei
Wenwu Ou
Han Li
ObjD
196
0
0
20 Nov 2025
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
Italian National Conference on Sensors (INS), 2025
Vinit Mehta
Charu Sharma
Karthick Thiyagarajan
LM&Ro
324
0
0
14 Nov 2025
CoT-PL: Visual Chain-of-Thought Reasoning Meets Pseudo-Labeling for Open-Vocabulary Object Detection
Hojun Choi
Youngsun Lim
Jaeyo Shin
Hyunjung Shim
ObjD
LRM
VLM
181
1
0
16 Oct 2025
Follow-Your-Emoji-Faster: Towards Efficient, Fine-Controllable, and Expressive Freestyle Portrait Animation
Yue Ma
Zexuan Yan
Hongyu Liu
H. Wang
Heng Pan
...
H. Shum
Zhifeng Li
Wei Liu
Linfeng Zhang
Qifeng Chen
VGen
171
10
0
20 Sep 2025
DSpAST: Disentangled Representations for Spatial Audio Reasoning with Large Language Models
Kevin Wilkinghoff
Zheng-Hua Tan
105
0
0
17 Sep 2025
LSNet: See Large, Focus Small
Computer Vision and Pattern Recognition (CVPR), 2025
Ao Wang
Hui Chen
Zijia Lin
Jiawei Han
Guiguang Ding
215
9
0
29 Mar 2025
PVChat: Personalized Video Chat with One-Shot Learning
Yufei Shi
Weilong Yan
Gang Xu
Yumeng Li
Yongqian Li
Hao Sun
Fei Richard Yu
Ming Li
Si Yong Yeo
330
3
0
21 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
253
15
0
20 Mar 2025
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
A. Nassar
Andres Marafioti
Matteo Omenetti
Maksym Lysak
Nikolaos Livathinos
...
Yusik Kim
A. Said Gurbuz
Michele Dolfi
Miquel Farré
Peter W. J. Staar
242
26
0
14 Mar 2025
Video-Text Dataset Construction from Multi-AI Feedback: Promoting Weak-to-Strong Preference Learning for Video Large Language Models
Hao Yi
Qingyang Li
Yihan Hu
Fuzheng Zhang
Di Zhang
Yong Liu
VGen
294
0
0
25 Nov 2024
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
125
0
0
06 Oct 2024
From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities
International Conference on Learning Representations (ICLR), 2024
Wanpeng Zhang
Zilong Xie
Yicheng Feng
Yijiang Li
Xingrun Xing
Sipeng Zheng
Zongqing Lu
MLLM
297
8
0
03 Oct 2024
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
IEEE International Conference on Robotics and Automation (ICRA), 2024
Sombit Dey
Jan-Nico Zaech
Nikolay Nikolov
Luc Van Gool
Danda Pani Paudel
MoMe
VLM
380
15
0
23 Sep 2024
Locality-aware Cross-modal Correspondence Learning for Dense Audio-Visual Events Localization
Ling Xing
Hongyu Qu
Rui Yan
Xiangbo Shu
Jinhui Tang
438
8
0
12 Sep 2024
UNIT: Unifying Image and Text Recognition in One Vision Encoder
Neural Information Processing Systems (NeurIPS), 2024
Yi Zhu
Yanpeng Zhou
Chunwei Wang
Yang Cao
Jianhua Han
Lu Hou
Hang Xu
ViT
VLM
225
9
0
06 Sep 2024
LMLT: Low-to-high Multi-Level Vision Transformer for Image Super-Resolution
Jeongsoo Kim
Jongho Nang
Junsuk Choe
ViT
275
6
0
05 Sep 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Yatian Wang
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
257
8
0
30 Jul 2024
HaloQuest: A Visual Hallucination Dataset for Advancing Multimodal Reasoning
Zhecan Wang
Garrett Bingham
Adams Wei Yu
Quoc V. Le
Thang Luong
Golnaz Ghiasi
MLLM
LRM
326
26
0
22 Jul 2024
PoseBench: Benchmarking the Robustness of Pose Estimation Models under Corruptions
Sihan Ma
Jing Zhang
Qiong Cao
Dacheng Tao
200
7
0
20 Jun 2024
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang
H. Wang
Haiyan Zhao
Zonghao Guo
Zhenyu Zhong
Long Lan
Wenjing Yang
Jing Zhang
373
0
0
17 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Di Lin
Dacheng Tao
Liangpei Zhang
305
82
0
17 Jun 2024
MemeGuard: An LLM and VLM-based Framework for Advancing Content Moderation via Meme Intervention
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Prince Jha
Raghav Jain
Konika Mandal
Vasu Sharma
Sriparna Saha
P. Bhattacharyya
175
17
0
08 Jun 2024
Jailbreak Vision Language Models via Bi-Modal Adversarial Prompt
Zonghao Ying
Aishan Liu
Tianyuan Zhang
Zhengmin Yu
Yaning Tan
Xianglong Liu
Dacheng Tao
AAML
310
71
0
06 Jun 2024
Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation
Yi Ma
Hongyu Liu
Haobo Wang
Heng Pan
Yingqing He
...
Ailing Zeng
Chengfei Cai
H. Shum
Wen Liu
Qifeng Chen
278
112
0
04 Jun 2024
Sharing Key Semantics in Transformer Makes Efficient Image Restoration
Bin Ren
Yawei Li
Christos Sakaridis
Rakesh Ranjan
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
Ming-Hsuan Yang
Andrii Zadaianchuk
262
9
0
30 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
International Joint Conference on Artificial Intelligence (IJCAI), 2024
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
162
9
0
16 May 2024
Enhancing Efficiency in Vision Transformer Networks: Design Techniques and Insights
Moein Heidari
Reza Azad
Sina Ghorbani Kolahi
René Arimond
Leon Niggemeier
...
Afshin Bozorgpour
Ehsan Khodapanah Aghdam
Amirhossein Kazerouni
Ilker Hacihaliloglu
Dorit Merhof
238
13
0
28 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
Guang Dai
198
90
0
20 Mar 2024
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
385
8
0
17 Feb 2024
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
IEEE Transactions on Cybernetics (IEEE Trans. Cybern.), 2023
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
333
8
0
10 Oct 2023
SparseSwin: Swin Transformer with Sparse Transformer Block
Krisna Pinasthika
Blessius Sheldo Putra Laksono
Riyandi Banovbi Putera Irsal
Syifa’ Hukma Shabiyya
N. Yudistira
ViT
206
32
0
11 Sep 2023
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention
Zhuofan Xia
Xuran Pan
Shiji Song
Li Erran Li
Gao Huang
ViT
209
40
0
04 Sep 2023
ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts
Future generations computer systems (FGCS), 2023
Bilel Benjdira
Anis Koubaa
Anas M. Ali
LM&Ro
143
9
0
22 Aug 2023
ESSAformer: Efficient Transformer for Hyperspectral Image Super-resolution
IEEE International Conference on Computer Vision (ICCV), 2023
Mingjin Zhang
Chi Zhang
Qiming Zhang
Jie-Ru Guo
Xinbo Gao
Jing Zhang
167
50
0
26 Jul 2023
Deep Image Matting: A Comprehensive Survey
Jizhizi Li
Jing Zhang
Dacheng Tao
VLM
219
17
0
10 Apr 2023
Distract Your Attention: Multi-head Cross Attention Network for Facial Expression Recognition
Zhengyao Wen
Wen-Long Lin
Tao Wang
Ge Xu
CVBM
342
257
0
15 Sep 2021
1