ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.16527
  4. Cited By
Exploring Plain Vision Transformer Backbones for Object Detection

Exploring Plain Vision Transformer Backbones for Object Detection

30 March 2022
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
    ViT
ArXivPDFHTML

Papers citing "Exploring Plain Vision Transformer Backbones for Object Detection"

50 / 110 papers shown
Title
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
61
0
0
04 May 2025
CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion
CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion
Boyuan Meng
X. Zhang
Peilin Li
Zhe Wu
Yiming Li
Wenkai Zhao
B. Yu
Hui-Liang Shen
ViT
39
0
0
02 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
Fanghua Yu
Jinjin Gu
Jinfan Hu
Zheyuan Li
Chao Dong
DiffM
50
0
0
21 Mar 2025
STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans
STEP: Simultaneous Tracking and Estimation of Pose for Animals and Humans
Shashikant Verma
Harish Katti
Soumyaratna Debnath
Yamuna Swamy
S. Raman
100
0
0
17 Mar 2025
Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Spiking Transformer:Introducing Accurate Addition-Only Spiking Self-Attention for Transformer
Yufei Guo
Xiaode Liu
Y. Chen
Weihang Peng
Yuhan Zhang
Zhe Ma
MQ
43
0
0
28 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
63
8
0
24 Feb 2025
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval
Haoran Tang
Meng Cao
Jinfa Huang
Ruyang Liu
Peng Jin
Ge Li
Xiaodan Liang
Mamba
92
4
0
24 Feb 2025
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Janus: Collaborative Vision Transformer Under Dynamic Network Environment
Linyi Jiang
Silvery Fu
Yifei Zhu
Bo Li
ViT
96
0
0
14 Feb 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
S. Xiang
Chunhong Pan
72
1
0
04 Feb 2025
Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation
Kolmogorov-Arnold Network for Remote Sensing Image Semantic Segmentation
Xianping Ma
Ziyao Wang
Yin Hu
Xiaokang Zhang
Man-On Pun
46
0
0
13 Jan 2025
UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping
UV-Attack: Physical-World Adversarial Attacks for Person Detection via Dynamic-NeRF-based UV Mapping
Yanjie Li
Wenxuan Zhang
K. Liang
Bin Xiao
AAML
59
1
0
10 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
46
0
03 Jan 2025
IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
Yaming Zhang
Chenqiang Gao
Fangcen Liu
Junjie Guo
Lan Wang
Xinggan Peng
Deyu Meng
87
0
0
21 Dec 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
W. Liu
X. Wang
3DGS
ViT
108
5
0
17 Dec 2024
Transmission Line Defect Detection Based on UAV Patrol Images and Vision-language Pretraining
Ke Zhang
Zhaoye Zheng
Yurong Guo
Jiacun Wang
Jiyuan Yang
Yangjie Xiao
VLM
77
0
0
18 Nov 2024
TransAgent: Transfer Vision-Language Foundation Models with
  Heterogeneous Agent Collaboration
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo
Shaobin Zhuang
Kunchang Li
Yu Qiao
Yali Wang
VLM
CLIP
21
0
0
16 Oct 2024
Fractal Calibration for long-tailed object detection
Fractal Calibration for long-tailed object detection
Konstantinos Panagiotis Alexandridis
Ismail Elezi
Jiankang Deng
Anh H. Nguyen
Shan Luo
61
0
0
15 Oct 2024
GlobalMamba: Global Image Serialization for Vision Mamba
GlobalMamba: Global Image Serialization for Vision Mamba
Chengkun Wang
Wenzhao Zheng
Jie Zhou
Jiwen Lu
Mamba
31
0
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
64
3
0
14 Oct 2024
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary
  Segmentation
FrozenSeg: Harmonizing Frozen Foundation Models for Open-Vocabulary Segmentation
Xi Chen
Haosen Yang
Sheng Jin
Xiatian Zhu
H. Yao
VLM
29
3
0
05 Sep 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
M. Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
41
3
0
30 Aug 2024
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
How Does Audio Influence Visual Attention in Omnidirectional Videos? Database and Model
Yuxin Zhu
Huiyu Duan
Kaiwei Zhang
Yucheng Zhu
Xilei Zhu
Long Teng
Xiongkuo Min
Guangtao Zhai
67
2
0
10 Aug 2024
Rate-Distortion-Cognition Controllable Versatile Neural Image
  Compression
Rate-Distortion-Cognition Controllable Versatile Neural Image Compression
Jinming Liu
Ruoyu Feng
Yunpeng Qi
Qiuyu Chen
Zhibo Chen
Wenjun Zeng
Xin Jin
28
2
0
16 Jul 2024
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Learning Spatial-Semantic Features for Robust Video Object Segmentation
Xin Li
Deshui Miao
Zhenyu He
Y. Wang
Huchuan Lu
Ming Yang
VOS
49
4
0
10 Jul 2024
Robot Instance Segmentation with Few Annotations for Grasping
Robot Instance Segmentation with Few Annotations for Grasping
Moshe Kimhi
David Vainshtein
Chaim Baskin
Dotan Di Castro
48
2
0
01 Jul 2024
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
EVF-SAM: Early Vision-Language Fusion for Text-Prompted Segment Anything Model
Yuxuan Zhang
Tianheng Cheng
Lianghui Zhu
Lei Liu
Heng Liu
Longjin Ran
Xiaoxin Chen
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
VLM
51
24
0
28 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Bo Du
Dacheng Tao
Liangpei Zhang
59
21
0
17 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
41
2
0
12 Jun 2024
Masked Image Modelling for retinal OCT understanding
Masked Image Modelling for retinal OCT understanding
Theodoros Pissas
Pablo Márquez-Neila
Sebastian Wolf
M. Zinkernagel
Raphael Sznitman
17
0
0
23 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and
  Extrapolate
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
34
2
0
22 May 2024
Replication Study and Benchmarking of Real-Time Object Detection Models
Replication Study and Benchmarking of Real-Time Object Detection Models
Pierre-Luc Asselin
Vincent Coulombe
William Guimont-Martin
William Larrivée-Hardy
30
0
0
11 May 2024
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training
MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training
Jiayang Li
Junjun Jiang
Pengwei Liang
Jiayi Ma
Liqiang Nie
39
1
0
17 Apr 2024
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Cross-domain Multi-modal Few-shot Object Detection via Rich Text
Zeyu Shangguan
Daniel Seita
Mohammad Rostami
ObjD
45
1
0
24 Mar 2024
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Justin Kay
T. Haucke
Suzanne Stathatos
Siqi Deng
Erik Young
Pietro Perona
Sara Beery
Grant Van Horn
51
4
0
18 Mar 2024
Denoising Autoregressive Representation Learning
Denoising Autoregressive Representation Learning
Yazhe Li
J. Bornschein
Ting Chen
DiffM
21
3
0
08 Mar 2024
GOOD: Towards Domain Generalized Orientated Object Detection
GOOD: Towards Domain Generalized Orientated Object Detection
Qi Bi
Beichen Zhou
Jingjun Yi
Wei Ji
Haolan Zhan
Gui-Song Xia
ObjD
OOD
74
2
0
20 Feb 2024
Dual-View Visual Contextualization for Web Navigation
Dual-View Visual Contextualization for Web Navigation
Jihyung Kil
Chan Hee Song
Boyuan Zheng
Xiang Deng
Yu-Chuan Su
Wei-Lun Chao
EgoV
22
12
0
06 Feb 2024
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Xinlei Chen
Zhuang Liu
Saining Xie
Kaiming He
DiffM
25
52
0
25 Jan 2024
Rethinking Patch Dependence for Masked Autoencoders
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
26
14
0
25 Jan 2024
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask
  Inpainting
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke
Bert De Brabandere
DiffM
22
11
0
18 Jan 2024
A Study on Self-Supervised Pretraining for Vision Problems in
  Gastrointestinal Endoscopy
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Edward Sanderson
B. Matuszewski
18
2
0
11 Jan 2024
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
34
62
0
11 Dec 2023
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains
  Into One
AM-RADIO: Agglomerative Vision Foundation Model -- Reduce All Domains Into One
Michael Ranzinger
Greg Heinrich
Jan Kautz
Pavlo Molchanov
VLM
26
42
0
10 Dec 2023
Unified Medical Image Pre-training in Language-Guided Common Semantic
  Space
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He
Yifan Yang
Xinyang Jiang
Xufang Luo
Haoji Hu
Siyun Zhao
Dongsheng Li
Yuqing Yang
Lili Qiu
27
1
0
24 Nov 2023
Learning Scene Context Without Images
Learning Scene Context Without Images
Amirreza Rouhi
David Han
VLM
25
0
0
18 Nov 2023
Processing and Segmentation of Human Teeth from 2D Images using Weakly
  Supervised Learning
Processing and Segmentation of Human Teeth from 2D Images using Weakly Supervised Learning
Tomáš Kunzo
Viktor Kocur
Lukás Gajdosech
Martin Madaras
13
0
0
13 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
31
142
0
10 Nov 2023
Minimalist and High-Performance Semantic Segmentation with Plain Vision
  Transformers
Minimalist and High-Performance Semantic Segmentation with Plain Vision Transformers
Yuanduo Hong
Jue Wang
Weichao Sun
Huihui Pan
VLM
ViT
27
7
0
19 Oct 2023
EfficientOCR: An Extensible, Open-Source Package for Efficiently
  Digitizing World Knowledge
EfficientOCR: An Extensible, Open-Source Package for Efficiently Digitizing World Knowledge
Tom Bryan
Jacob Carlson
Abhishek Arora
Melissa Dell
18
8
0
16 Oct 2023
123
Next