Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
50 / 678 papers shown
detrex: Benchmarking Detection Transformers
Tianhe Ren
Siyi Liu
Feng Li
Hao Zhang
Ailing Zeng
...
Zhaoyang Zeng
Xianbiao Qi
Yuhui Yuan
Jianwei Yang
Lei Zhang
219
20
0
12 Jun 2023
EventCLIP: Adapting CLIP for Event-based Object Recognition
Ziyi Wu
Xudong Liu
Igor Gilitschenski
VLM
288
28
0
10 Jun 2023
Multi-Modal Classifiers for Open-Vocabulary Object Detection
International Conference on Machine Learning (ICML), 2023
Prannay Kaul
Weidi Xie
Andrew Zisserman
ObjD
VLM
MLLM
201
60
0
08 Jun 2023
Matting Anything
Jiacheng Li
Jitesh Jain
Humphrey Shi
VLM
245
37
0
08 Jun 2023
ScaleDet: A Scalable Multi-Dataset Object Detector
Computer Vision and Pattern Recognition (CVPR), 2023
Yanbei Chen
Manchen Wang
Abhay Mittal
Zhenlin Xu
Paolo Favaro
Joseph Tighe
Davide Modolo
ObjD
168
27
0
08 Jun 2023
Fine-Grained Visual Prompting
Neural Information Processing Systems (NeurIPS), 2023
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
245
98
0
07 Jun 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
292
50
0
06 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
AAAI Conference on Artificial Intelligence (AAAI), 2023
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
252
31
0
06 Jun 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
British Machine Vision Conference (BMVC), 2023
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
243
9
0
30 May 2023
Multi-modal Queried Object Detection in the Wild
Neural Information Processing Systems (NeurIPS), 2023
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
364
48
0
30 May 2023
Contextual Object Detection with Multimodal Large Language Models
International Journal of Computer Vision (IJCV), 2023
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
325
140
0
29 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
290
10
0
28 May 2023
Modularized Zero-shot VQA with Pre-trained Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Rui Cao
Jing Jiang
LRM
254
3
0
27 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Guoying Gu
Yu Qiao
Hao Dong
Zhongjiang He
Shiyang Feng
VOS
322
56
0
25 May 2023
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
IEEE Transactions on Image Processing (IEEE TIP), 2023
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
279
66
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
206
9
0
24 May 2023
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
International Conference on 3D Vision (3DV), 2023
Taiki Miyanishi
Daich Azuma
Shuhei Kurita
M. Kawanabe
293
11
0
23 May 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Neural Information Processing Systems (NeurIPS), 2023
Viorica Puatruaucean
Lucas Smaira
Ankush Gupta
Adrià Recasens Continente
L. Markeeva
...
Y. Aytar
Simon Osindero
Dima Damen
Andrew Zisserman
João Carreira
VLM
437
264
0
23 May 2023
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Neural Information Processing Systems (NeurIPS), 2023
Pha Nguyen
Kha Gia Quach
Kris Kitani
Khoa Luu
283
32
0
22 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
International Conference on Learning Representations (ICLR), 2023
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
413
142
0
19 May 2023
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding
Chenchi Zhang
Jun Xiao
Lei Chen
Jian Shao
Long Chen
VLM
LRM
171
3
0
19 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Neural Information Processing Systems (NeurIPS), 2023
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
302
617
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
579
154
0
18 May 2023
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
Ana Claudia Akemi Matsuki de Faria
Felype de Castro Bastos
Jose Victor Nogueira Alves da Silva
Vitor Lopes Fabris
Valeska Uchôa
Décio Gonccalves de Aguiar Neto
C. F. G. Santos
263
27
0
18 May 2023
Annotation-free Audio-Visual Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOS
VLM
392
46
0
18 May 2023
Weakly-Supervised Visual-Textual Grounding with Semantic Prior Refinement
British Machine Vision Conference (BMVC), 2023
Davide Rigoni
Luca Parolari
Luciano Serafini
A. Sperduti
Lamberto Ballan
188
1
0
18 May 2023
UniS-MMC: Multimodal Classification via Unimodality-supervised Multimodal Contrastive Learning
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Heqing Zou
Meng Shen
Chen Chen
Yuchen Hu
D. Rajan
Chng Eng Siong
SSL
225
26
0
16 May 2023
CLIP-VG: Self-paced Curriculum Adapting of CLIP for Visual Grounding
IEEE transactions on multimedia (IEEE TMM), 2023
Linhui Xiao
Xiaoshan Yang
Fang Peng
Ming Yan
Yaowei Wang
Changsheng Xu
ObjD
VLM
448
62
0
15 May 2023
COLA: A Benchmark for Compositional Text-to-image Retrieval
Neural Information Processing Systems (NeurIPS), 2023
Arijit Ray
Filip Radenovic
Abhimanyu Dubey
Bryan A. Plummer
Ranjay Krishna
Kate Saenko
CoGe
VLM
426
54
0
05 May 2023
Unified Model Learning for Various Neural Machine Translation
Yunlong Liang
Fandong Meng
Jinan Xu
Jiaan Wang
Jinan Xu
Jie Zhou
211
1
0
04 May 2023
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
N. Gkanatsios
Ayush Jain
Zhou Xian
Yunchu Zhang
C. Atkeson
Katerina Fragkiadaki
LM&Ro
416
43
0
27 Apr 2023
π
π
π
-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation
International Conference on Machine Learning (ICML), 2023
Chengyue Wu
Teng Wang
Yixiao Ge
Zeyu Lu
Rui-Zhi Zhou
Ying Shan
Ping Luo
MoMe
214
43
0
27 Apr 2023
Zero-shot Unsupervised Transfer Instance Segmentation
Gyungin Shin
Samuel Albanie
Weidi Xie
ISeg
VLM
299
7
0
27 Apr 2023
Multimodal Grounding for Embodied AI via Augmented Reality Headsets for Natural Language Driven Task Planning
Selma Wanna
Fabian Parra
R. Valner
Karl Kruusamäe
Mitch Pryor
LM&Ro
179
3
0
26 Apr 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
429
362
0
24 Apr 2023
OmniLabel: A Challenging Benchmark for Language-Based Object Detection
IEEE International Conference on Computer Vision (ICCV), 2023
S. Schulter
G. VijayKumarB.
Yumin Suh
Konstantinos M. Dafnis
Zhixing Zhang
Shiyu Zhao
Dimitris N. Metaxas
ObjD
184
16
0
22 Apr 2023
Domain Generalization for Mammographic Image Analysis with Contrastive Learning
Zheren Li
Zhiming Cui
Lichi Zhang
Sheng Wang
Chenjin Lei
...
Yajia Gu
Zaiyi Liu
Chunling Liu
Dinggang Shen
Jie‐Zhi Cheng
572
3
0
20 Apr 2023
Transformer-Based Visual Segmentation: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
370
247
0
19 Apr 2023
Delving into Shape-aware Zero-shot Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Xinyu Liu
Beiwen Tian
Zhen Wang
Rui Wang
Kehua Sheng
Bo Zhang
Hao Zhao
Guyue Zhou
VLM
282
27
0
17 Apr 2023
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence
Gengchen Mai
Weiming Huang
Jin Sun
Suhang Song
Deepak Mishra
...
Yingjie Hu
Chris Cundy
Ziyuan Li
Rui Zhu
Ni Lao
AI4CE
318
154
0
13 Apr 2023
What does CLIP know about a red circle? Visual prompt engineering for VLMs
IEEE International Conference on Computer Vision (ICCV), 2023
Aleksandar Shtedritski
Christian Rupprecht
Andrea Vedaldi
VLM
MLLM
379
231
0
13 Apr 2023
Verbs in Action: Improving verb understanding in video-language models
IEEE International Conference on Computer Vision (ICCV), 2023
Liliane Momeni
Mathilde Caron
Arsha Nagrani
Andrew Zisserman
Cordelia Schmid
373
87
0
13 Apr 2023
WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
European Conference on Computer Vision (ECCV), 2023
Zhe Lin
Xidong Peng
Peishan Cong
Ge Zheng
Yujin Sun
Yuenan Hou
Xinge Zhu
Sibei Yang
Yuexin Ma
VGen
302
12
0
12 Apr 2023
MoMo: A shared encoder Model for text, image and multi-Modal representations
Rakesh Chada
Zhao-Heng Zheng
P. Natarajan
ViT
112
5
0
11 Apr 2023
Detection Transformer with Stable Matching
IEEE International Conference on Computer Vision (ICCV), 2023
Siyi Liu
Tianhe Ren
Jia-Yu Chen
Zhaoyang Zeng
Hao Zhang
...
Hongyang Li
Jun Huang
Hang Su
Jun Zhu
Lei Zhang
220
56
0
10 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Computer Vision and Pattern Recognition (CVPR), 2023
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
292
102
0
10 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
199
1
0
10 Apr 2023
ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes
IEEE International Conference on Computer Vision (ICCV), 2023
Ran Gong
Jiangyong Huang
Yizhou Zhao
Haoran Geng
Xiaofeng Gao
...
Ziheng Zhou
D. Terzopoulos
Song-Chun Zhu
Baoxiong Jia
Siyuan Huang
LM&Ro
286
70
0
09 Apr 2023
Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning
International Conference on Machine Learning (ICML), 2023
Yu Yang
Besmira Nushi
Hamid Palangi
Baharan Mirzasoleiman
261
58
0
08 Apr 2023
V3Det: Vast Vocabulary Visual Detection Dataset
IEEE International Conference on Computer Vision (ICCV), 2023
Yuan Liu
Pan Zhang
Tao Chu
Yuhang Cao
Yujie Zhou
Tong Wu
Sijin Yu
Conghui He
Dahua Lin
VLM
ObjD
317
76
0
07 Apr 2023
Previous
1
2
3
...
8
9
10
...
12
13
14
Next