Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
50 / 678 papers shown
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
234
29
0
10 May 2022
Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection
Wei Feng
Xingyuan Bu
Chenchen Zhang
Xubin Li
VLM
153
5
0
09 May 2022
Declaration-based Prompt Tuning for Visual Question Answering
International Joint Conference on Artificial Intelligence (IJCAI), 2022
Yuhang Liu
Wei Wei
Daowan Peng
Feida Zhu
MLLM
VLM
118
21
0
05 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
279
18
0
02 May 2022
A Multi-level Alignment Training Scheme for Video-and-Language Grounding
Yubo Zhang
Feiyang Niu
Q. Ping
Govind Thattai
CVBM
219
2
0
22 Apr 2022
Self-paced Multi-grained Cross-modal Interaction Modeling for Referring Expression Comprehension
IEEE Transactions on Image Processing (IEEE TIP), 2022
Peihan Miao
Wei Su
Gaoang Wang
Xuewei Li
Xi Li
ObjD
334
13
0
21 Apr 2022
A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension
IEEE transactions on multimedia (IEEE TMM), 2022
Gen Luo
Weihao Ye
Jiamu Sun
Xiaoshuai Sun
Rongrong Ji
ObjD
243
13
0
17 Apr 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Sanjay Subramanian
William Merrill
Trevor Darrell
Matt Gardner
Sameer Singh
Anna Rohrbach
ObjD
284
156
0
12 Apr 2022
X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks
European Conference on Computer Vision (ECCV), 2022
Zhaowei Cai
Gukyeong Kwon
Avinash Ravichandran
Erhan Bas
Zhuowen Tu
Rahul Bhotika
Stefano Soatto
ObjD
MLLM
VLM
145
51
0
12 Apr 2022
Domain-Agnostic Prior for Transfer Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2022
Xinyue Huo
Lingxi Xie
Hengtong Hu
Wen-gang Zhou
Houqiang Li
Qi Tian
220
38
0
06 Apr 2022
"This is my unicorn, Fluffy": Personalizing frozen vision-language representations
European Conference on Computer Vision (ECCV), 2022
Niv Cohen
Rinon Gal
E. Meirom
Gal Chechik
Yuval Atzmon
VLM
MLLM
351
104
0
04 Apr 2022
MultiMAE: Multi-modal Multi-task Masked Autoencoders
European Conference on Computer Vision (ECCV), 2022
Roman Bachmann
David Mizrahi
Andrei Atanov
Amir Zamir
423
345
0
04 Apr 2022
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
International Conference on Learning Representations (ICLR), 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
...
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
590
681
0
01 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
European Conference on Computer Vision (ECCV), 2022
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
210
18
0
31 Mar 2022
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
Computer Vision and Pattern Recognition (CVPR), 2022
N. Kim
Dongwon Kim
Cuiling Lan
Wenjun Zeng
Suha Kwak
345
178
0
31 Mar 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Computer Vision and Pattern Recognition (CVPR), 2022
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
341
121
0
30 Mar 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2022
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
230
93
0
29 Mar 2022
Open-Vocabulary DETR with Conditional Matching
European Conference on Computer Vision (ECCV), 2022
Yuhang Zang
Wei Li
Kaiyang Zhou
Chen Huang
Chen Change Loy
ObjD
VLM
382
262
0
22 Mar 2022
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
Computer Vision and Pattern Recognition (CVPR), 2022
S. Gadre
Mitchell Wortsman
Gabriel Ilharco
Ludwig Schmidt
Shuran Song
CLIP
LM&Ro
337
235
0
20 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
322
101
0
18 Mar 2022
End-to-End Modeling via Information Tree for One-Shot Natural Language Spatial Video Grounding
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Meng Li
Tianbao Wang
Haoyu Zhang
Shengyu Zhang
Zhou Zhao
...
Wenming Tan
Jin Wang
Peng Wang
Shi Pu
Leilei Gan
292
46
0
15 Mar 2022
Can you even tell left from right? Presenting a new challenge for VQA
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Sairaam Venkatraman
Rishi Rao
S. Balasubramanian
C. Vorugunti
R. R. Sarma
CoGe
174
0
0
15 Mar 2022
Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
European Conference on Computer Vision (ECCV), 2022
Boyu Chen
Peixia Li
Mengwei He
Leixian Qiao
Qiuhong Shen
Yue Liu
Weihao Gan
Wei Wu
Wanli Ouyang
ViT
VOT
272
269
0
10 Mar 2022
CMX: Cross-Modal Fusion for RGB-X Semantic Segmentation with Transformers
Kailai Li
Huayao Liu
Kailun Yang
Xinxin Hu
Ruiping Liu
Rainer Stiefelhagen
ViT
417
513
0
09 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
International Conference on Learning Representations (ICLR), 2022
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
744
2,208
0
07 Mar 2022
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations
AAAI/ACM Conference on AI, Ethics, and Society (AIES), 2022
Yiwei Lyu
Paul Pu Liang
Zihao Deng
Ruslan Salakhutdinov
Louis-Philippe Morency
234
51
0
03 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
207
41
0
03 Mar 2022
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
Computer Vision and Pattern Recognition (CVPR), 2022
Mohamed Afham
Isuru Dissanayake
Dinithi Dissanayake
Amaya Dharmasiri
Kanchana Thilakarathna
Ranga Rodrigo
3DPC
329
318
0
01 Mar 2022
Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
Computer Vision and Pattern Recognition (CVPR), 2022
Mingyang Zhou
Licheng Yu
Amanpreet Singh
Mengjiao MJ Wang
Zhou Yu
Ning Zhang
VLM
158
35
0
01 Mar 2022
Measuring CLEVRness: Blackbox testing of Visual Reasoning Models
International Conference on Learning Representations (ICLR), 2022
Spyridon Mouselinos
Henryk Michalewski
Mateusz Malinowski
270
4
0
24 Feb 2022
GroupViT: Semantic Segmentation Emerges from Text Supervision
Computer Vision and Pattern Recognition (CVPR), 2022
Jiarui Xu
Shalini De Mello
Sifei Liu
Wonmin Byeon
Thomas Breuel
Jan Kautz
Xinyu Wang
ViT
VLM
759
631
0
22 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Machine Intelligence Research (MIR), 2022
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
393
287
0
18 Feb 2022
Delving Deeper into Cross-lingual Visual Question Answering
Findings (Findings), 2022
Chen Cecilia Liu
Jonas Pfeiffer
Anna Korhonen
Ivan Vulić
Iryna Gurevych
300
10
0
15 Feb 2022
An experimental study of the vision-bottleneck in VQA
Social Science Research Network (SSRN), 2022
Pierre Marza
Corentin Kervadec
G. Antipov
M. Baccouche
Christian Wolf
250
1
0
14 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
International Conference on Machine Learning (ICML), 2022
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
517
1,009
0
07 Feb 2022
Transformers in Medical Imaging: A Survey
Fahad Shamshad
Salman Khan
Syed Waqas Zamir
Muhammad Haris Khan
Munawar Hayat
Fahad Shahbaz Khan
Huazhu Fu
ViT
LM&MA
MedIm
322
958
0
24 Jan 2022
Omnivore: A Single Model for Many Visual Modalities
Computer Vision and Pattern Recognition (CVPR), 2022
Rohit Girdhar
Mannat Singh
Nikhil Ravi
Laurens van der Maaten
Armand Joulin
Ishan Misra
597
287
0
20 Jan 2022
Label-dependent and event-guided interpretable disease risk prediction using EHRs
IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021
Shuai Niu
Yunya Song
Qing Yin
Wenhan Luo
Xian Yang
106
4
0
18 Jan 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Neurocomputing (Neurocomputing), 2022
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
207
12
0
18 Jan 2022
Multi-Query Video Retrieval
European Conference on Computer Vision (ECCV), 2022
Zeyu Wang
Yu Wu
Karthik Narasimhan
Olga Russakovsky
285
23
0
10 Jan 2022
Language-driven Semantic Segmentation
International Conference on Learning Representations (ICLR), 2022
Boyi Li
Kilian Q. Weinberger
Serge Belongie
V. Koltun
René Ranftl
VLM
329
780
0
10 Jan 2022
Detecting Twenty-thousand Classes using Image-level Supervision
European Conference on Computer Vision (ECCV), 2022
Xingyi Zhou
Rohit Girdhar
Armand Joulin
Phillip Krahenbuhl
Ishan Misra
CLIP
VLM
488
752
0
07 Jan 2022
Language as Queries for Referring Video Object Segmentation
Computer Vision and Pattern Recognition (CVPR), 2022
Jiannan Wu
Yi Jiang
Pei Sun
Zehuan Yuan
Ping Luo
516
220
0
03 Jan 2022
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
European Conference on Computer Vision (ECCV), 2021
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
444
494
0
22 Dec 2021
Image Segmentation Using Text and Image Prompts
Computer Vision and Pattern Recognition (CVPR), 2021
Timo Lüddecke
Alexander S. Ecker
CLIP
VLM
710
647
0
18 Dec 2021
Bottom Up Top Down Detection Transformers for Language Grounding in Images and Point Clouds
Ayush Jain
N. Gkanatsios
Ishita Mediratta
Katerina Fragkiadaki
ObjD
479
147
0
16 Dec 2021
Predicting Physical World Destinations for Commands Given to Self-Driving Cars
Dusan Grujicic
Thierry Deruyttere
Marie-Francine Moens
Matthew Blaschko
OOD
200
8
0
10 Dec 2021
PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning
Yining Hong
Li Yi
J. Tenenbaum
Antonio Torralba
Chuang Gan
168
43
0
09 Dec 2021
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
458
1,385
0
07 Dec 2021
From Coarse to Fine-grained Concept based Discrimination for Phrase Detection
Maan Qraitem
Bryan A. Plummer
ObjD
195
0
0
06 Dec 2021
Previous
1
2
3
...
12
13
14
Next