Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
50 / 678 papers shown
Title
Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerce
Computer Vision and Pattern Recognition (CVPR), 2023
Yang Jin
Yongzhi Li
Zehuan Yuan
Yadong Mu
160
20
0
06 Apr 2023
Learning to Name Classes for Vision and Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Sarah Parisot
Yongxin Yang
Jingyu Sun
VLM
191
15
0
04 Apr 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yongxin Zhu
Ziqiang Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
158
9
0
04 Apr 2023
Probabilistic Prompt Learning for Dense Prediction
Computer Vision and Pattern Recognition (CVPR), 2023
Hyeongjun Kwon
Taeyong Song
Somi Jeong
Jin-Hwa Kim
Jinhyun Jang
Kwanghoon Sohn
VLM
282
25
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
491
988
0
03 Apr 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Computer Vision and Pattern Recognition (CVPR), 2023
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
331
9
0
29 Mar 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
IEEE International Conference on Computer Vision (ICCV), 2023
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
519
77
0
29 Mar 2023
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human Attention
Computer Vision and Pattern Recognition (CVPR), 2023
Sounak Mondal
Zhibo Yang
Seoyoung Ahn
Dimitris Samaras
G. Zelinsky
Minh Hoai
302
43
0
27 Mar 2023
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
Computer Vision and Pattern Recognition (CVPR), 2023
WonJun Moon
Sangeek Hyun
S. Park
Dongchan Park
Jae-Pil Heo
ViT
232
185
0
24 Mar 2023
Open-Vocabulary Object Detection using Pseudo Caption Labels
Han-Cheol Cho
Won Young Jhoo
Woohyun Kang
Byungseok Roh
VLM
ObjD
149
20
0
23 Mar 2023
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
K. Pnvr
Bharat Singh
P. Ghosh
Behjat Siddiquie
David Jacobs
DiffM
293
34
0
22 Mar 2023
Detecting the open-world objects with the help of the Brain
Shuailei Ma
Yuefeng Wang
Ying-yu Wei
Peihao Chen
Zhixiang Ye
Jiaqi Fan
Enming Zhang
Thomas H. Li
VLM
ObjD
144
6
0
21 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
ACM Multimedia (ACM MM), 2023
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
378
3
0
18 Mar 2023
Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Kyle Buettner
Adriana Kovashka
197
0
0
17 Mar 2023
A Simple Framework for Open-Vocabulary Segmentation and Detection
IEEE International Conference on Computer Vision (ICCV), 2023
Hao Zhang
Feng Li
Xueyan Zou
Siyi Liu
Chun-yue Li
Jianfeng Gao
Jianwei Yang
Lei Zhang
ObjD
VLM
354
204
0
14 Mar 2023
Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Zhihao Chen
Yangqiaoyu Zhou
A. Tran
Junting Zhao
Liang Wan
...
Lionel T. E. Cheng
C. Thng
Xinxing Xu
Yong-Jin Liu
Huazhu Fu
MedIm
128
35
0
14 Mar 2023
Audio Visual Language Maps for Robot Navigation
International Symposium on Experimental Robotics (ISER), 2023
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
VGen
233
42
0
13 Mar 2023
Universal Instance Perception as Object Discovery and Retrieval
Computer Vision and Pattern Recognition (CVPR), 2023
B. Yan
Yi Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
VOS
VLM
LRM
372
233
0
12 Mar 2023
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
238
14
0
11 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhao Yang
Yuan Liu
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Juil Sock
185
31
0
11 Mar 2023
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Computer Vision and Pattern Recognition (CVPR), 2023
Luting Wang
Yi Liu
Penghui Du
Zihan Ding
Yue Liao
Qiaosong Qi
Biaolong Chen
Si Liu
ObjD
VLM
230
88
0
10 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
European Conference on Computer Vision (ECCV), 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
770
3,247
0
09 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
IEEE International Conference on Computer Vision (ICCV), 2023
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
274
2
0
09 Mar 2023
Referring Multi-Object Tracking
Computer Vision and Pattern Recognition (CVPR), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
226
114
0
06 Mar 2023
Naming Objects for Vision-and-Language Manipulation
Tokuhiro Nishikawa
Kazumi Aoyama
Shunichi Sekiguchi
Takayoshi Takayanagi
Jianing Wu
Yu Ishihara
Tamaki Kojima
Jerry Jun Yokono
139
1
0
06 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
Computer Vision and Pattern Recognition (CVPR), 2023
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjD
VLM
181
45
0
04 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Conference on Robot Learning (CoRL), 2023
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
522
200
0
02 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Neural Information Processing Systems (NeurIPS), 2023
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
236
76
0
01 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
170
1
0
28 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
493
322
0
27 Feb 2023
Localizing Moments in Long Video Via Multimodal Guidance
IEEE International Conference on Computer Vision (ICCV), 2023
Wayner Barrios
Mattia Soldan
Alberto M. Ceballos-Arroyo
Fabian Caba Heilbron
Guohao Li
228
27
0
26 Feb 2023
Focusing On Targets For Improving Weakly Supervised Visual Grounding
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
V. Pham
Nao Mishima
ObjD
196
1
0
22 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
436
270
0
20 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
207
8
0
16 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
281
179
0
14 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
193
1
0
07 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
International Conference on Machine Learning (ICML), 2023
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
243
217
0
01 Feb 2023
MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization
Yinghui Xing
Song Wang
Shizhou Zhang
Guoqiang Liang
Xiuwei Zhang
Yanning Zhang
ViT
376
21
0
01 Feb 2023
Champion Solution for the WSDM2023 Toloka VQA Challenge
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
173
2
0
22 Jan 2023
Linguistic Query-Guided Mask Generation for Referring Image Segmentation
Pattern Recognition (Pattern Recogn.), 2023
Zhichao Wei
Xiaohao Chen
Mingqiang Chen
Siyu Zhu
VLM
299
2
0
16 Jan 2023
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
AAAI Conference on Artificial Intelligence (AAAI), 2023
Haowei Wang
Jiayi Ji
Weihao Ye
Yongjian Wu
Xiaoshuai Sun
178
17
0
09 Jan 2023
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Computer Vision and Pattern Recognition (CVPR), 2023
Da Yin
Feng Gao
Govind Thattai
Michael F. Johnston
Kai-Wei Chang
VLM
171
20
0
05 Jan 2023
PACO: Parts and Attributes of Common Objects
Computer Vision and Pattern Recognition (CVPR), 2023
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
192
143
0
04 Jan 2023
Position-Aware Contrastive Alignment for Referring Image Segmentation
Bo Chen
Zhiwei Hu
Zhilong Ji
Jinfeng Bai
W. Zuo
215
9
0
27 Dec 2022
Weakly-Supervised Semantic Segmentation of Ships Using Thermal Imagery
Rushil Joshi
Ethan R. Adams
Matthew R. Ziemann
Christopher A. Metzler
181
1
0
26 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Computer Vision and Pattern Recognition (CVPR), 2022
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLM
MLLM
ObjD
280
326
0
21 Dec 2022
Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?
IEEE Access (IEEE Access), 2022
Monika Wysoczañska
Tom Monnier
Tomasz Trzciñski
David Picard
ReLM
OCL
184
1
0
20 Dec 2022
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive Evaluation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Matthieu Futeral
Cordelia Schmid
Ivan Laptev
Benoît Sagot
Rachel Bawden
302
46
0
20 Dec 2022
Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning
Hui Li
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Yao-Min Zhao
219
27
0
17 Dec 2022
Policy Adaptation from Foundation Model Feedback
Computer Vision and Pattern Recognition (CVPR), 2022
Yuying Ge
Annabella Macaluso
Erran L. Li
Ping Luo
Xiaolong Wang
LM&Ro
269
19
0
14 Dec 2022
Previous
1
2
3
...
10
11
12
13
14
9
Next