ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.12763
  4. Cited By
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
v1v2 (latest)

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
    ObjDVLM
ArXiv (abs)PDFHTMLGithub (1008★)

Papers citing "MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"

50 / 678 papers shown
Title
Learning Instance-Level Representation for Large-Scale Multi-Modal
  Pretraining in E-commerce
Learning Instance-Level Representation for Large-Scale Multi-Modal Pretraining in E-commerceComputer Vision and Pattern Recognition (CVPR), 2023
Yang Jin
Yongzhi Li
Zehuan Yuan
Yadong Mu
160
20
0
06 Apr 2023
Learning to Name Classes for Vision and Language Models
Learning to Name Classes for Vision and Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Sarah Parisot
Yongxin Yang
Jingyu Sun
VLM
191
15
0
04 Apr 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for
  Scene-Text VQA
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQAAAAI Conference on Artificial Intelligence (AAAI), 2023
Yongxin Zhu
Ziqiang Liu
Yukang Liang
Xin Li
Hao Liu
Changcun Bao
Linli Xu
158
9
0
04 Apr 2023
Probabilistic Prompt Learning for Dense Prediction
Probabilistic Prompt Learning for Dense PredictionComputer Vision and Pattern Recognition (CVPR), 2023
Hyeongjun Kwon
Taeyong Song
Somi Jeong
Jin-Hwa Kim
Jinhyun Jang
Kwanghoon Sohn
VLM
282
25
0
03 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
491
988
0
03 Apr 2023
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in
  Untrimmed Multi-Action Videos from Narrated Instructions
What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated InstructionsComputer Vision and Pattern Recognition (CVPR), 2023
Brian Chen
Nina Shvetsova
Andrew Rouditchenko
D. Kondermann
Samuel Thomas
Shih-Fu Chang
Rogerio Feris
James R. Glass
Hilde Kuehne
331
9
0
29 Mar 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with
  GPT and Prototype Guidance
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype GuidanceIEEE International Conference on Computer Vision (ICCV), 2023
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
519
77
0
29 Mar 2023
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed
  Human Attention
Gazeformer: Scalable, Effective and Fast Prediction of Goal-Directed Human AttentionComputer Vision and Pattern Recognition (CVPR), 2023
Sounak Mondal
Zhibo Yang
Seoyoung Ahn
Dimitris Samaras
G. Zelinsky
Minh Hoai
302
43
0
27 Mar 2023
Query-Dependent Video Representation for Moment Retrieval and Highlight
  Detection
Query-Dependent Video Representation for Moment Retrieval and Highlight DetectionComputer Vision and Pattern Recognition (CVPR), 2023
WonJun Moon
Sangeek Hyun
S. Park
Dongchan Park
Jae-Pil Heo
ViT
232
185
0
24 Mar 2023
Open-Vocabulary Object Detection using Pseudo Caption Labels
Open-Vocabulary Object Detection using Pseudo Caption Labels
Han-Cheol Cho
Won Young Jhoo
Woohyun Kang
Byungseok Roh
VLMObjD
149
20
0
23 Mar 2023
LD-ZNet: A Latent Diffusion Approach for Text-Based Image Segmentation
LD-ZNet: A Latent Diffusion Approach for Text-Based Image SegmentationIEEE International Conference on Computer Vision (ICCV), 2023
K. Pnvr
Bharat Singh
P. Ghosh
Behjat Siddiquie
David Jacobs
DiffM
293
34
0
22 Mar 2023
Detecting the open-world objects with the help of the Brain
Detecting the open-world objects with the help of the Brain
Shuailei Ma
Yuefeng Wang
Ying-yu Wei
Peihao Chen
Zhixiang Ye
Jiaqi Fan
Enming Zhang
Thomas H. Li
VLMObjD
144
6
0
21 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
A Region-Prompted Adapter Tuning for Visual Abductive ReasoningACM Multimedia (ACM MM), 2023
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
378
3
0
18 Mar 2023
Investigating the Role of Attribute Context in Vision-Language Models
  for Object Recognition and Detection
Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and DetectionIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Kyle Buettner
Adriana Kovashka
197
0
0
17 Mar 2023
A Simple Framework for Open-Vocabulary Segmentation and Detection
A Simple Framework for Open-Vocabulary Segmentation and DetectionIEEE International Conference on Computer Vision (ICCV), 2023
Hao Zhang
Feng Li
Xueyan Zou
Siyi Liu
Chun-yue Li
Jianfeng Gao
Jianwei Yang
Lei Zhang
ObjDVLM
354
204
0
14 Mar 2023
Medical Phrase Grounding with Region-Phrase Context Contrastive
  Alignment
Medical Phrase Grounding with Region-Phrase Context Contrastive AlignmentInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Zhihao Chen
Yangqiaoyu Zhou
A. Tran
Junting Zhao
Liang Wan
...
Lionel T. E. Cheng
C. Thng
Xinxing Xu
Yong-Jin Liu
Huazhu Fu
MedIm
128
35
0
14 Mar 2023
Audio Visual Language Maps for Robot Navigation
Audio Visual Language Maps for Robot NavigationInternational Symposium on Experimental Robotics (ISER), 2023
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
VGen
233
42
0
13 Mar 2023
Universal Instance Perception as Object Discovery and Retrieval
Universal Instance Perception as Object Discovery and RetrievalComputer Vision and Pattern Recognition (CVPR), 2023
B. Yan
Yi Jiang
Jiannan Wu
D. Wang
Ping Luo
Zehuan Yuan
Huchuan Lu
VOSVLMLRM
372
233
0
12 Mar 2023
Learning Grounded Vision-Language Representation for Versatile
  Understanding in Untrimmed Videos
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
238
14
0
11 Mar 2023
Semantics-Aware Dynamic Localization and Refinement for Referring Image
  Segmentation
Semantics-Aware Dynamic Localization and Refinement for Referring Image SegmentationAAAI Conference on Artificial Intelligence (AAAI), 2023
Zhao Yang
Yuan Liu
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Juil Sock
185
31
0
11 Mar 2023
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Object-Aware Distillation Pyramid for Open-Vocabulary Object DetectionComputer Vision and Pattern Recognition (CVPR), 2023
Luting Wang
Yi Liu
Penghui Du
Zihan Ding
Yue Liao
Qiaosong Qi
Biaolong Chen
Si Liu
ObjDVLM
230
88
0
10 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object DetectionEuropean Conference on Computer Vision (ECCV), 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
770
3,247
0
09 Mar 2023
Toward Unsupervised Realistic Visual Question Answering
Toward Unsupervised Realistic Visual Question AnsweringIEEE International Conference on Computer Vision (ICCV), 2023
Yuwei Zhang
Chih-Hui Ho
Nuno Vasconcelos
CoGe
274
2
0
09 Mar 2023
Referring Multi-Object Tracking
Referring Multi-Object TrackingComputer Vision and Pattern Recognition (CVPR), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
226
114
0
06 Mar 2023
Naming Objects for Vision-and-Language Manipulation
Naming Objects for Vision-and-Language Manipulation
Tokuhiro Nishikawa
Kazumi Aoyama
Shunichi Sekiguchi
Takayoshi Takayanagi
Jianing Wu
Yu Ishihara
Tamaki Kojima
Jerry Jun Yokono
139
1
0
06 Mar 2023
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
CapDet: Unifying Dense Captioning and Open-World Detection PretrainingComputer Vision and Pattern Recognition (CVPR), 2023
Yanxin Long
Youpeng Wen
Jianhua Han
Hang Xu
Pengzhen Ren
Wei Zhang
Sheng Zhao
Xiaodan Liang
ObjDVLM
181
45
0
04 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Open-World Object Manipulation using Pre-trained Vision-Language ModelsConference on Robot Learning (CoRL), 2023
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
522
200
0
02 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for
  Embodied Agents
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied AgentsNeural Information Processing Systems (NeurIPS), 2023
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
236
76
0
01 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in
  Situated Dialogue
Which One Are You Referring To? Multimodal Object Identification in Situated DialogueConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
170
1
0
28 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video CaptioningComputer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TSVLM
493
322
0
27 Feb 2023
Localizing Moments in Long Video Via Multimodal Guidance
Localizing Moments in Long Video Via Multimodal GuidanceIEEE International Conference on Computer Vision (ICCV), 2023
Wayner Barrios
Mattia Soldan
Alberto M. Ceballos-Arroyo
Fabian Caba Heilbron
Guohao Li
228
27
0
26 Feb 2023
Focusing On Targets For Improving Weakly Supervised Visual Grounding
Focusing On Targets For Improving Weakly Supervised Visual GroundingIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
V. Pham
Nao Mishima
ObjD
196
1
0
22 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive SurveyMachine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CEVLM
436
270
0
20 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
207
8
0
16 Feb 2023
PolyFormer: Referring Image Segmentation as Sequential Polygon
  Generation
PolyFormer: Referring Image Segmentation as Sequential Polygon GenerationComputer Vision and Pattern Recognition (CVPR), 2023
Jiang Liu
Hui Ding
Zhaowei Cai
Yuting Zhang
R. Satzoda
Vijay Mahadevan
R. Manmatha
ObjD
281
179
0
14 Feb 2023
Revisiting Pre-training in Audio-Visual Learning
Revisiting Pre-training in Audio-Visual Learning
Ruoxuan Feng
Wenke Xia
Di Hu
193
1
0
07 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image
  and Video
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and VideoInternational Conference on Machine Learning (ICML), 2023
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLMVLMMoE
243
217
0
01 Feb 2023
MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely
  Coupled Fusion and Modality-Balanced Optimization
MS-DETR: Multispectral Pedestrian Detection Transformer with Loosely Coupled Fusion and Modality-Balanced Optimization
Yinghui Xing
Song Wang
Shizhou Zhang
Guoqiang Liang
Xiuwei Zhang
Yanning Zhang
ViT
376
21
0
01 Feb 2023
Champion Solution for the WSDM2023 Toloka VQA Challenge
Champion Solution for the WSDM2023 Toloka VQA Challenge
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
173
2
0
22 Jan 2023
Linguistic Query-Guided Mask Generation for Referring Image Segmentation
Linguistic Query-Guided Mask Generation for Referring Image SegmentationPattern Recognition (Pattern Recogn.), 2023
Zhichao Wei
Xiaohao Chen
Mingqiang Chen
Siyu Zhu
VLM
299
2
0
16 Jan 2023
Towards Real-Time Panoptic Narrative Grounding by an End-to-End
  Grounding Network
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding NetworkAAAI Conference on Artificial Intelligence (AAAI), 2023
Haowei Wang
Jiayi Ji
Weihao Ye
Yongjian Wu
Xiaoshuai Sun
178
17
0
09 Jan 2023
GIVL: Improving Geographical Inclusivity of Vision-Language Models with
  Pre-Training Methods
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training MethodsComputer Vision and Pattern Recognition (CVPR), 2023
Da Yin
Feng Gao
Govind Thattai
Michael F. Johnston
Kai-Wei Chang
VLM
171
20
0
05 Jan 2023
PACO: Parts and Attributes of Common Objects
PACO: Parts and Attributes of Common ObjectsComputer Vision and Pattern Recognition (CVPR), 2023
Vignesh Ramanathan
Anmol Kalia
Vladan Petrovic
Yiqian Wen
Baixue Zheng
...
Abhishek Kadian
Amir Mousavi
Yi-Zhe Song
Abhimanyu Dubey
D. Mahajan
VLM
192
143
0
04 Jan 2023
Position-Aware Contrastive Alignment for Referring Image Segmentation
Position-Aware Contrastive Alignment for Referring Image Segmentation
Bo Chen
Zhiwei Hu
Zhilong Ji
Jinfeng Bai
W. Zuo
215
9
0
27 Dec 2022
Weakly-Supervised Semantic Segmentation of Ships Using Thermal Imagery
Weakly-Supervised Semantic Segmentation of Ships Using Thermal Imagery
Rushil Joshi
Ethan R. Adams
Matthew R. Ziemann
Christopher A. Metzler
181
1
0
26 Dec 2022
Generalized Decoding for Pixel, Image, and Language
Generalized Decoding for Pixel, Image, and LanguageComputer Vision and Pattern Recognition (CVPR), 2022
Xueyan Zou
Zi-Yi Dou
Jianwei Yang
Zhe Gan
Linjie Li
...
Lu Yuan
Nanyun Peng
Lijuan Wang
Yong Jae Lee
Jianfeng Gao
VLMMLLMObjD
280
326
0
21 Dec 2022
Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know
  How to Reason?
Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?IEEE Access (IEEE Access), 2022
Monika Wysoczañska
Tom Monnier
Tomasz Trzciñski
David Picard
ReLMOCL
184
1
0
20 Dec 2022
Tackling Ambiguity with Images: Improved Multimodal Machine Translation
  and Contrastive Evaluation
Tackling Ambiguity with Images: Improved Multimodal Machine Translation and Contrastive EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Matthieu Futeral
Cordelia Schmid
Ivan Laptev
Benoît Sagot
Rachel Bawden
302
46
0
20 Dec 2022
Fully and Weakly Supervised Referring Expression Segmentation with
  End-to-End Learning
Fully and Weakly Supervised Referring Expression Segmentation with End-to-End Learning
Hui Li
Mingjie Sun
Jimin Xiao
Eng Gee Lim
Yao-Min Zhao
219
27
0
17 Dec 2022
Policy Adaptation from Foundation Model Feedback
Policy Adaptation from Foundation Model FeedbackComputer Vision and Pattern Recognition (CVPR), 2022
Yuying Ge
Annabella Macaluso
Erran L. Li
Ping Luo
Xiaolong Wang
LM&Ro
269
19
0
14 Dec 2022
Previous
123...10111213149
Next