ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.12763
  4. Cited By
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding

26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
    ObjD
    VLM
ArXivPDFHTML

Papers citing "MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"

50 / 607 papers shown
Title
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly
  Supervised 3D Visual Grounding
Distilling Coarse-to-Fine Semantic Matching Knowledge for Weakly Supervised 3D Visual Grounding
Zehan Wang
Haifeng Huang
Yang Zhao
Lin Li
Xize Cheng
Yichen Zhu
Aoxiong Yin
Zhou Zhao
16
17
0
18 Jul 2023
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present,
  and Future
A Survey on Open-Vocabulary Detection and Segmentation: Past, Present, and Future
Chaoyang Zhu
Long Chen
ObjD
VLM
24
32
0
18 Jul 2023
Multimodal Diffusion Segmentation Model for Object Segmentation from
  Manipulation Instructions
Multimodal Diffusion Segmentation Model for Object Segmentation from Manipulation Instructions
Yui Iioka
Y. Yoshida
Yuiga Wada
Shumpei Hatanaka
K. Sugiura
DiffM
26
5
0
17 Jul 2023
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up
  Patch Summarization
BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization
Chaoya Jiang
Haiyang Xu
Wei Ye
Qinghao Ye
Chenliang Li
Mingshi Yan
Bin Bi
Shikun Zhang
Fei Huang
Songfang Huang
VLM
21
9
0
17 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language
  Pre-training
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
19
25
0
13 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
28
477
0
12 Jul 2023
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic
  Manipulation
GVCCI: Lifelong Learning of Visual Grounding for Language-Guided Robotic Manipulation
Junghyun Kim
Gi-Cheon Kang
Jaein Kim
Suyeon Shin
Byoung-Tak Zhang
LM&Ro
27
6
0
12 Jul 2023
Prototypical Contrastive Transfer Learning for Multimodal Language
  Understanding
Prototypical Contrastive Transfer Learning for Multimodal Language Understanding
Seitaro Otsuki
Shintaro Ishikawa
K. Sugiura
27
1
0
12 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
VLM
MLLM
83
223
0
07 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
15
5
0
06 Jul 2023
Distilling Large Vision-Language Model with Out-of-Distribution
  Generalizability
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Z. Tu
Haoran Su
VLM
28
23
0
06 Jul 2023
Human Inspired Progressive Alignment and Comparative Learning for
  Grounded Word Acquisition
Human Inspired Progressive Alignment and Comparative Learning for Grounded Word Acquisition
Yuwei Bao
B. Lattimer
J. Chai
CLL
30
1
0
05 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
36
219
0
04 Jul 2023
AVSegFormer: Audio-Visual Segmentation with Transformer
AVSegFormer: Audio-Visual Segmentation with Transformer
Sheng Gao
Zhe Chen
Guo Chen
Wenhai Wang
Tong Lu
VOS
24
46
0
03 Jul 2023
CoPL: Contextual Prompt Learning for Vision-Language Understanding
CoPL: Contextual Prompt Learning for Vision-Language Understanding
Koustava Goswami
Srikrishna Karanam
Prateksha Udhayanan
J. JosephK.
Balaji Vasan Srinivasan
VLM
14
8
0
03 Jul 2023
Statler: State-Maintaining Language Models for Embodied Reasoning
Statler: State-Maintaining Language Models for Embodied Reasoning
Takuma Yoneda
Jiading Fang
Peng Li
Huanyu Zhang
Tianchong Jiang
Shengjie Lin
Ben Picker
David Yunis
Hongyuan Mei
Matthew R. Walter
LM&Ro
20
32
0
30 Jun 2023
Look, Remember and Reason: Grounded reasoning in videos with language
  models
Look, Remember and Reason: Grounded reasoning in videos with language models
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
33
7
0
30 Jun 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Bernard Ghanem
Dacheng Tao
ObjD
VLM
27
134
0
28 Jun 2023
REFLECT: Summarizing Robot Experiences for Failure Explanation and
  Correction
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
Zeyi Liu
Arpit Bahety
Shuran Song
LRM
16
115
0
27 Jun 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
Kosmos-2: Grounding Multimodal Large Language Models to the World
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
28
688
0
26 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching
  Attention and Input
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
20
4
0
25 Jun 2023
DesCo: Learning Object Recognition with Rich Language Descriptions
DesCo: Learning Object Recognition with Rich Language Descriptions
Liunian Harold Li
Zi-Yi Dou
Nanyun Peng
Kai-Wei Chang
ObjD
VLM
16
20
0
24 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
27
52
0
20 Jun 2023
Visually-Guided Sound Source Separation with Audio-Visual Predictive
  Coding
Visually-Guided Sound Source Separation with Audio-Visual Predictive Coding
Zengjie Song
Zhaoxiang Zhang
19
1
0
19 Jun 2023
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot
  Vision-and-Language Navigation
CorNav: Autonomous Agent with Self-Corrected Planning for Zero-Shot Vision-and-Language Navigation
Xiwen Liang
Liang Ma
Shanshan Guo
Jianhua Han
Hang Xu
Shikui Ma
Xiaodan Liang
LM&Ro
LLMAG
79
4
0
17 Jun 2023
Scaling Open-Vocabulary Object Detection
Scaling Open-Vocabulary Object Detection
Matthias Minderer
A. Gritsenko
N. Houlsby
VLM
ObjD
24
175
0
16 Jun 2023
Recurrent Action Transformer with Memory
Recurrent Action Transformer with Memory
A. Staroverov
A. Bessonov
Dmitry A. Yudin
A. Kovalev
Aleksandr I. Panov
OffRL
30
4
0
15 Jun 2023
Exploring the Application of Large-scale Pre-trained Models on Adverse
  Weather Removal
Exploring the Application of Large-scale Pre-trained Models on Adverse Weather Removal
Zhentao Tan
Yue-bo Wu
Qiankun Liu
Qi Chu
Le Lu
Jieping Ye
Nenghai Yu
40
11
0
15 Jun 2023
World-to-Words: Grounded Open Vocabulary Acquisition through Fast
  Mapping in Vision-Language Models
World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models
Ziqiao Ma
Jiayi Pan
J. Chai
ObjD
VLM
21
8
0
14 Jun 2023
detrex: Benchmarking Detection Transformers
detrex: Benchmarking Detection Transformers
Tianhe Ren
Siyi Liu
Feng Li
Hao Zhang
Ailing Zeng
...
Zhaoyang Zeng
Xianbiao Qi
Yuhui Yuan
Jianwei Yang
Lei Zhang
25
13
0
12 Jun 2023
EventCLIP: Adapting CLIP for Event-based Object Recognition
EventCLIP: Adapting CLIP for Event-based Object Recognition
Ziyi Wu
Xudong Liu
Igor Gilitschenski
VLM
22
14
0
10 Jun 2023
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Multi-Modal Classifiers for Open-Vocabulary Object Detection
Prannay Kaul
Weidi Xie
Andrew Zisserman
ObjD
VLM
MLLM
14
47
0
08 Jun 2023
Matting Anything
Matting Anything
Jiacheng Li
Jitesh Jain
Humphrey Shi
VLM
28
16
0
08 Jun 2023
ScaleDet: A Scalable Multi-Dataset Object Detector
ScaleDet: A Scalable Multi-Dataset Object Detector
Yanbei Chen
Manchen Wang
Abhay Mittal
Zhenlin Xu
Paolo Favaro
Joseph Tighe
Davide Modolo
ObjD
6
19
0
08 Jun 2023
Fine-Grained Visual Prompting
Fine-Grained Visual Prompting
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
24
60
0
07 Jun 2023
Language Adaptive Weight Generation for Multi-task Visual Grounding
Language Adaptive Weight Generation for Multi-task Visual Grounding
Wei Su
Peihan Miao
Huanzhang Dou
Gaoang Wang
Liang Qiao
Zheyang Li
Xi Li
ObjD
25
32
0
06 Jun 2023
Referring Expression Comprehension Using Language Adaptive Inference
Referring Expression Comprehension Using Language Adaptive Inference
Wei Su
Peihan Miao
Huanzhang Dou
Yongjian Fu
Xi Li
ObjD
6
18
0
06 Jun 2023
DisCLIP: Open-Vocabulary Referring Expression Generation
DisCLIP: Open-Vocabulary Referring Expression Generation
Lior Bracha
E. Shaar
Aviv Shamsian
Ethan Fetaya
Gal Chechik
ObjD
28
7
0
30 May 2023
Multi-modal Queried Object Detection in the Wild
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
26
29
0
30 May 2023
Contextual Object Detection with Multimodal Large Language Models
Contextual Object Detection with Multimodal Large Language Models
Yuhang Zang
Wei Li
Jun Han
Kaiyang Zhou
Chen Change Loy
ObjD
VLM
MLLM
22
77
0
29 May 2023
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Z-GMOT: Zero-shot Generic Multiple Object Tracking
Kim Hoang Tran
Anh Duy Le Dinh
Tien-Phat Nguyen
Thinh Phan
Pha Nguyen
Khoa Luu
Don Adjeroh
Gianfranco Doretto
Ngan Hoang Le
VOT
28
5
0
28 May 2023
Modularized Zero-shot VQA with Pre-trained Models
Modularized Zero-shot VQA with Pre-trained Models
Rui Cao
Jing Jiang
LRM
19
2
0
27 May 2023
Referred by Multi-Modality: A Unified Temporal Transformer for Video
  Object Segmentation
Referred by Multi-Modality: A Unified Temporal Transformer for Video Object Segmentation
Shilin Yan
Renrui Zhang
Ziyu Guo
Wenchao Chen
Wei Zhang
Hongyang Li
Yu Qiao
Hao Dong
Zhongjiang He
Peng Gao
VOS
16
30
0
25 May 2023
Multi-Modal Mutual Attention and Iterative Interaction for Referring
  Image Segmentation
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
19
47
0
24 May 2023
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image
  Regions
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions
Woojeong Jin
Subhabrata Mukherjee
Yu Cheng
Yelong Shen
Weizhu Chen
Ahmed Hassan Awadallah
Damien Jose
Xiang Ren
ObjD
VLM
25
8
0
24 May 2023
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
Cross3DVG: Cross-Dataset 3D Visual Grounding on Different RGB-D Scans
Taiki Miyanishi
Daich Azuma
Shuhei Kurita
M. Kawanabe
28
2
0
23 May 2023
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Perception Test: A Diagnostic Benchmark for Multimodal Video Models
Viorica Puatruaucean
Lucas Smaira
Ankush Gupta
Adrià Recasens Continente
L. Markeeva
...
Y. Aytar
Simon Osindero
Dima Damen
Andrew Zisserman
João Carreira
VLM
130
139
0
23 May 2023
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Type-to-Track: Retrieve Any Object via Prompt-based Tracking
Pha Nguyen
Kha Gia Quach
Kris M. Kitani
Khoa Luu
30
18
0
22 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
22
90
0
19 May 2023
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual
  Grounding
TreePrompt: Learning to Compose Tree Prompts for Explainable Visual Grounding
Chenchi Zhang
Jun Xiao
Lei Chen
Jian Shao
Long Chen
VLM
LRM
22
2
0
19 May 2023
Previous
123...678...111213
Next