Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
50 / 671 papers shown
Title
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
1.1K
1
0
01 Dec 2024
Perception Test 2024: Challenge Summary and a Novel Hour-Long VideoQA Benchmark
Joseph Heyward
João Carreira
Dima Damen
Andrew Zisserman
Viorica Patraucean
299
3
0
29 Nov 2024
From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects
Zizhao Li
Zhengkang Xiang
Joseph West
Kourosh Khoshelham
ObjD
VLM
345
3
0
27 Nov 2024
Leverage Task Context for Object Affordance Ranking
Haojie Huang
Hongchen Luo
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
246
0
0
25 Nov 2024
Open Vocabulary Monocular 3D Object Detection
Jin Yao
Hao Gu
Xuweiyi Chen
Jiayun Wang
Zezhou Cheng
ObjD
VLM
418
9
0
25 Nov 2024
Learning to Reason Iteratively and Parallelly for Complex Visual Reasoning Scenarios
Neural Information Processing Systems (NeurIPS), 2024
Shantanu Jaiswal
Debaditya Roy
Basura Fernando
Cheston Tan
ReLM
LRM
315
4
0
20 Nov 2024
TrojanRobot: Physical-world Backdoor Attacks Against VLM-based Robotic Manipulation
Xiaobei Wang
Hewen Pan
Hangtao Zhang
Minghui Li
Shengshan Hu
...
Lulu Xue
Peijin Guo
Yichen Wang
Wei Wan
Aishan Liu
AAML
584
2
0
18 Nov 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
Wei Wang
Hao Sun
Qi Xu
Linfeng Li
Yiqing Cai
Botian Jiang
Hang Song
Xingcan Hu
Pengyu Wang
Li Xiao
148
7
0
14 Nov 2024
AD-DINO: Attention-Dynamic DINO for Distance-Aware Embodied Reference Understanding
Hao Guo
Wei Fan
Baichun Wei
Jianfei Zhu
Jin Tian
Chunzhi Yi
Feng Jiang
221
0
0
13 Nov 2024
LidaRefer: Context-aware Outdoor 3D Visual Grounding for Autonomous Driving
Yeong-Seung Baek
Heung-Seon Oh
224
0
0
07 Nov 2024
Finding NeMo: Negative-mined Mosaic Augmentation for Referring Image Segmentation
European Conference on Computer Vision (ECCV), 2024
Seongsu Ha
Chaeyun Kim
Donghwa Kim
Junho Lee
Sangho Lee
Joonseok Lee
214
6
0
03 Nov 2024
Referring Human Pose and Mask Estimation in the Wild
Neural Information Processing Systems (NeurIPS), 2024
Bo Miao
Mingtao Feng
Zijie Wu
Mohammed Bennamoun
Yongsheng Gao
Lin Wang
172
6
0
27 Oct 2024
Zero-shot Object Navigation with Vision-Language Models Reasoning
International Conference on Pattern Recognition (ICPR), 2024
Congcong Wen
Yisiyuan Huang
Niraj Pudasaini
Yanjia Huang
Shuaihang Yuan
Yu Hao
Hui Lin
Yu-Shen Liu
Yi Fang
LM&Ro
200
20
0
24 Oct 2024
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
MLLM
251
3
0
21 Oct 2024
Open-vocabulary vs. Closed-set: Best Practice for Few-shot Object Detection Considering Text Describability
Yusuke Hosoya
Masanori Suganuma
Takayuki Okatani
ObjD
236
0
0
20 Oct 2024
Temporal-Enhanced Multimodal Transformer for Referring Multi-Object Tracking and Segmentation
Changcheng Xiao
Qiong Cao
Yujie Zhong
Xiang Zhang
Tao Wang
Canqun Yang
L. Lan
154
3
0
17 Oct 2024
Context-Infused Visual Grounding for Art
Selina Khan
Nanne van Noord
ObjD
166
2
0
16 Oct 2024
DINTR: Tracking via Diffusion-based Interpolation
Neural Information Processing Systems (NeurIPS), 2024
Pha Nguyen
Ngan Le
J. Cothren
Alper Yilmaz
Khoa Luu
DiffM
273
3
0
14 Oct 2024
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Computer Vision and Pattern Recognition (CVPR), 2024
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
241
8
0
14 Oct 2024
DFIMat: Decoupled Flexible Interactive Matting in Multi-Person Scenarios
Asian Conference on Computer Vision (ACCV), 2024
Siyi Jiao
Wenzheng Zeng
Changxin Gao
Nong Sang
128
3
0
13 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Neural Information Processing Systems (NeurIPS), 2024
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
370
20
0
10 Oct 2024
G
2
^{2}
2
TR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models
Riya Arora
N. N.
Aman Tambi
Sandeep S. Zachariah
Souvik Chakraborty
Rohan Paul
LM&Ro
147
0
0
10 Oct 2024
Structured Spatial Reasoning with Open Vocabulary Object Detectors
Negar Nejatishahidin
Madhukar Reddy Vongala
Jana Kosecka
180
3
0
09 Oct 2024
Grounding Partially-Defined Events in Multimodal Data
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kate Sanders
Reno Kriz
David Etter
Hannah Recknor
Alexander Martin
Cameron Carpenter
Jingyang Lin
Benjamin Van Durme
131
4
0
07 Oct 2024
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models
Mengxue Qu
Xiaodong Chen
Wu Liu
Alicia Li
Yao Zhao
150
34
0
01 Oct 2024
You Only Speak Once to See
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Wenhao Yang
Jianguo Wei
Wenhuan Lu
Lei Li
VOS
146
4
0
27 Sep 2024
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
Neural Information Processing Systems (NeurIPS), 2024
Ming Dai
Lingfeng Yang
Yihao Xu
Zhenhua Feng
Wankou Yang
ObjD
346
36
0
26 Sep 2024
Harnessing Shared Relations via Multimodal Mixup Contrastive Learning for Multimodal Classification
Raja Kumar
Raghav Singhal
Pranamya Kulkarni
Deval Mehta
Kshitij S. Jadhav
290
2
0
26 Sep 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Junzhuo Liu
Xiaohu Yang
Weiwei Li
Peng Wang
ObjD
323
11
0
23 Sep 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
IEEE International Conference on Robotics and Automation (ICRA), 2024
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
255
2
0
23 Sep 2024
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
446
6
0
20 Sep 2024
LLM-wrapper: Black-Box Semantic-Aware Adaptation of Vision-Language Models for Referring Expression Comprehension
International Conference on Learning Representations (ICLR), 2024
Amaia Cardiel
Éloi Zablocki
Oriane Siméoni
Elias Ramzi
Matthieu Cord
VLM
232
0
0
18 Sep 2024
Robot Manipulation in Salient Vision through Referring Image Segmentation and Geometric Constraints
IEEE International Conference on Robotics and Automation (ICRA), 2024
Chen Jiang
Allie Luo
Martin Jägersand
217
4
0
17 Sep 2024
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Haoxuan Wang
Qu He
Jinlong Peng
Hao Yang
Mingmin Chi
Yabiao Wang
Mamba
221
7
0
13 Sep 2024
VLTP: Vision-Language Guided Token Pruning for Task-Oriented Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Hanning Chen
Yang Ni
Wenjun Huang
Yezi Liu
SungHeon Jeong
Fei Wen
Nathaniel D. Bastian
Hugo Latapie
Mohsen Imani
VLM
176
9
0
13 Sep 2024
An Attribute-Enriched Dataset and Auto-Annotated Pipeline for Open Detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Pengfei Qi
Yifei Zhang
Wenqiang Li
Youwen Hu
Kunlong Bai
ObjD
175
0
0
10 Sep 2024
Context is the Key: Backdoor Attacks for In-Context Learning with Vision Transformers
Gorka Abad
S. Picek
Lorenzo Cavallaro
A. Urbieta
SILM
197
1
0
06 Sep 2024
Make Graph-based Referring Expression Comprehension Great Again through Expression-guided Dynamic Gating and Regression
IEEE transactions on multimedia (IEEE TMM), 2024
Jingcheng Ke
Dele Wang
Jun-Cheng Chen
I-Hong Jhuo
Chia-Wen Lin
Yen-Yu Lin
202
1
0
05 Sep 2024
More Pictures Say More: Visual Intersection Network for Open Set Object Detection
Bingcheng Dong
Yuning Ding
Jinrong Zhang
Sifan Zhang
Shenglan Liu
ObjD
142
0
0
26 Aug 2024
LowCLIP: Adapting the CLIP Model Architecture for Low-Resource Languages in Multimodal Image Retrieval Task
Ali Asgarov
Samir Rustamov
VLM
94
3
0
25 Aug 2024
R2G: Reasoning to Ground in 3D Scenes
Pattern Recognition (Pattern Recogn.), 2024
Yixuan Li
Zan Wang
Wei Liang
237
2
0
24 Aug 2024
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models
Matteo Forlini
Mihail Babcinschi
Giacomo Palmieri
Pedro Neto
125
2
0
21 Aug 2024
On the Potential of Open-Vocabulary Models for Object Detection in Unusual Street Scenes
Sadia Ilyas
Ido Freeman
Matthias Rottmann
ObjD
284
6
0
20 Aug 2024
Towards Flexible Visual Relationship Segmentation
Neural Information Processing Systems (NeurIPS), 2024
Fangrui Zhu
Jianwei Yang
Huaizu Jiang
VOS
256
4
0
15 Aug 2024
An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding
European Conference on Computer Vision (ECCV), 2024
Wei Chen
Mahdieh Hatamian
Yu Wu
194
16
0
02 Aug 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
European Conference on Computer Vision (ECCV), 2024
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
357
3
0
28 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
168
13
0
23 Jul 2024
HAPFI: History-Aware Planning based on Fused Information
Sujin Jeon
Suyeon Shin
Byoung-Tak Zhang
148
0
0
23 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
170
5
0
22 Jul 2024
Weak-to-Strong Compositional Learning from Generative Models for Language-based Object Detection
Kwanyong Park
Kuniaki Saito
Donghyun Kim
VLM
CoGe
181
4
0
21 Jul 2024
Previous
1
2
3
4
5
6
...
12
13
14
Next