Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
50 / 678 papers shown
Proposal-Level Unsupervised Domain Adaptation for Open World Unbiased Detector
Xuanyi Liu
Zhongqi Yue
Xian-Sheng Hua
309
2
0
04 Nov 2023
Recognize Any Regions
Neural Information Processing Systems (NeurIPS), 2023
Haosen Yang
Chuofan Ma
Bin Wen
Yi Jiang
Zehuan Yuan
Xiatian Zhu
ObjD
VLM
361
3
0
02 Nov 2023
Spuriosity Rankings for Free: A Simple Framework for Last Layer Retraining Based on Object Detection
Mohammad Azizmalayeri
Reza Abbasi
Amir Hosein Haji Mohammad Rezaie
Reihaneh Zohrabi
Mahdi Amiri
M. T. Manzuri
M. Rohban
172
0
0
31 Oct 2023
A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical Image Analysis
medRxiv (medRxiv), 2023
Yingshu Li
Yunyi Liu
Zhanyu Wang
Xinyu Liang
Lei Wang
Lingqiao Liu
Leyang Cui
Zhaopeng Tu
Longyue Wang
Luping Zhou
ELM
LM&MA
329
0
0
31 Oct 2023
ViCLEVR: A Visual Reasoning Dataset and Hybrid Multimodal Fusion Model for Visual Question Answering in Vietnamese
Khiem Vinh Tran
Hao Phu Phan
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
155
15
0
27 Oct 2023
3D-Aware Visual Question Answering about Parts, Poses and Occlusions
Neural Information Processing Systems (NeurIPS), 2023
Xingrui Wang
Wufei Ma
Zhuowan Li
Adam Kortylewski
Yaoyao Liu
CoGe
318
21
0
27 Oct 2023
RIO: A Benchmark for Reasoning Intention-Oriented Objects in Open Environments
Neural Information Processing Systems (NeurIPS), 2023
Mengxue Qu
Yu-Huan Wu
Wu Liu
Xiaodan Liang
Jingkuan Song
Yao-Min Zhao
Yunchao Wei
239
19
0
26 Oct 2023
Context Does Matter: End-to-end Panoptic Narrative Grounding with Deformable Attention Refined Matching Network
Industrial Conference on Data Mining (IDM), 2023
Yiming Lin
Xiao-Bo Jin
Qiufeng Wang
Kaizhu Huang
155
5
0
25 Oct 2023
Video Referring Expression Comprehension via Transformer with Content-conditioned Query
Jiang Ji
Meng Cao
Tengtao Song
Long Chen
Yi Wang
Yuexian Zou
267
6
0
25 Oct 2023
What's Left? Concept Grounding with Logic-Enhanced Foundation Models
Neural Information Processing Systems (NeurIPS), 2023
Joy Hsu
Jiayuan Mao
Joshua B. Tenenbaum
Jiajun Wu
VLM
ReLM
LRM
384
41
0
24 Oct 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
263
17
0
24 Oct 2023
OV-VG: A Benchmark for Open-Vocabulary Visual Grounding
Chunlei Wang
Wenquan Feng
Xiangtai Li
Guangliang Cheng
Shuchang Lyu
Binghao Liu
Lijiang Chen
Qi Zhao
ObjD
VLM
269
14
0
22 Oct 2023
LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly
Bowen Fu
Sek Kun Leong
Yan Di
Jiwen Tang
Xiangyang Ji
281
5
0
20 Oct 2023
Multiscale Superpixel Structured Difference Graph Convolutional Network for VL Representation
Siyu Zhang
Ye-Ting Chen
Fang Wang
Yaoru Sun
Jun Yang
Lizhi Bai
SSL
299
1
0
20 Oct 2023
Weakly-Supervised Semantic Segmentation with Image-Level Labels: from Traditional Models to Foundation Models
ACM Computing Surveys (ACM Comput. Surv.), 2023
Zhaozheng Chen
Qianru Sun
VLM
426
24
0
19 Oct 2023
Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection
Neural Information Processing Systems (NeurIPS), 2023
Lingchen Meng
Xiyang Dai
Jianwei Yang
Dongdong Chen
Yinpeng Chen
Xiyang Dai
Yi-Ling Chen
Zuxuan Wu
Lu Yuan
Yu-Gang Jiang
154
12
0
18 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
192
1
0
18 Oct 2023
NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Haowei Wang
Jiayi Ji
Tianyu Guo
Yilong Yang
Weihao Ye
Xiaoshuai Sun
Rongrong Ji
349
8
0
17 Oct 2023
Zero-Shot Robotic Manipulation with Pretrained Image-Editing Diffusion Models
International Conference on Learning Representations (ICLR), 2023
Kevin Black
Mitsuhiko Nakamoto
P. Atreya
Homer Walke
Chelsea Finn
Aviral Kumar
Sergey Levine
DiffM
LM&Ro
388
235
0
16 Oct 2023
Ferret: Refer and Ground Anything Anywhere at Any Granularity
International Conference on Learning Representations (ICLR), 2023
Haoxuan You
Haotian Zhang
Zhe Gan
Xianzhi Du
Bowen Zhang
Zirui Wang
Liangliang Cao
Shih-Fu Chang
Yinfei Yang
ObjD
MLLM
VLM
415
453
0
11 Oct 2023
CoT3DRef: Chain-of-Thoughts Data-Efficient 3D Visual Grounding
International Conference on Learning Representations (ICLR), 2023
Eslam Mohamed Bakr
Mohamed Ayman
Mahmoud Ahmed
Habib Slim
Mohamed Elhoseiny
LRM
381
16
0
10 Oct 2023
InstructDET: Diversifying Referring Object Detection with Generalized Instructions
International Conference on Learning Representations (ICLR), 2023
Ronghao Dang
Jiangyan Feng
Haodong Zhang
Chongjian Ge
Lin Song
...
Chengju Liu
Qi Chen
Feng Zhu
Rui Zhao
Yibing Song
ObjD
434
16
0
08 Oct 2023
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
144
5
0
08 Oct 2023
Expedited Training of Visual Conditioned Language Generation via Redundancy Reduction
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yiren Jian
Tingkai Liu
Yunzhe Tao
Chunhui Zhang
Soroush Vosoughi
HX Yang
VLM
358
20
0
05 Oct 2023
CoDA: Collaborative Novel Box Discovery and Cross-modal Alignment for Open-vocabulary 3D Object Detection
Neural Information Processing Systems (NeurIPS), 2023
Yang Cao
Yihan Zeng
Hang Xu
Dan Xu
3DPC
ObjD
243
53
0
04 Oct 2023
Unsupervised 3D Perception with 2D Vision-Language Distillation for Autonomous Driving
IEEE International Conference on Computer Vision (ICCV), 2023
Mahyar Najibi
Jingwei Ji
Yin Zhou
C. Qi
Xinchen Yan
Scott Ettinger
Drago Anguelov
240
46
0
25 Sep 2023
CATR: Combinatorial-Dependence Audio-Queried Transformer for Audio-Visual Video Segmentation
ACM Multimedia (ACM MM), 2023
Kexin Li
Zongxin Yang
Lei Chen
Yezhou Yang
Jun Xiao
VOS
252
81
0
18 Sep 2023
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
Anh Pham Thi Minh
An Duc Nguyen
Georgios Tzimiropoulos
VPVLM
VLM
236
3
0
14 Sep 2023
Beyond Generation: Harnessing Text to Image Models for Object Detection and Segmentation
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Neel Joshi
Laurent Itti
Vibhav Vineet
DiffM
215
16
0
12 Sep 2023
Multi3DRefer: Grounding Text Description to Multiple 3D Objects
IEEE International Conference on Computer Vision (ICCV), 2023
Yiming Zhang
ZeMing Gong
Angel X. Chang
394
134
0
11 Sep 2023
Language Prompt for Autonomous Driving
AAAI Conference on Artificial Intelligence (AAAI), 2023
Dongming Wu
Wencheng Han
Tiancai Wang
Yingfei Liu
Cheng-zhong Xu
Jianbing Shen
Jianbing Shen
VLM
474
127
0
08 Sep 2023
Box-based Refinement for Weakly Supervised and Unsupervised Localization Tasks
IEEE International Conference on Computer Vision (ICCV), 2023
Eyal Gomel
Tal Shaharabany
Lior Wolf
ObjD
350
6
0
07 Sep 2023
DetermiNet: A Large-Scale Diagnostic Dataset for Complex Visually-Grounded Referencing using Determiners
IEEE International Conference on Computer Vision (ICCV), 2023
Clarence Lee
M Ganesh Kumar
Cheston Tan
198
3
0
07 Sep 2023
A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models
Noriyuki Kojima
Hadar Averbuch-Elor
Yoav Artzi
317
2
0
06 Sep 2023
Dense Object Grounding in 3D Scenes
ACM Multimedia (ACM MM), 2023
Wencan Huang
Daizong Liu
Wei Hu
259
24
0
05 Sep 2023
CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
IEEE International Conference on Computer Vision (ICCV), 2023
Jiajin Tang
Ge Zheng
Jingyi Yu
Sibei Yang
ObjD
215
39
0
03 Sep 2023
Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications
Wenyi Wu
Karim Bouyarmane
Ismail B. Tutar
54
2
0
30 Aug 2023
GREC: Generalized Referring Expression Comprehension
Shuting He
Henghui Ding
Chang Liu
Xudong Jiang
ObjD
257
34
0
30 Aug 2023
Exploring Multi-Modal Contextual Knowledge for Open-Vocabulary Object Detection
IEEE Transactions on Image Processing (IEEE TIP), 2023
Yifan Xu
Mengdan Zhang
Xiaoshan Yang
Changsheng Xu
ObjD
215
9
0
30 Aug 2023
Shatter and Gather: Learning Referring Image Segmentation with Text Supervision
IEEE International Conference on Computer Vision (ICCV), 2023
Dongwon Kim
Nam-Won Kim
Cuiling Lan
Suha Kwak
VLM
278
27
0
29 Aug 2023
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory
Computer Vision and Pattern Recognition (CVPR), 2023
Haiwen Diao
Bo Wan
Yanzhe Zhang
Xuecong Jia
Huchuan Lu
Long Chen
VLM
241
27
0
28 Aug 2023
Towards Unified Token Learning for Vision-Language Tracking
Yaozong Zheng
Bineng Zhong
Qihua Liang
Guorong Li
Rongrong Ji
Xianxian Li
270
81
0
27 Aug 2023
Beyond One-to-One: Rethinking the Referring Image Segmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Yutao Hu
Qixiong Wang
Wenqi Shao
Enze Xie
Zhenguo Li
Jungong Han
Ping Luo
3DV
244
70
0
26 Aug 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Chi Chen
Ruoyu Qin
Ziyue Wang
Xiaoyue Mi
Peng Li
Maosong Sun
Yang Liu
MLLM
VLM
301
55
0
25 Aug 2023
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yi Yao
Peng Liu
Tiancheng Zhao
Qianqian Zhang
Jiajia Liao
Chunxin Fang
Kyusong Lee
Qing Wang
VLM
ObjD
197
17
0
25 Aug 2023
SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Ziyan Yang
Kushal Kafle
Zhe Lin
Scott D. Cohen
Zhihong Ding
Vicente Ordonez
253
1
0
24 Aug 2023
Grounded Entity-Landmark Adaptive Pre-training for Vision-and-Language Navigation
IEEE International Conference on Computer Vision (ICCV), 2023
Yibo Cui
Liang Xie
Yakun Zhang
Meishan Zhang
Ye Yan
Erwei Yin
LM&Ro
223
27
0
24 Aug 2023
HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt interaction tasks
Zichao Dong
Weikun Zhang
Xufeng Huang
Hang Ji
Xin Zhan
Junbo Chen
VLM
91
6
0
24 Aug 2023
RefEgo: Referring Expression Comprehension Dataset from First-Person Perception of Ego4D
IEEE International Conference on Computer Vision (ICCV), 2023
Shuhei Kurita
Naoki Katsura
Eri Onami
EgoV
257
22
0
23 Aug 2023
Deep Metric Loss for Multimodal Learning
Machine-mediated learning (ML), 2023
Sehwan Moon
Hyun-Yong Lee
179
0
0
21 Aug 2023
Previous
1
2
3
...
6
7
8
...
12
13
14
Next