Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
50 / 678 papers shown
Title
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
European Conference on Computer Vision (ECCV), 2024
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
176
12
0
12 Jan 2024
GroundingGPT:Language Enhanced Multi-modal Grounding Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Zhaowei Li
Qi Xu
Dong Zhang
Hang Song
Yiqing Cai
...
Junting Pan
Zefeng Li
Van Tu Vu
Zhida Huang
Tao Wang
581
94
0
11 Jan 2024
An Open and Comprehensive Pipeline for Unified Object Grounding and Detection
Xiangyu Zhao
Yicheng Chen
Shilin Xu
Xiangtai Li
Xinjiang Wang
Yining Li
Haian Huang
ObjD
AI4CE
303
53
0
04 Jan 2024
Context-Guided Spatio-Temporal Video Grounding
Computer Vision and Pattern Recognition (CVPR), 2024
Xin Gu
Hengrui Fan
Yan Huang
Tiejian Luo
Libo Zhang
244
38
0
03 Jan 2024
Glance and Focus: Memory Prompting for Multi-Event Video Question Answering
Neural Information Processing Systems (NeurIPS), 2024
Ziyi Bai
Ruiping Wang
Xilin Chen
324
12
0
03 Jan 2024
Generating Enhanced Negatives for Training Language-Based Object Detectors
Computer Vision and Pattern Recognition (CVPR), 2023
Shiyu Zhao
Long Zhao
Vijay Kumar B.G
Yumin Suh
Dimitris N. Metaxas
Manmohan Chandraker
S. Schulter
ObjD
VLM
412
13
0
29 Dec 2023
Bridging Modality Gap for Visual Grounding with Effecitve Cross-modal Distillation
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Jiaxi Wang
Wenhui Hu
Xueyang Liu
Beihu Wu
Yuting Qiu
Yingying Cai
251
1
0
29 Dec 2023
Set Prediction Guided by Semantic Concepts for Diverse Video Captioning
Yifan Lu
Ziqi Zhang
Chunfen Yuan
Peng Li
Yan Wang
Bing Li
Weiming Hu
141
6
0
25 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
269
26
0
25 Dec 2023
Cycle-Consistency Learning for Captioning and Grounding
Ning Wang
Jiajun Deng
Mingbo Jia
ObjD
227
13
0
23 Dec 2023
Learning from Mistakes: Iterative Prompt Relabeling for Text-to-Image Diffusion Model Training
Xinyan Chen
Jiaxin Ge
Tianjun Zhang
Jiaming Liu
Shanghang Zhang
VLM
EGVM
448
2
0
23 Dec 2023
GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Haozhan Shen
Tiancheng Zhao
Mingwei Zhu
Yuxiang Cai
VLM
ObjD
404
25
0
22 Dec 2023
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
379
317
0
21 Dec 2023
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Junfei Xiao
Ziqi Zhou
Wenxuan Li
Shiyi Lan
Jieru Mei
Zhiding Yu
Yaoyao Liu
Yuyin Zhou
Cihang Xie
VLM
176
1
0
21 Dec 2023
Perception Test 2023: A Summary of the First Challenge And Outcome
Joseph Heyward
João Carreira
Dima Damen
Andrew Zisserman
Viorica Patraucean
220
0
0
20 Dec 2023
Spectral Prompt Tuning:Unveiling Unseen Classes for Zero-Shot Semantic Segmentation
Wenhao Xu
Rongtao Xu
Changwei Wang
Shibiao Xu
Li Guo
Man Zhang
Xiaopeng Zhang
VLM
227
18
0
20 Dec 2023
Weakly Supervised Open-Vocabulary Object Detection
Jianghang Lin
Chunjiang Ge
Bingquan Wang
Shaohui Lin
Ke Li
Liujuan Cao
WSOD
285
15
0
19 Dec 2023
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLM
MLLM
374
50
0
19 Dec 2023
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Wei Tang
Liang Li
Xuejing Liu
Lu Jin
Jinhui Tang
Zechao Li
236
41
0
19 Dec 2023
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation
Sihan Liu
Yiwei Ma
Xiaoqing Zhang
Haowei Wang
Jiayi Ji
Xiaoshuai Sun
Rongrong Ji
403
85
0
19 Dec 2023
Text-Conditioned Resampler For Long Form Video Understanding
Bruno Korbar
Yongqin Xian
A. Tonioni
Andrew Zisserman
Federico Tombari
292
23
0
19 Dec 2023
Pixel Aligned Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
271
17
0
14 Dec 2023
General Object Foundation Model for Images and Videos at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Junfeng Wu
Yi Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
324
76
0
14 Dec 2023
Exploration of visual prompt in Grounded pre-trained open-set detection
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Qibo Chen
Weizhong Jin
Shuchang Li
Mengdi Liu
Li Yu
Jian Jiang
Xiaozheng Wang
VLM
100
1
0
14 Dec 2023
Segment Beyond View: Handling Partially Missing Modality for Audio-Visual Semantic Segmentation
AAAI Conference on Artificial Intelligence (AAAI), 2023
Renjie Wu
Hu Wang
Feras Dayoub
Hsiang-Ting Chen
172
10
0
14 Dec 2023
SKDF: A Simple Knowledge Distillation Framework for Distilling Open-Vocabulary Knowledge to Open-world Object Detector
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Shuailei Ma
Yuefeng Wang
Ying-yu Wei
Jiaqi Fan
Enming Zhang
Xinyu Sun
Peihao Chen
ObjD
282
3
0
14 Dec 2023
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Shahzad Ahmad
S. Chanda
Yogesh S Rawat
VLM
254
11
0
13 Dec 2023
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Computer Vision and Pattern Recognition (CVPR), 2023
Shuyang Sun
Runjia Li
Juil Sock
Xiuye Gu
Siyang Li
VLM
CLIP
443
56
0
12 Dec 2023
Genixer: Empowering Multimodal Large Language Models as a Powerful Data Generator
Henry Hengyuan Zhao
Pan Zhou
Mike Zheng Shou
MLLM
SyDa
426
11
0
11 Dec 2023
Visual Grounding of Whole Radiology Reports for 3D CT Images
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2023
Akimichi Ichinose
Taro Hatsutani
Keigo Nakamura
Yoshiro Kitamura
S. Iizuka
E. Simo-Serra
Shoji Kido
Noriyuki Tomiyama
196
12
0
08 Dec 2023
Improved Visual Grounding through Self-Consistent Explanations
Ruozhen He
Paola Cascante-Bonilla
Ziyan Yang
Alexander C. Berg
Vicente Ordonez
ReLM
ObjD
LRM
FAtt
258
24
0
07 Dec 2023
GPT-4 Enhanced Multimodal Grounding for Autonomous Driving: Leveraging Cross-Modal Attention with Large Language Models
Haicheng Liao
Huanming Shen
Zhenning Li
Chengyue Wang
Guofa Li
Yiming Bie
Chengzhong Xu
234
80
0
06 Dec 2023
LLaVA-Grounding: Grounded Visual Chat with Large Multimodal Models
Hao Zhang
Hongyang Li
Feng Li
Tianhe Ren
Xueyan Zou
...
Shijia Huang
Jianfeng Gao
Lei Zhang
Chun-yue Li
Jianwei Yang
335
111
0
05 Dec 2023
Lenna: Language Enhanced Reasoning Detection Assistant
Fei Wei
Xinyu Zhang
Ailing Zhang
Bo Zhang
Xiangxiang Chu
MLLM
LRM
254
30
0
05 Dec 2023
Aligning and Prompting Everything All at Once for Universal Visual Perception
Computer Vision and Pattern Recognition (CVPR), 2023
Chunjiang Ge
Chaoyou Fu
Peixian Chen
Mengdan Zhang
Ke Li
Xing Sun
Yunsheng Wu
Shaohui Lin
Rongrong Ji
VLM
ObjD
279
63
0
04 Dec 2023
Towards Generalizable Referring Image Segmentation via Target Prompt and Visual Coherence
International Conference on Information Photonics (ICIP), 2023
Yajie Liu
Pu Ge
Haoxiang Ma
Shichao Fan
Qingjie Liu
Di Huang
Yunhong Wang
171
1
0
01 Dec 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation
Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Jiaming Song
MLLM
204
16
0
30 Nov 2023
Language-conditioned Detection Transformer
Computer Vision and Pattern Recognition (CVPR), 2023
Jang Hyun Cho
Philipp Krahenbuhl
VLM
ObjD
183
4
0
29 Nov 2023
The devil is in the fine-grained details: Evaluating open-vocabulary object detectors for fine-grained understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Lorenzo Bianchi
F. Carrara
Nicola Messina
Claudio Gennaro
Fabrizio Falchi
ObjD
329
24
0
29 Nov 2023
No Representation Rules Them All in Category Discovery
Neural Information Processing Systems (NeurIPS), 2023
S. Vaze
Andrea Vedaldi
Andrew Zisserman
OOD
247
55
0
28 Nov 2023
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Computer Vision and Pattern Recognition (CVPR), 2023
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
390
19
0
28 Nov 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
European Conference on Computer Vision (ECCV), 2023
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
216
29
0
24 Nov 2023
Visual In-Context Prompting
Computer Vision and Pattern Recognition (CVPR), 2023
Feng Li
Qing Jiang
Hao Zhang
Tianhe Ren
Shilong Liu
...
Hongyang Li
Chun-yue Li
Jianwei Yang
Lei Zhang
Jianfeng Gao
VLM
LRM
MLLM
176
51
0
22 Nov 2023
Enhancing Visual Grounding and Generalization: A Multi-Task Cycle Training Approach for Vision-Language Models
Xiaoyu Yang
Lijian Xu
Hao Sun
Jiaming Song
Shaoting Zhang
ObjD
409
10
0
21 Nov 2023
To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning
Junke Wang
Lingchen Meng
Zejia Weng
Bo He
Zuxuan Wu
Yu-Gang Jiang
MLLM
VLM
252
133
0
13 Nov 2023
PerceptionGPT: Effectively Fusing Visual Perception into LLM
Computer Vision and Pattern Recognition (CVPR), 2023
Renjie Pi
Lewei Yao
Jiahui Gao
Jipeng Zhang
Tong Zhang
MLLM
179
55
0
11 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Conference on Robot Learning (CoRL), 2023
Georgios Tziafas
Yucheng Xu
Arushi Goel
Mohammadreza Kasaei
Zhibin Li
Hamidreza Kasaei
235
39
0
09 Nov 2023
DualTalker: A Cross-Modal Dual Learning Approach for Speech-Driven 3D Facial Animation
Guinan Su
Yanwu Yang
Zhifeng Li
VGen
224
3
0
08 Nov 2023
GLaMM: Pixel Grounding Large Multimodal Model
Computer Vision and Pattern Recognition (CVPR), 2023
H. Rasheed
Muhammad Maaz
Sahal Shaji Mullappilly
Abdelrahman M. Shaker
Salman Khan
Hisham Cholakkal
Rao M. Anwer
Erix Xing
Ming-Hsuan Yang
Fahad S. Khan
MLLM
VLM
425
390
0
06 Nov 2023
Augment the Pairs: Semantics-Preserving Image-Caption Pair Augmentation for Grounding-Based Vision and Language Models
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jingru Yi
Burak Uzkent
Oana Ignat
Zili Li
Amanmeet Garg
Xiang Yu
Linda Liu
VLM
251
2
0
05 Nov 2023
Previous
1
2
3
...
5
6
7
...
12
13
14
Next