Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
50 / 677 papers shown
Title
Open Vocabulary Semantic Segmentation with Patch Aligned Contrastive Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Jishnu Mukhoti
Tsung-Yu Lin
Omid Poursaeed
Rui Wang
Ashish Shah
Juil Sock
Ser-Nam Lim
VLM
231
116
0
09 Dec 2022
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Conference on Robot Learning (CoRL), 2022
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
223
28
0
08 Dec 2022
Framework-agnostic Semantically-aware Global Reasoning for Segmentation
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Mir Rayat Imtiaz Hossain
Leonid Sigal
James J. Little
ViT
135
0
0
06 Dec 2022
Fine-tuned CLIP Models are Efficient Video Learners
Computer Vision and Pattern Recognition (CVPR), 2022
H. Rasheed
Muhammad Uzair Khattak
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
CLIP
VLM
323
221
0
06 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLM
MLLM
297
325
0
05 Dec 2022
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
Neural Information Processing Systems (NeurIPS), 2022
Zicheng Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
Wei Ke
195
35
0
04 Dec 2022
Visual Question Answering From Another Perspective: CLEVR Mental Rotation Tests
Pattern Recognition (Pattern Recogn.), 2022
Christopher Beckham
Martin Weiss
Florian Golemo
S. Honari
Derek Nowrouzezahrai
C. Pal
164
9
0
03 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
195
2
0
02 Dec 2022
Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs
Computer Vision and Pattern Recognition (CVPR), 2022
Junbum Cha
Jonghwan Mun
Byungseok Roh
VLM
351
125
0
01 Dec 2022
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning
Computer Vision and Pattern Recognition (CVPR), 2022
Zhuowan Li
Xingrui Wang
Elias Stengel-Eskin
Adam Kortylewski
Wufei Ma
Benjamin Van Durme
Max Planck Institute for Informatics
OOD
LRM
224
100
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Computer Vision and Pattern Recognition (CVPR), 2022
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
115
18
0
29 Nov 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2022
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
214
39
0
28 Nov 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
246
5
0
28 Nov 2022
SLAN: Self-Locator Aided Network for Cross-Modal Understanding
Jiang-Tian Zhai
Tao Gui
Tong Wu
Xinghan Chen
Jiangjiang Liu
Bo Ren
Ming-Ming Cheng
ObjD
VLM
119
1
0
28 Nov 2022
Learning Object-Language Alignments for Open-Vocabulary Object Detection
International Conference on Learning Representations (ICLR), 2022
Chuang Lin
Pei Sun
Yi Jiang
Ping Luo
Zhuang Li
Gholamreza Haffari
Zehuan Yuan
Jianfei Cai
VLM
ObjD
161
115
0
27 Nov 2022
PUnifiedNER: A Prompting-based Unified NER System for Diverse Datasets
AAAI Conference on Artificial Intelligence (AAAI), 2022
Jinghui Lu
Rui Zhao
Brian Mac Namee
Fei Tan
172
25
0
27 Nov 2022
Who are you referring to? Coreference resolution in image narrations
IEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
246
5
0
26 Nov 2022
Language-Assisted 3D Feature Learning for Semantic Scene Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2022
Junbo Zhang
Guo Fan
Guanghan Wang
Zhèngyuān Sū
Kaisheng Ma
L. Yi
3DPC
231
8
0
25 Nov 2022
TPA-Net: Generate A Dataset for Text to Physics-based Animation
Yuxing Qiu
Feng Gao
Minchen Li
Govind Thattai
Yin Yang
Jian Ren
PINN
DiffM
VGen
193
0
0
25 Nov 2022
Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
R. Burgert
Kanchana Ranasinghe
Xiang Li
Michael S. Ryoo
DiffM
VLM
258
41
0
23 Nov 2022
X
2
^2
2
-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yan Zeng
Xinsong Zhang
Hang Li
Jiawei Wang
Jipeng Zhang
Hkust Wangchunshu Zhou
VLM
MLLM
219
24
0
22 Nov 2022
ClipCrop: Conditioned Cropping Driven by Vision-Language Model
Zhihang Zhong
Mingxi Cheng
Zhirong Wu
Yuhui Yuan
Yinqiang Zheng
Ji Li
Han Hu
Stephen Lin
Yoichi Sato
Imari Sato
VLM
CLIP
110
8
0
21 Nov 2022
Unifying Tracking and Image-Video Object Detection
Peirong Liu
Rui Wang
Pengchuan Zhang
Omid Poursaeed
Yipin Zhou
Xuefei Cao
Sreya . Dutta Roy
Ashish Shah
Ser-Nam Lim
165
0
0
20 Nov 2022
Leveraging per Image-Token Consistency for Vision-Language Pre-training
Computer Vision and Pattern Recognition (CVPR), 2022
Yunhao Gou
Tom Ko
Hansi Yang
James T. Kwok
Yu Zhang
Mingxuan Wang
VLM
181
11
0
20 Nov 2022
Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Neural Information Processing Systems (NeurIPS), 2022
Shizhe Chen
Pierre-Louis Guhur
Makarand Tapaswi
Cordelia Schmid
Ivan Laptev
219
125
0
17 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
154
5
0
15 Nov 2022
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
156
26
0
15 Nov 2022
Grounding Scene Graphs on Natural Images via Visio-Lingual Message Passing
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Aditay Tripathi
Anand Mishra
Anirban Chakraborty
156
3
0
03 Nov 2022
VLT: Vision-Language Transformer and Query Generation for Referring Segmentation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Henghui Ding
Chang Liu
Suchen Wang
Xudong Jiang
284
152
0
28 Oct 2022
Towards Unifying Reference Expression Generation and Comprehension
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
130
9
0
24 Oct 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
162
5
0
23 Oct 2022
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Jiaming Chen
Weihua Luo
Ran Song
Xiaolin K. Wei
Lin Ma
Wei Emma Zhang
3DV
273
7
0
22 Oct 2022
TOIST: Task Oriented Instance Segmentation Transformer with Noun-Pronoun Distillation
Neural Information Processing Systems (NeurIPS), 2022
Pengfei Li
Beiwen Tian
Yongliang Shi
Xiaoxue Chen
Hao Zhao
Guyue Zhou
Ya Zhang
229
29
0
19 Oct 2022
LVP-M3: Language-aware Visual Prompt for Multilingual Multimodal Machine Translation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Hongcheng Guo
Jiaheng Liu
Haoyang Huang
Jian Yang
Zhoujun Li
Dongdong Zhang
Zheng Cui
Furu Wei
149
24
0
19 Oct 2022
Perceptual Grouping in Contrastive Vision-Language Models
IEEE International Conference on Computer Vision (ICCV), 2022
Kanchana Ranasinghe
Brandon McKinzie
S. S. Ravi
Yinfei Yang
Alexander Toshev
Jonathon Shlens
VLM
380
71
0
18 Oct 2022
How to Train Vision Transformer on Small-scale Datasets?
British Machine Vision Conference (BMVC), 2022
Hanan Gani
Muzammal Naseer
Mohammad Yaqub
ViT
167
62
0
13 Oct 2022
Visual Classification via Description from Large Language Models
International Conference on Learning Representations (ICLR), 2022
Sachit Menon
Carl Vondrick
VLM
342
364
0
13 Oct 2022
One does not fit all! On the Complementarity of Vision Encoders for Vision and Language Tasks
Workshop on Representation Learning for NLP (RepL4NLP), 2022
Gregor Geigle
Chen Cecilia Liu
Jonas Pfeiffer
Iryna Gurevych
VLM
155
1
0
12 Oct 2022
Visual Language Maps for Robot Navigation
IEEE International Conference on Robotics and Automation (ICRA), 2022
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
616
493
0
11 Oct 2022
Understanding Embodied Reference with Touch-Line Transformer
International Conference on Learning Representations (ICLR), 2022
Yongqian Li
Xiaoxue Chen
Hao Zhao
Jiangtao Gong
Guyue Zhou
Federico Rossano
Yixin Zhu
250
20
0
11 Oct 2022
MAP: Multimodal Uncertainty-Aware Vision-Language Pre-training Model
Computer Vision and Pattern Recognition (CVPR), 2022
Yatai Ji
Junjie Wang
Yuan Gong
Lin Zhang
Yan Zhu
Hongfa Wang
Jiaxing Zhang
Tetsuya Sakai
Yujiu Yang
MLLM
199
56
0
11 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
229
26
0
09 Oct 2022
A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Computer Vision and Pattern Recognition (CVPR), 2022
Aishwarya Kamath
Peter Anderson
Su Wang
Jing Yu Koh
Alexander Ku
Austin Waters
Yinfei Yang
Jason Baldridge
Zarana Parekh
LM&Ro
375
59
0
06 Oct 2022
Video Referring Expression Comprehension via Transformer with Content-aware Query
Ji Jiang
Meng Cao
Tengtao Song
Yuexian Zou
246
5
0
06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
330
37
0
05 Oct 2022
PLOT: Prompt Learning with Optimal Transport for Vision-Language Models
International Conference on Learning Representations (ICLR), 2022
Guangyi Chen
Weiran Yao
Xiangchen Song
Xinyue Li
Yongming Rao
Kun Zhang
VPVLM
VLM
257
74
0
03 Oct 2022
Introducing Vision Transformer for Alzheimer's Disease classification task with 3D input
Zilun Zhang
Farzad Khalvati
MedIm
ViT
96
13
0
03 Oct 2022
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2022
Yanmin Wu
Xinhua Cheng
Renrui Zhang
Zesen Cheng
Jian Zhang
262
106
0
29 Sep 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
166
43
0
28 Sep 2022
Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks
Neural Information Processing Systems (NeurIPS), 2022
Zhiyang Chen
Yousong Zhu
Zhaowen Li
Fan Yang
Wei Li
...
Honghui Dong
Liwei Wu
Rui Zhao
Jinqiao Wang
Ming Tang
VLM
VOS
207
17
0
28 Sep 2022
Previous
1
2
3
...
10
11
12
13
14
Next