Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2104.12763
Cited By
v1
v2 (latest)
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
IEEE International Conference on Computer Vision (ICCV), 2021
26 April 2021
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (1008★)
Papers citing
"MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding"
21 / 671 papers shown
Title
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
130
60
0
19 Nov 2021
Multi-Grained Vision Language Pre-Training: Aligning Texts with Visual Concepts
International Conference on Machine Learning (ICML), 2021
Yan Zeng
Xinsong Zhang
Hang Li
VLM
CLIP
264
347
0
16 Nov 2021
A Survey of Visual Transformers
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2021
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Peng Wang
Jianping Fan
Zhiqiang He
3DGS
ViT
358
447
0
11 Nov 2021
An Empirical Study of Training End-to-End Vision-and-Language Transformers
Computer Vision and Pattern Recognition (CVPR), 2021
Zi-Yi Dou
Yichong Xu
Zhe Gan
Jianfeng Wang
Shuohang Wang
...
Pengchuan Zhang
Lu Yuan
Nanyun Peng
Zicheng Liu
Michael Zeng
VLM
221
425
0
03 Nov 2021
Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Dirk Vath
Pascal Tilli
Ngoc Thang Vu
125
4
0
11 Oct 2021
CLIPort: What and Where Pathways for Robotic Manipulation
Conference on Robot Learning (CoRL), 2021
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
266
791
0
24 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
317
76
0
13 Sep 2021
Negative Sample Matters: A Renaissance of Metric Learning for Temporal Grounding
AAAI Conference on Artificial Intelligence (AAAI), 2021
Zhenzhi Wang
Limin Wang
Tao Wu
Tianhao Li
Gangshan Wu
AI4TS
250
150
0
10 Sep 2021
Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Stella Frank
Emanuele Bugliarello
Desmond Elliott
137
90
0
09 Sep 2021
TxT: Crossmodal End-to-End Learning with Transformers
German Conference on Pattern Recognition (DAGM), 2021
Jan-Martin O. Steitz
Jonas Pfeiffer
Iryna Gurevych
Stefan Roth
LRM
89
2
0
09 Sep 2021
SORNet: Spatial Object-Centric Representations for Sequential Manipulation
Conference on Robot Learning (CoRL), 2021
Wentao Yuan
Chris Paxton
Karthik Desingh
Dieter Fox
3DPC
433
76
0
08 Sep 2021
INVIGORATE: Interactive Visual Grounding and Grasping in Clutter
Hanbo Zhang
Yunfan Lu
Cunjun Yu
David Hsu
Xuguang Lan
Nanning Zheng
LM&Ro
183
70
0
25 Aug 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
ViT
279
86
0
20 Jul 2021
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
Conference on Robot Learning (CoRL), 2021
Junha Roh
Karthik Desingh
Ali Farhadi
Dieter Fox
210
110
0
07 Jul 2021
Augmented 2D-TAN: A Two-stage Approach for Human-centric Spatio-Temporal Video Grounding
Chaolei Tan
Zihang Lin
Jianfang Hu
Xiang Li
Weishi Zheng
172
12
0
20 Jun 2021
How Modular Should Neural Module Networks Be for Systematic Generalization?
Vanessa D’Amario
Tomotake Sasaki
Xavier Boix
137
18
0
15 Jun 2021
Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization
Ludan Ruan
Jieting Chen
Yuqing Song
Shizhe Chen
Qin Jin
80
0
0
11 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Neural Information Processing Systems (NeurIPS), 2021
Muchen Li
Leonid Sigal
ObjD
220
236
0
06 Jun 2021
VL-NMS: Breaking Proposal Bottlenecks in Two-Stage Visual-Language Matching
Chenchi Zhang
Wenbo Ma
Jun Xiao
Hanwang Zhang
Jian Shao
Yueting Zhuang
Long Chen
193
5
0
12 May 2021
Towards General Purpose Vision Systems
Computer Vision and Pattern Recognition (CVPR), 2021
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
222
55
0
01 Apr 2021
Transformers in Vision: A Survey
ACM Computing Surveys (CSUR), 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
ViT
773
3,041
0
04 Jan 2021
Previous
1
2
3
...
12
13
14