Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.08186
Cited By
MAttNet: Modular Attention Network for Referring Expression Comprehension
24 January 2018
Licheng Yu
Zhe-nan Lin
Xiaohui Shen
Jimei Yang
Xin Lu
Mohit Bansal
Tamara L. Berg
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MAttNet: Modular Attention Network for Referring Expression Comprehension"
50 / 168 papers shown
Title
Referring Multi-Object Tracking
Dongming Wu
Wencheng Han
Tiancai Wang
Xingping Dong
Xiangyu Zhang
Jianbing Shen
34
71
0
06 Mar 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
160
215
0
03 Mar 2023
Focusing On Targets For Improving Weakly Supervised Visual Grounding
V. Pham
Nao Mishima
ObjD
21
1
0
22 Feb 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
21
2
0
17 Feb 2023
Semantic Image Segmentation: Two Decades of Research
G. Csurka
Riccardo Volpi
Boris Chidlovskii
3DV
29
50
0
13 Feb 2023
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Haowei Wang
Jiayi Ji
Yiyi Zhou
Yongjian Wu
Xiaoshuai Sun
30
15
0
09 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
24
4
0
05 Jan 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Jianzong Wu
Xiangtai Li
Henghui Ding
Xia Li
Guangliang Cheng
Yu Tong
Chen Change Loy
VLM
85
31
0
02 Jan 2023
ScanEnts3D: Exploiting Phrase-to-3D-Object Correspondences for Improved Visio-Linguistic Models in 3D Scenes
Ahmed Abdelreheem
Kyle Olszewski
Hsin-Ying Lee
Peter Wonka
Panos Achlioptas
3DPC
22
28
0
12 Dec 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
42
24
0
28 Nov 2022
Who are you referring to? Coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
17
2
0
26 Nov 2022
Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding
Eslam Mohamed Bakr
Yasmeen Alsaedy
Mohamed Elhoseiny
3DPC
21
41
0
25 Nov 2022
Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
R. Burgert
Kanchana Ranasinghe
Xiang Li
Michael S. Ryoo
DiffM
VLM
34
37
0
23 Nov 2022
Cross-Modal Adapter for Text-Video Retrieval
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Jie Zhou
S. Song
Gao Huang
45
36
0
17 Nov 2022
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
27
5
0
15 Nov 2022
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
RSVG: Exploring Data and Models for Visual Grounding on Remote Sensing Data
Yangfan Zhan
Zhitong Xiong
Yuan. Yuan
71
106
0
23 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
19
8
0
09 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image Captioning
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
40
10
0
04 Oct 2022
MUG: Interactive Multimodal Grounding on User Interfaces
Tao Li
Gang Li
Jingjie Zheng
Purple Wang
Yang Li
LLMAG
33
8
0
29 Sep 2022
Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
Fengyuan Shi
Ruopeng Gao
Weilin Huang
Limin Wang
19
23
0
28 Sep 2022
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
31
11
0
19 Aug 2022
PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding
Zihan Ding
Zixiang Ding
Tianrui Hui
Junshi Huang
Xiaoming Wei
Xiaolin K. Wei
Si Liu
12
12
0
11 Aug 2022
RefCrowd: Grounding the Target in Crowd with Referring Expressions
Heqian Qiu
Hongliang Li
Taijin Zhao
Lanxiao Wang
Qingbo Wu
Fanman Meng
ObjD
21
6
0
16 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
24
124
0
15 Jun 2022
Referring Image Matting
Jizhizi Li
Jing Zhang
Dacheng Tao
ObjD
VLM
23
22
0
10 Jun 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
26
38
0
23 May 2022
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
19
21
0
10 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning
Li Yang
Yan Xu
Chunfen Yuan
Wei Liu
Bing Li
Weiming Hu
ObjD
34
113
0
30 Apr 2022
Instance-Specific Feature Propagation for Referring Segmentation
Chang Liu
Xudong Jiang
Henghui Ding
ISeg
17
55
0
26 Apr 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian
William Merrill
Trevor Darrell
Matt Gardner
Sameer Singh
Anna Rohrbach
ObjD
21
125
0
12 Apr 2022
Position-aware Location Regression Network for Temporal Video Grounding
Sunoh Kim
Kimin Yun
J. Choi
22
4
0
12 Apr 2022
Multi-View Transformer for 3D Visual Grounding
Shijia Huang
Yilun Chen
Jiaya Jia
Liwei Wang
20
112
0
05 Apr 2022
To Find Waldo You Need Contextual Cues: Debiasing Who's Waldo
Yiran Luo
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
19
4
0
30 Mar 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
28
94
0
30 Mar 2022
Shifting More Attention to Visual Backbone: Query-modulated Refinement Networks for End-to-End Visual Grounding
Jiabo Ye
Junfeng Tian
Ming Yan
Xiaoshan Yang
Xuwu Wang
Ji Zhang
Liang He
Xin Lin
ObjD
11
61
0
29 Mar 2022
Text2Pos: Text-to-Point-Cloud Cross-Modal Localization
Manuel Kolmet
Qunjie Zhou
Aljosa Osep
Laura Leal-Taixe
19
22
0
28 Mar 2022
Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language Structures via Dependency Relationships
Chao Lou
Wenjuan Han
Yuh-Chen Lin
Zilong Zheng
CoGe
23
10
0
27 Mar 2022
Emergence of hierarchical reference systems in multi-agent communication
Xenia Ohmer
M. Duda
Elia Bruni
22
8
0
24 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
37
50
0
16 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Yang Yang
Xibai Lou
Changhyun Choi
21
30
0
15 Mar 2022
Grounding Commands for Autonomous Vehicles via Layer Fusion with Region-specific Dynamic Layer Attention
Hou Pong Chan
M. Guo
Chengguang Xu
22
4
0
14 Mar 2022
Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
Liangsheng Lu
Wei Zhai
Hongcheng Luo
Yu Kang
Yang Cao
21
19
0
24 Feb 2022
CAISE: Conversational Agent for Image Search and Editing
Hyounghun Kim
Doo Soon Kim
Seunghyun Yoon
Franck Dernoncourt
Trung Bui
Mohit Bansal
19
6
0
24 Feb 2022
Multi-Modal Knowledge Graph Construction and Application: A Survey
Xiangru Zhu
Zhixu Li
Xiaodan Wang
Xueyao Jiang
Penglei Sun
Xuwu Wang
Yanghua Xiao
N. Yuan
28
154
0
11 Feb 2022
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
Hengcan Shi
Munawar Hayat
Yicheng Wu
Jianfei Cai
VLM
30
60
0
18 Jan 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
18
10
0
18 Jan 2022
Repurposing Existing Deep Networks for Caption and Aesthetic-Guided Image Cropping
Nora Horanyi
Kedi Xia
K. M. Yi
Abhishake Kumar Bojja
A. Leonardis
H. Chang
28
12
0
07 Jan 2022
Incremental Object Grounding Using Scene Graphs
J. Yi
Yoonwoo Kim
Sonia Chernova
LM&Ro
28
9
0
06 Jan 2022
Previous
1
2
3
4
Next