Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1612.09542
Cited By
A Joint Speaker-Listener-Reinforcer Model for Referring Expressions
30 December 2016
Licheng Yu
Hao Tan
Mohit Bansal
Tamara L. Berg
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Joint Speaker-Listener-Reinforcer Model for Referring Expressions"
46 / 46 papers shown
Title
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
159
2
0
14 Jan 2025
Multi-Modal Mutual Attention and Iterative Interaction for Referring Image Segmentation
Chang Liu
Henghui Ding
Yulun Zhang
Xudong Jiang
24
47
0
24 May 2023
CK-Transformer: Commonsense Knowledge Enhanced Transformers for Referring Expression Comprehension
Zhi Zhang
H. Yannakoudakis
Xiantong Zhen
Ekaterina Shutova
21
2
0
17 Feb 2023
Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network
Haowei Wang
Jiayi Ji
Yiyi Zhou
Yongjian Wu
Xiaoshuai Sun
27
15
0
09 Jan 2023
A Unified Mutual Supervision Framework for Referring Expression Segmentation and Generation
Shijia Huang
Feng Li
Hao Zhang
Siyi Liu
Lei Zhang
Liwei Wang
24
5
0
15 Nov 2022
YORO -- Lightweight End to End Visual Grounding
Chih-Hui Ho
Srikar Appalaraju
Bhavan A. Jasani
R. Manmatha
Nuno Vasconcelos
ObjD
21
21
0
15 Nov 2022
Instance-Specific Feature Propagation for Referring Segmentation
Chang Liu
Xudong Jiang
Henghui Ding
ISeg
17
55
0
26 Apr 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
28
94
0
30 Mar 2022
Local-Global Context Aware Transformer for Language-Guided Video Segmentation
Chen Liang
Wenguan Wang
Tianfei Zhou
Jiaxu Miao
Yawei Luo
Yi Yang
VOS
26
74
0
18 Mar 2022
Phrase-Based Affordance Detection via Cyclic Bilateral Interaction
Liangsheng Lu
Wei Zhai
Hongcheng Luo
Yu Kang
Yang Cao
21
19
0
24 Feb 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
18
10
0
18 Jan 2022
D3Net: A Unified Speaker-Listener Architecture for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Qirui Wu
Matthias Nießner
Angel X. Chang
19
29
0
02 Dec 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
28
15
0
08 Jun 2021
Encoder Fusion Network with Co-Attention Embedding for Referring Image Segmentation
Guang Feng
Zhiwei Hu
Lihe Zhang
Huchuan Lu
EgoV
19
168
0
05 May 2021
Co-Grounding Networks with Semantic Attention for Referring Expression Comprehension in Videos
Sijie Song
Xudong Lin
Jiaying Liu
Zongming Guo
Shih-Fu Chang
ObjD
19
16
0
23 Mar 2021
Refer-it-in-RGBD: A Bottom-up Approach for 3D Visual Grounding in RGBD Images
Haolin Liu
Anran Lin
Xiaoguang Han
Lei Yang
Yizhou Yu
Shuguang Cui
24
39
0
14 Mar 2021
Refer, Reuse, Reduce: Generating Subsequent References in Visual and Conversational Contexts
Ece Takmaz
Mario Giulianelli
Sandro Pezzelle
Arabella J. Sinclair
Raquel Fernández
15
26
0
09 Nov 2020
RefVOS: A Closer Look at Referring Expressions for Video Object Segmentation
Míriam Bellver
Carles Ventura
Carina Silberer
Ioannis V. Kazakos
Jordi Torres
Xavier Giró-i-Nieto
VOS
21
32
0
01 Oct 2020
Referring Expression Comprehension: A Survey of Methods and Datasets
Yanyuan Qiao
Chaorui Deng
Qi Wu
ObjD
47
93
0
19 Jul 2020
Give Me Something to Eat: Referring Expression Comprehension with Commonsense Knowledge
Peng Wang
Dongyang Liu
Hui Li
Qi Wu
ObjD
22
19
0
02 Jun 2020
Dynamic Language Binding in Relational Visual Reasoning
T. Le
Vuong Le
Svetha Venkatesh
T. Tran
NAI
23
19
0
30 Apr 2020
Giving Commands to a Self-driving Car: A Multimodal Reasoner for Visual Grounding
Thierry Deruyttere
Guillem Collell
Marie-Francine Moens
LRM
11
8
0
19 Mar 2020
A Multimodal Target-Source Classifier with Attention Branches to Understand Ambiguous Instructions for Fetching Daily Objects
A. Magassouba
K. Sugiura
Hisashi Kawai
38
9
0
23 Dec 2019
Grounding-Tracking-Integration
Zhengyuan Yang
T. Kumar
Tianlang Chen
Jinsong Su
Jiebo Luo
22
53
0
13 Dec 2019
A Real-time Global Inference Network for One-stage Referring Expression Comprehension
Yiyi Zhou
Rongrong Ji
Gen Luo
Xiaoshuai Sun
Jinsong Su
Xinghao Ding
Chia-Wen Lin
Q. Tian
ObjD
24
60
0
07 Dec 2019
A Fast and Accurate One-Stage Approach to Visual Grounding
Zhengyuan Yang
Boqing Gong
Liwei Wang
Wenbing Huang
Dong Yu
Jiebo Luo
ObjD
12
360
0
18 Aug 2019
Variational Context: Exploiting Visual and Textual Context for Grounding Referring Expressions
Yulei Niu
Hanwang Zhang
Zhiwu Lu
Shih-Fu Chang
ObjD
BDL
30
24
0
08 Jul 2019
Language-Conditioned Graph Networks for Relational Reasoning
Ronghang Hu
Anna Rohrbach
Trevor Darrell
Kate Saenko
23
171
0
10 May 2019
ShapeGlot: Learning Language for Shape Differentiation
Panos Achlioptas
Judy Fan
Robert D. Hawkins
Noah D. Goodman
Leonidas J. Guibas
28
82
0
08 May 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
31
227
0
25 Apr 2019
Learning to Generate Unambiguous Spatial Referring Expressions for Real-World Environments
Fethiye Irmak Dogan
Sinan Kalkan
Iolanda Leite
18
19
0
15 Apr 2019
Improving Referring Expression Grounding with Cross-modal Attention-guided Erasing
Xihui Liu
Zihao W. Wang
Jing Shao
Xiaogang Wang
Hongsheng Li
ObjD
19
180
0
03 Mar 2019
Neighbourhood Watch: Referring Expression Comprehension via Language-guided Graph Attention Networks
Peng Wang
Qi Wu
Jiewei Cao
Chunhua Shen
Lianli Gao
A. Hengel
ObjD
22
252
0
12 Dec 2018
Counterfactual Critic Multi-Agent Training for Scene Graph Generation
Long Chen
Hanwang Zhang
Jun Xiao
Xiangnan He
Shiliang Pu
Shih-Fu Chang
14
159
0
06 Dec 2018
Learning to Explain with Complemental Examples
Atsushi Kanehira
Tatsuya Harada
10
40
0
04 Dec 2018
Adversarial Over-Sensitivity and Over-Stability Strategies for Dialogue Models
Tong Niu
Mohit Bansal
AAML
21
85
0
06 Sep 2018
TVQA: Localized, Compositional Video Question Answering
Muhammad Abdul Wahab
Licheng Yu
Mounir Nasr Allah
Tamara L. Berg
25
616
0
05 Sep 2018
Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding
Zhou Yu
Jun-chen Yu
Chenchao Xiang
Zhou Zhao
Q. Tian
Dacheng Tao
ObjD
18
138
0
09 May 2018
Referring Relationships
Ranjay Krishna
Ines Chami
Michael S. Bernstein
Li Fei-Fei
22
94
0
28 Mar 2018
Discriminability objective for training descriptive captions
Ruotian Luo
Brian L. Price
Scott D. Cohen
Gregory Shakhnarovich
19
202
0
12 Mar 2018
Grounding Referring Expressions in Images by Variational Context
Hanwang Zhang
Yulei Niu
Shih-Fu Chang
BDL
ObjD
13
219
0
05 Dec 2017
Reasoning about Fine-grained Attribute Phrases using Reference Games
Jong-Chyi Su
Chenyun Wu
Huaizu Jiang
Subhransu Maji
32
16
0
29 Aug 2017
Translating Neuralese
Jacob Andreas
Anca Dragan
Dan Klein
21
58
0
23 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
Liwei Wang
Yin Li
Jing-ling Huang
Svetlana Lazebnik
VLM
27
494
0
11 Apr 2017
An End-to-End Approach to Natural Language Object Retrieval via Context-Aware Deep Reinforcement Learning
Fan Wu
Zhongwen Xu
Yi Yang
ObjD
26
11
0
22 Mar 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
Akira Fukui
Dong Huk Park
Daylen Yang
Anna Rohrbach
Trevor Darrell
Marcus Rohrbach
152
1,465
0
06 Jun 2016
1