Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1809.01816
Cited By
Visual Coreference Resolution in Visual Dialog using Neural Module Networks
6 September 2018
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Coreference Resolution in Visual Dialog using Neural Module Networks"
50 / 87 papers shown
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
Yinhong Liu
Jianfeng He
Hang Su
Ruixue Lian
Yi Nian
Jake W. Vincent
Srikanth Vishnubhotla
Robinson Piramuthu
Saab Mansour
104
0
0
02 Oct 2025
Alignment Helps Make the Most of Multimodal Data
Christian Arnold
Andreas Küpfer
327
2
0
14 May 2024
ReALM: Reference Resolution As Language Modeling
Joel Ruben Antony Moniz
Soundarya Krishnan
Melis Ozyildirim
Prathamesh Saraf
Halim Cagri Ates
Yuan-kang Zhang
Hong-ye Yu
Nidhi Rajshree
263
10
0
29 Mar 2024
Detours for Navigating Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
474
7
0
03 Jan 2024
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
180
3
0
21 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Neural Information Processing Systems (NeurIPS), 2023
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
268
13
0
08 Dec 2023
V
D
\mathbb{VD}
VD
-
G
R
\mathbb{GR}
GR
: Boosting
V
\mathbb{V}
V
isual
D
\mathbb{D}
D
ialog with Cascaded Spatial-Temporal Multi-Modal
G
R
\mathbb{GR}
GR
aphs
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
166
7
0
25 Oct 2023
ETHER: Aligning Emergent Communication for Hindsight Experience Replay
Kevin Denamganai
Daniel Hernández
Ozan Vardal
S. Missaoui
James Alfred Walker
223
0
0
28 Jul 2023
MOPA: Modular Object Navigation with PointGoal Agents
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sonia Raychaudhuri
Tommaso Campari
Unnat Jain
Manolis Savva
Angel X. Chang
3DPC
405
13
0
07 Apr 2023
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
170
1
0
28 Feb 2023
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Conference on Robot Learning (CoRL), 2022
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
235
28
0
08 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
203
2
0
02 Dec 2022
Who are you referring to? Coreference resolution in image narrations
IEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
266
5
0
26 Nov 2022
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
ACM Multimedia (ACM MM), 2022
Zihao Wang
Junli Wang
Changjun Jiang
MLLM
179
13
0
23 Nov 2022
McQueen: a Benchmark for Multimodal Conversational Query Rewrite
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yifei Yuan
Chen Shi
Runze Wang
Liyi Chen
Feijun Jiang
Yuan You
W. Lam
113
7
0
23 Oct 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
178
5
0
23 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions
ACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
310
164
0
07 Sep 2022
Interactive Question Answering Systems: Literature Review
ACM Computing Surveys (ACM CSUR), 2022
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
391
38
0
04 Sep 2022
Neuro-Symbolic Visual Dialog
International Conference on Computational Linguistics (COLING), 2022
Adnen Abdessaied
Mihai Bâce
Andreas Bulling
NAI
185
4
0
22 Aug 2022
Video Dialog as Conversation about Objects Living in Space-Time
European Conference on Computer Vision (ECCV), 2022
H. Pham
T. Le
Vuong Le
Tu Minh Phuong
T. Tran
209
14
0
08 Jul 2022
Adversarial Robustness of Visual Dialog
Lu Yu
Verena Rieser
AAML
186
0
0
06 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
245
2
0
02 Jul 2022
Multimodal Dialogue State Tracking
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hung Le
Nancy F. Chen
Guosheng Lin
148
10
0
16 Jun 2022
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
Pattern Recognition (Pattern Recogn.), 2022
Xintong Yu
Hongming Zhang
Ruixin Hong
Yangqiu Song
Changshui Zhang
180
17
0
29 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Computer Vision and Pattern Recognition (CVPR), 2022
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
278
16
0
25 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
166
32
0
13 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
279
18
0
02 May 2022
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual Dialog
Computer Vision and Pattern Recognition (CVPR), 2022
Cheng Chen
Yudong Zhu
Zhenshan Tan
Qingrong Cheng
Xin Jiang
Qun Liu
X. Gu
256
43
0
01 May 2022
Improving Cross-Modal Understanding in Visual Dialog via Contrastive Learning
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Feilong Chen
Xiuyi Chen
Shuang Xu
Bo Xu
VLM
155
19
0
15 Apr 2022
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Shunyu Zhang
X. Jiang
Zequn Yang
T. Wan
Zengchang Qin
163
14
0
10 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
European Conference on Computer Vision (ECCV), 2022
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
202
18
0
31 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric Assistant
European Conference on Computer Vision (ECCV), 2022
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
452
34
0
08 Mar 2022
Modeling Coreference Relations in Visual Dialog
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mingxiao Li
Marie-Francine Moens
119
10
0
06 Mar 2022
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
173
21
0
02 Dec 2021
OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts
Shuhe Wang
Yuxian Meng
Xiaoya Li
Xiaofei Sun
Rongbin Ouyang
Jiwei Li
MLLM
VLM
221
23
0
27 Sep 2021
Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation
Feilong Chen
Fandong Meng
Xiuyi Chen
Peng Li
Jie Zhou
180
25
0
17 Sep 2021
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
Feilong Chen
Xiuyi Chen
Fandong Meng
Peng Li
Jie Zhou
264
37
0
17 Sep 2021
Learning to Ground Visual Objects for Visual Dialog
Feilong Chen
Xiuyi Chen
Can Xu
Daxin Jiang
OOD
186
18
0
13 Sep 2021
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
138
0
0
10 Sep 2021
Exophoric Pronoun Resolution in Dialogues with Topic Regularization
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Xintong Yu
Hongming Zhang
Yangqiu Song
Changshui Zhang
Kun Xu
Dong Yu
143
5
0
10 Sep 2021
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
402
111
0
01 Jul 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
290
21
0
16 Apr 2021
Ensemble of MRR and NDCG models for Visual Dialog
North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Idan Schwartz
250
10
0
15 Apr 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
AAAI Conference on Artificial Intelligence (AAAI), 2021
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
202
30
0
24 Mar 2021
Relation-aware Instance Refinement for Weakly Supervised Visual Grounding
Computer Vision and Pattern Recognition (CVPR), 2021
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
252
65
0
24 Mar 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs
Computer Vision and Pattern Recognition (CVPR), 2021
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
240
74
0
28 Jan 2021
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Leilei Gan
Rui Yan
Jiwei Li
371
31
0
30 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Computer Vision and Pattern Recognition (CVPR), 2020
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
311
71
0
10 Dec 2020
Reasoning Over History: Context Aware Visual Dialog
Muhammad A. Shah
Shikib Mehri
Tejas Srinivasan
157
4
0
02 Nov 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
359
39
0
08 Oct 2020
1
2
Next