ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.01816
  4. Cited By
Visual Coreference Resolution in Visual Dialog using Neural Module
  Networks

Visual Coreference Resolution in Visual Dialog using Neural Module Networks

6 September 2018
Satwik Kottur
José M. F. Moura
Devi Parikh
Dhruv Batra
Marcus Rohrbach
ArXiv (abs)PDFHTML

Papers citing "Visual Coreference Resolution in Visual Dialog using Neural Module Networks"

50 / 87 papers shown
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization
Yinhong Liu
Jianfeng He
Hang Su
Ruixue Lian
Yi Nian
Jake W. Vincent
Srikanth Vishnubhotla
Robinson Piramuthu
Saab Mansour
104
0
0
02 Oct 2025
Alignment Helps Make the Most of Multimodal Data
Alignment Helps Make the Most of Multimodal Data
Christian Arnold
Andreas Küpfer
327
2
0
14 May 2024
ReALM: Reference Resolution As Language Modeling
ReALM: Reference Resolution As Language Modeling
Joel Ruben Antony Moniz
Soundarya Krishnan
Melis Ozyildirim
Prathamesh Saraf
Halim Cagri Ates
Yuan-kang Zhang
Hong-ye Yu
Nidhi Rajshree
263
10
0
29 Mar 2024
Detours for Navigating Instructional Videos
Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
474
7
0
03 Jan 2024
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large
  Multimodal and Language Models
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
180
3
0
21 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsNeural Information Processing Systems (NeurIPS), 2023
Jinho Park
Jack Hessel
Khyathi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
268
13
0
08 Dec 2023
$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual
  $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal
  $\mathbb{GR}$aphs
VD\mathbb{VD}VD-GR\mathbb{GR}GR: Boosting V\mathbb{V}Visual D\mathbb{D}Dialog with Cascaded Spatial-Temporal Multi-Modal GR\mathbb{GR}GRaphsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
166
7
0
25 Oct 2023
ETHER: Aligning Emergent Communication for Hindsight Experience Replay
ETHER: Aligning Emergent Communication for Hindsight Experience Replay
Kevin Denamganai
Daniel Hernández
Ozan Vardal
S. Missaoui
James Alfred Walker
223
0
0
28 Jul 2023
MOPA: Modular Object Navigation with PointGoal Agents
MOPA: Modular Object Navigation with PointGoal AgentsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Sonia Raychaudhuri
Tommaso Campari
Unnat Jain
Manolis Savva
Angel X. Chang
3DPC
405
13
0
07 Apr 2023
Which One Are You Referring To? Multimodal Object Identification in
  Situated Dialogue
Which One Are You Referring To? Multimodal Object Identification in Situated DialogueConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
170
1
0
28 Feb 2023
Modularity through Attention: Efficient Training and Transfer of
  Language-Conditioned Policies for Robot Manipulation
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot ManipulationConference on Robot Learning (CoRL), 2022
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
235
28
0
08 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation
  Learning
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
203
2
0
02 Dec 2022
Who are you referring to? Coreference resolution in image narrations
Who are you referring to? Coreference resolution in image narrationsIEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
266
5
0
26 Nov 2022
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
Unified Multimodal Model with Unlikelihood Training for Visual DialogACM Multimedia (ACM MM), 2022
Zihao Wang
Junli Wang
Changjun Jiang
MLLM
179
13
0
23 Nov 2022
McQueen: a Benchmark for Multimodal Conversational Query Rewrite
McQueen: a Benchmark for Multimodal Conversational Query RewriteConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yifei Yuan
Chen Shi
Runze Wang
Liyi Chen
Feijun Jiang
Yuan You
W. Lam
113
7
0
23 Oct 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Extending Phrase Grounding with Pronouns in Visual DialoguesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
178
5
0
23 Oct 2022
Foundations and Trends in Multimodal Machine Learning: Principles,
  Challenges, and Open Questions
Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022
Paul Pu Liang
Amir Zadeh
Louis-Philippe Morency
310
164
0
07 Sep 2022
Interactive Question Answering Systems: Literature Review
Interactive Question Answering Systems: Literature ReviewACM Computing Surveys (ACM CSUR), 2022
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
391
38
0
04 Sep 2022
Neuro-Symbolic Visual Dialog
Neuro-Symbolic Visual DialogInternational Conference on Computational Linguistics (COLING), 2022
Adnen Abdessaied
Mihai Bâce
Andreas Bulling
NAI
185
4
0
22 Aug 2022
Video Dialog as Conversation about Objects Living in Space-Time
Video Dialog as Conversation about Objects Living in Space-TimeEuropean Conference on Computer Vision (ECCV), 2022
H. Pham
T. Le
Vuong Le
Tu Minh Phuong
T. Tran
209
14
0
08 Jul 2022
Adversarial Robustness of Visual Dialog
Adversarial Robustness of Visual Dialog
Lu Yu
Verena Rieser
AAML
186
0
0
06 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context
  Augmented Dialogue System: A Review
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
245
2
0
02 Jul 2022
Multimodal Dialogue State Tracking
Multimodal Dialogue State TrackingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hung Le
Nancy F. Chen
Guosheng Lin
148
10
0
16 Jun 2022
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
VD-PCR: Improving Visual Dialog with Pronoun Coreference ResolutionPattern Recognition (Pattern Recogn.), 2022
Xintong Yu
Hongming Zhang
Ruixin Hong
Yangqiu Song
Changshui Zhang
180
17
0
29 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative
  Self-Training
The Dialog Must Go On: Improving Visual Dialog via Generative Self-TrainingComputer Vision and Pattern Recognition (CVPR), 2022
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
278
16
0
25 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
166
32
0
13 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
279
18
0
02 May 2022
UTC: A Unified Transformer with Inter-Task Contrastive Learning for
  Visual Dialog
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual DialogComputer Vision and Pattern Recognition (CVPR), 2022
Cheng Chen
Yudong Zhu
Zhenshan Tan
Qingrong Cheng
Xin Jiang
Qun Liu
X. Gu
256
43
0
01 May 2022
Improving Cross-Modal Understanding in Visual Dialog via Contrastive
  Learning
Improving Cross-Modal Understanding in Visual Dialog via Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Feilong Chen
Xiuyi Chen
Shuang Xu
Bo Xu
VLM
155
19
0
15 Apr 2022
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Shunyu Zhang
X. Jiang
Zequn Yang
T. Wan
Zengchang Qin
163
14
0
10 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
FindIt: Generalized Localization with Natural Language QueriesEuropean Conference on Computer Vision (ECCV), 2022
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
202
18
0
31 Mar 2022
AssistQ: Affordance-centric Question-driven Task Completion for
  Egocentric Assistant
AssistQ: Affordance-centric Question-driven Task Completion for Egocentric AssistantEuropean Conference on Computer Vision (ECCV), 2022
B. Wong
Joya Chen
You Wu
Stan Weixian Lei
Dongxing Mao
Difei Gao
Mike Zheng Shou
EgoV
452
34
0
08 Mar 2022
Modeling Coreference Relations in Visual Dialog
Modeling Coreference Relations in Visual DialogConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Mingxiao Li
Marie-Francine Moens
119
10
0
06 Mar 2022
Controllable Video Captioning with an Exemplar Sentence
Controllable Video Captioning with an Exemplar Sentence
Yitian Yuan
Lin Ma
Jingwen Wang
Wenwu Zhu
173
21
0
02 Dec 2021
OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset
  with Visual Contexts
OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts
Shuhe Wang
Yuxian Meng
Xiaoya Li
Xiaofei Sun
Rongbin Ouyang
Jiwei Li
MLLMVLM
221
23
0
27 Sep 2021
Multimodal Incremental Transformer with Visual Grounding for Visual
  Dialogue Generation
Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation
Feilong Chen
Fandong Meng
Xiuyi Chen
Peng Li
Jie Zhou
180
25
0
17 Sep 2021
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
GoG: Relation-aware Graph-over-Graph Network for Visual Dialog
Feilong Chen
Xiuyi Chen
Fandong Meng
Peng Li
Jie Zhou
264
37
0
17 Sep 2021
Learning to Ground Visual Objects for Visual Dialog
Learning to Ground Visual Objects for Visual Dialog
Feilong Chen
Xiuyi Chen
Can Xu
Daxin Jiang
OOD
186
18
0
13 Sep 2021
We went to look for meaning and all we got were these lousy
  representations: aspects of meaning representation for computational
  semantics
We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics
Simon Dobnik
R. Cooper
Adam Ek
Bill Noble
Staffan Larsson
N. Ilinykh
Vladislav Maraev
Vidya Somashekarappa
138
0
0
10 Sep 2021
Exophoric Pronoun Resolution in Dialogues with Topic Regularization
Exophoric Pronoun Resolution in Dialogues with Topic RegularizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021
Xintong Yu
Hongming Zhang
Yangqiu Song
Changshui Zhang
Kun Xu
Dong Yu
143
5
0
10 Sep 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
402
111
0
01 Jul 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
290
21
0
16 Apr 2021
Ensemble of MRR and NDCG models for Visual Dialog
Ensemble of MRR and NDCG models for Visual DialogNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021
Idan Schwartz
250
10
0
15 Apr 2021
Structured Co-reference Graph Attention for Video-grounded Dialogue
Structured Co-reference Graph Attention for Video-grounded DialogueAAAI Conference on Artificial Intelligence (AAAI), 2021
Junyeong Kim
Sunjae Yoon
Dahyun Kim
Chang D. Yoo
202
30
0
24 Mar 2021
Relation-aware Instance Refinement for Weakly Supervised Visual
  Grounding
Relation-aware Instance Refinement for Weakly Supervised Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2021
Yongfei Liu
Bo Wan
Lin Ma
Xuming He
ObjD
252
65
0
24 Mar 2021
VX2TEXT: End-to-End Learning of Video-Based Text Generation From
  Multimodal Inputs
VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal InputsComputer Vision and Pattern Recognition (CVPR), 2021
Xudong Lin
Gedas Bertasius
Jue Wang
Shih-Fu Chang
Devi Parikh
Lorenzo Torresani
VGen
240
74
0
28 Jan 2021
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
  Contexts
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Leilei Gan
Rui Yan
Jiwei Li
371
31
0
30 Dec 2020
Look Before you Speak: Visually Contextualized Utterances
Look Before you Speak: Visually Contextualized UtterancesComputer Vision and Pattern Recognition (CVPR), 2020
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
311
71
0
10 Dec 2020
Reasoning Over History: Context Aware Visual Dialog
Reasoning Over History: Context Aware Visual Dialog
Muhammad A. Shah
Shikib Mehri
Tejas Srinivasan
157
4
0
02 Nov 2020
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dense Relational Image Captioning via Multi-task Triple-Stream Networks
Dong-Jin Kim
Tae-Hyun Oh
Jinsoo Choi
In So Kweon
359
39
0
08 Oct 2020
12
Next