Visual Coreference Resolution in Visual Dialog using Neural Module Networks

6 September 2018

Devi Parikh

Papers citing "Visual Coreference Resolution in Visual Dialog using Neural Module Networks"

50 / 87 papers shown

MDSEval: A Meta-Evaluation Benchmark for Multimodal Dialogue Summarization

Srikanth Vishnubhotla

Robinson Piramuthu

Saab Mansour

104

02 Oct 2025

Alignment Helps Make the Most of Multimodal Data

Christian Arnold

Andreas Küpfer

327

14 May 2024

ReALM: Reference Resolution As Language Modeling

Joel Ruben Antony Moniz

263

29 Mar 2024

Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024

474

03 Jan 2024

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

180

21 Dec 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsNeural Information Processing Systems (NeurIPS), 2023

...

Yejin Choi

268

08 Dec 2023

$$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal $\mathbb{GR}$aphs$

\mathbb{VD}

\mathbb{GR}

: Boosting

\mathbb{V}

isual

\mathbb{D}

ialog with Cascaded Spatial-Temporal Multi-Modal

\mathbb{GR}

aphsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

166

25 Oct 2023

ETHER: Aligning Emergent Communication for Hindsight Experience Replay

223

28 Jul 2023

MOPA: Modular Object Navigation with PointGoal AgentsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

405

07 Apr 2023

Which One Are You Referring To? Multimodal Object Identification in Situated DialogueConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023

Holy Lovenia

Samuel Cahyawijaya

Pascale Fung

170

28 Feb 2023

Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot ManipulationConference on Robot Learning (CoRL), 2022

235

08 Dec 2022

Compound Tokens: Channel Fusion for Vision-Language Representation Learning

Maxwell Mbabilla Aladago

A. Piergiovanni

203

02 Dec 2022

Who are you referring to? Coreference resolution in image narrationsIEEE International Conference on Computer Vision (ICCV), 2022

266

26 Nov 2022

Unified Multimodal Model with Unlikelihood Training for Visual DialogACM Multimedia (ACM MM), 2022

179

23 Nov 2022

McQueen: a Benchmark for Multimodal Conversational Query RewriteConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

113

23 Oct 2022

Extending Phrase Grounding with Pronouns in Visual DialoguesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022

Min Zhang

178

23 Oct 2022

Foundations and Trends in Multimodal Machine Learning: Principles, Challenges, and Open QuestionsACM Computing Surveys (ACM CSUR), 2022

Paul Pu Liang

Amir Zadeh

Louis-Philippe Morency

310

164

07 Sep 2022

Interactive Question Answering Systems: Literature ReviewACM Computing Surveys (ACM CSUR), 2022

Giovanni Maria Biancofiore

391

04 Sep 2022

Neuro-Symbolic Visual DialogInternational Conference on Computational Linguistics (COLING), 2022

185

22 Aug 2022

Video Dialog as Conversation about Objects Living in Space-TimeEuropean Conference on Computer Vision (ECCV), 2022

209

08 Jul 2022

Adversarial Robustness of Visual Dialog

Lu Yu

Verena Rieser

AAML

186

06 Jul 2022

Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review

245

02 Jul 2022

Multimodal Dialogue State TrackingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022

Hung Le

Nancy F. Chen

Guosheng Lin

148

16 Jun 2022

VD-PCR: Improving Visual Dialog with Pronoun Coreference ResolutionPattern Recognition (Pattern Recogn.), 2022

180

29 May 2022

The Dialog Must Go On: Improving Visual Dialog via Generative Self-TrainingComputer Vision and Pattern Recognition (CVPR), 2022

278

25 May 2022

Multimodal Conversational AI: A Survey of Datasets and Approaches

Anirudh S. Sundar

Larry Heck

166

13 May 2022

Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering

279

02 May 2022

UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual DialogComputer Vision and Pattern Recognition (CVPR), 2022

Xin Jiang

Qun Liu

X. Gu

256

01 May 2022

Improving Cross-Modal Understanding in Visual Dialog via Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022

Bo Xu

155

15 Apr 2022

Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog

163

10 Apr 2022

FindIt: Generalized Localization with Natural Language QueriesEuropean Conference on Computer Vision (ECCV), 2022

202

31 Mar 2022

AssistQ: Affordance-centric Question-driven Task Completion for Egocentric AssistantEuropean Conference on Computer Vision (ECCV), 2022

452

08 Mar 2022

Modeling Coreference Relations in Visual DialogConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022

Mingxiao Li

Marie-Francine Moens

119

06 Mar 2022

Controllable Video Captioning with an Exemplar Sentence

173

02 Dec 2021

OpenViDial 2.0: A Larger-Scale, Open-Domain Dialogue Generation Dataset with Visual Contexts

Jiwei Li

221

27 Sep 2021

Multimodal Incremental Transformer with Visual Grounding for Visual Dialogue Generation

Feilong Chen

Fandong Meng

Xiuyi Chen

Peng Li

Jie Zhou

180

17 Sep 2021

GoG: Relation-aware Graph-over-Graph Network for Visual Dialog

Feilong Chen

Xiuyi Chen

Fandong Meng

Peng Li

Jie Zhou

264

17 Sep 2021

Learning to Ground Visual Objects for Visual Dialog

186

13 Sep 2021

We went to look for meaning and all we got were these lousy representations: aspects of meaning representation for computational semantics

138

10 Sep 2021

Exophoric Pronoun Resolution in Dialogues with Topic RegularizationConference on Empirical Methods in Natural Language Processing (EMNLP), 2021

Kun Xu

Dong Yu

143

10 Sep 2021

Productivity, Portability, Performance: Data-Centric Python

402

111

01 Jul 2021

VGNMN: Video-grounded Neural Module Network to Video-Grounded Language TasksNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Hung Le

Nancy F. Chen

Guosheng Lin

MLLM

290

16 Apr 2021

Ensemble of MRR and NDCG models for Visual DialogNorth American Chapter of the Association for Computational Linguistics (NAACL), 2021

Idan Schwartz

250

15 Apr 2021

Structured Co-reference Graph Attention for Video-grounded DialogueAAAI Conference on Artificial Intelligence (AAAI), 2021

202

24 Mar 2021

Relation-aware Instance Refinement for Weakly Supervised Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2021

252

24 Mar 2021

VX2TEXT: End-to-End Learning of Video-Based Text Generation From Multimodal InputsComputer Vision and Pattern Recognition (CVPR), 2021

Gedas Bertasius

Devi Parikh

240

28 Jan 2021

OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts

Rui Yan

Jiwei Li

371

30 Dec 2020

Look Before you Speak: Visually Contextualized UtterancesComputer Vision and Pattern Recognition (CVPR), 2020

Paul Hongsuck Seo

Arsha Nagrani

Cordelia Schmid

311

10 Dec 2020

Reasoning Over History: Context Aware Visual Dialog

Muhammad A. Shah

Shikib Mehri

Tejas Srinivasan

157

02 Nov 2020

Dense Relational Image Captioning via Multi-task Triple-Stream Networks

Dong-Jin Kim

Tae-Hyun Oh

Jinsoo Choi

In So Kweon

359

08 Oct 2020