ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.08669
  4. Cited By
Visual Dialog
v1v2v3v4v5 (latest)

Visual Dialog

26 November 2016
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
ArXiv (abs)PDFHTML

Papers citing "Visual Dialog"

50 / 597 papers shown
VLC-BERT: Visual Question Answering with Contextualized Commonsense
  Knowledge
VLC-BERT: Visual Question Answering with Contextualized Commonsense KnowledgeIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Sahithya Ravi
Aditya Chinchure
Leonid Sigal
Renjie Liao
Vered Shwartz
150
44
0
24 Oct 2022
Towards Unifying Reference Expression Generation and Comprehension
Towards Unifying Reference Expression Generation and ComprehensionConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Duo Zheng
Tao Kong
Ya Jing
Jiaan Wang
Xiaojie Wang
ObjD
177
9
0
24 Oct 2022
McQueen: a Benchmark for Multimodal Conversational Query Rewrite
McQueen: a Benchmark for Multimodal Conversational Query RewriteConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yifei Yuan
Chen Shi
Runze Wang
Liyi Chen
Feijun Jiang
Yuan You
W. Lam
116
7
0
23 Oct 2022
Extending Phrase Grounding with Pronouns in Visual Dialogues
Extending Phrase Grounding with Pronouns in Visual DialoguesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Panzhong Lu
Xin Zhang
Meishan Zhang
Min Zhang
ObjD
193
5
0
23 Oct 2022
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Learning Point-Language Hierarchical Alignment for 3D Visual Grounding
Jiaming Chen
Weihua Luo
Ran Song
Xiaolin K. Wei
Lin Ma
Wei Emma Zhang
3DV
321
7
0
22 Oct 2022
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Z-LaVI: Zero-Shot Language Solver Fueled by Visual ImaginationConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
216
24
0
21 Oct 2022
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Selective Query-guided Debiasing for Video Corpus Moment RetrievalEuropean Conference on Computer Vision (ECCV), 2022
Sunjae Yoon
Jiajing Hong
Eunseop Yoon
Dahyun Kim
Junyeong Kim
Hee Suk Yoon
Changdong Yoo
417
26
0
17 Oct 2022
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for
  Vision-Language Few-Shot Prompting
MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot PromptingConference of the European Chapter of the Association for Computational Linguistics (EACL), 2022
Oscar Manas
Pau Rodríguez López
Saba Ahmadi
Aida Nematzadeh
Yash Goyal
Aishwarya Agrawal
VLMVPVLM
261
58
0
13 Oct 2022
Embodied Referring Expression for Manipulation Question Answering in
  Interactive Environment
Embodied Referring Expression for Manipulation Question Answering in Interactive EnvironmentIEEE International Conference on Robotics and Automation (ICRA), 2022
Qie Sima
Sinan Tan
Huaping Liu
LM&Ro
157
8
0
06 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of DataIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Ye Zhu
Yuehua Wu
Andrii Zadaianchuk
Yan Yan
361
39
0
05 Oct 2022
Learning to Collocate Visual-Linguistic Neural Modules for Image
  Captioning
Learning to Collocate Visual-Linguistic Neural Modules for Image CaptioningInternational Journal of Computer Vision (IJCV), 2022
Xu Yang
Hanwang Zhang
Chongyang Gao
Jianfei Cai
MLLM
273
10
0
04 Oct 2022
Towards Explainable 3D Grounded Visual Question Answering: A New
  Benchmark and Strong Baseline
Towards Explainable 3D Grounded Visual Question Answering: A New Benchmark and Strong Baseline
Lichen Zhao
Daigang Cai
Jing Zhang
Lu Sheng
Dong Xu
Ruizhi Zheng
Yinjie Zhao
Lipeng Wang
Xibo Fan
184
41
0
24 Sep 2022
I2DFormer: Learning Image to Document Attention for Zero-Shot Image
  Classification
I2DFormer: Learning Image to Document Attention for Zero-Shot Image ClassificationNeural Information Processing Systems (NeurIPS), 2022
Muhammad Ferjad Naeem
Yongqin Xian
Luc Van Gool
F. Tombari
VLM
198
54
0
21 Sep 2022
Selecting Stickers in Open-Domain Dialogue through Multitask Learning
Selecting Stickers in Open-Domain Dialogue through Multitask LearningFindings (Findings), 2022
Zhexin Zhang
Yeshuang Zhu
Zhengcong Fei
Jinchao Zhang
Jie Zhou
145
6
0
16 Sep 2022
LAVIS: A Library for Language-Vision Intelligence
LAVIS: A Library for Language-Vision Intelligence
Dongxu Li
Junnan Li
Hung Le
Guangsen Wang
Silvio Savarese
Guosheng Lin
VLM
334
63
0
15 Sep 2022
Interactive Question Answering Systems: Literature Review
Interactive Question Answering Systems: Literature ReviewACM Computing Surveys (ACM CSUR), 2022
Giovanni Maria Biancofiore
Yashar Deldjoo
Tommaso Di Noia
E. Sciascio
Fedelucio Narducci
403
39
0
04 Sep 2022
Neuro-Symbolic Visual Dialog
Neuro-Symbolic Visual DialogInternational Conference on Computational Linguistics (COLING), 2022
Adnen Abdessaied
Mihai Bâce
Andreas Bulling
NAI
193
4
0
22 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
Video Question Answering with Iterative Video-Text Co-TokenizationEuropean Conference on Computer Vision (ECCV), 2022
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
236
21
0
01 Aug 2022
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question
  Answering
Cross-Modal Causal Relational Reasoning for Event-Level Visual Question AnsweringIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Yang Liu
Guanbin Li
Guanbin Li
LRM
572
148
0
26 Jul 2022
Explicit Image Caption Editing
Explicit Image Caption EditingEuropean Conference on Computer Vision (ECCV), 2022
Zhen Wang
Long Chen
Wenbo Ma
G. Han
Yulei Niu
Jian Shao
Jun Xiao
183
14
0
20 Jul 2022
Deep Sequence Models for Text Classification Tasks
Deep Sequence Models for Text Classification Tasks
S. S. Abdullahi
Su Yiming
Shamsuddeen Hassan Muhammad
A. Mustapha
Ahmad Muhammad Aminu
Abdulkadir Abdullahi
Musa Bello
Saminu Mohammad Aliyu
128
4
0
18 Jul 2022
Scene Graph for Embodied Exploration in Cluttered Scenario
Scene Graph for Embodied Exploration in Cluttered Scenario
Yuhong Deng
Qie Sima
Di Guo
Huaping Liu
Yi Wang
Gang Hua
LM&Ro
287
2
0
16 Jul 2022
Modeling Non-Cooperative Dialogue: Theoretical and Empirical Insights
Modeling Non-Cooperative Dialogue: Theoretical and Empirical InsightsTransactions of the Association for Computational Linguistics (TACL), 2022
Anthony Sicilia
Tristan D. Maidment
Pat Healy
Malihe Alikhani
153
4
0
15 Jul 2022
Video Dialog as Conversation about Objects Living in Space-Time
Video Dialog as Conversation about Objects Living in Space-TimeEuropean Conference on Computer Vision (ECCV), 2022
H. Pham
T. Le
Vuong Le
Tu Minh Phuong
T. Tran
213
14
0
08 Jul 2022
Adversarial Robustness of Visual Dialog
Adversarial Robustness of Visual Dialog
Lu Yu
Verena Rieser
AAML
192
0
0
06 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context
  Augmented Dialogue System: A Review
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
245
2
0
02 Jul 2022
Technical Report for CVPR 2022 LOVEU AQTC Challenge
Technical Report for CVPR 2022 LOVEU AQTC Challenge
Hyeonyu Kim
Jongeun Kim
Jeonghun Kang
S. Park
Dongchan Park
Taehwan Kim
93
0
0
29 Jun 2022
Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric
  Approach
Winning the CVPR'2022 AQTC Challenge: A Two-stage Function-centric Approach
Shiwei Wu
Weidong He
Tong Xu
Hao Wang
Enhong Chen
EgoV
235
3
0
20 Jun 2022
Multimodal Dialogue State Tracking
Multimodal Dialogue State TrackingNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Hung Le
Nancy F. Chen
Guosheng Lin
158
10
0
16 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Peng Xu
Xiatian Zhu
David Clifton
ViT
567
846
0
13 Jun 2022
VD-PCR: Improving Visual Dialog with Pronoun Coreference Resolution
VD-PCR: Improving Visual Dialog with Pronoun Coreference ResolutionPattern Recognition (Pattern Recogn.), 2022
Xintong Yu
Hongming Zhang
Ruixin Hong
Yangqiu Song
Changshui Zhang
184
17
0
29 May 2022
Prompt-based Learning for Unpaired Image Captioning
Prompt-based Learning for Unpaired Image CaptioningIEEE transactions on multimedia (IEEE TMM), 2022
Peipei Zhu
Tianlin Li
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
Chen Chen
VLM
210
45
0
26 May 2022
Multimodal Knowledge Alignment with Reinforcement Learning
Multimodal Knowledge Alignment with Reinforcement Learning
Youngjae Yu
Jiwan Chung
Heeseung Yun
Jack Hessel
Jinho Park
...
Prithviraj Ammanabrolu
Rowan Zellers
Ronan Le Bras
Gunhee Kim
Yejin Choi
VLM
291
38
0
25 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative
  Self-Training
The Dialog Must Go On: Improving Visual Dialog via Generative Self-TrainingComputer Vision and Pattern Recognition (CVPR), 2022
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
293
16
0
25 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
166
33
0
13 May 2022
Learning to Retrieve Videos by Asking Questions
Learning to Retrieve Videos by Asking QuestionsACM Multimedia (ACM MM), 2022
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
319
19
0
11 May 2022
Chart Question Answering: State of the Art and Future Directions
Chart Question Answering: State of the Art and Future Directions
Enamul Hoque
P. Kavehzadeh
Ahmed Masry
154
53
0
08 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
279
18
0
02 May 2022
UTC: A Unified Transformer with Inter-Task Contrastive Learning for
  Visual Dialog
UTC: A Unified Transformer with Inter-Task Contrastive Learning for Visual DialogComputer Vision and Pattern Recognition (CVPR), 2022
Cheng Chen
Yudong Zhu
Zhenshan Tan
Qingrong Cheng
Xin Jiang
Qun Liu
X. Gu
267
43
0
01 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot LearningNeural Information Processing Systems (NeurIPS), 2022
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
697
4,901
0
29 Apr 2022
Supplementing Missing Visions via Dialog for Scene Graph Generations
Supplementing Missing Visions via Dialog for Scene Graph GenerationsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhenghao Zhao
Ye Zhu
Xiaoguang Zhu
Yuzhang Shang
Yan Yan
200
1
0
23 Apr 2022
Improving Cross-Modal Understanding in Visual Dialog via Contrastive
  Learning
Improving Cross-Modal Understanding in Visual Dialog via Contrastive LearningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Feilong Chen
Xiuyi Chen
Shuang Xu
Bo Xu
VLM
160
19
0
15 Apr 2022
Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared Knowledge
Can Visual Dialogue Models Do Scorekeeping? Exploring How Dialogue Representations Incrementally Encode Shared KnowledgeAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Brielen Madureira
David Schlangen
153
4
0
14 Apr 2022
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Reasoning with Multi-Structure Commonsense Knowledge in Visual Dialog
Shunyu Zhang
X. Jiang
Zequn Yang
T. Wan
Zengchang Qin
164
14
0
10 Apr 2022
There Are a Thousand Hamlets in a Thousand People's Eyes: Enhancing
  Knowledge-grounded Dialogue with Personal Memory
There Are a Thousand Hamlets in a Thousand People's Eyes: Enhancing Knowledge-grounded Dialogue with Personal MemoryAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Tingchen Fu
Xueliang Zhao
Chongyang Tao
Jiaxin Wen
Rui Yan
179
26
0
06 Apr 2022
Co-VQA : Answering by Interactive Sub Question Sequence
Co-VQA : Answering by Interactive Sub Question SequenceFindings (Findings), 2022
Ruonan Wang
Yuxi Qian
Fangxiang Feng
Xiaojie Wang
Huixing Jiang
LRM
165
19
0
02 Apr 2022
FindIt: Generalized Localization with Natural Language Queries
FindIt: Generalized Localization with Natural Language QueriesEuropean Conference on Computer Vision (ECCV), 2022
Weicheng Kuo
Fred Bertsch
Wei Li
A. Piergiovanni
M. Saffar
A. Angelova
ObjD
212
18
0
31 Mar 2022
Image Retrieval from Contextual Descriptions
Image Retrieval from Contextual DescriptionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Benno Krojer
Vaibhav Adlakha
Vibhav Vineet
Yash Goyal
Edoardo Ponti
Siva Reddy
256
38
0
29 Mar 2022
Fine-Grained Visual Entailment
Fine-Grained Visual EntailmentEuropean Conference on Computer Vision (ECCV), 2022
Christopher Thomas
Yipeng Zhang
Shih-Fu Chang
298
7
0
29 Mar 2022
How do you Converse with an Analytical Chatbot? Revisiting Gricean
  Maxims for Designing Analytical Conversational Behavior
How do you Converse with an Analytical Chatbot? Revisiting Gricean Maxims for Designing Analytical Conversational BehaviorInternational Conference on Human Factors in Computing Systems (CHI), 2022
V. Setlur
Melanie Tory
151
61
0
16 Mar 2022
Previous
123456...101112
Next
Page 5 of 12
Pageof 12