ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.08669
  4. Cited By
Visual Dialog
v1v2v3v4v5 (latest)

Visual Dialog

26 November 2016
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
ArXiv (abs)PDFHTML

Papers citing "Visual Dialog"

50 / 597 papers shown
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
420
152
0
25 Jul 2023
Emu: Generative Pretraining in Multimodality
Emu: Generative Pretraining in MultimodalityInternational Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
358
155
0
11 Jul 2023
SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented
  Dialogue with Symbolic Scene Representation
SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene RepresentationInternational Conference on Computational Semantics (IWCS), 2023
Bhathiya Hemanthage
Christian Dondrup
P. Bartie
Oliver Lemon
MLLM
134
1
0
10 Jul 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text
  Documents
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text DocumentsNeural Information Processing Systems (NeurIPS), 2023
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
351
317
0
21 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Xu
Wenqi Shao
Kaipeng Zhang
Shiyang Feng
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELMMLLM
309
230
0
15 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review
  of Methodological Advances and Future Research Directions
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research DirectionsIEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
333
38
0
09 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Dealing with Semantic Underspecification in Multimodal NLPAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Sandro Pezzelle
164
11
0
08 Jun 2023
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLMVLM
376
136
0
07 Jun 2023
Chatting Makes Perfect: Chat-based Image Retrieval
Chatting Makes Perfect: Chat-based Image RetrievalNeural Information Processing Systems (NeurIPS), 2023
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
364
25
0
31 May 2023
VILAS: Exploring the Effects of Vision and Language Context in Automatic
  Speech Recognition
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziyi Ni
Minglun Han
Feilong Chen
Linghui Meng
Jing Shi
Shuang Xu
Bo Xu
184
3
0
31 May 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic
  Understanding with Scene and Topic Transitions
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic TransitionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuxuan Wang
Zilong Zheng
Xueliang Zhao
Jinpeng Li
Yueqian Wang
Dongyan Zhao
VGen
177
14
0
30 May 2023
A Unified Framework for Slot based Response Generation in a Multimodal
  Dialogue System
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Mauajama Firdaus
Avinash Madasu
Asif Ekbal
284
9
0
27 May 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation
MPCHAT: Towards Multimodal Persona-Grounded ConversationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jaewoo Ahn
Yeda Song
Sangdoo Yun
Gunhee Kim
177
26
0
27 May 2023
Generating Images with Multimodal Language Models
Generating Images with Multimodal Language ModelsNeural Information Processing Systems (NeurIPS), 2023
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
359
326
0
26 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
BIG-C: a Multimodal Multi-Purpose Dataset for BembaAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
171
8
0
26 May 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Learning to Imagine: Visually-Augmented Natural Language GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
421
10
0
26 May 2023
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and
  Compositional Experts
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional ExpertsAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Yunshui Li
Binyuan Hui
Zhichao Yin
Min Yang
Fei Huang
Yongbin Li
MoE
199
23
0
24 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in
  Open-domain Dialogue
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain DialogueConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoqin Tu
Yitong Li
Fei Mi
Zhongliang Yang
173
5
0
23 May 2023
SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation
SuperDialseg: A Large-scale Dataset for Supervised Dialogue SegmentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Junfeng Jiang
Chengzhang Dong
Sadao Kurohashi
Akiko Aizawa
114
13
0
15 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with
  Instruction Tuning
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningNeural Information Processing Systems (NeurIPS), 2023
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLMVLM
1.4K
2,884
0
11 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
334
150
0
07 May 2023
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
VCD: Visual Causality Discovery for Cross-Modal Question ReasoningChinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Zehua Wang
Guanbin Li
Jingzhou Luo
Guanbin Li
BDLLRM
288
6
0
17 Apr 2023
Grounding 3D Object Affordance from 2D Interactions in Images
Grounding 3D Object Affordance from 2D Interactions in ImagesIEEE International Conference on Computer Vision (ICCV), 2023
Yuhang Yang
Wei Zhai
Hongcheng Luo
Yang Cao
Jiebo Luo
Zhengjun Zha
273
53
0
18 Mar 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web VideosIEEE International Conference on Computer Vision (ICCV), 2023
Seungju Han
Jack Hessel
Nouha Dziri
Yejin Choi
Youngjae Yu
VGen
194
21
0
17 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
Data Roaming and Quality Assessment for Composed Image RetrievalAAAI Conference on Artificial Intelligence (AAAI), 2023
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
249
47
0
16 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched
  Visual Descriptions
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
236
123
0
12 Mar 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in
  Image Re-creation
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
367
0
0
10 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in
  Situated Dialogue
Which One Are You Referring To? Multimodal Object Identification in Situated DialogueConference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
170
1
0
28 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive SurveyMachine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CEVLM
464
272
0
20 Feb 2023
Interactive Video Corpus Moment Retrieval using Reinforcement Learning
Interactive Video Corpus Moment Retrieval using Reinforcement LearningACM Multimedia (ACM MM), 2022
Zhixin Ma
Chong-Wah Ngo
164
5
0
19 Feb 2023
What A Situated Language-Using Agent Must be Able to Do: A Top-Down
  Analysis
What A Situated Language-Using Agent Must be Able to Do: A Top-Down Analysis
David Schlangen
LLMAGLM&Ro
120
10
0
16 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and OutputsInternational Conference on Machine Learning (ICML), 2023
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
444
150
0
31 Jan 2023
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Style-Aware Contrastive Learning for Multi-Style Image CaptioningFindings (Findings), 2023
Yucheng Zhou
Guodong Long
144
28
0
26 Jan 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real
  World
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real WorldACM Multimedia (ACM MM), 2023
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
202
13
0
14 Jan 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions
  from Incremental Layout Graph
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout GraphAAAI Conference on Artificial Intelligence (AAAI), 2023
Yuxing Long
Binyuan Hui
Fulong Ye
Yanyang Li
Zhuoxin Han
Caixia Yuan
Yongbin Li
Xiaojie Wang
LLMAG
205
9
0
05 Jan 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction
  Tuning
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction TuningAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
362
132
0
21 Dec 2022
Modularity through Attention: Efficient Training and Transfer of
  Language-Conditioned Policies for Robot Manipulation
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot ManipulationConference on Robot Learning (CoRL), 2022
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
241
28
0
08 Dec 2022
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal
  Dialogue Dataset
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetNorth American Chapter of the Association for Computational Linguistics (NAACL), 2022
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Jonghwan Hyeon
Ho-Jin Choi
296
12
0
08 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation
  Learning
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
203
2
0
02 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph
  Riddles
Improving Commonsense in Vision-Language Models via Knowledge Graph RiddlesComputer Vision and Pattern Recognition (CVPR), 2022
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
135
18
0
29 Nov 2022
Who are you referring to? Coreference resolution in image narrations
Who are you referring to? Coreference resolution in image narrationsIEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
272
5
0
26 Nov 2022
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
Unified Multimodal Model with Unlikelihood Training for Visual DialogACM Multimedia (ACM MM), 2022
Zihao Wang
Junli Wang
Changjun Jiang
MLLM
180
13
0
23 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video
  Captioning
Aligning Source Visual and Target Language Domains for Unpaired Video CaptioningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
242
30
0
22 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image
  Captioning
Progressive Tree-Structured Prototype Network for End-to-End Image CaptioningACM Multimedia (ACM MM), 2022
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
181
33
0
17 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog System
Navigating Connected Memories with a Task-oriented Dialog SystemConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Seungwhan Moon
Satwik Kottur
A. Geramifard
Babak Damavandi
125
3
0
15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling
  Approaches
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling ApproachesConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
244
9
0
15 Nov 2022
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal
  Open-domain Conversation
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain ConversationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Jiazhan Feng
Qingfeng Sun
Can Xu
Lu Wang
Yaming Yang
Chongyang Tao
Dongyan Zhao
Qingwei Lin
251
67
0
10 Nov 2022
Going for GOAL: A Resource for Grounded Football Commentaries
Going for GOAL: A Resource for Grounded Football Commentaries
Alessandro Suglia
José Lopes
E. Bastianelli
Andrea Vanzo
Shubham Agarwal
Malvina Nikandrou
Lu Yu
Ioannis Konstas
Verena Rieser
125
8
0
08 Nov 2022
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity
  Recognition
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
H. Haresamudram
Irfan Essa
122
9
0
08 Nov 2022
End-to-End Multimodal Representation Learning for Video Dialog
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
205
7
0
26 Oct 2022
Previous
12345...101112
Next