Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1611.08669
Cited By
v1
v2
v3
v4
v5 (latest)
Visual Dialog
26 November 2016
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Dialog"
50 / 597 papers shown
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
420
152
0
25 Jul 2023
Emu: Generative Pretraining in Multimodality
International Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
358
155
0
11 Jul 2023
SimpleMTOD: A Simple Language Model for Multimodal Task-Oriented Dialogue with Symbolic Scene Representation
International Conference on Computational Semantics (IWCS), 2023
Bhathiya Hemanthage
Christian Dondrup
P. Bartie
Oliver Lemon
MLLM
134
1
0
10 Jul 2023
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Neural Information Processing Systems (NeurIPS), 2023
Hugo Laurenccon
Lucile Saulnier
Léo Tronchon
Stas Bekman
Amanpreet Singh
...
Siddharth Karamcheti
Alexander M. Rush
Douwe Kiela
Matthieu Cord
Victor Sanh
351
317
0
21 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Xu
Wenqi Shao
Kaipeng Zhang
Shiyang Feng
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
309
230
0
15 Jun 2023
Multimodal Explainable Artificial Intelligence: A Comprehensive Review of Methodological Advances and Future Research Directions
IEEE Access (IEEE Access), 2023
N. Rodis
Christos Sardianos
Panagiotis I. Radoglou-Grammatikis
Panagiotis G. Sarigiannidis
Iraklis Varlamis
Georgios Th. Papadopoulos
333
38
0
09 Jun 2023
Dealing with Semantic Underspecification in Multimodal NLP
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Sandro Pezzelle
164
11
0
08 Jun 2023
M
3
^3
3
IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
376
136
0
07 Jun 2023
Chatting Makes Perfect: Chat-based Image Retrieval
Neural Information Processing Systems (NeurIPS), 2023
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
364
25
0
31 May 2023
VILAS: Exploring the Effects of Vision and Language Context in Automatic Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Ziyi Ni
Minglun Han
Feilong Chen
Linghui Meng
Jing Shi
Shuang Xu
Bo Xu
184
3
0
31 May 2023
VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yuxuan Wang
Zilong Zheng
Xueliang Zhao
Jinpeng Li
Yueqian Wang
Dongyan Zhao
VGen
177
14
0
30 May 2023
A Unified Framework for Slot based Response Generation in a Multimodal Dialogue System
Mauajama Firdaus
Avinash Madasu
Asif Ekbal
284
9
0
27 May 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jaewoo Ahn
Yeda Song
Sangdoo Yun
Gunhee Kim
177
26
0
27 May 2023
Generating Images with Multimodal Language Models
Neural Information Processing Systems (NeurIPS), 2023
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
359
326
0
26 May 2023
BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Claytone Sikasote
Eunice Mukonde
Md Mahfuz Ibn Alam
Antonios Anastasopoulos
171
8
0
26 May 2023
Learning to Imagine: Visually-Augmented Natural Language Generation
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Tianyi Tang
Yushuo Chen
Yifan Du
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
DiffM
421
10
0
26 May 2023
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Yunshui Li
Binyuan Hui
Zhichao Yin
Min Yang
Fei Huang
Yongbin Li
MoE
199
23
0
24 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Haoqin Tu
Yitong Li
Fei Mi
Zhongliang Yang
173
5
0
23 May 2023
SuperDialseg: A Large-scale Dataset for Supervised Dialogue Segmentation
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Junfeng Jiang
Chengzhang Dong
Sadao Kurohashi
Akiko Aizawa
114
13
0
15 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Neural Information Processing Systems (NeurIPS), 2023
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
1.4K
2,884
0
11 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
334
150
0
07 May 2023
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Zehua Wang
Guanbin Li
Jingzhou Luo
Guanbin Li
BDL
LRM
288
6
0
17 Apr 2023
Grounding 3D Object Affordance from 2D Interactions in Images
IEEE International Conference on Computer Vision (ICCV), 2023
Yuhang Yang
Wei Zhai
Hongcheng Luo
Yang Cao
Jiebo Luo
Zhengjun Zha
273
53
0
18 Mar 2023
CHAMPAGNE: Learning Real-world Conversation from Large-Scale Web Videos
IEEE International Conference on Computer Vision (ICCV), 2023
Seungju Han
Jack Hessel
Nouha Dziri
Yejin Choi
Youngjae Yu
VGen
194
21
0
17 Mar 2023
Data Roaming and Quality Assessment for Composed Image Retrieval
AAAI Conference on Artificial Intelligence (AAAI), 2023
Matan Levy
Rami Ben-Ari
N. Darshan
Dani Lischinski
249
47
0
16 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
236
123
0
12 Mar 2023
Accountable Textual-Visual Chat Learns to Reject Human Instructions in Image Re-creation
Zhiwei Zhang
Yuliang Liu
MLLM
367
0
0
10 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2023
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
170
1
0
28 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
464
272
0
20 Feb 2023
Interactive Video Corpus Moment Retrieval using Reinforcement Learning
ACM Multimedia (ACM MM), 2022
Zhixin Ma
Chong-Wah Ngo
164
5
0
19 Feb 2023
What A Situated Language-Using Agent Must be Able to Do: A Top-Down Analysis
David Schlangen
LLMAG
LM&Ro
120
10
0
16 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
International Conference on Machine Learning (ICML), 2023
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
444
150
0
31 Jan 2023
Style-Aware Contrastive Learning for Multi-Style Image Captioning
Findings (Findings), 2023
Yucheng Zhou
Guodong Long
144
28
0
26 Jan 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
ACM Multimedia (ACM MM), 2023
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
202
13
0
14 Jan 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yuxing Long
Binyuan Hui
Fulong Ye
Yanyang Li
Zhuoxin Han
Caixia Yuan
Yongbin Li
Xiaojie Wang
LLMAG
205
9
0
05 Jan 2023
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhiyang Xu
Ying Shen
Lifu Huang
MLLM
362
132
0
21 Dec 2022
Modularity through Attention: Efficient Training and Transfer of Language-Conditioned Policies for Robot Manipulation
Conference on Robot Learning (CoRL), 2022
Yifan Zhou
Shubham D. Sonawani
Mariano Phielipp
Simon Stepputtis
H. B. Amor
LM&Ro
241
28
0
08 Dec 2022
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset
North American Chapter of the Association for Computational Linguistics (NAACL), 2022
Young-Jun Lee
ByungSoo Ko
Han-Gyu Kim
Jonghwan Hyeon
Ho-Jin Choi
296
12
0
08 Dec 2022
Compound Tokens: Channel Fusion for Vision-Language Representation Learning
Maxwell Mbabilla Aladago
A. Piergiovanni
203
2
0
02 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Computer Vision and Pattern Recognition (CVPR), 2022
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
135
18
0
29 Nov 2022
Who are you referring to? Coreference resolution in image narrations
IEEE International Conference on Computer Vision (ICCV), 2022
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
272
5
0
26 Nov 2022
Unified Multimodal Model with Unlikelihood Training for Visual Dialog
ACM Multimedia (ACM MM), 2022
Zihao Wang
Junli Wang
Changjun Jiang
MLLM
180
13
0
23 Nov 2022
Aligning Source Visual and Target Language Domains for Unpaired Video Captioning
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021
Fenglin Liu
Xian Wu
Chenyu You
Shen Ge
Yuexian Zou
Xu Sun
242
30
0
22 Nov 2022
Progressive Tree-Structured Prototype Network for End-to-End Image Captioning
ACM Multimedia (ACM MM), 2022
Pengpeng Zeng
Jinkuan Zhu
Jingkuan Song
Lianli Gao
VLM
181
33
0
17 Nov 2022
Navigating Connected Memories with a Task-oriented Dialog System
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Seungwhan Moon
Satwik Kottur
A. Geramifard
Babak Damavandi
125
3
0
15 Nov 2022
Pragmatics in Language Grounding: Phenomena, Tasks, and Modeling Approaches
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Daniel Fried
Nicholas Tomlin
Jennifer Hu
Roma Patel
Aida Nematzadeh
244
9
0
15 Nov 2022
MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
Jiazhan Feng
Qingfeng Sun
Can Xu
Lu Wang
Yaming Yang
Chongyang Tao
Dongyan Zhao
Qingwei Lin
251
67
0
10 Nov 2022
Going for GOAL: A Resource for Grounded Football Commentaries
Alessandro Suglia
José Lopes
E. Bastianelli
Andrea Vanzo
Shubham Agarwal
Malvina Nikandrou
Lu Yu
Ioannis Konstas
Verena Rieser
125
8
0
08 Nov 2022
Multi-Stage Based Feature Fusion of Multi-Modal Data for Human Activity Recognition
Hyeongju Choi
Apoorva Beedu
H. Haresamudram
Irfan Essa
122
9
0
08 Nov 2022
End-to-End Multimodal Representation Learning for Video Dialog
Huda AlAmri
Anthony Bilic
Michael Hu
Apoorva Beedu
Irfan Essa
205
7
0
26 Oct 2022
Previous
1
2
3
4
5
...
10
11
12
Next