Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1611.08669
Cited By
v1
v2
v3
v4
v5 (latest)
Visual Dialog
26 November 2016
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Visual Dialog"
50 / 597 papers shown
OSCaR: Object State Captioning and State Change Representation
Nguyen Nguyen
Jing Bi
Ali Vosoughi
Yapeng Tian
Pooyan Fazli
Chenliang Xu
552
14
0
27 Feb 2024
CommVQA: Situating Visual Question Answering in Communicative Contexts
N. Naik
Christopher Potts
Elisa Kreiss
CoGe
84
1
0
22 Feb 2024
Towards Robust Instruction Tuning on Multimodal Large Language Models
Wei Han
Hui Chen
Soujanya Poria
MLLM
291
2
0
22 Feb 2024
CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models
Ziyue Wang
Chi Chen
Zihao Wan
Zhaolu Kang
Qidong Yan
...
Xiaoyue Mi
Peng Li
Ning Ma
Maosong Sun
Yang Liu
312
11
0
21 Feb 2024
OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog
Adnen Abdessaied
Manuel von Hochmeister
Andreas Bulling
200
2
0
20 Feb 2024
ConVQG: Contrastive Visual Question Generation with Multimodal Guidance
Li Mi
Syrielle Montariol
J. Castillo-Navarro
Xianjie Dai
Antoine Bosselut
D. Tuia
177
7
0
20 Feb 2024
Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection
Ruibo Chen
Yihan Wu
Lichang Chen
Guodong Liu
Qi He
Tianyi Xiong
Chenxi Liu
Junfeng Guo
Heng-Chiao Huang
VLM
192
36
0
19 Feb 2024
SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction
Jie Xu
Hanbo Zhang
Xinghang Li
Huaping Liu
Xuguang Lan
Tao Kong
LM&Ro
282
5
0
19 Feb 2024
VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction Optimization
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Dongsheng Zhu
Xunzhu Tang
Weidong Han
Jinghui Lu
Yukun Zhao
Guoliang Xing
Junfeng Wang
D. Yin
VLM
MLLM
296
16
0
12 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
International Conference on Learning Representations (ICLR), 2024
Simon Ging
M. A. Bravo
Thomas Brox
VLM
401
19
0
11 Feb 2024
MobileVLM V2: Faster and Stronger Baseline for Vision Language Model
Xiangxiang Chu
Limeng Qiao
Xinyu Zhang
Shuang Xu
Fei Wei
...
Xiaofei Sun
Yiming Hu
Xinyang Lin
Bo Zhang
Chunhua Shen
VLM
MLLM
238
148
0
06 Feb 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Jiaming Song
Yu Qiao
Jifeng Dai
AuLLM
239
70
0
18 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
211
12
0
04 Jan 2024
Detours for Navigating Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
482
7
0
03 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
282
271
0
28 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
635
2,182
0
21 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
183
3
0
21 Dec 2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViT
VLM
185
46
0
14 Dec 2023
Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common Ground
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023
Zhenyu Xu
Hailin Xu
Zhouyang Lu
Yingying Zhao
Rui Zhu
...
Robert P. Dick
Fan Yang
Tun Lu
Ning Gu
L. Shang
176
7
0
30 Nov 2023
A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models
Hongru Wang
Lingzhi Wang
Yiming Du
Liang Chen
Jing Zhou
Yufei Wang
Kam-Fai Wong
LRM
451
23
0
28 Nov 2023
LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge
Gongwei Chen
Leyang Shen
Rui Shao
Xiang Deng
Liqiang Nie
VLM
MLLM
299
81
0
20 Nov 2023
Active Prompt Learning in Vision Language Models
Jihwan Bang
Sumyeong Ahn
Jae-Gil Lee
VLM
261
20
0
18 Nov 2023
Towards A Unified Neural Architecture for Visual Recognition and Reasoning
Calvin Luo
Boqing Gong
Ting Chen
Chen Sun
OCL
ObjD
163
1
0
10 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Information Fusion (Inf. Fusion), 2023
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
399
71
0
01 Nov 2023
Impressions: Understanding Visual Semiotics and Aesthetic Impact
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Julia Kruk
Caleb Ziems
Diyi Yang
149
3
0
27 Oct 2023
V
D
\mathbb{VD}
VD
-
G
R
\mathbb{GR}
GR
: Boosting
V
\mathbb{V}
V
isual
D
\mathbb{D}
D
ialog with Cascaded Spatial-Temporal Multi-Modal
G
R
\mathbb{GR}
GR
aphs
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Adnen Abdessaied
Lei Shi
Andreas Bulling
3DH
171
7
0
25 Oct 2023
Large Language Models can Share Images, Too!
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Young-Jun Lee
Dokyong Lee
Joo Won Sung
Jonghwan Hyeon
Ho-Jin Choi
MLLM
386
4
0
23 Oct 2023
Semi-supervised multimodal coreference resolution in image narrations
A. Goel
Basura Fernando
Frank Keller
Hakan Bilen
208
6
0
20 Oct 2023
InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions
Hanbo Zhang
Jie Xu
Yuchen Mo
Tao Kong
192
1
0
18 Oct 2023
EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset
Hang Yin
Pinren Lu
Ziang Li
Bin Sun
Kan Li
259
0
0
17 Oct 2023
Off-Policy Evaluation for Human Feedback
Neural Information Processing Systems (NeurIPS), 2023
Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic
OffRL
331
7
0
11 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
235
2
0
07 Oct 2023
ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented Benchmarks
ACM Multimedia (ACM MM), 2023
Zejun Li
Ye Wang
Mengfei Du
Qingwen Liu
Binhao Wu
...
Zhihao Fan
Jie Fu
Jingjing Chen
Xuanjing Huang
Zhongyu Wei
303
16
0
04 Oct 2023
Application of frozen large-scale models to multimodal task-oriented dialogue
Tatsuki Kawamoto
Takuma Suzuki
Ko Miyama
Takumi Meguro
Tomohiro Takagi
133
2
0
02 Oct 2023
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Tianyu Yu
Jinyi Hu
Yuan Yao
Haoye Zhang
Yue Zhao
...
Jiao Xue
Dahai Li
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
VLM
MLLM
156
23
0
01 Oct 2023
AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Avamarie Brueggeman
Andrea Madotto
Mohammad Kachuee
Tushar Nagarajan
Matt Smith
...
Peyman Heidari
Yue Liu
Kavya Srinet
Babak Damavandi
Anuj Kumar
MLLM
292
111
0
27 Sep 2023
Teaching Text-to-Image Models to Communicate in Dialog
Xiaowen Sun
Jiazhan Feng
Yuxuan Wang
Yuxuan Lai
Xingyu Shen
Dongyan Zhao
DiffM
164
1
0
27 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Yuan Liu
MLLM
790
307
0
26 Sep 2023
Resolving References in Visually-Grounded Dialogue via Text Generation
SIGDIAL Conferences (SIGDIAL), 2023
Bram Willemsen
Livia Qian
Gabriel Skantze
172
5
0
23 Sep 2023
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
International Conference on Learning Representations (ICLR), 2023
Haozhe Zhao
Zefan Cai
Shuzheng Si
Xiaojian Ma
Kaikai An
Liang Chen
Zixuan Liu
Sheng Wang
Wenjuan Han
Baobao Chang
MLLM
VLM
452
184
0
14 Sep 2023
VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue
Yunshui Li
Binyuan Hui
Zhaochao Yin
Wanwei He
Run Luo
Yuxing Long
Min Yang
Fei Huang
Yongbin Li
157
1
0
14 Sep 2023
Collecting Visually-Grounded Dialogue with A Game Of Sorts
International Conference on Language Resources and Evaluation (LREC), 2023
Bram Willemsen
Dmytro Kalpakchi
Gabriel Skantze
117
2
0
10 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Shiyang Feng
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
276
152
0
07 Sep 2023
Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models
Yupan Huang
Zaiqiao Meng
Fangyu Liu
Yixuan Su
Nigel Collier
Yutong Lu
MLLM
182
32
0
31 Aug 2023
Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded Conversations
European Conference on Computer Vision (ECCV), 2023
Kilichbek Haydarov
Xiaoqian Shen
Avinash Madasu
Mahmoud Salem
Jia Li
Gamaleldin F. Elsayed
Mohamed Elhoseiny
263
7
0
30 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
International Conference on Learning Representations (ICLR), 2023
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
276
76
0
23 Aug 2023
Simple Baselines for Interactive Video Retrieval with Questions and Answers
IEEE International Conference on Computer Vision (ICCV), 2023
Kaiqu Liang
Samuel Albanie
200
8
0
21 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
AAAI Conference on Artificial Intelligence (AAAI), 2023
Wenbo Hu
Y. Xu
Jian Wang
W. Li
Zhe Chen
Zhuowen Tu
MLLM
VLM
347
188
0
19 Aug 2023
ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue Generation
ACM Multimedia (ACM MM), 2023
Bo Zhang
Jian Wang
Hui Ma
Bo Xu
Hongfei Lin
192
5
0
01 Aug 2023
'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges
SIGDIAL Conferences (SIGDIAL), 2023
Javier Chiyah-Garcia
Alessandro Suglia
Arash Eshghi
Helen F. Hastie
169
6
0
28 Jul 2023
Previous
1
2
3
4
5
6
...
10
11
12
Next