v1v2v3v4v5 (latest)

Visual Dialog

26 November 2016

Devi Parikh

Papers citing "Visual Dialog"

50 / 597 papers shown

OSCaR: Object State Captioning and State Change Representation

552

27 Feb 2024

CommVQA: Situating Visual Question Answering in Communicative Contexts

22 Feb 2024

Towards Robust Instruction Tuning on Multimodal Large Language Models

291

22 Feb 2024

CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

...

Peng Li

Maosong Sun

312

21 Feb 2024

OLViT: Multi-Modal State Tracking via Attention-Based Embeddings for Video-Grounded Dialog

Adnen Abdessaied

Manuel von Hochmeister

Andreas Bulling

200

20 Feb 2024

ConVQG: Contrastive Visual Question Generation with Multimodal Guidance

177

20 Feb 2024

Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection

192

19 Feb 2024

SInViG: A Self-Evolving Interactive Visual Agent for Human-Robot Interaction

282

19 Feb 2024

VisLingInstruct: Elevating Zero-Shot Learning in Multi-Modal Language Models with Autonomous Instruction OptimizationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

296

12 Feb 2024

Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchyInternational Conference on Learning Representations (ICLR), 2024

Simon Ging

M. A. Bravo

Thomas Brox

VLM

401

11 Feb 2024

MobileVLM V2: Faster and Stronger Baseline for Vision Language Model

...

Chunhua Shen

238

148

06 Feb 2024

MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer

...

Yu Qiao

239

18 Jan 2024

Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training

211

04 Jan 2024

Detours for Navigating Instructional VideosComputer Vision and Pattern Recognition (CVPR), 2024

482

03 Jan 2024

Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action

282

271

28 Dec 2023

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Weijie Su

...

Ping Luo

Yu Qiao

635

2,182

21 Dec 2023

InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models

183

21 Dec 2023

VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation

Sijie Zhao

Ying Shan

185

14 Dec 2023

Can Large Language Models Be Good Companions? An LLM-Based Eyewear System with Conversational Common GroundProceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies (IMWUT), 2023

Rui Zhu

...

Fan Yang

176

30 Nov 2023

A Survey of the Evolution of Language Model-Based Dialogue Systems: Data, Task and Models

451

28 Nov 2023

LION : Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge

299

20 Nov 2023

Active Prompt Learning in Vision Language Models

261

18 Nov 2023

Towards A Unified Neural Architecture for Visual Recognition and Reasoning

163

10 Nov 2023

From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and OpportunitiesInformation Fusion (Inf. Fusion), 2023

Md Farhan Ishmam

Md Sakib Hossain Shovon

M. F. Mridha

Nilanjan Dey

399

01 Nov 2023

Impressions: Understanding Visual Semiotics and Aesthetic ImpactConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Julia Kruk

Caleb Ziems

Diyi Yang

149

27 Oct 2023

$$\mathbb{VD}$-$\mathbb{GR}$: Boosting $\mathbb{V}$isual $\mathbb{D}$ialog with Cascaded Spatial-Temporal Multi-Modal $\mathbb{GR}$aphs$

\mathbb{VD}

\mathbb{GR}

: Boosting

\mathbb{V}

isual

\mathbb{D}

ialog with Cascaded Spatial-Temporal Multi-Modal

\mathbb{GR}

aphsIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023

171

25 Oct 2023

Large Language Models can Share Images, Too!Annual Meeting of the Association for Computational Linguistics (ACL), 2023

386

23 Oct 2023

Semi-supervised multimodal coreference resolution in image narrations

208

20 Oct 2023

InViG: Benchmarking Interactive Visual Grounding with 500K Human-Robot Interactions

192

18 Oct 2023

EXMODD: An EXplanatory Multimodal Open-Domain Dialogue dataset

259

17 Oct 2023

Off-Policy Evaluation for Human FeedbackNeural Information Processing Systems (NeurIPS), 2023

331

11 Oct 2023

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

Avinash Madasu

Anahita Bhiwandiwalla

Vasudev Lal

VLM

235

07 Oct 2023

ReForm-Eval: Evaluating Large Vision Language Models via Unified Re-Formulation of Task-Oriented BenchmarksACM Multimedia (ACM MM), 2023

...

Xuanjing Huang

303

04 Oct 2023

Application of frozen large-scale models to multimodal task-oriented dialogue

133

02 Oct 2023

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Tianyu Yu

Jinyi Hu

...

Zhiyuan Liu

Maosong Sun

156

01 Oct 2023

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language ModelConference on Empirical Methods in Natural Language Processing (EMNLP), 2023

Andrea Madotto

Tushar Nagarajan

...

Yue Liu

Babak Damavandi

Anuj Kumar

MLLM

292

111

27 Sep 2023

Teaching Text-to-Image Models to Communicate in Dialog

Yuxuan Wang

Dongyan Zhao

164

27 Sep 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition

...

Conghui He

Yu Qiao

790

307

26 Sep 2023

Resolving References in Visually-Grounded Dialogue via Text GenerationSIGDIAL Conferences (SIGDIAL), 2023

Bram Willemsen

Livia Qian

Gabriel Skantze

172

23 Sep 2023

MMICL: Empowering Vision-language Model with Multi-Modal In-Context LearningInternational Conference on Learning Representations (ICLR), 2023

Zefan Cai

Xiaojian Ma

452

184

14 Sep 2023

VDialogUE: A Unified Evaluation Benchmark for Visually-grounded Dialogue

Yunshui Li

Run Luo

Min Yang

Fei Huang

Yongbin Li

157

14 Sep 2023

Collecting Visually-Grounded Dialogue with A Game Of SortsInternational Conference on Language Resources and Evaluation (LREC), 2023

Bram Willemsen

Dmytro Kalpakchi

Gabriel Skantze

117

10 Sep 2023

ImageBind-LLM: Multi-modality Instruction Tuning

...

Yu Qiao

276

152

07 Sep 2023

Sparkles: Unlocking Chats Across Multiple Images for Multimodal Instruction-Following Models

182

31 Aug 2023

Affective Visual Dialog: A Large-Scale Benchmark for Emotional Reasoning Based on Visually Grounded ConversationsEuropean Conference on Computer Vision (ECCV), 2023

Gamaleldin F. Elsayed

Mohamed Elhoseiny

263

30 Aug 2023

Large Multilingual Models Pivot Zero-Shot Multimodal Learning across LanguagesInternational Conference on Learning Representations (ICLR), 2023

Jinyi Hu

...

Yankai Lin

Jiao Xue

Dahai Li

Zhiyuan Liu

Maosong Sun

MLLM VLM

276

23 Aug 2023

Simple Baselines for Interactive Video Retrieval with Questions and AnswersIEEE International Conference on Computer Vision (ICCV), 2023

Kaiqu Liang

Samuel Albanie

200

21 Aug 2023

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual QuestionsAAAI Conference on Artificial Intelligence (AAAI), 2023

347

188

19 Aug 2023

ZRIGF: An Innovative Multimodal Framework for Zero-Resource Image-Grounded Dialogue GenerationACM Multimedia (ACM MM), 2023

192

01 Aug 2023

'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational ExchangesSIGDIAL Conferences (SIGDIAL), 2023

169

28 Jul 2023