ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.08669
  4. Cited By
Visual Dialog
v1v2v3v4v5 (latest)

Visual Dialog

26 November 2016
Abhishek Das
Satwik Kottur
Khushi Gupta
Avi Singh
Deshraj Yadav
José M. F. Moura
Devi Parikh
Dhruv Batra
ArXiv (abs)PDFHTML

Papers citing "Visual Dialog"

50 / 597 papers shown
Title
Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue
Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue
Lin Yu
Xiaofei Han
Yifei Kang
Chiung-Yi Tseng
Danyang Zhang
Ziqian Bi
Zhimo Han
8
0
0
21 Nov 2025
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
FastVLM: Self-Speculative Decoding for Fast Vision-Language Model Inference
Divya J. Bajpai
M. Hanawal
MLLMVLM
198
0
0
26 Oct 2025
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
Qiyu Wu
Shuyang Cui
Satoshi Hayakawa
Wei-Yao Wang
Hiromi Wakaki
Yuki Mitsufuji
76
0
0
17 Oct 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
124
0
0
16 Oct 2025
The Mechanistic Emergence of Symbol Grounding in Language Models
The Mechanistic Emergence of Symbol Grounding in Language Models
Shuyu Wu
Ziqiao Ma
Xiaoxi Luo
Yidong Huang
Josue Torres-Fonseca
Freda Shi
Joyce Chai
LRM
136
2
0
15 Oct 2025
J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception
J-ORA: A Framework and Multimodal Dataset for Japanese Object Identification, Reference, Action Prediction in Robot Perception
Jesse Atuhurra
Hidetaka Kamigaito
Taro Watanabe
Koichiro Yoshino
60
0
0
13 Oct 2025
Generalized Contrastive Learning for Universal Multimodal Retrieval
Generalized Contrastive Learning for Universal Multimodal Retrieval
Jungsoo Lee
Janghoon Cho
Hyojin Park
Munawar Hayat
Kyuwoong Hwang
Fatih Porikli
Sungha Choi
VLM
144
1
0
30 Sep 2025
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
Beyond Greedy Exits: Improved Early Exit Decisions for Risk Control and Reliability
Divya J. Bajpai
M. Hanawal
68
0
0
28 Sep 2025
Chain-of-Thought Re-ranking for Image Retrieval Tasks
Chain-of-Thought Re-ranking for Image Retrieval Tasks
Shangrong Wu
Yanghong Zhou
Yang Chen
Feng Zhang
P. Y. Mok
LRM
100
0
0
18 Sep 2025
A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
George Correa de Araujo
H. Maia
Hélio Pedrini
104
0
0
17 Sep 2025
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
Calibrating MLLM-as-a-judge via Multimodal Bayesian Prompt Ensembles
Eric Slyman
Mehrab Tanjim
Kushal Kafle
Stefan Lee
125
0
0
10 Sep 2025
Omnidirectional Spatial Modeling from Correlated Panoramas
Omnidirectional Spatial Modeling from Correlated Panoramas
Xinshen Zhang
Tongxi Fu
Xu Zheng
89
1
0
02 Sep 2025
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Text-conditioned State Space Model For Domain-generalized Change Detection Visual Question Answering
Elman Ghazaei
Erchan Aptoula
129
0
0
12 Aug 2025
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Modality-Aware Feature Matching: A Comprehensive Review of Single- and Cross-Modality Techniques
Weide Liu
Wei Zhou
Jun Liu
Ping Hu
Jun Cheng
Jungong Han
Weisi Lin
3DV
179
3
0
30 Jul 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
201
3
0
28 Jul 2025
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
U-MARVEL: Unveiling Key Factors for Universal Multimodal Retrieval via Embedding Learning with MLLMs
Xiaojie Li
Chu Li
Shi-Zhe Chen
Xi Chen
OffRL
205
2
0
20 Jul 2025
Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions
Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual QuestionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Pu Jian
Donglei Yu
Wen Yang
Shuo Ren
Jiajun Zhang
117
5
0
18 Jul 2025
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu
Enxin Song
Wenhao Chai
Xuexiang Wen
Tian-Chun Ye
Gaoang Wang
288
5
0
03 Jul 2025
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
GeoGuess: Multimodal Reasoning based on Hierarchy of Visual Information in Street View
Fenghua Cheng
Jinxiang Wang
Sen Wang
Zi Huang
Xue Li
LRM
199
0
0
19 Jun 2025
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM
Yujun Wang
Aniri
Jinhe Bi
Soeren Pirk
Yunpu Ma
MLLM
292
11
0
17 Jun 2025
On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
On the Effectiveness of Integration Methods for Multimodal Dialogue Response Retrieval
Seongbo Jang
Seonghyeon Lee
Dongha Lee
Hwanjo Yu
146
0
0
13 Jun 2025
Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos
Outside Knowledge Conversational Video (OKCV) Dataset -- Dialoguing over Videos
Benjamin Z. Reichman
Constantin Patsch
Jack Truxal
Atishay Jain
Larry Heck
173
0
0
11 Jun 2025
FREE: Fast and Robust Vision Language Models with Early Exits
FREE: Fast and Robust Vision Language Models with Early ExitsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Divya J. Bajpai
M. Hanawal
VLM
129
2
0
07 Jun 2025
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic Interactions
Enabling Chatbots with Eyes and Ears: An Immersive Multimodal Conversation System for Dynamic InteractionsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Jihyoung Jang
Minwook Bae
Minji Kim
Dilek Z. Hakkani-Tür
Hyounghun Kim
163
1
0
31 May 2025
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
Yiming Lei
Zhizheng Yang
Zeming Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
223
0
0
29 May 2025
You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models
You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models
Dota Tianai Dong
Yifan Luo
Po-Ya Angela Wang
Asli Ozyurek
Paula Rubio-Fernandez
150
0
0
29 May 2025
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Highlighting What Matters: Promptable Embeddings for Attribute-Focused Image Retrieval
Siting Li
Xiang Gao
Simon Shaolei Du
392
1
0
21 May 2025
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic StructuresAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Shun Inadumi
Nobuhiro Ueda
Koichiro Yoshino
ObjD
312
0
0
16 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
1.1K
26
0
05 May 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
357
2
0
10 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLMVLM
270
1
0
10 Apr 2025
LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
Ziyi Wang
Haoran Wu
Yiming Rong
Deyang Jiang
Yixin Zhang
Yue Zhao
Shuang Xu
Bo Xu
VLM
189
3
0
09 Apr 2025
Vision-Speech Models: Teaching Speech Models to Converse about Images
Vision-Speech Models: Teaching Speech Models to Converse about Images
Amélie Royer
Moritz Böhle
Gabriel de Marmiesse
Laurent Mazaré
Neil Zeghidour
Alexandre Défossez
P. Pérez
AuLLMVLM
235
1
0
19 Mar 2025
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Mitigating Object Hallucinations in MLLMs via Multi-Frequency Perturbations
Shuo Li
Jiajun Sun
Guodong Zheng
Xiaoran Fan
Yujiong Shen
...
Wenming Tan
Changzhi Sun
Tao Gui
Tao Gui
Qi Zhang
AAMLVLM
327
4
0
19 Mar 2025
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective ReasoningThe Web Conference (WWW), 2025
Pengfei Luo
Jingbo Zhou
Tong Xu
Yuan Xia
Linli Xu
Tong Xu
LRM
324
6
0
13 Mar 2025
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus
KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue CorpusNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Xiaoming Shi
Zeming Liu
Chenkai Zhang
Yiming Lei
Haitao Leng
...
Qingjie Liu
Wanxiang Che
Shaoguo Liu
Size Li
Yanjie Wang
396
1
0
10 Mar 2025
Taking Notes Brings Focus? Towards Multi-Turn Multimodal Dialogue Learning
Jiazheng Liu
Sipeng Zheng
Börje F. Karlsson
Zongqing Lu
186
1
0
10 Mar 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
Davide Testa
Giovanni Bonetta
Raffaella Bernardi
Alessandro Bondielli
Alessandro Lenci
Alessio Miaschi
Lucia Passaro
Bernardo Magnini
VGenLRM
334
1
0
24 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Hui Yuan
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Longji Xu
195
1
0
19 Feb 2025
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational Agents
MTPChat: A Multimodal Time-Aware Persona Dataset for Conversational AgentsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Wanqi Yang
Yongqian Li
Meng Fang
Lawrence Yunliang Chen
255
1
0
09 Feb 2025
A Video-grounded Dialogue Dataset and Metric for Event-driven ActivitiesAAAI Conference on Artificial Intelligence (AAAI), 2025
Wiradee Imrattanatrai
Masaki Asada
Kimihiro Hasegawa
Zhi-Qi Cheng
Ken Fukuda
Teruko Mitamura
VGen
254
0
0
30 Jan 2025
Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image Retrieval
Diffusion Augmented Retrieval: A Training-Free Approach to Interactive Text-to-Image RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Zijun Long
Kangheng Liang
Gerardo Aragon Camarasa
R. McCreadie
Paul Henderson
DiffM
194
0
0
26 Jan 2025
Multimodal Multihop Source Retrieval for Web Question Answering
Multimodal Multihop Source Retrieval for Web Question Answering
Navya Yarrabelly
Saloni Mittal
108
0
0
07 Jan 2025
Visual Large Language Models for Generalized and Specialized Applications
Jiayi Zhang
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
422
32
0
06 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding TasksInternational Conference on Learning Representations (ICLR), 2024
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Lei Ma
MLLMVLM
506
93
0
03 Jan 2025
B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-Instructions
B-AVIBench: Towards Evaluating the Robustness of Large Vision-Language Model on Black-box Adversarial Visual-InstructionsIEEE Transactions on Information Forensics and Security (IEEE TIFS), 2024
Hao Zhang
Wenqi Shao
Hong Liu
Yongqiang Ma
Ping Luo
Yu Qiao
Kaipeng Zhang
Jianchao Tan
VLMAAML
170
10
0
31 Dec 2024
Towards Visual Grounding: A Survey
Towards Visual Grounding: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
859
27
0
28 Dec 2024
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation
  Understanding
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2024
Yueqian Wang
Xiaojun Meng
Yijiao Wang
Jianxin Liang
Qun Liu
Dongyan Zhao
230
3
0
23 Dec 2024
Learning to Correction: Explainable Feedback Generation for Visual
  Commonsense Reasoning Distractor
Learning to Correction: Explainable Feedback Generation for Visual Commonsense Reasoning DistractorACM Multimedia (MM), 2024
Jiali Chen
Xusen Hei
Yuqi Xue
Yuancheng Wei
Jiayuan Xie
Yi Cai
Qing Li
MLLMLRM
283
10
0
08 Dec 2024
LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
LamRA: Large Multimodal Model as Your Advanced Retrieval AssistantComputer Vision and Pattern Recognition (CVPR), 2024
Yikun Liu
Pingan Chen
Jiayin Cai
Xiaolong Jiang
Feng-Long Xie
Jiangchao Yao
Yanfeng Wang
Weidi Xie
RALM
184
0
0
02 Dec 2024
1234...101112
Next