Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2404.18930
Cited By

Hallucination of Multimodal Large Language Models: A Survey

v1v2 (latest)

Hallucination of Multimodal Large Language Models: A Survey

29 April 2024

Tianjun Xiao

Zheng Zhang

Mike Zheng Shou

ArXiv (abs)PDF HTML

Papers citing "Hallucination of Multimodal Large Language Models: A Survey"

50 / 334 papers shown

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

320

5

0

24 Dec 2025

Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection

Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection

232

2

0

24 Dec 2025

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Rethinking Prompt Design for Inference-time Scaling in Text-to-Visual Generation

Mamshad Nayeem Rizve

192

0

0

03 Dec 2025

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

Debate with Images: Detecting Deceptive Behaviors in Multimodal Large Language Models

187

0

0

29 Nov 2025

TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs

TrafficLens: Multi-Camera Traffic Video Analysis Using LLMs

Md. Adnan Arefeen

Biplob K. Debnath

358

0

0

26 Nov 2025

VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering

VeriSciQA: An Auto-Verified Dataset for Scientific Visual Question Answering

143

0

0

25 Nov 2025

Beyond Words and Pixels: A Benchmark for Implicit World Knowledge Reasoning in Generative Models

Beyond Words and Pixels: A Benchmark for Implicit World Knowledge Reasoning in Generative Models

481

0

0

23 Nov 2025

ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization

ARIAL: An Agentic Framework for Document VQA with Precise Answer Localization

Ahmad Mohammadshirazi

Pinaki Prasad Guha Neogi

Dheeraj Kulshrestha

121

0

0

22 Nov 2025

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

174

2

0

20 Nov 2025

Dual-LoRA and Quality-Enhanced Pseudo Replay for Multimodal Continual Food Learning

Dual-LoRA and Quality-Enhanced Pseudo Replay for Multimodal Continual Food Learning

248

0

0

17 Nov 2025

What Color Is It? A Text-Interference Multimodal Hallucination Benchmark

What Color Is It? A Text-Interference Multimodal Hallucination Benchmark

243

1

0

17 Nov 2025

Suppressing VLM Hallucinations with Spectral Representation Filtering

Suppressing VLM Hallucinations with Spectral Representation Filtering

145

0

0

15 Nov 2025

An Analysis of Architectural Impact on LLM-based Abstract Visual Reasoning: A Systematic Benchmark on RAVEN-FAIR

An Analysis of Architectural Impact on LLM-based Abstract Visual Reasoning: A Systematic Benchmark on RAVEN-FAIR

61

0

0

14 Nov 2025

A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving

A Low-Rank Method for Vision Language Model Hallucination Mitigation in Autonomous Driving

93

1

0

09 Nov 2025

Role-SynthCLIP: A Role Play Driven Diverse Synthetic Data Approach

Role-SynthCLIP: A Role Play Driven Diverse Synthetic Data Approach

Yuanxiang Huangfu

112

0

0

07 Nov 2025

Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems

Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems

Ghanshyam Verma

191

1

0

28 Oct 2025

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs

MAD-Fact: A Multi-Agent Debate Framework for Long-Form Factuality Evaluation in LLMs

313

0

0

27 Oct 2025

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

From Denoising to Refining: A Corrective Framework for Vision-Language Diffusion Model

171

1

0

22 Oct 2025

PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning

PruneHal: Reducing Hallucinations in Multi-modal Large Language Models through Adaptive KV Cache Pruning

120

0

0

22 Oct 2025

Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding

Beyond Single Models: Mitigating Multimodal Hallucinations via Adaptive Token Ensemble Decoding

151

0

0

21 Oct 2025

Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents

Med-VRAgent: A Framework for Medical Visual Reasoning-Enhanced Agents

180

1

0

21 Oct 2025

Token-Level Inference-Time Alignment for Vision-Language Models

Token-Level Inference-Time Alignment for Vision-Language Models

277

0

0

20 Oct 2025

Hallucination Benchmark for Speech Foundation Models

Hallucination Benchmark for Speech Foundation Models

Alkis Koudounas

Moreno La Quatra

Sabato Marco Siniscalchi

238

1

0

18 Oct 2025

Spatial Preference Rewarding for MLLMs Spatial Understanding

Spatial Preference Rewarding for MLLMs Spatial Understanding

134

0

0

16 Oct 2025

Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control

Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control

135

0

0

11 Oct 2025

Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing

Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing

...

180

0

0

09 Oct 2025

ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations

ChainMPQ: Interleaved Text-Image Reasoning Chains for Mitigating Relation Hallucinations

119

0

0

07 Oct 2025

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

When Thinking Drifts: Evidential Grounding for Robust Video Reasoning

Kristen Grauman

271

4

0

07 Oct 2025

CoDA: Agentic Systems for Collaborative Data Visualization

CoDA: Agentic Systems for Collaborative Data Visualization

102

2

0

03 Oct 2025

RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation

RefineShot: Rethinking Cinematography Understanding with Foundational Skill Evaluation

Ming-Hsuan Yang

175

0

0

02 Oct 2025

MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning

MedMMV: A Controllable Multimodal Multi-Agent Framework for Reliable and Verifiable Clinical Reasoning

168

2

0

29 Sep 2025

DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning

DocPruner: A Storage-Efficient Framework for Multi-Vector Visual Document Retrieval via Adaptive Patch-Level Embedding Pruning

189

5

0

28 Sep 2025

Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors

Exposing Hallucinations To Suppress Them: VLMs Representation Editing With Generative Anchors

145

1

0

26 Sep 2025

Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

Customizing Visual Emotion Evaluation for MLLMs: An Open-vocabulary, Multifaceted, and Scalable Approach

152

1

0

26 Sep 2025

From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education

From Superficial Outputs to Superficial Learning: Risks of Large Language Models in Education

403

3

0

26 Sep 2025

Hallucination as an Upper Bound: A New Perspective on Text-to-Image Evaluation

Hallucination as an Upper Bound: A New Perspective on Text-to-Image Evaluation

300

0

0

25 Sep 2025

Are Hallucinations Bad Estimations?

Are Hallucinations Bad Estimations?

Jerry Yao-Chieh Hu

Jennifer Yuntong Zhang

161

0

0

25 Sep 2025

Revealing Multimodal Causality with Large Language Models

Revealing Multimodal Causality with Large Language Models

188

0

0

22 Sep 2025

Losing the Plot: How VLM responses degrade on imperfect charts

Losing the Plot: How VLM responses degrade on imperfect charts

Vijaykrishnan Narayanan

Mahantesh Halappanavar

102

0

0

22 Sep 2025

WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification

WISE: Weak-Supervision-Guided Step-by-Step Explanations for Multimodal LLMs in Image Classification

126

1

0

22 Sep 2025

ChartHal: A Fine-grained Framework Evaluating Hallucination of Large Vision Language Models in Chart Understanding

ChartHal: A Fine-grained Framework Evaluating Hallucination of Large Vision Language Models in Chart Understanding

124

0

0

22 Sep 2025

Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing

Beyond Spurious Signals: Debiasing Multimodal Large Language Models via Counterfactual Inference and Adaptive Expert Routing

Hsiu-Yuan Huang

98

1

0

18 Sep 2025

ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models

ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models

Nathaniel D. Bastian

146

0

0

18 Sep 2025

EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing

EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing

...

143

0

0

16 Sep 2025

HARMONIC: A Content-Centric Cognitive Robotic Architecture

HARMONIC: A Content-Centric Cognitive Robotic Architecture

Sanjay Oruganti

Michael K. Roberts

Christian Arndt

Carlos Gonzalez

77

1

0

16 Sep 2025

FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning

FineQuest: Adaptive Knowledge-Assisted Sports Video Understanding via Agent-of-Thoughts Reasoning

175

2

0

15 Sep 2025

Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

...

William Yang Wang

169

1

0

15 Sep 2025

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination

OmniDPO: A Preference Optimization Framework to Address Omni-Modal Hallucination

185

0

0

31 Aug 2025

MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning

MM-SeR: Multimodal Self-Refinement for Lightweight Image Captioning

212

0

0

29 Aug 2025

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

GLSim: Detecting Object Hallucinations in LVLMs via Global-Local Similarity

148

1

0

27 Aug 2025