v1v2 (latest)

Hallucination of Multimodal Large Language Models: A Survey

29 April 2024

Tianjun Xiao

Zheng Zhang

Papers citing "Hallucination of Multimodal Large Language Models: A Survey"

50 / 334 papers shown

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

361

09 Feb 2024

The Instinctive Bias: Spurious Images lead to Hallucination in MLLMsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Tianyang Han

Yong Lin

Tong Zhang

160

06 Feb 2024

Enhancing Multimodal Large Language Models with Vision Detection Models: An Empirical Study

186

31 Jan 2024

Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMsComputer Vision and Pattern Recognition (CVPR), 2024

Shengbang Tong

412

568

11 Jan 2024

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks

Weijie Su

...

Ping Luo

Yu Qiao

641

2,182

21 Dec 2023

VCoder: Versatile Vision Encoders for Multimodal Large Language Models

203

21 Dec 2023

Silkie: Preference Distillation for Large Visual Language Models

Lei Li

Peiyi Wang

391

107

17 Dec 2023

Mitigating Fine-Grained Hallucination by Fine-Tuning Large Vision-Language Models with Caption RewritesConference on Multimedia Modeling (MMM), 2023

220

04 Dec 2023

Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models

Andrés Villa

Juan Carlos León Alcázar

Alvaro Soto

Bernard Ghanem

MLLM VLM

292

03 Dec 2023

RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human FeedbackComputer Vision and Pattern Recognition (CVPR), 2023

...

Zhiyuan Liu

Maosong Sun

436

343

01 Dec 2023

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-AllocationComputer Vision and Pattern Recognition (CVPR), 2023

Conghui He

Dahua Lin

472

363

29 Nov 2023

Mitigating Object Hallucinations in Large Vision-Language Models through Visual Contrastive DecodingComputer Vision and Pattern Recognition (CVPR), 2023

Xin Li

314

448

28 Nov 2023

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization

Conghui He

369

188

28 Nov 2023

Mitigating Hallucination in Visual Language Models with Visual Supervision

Ming Tang

244

27 Nov 2023

Multimodal Large Language Models: A SurveyBigData Congress [Services Society] (BSS), 2023

Jiayang Wu

Wensheng Gan

Zefeng Chen

Shicheng Wan

Philip S. Yu

235

310

22 Nov 2023

HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction DataComputer Vision and Pattern Recognition (CVPR), 2023

Liang Pang

269

122

22 Nov 2023

SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models

...

Yu Qiao

319

275

13 Nov 2023

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

...

Ming Yan

Ji Zhang

Jitao Sang

MLLM VLM

252

188

13 Nov 2023

InfMLLM: A Unified Framework for Visual-Language Tasks

Hao Li

144

12 Nov 2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality CollaborationComputer Vision and Pattern Recognition (CVPR), 2023

Jiabo Ye

Ji Zhang

Fei Huang

Jingren Zhou

MLLM VLM

467

600

07 Nov 2023

Holistic Analysis of Hallucination in GPT-4V(ision): Bias and Interference Challenges

290

122

06 Nov 2023

Improved Baselines with Visual Instruction TuningComputer Vision and Pattern Recognition (CVPR), 2023

611

4,207

05 Oct 2023

Analyzing and Mitigating Object Hallucination in Large Vision-Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Mohit Bansal

356

266

01 Oct 2023

Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants

Tianyu Yu

Jinyi Hu

...

Zhiyuan Liu

Maosong Sun

156

01 Oct 2023

Aligning Large Multimodal Models with Factually Augmented RLHFAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

...

285

593

25 Sep 2023

Language Modeling Is CompressionInternational Conference on Learning Representations (ICLR), 2023

Grégoire Delétang

Anian Ruoss

Paul-Ambroise Duquenne

...

Marcus Hutter

426

202

19 Sep 2023

Unsupervised Open-Vocabulary Object Localization in VideosIEEE International Conference on Computer Vision (ICCV), 2023

Tianjun Xiao

...

Bernt Schiele

Thomas Brox

Zheng Zhang

Yanwei Fu

Tong He

290

18 Sep 2023

MMICL: Empowering Vision-language Model with Multi-Modal In-Context LearningInternational Conference on Learning Representations (ICLR), 2023

Zefan Cai

Xiaojian Ma

453

184

14 Sep 2023

ImageBind-LLM: Multi-modality Instruction Tuning

...

Yu Qiao

282

154

07 Sep 2023

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2023

286

284

07 Sep 2023

CIEM: Contrastive Instruction Evaluation Method for Better Instruction Tuning

215

05 Sep 2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models

...

Ji Zhang

271

29 Aug 2023

VIGC: Visual Instruction Generation and CorrectionAAAI Conference on Artificial Intelligence (AAAI), 2023

Huaping Zhong

...

Conghui He

339

24 Aug 2023

Instruction Tuning for Large Language Models: A Survey

...

Jiwei Li

922

765

21 Aug 2023

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual QuestionsAAAI Conference on Artificial Intelligence (AAAI), 2023

347

189

19 Aug 2023

Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsInternational Conference on Learning Representations (ICLR), 2023

Wei Ji

333

08 Aug 2023

Llama 2: Open Foundation and Fine-Tuned Chat Models

Louis Martin

...

Sharan Narang

Sergey Edunov

8.3K

15,302

18 Jul 2023

MMBench: Is Your Multi-modal Model an All-around Player?European Conference on Computer Vision (ECCV), 2023

...

Conghui He

Ziwei Liu

Kai-xiang Chen

Dahua Lin

709

1,664

12 Jul 2023

GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest

913

320

07 Jul 2023

What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?North American Chapter of the Association for Computational Linguistics (NAACL), 2023

321

05 Jul 2023

Shikra: Unleashing Multimodal LLM's Referential Dialogue Magic

456

816

27 Jun 2023

Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction TuningInternational Conference on Learning Representations (ICLR), 2023

Fuxiao Liu

433

406

26 Jun 2023

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

...

802

1,224

23 Jun 2023

The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only

Guilherme Penedo

Quentin Malartic

Daniel Hesslow

Ruxandra-Aimée Cojocaru

422

882

01 Jun 2023

PandaGPT: One Model To Instruction-Follow Them AllTsinghua Interdisciplinary Workshop on Logic, Language and Meaning (TILLM), 2023

260

379

25 May 2023

LIMA: Less Is More for AlignmentNeural Information Processing Systems (NeurIPS), 2023

...

Luke Zettlemoyer

443

1,138

18 May 2023

C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation ModelsNeural Information Processing Systems (NeurIPS), 2023

...

Maosong Sun

425

741

15 May 2023

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningNeural Information Processing Systems (NeurIPS), 2023

1.4K

2,908

11 May 2023

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Ping Luo

347

305

08 May 2023

Otter: A Multi-Modal Model with In-Context Instruction TuningIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Joshua Adrian Cahyono

Jingkang Yang

Yu Qiao

MLLM

520

620

05 May 2023