Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.10355
Cited By
Evaluating Object Hallucination in Large Vision-Language Models
17 May 2023
Yifan Li
Yifan Du
Kun Zhou
Jinpeng Wang
Wayne Xin Zhao
Ji-Rong Wen
MLLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating Object Hallucination in Large Vision-Language Models"
50 / 577 papers shown
Title
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye
Xianhan Zeng
Fu Li
C. Li
Haoqi Fan
CoGe
80
0
0
10 Mar 2025
Should VLMs be Pre-trained with Image Data?
Sedrick Scott Keh
Jean-Pierre Mercat
S. Gadre
Kushal Arora
Igor Vasiljevic
...
Shuran Song
Russ Tedrake
Thomas Kollar
Ludwig Schmidt
Achal Dave
VLM
49
0
0
10 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
49
0
0
10 Mar 2025
Filter Images First, Generate Instructions Later: Pre-Instruction Data Selection for Visual Instruction Tuning
Bardia Safaei
Faizan Siddiqui
Jiacong Xu
Vishal M. Patel
Shao-Yuan Lo
VLM
80
0
0
10 Mar 2025
EAZY: Eliminating Hallucinations in LVLMs by Zeroing out Hallucinatory Image Tokens
Liwei Che
Tony Qingze Liu
Jing Jia
Weiyi Qin
Ruixiang Tang
Vladimir Pavlovic
MLLM
VLM
100
1
0
10 Mar 2025
PerturboLLaVA: Reducing Multimodal Hallucinations with Perturbative Visual Training
Cong Chen
Mingyu Liu
Chenchen Jing
Y. Zhou
Fengyun Rao
Hao Chen
Bo Zhang
Chunhua Shen
MLLM
AAML
VLM
52
2
0
09 Mar 2025
DiffCLIP: Differential Attention Meets CLIP
Hasan Hammoud
Bernard Ghanem
VLM
42
0
0
09 Mar 2025
Small Vision-Language Models: A Survey on Compact Architectures and Techniques
Nitesh Patnaik
Navdeep Nayak
Himani Bansal Agrawal
Moinak Chinmoy Khamaru
Gourav Bal
Saishree Smaranika Panda
Rishi Raj
Vishal Meena
Kartheek Vadlamani
VLM
53
0
0
09 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Z. Chen
Chunwei Wang
Xiuwei Chen
Hang Xu
J. Han
Xiandan Liang
VLM
69
1
0
09 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
60
0
0
08 Mar 2025
Treble Counterfactual VLMs: A Causal Approach to Hallucination
Li Li
Jiashu Qu
Yuxiao Zhou
Yuehan Qin
Tiankai Yang
Yue Zhao
78
1
0
08 Mar 2025
SHAPE : Self-Improved Visual Preference Alignment by Iteratively Generating Holistic Winner
Kejia Chen
Jiawen Zhang
Jiacong Hu
Jiazhen Yang
Jian Lou
Zunlei Feng
Mingli Song
57
0
0
06 Mar 2025
TPC: Cross-Temporal Prediction Connection for Vision-Language Model Hallucination Reduction
Chao Wang
Weiwei Fu
Yang Zhou
MLLM
VLM
67
0
0
06 Mar 2025
See What You Are Told: Visual Attention Sink in Large Multimodal Models
Seil Kang
Jinyeong Kim
Junhyeok Kim
Seong Jae Hwang
VLM
104
5
0
05 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
73
0
0
04 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
96
2
0
04 Mar 2025
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models
Y. Huang
Peng Ye
Chenyu Huang
Jianjian Cao
Lin Zhang
Baopu Li
Gang Yu
Tao Chen
MoMe
MoE
53
0
0
03 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
18
0
03 Mar 2025
Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
Shiqi Chen
Tongyao Zhu
Ruochen Zhou
Jinghan Zhang
Siyang Gao
Juan Carlos Niebles
Mor Geva
Junxian He
Jiajun Wu
Manling Li
LRM
60
0
0
03 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
51
0
0
02 Mar 2025
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Y. Zhang
MLLM
VLM
47
1
0
01 Mar 2025
HalCECE: A Framework for Explainable Hallucination Detection through Conceptual Counterfactuals in Image Captioning
Maria Lymperaiou
Giorgos Filandrianos
Angeliki Dimitriou
Athanasios Voulodimos
Giorgos Stamou
MLLM
35
0
0
01 Mar 2025
Towards Statistical Factuality Guarantee for Large Vision-Language Models
Z. Li
Chao Yan
Nicholas J. Jackson
Wendi Cui
B. Li
Jiaxin Zhang
Bradley Malin
69
0
0
27 Feb 2025
Stealthy Backdoor Attack in Self-Supervised Learning Vision Encoders for Large Vision Language Models
Zhaoyi Liu
Huan Zhang
AAML
72
0
0
25 Feb 2025
Exploring Causes and Mitigation of Hallucinations in Large Vision Language Models
Yaqi Sun
Kyohei Atarashi
Koh Takeuchi
Hisashi Kashima
MLLM
49
0
0
24 Feb 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
LRM
66
0
0
24 Feb 2025
Testing the limits of fine-tuning to improve reasoning in vision language models
Luca M. Schulze Buschoff
Konstantinos Voudouris
Elif Akata
Matthias Bethge
Joshua B. Tenenbaum
Eric Schulz
LRM
VLM
Presented at
ResearchTrend Connect | VLM
on
14 Mar 2025
122
0
1
24 Feb 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
39
5
0
24 Feb 2025
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
47
1
0
24 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
46
7
0
21 Feb 2025
Chitrarth: Bridging Vision and Language for a Billion People
Shaharukh Khan
Ayush Tarun
Abhinav Ravi
Ali Faraz
Akshat Patidar
Praveen Kumar Pokala
Anagha Bhangare
Raja Kolla
Chandra Khatri
Shubham Agarwal
VLM
110
1
0
21 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
71
8
0
21 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
104
8
0
18 Feb 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
L. Yang
Xinchen Zhang
Ye Tian
Chenming Shang
Minghao Xu
Wentao Zhang
Bin Cui
91
1
0
17 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Z. Yang
Mike Zheng Shou
MoE
63
0
0
10 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li
Yuquan Deng
J. Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
86
5
0
08 Feb 2025
Visual Attention Never Fades: Selective Progressive Attention ReCalibration for Detailed Image Captioning in Multimodal Large Language Models
Mingi Jung
Saehuyng Lee
Eunji Kim
Sungroh Yoon
68
0
0
03 Feb 2025
Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
H. Malik
Fahad Shamshad
Muzammal Naseer
Karthik Nandakumar
F. Khan
Salman Khan
AAML
MLLM
VLM
66
0
0
03 Feb 2025
Calling a Spade a Heart: Gaslighting Multimodal Large Language Models via Negation
Bin Zhu
Hui yan Qi
Yinxuan Gui
Jingjing Chen
Chong-Wah Ngo
Ee-Peng Lim
77
1
0
31 Jan 2025
Boosting Multimodal Large Language Models with Visual Tokens Withdrawal for Rapid Inference
Zhihang Lin
Mingbao Lin
Luxi Lin
Rongrong Ji
51
16
0
28 Jan 2025
A Survey of Large Language Models for Healthcare: from Data, Technology, and Applications to Accountability and Ethics
Kai He
Rui Mao
Qika Lin
Yucheng Ruan
Xiang Lan
Mengling Feng
Erik Cambria
LM&MA
AILaw
85
148
0
28 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
106
4
0
21 Jan 2025
PAINT: Paying Attention to INformed Tokens to Mitigate Hallucination in Large Vision-Language Model
Kazi Hasan Ibn Arif
Sajib Acharjee Dip
Khizar Hussain
Lang Zhang
Chris Thomas
62
0
0
21 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
45
0
0
19 Jan 2025
Playing Devil's Advocate: Unmasking Toxicity and Vulnerabilities in Large Vision-Language Models
Abdulkadir Erol
Trilok Padhi
Agnik Saha
Ugur Kursuncu
Mehmet Emin Aktas
42
0
0
17 Jan 2025
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Miran Heo
Min-Hung Chen
De-An Huang
Sifei Liu
Subhashree Radhakrishnan
Seon Joo Kim
Yu-Chun Wang
Ryo Hachiuma
ObjD
VLM
121
2
0
14 Jan 2025
LEO: Boosting Mixture of Vision Encoders for Multimodal Large Language Models
Mozhgan Nasr Azadani
James Riddell
Sean Sedwards
Krzysztof Czarnecki
MLLM
VLM
41
2
0
13 Jan 2025
Supervision-free Vision-Language Alignment
Giorgio Giannone
Ruoteng Li
Qianli Feng
Evgeny Perevodchikov
Rui Chen
Aleix M. Martinez
VLM
58
0
0
08 Jan 2025
Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos
Haobo Yuan
X. Li
Tao Zhang
Zilong Huang
Shilin Xu
S. Ji
Yunhai Tong
Lu Qi
Jiashi Feng
Ming Yang
VLM
89
11
0
07 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
59
24
0
07 Jan 2025
Previous
1
2
3
4
5
6
...
10
11
12
Next