Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.12742
Cited By
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning
18 June 2024
Bingchen Zhao
Yongshuo Zong
Letian Zhang
Timothy Hospedales
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning"
16 / 16 papers shown
Title
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
J. A. Zhang
Chuanqi Cheng
Y. Liu
W. Liu
Jian Luan
Rui Yan
22
0
0
28 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
66
6
1
14 Apr 2025
Identifying and Mitigating Position Bias of Multi-image Vision-Language Models
Xinyu Tian
Shu Zou
Zhaoyuan Yang
Jing Zhang
58
0
0
18 Mar 2025
Exploring and Evaluating Multimodal Knowledge Reasoning Consistency of Multimodal Large Language Models
Boyu Jia
Junzhe Zhang
Huixuan Zhang
Xiaojun Wan
LRM
44
1
0
03 Mar 2025
Can Large Language Models Unveil the Mysteries? An Exploration of Their Ability to Unlock Information in Complex Scenarios
Chao Wang
Luning Zhang
Z. Wang
Yang Zhou
ELM
VLM
LRM
53
1
0
27 Feb 2025
VOILA: Evaluation of MLLMs For Perceptual Understanding and Analogical Reasoning
Nilay Yilmaz
Maitreya Patel
Yiran Luo
Tejas Gokhale
Chitta Baral
Suren Jayasuriya
Yezhou Yang
LRM
33
0
0
25 Feb 2025
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng-Long Jiang
Dong Yu
VLM
29
5
0
02 Oct 2024
CAST: Cross-modal Alignment Similarity Test for Vision Language Models
Gautier Dagan
Olga Loginova
Anil Batra
CoGe
72
1
0
17 Sep 2024
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
Siwei Wu
Kang Zhu
Yu Bai
Yiming Liang
Yizhi Li
...
Xingwei Qu
Xuxin Cheng
Ge Zhang
Wenhao Huang
Chenghua Lin
VLM
22
2
0
24 Jul 2024
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon
Biniyam Aschalew Tolera
Taesik Gong
Kimin Lee
Sung-Ju Lee
28
6
0
15 Jul 2024
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
27
2
0
26 May 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
76
242
0
29 Jan 2024
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
154
280
0
14 Oct 2023
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering
Pan Lu
Swaroop Mishra
Tony Xia
Liang Qiu
Kai-Wei Chang
Song-Chun Zhu
Oyvind Tafjord
Peter Clark
A. Kalyan
ELM
ReLM
LRM
207
1,089
0
20 Sep 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1