Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.06595
Cited By
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
12 August 2023
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use"
50 / 75 papers shown
Title
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Y. Cao
D. Lin
Jiaqi Wang
OffRL
46
1
0
10 Apr 2025
Taxonomy-Aware Evaluation of Vision-Language Models
Vésteinn Snæbjarnarson
Kevin Du
Niklas Stoehr
Serge J. Belongie
Ryan Cotterell
Nico Lang
Stella Frank
24
0
0
07 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
42
0
0
04 Apr 2025
XLRS-Bench: Could Your Multimodal LLMs Understand Extremely Large Ultra-High-Resolution Remote Sensing Imagery?
Fengxiang Wang
H. Wang
Mingshuo Chen
Di Wang
Yulin Wang
...
L. Lan
Wenjing Yang
J. Zhang
Zhiyuan Liu
Maosong Sun
52
2
0
31 Mar 2025
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Tuo Liang
Zhe Hu
Jing Li
Hao Zhang
Yiren Lu
...
Yiran Qiao
Disheng Liu
Jeirui Peng
Jing Ma
Yu Yin
44
0
0
29 Mar 2025
From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models
Muzhi Dai
Jiashuo Sun
Zhiyuan Zhao
Shixuan Liu
Rui Li
Junyu Gao
Xuelong Li
VLM
48
0
0
08 Mar 2025
OWLViz: An Open-World Benchmark for Visual Question Answering
T. Nguyen
Dang Nguyen
Hoang Nguyen
Thuan Luong
Long Hoang Dang
Viet Dac Lai
VLM
56
0
0
04 Mar 2025
MMSciBench: Benchmarking Language Models on Multimodal Scientific Problems
Xinwu Ye
Chengfan Li
Siming Chen
Xiangru Tang
Wei Wei
LRM
29
1
0
27 Feb 2025
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Yi-Fan Zhang
Hang Li
D. Song
Lichao Sun
Tianlong Xu
Qingsong Wen
LLMAG
LRM
74
2
0
20 Feb 2025
MMAR: Towards Lossless Multi-Modal Auto-Regressive Probabilistic Modeling
Jian Yang
Dacheng Yin
Yizhou Zhou
Fengyun Rao
Wei-dong Zhai
Yang Cao
Zheng-jun Zha
DiffM
18
5
0
14 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
24
2
0
06 Oct 2024
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
Phillip Mueller
Sebastian Mueller
Lars Mikelsons
9
1
0
25 Sep 2024
A Survey on Multimodal Benchmarks: In the Era of Large AI Models
Lin Li
Guikun Chen
Hanrong Shi
Jun Xiao
Long Chen
34
8
0
21 Sep 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGe
VLM
31
2
0
19 Sep 2024
IW-Bench: Evaluating Large Multimodal Models for Converting Image-to-Web
Hongcheng Guo
Wei Zhang
Junhao Chen
Yaonan Gu
Jian Yang
...
Binyuan Hui
Tianyu Liu
Jianxin Ma
Chang Zhou
Zhoujun Li
15
1
0
14 Sep 2024
MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
Yi-Fan Zhang
Huanyu Zhang
Haochen Tian
Chaoyou Fu
Shuangqing Zhang
...
Qingsong Wen
Zhang Zhang
L. Wang
Rong Jin
Tieniu Tan
OffRL
38
35
0
23 Aug 2024
UniBench: Visual Reasoning Requires Rethinking Vision-Language Beyond Scaling
Haider Al-Tahan
Q. Garrido
Randall Balestriero
Diane Bouchacourt
C. Hazirbas
Mark Ibrahim
VLM
27
10
0
09 Aug 2024
Visual Riddles: a Commonsense and World Knowledge Challenge for Large Vision and Language Models
Nitzan Bitton-Guetta
Aviv Slobodkin
Aviya Maimon
Eliya Habba
Royi Rassin
Yonatan Bitton
Idan Szpektor
Amir Globerson
Yuval Elovici
ReLM
VLM
LRM
26
5
0
28 Jul 2024
LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts
Yijia Xiao
Edward Sun
Tianyu Liu
Wei Wang
LRM
22
28
0
06 Jul 2024
Spontaneous Reward Hacking in Iterative Self-Refinement
Jane Pan
He He
Samuel R. Bowman
Shi Feng
12
10
0
05 Jul 2024
HEMM: Holistic Evaluation of Multimodal Foundation Models
Paul Pu Liang
Akshay Goindani
Talha Chafekar
Leena Mathur
Haofei Yu
Ruslan Salakhutdinov
Louis-Philippe Morency
29
10
0
03 Jul 2024
VIVA: A Benchmark for Vision-Grounded Decision-Making with Human Values
Zhe Hu
Yixiao Ren
Jing Li
Yu Yin
VLM
20
0
0
03 Jul 2024
Why do LLaVA Vision-Language Models Reply to Images in English?
Musashi Hinck
Carolin Holtermann
M. L. Olson
Florian Schneider
Sungduk Yu
Anahita Bhiwandiwalla
Anne Lauscher
Shaoyen Tseng
Vasudev Lal
VLM
28
4
0
02 Jul 2024
MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs
Yusu Qian
Hanrong Ye
J. Fauconnier
Peter Grasch
Yinfei Yang
Zhe Gan
102
13
0
01 Jul 2024
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
Wenqian Ye
Guangtao Zheng
Yunsheng Ma
Xu Cao
Bolin Lai
James M. Rehg
Aidong Zhang
20
10
0
24 Jun 2024
Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Joanna Matthiesen
Kevin A. Smith
J. Tenenbaum
ELM
LRM
34
6
0
22 Jun 2024
WildVision: Evaluating Vision-Language Models in the Wild with Human Preferences
Yujie Lu
Dongfu Jiang
Wenhu Chen
William Yang Wang
Yejin Choi
Bill Yuchen Lin
VLM
33
26
0
16 Jun 2024
AlignMMBench: Evaluating Chinese Multimodal Alignment in Large Vision-Language Models
Yuhang Wu
Wenmeng Yu
Yean Cheng
Yan Wang
Xiaohan Zhang
Jiazheng Xu
Ming Ding
Yuxiao Dong
30
1
0
13 Jun 2024
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases
Rithesh Murthy
Liangwei Yang
Juntao Tan
Tulika Awalgaonkar
Yilun Zhou
...
Zuxin Liu
Ming Zhu
Huan Wang
Caiming Xiong
Silvio Savarese
49
5
0
12 Jun 2024
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He
Weixi Feng
Kaizhi Zheng
Yujie Lu
Wanrong Zhu
...
Zhengyuan Yang
Kevin Lin
William Yang Wang
Lijuan Wang
Xin Eric Wang
VGen
LRM
25
12
0
12 Jun 2024
DiffuSyn Bench: Evaluating Vision-Language Models on Real-World Complexities with Diffusion-Generated Synthetic Benchmarks
Haokun Zhou
Yipeng Hong
VLM
EGVM
13
1
0
06 Jun 2024
VideoPhy: Evaluating Physical Commonsense for Video Generation
Hritik Bansal
Zongyu Lin
Tianyi Xie
Zeshun Zong
Michal Yarom
Yonatan Bitton
Chenfanfu Jiang
Yizhou Sun
Kai-Wei Chang
Aditya Grover
EGVM
VGen
21
36
0
05 Jun 2024
Cracking the Code of Juxtaposition: Can AI Models Understand the Humorous Contradictions
Zhe Hu
Tuo Liang
Jing Li
Yiren Lu
Yunlai Zhou
Yiran Qiao
Jing Ma
Yu Yin
25
4
0
29 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao-Yu Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
29
15
0
08 May 2024
SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension
Bohao Li
Yuying Ge
Yi Chen
Yixiao Ge
Ruimao Zhang
Ying Shan
VLM
22
27
0
25 Apr 2024
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
Samuel R. Bowman
Shi Feng
20
152
0
15 Apr 2024
What Are We Measuring When We Evaluate Large Vision-Language Models? An Analysis of Latent Factors and Biases
A. M. H. Tiong
Junqi Zhao
Boyang Albert Li
Junnan Li
S. Hoi
Caiming Xiong
27
7
0
03 Apr 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
45
9
0
31 Mar 2024
Exploring the Frontier of Vision-Language Models: A Survey of Current Methodologies and Future Directions
Akash Ghosh
Arkadeep Acharya
Sriparna Saha
Vinija Jain
Aman Chadha
VLM
28
9
0
20 Feb 2024
Evaluating Image Review Ability of Vision Language Models
Shigeki Saito
Kazuki Hayashi
Yusuke Ide
Yusuke Sakai
Kazuma Onishi
Toma Suzuki
Seiji Gobara
Hidetaka Kamigaito
Katsuhiko Hayashi
Taro Watanabe
24
0
0
19 Feb 2024
SceMQA: A Scientific College Entrance Level Multimodal Question Answering Benchmark
Zhenwen Liang
Kehan Guo
Gang Liu
Taicheng Guo
Yujun Zhou
Tianyu Yang
Jiajun Jiao
Renjie Pi
Jipeng Zhang
Xiangliang Zhang
ELM
13
5
0
06 Feb 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
P. Xiao
59
46
0
04 Feb 2024
Binding Touch to Everything: Learning Unified Multimodal Tactile Representations
Fengyu Yang
Chao Feng
Ziyang Chen
Hyoungseob Park
Daniel Wang
...
Ziyao Zeng
Xien Chen
Rit Gangopadhyay
Andrew Owens
Alex Wong
28
38
0
31 Jan 2024
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yuxin Jiang
Yufei Wang
Liangyou Li
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
ELM
12
14
0
30 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
17
66
0
10 Jan 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
...
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
MLLM
28
42
0
03 Jan 2024
Visual Instruction Tuning towards General-Purpose Multimodal Model: A Survey
Jiaxing Huang
Jingyi Zhang
Kai Jiang
Han Qiu
Shijian Lu
22
22
0
27 Dec 2023
GlitchBench: Can large multimodal models detect video game glitches?
Mohammad Reza Taesiri
Tianjun Feng
Anh Nguyen
C. Bezemer
MLLM
VLM
LRM
17
9
0
08 Dec 2023
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?
Xiujun Li
Yujie Lu
Zhe Gan
Jianfeng Gao
William Yang Wang
Yejin Choi
VLM
MLLM
17
1
0
29 Nov 2023
MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models
Xin Liu
Yichen Zhu
Jindong Gu
Yunshi Lan
Chao Yang
Yu Qiao
19
39
0
29 Nov 2023
1
2
Next