Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.16502
Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
50 / 553 papers shown
Title
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding
Xinyu Ma
Ziyang Ding
Zhicong Luo
C. L. P. Chen
Zonghao Guo
Derek F. Wong
Xiaoyi Feng
Maosong Sun
VLM
LRM
66
0
0
17 Mar 2025
Sightation Counts: Leveraging Sighted User Feedback in Building a BLV-aligned Dataset of Diagram Descriptions
Wan Ju Kang
Eunki Kim
Na Min An
Sangryul Kim
Haemin Choi
Ki Hoon Kwak
James Thorne
47
0
0
17 Mar 2025
NuPlanQA: A Large-Scale Dataset and Benchmark for Multi-View Driving Scene Understanding in Multi-Modal Large Language Models
Sung-Yeon Park
Can Cui
Yunsheng Ma
Ahmadreza Moradipari
Rohit Gupta
Kyungtae Han
Ziran Wang
34
0
0
17 Mar 2025
MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling
Yingyue Li
Bencheng Liao
Wenyu Liu
Xinggang Wang
Mamba
58
0
0
17 Mar 2025
Task-Oriented Feature Compression for Multimodal Understanding via Device-Edge Co-Inference
Cheng Yuan
Z. Liu
Jiashu Lv
Jiawei Shao
Yufei Jiang
J. Zhang
Xuelong Li
43
0
0
17 Mar 2025
BREEN: Bridge Data-Efficient Encoder-Free Multimodal Learning with Learnable Queries
Tianle Li
Yongming Rao
Winston Hu
Yu Cheng
MLLM
61
0
0
16 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y. Wang
Shengqiong Wu
Y. Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
74
7
0
16 Mar 2025
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era
Kanzhi Cheng
Wenpo Song
Jiaxin Fan
Zheng Ma
Qiushi Sun
Fangzhi Xu
Chenyang Yan
Nuo Chen
Jianbing Zhang
Jiajun Chen
MLLM
VLM
45
1
0
16 Mar 2025
VisTW: Benchmarking Vision-Language Models for Traditional Chinese in Taiwan
Zhi Rui Tam
Ya-Ting Pai
Yen-Wei Lee
Yun-Nung Chen
CoGe
99
0
0
13 Mar 2025
SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems
Ziyu Guo
Ray Zhang
Hao Chen
Jialin Gao
Dongzhi Jiang
Jiaze Wang
Pheng-Ann Heng
48
2
0
13 Mar 2025
Towards Understanding Graphical Perception in Large Multimodal Models
Kai Zhang
Jianwei Yang
J. Inala
Chandan Singh
Jianfeng Gao
Yu Su
Chenglong Wang
40
1
0
13 Mar 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
Yiming Jia
J. Li
Xiang Yue
Bo Li
Ping Nie
Kai Zou
Wenhu Chen
LRM
74
2
0
13 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
58
9
0
13 Mar 2025
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
Rui Yang
Lin Song
Yicheng Xiao
Runhui Huang
Yixiao Ge
Ying Shan
Hengshuang Zhao
MLLM
62
0
0
12 Mar 2025
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Luozheng Qin
Zhiyu Tan
Mengping Yang
Xiaomeng Yang
Hao Li
80
0
0
12 Mar 2025
OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
Mamba
MLLM
74
1
0
11 Mar 2025
Painting with Words: Elevating Detailed Image Captioning with Benchmark and Alignment Learning
Qinghao Ye
Xianhan Zeng
Fu Li
C. Li
Haoqi Fan
CoGe
80
0
0
10 Mar 2025
ALLVB: All-in-One Long Video Understanding Benchmark
Xichen Tan
Yuanjing Luo
Yunfan Ye
Fang Liu
Zhiping Cai
MLLM
VLM
61
0
0
10 Mar 2025
MapQA: Open-domain Geospatial Question Answering on Map Data
Zekun Li
Malcolm Grossman
Eric
Qasemi
Mihir Kulkarni
Muhao Chen
Yao-Yi Chiang
45
0
0
10 Mar 2025
Statistical Study of Sensor Data and Investigation of ML-based Calibration Algorithms for Inexpensive Sensor Modules: Experiments from Cape Point
Travis Barrett
Amit Kumar Mishra
43
0
0
09 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Z. Chen
Chunwei Wang
Xiuwei Chen
Hang Xu
J. Han
Xiandan Liang
VLM
67
1
0
09 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
K. Zhang
ELM
59
0
0
09 Mar 2025
GenieBlue: Integrating both Linguistic and Multimodal Capabilities for Large Language Models on Mobile Devices
Xudong Lu
Yinghao Chen
Renshou Wu
Haohao Gao
Xi Chen
...
Fangyuan Li
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
68
0
0
08 Mar 2025
CASP: Compression of Large Multimodal Models Based on Attention Sparsity
Mohsen Gholami
Mohammad Akbari
Kevin Cannons
Yong Zhang
63
0
0
07 Mar 2025
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Ailin Deng
Tri Cao
Zhirui Chen
Bryan Hooi
VLM
93
2
0
04 Mar 2025
Seeing is Understanding: Unlocking Causal Attention into Modality-Mutual Attention for Multimodal LLMs
Wei-Yao Wang
Zhao Wang
Helen Suzuki
Yoshiyuki Kobayashi
LRM
48
1
0
04 Mar 2025
DivPrune: Diversity-based Visual Token Pruning for Large Multimodal Models
Saeed Ranjbar Alvar
Gursimran Singh
Mohammad Akbari
Yong Zhang
VLM
71
0
0
04 Mar 2025
OWLViz: An Open-World Benchmark for Visual Question Answering
T. Nguyen
Dang Nguyen
Hoang Nguyen
Thuan Luong
Long Hoang Dang
Viet Dac Lai
VLM
61
0
0
04 Mar 2025
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs
Abdelrahman Abouelenin
Atabak Ashfaq
Adam Atkinson
Hany Awadalla
Nguyen Bach
...
Ishmam Zabir
Yunan Zhang
Li Zhang
Y. Zhang
Xiren Zhou
MoE
SyDa
68
18
0
03 Mar 2025
AesthetiQ: Enhancing Graphic Layout Design via Aesthetic-Aware Preference Alignment of Multi-modal Large Language Models
Sohan Patnaik
Rishabh Jain
Balaji Krishnamurthy
Mausoom Sarkar
26
0
0
01 Mar 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
P. Wang
Zhongzhi Li
Fei Yin
Dekang Ran
Chenglin Liu
Cheng-Lin Liu
LRM
42
3
0
28 Feb 2025
FC-Attack: Jailbreaking Large Vision-Language Models via Auto-Generated Flowcharts
Ziyi Zhang
Zhen Sun
Z. Zhang
Jihui Guo
Xinlei He
AAML
39
2
0
28 Feb 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
80
5
0
28 Feb 2025
Picking the Cream of the Crop: Visual-Centric Data Selection with Collaborative Agents
Zhenyu Liu
Yunxin Li
Baotian Hu
Wenhan Luo
Yaowei Wang
Min-Ling Zhang
60
0
0
27 Feb 2025
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
74
0
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
74
3
0
26 Feb 2025
Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference
Zhuo Chen
Xinyu Wang
Yong-feng Jiang
Zhen Zhang
Xinyu Geng
Pengjun Xie
Fei Huang
Kewei Tu
98
0
0
25 Feb 2025
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference
Xiangyu Zhao
Shengyuan Ding
Zicheng Zhang
Haian Huang
Maosong Cao
...
Wenhai Wang
Guangtao Zhai
Haodong Duan
Hua Yang
Kai Chen
86
6
0
25 Feb 2025
Evaluating Multimodal Generative AI with Korean Educational Standards
S. Park
Geewook Kim
AI4Ed
ELM
65
0
0
24 Feb 2025
PosterSum: A Multimodal Benchmark for Scientific Poster Summarization
Rohit Saxena
Pasquale Minervini
Frank Keller
VLM
64
0
0
24 Feb 2025
All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark
Davide Testa
Giovanni Bonetta
Raffaella Bernardi
Alessandro Bondielli
Alessandro Lenci
Alessio Miaschi
Lucia Passaro
Bernardo Magnini
VGen
LRM
50
0
0
24 Feb 2025
MMAD: A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection
Xi Jiang
Jian Li
Hanqiu Deng
Y. Liu
Bin-Bin Gao
Yifeng Zhou
Jialin Li
Chengjie Wang
Feng Zheng
50
0
0
24 Feb 2025
Visual Reasoning Evaluation of Grok, Deepseek Janus, Gemini, Qwen, Mistral, and ChatGPT
Nidhal Jegham
Marwan Abdelatti
Abdeltawab Hendawi
VLM
LRM
47
0
0
23 Feb 2025
Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models
Qianqi Yan
Yue Fan
Hongquan Li
Shan Jiang
Yang Zhao
Xinze Guan
Ching-Chen Kuo
X. Wang
VLM
LRM
58
2
0
22 Feb 2025
Moving Beyond Medical Exam Questions: A Clinician-Annotated Dataset of Real-World Tasks and Ambiguity in Mental Healthcare
Max Lamparth
D. Grabb
Amy Franks
Scott Gershan
Kaitlyn N. Kunstman
...
Monika Drummond Roots
Manu Sharma
Aryan Shrivastava
N. Vasan
Colleen Waickman
36
2
0
22 Feb 2025
LOVA3: Learning to Visual Question Answering, Asking and Assessment
Henry Hengyuan Zhao
Pan Zhou
Difei Gao
Zechen Bai
Mike Zheng Shou
69
8
0
21 Feb 2025
Symmetrical Visual Contrastive Optimization: Aligning Vision-Language Models with Minimal Contrastive Images
Shengguang Wu
Fan-Yun Sun
Kaiyue Wen
Nick Haber
VLM
72
0
0
20 Feb 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
Seunghee Kim
Changhyeon Kim
Taeuk Kim
LRM
69
1
0
20 Feb 2025
Evaluating Large Language Models for Public Health Classification and Extraction Tasks
Joshua Harris
Timothy Laurence
Leo Loman
Fan Grayson
Toby Nonnenmacher
...
Hamish Mohammed
Thomas Finnie
Luke Hounsome
Michael Borowitz
Steven Riley
LM&MA
AI4MH
79
5
0
20 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
38
0
0
20 Feb 2025
Previous
1
2
3
4
5
6
...
10
11
12
Next