ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.16502
  4. Cited By
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning
  Benchmark for Expert AGI

MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

27 November 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
Ge Zhang
Samuel Stevens
Dongfu Jiang
Weiming Ren
Yuxuan Sun
Cong Wei
Botao Yu
Ruibin Yuan
Renliang Sun
Ming Yin
Boyuan Zheng
Zhenzhu Yang
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
    OSLM
    ELM
    VLM
ArXivPDFHTML

Papers citing "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

50 / 553 papers shown
Title
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation
Shota Onohara
Atsuyuki Miyai
Yuki Imajuku
Kazuki Egashira
Jeonghun Baek
Xiang Yue
Graham Neubig
Kiyoharu Aizawa
OSLM
75
1
0
22 Oct 2024
Teach Multimodal LLMs to Comprehend Electrocardiographic Images
Teach Multimodal LLMs to Comprehend Electrocardiographic Images
Ruoqi Liu
Yuelin Bai
Xiang Yue
Ping Zhang
17
3
0
21 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
62
22
0
21 Oct 2024
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
LLaVA-KD: A Framework of Distilling Multimodal Large Language Models
Y. Cai
Jiangning Zhang
Haoyang He
Xinwei He
Ao Tong
Zhenye Gan
Chengjie Wang
X. Bai
VLM
19
2
0
21 Oct 2024
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large
  Multimodal Models
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Fan Yang
Ming Tang
Jinqiao Wang
MLLM
43
1
0
21 Oct 2024
OpenMU: Your Swiss Army Knife for Music Understanding
OpenMU: Your Swiss Army Knife for Music Understanding
Mengjie Zhao
Zhi-Wei Zhong
Zhuoyuan Mao
Shiqi Yang
Wei-Hsiang Liao
Shusuke Takahashi
Hiromi Wakaki
Yuki Mitsufuji
OSLM
40
4
0
21 Oct 2024
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI
  with a Focus on Model Confidence
Dynamic Intelligence Assessment: Benchmarking LLMs on the Road to AGI with a Focus on Model Confidence
Norbert Tihanyi
Tamás Bisztray
Richard A. Dubniczky
Rebeka Tóth
B. Borsos
...
Ryan Marinelli
Lucas C. Cordeiro
Merouane Debbah
Vasileios Mavroeidis
Audun Josang
16
4
0
20 Oct 2024
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Baiqi Li
Zhiqiu Lin
Wenxuan Peng
Jean de Dieu Nyandwi
Daniel Jiang
Zixian Ma
Simran Khanuja
Ranjay Krishna
Graham Neubig
Deva Ramanan
AAML
CoGe
VLM
58
20
0
18 Oct 2024
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
LabSafety Bench: Benchmarking LLMs on Safety Issues in Scientific Labs
Yujun Zhou
Jingdong Yang
Kehan Guo
Pin-Yu Chen
Tian Gao
...
Tian Gao
Werner Geyer
Nuno Moniz
Nitesh V Chawla
Xiangliang Zhang
33
4
0
18 Oct 2024
Efficient Vision-Language Models by Summarizing Visual Tokens into
  Compact Registers
Efficient Vision-Language Models by Summarizing Visual Tokens into Compact Registers
Yuxin Wen
Qingqing Cao
Qichen Fu
Sachin Mehta
Mahyar Najibi
VLM
25
4
0
17 Oct 2024
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Can MLLMs Understand the Deep Implication Behind Chinese Images?
Chenhao Zhang
Xi Feng
Yuelin Bai
Xinrun Du
Jinchang Hou
...
Min Yang
Wenhao Huang
Chenghua Lin
Ge Zhang
Shiwen Ni
ELM
VLM
28
3
0
17 Oct 2024
Unearthing Skill-Level Insights for Understanding Trade-Offs of
  Foundation Models
Unearthing Skill-Level Insights for Understanding Trade-Offs of Foundation Models
Mazda Moayeri
Vidhisha Balachandran
Varun Chandrasekaran
Safoora Yousefi
Thomas Fel
S. Feizi
Besmira Nushi
Neel Joshi
Vibhav Vineet
24
2
0
17 Oct 2024
Harnessing Webpage UIs for Text-Rich Visual Understanding
Harnessing Webpage UIs for Text-Rich Visual Understanding
Junpeng Liu
Tianyue Ou
Yifan Song
Yuxiao Qu
Wai Lam
Chenyan Xiong
Wenhu Chen
Graham Neubig
Xiang Yue
74
5
0
17 Oct 2024
H2OVL-Mississippi Vision Language Models Technical Report
H2OVL-Mississippi Vision Language Models Technical Report
Shaikat Galib
Shanshan Wang
Guanshuo Xu
Pascal Pfeiffer
Ryan Chesler
Mark Landry
Sri Satish Ambati
MLLM
VLM
13
2
0
17 Oct 2024
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
RAP: Retrieval-Augmented Personalization for Multimodal Large Language Models
Haoran Hao
Jiaming Han
Changsheng Li
Yu-Feng Li
Xiangyu Yue
RALM
43
1
0
17 Oct 2024
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
UCFE: A User-Centric Financial Expertise Benchmark for Large Language Models
Yuzhe Yang
Yifei Zhang
Yan Hu
Y. Guo
Ruoli Gan
...
Haining Wang
Qianqian Xie
Jimin Huang
Honghai Yu
Benyou Wang
ELM
AIFin
35
1
0
17 Oct 2024
WorldMedQA-V: a multilingual, multimodal medical examination dataset for
  multimodal language models evaluation
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
João Matos
Shan Chen
Siena Placino
Yingya Li
Juan Carlos Climent Pardo
...
Hugo J. W. L. Aerts
L. A. Celi
A. I. Wong
Danielle S. Bitterman
Jack Gallifant
20
0
0
16 Oct 2024
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of
  Large Multimodal Models Through Coding Tasks
HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks
Fengji Zhang
Linquan Wu
Huiyu Bai
Guancheng Lin
Xiao Li
Xiao Yu
Yue Wang
Bei Chen
Jacky Keung
MLLM
ELM
LRM
32
0
0
16 Oct 2024
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of
  MLLMs
MC-Bench: A Benchmark for Multi-Context Visual Grounding in the Era of MLLMs
Yunqiu Xu
Linchao Zhu
Yi Yang
23
3
0
16 Oct 2024
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Understanding the Role of LLMs in Multimodal Evaluation Benchmarks
Botian Jiang
Lei Li
Xiaonan Li
Zhaowei Li
Xiachong Feng
Lingpeng Kong
Q. Liu
Xipeng Qiu
41
2
0
16 Oct 2024
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving
  Robust Reasoning in Large Language Models via Abstraction
Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction
Kaiqiao Han
Tianqing Fang
Zhaowei Wang
Y. Song
Mark Steedman
LRM
24
0
0
15 Oct 2024
When Does Perceptual Alignment Benefit Vision Representations?
When Does Perceptual Alignment Benefit Vision Representations?
Shobhita Sundaram
Stephanie Fu
Lukas Muttenthaler
Netanel Y. Tamir
Lucy Chai
Simon Kornblith
Trevor Darrell
Phillip Isola
49
12
1
14 Oct 2024
Balancing Continuous Pre-Training and Instruction Fine-Tuning:
  Optimizing Instruction-Following in LLMs
Balancing Continuous Pre-Training and Instruction Fine-Tuning: Optimizing Instruction-Following in LLMs
Ishan Jindal
Chandana Badrinath
Pranjal Bharti
Lakkidi Vinay
Sachin Dev Sharma
CLL
ALM
26
2
0
14 Oct 2024
Can We Predict Performance of Large Models across Vision-Language Tasks?
Can We Predict Performance of Large Models across Vision-Language Tasks?
Qinyu Zhao
Ming Xu
Kartik Gupta
Akshay Asthana
Liang Zheng
Stephen Gould
37
0
0
14 Oct 2024
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
MMIE: Massive Multimodal Interleaved Comprehension Benchmark for Large Vision-Language Models
Peng Xia
Siwei Han
Shi Qiu
Yiyang Zhou
Zhaoyang Wang
...
Chenhang Cui
Mingyu Ding
Linjie Li
Lijuan Wang
Huaxiu Yao
49
10
0
14 Oct 2024
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
3DArticCyclists: Generating Synthetic Articulated 8D Pose-Controllable Cyclist Data for Computer Vision Applications
Eduardo R. Corral-Soto
Yang Liu
Tongtong Cao
Y. Ren
Liu Bingbing
44
0
0
14 Oct 2024
Adapt-$\infty$: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
Adapt-∞\infty∞: Scalable Continual Multimodal Instruction Tuning via Dynamic Data Selection
A. Maharana
Jaehong Yoon
Tianlong Chen
Mohit Bansal
27
1
0
14 Oct 2024
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained
  Vision-Language Models
MMCOMPOSITION: Revisiting the Compositionality of Pre-trained Vision-Language Models
Hang Hua
Yunlong Tang
Ziyun Zeng
Liangliang Cao
Zhengyuan Yang
Hangfeng He
Chenliang Xu
Jiebo Luo
VLM
CoGe
31
9
0
13 Oct 2024
Skipping Computations in Multimodal LLMs
Skipping Computations in Multimodal LLMs
Mustafa Shukor
Matthieu Cord
24
2
0
12 Oct 2024
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language
  Models Alignment
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment
Lei Li
Zhihui Xie
Mukai Li
Shunian Chen
Peiyi Wang
L. Chen
Yazheng Yang
Benyou Wang
Lingpeng Kong
Q. Liu
VLM
ALM
34
16
0
12 Oct 2024
Dynamic Multimodal Evaluation with Flexible Complexity by
  Vision-Language Bootstrapping
Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
Yue Yang
S. Zhang
Wenqi Shao
Kaipeng Zhang
Yi Bin
Yu Wang
Ping Luo
28
3
0
11 Oct 2024
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
H. Xia
Zhengbang Yang
Junbo Zou
Rhys Tracy
Yuqing Wang
...
Xun Shao
Zhuoqing Xie
Yuan-fang Wang
Weining Shen
Hanjie Chen
ReLM
LRM
ELM
31
2
0
11 Oct 2024
Insight Over Sight? Exploring the Vision-Knowledge Conflicts in
  Multimodal LLMs
Insight Over Sight? Exploring the Vision-Knowledge Conflicts in Multimodal LLMs
Xiaoyuan Liu
Wenxuan Wang
Youliang Yuan
Jen-tse Huang
Qiuzhi Liu
Pinjia He
Zhaopeng Tu
40
0
0
10 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
59
25
0
10 Oct 2024
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
MRAG-Bench: Vision-Centric Evaluation for Retrieval-Augmented Multimodal Models
Wenbo Hu
Jia-Chen Gu
Zi-Yi Dou
Mohsen Fayyaz
Pan Lu
Kai-Wei Chang
Nanyun Peng
VLM
58
4
0
10 Oct 2024
VHELM: A Holistic Evaluation of Vision Language Models
VHELM: A Holistic Evaluation of Vision Language Models
Tony Lee
Haoqin Tu
Chi Heem Wong
Wenhao Zheng
Yiyang Zhou
...
Josselin Somerville Roberts
Michihiro Yasunaga
Huaxiu Yao
Cihang Xie
Percy Liang
VLM
37
10
0
09 Oct 2024
Pixtral 12B
Pixtral 12B
Pravesh Agrawal
Szymon Antoniak
Emma Bou Hanna
Baptiste Bout
Devendra Singh Chaplot
...
Joachim Studnia
Sandeep Subramanian
Sagar Vaze
Thomas Wang
Sophia Yang
VLM
MLLM
28
48
0
09 Oct 2024
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
ING-VP: MLLMs cannot Play Easy Vision-based Games Yet
Haoran Zhang
Hangyu Guo
Shuyue Guo
Meng Cao
Wenhao Huang
Jiaheng Liu
Ge Zhang
VLM
MLLM
LRM
35
2
0
09 Oct 2024
On the Modeling Capabilities of Large Language Models for Sequential
  Decision Making
On the Modeling Capabilities of Large Language Models for Sequential Decision Making
Martin Klissarov
Devon Hjelm
Alexander Toshev
Bogdan Mazoure
LM&Ro
ELM
OffRL
LRM
29
2
0
08 Oct 2024
Intuitions of Compromise: Utilitarianism vs. Contractualism
Intuitions of Compromise: Utilitarianism vs. Contractualism
Jared Moore
Yejin Choi
Sydney Levine
21
0
0
07 Oct 2024
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
24
4
0
06 Oct 2024
VISTA: A Visual and Textual Attention Dataset for Interpreting
  Multimodal Models
VISTA: A Visual and Textual Attention Dataset for Interpreting Multimodal Models
Harshit
Tolga Tasdizen
CoGe
VLM
17
1
0
06 Oct 2024
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering
Siqiao Xue
Tingting Chen
Fan Zhou
Qingyang Dai
Zhixuan Chu
Hongyuan Mei
34
4
0
06 Oct 2024
Improving LLM Reasoning through Scaling Inference Computation with
  Collaborative Verification
Improving LLM Reasoning through Scaling Inference Computation with Collaborative Verification
Zhenwen Liang
Ye Liu
Tong Niu
Xiangliang Zhang
Yingbo Zhou
Semih Yavuz
LRM
30
17
0
05 Oct 2024
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Gamified crowd-sourcing of high-quality data for visual fine-tuning
Shashank Yadav
Rohan Tomar
Garvit Jain
Chirag Ahooja
Shubham Chaudhary
Charles Elkan
28
0
0
05 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
71
25
0
04 Oct 2024
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal
  Foundation Models
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models
Zhengfeng Lai
Vasileios Saveris
C. L. P. Chen
Hong-You Chen
Haotian Zhang
...
Wenze Hu
Zhe Gan
Peter Grasch
Meng Cao
Yinfei Yang
VLM
30
3
0
03 Oct 2024
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Sara Ghazanfari
Alexandre Araujo
P. Krishnamurthy
Siddharth Garg
Farshad Khorrami
VLM
52
1
0
02 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng-Long Jiang
Dong Yu
VLM
29
5
0
02 Oct 2024
Visual Perception in Text Strings
Visual Perception in Text Strings
Qi Jia
Xiang Yue
Shanshan Huang
Ziheng Qin
Yizhu Liu
Bill Yuchen Lin
Yang You
VLM
38
1
0
02 Oct 2024
Previous
123...567...101112
Next