ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.16006
  4. Cited By
MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large
  Vision-Language Models Towards Multitask AGI

MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI

24 April 2024
Kaining Ying
Fanqing Meng
Jin Wang
Zhiqiang Li
Han Lin
Yue Yang
Hao Zhang
Wenbo Zhang
Yuqi Lin
Shuo Liu
Jiayi Lei
Quanfeng Lu
Runjian Chen
Peng Xu
Renrui Zhang
Haozhe Zhang
Shiyang Feng
Yali Wang
Yuning Qiao
Ping Luo
Kaipeng Zhang
Wenqi Shao
ArXiv (abs)PDFHTML

Papers citing "MMT-Bench: A Comprehensive Multimodal Benchmark for Evaluating Large Vision-Language Models Towards Multitask AGI"

50 / 69 papers shown
Jina-VLM: Small Multilingual Vision Language Model
Jina-VLM: Small Multilingual Vision Language Model
Andreas Koukounas
Georgios Mastrapas
Florian Hönicke
Sedigheh Eslami
Guillaume Roncari
Scott Martens
Han Xiao
MLLM
359
0
0
03 Dec 2025
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Multimodal Reinforcement Learning with Agentic Verifier for AI Agents
Reuben Tan
Baolin Peng
Zhengyuan Yang
Hao Cheng
Oier Mees
...
Xiaodong Liu
Lijuan Wang
Marc Pollefeys
Yong Jae Lee
Jianfeng Gao
OffRLLRM
192
1
0
03 Dec 2025
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
AVFakeBench: A Comprehensive Audio-Video Forgery Detection Benchmark for AV-LMMs
Shuhan Xia
Peipei Li
Xuannan Liu
Dongsen Zhang
Xinyu Guo
Zekun Li
AAML
223
0
0
26 Nov 2025
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
SPHINX: A Synthetic Environment for Visual Perception and Reasoning
Md Tanvirul Alam
Saksham Aggarwal
Justin Yang Chae
Nidhi Rastogi
ReLMLRM
312
0
0
25 Nov 2025
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Yiming Qin
Bomin Wei
Jiaxin Ge
Konstantinos Kallidromitis
Stephanie Fu
Trevor Darrell
Xudong Wang
LRMVLM
260
1
0
24 Nov 2025
NVIDIA Nemotron Nano V2 VL
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
313
2
0
06 Nov 2025
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Z. He
MLLMMoEVLM
351
2
0
28 Oct 2025
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
PRISM-Bench: A Benchmark of Puzzle-Based Visual Tasks with CoT Error Detection
Yusu Qian
Cheng Wan
Chao Jia
Yinfei Yang
Qingyu Zhao
Zhe Gan
LRMReLM
512
1
0
27 Oct 2025
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
VERITAS: Leveraging Vision Priors and Expert Fusion to Improve Multimodal Data
Tingqiao Xu
Ziru Zeng
Jiayu Chen
92
0
0
17 Oct 2025
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution
Long Cui
Weiyun Wang
Jie Shao
Zichen Wen
Gen Luo
Linfeng Zhang
Y. Zhang
Yu Qiao
Wenhai Wang
VLM
186
2
0
14 Oct 2025
Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement
Improving Temporal Understanding Logic Consistency in Video-Language Models via Attention Enhancement
Chengzhi Li
Heyan Huang
Ping Jian
Zhen Yang
Yaning Tian
107
0
0
09 Oct 2025
AstroMMBench: A Benchmark for Evaluating Multimodal Large Language Models Capabilities in Astronomy
AstroMMBench: A Benchmark for Evaluating Multimodal Large Language Models Capabilities in Astronomy
Jinghang Shi
Xiao Yu Tang
Yang Hunag
Yuyang Li
Xiaokong
Yanxia Zhang
Caizhan Yue
194
0
0
29 Sep 2025
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
Teng Xiao
Zuchao Li
Lefei Zhang
184
1
0
23 Sep 2025
ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
ORIC: Benchmarking Object Recognition under Contextual Incongruity in Large Vision-Language Models
Zhaoyang Li
Z. Ling
Yuchen Zhou
Litian Gong
Erdem Bıyık
H. Su
212
0
0
19 Sep 2025
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
Daxiang Dong
Mingming Zheng
Dong Xu
Bairong Zhuang
W. Zhang
...
Ruchang Yao
Ziye Yuan
J. Wu
Guangjun Xie
Dou Shen
VLM
99
1
0
19 Sep 2025
A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation
A Multi-To-One Interview Paradigm for Efficient MLLM Evaluation
Ye Shen
Junying Wang
Farong Wen
Yijin Guo
Qi Jia
Zicheng Zhang
Guangtao Zhai
140
0
0
18 Sep 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
Tianyu Yu
Zefan Wang
Chongyi Wang
Fuwei Huang
Wenshuo Ma
...
Ning Ding
Xu Han
Xingtai Lv
Zhiyuan Liu
Maosong Sun
MLLMVLM
198
24
0
16 Sep 2025
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency
Weiyun Wang
Zhangwei Gao
Lixin Gu
Hengjun Pu
Long Cui
...
Bowen Zhou
Kai Chen
Yu Qiao
Wenhai Wang
Gen Luo
MLLMLRM
305
279
0
25 Aug 2025
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
HumanPCR: Probing MLLM Capabilities in Diverse Human-Centric Scenes
Keliang Li
Hongze Shen
Hao Shi
Ruibing Hou
Hong Chang
...
Wen Wang
Yiling Wu
Shihong Deng
Shiguang Shan
Xilin Chen
LRM
182
1
0
19 Aug 2025
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Xin Guan
Peng Xia
Zhen Zhang
Xinyu Wang
Qiuchen Wang
...
Kuan Li
Yong Jiang
Pengjun Xie
Fei Huang
Jingren Zhou
340
32
0
07 Aug 2025
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
OS Agents: A Survey on MLLM-based Agents for General Computing Devices Use
Xueyu Hu
Tao Xiong
Biao Yi
Zishu Wei
Ruixuan Xiao
...
Zhou Zhao
Hongxia Yang
Fan Wu
Shengyu Zhang
Fei Wu
LLMAGLM&RoAI4TS
241
31
0
06 Aug 2025
Evaluating Variance in Visual Question Answering Benchmarks
Evaluating Variance in Visual Question Answering Benchmarks
Nikitha SR
LRM
160
0
0
04 Aug 2025
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Towards Omnimodal Expressions and Reasoning in Referring Audio-Visual Segmentation
Kaining Ying
Henghui Ding
Guangquan Jie
Yu Jiang
VOS
327
5
0
30 Jul 2025
MOVE: Motion-Guided Few-Shot Video Object Segmentation
MOVE: Motion-Guided Few-Shot Video Object Segmentation
Kaining Ying
Hengrui Hu
Henghui Ding
VOS
244
3
0
29 Jul 2025
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Xiaolong Wang
Zhaolu Kang
Wangyuxuan Zhai
Xinyue Lou
Yunghwei Lai
...
Yawen Wang
Kaiyu Huang
Yile Wang
Peng Li
Wenshu Fan
193
0
0
20 Jun 2025
Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling
Aligning MLLM Benchmark With Human Preferences via Structural Equation Modeling
Shengwu.Xiong
Tianyu.Zou
Cong.Wang
Xuelong Li
111
0
0
13 Jun 2025
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?
Burn After Reading: Do Multimodal Large Language Models Truly Capture Order of Events in Image Sequences?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Yingjin Song
Yupei Du
Denis Paperno
Albert Gatt
MLLM
298
1
0
12 Jun 2025
CoMemo: LVLMs Need Image Context with Image Memory
CoMemo: LVLMs Need Image Context with Image Memory
Shi-Qi Liu
Weijie Su
Xizhou Zhu
Wenhai Wang
Jifeng Dai
VLM
218
0
0
06 Jun 2025
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos
Kejian Zhu
Zhuoran Jin
Hongbang Yuan
Jiachun Li
Shangqing Tu
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
VLMLRM
211
8
0
04 Jun 2025
Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation
Abstractive Visual Understanding of Multi-modal Structured Knowledge: A New Perspective for MLLM Evaluation
Yichi Zhang
Zhuo Chen
Lingbing Guo
Yajing Xu
M. Zhang
Wen Zhang
H. Chen
209
2
0
02 Jun 2025
Affordance Benchmark for MLLMs
Affordance Benchmark for MLLMs
Junying Wang
Wenzhe Li
Yalun Wu
Yingji Liang
Yijin Guo
Chunyi Li
Haodong Duan
Zicheng Zhang
Guangtao Zhai
247
4
0
01 Jun 2025
Improve MLLM Benchmark Efficiency through Interview
Improve MLLM Benchmark Efficiency through Interview
Farong Wen
Yijin Guo
Junying Wang
Jiaohao Xiao
Yingjie Zhou
Chunyi Li
Qi Jia
Guangtao Zhai
Zicheng Zhang
MLLM
225
2
0
01 Jun 2025
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
AutoJudger: An Agent-Driven Framework for Efficient Benchmarking of MLLMs
Xuanwen Ding
Chengjun Pan
Zejun Li
Jiwen Zhang
Siyuan Wang
Zhongyu Wei
262
0
0
27 May 2025
Bias and Generalizability of Foundation Models across Datasets in Breast Mammography
Bias and Generalizability of Foundation Models across Datasets in Breast MammographyInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Elodie Germani
Selin Türk Ilayda
Zeineddine Fatima
Mourad Charbel
Shadi Albarqouni
AI4CE
339
3
0
14 May 2025
SITE: towards Spatial Intelligence Thorough Evaluation
SITE: towards Spatial Intelligence Thorough Evaluation
Wenjie Wang
Reuben Tan
Pengyue Zhu
Jianwei Yang
Zhengyuan Yang
Lijuan Wang
Andrey Kolobov
Jianfeng Gao
Boqing Gong
293
6
0
08 May 2025
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Yikun Ji
Y. Hong
Jiahui Zhan
H. Chen
Jun Lan
Huijia Zhu
Weiqiang Wang
Guang Dai
Jianfu Zhang
MLLMLRM
512
4
0
19 Apr 2025
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Evaluating Menu OCR and Translation: A Benchmark for Aligning Human and Automated Evaluations in Large Vision-Language Models
Zhanglin Wu
Tengfei Song
Ning Xie
Mengli Zhu
Weidong Zhang
...
Pengfei Li
Chong Li
Junhao Zhu
Hao Yang
Shiliang Sun
471
2
0
16 Apr 2025
Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models
Resampling Benchmark for Efficient Comprehensive Evaluation of Large Vision-Language Models
Teppei Suzuki
Keisuke Ozawa
VLM
483
0
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Ziwei Liu
Shenglong Ye
...
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
Wei Wang
MLLMVLM
619
806
1
14 Apr 2025
MM-IFEngine: Towards Multimodal Instruction Following
MM-IFEngine: Towards Multimodal Instruction Following
Shengyuan Ding
Shenxi Wu
Xiangyu Zhao
Yuhang Zang
Haodong Duan
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Dahua Lin
Jiaqi Wang
OffRL
520
19
0
10 Apr 2025
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language Models
Forensics-Bench: A Comprehensive Forgery Detection Benchmark Suite for Large Vision Language ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Jin Wang
Chenghui Lv
Xian Li
Shichao Dong
Huadong Li
Kelu Yao
Chao Li
Wenqi Shao
Ping Luo
418
10
0
19 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Yujiao Shi
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Zheng Zhang
Yan Huang
Liang Wang
Tieniu Tan
833
12
0
18 Mar 2025
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator TrajectoriesComputer Vision and Pattern Recognition (CVPR), 2025
Huanyi Zheng
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Yongxu Liu
M. Yang
Chunhua Shen
MLLMVLM
288
11
0
11 Mar 2025
ProJudge: A Multi-Modal Multi-Discipline Benchmark and Instruction-Tuning Dataset for MLLM-based Process Judges
Jiaxin Ai
Pengfei Zhou
Zhaopan Xu
Ming Li
Fanrui Zhang
...
Jianwen Sun
Yukang Feng
Baojin Huang
Zhongyuan Wang
Jianchao Tan
ELM
948
3
0
09 Mar 2025
VisualSimpleQA: A Benchmark for Decoupled Evaluation of Large Vision-Language Models in Fact-Seeking Question Answering
Yanling Wang
Yihan Zhao
Xiaodong Chen
Shasha Guo
Lixin Liu
Haoyang Li
Yong Xiao
Jing Zhang
Qi Li
Ke Xu
214
3
0
09 Mar 2025
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual Contexts
MV-MATH: Evaluating Multimodal Math Reasoning in Multi-Visual ContextsComputer Vision and Pattern Recognition (CVPR), 2025
Peijie Wang
Zhong-Zhi Li
Fei Yin
Xin Yang
Dekang Ran
Cheng-Lin Liu
LRM
592
29
0
28 Feb 2025
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
From Correctness to Comprehension: AI Agents for Personalized Error Diagnosis in Education
Yi-Fan Zhang
Hang Li
D. Song
Shunian Chen
Tianlong Xu
Qingsong Wen
LLMAGLRM
359
4
0
20 Feb 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanwei Li
Yu Qi
...
Shen Yan
Bo Zhang
Chaoyou Fu
Peng Gao
Jiaming Song
MLLMLRM
453
88
0
13 Feb 2025
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position Encoding
Advancing General Multimodal Capability of Vision-language Models with Pyramid-descent Visual Position EncodingAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ziyang Chen
Mingxiao Li
Zhongfu Chen
Nan Du
Xiaolong Li
Yuexian Zou
369
3
0
19 Jan 2025
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop Reasoning
FCMR: Robust Evaluation of Financial Cross-Modal Multi-Hop ReasoningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Seunghee Kim
Changhyeon Kim
Taeuk Kim
LRM
464
7
0
17 Dec 2024
12
Next
Page 1 of 2