ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.01800
  4. Cited By
MiniCPM-V: A GPT-4V Level MLLM on Your Phone

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

3 August 2024
Yuan Yao
Tianyu Yu
Ao Zhang
Chongyi Wang
Junbo Cui
Hongji Zhu
Tianchi Cai
Haoyu Li
Weilin Zhao
Zhihui He
Qi-An Chen
Huarong Zhou
Zhensheng Zou
Haoye Zhang
Shengding Hu
Zhi Zheng
Jie Zhou
Jie Cai
Xu Han
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
    VLM
    MLLM
ArXivPDFHTML

Papers citing "MiniCPM-V: A GPT-4V Level MLLM on Your Phone"

19 / 69 papers shown
Title
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Infinity-MM: Scaling Multimodal Performance with Large-Scale and High-Quality Instruction Data
Shuhao Gu
Jialing Zhang
Siyuan Zhou
Kevin Yu
Zhaohu Xing
...
Yufeng Cui
Xinlong Wang
Yaoqi Liu
Fangxiang Feng
Guang Liu
SyDa
VLM
MLLM
59
25
0
24 Oct 2024
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction
Long Xing
Qidong Huang
Xiaoyi Dong
Jiajie Lu
Pan Zhang
...
Yuhang Cao
Zeang Sheng
Jiaqi Wang
Feng Wu
Dahua Lin
VLM
66
35
0
22 Oct 2024
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning Steps
Xiongtao Zhou
Jie He
Lanyu Chen
Jingyu Li
Haojing Chen
Víctor Gutiérrez-Basulto
Jeff Z. Pan
Ningyu Zhang
LRM
74
1
0
18 Oct 2024
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
MultiChartQA: Benchmarking Vision-Language Models on Multi-Chart Problems
Zifeng Zhu
Mengzhao Jia
Zizhuo Zhang
Lang Li
Meng Jiang
LRM
83
4
0
18 Oct 2024
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
Chenxi Wang
Xiang Chen
N. Zhang
Bozhong Tian
Haoming Xu
Shumin Deng
Ningyu Zhang
MLLM
LRM
141
7
0
15 Oct 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
102
30
0
14 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
87
26
0
10 Oct 2024
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
ActiView: Evaluating Active Perception Ability for Multimodal Large Language Models
Ziyue Wang
Chi Chen
Ziyue Wang
Yurui Dong
Yuanchi Zhang
Yuzhuang Xu
Xiaolong Wang
Ziwei Sun
Yang Liu
LRM
57
3
0
07 Oct 2024
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
TLDR: Token-Level Detective Reward Model for Large Vision Language Models
Deqing Fu
Tong Xiao
Rui Wang
Wang Zhu
Pengchuan Zhang
Guan Pang
Robin Jia
Lawrence Chen
87
6
0
07 Oct 2024
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Self-Correction is More than Refinement: A Learning Framework for Visual and Language Reasoning Tasks
Jiayi He
Hehai Lin
Q. Wang
Yi R. Fung
Chenhui Xu
ReLM
LRM
131
7
0
05 Oct 2024
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions
Weifeng Lin
Xinyu Wei
Renrui Zhang
Le Zhuo
Shitian Zhao
...
Junlin Xie
Junlin Xie
Yu Qiao
Peng Gao
Hongsheng Li
MLLM
DiffM
105
13
0
23 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
103
59
0
19 Sep 2024
One missing piece in Vision and Language: A Survey on Comics Understanding
One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli
Andrey Barsky
Mohamed Ali Souibgui
Artemis LLabres
Marco Bertini
Dimosthenis Karatzas
68
4
0
14 Sep 2024
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Beyond the Hype: A dispassionate look at vision-language models in medical scenario
Yang Nan
Huichi Zhou
Xiaodan Xing
Guang Yang
66
3
0
16 Aug 2024
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Effectively Leveraging CLIP for Generating Situational Summaries of Images and Videos
Dhruv Verma
Debaditya Roy
Basura Fernando
56
1
0
30 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
97
142
0
16 Jul 2024
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
MFC-Bench: Benchmarking Multimodal Fact-Checking with Large Vision-Language Models
Shengkang Wang
Hongzhan Lin
Ziyang Luo
Zhen Ye
Guang Chen
Jing Ma
86
3
0
17 Jun 2024
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement
Xiyao Wang
Jiuhai Chen
Zhaoyang Wang
Yuhang Zhou
Yiyang Zhou
...
Dinesh Manocha
Tom Goldstein
Parminder Bhatia
Furong Huang
Cao Xiao
104
35
0
24 May 2024
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
GOAT-Bench: Safety Insights to Large Multimodal Models through Meme-Based Social Abuse
Hongzhan Lin
Ziyang Luo
Bo Wang
Ruichao Yang
Jing Ma
72
28
0
03 Jan 2024
Previous
12