ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.23661
231
25
v1v2 (latest)

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

28 September 2025
Xiang An
Yin Xie
Kaicheng Yang
Wenkang Zhang
X. Zhao
Zheng Cheng
Y. Wang
Songcen Xu
Changrui Chen
Chunsheng Wu
Huajie Tan
Chunyuan Li
J. Yang
Jie Yu
Xiyao Wang
Bin Qin
Yumeng Wang
Zizhen Yan
Ziyong Feng
Ziwei Liu
Bo Li
Jiankang Deng
    MLLMVLMSyDa
ArXiv (abs)PDFHTMLHuggingFace (35 upvotes)Github
Main:13 Pages
6 Figures
Bibliography:1 Pages
1 Tables
Appendix:7 Pages
Abstract

We present LLaVA-OneVision-1.5, a novel family of Large Multimodal Models (LMMs) that achieve state-of-the-art performance with significantly reduced computational and financial costs. Different from the existing works, LLaVA-OneVision-1.5 provides an open, efficient, and reproducible framework for building high-quality vision-language models entirely from scratch. The LLaVA-OneVision-1.5 release comprises three primary components: (1) Large-Scale Curated Datasets: We construct an 85M concept-balanced pretraining dataset LLaVA-OneVision-1.5-Mid-Traning and a meticulously curated 22M instruction dataset LLaVA-OneVision-1.5-Instruct. (2) Efficient Training Framework: We develop a complete end-to-end efficient training framework leveraging an offline parallel data packing strategy to facilitate the training of LLaVA-OneVision-1.5 within a $16,000 budget. (3) State-of-the-art Performance: Experimental results demonstrate that LLaVA-OneVision-1.5 yields exceptionally competitive performance across a broad range of downstream tasks. Specifically, LLaVA-OneVision-1.5-8B outperforms Qwen2.5-VL-7B on 18 of 27 benchmarks, and LLaVA-OneVision-1.5-4B surpasses Qwen2.5-VL-3B on all 27 benchmarks. We anticipate releasing LLaVA-OneVision-1.5-RL shortly and encourage the community to await further updates.

View on arXiv
Comments on this paper