PaddleOCR-VL-1.5: Towards a Multi-Task 0.9B VLM for Robust In-the-Wild Document Parsing

29 January 2026

Cheng Cui

Ting Sun

Suyin Liang

Tingquan Gao

Zelun Zhang

Jiaxuan Liu

Xueqing Wang

Changda Zhou

Hongen Liu

Manhui Lin

Yue Zhang

Yubo Zhang

Yi Liu

Dianhai Yu

Yanjun Ma

ArXiv (abs)PDF HTML HuggingFace (13 upvotes)Github (69190★)

Main:15 Pages

20 Figures

Bibliography:3 Pages

9 Tables

Appendix:28 Pages

Abstract

We introduce PaddleOCR-VL-1.5, an upgraded model achieving a new state-of-the-art (SOTA) accuracy of 94.5% on OmniDocBench v1.5. To rigorously evaluate robustness against real-world physical distortions, including scanning, skew, warping, screen-photography, and illumination, we propose the Real5-OmniDocBench benchmark. Experimental results demonstrate that this enhanced model attains SOTA performance on the newly curated benchmark. Furthermore, we extend the model's capabilities by incorporating seal recognition and text spotting tasks, while remaining a 0.9B ultra-compact VLM with high efficiency. Code:this https URL

View on arXiv

Comments on this paper