Megrez-Omni Technical Report
Boxun Li
Yadong Li
Z. Li
Congyi Liu
Weilin Liu
Guowei Niu
Zheyue Tan
Haiyang Xu
Z. Yao
Tao Yuan
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Y. Wang
Abstract
In this work, we present the Megrez models, comprising a language model (Megrez-3B-Instruct) and a multimodal model (Megrez-3B-Omni). These models are designed to deliver fast inference, compactness, and robust edge-side intelligence through a software-hardware co-design approach. Megrez-3B-Instruct offers several advantages, including high accuracy, high speed, ease of use, and a wide range of applications. Building on Megrez-3B-Instruct, Megrez-3B-Omni is an on-device multimodal understanding LLM that supports image, text, and audio analysis. It achieves state-of-the-art accuracy across all three modalities and demonstrates strong versatility and robustness, setting a new benchmark for multimodal AI models.
View on arXiv@article{li2025_2502.15803, title={ Megrez-Omni Technical Report }, author={ Boxun Li and Yadong Li and Zhiyuan Li and Congyi Liu and Weilin Liu and Guowei Niu and Zheyue Tan and Haiyang Xu and Zhuyu Yao and Tao Yuan and Dong Zhou and Yueqing Zhuang and Shengen Yan and Guohao Dai and Yu Wang }, journal={arXiv preprint arXiv:2502.15803}, year={ 2025 } }
Comments on this paper