73
v1v2 (latest)

Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models

Junru Lu
Jiarui Qin
Lingfeng Qiao
Yinghui Li
Xinyi Dai
Bo Ke
Jianfeng He
Ruizhi Qiao
Di Yin
Xing Sun
Yunsheng Wu
Yinsong Liu
Shuangyin Liu
Mingkong Tang
Haodong Lin
Jiayi Kuang
Fanxu Meng
Xiaojuan Tang
Yunjia Xi
Junjie Huang
Haotong Yang
Zhenyi Shen
Yangning Li
Qianwen Zhang
Yifei Yu
Siyu An
Junnan Dong
Qiufeng Wang
Jie Wang
Keyu Chen
Wei Wen
Taian Guo
Zhifeng Shen
Daohai Yu
Jiahao Li
Ke Li
Zongyi Li
Xiaoyu Tan
Main:30 Pages
27 Figures
Bibliography:7 Pages
14 Tables
Appendix:20 Pages
Abstract

We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on distillation, Youtu-LLM (1.96B) is pre-trained from scratch to systematically cultivate reasoning and planning capabilities. The key technical advancements are as follows: (1) Compact Architecture with Long-Context Support: Built on a dense Multi-Latent Attention (MLA) architecture with a novel STEM-oriented vocabulary, Youtu-LLM supports a 128k context window. This design enables robust long-context reasoning and state tracking within a minimal memory footprint, making it ideal for long-horizon agent and reasoning tasks. (2) Principled "Commonsense-STEM-Agent" Curriculum: We curated a massive corpus of approximately 11T tokens and implemented a multi-stage training strategy. By progressively shifting the pre-training data distribution from general commonsense to complex STEM and agentic tasks, we ensure the model acquires deep cognitive abilities rather than superficial alignment. (3) Scalable Agentic Mid-training: Specifically for the agentic mid-training, we employ diverse data construction schemes to synthesize rich and varied trajectories across math, coding, and tool-use domains. This high-quality data enables the model to internalize planning and reflection behaviors effectively. Extensive evaluations show that Youtu-LLM sets a new state-of-the-art for sub-2B LLMs. On general benchmarks, it achieves competitive performance against larger models, while on agent-specific tasks, it significantly surpasses existing SOTA baselines, demonstrating that lightweight models can possess strong intrinsic agentic capabilities.

View on arXiv
Comments on this paper