12

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

DeepSeek-AI
Aixin Liu
Aoxue Mei
Bangcai Lin
Bing Xue
Bingxuan Wang
Bingzheng Xu
Bochao Wu
Bowei Zhang
Chaofan Lin
Chen Dong
Chengda Lu
Chenggang Zhao
Chengqi Deng
Chenhao Xu
Chong Ruan
Damai Dai
Daya Guo
Dejian Yang
Deli Chen
Erhang Li
Fangqi Zhou
Fangyun Lin
Fucong Dai
Guangbo Hao
Guanting Chen
Guowei Li
H. Zhang
Hanwei Xu
Hao Li
Haofen Liang
Haoran Wei
Haowei Zhang
Haowen Luo
Haozhe Ji
Honghui Ding
Hongxuan Tang
Huanqi Cao
Huazuo Gao
Hui Qu
Hui Zeng
Jialiang Huang
Jiashi Li
Jiaxin Xu
Jiewen Hu
Jingchang Chen
Jingting Xiang
Jingyang Yuan
Jingyuan Cheng
Jinhua Zhu
Jun Ran
Junguang Jiang
Junjie Qiu
Junlong Li
Junxiao Song
Kai Dong
Kaige Gao
Kang Guan
Kexin Huang
Kexing Zhou
Kezhao Huang
Kuai Yu
Lean Wang
Lecong Zhang
Lei Wang
Liang Zhao
Liangsheng Yin
Lihua Guo
Lingxiao Luo
Linwang Ma
Litong Wang
Liyue Zhang
M.S. Di
M.Y Xu
Mingchuan Zhang
Minghua Zhang
Minghui Tang
Mingxu Zhou
Panpan Huang
Peixin Cong
Peiyi Wang
Qiancheng Wang
Qihao Zhu
Qingyang Li
Qinyu Chen
Qiushi Du
Ruiling Xu
Ruiqi Ge
Ruisong Zhang
Ruizhe Pan
Runji Wang
Runqiu Yin
Runxin Xu
Ruomeng Shen
Ruoyu Zhang
S.H. Liu
Shanghao Lu
Shangyan Zhou
Shanhuang Chen
Shaofei Cai
Main:16 Pages
8 Figures
Bibliography:3 Pages
8 Tables
Appendix:4 Pages
Abstract

We introduce DeepSeek-V3.2, a model that harmonizes high computational efficiency with superior reasoning and agent performance. The key technical breakthroughs of DeepSeek-V3.2 are as follows: (1) DeepSeek Sparse Attention (DSA): We introduce DSA, an efficient attention mechanism that substantially reduces computational complexity while preserving model performance in long-context scenarios. (2) Scalable Reinforcement Learning Framework: By implementing a robust reinforcement learning protocol and scaling post-training compute, DeepSeek-V3.2 performs comparably to GPT-5. Notably, our high-compute variant, DeepSeek-V3.2-Speciale, surpasses GPT-5 and exhibits reasoning proficiency on par with Gemini-3.0-Pro, achieving gold-medal performance in both the 2025 International Mathematical Olympiad (IMO) and the International Olympiad in Informatics (IOI). (3) Large-Scale Agentic Task Synthesis Pipeline: To integrate reasoning into tool-use scenarios, we developed a novel synthesis pipeline that systematically generates training data at scale. This methodology facilitates scalable agentic post-training, yielding substantial improvements in generalization and instruction-following robustness within complex, interactive environments.

View on arXiv
Comments on this paper