ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.03300
  4. Cited By
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open
  Language Models
v1v2v3 (latest)

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

5 February 2024
Zhihong Shao
Peiyi Wang
Qihao Zhu
Runxin Xu
Jun-Mei Song
Xiao Bi
Haowei Zhang
Mingchuan Zhang
Yiming Li
Yu-Huan Wu
Daya Guo
    ReLMLRM
ArXiv (abs)PDFHTMLHuggingFace (125 upvotes)Github (3224★)

Papers citing "DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models"

50 / 2,701 papers shown
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
Teng Wang
Zhangyi Jiang
Zhenqi He
Shenyang Tong
Wenhan Yang
...
Zifan He
Hailei Gong
Zewen Ye
Shengjie Ma
Jianping Zhang
LRM
711
13
0
10 Apr 2026
Search-R3: Unifying Reasoning and Embedding in Large Language Models
Search-R3: Unifying Reasoning and Embedding in Large Language Models
Yuntao Gui
James Cheng
KELMLRM
264
2
0
10 Apr 2026
PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
Yunxiao Wang
Meng Liu
Wenqi Liu
Kaiyu Jiang
Bin Wen
Fan Yang
Tingting Gao
LRM
171
1
0
10 Apr 2026
Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework
Mitigating Visual Context Degradation in Large Multimodal Models: A Training-Free Decoupled Agentic Framework
Hongrui Jia
Chaoya Jiang
Shikun Zhang
Wei Ye
LRM
184
1
0
10 Apr 2026
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning
Zhenchao Tang
Fang Wang
Haohuai He
Jiale Zhou
Tianxu Lv
...
J. Yao
Jiehui Huang
Dawei Huang
Zhi Song
Jianhua Yao
CLLAI4MHLM&MAAI4CE
543
1
0
30 Mar 2026
Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients
Complexity-Aware Deep Symbolic Regression with Robust Risk-Seeking Policy Gradients
Zachary Bastiani
R. Kirby
Jacob Hochhalter
Shandian Zhe
314
3
0
30 Mar 2026
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Beyond Log Likelihood: Probability-Based Objectives for Supervised Fine-Tuning across the Model Capability Continuum
Gaotang Li
Ruizhong Qiu
Xiusi Chen
Heng Ji
Hanghang Tong
ELM
190
10
0
30 Mar 2026
Humanline: Online Alignment as Perceptual Loss
Humanline: Online Alignment as Perceptual Loss
Sijia Liu
Niklas Muennighoff
Kawin Ethayarajh
OnRL
131
0
0
30 Mar 2026
Clinical Metadata Guided Limited-Angle CT Image Reconstruction
Clinical Metadata Guided Limited-Angle CT Image Reconstruction
Yu Shi
S. Fan
Changsheng Fang
Shuo Han
Haodong Li
Li Zhou
Bahareh Morovati
Dayang Wang
Hengyong Yu
MedIm
151
1
0
30 Mar 2026
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models
Fatemeh
Nourzad
Amirhossein Roknilamouki
Eylem Ekici
Ness B. Shroff
365
0
0
27 Mar 2026
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Zhengyu He
MLLMMoEAuLLMVLMLRM
452
11
0
27 Mar 2026
C$^2$GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
C2^22GSPG: Confidence-calibrated Group Sequence Policy Gradient towards Self-aware Reasoning
Haotian Liu
Shuo Wang
Hongteng Xu
LRM
215
0
0
24 Dec 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI ArchitecturesInternational Symposium on Computer Architecture (ISCA), 2025
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yun Wang
Yuxuan Liu
Y. X. Wei
MoE
314
67
0
24 Dec 2025
RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow
RemoteReasoner: Towards Unifying Geospatial Reasoning Workflow
Liang Yao
Fan Liu
Hongbo Lu
Chuanyi Zhang
Rui Min
Shengxiang Xu
Shimin Di
Pai Peng
LRM
372
12
0
24 Dec 2025
Reinforcement Learning for Large Model: A Survey
Reinforcement Learning for Large Model: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
434
2
0
24 Dec 2025
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Select2Reason: Efficient Instruction-Tuning Data Selection for Long-CoT Reasoning
Cehao Yang
Xueyuan Lin
Chengjin Xu
Xuhui Jiang
Xiaojun Wu
Honghao Liu
Hui Xiong
Jian Guo
LRM
363
5
0
24 Dec 2025
Environment Scaling for Interactive Agentic Experience Collection: A Survey
Environment Scaling for Interactive Agentic Experience Collection: A Survey
Y. Huang
S. Li
Minghao Liu
Wei Liu
Shijue Huang
Zhiyuan Fan
Hou Pong Chan
Yi R. Fung
282
0
0
24 Dec 2025
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning
Zhenpeng Su
Leiyu Pan
Minxuan Lv
Tiehua Mei
Zijia Lin
Yuntao Li
Wenping Hu
Ruiming Tang
Kun Gai
G. Zhou
46
2
0
05 Dec 2025
Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framework for Open-Ended LLM Alignment
Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framework for Open-Ended LLM Alignment
Panatchakorn Anantaprayoon
Nataliia Babina
Jad Tarifi
Nima Asgharbeygi
127
1
0
05 Dec 2025
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity
Germán Kruszewski
Pierre Erbacher
Jos Rozen
Marc Dymetman
276
1
0
05 Dec 2025
Value Gradient Guidance for Flow Matching Alignment
Value Gradient Guidance for Flow Matching Alignment
Zhen Liu
Tim Z. Xiao
Carles Domingo-Enrich
Weiyang Liu
Dinghuai Zhang
110
3
0
04 Dec 2025
Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning
Semantic Soft Bootstrapping: Long Context Reasoning in LLMs without Reinforcement Learning
Purbesh Mitra
S. Ulukus
OffRLReLMLRM
242
7
0
04 Dec 2025
Are Your Agents Upward Deceivers?
Are Your Agents Upward Deceivers?
Dadi Guo
Qingyu Liu
Dongrui Liu
Qihan Ren
Shuai Shao
...
Z. Chen
Jialing Tao
Yaodong Yang
Jing Shao
Xia Hu
LLMAG
224
3
0
04 Dec 2025
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
RRPO: Robust Reward Policy Optimization for LLM-based Emotional TTS
Cong Wang
Changfeng Gao
Yang Xiang
Zhihao Du
Keyu An
Han Zhao
Qian Chen
Xiangang Li
Yingming Gao
Ya Li
103
2
0
04 Dec 2025
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Visual Reasoning Tracer: Object-Level Grounded Reasoning Benchmark
Haobo Yuan
Yueyi Sun
Yanwei Li
Tao Zhang
XueQing Deng
Henghui Ding
Lu Qi
Anran Wang
X. Li
Ming-Hsuan Yang
ReLMLRM
427
3
0
04 Dec 2025
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Joey Hong
Kang Liu
Zhan Ling
Jiecao Chen
Sergey Levine
LLMAGOffRL
280
4
0
04 Dec 2025
Structured Document Translation via Format Reinforcement Learning
Structured Document Translation via Format Reinforcement Learning
Haiyue Song
Johannes Eschbach-Dymanus
Hour Kaing
Sumire Honda
Hideki Tanaka
Bianka Buschbeck
Masao Utiyama
134
0
0
04 Dec 2025
CARL: Focusing Agentic Reinforcement Learning on Critical Actions
CARL: Focusing Agentic Reinforcement Learning on Critical Actions
Leyang Shen
Y. Zhang
Chun Kai Ling
Xiaoyan Zhao
Tat-Seng Chua
235
0
0
04 Dec 2025
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
YingMusic-Singer: Zero-shot Singing Voice Synthesis and Editing with Annotation-free Melody Guidance
Junjie Zheng
Chunbo Hao
Guobin Ma
Xiaoyu Zhang
Gongyu Chen
Chaofan Ding
Zihao Chen
Lei Xie
DiffM
233
4
0
04 Dec 2025
Learning to Orchestrate Agents in Natural Language with the Conductor
Learning to Orchestrate Agents in Natural Language with the Conductor
Stefan Nielsen
Edoardo Cetin
Peter Schwendeman
Qi Sun
Jinglue Xu
Yujin Tang
LLMAG
186
2
0
04 Dec 2025
YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases
YingMusic-SVC: Real-World Robust Zero-Shot Singing Voice Conversion with Flow-GRPO and Singing-Specific Inductive Biases
Gongyu Chen
Xiaoyu Zhang
Zhenqiang Weng
Junjie Zheng
Da Shen
Chaofan Ding
Wei-Qiang Zhang
Zihao Chen
97
3
0
04 Dec 2025
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
ARM-Thinker: Reinforcing Multimodal Generative Reward Models with Agentic Tool Use and Visual Reasoning
Shengyuan Ding
Xinyu Fang
Ziyu Liu
Yuhang Zang
Yuhang Cao
...
Jianze Liang
Bin Wang
Conghui He
Dahua Lin
Jiaqi Wang
LRM
284
3
0
04 Dec 2025
TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning
TempR1: Improving Temporal Understanding of MLLMs via Temporal-Aware Multi-Task Reinforcement Learning
Tao Wu
Li Yang
Gen Zhan
Y. Zhang
Yiting Liao
Junlin Li
Deliang Fu
Li Zhang
Limin Wang
AI4TSLRM
336
3
0
03 Dec 2025
AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
AdaptVision: Efficient Vision-Language Models via Adaptive Visual Acquisition
Zichuan Lin
Y. Liu
Yang Yang
Lvfang Tao
Deheng Ye
VLM
192
4
0
03 Dec 2025
Multimodal Reinforcement Learning with Adaptive Verifier for AI Agents
Multimodal Reinforcement Learning with Adaptive Verifier for AI Agents
Reuben Tan
Baolin Peng
Zhengyuan Yang
Hao Cheng
Oier Mees
...
Xiaodong Liu
Lijuan Wang
Marc Pollefeys
Yong Jae Lee
Jianfeng Gao
LRM
287
2
0
03 Dec 2025
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Principled RL for Diffusion LLMs Emerges from a Sequence-Level Perspective
Jingyang Ou
Jiaqi Han
Minkai Xu
Shaoxuan Xu
Jianwen Xie
Stefano Ermon
Yi Wu
Chongxuan Li
DiffM
187
9
0
03 Dec 2025
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
PosterCopilot: Toward Layout Reasoning and Controllable Editing for Professional Graphic Design
Jiazhe Wei
Ken Li
Tianyu Lao
Haofan Wang
Liang Wang
Caifeng Shan
Chenyang Si
179
2
0
03 Dec 2025
Thinking with Programming Vision: Towards a Unified View for Thinking with Images
Thinking with Programming Vision: Towards a Unified View for Thinking with Images
Zirun Guo
Minjie Hong
Feng Zhang
Kai Jia
Tao Jin
OffRLLRMVLM
333
6
0
03 Dec 2025
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Balancing Safety and Helpfulness in Healthcare AI Assistants through Iterative Preference Alignment
Huy Nghiem
Swetasudha Panda
Devashish Khatwani
Huy Nguyen
Krishnaram Kenthapadi
Hal Daumé III
LM&MA
167
0
0
03 Dec 2025
PretrainZero: Reinforcement Active Pretraining
PretrainZero: Reinforcement Active Pretraining
Xingrun Xing
Zhiyuan Fan
Jie Lou
G. Li
Jiajun Zhang
Debing Zhang
OffRLAIMatReLMLRMAI4CE
531
2
0
03 Dec 2025
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
Dingwei Zhu
Zhiheng Xi
Shihan Dou
Yuhui Wang
Sixian Li
...
Caishuang Huang
Yunke Zhang
Demei Yan
Yuran Wang
Tao Gui
OffRL
170
0
0
03 Dec 2025
Better World Models Can Lead to Better Post-Training Performance
Better World Models Can Lead to Better Post-Training Performance
Prakhar Gupta
Henry Conklin
Sarah-Jane Leslie
Andrew Lee
OffRL
188
2
0
03 Dec 2025
MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
MarkTune: Improving the Quality-Detectability Trade-off in Open-Weight LLM Watermarking
Yizhou Zhao
Zhiwei Steven Wu
Adam Block
226
0
0
03 Dec 2025
LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling
LSRS: Latent Scale Rejection Sampling for Visual Autoregressive Modeling
Hong-Kai Zheng
Piji Li
107
0
0
03 Dec 2025
On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement
On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement
Wenlong Deng
Yushu Li
Boying Gong
Yi Ren
Christos Thrampoulidis
Xiaoxiao Li
179
7
0
03 Dec 2025
Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order
Bootstrapped Mixed Rewards for RL Post-Training: Injecting Canonical Action Order
Prakhar Gupta
Vaibhav Gupta
55
0
0
03 Dec 2025
ReasonX: MLLM-Guided Intrinsic Image Decomposition
ReasonX: MLLM-Guided Intrinsic Image Decomposition
Alara Dirik
Tuanfeng Y. Wang
Duygu Ceylan
Stefanos Zafeiriou
Anna Frühstück
118
2
0
03 Dec 2025
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models
Xinyue Ai
Yutong He
Albert Gu
Ruslan Salakhutdinov
J. Zico Kolter
Nicholas Matthew Boffi
Max Simchowitz
126
3
0
02 Dec 2025
Taming Camera-Controlled Video Generation with Verifiable Geometry Reward
Taming Camera-Controlled Video Generation with Verifiable Geometry Reward
Zhaoqing Wang
Xiaobo Xia
Zhuolin Bie
Jinlin Liu
Dongdong Yu
Jia-Wang Bian
Changhu Wang
EGVMVGen
202
1
0
02 Dec 2025
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
SR-GRPO: Stable Rank as an Intrinsic Geometric Reward for Large Language Model Alignment
Yixuan Tang
Yi Yang
ALM
216
0
0
02 Dec 2025
1234...535455
Next
Page 1 of 55
Pageof 55