ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17952
  4. Cited By
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL

Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL

23 May 2025
Che Liu
Haozhe Wang
J. Pan
Zhongwei Wan
Yong Dai
Fangzhen Lin
Wenjia Bai
Daniel Rueckert
Rossella Arcucci
    OffRLLRMELM
ArXiv (abs)PDFHTMLHuggingFace (20 upvotes)

Papers citing "Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL"

45 / 45 papers shown
Title
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
Chenliang Li
Adel Elmahdy
Alex Boyd
Zhongruo Wang
Alfredo García
Parminder Bhatia
Taha A. Kass-Hout
Cao Xiao
Mingyi Hong
OffRL
143
0
0
25 Nov 2025
MIMIC-SR-ICD11: A Dataset for Narrative-Based Diagnosis
MIMIC-SR-ICD11: A Dataset for Narrative-Based Diagnosis
Yuexin Wu
Shiqi Wang
Vasile Rus
109
0
0
07 Nov 2025
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training
Pengkai Wang
Qi Zuo
Pengwei Liu
Zhijie Sang
C. Xie
Hongxia Yang
LM&MA
280
0
0
17 Oct 2025
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
Hongjie Zheng
Zesheng Shi
Ping Yi
102
0
0
12 Oct 2025
AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
Shaohao Rui
Kaitao Chen
Weijie Ma
Xiaosong Wang
MedImLRM
92
0
0
29 Sep 2025
Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning
Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning
Chi Liu
Derek Li
Yan Shu
Robin Chen
Derek Duan
Teng Fang
Bryan Dai
OffRLLM&MALRM
130
2
0
18 Sep 2025
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Haozhe Wang
Qixin Xu
Che Liu
J. Wu
Fangzhen Lin
Wenhu Chen
LRM
169
18
0
03 Sep 2025
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Baichuan-M2 Team
Chengfeng Dou
Chong Liu
Chenzheng Zhu
Fei Li
...
Zheng Liang
Zhishou Zhang
Hengfu Cui
Zuyi Zhu
X. Wang
LM&MAELMLRM
144
13
0
02 Sep 2025
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Wenxuan Wang
Zizhan Ma
Meidan Ding
S. Zheng
Shengyuan Liu
...
Jiaming Ji
Wenting Chen
Xiang Li
LinLin Shen
Yixuan Yuan
LRM
162
4
0
01 Aug 2025
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
Zhongwei Wan
Zhihao Dou
Che Liu
Yu Zhang
Dongfei Cui
...
Yifan Jiang
Yangfan He
Mi Zhang
Shen Yan
Shen Yan
LRM
306
28
0
02 Jun 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
Chao Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Lei Ma
OffRLReLMSyDaLRMVLM
438
157
0
10 Apr 2025
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
Jian Zhao
Runze Liu
Kaiyan Zhang
Zhimu Zhou
Junqi Gao
...
Jiafei Lyu
Zhouyi Qian
Biqing Qi
Xiu Li
Bowen Zhou
OffRLLRM
374
21
0
01 Apr 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MHLRMELMLM&MA
276
25
0
10 Mar 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement LearningInternational Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRMLM&MAVLM
425
100
0
26 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
620
378
0
28 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLMVLMOffRLAI4TSLRM
1.2K
5,274
0
22 Jan 2025
Malware Classification using a Hybrid Hidden Markov Model-Convolutional
  Neural Network
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
289
152
0
25 Dec 2024
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
  Reinforcement Learning Perspective
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Bo Wang
Shimin Li
Yunhua Zhou
Qipeng Guo
Qi Zhang
Jiaqi Leng
ELMAI4TSLRM
277
47
0
18 Dec 2024
Towards Next-Generation Medical Agent: How o1 is Reshaping
  Decision-Making in Medical Scenarios
Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios
Shaochen Xu
Yimiao Zhou
Ziqiang Liu
Zihao Wu
T. Zhong
...
Quanzheng Li
Andrea Sikora
Xiaoming Zhai
Zhen Xiang
Tianming Liu
LM&MA
247
10
0
16 Nov 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
566
2,655
0
25 Oct 2024
OpenR: An Open Source Framework for Advanced Reasoning with Large
  Language Models
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Jun Wang
Meng Fang
Bo Liu
Muning Wen
Jiachen Zhu
...
Lei Chen
Lionel M. Ni
Linyi Yang
Ying Wen
Weinan Zhang
LRM
202
59
0
12 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Rui Wang
Pengfei Liu
VLM
314
134
0
08 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow: A Flexible and Efficient RLHF FrameworkEuropean Conference on Computer Systems (EuroSys), 2024
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
569
863
0
28 Sep 2024
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Yunfei Xie
Juncheng Wu
Haoqin Tu
Siwei Yang
Bingchen Zhao
Yongshuo Zong
Qiao Jin
Cihang Xie
Yuyin Zhou
LM&MAELMLRM
298
39
0
23 Sep 2024
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Junying Chen
Chi Gui
Anningzhe Gao
Ke Ji
Xidong Wang
Xiang Wan
Benyou Wang
MedImAI4CELM&MA
122
42
0
18 Jul 2024
UltraMedical: Building Specialized Generalists in Biomedicine
UltraMedical: Building Specialized Generalists in Biomedicine
Kaiyan Zhang
Sihang Zeng
Ermo Hua
Ning Ding
Zhang-Ren Chen
...
Xuekai Zhu
Xingtai Lv
Hu Jinfang
Zhiyuan Liu
Bowen Zhou
LM&MA
221
54
0
06 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language
  Understanding Benchmark
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRMELM
565
1,011
0
03 Jun 2024
Capabilities of Gemini Models in Medicine
Capabilities of Gemini Models in Medicine
Khaled Saab
Tao Tu
Wei-Hung Weng
Ryutaro Tanno
David Stutz
...
Christopher Semturs
S. S. Mahdavi
Juraj Gottweis
Alan Karthikesalingam
Vivek Natarajan
ELMAI4MHLM&MA
209
282
0
29 Apr 2024
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs:
  Full-Parameter vs. Parameter-Efficient Approaches
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches
Clément Christophe
Praveen K Kanithi
Prateek Munjal
Tathagata Raha
Nasir Hayat
...
Charles Chen
Natalia Vassilieva
Boulbaba Ben Amor
Marco AF Pimentel
Shadab Khan
AI4MHLM&MA
164
64
0
23 Apr 2024
MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
Zhongwei Wan
Che Liu
Xin Wang
Chaofan Tao
Hui Shen
Zhenwu Peng
Jie Fu
Rossella Arcucci
Huaxiu Yao
366
16
0
07 Mar 2024
Towards Building Multilingual Language Model for Medicine
Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu
Chaoyi Wu
Xiaoman Zhang
Weixiong Lin
Haicheng Wang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MAELM
467
141
0
21 Feb 2024
BioMistral: A Collection of Open-Source Pretrained Large Language Models
  for Medical Domains
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains
Yanis Labrak
Adrien Bazoge
Emmanuel Morin
P. Gourraud
Mickael Rouvier
Richard Dufour
431
354
0
15 Feb 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
433
1,572
0
20 Nov 2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen
Xidong Wang
Anningzhe Gao
Feng Jiang
Shunian Chen
...
Chuyi Kong
Jianquan Li
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
202
105
0
16 Nov 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
823
6,459
0
29 May 2023
HuatuoGPT, towards Taming Language Model to Be a Doctor
HuatuoGPT, towards Taming Language Model to Be a DoctorConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hongbo Zhang
Junying Chen
Feng Jiang
Fei Yu
Zhihong Chen
...
Zhiyi Zhang
Qingying Xiao
Xiang Wan
Benyou Wang
Haizhou Li
LM&MAAI4MHELM
208
286
0
24 May 2023
Continual Pre-training of Language Models
Continual Pre-training of Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Zixuan Ke
Yijia Shao
Haowei Lin
Tatsuya Konishi
Gyuhak Kim
Yinan Han
CLLKELM
423
187
0
07 Feb 2023
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language ModelsJournal of machine learning research (JMLR), 2022
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLMLRM
948
3,763
0
20 Oct 2022
STaR: Bootstrapping Reasoning With Reasoning
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLMLRM
503
686
0
28 Mar 2022
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical
  domain Question Answering
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question AnsweringACM Conference on Health, Inference, and Learning (ACM CHIL), 2022
Ankit Pal
Logesh Kumar Umapathi
Malaikannan Sankarasubbu
ELMLM&MA
401
505
0
27 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedbackNeural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
2.0K
17,090
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.2K
14,159
0
28 Jan 2022
What Disease does this Patient Have? A Large-scale Open Domain Question
  Answering Dataset from Medical Exams
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical ExamsApplied Sciences (Appl. Sci.), 2020
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaMLELMLM&MA
404
1,205
0
28 Sep 2020
PubMedQA: A Dataset for Biomedical Research Question Answering
PubMedQA: A Dataset for Biomedical Research Question AnsweringConference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
719
1,253
0
13 Sep 2019
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
1.1K
23,699
0
20 Jul 2017
1