Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2505.17952
Cited By
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
23 May 2025
Che Liu
Haozhe Wang
J. Pan
Zhongwei Wan
Yong Dai
Fangzhen Lin
Wenjia Bai
Daniel Rueckert
Rossella Arcucci
OffRL
LRM
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (20 upvotes)
Papers citing
"Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL"
45 / 45 papers shown
Title
ST-PPO: Stabilized Off-Policy Proximal Policy Optimization for Multi-Turn Agents Training
Chenliang Li
Adel Elmahdy
Alex Boyd
Zhongruo Wang
Alfredo García
Parminder Bhatia
Taha A. Kass-Hout
Cao Xiao
Mingyi Hong
OffRL
143
0
0
25 Nov 2025
MIMIC-SR-ICD11: A Dataset for Narrative-Based Diagnosis
Yuexin Wu
Shiqi Wang
Vasile Rus
109
0
0
07 Nov 2025
InfiMed-ORBIT: Aligning LLMs on Open-Ended Complex Tasks via Rubric-Based Incremental Training
Pengkai Wang
Qi Zuo
Pengwei Liu
Zhijie Sang
C. Xie
Hongxia Yang
LM&MA
280
0
0
17 Oct 2025
MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
Hongjie Zheng
Zesheng Shi
Ping Yi
102
0
0
12 Oct 2025
AdaThink-Med: Medical Adaptive Thinking with Uncertainty-Guided Length Calibration
Shaohao Rui
Kaitao Chen
Weijie Ma
Xiaosong Wang
MedIm
LRM
92
0
0
29 Sep 2025
Fleming-R1: Toward Expert-Level Medical Reasoning via Reinforcement Learning
Chi Liu
Derek Li
Yan Shu
Robin Chen
Derek Duan
Teng Fang
Bryan Dai
OffRL
LM&MA
LRM
130
2
0
18 Sep 2025
Emergent Hierarchical Reasoning in LLMs through Reinforcement Learning
Haozhe Wang
Qixin Xu
Che Liu
J. Wu
Fangzhen Lin
Wenhu Chen
LRM
169
18
0
03 Sep 2025
Baichuan-M2: Scaling Medical Capability with Large Verifier System
Baichuan-M2 Team
Chengfeng Dou
Chong Liu
Chenzheng Zhu
Fei Li
...
Zheng Liang
Zhishou Zhang
Hengfu Cui
Zuyi Zhu
X. Wang
LM&MA
ELM
LRM
144
13
0
02 Sep 2025
Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications
Wenxuan Wang
Zizhan Ma
Meidan Ding
S. Zheng
Shengyuan Liu
...
Jiaming Ji
Wenting Chen
Xiang Li
LinLin Shen
Yixuan Yuan
LRM
162
4
0
01 Aug 2025
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning
Zhongwei Wan
Zhihao Dou
Che Liu
Yu Zhang
Dongfei Cui
...
Yifan Jiang
Yangfan He
Mi Zhang
Shen Yan
Shen Yan
LRM
306
28
0
02 Jun 2025
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning
Haozhe Wang
Chao Qu
Zuming Huang
Wei Chu
Fangzhen Lin
Lei Ma
OffRL
ReLM
SyDa
LRM
VLM
438
157
0
10 Apr 2025
GenPRM: Scaling Test-Time Compute of Process Reward Models via Generative Reasoning
Jian Zhao
Runze Liu
Kaiyan Zhang
Zhimu Zhou
Junqi Gao
...
Jiafei Lyu
Zhouyi Qian
Biqing Qi
Xiu Li
Bowen Zhou
OffRL
LRM
374
21
0
01 Apr 2025
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning
Xiangru Tang
Daniel Shao
Jiwoong Sohn
Jiapeng Chen
Jiayi Zhang
...
Yilun Zhao
Chenglin Wu
Wenqi Shi
Arman Cohan
Mark B. Gerstein
AI4MH
LRM
ELM
LM&MA
280
25
0
10 Mar 2025
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
International Conference on Medical Image Computing and Computer-Assisted Intervention (MICCAI), 2025
Jiazhen Pan
Che Liu
Junde Wu
Fenglin Liu
Jiayuan Zhu
Hongwei Bran Li
Chen Chen
Cheng Ouyang
Daniel Rueckert
LRM
LM&MA
VLM
425
100
0
26 Feb 2025
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training
Tianzhe Chu
Yuexiang Zhai
Jihan Yang
Shengbang Tong
Saining Xie
Dale Schuurmans
Quoc V. Le
Sergey Levine
Yi-An Ma
OffRL
620
378
0
28 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
1.2K
5,274
0
22 Jan 2025
Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network
Ritik Mehta
Olha Jurecková
Mark Stamp
289
152
0
25 Dec 2024
Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Bo Wang
Shimin Li
Yunhua Zhou
Qipeng Guo
Qi Zhang
Jiaqi Leng
ELM
AI4TS
LRM
277
47
0
18 Dec 2024
Towards Next-Generation Medical Agent: How o1 is Reshaping Decision-Making in Medical Scenarios
Shaochen Xu
Yimiao Zhou
Ziqiang Liu
Zihao Wu
T. Zhong
...
Quanzheng Li
Andrea Sikora
Xiaoming Zhai
Zhen Xiang
Tianming Liu
LM&MA
247
10
0
16 Nov 2024
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
566
2,655
0
25 Oct 2024
OpenR: An Open Source Framework for Advanced Reasoning with Large Language Models
Jun Wang
Meng Fang
Bo Liu
Muning Wen
Jiachen Zhu
...
Lei Chen
Lionel M. Ni
Linyi Yang
Ying Wen
Weinan Zhang
LRM
202
59
0
12 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Rui Wang
Pengfei Liu
VLM
314
134
0
08 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
European Conference on Computer Systems (EuroSys), 2024
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
569
863
0
28 Sep 2024
A Preliminary Study of o1 in Medicine: Are We Closer to an AI Doctor?
Yunfei Xie
Juncheng Wu
Haoqin Tu
Siwei Yang
Bingchen Zhao
Yongshuo Zong
Qiao Jin
Cihang Xie
Yuyin Zhou
LM&MA
ELM
LRM
298
39
0
23 Sep 2024
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis
Junying Chen
Chi Gui
Anningzhe Gao
Ke Ji
Xidong Wang
Xiang Wan
Benyou Wang
MedIm
AI4CE
LM&MA
122
42
0
18 Jul 2024
UltraMedical: Building Specialized Generalists in Biomedicine
Kaiyan Zhang
Sihang Zeng
Ermo Hua
Ning Ding
Zhang-Ren Chen
...
Xuekai Zhu
Xingtai Lv
Hu Jinfang
Zhiyuan Liu
Bowen Zhou
LM&MA
221
54
0
06 Jun 2024
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Yubo Wang
Xueguang Ma
Ge Zhang
Yuansheng Ni
Abhranil Chandra
...
Kai Wang
Alex Zhuang
Rongqi Fan
Xiang Yue
Wenhu Chen
LRM
ELM
565
1,011
0
03 Jun 2024
Capabilities of Gemini Models in Medicine
Khaled Saab
Tao Tu
Wei-Hung Weng
Ryutaro Tanno
David Stutz
...
Christopher Semturs
S. S. Mahdavi
Juraj Gottweis
Alan Karthikesalingam
Vivek Natarajan
ELM
AI4MH
LM&MA
209
282
0
29 Apr 2024
Med42 -- Evaluating Fine-Tuning Strategies for Medical LLMs: Full-Parameter vs. Parameter-Efficient Approaches
Clément Christophe
Praveen K Kanithi
Prateek Munjal
Tathagata Raha
Nasir Hayat
...
Charles Chen
Natalia Vassilieva
Boulbaba Ben Amor
Marco AF Pimentel
Shadab Khan
AI4MH
LM&MA
168
64
0
23 Apr 2024
MEIT: Multimodal Electrocardiogram Instruction Tuning on Large Language Models for Report Generation
Zhongwei Wan
Che Liu
Xin Wang
Chaofan Tao
Hui Shen
Zhenwu Peng
Jie Fu
Rossella Arcucci
Huaxiu Yao
366
16
0
07 Mar 2024
Towards Building Multilingual Language Model for Medicine
Pengcheng Qiu
Chaoyi Wu
Xiaoman Zhang
Weixiong Lin
Haicheng Wang
Ya Zhang
Yanfeng Wang
Weidi Xie
LM&MA
ELM
467
141
0
21 Feb 2024
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains
Yanis Labrak
Adrien Bazoge
Emmanuel Morin
P. Gourraud
Mickael Rouvier
Richard Dufour
431
354
0
15 Feb 2024
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
433
1,572
0
20 Nov 2023
HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs
Junying Chen
Xidong Wang
Anningzhe Gao
Feng Jiang
Shunian Chen
...
Chuyi Kong
Jianquan Li
Xiang Wan
Haizhou Li
Benyou Wang
LM&MA
202
105
0
16 Nov 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
823
6,459
0
29 May 2023
HuatuoGPT, towards Taming Language Model to Be a Doctor
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Hongbo Zhang
Junying Chen
Feng Jiang
Fei Yu
Zhihong Chen
...
Zhiyi Zhang
Qingying Xiao
Xiang Wan
Benyou Wang
Haizhou Li
LM&MA
AI4MH
ELM
208
286
0
24 May 2023
Continual Pre-training of Language Models
International Conference on Learning Representations (ICLR), 2023
Zixuan Ke
Yijia Shao
Haowei Lin
Tatsuya Konishi
Gyuhak Kim
Yinan Han
CLL
KELM
423
187
0
07 Feb 2023
Scaling Instruction-Finetuned Language Models
Journal of machine learning research (JMLR), 2022
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
948
3,763
0
20 Oct 2022
STaR: Bootstrapping Reasoning With Reasoning
E. Zelikman
Yuhuai Wu
Jesse Mu
Noah D. Goodman
ReLM
LRM
503
686
0
28 Mar 2022
MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering
ACM Conference on Health, Inference, and Learning (ACM CHIL), 2022
Ankit Pal
Logesh Kumar Umapathi
Malaikannan Sankarasubbu
ELM
LM&MA
401
505
0
27 Mar 2022
Training language models to follow instructions with human feedback
Neural Information Processing Systems (NeurIPS), 2022
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
2.0K
17,090
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
2.2K
14,159
0
28 Jan 2022
What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams
Applied Sciences (Appl. Sci.), 2020
Di Jin
Eileen Pan
Nassim Oufattole
W. Weng
Hanyi Fang
Peter Szolovits
FaML
ELM
LM&MA
404
1,205
0
28 Sep 2020
PubMedQA: A Dataset for Biomedical Research Question Answering
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2019
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
719
1,253
0
13 Sep 2019
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
1.1K
23,699
0
20 Jul 2017
1