Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.12599
Cited By
Kimi k1.5: Scaling Reinforcement Learning with LLMs
22 January 2025
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
Cheng Chen
Cheng Li
Chenjun Xiao
C. Du
Chonghua Liao
C. Tang
C. Wang
Dehao Zhang
Enming Yuan
Enzhe Lu
Fengxiang Tang
Flood Sung
Guangda Wei
Guokun Lai
Haiqing Guo
Han Zhu
Hao Ding
Hao Hu
Hao Yang
Hao Zhang
Haotian Yao
Haotian Zhao
Haoyu Lu
H. Li
Haozhen Yu
Hongcheng Gao
Huabin Zheng
Huan Yuan
Jia-Yu Chen
Jianhang Guo
Jianlin Su
J. Wang
J. Zhao
Jin Zhang
J. Liu
Junjie Yan
J. Wu
Lidong Shi
Ling Ye
L. Yu
Mengnan Dong
N. Zhang
Ningchen Ma
Qiwei Pan
Qucheng Gong
S. Liu
Shengling Ma
Shupeng Wei
Sihan Cao
S. Huang
Tao Jiang
W. Gao
Weimin Xiong
Weiran He
W. R. Huang
W. Wu
Wenyang He
Xianghui Wei
Xianqing Jia
Xingzhe Wu
Xinran Xu
Xinxing Zu
Xinyu Zhou
Xuehai Pan
Y. Charles
Yang Li
Y. Hu
Y. Liu
Y. Chen
Yejie Wang
Yibo Liu
Yidao Qin
Y. Liu
Y. Yang
Yiping Bao
Yulun Du
Yuxin Wu
Yuzhi Wang
Zaida Zhou
Z. Wang
Z. Li
Zhen Zhu
Zheng Zhang
Zhexu Wang
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Kimi k1.5: Scaling Reinforcement Learning with LLMs"
50 / 100 papers shown
Title
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1
Yi Chen
Yuying Ge
Rui Wang
Yixiao Ge
Lu Qiu
Ying Shan
Xihui Liu
ReLM
VLM
OffRL
LRM
52
2
0
31 Mar 2025
Efficient Inference for Large Reasoning Models: A Survey
Y. Liu
Jiaying Wu
Yufei He
Hongcheng Gao
Hongyu Chen
Baolong Bi
Jiaheng Zhang
Zhiqi Huang
Bryan Hooi
LLMAG
LRM
58
7
0
29 Mar 2025
Reasoning-SQL: Reinforcement Learning with SQL Tailored Partial Rewards for Reasoning-Enhanced Text-to-SQL
Mohammadreza Pourreza
Shayan Talaei
Ruoxi Sun
Xingchen Wan
Hailong Li
Azalia Mirhoseini
Amin Saberi
Sercan Ö. Arik
ReLM
AI4TS
LRM
37
1
0
29 Mar 2025
Think Before Recommend: Unleashing the Latent Reasoning Power for Sequential Recommendation
Jiakai Tang
Sunhao Dai
Teng Shi
Jun Xu
X. Chen
Wen Chen
Wu Jian
Yuning Jiang
LRM
52
5
0
28 Mar 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
W. Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Y. Zhuang
LM&Ro
LRM
65
2
0
27 Mar 2025
ReaRAG: Knowledge-guided Reasoning Enhances Factuality of Large Reasoning Models with Iterative Retrieval Augmented Generation
Zhicheng Lee
S. Cao
Jinxin Liu
J. Zhang
Weichuan Liu
Xiaoyin Che
Lei Hou
Juanzi Li
ReLM
LRM
77
2
0
27 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
76
11
0
27 Mar 2025
Video-R1: Reinforcing Video Reasoning in MLLMs
Kaituo Feng
Kaixiong Gong
B. Li
Zonghao Guo
Yibing Wang
Tianshuo Peng
Benyou Wang
Xiangyu Yue
SyDa
AI4TS
LRM
46
13
0
27 Mar 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
59
2
0
26 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLM
OffRL
LRM
90
6
0
26 Mar 2025
Learning to chain-of-thought with Jensen's evidence lower bound
Yunhao Tang
Sid Wang
Rémi Munos
BDL
OffRL
LRM
45
0
0
25 Mar 2025
Scaling Laws of Synthetic Data for Language Models
Zeyu Qin
Qingxiu Dong
Xingxing Zhang
Li Dong
Xiaolong Huang
...
Hany Awadalla
Yi R. Fung
Weizhu Chen
Minhao Cheng
Furu Wei
SyDa
71
1
0
25 Mar 2025
Process or Result? Manipulated Ending Tokens Can Mislead Reasoning LLMs to Ignore the Correct Reasoning Steps
Yu Cui
Bryan Hooi
Yujun Cai
Yiwei Wang
LRM
24
2
0
25 Mar 2025
RL-finetuning LLMs from on- and off-policy data with a single algorithm
Yunhao Tang
Taco Cohen
David W. Zhang
Michal Valko
Rémi Munos
OffRL
37
1
0
25 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Y. Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLM
VLM
LRM
74
3
0
25 Mar 2025
Video-T1: Test-Time Scaling for Video Generation
F. Liu
Hanyang Wang
Yimo Cai
Kaiyan Zhang
Xiaohang Zhan
Yueqi Duan
DiffM
VGen
76
1
0
24 Mar 2025
SimpleRL-Zoo: Investigating and Taming Zero Reinforcement Learning for Open Base Models in the Wild
Weihao Zeng
Yuzhen Huang
Qian Liu
Wei Liu
Keqing He
Zejun Ma
Junxian He
OffRL
ReLM
LRM
88
28
0
24 Mar 2025
Mind with Eyes: from Language Reasoning to Multimodal Reasoning
Zhiyu Lin
Yifei Gao
Xian Zhao
Yunfan Yang
Jitao Sang
LRM
42
1
0
23 Mar 2025
Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilities
Weixiang Zhao
Xingyu Sui
Jiahe Guo
Yulin Hu
Yang Deng
Yanyan Zhao
Bing Qin
Wanxiang Che
Tat-Seng Chua
Ting Liu
ELM
LRM
52
4
0
23 Mar 2025
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Zhiyuan Liu
Yuting Zhang
Feng Liu
Changwang Zhang
Ying Sun
Jun Wang
LRM
66
2
0
20 Mar 2025
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Yang Sui
Yu-Neng Chuang
Guanchu Wang
Jiamu Zhang
Tianyi Zhang
...
Hongyi Liu
Andrew Wen
Shaochen
Zhong
Hanjie Chen
OffRL
ReLM
LRM
58
21
0
20 Mar 2025
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo
Yulin Song
Xingyao Zhang
Jiaheng Liu
Weixun Wang
Gengru Chen
Wenbo Su
Bo Zheng
LRM
52
4
0
20 Mar 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
86
2
0
18 Mar 2025
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Qiying Yu
Z. Zhang
Ruofei Zhu
Yufeng Yuan
Xiaochen Zuo
...
Ya-Qin Zhang
Lin Yan
Mu Qiao
Yonghui Wu
Mingxuan Wang
OffRL
LRM
59
41
0
18 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
H. Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
47
17
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Y. Wang
Shengqiong Wu
Y. Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
74
7
0
16 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
62
11
0
14 Mar 2025
Large Language Models for Outpatient Referral: Problem Definition, Benchmarking and Challenges
Xiaoxiao Liu
Qingying Xiao
Junying Chen
Xiangyi Feng
Xiangbo Wu
...
Xiang Wan
Jian Chang
Guangjun Yu
Yan Hu
Benyou Wang
LM&MA
LRM
56
0
0
11 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
86
29
0
10 Mar 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Y. Liu
Jin Jiang
M. Zhang
Jian Shao
Yueting Zhuang
LRM
ReLM
53
3
0
09 Mar 2025
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Huatong Song
Jinhao Jiang
Yingqian Min
Jie Chen
Z. Chen
Wayne Xin Zhao
Lei Fang
Ji-Rong Wen
AI4TS
LRM
KELM
93
6
0
07 Mar 2025
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen
J. Zhang
Jieyun Huang
Shuming Shi
Wenjing Zhang
Jiangze Yan
Ning Wang
Kai Wang
Shiguo Lian
LRM
66
12
0
06 Mar 2025
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Z. Chen
Yingqian Min
Beichen Zhang
Jie Chen
Jinhao Jiang
...
Xu Miao
Y. Lu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
ReLM
OffRL
LRM
73
14
0
06 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Z. Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
72
5
0
04 Mar 2025
Visual-RFT: Visual Reinforcement Fine-Tuning
Ziyu Liu
Zeyi Sun
Yuhang Zang
Xiaoyi Dong
Y. Cao
Haodong Duan
D. Lin
Jiaqi Wang
ObjD
VLM
LRM
62
40
0
03 Mar 2025
Safety Tax: Safety Alignment Makes Your Large Reasoning Models Less Reasonable
Tiansheng Huang
Sihao Hu
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
Yichang Xu
Ling Liu
48
8
0
01 Mar 2025
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete
Yuheng Ji
Huajie Tan
Jiayu Shi
Xiaoshuai Hao
Yuan Zhang
...
Huaihai Lyu
Xiaolong Zheng
Jiaming Liu
Zhongyuan Wang
Shanghang Zhang
77
5
0
28 Feb 2025
FINEREASON: Evaluating and Improving LLMs' Deliberate Reasoning through Reflective Puzzle Solving
Guizhen Chen
Weiwen Xu
Hao Zhang
Hou Pong Chan
Chaoqun Liu
Lidong Bing
Deli Zhao
Anh Tuan Luu
Yu Rong
ReLM
LRM
43
3
0
27 Feb 2025
SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning
Zexiong Ma
Chao Peng
Pengfei Gao
Xiangxin Meng
Yanzhen Zou
Bing Xie
MoMe
OffRL
LRM
57
4
0
27 Feb 2025
MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
LRM
34
5
0
24 Feb 2025
Theoretical Physics Benchmark (TPBench) -- a Dataset and Study of AI Reasoning Capabilities in Theoretical Physics
Daniel J.H. Chung
Zhiqi Gao
Yurii Kvasiuk
Tianyi Li
Moritz Münchmeyer
Maja Rudolph
Frederic Sala
Sai Chaitanya Tadepalli
AIMat
39
3
0
19 Feb 2025
Atom of Thoughts for Markov LLM Test-Time Scaling
Fengwei Teng
Zhaoyang Yu
Quan Shi
Jiayi Zhang
Chenglin Wu
Yuyu Luo
MU
LRM
49
13
0
17 Feb 2025
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanwei Li
Yu Qi
...
Shen Yan
Bo Zhang
Chaoyou Fu
Peng Gao
Hongsheng Li
MLLM
LRM
77
21
0
13 Feb 2025
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Zongyu Lin
Yao Tang
Xingcheng Yao
Da Yin
Ziniu Hu
Yizhou Sun
Kai-Wei Chang
LRM
47
3
0
04 Feb 2025
Brief analysis of DeepSeek R1 and its implications for Generative AI
Sarah Mercer
Samuel Spillard
Daniel P. Martin
58
10
0
04 Feb 2025
Process Reinforcement through Implicit Rewards
Ganqu Cui
Lifan Yuan
Z. Wang
Hanbin Wang
Wendi Li
...
Yu Cheng
Zhiyuan Liu
Maosong Sun
Bowen Zhou
Ning Ding
OffRL
LRM
68
50
0
03 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun-Xiong Xia
Tianyi Wu
Zhiwei Xue
Y. Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
111
13
0
30 Jan 2025
Do NOT Think That Much for 2+3=? On the Overthinking of o1-Like LLMs
Xingyu Chen
Jiahao Xu
Tian Liang
Zhiwei He
Jianhui Pang
...
Z. Zhang
Rui Wang
Zhaopeng Tu
Haitao Mi
Dong Yu
LRM
ReLM
51
90
0
30 Dec 2024
Understanding Layer Significance in LLM Alignment
Guangyuan Shi
Zexin Lu
Xiaoyu Dong
Wenlong Zhang
Xuanyu Zhang
Yujie Feng
Xiao-Ming Wu
38
2
0
23 Oct 2024
Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective
Yotam Wolf
Binyamin Rothberg
Dorin Shteyman
Amnon Shashua
13
0
0
26 Sep 2024
Previous
1
2