Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.11694
Cited By
v1
v2
v3
v4 (latest)
Enhancing LLM Reasoning with Reward-guided Tree Search
3 January 2025
Jinhao Jiang
Zhongfu Chen
Yingqian Min
Jie Chen
Xiaoxue Cheng
Jiapeng Wang
Yiru Tang
Haoxiang Sun
Jia Deng
Wayne Xin Zhao
Zhengyang Liang
Dong Yan
Jian Xie
Ziyi Wang
Ji-Rong Wen
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Enhancing LLM Reasoning with Reward-guided Tree Search"
24 / 24 papers shown
Title
Com
2
^2
2
: A Causal-Guided Benchmark for Exploring Complex Commonsense Reasoning in Large Language Models
Kai Xiong
Xiao Ding
Yixin Cao
Yuxiong Yan
Li Du
Yufei Zhang
Jinglong Gao
Jiaqian Liu
Bing Qin
Ting Liu
ReLM
LRM
25
0
0
08 Jun 2025
Evaluation is All You Need: Strategic Overclaiming of LLM Reasoning Capabilities Through Evaluation Design
Lin Sun
Weihong Lin
Jinzhu Wu
Yongfu Zhu
Xiaoqi Jian
...
Change Jia
Linglin Zhang
Sai-er Hu
Yuhan Wu
Xiangzheng Zhang
ELM
LRM
118
0
0
05 Jun 2025
Every Rollout Counts: Optimal Resource Allocation for Efficient Test-Time Scaling
Xinglin Wang
Yiwei Li
Shaoxiong Feng
Peiwen Yuan
Y. Zhang
Jiayi Shi
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
LRM
9
0
0
30 May 2025
Pangu DeepDiver: Adaptive Search Intensity Scaling via Open-Web Reinforcement Learning
Wenxuan Shi
Haochen Tan
Chuqiao Kuang
Xiaoguang Li
Xiaozhe Ren
...
Hanting Chen
Yasheng Wang
Lifeng Shang
Fisher Yu
Yunhe Wang
RALM
18
0
0
30 May 2025
Pedagogy-R1: Pedagogically-Aligned Reasoning Model with Balanced Educational Benchmark
Unggi Lee
Jaeyong Lee
Jiyeong Bae
Yeil Jeong
Junbo Koh
Gyeonggeon Lee
Gunho Lee
Taekyung Ahn
Hyeoncheol Kim
LRM
55
0
0
24 May 2025
Stepwise Reasoning Checkpoint Analysis: A Test Time Scaling Method to Enhance LLMs' Reasoning
Zezhong Wang
Xingshan Zeng
Weiwen Liu
Yijiao Wang
Liangyou Li
Yasheng Wang
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRM
67
0
0
23 May 2025
EquivPruner: Boosting Efficiency and Quality in LLM-Based Search via Action Pruning
Jiawei Liu
Qisi Chen
Jianshu Zhang
Quan Liu
Defu Lian
LLMAG
158
0
0
22 May 2025
RoT: Enhancing Table Reasoning with Iterative Row-Wise Traversals
Xuanliang Zhang
Dingzirui Wang
Keyan Xu
Qingfu Zhu
Wanxiang Che
LMTD
ReLM
LRM
100
0
0
21 May 2025
ZeroSearch: Incentivize the Search Capability of LLMs without Searching
Hao Sun
Zile Qiao
Jiayan Guo
Xuanbo Fan
Yingyan Hou
Yong Jiang
Pengjun Xie
Yan Zhang
Fei Huang
Jingren Zhou
OffRL
129
12
0
07 May 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
220
5
0
05 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
Mingxuan Yuan
Jianye Hao
Feng Wu
ReLM
LRM
154
1
0
03 May 2025
Slow Thinking for Sequential Recommendation
Junjie Zhang
Beichen Zhang
Wenqi Sun
Hongyu Lu
Wayne Xin Zhao
Yu Chen
Ji-Rong Wen
OffRL
LRM
105
1
0
13 Apr 2025
A Comprehensive Survey of Reward Models: Taxonomy, Applications, Challenges, and Future
Jialun Zhong
Wei Shen
Yanzeng Li
Songyang Gao
Hua Lu
Yicheng Chen
Yang Zhang
Wei Zhou
Jinjie Gu
Lei Zou
LRM
118
11
0
12 Apr 2025
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Sakhinana Sagar Srinivas
Akash Das
Shivam Gupta
Venkataramana Runkana
OffRL
120
1
0
02 Apr 2025
RARE: Retrieval-Augmented Reasoning Modeling
Zhengren Wang
Jiayang Yu
Dongsheng Ma
Zhe Chen
Yu Wang
...
Feiyu Xiong
Yanfeng Wang
Weinan E
Linpeng Tang
Wentao Zhang
RALM
LRM
113
3
0
30 Mar 2025
Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation
Yijia Luo
Yulin Song
Xingyao Zhang
Jiaheng Liu
Weixun Wang
Gengru Chen
Wenbo Su
Bo Zheng
LRM
114
11
0
20 Mar 2025
Mitigating Visual Forgetting via Take-along Visual Conditioning for Multi-modal Long CoT Reasoning
Hai-Long Sun
Zhun Sun
Houwen Peng
Han-Jia Ye
LRM
128
6
0
17 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yansen Wang
Shengqiong Wu
Yize Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
208
31
0
16 Mar 2025
Rewarding Curse: Analyze and Mitigate Reward Modeling Issues for LLM Reasoning
Jiachun Li
Pengfei Cao
Yubo Chen
Jiexin Xu
Huaijun Li
Xiaojian Jiang
Kang Liu
Jun Zhao
LRM
99
0
0
07 Mar 2025
An Empirical Study on Eliciting and Improving R1-like Reasoning Models
Zhongfu Chen
Yingqian Min
Beichen Zhang
Jie Chen
Jinhao Jiang
...
Xu Miao
Yaojie Lu
Lei Fang
Zhongyuan Wang
Ji-Rong Wen
ReLM
OffRL
LRM
156
37
0
06 Mar 2025
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Yunhua Zhou
Xipeng Qiu
LRM
174
20
0
17 Feb 2025
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning
Zhongzhen Huang
Gui Geng
Shengyi Hua
Zhen Huang
Haoyang Zou
Shanghang Zhang
Pengfei Liu
Xiaofan Zhang
LRM
98
15
0
11 Jan 2025
Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking
Xiaoxue Cheng
Junyi Li
Wayne Xin Zhao
Ji-Rong Wen
HILM
LRM
103
0
0
02 Jan 2025
Reasoning Through Execution: Unifying Process and Outcome Rewards for Code Generation
Zhuohao Yu
Weizheng Gu
Yidong Wang
Xingru Jiang
Zhengran Zeng
Jindong Wang
Wei Ye
Shikun Zhang
LRM
169
5
0
19 Dec 2024
1