ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.14283
  4. Cited By
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

20 June 2024
Chaojie Wang
Yanchen Deng
Zhiyi Lyu
Liang Zeng
Jujie He
Shuicheng Yan
Bo An
    LRM
    ReLM
ArXivPDFHTML

Papers citing "Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning"

41 / 41 papers shown
Title
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
Prejudge-Before-Think: Enhancing Large Language Models at Test-Time by Process Prejudge Reasoning
J. T. Wang
Jin Jiang
Yang Liu
M. Zhang
Xunliang Cai
LRM
32
0
0
18 Apr 2025
A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
A Short Survey on Small Reasoning Models: Training, Inference, Applications and Research Directions
Chengyu Wang
Taolin Zhang
Richang Hong
Jun Huang
ReLM
LRM
32
1
0
12 Apr 2025
The Mind in the Machine: A Survey of Incorporating Psychological Theories in LLMs
The Mind in the Machine: A Survey of Incorporating Psychological Theories in LLMs
Zizhou Liu
Ziwei Gong
Lin Ai
Zheng Hui
Run Chen
Colin Wayne Leach
Michelle R. Greene
Julia Hirschberg
LLMAG
50
0
0
28 Mar 2025
Process Reward Modeling with Entropy-Driven Uncertainty
Process Reward Modeling with Entropy-Driven Uncertainty
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
...
Y. Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
34
0
0
28 Mar 2025
Controlling Large Language Model with Latent Actions
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
46
0
0
27 Mar 2025
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Enhancing LLM Reasoning with Iterative DPO: A Comprehensive Empirical Investigation
Songjun Tu
Jiahao Lin
Xiangyu Tian
Qichao Zhang
Linjing Li
...
Nan Xu
Wei He
Xiangyuan Lan
D. Jiang
Dongbin Zhao
LRM
42
2
0
17 Mar 2025
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training
Tong Wei
Yijun Yang
Junliang Xing
Yuanchun Shi
Zongqing Lu
Deheng Ye
OffRL
LRM
39
1
0
11 Mar 2025
Better Process Supervision with Bi-directional Rewarding Signals
Wenxiang Chen
Wei He
Zhiheng Xi
Honglin Guo
Boyang Hong
...
Nijun Li
Tao Gui
Yun Li
Qi Zhang
Xuanjing Huang
LRM
42
2
0
06 Mar 2025
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Multi2: Multi-Agent Test-Time Scalable Framework for Multi-Document Processing
Juntai Cao
Xiang Zhang
Raymond Li
Chuyuan Li
Shafiq R. Joty
Giuseppe Carenini
54
1
0
27 Feb 2025
Data-Efficient Multi-Agent Spatial Planning with LLMs
Data-Efficient Multi-Agent Spatial Planning with LLMs
Huangyuan Su
Aaron Walsman
Daniel Garces
Sham Kakade
Stephanie Gil
LLMAG
Presented at ResearchTrend Connect | LLMAG on 28 Mar 2025
126
0
0
26 Feb 2025
How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities
How Far are LLMs from Real Search? A Comprehensive Study on Efficiency, Completeness, and Inherent Capabilities
M. Lin
Hui Liu
X. Tang
Jingying Zeng
Zhenwei Dai
Chen Luo
Zheng Li
Xiang Zhang
Qi He
Suhang Wang
OffRL
LRM
39
0
0
25 Feb 2025
AgentRM: Enhancing Agent Generalization with Reward Modeling
AgentRM: Enhancing Agent Generalization with Reward Modeling
Yu Xia
Jingru Fan
Weize Chen
Siyu Yan
Xin Cong
Zhong Zhang
Y. Lu
Yankai Lin
Zhiyuan Liu
Maosong Sun
43
1
0
25 Feb 2025
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Dynamic Parallel Tree Search for Efficient LLM Reasoning
Yifu Ding
Wentao Jiang
Shunyu Liu
Yongcheng Jing
J. Guo
...
Zengmao Wang
Z. Liu
Bo Du
X. Liu
Dacheng Tao
LRM
44
4
0
22 Feb 2025
SIFT: Grounding LLM Reasoning in Contexts via Stickers
SIFT: Grounding LLM Reasoning in Contexts via Stickers
Zihao Zeng
Xuyao Huang
Boxiu Li
Zhijie Deng
LRM
33
2
0
19 Feb 2025
S$^2$R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
S2^22R: Teaching LLMs to Self-verify and Self-correct via Reinforcement Learning
Ruotian Ma
Peisong Wang
Cheng Liu
Xingyan Liu
Jiaqi Chen
Bang Zhang
Xin Zhou
Nan Du
Jia Li
LRM
54
2
0
18 Feb 2025
Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering
Ontology-Guided Reverse Thinking Makes Large Language Models Stronger on Knowledge Graph Question Answering
Runxuan Liu
Bei Luo
Jiaqi Li
Baoxin Wang
Ming Liu
Dayong Wu
Shijin Wang
Bing Qin
LRM
36
0
0
17 Feb 2025
Bag of Tricks for Inference-time Computation of LLM Reasoning
Bag of Tricks for Inference-time Computation of LLM Reasoning
Fan Liu
Wenshuo Chao
Naiqiang Tan
Hao Liu
OffRL
LRM
69
3
0
11 Feb 2025
Policy Guided Tree Search for Enhanced LLM Reasoning
Policy Guided Tree Search for Enhanced LLM Reasoning
Yang Li
LRM
45
0
0
04 Feb 2025
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Zongyu Lin
Yao Tang
Xingcheng Yao
Da Yin
Ziniu Hu
Yizhou Sun
Kai-Wei Chang
LRM
45
3
0
04 Feb 2025
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking
Xinyu Guan
L. Zhang
Yifei Liu
Ning Shang
Youran Sun
Yi Zhu
Fan Yang
Mao Yang
LRM
SyDa
ReLM
52
74
0
08 Jan 2025
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree
  Search and Progress Reward Modeling
Think&Cite: Improving Attributed Text Generation with Self-Guided Tree Search and Progress Reward Modeling
Junyi Li
Hwee Tou Ng
LRM
66
1
0
19 Dec 2024
AtomThink: A Slow Thinking Framework for Multimodal Mathematical
  Reasoning
AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning
Kun Xiang
Zhili Liu
Zihao Jiang
Yunshuang Nie
Runhui Huang
...
Yihan Zeng
J. Han
Lanqing Hong
Hang Xu
Xiaodan Liang
LRM
99
10
0
18 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
33
3
0
17 Nov 2024
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Matryoshka: Learning to Drive Black-Box LLMs with LLMs
Changhao Li
Yuchen Zhuang
Rushi Qiang
Haotian Sun
H. Dai
Chao Zhang
Bo Dai
LRM
13
4
0
28 Oct 2024
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
DAWN-ICL: Strategic Planning of Problem-solving Trajectories for Zero-Shot In-Context Learning
Xinyu Tang
Xiaolei Wang
Wayne Xin Zhao
Ji-Rong Wen
27
3
0
26 Oct 2024
$C^2$: Scalable Auto-Feedback for LLM-based Chart Generation
C2C^2C2: Scalable Auto-Feedback for LLM-based Chart Generation
Woosung Koh
Jang Han Yoon
M. Lee
Youngjin Song
Jaegwan Cho
Jaehyun Kang
Taehyeon Kim
Se-Young Yun
Youngjae Yu
B. Lee
37
0
0
24 Oct 2024
Process Reward Model with Q-Value Rankings
Process Reward Model with Q-Value Rankings
W. Li
Yixuan Li
LRM
39
13
0
15 Oct 2024
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based
  Verifiers
VerifierQ: Enhancing LLM Test Time Compute with Q-Learning-based Verifiers
Jianing Qi
Hao Tang
Zhigang Zhu
OffRL
LRM
18
0
0
10 Oct 2024
Animating the Past: Reconstruct Trilobite via Video Generation
Animating the Past: Reconstruct Trilobite via Video Generation
Xiaoran Wu
Zien Huang
Chonghan Yu
VGen
35
1
0
10 Oct 2024
O1 Replication Journey: A Strategic Progress Report -- Part 1
O1 Replication Journey: A Strategic Progress Report -- Part 1
Yiwei Qin
Xuefeng Li
Haoyang Zou
Yixiu Liu
Shijie Xia
...
Yixin Ye
Weizhe Yuan
Hector Liu
Y. Li
Pengfei Liu
VLM
35
61
0
08 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
  Mathematical Reasoning
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
19
42
0
03 Oct 2024
Quantifying Generalization Complexity for Large Language Models
Quantifying Generalization Complexity for Large Language Models
Zhenting Qi
Hongyin Luo
Xuliang Huang
Zhuokai Zhao
Yibo Jiang
Xiangjun Fan
Himabindu Lakkaraju
James Glass
LRM
ELM
21
5
0
02 Oct 2024
Enhancing Language Model Rationality with Bi-Directional Deliberation
  Reasoning
Enhancing Language Model Rationality with Bi-Directional Deliberation Reasoning
Yadong Zhang
Shaoguang Mao
Wenshan Wu
Yan Xia
Tao Ge
Man Lan
Furu Wei
35
1
0
08 Jul 2024
LiteSearch: Efficacious Tree Search for LLM
LiteSearch: Efficacious Tree Search for LLM
Ante Wang
Linfeng Song
Ye Tian
Baolin Peng
Dian Yu
Haitao Mi
Jinsong Su
Dong Yu
33
14
0
29 Jun 2024
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning
  in LLMs
Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs
Xuan Zhang
Chao Du
Tianyu Pang
Qian Liu
Wei Gao
Min-Bin Lin
LRM
AI4CE
31
34
0
13 Jun 2024
A Survey on Large Language Model-Based Game Agents
A Survey on Large Language Model-Based Game Agents
Sihao Hu
Tiansheng Huang
Gaowen Liu
Ramana Rao Kompella
Gaowen Liu
Selim Furkan Tekin
Yichang Xu
Zachary Yahn
Ling Liu
LLMAG
LM&Ro
AI4CE
LM&MA
51
49
0
02 Apr 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,953
0
22 Mar 2023
Complexity-Based Prompting for Multi-Step Reasoning
Complexity-Based Prompting for Multi-Step Reasoning
Yao Fu
Hao-Chun Peng
Ashish Sabharwal
Peter Clark
Tushar Khot
ReLM
LRM
152
298
0
03 Oct 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
297
3,163
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1