Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.09302
Cited By
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
11 October 2024
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization"
5 / 5 papers shown
Title
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDa
OffRL
ReLM
LRM
99
3
0
07 Apr 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
48
2
0
19 Mar 2025
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
Junbo Li
Zhangyang Wang
Qiang Liu
OffRL
89
0
0
09 Feb 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
32
0
0
09 Feb 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
50
0
0
31 Dec 2024
1