ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.09302
  4. Cited By
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization

11 October 2024
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
    OffRL
    LRM
ArXivPDFHTML

Papers citing "Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization"

5 / 5 papers shown
Title
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
Anna Goldie
Azalia Mirhoseini
Hao Zhou
Irene Cai
Christopher D. Manning
SyDa
OffRL
ReLM
LRM
99
3
0
07 Apr 2025
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
48
2
0
19 Mar 2025
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
Junbo Li
Zhangyang Wang
Qiang Liu
OffRL
89
0
0
09 Feb 2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Qingyue Zhao
Kaixuan Ji
Heyang Zhao
Tong Zhang
Q. Gu
OffRL
30
0
0
09 Feb 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
50
0
0
31 Dec 2024
1