ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.15771
24
1

Learning to Reason from Feedback at Test-Time

16 February 2025
Yanyang Li
M. Lyu
Liwei Wang
    LRM
ArXivPDFHTML
Abstract

Solving complex tasks in a single attempt is challenging for large language models (LLMs). Iterative interaction with the environment and feedback is often required to achieve success, making effective feedback utilization a critical topic. Existing approaches either struggle with length generalization or rely on naive retries without leveraging prior information. In this paper, we introduce FTTT, a novel paradigm that formulates feedback utilization as an optimization problem at test time. Additionally, we propose a learnable test-time optimizer, OpTune, to effectively exploit feedback. Experiments on two LLMs across four reasoning datasets demonstrate that FTTT and OpTune achieve superior scalability and performance.

View on arXiv
@article{li2025_2502.15771,
  title={ Learning to Reason from Feedback at Test-Time },
  author={ Yanyang Li and Michael Lyu and Liwei Wang },
  journal={arXiv preprint arXiv:2502.15771},
  year={ 2025 }
}
Comments on this paper