ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.18246
17
0

Efficient Single-Pass Training for Multi-Turn Reasoning

25 April 2025
Ritesh Goru
Shanay Mehta
Prateek Jain
    LRM
ArXivPDFHTML
Abstract

Training Large Language Models ( LLMs) to generate explicit reasoning before they produce an answer has been shown to improve their performance across various tasks such as mathematics and coding. However, fine-tuning LLMs on multi-turn reasoning datasets presents a unique challenge: LLMs must generate reasoning tokens that are excluded from subsequent inputs to the LLM. This discrepancy prevents us from processing an entire conversation in a single forward pass-an optimization readily available when we fine-tune on a multi-turn non-reasoning dataset. This paper proposes a novel approach that overcomes this limitation through response token duplication and a custom attention mask that enforces appropriate visibility constraints. Our approach significantly reduces the training time and allows efficient fine-tuning on multi-turn reasoning datasets.

View on arXiv
@article{goru2025_2504.18246,
  title={ Efficient Single-Pass Training for Multi-Turn Reasoning },
  author={ Ritesh Goru and Shanay Mehta and Prateek Jain },
  journal={arXiv preprint arXiv:2504.18246},
  year={ 2025 }
}
Comments on this paper