161

Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning

Main:9 Pages
6 Figures
Bibliography:2 Pages
3 Tables
Appendix:14 Pages
Abstract

Research on applications of Reinforcement Learning (RL) to Large Language Models (LLMs) has mostly been focused on single-turn problems, such as mathematical reasoning or single-shot code generation. While these problems can be viewed as token-level multi-turn MDPs, this view corresponds to a degenerate case of multi-turn interaction where the environment provides no feedback. This contrasts with many real-world domains, such as software engineering (SWE), which require rich multi-turn interactions with a stateful environment that responds to each action with a non-trivial observation.

View on arXiv
Comments on this paper