Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2509.08755
Cited By
AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning
10 September 2025
Zhiheng Xi
J. Huang
Chenyang Liao
Baodai Huang
Honglin Guo
Jiaqi Liu
Rui Zheng
Junjie Ye
Jiazheng Zhang
Wenxiang Chen
Wei He
Yiwen Ding
Guanyu Li
Zehui Chen
Zhengyin Du
Xuesong Yao
Yufei Xu
Jiecao Chen
Tao Gui
Zuxuan Wu
Qi Zhang
Xuanjing Huang
Yu-Gang Jiang
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (53 upvotes)
Github (14★)
Papers citing
"AgentGym-RL: Training LLM Agents for Long-Horizon Decision Making through Multi-Turn Reinforcement Learning"
11 / 11 papers shown
DVPO: Distributional Value Modeling-based Policy Optimization for LLM Post-Training
Dingwei Zhu
Zhiheng Xi
Shihan Dou
Yuhui Wang
Sixian Li
...
Caishuang Huang
Yunke Zhang
Demei Yan
Yuran Wang
Tao Gui
OffRL
128
0
0
03 Dec 2025
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration
Hongjin Su
Shizhe Diao
Ximing Lu
Mingjie Liu
Jiacheng Xu
...
Evelina Bakhturina
Tao Yu
Yejin Choi
Jan Kautz
Pavlo Molchanov
257
4
0
26 Nov 2025
Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs
Meng Lu
Ran Xu
Yi Fang
Wenxuan Zhang
Yue Yu
...
Guanghua Xiao
Hanrui Wang
Di Jin
W. Shi
Xuan Wang
LRM
139
1
0
24 Nov 2025
Graph-Enhanced Policy Optimization in LLM Agent Training
Jiazhen Yuan
Wei Zhao
Zhengbiao Bai
87
0
0
30 Oct 2025
DeepAgent: A General Reasoning Agent with Scalable Toolsets
Xiaoxi Li
Wenxiang Jiao
Jiarui Jin
Guanting Dong
Jiajie Jin
...
H. Wang
Yutao Zhu
Ji-Rong Wen
Yuan Lu
Zhicheng Dou
LLMAG
LRM
130
7
0
24 Oct 2025
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
Minhua Lin
Zongyu Wu
Zhichao Xu
Hui Liu
Xianfeng Tang
Qi He
Charu C. Aggarwal
Hui Liu
Xiang Zhang
Suhang Wang
AI4TS
LRM
558
2
0
19 Oct 2025
From Refusal to Recovery: A Control-Theoretic Approach to Generative AI Guardrails
Ravi Pandya
Madison Bland
D. Nguyen
Changliu Liu
J. F. Fisac
Andrea V. Bajcsy
138
1
0
15 Oct 2025
Revisiting Long-context Modeling from Context Denoising Perspective
Zecheng Tang
Baibei Ji
Juntao Li
Lijun Wu
Haijia Gui
Min Zhang
171
0
0
07 Oct 2025
Efficient Multi-turn RL for GUI Agents via Decoupled Training and Adaptive Data Curation
P. Li
Zechen Hu
Zirui Shang
J. Wu
Y. Liu
...
Xinxiao Wu
Yunde Jia
Liuyu Xiang
Zhaofeng He
Qing Li
OffRL
143
1
0
28 Sep 2025
DeepTravel: An End-to-End Agentic Reinforcement Learning Framework for Autonomous Travel Planning Agents
Yansong Ning
Rui Liu
Jun Wang
Kai Chen
W. Li
Jun Fang
Kan Zheng
Naiqiang Tan
Hao Liu
127
6
0
26 Sep 2025
Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
Hanjiang Hu
Changliu Liu
Na Li
Yebin Wang
OffRL
LRM
129
0
0
24 Sep 2025
1