Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2503.15478
Cited By
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
19 March 2025
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (13 upvotes)
Papers citing
"SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks"
36 / 36 papers shown
Title
CriticSearch: Fine-Grained Credit Assignment for Search Agents via a Retrospective Critic
Yaocheng Zhang
Haohuan Huang
Zijun Song
Yuanheng Zhu
Qichao Zhang
Zijie Zhao
Dongbin Zhao
OffRL
LRM
132
0
0
15 Nov 2025
Conformal Constrained Policy Optimization for Cost-Effective LLM Agents
Wenwen Si
Sooyong Jang
Insup Lee
Osbert Bastani
LLMAG
143
0
0
14 Nov 2025
Training Proactive and Personalized LLM Agents
Weiwei Sun
Xuhui Zhou
Weihua Du
Xingyao Wang
Sean Welleck
Graham Neubig
Maarten Sap
Yiming Yang
196
1
0
04 Nov 2025
Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation
Zhiwei Zhang
Xiaomin Li
Yudi Lin
Hui Liu
Ramraj Chandradevan
...
Minhua Lin
Fali Wang
Xianfeng Tang
Qi He
Suhang Wang
LLMAG
LRM
235
0
0
04 Nov 2025
The Collaboration Gap
Tim R. Davidson
Adam Fourney
Saleema Amershi
Robert West
Eric Horvitz
Ece Kamar
96
0
0
04 Nov 2025
Tree Training: Accelerating Agentic LLMs Training via Shared Prefix Reuse
Shaojie Wang
Jinghui Wang
Yinghan Cui
Xuxing Chen
Chao Wang
...
Xiaojiang Zhang
J. Peng
Li Wan
Haotian Zhang
Bin Chen
118
0
0
01 Nov 2025
InteractComp: Evaluating Search Agents With Ambiguous Queries
Mingyi Deng
Lijun Huang
Yani Fan
Jiayi Zhang
Fashen Ren
...
Xinyu Wang
Xiangru Tang
Nan Tang
Chenglin Wu
Yuyu Luo
140
1
0
28 Oct 2025
Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning
Qiang Liu
Wuganjing Song
Zhenzhou Lin
Feifan Chen
Qiaolong Cai
Chen Li
Yongduo Sui
OffRL
LRM
128
0
0
24 Oct 2025
RLBoost: Harvesting Preemptible Resources for Cost-Efficient Reinforcement Learning on LLMs
Yongji Wu
Xueshen Liu
Haizhong Zheng
Juncheng Gu
Beidi Chen
Z. Morley Mao
Arvind Krishnamurthy
Eric Liang
OffRL
OnRL
229
1
0
22 Oct 2025
Offline Policy Evaluation of Multi-Turn LLM Health Coaching with Real Users
Melik Ozolcer
Sang Won Bae
OffRL
154
0
0
20 Oct 2025
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety
Jingyu Zhang
Haozhu Wang
Eric Michael Smith
Sid Wang
Amr Sharaf
Mahesh Pasupuleti
Benjamin Van Durme
Daniel Khashabi
Jason Weston
Hongyuan Zhan
104
1
0
09 Oct 2025
Vul-R2: A Reasoning LLM for Automated Vulnerability Repair
Xin-Cheng Wen
Zirui Lin
Yijun Yang
Cuiyun Gao
Deheng Ye
LRM
104
2
0
07 Oct 2025
BIRD-INTERACT: Re-imagining Text-to-SQL Evaluation for Large Language Models via Lens of Dynamic Interactions
Nan Huo
Xiaohan Xu
Jinyang Li
Per Jacobsson
Shipei Lin
...
Hongyu Liu
Chenhao Ma
Fatma Ozcan
Yannis Papakonstantinou
Reynold Cheng
LMTD
VLM
244
3
0
06 Oct 2025
In-Place Feedback: A New Paradigm for Guiding LLMs in Multi-Turn Reasoning
Youngbin Choi
M. Lee
Saemi Moon
Seunghyuk Cho
Chaehyeon Chung
Moonjeong Park
Dongwoo Kim
KELM
LRM
104
0
0
01 Oct 2025
A Practitioner's Guide to Multi-turn Agentic Reinforcement Learning
Ruiyi Wang
Prithviraj Ammanabrolu
125
0
0
01 Oct 2025
Test-Time Policy Adaptation for Enhanced Multi-Turn Interactions with LLMs
Chenxing Wei
Hong Wang
Ying He
Fei Richard Yu
Yao Shu
96
1
0
27 Sep 2025
Non-Collaborative User Simulators for Tool Agents
Jeonghoon Shim
Woojung Song
Cheyon Jin
Seungwon KooK
Yohan Jo
LLMAG
210
1
0
27 Sep 2025
Training Task Reasoning LLM Agents for Multi-turn Task Planning via Single-turn Reinforcement Learning
Hanjiang Hu
Changliu Liu
Na Li
Yebin Wang
OffRL
LRM
111
0
0
24 Sep 2025
Efficient On-Device Agents via Adaptive Context Management
Sanidhya Vijayvargiya
Rahul Lokesh
81
0
0
24 Sep 2025
Learning When to Plan: Efficiently Allocating Test-Time Compute for LLM Agents
Davide Paglieri
Bartłomiej Cupiał
Jonathan Cook
Ulyana Piterbarg
Jens Tuyls
Edward Grefenstette
Jakob Foerster
Jack Parker-Holder
Tim Rocktaschel
LLMAG
234
1
0
03 Sep 2025
StepWiser: Stepwise Generative Judges for Wiser Reasoning
Wei Xiong
Wenting Zhao
Weizhe Yuan
O. Yu. Golovneva
Tong Zhang
Jason Weston
Sainbayar Sukhbaatar
LRM
116
12
0
26 Aug 2025
History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
Jingkai He
Tianjian Li
Erhu Feng
Dong Du
Qian Liu
Tao Liu
Yubin Xia
Haibo Chen
117
14
0
26 Aug 2025
Stabilizing Long-term Multi-turn Reinforcement Learning with Gated Rewards
Zetian Sun
Dongfang Li
Zhuoen Chen
Yuhuai Qin
Baotian Hu
OffRL
92
2
0
14 Aug 2025
SEA: Self-Evolution Agent with Step-wise Reward for Computer Use
Liang Tang
Shuxian Li
Yuhao Cheng
Yukang Huo
Zhepeng Wang
Yiqiang Yan
Kaer Huang
Yanzhe Jing
Tiaonan Duan
214
6
0
06 Aug 2025
Agnostics: Learning to Code in Any Programming Language via Reinforcement with a Universal Learning Environment
Aleksander Boruch-Gruszecki
Yangtian Zi
Zixuan Wu
Tejas Oberoi
Carolyn Jane Anderson
Joydeep Biswas
Arjun Guha
SyDa
OffRL
125
2
0
06 Aug 2025
PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier
Y. Jiang
Yuwen Xiong
Yufeng Yuan
Chao Xin
Wenyuan Xu
Yu Yue
Qianchuan Zhao
Lin Yan
LRM
281
9
0
12 Jun 2025
Response-Level Rewards Are All You Need for Online Reinforcement Learning in LLMs: A Mathematical Perspective
Shenghua He
Tian Xia
Xuan Zhou
Hui Wei
OffRL
210
2
0
03 Jun 2025
Self-Challenging Language Model Agents
Yifei Zhou
Sergey Levine
Jason Weston
Xian Li
Sainbayar Sukhbaatar
ALM
ELM
385
18
0
02 Jun 2025
OWL: Optimized Workforce Learning for General Multi-Agent Assistance in Real-World Task Automation
Mengkang Hu
Yuhang Zhou
Wendong Fan
Yuzhou Nie
Bowei Xia
...
Yifeng Wang
Qianshuo Ye
Bernard Ghanem
Ping Luo
Guohao Li
754
69
0
29 May 2025
Get Experience from Practice: LLM Agents with Record & Replay
Erhu Feng
Wenbo Zhou
Zibin Liu
Le Chen
Yunpeng Dong
...
Yisheng Zhao
Dong Du
Zhichao Hua
Yubin Xia
Haibo Chen
363
6
0
23 May 2025
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning
Zhepei Wei
Wenlin Yao
Yao Liu
Weizhi Zhang
Qin Lu
...
Puyang Xu
Chao Zhang
Bing Yin
Hyokun Yun
Lihong Li
OffRL
CLL
OnRL
LRM
405
54
0
22 May 2025
lmgame-Bench: How Good are LLMs at Playing Games?
Lanxiang Hu
Mingjia Huo
Yu Zhang
Haoyang Yu
Eric P. Xing
Ion Stoica
Tajana Rosing
Haojian Jin
Hao Zhang
441
9
0
21 May 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
582
19
0
21 Apr 2025
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Divyansh Garg
Shaun VanWeelden
Diego Caples
Andis Draguns
Nikil Ravi
...
Youngchul Joo
Jindong Gu
Charles London
Christian Schroeder de Witt
S. Motwani
502
18
0
15 Apr 2025
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
Yen-Ting Lin
Di Jin
Tengyu Xu
Tianhao Wu
Sainbayar Sukhbaatar
...
Yuandong Tian
Arash Rahnama
Sinong Wang
Hao Ma
Han Fang
LRM
188
11
0
18 Jan 2025
Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization
Guanlin Liu
Kaixuan Ji
Ning Dai
Zheng Wu
Chen Dun
Q. Gu
Lin Yan
Quanquan Gu
Lin Yan
OffRL
LRM
351
18
0
11 Oct 2024
1