ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.18929
  4. Cited By
Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training
v1v2 (latest)

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

24 March 2025
Brian Bartoldson
S. Venkatraman
James Diffenderfer
Moksh Jain
Tal Ben-Nun
Seanie Lee
Minsu Kim
J. Obando-Ceron
Yoshua Bengio
B. Kailkhura
    OffRL
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)Github (22★)

Papers citing "Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training"

6 / 6 papers shown
CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling
CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling
Zekai Qu
Yinxu Pan
Ao Sun
Chaojun Xiao
Xu Han
80
0
0
05 Nov 2025
FlowRL: Matching Reward Distributions for LLM Reasoning
FlowRL: Matching Reward Distributions for LLM Reasoning
Xuekai Zhu
Daixuan Cheng
D. Zhang
Hengli Li
Kaiyan Zhang
...
J. Gao
Xiaodong Liu
Bowen Zhou
Hongyuan Mei
Zhouhan Lin
LRM
235
6
0
18 Sep 2025
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning
Wei Fu
Jiaxuan Gao
Xujie Shen
Chen Zhu
Zhiyu Mei
...
Jun Mei
Jiashu Wang
Tongkai Yang
Binhang Yuan
Yi Wu
OffRLSyDaLRM
504
85
0
30 May 2025
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models
Yiran Guo
Lijie Xu
Jie Liu
Dan Ye
Delin Qu
OffRL
281
15
0
29 May 2025
Steering Generative Models with Experimental Data for Protein Fitness Optimization
Steering Generative Models with Experimental Data for Protein Fitness Optimization
Jason Yang
Wenda Chu
Daniel Khalil
Raul Astudillo
Bruce J. Wittmann
Frances H. Arnold
Yisong Yue
393
5
0
21 May 2025
Self-Evolving Curriculum for LLM Reasoning
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Jian Tang
Alexandre Piché
Nicolas Angelard-Gontier
Yoshua Bengio
Ehsan Kamalloo
ReLMLRM
611
28
0
20 May 2025
1