v1v2 (latest)

Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training

24 March 2025

ArXiv (abs)PDF HTML HuggingFace (4 upvotes)Github (22★)

Papers citing "Trajectory Balance with Asynchrony: Decoupling Exploration and Learning for Fast, Scalable LLM Post-Training"

6 / 6 papers shown

CoPRIS: Efficient and Stable Reinforcement Learning via Concurrency-Controlled Partial Rollout with Importance Sampling

05 Nov 2025

FlowRL: Matching Reward Distributions for LLM Reasoning

...

235

18 Sep 2025

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

...

504

30 May 2025

Segment Policy Optimization: Effective Segment-Level Credit Assignment in RL for Large Language Models

281

29 May 2025

Steering Generative Models with Experimental Data for Protein Fitness Optimization

393

21 May 2025

Self-Evolving Curriculum for LLM Reasoning

Nicolas Angelard-Gontier

Yoshua Bengio

Ehsan Kamalloo

ReLM LRM

611

20 May 2025