Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.05477
Cited By
Trust Region Policy Optimization
19 February 2015
John Schulman
Sergey Levine
Philipp Moritz
Michael I. Jordan
Pieter Abbeel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Trust Region Policy Optimization"
50 / 3,103 papers shown
Title
Improving Value Estimation Critically Enhances Vanilla Policy Gradient
Tao Wang
Ruipeng Zhang
Sicun Gao
OffRL
10
0
0
25 May 2025
Bridging Supervised Learning and Reinforcement Learning in Math Reasoning
Huayu Chen
Kaiwen Zheng
Qinsheng Zhang
Ganqu Cui
Yin Cui
Haotian Ye
Tsung-Yi Lin
Ming-Yu Liu
Jun Zhu
Haoxiang Wang
OffRL
LRM
27
0
0
23 May 2025
PPO-BR: Dual-Signal Entropy-Reward Adaptation for Trust Region Policy Optimization
Ben Rahman
0
0
0
23 May 2025
Learning to Choose or Choosing to Learn: Best-of-N vs. Supervised Fine-Tuning for Bit String Generation
Seamus Somerstep
Vinod Raman
Unique Subedi
Yuekai Sun
5
0
0
22 May 2025
Effective Reinforcement Learning for Reasoning in Language Models
Lianghuan Huang
Shuo Li
Sagnik Anupam
Insup Lee
Osbert Bastani
LRM
5
0
0
22 May 2025
A Temporal Difference Method for Stochastic Continuous Dynamics
Haruki Settai
Naoya Takeishi
Takehisa Yairi
22
0
0
21 May 2025
Aligning Explanations with Human Communication
Jacopo Teneggi
Zhenzhen Wang
Paul H. Yi
Tianmin Shu
Jeremias Sulam
46
0
0
21 May 2025
AAPO: Enhance the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Dejing Dou
LRM
33
0
0
20 May 2025
Flattening Hierarchies with Policy Bootstrapping
John L. Zhou
Jonathan C. Kao
OffRL
46
0
0
20 May 2025
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
Khang Nguyen
Khai Nguyen
An T. Le
Jan Peters
Manfred Huber
Ngo Anh Vien
Minh Nhat Vu
22
0
0
19 May 2025
Incentivizing Truthful Language Models via Peer Elicitation Games
Baiting Chen
Tong Zhu
Jiale Han
Lexin Li
Gang Li
Xiaowu Dai
36
0
0
19 May 2025
RL in Name Only? Analyzing the Structural Assumptions in RL post-training for LLMs
Soumya Rani Samineni
Durgesh Kalwar
Karthik Valmeekam
Kaya Stechly
Subbarao Kambhampati
OffRL
26
0
0
19 May 2025
DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
Gang Li
Ming Lin
Tomer Galanti
Zhengzhong Tu
Tianbao Yang
26
0
0
18 May 2025
Observe-R1: Unlocking Reasoning Abilities of MLLMs with Dynamic Progressive Reinforcement Learning
Zirun Guo
Minjie Hong
Tao Jin
OffRL
LRM
53
0
0
18 May 2025
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Kalyan Cherukuri
Aarav Lala
Yash Yardi
27
0
0
17 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
27
0
0
16 May 2025
Zero-Shot Visual Generalization in Robot Manipulation
Sumeet Batra
Gaurav Sukhatme
27
0
0
16 May 2025
Policy Gradient with Second Order Momentum
Tianyu Sun
32
0
0
16 May 2025
Scalability of Reinforcement Learning Methods for Dispatching in Semiconductor Frontend Fabs: A Comparison of Open-Source Models with Real Industry Datasets
Patrick Stöckermann
Henning Südfeld
Alessandro Immordino
Thomas Altenmüller
Marc Wegmann
Martin Gebser
Konstantin Schekotihin
Georg Seidel
Chew Wye Chan
Fei Fei Zhang
OffRL
22
0
0
16 May 2025
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
Sagnik Mukherjee
Lifan Yuan
Dilek Hakkani-Tur
Hao Peng
30
0
0
16 May 2025
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRM
38
0
0
16 May 2025
Bi-Level Policy Optimization with Nyström Hypergradients
Arjun Prakash
Naicheng He
Denizalp Goktas
Amy Greenwald
24
0
0
16 May 2025
Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
...
Ryan Julian
Jordan Terry
Isaac Woungang
Nariman Farsad
Pablo Samuel Castro
OffRL
23
0
0
16 May 2025
Modular Robot Control with Motor Primitives
Moses C. Nah
Johannes Lachner
Neville Hogan
41
0
0
15 May 2025
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Huiyun Jiang
Zhuang Yang
39
0
0
13 May 2025
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Montaser Mohammedalamen
Michael Bowling
48
0
0
13 May 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang
Xingang Yan
Wei Lin
Tianxiang Zhang
Qianchun Lu
LRM
38
0
0
13 May 2025
LineFlow: A Framework to Learn Active Control of Production Lines
Kai Müller
Martin Wenzel
Tobias Windisch
AI4CE
31
0
0
10 May 2025
Barrier Function Overrides For Non-Convex Fixed Wing Flight Control and Self-Driving Cars
Eric Squires
Phillip Odom
Z. Kira
49
0
0
08 May 2025
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
Axel Friedrich Wolter
Tobias Sutter
OffRL
56
0
0
07 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
52
0
0
06 May 2025
VLM Q-Learning: Aligning Vision-Language Models for Interactive Decision-Making
Jake Grigsby
Yuke Zhu
Michael S Ryoo
Juan Carlos Niebles
OffRL
VLM
54
0
0
06 May 2025
Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story
Vincenzo De Paola
Riccardo Zamboni
Mirco Mutti
Marcello Restelli
50
0
0
02 May 2025
Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator
Xuyang Chen
Jingliang Duan
Lin Zhao
69
1
0
02 May 2025
Wasserstein Policy Optimization
David Pfau
Ian Davies
Diana Borsa
Joao G. M. Araujo
Brendan D. Tracey
H. V. Hasselt
47
0
0
01 May 2025
Leveraging Partial SMILES Validation Scheme for Enhanced Drug Design in Reinforcement Learning Frameworks
Xinyu Wang
Jinbo Bi
Minghu Song
CLL
78
0
0
01 May 2025
Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey
Mohamad Abdul Hady
Siyi Hu
Mahardhika Pratama
Jimmy Cao
Ryszard Kowalczyk
29
0
0
29 Apr 2025
HyperController: A Hyperparameter Controller for Fast and Stable Training of Reinforcement Learning Neural Networks
J. Gornet
Yiannis Kantaros
Bruno Sinopoli
266
0
0
27 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
75
0
0
26 Apr 2025
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
S. Nair
Timothy F. Walsh
Greg Pickrell
Fabio Semperlotti
40
0
0
23 Apr 2025
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Rohit Dhakate
Christian Brommer
C. Böhm
Stephan Weiss
J. Steinbrener
36
5
0
22 Apr 2025
Solving Multi-Agent Safe Optimal Control with Distributed Epigraph Form MARL
Songyuan Zhang
Oswin So
Mitchell Black
Zachary Serlin
Chuchu Fan
40
0
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
44
3
0
21 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
57
0
0
21 Apr 2025
Single-loop Algorithms for Stochastic Non-convex Optimization with Weakly-Convex Constraints
Ming-Hsuan Yang
Gang Li
Quanqi Hu
Qihang Lin
Tianbao Yang
40
0
0
21 Apr 2025
HF4Rec: Human-Like Feedback-Driven Optimization Framework for Explainable Recommendation
Jiakai Tang
Jingsen Zhang
Zihang Tian
Xueyang Feng
Lei Wang
Xu Chen
OffRL
284
0
0
19 Apr 2025
Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning
Jiyuan Shi
Xinzhe Liu
Dewei Wang
Ouyang Lu
Sören Schwertfeger
Fuchun Sun
Chenjia Bai
Xiaochen Li
49
0
0
19 Apr 2025
Hysteresis-Aware Neural Network Modeling and Whole-Body Reinforcement Learning Control of Soft Robots
Zhe Chen
Yan Xia
Jiayuan Liu
Jijia Liu
Wenhao Tang
...
Hongen Liao
Yu-Ping Wang
Chao Yu
Boyu Zhang
Fei Xing
32
1
0
18 Apr 2025
Evolutionary Policy Optimization
Zelal Su "Lain" Mustafaoglu
Keshav Pingali
Risto Miikkulainen
38
0
0
17 Apr 2025
Self-alignment of Large Video Language Models with Refined Regularized Preference Optimization
Pritam Sarkar
Ali Etemad
45
0
0
16 Apr 2025
1
2
3
4
...
61
62
63
Next