Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1707.06347
Cited By
Proximal Policy Optimization Algorithms
20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Proximal Policy Optimization Algorithms"
50 / 7,213 papers shown
Title
Neurophysiologically Realistic Environment for Comparing Adaptive Deep Brain Stimulation Algorithms in Parkinson Disease
Ekaterina Kuzmina
Dmitrii Kriukov
Mikhail Lebedev
Dmitry V. Dylov
OOD
48
0
0
26 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
75
0
0
26 Apr 2025
Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots
Brendon Johnson
Alfredo Weitzenfeld
39
1
0
26 Apr 2025
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng
Feishi Wang
Songlin Wei
Yuchen Li
Bangjun Wang
...
Hao Dong
Siyuan Huang
Yue Wang
Jitendra Malik
Pieter Abbeel
85
4
0
26 Apr 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jing Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Yongbin Li
Jun Zhu
MLLM
292
0
0
25 Apr 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
86
0
0
25 Apr 2025
Learning from Less: SINDy Surrogates in RL
Aniket Dixit
Muhammad Ibrahim Khan
Faizan Ahmed
James Brusey
45
0
0
25 Apr 2025
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Caia Costello
Simon Guo
Anna Goldie
Azalia Mirhoseini
ReLM
SyDa
LRM
118
2
0
25 Apr 2025
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding
Kun Li
Jiadong Wang
Yangfan He
Xinyuan Song
Ruoyu Wang
...
Keqin Li
Sida Li
Miao Zhang
Tianyu Shi
Xueqian Wang
50
0
0
25 Apr 2025
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
Gwen Yidou Weng
Benjie Wang
Guy Van den Broeck
BDL
275
0
0
25 Apr 2025
RL-Driven Data Generation for Robust Vision-Based Dexterous Grasping
Atsushi Kanehira
Naoki Wake
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
44
0
0
25 Apr 2025
AI Awareness
Xianrui Li
Haoyuan Shi
Rongwu Xu
Wei Xu
63
0
0
25 Apr 2025
Do We Need Transformers to Play FPS Video Games?
Karmanbir Batth
Krish Sethi
Aly Shariff
Leo Shi
Hetul Patel
OffRL
AI4CE
41
0
0
24 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
94
11
0
24 Apr 2025
Evolution Meets Diffusion: Efficient Neural Architecture Generation
Bingye Zhou
Caiyang Yu
DiffM
86
0
0
24 Apr 2025
Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization
Hongshu Guo
Wenjie Qiu
Zeyuan Ma
Xuzhi Zhang
Jun Zhang
Jiawei Liu
68
2
0
24 Apr 2025
CaRL: Learning Scalable Planning Policies with Simple Rewards
Bernhard Jaeger
D. Dauner
Jens Beißwenger
Simon Gerstenecker
Kashyap Chitta
Andreas Geiger
66
1
0
24 Apr 2025
Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning
Mingqi Yuan
Qi Wang
Guozheng Ma
Yue Liu
Xin Jin
Yunbo Wang
Xiaokang Yang
Wenjun Zeng
D. Tao
OffRL
AI4CE
52
0
0
24 Apr 2025
High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures
A. J Miller
Fangzhou Yu
Michael Brauckmann
Farbod Farshidian
OffRL
BDL
39
0
0
24 Apr 2025
Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control
Haoran Wang
Zhiwei Shi
Chengxi Zhu
Yafei Qiao
Cheng Zhang
Fan Yang
Pengjie Ren
Lan Lu
D. Xuan
82
1
0
24 Apr 2025
SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
Nicolas Jonason
Luca Casini
Bob L. T. Sturm
44
1
0
23 Apr 2025
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
S. Nair
Timothy F. Walsh
Greg Pickrell
Fabio Semperlotti
40
0
0
23 Apr 2025
Param
Δ
Δ
Δ
for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
85
0
0
23 Apr 2025
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator
Chenhao Li
Andreas Krause
Marco Hutter
OffRL
38
0
0
23 Apr 2025
Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification
Balaji Rao
William Eiers
Carlo Lipizzi
63
0
0
23 Apr 2025
Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms
Hsin-Jung Yang
Mahsa Khosravi
Benjamin Walt
Girish Krishnan
Soumik Sarkar
27
0
0
23 Apr 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
85
1
0
23 Apr 2025
A Comprehensive Survey of Synthetic Tabular Data Generation
Ruxue Shi
Yili Wang
Mengnan Du
Xu Shen
Xin Wang
54
2
0
23 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&Ro
LRM
38
0
0
22 Apr 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
74
0
0
22 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
82
0
0
22 Apr 2025
GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network
Wenjing Xiao
Chenglong Shi
Miaojiang Chen
Zhiquan Liu
Min Chen
Haiwen Song
41
0
0
22 Apr 2025
CaRoSaC: A Reinforcement Learning-Based Kinematic Control of Cable-Driven Parallel Robots by Addressing Cable Sag through Simulation
Rohit Dhakate
Thomas Jantos
Eren Allak
Stephan Weiss
J. Steinbrener
51
0
0
22 Apr 2025
Dynamic Early Exit in Reasoning Models
Chenxu Yang
Qingyi Si
Yongjie Duan
Zheliang Zhu
Chenyu Zhu
Zheng Lin
Zheng Lin
Li Cao
Weiping Wang
ReLM
LRM
67
10
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
213
5
0
22 Apr 2025
AdaViP: Aligning Multi-modal LLMs via Adaptive Vision-enhanced Preference Optimization
Jinda Lu
Jinghan Li
Yuan Gao
Junkang Wu
Jiancan Wu
Xiang Wang
Xiangnan He
275
1
0
22 Apr 2025
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents
Siyu Zhou
Tianyi Zhou
Yijun Yang
Guodong Long
Deheng Ye
Jing Jiang
Chengqi Zhang
LM&Ro
37
0
0
22 Apr 2025
Autonomous Control of Redundant Hydraulic Manipulator Using Reinforcement Learning with Action Feedback
Rohit Dhakate
Christian Brommer
C. Böhm
Stephan Weiss
J. Steinbrener
36
5
0
22 Apr 2025
Learning Explainable Dense Reward Shapes via Bayesian Optimization
Ryan Koo
Ian Yang
Vipul Raheja
Mingyi Hong
Kwang-Sung Jun
Dongyeop Kang
38
0
0
22 Apr 2025
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Yinmin Zhong
Zili Zhang
Xiaoniu Song
Hanpeng Hu
Chao Jin
...
Changyi Wan
Hongyu Zhou
Yimin Jiang
Yibo Zhu
Daxin Jiang
OffRL
AI4TS
66
0
0
22 Apr 2025
Reasoning Physical Video Generation with Diffusion Timestep Tokens via Reinforcement Learning
Wang Lin
Liyu Jia
Wentao Hu
Kaihang Pan
Zhongqi Yue
Wei Zhao
Jingyuan Chen
Fei Wu
Hanwang Zhang
VGen
54
1
0
22 Apr 2025
Dynamic Legged Ball Manipulation on Rugged Terrains with Hierarchical Reinforcement Learning
Dongjie Zhu
Zhuo Yang
Tianhang Wu
Luzhou Ge
Xiaochen Li
Qi Liu
Xuzhao Li
36
0
0
21 Apr 2025
Stop Summation: Min-Form Credit Assignment Is All Process Reward Model Needs for Reasoning
Jie Cheng
Ruixi Qiao
Lijun Li
Chao Guo
Jianmin Wang
Gang Xiong
Yisheng Lv
Fei-Yue Wang
LRM
240
3
0
21 Apr 2025
Improving Human-AI Coordination through Adversarial Training and Generative Models
Paresh Chaudhary
Yancheng Liang
Daphne Chen
S. Du
Natasha Jaques
73
0
0
21 Apr 2025
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Yatong Bai
Jonah Casebeer
Somayeh Sojoudi
Nicholas J. Bryan
DiffM
VLM
70
1
0
21 Apr 2025
DSPO: Direct Semantic Preference Optimization for Real-World Image Super-Resolution
Miaomiao Cai
Simiao Li
Wei Li
X. Y. Huang
Hanting Chen
Jie Hu
Yunhe Wang
38
0
0
21 Apr 2025
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
44
3
0
21 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
281
0
0
21 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
57
0
0
21 Apr 2025
In-context Ranking Preference Optimization
Junda Wu
Rohan Surana
Zhouhang Xie
Yiran Shen
Yu Xia
Tong Yu
Ryan Rossi
Prithviraj Ammanabrolu
Julian McAuley
40
0
0
21 Apr 2025
Previous
1
2
3
...
6
7
8
...
143
144
145
Next