ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXivPDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 7,146 papers shown
Title
Whleaper: A 10-DOF Flexible Bipedal Wheeled Robot
Whleaper: A 10-DOF Flexible Bipedal Wheeled Robot
Yinglei Zhu
Sixiao He
Zhenghao Qi
Zhuoyuan Yong
Yihua Qin
Jianyu Chen
36
0
0
30 Apr 2025
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
ShorterBetter: Guiding Reasoning Models to Find Optimal Inference Length for Efficient Reasoning
Jingyang Yi
Jiazheng Wang
Sida Li
ReLM
OODD
LRM
290
3
0
30 Apr 2025
One Net to Rule Them All: Domain Randomization in Quadcopter Racing Across Different Platforms
One Net to Rule Them All: Domain Randomization in Quadcopter Racing Across Different Platforms
Robin Ferede
Till Blaha
Erin Lucassen
Christophe De Wagter
Guido de Croon
45
1
0
30 Apr 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
58
2
0
30 Apr 2025
Neuro-Symbolic Generation of Explanations for Robot Policies with Weighted Signal Temporal Logic
Neuro-Symbolic Generation of Explanations for Robot Policies with Weighted Signal Temporal Logic
Mikihisa Yuasa
R. Sreenivas
Huy T. Tran
52
0
0
30 Apr 2025
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math
Haoran Xu
Baolin Peng
Hany Awadalla
DongDong Chen
Yen-Chun Chen
...
Yelong Shen
Shuaiqiang Wang
Weijian Xu
Jianfeng Gao
Weizhu Chen
ReLM
LRM
90
3
0
30 Apr 2025
LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning
LangWBC: Language-directed Humanoid Whole-Body Control via End-to-end Learning
Yiyang Shao
Xiaoyu Huang
Bike Zhang
Qiayuan Liao
Yuman Gao
Yufeng Chi
Zhongyu Li
Sophia Shao
Koushil Sreenath
LM&Ro
292
0
0
30 Apr 2025
XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search
XPG-RL: Reinforcement Learning with Explainable Priority Guidance for Efficiency-Boosted Mechanical Search
Yiting Zhang
Shichen Li
Elena Shrestha
40
1
0
29 Apr 2025
A Domain-Agnostic Scalable AI Safety Ensuring Framework
A Domain-Agnostic Scalable AI Safety Ensuring Framework
Beomjun Kim
Kangyeon Kim
Sunwoo Kim
Heejin Ahn
57
0
0
29 Apr 2025
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
Liu Liu
...
Jianfeng Gao
Weizhu Chen
Shuaiqiang Wang
Simon Shaolei Du
Yelong Shen
OffRL
ReLM
LRM
148
11
0
29 Apr 2025
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Antidote: A Unified Framework for Mitigating LVLM Hallucinations in Counterfactual Presupposition and Object Perception
Yuanchen Wu
Lu Zhang
Hang Yao
Junlong Du
Ke Yan
Shouhong Ding
Yunsheng Wu
Xuzhao Li
MLLM
85
0
0
29 Apr 2025
Token-Efficient RL for LLM Reasoning
Token-Efficient RL for LLM Reasoning
Alan Lee
Harry Tong
OffRL
262
0
0
29 Apr 2025
Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey
Multi-Agent Reinforcement Learning for Resources Allocation Optimization: A Survey
Mohamad Abdul Hady
Siyi Hu
Mahardhika Pratama
Jimmy Cao
Ryszard Kowalczyk
29
0
0
29 Apr 2025
Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation
Return Capping: Sample-Efficient CVaR Policy Gradient Optimisation
Harry Mead
Clarissa Costen
Bruno Lacerda
Nick Hawes
50
0
0
29 Apr 2025
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
Adam Younsi
Abdalgader Abubaker
M. Seddik
Hakim Hacid
Salem Lahlou
LRM
59
0
0
28 Apr 2025
ARMOR: Adaptive Meshing with Reinforcement Optimization for Real-time 3D Monitoring in Unexposed Scenes
ARMOR: Adaptive Meshing with Reinforcement Optimization for Real-time 3D Monitoring in Unexposed Scenes
Yizhe Zhang
Jianping Li
Xin Zhao
Fuxun Liang
Z. Dong
Bisheng Yang
AI4CE
59
0
0
28 Apr 2025
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Learning to Plan Before Answering: Self-Teaching LLMs to Learn Abstract Plans for Problem Solving
Junxuan Zhang
Flood Sung
Zhiyong Yang
Yang Gao
Chongjie Zhang
LLMAG
58
0
0
28 Apr 2025
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Agentic Reasoning and Tool Integration for LLMs via Reinforcement Learning
Joykirat Singh
Raghav Magazine
Yash Pandya
A. Nambi
LLMAG
KELM
OffRL
LRM
218
4
0
28 Apr 2025
Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control
Model-based controller assisted domain randomization in deep reinforcement learning: application to nonlinear powertrain control
Heisei Yonezawa
Ansei Yonezawa
Itsuro Kajiwara
54
0
0
28 Apr 2025
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets
GenCLS++: Pushing the Boundaries of Generative Classification in LLMs Through Comprehensive SFT and RL Studies Across Diverse Datasets
Mingqian He
Fei Zhao
Chonggang Lu
Ziqiang Liu
Yun Wang
Haofu Qian
OffRL
AI4TS
VLM
74
0
0
28 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
83
0
0
27 Apr 2025
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Fast and Robust: Task Sampling with Posterior and Diversity Synergies for Adaptive Decision-Makers in Randomized Environments
Yun Qu
Wenjie Wang
Yixiu Mao
Yiqin Lv
Xiangyang Ji
TTA
93
0
0
27 Apr 2025
HyperController: A Hyperparameter Controller for Fast and Stable Training of Reinforcement Learning Neural Networks
HyperController: A Hyperparameter Controller for Fast and Stable Training of Reinforcement Learning Neural Networks
J. Gornet
Yiannis Kantaros
Bruno Sinopoli
260
0
0
27 Apr 2025
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
Shuang Wang
Haoyang Zhang
Tianxing Wu
Yize Zhang
W. Zhang
Quan Z. Sheng
36
0
0
27 Apr 2025
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning
Jiaqi Chen
Bang Zhang
Ruotian Ma
Peisong Wang
Xiaodan Liang
Zhaopeng Tu
Xuzhao Li
Kwan-Yee K. Wong
LLMAG
ReLM
LRM
92
1
0
27 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
75
0
0
26 Apr 2025
Neurophysiologically Realistic Environment for Comparing Adaptive Deep Brain Stimulation Algorithms in Parkinson Disease
Neurophysiologically Realistic Environment for Comparing Adaptive Deep Brain Stimulation Algorithms in Parkinson Disease
Ekaterina Kuzmina
Dmitrii Kriukov
Mikhail Lebedev
Dmitry V. Dylov
OOD
43
0
0
26 Apr 2025
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng
Feishi Wang
Songlin Wei
Yuchen Li
Bangjun Wang
...
Hao Dong
Siyuan Huang
Yue Wang
Jitendra Malik
Pieter Abbeel
85
4
0
26 Apr 2025
Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots
Hierarchical Reinforcement Learning in Multi-Goal Spatial Navigation with Autonomous Mobile Robots
Brendon Johnson
Alfredo Weitzenfeld
34
1
0
26 Apr 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
86
0
0
25 Apr 2025
AI Awareness
AI Awareness
Xianrui Li
Haoyuan Shi
Rongwu Xu
Wei Xu
63
0
0
25 Apr 2025
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
DREAM: Disentangling Risks to Enhance Safety Alignment in Multimodal Large Language Models
Jing Liu
Hangyu Guo
Ranjie Duan
Xingyuan Bu
Yancheng He
...
Yingshui Tan
Yanan Wu
Jihao Gu
Yongbin Li
Jun Zhu
MLLM
289
0
0
25 Apr 2025
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Think, Prune, Train, Improve: Scaling Reasoning without Scaling Models
Caia Costello
Simon Guo
Anna Goldie
Azalia Mirhoseini
ReLM
SyDa
LRM
118
2
0
25 Apr 2025
Learning from Less: SINDy Surrogates in RL
Learning from Less: SINDy Surrogates in RL
Aniket Dixit
Muhammad Ibrahim Khan
Faizan Ahmed
James Brusey
45
0
0
25 Apr 2025
RL-Driven Data Generation for Robust Vision-Based Dexterous Grasping
RL-Driven Data Generation for Robust Vision-Based Dexterous Grasping
Atsushi Kanehira
Naoki Wake
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
42
0
0
25 Apr 2025
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding
Optimizing Multi-Round Enhanced Training in Diffusion Models for Improved Preference Understanding
Kun Li
Jiadong Wang
Yangfan He
Xinyuan Song
Ruoyu Wang
...
Keqin Li
Sida Li
Miao Zhang
Tianyu Shi
Xueqian Wang
50
0
0
25 Apr 2025
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
TRACE Back from the Future: A Probabilistic Reasoning Approach to Controllable Language Generation
Gwen Yidou Weng
Benjie Wang
Guy Van den Broeck
BDL
272
0
0
25 Apr 2025
High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures
High-Performance Reinforcement Learning on Spot: Optimizing Simulation Parameters with Distributional Measures
A. J Miller
Fangzhou Yu
Michael Brauckmann
Farbod Farshidian
OffRL
BDL
39
0
0
24 Apr 2025
Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning
Plasticine: Accelerating Research in Plasticity-Motivated Deep Reinforcement Learning
Mingqi Yuan
Qi Wang
Guozheng Ma
Yue Liu
Xin Jin
Yunbo Wang
Xiaokang Yang
Wenjun Zeng
D. Tao
OffRL
AI4CE
52
0
0
24 Apr 2025
Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control
Integrating Learning-Based Manipulation and Physics-Based Locomotion for Whole-Body Badminton Robot Control
Haoran Wang
Zhiwei Shi
Chengxi Zhu
Yafei Qiao
Cheng Zhang
Fan Yang
Pengjie Ren
Lan Lu
D. Xuan
82
1
0
24 Apr 2025
Evolution Meets Diffusion: Efficient Neural Architecture Generation
Evolution Meets Diffusion: Efficient Neural Architecture Generation
Bingye Zhou
Caiyang Yu
DiffM
86
0
0
24 Apr 2025
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
RAGEN: Understanding Self-Evolution in LLM Agents via Multi-Turn Reinforcement Learning
Zihan Wang
Kaidi Wang
Q. Wang
Pingyue Zhang
Linjie Li
...
Jiajun Wu
L. Fei-Fei
Lijuan Wang
Yejin Choi
Manling Li
92
11
0
24 Apr 2025
CaRL: Learning Scalable Planning Policies with Simple Rewards
CaRL: Learning Scalable Planning Policies with Simple Rewards
Bernhard Jaeger
D. Dauner
Jens Beißwenger
Simon Gerstenecker
Kashyap Chitta
Andreas Geiger
66
1
0
24 Apr 2025
Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization
Advancing CMA-ES with Learning-Based Cooperative Coevolution for Scalable Optimization
Hongshu Guo
Wenjie Qiu
Zeyuan Ma
Xuzhi Zhang
Jun Zhang
Jiawei Liu
68
1
0
24 Apr 2025
Do We Need Transformers to Play FPS Video Games?
Do We Need Transformers to Play FPS Video Games?
Karmanbir Batth
Krish Sethi
Aly Shariff
Leo Shi
Hetul Patel
OffRL
AI4CE
41
0
0
24 Apr 2025
SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
SMART: Tuning a symbolic music generation system with an audio domain aesthetic reward
Nicolas Jonason
Luca Casini
Bob L. T. Sturm
44
1
0
23 Apr 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
85
1
0
23 Apr 2025
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator
Offline Robotic World Model: Learning Robotic Policies without a Physics Simulator
Chenhao Li
Andreas Krause
Marco Hutter
OffRL
38
0
0
23 Apr 2025
Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms
Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms
Hsin-Jung Yang
Mahsa Khosravi
Benjamin Walt
Girish Krishnan
Soumik Sarkar
20
0
0
23 Apr 2025
A Comprehensive Survey of Synthetic Tabular Data Generation
A Comprehensive Survey of Synthetic Tabular Data Generation
Ruxue Shi
Yili Wang
Mengnan Du
Xu Shen
Xin Wang
54
2
0
23 Apr 2025
Previous
123...567...141142143
Next