ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXivPDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 7,062 papers shown
Title
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Conor F. Hayes
Felipe Leno Da Silva
Jiachen Yang
T. Nathan Mundhenk
Chak Shing Lee
...
Ahmet Can Solak
Thomas Desautels
Daniel Faissol
Brenden K. Petersen
Mikel Landajuela
28
0
0
16 May 2025
Meta-World+: An Improved, Standardized, RL Benchmark
Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
...
Ryan Julian
Jordan Terry
Isaac Woungang
Nariman Farsad
Pablo Samuel Castro
OffRL
21
0
0
16 May 2025
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Diffusion-NPO: Negative Preference Optimization for Better Preference Aligned Generation of Diffusion Models
Fu-Yun Wang
Yunhao Shui
Jingtan Piao
Keqiang Sun
Hongsheng Li
34
0
0
16 May 2025
Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors
Bidirectional Distillation: A Mixed-Play Framework for Multi-Agent Generalizable Behaviors
Lang Feng
Jiahao Lin
Dong Xing
Li Zhang
De Ma
Gang Pan
42
0
0
16 May 2025
REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning
REMOR: Automated Peer Review Generation with LLM Reasoning and Multi-Objective Reinforcement Learning
Pawin Taechoyotin
Daniel Acuna
LRM
27
0
0
16 May 2025
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Explaining Strategic Decisions in Multi-Agent Reinforcement Learning for Aerial Combat Tactics
Ardian Selmonaj
Alessandro Antonucci
Adrian Schneider
Michael Rüegsegger
Matthias Sommer
32
0
0
16 May 2025
Exploration by Random Distribution Distillation
Exploration by Random Distribution Distillation
Zhirui Fang
Kai Yang
Jian Tao
Jiafei Lyu
Lusong Li
Li Shen
Xiu Li
19
0
0
16 May 2025
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Unifying Segment Anything in Microscopy with Multimodal Large Language Model
Manyu Li
Ruian He
Zixian Zhang
Weimin Tan
Bo Yan
VLM
24
0
0
16 May 2025
Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning
Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning
Chongyang Tan
Ruoqi Wen
Rongpeng Li
Zhifeng Zhao
Ekram Hossain
Honggang Zhang
34
0
0
16 May 2025
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
Time-R1: Towards Comprehensive Temporal Reasoning in LLMs
Zijia Liu
Peixuan Han
Haofei Yu
Haoru Li
Jiaxuan You
AI4TS
LRM
29
0
0
16 May 2025
Towards Self-Improvement of Diffusion Models via Group Preference Optimization
Towards Self-Improvement of Diffusion Models via Group Preference Optimization
Renjie Chen
Wenfeng Lin
Yichen Zhang
Jiangchuan Wei
Boyuan Liu
Chao Feng
Jiao Ran
Mingyu Guo
27
0
0
16 May 2025
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Spectral Policy Optimization: Coloring your Incorrect Reasoning in GRPO
Peter Chen
Xiaopeng Li
Zhiyu Li
Xi Chen
Tianyi Lin
24
0
0
16 May 2025
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs
Yaorui Shi
Shihan Li
Chang Wu
Zhiyuan Liu
Sihang Li
Hengxing Cai
An Zhang
Xiang Wang
ReLM
LRM
41
0
0
16 May 2025
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages
Junyao Xing
Jiaqi Zeng
Olivier Delalleau
Hoo-Chang Shin
Felipe Soares
Alexander Bukharin
Ellie Evans
Yi Dong
Oleksii Kuchaiev
34
0
0
16 May 2025
LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios
LD-Scene: LLM-Guided Diffusion for Controllable Generation of Adversarial Safety-Critical Driving Scenarios
Mingxing Peng
Yuting Xie
Xusen Guo
Ruoyu Yao
Hai Yang
Jun Ma
19
0
0
16 May 2025
Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design
Reinforcement Learning for AMR Charging Decisions: The Impact of Reward and Action Space Design
Janik Bischoff
Alexandru Rinciog
Anne Meyer
OffRL
22
0
0
16 May 2025
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
Continuous Optimization for Feature Selection with Permutation-Invariant Embedding and Policy-Guided Search
Rui Liu
Rui Xie
Zijun Yao
Yanjie Fu
Dongjie Wang
12
0
0
16 May 2025
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRM
24
0
0
16 May 2025
Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation
Sample Efficient Reinforcement Learning via Large Vision Language Model Distillation
Donghoon Lee
Tung M. Luu
Younghwan Lee
Chang D. Yoo
OffRL
VLM
34
0
0
16 May 2025
Scalability of Reinforcement Learning Methods for Dispatching in Semiconductor Frontend Fabs: A Comparison of Open-Source Models with Real Industry Datasets
Scalability of Reinforcement Learning Methods for Dispatching in Semiconductor Frontend Fabs: A Comparison of Open-Source Models with Real Industry Datasets
Patrick Stöckermann
Henning Südfeld
Alessandro Immordino
Thomas Altenmüller
Marc Wegmann
Martin Gebser
Konstantin Schekotihin
Georg Seidel
Chew Wye Chan
Fei Fei Zhang
OffRL
22
0
0
16 May 2025
Learning Diverse Natural Behaviors for Enhancing the Agility of Quadrupedal Robots
Huiqiao Fu
Haoyu Dong
Wentao Xu
Zhehao Zhou
Guizhou Deng
Kaiqiang Tang
D. Dong
Chunlin Chen
29
0
0
15 May 2025
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
Wenhao Shen
Wanqi Yin
Xiaofeng Yang
Cheng Chen
Chaoyue Song
Zhongang Cai
Lei Yang
Hao Wang
Guosheng Lin
49
0
0
15 May 2025
APEX: Action Priors Enable Efficient Exploration for Skill Imitation on Articulated Robots
Shivam Sood
Laukik B Nakhwa
Yuhong Cao
Sun Ge
Guillaume Sartoretti
28
0
0
15 May 2025
Infinigen-Sim: Procedural Generation of Articulated Simulation Assets
Infinigen-Sim: Procedural Generation of Articulated Simulation Assets
Abhishek Joshi
Beining Han
Jack Nugent
Yiming Zuo
Jing Liu
...
Tao Sun
Alexander Raistrick
Gaowen Liu
Yi Shao
Jia Deng
VGen
39
0
0
15 May 2025
Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency
Decomposed Inductive Procedure Learning: Learning Academic Tasks with Human-Like Data Efficiency
Daniel Weitekamp
Christopher MacLellan
Erik Harpstead
Kenneth R. Koedinger
38
0
0
15 May 2025
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests
Evaluating Robustness of Deep Reinforcement Learning for Autonomous Surface Vehicle Control in Field Tests
Luis F W Batista
Stéphanie Aravecchia
Seth Hutchinson
C´edric Pradalier
38
0
0
15 May 2025
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Reinforcing the Diffusion Chain of Lateral Thought with Diffusion Language Models
Zemin Huang
Zhiyang Chen
Zijun Wang
Tiancheng Li
Guo-Jun Qi
DiffM
LRM
AI4CE
40
0
0
15 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yongqian Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLM
ALM
43
0
0
14 May 2025
InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials
InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials
Xiao-Qi Han
Peng-Jie Guo
Ze-Feng Gao
Hao Sun
Zhong-Yi Lu
AI4CE
38
0
0
14 May 2025
Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model
Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model
George Andriopoulos
Soyuj Jung Basnet
Juan Guevara
Li Guo
Keith Ross
42
0
0
14 May 2025
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Scent of Knowledge: Optimizing Search-Enhanced Reasoning with Information Foraging
Hongjin Qian
Zhengyang Liang
RALM
LRM
45
0
0
14 May 2025
CEC-Zero: Chinese Error Correction Solution Based on LLM
CEC-Zero: Chinese Error Correction Solution Based on LLM
Sophie Zhang
Zhiming Lin
36
0
0
14 May 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yun Wang
Yuxuan Liu
Y. X. Wei
MoE
41
0
0
14 May 2025
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko
Saurabhchand Bhati
Edson Araujo
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
VLM
49
0
0
14 May 2025
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
Yuhang Huang
JIazhao Zhang
SHilong Zou
Xinwang Liu
Ruizhen Hu
Kai Xu
27
0
0
13 May 2025
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Reinforcement Learning meets Masked Video Modeling : Trajectory-Guided Adaptive Token Selection
Ayush K. Rai
Kyle Min
Tarun Krishna
Feiyan Hu
Alan F. Smeaton
Noel E. O'Connor
VGen
36
0
0
13 May 2025
Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning
Enhancing Aerial Combat Tactics through Hierarchical Multi-Agent Reinforcement Learning
Ardian Selmonaj
Oleg Szehr
Giacomo Del Rio
Alessandro Antonucci
Adrian Schneider
Michael Rüegsegger
34
0
0
13 May 2025
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale
Yunjie Ji
Xiaoyu Tian
Sitong Zhao
Haotian Wang
Shuaiting Chen
Yiping Peng
Han Zhao
Xiangang Li
ReLM
LRM
VLM
56
1
0
13 May 2025
Deep reinforcement learning-based longitudinal control strategy for automated vehicles at signalised intersections
Deep reinforcement learning-based longitudinal control strategy for automated vehicles at signalised intersections
Pankaj Kumar
Aditya Mishra
Pranamesh Chakraborty
Subrahmanya Swamy Peruru
37
0
0
13 May 2025
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Montaser Mohammedalamen
Michael Bowling
34
0
0
13 May 2025
MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos
MA-ROESL: Motion-aware Rapid Reward Optimization for Efficient Robot Skill Learning from Single Videos
Xinyu Wang
Xinming Zhang
Yanjun Chen
Xiaoyu Shen
Wei Zhang
29
0
0
13 May 2025
Detecting Prefix Bias in LLM-based Reward Models
Detecting Prefix Bias in LLM-based Reward Models
Ashwin Kumar
Yuzi He
Aram H. Markosyan
Bobbie Chern
Imanol Arrieta-Ibarra
17
0
0
13 May 2025
Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning
Adaptive Security Policy Management in Cloud Environments Using Reinforcement Learning
Muhammad Saqib
Dipkumar Mehta
Fnu Yashu
Shubham Malhotra
29
0
0
13 May 2025
Monte Carlo Beam Search for Actor-Critic Reinforcement Learning in Continuous Control
Monte Carlo Beam Search for Actor-Critic Reinforcement Learning in Continuous Control
Hazim Alzorgan
Abolfazl Razi
39
0
0
13 May 2025
Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges
Parameter Estimation using Reinforcement Learning Causal Curiosity: Limits and Challenges
Miguel Arana-Catania
Weisi Guo
CML
35
0
0
13 May 2025
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles
Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles
Matteo Gallici
Ivan Masmitja
Mario Martin
OffRL
31
0
0
13 May 2025
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Xinyue Wang
Zhen Zhang
OffRL
CML
37
0
0
13 May 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang
Xingang Yan
Wei Lin
Tianxiang Zhang
Qianchun Lu
LRM
35
0
0
13 May 2025
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
InfoPO: On Mutual Information Maximization for Large Language Model Alignment
Teng Xiao
Zhen Ge
Sujay Sanghavi
Tian Wang
Julian Katz-Samuels
Marc Versage
Qingjun Cui
Trishul Chilimbi
31
0
0
13 May 2025
Reinforcement Learning-based Fault-Tolerant Control for Quadrotor with Online Transformer Adaptation
Reinforcement Learning-based Fault-Tolerant Control for Quadrotor with Online Transformer Adaptation
Dohyun Kim
Jayden Dongwoo Lee
Hyochoong Bang
Jungho Bae
41
0
0
13 May 2025
Previous
123456...140141142
Next