ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,419 papers shown
Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy
Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy
Bram Grooten
Patrick MacAlpine
K. Subramanian
Peter Stone
Peter R. Wurman
OODDOffRL
268
0
0
12 Nov 2025
Generalized-Scale Object Counting with Gradual Query Aggregation
Generalized-Scale Object Counting with Gradual Query Aggregation
Jer Pelhan
A. Lukežič
Matej Kristan
ObjD
247
0
0
11 Nov 2025
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem ExplorationApplied Soft Computing (ASC), 2017
Ruiyu Qiu
Rui Wang
Guanghui Yang
Xiang Li
Zhijiang Shao
129
0
0
11 Nov 2025
Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks
Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks
Muthukumar Pandaram
Jakob J. Hollenstein
David Drexel
Samuele Tosatto
A. Rodríguez-Sánchez
J. Piater
CML
200
0
0
11 Nov 2025
On Geometric Structures for Policy Parameterization in Continuous Control
On Geometric Structures for Policy Parameterization in Continuous Control
Zhihao Lin
248
0
0
11 Nov 2025
BIPPO: Budget-Aware Independent PPO for Energy-Efficient Federated Learning Services
BIPPO: Budget-Aware Independent PPO for Energy-Efficient Federated Learning Services
Anna Lackinger
Andrea Morichetta
P. Frangoudis
Schahram Dustdar
179
0
0
11 Nov 2025
Deep (Predictive) Discounted Counterfactual Regret Minimization
Deep (Predictive) Discounted Counterfactual Regret Minimization
Hang Xu
Kai Li
Haobo Fu
Qiang Fu
Junliang Xing
Jian Cheng
100
0
0
11 Nov 2025
Learning Omnidirectional Locomotion for a Salamander-Like Quadruped Robot
Learning Omnidirectional Locomotion for a Salamander-Like Quadruped Robot
Zhiang Liu
Yang Liu
Yongchun Fang
X. Guo
210
0
0
11 Nov 2025
Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring
Judging by the Rules: Compliance-Aligned Framework for Modern Slavery Statement Monitoring
Wenhao Xu
Akshatha Arodi
Jian-Yun Nie
Arsène Fansi Tchango
AILaw
295
0
0
11 Nov 2025
Understanding Electro-communication and Electro-sensing in Weakly Electric Fish using Multi-Agent Deep Reinforcement Learning
Understanding Electro-communication and Electro-sensing in Weakly Electric Fish using Multi-Agent Deep Reinforcement Learning
Satpreet H. Singh
Sonja Johnson-Yu
Zhouyang Lu
Aaron Walsman
Federico Pedraja
Denis Turcu
Pratyusha Sharma
Naomi Saphra
Nathaniel B. Sawtell
Kanaka Rajan
79
0
0
11 Nov 2025
GAMA: A Neural Neighborhood Search Method with Graph-aware Multi-modal Attention for Vehicle Routing Problem
GAMA: A Neural Neighborhood Search Method with Graph-aware Multi-modal Attention for Vehicle Routing ProblemInternational Symposium on Mixed and Augmented Reality (ISMAR), 2025
Xiangling Chen
Yi Mei
Mengjie Zhang
101
0
0
11 Nov 2025
SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories
SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories
Returaj Burnwal
Nirav P. Bhatt
Balaraman Ravindran
OffRL
369
0
0
11 Nov 2025
Numerical Sensitivity and Robustness: Exploring the Flaws of Mathematical Reasoning in Large Language Models
Numerical Sensitivity and Robustness: Exploring the Flaws of Mathematical Reasoning in Large Language Models
Zhishen Sun
Guang Dai
Ivor Tsang
Haishan Ye
AAMLLRM
150
0
0
11 Nov 2025
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
SONIC: Supersizing Motion Tracking for Natural Humanoid Whole-Body Control
Zhengyi Luo
Ye Yuan
Tingwu Wang
Chenran Li
Sirui Chen
...
Jan Kautz
Yan Chang
Umar Iqbal
Linxi Fan
Yuke Zhu
129
6
0
11 Nov 2025
A Negotiation-Based Multi-Agent Reinforcement Learning Approach for Dynamic Scheduling of Reconfigurable Manufacturing Systems
A Negotiation-Based Multi-Agent Reinforcement Learning Approach for Dynamic Scheduling of Reconfigurable Manufacturing SystemsNASA Formal Methods (NFM), 2025
Manonmani Sekar
Nasim Nezamoddini
59
0
0
11 Nov 2025
Adversarial Bias: Data Poisoning Attacks on Fairness
Adversarial Bias: Data Poisoning Attacks on Fairness
Eunice Chan
Hanghang Tong
AAML
68
0
0
11 Nov 2025
Analyzing Political Text at Scale with Online Tensor LDA
Analyzing Political Text at Scale with Online Tensor LDA
Sara Kangaslahti
Danny Ebanks
Jean Kossaifi
Anqi Liu
R. Alvarez
A. Anandkumar
105
0
0
11 Nov 2025
PrefPoE: Advantage-Guided Preference Fusion for Learning Where to Explore
PrefPoE: Advantage-Guided Preference Fusion for Learning Where to Explore
Zhihao Lin
Lin Wu
Zhen Tian
Jianglin Lan
125
0
0
11 Nov 2025
Enhancing Binary Encoded Crime Linkage Analysis Using Siamese Network
Enhancing Binary Encoded Crime Linkage Analysis Using Siamese Network
Yicheng Zhan
Fahim Ahmed
Amy Burrell
Matthew J. Tonkin
Sarah Galambos
Jessica Woodhams
Dalal Alrajeh
176
0
0
10 Nov 2025
Shocks Under Control: Taming Transonic Compressible Flow over an RAE2822 Airfoil with Deep Reinforcement Learning
Shocks Under Control: Taming Transonic Compressible Flow over an RAE2822 Airfoil with Deep Reinforcement Learning
Trishit Mondal
Ricardo Vinuesa
Ameya D. Jagtap
AI4CE
105
0
0
10 Nov 2025
FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation
FinRpt: Dataset, Evaluation System and LLM-based Multi-agent Framework for Equity Research Report Generation
Song Jin
Shuqi Li
Shukun Zhang
Rui Yan
AIFin
554
1
0
10 Nov 2025
Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention
Textual Self-attention Network: Test-Time Preference Optimization through Textual Gradient-based Attention
Shibing Mo
Haoyang Ruan
Kai Wu
Jing Liu
223
0
0
10 Nov 2025
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
A. Sorokin
N. Buzun
Alexander Anokhin
Oleg Inozemcev
Egor Vedernikov
Petr Anokhin
Mikhail Burtsev
Trushkov Alexey
Yin Wenshuai
Evgeny Burnaev
RALM
177
0
0
10 Nov 2025
Secure Low-altitude Maritime Communications via Intelligent Jamming
Secure Low-altitude Maritime Communications via Intelligent JammingScience China Information Sciences (Sci. China Inf. Sci.), 2025
Jiawei Huang
Aimin Wang
Geng Sun
Jiahui Li
Jiacheng Wang
Weijie Yuan
Dusit Niyato
Xianbin Wang
110
0
0
10 Nov 2025
Robot Learning from a Physical World Model
Robot Learning from a Physical World Model
Jiageng Mao
Sicheng He
Hao-Ning Wu
Yang You
Shuyang Sun
...
Huizhong Chen
Leonidas Guibas
Vitor Campagnolo Guizilini
Zhengyu Ma
Yue Wang
VGenPINN
424
0
0
10 Nov 2025
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation
Improving Deepfake Detection with Reinforcement Learning-Based Adaptive Data Augmentation
Yuxuan Zhou
Tao Yu
Wen Huang
Yuheng Zhang
Tao Dai
Shu-Tao Xia
146
0
0
10 Nov 2025
Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
Superhuman AI for Stratego Using Self-Play Reinforcement Learning and Test-Time Search
Samuel Sokota
Eugene Vinitsky
Hengyuan Hu
J. Zico Kolter
Gabriele Farina
72
0
0
10 Nov 2025
Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization
Enabling Off-Policy Imitation Learning with Deep Actor Critic Stabilization
Sayambhu Sen
Shalabh Bhatnagar
103
0
0
10 Nov 2025
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
SPA: Achieving Consensus in LLM Alignment via Self-Priority Optimization
Yue Huang
Xiangqi Wang
Xiangliang Zhang
131
0
0
09 Nov 2025
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
Chen He
Xun Jiang
Lei Wang
Hao-ran Yang
Chong Peng
Peng Yan
Fumin Shen
Xing Xu
LRM
236
0
0
09 Nov 2025
Cross-Platform Learnable Fuzzy Gain-Scheduled Proportional-Integral-Derivative Controller Tuning via Physics-Constrained Meta-Learning and Reinforcement Learning Adaptation
Cross-Platform Learnable Fuzzy Gain-Scheduled Proportional-Integral-Derivative Controller Tuning via Physics-Constrained Meta-Learning and Reinforcement Learning Adaptation
JiaHao Wu
ShengWen Yu
AI4CE
313
0
0
09 Nov 2025
FLEX: Continuous Agent Evolution via Forward Learning from Experience
FLEX: Continuous Agent Evolution via Forward Learning from Experience
Zhicheng Cai
Xinyuan Guo
Yu Pei
Jiangtao Feng
Jiangjie Chen
Ya Zhang
Wei-Ying Ma
Mingxuan Wang
Hao Zhou
Hao Zhou
CLLLLMAGLRM
279
4
0
09 Nov 2025
Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment
Deep Reinforcement Learning for Dynamic Origin-Destination Matrix Estimation in Microscopic Traffic Simulations Considering Credit Assignment
Donggyu Min
Seongjin Choi
Dong-Kyu Kim
58
0
0
09 Nov 2025
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B
Sen Xu
Yi Zhou
Wei Wang
Jixin Min
Z. Yin
Yingwei Dai
Shixi Liu
Lianyu Pang
Yirong Chen
J. Zhang
MoELRMVLM
168
1
0
09 Nov 2025
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning
Yingnan Zhao
Xinmiao Wang
Dewei Wang
Xinzhe Liu
Dan. Lu
Qilong Han
P. Liu
Chenjia Bai
149
0
0
09 Nov 2025
ScRPO: From Errors to Insights
ScRPO: From Errors to Insights
Lianrui Li
Dakuan Lu
Jiawei Shao
Chi Zhang
LRM
155
0
0
08 Nov 2025
Policy Gradient-Based EMT-in-the-Loop Learning to Mitigate Sub-Synchronous Control Interactions
Policy Gradient-Based EMT-in-the-Loop Learning to Mitigate Sub-Synchronous Control Interactions
Sayak Mukherjee
Ramij-Raja Hossain
Kaustav Chatterjee
Sameer Nekkalapu
Marcelo Elizondo
109
0
0
08 Nov 2025
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
Revisiting Entropy in Reinforcement Learning for Large Reasoning Models
Renren Jin
Pengzhi Gao
Yuqi Ren
Zhuowen Han
Tongxuan Zhang
Wuwei Huang
Wei Liu
Jian Luan
Deyi Xiong
LRM
125
1
0
08 Nov 2025
Approximating Shapley Explanations in Reinforcement Learning
Approximating Shapley Explanations in Reinforcement Learning
Daniel Beechey
Özgür Simsek
FAttOffRL
351
0
0
08 Nov 2025
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework
Chao Zhang
Y Samuel Wang
Derong Xu
Haoxin Zhang
Yuanjie Lyu
...
Tong Xu
Xiangyu Zhao
Yan Gao
Yao Hu
Enhong Chen
3DV
428
1
0
07 Nov 2025
Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale
Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale
Bassel Al Omari
Michael T. Matthews
Alexander Rutherford
Jakob N. Foerster
117
1
0
07 Nov 2025
Reasoning Is All You Need for Urban Planning AI
Reasoning Is All You Need for Urban Planning AI
Sijie Yang
Jiatong Li
Filip Biljecki
32
0
0
07 Nov 2025
You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models
Shuvendu Roy
Hossein Hajimirsadeghi
Mengyao Zhai
Golnoosh Samei
OffRLReLMLRM
348
0
0
07 Nov 2025
Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning
Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning
Yahui Fu
Zi Haur Pang
Tatsuya Kawahara
124
0
0
07 Nov 2025
Distributionally Robust Self Paced Curriculum Reinforcement Learning
Distributionally Robust Self Paced Curriculum Reinforcement Learning
Anirudh Satheesh
Keenan Powell
Vaneet Aggarwal
OODOffRL
496
0
0
07 Nov 2025
SSPO: Subsentence-level Policy Optimization
SSPO: Subsentence-level Policy Optimization
Kun Yang
Zikang chen
Yanmeng Wang
Zhigen Li
115
0
0
06 Nov 2025
DMA: Online RAG Alignment with Human Feedback
DMA: Online RAG Alignment with Human Feedback
Yu Bai
Yukai Miao
Dawei Wang
Li Chen
Fei Long
...
Yanyu Ren
Tianfeng Liu
Hongtao Xie
Ce Yang
Xuhui Cai
158
0
0
06 Nov 2025
PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration
PUL-SLAM: Path-Uncertainty Co-Optimization with Lightweight Stagnation Detection for Efficient Robotic Exploration
Yizhen Yin
Dapeng Feng
Hongbo Chen
Yuhua Qi
130
0
0
06 Nov 2025
RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods
Raghav Sharma
Manan Mehta
Sai Tiger Raina
310
0
0
06 Nov 2025
DARN: Dynamic Adaptive Regularization Networks for Efficient and Robust Foundation Model Adaptation
DARN: Dynamic Adaptive Regularization Networks for Efficient and Robust Foundation Model Adaptation
Dhenenjay Yadav
Rohan Sawai
AI4CE
192
0
0
06 Nov 2025
Previous
123...567...227228229
Next