ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,421 papers shown
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs
  without Fine-tuning
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuningConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ximing Lu
Faeze Brahman
Peter West
Jaehun Jang
Khyathi Chandu
...
Bill Yuchen Lin
Skyler Hallinan
Xiang Ren
Sean Welleck
Yejin Choi
326
33
0
24 May 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsInternational Conference on Learning Representations (ICLR), 2023
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
349
14
0
24 May 2023
Barkour: Benchmarking Animal-level Agility with Quadruped Robots
Barkour: Benchmarking Animal-level Agility with Quadruped Robots
Ken Caluwaerts
Atil Iscen
J. Kew
Wenhao Yu
Tingnan Zhang
...
J. Seto
Carolina Parada
Vikas Sindhwani
Vincent Vanhoucke
Jie Tan
225
75
0
24 May 2023
Inverse Reinforcement Learning with the Average Reward Criterion
Inverse Reinforcement Learning with the Average Reward CriterionNeural Information Processing Systems (NeurIPS), 2023
Feiyang Wu
Jingyang Ke
Anqi Wu
280
14
0
24 May 2023
Adaptive Policy Learning to Additional Tasks
Adaptive Policy Learning to Additional Tasks
Wenjian Hao
Zehui Lu
Zihao Liang
Tianyu Zhou
Shaoshuai Mou
220
0
0
24 May 2023
MARC: A multi-agent robots control framework for enhancing reinforcement
  learning in construction tasks
MARC: A multi-agent robots control framework for enhancing reinforcement learning in construction tasks
Kangkang Duan
C. W. Suen
Zhengbo Zou
116
2
0
23 May 2023
Learning from demonstrations: An intuitive VR environment for imitation
  learning of construction robots
Learning from demonstrations: An intuitive VR environment for imitation learning of construction robots
Kangkang Duan
Zhengbo Zou
132
2
0
23 May 2023
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement
  Learning
RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning
Alexander Scarlatos
Andrew Lan
OffRLLRM
260
28
0
23 May 2023
Language Model Self-improvement by Reinforcement Learning Contemplation
Language Model Self-improvement by Reinforcement Learning ContemplationInternational Conference on Learning Representations (ICLR), 2023
Jing-Cheng Pang
Pengyuan Wang
Kaiyuan Li
Xiong-Hui Chen
Jiacheng Xu
Zongzhang Zhang
Yang Yu
LRMKELM
229
74
0
23 May 2023
Query Rewriting for Retrieval-Augmented Large Language Models
Query Rewriting for Retrieval-Augmented Large Language Models
Xinbei Ma
Yeyun Gong
Pengcheng He
Hai Zhao
Nan Duan
KELMLRM
236
192
0
23 May 2023
Enhancing Chat Language Models by Scaling High-quality Instructional
  Conversations
Enhancing Chat Language Models by Scaling High-quality Instructional ConversationsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
365
747
0
23 May 2023
Constrained Proximal Policy Optimization
Constrained Proximal Policy Optimization
Chengbin Xuan
Feng Zhang
Faliang Yin
H. Lam
113
1
0
23 May 2023
ChemGymRL: An Interactive Framework for Reinforcement Learning for
  Digital Chemistry
ChemGymRL: An Interactive Framework for Reinforcement Learning for Digital Chemistry
Chris Beeler
Sriram Ganapathi Subramanian
Kyle Sprague
Nouha Chatti
C. Bellinger
...
Amanuel Dawit
Zihan Yang
Xinkai Li
Mark Crowley
Isaac Tamblyn
OffRL
202
7
0
23 May 2023
Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep
  Reinforcement Learning
Solving Stabilize-Avoid Optimal Control via Epigraph Form and Deep Reinforcement Learning
Oswin So
Chuchu Fan
162
30
0
23 May 2023
RLBoost: Boosting Supervised Models using Deep Reinforcement Learning
RLBoost: Boosting Supervised Models using Deep Reinforcement LearningNeurocomputing (Neurocomputing), 2023
Eloy Anguiano Batanero
Ángela Fernández Pascual
Á. Jiménez
OffRL
94
3
0
23 May 2023
Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML
Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyMLACM Transactions on Evolutionary Learning and Optimization (TELO), 2023
M. Deutel
G. Kontes
Christopher Mutschler
Jürgen Teich
519
4
0
23 May 2023
Constrained Reinforcement Learning for Dynamic Material Handling
Constrained Reinforcement Learning for Dynamic Material HandlingIEEE International Joint Conference on Neural Network (IJCNN), 2023
Chengpeng Hu
Ziming Wang
Jialin Liu
J. Wen
Bifei Mao
Xinghu Yao
169
1
0
23 May 2023
XRoute Environment: A Novel Reinforcement Learning Environment for
  Routing
XRoute Environment: A Novel Reinforcement Learning Environment for Routing
Zhanwen Zhou
H. Zhuo
Xiaowu Zhang
Qiyuan Deng
109
0
0
23 May 2023
Proximal Policy Gradient Arborescence for Quality Diversity
  Reinforcement Learning
Proximal Policy Gradient Arborescence for Quality Diversity Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2023
Sumeet Batra
Bryon Tjanaka
Matthew C. Fontaine
Aleksei Petrenko
Stefanos Nikolaidis
Gaurav Sukhatme
OffRL
263
24
0
23 May 2023
Optimizing Long-term Value for Auction-Based Recommender Systems via
  On-Policy Reinforcement Learning
Optimizing Long-term Value for Auction-Based Recommender Systems via On-Policy Reinforcement LearningACM Conference on Recommender Systems (RecSys), 2023
Ruiyang Xu
Jalaj Bhandari
D. Korenkevych
Fan Liu
Yuchen He
Alex Nikulkov
Zheqing Zhu
OffRL
305
8
0
23 May 2023
Aligning Large Language Models through Synthetic Feedback
Aligning Large Language Models through Synthetic FeedbackConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sungdong Kim
Sanghwan Bae
Jamin Shin
Soyoung Kang
Donghyun Kwak
Kang Min Yoo
Minjoon Seo
ALMSyDa
273
84
0
23 May 2023
Robust Model-Based Optimization for Challenging Fitness Landscapes
Robust Model-Based Optimization for Challenging Fitness LandscapesInternational Conference on Learning Representations (ICLR), 2023
Saba Ghaffari
Ehsan Saleh
Alex Schwing
Yu-Xiong Wang
Martin D. Burke
Saurabh Sinha
234
3
0
23 May 2023
Developmental Curiosity and Social Interaction in Virtual Agents
Developmental Curiosity and Social Interaction in Virtual AgentsAnnual Meeting of the Cognitive Science Society (CogSci), 2023
Christopher Doyle
Sarah Shader
M. Lau
Megumi Sano
Daniel L. K. Yamins
Nick Haber
LRM
135
2
0
22 May 2023
Training Diffusion Models with Reinforcement Learning
Training Diffusion Models with Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2023
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
578
640
0
22 May 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human FeedbackNeural Information Processing Systems (NeurIPS), 2023
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Abigail Z. Jacobs
Tatsunori B. Hashimoto
ALM
488
764
0
22 May 2023
Making Language Models Better Tool Learners with Execution Feedback
Making Language Models Better Tool Learners with Execution FeedbackNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Shuofei Qiao
Honghao Gui
Chengfei Lv
Qianghuai Jia
Huajun Chen
Ningyu Zhang
LLMAG
418
69
0
22 May 2023
Road Planning for Slums via Deep Reinforcement Learning
Road Planning for Slums via Deep Reinforcement LearningKnowledge Discovery and Data Mining (KDD), 2023
Y. Zheng
Hongyuan Su
Jingtao Ding
Depeng Jin
Yong Li
300
20
0
22 May 2023
Yes, this Way! Learning to Ground Referring Expressions into Actions
  with Intra-episodic Feedback from Supportive Teachers
Yes, this Way! Learning to Ground Referring Expressions into Actions with Intra-episodic Feedback from Supportive TeachersAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
P. Sadler
Sherzod Hakimov
David Schlangen
295
3
0
22 May 2023
Testing of Deep Reinforcement Learning Agents with Surrogate Models
Testing of Deep Reinforcement Learning Agents with Surrogate ModelsACM Transactions on Software Engineering and Methodology (TOSEM), 2023
Matteo Biagiola
Paolo Tonella
248
31
0
22 May 2023
Multi-task Hierarchical Adversarial Inverse Reinforcement Learning
Multi-task Hierarchical Adversarial Inverse Reinforcement LearningInternational Conference on Machine Learning (ICML), 2023
Jiayu Chen
Dipesh Tamboli
Tian-Shing Lan
Vaneet Aggarwal
210
17
0
22 May 2023
Strategy Extraction in Single-Agent Games
Strategy Extraction in Single-Agent Games
Archana Vadakattu
Michelle L. Blom
A. Pearce
172
1
0
22 May 2023
A Reinforcement Learning Approach for Robust Supervisory Control of UAVs
  Under Disturbances
A Reinforcement Learning Approach for Robust Supervisory Control of UAVs Under Disturbances
Ibrahim Ahmed
Marcos Quiñones-Grueiro
Gautam Biswas
95
0
0
21 May 2023
BertRLFuzzer: A BERT and Reinforcement Learning Based Fuzzer
BertRLFuzzer: A BERT and Reinforcement Learning Based FuzzerAAAI Conference on Artificial Intelligence (AAAI), 2023
Piyush Jha
Joseph Scott
Jaya Sriram Ganeshna
M. Singh
Vijay Ganesh
280
9
0
21 May 2023
Synthesizing Diverse Human Motions in 3D Indoor Scenes
Synthesizing Diverse Human Motions in 3D Indoor ScenesIEEE International Conference on Computer Vision (ICCV), 2023
Kaifeng Zhao
Yan Zhang
Shaofei Wang
Thabo Beeler
Siyu Tang
349
103
0
21 May 2023
DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with
  Population Based Training
DexPBT: Scaling up Dexterous Manipulation for Hand-Arm Systems with Population Based Training
Aleksei Petrenko
Arthur Allshire
Gavriel State
Ankur Handa
Viktor Makoviychuk
188
31
0
20 May 2023
Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer
Vision-based DRL Autonomous Driving Agent with Sim2Real Transfer
Dian-Tao Li
Ostap Okhrin
283
5
0
19 May 2023
Learning Diverse Risk Preferences in Population-based Self-play
Learning Diverse Risk Preferences in Population-based Self-playAAAI Conference on Artificial Intelligence (AAAI), 2023
Y. Jiang
Qihan Liu
Xiaoteng Ma
Chenghao Li
Yiqin Yang
Jun Yang
Bin Liang
Qianchuan Zhao
387
8
0
19 May 2023
Counterfactual Fairness Filter for Fair-Delay Multi-Robot Navigation
Counterfactual Fairness Filter for Fair-Delay Multi-Robot NavigationAdaptive Agents and Multi-Agent Systems (AAMAS), 2023
Hikaru Asano
Ryo Yonetani
Mai Nishimura
Tadashi Kozuno
196
1
0
19 May 2023
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive
  Language Models
Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models
Wanqiao Xu
Shi Dong
Dilip Arumugam
Benjamin Van Roy
165
8
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and ValidationArtificial Intelligence Review (AIR), 2023
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
351
146
0
19 May 2023
Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning
  with Energy-based Models
Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based ModelsInternational Conference on Machine Learning (ICML), 2023
Wenhao Ding
Tong Che
Ding Zhao
Marco Pavone
BDLOffRL
143
2
0
18 May 2023
Constrained Environment Optimization for Prioritized Multi-Agent
  Navigation
Constrained Environment Optimization for Prioritized Multi-Agent Navigation
Zhan Gao
Amanda Prorok
185
11
0
18 May 2023
Parallel development of social preferences in fish and machines
Parallel development of social preferences in fish and machinesAnnual Meeting of the Cognitive Science Society (CogSci), 2023
Joshua McGraw
Donsuk Lee
Justin N. Wood
88
3
0
18 May 2023
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image
  Synthesis Evaluation
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis EvaluationNeural Information Processing Systems (NeurIPS), 2023
Yujie Lu
Xianjun Yang
Xiujun Li
Xinze Wang
William Yang Wang
EGVM
424
99
0
18 May 2023
From Data-Fitting to Discovery: Interpreting the Neural Dynamics of
  Motor Control through Reinforcement Learning
From Data-Fitting to Discovery: Interpreting the Neural Dynamics of Motor Control through Reinforcement Learning
Eugene R. Rush
Kaushik Jayaram
J. Humbert
151
2
0
18 May 2023
Optimistic Natural Policy Gradient: a Simple Efficient Policy
  Optimization Framework for Online RL
Optimistic Natural Policy Gradient: a Simple Efficient Policy Optimization Framework for Online RLNeural Information Processing Systems (NeurIPS), 2023
Qinghua Liu
Gellert Weisz
András Gyorgy
Chi Jin
Csaba Szepesvári
OffRL
211
15
0
18 May 2023
Deep Metric Tensor Regularized Policy Gradient
Deep Metric Tensor Regularized Policy Gradient
Gang Chen
Victoria Huang
215
0
0
18 May 2023
Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks
Sharing Lifelong Reinforcement Learning Knowledge via Modulating Masks
Saptarshi Nath
Christos Peridis
Eseoghene Ben-Iwhiwhu
Hengrong Du
Shirin Dora
Cong Liu
Soheil Kolouri
Andrea Soltoggio
CLL
180
11
0
18 May 2023
Reinforcement Learning for Legged Robots: Motion Imitation from
  Model-Based Optimal Control
Reinforcement Learning for Legged Robots: Motion Imitation from Model-Based Optimal Control
A. Miller
Shamel Fahmi
Matthew Chignoli
Sangbae Kim
161
8
0
18 May 2023
Client Selection for Federated Policy Optimization with Environment
  Heterogeneity
Client Selection for Federated Policy Optimization with Environment Heterogeneity
Zhijie Xie
S. H. Song
453
6
0
18 May 2023
Previous
123...139140141...227228229
Next