ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1707.06347
  4. Cited By
Proximal Policy Optimization Algorithms
v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
    OffRL
ArXiv (abs)PDFHTML

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,421 papers shown
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewardsNeural Information Processing Systems (NeurIPS), 2023
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
360
202
0
07 Jun 2023
Dual policy as self-model for planning
Dual policy as self-model for planningJournal of Korean institute of intelligent systems (JKIIS), 2023
J. Yoo
Fernanda De La Torre
G. R. Yang
167
1
0
07 Jun 2023
Balancing of competitive two-player Game Levels with Reinforcement
  Learning
Balancing of competitive two-player Game Levels with Reinforcement Learning
Florian Rupp
Manuel Eberhardinger
Kai Eckert
156
9
0
07 Jun 2023
Fairness-Sensitive Policy-Gradient Reinforcement Learning for Reducing
  Bias in Robotic Assistance
Fairness-Sensitive Policy-Gradient Reinforcement Learning for Reducing Bias in Robotic Assistance
Jie Zhu
Mengsha Hu
Xueyao Liang
Amy Zhang
Ruoming Jin
Rui Liu
166
1
0
07 Jun 2023
Adaptive Frequency Green Light Optimal Speed Advisory based on Hybrid
  Actor-Critic Reinforcement Learning
Adaptive Frequency Green Light Optimal Speed Advisory based on Hybrid Actor-Critic Reinforcement Learning
Mingle Xu
Dongyu Zuo
66
2
0
07 Jun 2023
Learning with a Mole: Transferable latent spatial representations for
  navigation without reconstruction
Learning with a Mole: Transferable latent spatial representations for navigation without reconstructionInternational Conference on Learning Representations (ICLR), 2023
G. Bono
L. Antsfeld
Assem Sadek
G. Monaci
Christian Wolf
SSL
315
8
0
06 Jun 2023
Enabling Intelligent Interactions between an Agent and an LLM: A
  Reinforcement Learning Approach
Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach
Bin-Bin Hu
Chenyang Zhao
Pushi Zhang
Zihao Zhou
Yuanhang Yang
Zenglin Xu
Yinan Han
LM&RoLLMAG
605
32
0
06 Jun 2023
State Regularized Policy Optimization on Data with Dynamics Shift
State Regularized Policy Optimization on Data with Dynamics ShiftNeural Information Processing Systems (NeurIPS), 2023
Zhenghai Xue
Qingpeng Cai
Shuchang Liu
Dong Zheng
Peng Jiang
Kun Gai
Bo An
OffRL
369
25
0
06 Jun 2023
RLtools: A Fast, Portable Deep Reinforcement Learning Library for
  Continuous Control
RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control
Jonas Eschmann
Dario Albani
Giuseppe Loianno
OffRL
364
7
0
06 Jun 2023
A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep
  Reinforcement Learning from Vision and Touch
A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and TouchIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023
Federico Ceola
Elisa Maiettini
Lorenzo Rosasco
Lorenzo Natale
219
6
0
06 Jun 2023
Learning Embeddings for Sequential Tasks Using Population of Agents
Learning Embeddings for Sequential Tasks Using Population of AgentsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Mridul Mahajan
Georgios Tzannetos
Goran Radanović
Adish Singla
FedML
262
1
0
05 Jun 2023
Explore to Generalize in Zero-Shot RL
Explore to Generalize in Zero-Shot RLNeural Information Processing Systems (NeurIPS), 2023
E. Zisselman
Itai Lavie
Daniel Soudry
Aviv Tamar
323
20
0
05 Jun 2023
Tackling Cooperative Incompatibility for Zero-Shot Human-AI Coordination
Tackling Cooperative Incompatibility for Zero-Shot Human-AI CoordinationJournal of Artificial Intelligence Research (JAIR), 2023
Yang Li
Shao Zhang
Jichen Sun
Wenhao Zhang
Yali Du
Ying Wen
Xinbing Wang
Wei Pan
295
24
0
05 Jun 2023
Action-Evolution Petri Nets: a Framework for Modeling and Solving
  Dynamic Task Assignment Problems
Action-Evolution Petri Nets: a Framework for Modeling and Solving Dynamic Task Assignment ProblemsInternational Conference on Business Process Management (BPM), 2023
R. Bianco
R. Dijkman
Wim P. M. Nuijten
W. Jaarsveld
127
7
0
05 Jun 2023
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy
  Actor-Critic
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-CriticInternational Conference on Machine Learning (ICML), 2023
Tianying Ji
Yuping Luo
Gang Hua
Xianyuan Zhan
Jianwei Zhang
Huazhe Xu
OffRLOnRL
408
21
0
05 Jun 2023
Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin
  Representation
Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin RepresentationInternational Conference on Machine Learning (ICML), 2023
Wanpeng Zhang
Yilin Li
Boyu Yang
Zongqing Lu
CML
281
3
0
05 Jun 2023
For SALE: State-Action Representation Learning for Deep Reinforcement
  Learning
For SALE: State-Action Representation Learning for Deep Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Scott Fujimoto
Wei-Di Chang
Edward James Smith
S. Gu
Doina Precup
David Meger
OffRL
357
85
0
04 Jun 2023
Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in
  RL
Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL
Miguel Suau
M. Spaan
F. Oliehoek
CML
315
9
0
04 Jun 2023
ContraBAR: Contrastive Bayes-Adaptive Deep RL
ContraBAR: Contrastive Bayes-Adaptive Deep RLInternational Conference on Machine Learning (ICML), 2023
Era Choshen
Aviv Tamar
BDLOffRL
183
10
0
04 Jun 2023
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
OSLM
280
48
0
04 Jun 2023
Cycle Consistency Driven Object Discovery
Cycle Consistency Driven Object DiscoveryInternational Conference on Learning Representations (ICLR), 2023
Aniket Didolkar
Anirudh Goyal
Yoshua Bengio
OCL
343
10
0
03 Jun 2023
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
  Reinforcement Learning
MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Haolin Song
Ming Feng
Wen-gang Zhou
Houqiang Li
OffRL
162
11
0
03 Jun 2023
Synaptic motor adaptation: A three-factor learning rule for adaptive
  robotic control in spiking neural networks
Synaptic motor adaptation: A three-factor learning rule for adaptive robotic control in spiking neural networksInternational Conference on Systems (ICONS), 2023
Samuel Schmidgall
Joe Hays
245
6
0
02 Jun 2023
Learning to Stabilize Online Reinforcement Learning in Unbounded State
  Spaces
Learning to Stabilize Online Reinforcement Learning in Unbounded State SpacesInternational Conference on Machine Learning (ICML), 2023
Brahma S. Pavse
M. Zurek
Yudong Chen
Qiaomin Xie
Josiah P. Hanna
OffRL
360
2
0
02 Jun 2023
Reinforcement Learning with General Utilities: Simpler Variance
  Reduction and Large State-Action Space
Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action SpaceInternational Conference on Machine Learning (ICML), 2023
Anas Barakat
Ilyas Fatkhullin
Niao He
219
14
0
02 Jun 2023
PAGAR: Taming Reward Misalignment in Inverse Reinforcement
  Learning-Based Imitation Learning with Protagonist Antagonist Guided
  Adversarial Reward
PAGAR: Taming Reward Misalignment in Inverse Reinforcement Learning-Based Imitation Learning with Protagonist Antagonist Guided Adversarial Reward
Weichao Zhou
Wenchao Li
275
0
0
02 Jun 2023
OMNI: Open-endedness via Models of human Notions of Interestingness
OMNI: Open-endedness via Models of human Notions of Interestingness
Jenny Zhang
Joel Lehman
Kenneth O. Stanley
Jeff Clune
LRM
439
52
0
02 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model TrainingNeural Information Processing Systems (NeurIPS), 2023
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
465
417
0
02 Jun 2023
EmoUS: Simulating User Emotions in Task-Oriented Dialogues
EmoUS: Simulating User Emotions in Task-Oriented DialoguesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Hsien-chin Lin
Shutong Feng
Christian Geishauser
Nurul Lubis
Carel van Niekerk
Michael Heck
Benjamin Ruppik
Renato Vukovic
Milica Gavsić
122
15
0
02 Jun 2023
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive
  Advantages
ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesInternational Conference on Machine Learning (ICML), 2023
Andrew Jesson
Chris Xiaoxuan Lu
Gunshi Gupta
Angelos Filos
Jakob N. Foerster
Y. Gal
OffRL
361
9
0
02 Jun 2023
Deep Q-Learning versus Proximal Policy Optimization: Performance
  Comparison in a Material Sorting Task
Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting TaskInternational Symposium on Industrial Electronics (ISIE), 2023
Reuf Kozlica
S. Wegenkittl
Simon Hirlaender
OffRL
119
13
0
02 Jun 2023
Interpretable and Explainable Logical Policies via Neurally Guided
  Symbolic Abstraction
Interpretable and Explainable Logical Policies via Neurally Guided Symbolic AbstractionNeural Information Processing Systems (NeurIPS), 2023
Quentin Delfosse
Hikaru Shindo
Devendra Singh Dhami
Kristian Kersting
330
53
0
02 Jun 2023
ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an
  Opportunity?
ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Michael Heck
Nurul Lubis
Benjamin Ruppik
Renato Vukovic
Shutong Feng
Christian Geishauser
Hsien-chin Lin
Carel van Niekerk
Milica Gavsić
215
54
0
02 Jun 2023
Hyperparameters in Reinforcement Learning and How To Tune Them
Hyperparameters in Reinforcement Learning and How To Tune ThemInternational Conference on Machine Learning (ICML), 2023
Theresa Eimer
Marius Lindauer
Roberta Raileanu
OffRL
425
71
0
02 Jun 2023
Symmetric Replay Training: Enhancing Sample Efficiency in Deep
  Reinforcement Learning for Combinatorial Optimization
Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial OptimizationInternational Conference on Machine Learning (ICML), 2023
Hyeon-Seob Kim
Minsu Kim
SungSoo Ahn
Jinkyoo Park
OffRL
443
9
0
02 Jun 2023
Heterogeneous Knowledge for Augmented Modular Reinforcement Learning
Heterogeneous Knowledge for Augmented Modular Reinforcement Learning
Lorenz Wolf
Mirco Musolesi
OffRL
233
0
0
01 Jun 2023
Investigating Navigation Strategies in the Morris Water Maze through
  Deep Reinforcement Learning
Investigating Navigation Strategies in the Morris Water Maze through Deep Reinforcement LearningNeural Networks (Neural Netw.), 2023
A. Liu
Alla Borisyuk
277
12
0
01 Jun 2023
Extracting Reward Functions from Diffusion Models
Extracting Reward Functions from Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023
Felipe Nuti
Tim Franzmeyer
João F. Henriques
198
19
0
01 Jun 2023
Chaos persists in large-scale multi-agent learning despite adaptive
  learning rates
Chaos persists in large-scale multi-agent learning despite adaptive learning rates
Emmanouil-Vasileios Vlatakis-Gkaragkounis
Lampros Flokas
Georgios Piliouras
245
1
0
01 Jun 2023
Normalization Enhances Generalization in Visual Reinforcement Learning
Normalization Enhances Generalization in Visual Reinforcement LearningAdaptive Agents and Multi-Agent Systems (AAMAS), 2023
Lu Li
Jiafei Lyu
Guozheng Ma
Zilin Wang
Zhen Yang
Xiu Li
Zhiheng Li
OOD
202
12
0
01 Jun 2023
TorchRL: A data-driven decision-making library for PyTorch
TorchRL: A data-driven decision-making library for PyTorchInternational Conference on Learning Representations (ICLR), 2023
Albert Bou
Matteo Bettini
Sebastian Dittert
Vikash Kumar
Shagun Sodhani
Xiaomeng Yang
Gianni De Fabritiis
Vincent Moens
OffRLAI4CE
309
65
0
01 Jun 2023
Interactive Character Control with Auto-Regressive Motion Diffusion
  Models
Interactive Character Control with Auto-Regressive Motion Diffusion ModelsACM Transactions on Graphics (TOG), 2023
Yi Shi
Jingbo Wang
Xuekun Jiang
Bingkun Lin
Bo Dai
Xue Bin Peng
DiffMAI4CE
314
41
0
01 Jun 2023
CapText: Large Language Model-based Caption Generation From Image
  Context and Description
CapText: Large Language Model-based Caption Generation From Image Context and Description
Shinjini Ghosh
Sagnik Anupam
VLM
321
4
0
01 Jun 2023
From Pixels to UI Actions: Learning to Follow Instructions via Graphical
  User Interfaces
From Pixels to UI Actions: Learning to Follow Instructions via Graphical User InterfacesNeural Information Processing Systems (NeurIPS), 2023
Peter Shaw
Mandar Joshi
James Cohan
Jonathan Berant
Panupong Pasupat
Hexiang Hu
Urvashi Khandelwal
Kenton Lee
Kristina Toutanova
LLMAGLM&Ro
263
75
0
31 May 2023
Factually Consistent Summarization via Reinforcement Learning with
  Textual Entailment Feedback
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Paul Roit
Johan Ferret
Lior Shani
Roee Aharoni
Geoffrey Cideron
...
Olivier Bachem
G. Elidan
Avinatan Hassidim
Olivier Pietquin
Idan Szpektor
HILM
289
100
0
31 May 2023
Adaptive Coordination in Social Embodied Rearrangement
Adaptive Coordination in Social Embodied RearrangementInternational Conference on Machine Learning (ICML), 2023
Andrew Szot
Unnat Jain
Dhruv Batra
Z. Kira
Ruta Desai
Akshara Rai
221
18
0
31 May 2023
Efficient Diffusion Policies for Offline Reinforcement Learning
Efficient Diffusion Policies for Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Bingyi Kang
Xiao Ma
Chao Du
Tianyu Pang
Shuicheng Yan
OffRL
354
117
0
31 May 2023
Latent Exploration for Reinforcement Learning
Latent Exploration for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
A. Chiappa
Alessandro Marin Vargas
Ann Zixiang Huang
Alexander Mathis
321
27
0
31 May 2023
Scalable Learning of Latent Language Structure With Logical Offline
  Cycle Consistency
Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency
Mayank Agarwal
Ramón Fernández Astudillo
Tahira Naseem
Subhajit Chaudhury
Pavan Kapanipathi
Salim Roukos
Alexander G. Gray
OffRL
184
0
0
31 May 2023
Lottery Tickets in Evolutionary Optimization: On Sparse
  Backpropagation-Free Trainability
Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability
R. T. Lange
Henning Sprekeler
178
2
0
31 May 2023
Previous
123...137138139...227228229
Next