v1v2 (latest)

Proximal Policy Optimization Algorithms

20 July 2017

Papers citing "Proximal Policy Optimization Algorithms"

50 / 11,421 papers shown

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewardsNeural Information Processing Systems (NeurIPS), 2023

360

202

07 Jun 2023

Dual policy as self-model for planningJournal of Korean institute of intelligent systems (JKIIS), 2023

J. Yoo

Fernanda De La Torre

G. R. Yang

167

07 Jun 2023

Balancing of competitive two-player Game Levels with Reinforcement Learning

Florian Rupp

Manuel Eberhardinger

Kai Eckert

156

07 Jun 2023

Fairness-Sensitive Policy-Gradient Reinforcement Learning for Reducing Bias in Robotic Assistance

166

07 Jun 2023

Adaptive Frequency Green Light Optimal Speed Advisory based on Hybrid Actor-Critic Reinforcement Learning

Mingle Xu

Dongyu Zuo

07 Jun 2023

Learning with a Mole: Transferable latent spatial representations for navigation without reconstructionInternational Conference on Learning Representations (ICLR), 2023

315

06 Jun 2023

Enabling Intelligent Interactions between an Agent and an LLM: A Reinforcement Learning Approach

605

06 Jun 2023

State Regularized Policy Optimization on Data with Dynamics ShiftNeural Information Processing Systems (NeurIPS), 2023

369

06 Jun 2023

RLtools: A Fast, Portable Deep Reinforcement Learning Library for Continuous Control

364

06 Jun 2023

A Grasp Pose is All You Need: Learning Multi-fingered Grasping with Deep Reinforcement Learning from Vision and TouchIEEE/RJS International Conference on Intelligent RObots and Systems (IROS), 2023

Federico Ceola

Elisa Maiettini

Lorenzo Rosasco

Lorenzo Natale

219

06 Jun 2023

Learning Embeddings for Sequential Tasks Using Population of AgentsInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

262

05 Jun 2023

Explore to Generalize in Zero-Shot RLNeural Information Processing Systems (NeurIPS), 2023

323

05 Jun 2023

Tackling Cooperative Incompatibility for Zero-Shot Human-AI CoordinationJournal of Artificial Intelligence Research (JAIR), 2023

Shao Zhang

Wenhao Zhang

Xinbing Wang

Wei Pan

295

05 Jun 2023

Action-Evolution Petri Nets: a Framework for Modeling and Solving Dynamic Task Assignment ProblemsInternational Conference on Business Process Management (BPM), 2023

127

05 Jun 2023

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-CriticInternational Conference on Machine Learning (ICML), 2023

408

05 Jun 2023

Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin RepresentationInternational Conference on Machine Learning (ICML), 2023

281

05 Jun 2023

For SALE: State-Action Representation Learning for Deep Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023

357

04 Jun 2023

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

315

04 Jun 2023

ContraBAR: Contrastive Bayes-Adaptive Deep RLInternational Conference on Machine Learning (ICML), 2023

Era Choshen

Aviv Tamar

BDL OffRL

183

04 Jun 2023

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Banghua Zhu

Hiteshi Sharma

Felipe Vieira Frujeri

280

04 Jun 2023

Cycle Consistency Driven Object DiscoveryInternational Conference on Learning Representations (ICLR), 2023

343

03 Jun 2023

MA2CL:Masked Attentive Contrastive Learning for Multi-Agent Reinforcement LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023

162

03 Jun 2023

Synaptic motor adaptation: A three-factor learning rule for adaptive robotic control in spiking neural networksInternational Conference on Systems (ICONS), 2023

Samuel Schmidgall

Joe Hays

245

02 Jun 2023

Learning to Stabilize Online Reinforcement Learning in Unbounded State SpacesInternational Conference on Machine Learning (ICML), 2023

360

02 Jun 2023

Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action SpaceInternational Conference on Machine Learning (ICML), 2023

Anas Barakat

Ilyas Fatkhullin

Niao He

219

02 Jun 2023

PAGAR: Taming Reward Misalignment in Inverse Reinforcement Learning-Based Imitation Learning with Protagonist Antagonist Guided Adversarial Reward

Weichao Zhou

Wenchao Li

275

02 Jun 2023

OMNI: Open-endedness via Models of human Notions of Interestingness

Jeff Clune

439

02 Jun 2023

Fine-Grained Human Feedback Gives Better Rewards for Language Model TrainingNeural Information Processing Systems (NeurIPS), 2023

Weijia Shi

Prithviraj Ammanabrolu

Noah A. Smith

Mari Ostendorf

Hannaneh Hajishirzi

ALM

465

417

02 Jun 2023

EmoUS: Simulating User Emotions in Task-Oriented DialoguesAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023

Benjamin Ruppik

122

02 Jun 2023

ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive AdvantagesInternational Conference on Machine Learning (ICML), 2023

361

02 Jun 2023

Deep Q-Learning versus Proximal Policy Optimization: Performance Comparison in a Material Sorting TaskInternational Symposium on Industrial Electronics (ISIE), 2023

119

02 Jun 2023

Interpretable and Explainable Logical Policies via Neurally Guided Symbolic AbstractionNeural Information Processing Systems (NeurIPS), 2023

330

02 Jun 2023

ChatGPT for Zero-shot Dialogue State Tracking: A Solution or an Opportunity?Annual Meeting of the Association for Computational Linguistics (ACL), 2023

Benjamin Ruppik

215

02 Jun 2023

Hyperparameters in Reinforcement Learning and How To Tune ThemInternational Conference on Machine Learning (ICML), 2023

425

02 Jun 2023

Symmetric Replay Training: Enhancing Sample Efficiency in Deep Reinforcement Learning for Combinatorial OptimizationInternational Conference on Machine Learning (ICML), 2023

Jinkyoo Park

443

02 Jun 2023

Heterogeneous Knowledge for Augmented Modular Reinforcement Learning

Lorenz Wolf

Mirco Musolesi

OffRL

233

01 Jun 2023

Investigating Navigation Strategies in the Morris Water Maze through Deep Reinforcement LearningNeural Networks (Neural Netw.), 2023

A. Liu

Alla Borisyuk

277

01 Jun 2023

Extracting Reward Functions from Diffusion ModelsNeural Information Processing Systems (NeurIPS), 2023

Felipe Nuti

Tim Franzmeyer

João F. Henriques

198

01 Jun 2023

Chaos persists in large-scale multi-agent learning despite adaptive learning rates

Emmanouil-Vasileios Vlatakis-Gkaragkounis

Lampros Flokas

Georgios Piliouras

245

01 Jun 2023

Normalization Enhances Generalization in Visual Reinforcement LearningAdaptive Agents and Multi-Agent Systems (AAMAS), 2023

202

01 Jun 2023

TorchRL: A data-driven decision-making library for PyTorchInternational Conference on Learning Representations (ICLR), 2023

Vikash Kumar

309

01 Jun 2023

Interactive Character Control with Auto-Regressive Motion Diffusion ModelsACM Transactions on Graphics (TOG), 2023

314

01 Jun 2023

CapText: Large Language Model-based Caption Generation From Image Context and Description

Shinjini Ghosh

Sagnik Anupam

VLM

321

01 Jun 2023

From Pixels to UI Actions: Learning to Follow Instructions via Graphical User InterfacesNeural Information Processing Systems (NeurIPS), 2023

263

31 May 2023

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2023

...

Olivier Bachem

Olivier Pietquin

289

100

31 May 2023

Adaptive Coordination in Social Embodied RearrangementInternational Conference on Machine Learning (ICML), 2023

Ruta Desai

221

31 May 2023

Efficient Diffusion Policies for Offline Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023

354

117

31 May 2023

Latent Exploration for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023

A. Chiappa

Alessandro Marin Vargas

Ann Zixiang Huang

Alexander Mathis

321

31 May 2023

Scalable Learning of Latent Language Structure With Logical Offline Cycle Consistency

Mayank Agarwal

Ramón Fernández Astudillo

184

31 May 2023

Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability

R. T. Lange

Henning Sprekeler

178

31 May 2023