v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018

Pieter Abbeel

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown

A Fairness-Aware Strategy for B5G Physical-layer Security Leveraging Reconfigurable Intelligent Surfaces

Joaquin Garcia-Alfaro

01 Jun 2025

Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning

Jianglin Ding

Jingcheng Tang

Gangshan Jing

152

01 Jun 2025

Optimistic critics can empower small actors

Olya Mastikhina

Dhruv Sreenivas

Pablo Samuel Castro

578

01 Jun 2025

Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning

Kyowoon Lee

Jaesik Choi

DiffM

322

01 Jun 2025

Prompt-Tuned LLM-Augmented DRL for Dynamic O-RAN Network Slicing

Fatemeh Lotfi

Hossein Rajoli

Fatemeh Afghah

235

31 May 2025

Comparing Traditional and Reinforcement-Learning Methods for Energy Storage Control

31 May 2025

Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn

203

31 May 2025

BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies

Kourosh Shahnazari

Seyed Moein Ayyoubzadeh

Mohammadali Keshtparvar

OffRL

240

31 May 2025

MOFGPT: Generative Design of Metal-Organic Frameworks using Language ModelsJournal of Chemical Information and Modeling (JCIM), 2025

Srivathsan Badrinarayanan

162

30 May 2025

Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer

227

30 May 2025

Enhanced DACER Algorithm with High Diffusion Efficiency

...

366

29 May 2025

Human sensory-musculoskeletal modeling and control of whole-body movements

109

29 May 2025

Discriminative Policy Optimization for Token-Level Reward Models

191

29 May 2025

CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image ProcessingInternational Conference on Information Photonics (ICIP), 2025

393

29 May 2025

Normalizing Flows are Capable Models for RL

Raj Ghugare

Benjamin Eysenbach

OffRL AI4CE

371

29 May 2025

Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners

258

29 May 2025

Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data

354

29 May 2025

The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models

...

266

242

28 May 2025

ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning

611

28 May 2025

Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking

Minjae Cho

Hiroyasu Tsukamoto

Huy Trong Tran

158

28 May 2025

Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective

366

27 May 2025

DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning

315

26 May 2025

Decision Flow Policy Optimization

335

26 May 2025

Situationally-Aware Dynamics Learning

Alejandro Murillo-Gonzalez

Lantao Liu

340

26 May 2025

The challenge of hidden gifts in multi-agent reinforcement learning

Dane Malenfant

Blake A. Richards

381

26 May 2025

Token-level Accept or Reject: A Micro Alignment Approach for Large Language ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2025

...

485

26 May 2025

Deep Actor-Critics with Tight Risk Certificates

377

26 May 2025

Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL

206

26 May 2025

Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network

405

26 May 2025

Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning

Zhuochen Liu

Rahul Jain

Quan Nguyen

173

25 May 2025

Structured Reinforcement Learning for Combinatorial Decision-Making

486

25 May 2025

Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning

249

24 May 2025

CiRL: Open-Source Environments for Reinforcement Learning in Circular Economy and Net Zero

351

24 May 2025

KL-regularization Itself is Differentially Private in Bandits and RLHF

234

23 May 2025

H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies

Geeta Chandra Raju Bethala

302

23 May 2025

Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning

Ghada Sokar

Pablo Samuel Castro

345

23 May 2025

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning

281

23 May 2025

How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning

378

22 May 2025

VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving

244

22 May 2025

MPO: Multilingual Safety Alignment via Reward Gap OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

...

318

22 May 2025

FlashBack: Consistency Model-Accelerated Shared Autonomy

483

22 May 2025

Sequential Monte Carlo for Policy Optimization in Continuous POMDPs

Hany Abdulsamad

Sahel Iqbal

Simo Särkkä

356

22 May 2025

Meta-reinforcement learning with minimum attention

Pilhwa Lee

Shashank Gupta

OffRL

306

22 May 2025

Reward-Aware Proto-Representations in Reinforcement Learning

Hon Tik Tse

Siddarth Chandrasekar

Marlos C. Machado

157

22 May 2025

A Temporal Difference Method for Stochastic Continuous Dynamics

Haruki Settai

Naoya Takeishi

Takehisa Yairi

533

21 May 2025

The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning

481

21 May 2025

Learning-based Autonomous Oversteer Control and Collision Avoidance

Seokjun Lee

Seung-Hyun Kong

160

21 May 2025

Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning

369

21 May 2025

AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Momentum

346

20 May 2025

Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning

233

20 May 2025