v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018

Pieter Abbeel

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown

A Primer on SO(3) Action Representations in Deep Reinforcement Learning

Martin Schuck

Sherif Samy

Angela P. Schoellig

101

13 Oct 2025

PAC-Bayesian Reinforcement Learning Trains Generalizable Policies

167

12 Oct 2025

Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models

113

11 Oct 2025

Dejavu: Towards Experience Feedback Learning for Embodied Intelligence

Shalayiding Sirejiding

160

11 Oct 2025

Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework

103

11 Oct 2025

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

11 Oct 2025

Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning

178

10 Oct 2025

Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood

10 Oct 2025

Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach

137

10 Oct 2025

Energy-Guided Diffusion Sampling for Long-Term User Behavior Prediction in Reinforcement Learning-based Recommendation

105

09 Oct 2025

Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints

220

09 Oct 2025

Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning

199

09 Oct 2025

Continual Learning for Adaptive AI Systems

Md Hasibul Amin

Tamzid Tanvi Alam

CLL

252

09 Oct 2025

Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement Learning

09 Oct 2025

Maximum In-Support Return Modeling for Dynamic Recommendation with Language Model Prior

09 Oct 2025

Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem

114

09 Oct 2025

Adaptive Motion Planning via Contact-Based Intent Inference for Human-Robot Collaboration

Jiurun Song

X. Liang

Minghui Zheng

09 Oct 2025

RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training

...

08 Oct 2025

Deterministic algorithms for inhomogeneous Bernoulli trials: Shapley value of network devices

Jesse D Wei

Guo Wei

FAtt

226

08 Oct 2025

Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025

262

08 Oct 2025

Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions

Frank Wu

Mengye Ren

156

08 Oct 2025

Incoherence in goal-conditioned autoregressive models

Jacek Karwowski

Raymond Douglas

109

08 Oct 2025

Phase Diagram of Dropout for Two-Layer Neural Networks in the Mean-Field Regime

Lénaic Chizat

Pierre Marion

Yerkin Yesbay

105

08 Oct 2025

Multi-Task Reinforcement Learning with Language-Encoded Gated Policy Networks

Rushiv Arora

MoE

103

07 Oct 2025

Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies

137

07 Oct 2025

BuilderBench -- A benchmark for generalist agents

138

07 Oct 2025

Controllable Audio-Visual Viewpoint Generation from 360° Spatial Information

151

07 Oct 2025

Automaton Constrained Q-Learning

Anastasios Manganaris

Vittorio Giammarino

A. H. Qureshi

195

06 Oct 2025

General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks

112

06 Oct 2025

Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning

182

06 Oct 2025

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

Theodore Jerome Tinker

Kenji Doya

Jun Tani

LM&Ro LRM

229

06 Oct 2025

DREAMer-VXS: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments

Agniprabha Chakraborty

06 Oct 2025

LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0

06 Oct 2025

Flexible Locomotion Learning with Diffusion Model Predictive Control

158

05 Oct 2025

A KL-regularization framework for learning to plan with adaptive priors

123

05 Oct 2025

Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops

Mattia Scardecchia

ViT

169

04 Oct 2025

Comparative Analysis of Parameterized Action Actor-Critic Reinforcement Learning Algorithms for Web Search Match Plan Generation

Ubayd Bapoo

Clement N Nyirenda

141

03 Oct 2025

D2 Actor Critic: Diffusion Actor Meets Distributional Critic

264

03 Oct 2025

A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models

02 Oct 2025

Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning

159

02 Oct 2025

Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method

01 Oct 2025

Fixing That Free Lunch: When, Where, and Why Synthetic Data Fails in Model-Based Policy Optimization

Brett Barkley

David Fridovich-Keil

OffRL

167

01 Oct 2025

Differentiable Skill Optimisation for Powder Manipulation in Laboratory Automation

102

01 Oct 2025

Constant in an Ever-Changing World

01 Oct 2025

Diversity-Incentivized Exploration for Versatile Reasoning

146

30 Sep 2025

Memory-Driven Self-Improvement for Decision Making with Large Language Models

128

30 Sep 2025

Noise-Guided Transport for Imitation Learning

Lionel Blondé

Joao A. Candido Ramos

Alexandros Kalousis

204

30 Sep 2025

Accelerating Transformers in Online RL

143

30 Sep 2025

Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models

30 Sep 2025

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption

314

29 Sep 2025