ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
A Primer on SO(3) Action Representations in Deep Reinforcement Learning
Martin Schuck
Sherif Samy
Angela P. Schoellig
101
0
0
13 Oct 2025
PAC-Bayesian Reinforcement Learning Trains Generalizable Policies
PAC-Bayesian Reinforcement Learning Trains Generalizable Policies
Abdelkrim Zitouni
Mehdi Hennequin
Juba Agoun
Ryan Horache
Nadia Kabachi
Omar Rivasplata
OffRLBDL
167
0
0
12 Oct 2025
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
Reinforcement Fine-Tuning of Flow-Matching Policies for Vision-Language-Action Models
Mingyang Lyu
Yinqian Sun
Erliang Lin
Huangrui Li
Ruolin Chen
Feifei Zhao
Yi Zeng
113
0
0
11 Oct 2025
Dejavu: Towards Experience Feedback Learning for Embodied Intelligence
Dejavu: Towards Experience Feedback Learning for Embodied Intelligence
Shaokai Wu
Yanbiao Ji
Qiuchang Li
Zhiyi Zhang
Shalayiding Sirejiding
Wenyuan Xie
Guodong Zhang
Bayram Bayramli
Yue Ding
Hongtao Lu
160
0
0
11 Oct 2025
Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework
Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework
Kohio Deflesselle
Mélodie Daniel
Aly Magassouba
Miguel Aranda
Olivier Ly
103
0
0
11 Oct 2025
Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective
Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective
Zhezheng Hao
Hong Wang
Haoyang Liu
Jian Luo
Jiarui Yu
Hande Dong
Qiang Lin
Can Wang
Jiawei Chen
AAML
85
7
0
11 Oct 2025
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Shangzhe Li
Dongruo Zhou
Weitong Zhang
OffRL
178
0
0
10 Oct 2025
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
Token-Level Policy Optimization: Linking Group-Level Rewards to Token-Level Aggregation via Markov Likelihood
Xingyu Lin
Yilin Wen
E. Wang
Du Su
Wenbin Liu
Chenfu Bao
Zhonghou Lv
88
1
0
10 Oct 2025
Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach
Robust Driving Control for Autonomous Vehicles: An Intelligent General-sum Constrained Adversarial Reinforcement Learning Approach
Junchao Fan
Qi Wei
Ruichen Zhang
Dusit Niyato
Yang Lu
Jianhua Wang
Xiaolin Chang
B. Ai
AAML
137
1
0
10 Oct 2025
Energy-Guided Diffusion Sampling for Long-Term User Behavior Prediction in Reinforcement Learning-based Recommendation
Energy-Guided Diffusion Sampling for Long-Term User Behavior Prediction in Reinforcement Learning-based Recommendation
Xiaocong Chen
Siyu Wang
Lina Yao
OffRL
105
0
0
09 Oct 2025
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
Entropy Regularizing Activation: Boosting Continuous Control, Large Language Models, and Image Classification with Activation as Entropy Constraints
Zilin Kang
Chonghua Liao
Tingqiang Xu
Huazhe Xu
220
1
0
09 Oct 2025
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Gaze on the Prize: Shaping Visual Attention with Return-Guided Contrastive Learning
Andrew Lee
Ian Chuang
D. Gao
Kai Fukazawa
Iman Soltani
199
0
0
09 Oct 2025
Continual Learning for Adaptive AI Systems
Continual Learning for Adaptive AI Systems
Md Hasibul Amin
Tamzid Tanvi Alam
CLL
252
1
0
09 Oct 2025
Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement Learning
Control Synthesis of Cyber-Physical Systems for Real-Time Specifications through Causation-Guided Reinforcement Learning
Xiaochen Tang
Zhenya Zhang
Miaomiao Zhang
Jie An
86
0
0
09 Oct 2025
Maximum In-Support Return Modeling for Dynamic Recommendation with Language Model Prior
Maximum In-Support Return Modeling for Dynamic Recommendation with Language Model Prior
Xiaocong Chen
Siyu Wang
Lina Yao
OffRLAI4TS
91
0
0
09 Oct 2025
Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem
Zero-Shot Policy Transfer in Reinforcement Learning using Buckingham's Pi Theorem
Francisco Pascoa
Ian Lalonde
Alexandre Girard
OffRL
114
1
0
09 Oct 2025
Adaptive Motion Planning via Contact-Based Intent Inference for Human-Robot Collaboration
Adaptive Motion Planning via Contact-Based Intent Inference for Human-Robot Collaboration
Jiurun Song
X. Liang
Minghui Zheng
98
1
0
09 Oct 2025
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
RLinf-VLA: A Unified and Efficient Framework for VLA+RL Training
Hongzhi Zang
Mingjie Wei
Si Xu
Y. Wu
Zhen Guo
...
Wenhao Tang
Quanlu Zhang
W. Zhang
Chao Yu
Yu Wang
VLM
89
6
0
08 Oct 2025
Deterministic algorithms for inhomogeneous Bernoulli trials: Shapley value of network devices
Deterministic algorithms for inhomogeneous Bernoulli trials: Shapley value of network devices
Jesse D Wei
Guo Wei
FAtt
226
0
0
08 Oct 2025
Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications
Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025
Kento Kawaharazuka
Jihoon Oh
Jun Yamada
Ingmar Posner
Yuke Zhu
LM&Ro
262
24
0
08 Oct 2025
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
Local Reinforcement Learning with Action-Conditioned Root Mean Squared Q-Functions
Frank Wu
Mengye Ren
156
0
0
08 Oct 2025
Incoherence in goal-conditioned autoregressive models
Incoherence in goal-conditioned autoregressive models
Jacek Karwowski
Raymond Douglas
109
0
0
08 Oct 2025
Phase Diagram of Dropout for Two-Layer Neural Networks in the Mean-Field Regime
Phase Diagram of Dropout for Two-Layer Neural Networks in the Mean-Field Regime
Lénaic Chizat
Pierre Marion
Yerkin Yesbay
105
0
0
08 Oct 2025
Multi-Task Reinforcement Learning with Language-Encoded Gated Policy Networks
Multi-Task Reinforcement Learning with Language-Encoded Gated Policy Networks
Rushiv Arora
MoE
103
0
0
07 Oct 2025
Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies
Oracle-Guided Masked Contrastive Reinforcement Learning for Visuomotor Policies
Yuhang Zhang
Jiaping Xiao
Chao Yan
Mir Feroskhan
137
0
0
07 Oct 2025
BuilderBench -- A benchmark for generalist agents
BuilderBench -- A benchmark for generalist agents
Raj Ghugare
Catherine Ji
Kathryn Wantlin
Jin Schofield
Benjamin Eysenbach
138
1
0
07 Oct 2025
Controllable Audio-Visual Viewpoint Generation from 360° Spatial Information
Controllable Audio-Visual Viewpoint Generation from 360° Spatial Information
Christian Marinoni
R. F. Gramaccioni
Eleonora Grassucci
Danilo Comminiello
VGen
151
0
0
07 Oct 2025
Automaton Constrained Q-Learning
Automaton Constrained Q-Learning
Anastasios Manganaris
Vittorio Giammarino
A. H. Qureshi
195
1
0
06 Oct 2025
General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks
General and Efficient Visual Goal-Conditioned Reinforcement Learning using Object-Agnostic Masks
Fahim Shahriar
Cheryl Wang
Alireza Azimi
Gautham Vasan
Hany Hamed Elanwar
A. Rupam Mahmood
Colin Bellinger
112
0
0
06 Oct 2025
Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning
Let it Calm: Exploratory Annealed Decoding for Verifiable Reinforcement Learning
Chenghao Yang
Lin Gui
Chenxiao Yang
Victor Veitch
Lizhu Zhang
Zhuokai Zhao
OffRL
182
0
0
06 Oct 2025
Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration
Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration
Theodore Jerome Tinker
Kenji Doya
Jun Tani
LM&RoLRM
229
0
0
06 Oct 2025
DREAMer-VXS: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments
DREAMer-VXS: A Latent World Model for Sample-Efficient AGV Exploration in Stochastic, Unobserved Environments
Agniprabha Chakraborty
55
0
0
06 Oct 2025
LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0
LMM-Incentive: Large Multimodal Model-based Incentive Design for User-Generated Content in Web 3.0
Jinbo Wen
Jiawen Kang
Linfeng Zhang
Xiaoying Tang
Jianhang Tang
Yang Zhang
Zhaohui Yang
Dusit Niyato
99
0
0
06 Oct 2025
Flexible Locomotion Learning with Diffusion Model Predictive Control
Flexible Locomotion Learning with Diffusion Model Predictive Control
Runhan Huang
Haldun Balim
Heng Yang
Yilun Du
158
1
0
05 Oct 2025
A KL-regularization framework for learning to plan with adaptive priors
A KL-regularization framework for learning to plan with adaptive priors
Álvaro Serra-Gómez
Daniel Jarne Ornia
Dhruva Tirumala
Thomas Moerland
OffRL
123
1
0
05 Oct 2025
Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops
Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops
Mattia Scardecchia
ViT
169
0
0
04 Oct 2025
Comparative Analysis of Parameterized Action Actor-Critic Reinforcement Learning Algorithms for Web Search Match Plan Generation
Comparative Analysis of Parameterized Action Actor-Critic Reinforcement Learning Algorithms for Web Search Match Plan Generation
Ubayd Bapoo
Clement N Nyirenda
141
0
0
03 Oct 2025
D2 Actor Critic: Diffusion Actor Meets Distributional Critic
D2 Actor Critic: Diffusion Actor Meets Distributional Critic
Lunjun Zhang
Shuo Han
Hanrui Lyu
Bradly C. Stadie
OffRL
264
1
0
03 Oct 2025
A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models
A Recipe for Efficient Sim-to-Real Transfer in Manipulation with Online Imitation-Pretrained World Models
Yilin Wang
Shangzhe Li
Haoyi Niu
Zhiao Huang
Weitong Zhang
H. Su
OffRL
91
1
0
02 Oct 2025
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
Ahmed Hendawy
Henrik Metternich
Théo Vincent
Mahdi Kallel
Jan Peters
Carlo DÉramo
OffRL
159
0
0
02 Oct 2025
Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method
Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method
Andy Wu
Chun-Cheng Lin
Rung-Tzuo Liaw
Yuehua Huang
Chihjung Kuo
Chia Tong Weng
93
0
0
01 Oct 2025
Fixing That Free Lunch: When, Where, and Why Synthetic Data Fails in Model-Based Policy Optimization
Fixing That Free Lunch: When, Where, and Why Synthetic Data Fails in Model-Based Policy Optimization
Brett Barkley
David Fridovich-Keil
OffRL
167
0
0
01 Oct 2025
Differentiable Skill Optimisation for Powder Manipulation in Laboratory Automation
Differentiable Skill Optimisation for Powder Manipulation in Laboratory Automation
Minglun Wei
Xintong Yang
Yu-kun Lai
Ze Ji
102
0
0
01 Oct 2025
Constant in an Ever-Changing World
Constant in an Ever-Changing World
Andy Wu
Chun-Cheng Lin
Yuehua Huang
Rung-Tzuo Liaw
CLL
60
0
0
01 Oct 2025
Diversity-Incentivized Exploration for Versatile Reasoning
Diversity-Incentivized Exploration for Versatile Reasoning
Zican Hu
Shilin Zhang
Yafu Li
Jianhao Yan
Xuyang Hu
Leyang Cui
Xiaoye Qu
C. L. Philip Chen
Yu Cheng
Zhi Wang
LRM
146
2
0
30 Sep 2025
Memory-Driven Self-Improvement for Decision Making with Large Language Models
Memory-Driven Self-Improvement for Decision Making with Large Language Models
Xue Yan
Chinmay Pani
Mengyue Yang
Yan Song
Haifeng Zhang
Yingzhen Li
Ning Yang
128
0
0
30 Sep 2025
Noise-Guided Transport for Imitation Learning
Noise-Guided Transport for Imitation Learning
Lionel Blondé
Joao A. Candido Ramos
Alexandros Kalousis
OT
204
0
0
30 Sep 2025
Accelerating Transformers in Online RL
Accelerating Transformers in Online RL
Daniil Zelezetsky
A. Kovalev
Aleksandr I. Panov
OffRL
143
0
0
30 Sep 2025
Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models
Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models
Jaesung R. Park
Junsu Kim
Gyeongman Kim
Jinyoung Jo
Sean Choi
Jaewoong Cho
Ernest K. Ryu
97
1
0
30 Sep 2025
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Longxiang He
Deheng Ye
Junbo Tan
Xueqian Wang
Li Shen
OnRL
314
0
0
29 Sep 2025
Previous
12345...909192
Next