ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown
Data-Efficient Multitask DAgger
Data-Efficient Multitask DAgger
Haotian Fu
Ran Gong
Xiaohan Zhang
M. Minniti
Jigarkumar Patel
Karl Schmeckpeper
OffRL
138
0
0
29 Sep 2025
Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models
Parallel Heuristic Search as Inference for Actor-Critic Reinforcement Learning Models
Hanlan Yang
Itamar Mishani
Luca Pivetti
Zachary Kingston
Maxim Likhachev
OffRLLRM
68
0
0
29 Sep 2025
Polychromic Objectives for Reinforcement Learning
Polychromic Objectives for Reinforcement Learning
Jubayer Ibn Hamid
Ifdita Hasan Orney
Ellen Xu
Chelsea Finn
Dorsa Sadigh
OffRL
107
1
0
29 Sep 2025
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Longxiang He
Deheng Ye
Junbo Tan
Xueqian Wang
Li Shen
OnRL
314
0
0
29 Sep 2025
Unlocking the Potential of Soft Actor-Critic for Imitation Learning
Unlocking the Potential of Soft Actor-Critic for Imitation Learning
Nayari Marie Lessa
Melya Boukheddimi
Frank Kirchner
97
0
0
29 Sep 2025
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
STAIR: Addressing Stage Misalignment through Temporal-Aligned Preference Reinforcement Learning
Yao Luan
Ni Mu
Yiqin Yang
Bo Xu
Qing-Shan Jia
99
0
0
28 Sep 2025
An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms
An Investigation of Batch Normalization in Off-Policy Actor-Critic Algorithms
Li Wang
Sudun
X. Zhang
Wenjun Wu
Lei Huang
OffRL
153
0
0
28 Sep 2025
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
DexFlyWheel: A Scalable and Self-improving Data Generation Framework for Dexterous Manipulation
Kefei Zhu
Fengshuo Bai
YuanHao Xiang
Yishuai Cai
Xinglin Chen
...
X. Wang
Hao Dong
Yaodong Yang
Xiaopeng Fan
Yuanpei Chen
97
3
0
28 Sep 2025
Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
Bridging Discrete and Continuous RL: Stable Deterministic Policy Gradient with Martingale Characterization
Ziheng Cheng
Xin Guo
Yufei Zhang
OffRL
104
0
0
28 Sep 2025
Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
Mash, Spread, Slice! Learning to Manipulate Object States via Visual Spatial Progress
Priyanka Mandikal
Jiaheng Hu
Shivin Dass
Sagnik Majumder
Roberto Martín-Martín
Kristen Grauman
151
1
0
28 Sep 2025
DiBS-MTL: Transformation-Invariant Multitask Learning with Direction Oracles
DiBS-MTL: Transformation-Invariant Multitask Learning with Direction Oracles
Surya Murthy
Kushagra Gupta
Mustafa O. Karabag
David Fridovich-Keil
Ufuk Topcu
147
0
0
28 Sep 2025
ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
ZeroSiam: An Efficient Siamese for Test-Time Entropy Optimization without Collapse
Guohao Chen
Shuaicheng Niu
Deyu Chen
Jiahao Yang
Zitian Zhang
Zhuliang Yu
Pengcheng Wu
Zhiqi Shen
AAMLVLM
132
0
0
27 Sep 2025
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
Trust Region Reward Optimization and Proximal Inverse Reward Optimization Algorithm
Yang Chen
Menglin Zou
Jiaqi Zhang
Y. Zhang
Junyi Yang
Gaël Gendron
Libo Zhang
Jiamou Liu
Michael Witbrock
190
0
0
27 Sep 2025
Continuous-Time Reinforcement Learning for Asset-Liability Management
Continuous-Time Reinforcement Learning for Asset-Liability Management
Yilie Huang
76
0
0
27 Sep 2025
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
LAGEA: Language Guided Embodied Agents for Robotic Manipulation
Abdul Monaf Chowdhury
Akm Moshiur Rahman Mazumder
Rabeya Akter
S. Arib
LM&Ro
110
0
0
27 Sep 2025
Quantile Advantage Estimation for Entropy-Safe Reasoning
Quantile Advantage Estimation for Entropy-Safe Reasoning
Junkang Wu
Kexin Huang
Jiancan Wu
An Zhang
Xiang Wang
Xiangnan He
129
2
0
26 Sep 2025
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
Triple-BERT: Do We Really Need MARL for Order Dispatch on Ride-Sharing Platforms?
Zijian Zhao
S. Li
OffRL
129
0
0
26 Sep 2025
Functional Critics Are Essential in Off-Policy Actor-Critic: Provable Convergence and Efficient Exploration
Functional Critics Are Essential in Off-Policy Actor-Critic: Provable Convergence and Efficient Exploration
Qinxun Bai
Yuxuan Han
Wei Xu
Zhengyuan Zhou
OffRL
160
0
0
26 Sep 2025
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
Yulei Qin
Xiaoyu Tan
Zhengbao He
Gang Li
Haojia Lin
...
Yuzheng Cai
Xuan Zhang
Sheng Ye
Ke Li
Xing Sun
398
1
0
26 Sep 2025
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning
Xu Wujiang
Wentian Zhao
Zhenting Wang
Li Yu-Jhe
Jin Can
Jin Mingyu
Mei Kai
Wan Kun
Metaxas Dimitris
101
0
0
26 Sep 2025
Reinforcement Learning for Durable Algorithmic Recourse
Reinforcement Learning for Durable Algorithmic Recourse
Marina Ceccon
Alessandro Fabris
Goran Radanović
Asia J. Biega
Gian Antonio Susto
114
0
0
26 Sep 2025
MTRec: Learning to Align with User Preferences via Mental Reward Models
MTRec: Learning to Align with User Preferences via Mental Reward Models
Mengchen Zhao
Yifan Gao
Yaqing Hou
Xiangyang Li
Pengjie Gu
Zhenhua Dong
Ruiming Tang
Yi Cai
200
0
0
26 Sep 2025
From Parameters to Behavior: Unsupervised Compression of the Policy Space
From Parameters to Behavior: Unsupervised Compression of the Policy Space
Davide Tenedini
Riccardo Zamboni
Mirco Mutti
Marcello Restelli
139
1
0
26 Sep 2025
Inverse Reinforcement Learning Using Just Classification and a Few Regressions
Inverse Reinforcement Learning Using Just Classification and a Few Regressions
Lars van der Laan
Nathan Kallus
Aurélien F. Bibaut
89
2
0
25 Sep 2025
Cross-Modal Instructions for Robot Motion Generation
Cross-Modal Instructions for Robot Motion Generation
William Barron
Xiaoxiang Dong
Matthew Johnson-Roberson
Weiming Zhi
108
0
0
25 Sep 2025
Fine-Tuning LLMs to Analyze Multiple Dimensions of Code Review: A Maximum Entropy Regulated Long Chain-of-Thought Approach
Fine-Tuning LLMs to Analyze Multiple Dimensions of Code Review: A Maximum Entropy Regulated Long Chain-of-Thought Approach
Yongda Yu
Guohao Shi
Xianwei Wu
Haochuan He
XueMing Gu
...
Kui Liu
Qiushi Wang
Zhao Tian
Haifeng Shen
Guoping Rong
LRM
137
0
0
25 Sep 2025
MPC-based Deep Reinforcement Learning Method for Space Robotic Control with Fuel Sloshing Mitigation
MPC-based Deep Reinforcement Learning Method for Space Robotic Control with Fuel Sloshing Mitigation
Mahya Ramezani
M. Alandihallaj
Barış Can Yalçın
Miguel Angel Olivares Mendez
Holger Voos
55
1
0
25 Sep 2025
Actor-Critic without Actor
Actor-Critic without Actor
Donghyeon Ki
Hee-Jun Ahn
Kyungyoon Kim
Byung-Jun Lee
OffRL
161
0
0
25 Sep 2025
Model-Based Reinforcement Learning under Random Observation Delays
Model-Based Reinforcement Learning under Random Observation Delays
Armin Karamzade
Kyungmin Kim
JB Lanier
Davide Corsi
Roy Fox
OffRL
132
0
0
25 Sep 2025
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
Teaching RL Agents to Act Better: VLM as Action Advisor for Online Reinforcement Learning
Xiefeng Wu
Jing Zhao
Shu Zhang
Mingyu Hu
OffRL
99
1
0
25 Sep 2025
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning
Zhenpeng Su
Leiyu Pan
Minxuan Lv
Yuntao Li
Wenping Hu
Fuzheng Zhang
Kun Gai
Guorui Zhou
204
0
0
25 Sep 2025
Robot Trajectron V2: A Probabilistic Shared Control Framework for Navigation
Robot Trajectron V2: A Probabilistic Shared Control Framework for Navigation
Pinhao Song
Yurui Du
Ophelie Saussus
Sofie De Schrijver
Irene Caprara
Peter Janssen
Renaud Detry
110
0
0
24 Sep 2025
Failure Modes of Maximum Entropy RLHF
Failure Modes of Maximum Entropy RLHF
Ömer Veysel Çağatan
Barış Akgün
115
0
0
24 Sep 2025
Complexity-Driven Policy Optimization
Complexity-Driven Policy Optimization
Luca Serfilippi
Giorgio Franceschelli
Antonio Corradi
Mirco Musolesi
77
0
0
24 Sep 2025
Selective Progress-Aware Querying for Human-in-the-Loop Reinforcement Learning
Selective Progress-Aware Querying for Human-in-the-Loop Reinforcement Learning
Anujith Muraleedharan
Anamika J H
65
0
0
24 Sep 2025
Embodied AI: From LLMs to World Models
Embodied AI: From LLMs to World Models
Tongtong Feng
Xin Wang
Yu Jiang
Wenwu Zhu
LM&Ro
339
11
0
24 Sep 2025
Frictional Q-Learning
Frictional Q-Learning
Hyunwoo Kim
Hyo Kyung Lee
OffRL
155
0
0
24 Sep 2025
Memory-Augmented Potential Field Theory: A Framework for Adaptive Control in Non-Convex Domains
Memory-Augmented Potential Field Theory: A Framework for Adaptive Control in Non-Convex Domains
Dongzhe Zheng
Wenjie Mei
109
0
0
24 Sep 2025
DexSkin: High-Coverage Conformable Robotic Skin for Learning Contact-Rich Manipulation
DexSkin: High-Coverage Conformable Robotic Skin for Learning Contact-Rich Manipulation
Suzannah Wistreich
Baiyu Shi
Stephen Tian
Samuel Clarke
Michael Nath
Chengyi Xu
Zhenan Bao
Jiajun Wu
180
1
0
23 Sep 2025
Efficient Reinforcement Learning by Reducing Forgetting with Elephant Activation Functions
Efficient Reinforcement Learning by Reducing Forgetting with Elephant Activation Functions
Qingfeng Lan
Gautham Vasan
A. R. Mahmood
CLL
135
0
0
23 Sep 2025
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
Residual Off-Policy RL for Finetuning Behavior Cloning Policies
Lars Ankile
Zhenyu Jiang
Rocky Duan
Guanya Shi
Pieter Abbeel
Anusha Nagabandi
OffRL
222
4
0
23 Sep 2025
Reduced-Order Model-Guided Reinforcement Learning for Demonstration-Free Humanoid Locomotion
Reduced-Order Model-Guided Reinforcement Learning for Demonstration-Free Humanoid Locomotion
Shuai Liu
Meng Cheng Lau
100
0
0
23 Sep 2025
SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration
SOE: Sample-Efficient Robot Policy Self-Improvement via On-Manifold Exploration
Yang Jin
Jun Lv
Han Xue
Wendi Chen
Chuan Wen
Cewu Lu
177
0
0
23 Sep 2025
Real-Time Reinforcement Learning for Dynamic Tasks with a Parallel Soft Robot
Real-Time Reinforcement Learning for Dynamic Tasks with a Parallel Soft Robot
James Avtges
Jake Ketchum
Millicent Schlafly
Helena Young
Taekyoung Kim
Allison Pinosky
R. Truby
Todd Murphey
116
1
0
23 Sep 2025
RL-augmented Adaptive Model Predictive Control for Bipedal Locomotion over Challenging Terrain
RL-augmented Adaptive Model Predictive Control for Bipedal Locomotion over Challenging Terrain
Junnosuke Kamohara
Feiyang Wu
Chinmayee Wamorkar
Seth Hutchinson
Ye Zhao
145
2
0
22 Sep 2025
Fast Trajectory Planner with a Reinforcement Learning-based Controller for Robotic Manipulators
Fast Trajectory Planner with a Reinforcement Learning-based Controller for Robotic ManipulatorsEngineering applications of artificial intelligence (EAAI), 2025
Yongliang Wang
Hamidreza Kasaei
113
0
0
22 Sep 2025
Preference Distillation via Value based Reinforcement Learning
Preference Distillation via Value based Reinforcement Learning
M. Kwon
Junwon Ko
Kangil Kim
Junmo Kim
153
0
0
21 Sep 2025
End2Race: Efficient End-to-End Imitation Learning for Real-Time F1Tenth Racing
End2Race: Efficient End-to-End Imitation Learning for Real-Time F1Tenth Racing
Zhijie Qiao
Haowei Li
Zhong Cao
Henry X. Liu
105
0
0
21 Sep 2025
Bayesian Ego-graph Inference for Networked Multi-Agent Reinforcement Learning
Bayesian Ego-graph Inference for Networked Multi-Agent Reinforcement Learning
Wei Duan
Jie Lu
Junyu Xuan
BDL
212
0
0
20 Sep 2025
Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers
Mental Accounts for Actions: EWA-Inspired Attention in Decision Transformers
Zahra Aref
Narayan B. Mandayam
OffRL
112
0
0
19 Sep 2025
Previous
123456...909192
Next