ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Diffusion Fine-Tuning via Reparameterized Policy Gradient of the Soft Q-Function
Hyeongyu Kang
Jaewoo Lee
Woocheol Shin
Kiyoung Om
Jinkyoo Park
101
0
0
04 Dec 2025
Model Whisper: Steering Vectors Unlock Large Language Models' Potential in Test-time
Model Whisper: Steering Vectors Unlock Large Language Models' Potential in Test-time
Xinyue Kang
Diwei Shi
Li Chen
LLMSVLRM
222
0
0
04 Dec 2025
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Natural Language Actor-Critic: Scalable Off-Policy Learning in Language Space
Joey Hong
Kang Liu
Zhan Ling
Jiecao Chen
Sergey Levine
LLMAGOffRL
160
0
0
04 Dec 2025
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Guided Flow Policy: Learning from High-Value Actions in Offline Reinforcement Learning
Franki Nguimatsia Tiofack
Théotime Le Hellard
Fabian Schramm
Nicolas Perrin-Gilbert
Justin Carpentier
242
0
0
03 Dec 2025
Variable-Impedance Muscle Coordination under Slow-Rate Control Frequencies and Limited Observation Conditions Evaluated through Legged Locomotion
Variable-Impedance Muscle Coordination under Slow-Rate Control Frequencies and Limited Observation Conditions Evaluated through Legged Locomotion
Hidaka Asai
Tomoyuki Noda
Jun Morimoto
112
0
0
03 Dec 2025
World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations
World Models for Autonomous Navigation of Terrestrial Robots from LIDAR Observations
Raul Steinmetz
Fabio Demo Rosa
V. A. Kich
J. A. Bottega
Ricardo B. Grando
D. T. Gamarra
3DV
380
0
0
03 Dec 2025
Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA
Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+(λλλ,λλλ))-GA
Tai Nguyen
Phong Le
André Biedenkapp
Carola Doerr
Nguyen Dang
62
0
0
03 Dec 2025
GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding
GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding
Johannes Gaber
Meshal Alharbi
Daniele Gammelli
G. Zardini
43
0
0
02 Dec 2025
Vehicle Dynamics Embedded World Models for Autonomous Driving
Vehicle Dynamics Embedded World Models for Autonomous Driving
Huiqian Li
Wei Pan
Haodong Zhang
Jin Huang
Zhihua Zhong
148
0
0
02 Dec 2025
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Dual-Robust Cross-Domain Offline Reinforcement Learning Against Dynamics Shifts
Zhongjian Qiao
Rui Yang
Jiafei Lyu
Xiu Li
Zhongxiang Dai
Zhuoran Yang
Siyang Gao
Shuang Qiu
OffRL
155
0
0
02 Dec 2025
Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering
Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering
Zhongjian Qiao
Rui Yang
Jiafei Lyu
Chenjia Bai
Xiu Li
Zhuoran Yang
Siyang Gao
Shuang Qiu
OffRL
144
0
0
02 Dec 2025
GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies
GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies
Chubin Zhang
Zhenglin Wan
Feng Chen
Xingrui Yu
Ivor W. Tsang
Bo An
83
0
0
02 Dec 2025
Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control
Differentiable Weightless Controllers: Learning Logic Circuits for Continuous Control
Fabian Kresse
Christoph H. Lampert
203
0
0
01 Dec 2025
How do trout regulate patterns of muscle contraction to optimize propulsive efficiency during steady swimming
Tao Li
Chunze Zhang
Weiwei Yao
Junzhao He
Ji Hou
Qin Zhou
Lu Zhang
45
0
0
01 Dec 2025
On the Tension Between Optimality and Adversarial Robustness in Policy Optimization
Haoran Li
Jiayu Lv
Congying Han
Zicheng Zhang
Anqi Li
Y. Liu
Tiande Guo
Nan Jiang
AAML
139
0
0
01 Dec 2025
A Diffusion Model Framework for Maximum Entropy Reinforcement Learning
A Diffusion Model Framework for Maximum Entropy Reinforcement Learning
Sebastian Sanokowski
Kaustubh Patil
Alois Knoll
DiffM
123
0
0
01 Dec 2025
Discovering Self-Protective Falling Policy for Humanoid Robot via Deep Reinforcement Learning
Diyuan Shi
Shangke Lyu
Donglin Wang
127
0
0
01 Dec 2025
MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion
MS-PPO: Morphological-Symmetry-Equivariant Policy for Legged Robot Locomotion
Sizhe Wei
Xulin Chen
Fengze Xie
Garrett E. Katz
Zhenyu Gan
Lu Gan
53
0
0
30 Nov 2025
Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
Shielded Controller Units for RL with Operational Constraints Applied to Remote Microgrids
Hadi Nekoei
Alexandre Blondin Massé
Rachid Hassani
Sarath Chandar
Vincent Mai
61
0
0
30 Nov 2025
Partially Equivariant Reinforcement Learning in Symmetry-Breaking Environments
Junwoo Chang
Minwoo Park
Joohwan Seo
R. Horowitz
Jongmin Lee
Jongeun Choi
61
1
0
30 Nov 2025
An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
Jianhai Su
Jinzhu Luo
Qi Zhang
OffRLOnRL
253
0
0
29 Nov 2025
MARVO: Marine-Adaptive Radiance-aware Visual Odometry
MARVO: Marine-Adaptive Radiance-aware Visual Odometry
Sacchin Sundar
Atman Kikani
Aaliya Alam
Sumukh Shrote
A. Nayeemulla Khan
A. Shahina
MDE
377
0
0
28 Nov 2025
Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions
Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions
Roland Stolz
Michael Eichelbeck
Matthias Althoff
21
0
0
27 Nov 2025
Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems
Independent policy gradient-based reinforcement learning for economic and reliable energy management of multi-microgrid systems
Junkai Hu
Li Xia
375
0
0
26 Nov 2025
Reinforcing Action Policies by Prophesying
Reinforcing Action Policies by Prophesying
Jiahui Zhang
Ze Huang
Chun Gu
Zipei Ma
Li Zhang
233
1
0
25 Nov 2025
Multi-Agent Cross-Entropy Method with Monotonic Nonlinear Critic Decomposition
Multi-Agent Cross-Entropy Method with Monotonic Nonlinear Critic Decomposition
Yan Wang
Ke Deng
Yongli Ren
159
0
0
24 Nov 2025
Accelerating Reinforcement Learning via Error-Related Human Brain Signals
Accelerating Reinforcement Learning via Error-Related Human Brain Signals
Suzie Kim
Hye-Bin Shin
Hyo-Jeong Jang
OffRL
207
0
0
24 Nov 2025
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
FastForward Pruning: Efficient LLM Pruning via Single-Step Reinforcement Learning
Xin Yuan
S. Li
Jiateng Wei
Chengrui Zhu
Yanming Wu
Qingpeng Li
Jiajun Lv
Xiaoke Lan
Jun Chen
Yong-Jin Liu
OffRL
373
0
0
24 Nov 2025
Active Inference is a Subtype of Variational Inference
Active Inference is a Subtype of Variational Inference
Wouter W. L. Nuijten
Mykola Lukashchuk
153
0
0
24 Nov 2025
First-order Sobolev Reinforcement Learning
First-order Sobolev Reinforcement Learning
Fabian Schramm
Nicolas Perrin-Gilbert
Justin Carpentier
58
0
0
24 Nov 2025
MOMA-AC: A preference-driven actor-critic framework for continuous multi-objective multi-agent reinforcement learning
MOMA-AC: A preference-driven actor-critic framework for continuous multi-objective multi-agent reinforcement learningNeurocomputing (Neurocomputing), 2025
Adam Callaghan
Karl Mason
Patrick Mannion
AI4CE
98
0
0
22 Nov 2025
Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning
Reward Engineering for Spatial Epidemic Simulations: A Reinforcement Learning Platform for Individual Behavioral Learning
Radman Rakhshandehroo
Daniel Coombs
106
0
0
22 Nov 2025
Physical Reinforcement Learning
Physical Reinforcement Learning
Sam Dillavou
Shruti Mishra
OffRL
157
0
0
21 Nov 2025
Optimizing Operation Recipes with Reinforcement Learning for Safe and Interpretable Control of Chemical Processes
D. Brandner
Sergio Lucia
143
0
0
20 Nov 2025
MagBotSim: Physics-Based Simulation and Reinforcement Learning Environments for Magnetic Robotics
Lara Bergmann
Cedric Grothues
Klaus Neumann
109
0
0
20 Nov 2025
Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization
Haohui Chen
Zhiyong Chen
Aoxiang Liu
Wentuo Fang
132
0
0
20 Nov 2025
Limitations of Scalarisation in MORL: A Comparative Study in Discrete Environments
Muhammad Saóod Shah
Asad Jeewa
138
0
0
20 Nov 2025
Stabilizing Policy Gradient Methods via Reward Profiling
Shihab Ahmed
El Houcine Bergou
A. Dutta
Yue Wang
204
0
0
20 Nov 2025
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
EntroPIC: Towards Stable Long-Term Training of LLMs via Entropy Stabilization with Proportional-Integral Control
Kai Yang
Xin Xu
Yangkun Chen
Weijie Liu
Jiafei Lyu
Zichuan Lin
Deheng Ye
Saiyong Yang
237
1
0
19 Nov 2025
Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement Learning
Task Specific Sharpness Aware O-RAN Resource Management using Multi Agent Reinforcement LearningIEEE Transactions on Machine Learning in Communications and Networking (IEEE TMLCN), 2025
Fatemeh Lotfi
Hossein Rajoli
Fatemeh Afghah
101
0
0
19 Nov 2025
IPR-1: Interactive Physical Reasoner
IPR-1: Interactive Physical Reasoner
Mingyu Zhang
Lifeng Zhuo
Tianxi Tan
Guocan Xie
Xian Nie
...
Renjie Zhao
Zizhu He
Z. Wang
Jiting Cai
Yong-Lu Li
PINNLRMAI4CE
402
0
0
19 Nov 2025
Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization
Learning Human-Like RL Agents Through Trajectory Optimization With Action Quantization
Jian-Ting Guo
Yu-Cheng Chen
Ping-Chun Hsieh
Kuo-Hao Ho
Po-Wei Huang
Ti-Rong Wu
I-Chen Wu
88
0
0
19 Nov 2025
Transformer-Guided Deep Reinforcement Learning for Optimal Takeoff Trajectory Design of an eVTOL Drone
Transformer-Guided Deep Reinforcement Learning for Optimal Takeoff Trajectory Design of an eVTOL Drone
Nathan M. Roberts II
Xiaosong Du
128
0
0
18 Nov 2025
$π^{*}_{0.6}$: a VLA That Learns From Experience
π0.6∗π^{*}_{0.6}π0.6∗​: a VLA That Learns From Experience
Physical Intelligence
Ali Amin
Raichelle Aniceto
Ashwin Balakrishna
Kevin Black
...
Blake Williams
Sukwon Yoo
Lili Yu
Ury Zhilinsky
Zhiyuan Zhou
OffRLVLM
897
16
0
18 Nov 2025
Reinforcement Learning from Implicit Neural Feedback for Human-Aligned Robot Control
Reinforcement Learning from Implicit Neural Feedback for Human-Aligned Robot Control
Suzie Kim
OffRL
69
0
0
18 Nov 2025
Soft Conflict-Resolution Decision Transformer for Offline Multi-Task Reinforcement Learning
Soft Conflict-Resolution Decision Transformer for Offline Multi-Task Reinforcement Learning
Shudong Wang
Xinfei Wang
Chenhao Zhang
Shanchen Pang
Haiyuan Gui
Wenhao Ji
Xiaojian Liao
OffRL
125
0
0
17 Nov 2025
Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation
Scalable Multi-Objective and Meta Reinforcement Learning via Gradient Estimation
Zhenshuo Zhang
Minxuan Duan
Youran Ye
Hongyang R. Zhang
OffRL
415
0
0
16 Nov 2025
Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression
Quantile Q-Learning: Revisiting Offline Extreme Q-Learning with Quantile Regression
Xinming Gao
Shangzhe Li
Yujin Cai
Wenwu Yu
OffRL
109
0
0
15 Nov 2025
Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries
Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries
Xiaobin Song
Siyuan Bai
Da-Wei Wang
Hanxiao Tao
Xizhe Wang
Rebing Wu
Benben Jiang
40
0
0
15 Nov 2025
Intelligent Collaborative Optimization for Rubber Tyre Film Production Based on Multi-path Differentiated Clipping Proximal Policy Optimization
Intelligent Collaborative Optimization for Rubber Tyre Film Production Based on Multi-path Differentiated Clipping Proximal Policy Optimization
Yinghao Ruan
Wei Pang
Shuaihao Liu
Huili Yang
Leyi Han
Xinghui Dong
188
0
0
15 Nov 2025
1234...909192
Next