ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown
A Fairness-Aware Strategy for B5G Physical-layer Security Leveraging Reconfigurable Intelligent Surfaces
A Fairness-Aware Strategy for B5G Physical-layer Security Leveraging Reconfigurable Intelligent Surfaces
Alex Pierron
Michel Barbeau
L. D. Cicco
José Rubio-Hernán
Joaquin Garcia-Alfaro
94
0
0
01 Jun 2025
Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning
Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning
Jianglin Ding
Jingcheng Tang
Gangshan Jing
152
0
0
01 Jun 2025
Optimistic critics can empower small actors
Optimistic critics can empower small actors
Olya Mastikhina
Dhruv Sreenivas
Pablo Samuel Castro
578
3
0
01 Jun 2025
Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning
Local Manifold Approximation and Projection for Manifold-Aware Diffusion Planning
Kyowoon Lee
Jaesik Choi
DiffM
322
3
0
01 Jun 2025
Prompt-Tuned LLM-Augmented DRL for Dynamic O-RAN Network Slicing
Prompt-Tuned LLM-Augmented DRL for Dynamic O-RAN Network Slicing
Fatemeh Lotfi
Hossein Rajoli
Fatemeh Afghah
235
7
0
31 May 2025
Comparing Traditional and Reinforcement-Learning Methods for Energy Storage Control
Comparing Traditional and Reinforcement-Learning Methods for Energy Storage Control
Elinor Ginzburg
Itay Segev
Yoash Levron
Sarah Keren
OffRL
49
1
0
31 May 2025
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Mitigating Plasticity Loss in Continual Reinforcement Learning by Reducing Churn
Hongyao Tang
J. Obando-Ceron
Pablo Samuel Castro
Aaron Courville
Glen Berseth
203
3
0
31 May 2025
BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies
BASIL: Best-Action Symbolic Interpretable Learning for Evolving Compact RL Policies
Kourosh Shahnazari
Seyed Moein Ayyoubzadeh
Mohammadali Keshtparvar
OffRL
240
0
0
31 May 2025
MOFGPT: Generative Design of Metal-Organic Frameworks using Language Models
MOFGPT: Generative Design of Metal-Organic Frameworks using Language ModelsJournal of Chemical Information and Modeling (JCIM), 2025
Srivathsan Badrinarayanan
Rishikesh Magar
Akshay Antony
Radheesh Sharma Meda
Amir Barati Farimani
AI4CE
162
10
0
30 May 2025
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer
Yilun Kong
Guozheng Ma
Qi Zhao
Haoyu Wang
Li Shen
Xueqian Wang
Dacheng Tao
MoEOffRL
227
4
0
30 May 2025
Enhanced DACER Algorithm with High Diffusion Efficiency
Enhanced DACER Algorithm with High Diffusion Efficiency
Yinuo Wang
Mining Tan
Wenjun Zou
Haotian Lin
Xujie Song
...
Tianze Zhu
Shiqi Liu
Jingliang Duan
Jingliang Duan
Shengbo Eben Li
DiffM
366
6
0
29 May 2025
Human sensory-musculoskeletal modeling and control of whole-body movements
Human sensory-musculoskeletal modeling and control of whole-body movements
Chenhui Zuo
Guohao Lin
Chen Zhang
Shanning Zhuang
Yanan Sui
109
0
0
29 May 2025
Discriminative Policy Optimization for Token-Level Reward Models
Discriminative Policy Optimization for Token-Level Reward Models
Hongzhan Chen
Tao Yang
Shiping Gao
Ruijun Chen
Xiaojun Quan
Hongtao Tian
Ting Yao
191
3
0
29 May 2025
CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image Processing
CURVE: CLIP-Utilized Reinforcement Learning for Visual Image Enhancement via Simple Image ProcessingInternational Conference on Information Photonics (ICIP), 2025
Yuka Ogino
Takahiro Toizumi
Atsushi Ito
CLIP
393
0
0
29 May 2025
Normalizing Flows are Capable Models for RL
Normalizing Flows are Capable Models for RL
Raj Ghugare
Benjamin Eysenbach
OffRLAI4CE
371
6
0
29 May 2025
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Bigger, Regularized, Categorical: High-Capacity Value Functions are Efficient Multi-Task Learners
Michal Nauman
Marek Cygan
Carmelo Sferrazza
Aviral Kumar
Pieter Abbeel
OffRL
258
7
0
29 May 2025
Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Composite Flow Matching for Reinforcement Learning with Shifted-Dynamics Data
Lingkai Kong
Haichuan Wang
Tonghan Wang
Guojun Xiong
Milind Tambe
OffRL
354
7
0
29 May 2025
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models
Ganqu Cui
Yuchen Zhang
Jiacheng Chen
Lifan Yuan
Zhi Wang
...
Lei Bai
Wanli Ouyang
Yu Cheng
Bowen Zhou
Ning Ding
LRM
266
242
0
28 May 2025
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
ReinFlow: Fine-tuning Flow Matching Policy with Online Reinforcement Learning
Tonghe Zhang
Chao Yu
Sichang Su
Yu Wang
611
17
0
28 May 2025
Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking
Contraction Actor-Critic: Contraction Metric-Guided Reinforcement Learning for Robust Path Tracking
Minjae Cho
Hiroyasu Tsukamoto
Huy Trong Tran
158
0
0
28 May 2025
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Revisiting Multi-Agent World Modeling from a Diffusion-Inspired Perspective
Yang Zhang
Xinran Li
Jianing Ye
Delin Qu
Delin Qu
Chongjie Zhang
Xiu Li
Chenjia Bai
366
5
0
27 May 2025
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning
Leander Diaz-Bone
Marco Bagatella
Jonas Hübotter
Andreas Krause
OffRL
315
5
0
26 May 2025
Decision Flow Policy Optimization
Decision Flow Policy Optimization
Jifeng Hu
Sili Huang
Siyuan Guo
Zhaogeng Liu
Li Shen
Lichao Sun
Hechang Chen
Yi-Ju Chang
Dacheng Tao
335
0
0
26 May 2025
Situationally-Aware Dynamics Learning
Situationally-Aware Dynamics Learning
Alejandro Murillo-Gonzalez
Lantao Liu
340
0
0
26 May 2025
The challenge of hidden gifts in multi-agent reinforcement learning
The challenge of hidden gifts in multi-agent reinforcement learning
Dane Malenfant
Blake A. Richards
381
0
0
26 May 2025
Token-level Accept or Reject: A Micro Alignment Approach for Large Language Models
Token-level Accept or Reject: A Micro Alignment Approach for Large Language ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Y. Zhang
Yu Yu
Bo Tang
Yu Zhu
Chuxiong Sun
...
Jie Hu
Zipeng Xie
Zhiyu Li
Feiyu Xiong
Edward Chung
485
0
0
26 May 2025
Deep Actor-Critics with Tight Risk Certificates
Deep Actor-Critics with Tight Risk Certificates
Bahareh Tasdighi
Manuel Haussmann
Yi-Shan Wu
A. Masegosa
M. Kandemir
UQCV
377
0
0
26 May 2025
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Learning to Trust Bellman Updates: Selective State-Adaptive Regularization for Offline RL
Qin-Wen Luo
Ming-Kun Xie
Ye-Wen Wang
Sheng-Jun Huang
OffRL
206
1
0
26 May 2025
Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network
Surrogate-Assisted Evolutionary Reinforcement Learning Based on Autoencoder and Hyperbolic Neural Network
Bingdong Li
Mei Jiang
Hong Qian
Shengcai Liu
W. Hong
Peng Yang
405
1
0
26 May 2025
Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Reduce Computational Cost In Deep Reinforcement Learning Via Randomized Policy Learning
Zhuochen Liu
Rahul Jain
Quan Nguyen
173
0
0
25 May 2025
Structured Reinforcement Learning for Combinatorial Decision-Making
Structured Reinforcement Learning for Combinatorial Decision-Making
Heiko Hoppe
Léo Baty
Louis Bouvier
Axel Parmentier
Maximilian Schiffer
OffRL
486
5
0
25 May 2025
Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Guided by Guardrails: Control Barrier Functions as Safety Instructors for Robotic Learning
Maeva Guerrier
Karthik Soma
Hassan Fouad
Giovanni Beltrame
249
1
0
24 May 2025
CiRL: Open-Source Environments for Reinforcement Learning in Circular Economy and Net Zero
CiRL: Open-Source Environments for Reinforcement Learning in Circular Economy and Net Zero
Federico Zocco
Andrea Corti
Monica Malvezzi
AI4CE
351
1
0
24 May 2025
KL-regularization Itself is Differentially Private in Bandits and RLHF
KL-regularization Itself is Differentially Private in Bandits and RLHF
Yizhou Zhang
Kishan Panaganti
Laixi Shi
Juba Ziani
Adam Wierman
234
1
0
23 May 2025
H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies
H2-COMPACT: Human-Humanoid Co-Manipulation via Adaptive Contact Trajectory Policies
Geeta Chandra Raju Bethala
Niraj Pudasaini
Niraj Pudasaini
Abdullah Mohamed Ali
Shuaihang Yuan
Congcong Wen
Anthony Tzes
Yi Fang
302
3
0
23 May 2025
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Mind the GAP! The Challenges of Scale in Pixel-based Deep Reinforcement Learning
Ghada Sokar
Pablo Samuel Castro
345
1
0
23 May 2025
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning
Till Freihaut
Luca Viano
Volkan Cevher
Matthieu Geist
Giorgia Ramponi
281
2
0
23 May 2025
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
How Ensembles of Distilled Policies Improve Generalisation in Reinforcement Learning
Max Weltevrede
Moritz A. Zanger
M. Spaan
Wendelin Bohmer
OffRLFedML
378
0
0
22 May 2025
VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving
VL-SAFE: Vision-Language Guided Safety-Aware Reinforcement Learning with World Models for Autonomous Driving
Yansong Qu
Zilin Huang
Zihao Sheng
Jiancong Chen
Sikai Chen
Samuel Labi
OffRL
244
3
0
22 May 2025
MPO: Multilingual Safety Alignment via Reward Gap Optimization
MPO: Multilingual Safety Alignment via Reward Gap OptimizationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Weixiang Zhao
Yulin Hu
Yang Deng
Tongtong Wu
Wenxuan Zhang
...
An Zhang
Yanyan Zhao
Bing Qin
Tat-Seng Chua
Ting Liu
318
7
0
22 May 2025
FlashBack: Consistency Model-Accelerated Shared Autonomy
FlashBack: Consistency Model-Accelerated Shared Autonomy
Luzhe Sun
Jingtian Ji
Xiangshan Tan
Matthew R. Walter
483
1
0
22 May 2025
Sequential Monte Carlo for Policy Optimization in Continuous POMDPs
Sequential Monte Carlo for Policy Optimization in Continuous POMDPs
Hany Abdulsamad
Sahel Iqbal
Simo Särkkä
356
1
0
22 May 2025
Meta-reinforcement learning with minimum attention
Meta-reinforcement learning with minimum attention
Pilhwa Lee
Shashank Gupta
OffRL
306
0
0
22 May 2025
Reward-Aware Proto-Representations in Reinforcement Learning
Reward-Aware Proto-Representations in Reinforcement Learning
Hon Tik Tse
Siddarth Chandrasekar
Marlos C. Machado
157
1
0
22 May 2025
A Temporal Difference Method for Stochastic Continuous Dynamics
A Temporal Difference Method for Stochastic Continuous Dynamics
Haruki Settai
Naoya Takeishi
Takehisa Yairi
533
0
0
21 May 2025
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
The Unreasonable Effectiveness of Entropy Minimization in LLM Reasoning
Shivam Agarwal
Zimin Zhang
Lifan Yuan
Jiawei Han
Yuan Yao
481
88
0
21 May 2025
Learning-based Autonomous Oversteer Control and Collision Avoidance
Learning-based Autonomous Oversteer Control and Collision Avoidance
Seokjun Lee
Seung-Hyun Kong
160
0
0
21 May 2025
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Trajectory Bellman Residual Minimization: A Simple Value-Based Method for LLM Reasoning
Yurun Yuan
Fan Chen
Zeyu Jia
Alexander Rakhlin
Tengyang Xie
OffRL
369
1
0
21 May 2025
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Momentum
AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Momentum
Jian Xiong
Jingbo Zhou
Jingyong Ye
Qiang Huang
Dejing Dou
LRM
346
1
0
20 May 2025
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning
Time Reversal Symmetry for Efficient Robotic Manipulations in Deep Reinforcement Learning
Yunpeng Jiang
Jianshu Hu
Paul Weng
Yutong Ban
233
0
0
20 May 2025
Previous
123...91011...909192
Next
Page 10 of 92
Pageof 92