ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown
Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries
Reinforcement Learning for Charging Optimization of Inhomogeneous Dicke Quantum Batteries
Xiaobin Song
Siyuan Bai
Da-Wei Wang
Hanxiao Tao
Xizhe Wang
Rebing Wu
Benben Jiang
40
0
0
15 Nov 2025
From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training
From Exploration to Exploitation: A Two-Stage Entropy RLVR Approach for Noise-Tolerant MLLM Training
Donglai Xu
Hongzheng Yang
Yuzhi Zhao
Pingping Zhang
Jinpeng Chen
...
Xiaolei Li
Senkang Hu
Ziyi Guan
Jason Chun Lok Li
L. Po
142
0
0
11 Nov 2025
PrefPoE: Advantage-Guided Preference Fusion for Learning Where to Explore
PrefPoE: Advantage-Guided Preference Fusion for Learning Where to Explore
Zhihao Lin
Lin Wu
Zhen Tian
Jianglin Lan
125
0
0
11 Nov 2025
Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks
Dynamic Sparsity: Challenging Common Sparsity Assumptions for Learning World Models in Robotic Reinforcement Learning Benchmarks
Muthukumar Pandaram
Jakob J. Hollenstein
David Drexel
Samuele Tosatto
A. Rodríguez-Sánchez
J. Piater
CML
200
0
0
11 Nov 2025
SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories
SafeMIL: Learning Offline Safe Imitation Policy from Non-Preferred Trajectories
Returaj Burnwal
Nirav P. Bhatt
Balaraman Ravindran
OffRL
370
0
0
11 Nov 2025
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem ExplorationApplied Soft Computing (ASC), 2017
Ruiyu Qiu
Rui Wang
Guanghui Yang
Xiang Li
Zhijiang Shao
132
0
0
11 Nov 2025
Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning
Multistep Quasimetric Learning for Scalable Goal-conditioned Reinforcement Learning
Bill Chunyuan Zheng
Vivek Myers
Benjamin Eysenbach
Sergey Levine
OffRL
199
0
0
11 Nov 2025
PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork
PADiff: Predictive and Adaptive Diffusion Policies for Ad Hoc Teamwork
Hohei Chan
Xinzhi Zhang
Antao Xiang
Weinan Zhang
Mengchen Zhao
104
0
0
10 Nov 2025
Secure Low-altitude Maritime Communications via Intelligent Jamming
Secure Low-altitude Maritime Communications via Intelligent JammingScience China Information Sciences (Sci. China Inf. Sci.), 2025
Jiawei Huang
Aimin Wang
Geng Sun
Jiahui Li
Jiacheng Wang
Weijie Yuan
Dusit Niyato
Xianbin Wang
110
0
0
10 Nov 2025
Controllable Flow Matching for Online Reinforcement Learning
Controllable Flow Matching for Online Reinforcement Learning
Bin Wang
Boxiang Tao
Haifeng Jing
Hongbo Dou
Zijian Wang
144
1
0
10 Nov 2025
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
Q-RAG: Long Context Multi-step Retrieval via Value-based Embedder Training
A. Sorokin
N. Buzun
Alexander Anokhin
Oleg Inozemcev
Egor Vedernikov
Petr Anokhin
Mikhail Burtsev
Trushkov Alexey
Yin Wenshuai
Evgeny Burnaev
RALM
180
0
0
10 Nov 2025
Rapidly Learning Soft Robot Control via Implicit Time-Stepping
Rapidly Learning Soft Robot Control via Implicit Time-Stepping
Andrew Choi
Dezhong Tong
73
0
0
10 Nov 2025
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
What Makes Reasoning Invalid: Echo Reflection Mitigation for Large Language Models
Chen He
Xun Jiang
Lei Wang
Hao-ran Yang
Chong Peng
Peng Yan
Fumin Shen
Xing Xu
LRM
237
0
0
09 Nov 2025
Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices
Guardian-regularized Safe Offline Reinforcement Learning for Smart Weaning of Mechanical Circulatory Devices
Aysin Tumay
S. Sun
Sonia Fereidooni
Aaron Dumas
Elise Jortberg
Rose Yu
OffRL
156
0
0
08 Nov 2025
Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills
Gentle Manipulation Policy Learning via Demonstrations from VLM Planned Atomic Skills
Jiayu Zhou
Qiwei Wu
Jian Li
Z. Chen
Xiaogang Xiong
Renjing Xu
OffRLLM&Ro
487
0
0
08 Nov 2025
Towards Personalized Quantum Federated Learning for Anomaly Detection
Towards Personalized Quantum Federated Learning for Anomaly DetectionIEEE Transactions on Network Science and Engineering (IEEE TNS&E), 2025
Ratun Rahman
Sina shaham
Dinh C. Nguyen
164
1
0
08 Nov 2025
SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning
SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning
T. Huang
Armin Lederer
Dai-Jie Wu
X. Dai
Sihua Zhang
Stefan Sosnowski
Shao-Hua Sun
Sandra Hirche
189
1
0
07 Nov 2025
Multi-agent Coordination via Flow Matching
Multi-agent Coordination via Flow Matching
Dongsu Lee
Daehee Lee
Amy Zhang
130
0
0
07 Nov 2025
On Flow Matching KL Divergence
On Flow Matching KL Divergence
Maojiang Su
Jerry Yao-Chieh Hu
Sophia Pi
Han Liu
330
0
0
07 Nov 2025
Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games
Blind Inverse Game Theory: Jointly Decoding Rewards and Rationality in Entropy-Regularized Competitive Games
Hamza Virk
Sandro Amaglobeli
Zuhayr Syed
100
0
0
07 Nov 2025
Distributionally Robust Self Paced Curriculum Reinforcement Learning
Distributionally Robust Self Paced Curriculum Reinforcement Learning
Anirudh Satheesh
Keenan Powell
Vaneet Aggarwal
OODOffRL
497
0
0
07 Nov 2025
ReGen: Generative Robot Simulation via Inverse Design
ReGen: Generative Robot Simulation via Inverse DesignInternational Conference on Learning Representations (ICLR), 2025
Phat Nguyen
Tsun-Hsuan Wang
Zhang-Wei Hong
Erfan Aasi
Andrew Silva
Guy Rosman
S. Karaman
Daniela Rus
AI4CE
172
3
0
06 Nov 2025
Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies
Can Context Bridge the Reality Gap? Sim-to-Real Transfer of Context-Aware Policies
M. Iannotta
Yuxuan Yang
J. A. Stork
Erik Schaffernicht
Todor Stoyanov
OffRL
142
1
0
06 Nov 2025
Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning
Environment Agnostic Goal-Conditioning, A Study of Reward-Free Autonomous Learning
Hampus Åström
Elin Anna Topp
Jacek Malec
OffRL
134
0
0
06 Nov 2025
Periodic Skill Discovery
Periodic Skill Discovery
Jonghae Park
Daesol Cho
Jusuk Lee
D. Shim
Inkyu Jang
H. J. Kim
334
0
0
05 Nov 2025
Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments
Optimizing Multi-Lane Intersection Performance in Mixed Autonomy Environments
Manonmani Sekar
Nasim Nezamoddini
189
0
0
04 Nov 2025
Natural-gas storage modelling by deep reinforcement learning
Natural-gas storage modelling by deep reinforcement learning
Tiziano Balaconi
Aldo Glielmo
Marco Taboga
78
0
0
04 Nov 2025
Automated Reward Design for Gran Turismo
Automated Reward Design for Gran Turismo
Michel Ma
Takuma Seno
K. Subramanian
Peter R. Wurman
Peter Stone
Craig Sherstan
209
1
0
03 Nov 2025
Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization
Learning Intractable Multimodal Policies with Reparameterization and Diversity Regularization
Ziqi Wang
Jiashun Liu
L. Pan
234
0
0
03 Nov 2025
Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement Learning
Clustering-Based Weight Orthogonalization for Stabilizing Deep Reinforcement LearningIEEE International Joint Conference on Neural Network (IJCNN), 2025
Guoqing Ma
Y. Zhang
Yuming Dai
Guangfu Hao
Yang Chen
S. Yu
OffRL
130
0
0
02 Nov 2025
SLAP: Shortcut Learning for Abstract Planning
SLAP: Shortcut Learning for Abstract Planning
Yaoyao Liu
Bowen Li
Benjamin Eysenbach
Tom Silver
OffRL
129
1
0
02 Nov 2025
Bootstrap Off-policy with World Model
Bootstrap Off-policy with World Model
Guojian Zhan
Likun Wang
Xiangteng Zhang
Jiaxin Gao
Masayoshi Tomizuka
Shengbo Eben Li
OffRLOnRL
418
0
0
01 Nov 2025
Learning Soft Robotic Dynamics with Active Exploration
Learning Soft Robotic Dynamics with Active Exploration
Hehui Zheng
Bhavya Sukhija
Chenhao Li
Klemens Iten
Andreas Krause
Robert K. Katzschmann
141
0
0
31 Oct 2025
Asynchronous Risk-Aware Multi-Agent Packet Routing for Ultra-Dense LEO Satellite Networks
Asynchronous Risk-Aware Multi-Agent Packet Routing for Ultra-Dense LEO Satellite Networks
Ke He
T. Vu
Le He
Lisheng Fan
Symeon Chatzinotas
Björn E. Ottersten
100
0
0
31 Oct 2025
Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion
Morphology-Aware Graph Reinforcement Learning for Tensegrity Robot Locomotion
Chi Zhang
Mingrui Li
W. Tong
X. Y. Huang
AI4CE
104
0
0
30 Oct 2025
Towards Reinforcement Learning Based Log Loading Automation
Towards Reinforcement Learning Based Log Loading Automation
Ilya Kurinov
Miroslav Ivanov
Grzegorz Orzechowski
A. Mikkola
77
0
0
30 Oct 2025
SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation
SpikeATac: A Multimodal Tactile Finger with Taxelized Dynamic Sensing for Dexterous Manipulation
Eric T. Chang
Peter Ballentine
Zhanpeng He
Do-Gon Kim
Kai Jiang
...
Joaquin Palacios
William Wang
Pedro Piacenza
Ioannis Kymissis
M. Ciocarlie
212
0
0
30 Oct 2025
Real-DRL: Teach and Learn in Reality
Real-DRL: Teach and Learn in Reality
Y. Mao
Yihao Cai
L. Sha
OffRL
135
0
0
30 Oct 2025
Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning
Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning
Federica Tonti
Ricardo Vinuesa
81
0
0
29 Oct 2025
Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning
Sim-to-Real Gentle Manipulation of Deformable and Fragile Objects with Stress-Guided Reinforcement Learning
Kei Ikemura
Yifei Dong
David Blanco-Mulero
Alberta Longhini
Li Chen
Florian T. Pokorny
OffRL
134
0
0
29 Oct 2025
Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning
Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning
Sagalpreet Singh
Rishi Saket
A. Raghuveer
115
0
0
29 Oct 2025
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Off-policy Reinforcement Learning with Model-based Exploration Augmentation
Likun Wang
Xiangteng Zhang
Yinuo Wang
Guojian Zhan
Wenxuan Wang
Haoyu Gao
Jingliang Duan
Shengbo Eben Li
OffRL
173
0
0
29 Oct 2025
Sample-efficient and Scalable Exploration in Continuous-Time RL
Sample-efficient and Scalable Exploration in Continuous-Time RL
Klemens Iten
Lenart Treven
Bhavya Sukhija
Florian Dorfler
Andreas Krause
OffRL
140
1
0
28 Oct 2025
Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering
Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering
Maximilian Bloor
M. Mowbray
Ehecatl Antonio del Rio Chanona
Calvin Tsay
OffRL
132
0
0
28 Oct 2025
Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner
Multi-Agent Conditional Diffusion Model with Mean Field Communication as Wireless Resource Allocation Planner
Kechen Meng
Sinuo Zhang
Rongpeng Li
Xiangming Meng
Chan Wang
Chan Wang
Zhifeng Zhao
Zhifeng Zhao
DiffM
160
0
0
27 Oct 2025
Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach
Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach
Alessandro Sestini
Joakim Bergdahl
Jean-Philippe Barrette-LaPierre
Florian Fuchs
Brady Chen
Michael Jones
Linus Gisslén
170
0
0
27 Oct 2025
TARC: Time-Adaptive Robotic Control
TARC: Time-Adaptive Robotic Control
Arnav Sukhija
Lenart Treven
Jin Cheng
Florian Dorfler
Stelian Coros
Andreas Krause
113
0
0
27 Oct 2025
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning
Shan Zhong
Shutong Ding
He Diao
Xiangyu Wang
Kah Chan Teh
Bei Peng
OffRL
127
0
0
26 Oct 2025
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
Mind Your Entropy: From Maximum Entropy to Trajectory Entropy-Constrained RL
Guojian Zhan
Likun Wang
Pengcheng Wang
Feihong Zhang
Jingliang Duan
Masayoshi Tomizuka
Shengbo Eben Li
78
0
0
25 Oct 2025
STAR-RIS-assisted Collaborative Beamforming for Low-altitude Wireless Networks
STAR-RIS-assisted Collaborative Beamforming for Low-altitude Wireless Networks
Xinyue Liang
Hui Kang
Junwei Che
Jiahui Li
Geng Sun
Qingqing Wu
Jiacheng Wang
Dusit Niyato
67
0
0
25 Oct 2025
Previous
12345...909192
Next