ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1801.01290
  4. Cited By
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement
  Learning with a Stochastic Actor
v1v2 (latest)

Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

4 January 2018
Tuomas Haarnoja
Aurick Zhou
Pieter Abbeel
Sergey Levine
ArXiv (abs)PDFHTML

Papers citing "Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor"

50 / 4,552 papers shown
Multi-parameter Control for the $(1+(λ,λ))$-GA on OneMax via Deep Reinforcement Learning
Multi-parameter Control for the (1+(λ,λ))(1+(λ,λ))(1+(λ,λ))-GA on OneMax via Deep Reinforcement LearningFoundations of Genetic Algorithms (FOGA), 2025
Tai Nguyen
Phong Le
Carola Doerr
Nguyen Dang
388
0
0
19 May 2025
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
TD-GRPC: Temporal Difference Learning with Group Relative Policy Constraint for Humanoid Locomotion
Khang Nguyen
Khai Nguyen
An T. Le
Jan Peters
Manfred Huber
Ngo Anh Vien
Minh Nhat Vu
260
2
0
19 May 2025
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Zero-Shot Adaptation of Behavioral Foundation Models to Unseen Dynamics
Maksim Bobrin
Ilya Zisman
Alexander Nikulin
Vladislav Kurenkov
Dmitry V. Dylov
OffRL
234
3
0
19 May 2025
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning
Jiayu Chen
Aravind Venugopal
Shiyu Huang
Jeff Schneider
OffRL
352
0
0
19 May 2025
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Multi-CALF: A Policy Combination Approach with Statistical Guarantees
Georgiy Malaniya
Anton Bolychev
Grigory Yaremenko
Anastasia Krasnaya
Pavel Osinenko
235
0
0
18 May 2025
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
Of Mice and Machines: A Comparison of Learning Between Real World Mice and RL Agents
Shuo Han
German Espinosa
Junda Huang
D. Dombeck
Malcolm A. MacIver
Bradly C. Stadie
456
2
0
18 May 2025
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Bench-NPIN: Benchmarking Non-prehensile Interactive Navigation
Ninghan Zhong
Steven Caro
Avraiem Iskandar
Megnath Ramesh
Stephen L. Smith
233
1
0
17 May 2025
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Q-Policy: Quantum-Enhanced Policy Evaluation for Scalable Reinforcement Learning
Kalyan Cherukuri
Aarav Lala
Yash Yardi
204
1
0
17 May 2025
SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces
SAINT: Attention-Based Policies for Discrete Combinatorial Action Spaces
Matthew Landers
Taylor W. Killian
Thomas Hartvigsen
Afsaneh Doryab
225
0
0
17 May 2025
Exploration by Random Distribution Distillation
Exploration by Random Distribution Distillation
Zhirui Fang
Kai Yang
Jian Tao
Jiafei Lyu
Lusong Li
Li Shen
Xiu Li
331
1
0
16 May 2025
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Deep Symbolic Optimization: Reinforcement Learning for Symbolic Mathematics
Conor F. Hayes
Felipe Leno Da Silva
Jiachen Yang
T. Nathan Mundhenk
Chak Shing Lee
...
Ahmet Can Solak
Thomas Desautels
Daniel Faissol
Brenden K. Petersen
Mikel Landajuela
291
1
0
16 May 2025
Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning
Tool-Aided Evolutionary LLM for Generative Policy Toward Efficient Resource Management in Wireless Federated Learning
Chongyang Tan
Ruoqi Wen
Rongpeng Li
Zhifeng Zhao
Ekram Hossain
Honggang Zhang
369
1
0
16 May 2025
Meta-World+: An Improved, Standardized, RL Benchmark
Meta-World+: An Improved, Standardized, RL Benchmark
Reginald McLean
Evangelos Chatzaroulas
Luc McCutcheon
Frank Röder
Tianhe Yu
...
Ryan Julian
Jordan Terry
Isaac Woungang
Nariman Farsad
Pablo Samuel Castro
OffRL
269
13
0
16 May 2025
ReaCritic: Large Reasoning Transformer-based DRL Critic-model Scaling For Heterogeneous Networks
ReaCritic: Large Reasoning Transformer-based DRL Critic-model Scaling For Heterogeneous Networks
Feiran You
Hongyang Du
OffRLLRM
239
6
0
16 May 2025
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
ReWiND: Language-Guided Rewards Teach Robot Policies without New Demonstrations
Jiahui Zhang
Yusen Luo
Abrar Anwar
Sumedh Anand Sontakke
Joseph J Lim
Jesse Thomason
Erdem Biyik
Jesse Zhang
OffRLLM&Ro
426
19
0
16 May 2025
Zero-Shot Visual Generalization in Robot Manipulation
Zero-Shot Visual Generalization in Robot Manipulation
Sumeet Batra
Gaurav Sukhatme
231
3
0
16 May 2025
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition
Bo Yue
Shuqi Guo
Kaiyu Hu
Chujiao Wang
Benyou Wang
Kui Jia
Guiliang Liu
LRM
304
1
0
16 May 2025
Bi-Level Policy Optimization with Nyström Hypergradients
Bi-Level Policy Optimization with Nyström Hypergradients
Arjun Prakash
Naicheng He
Denizalp Goktas
Amy Greenwald
244
0
0
16 May 2025
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRM
162
8
0
16 May 2025
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Certifying Stability of Reinforcement Learning Policies using Generalized Lyapunov Functions
Kehan Long
Jorge Cortés
Nikolay Atanasov
447
2
0
16 May 2025
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
Haoxiang You
Yilang Liu
Ian Abraham
399
0
0
15 May 2025
Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation
Knowledge capture, adaptation and composition (KCAC): A framework for cross-task curriculum learning in robotic manipulation
Xinrui Wang
Yan Jin
342
0
0
15 May 2025
Modular Robot Control with Motor Primitives
Modular Robot Control with Motor Primitives
Moses C. Nah
Johannes Lachner
Neville Hogan
325
2
0
15 May 2025
Approximated Behavioral Metric-based State Projection for Federated Reinforcement Learning
Approximated Behavioral Metric-based State Projection for Federated Reinforcement LearningInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Zengxia Guo
Bohui An
Zhongqi Lu
FedML
257
0
0
15 May 2025
Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
Preserving Plasticity in Continual Learning with Adaptive Linearity Injection
Seyed Roozbeh Razavi Rohani
Khashayar Khajavi
Wesley Chung
Mo Chen
Sharan Vaswani
CLLAI4CE
213
1
0
14 May 2025
General Dynamic Goal Recognition using Goal-Conditioned and Meta Reinforcement Learning
General Dynamic Goal Recognition using Goal-Conditioned and Meta Reinforcement Learning
Osher Elhadad
Reuth Mirsky
Reuth Mirsky
AI4CE
183
2
0
14 May 2025
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Learning Like Humans: Advancing LLM Reasoning Capabilities via Adaptive Difficulty Curriculum Learning and Expert-Guided Self-Reformulation
Enci Zhang
Xingang Yan
Wei Lin
Tianxiang Zhang
Qianchun Lu
LRM
346
5
0
13 May 2025
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning
Continuous World Coverage Path Planning for Fixed-Wing UAVs using Deep Reinforcement Learning
Mirco Theile
Andres R. Zapata Rodriguez
Marco Caccamo
Alberto L. Sangiovanni-Vincentelli
257
1
0
13 May 2025
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement Learning
Modeling Unseen Environments with Language-guided Composable Causal Components in Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2025
Xinyue Wang
Zhen Zhang
OffRLCML
252
0
0
13 May 2025
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Adaptive Diffusion Policy Optimization for Robotic Manipulation
Huiyun Jiang
Zhuang Yang
334
0
0
13 May 2025
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
Yuhang Huang
JIazhao Zhang
SHilong Zou
Xinwang Liu
Ruizhen Hu
Kai Xu
539
7
0
13 May 2025
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Generalization in Monitored Markov Decision Processes (Mon-MDPs)
Montaser Mohammedalamen
Michael Bowling
299
0
0
13 May 2025
Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models
Imagine, Verify, Execute: Memory-guided Agentic Exploration with Vision-Language Models
Seungjae Lee
Daniel Ekpo
Haowen Liu
Furong Huang
Abhinav Shrivastava
Jia-Bin Huang
LM&Ro
712
0
0
12 May 2025
Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
Drive Fast, Learn Faster: On-Board RL for High Performance Autonomous Racing
Benedict Hildisch
Edoardo Ghignone
Nicolas Baumann
Cheng Hu
Andrea Carron
Michele Magno
242
0
0
12 May 2025
Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review
Combining Bayesian Inference and Reinforcement Learning for Agent Decision Making: A Review
Chengmin Zhou
Ville Kyrki
Pasi Fränti
Laura Ruotsalainen
BDLAI4CE
431
0
0
12 May 2025
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains
Ibne Farabi Shihab
Sanjeda Akter
Anuj Sharma
BDL
227
1
0
12 May 2025
A Reinforcement Learning Framework for Application-Specific TCP Congestion-Control
A Reinforcement Learning Framework for Application-Specific TCP Congestion-Control
Jinming Xing
Muhammad Shahzad
225
2
0
11 May 2025
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with Demonstrations
TREND: Tri-teaching for Robust Preference-based Reinforcement Learning with DemonstrationsIEEE International Conference on Robotics and Automation (ICRA), 2025
Shuaiyi Huang
Mara Levy
Anubhav Gupta
Daniel Ekpo
Ruijie Zheng
Abhinav Shrivastava
271
5
0
09 May 2025
Apple: Toward General Active Perception via Reinforcement Learning
Apple: Toward General Active Perception via Reinforcement Learning
Tim Schneider
Cristiana de Farias
Roberto Calandra
Lawrence Yunliang Chen
Jan Peters
1.0K
2
0
09 May 2025
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
DAPPER: Discriminability-Aware Policy-to-Policy Preference-Based Reinforcement Learning for Query-Efficient Robot Skill Acquisition
Yuki Kadokawa
Jonas Frey
Takahiro Miki
Takamitsu Matsubara
Marco Hutter
201
0
0
09 May 2025
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
ReactDance: Hierarchical Representation for High-Fidelity and Coherent Long-Form Reactive Dance Generation
Jingzhong Lin
Xinru Li
Yuanyuan Qi
Hao Wu
Wenxiang Liu
...
Xuejiao Wang
Xiangfeng Xu
Bangyan Li
Changbo Wang
Gaoqi He
238
0
0
08 May 2025
A critical assessment of reinforcement learning methods for microswimmer navigation in complex flows
A critical assessment of reinforcement learning methods for microswimmer navigation in complex flows
Selim Mecanna
Aurore Loisy
Christophe Eloy
253
1
0
08 May 2025
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Taming OOD Actions for Offline Reinforcement Learning: An Advantage-Based Approach
Xuyang Chen
Keyu Yan
Wenhan Cao
Tianyuan Chen
OffRL
489
2
0
08 May 2025
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
Axel Friedrich Wolter
Tobias Sutter
OffRL
244
0
0
07 May 2025
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Abdulaziz Almuzairee
Rohan Patil
Dwait Bhatt
Henrik I. Christensen
374
1
0
07 May 2025
Optimization of Infectious Disease Intervention Measures Based on Reinforcement Learning - Empirical analysis based on UK COVID-19 epidemic data
Optimization of Infectious Disease Intervention Measures Based on Reinforcement Learning - Empirical analysis based on UK COVID-19 epidemic data
Baida Zhang
Yakai Chen
Huichun Li
Zhenghu Zu
467
0
0
07 May 2025
Joint Resource Management for Energy-efficient UAV-assisted SWIPT-MEC: A Deep Reinforcement Learning Approach
Joint Resource Management for Energy-efficient UAV-assisted SWIPT-MEC: A Deep Reinforcement Learning ApproachIEEE Internet of Things Journal (IEEE IoT J.), 2025
Yue Chen
Hui Kang
Jiahui Li
Geng Sun
Boxiong Wang
Jiacheng Wang
Cong Liang
Shuang Liang
Dusit Niyato
506
7
0
06 May 2025
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Policy-labeled Preference Learning: Is Preference Enough for RLHF?
Taehyun Cho
Seokhun Ju
Seungyub Han
Dohyeong Kim
Kyungjae Lee
Jungwoo Lee
OffRL
435
0
0
06 May 2025
Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems
Decentralized Distributed Proximal Policy Optimization (DD-PPO) for High Performance Computing Scheduling on Multi-User Systems
Matthew Sgambati
Aleksandar Vakanski
Matthew Anderson
156
1
0
06 May 2025
Zero-shot Sim2Real Transfer for Magnet-Based Tactile Sensor on Insertion Tasks
Zero-shot Sim2Real Transfer for Magnet-Based Tactile Sensor on Insertion Tasks
Beining Han
Abhishek Joshi
Gaowen Liu
379
0
0
05 May 2025
Previous
123...101112...909192
Next
Page 11 of 92
Pageof 92