ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1702.08165
  4. Cited By
Reinforcement Learning with Deep Energy-Based Policies

Reinforcement Learning with Deep Energy-Based Policies

27 February 2017
Tuomas Haarnoja
Haoran Tang
Pieter Abbeel
Sergey Levine
ArXivPDFHTML

Papers citing "Reinforcement Learning with Deep Energy-Based Policies"

50 / 242 papers shown
Title
Preference Optimization for Combinatorial Optimization Problems
Preference Optimization for Combinatorial Optimization Problems
Mingjun Pan
Guanquan Lin
You-Wei Luo
Bin Zhu
Zhien Dai
Lijun Sun
Chun Yuan
23
0
0
13 May 2025
RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation
RLMiniStyler: Light-weight RL Style Agent for Arbitrary Sequential Neural Style Generation
Jing Hu
Chengming Feng
Shu Hu
Ming-Ching Chang
Xin Li
Xi Wu
Xin Wang
36
0
0
07 May 2025
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Towards Efficient Online Tuning of VLM Agents via Counterfactual Soft Reinforcement Learning
Lang Feng
Weihao Tan
Zhiyi Lyu
Longtao Zheng
Haiyang Xu
M. Yan
Fei Huang
Jingyi Wang
29
0
0
01 May 2025
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization
Emiliano Penaloza
Tianyue H. Zhan
Laurent Charlin
Mateo Espinosa Zarlenga
51
0
0
25 Apr 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
31
0
0
21 Apr 2025
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Taiwei Shi
Yiyang Wu
Linxin Song
Tianyi Zhou
Jieyu Zhao
LRM
78
1
0
07 Apr 2025
Maximum Entropy Reinforcement Learning with Diffusion Policy
Maximum Entropy Reinforcement Learning with Diffusion Policy
Xiaoyi Dong
Jian Cheng
Xiaotian Zhang
46
0
0
17 Feb 2025
Mirror Descent Actor Critic via Bounded Advantage Learning
Mirror Descent Actor Critic via Bounded Advantage Learning
Ryo Iwaki
93
0
0
06 Feb 2025
Regularized Langevin Dynamics for Combinatorial Optimization
Regularized Langevin Dynamics for Combinatorial Optimization
Shengyu Feng
Yiming Yang
73
0
0
01 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
L. Zhang
Dacheng Tao
81
3
0
31 Jan 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
54
4
0
29 Jan 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
76
0
0
28 Jan 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
41
16
0
28 Jan 2025
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
On Generalization and Distributional Update for Mimicking Observations with Adequate Exploration
Yirui Zhou
Xiaowei Liu
Xiaofeng Zhang
Yangchun Zhang
37
0
0
22 Jan 2025
Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors
Inverse Reinforcement Learning with Switching Rewards and History Dependency for Characterizing Animal Behaviors
Jingyang Ke
Feiyang Wu
Jiyi Wang
Jeffrey Markowitz
Anqi Wu
82
0
0
22 Jan 2025
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing
Vernon Luk
Jean Oh
84
0
0
16 Dec 2024
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Robust Contact-rich Manipulation through Implicit Motor Adaptation
Teng Xue
Amirreza Razmjoo
Suhan Shetty
Sylvain Calinon
102
1
0
16 Dec 2024
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets
Zhen Liu
Tim Z. Xiao
Weiyang Liu
Yoshua Bengio
Dinghuai Zhang
123
2
0
10 Dec 2024
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Sharp Analysis for KL-Regularized Contextual Bandits and RLHF
Heyang Zhao
Chenlu Ye
Quanquan Gu
Tong Zhang
OffRL
57
3
0
07 Nov 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
67
5
0
22 Oct 2024
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Optimizing Backward Policies in GFlowNets via Trajectory Likelihood Maximization
Timofei Gritsaev
Nikita Morozov
S. Samsonov
D. Tiapkin
18
0
0
20 Oct 2024
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Web Agents with World Models: Learning and Leveraging Environment Dynamics in Web Navigation
Hyungjoo Chae
Namyoung Kim
Kai Tzu-iunn Ong
Minju Gwak
Gwanwoo Song
Jihoon Kim
S. Kim
Dongha Lee
Jinyoung Yeo
LLMAG
33
14
0
17 Oct 2024
Reward-free World Models for Online Imitation Learning
Reward-free World Models for Online Imitation Learning
Shangzhe Li
Zhiao Huang
H. Su
OffRL
65
1
0
17 Oct 2024
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning
SHIRE: Enhancing Sample Efficiency using Human Intuition in REinforcement Learning
Amogh Joshi
Adarsh Kosta
Kaushik Roy
OffRL
42
2
0
16 Sep 2024
Hard Prompts Made Interpretable: Sparse Entropy Regularization for
  Prompt Tuning with RL
Hard Prompts Made Interpretable: Sparse Entropy Regularization for Prompt Tuning with RL
Yunseon Choi
Sangmin Bae
Seonghyun Ban
Minchan Jeong
Chuheng Zhang
Lei Song
Li Zhao
Jiang Bian
Kee-Eung Kim
VLM
AAML
36
3
0
20 Jul 2024
The Impact of Quantization and Pruning on Deep Reinforcement Learning
  Models
The Impact of Quantization and Pruning on Deep Reinforcement Learning Models
Heng Lu
Mehdi Alemi
Reza Rawassizadeh
34
1
0
05 Jul 2024
Simplifying Deep Temporal Difference Learning
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
62
14
0
05 Jul 2024
Residual-MPPI: Online Policy Customization for Continuous Control
Residual-MPPI: Online Policy Customization for Continuous Control
Pengcheng Wang
Chenran Li
Catherine Weaver
Kenta Kawamoto
M. Tomizuka
Chen Tang
Wei Zhan
OffRL
34
3
0
01 Jul 2024
Decoupling regularization from the action space
Decoupling regularization from the action space
Sobhan Mohammadpour
Emma Frejinger
Pierre-Luc Bacon
31
0
0
10 Jun 2024
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL
Qi Lv
Xiang Deng
Gongwei Chen
Michael Yu Wang
Liqiang Nie
75
7
0
08 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
43
2
0
30 May 2024
Offline Regularised Reinforcement Learning for Large Language Models
  Alignment
Offline Regularised Reinforcement Learning for Large Language Models Alignment
Pierre Harvey Richemond
Yunhao Tang
Daniel Guo
Daniele Calandriello
M. G. Azar
...
Gil Shamir
Rishabh Joshi
Tianqi Liu
Rémi Munos
Bilal Piot
OffRL
46
22
0
29 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Exclusively Penalized Q-learning for Offline Reinforcement Learning
Exclusively Penalized Q-learning for Offline Reinforcement Learning
Junghyuk Yeom
Yonghyeon Jo
Jungmo Kim
Sanghyeon Lee
Seungyul Han
OffRL
40
2
0
23 May 2024
S$^2$AC: Energy-Based Reinforcement Learning with Stein Soft Actor
  Critic
S2^22AC: Energy-Based Reinforcement Learning with Stein Soft Actor Critic
Safa Messaoud
Billel Mokeddem
Zhenghai Xue
L. Pang
Bo An
Haipeng Chen
Sanjay Chawla
41
3
0
02 May 2024
Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter
  Lesson of Reinforcement Learning
Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning
Michal Nauman
Michal Bortkiewicz
Piotr Milo's
Tomasz Trzciñski
M. Ostaszewski
Marek Cygan
OffRL
27
17
0
01 Mar 2024
Blending Data-Driven Priors in Dynamic Games
Blending Data-Driven Priors in Dynamic Games
Justin Lidard
Haimin Hu
Asher Hancock
Zixu Zhang
Albert Gimó Contreras
...
Deepak Gopinath
Guy Rosman
Naomi Ehrich Leonard
María Santos
J. F. Fisac
OffRL
40
5
0
21 Feb 2024
Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
Inverse Reinforcement Learning by Estimating Expertise of Demonstrators
M. Beliaev
Ramtin Pedarsani
35
2
0
02 Feb 2024
Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator
Extrinsicaly Rewarded Soft Q Imitation Learning with Discriminator
Ryoma Furuyama
Daiki Kuyoshi
Satoshi Yamane
18
0
0
30 Jan 2024
Adaptive trajectory-constrained exploration strategy for deep
  reinforcement learning
Adaptive trajectory-constrained exploration strategy for deep reinforcement learning
Guojian Wang
Faguo Wu
Xiao Zhang
Ning Guo
Zhiming Zheng
28
3
0
27 Dec 2023
GAD-PVI: A General Accelerated Dynamic-Weight Particle-Based Variational
  Inference Framework
GAD-PVI: A General Accelerated Dynamic-Weight Particle-Based Variational Inference Framework
Fangyikang Wang
Huminhao Zhu
Chao Zhang
Han Zhao
Hui Qian
24
5
0
27 Dec 2023
Digital Twin-Enhanced Deep Reinforcement Learning for Resource
  Management in Networks Slicing
Digital Twin-Enhanced Deep Reinforcement Learning for Resource Management in Networks Slicing
Zhengming Zhang
Yongming Huang
Cheng Zhang
Qingbi Zheng
Luxi Yang
Xiaohu You
24
12
0
28 Nov 2023
Rule-Based Lloyd Algorithm for Multi-Robot Motion Planning and Control with Safety and Convergence Guarantees
Rule-Based Lloyd Algorithm for Multi-Robot Motion Planning and Control with Safety and Convergence Guarantees
Manuel Boldrer
Álvaro Serra-Gómez
Lorenzo Lyons
Vít Krátký
Javier Alonso-Mora
Laura Ferranti
81
4
0
30 Oct 2023
Bridging the Gap between Newton-Raphson Method and Regularized Policy
  Iteration
Bridging the Gap between Newton-Raphson Method and Regularized Policy Iteration
Zeyang Li
Chuxiong Hu
Yunan Wang
Guojian Zhan
Jie Li
Shengbo Eben Li
21
0
0
11 Oct 2023
Reinforcement Learning in the Era of LLMs: What is Essential? What is
  needed? An RL Perspective on RLHF, Prompting, and Beyond
Reinforcement Learning in the Era of LLMs: What is Essential? What is needed? An RL Perspective on RLHF, Prompting, and Beyond
Hao Sun
OffRL
34
21
0
09 Oct 2023
Increasing Entropy to Boost Policy Gradient Performance on
  Personalization Tasks
Increasing Entropy to Boost Policy Gradient Performance on Personalization Tasks
Andrew Starnes
Anton Dereventsov
Clayton Webster
24
0
0
09 Oct 2023
Amortizing intractable inference in large language models
Amortizing intractable inference in large language models
Marvin Schmitt
Moksh Jain
Daniel Habermann
Younesse Kaddar
Ullrich Kothe
Stefan T. Radev
Nikolay Malkin
AIFin
BDL
29
46
0
06 Oct 2023
A General Offline Reinforcement Learning Framework for Interactive
  Recommendation
A General Offline Reinforcement Learning Framework for Interactive Recommendation
Teng Xiao
Donglin Wang
OffRL
32
73
0
01 Oct 2023
AdaptNet: Policy Adaptation for Physics-Based Character Control
AdaptNet: Policy Adaptation for Physics-Based Character Control
Pei Xu
Kaixiang Xie
Sheldon Andrews
P. Kry
Michael Neff
Morgan McGuire
Ioannis Karamouzas
Victor Zordan
TTA
37
16
0
30 Sep 2023
Recent Advances in Path Integral Control for Trajectory Optimization: An
  Overview in Theoretical and Algorithmic Perspectives
Recent Advances in Path Integral Control for Trajectory Optimization: An Overview in Theoretical and Algorithmic Perspectives
Muhammad Kazim
JunGee Hong
Min-Gyeom Kim
Kwang-Ki K. Kim
37
16
0
22 Sep 2023
12345
Next