ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.02647
  4. Cited By
Safe and Efficient Off-Policy Reinforcement Learning

Safe and Efficient Off-Policy Reinforcement Learning

8 June 2016
Rémi Munos
T. Stepleton
Anna Harutyunyan
Marc G. Bellemare
    OffRL
ArXivPDFHTML

Papers citing "Safe and Efficient Off-Policy Reinforcement Learning"

50 / 155 papers shown
Title
Automatic Reward Shaping from Confounded Offline Data
Automatic Reward Shaping from Confounded Offline Data
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
OffRL
OnRL
39
0
0
16 May 2025
ShiQ: Bringing back Bellman to LLMs
ShiQ: Bringing back Bellman to LLMs
Pierre Clavier
Nathan Grinsztajn
Raphaël Avalos
Yannis Flet-Berliac
Irem Ergun
...
Eugene Tarassov
Olivier Pietquin
Pierre Harvey Richemond
Florian Strub
Matthieu Geist
OffRL
14
0
0
16 May 2025
Trust-Region Twisted Policy Improvement
Trust-Region Twisted Policy Improvement
Joery A. de Vries
Jinke He
Yaniv Oren
M. Spaan
OffRL
LRM
40
0
0
08 Apr 2025
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
4
0
18 Mar 2025
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
DistRL: An Asynchronous Distributed Reinforcement Learning Framework for On-Device Control Agents
Taiyi Wang
Zhihao Wu
Jianheng Liu
Jianye Hao
Jun Wang
Kun Shao
OffRL
46
14
0
24 Feb 2025
Divergence-Augmented Policy Optimization
Qing Wang
Yingru Li
Jiechao Xiong
Tong Zhang
OffRL
52
16
0
28 Jan 2025
GraCo -- A Graph Composer for Integrated Circuits
GraCo -- A Graph Composer for Integrated Circuits
Stefan Uhlich
Andrea Bonetti
Arun Venkitaraman
Ali Momeni
Ryoga Matsuo
Chia-Yu Hsieh
Eisaku Ohbuchi
Lorenzo Servadei
GNN
97
0
0
21 Nov 2024
Compatible Gradient Approximations for Actor-Critic Algorithms
Compatible Gradient Approximations for Actor-Critic Algorithms
Baturay Saglam
Dionysis Kalogerias
39
0
0
02 Sep 2024
Simplifying Deep Temporal Difference Learning
Simplifying Deep Temporal Difference Learning
Matteo Gallici
Mattie Fellows
Benjamin Ellis
B. Pou
Ivan Masmitja
Jakob Foerster
Mario Martin
OffRL
62
17
0
05 Jul 2024
Towards Robust Policy: Enhancing Offline Reinforcement Learning with
  Adversarial Attacks and Defenses
Towards Robust Policy: Enhancing Offline Reinforcement Learning with Adversarial Attacks and Defenses
Thanh Nguyen
Tung M. Luu
Tri Ton
Chang D. Yoo
OffRL
AAML
39
0
0
18 May 2024
Shared learning of powertrain control policies for vehicle fleets
Shared learning of powertrain control policies for vehicle fleets
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
35
0
0
27 Apr 2024
Off-policy Distributional Q($λ$): Distributional RL without
  Importance Sampling
Off-policy Distributional Q(λλλ): Distributional RL without Importance Sampling
Yunhao Tang
Mark Rowland
Rémi Munos
Bernardo Avila-Pires
Will Dabney
OffRL
17
1
0
08 Feb 2024
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Tight Finite Time Bounds of Two-Time-Scale Linear Stochastic Approximation with Markovian Noise
Shaan ul Haque
S. Khodadadian
S. T. Maguluri
46
11
0
31 Dec 2023
Stochastic Optimal Control Matching
Stochastic Optimal Control Matching
Carles Domingo-Enrich
Jiequn Han
Brandon Amos
Joan Bruna
Ricky T. Q. Chen
DiffM
25
7
0
04 Dec 2023
Counterfactual Explanation Policies in RL
Counterfactual Explanation Policies in RL
Shripad Deshmukh
R Srivatsan
Supriti Vijay
Jayakumar Subramanian
Chirag Agarwal
OffRL
39
0
0
25 Jul 2023
Multi-Task Reinforcement Learning in Continuous Control with Successor
  Feature-Based Concurrent Composition
Multi-Task Reinforcement Learning in Continuous Control with Successor Feature-Based Concurrent Composition
Y. Liu
Aamir Ahmad
29
4
0
24 Mar 2023
Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End
  Policy and Optimistic Smooth Fictitious Play
Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play
Wei Xi
Yongxin Zhang
Changnan Xiao
Xuefeng Huang
Shihong Deng
Haowei Liang
Jie Chen
Peng Sun
OffRL
50
8
0
07 Mar 2023
Sequential Counterfactual Risk Minimization
Sequential Counterfactual Risk Minimization
Houssam Zenati
Eustache Diemert
Matthieu Martin
Julien Mairal
Pierre Gaillard
OffRL
29
3
0
23 Feb 2023
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
HOPE: Human-Centric Off-Policy Evaluation for E-Learning and Healthcare
Ge Gao
Song Ju
Markel Sanz Ausin
Min Chi
OffRL
34
8
0
18 Feb 2023
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement
  Learning
Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning
Brett Daley
Martha White
Chris Amato
Marlos C. Machado
OffRL
25
3
0
26 Jan 2023
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Off-Policy Evaluation for Action-Dependent Non-Stationary Environments
Yash Chandak
Shiv Shankar
Nathaniel D. Bastian
Bruno Castro da Silva
Emma Brunskil
Philip S. Thomas
OffRL
52
6
0
24 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Human-Timescale Adaptation in an Open-Ended Task Space
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&Ro
OffRL
AI4CE
LRM
40
110
0
18 Jan 2023
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance
  System
Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System
Habtamu Hailemichael
B. Ayalew
Lindsey Kerbel
Andrej Ivanco
K. Loiselle
22
4
0
03 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
104
35
0
01 Jan 2023
Driver Assistance Eco-driving and Transmission Control with Deep
  Reinforcement Learning
Driver Assistance Eco-driving and Transmission Control with Deep Reinforcement Learning
Lindsey Kerbel
B. Ayalew
Andrej Ivanco
K. Loiselle
OffRL
24
8
0
15 Dec 2022
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Coordinate Ascent for Off-Policy RL with Global Convergence Guarantees
Hsin-En Su
Yen-Ju Chen
Ping-Chun Hsieh
Xi Liu
OffRL
31
0
0
10 Dec 2022
Knowing the Past to Predict the Future: Reinforcement Virtual Learning
Knowing the Past to Predict the Future: Reinforcement Virtual Learning
Peng Zhang
Yawen Huang
Bingzhang Hu
Shizheng Wang
Haoran Duan
Noura Al Moubayed
Yefeng Zheng
Yang Long
OffRL
27
0
0
02 Nov 2022
Hierarchical reinforcement learning for in-hand robotic manipulation
  using Davenport chained rotations
Hierarchical reinforcement learning for in-hand robotic manipulation using Davenport chained rotations
Francisco Roldan Sanchez
Qiang-qiang Wang
David Córdova Bulens
Kevin McGuinness
Stephen J. Redmond
Noel E. O'Connor
23
1
0
03 Oct 2022
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Konstantina Christakopoulou
Can Xu
Sai Zhang
Sriraj Badam
Trevor Potter
...
Ya Le
Chris Berg
E. B. Dixon
Ed H. Chi
Minmin Chen
OffRL
25
8
0
30 Sep 2022
Reinforcement Learning Algorithms: An Overview and Classification
Reinforcement Learning Algorithms: An Overview and Classification
Fadi AlMahamid
Katarina Grolinger
21
40
0
29 Sep 2022
Opportunities and Challenges from Using Animal Videos in Reinforcement
  Learning for Navigation
Opportunities and Challenges from Using Animal Videos in Reinforcement Learning for Navigation
Vittorio Giammarino
James Queeney
Lucas C. Carstensen
Michael Hasselmo
I. Paschalidis
OffRL
55
4
0
25 Sep 2022
Q-learning Decision Transformer: Leveraging Dynamic Programming for
  Conditional Sequence Modelling in Offline RL
Q-learning Decision Transformer: Leveraging Dynamic Programming for Conditional Sequence Modelling in Offline RL
Taku Yamagata
Ahmed Khalil
Raúl Santos-Rodríguez
OffRL
160
72
0
08 Sep 2022
Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning
  Implementation for High-Freq Stock Trading
Safe-FinRL: A Low Bias and Variance Deep Reinforcement Learning Implementation for High-Freq Stock Trading
Zitao Song
Xuyang Jin
Chenliang Li
OffRL
AIFin
29
1
0
13 Jun 2022
On the Robustness of Safe Reinforcement Learning under Observational
  Perturbations
On the Robustness of Safe Reinforcement Learning under Observational Perturbations
Zuxin Liu
Zijian Guo
Zhepeng Cen
Huan Zhang
Jie Tan
Yue Liu
Ding Zhao
OOD
OffRL
48
36
0
29 May 2022
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still
  Insufficient according to an Off-Policy Measure
The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen
Dongcui Diao
Hechang Chen
Hengshuai Yao
Haiyin Piao
Zhixiao Sun
Zhiwei Yang
Randy Goebel
Bei Jiang
Yi-Ju Chang
OffRL
43
8
0
20 May 2022
Towards biologically plausible Dreaming and Planning in recurrent
  spiking networks
Towards biologically plausible Dreaming and Planning in recurrent spiking networks
C. Capone
P. Paolucci
CLL
31
7
0
20 May 2022
Knowledge Infused Decoding
Knowledge Infused Decoding
Ruibo Liu
Guoqing Zheng
Shashank Gupta
Radhika Gaonkar
Chongyang Gao
Soroush Vosoughi
Milad Shokouhi
Ahmed Hassan Awadallah
KELM
30
14
0
06 Apr 2022
Remember and Forget Experience Replay for Multi-Agent Reinforcement
  Learning
Remember and Forget Experience Replay for Multi-Agent Reinforcement Learning
Pascal Weber
Daniel Wälchli
Mustafa Zeqiri
Petros Koumoutsakos
CLL
OffRL
23
7
0
24 Mar 2022
Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Importance Sampling Placement in Off-Policy Temporal-Difference Methods
Eric Graves
Sina Ghiassian
OffRL
31
2
0
18 Mar 2022
On Credit Assignment in Hierarchical Reinforcement Learning
On Credit Assignment in Hierarchical Reinforcement Learning
Joery A. de Vries
Thomas M. Moerland
Aske Plaat
21
0
0
07 Mar 2022
Follow your Nose: Using General Value Functions for Directed Exploration
  in Reinforcement Learning
Follow your Nose: Using General Value Functions for Directed Exploration in Reinforcement Learning
Durgesh Kalwar
Omkar Shelke
Somjit Nath
Hardik Meisheri
H. Khadilkar
30
1
0
02 Mar 2022
Learning Robust Real-Time Cultural Transmission without Human Data
Learning Robust Real-Time Cultural Transmission without Human Data
Cultural General Intelligence Team
Avishkar Bhoopchand
Bethanie Brownfield
Adrian Collister
Agustin Dal Lago
...
Alex Platonov
Evan Senter
Sukhdeep Singh
Alexander Zacherl
Lei M. Zhang
VLM
48
11
0
01 Mar 2022
Sequential Bayesian experimental designs via reinforcement learning
Sequential Bayesian experimental designs via reinforcement learning
Hikaru Asano
OffRL
18
0
0
14 Feb 2022
Chaining Value Functions for Off-Policy Learning
Chaining Value Functions for Off-Policy Learning
Simon Schmitt
John Shawe-Taylor
Hado van Hasselt
OffRL
28
2
0
17 Jan 2022
Improving the Efficiency of Off-Policy Reinforcement Learning by
  Accounting for Past Decisions
Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions
Brett Daley
Chris Amato
OffRL
23
1
0
23 Dec 2021
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
Model-Value Inconsistency as a Signal for Epistemic Uncertainty
Angelos Filos
Eszter Vértes
Zita Marinho
Gregory Farquhar
Diana Borsa
A. Friesen
Feryal M. P. Behbahani
Tom Schaul
André Barreto
Simon Osindero
44
7
0
08 Dec 2021
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence
  Model Tackles All SMAC Tasks
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Tackles All SMAC Tasks
Linghui Meng
Muning Wen
Yaodong Yang
Chenyang Le
Xiyun Li
Weinan Zhang
Ying Wen
Haifeng Zhang
Jun Wang
Bo Xu
OffRL
31
38
0
06 Dec 2021
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Adaptively Calibrated Critic Estimates for Deep Reinforcement Learning
Nicolai Dorka
Tim Welschehold
Joschka Boedecker
Wolfram Burgard
OffRL
35
9
0
24 Nov 2021
SOPE: Spectrum of Off-Policy Estimators
SOPE: Spectrum of Off-Policy Estimators
C. J. Yuan
Yash Chandak
S. Giguere
Philip S. Thomas
S. Niekum
OffRL
55
5
0
06 Nov 2021
Is Bang-Bang Control All You Need? Solving Continuous Control with
  Bernoulli Policies
Is Bang-Bang Control All You Need? Solving Continuous Control with Bernoulli Policies
Tim Seyde
Igor Gilitschenski
Wilko Schwarting
Bartolomeo Stellato
Martin Riedmiller
Markus Wulfmeier
Daniela Rus
28
44
0
03 Nov 2021
1234
Next