ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02865
  4. Cited By
Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy
  Actor-Critic

Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic

5 June 2023
Tianying Ji
Yuping Luo
Fuchun Sun
Xianyuan Zhan
Jianwei Zhang
Huazhe Xu
    OffRL
    OnRL
ArXivPDFHTML

Papers citing "Seizing Serendipity: Exploiting the Value of Past Success in Off-Policy Actor-Critic"

8 / 8 papers shown
Title
Bigger, Regularized, Optimistic: scaling for compute and
  sample-efficient continuous control
Bigger, Regularized, Optimistic: scaling for compute and sample-efficient continuous control
Michal Nauman
M. Ostaszewski
Krzysztof Jankowski
Piotr Milo's
Marek Cygan
OffRL
29
16
0
25 May 2024
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps
Haoyi Niu
Tianying Ji
Bingqi Liu
Haocheng Zhao
Xiangyu Zhu
Jianying Zheng
Pengfei Huang
Guyue Zhou
Jianming Hu
Xianyuan Zhan
OffRL
OnRL
AI4CE
25
6
0
22 Sep 2023
Simplifying Model-based RL: Learning Representations, Latent-space
  Models, and Policies with One Objective
Simplifying Model-based RL: Learning Representations, Latent-space Models, and Policies with One Objective
Raj Ghugare
Homanga Bharadhwaj
Benjamin Eysenbach
Sergey Levine
Ruslan Salakhutdinov
OffRL
38
25
0
18 Sep 2022
Optimistic Curiosity Exploration and Conservative Exploitation with
  Linear Reward Shaping
Optimistic Curiosity Exploration and Conservative Exploitation with Linear Reward Shaping
Hao Sun
Lei Han
Rui Yang
Xiaoteng Ma
Jian Guo
Bolei Zhou
OffRL
OnRL
27
10
0
15 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
206
832
0
12 Oct 2021
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement
  Learning
Learning Pessimism for Robust and Efficient Off-Policy Reinforcement Learning
Edoardo Cetin
Oya Celiktutan
OffRL
26
16
0
07 Oct 2021
Controlling Overestimation Bias with Truncated Mixture of Continuous
  Distributional Quantile Critics
Controlling Overestimation Bias with Truncated Mixture of Continuous Distributional Quantile Critics
Arsenii Kuznetsov
Pavel Shvechikov
Alexander Grishin
Dmitry Vetrov
131
184
0
08 May 2020
1