ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.03381
  4. Cited By
Learning Montezuma's Revenge from a Single Demonstration

Learning Montezuma's Revenge from a Single Demonstration

8 December 2018
Tim Salimans
Richard J. Chen
ArXivPDFHTML

Papers citing "Learning Montezuma's Revenge from a Single Demonstration"

50 / 92 papers shown
Title
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
MARGE: Improving Math Reasoning for LLMs with Guided Exploration
Jingyue Gao
Runji Lin
Keming Lu
Bowen Yu
Junyang Lin
Jianyu Chen
LRM
2
0
0
18 May 2025
Causally Aligned Curriculum Learning
Causally Aligned Curriculum Learning
Mingxuan Li
Junzhe Zhang
Elias Bareinboim
CML
64
3
0
21 Mar 2025
Unsupervised Skill Discovery for Robotic Manipulation through Automatic
  Task Generation
Unsupervised Skill Discovery for Robotic Manipulation through Automatic Task Generation
Paul Jansonnie
Bingbing Wu
Julien Perez
Jan Peters
SSL
25
0
0
07 Oct 2024
DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with
  multi-fingered robots
DemoStart: Demonstration-led auto-curriculum applied to sim-to-real with multi-fingered robots
Maria Bauzá
José Enrique Chen
Valentin Dalibard
Nimrod Gileadi
Roland Hafner
...
Martin Riedmiller
Jon Scholz
Konstantinos Bousmalis
Francesco Nori
Nicolas Heess
34
5
0
10 Sep 2024
Whole-Body Control Through Narrow Gaps From Pixels To Action
Whole-Body Control Through Narrow Gaps From Pixels To Action
Tianyue Wu
Yeke Chen
Tianyang Chen
Guangyu Zhao
Fei Gao
54
4
0
02 Sep 2024
Representation Alignment from Human Feedback for Cross-Embodiment Reward
  Learning from Mixed-Quality Demonstrations
Representation Alignment from Human Feedback for Cross-Embodiment Reward Learning from Mixed-Quality Demonstrations
Connor Mattson
Anurag Aribandi
Daniel S. Brown
40
0
0
10 Aug 2024
WayEx: Waypoint Exploration using a Single Demonstration
WayEx: Waypoint Exploration using a Single Demonstration
Mara Levy
Nirat Saini
Abhinav Shrivastava
61
1
0
22 Jul 2024
Hybrid Reinforcement Learning from Offline Observation Alone
Hybrid Reinforcement Learning from Offline Observation Alone
Yuda Song
J. Andrew Bagnell
Aarti Singh
OffRL
84
2
0
11 Jun 2024
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
A Study of Plasticity Loss in On-Policy Deep Reinforcement Learning
Arthur Juliani
Jordan T. Ash
OffRL
OnRL
CLL
47
5
0
29 May 2024
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration
  Efficiency in Reinforcement Learning
Reverse Forward Curriculum Learning for Extreme Sample and Demonstration Efficiency in Reinforcement Learning
Stone Tao
Arth Shukla
Tse-kai Chan
Hao Su
OffRL
41
4
0
06 May 2024
Proximal Curriculum with Task Correlations for Deep Reinforcement
  Learning
Proximal Curriculum with Task Correlations for Deep Reinforcement Learning
Georgios Tzannetos
Parameswaran Kamalaruban
Adish Singla
34
4
0
03 May 2024
Dataset Reset Policy Optimization for RLHF
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
27
21
0
12 Apr 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLM
LRM
34
68
0
07 Mar 2024
Snapshot Reinforcement Learning: Leveraging Prior Trajectories for
  Efficiency
Snapshot Reinforcement Learning: Leveraging Prior Trajectories for Efficiency
Yanxiao Zhao
Yangge Qian
Tianyi Wang
Jingyang Shan
Xiaolin Qin
24
0
0
01 Mar 2024
Training Large Language Models for Reasoning through Reverse Curriculum
  Reinforcement Learning
Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning
Zhiheng Xi
Wenxiang Chen
Boyang Hong
Senjie Jin
Rui Zheng
...
Xinbo Zhang
Peng Sun
Tao Gui
Qi Zhang
Xuanjing Huang
LRM
39
21
0
08 Feb 2024
StepCoder: Improve Code Generation with Reinforcement Learning from
  Compiler Feedback
StepCoder: Improve Code Generation with Reinforcement Learning from Compiler Feedback
Shihan Dou
Yan Liu
Haoxiang Jia
Limao Xiong
Enyu Zhou
...
Tao Ji
Rui Zheng
Qi Zhang
Xuanjing Huang
Tao Gui
LLMAG
65
30
0
02 Feb 2024
HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement
  Learning Agents
HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents
Dániel Horváth
Jesús Bujalance Martín
Ferenc Gàbor Erdos
Z. Istenes
Fabien Moutarde
OffRL
28
1
0
14 Dec 2023
Generalization to New Sequential Decision Making Tasks with In-Context
  Learning
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy
Eric Hambro
Robert Kirk
Mikael Henaff
Roberta Raileanu
OffRL
111
21
0
06 Dec 2023
Where2Start: Leveraging initial States for Robust and Sample-Efficient
  Reinforcement Learning
Where2Start: Leveraging initial States for Robust and Sample-Efficient Reinforcement Learning
Pouya Parsa
Raoof Zare Moayedi
Mohammad Bornosi
Mohammad Mahdi Bejani
24
0
0
25 Nov 2023
Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu
  Search
Finding Increasingly Large Extremal Graphs with AlphaZero and Tabu Search
Abbas Mehrabian
Ankit Anand
Hyunjik Kim
Nicolas Sonnerat
Matej Balog
...
Laurent Orseau
Joonkyung Lee
Anurag Murty Naredla
Doina Precup
Adam Zsolt Wagner
21
7
0
06 Nov 2023
Information Content Exploration
Information Content Exploration
Jacob Chmura
Hasham Burhani
Xiao Qi Shi
19
0
0
10 Oct 2023
Diagnosing and exploiting the computational demands of videos games for
  deep reinforcement learning
Diagnosing and exploiting the computational demands of videos games for deep reinforcement learning
L. Govindarajan
Rex G Liu
Drew Linsley
A. Ashok
Max Reuter
M. Frank
Thomas Serre
OffRL
21
0
0
22 Sep 2023
One ACT Play: Single Demonstration Behavior Cloning with Action Chunking
  Transformers
One ACT Play: Single Demonstration Behavior Cloning with Action Chunking Transformers
Abraham George
A. Farimani
OffRL
25
11
0
18 Sep 2023
Contrastive Initial State Buffer for Reinforcement Learning
Contrastive Initial State Buffer for Reinforcement Learning
Nico Messikommer
Yunlong Song
Davide Scaramuzza
OffRL
44
9
0
18 Sep 2023
Learning to Generate Better Than Your LLM
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
19
41
0
20 Jun 2023
Proximal Curriculum for Reinforcement Learning Agents
Proximal Curriculum for Reinforcement Learning Agents
Georgios Tzannetos
Bárbara Gomes Ribeiro
Parameswaran Kamalaruban
Adish Singla
32
5
0
25 Apr 2023
Boosting Reinforcement Learning and Planning with Demonstrations: A
  Survey
Boosting Reinforcement Learning and Planning with Demonstrations: A Survey
Tongzhou Mu
H. Su
OffRL
35
1
0
23 Mar 2023
Sample Efficient Deep Reinforcement Learning via Local Planning
Sample Efficient Deep Reinforcement Learning via Local Planning
Dong Yin
S. Thiagarajan
N. Lazić
Nived Rajaraman
Botao Hao
Csaba Szepesvári
25
4
0
29 Jan 2023
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
David Zhang
Micah Carroll
Andreea Bobu
Anca Dragan
24
4
0
30 Nov 2022
Towards Improving Exploration in Self-Imitation Learning using Intrinsic
  Motivation
Towards Improving Exploration in Self-Imitation Learning using Intrinsic Motivation
Alain Andres
Esther Villar-Rodriguez
Javier Del Ser
SSL
25
6
0
30 Nov 2022
Leveraging Sequentiality in Reinforcement Learning from a Single
  Demonstration
Leveraging Sequentiality in Reinforcement Learning from a Single Demonstration
Alexandre Chenu
Olivier Serris
Olivier Sigaud
Nicolas Perrin-Gilbert
17
4
0
09 Nov 2022
D-Shape: Demonstration-Shaped Reinforcement Learning via Goal
  Conditioning
D-Shape: Demonstration-Shaped Reinforcement Learning via Goal Conditioning
Caroline Wang
Garrett A. Warnell
Peter Stone
40
3
0
26 Oct 2022
Task Phasing: Automated Curriculum Learning from Demonstrations
Task Phasing: Automated Curriculum Learning from Demonstrations
Vaibhav Bajaj
Guni Sharon
Peter Stone
26
8
0
20 Oct 2022
Generative Personas That Behave and Experience Like Humans
Generative Personas That Behave and Experience Like Humans
M. Barthet
Ahmed Khalifa
Antonios Liapis
Georgios N. Yannakakis
21
20
0
26 Aug 2022
Reinforcement Learning for Branch-and-Bound Optimisation using
  Retrospective Trajectories
Reinforcement Learning for Branch-and-Bound Optimisation using Retrospective Trajectories
Christopher W. F. Parsonson
Alexandre Laterre
Thomas D. Barrett
19
19
0
28 May 2022
Learning to Guide Multiple Heterogeneous Actors from a Single Human
  Demonstration via Automatic Curriculum Learning in StarCraft II
Learning to Guide Multiple Heterogeneous Actors from a Single Human Demonstration via Automatic Curriculum Learning in StarCraft II
Nicholas R. Waytowich
James Z. Hare
Vinicius G. Goecks
Mark R. Mittrick
John Richardson
Anjon Basak
Derrik E. Asher
30
2
0
11 May 2022
Exploration in Deep Reinforcement Learning: A Survey
Exploration in Deep Reinforcement Learning: A Survey
Pawel Ladosz
Lilian Weng
Minwoo Kim
H. Oh
OffRL
26
324
0
02 May 2022
Divide & Conquer Imitation Learning
Divide & Conquer Imitation Learning
Alexandre Chenu
Nicolas Perrin-Gilbert
Olivier Sigaud
16
5
0
15 Apr 2022
Jump-Start Reinforcement Learning
Jump-Start Reinforcement Learning
Ikechukwu Uchendu
Ted Xiao
Yao Lu
Banghua Zhu
Mengyuan Yan
...
Chuyuan Fu
Cong Ma
Jiantao Jiao
Sergey Levine
Karol Hausman
OffRL
OnRL
38
109
0
05 Apr 2022
Wish you were here: Hindsight Goal Selection for long-horizon dexterous
  manipulation
Wish you were here: Hindsight Goal Selection for long-horizon dexterous manipulation
Todor Davchev
Oleg O. Sushkov
Jean-Baptiste Regli
S. Schaal
Y. Aytar
Markus Wulfmeier
Jonathan Scholz
16
18
0
01 Dec 2021
Learning to Execute: Efficient Learning of Universal Plan-Conditioned
  Policies in Robotics
Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics
Ingmar Schubert
Danny Driess
Ozgur S. Oguz
Marc Toussaint
OffRL
22
1
0
15 Nov 2021
Learning from Ambiguous Demonstrations with Self-Explanation Guided
  Reinforcement Learning
Learning from Ambiguous Demonstrations with Self-Explanation Guided Reinforcement Learning
Yantian Zha
L. Guan
Subbarao Kambhampati
26
5
0
11 Oct 2021
Learning Multi-Objective Curricula for Robotic Policy Learning
Learning Multi-Objective Curricula for Robotic Policy Learning
Jikun Kang
Miao Liu
Abhinav Gupta
C. Pal
Xue Liu
Jie Fu
42
4
0
06 Oct 2021
Go-Blend behavior and affect
Go-Blend behavior and affect
M. Barthet
Antonios Liapis
Georgios N. Yannakakis
19
7
0
24 Sep 2021
Exploration in Deep Reinforcement Learning: From Single-Agent to
  Multiagent Domain
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain
Jianye Hao
Tianpei Yang
Hongyao Tang
Chenjia Bai
Jinyi Liu
Zhaopeng Meng
Peng Liu
Zhen Wang
OffRL
36
92
0
14 Sep 2021
Finding Failures in High-Fidelity Simulation using Adaptive Stress
  Testing and the Backward Algorithm
Finding Failures in High-Fidelity Simulation using Adaptive Stress Testing and the Backward Algorithm
Mark Koren
Ahmed Nassar
Mykel J. Kochenderfer
16
20
0
27 Jul 2021
Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks
Plan-Based Relaxed Reward Shaping for Goal-Directed Tasks
Ingmar Schubert
Ozgur S. Oguz
Marc Toussaint
OffRL
29
5
0
14 Jul 2021
Imitation Learning: Progress, Taxonomies and Challenges
Imitation Learning: Progress, Taxonomies and Challenges
Boyuan Zheng
Sunny Verma
Jianlong Zhou
Ivor Tsang
Fang Chen
25
85
0
23 Jun 2021
Automatic Curricula via Expert Demonstrations
Automatic Curricula via Expert Demonstrations
Siyu Dai
Andreas G. Hofmann
B. Williams
15
5
0
16 Jun 2021
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale
  of Pessimism
Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism
Paria Rashidinejad
Banghua Zhu
Cong Ma
Jiantao Jiao
Stuart J. Russell
OffRL
30
273
0
22 Mar 2021
12
Next