Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.02875
Cited By
Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions
5 December 2019
J. Schmidhuber
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reinforcement Learning Upside Down: Don't Predict Rewards -- Just Map Them to Actions"
38 / 38 papers shown
Title
Bi-directional Recurrence Improves Transformer in Partially Observable Markov Decision Processes
Ashok Arora
Neetesh Kumar
36
0
0
16 May 2025
Beyond the Known: Decision Making with Counterfactual Reasoning Decision Transformer
Minh Hoang Nguyen
Linh Le Pham Van
Thommen George Karimpanal
Sunil Gupta
Hung Le
OffRL
LRM
37
0
0
14 May 2025
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
Günter Klambauer
Razvan Pascanu
Sepp Hochreiter
75
5
0
21 Feb 2025
Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
Zijian Guo
Weichao Zhou
Wenchao Li
OffRL
105
2
0
28 Jan 2025
Towards Aligning Language Models with Textual Feedback
Sauc Abadal Lloret
S. Dhuliawala
K. Murugesan
Mrinmaya Sachan
VLM
48
1
0
24 Jul 2024
Improving Reward-Conditioned Policies for Multi-Armed Bandits using Normalized Weight Functions
Kai Xu
Farid Tajaddodianfar
Ben Allison
21
0
0
16 Jun 2024
Return-Aligned Decision Transformer
Tsunehiko Tanaka
Kenshi Abe
Kaito Ariu
Tetsuro Morimura
Edgar Simo-Serra
OffRL
69
1
0
06 Feb 2024
Closing the Gap between TD Learning and Supervised Learning -- A Generalisation Point of View
Raj Ghugare
Matthieu Geist
Glen Berseth
Benjamin Eysenbach
OffRL
35
14
0
20 Jan 2024
An Invitation to Deep Reinforcement Learning
Bernhard Jaeger
Andreas Geiger
OffRL
OOD
80
5
0
13 Dec 2023
A Tractable Inference Perspective of Offline RL
Xuejie Liu
Guy Van den Broeck
Mathias Niepert
Yitao Liang
OffRL
36
1
0
31 Oct 2023
ACT: Empowering Decision Transformer with Dynamic Programming via Advantage Conditioning
Chenxiao Gao
Chenyang Wu
Mingjun Cao
Rui Kong
Zongzhang Zhang
Yang Yu
OffRL
34
13
0
12 Sep 2023
Transformers in Reinforcement Learning: A Survey
Pranav Agarwal
A. Rahman
P. St-Charles
Simon J. D. Prince
Samira Ebrahimi Kahou
OffRL
32
19
0
12 Jul 2023
Passive learning of active causal strategies in agents and language models
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Ishita Dasgupta
A. Nam
Jane X. Wang
29
15
0
25 May 2023
Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning
T. Kanazawa
Chetan Gupta
29
0
0
15 Mar 2023
Language Decision Transformers with Exponential Tilt for Interactive Text Environments
Nicolas Angelard-Gontier
Pau Rodríguez López
I. Laradji
David Vazquez
C. Pal
OffRL
34
1
0
10 Feb 2023
SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning
Benjamin Ellis
Jonathan Cook
S. Moalla
Mikayel Samvelyan
Mingfei Sun
Anuj Mahajan
Jakob N. Foerster
Shimon Whiteson
33
84
0
14 Dec 2022
Is Conditional Generative Modeling all you need for Decision-Making?
Anurag Ajay
Yilun Du
Abhi Gupta
J. Tenenbaum
Tommi Jaakkola
Pulkit Agrawal
DiffM
66
365
0
28 Nov 2022
Hypernetworks for Zero-shot Transfer in Reinforcement Learning
S. Rezaei-Shoshtari
Charlotte Morissette
F. Hogan
Gregory Dudek
David Meger
OffRL
17
14
0
28 Nov 2022
Control Transformer: Robot Navigation in Unknown Environments through PRM-Guided Return-Conditioned Sequence Modeling
Daniel Lawson
A. H. Qureshi
24
8
0
11 Nov 2022
Dichotomy of Control: Separating What You Can Control from What You Cannot
Mengjiao Yang
Dale Schuurmans
Pieter Abbeel
Ofir Nachum
OffRL
30
42
0
24 Oct 2022
Reliable Conditioning of Behavioral Cloning for Offline Reinforcement Learning
Tung Nguyen
Qinqing Zheng
Aditya Grover
OffRL
34
6
0
11 Oct 2022
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
68
183
0
30 Aug 2022
Efficient Planning in a Compact Latent Action Space
Zhengyao Jiang
Tianjun Zhang
Michael Janner
Yueying Li
Tim Rocktaschel
Edward Grefenstette
Yuandong Tian
OffRL
24
37
0
22 Aug 2022
Goal-Conditioned Generators of Deep Policies
Francesco Faccio
Vincent Herrmann
Aditya A. Ramesh
Louis Kirsch
Jürgen Schmidhuber
OffRL
40
8
0
04 Jul 2022
Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning
Yunfei Li
Tian Gao
Jiaqi Yang
Huazhe Xu
Yi Wu
OffRL
31
22
0
24 Jun 2022
Deep Transformer Q-Networks for Partially Observable Reinforcement Learning
Kevin Esslinger
Robert W. Platt
Chris Amato
OffRL
35
35
0
02 Jun 2022
Towards Learning Universal Hyperparameter Optimizers with Transformers
Yutian Chen
Xingyou Song
Chansoo Lee
Zehao Wang
Qiuyi Zhang
...
Greg Kochanski
Arnaud Doucet
MarcÁurelio Ranzato
Sagi Perel
Nando de Freitas
32
63
0
26 May 2022
Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets
M. Strupl
Francesco Faccio
Dylan R. Ashley
Jürgen Schmidhuber
R. Srivastava
15
9
0
13 May 2022
Exploring the Pareto front of multi-objective COVID-19 mitigation policies using reinforcement learning
Mathieu Reymond
Conor F. Hayes
L. Willem
Roxana Rădulescu
S. Abrams
...
Enda Howley
Patrick Mannion
N. Hens
Ann Nowé
Pieter J. K. Libin
16
8
0
11 Apr 2022
Unsupervised Learning of Temporal Abstractions with Slot-based Transformers
Anand Gopalakrishnan
Kazuki Irie
Jürgen Schmidhuber
Sjoerd van Steenkiste
OffRL
26
16
0
25 Mar 2022
Learning Relative Return Policies With Upside-Down Reinforcement Learning
Dylan R. Ashley
Kai Arulkumaran
Jürgen Schmidhuber
R. Srivastava
OffRL
24
1
0
23 Feb 2022
Goal-Conditioned Reinforcement Learning: Problems and Solutions
Minghuan Liu
Menghui Zhu
Weinan Zhang
35
133
0
20 Jan 2022
RvS: What is Essential for Offline RL via Supervised Learning?
Scott Emmons
Benjamin Eysenbach
Ilya Kostrikov
Sergey Levine
OffRL
31
170
0
20 Dec 2021
An Offline Deep Reinforcement Learning for Maintenance Decision-Making
H. Khorasgani
Haiyan Wang
Chetan Gupta
Ahmed K. Farahat
KELM
OffRL
21
5
0
28 Sep 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
33
57
0
11 Jun 2021
Offline Reinforcement Learning as One Big Sequence Modeling Problem
Michael Janner
Qiyang Li
Sergey Levine
OffRL
66
649
0
03 Jun 2021
RUDDER: Return Decomposition for Delayed Rewards
Jose A. Arjona-Medina
Michael Gillhofer
Michael Widrich
Thomas Unterthiner
Johannes Brandstetter
Sepp Hochreiter
30
212
0
20 Jun 2018
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
718
6,748
0
26 Sep 2016
1