v1v2 (latest)

Optimizing Agent Behavior over Long Time Scales by Transporting Value

15 October 2018

Arun Ahuja

Papers citing "Optimizing Agent Behavior over Long Time Scales by Transporting Value"

50 / 82 papers shown

Tree Search for LLM Agent Reinforcement Learning

253

25 Sep 2025

Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers

Kazuki Irie

Morris Yau

Samuel J. Gershman

248

31 May 2025

The challenge of hidden gifts in multi-agent reinforcement learning

Dane Malenfant

Blake A. Richards

455

26 May 2025

Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning

571

14 Feb 2025

Evolution and The Knightian Blindspot of Machine Learning

385

22 Jan 2025

Token-level Proximal Policy Optimization for Query Generation

Chenghua Huang

...

936

01 Nov 2024

VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision Making

406

21 Oct 2024

Action abstractions for amortized samplingInternational Conference on Learning Representations (ICLR), 2024

Moksh Jain

Nikolay Malkin

Emmanuel Bengio

Rim Assouel

Yoshua Bengio

261

19 Oct 2024

Stable Hadamard Memory: Revitalizing Memory-Augmented Agents for Reinforcement LearningInternational Conference on Learning Representations (ICLR), 2024

319

14 Oct 2024

Assessing the Zero-Shot Capabilities of LLMs for Action Evaluation in RL

Eduardo Pignatelli

Johan Ferret

Laura Toni

324

19 Sep 2024

Equivariant Reinforcement Learning under Partial ObservabilityConference on Robot Learning (CoRL), 2024

Hai Nguyen

Robert Platt

265

26 Aug 2024

Variable-Agnostic Causal Exploration for Reinforcement Learning

322

17 Jul 2024

Rethinking Transformers in Solving POMDPs

461

27 May 2024

Mastering Memory Tasks with World Models

Mohammad Reza Samsami

Artem Zholus

Janarthanan Rajendran

Sarath Chandar

CLL OffRL

390

07 Mar 2024

Spatially-Aware Transformer for Embodied Agents

Junmo Cho

Jaesik Yoon

Sungjin Ahn

358

23 Feb 2024

Do Transformer World Models Give Better Policy Gradients?

Pierre-Luc Bacon

302

07 Feb 2024

Policy Optimization with Smooth Guidance Learned from State-Only Demonstrations

470

30 Dec 2023

Episodic Return Decomposition by Difference of Implicitly Assigned Sub-Trajectory Reward

222

17 Dec 2023

Neuro-Inspired Fragmentation and Recall to Overcome Catastrophic Forgetting in Curiosity

Jaedong Hwang

Ila Fiete

204

26 Oct 2023

PCGPT: Procedural Content Generation via Transformers

Sajad Mohaghegh

Mohammad Amin Ramezan Dehnavi

Golnoosh Abdollahinejad

Matin Hashemi

ViT

241

03 Oct 2023

Karma: Adaptive Video Streaming via Causal Sequence ModelingACM Multimedia (ACM MM), 2023

Bo Xu

Hao Chen

Zhanghui Ma

CML

20 Aug 2023

Hindsight-DICE: Stable Credit Assignment for Deep Reinforcement Learning

341

21 Jul 2023

Transformers in Reinforcement Learning: A Survey

Samira Ebrahimi Kahou

OffRL

293

12 Jul 2023

Grid Cell-Inspired Fragmentation and Recall for Efficient Map Building

Jaedong Hwang

Ila Fiete

295

11 Jul 2023

When Do Transformers Shine in RL? Decoupling Memory from Credit AssignmentNeural Information Processing Systems (NeurIPS), 2023

Pierre-Luc Bacon

587

07 Jul 2023

Would I have gotten that reward? Long-term credit assignment by counterfactual contribution analysisNeural Information Processing Systems (NeurIPS), 2023

416

29 Jun 2023

Decision S4: Efficient Sequence-Based RL via State Spaces LayersInternational Conference on Learning Representations (ICLR), 2023

228

08 Jun 2023

Contrastive Retrospection: honing in on critical steps for rapid learning and generalization in RLNeural Information Processing Systems (NeurIPS), 2022

Benjamin Alsbury-Nealy

Yoshua Bengio

Blake A. Richards

OffRL

397

12 Oct 2022

Reward Learning using Structural Motifs in Inverse Reinforcement Learning

Raeid Saqur

315

25 Sep 2022

Multi-Agent Reinforcement Learning for Long-Term Network Resource Allocation through Auction: a V2X ApplicationComputer Communications (Comput. Commun.), 2022

176

29 Jul 2022

Explain My Surprise: Learning Efficient Long-Term Memory by Predicting Uncertain OutcomesNeural Information Processing Systems (NeurIPS), 2022

397

27 Jul 2022

Off-Beat Multi-Agent Reinforcement LearningAdaptive Agents and Multi-Agent Systems (AAMAS), 2022

Jianye Hao

Changjie Fan

230

27 May 2022

Modeling Human Behavior Part I -- Learning and Belief Approaches

Andrew Fuchs

A. Passarella

M. Conti

276

13 May 2022

Learning to Bid Long-Term: Multi-Agent Reinforcement Learning with Long-Term and Sparse Reward in Repeated Auction Games

Jing Tan

R. Khalili

Holger Karl

111

05 Apr 2022

Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act

Alexis Jacq

Johan Ferret

Olivier Pietquin

Matthieu Geist

217

16 Mar 2022

Selective Credit Assignment

221

20 Feb 2022

Bayesian sense of time in biological and artificial brains

Zafeirios Fountas

Alexey Zakharov

237

14 Jan 2022

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

206

17 Dec 2021

Episodic Policy Gradient Training

Hung Le

Majid Abdolshah

Thommen George Karimpanal

214

03 Dec 2021

Model-Based Episodic Memory Induces Dynamic Hybrid ControlsNeural Information Processing Systems (NeurIPS), 2021

Hung Le

Thommen George Karimpanal

Majid Abdolshah

T. Tran

Svetha Venkatesh

212

03 Nov 2021

Biological learning in key-value memory networks

271

26 Oct 2021

Recurrent Model-Free RL Can Be a Strong Baseline for Many POMDPsInternational Conference on Machine Learning (ICML), 2021

Tianwei Ni

Benjamin Eysenbach

Ruslan Salakhutdinov

456

155

11 Oct 2021

Evaluating the progress of Deep Reinforcement Learning in the real world: aligning domain-agnostic and domain-specific research

303

07 Jul 2021

Preferential Temporal Difference LearningInternational Conference on Machine Learning (ICML), 2021

N. Anand

Doina Precup

OOD

194

11 Jun 2021

Towards Practical Credit Assignment for Deep Reinforcement Learning

Dmitry Vetrov

181

08 Jun 2021

Decision Transformer: Reinforcement Learning via Sequence ModelingNeural Information Processing Systems (NeurIPS), 2021

Aravind Rajeswaran

Pieter Abbeel

708

2,154

02 Jun 2021

Towards mental time travel: a hierarchical memory for reinforcement learning agentsNeural Information Processing Systems (NeurIPS), 2021

406

28 May 2021

An Information-Theoretic Perspective on Credit Assignment in Reinforcement Learning

Dilip Arumugam

Peter Henderson

Pierre-Luc Bacon

176

10 Mar 2021

Synthetic Returns for Long-Term Credit Assignment

244

24 Feb 2021

Delayed Rewards Calibration via Reward Empirical Sufficiency

Hu Wang

226

21 Feb 2021