ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.05261
  4. Cited By
Simple Agent, Complex Environment: Efficient Reinforcement Learning with
  Agent States
v1v2v3v4v5v6v7 (latest)

Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States

Journal of machine learning research (JMLR), 2021
10 February 2021
Shi Dong
Benjamin Van Roy
Zhengyuan Zhou
ArXiv (abs)PDFHTML

Papers citing "Simple Agent, Complex Environment: Efficient Reinforcement Learning with Agent States"

27 / 27 papers shown
Real-World Reinforcement Learning of Active Perception Behaviors
Real-World Reinforcement Learning of Active Perception Behaviors
E. Hu
Jie Wang
Xingfang Yuan
Fiona Luo
Muyao Li
Gaspard Lambrechts
Oleh Rybkin
Dinesh Jayaraman
OffRL
287
3
0
01 Dec 2025
Forgetting is Everywhere
Forgetting is Everywhere
Ben Sanati
Thomas L. Lee
Trevor A. McInroe
Aidan Scannell
Nikolay Malkin
David Abel
Amos Storkey
OODCML
453
0
0
06 Nov 2025
Modeling Others' Minds as Code
Modeling Others' Minds as Code
Kunal Jha
Aydan Yuenan Huang
Eric Ye
Natasha Jaques
Max Kleiman-Weiner
SyDa
185
4
0
29 Sep 2025
Convergence of regularized agent-state-based Q-learning in POMDPs
Convergence of regularized agent-state-based Q-learning in POMDPs
Amit Sinha
Matthieu Geist
Aditya Mahajan
159
0
0
29 Aug 2025
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RL
BOFormer: Learning to Solve Multi-Objective Bayesian Optimization via Non-Markovian RLInternational Conference on Learning Representations (ICLR), 2025
Yu-Heng Hung
Kai-Jie Lin
Yu-Heng Lin
Chien-Yi Wang
Cheng Sun
Ping-Chun Hsieh
377
6
0
28 May 2025
Plasticity as the Mirror of Empowerment
Plasticity as the Mirror of Empowerment
David Abel
Michael Bowling
André Barreto
Will Dabney
Shi Dong
...
Doina Precup
Jonathan Richens
Mark Rowland
Tom Schaul
Satinder Singh
AI4CE
501
3
0
15 May 2025
Toward Efficient Exploration by Large Language Model Agents
Toward Efficient Exploration by Large Language Model Agents
Dilip Arumugam
Thomas L. Griffiths
LLMAG
469
12
0
29 Apr 2025
Rethinking the Foundations for Continual Reinforcement Learning
Rethinking the Foundations for Continual Reinforcement Learning
Esraa Elelimy
David Szepesvari
Martha White
Michael Bowling
OffRLLRMCLL
512
13
0
10 Apr 2025
Agent-state based policies in POMDPs: Beyond belief-state MDPs
Agent-state based policies in POMDPs: Beyond belief-state MDPsIEEE Conference on Decision and Control (CDC), 2024
Amit Sinha
Aditya Mahajan
318
11
0
24 Sep 2024
The Need for a Big World Simulator: A Scientific Challenge for Continual
  Learning
The Need for a Big World Simulator: A Scientific Challenge for Continual Learning
Saurabh Kumar
Hong Jun Jeon
Alex Lewandowski
Benjamin Van Roy
245
5
0
06 Aug 2024
Three Dogmas of Reinforcement Learning
Three Dogmas of Reinforcement Learning
David Abel
Mark K. Ho
Anna Harutyunyan
431
12
0
15 Jul 2024
Mitigating Partial Observability in Sequential Decision Processes via
  the Lambda Discrepancy
Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy
Cameron Allen
Aaron Kirtland
Ruo Yu Tao
Sam Lobel
Daniel Scott
Nicholas Petrocelli
Omer Gottesman
Ronald E. Parr
M. L. Littman
George Konidaris
246
8
0
10 Jul 2024
Periodic agent-state based Q-learning for POMDPs
Periodic agent-state based Q-learning for POMDPs
Amit Sinha
Mathieu Geist
Aditya Mahajan
365
5
0
08 Jul 2024
Meta-Gradient Search Control: A Method for Improving the Efficiency of
  Dyna-style Planning
Meta-Gradient Search Control: A Method for Improving the Efficiency of Dyna-style Planning
Bradley Burega
John D. Martin
Luke Kapeluck
Michael Bowling
288
0
0
27 Jun 2024
Conditions on Preference Relations that Guarantee the Existence of
  Optimal Policies
Conditions on Preference Relations that Guarantee the Existence of Optimal PoliciesInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Jonathan Colaco Carr
Prakash Panangaden
Doina Precup
391
4
0
03 Nov 2023
Q-Learning for Stochastic Control under General Information Structures
  and Non-Markovian Environments
Q-Learning for Stochastic Control under General Information Structures and Non-Markovian Environments
A. D. Kara
S. Yüksel
338
14
0
31 Oct 2023
A Definition of Continual Reinforcement Learning
A Definition of Continual Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
David Abel
André Barreto
Benjamin Van Roy
Doina Precup
H. V. Hasselt
Satinder Singh
CLL
589
125
0
20 Jul 2023
On the Convergence of Bounded Agents
On the Convergence of Bounded Agents
David Abel
André Barreto
Hado van Hasselt
Benjamin Van Roy
Doina Precup
Satinder Singh
301
7
0
20 Jul 2023
Continual Learning as Computationally Constrained Reinforcement Learning
Continual Learning as Computationally Constrained Reinforcement Learning
Saurabh Kumar
Henrik Marklund
Anand Srinivasa Rao
Yifan Zhu
Hong Jun Jeon
Yueyang Liu
Benjamin Van Roy
CLL
464
39
0
10 Jul 2023
Approximate information state based convergence analysis of recurrent
  Q-learning
Approximate information state based convergence analysis of recurrent Q-learning
Erfan Seyedsalehi
N. Akbarzadeh
Amit Sinha
Aditya Mahajan
214
6
0
09 Jun 2023
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Fine-Tuning Language Models with Advantage-Induced Policy Alignment
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
OSLM
342
49
0
04 Jun 2023
Bayesian Reinforcement Learning with Limited Cognitive Load
Bayesian Reinforcement Learning with Limited Cognitive LoadOpen Mind (OM), 2023
Dilip Arumugam
Mark K. Ho
Noah D. Goodman
Benjamin Van Roy
OffRL
258
16
0
05 May 2023
Settling the Reward Hypothesis
Settling the Reward HypothesisInternational Conference on Machine Learning (ICML), 2022
Michael Bowling
John D. Martin
David Abel
Will Dabney
LRM
327
45
0
20 Dec 2022
Posterior Sampling for Continuing Environments
Posterior Sampling for Continuing Environments
Wanqiao Xu
Shi Dong
Benjamin Van Roy
270
4
0
29 Nov 2022
Reinforcement Learning in Non-Markovian Environments
Reinforcement Learning in Non-Markovian Environments
Siddharth Chandak
Pratik Shah
Vivek Borkar
Parth Dodhia
OOD
411
15
0
03 Nov 2022
On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement
  Learning
On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning
Dilip Arumugam
Mark K. Ho
Noah D. Goodman
Benjamin Van Roy
301
7
0
30 Oct 2022
Planning to the Information Horizon of BAMDPs via Epistemic State
  Abstraction
Planning to the Information Horizon of BAMDPs via Epistemic State AbstractionNeural Information Processing Systems (NeurIPS), 2022
Dilip Arumugam
Satinder Singh
238
7
0
30 Oct 2022
1
Page 1 of 1