v1v2v3v4 (latest)

Training a Generally Curious Agent

24 February 2025

ArXiv (abs)PDF HTML Github (39203★)

Papers citing "Training a Generally Curious Agent"

50 / 58 papers shown

Benchmarking In-context Experiential Learning Through Repeated Product Recommendations

101

27 Nov 2025

When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training

118

29 Sep 2025

Towards Monotonic Improvement in In-Context Reinforcement Learning

155

27 Sep 2025

Outcome-based Exploration for LLM Reasoning

289

08 Sep 2025

Provably Learning from Language Feedback

376

12 Jun 2025

ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations

215

10 Jun 2025

Self-Evolving Curriculum for LLM Reasoning

Nicolas Angelard-Gontier

Yoshua Bengio

Ehsan Kamalloo

ReLM LRM

666

20 May 2025

SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills

...

537

09 Apr 2025

ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning

347

20 Feb 2025

Should You Use Your Large Language Model to Explore or Exploit?

Keegan Harris

Aleksandrs Slivkins

178

31 Jan 2025

...

619

3,199

25 Oct 2024

Unintentional Unalignment: Likelihood Displacement in Direct Preference OptimizationInternational Conference on Learning Representations (ICLR), 2024

649

11 Oct 2024

GenQA: Generating Millions of Instructions from a Handful of Prompts

Tom Goldstein

396

14 Jun 2024

Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data

Stefano Ermon

488

182

22 Apr 2024

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Chao Yu

451

260

16 Apr 2024

Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators

481

674

06 Apr 2024

Can large language models explore in-context?Neural Information Processing Systems (NeurIPS), 2024

640

22 Mar 2024

Teaching Large Language Models to Reason with Reinforcement Learning

Alex Havrilla

Yuqing Du

Sharath Chandra Raparthy

Christoforos Nalmpantis

305

155

07 Mar 2024

Genie: Generative Interactive Environments

...

358

420

23 Feb 2024

MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models

Xingshan Zeng

Lifeng Shang

Xin Jiang

Qun Liu

Kam-Fai Wong

LRM ELM

144

30 Jan 2024

Best Arm Identification with Fixed Budget: A Large Deviation Perspective

Po-An Wang

Ruo-Chun Tzeng

Alexandre Proutiere

317

19 Dec 2023

Generalization to New Sequential Decision Making Tasks with In-Context Learning

Sharath Chandra Raparthy

357

06 Dec 2023

LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models

256

30 Nov 2023

GPQA: A Graduate-Level Google-Proof Q&A Benchmark

492

2,077

20 Nov 2023

Instruction-Following Evaluation for Large Language Models

352

676

14 Nov 2023

Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingInternational Conference on Learning Representations (ICLR), 2023

345

12 Oct 2023

MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language FeedbackInternational Conference on Learning Representations (ICLR), 2023

Hao Peng

Heng Ji

LRM

489

268

19 Sep 2023

Reinforced Self-Training (ReST) for Language Modeling

...

425

401

17 Aug 2023

FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningInternational Conference on Learning Representations (ICLR), 2023

Tri Dao

LRM

500

2,277

17 Jul 2023

Supervised Pretraining Can Learn In-Context Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023

322

134

26 Jun 2023

Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023

Christopher D. Manning

Chelsea Finn

ALM

980

7,442

29 May 2023

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

Wei Xiong

Tong Zhang

510

668

13 Apr 2023

In-context Reinforcement Learning with Algorithm DistillationInternational Conference on Learning Representations (ICLR), 2022

Stephen Spencer

...

269

180

25 Oct 2022

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022

885

3,673

27 May 2022

Large Language Models are Zero-Shot ReasonersNeural Information Processing Systems (NeurIPS), 2022

1.5K

6,499

24 May 2022

Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022

2.5K

15,837

28 Jan 2022

Replay-Guided Adversarial Environment Design

570

133

06 Oct 2021

Measuring Mathematical Problem Solving With the MATH Dataset

1.0K

4,489

05 Mar 2021

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment DesignNeural Information Processing Systems (NeurIPS), 2020

559

300

03 Dec 2020

Prioritized Level Replay

571

203

08 Oct 2020

Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020

...

2.2K

54,992

28 May 2020

Automatic Curriculum Learning For Deep RL: A Short SurveyInternational Joint Conference on Artificial Intelligence (IJCAI), 2020

357

210

10 Mar 2020

Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environmentsConference on Robot Learning (CoRL), 2019

207

162

16 Oct 2019

Interactive Fiction Games: A Colossal AdventureAAAI Conference on Artificial Intelligence (AAAI), 2019

Matthew J. Hausknecht

Prithviraj Ammanabrolu

Marc-Alexandre Côté

Xingdi Yuan

LLMAG LM&Ro AI4CE

373

233

11 Sep 2019

Dynamics-Aware Unsupervised Discovery of SkillsInternational Conference on Learning Representations (ICLR), 2019

Vikash Kumar

436

465

02 Jul 2019

Self-Supervised Exploration via DisagreementInternational Conference on Machine Learning (ICML), 2019

258

431

10 Jun 2019

Policy Gradient Search: Online Planning and Expert Iteration without Search Trees

215

07 Apr 2019

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions

Rui Wang

Joel Lehman

Jeff Clune

Kenneth O. Stanley

473

280

07 Jan 2019

Exploration by Random Network Distillation

Yuri Burda

Harrison Edwards

Amos Storkey

Oleg Klimov

307

1,571

30 Oct 2018

Diversity is All You Need: Learning Skills without a Reward Function

Benjamin Eysenbach

Abhishek Gupta

Julian Ibarz

Sergey Levine

1.0K

1,236

16 Feb 2018