ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.17543
  4. Cited By
Training a Generally Curious Agent
v1v2v3v4 (latest)

Training a Generally Curious Agent

24 February 2025
Fahim Tajwar
Yiding Jiang
Abitha Thankaraj
Sumaita Sadia Rahman
J. Zico Kolter
Jeff Schneider
Ruslan Salakhutdinov
ArXiv (abs)PDFHTMLGithub (39203★)

Papers citing "Training a Generally Curious Agent"

50 / 58 papers shown
Benchmarking In-context Experiential Learning Through Repeated Product Recommendations
Benchmarking In-context Experiential Learning Through Repeated Product Recommendations
Gilbert Yang
Yaqin Chen
Thomson Yen
Hongseok Namkoong
101
0
0
27 Nov 2025
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
When Greedy Wins: Emergent Exploitation Bias in Meta-Bandit LLM Training
Sanxing Chen
Xiaoyin Chen
Yukun Huang
Roy Xie
Bhuwan Dhingra
118
2
0
29 Sep 2025
Towards Monotonic Improvement in In-Context Reinforcement Learning
Towards Monotonic Improvement in In-Context Reinforcement Learning
Wenhao Zhang
Shao Zhang
Xihuai Wang
Yang Li
Ying Wen
OffRL
155
0
0
27 Sep 2025
Outcome-based Exploration for LLM Reasoning
Outcome-based Exploration for LLM Reasoning
Yuda Song
Julia Kempe
Remi Munos
OffRLLRM
289
46
0
08 Sep 2025
Provably Learning from Language Feedback
Provably Learning from Language Feedback
Wanqiao Xu
Allen Nie
Ruijie Zheng
Aditya Modi
Adith Swaminathan
Ching-An Cheng
376
6
0
12 Jun 2025
ADAM: Autonomous Discovery and Annotation Model using LLMs for Context-Aware Annotations
Amirreza Rouhi
Solmaz Arezoomandan
Knut Peterson
Joseph T. Woods
David Han
VLM
215
18
0
10 Jun 2025
Self-Evolving Curriculum for LLM Reasoning
Self-Evolving Curriculum for LLM Reasoning
Xiaoyin Chen
Jiarui Lu
Minsu Kim
Dinghuai Zhang
Jian Tang
Alexandre Piché
Nicolas Angelard-Gontier
Yoshua Bengio
Ehsan Kamalloo
ReLMLRM
666
51
0
20 May 2025
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing Skills
Boyuan Zheng
Michael Y. Fatemi
Xiaolong Jin
Liang Luo
Apurva Gandhi
...
Yu Gu
Jayanth Srinivasa
Gaowen Liu
Graham Neubig
Eric Fosler-Lussier
CLL
537
35
0
09 Apr 2025
ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
ALFA: Aligning LLMs to Ask Good Questions A Case Study in Clinical Reasoning
Shuyue Stella Li
Kumail Alhamoud
Faeze Brahman
Pedram Hosseini
Bryceton G. Thomas
Jessica M. Sin
Bing Ren
Jonathan Ilgen
Yulia Tsvetkov
Maarten Sap
LM&MA
347
10
0
20 Feb 2025
Should You Use Your Large Language Model to Explore or Exploit?
Should You Use Your Large Language Model to Explore or Exploit?
Keegan Harris
Aleksandrs Slivkins
178
2
0
31 Jan 2025
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
619
3,199
0
25 Oct 2024
Unintentional Unalignment: Likelihood Displacement in Direct Preference Optimization
Unintentional Unalignment: Likelihood Displacement in Direct Preference OptimizationInternational Conference on Learning Representations (ICLR), 2024
Noam Razin
Sadhika Malladi
Adithya Bhaskar
Danqi Chen
Sanjeev Arora
Boris Hanin
649
53
0
11 Oct 2024
GenQA: Generating Millions of Instructions from a Handful of Prompts
GenQA: Generating Millions of Instructions from a Handful of Prompts
Jiuhai Chen
Rifaa Qadri
Yuxin Wen
Neel Jain
John Kirchenbauer
Wanrong Zhu
Tom Goldstein
ALM
396
36
0
14 Jun 2024
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy
  Data
Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data
Fahim Tajwar
Anika Singh
Archit Sharma
Rafael Rafailov
Jeff Schneider
Tengyang Xie
Stefano Ermon
Chelsea Finn
Aviral Kumar
488
182
0
22 Apr 2024
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study
Shusheng Xu
Wei Fu
Jiaxuan Gao
Wenjie Ye
Weiling Liu
Zhiyu Mei
Guangju Wang
Chao Yu
Yi Wu
451
260
0
16 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
481
674
0
06 Apr 2024
Can large language models explore in-context?
Can large language models explore in-context?Neural Information Processing Systems (NeurIPS), 2024
Akshay Krishnamurthy
Keegan Harris
Dylan J. Foster
Cyril Zhang
Aleksandrs Slivkins
LM&RoLLMAGLRM
640
61
0
22 Mar 2024
Teaching Large Language Models to Reason with Reinforcement Learning
Teaching Large Language Models to Reason with Reinforcement Learning
Alex Havrilla
Yuqing Du
Sharath Chandra Raparthy
Christoforos Nalmpantis
Jane Dwivedi-Yu
Maksym Zhuravinskyi
Eric Hambro
Sainbayar Sukhbaatar
Roberta Raileanu
ReLMLRM
305
155
0
07 Mar 2024
Genie: Generative Interactive Environments
Genie: Generative Interactive Environments
Jake Bruce
Michael Dennis
Ashley D. Edwards
Jack Parker-Holder
Yuge Shi
...
Konrad Zolna
Jeff Clune
Nando de Freitas
Satinder Singh
Tim Rocktaschel
VGenVLM
358
420
0
23 Feb 2024
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large
  Language Models
MT-Eval: A Multi-Turn Capabilities Evaluation Benchmark for Large Language Models
Wai-Chung Kwan
Xingshan Zeng
Yuxin Jiang
Yufei Wang
Liangyou Li
Lifeng Shang
Xin Jiang
Qun Liu
Kam-Fai Wong
LRMELM
144
58
0
30 Jan 2024
Best Arm Identification with Fixed Budget: A Large Deviation Perspective
Best Arm Identification with Fixed Budget: A Large Deviation Perspective
Po-An Wang
Ruo-Chun Tzeng
Alexandre Proutiere
317
10
0
19 Dec 2023
Generalization to New Sequential Decision Making Tasks with In-Context
  Learning
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy
Eric Hambro
Robert Kirk
Mikael Henaff
Roberta Raileanu
OffRL
357
37
0
06 Dec 2023
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language
  Models
LMRL Gym: Benchmarks for Multi-Turn Reinforcement Learning with Language Models
Marwa Abdulhai
Isadora White
Charles Burton Snell
Charles Sun
Joey Hong
Yuexiang Zhai
Kelvin Xu
Sergey Levine
LLMAGOffRLLRM
256
68
0
30 Nov 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MHELM
492
2,077
0
20 Nov 2023
Instruction-Following Evaluation for Large Language Models
Instruction-Following Evaluation for Large Language Models
Jeffrey Zhou
Tianjian Lu
Swaroop Mishra
Siddhartha Brahma
Sujoy Basu
Yi Luan
Denny Zhou
Le Hou
ELMALMLRM
352
676
0
14 Nov 2023
Transformers as Decision Makers: Provable In-Context Reinforcement
  Learning via Supervised Pretraining
Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised PretrainingInternational Conference on Learning Representations (ICLR), 2023
Licong Lin
Yu Bai
Song Mei
OffRL
345
71
0
12 Oct 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language FeedbackInternational Conference on Learning Representations (ICLR), 2023
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
489
268
0
19 Sep 2023
Reinforced Self-Training (ReST) for Language Modeling
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
425
401
0
17 Aug 2023
FlashAttention-2: Faster Attention with Better Parallelism and Work
  Partitioning
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningInternational Conference on Learning Representations (ICLR), 2023
Tri Dao
LRM
500
2,277
0
17 Jul 2023
Supervised Pretraining Can Learn In-Context Reinforcement Learning
Supervised Pretraining Can Learn In-Context Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Jonathan Lee
Annie Xie
Aldo Pacchiano
Yash Chandak
Chelsea Finn
Ofir Nachum
Emma Brunskill
OffRL
322
134
0
26 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
980
7,442
0
29 May 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Boyao Wang
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
510
668
0
13 Apr 2023
In-context Reinforcement Learning with Algorithm Distillation
In-context Reinforcement Learning with Algorithm DistillationInternational Conference on Learning Representations (ICLR), 2022
Michael Laskin
Luyu Wang
Junhyuk Oh
Emilio Parisotto
Stephen Spencer
...
Ethan A. Brooks
Maxime Gazeau
Himanshu Sahni
Satinder Singh
Volodymyr Mnih
OffRL
269
180
0
25 Oct 2022
FlashAttention: Fast and Memory-Efficient Exact Attention with
  IO-Awareness
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessNeural Information Processing Systems (NeurIPS), 2022
Tri Dao
Daniel Y. Fu
Stefano Ermon
Atri Rudra
Christopher Ré
VLM
885
3,673
0
27 May 2022
Large Language Models are Zero-Shot Reasoners
Large Language Models are Zero-Shot ReasonersNeural Information Processing Systems (NeurIPS), 2022
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLMLRM
1.5K
6,499
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.5K
15,837
0
28 Jan 2022
Replay-Guided Adversarial Environment Design
Replay-Guided Adversarial Environment Design
Minqi Jiang
Michael Dennis
Jack Parker-Holder
Jakob N. Foerster
Edward Grefenstette
Tim Rocktaschel
570
133
0
06 Oct 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLMFaML
1.0K
4,489
0
05 Mar 2021
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment
  Design
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment DesignNeural Information Processing Systems (NeurIPS), 2020
Michael Dennis
Natasha Jaques
Eugene Vinitsky
Alexandre M. Bayen
Stuart J. Russell
Andrew Critch
Sergey Levine
559
300
0
03 Dec 2020
Prioritized Level Replay
Prioritized Level Replay
Minqi Jiang
Edward Grefenstette
Tim Rocktaschel
OffRL
571
203
0
08 Oct 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot LearnersNeural Information Processing Systems (NeurIPS), 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
2.2K
54,992
0
28 May 2020
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning For Deep RL: A Short SurveyInternational Joint Conference on Artificial Intelligence (IJCAI), 2020
Rémy Portelas
Cédric Colas
Lilian Weng
Katja Hofmann
Pierre-Yves Oudeyer
ODL
357
210
0
10 Mar 2020
Teacher algorithms for curriculum learning of Deep RL in continuously
  parameterized environments
Teacher algorithms for curriculum learning of Deep RL in continuously parameterized environmentsConference on Robot Learning (CoRL), 2019
Rémy Portelas
Cédric Colas
Katja Hofmann
Pierre-Yves Oudeyer
207
162
0
16 Oct 2019
Interactive Fiction Games: A Colossal Adventure
Interactive Fiction Games: A Colossal AdventureAAAI Conference on Artificial Intelligence (AAAI), 2019
Matthew J. Hausknecht
Prithviraj Ammanabrolu
Marc-Alexandre Côté
Xingdi Yuan
LLMAGLM&RoAI4CE
373
233
0
11 Sep 2019
Dynamics-Aware Unsupervised Discovery of Skills
Dynamics-Aware Unsupervised Discovery of SkillsInternational Conference on Learning Representations (ICLR), 2019
Archit Sharma
S. Gu
Sergey Levine
Vikash Kumar
Karol Hausman
436
465
0
02 Jul 2019
Self-Supervised Exploration via Disagreement
Self-Supervised Exploration via DisagreementInternational Conference on Machine Learning (ICML), 2019
Deepak Pathak
Dhiraj Gandhi
Abhinav Gupta
SSL
258
431
0
10 Jun 2019
Policy Gradient Search: Online Planning and Expert Iteration without
  Search Trees
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees
Thomas W. Anthony
Robert Nishihara
Philipp Moritz
Tim Salimans
John Schulman
215
30
0
07 Apr 2019
Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly
  Complex and Diverse Learning Environments and Their Solutions
Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Rui Wang
Joel Lehman
Jeff Clune
Kenneth O. Stanley
473
280
0
07 Jan 2019
Exploration by Random Network Distillation
Exploration by Random Network Distillation
Yuri Burda
Harrison Edwards
Amos Storkey
Oleg Klimov
307
1,571
0
30 Oct 2018
Diversity is All You Need: Learning Skills without a Reward Function
Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach
Abhishek Gupta
Julian Ibarz
Sergey Levine
1.0K
1,236
0
16 Feb 2018
12
Next
Page 1 of 2