ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.06769
  4. Cited By
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating
  Automated Scientific Discovery Agents

DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents

10 June 2024
Peter Alexander Jansen
Marc-Alexandre Côté
Tushar Khot
Erin Bransom
Bhavana Dalvi Mishra
Bodhisattwa Prasad Majumder
Oyvind Tafjord
Peter Clark
    LLMAG
ArXivPDFHTML

Papers citing "DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents"

20 / 20 papers shown
Title
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction
Vaishnavh Nagarajan
Chen Henry Wu
Charles Ding
Aditi Raghunathan
26
0
0
21 Apr 2025
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Sparks of Science: Hypothesis Generation Using Structured Paper Data
Charles OÑeill
Tirthankar Ghosal
Roberta Răileanu
Mike Walmsley
Thang Bui
Kevin Schawinski
I. Ciucă
LRM
49
0
0
17 Apr 2025
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
HypoBench: Towards Systematic and Principled Benchmarking for Hypothesis Generation
Haokun Liu
Sicong Huang
Jingyu Hu
Yangqiaoyu Zhou
Chenhao Tan
25
0
0
15 Apr 2025
PaperBench: Evaluating AI's Ability to Replicate AI Research
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALM
ELM
769
5
0
02 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
53
0
0
31 Mar 2025
debug-gym: A Text-Based Environment for Interactive Debugging
debug-gym: A Text-Based Environment for Interactive Debugging
Xingdi Yuan
Morgane M Moss
Charbel El Feghali
Chinmay Singh
Darya Moldavskaya
...
Lucas Page-Caccia
Matheus Pereira
Minseon Kim
Alessandro Sordoni
Marc-Alexandre Côté
LLMAG
68
1
0
27 Mar 2025
AgentRxiv: Towards Collaborative Autonomous Research
AgentRxiv: Towards Collaborative Autonomous Research
Samuel Schmidgall
Michael Moor
52
2
0
23 Mar 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at ResearchTrend Connect | LLMAG on 07 May 2025
93
5
0
20 Mar 2025
LLM Agents for Education: Advances and Applications
LLM Agents for Education: Advances and Applications
Zhendong Chu
Shen Wang
Jian Xie
Tinghui Zhu
Yibo Yan
...
Aoxiao Zhong
Xuming Hu
Jing Liang
Philip S. Yu
Qingsong Wen
LLMAG
ELM
103
1
0
14 Mar 2025
Deep Learning based discovery of Integrable Systems
Deep Learning based discovery of Integrable Systems
Shailesh Lal
Suvajit Majumder
E. Sobko
36
0
0
13 Mar 2025
From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems
Zekun Zhou
Xiaocheng Feng
L. Huang
Xiachong Feng
Ziyun Song
...
Baoxin Wang
Dayong Wu
Guoping Hu
Ting Liu
Bing Qin
AI4TS
66
0
0
03 Mar 2025
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
MLGym: A New Framework and Benchmark for Advancing AI Research Agents
Deepak Nathani
Lovish Madaan
Nicholas Roberts
Nikolay Bashlykov
Ajay Menon
...
Tatiana Shavrina
Jakob Foerster
Yoram Bachrach
William Yang Wang
Roberta Raileanu
LLMAG
75
7
0
21 Feb 2025
Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization
Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization
M. L. Olson
Neale Ratzlaff
Musashi Hinck
Man Luo
Sungduk Yu
Chendi Xue
Vasudev Lal
MoE
LRM
41
1
0
15 Feb 2025
The AI Agent Index
The AI Agent Index
Stephen Casper
Luke Bailey
Rosco Hunter
Carson Ezell
Emma Cabalé
...
Phillip J. K. Christoffersen
A. Pinar Ozisik
Rakshit Trivedi
Dylan Hadfield-Menell
Noam Kolt
63
4
0
03 Feb 2025
MetaScientist: A Human-AI Synergistic Framework for Automated Mechanical
  Metamaterial Design
MetaScientist: A Human-AI Synergistic Framework for Automated Mechanical Metamaterial Design
Jingyuan Qi
Z. Jia
Minqian Liu
Wangzhi Zhan
Junkai Zhang
...
Muhao Chen
Dawei Zhou
Ling Li
Wei Wang
Lifu Huang
AI4CE
71
1
0
20 Dec 2024
Cocoa: Co-Planning and Co-Execution with AI Agents
Cocoa: Co-Planning and Co-Execution with AI Agents
K. J. Kevin Feng
Kevin Pu
Matt Latzke
Tal August
Pao Siangliulue
Jonathan Bragg
Daniel S. Weld
Amy X. Zhang
Joseph Chee Chang
LM&Ro
LLMAG
87
4
0
14 Dec 2024
Agent-as-a-Judge: Evaluate Agents with Agents
Agent-as-a-Judge: Evaluate Agents with Agents
Mingchen Zhuge
Changsheng Zhao
Dylan R. Ashley
Wenyi Wang
Dmitrii Khizbullin
...
Raghuraman Krishnamoorthi
Yuandong Tian
Yangyang Shi
Vikas Chandra
Jürgen Schmidhuber
ELM
57
32
0
14 Oct 2024
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research
  Repositories
SUPER: Evaluating Agents on Setting Up and Executing Tasks from Research Repositories
Ben Bogin
Kejuan Yang
Shashank Gupta
Kyle Richardson
Erin Bransom
Peter Clark
Ashish Sabharwal
Tushar Khot
ELM
LRM
34
9
0
11 Sep 2024
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning
  Research
MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research
Mikayel Samvelyan
Robert Kirk
Vitaly Kurin
Jack Parker-Holder
Minqi Jiang
Eric Hambro
Fabio Petroni
Heinrich Küttler
Edward Grefenstette
Tim Rocktaschel
OffRL
220
89
0
27 Sep 2021
1