Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.03302
Cited By
MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation
5 October 2023
Qian Huang
Jian Vora
Percy Liang
J. Leskovec
ELM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation"
21 / 21 papers shown
Title
MLE-Dojo: Interactive Environments for Empowering LLM Agents in Machine Learning Engineering
Rushi Qiang
Yuchen Zhuang
Yinghao Li
D. Kilman
Rongzhi Zhang
...
Ian Shu-Hei Wong
Sherry Yang
Percy Liang
Chao Zhang
Bo Dai
ELM
39
0
0
12 May 2025
Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey
Da Zheng
Lun Du
Junwei Su
Yuchen Tian
Yuqi Zhu
Jintian Zhang
Lanning Wei
Ningyu Zhang
H. Chen
LRM
54
0
0
06 May 2025
ResearchCodeAgent: An LLM Multi-Agent System for Automated Codification of Research Methodologies
Shubham Gandhi
Dhruv Shah
Manasi S. Patwardhan
L. Vig
Gautam M. Shroff
LLMAG
AI4CE
113
0
0
28 Apr 2025
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
S. Hwang
AI4CE
37
0
0
24 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
R. Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
60
0
0
15 Apr 2025
ELT-Bench: An End-to-End Benchmark for Evaluating AI Agents on ELT Pipelines
Tengjun Jin
Yuxuan Zhu
Daniel Kang
LMTD
ELM
47
0
0
07 Apr 2025
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALM
ELM
791
6
0
02 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
59
1
0
31 Mar 2025
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
77
6
0
18 Mar 2025
SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing
Xiangchao Yan
Shiyang Feng
Jiakang Yuan
Renqiu Xia
Bin Wang
Bo Zhang
Lei Bai
60
2
0
06 Mar 2025
AIDE: AI-Driven Exploration in the Space of Code
Zhengyao Jiang
Dominik Schmidt
Dhruv Srikanth
Dixing Xu
Ian Kaplan
Deniss Jacenko
Yuxiang Wu
67
5
0
18 Feb 2025
DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration
Sizhe Liu
Y. Lu
Siyu Chen
Xiyang Hu
Jieyu Zhao
Tianfan Fu
Yue Zhao
LLMAG
79
6
0
24 Nov 2024
Automating Traffic Model Enhancement with AI Research Agent
Xusen Guo
Xinxi Yang
Mingxing Peng
Hongliang Lu
Meixin Zhu
Hai Yang
62
0
0
25 Sep 2024
AI Agents That Matter
Sayash Kapoor
Benedikt Stroebl
Zachary S. Siegel
Nitya Nadgir
Arvind Narayanan
41
33
0
01 Jul 2024
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Peter Alexander Jansen
Marc-Alexandre Côté
Tushar Khot
Erin Bransom
Bhavana Dalvi Mishra
Bodhisattwa Prasad Majumder
Oyvind Tafjord
Peter Clark
LLMAG
35
21
0
10 Jun 2024
SciMON: Scientific Inspiration Machines Optimized for Novelty
Qingyun Wang
Doug Downey
Heng Ji
Tom Hope
LLMAG
26
61
0
23 May 2023
AutoML-GPT: Automatic Machine Learning with GPT
Shujian Zhang
Chengyue Gong
Lemeng Wu
Xingchao Liu
Mi Zhou
LLMAG
52
59
0
04 May 2023
Automated Scientific Discovery: From Equation Discovery to Autonomous Discovery Systems
Stefan Kramer
Mattia Cerrato
S. Džeroski
R. King
26
10
0
03 May 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
215
1,733
0
07 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
233
2,477
0
06 Oct 2022
The CLRS Algorithmic Reasoning Benchmark
Petar Velivcković
Adria Puigdomenech Badia
David Budden
Razvan Pascanu
Andrea Banino
Mikhail Dashevskiy
R. Hadsell
Charles Blundell
157
87
0
31 May 2022
1