40
3

Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents

Abstract

Scientific experimentation, a cornerstone of human progress, demands rigor in reliability, methodical control, and interpretability to yield meaningful results. Despite the growing capabilities of large language models (LLMs) in automating different aspects of the scientific process, automating rigorous experimentation remains a significant challenge. To address this gap, we propose Curie, an AI agent framework designed to embed rigor into the experimentation process through three key components: an intra-agent rigor module to enhance reliability, an inter-agent rigor module to maintain methodical control, and an experiment knowledge module to enhance interpretability. To evaluate Curie, we design a novel experimental benchmark composed of 46 questions across four computer science domains, derived from influential research papers, and widely adopted open-source projects. Compared to the strongest baseline tested, we achieve a 3.4×\times improvement in correctly answering experimental questions. Curie is open-sourced atthis https URL.

View on arXiv
@article{kon2025_2502.16069,
  title={ Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents },
  author={ Patrick Tser Jern Kon and Jiachen Liu and Qiuyi Ding and Yiming Qiu and Zhenning Yang and Yibo Huang and Jayanth Srinivasa and Myungjin Lee and Mosharaf Chowdhury and Ang Chen },
  journal={arXiv preprint arXiv:2502.16069},
  year={ 2025 }
}
Comments on this paper