Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.20538
Cited By
v1
v2
v3 (latest)
AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy
26 May 2025
Sebastian Antony Joseph
Syed Murtaza Husain
Stella S. R. Offner
Stéphanie Juneau
Paul Torrey
Adam S. Bolton
Juan P. Farias
Niall Gaffney
Greg Durrett
Junyi Jessy Li
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"AstroVisBench: A Code Benchmark for Scientific Computing and Visualization in Astronomy"
6 / 6 papers shown
Title
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALM
ELM
969
23
0
02 Apr 2025
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
148
2
0
25 Mar 2025
CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation
Peter Alexander Jansen
Oyvind Tafjord
Marissa Radensky
Pao Siangliulue
Tom Hope
Bhavana Dalvi Mishra
Bodhisattwa Prasad Majumder
Daniel S. Weld
Peter Clark
87
7
0
20 Mar 2025
CSR-Bench: Benchmarking LLM Agents in Deployment of Computer Science Research Repositories
Yijia Xiao
Runhui Wang
Luyang Kong
Davor Golac
Wei Wang
LLMAG
472
3
0
10 Feb 2025
Gravity-Bench-v1: A Benchmark on Gravitational Physics Discovery for Agents
Nolan Koblischke
Hyunseok Jang
Kristen Menou
M. Ali-Dib
155
2
0
30 Jan 2025
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
235
193
0
22 Jun 2024
1