ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.11363
  4. Cited By
CORE-Bench: Fostering the Credibility of Published Research Through a
  Computational Reproducibility Agent Benchmark

CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark

17 September 2024
Zachary S. Siegel
Sayash Kapoor
Nitya Nagdir
Benedikt Stroebl
Arvind Narayanan
ArXivPDFHTML

Papers citing "CORE-Bench: Fostering the Credibility of Published Research Through a Computational Reproducibility Agent Benchmark"

8 / 8 papers shown
Title
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
FEABench: Evaluating Language Models on Multiphysics Reasoning Ability
N. Mudur
Hao Cui
Subhashini Venugopalan
Paul Raccuglia
M. Brenner
Peter C. Norgaard
LLMAG
ELM
LRM
38
0
0
08 Apr 2025
PaperBench: Evaluating AI's Ability to Replicate AI Research
PaperBench: Evaluating AI's Ability to Replicate AI Research
Giulio Starace
Oliver Jaffe
Dane Sherburn
James Aung
Jun Shern Chan
...
Benjamin Kinsella
Wyatt Thompson
Johannes Heidecke
Amelia Glaese
Tejal Patwardhan
ALM
ELM
772
5
0
02 Apr 2025
Survey on Evaluation of LLM-based Agents
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at ResearchTrend Connect | LLMAG on 07 May 2025
93
5
0
20 Mar 2025
Landscape Complexity for the Empirical Risk of Generalized Linear Models: Discrimination between Structured Data
Landscape Complexity for the Empirical Risk of Generalized Linear Models: Discrimination between Structured Data
Theodoros G. Tsironis
Aris L. Moustakas
52
1
0
18 Mar 2025
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
AAML
ELM
39
1
0
03 Mar 2025
From Hypothesis to Publication: A Comprehensive Survey of AI-Driven Research Support Systems
Zekun Zhou
Xiaocheng Feng
L. Huang
Xiachong Feng
Ziyun Song
...
Baoxin Wang
Dayong Wu
Guoping Hu
Ting Liu
Bing Qin
AI4TS
66
0
0
03 Mar 2025
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology
Ludovico Mitchener
Jon M. Laurent
Benjamin Tenmann
Siddharth Narayanan
Geemi P Wellawatte
A. White
Lorenzo Sani
Samuel G. Rodriques
LLMAG
LM&MA
ELM
62
2
0
28 Feb 2025
The AI Agent Index
The AI Agent Index
Stephen Casper
Luke Bailey
Rosco Hunter
Carson Ezell
Emma Cabalé
...
Phillip J. K. Christoffersen
A. Pinar Ozisik
Rakshit Trivedi
Dylan Hadfield-Menell
Noam Kolt
66
4
0
03 Feb 2025
1