Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.21199
Cited By
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
3 January 2025
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation"
2 / 2 papers shown
Title
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Lennart Luettgau
Harry Coppock
Magda Dubois
Christopher Summerfield
Cozmin Ududec
9
0
0
08 May 2025
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
Sizhe Wang
Z. Wang
Dongsheng Ma
Yongan Yu
Rui Ling
Z. Li
Feiyu Xiong
W. Zhang
LRM
45
0
0
30 Apr 2025
1