Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

27 September 2024

Tom Goldstein

Anima Anandkumar

Furong Huang

Papers citing "Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization"

3 / 3 papers shown

Title
A Weighted Byzantine Fault Tolerance Consensus Driven Trusted Multiple Large Language Models Network Haoxiang Luo Gang Sun Yinqiu Liu Dongcheng Zhao Dusit Niyato Hongfang Yu Schahram Dustdar 33 0 0 08 May 2025
Beyond the Singular: The Essential Role of Multiple Generations in Effective Benchmark Evaluation and Analysis Wenbo Zhang Hengrui Cai Wenyu Chen 77 0 0 17 Feb 2025
EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? Aakriti Agrawal Mucong Ding Zora Che Chenghao Deng Anirudh Satheesh John Langford Furong Huang 39 4 0 06 Oct 2024