ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.14039
21
0

MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks

18 April 2025
Jaime Raldua Veuthey
Zainab Ali Majid
Suhas Hariharan
Jacob Haimes
    ELM
ArXivPDFHTML
Abstract

As Large Language Models (LLMs) advance, their potential for widespread societal impact grows simultaneously. Hence, rigorous LLM evaluations are both a technical necessity and social imperative. While numerous evaluation benchmarks have been developed, there remains a critical gap in meta-evaluation: effectively assessing benchmarks' quality. We propose MEQA, a framework for the meta-evaluation of question and answer (QA) benchmarks, to provide standardized assessments, quantifiable scores, and enable meaningful intra-benchmark comparisons. We demonstrate this approach on cybersecurity benchmarks, using human and LLM evaluators, highlighting the benchmarks' strengths and weaknesses. We motivate our choice of test domain by AI models' dual nature as powerful defensive tools and security threats.

View on arXiv
@article{veuthey2025_2504.14039,
  title={ MEQA: A Meta-Evaluation Framework for Question & Answer LLM Benchmarks },
  author={ Jaime Raldua Veuthey and Zainab Ali Majid and Suhas Hariharan and Jacob Haimes },
  journal={arXiv preprint arXiv:2504.14039},
  year={ 2025 }
}
Comments on this paper