ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2511.04703
  4. Cited By
Measuring what Matters: Construct Validity in Large Language Model Benchmarks

Measuring what Matters: Construct Validity in Large Language Model Benchmarks

3 November 2025
Andrew M. Bean
Ryan Kearns
Angelika Romanou
Franziska Sofia Hafner
Harry Mayne
Jan Batzner
Negar Foroutan
Chris Schmitz
Karolina Korgul
Hunar Batra
Oishi Deb
Emma Beharry
Cornelius Emde
Thomas Foster
Anna Gausen
María Grandury
Simeng Han
Valentin Hofmann
Lujain Ibrahim
Hazel Kim
Hannah Rose Kirk
Fangru Lin
Gabrielle Kaili-May Liu
Lennart Luettgau
Jabez Magomere
Jonathan Rystrøm
Anna Sotnikova
Yushi Yang
Yilun Zhao
Adel Bibi
Antoine Bosselut
Ronald Clark
Arman Cohan
Jakob N. Foerster
Y. Gal
Scott A. Hale
Inioluwa Deborah Raji
Christopher Summerfield
Philip Torr
Cozmin Ududec
Luc Rocher
Adam Mahdi
    ALM
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)

Papers citing "Measuring what Matters: Construct Validity in Large Language Model Benchmarks"

0 / 0 papers shown
Title

No papers found