Inadequacies of Large Language Model Benchmarks in the Era of Generative
Artificial Intelligence

Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence

15 February 2024

Timothy R. McIntosh

Malka N. Halgamuge

Papers citing "Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence"

8 / 8 papers shown

Title
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark Jasper Götting Pedro Medeiros Jon G Sanders Nathaniel Li Long Phan Karam Elabd Lennart Justen Dan Hendrycks Seth Donoughe ELM 47 2 0 21 Apr 2025
Latent Convergence Modulation in Large Language Models: A Novel Approach to Iterative Contextual Realignment Patricia Porretta Sylvester Pakenham Huxley Ainsworth Gregory Chatten Godfrey Allerton Simon Hollingsworth Vance Periwinkle 52 0 0 10 Feb 2025
Semantic Layered Embedding Diffusion in Large Language Models for Multi-Contextual Consistency Irin Kabakum Thomas Montgomery Daniel Ravenwood Genevieve Harrington 30 0 0 26 Jan 2025
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals Qingyang Wu Ying Xu Tingsong Xiao Yunze Xiao Yitong Li ... Yichi Zhang Shanghai Zhong Yuwei Zhang Wei Lu Yifan Yang 61 1 0 17 Jan 2025
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities Fabrizio Davide Pietro Torre Andrea Gaggioli Andrea Gaggioli ELM 79 0 0 12 Dec 2024
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape Timothy R. McIntosh Teo Susnjak Tong Liu Paul Watters Malka N. Halgamuge 79 46 0 18 Dec 2023
Don't Make Your LLM an Evaluation Benchmark Cheater Kun Zhou Yutao Zhu Zhipeng Chen Wentong Chen Wayne Xin Zhao Xu Chen Yankai Lin Ji-Rong Wen Jiawei Han ELM 99 136 0 03 Nov 2023
LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning Neel Guha Daniel E. Ho Julian Nyarko Christopher Ré AILaw ELM 89 16 0 13 Sep 2022