Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.09880
Cited By
Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence
15 February 2024
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
ALM
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inadequacies of Large Language Model Benchmarks in the Era of Generative Artificial Intelligence"
8 / 8 papers shown
Title
Virology Capabilities Test (VCT): A Multimodal Virology Q&A Benchmark
Jasper Götting
Pedro Medeiros
Jon G Sanders
Nathaniel Li
Long Phan
Karam Elabd
Lennart Justen
Dan Hendrycks
Seth Donoughe
ELM
47
2
0
21 Apr 2025
Latent Convergence Modulation in Large Language Models: A Novel Approach to Iterative Contextual Realignment
Patricia Porretta
Sylvester Pakenham
Huxley Ainsworth
Gregory Chatten
Godfrey Allerton
Simon Hollingsworth
Vance Periwinkle
52
0
0
10 Feb 2025
Semantic Layered Embedding Diffusion in Large Language Models for Multi-Contextual Consistency
Irin Kabakum
Thomas Montgomery
Daniel Ravenwood
Genevieve Harrington
30
0
0
26 Jan 2025
Surveying Attitudinal Alignment Between Large Language Models Vs. Humans Towards 17 Sustainable Development Goals
Qingyang Wu
Ying Xu
Tingsong Xiao
Yunze Xiao
Yitong Li
...
Yichi Zhang
Shanghai Zhong
Yuwei Zhang
Wei Lu
Yifan Yang
61
1
0
17 Jan 2025
AI Predicts AGI: Leveraging AGI Forecasting and Peer Review to Explore LLMs' Complex Reasoning Capabilities
Fabrizio Davide
Pietro Torre
Andrea Gaggioli
Andrea Gaggioli
ELM
79
0
0
12 Dec 2024
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Malka N. Halgamuge
79
46
0
18 Dec 2023
Don't Make Your LLM an Evaluation Benchmark Cheater
Kun Zhou
Yutao Zhu
Zhipeng Chen
Wentong Chen
Wayne Xin Zhao
Xu Chen
Yankai Lin
Ji-Rong Wen
Jiawei Han
ELM
99
136
0
03 Nov 2023
LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning
Neel Guha
Daniel E. Ho
Julian Nyarko
Christopher Ré
AILaw
ELM
89
16
0
13 Sep 2022
1