Evaluating General-Purpose AI with Psychometrics

Evaluating General-Purpose AI with Psychometrics

25 October 2023

Jose Hernandez-Orallo

David Stillwell

Xing Xie

Papers citing "Evaluating General-Purpose AI with Psychometrics"

11 / 11 papers shown

Title
A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models Elena Kardanova Alina Ivanova Ksenia Tarasova Taras Pashchenko Aleksei Tikhoniuk Elen Yusupova Anatoly Kasprzhak Yaroslav Kuzminov Ekaterina Kruchinskaia Irina Brun 35 1 0 29 Oct 2024
Cognitive phantoms in LLMs through the lens of latent variables Sanne Peereboom Inga Schwabe Bennett Kleinberg 23 0 0 06 Sep 2024
Evaluating AI Evaluation: Perils and Prospects John Burden ELM 25 8 0 12 Jul 2024
Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models Bei Yan Jie Zhang Zheng Yuan Shiguang Shan Xilin Chen VLM 25 4 0 24 Jun 2024
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework Olivier Binette Jerome P. Reiter 23 0 0 14 Jun 2024
An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI Ross Gruetzemacher Alan Chan Kevin Frazier Christy Manning Stepán Los ... Clíodhna Ní Ghuidhir Mark M. Bailey Daniel Eth Toby D. Pilditch Kyle A. Kilian 16 5 0 22 Oct 2023
Position: AI Evaluation Should Learn from How We Test Humans Yan Zhuang Q. Liu Yuting Ning Wei Huang Rui Lv Zhenya Huang Guanhao Zhao Zheng-Wei Zhang ELM ALM 62 21 0 18 Jun 2023
Revealing the structure of language model capabilities Ryan Burnell Hank Hao Andrew R. A. Conway José Hernández Orallo ELM 37 17 0 14 Jun 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4 Sébastien Bubeck Varun Chandrasekaran Ronen Eldan J. Gehrke Eric Horvitz ... Scott M. Lundberg Harsha Nori Hamid Palangi Marco Tulio Ribeiro Yi Zhang ELM AI4MH AI4CE ALM 203 2,232 0 22 Mar 2023
Using cognitive psychology to understand GPT-3 Marcel Binz Eric Schulz ELM LLMAG 236 435 0 21 Jun 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason W. Wei Xuezhi Wang Dale Schuurmans Maarten Bosma Brian Ichter F. Xia Ed H. Chi Quoc Le Denny Zhou LM&Ro LRM AI4CE ReLM 315 8,261 0 28 Jan 2022