Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.16379
Cited By
Evaluating General-Purpose AI with Psychometrics
25 October 2023
Xiting Wang
Liming Jiang
Jose Hernandez-Orallo
David Stillwell
Luning Sun
Fang Luo
Xing Xie
AI4MH
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Evaluating General-Purpose AI with Psychometrics"
11 / 11 papers shown
Title
A Novel Psychometrics-Based Approach to Developing Professional Competency Benchmark for Large Language Models
Elena Kardanova
Alina Ivanova
Ksenia Tarasova
Taras Pashchenko
Aleksei Tikhoniuk
Elen Yusupova
Anatoly Kasprzhak
Yaroslav Kuzminov
Ekaterina Kruchinskaia
Irina Brun
35
1
0
29 Oct 2024
Cognitive phantoms in LLMs through the lens of latent variables
Sanne Peereboom
Inga Schwabe
Bennett Kleinberg
23
0
0
06 Sep 2024
Evaluating AI Evaluation: Perils and Prospects
John Burden
ELM
25
8
0
12 Jul 2024
Evaluating the Quality of Hallucination Benchmarks for Large Vision-Language Models
Bei Yan
Jie Zhang
Zheng Yuan
Shiguang Shan
Xilin Chen
VLM
25
4
0
24 Jun 2024
Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework
Olivier Binette
Jerome P. Reiter
23
0
0
14 Jun 2024
An International Consortium for Evaluations of Societal-Scale Risks from Advanced AI
Ross Gruetzemacher
Alan Chan
Kevin Frazier
Christy Manning
Stepán Los
...
Clíodhna Ní Ghuidhir
Mark M. Bailey
Daniel Eth
Toby D. Pilditch
Kyle A. Kilian
16
5
0
22 Oct 2023
Position: AI Evaluation Should Learn from How We Test Humans
Yan Zhuang
Q. Liu
Yuting Ning
Wei Huang
Rui Lv
Zhenya Huang
Guanhao Zhao
Zheng-Wei Zhang
ELM
ALM
62
21
0
18 Jun 2023
Revealing the structure of language model capabilities
Ryan Burnell
Hank Hao
Andrew R. A. Conway
José Hernández Orallo
ELM
37
17
0
14 Jun 2023
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
203
2,232
0
22 Mar 2023
Using cognitive psychology to understand GPT-3
Marcel Binz
Eric Schulz
ELM
LLMAG
236
435
0
21 Jun 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
1