Revealing the structure of language model capabilities

Revealing the structure of language model capabilities

14 June 2023

Andrew R. A. Conway

José Hernández Orallo

Papers citing "Revealing the structure of language model capabilities"

13 / 13 papers shown

Title
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction Jianhao Yan Yun Luo Yue Zhang LLMAG 50 1 0 25 Feb 2025
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families Felipe Maia Polo S. Kamath S Leshem Choshen Yuekai Sun Mikhail Yurochkin 76 5 0 09 Dec 2024
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do Guijin Son Hyunwoo Ko Hoyoung Lee Yewon Kim Seunghyeok Hong ALM ELM 30 5 0 17 Sep 2024
100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances Lorenzo Pacchiardi Lucy G. Cheke José Hernández Orallo ALM LRM ELM 32 3 0 05 Sep 2024
AutoBencher: Towards Declarative Benchmark Construction Xiang Lisa Li E. Liu Percy Liang Tatsunori Hashimoto Percy Liang Tatsunori Hashimoto 35 1 0 11 Jul 2024
When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives Yebowen Hu Kaiqiang Song Sangwoo Cho Xiaoyang Wang Wenlin Yao H. Foroosh Dong Yu Fei Liu 32 6 0 17 Jun 2024
Dissociation of Faithful and Unfaithful Reasoning in LLMs Evelyn Yee Alice Li Chenyu Tang Yeon Ho Jung R. Paturi Leon Bergen LRM 24 4 0 23 May 2024
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches Pablo Biedma Xiaoyuan Yi Linus Huang Maosong Sun Xing Xie PILM 32 1 0 19 Apr 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach Kun Sun Rong Wang Anders Sogaard 24 3 0 22 Mar 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents Kaijie Zhu Jindong Wang Qinlin Zhao Ruochen Xu Xing Xie 30 30 0 21 Feb 2024
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation Yikun Wang Rui Zheng Haoming Li Qi Zhang Tao Gui Fei Liu OffRL 14 3 0 15 Nov 2023
Evaluating General-Purpose AI with Psychometrics Xiting Wang Liming Jiang Jose Hernandez-Orallo David Stillwell Luning Sun Fang Luo Xing Xie AI4MH ELM 17 12 0 25 Oct 2023
Language Models as a Service: Overview of a New Paradigm and its Challenges Emanuele La Malfa Aleksandar Petrov Simon Frieder Christoph Weinhuber Ryan Burnell Raza Nazar Anthony Cohn Nigel Shadbolt Michael Wooldridge ALM ELM 22 3 0 28 Sep 2023