Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.10062
Cited By
Revealing the structure of language model capabilities
14 June 2023
Ryan Burnell
Hank Hao
Andrew R. A. Conway
José Hernández Orallo
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Revealing the structure of language model capabilities"
13 / 13 papers shown
Title
RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction
Jianhao Yan
Yun Luo
Yue Zhang
LLMAG
50
1
0
25 Feb 2025
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Felipe Maia Polo
S. Kamath S
Leshem Choshen
Yuekai Sun
Mikhail Yurochkin
76
5
0
09 Dec 2024
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
Guijin Son
Hyunwoo Ko
Hoyoung Lee
Yewon Kim
Seunghyeok Hong
ALM
ELM
30
5
0
17 Sep 2024
100 instances is all you need: predicting the success of a new LLM on unseen data by testing on a few instances
Lorenzo Pacchiardi
Lucy G. Cheke
José Hernández Orallo
ALM
LRM
ELM
32
3
0
05 Sep 2024
AutoBencher: Towards Declarative Benchmark Construction
Xiang Lisa Li
E. Liu
Percy Liang
Tatsunori Hashimoto
Percy Liang
Tatsunori Hashimoto
35
1
0
11 Jul 2024
When Reasoning Meets Information Aggregation: A Case Study with Sports Narratives
Yebowen Hu
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
Wenlin Yao
H. Foroosh
Dong Yu
Fei Liu
32
6
0
17 Jun 2024
Dissociation of Faithful and Unfaithful Reasoning in LLMs
Evelyn Yee
Alice Li
Chenyu Tang
Yeon Ho Jung
R. Paturi
Leon Bergen
LRM
24
4
0
23 May 2024
Beyond Human Norms: Unveiling Unique Values of Large Language Models through Interdisciplinary Approaches
Pablo Biedma
Xiaoyuan Yi
Linus Huang
Maosong Sun
Xing Xie
PILM
32
1
0
19 Apr 2024
Comprehensive Reassessment of Large-Scale Evaluation Outcomes in LLMs: A Multifaceted Statistical Approach
Kun Sun
Rong Wang
Anders Sogaard
24
3
0
22 Mar 2024
Dynamic Evaluation of Large Language Models by Meta Probing Agents
Kaijie Zhu
Jindong Wang
Qinlin Zhao
Ruochen Xu
Xing Xie
30
30
0
21 Feb 2024
Rescue: Ranking LLM Responses with Partial Ordering to Improve Response Generation
Yikun Wang
Rui Zheng
Haoming Li
Qi Zhang
Tao Gui
Fei Liu
OffRL
14
3
0
15 Nov 2023
Evaluating General-Purpose AI with Psychometrics
Xiting Wang
Liming Jiang
Jose Hernandez-Orallo
David Stillwell
Luning Sun
Fang Luo
Xing Xie
AI4MH
ELM
17
12
0
25 Oct 2023
Language Models as a Service: Overview of a New Paradigm and its Challenges
Emanuele La Malfa
Aleksandar Petrov
Simon Frieder
Christoph Weinhuber
Ryan Burnell
Raza Nazar
Anthony Cohn
Nigel Shadbolt
Michael Wooldridge
ALM
ELM
22
3
0
28 Sep 2023
1