Comparing Test Sets with Item Response Theory

1 June 2021

Papers citing "Comparing Test Sets with Item Response Theory"

9 / 9 papers shown

Title
Improving Model Evaluation using SMART Filtering of Benchmark Datasets Vipul Gupta Candace Ross David Pantoja R. Passonneau Megan Ung Adina Williams 70 1 0 26 Oct 2024
Efficient multi-prompt evaluation of LLMs Felipe Maia Polo Ronald Xu Lucas Weber Mírian Silva Onkar Bhardwaj Leshem Choshen Allysson Flavio Melo de Oliveira Yuekai Sun Mikhail Yurochkin 37 17 0 27 May 2024
Computational modeling of semantic change Nina Tahmasebi Haim Dubossarsky 26 6 0 13 Apr 2023
MonoByte: A Pool of Monolingual Byte-level Language Models Hugo Queiroz Abonizio Leandro Rodrigues de Souza R. Lotufo Rodrigo Nogueira 23 1 0 22 Sep 2022
py-irt: A Scalable Item Response Theory Library for Python John P. Lalor Pedro Rodriguez 20 10 0 02 Mar 2022
Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants Max Bartolo Tristan Thrush Sebastian Riedel Pontus Stenetorp Robin Jia Douwe Kiela 19 33 0 16 Dec 2021
Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair Jason Phang Angelica Chen William Huang Samuel R. Bowman AAML 28 13 0 16 Nov 2021
The Benchmark Lottery Mostafa Dehghani Yi Tay A. Gritsenko Zhe Zhao N. Houlsby Fernando Diaz Donald Metzler Oriol Vinyals 34 89 0 14 Jul 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,950 0 20 Apr 2018