ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.00840
  4. Cited By
Comparing Test Sets with Item Response Theory

Comparing Test Sets with Item Response Theory

1 June 2021
Clara Vania
Phu Mon Htut
William Huang
Dhara Mungra
Richard Yuanzhe Pang
Jason Phang
Haokun Liu
Kyunghyun Cho
Sam Bowman
ArXivPDFHTML

Papers citing "Comparing Test Sets with Item Response Theory"

9 / 9 papers shown
Title
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Vipul Gupta
Candace Ross
David Pantoja
R. Passonneau
Megan Ung
Adina Williams
70
1
0
26 Oct 2024
Efficient multi-prompt evaluation of LLMs
Efficient multi-prompt evaluation of LLMs
Felipe Maia Polo
Ronald Xu
Lucas Weber
Mírian Silva
Onkar Bhardwaj
Leshem Choshen
Allysson Flavio Melo de Oliveira
Yuekai Sun
Mikhail Yurochkin
37
17
0
27 May 2024
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
26
6
0
13 Apr 2023
MonoByte: A Pool of Monolingual Byte-level Language Models
MonoByte: A Pool of Monolingual Byte-level Language Models
Hugo Queiroz Abonizio
Leandro Rodrigues de Souza
R. Lotufo
Rodrigo Nogueira
23
1
0
22 Sep 2022
py-irt: A Scalable Item Response Theory Library for Python
py-irt: A Scalable Item Response Theory Library for Python
John P. Lalor
Pedro Rodriguez
20
10
0
02 Mar 2022
Models in the Loop: Aiding Crowdworkers with Generative Annotation
  Assistants
Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants
Max Bartolo
Tristan Thrush
Sebastian Riedel
Pontus Stenetorp
Robin Jia
Douwe Kiela
19
33
0
16 Dec 2021
Adversarially Constructed Evaluation Sets Are More Challenging, but May
  Not Be Fair
Adversarially Constructed Evaluation Sets Are More Challenging, but May Not Be Fair
Jason Phang
Angelica Chen
William Huang
Samuel R. Bowman
AAML
28
13
0
16 Nov 2021
The Benchmark Lottery
The Benchmark Lottery
Mostafa Dehghani
Yi Tay
A. Gritsenko
Zhe Zhao
N. Houlsby
Fernando Diaz
Donald Metzler
Oriol Vinyals
34
89
0
14 Jul 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,950
0
20 Apr 2018
1