Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.04792
Cited By
Targeting the Benchmark: On Methodology in Current Natural Language Processing Research
7 July 2020
David Schlangen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Targeting the Benchmark: On Methodology in Current Natural Language Processing Research"
12 / 12 papers shown
Title
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez Llorca
ELM
135
1
0
10 Feb 2025
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
48
19
0
14 Aug 2023
Weisfeiler and Leman Go Measurement Modeling: Probing the Validity of the WL Test
Arjun Subramonian
Adina Williams
Maximilian Nickel
Yizhou Sun
Levent Sagun
21
0
0
11 Jul 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
P. Sadler
David Schlangen
21
2
0
24 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
80
1,147
0
17 May 2023
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy
David Schlangen
ELM
21
13
0
14 Apr 2023
The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
Barbara Plank
30
97
0
04 Nov 2022
StyLEx: Explaining Style Using Human Lexical Annotations
Shirley Anugrah Hayati
Kyumin Park
Dheeraj Rajagopal
Lyle Ungar
Dongyeop Kang
20
3
0
14 Oct 2022
Underspecification in Scene Description-to-Depiction Tasks
Ben Hutchinson
Jason Baldridge
Vinodkumar Prabhakaran
DiffM
66
32
0
11 Oct 2022
Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR
Nina Markl
S. McNulty
22
9
0
25 Feb 2022
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
Bernard Koch
Emily L. Denton
A. Hanna
J. Foster
36
140
0
03 Dec 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1