ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.04792
  4. Cited By
Targeting the Benchmark: On Methodology in Current Natural Language
  Processing Research

Targeting the Benchmark: On Methodology in Current Natural Language Processing Research

7 July 2020
David Schlangen
ArXivPDFHTML

Papers citing "Targeting the Benchmark: On Methodology in Current Natural Language Processing Research"

12 / 12 papers shown
Title
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Maria Eriksson
Erasmo Purificato
Arman Noroozian
Joao Vinagre
Guillaume Chaslot
Emilia Gomez
David Fernandez Llorca
ELM
135
1
0
10 Feb 2025
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
48
19
0
14 Aug 2023
Weisfeiler and Leman Go Measurement Modeling: Probing the Validity of
  the WL Test
Weisfeiler and Leman Go Measurement Modeling: Probing the Validity of the WL Test
Arjun Subramonian
Adina Williams
Maximilian Nickel
Yizhou Sun
Levent Sagun
21
0
0
11 Jul 2023
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental
  Algorithm for Referring Expression Generation from Examples
Pento-DIARef: A Diagnostic Dataset for Learning the Incremental Algorithm for Referring Expression Generation from Examples
P. Sadler
David Schlangen
21
2
0
24 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
80
1,147
0
17 May 2023
Dialogue Games for Benchmarking Language Understanding: Motivation,
  Taxonomy, Strategy
Dialogue Games for Benchmarking Language Understanding: Motivation, Taxonomy, Strategy
David Schlangen
ELM
21
13
0
14 Apr 2023
The 'Problem' of Human Label Variation: On Ground Truth in Data,
  Modeling and Evaluation
The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation
Barbara Plank
30
97
0
04 Nov 2022
StyLEx: Explaining Style Using Human Lexical Annotations
StyLEx: Explaining Style Using Human Lexical Annotations
Shirley Anugrah Hayati
Kyumin Park
Dheeraj Rajagopal
Lyle Ungar
Dongyeop Kang
20
3
0
14 Oct 2022
Underspecification in Scene Description-to-Depiction Tasks
Underspecification in Scene Description-to-Depiction Tasks
Ben Hutchinson
Jason Baldridge
Vinodkumar Prabhakaran
DiffM
66
32
0
11 Oct 2022
Language technology practitioners as language managers: arbitrating data
  bias and predictive bias in ASR
Language technology practitioners as language managers: arbitrating data bias and predictive bias in ASR
Nina Markl
S. McNulty
22
9
0
25 Feb 2022
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning
  Research
Reduced, Reused and Recycled: The Life of a Dataset in Machine Learning Research
Bernard Koch
Emily L. Denton
A. Hanna
J. Foster
36
140
0
03 Dec 2021
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,956
0
20 Apr 2018
1