ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.09022
  4. Cited By
It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and
  Measurements of Performance

It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance

15 May 2023
Arjun Subramonian
Xingdi Yuan
Hal Daumé
Su Lin Blodgett
ArXivPDFHTML

Papers citing "It Takes Two to Tango: Navigating Conceptualizations of NLP Tasks and Measurements of Performance"

11 / 11 papers shown
Title
WinoPron: Revisiting English Winogender Schemas for Consistency,
  Coverage, and Grammatical Case
WinoPron: Revisiting English Winogender Schemas for Consistency, Coverage, and Grammatical Case
Vagrant Gautam
Julius Steuer
Eileen Bingert
Ray Johns
Anne Lauscher
Dietrich Klakow
46
3
0
09 Sep 2024
Stop Measuring Calibration When Humans Disagree
Stop Measuring Calibration When Humans Disagree
Joris Baan
Wilker Aziz
Barbara Plank
Raquel Fernández
16
53
0
28 Oct 2022
ACES: Translation Accuracy Challenge Sets for Evaluating Machine
  Translation Metrics
ACES: Translation Accuracy Challenge Sets for Evaluating Machine Translation Metrics
Chantal Amrhein
Nikita Moghe
Liane Guillou
ELM
23
22
0
27 Oct 2022
Quantifying Social Biases Using Templates is Unreliable
Quantifying Social Biases Using Templates is Unreliable
P. Seshadri
Pouya Pezeshkpour
Sameer Singh
49
33
0
09 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
103
92
0
06 Oct 2022
Extractive is not Faithful: An Investigation of Broad Unfaithfulness
  Problems in Extractive Summarization
Extractive is not Faithful: An Investigation of Broad Unfaithfulness Problems in Extractive Summarization
Shiyue Zhang
David Wan
Mohit Bansal
HILM
45
27
0
08 Sep 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and
  Their Implications
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
94
25
0
13 May 2022
The GEM Benchmark: Natural Language Generation, its Evaluation and
  Metrics
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics
Sebastian Gehrmann
Tosin P. Adewumi
Karmanya Aggarwal
Pawan Sasanka Ammanamanchi
Aremu Anuoluwapo
...
Nishant Subramani
Wei-ping Xu
Diyi Yang
Akhila Yerukola
Jiawei Zhou
VLM
243
284
0
02 Feb 2021
Hypothesis Only Baselines in Natural Language Inference
Hypothesis Only Baselines in Natural Language Inference
Adam Poliak
Jason Naradowsky
Aparajita Haldar
Rachel Rudinger
Benjamin Van Durme
187
576
0
02 May 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
294
6,943
0
20 Apr 2018
Text Summarization Techniques: A Brief Survey
Text Summarization Techniques: A Brief Survey
M. Allahyari
Seyedamin Pouriyeh
Mehdi Assefi
S. Safaei
Elizabeth D. Trippe
Juan B. Gutierrez
K. Kochut
CVBM
50
512
0
07 Jul 2017
1