MOCHA: A Dataset for Training and Evaluating Generative Reading
Comprehension Metrics

MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics

7 October 2020

Gabriel Stanovsky

Papers citing "MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics"

7 / 7 papers shown

Title
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators Feng Gu Zongxia Li Carlos Rafael Colon Benjamin Evans Ishani Mondal Jordan Boyd-Graber 43 1 0 09 Mar 2025
A Critical Evaluation of Evaluations for Long-form Question Answering Fangyuan Xu Yixiao Song Mohit Iyyer Eunsol Choi ELM 27 94 0 29 May 2023
Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts Qinyuan Ye Juan Zha Xiang Ren MoE 11 12 0 25 May 2022
Repro: An Open-Source Library for Improving the Reproducibility and Usability of Publicly Available Research Code Daniel Deutsch Dan Roth AI4CE 32 2 0 29 Apr 2022
Tomayto, Tomahto. Beyond Token-level Answer Equivalence for Question Answering Evaluation Jannis Bulian Christian Buck Wojciech Gajewski Benjamin Boerschinger Tal Schuster 14 43 0 15 Feb 2022
Challenges in Information-Seeking QA: Unanswerable Questions and Paragraph Retrieval Akari Asai Eunsol Choi RALM 29 51 0 22 Oct 2020
Are We Modeling the Task or the Annotator? An Investigation of Annotator Bias in Natural Language Understanding Datasets Mor Geva Yoav Goldberg Jonathan Berant 235 319 0 21 Aug 2019