Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2210.13746
Cited By
DEMETR: Diagnosing Evaluation Metrics for Translation
25 October 2022
Marzena Karpinska
N. Raj
Katherine Thai
Yixiao Song
Ankita Gupta
Mohit Iyyer
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DEMETR: Diagnosing Evaluation Metrics for Translation"
27 / 27 papers shown
Title
AskQE: Question Answering as Automatic Evaluation for Machine Translation
Dayeon Ki
Kevin Duh
Marine Carpuat
24
0
0
15 Apr 2025
Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation
Yingfeng Luo
Tong Zheng
Yongyu Mu
B. Li
Qinghong Zhang
...
Ziqiang Xu
Peinan Feng
Xiaoqian Liu
Tong Xiao
Jingbo Zhu
AI4CE
68
0
0
09 Mar 2025
Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
Stefano Perrella
Lorenzo Proietti
Pere-Lluís Huguet Cabot
Edoardo Barba
Roberto Navigli
14
2
0
07 Oct 2024
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
Juraj Juraska
Daniel Deutsch
Mara Finkelstein
Markus Freitag
31
14
0
04 Oct 2024
What the Harm? Quantifying the Tangible Impact of Gender Bias in Machine Translation with a Human-centered Study
Beatrice Savoldi
Sara Papi
Matteo Negri
Ana Guerberof
L. Bentivogli
35
6
0
01 Oct 2024
Improving Minimum Bayes Risk Decoding with Multi-Prompt
David Heineman
Yao Dou
Wei-ping Xu
29
6
0
22 Jul 2024
One Thousand and One Pairs: A "novel" challenge for long-context language models
Marzena Karpinska
Katherine Thai
Kyle Lo
Tanya Goyal
Mohit Iyyer
LRM
36
40
0
24 Jun 2024
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations
Dayeon Ki
Marine Carpuat
25
17
0
11 Apr 2024
Reference-based Metrics Disprove Themselves in Question Generation
Bang Nguyen
Mengxia Yu
Yun Huang
Meng-Long Jiang
HILM
21
2
0
18 Mar 2024
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xinyu Hu
Mingqi Gao
Sen Hu
Yang Zhang
Yicheng Chen
Teng Xu
Xiaojun Wan
AAML
ELM
26
21
0
19 Feb 2024
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
Tom Kocmi
Vilém Zouhar
C. Federmann
Matt Post
21
10
0
12 Jan 2024
ACES: Translation Accuracy Challenge Sets at WMT 2023
Chantal Amrhein
Nikita Moghe
Liane Guillou
ELM
16
3
0
02 Nov 2023
Fighting Fire with Fire: The Dual Role of LLMs in Crafting and Detecting Elusive Disinformation
Jason Samuel Lucas
Adaku Uchendu
Michiharu Yamashita
Jooyoung Lee
Shaurya Rohatgi
Dongwon Lee
11
41
0
24 Oct 2023
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection
Nuno M. Guerreiro
Ricardo Rei
Daan van Stigt
Luísa Coheur
Pierre Colombo
André F.T. Martins
35
109
0
16 Oct 2023
This is not correct! Negation-aware Evaluation of Language Generation Systems
Miriam Anschütz
Diego Miguel Lozano
Georg Groh
25
6
0
26 Jul 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
18
11
0
22 Jun 2023
BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation
T. Glushkova
Chrysoula Zerva
André F. T. Martins
25
6
0
30 May 2023
The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics
Ricardo Rei
Nuno M. Guerreiro
Marcos Vinícius Treviso
Luísa Coheur
A. Lavie
André F.T. Martins
19
15
0
19 May 2023
Discourse Centric Evaluation of Machine Translation with a Densely Annotated Parallel Corpus
Yu Jiang
Tianyu Liu
Shuming Ma
Dongdong Zhang
Mrinmaya Sachan
Ryan Cotterell
22
7
0
18 May 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
31
81
0
06 Apr 2023
DeltaScore: Fine-Grained Story Evaluation with Perturbations
Zhuohan Xie
Miao Li
Trevor Cohn
Jey Han Lau
22
4
0
15 Mar 2023
IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages
Ananya B. Sai
Vignesh Nagarajan
Tanay Dixit
Raj Dabre
Anoop Kunchukuttan
Pratyush Kumar
Mitesh M. Khapra
26
20
0
20 Dec 2022
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu
Liang Ding
Liping Xie
Kanjian Zhang
Derek F. Wong
Dacheng Tao
ELM
ALM
24
12
0
20 Dec 2022
MENLI: Robust Evaluation Metrics from Natural Language Inference
Yanran Chen
Steffen Eger
19
15
0
15 Aug 2022
Bidimensional Leaderboards: Generate and Evaluate Language Hand in Hand
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Lavinia Dunagan
Jacob Morrison
Alexander R. Fabbri
Yejin Choi
Noah A. Smith
49
39
0
08 Dec 2021
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya B. Sai
Tanay Dixit
D. Y. Sheth
S. Mohan
Mitesh M. Khapra
AAML
94
55
0
13 Sep 2021
Stanza: A Python Natural Language Processing Toolkit for Many Human Languages
Peng Qi
Yuhao Zhang
Yuhui Zhang
Jason Bolton
Christopher D. Manning
AI4TS
190
1,638
0
16 Mar 2020
1