Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.10821
Cited By
To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation
22 July 2021
Tom Kocmi
C. Federmann
Roman Grundkiewicz
Marcin Junczys-Dowmunt
Hitokazu Matsushita
Arul Menezes
Re-assign community
ArXiv
PDF
HTML
Papers citing
"To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation"
50 / 127 papers shown
Title
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
Tobias Domhan
Dawei Zhu
26
0
0
03 May 2025
Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
Shaomu Tan
Christof Monz
32
0
0
18 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
68
0
0
01 Apr 2025
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Athiya Deviyani
Fernando Diaz
28
0
0
25 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
Naome A. Etori
Kevin Lu
Randu Karisa
Arturs Kanepajs
LRM
ELM
89
0
0
14 Mar 2025
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators
Feng Gu
Zongxia Li
Carlos Rafael Colon
Benjamin Evans
Ishani Mondal
Jordan Boyd-Graber
46
1
0
09 Mar 2025
Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation
Bryan Li
Jiaming Luo
Eleftheria Briakou
Colin Cherry
35
0
0
06 Mar 2025
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Ameya Godbole
Robin Jia
HILM
51
1
0
24 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
Enabling Scalable Evaluation of Bias Patterns in Medical LLMs
Hamed Fayyaz
Raphael Poulain
Rahmatollah Beheshti
32
1
0
18 Oct 2024
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
Juraj Juraska
Daniel Deutsch
Mara Finkelstein
Markus Freitag
39
14
0
04 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
37
7
0
03 Oct 2024
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Hippolyte Gisserot-Boukhlef
Ricardo Rei
Emmanuel Malherbe
C´eline Hudelot
Pierre Colombo
Nuno M. Guerreiro
28
2
0
30 Sep 2024
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators
Qingyu Lu
Liang Ding
Kanjian Zhang
Jinxia Zhang
Dacheng Tao
35
3
0
22 Sep 2024
Enhancing E-commerce Product Title Translation with Retrieval-Augmented Generation and Large Language Models
Bryan Zhang
Taichi Nakatani
Stephan Walter
28
0
0
19 Sep 2024
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Brian Thompson
Nitika Mathur
Daniel Deutsch
Huda Khayrallah
25
9
0
15 Sep 2024
Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
Stefano Perrella
Lorenzo Proietti
Alessandro Sciré
Edoardo Barba
Roberto Navigli
23
3
0
25 Aug 2024
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
Chenming Tang
Zhixiang Wang
Yunfang Wu
LRM
21
0
0
16 Aug 2024
SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
Chenming Tang
Zhixiang Wang
Yunfang Wu
16
1
0
09 Aug 2024
Scaling Sign Language Translation
Biao Zhang
Garrett Tanzer
Orhan Firat
LRM
VLM
SLR
32
1
0
16 Jul 2024
NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task
Muhammad Abdul-Mageed
Amr Keleg
AbdelRahim Elmadany
Chiyu Zhang
Injy Hamed
Walid Magdy
Houda Bouamor
Nizar Habash
30
16
0
06 Jul 2024
Evaluating Automatic Metrics with Incremental Machine Translation Systems
Guojun Wu
Shay B. Cohen
Rico Sennrich
19
0
0
03 Jul 2024
On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?
Rochelle Choenni
Sara Rajaee
Christof Monz
Ekaterina Shutova
21
1
0
20 Jun 2024
Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Tom Kocmi
Vilém Zouhar
Eleftherios Avramidis
Roman Grundkiewicz
Marzena Karpinska
Maja Popović
Mrinmaya Sachan
Mariya Shmatova
26
14
0
17 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
33
9
0
14 Jun 2024
GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews
Maxime Darrin
Ines Arous
Pablo Piantanida
Jackie CK Cheung
29
2
0
11 Jun 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
14
1
0
03 Jun 2024
The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities
David Stap
Eva Hasler
Bill Byrne
Christof Monz
Ke M. Tran
27
8
0
30 May 2024
Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation
Eyal Liron Dolev
Clemens Fidel Lutz
Noemi Aepli
23
4
0
30 Apr 2024
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Justin Zhao
Timothy Wang
Wael Abid
Geoffrey Angus
Arnav Garg
Jeffery Kinnison
Alex Sherstinsky
Piero Molino
Travis Addair
Devvret Rishi
ALM
46
28
0
29 Apr 2024
Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study
Wan-Hua Her
Udo Kruschwitz
25
4
0
12 Apr 2024
Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation
Chenming Tang
Zhixiang Wang
Yunfang Wu
21
1
0
28 Mar 2024
TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement
Zhaopeng Feng
Yan Zhang
Hao Li
Bei Wu
Jiayu Liao
Wenqiang Liu
Jun Lang
Yang Feng
Jian Wu
Zuozhu Liu
LRM
40
9
0
26 Feb 2024
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Masanari Ohi
Masahiro Kaneko
Ryuto Koike
Mengsay Loem
Naoaki Okazaki
21
4
0
25 Feb 2024
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
Nuo Xu
Jun Zhao
Can Zu
Sixian Li
Lu Chen
...
Shihan Dou
Wenjuan Qin
Tao Gui
Qi Zhang
Xuanjing Huang
43
6
0
18 Feb 2024
Large Language Models "Ad Referendum": How Good Are They at Machine Translation in the Legal Domain?
Vicent Briva-Iglesias
Joao Lucas Cavalheiro Camargo
Gokhan Dogru
AILaw
ELM
28
7
0
12 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Simon Ging
M. A. Bravo
Thomas Brox
VLM
38
11
0
11 Feb 2024
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
Zhiwei He
Xing Wang
Wenxiang Jiao
Zhuosheng Zhang
Rui Wang
Shuming Shi
Zhaopeng Tu
ALM
29
24
0
23 Jan 2024
An Empirical Study of In-context Learning in LLMs for Machine Translation
Pranjal A. Chitale
Jay Gala
Raj Dabre
LRM
26
5
0
22 Jan 2024
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
Tom Kocmi
Vilém Zouhar
C. Federmann
Matt Post
21
26
0
12 Jan 2024
Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation
Giorgos Vernikos
Andrei Popescu-Belis
30
14
0
12 Jan 2024
POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation
Shilong Pan
Zhiliang Tian
Liang Ding
Zhen Huang
Zhihua Wen
Dongsheng Li
32
2
0
11 Jan 2024
Quality and Quantity of Machine Translation References for Automatic Metrics
Vilém Zouhar
Ondrej Bojar
65
7
0
02 Jan 2024
Speech Translation with Large Language Models: An Industrial Practice
Zhichao Huang
Rong Ye
Tom Ko
Qianqian Dong
Shanbo Cheng
Mingxuan Wang
Hang Li
62
15
0
21 Dec 2023
Trained MT Metrics Learn to Cope with Machine-translated References
Jannis Vamvas
Tobias Domhan
Sony Trenous
Rico Sennrich
Eva Hasler
15
1
0
01 Dec 2023
JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing
Shester Gueuwou
Sophie Siake
Colin Leong
Mathias Müller
SLR
22
11
0
16 Nov 2023
Extending Multilingual Machine Translation through Imitation Learning
Wen Lai
Viktor Hangya
Alexander M. Fraser
LRM
CLL
14
3
0
14 Nov 2023
Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text
Isaac Caswell
Lisa Wang
Isabel Papadimitriou
26
0
0
11 Nov 2023
Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study
Injy Hamed
Nizar Habash
Ngoc Thang Vu
16
2
0
23 Oct 2023
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4
Tom Kocmi
C. Federmann
19
73
0
21 Oct 2023
1
2
3
Next