ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2107.10821
  4. Cited By
To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for
  Machine Translation

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

22 July 2021
Tom Kocmi
C. Federmann
Roman Grundkiewicz
Marcin Junczys-Dowmunt
Hitokazu Matsushita
Arul Menezes
ArXivPDFHTML

Papers citing "To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation"

50 / 127 papers shown
Title
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
Tobias Domhan
Dawei Zhu
26
0
0
03 May 2025
Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
Remedy: Learning Machine Translation Evaluation from Human Preferences with Reward Modeling
Shaomu Tan
Christof Monz
32
0
0
18 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
68
0
0
01 Apr 2025
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Contextual Metric Meta-Evaluation by Measuring Local Metric Accuracy
Athiya Deviyani
Fernando Diaz
28
0
0
25 Mar 2025
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
LAG-MMLU: Benchmarking Frontier LLM Understanding in Latvian and Giriama
Naome A. Etori
Kevin Lu
Randu Karisa
Arturs Kanepajs
LRM
ELM
89
0
0
14 Mar 2025
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators
Large Language Models Are Effective Human Annotation Assistants, But Not Good Independent Annotators
Feng Gu
Zongxia Li
Carlos Rafael Colon
Benjamin Evans
Ishani Mondal
Jordan Boyd-Graber
46
1
0
09 Mar 2025
Leveraging Domain Knowledge at Inference Time for LLM Translation: Retrieval versus Generation
Bryan Li
Jiaming Luo
Eleftheria Briakou
Colin Cherry
35
0
0
06 Mar 2025
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Verify with Caution: The Pitfalls of Relying on Imperfect Factuality Metrics
Ameya Godbole
Robin Jia
HILM
51
1
0
24 Jan 2025
A review of faithfulness metrics for hallucination assessment in Large Language Models
Ben Malin
Tatiana Kalganova
Nikoloas Boulgouris
HILM
59
2
0
03 Jan 2025
Enabling Scalable Evaluation of Bias Patterns in Medical LLMs
Enabling Scalable Evaluation of Bias Patterns in Medical LLMs
Hamed Fayyaz
Raphael Poulain
Rahmatollah Beheshti
32
1
0
18 Oct 2024
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
MetricX-24: The Google Submission to the WMT 2024 Metrics Shared Task
Juraj Juraska
Daniel Deutsch
Mara Finkelstein
Markus Freitag
39
14
0
04 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
37
7
0
03 Oct 2024
Is Preference Alignment Always the Best Option to Enhance LLM-Based
  Translation? An Empirical Analysis
Is Preference Alignment Always the Best Option to Enhance LLM-Based Translation? An Empirical Analysis
Hippolyte Gisserot-Boukhlef
Ricardo Rei
Emmanuel Malherbe
C´eline Hudelot
Pierre Colombo
Nuno M. Guerreiro
28
2
0
30 Sep 2024
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic
  Post-Editing in LLM Translation Evaluators
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators
Qingyu Lu
Liang Ding
Kanjian Zhang
Jinxia Zhang
Dacheng Tao
35
3
0
22 Sep 2024
Enhancing E-commerce Product Title Translation with Retrieval-Augmented
  Generation and Large Language Models
Enhancing E-commerce Product Title Translation with Retrieval-Augmented Generation and Large Language Models
Bryan Zhang
Taichi Nakatani
Stephan Walter
26
0
0
19 Sep 2024
Improving Statistical Significance in Human Evaluation of Automatic
  Metrics via Soft Pairwise Accuracy
Improving Statistical Significance in Human Evaluation of Automatic Metrics via Soft Pairwise Accuracy
Brian Thompson
Nitika Mathur
Daniel Deutsch
Huda Khayrallah
25
9
0
15 Sep 2024
Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics
  Fall In!
Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
Stefano Perrella
Lorenzo Proietti
Alessandro Sciré
Edoardo Barba
Roberto Navigli
23
3
0
25 Aug 2024
Large Language Models Might Not Care What You Are Saying: Prompt Format
  Beats Descriptions
Large Language Models Might Not Care What You Are Saying: Prompt Format Beats Descriptions
Chenming Tang
Zhixiang Wang
Yunfang Wu
LRM
21
0
0
16 Aug 2024
SCOI: Syntax-augmented Coverage-based In-context Example Selection for
  Machine Translation
SCOI: Syntax-augmented Coverage-based In-context Example Selection for Machine Translation
Chenming Tang
Zhixiang Wang
Yunfang Wu
16
1
0
09 Aug 2024
Scaling Sign Language Translation
Scaling Sign Language Translation
Biao Zhang
Garrett Tanzer
Orhan Firat
LRM
VLM
SLR
32
1
0
16 Jul 2024
NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task
NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task
Muhammad Abdul-Mageed
Amr Keleg
AbdelRahim Elmadany
Chiyu Zhang
Injy Hamed
Walid Magdy
Houda Bouamor
Nizar Habash
30
16
0
06 Jul 2024
Evaluating Automatic Metrics with Incremental Machine Translation
  Systems
Evaluating Automatic Metrics with Incremental Machine Translation Systems
Guojun Wu
Shay B. Cohen
Rico Sennrich
19
0
0
03 Jul 2024
On the Evaluation Practices in Multilingual NLP: Can Machine Translation
  Offer an Alternative to Human Translations?
On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?
Rochelle Choenni
Sara Rajaee
Christof Monz
Ekaterina Shutova
19
1
0
20 Jun 2024
Error Span Annotation: A Balanced Approach for Human Evaluation of
  Machine Translation
Error Span Annotation: A Balanced Approach for Human Evaluation of Machine Translation
Tom Kocmi
Vilém Zouhar
Eleftherios Avramidis
Roman Grundkiewicz
Marzena Karpinska
Maja Popović
Mrinmaya Sachan
Mariya Shmatova
26
14
0
17 Jun 2024
Quantifying Variance in Evaluation Benchmarks
Quantifying Variance in Evaluation Benchmarks
Lovish Madaan
Aaditya K. Singh
Rylan Schaeffer
Andrew Poulton
Sanmi Koyejo
Pontus Stenetorp
Sharan Narang
Dieuwke Hupkes
33
9
0
14 Jun 2024
GLIMPSE: Pragmatically Informative Multi-Document Summarization for
  Scholarly Reviews
GLIMPSE: Pragmatically Informative Multi-Document Summarization for Scholarly Reviews
Maxime Darrin
Ines Arous
Pablo Piantanida
Jackie CK Cheung
29
2
0
11 Jun 2024
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for
  Generative AI Evaluation
Favi-Score: A Measure for Favoritism in Automated Preference Ratings for Generative AI Evaluation
Pius von Daniken
Jan Deriu
Don Tuggener
Mark Cieliebak
14
1
0
03 Jun 2024
The Fine-Tuning Paradox: Boosting Translation Quality Without
  Sacrificing LLM Abilities
The Fine-Tuning Paradox: Boosting Translation Quality Without Sacrificing LLM Abilities
David Stap
Eva Hasler
Bill Byrne
Christof Monz
Ke M. Tran
27
8
0
30 May 2024
Does Whisper understand Swiss German? An automatic, qualitative, and
  human evaluation
Does Whisper understand Swiss German? An automatic, qualitative, and human evaluation
Eyal Liron Dolev
Clemens Fidel Lutz
Noemi Aepli
21
4
0
30 Apr 2024
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Justin Zhao
Timothy Wang
Wael Abid
Geoffrey Angus
Arnav Garg
Jeffery Kinnison
Alex Sherstinsky
Piero Molino
Travis Addair
Devvret Rishi
ALM
46
28
0
29 Apr 2024
Investigating Neural Machine Translation for Low-Resource Languages:
  Using Bavarian as a Case Study
Investigating Neural Machine Translation for Low-Resource Languages: Using Bavarian as a Case Study
Wan-Hua Her
Udo Kruschwitz
25
4
0
12 Apr 2024
Going Beyond Word Matching: Syntax Improves In-context Example Selection
  for Machine Translation
Going Beyond Word Matching: Syntax Improves In-context Example Selection for Machine Translation
Chenming Tang
Zhixiang Wang
Yunfang Wu
21
1
0
28 Mar 2024
TEaR: Improving LLM-based Machine Translation with Systematic
  Self-Refinement
TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement
Zhaopeng Feng
Yan Zhang
Hao Li
Bei Wu
Jiayu Liao
Wenqiang Liu
Jun Lang
Yang Feng
Jian Wu
Zuozhu Liu
LRM
40
9
0
26 Feb 2024
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Masanari Ohi
Masahiro Kaneko
Ryuto Koike
Mengsay Loem
Naoaki Okazaki
21
4
0
25 Feb 2024
Advancing Translation Preference Modeling with RLHF: A Step Towards
  Cost-Effective Solution
Advancing Translation Preference Modeling with RLHF: A Step Towards Cost-Effective Solution
Nuo Xu
Jun Zhao
Can Zu
Sixian Li
Lu Chen
...
Shihan Dou
Wenjuan Qin
Tao Gui
Qi Zhang
Xuanjing Huang
43
6
0
18 Feb 2024
Large Language Models "Ad Referendum": How Good Are They at Machine
  Translation in the Legal Domain?
Large Language Models "Ad Referendum": How Good Are They at Machine Translation in the Legal Domain?
Vicent Briva-Iglesias
Joao Lucas Cavalheiro Camargo
Gokhan Dogru
AILaw
ELM
28
7
0
12 Feb 2024
Open-ended VQA benchmarking of Vision-Language models by exploiting
  Classification datasets and their semantic hierarchy
Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Simon Ging
M. A. Bravo
Thomas Brox
VLM
38
11
0
11 Feb 2024
Improving Machine Translation with Human Feedback: An Exploration of
  Quality Estimation as a Reward Model
Improving Machine Translation with Human Feedback: An Exploration of Quality Estimation as a Reward Model
Zhiwei He
Xing Wang
Wenxiang Jiao
Zhuosheng Zhang
Rui Wang
Shuming Shi
Zhaopeng Tu
ALM
29
24
0
23 Jan 2024
An Empirical Study of In-context Learning in LLMs for Machine
  Translation
An Empirical Study of In-context Learning in LLMs for Machine Translation
Pranjal A. Chitale
Jay Gala
Raj Dabre
LRM
26
5
0
22 Jan 2024
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
Tom Kocmi
Vilém Zouhar
C. Federmann
Matt Post
21
26
0
12 Jan 2024
Don't Rank, Combine! Combining Machine Translation Hypotheses Using
  Quality Estimation
Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation
Giorgos Vernikos
Andrei Popescu-Belis
30
14
0
12 Jan 2024
POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource
  Unsupervised Neural Machine Translation
POMP: Probability-driven Meta-graph Prompter for LLMs in Low-resource Unsupervised Neural Machine Translation
Shilong Pan
Zhiliang Tian
Liang Ding
Zhen Huang
Zhihua Wen
Dongsheng Li
32
2
0
11 Jan 2024
Quality and Quantity of Machine Translation References for Automatic
  Metrics
Quality and Quantity of Machine Translation References for Automatic Metrics
Vilém Zouhar
Ondrej Bojar
65
7
0
02 Jan 2024
Speech Translation with Large Language Models: An Industrial Practice
Speech Translation with Large Language Models: An Industrial Practice
Zhichao Huang
Rong Ye
Tom Ko
Qianqian Dong
Shanbo Cheng
Mingxuan Wang
Hang Li
62
15
0
21 Dec 2023
Trained MT Metrics Learn to Cope with Machine-translated References
Trained MT Metrics Learn to Cope with Machine-translated References
Jannis Vamvas
Tobias Domhan
Sony Trenous
Rico Sennrich
Eva Hasler
15
1
0
01 Dec 2023
JWSign: A Highly Multilingual Corpus of Bible Translations for more
  Diversity in Sign Language Processing
JWSign: A Highly Multilingual Corpus of Bible Translations for more Diversity in Sign Language Processing
Shester Gueuwou
Sophie Siake
Colin Leong
Mathias Müller
SLR
22
11
0
16 Nov 2023
Extending Multilingual Machine Translation through Imitation Learning
Extending Multilingual Machine Translation through Imitation Learning
Wen Lai
Viktor Hangya
Alexander M. Fraser
LRM
CLL
14
3
0
14 Nov 2023
Separating the Wheat from the Chaff with BREAD: An open-source benchmark
  and metrics to detect redundancy in text
Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text
Isaac Caswell
Lisa Wang
Isabel Papadimitriou
26
0
0
11 Nov 2023
Data Augmentation Techniques for Machine Translation of Code-Switched
  Texts: A Comparative Study
Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study
Injy Hamed
Nizar Habash
Ngoc Thang Vu
16
2
0
23 Oct 2023
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4
Tom Kocmi
C. Federmann
19
73
0
21 Oct 2023
123
Next