ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.14478
  4. Cited By
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation
  for Machine Translation

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

29 April 2021
Markus Freitag
George F. Foster
David Grangier
Viresh Ratnakar
Qijun Tan
Wolfgang Macherey
ArXivPDFHTML

Papers citing "Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation"

50 / 75 papers shown
Title
Calibrating Translation Decoding with Quality Estimation on LLMs
Calibrating Translation Decoding with Quality Estimation on LLMs
Di Wu
Yibin Lei
Christof Monz
70
0
0
26 Apr 2025
Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT
Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT
Joachim Minder
Guillaume Wisniewski
Natalie Kübler
28
0
0
21 Apr 2025
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Diana Galván-Sosa
Gabrielle Gaudeau
Pride Kavumba
Yunmeng Li
Hongyi gu
Zheng Yuan
Keisuke Sakaguchi
P. Buttery
LRM
35
0
0
31 Mar 2025
Self-Vocabularizing Training for Neural Machine Translation
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
63
0
0
18 Mar 2025
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
Xiang Geng
Zhejian Lai
Jiajun Chen
Hao Yang
Shujian Huang
62
0
0
27 Feb 2025
Automatic Input Rewriting Improves Translation with Large Language Models
Automatic Input Rewriting Improves Translation with Large Language Models
Dayeon Ki
Marine Carpuat
40
0
0
23 Feb 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
203
0
0
21 Feb 2025
Aligning Black-box Language Models with Human Judgments
Aligning Black-box Language Models with Human Judgments
Gerrit J. J. van den Burg
Gen Suzuki
Wei Liu
Murat Sensoy
ALM
82
0
0
07 Feb 2025
A comparison of translation performance between DeepL and Supertext
A comparison of translation performance between DeepL and Supertext
Alex Flückiger
Chantal Amrhein
Tim Graf
Frédéric Odermatt
Martin Pömsl
Philippe Schläpfer
Florian Schottmann
Samuel Laubli
ELM
40
0
0
04 Feb 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang
Wei-Ye Zhao
Steffen Eger
71
4
0
24 Oct 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text
  Generation: A State-of-the-Art Investigation
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
37
9
0
19 Jul 2024
AI-Assisted Human Evaluation of Machine Translation
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar
Tom Kocmi
Mrinmaya Sachan
30
5
0
18 Jun 2024
Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation
Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation
Boxuan Lyu
Hidetaka Kamigaito
Kotaro Funakoshi
Manabu Okumura
38
0
0
17 Jun 2024
Critical Learning Periods: Leveraging Early Training Dynamics for
  Efficient Data Pruning
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
39
4
0
29 May 2024
What Have We Achieved on Non-autoregressive Translation?
What Have We Achieved on Non-autoregressive Translation?
Yafu Li
Huajian Zhang
Jianhao Yan
Yongjing Yin
Yue Zhang
31
1
0
21 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
119
22
0
20 May 2024
Natural Language Processing RELIES on Linguistics
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
52
7
0
09 May 2024
Guiding Large Language Models to Post-Edit Machine Translation with
  Error Annotations
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations
Dayeon Ki
Marine Carpuat
36
17
0
11 Apr 2024
Human Evaluation of English--Irish Transformer-Based NMT
Human Evaluation of English--Irish Transformer-Based NMT
Séamus Lankford
Haithem Afli
Andy Way
35
10
0
04 Mar 2024
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Masanari Ohi
Masahiro Kaneko
Ryuto Koike
Mengsay Loem
Naoaki Okazaki
27
4
0
25 Feb 2024
Evaluating Optimal Reference Translations
Evaluating Optimal Reference Translations
Vilém Zouhar
Vvera Kloudová
Martin Popel
Ondrej Bojar
29
2
0
28 Nov 2023
Physician Detection of Clinical Harm in Machine Translation: Quality
  Estimation Aids in Reliance and Backtranslation Identifies Critical Errors
Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors
Nikita Mehandru
Sweta Agrawal
Yimin Xiao
Elaine C. Khoong
Ge Gao
Marine Carpuat
Niloufar Salehi
22
10
0
25 Oct 2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained
  Text Evaluation
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
David Heineman
Yao Dou
Wei-ping Xu
24
7
0
14 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
34
3
0
08 Aug 2023
Efficient Machine Translation Corpus Generation
Efficient Machine Translation Corpus Generation
K. Yuksel
Ahmet Gunduz
Shreyas Sharma
H. Sawaf
19
4
0
20 Jun 2023
BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust
  Machine Translation Evaluation
BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation
T. Glushkova
Chrysoula Zerva
André F. T. Martins
33
6
0
30 May 2023
A Critical Evaluation of Evaluations for Long-form Question Answering
A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu
Yixiao Song
Mohit Iyyer
Eunsol Choi
ELM
37
94
0
29 May 2023
Towards Unsupervised Recognition of Token-level Semantic Differences in
  Related Documents
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents
Jannis Vamvas
Rico Sennrich
21
1
0
22 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
80
1,142
0
17 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model
  Improvements
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
30
15
0
12 Apr 2023
Large language models effectively leverage document-level context for
  literary translation, but critical errors persist
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
31
81
0
06 Apr 2023
Error Analysis Prompting Enables Human-Like Translation Evaluation in
  Large Language Models
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
Qingyu Lu
Baopu Qiu
Liang Ding
Liping Xie
Tom Kocmi
Dacheng Tao
LRM
ALM
ELM
23
107
0
24 Mar 2023
Towards Fine-Grained Information: Identifying the Type and Location of
  Translation Errors
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors
Keqin Bao
Yu Wan
Dayiheng Liu
Baosong Yang
Wenqiang Lei
Xiangnan He
Derek F.Wong
Jun Xie
29
4
0
17 Feb 2023
The unreasonable effectiveness of few-shot learning for machine
  translation
The unreasonable effectiveness of few-shot learning for machine translation
Xavier Garcia
Yamini Bansal
Colin Cherry
George F. Foster
M. Krikun
Fan Feng
Melvin Johnson
Orhan Firat
27
102
0
02 Feb 2023
Extrinsic Evaluation of Machine Translation Metrics
Extrinsic Evaluation of Machine Translation Metrics
Nikita Moghe
Tom Sherborne
Mark Steedman
Alexandra Birch
ELM
18
18
0
20 Dec 2022
Toward Human-Like Evaluation for Natural Language Generation with Error
  Analysis
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu
Liang Ding
Liping Xie
Kanjian Zhang
Derek F. Wong
Dacheng Tao
ELM
ALM
34
14
0
20 Dec 2022
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su
Weijia Shi
Jungo Kasai
Yizhong Wang
Yushi Hu
Mari Ostendorf
Wen-tau Yih
Noah A. Smith
Luke Zettlemoyer
Tao Yu
27
278
0
19 Dec 2022
Operationalizing Specifications, In Addition to Test Sets for Evaluating
  Constrained Generative Models
Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models
Vikas Raunak
Matt Post
Arul Menezes
EGVM
27
0
0
19 Nov 2022
Reward Gaming in Conditional Text Generation
Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
21
24
0
16 Nov 2022
HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric
  Looking into Multi-Word Expressions
HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric Looking into Multi-Word Expressions
Lifeng Han
12
2
0
09 Nov 2022
Dialect-robust Evaluation of Generated Text
Dialect-robust Evaluation of Generated Text
Jiao Sun
Thibault Sellam
Elizabeth Clark
Tu Vu
Timothy Dozat
Dan Garrette
Aditya Siddhant
Jacob Eisenstein
Sebastian Gehrmann
15
19
0
02 Nov 2022
Searching for a higher power in the human evaluation of MT
Searching for a higher power in the human evaluation of MT
Johnny Tian-Zheng Wei
Tom Kocmi
C. Federmann
16
6
0
20 Oct 2022
Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task
Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task
Yu Wan
Keqin Bao
Dayiheng Liu
Baosong Yang
Derek F. Wong
Lidia S. Chao
Wenqiang Lei
Jun Xie
22
9
0
18 Oct 2022
DICTDIS: Dictionary Constrained Disambiguation for Improved NMT
DICTDIS: Dictionary Constrained Disambiguation for Improved NMT
Ayush Maheshwari
Piyush Sharma
P. Jyothi
Ganesh Ramakrishnan
31
2
0
13 Oct 2022
Toxicity in Multilingual Machine Translation at Scale
Toxicity in Multilingual Machine Translation at Scale
Marta R. Costa-jussá
Eric Michael Smith
C. Ropers
Daniel Licht
Jean Maillard
Javier Ferrando
Carlos Escolano
22
24
0
06 Oct 2022
Rethink about the Word-level Quality Estimation for Machine Translation
  from Human Judgement
Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement
Zhen Yang
Fandong Meng
Yuanmeng Yan
Jie Zhou
21
3
0
13 Sep 2022
Automatic Correction of Human Translations
Automatic Correction of Human Translations
Jessy Lin
G. Kovács
Aditya Shastry
Joern Wuebker
John DeNero
28
3
0
17 Jun 2022
Resolving the Human Subjects Status of Machine Learning's Crowdworkers
Resolving the Human Subjects Status of Machine Learning's Crowdworkers
Divyansh Kaushik
Zachary Chase Lipton
A. London
25
2
0
08 Jun 2022
Twist Decoding: Diverse Generators Guide Each Other
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
19
4
0
19 May 2022
12
Next