Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.14478
Cited By
Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation
29 April 2021
Markus Freitag
George F. Foster
David Grangier
Viresh Ratnakar
Qijun Tan
Wolfgang Macherey
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation"
50 / 76 papers shown
Title
Calibrating Translation Decoding with Quality Estimation on LLMs
Di Wu
Yibin Lei
Christof Monz
70
0
0
26 Apr 2025
Testing LLMs' Capabilities in Annotating Translations Based on an Error Typology Designed for LSP Translation: First Experiments with ChatGPT
Joachim Minder
Guillaume Wisniewski
Natalie Kübler
28
0
0
21 Apr 2025
Rubrik's Cube: Testing a New Rubric for Evaluating Explanations on the CUBE dataset
Diana Galván-Sosa
Gabrielle Gaudeau
Pride Kavumba
Yunmeng Li
Hongyi gu
Zheng Yuan
Keisuke Sakaguchi
P. Buttery
LRM
35
0
0
31 Mar 2025
Self-Vocabularizing Training for Neural Machine Translation
Pin-Jie Lin
Ernie Chang
Yangyang Shi
Vikas Chandra
63
0
0
18 Mar 2025
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
Xiang Geng
Zhejian Lai
Jiajun Chen
Hao Yang
Shujian Huang
62
0
0
27 Feb 2025
Automatic Input Rewriting Improves Translation with Large Language Models
Dayeon Ki
Marine Carpuat
40
0
0
23 Feb 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
203
0
0
21 Feb 2025
Aligning Black-box Language Models with Human Judgments
Gerrit J. J. van den Burg
Gen Suzuki
Wei Liu
Murat Sensoy
ALM
82
0
0
07 Feb 2025
A comparison of translation performance between DeepL and Supertext
Alex Flückiger
Chantal Amrhein
Tim Graf
Frédéric Odermatt
Martin Pömsl
Philippe Schläpfer
Florian Schottmann
Samuel Laubli
ELM
40
0
0
04 Feb 2025
Analyzing and Evaluating Correlation Measures in NLG Meta-Evaluation
Mingqi Gao
Xinyu Hu
Li Lin
Xiaojun Wan
28
1
0
28 Jan 2025
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang
Wei-Ye Zhao
Steffen Eger
71
4
0
24 Oct 2024
Impact of Model Size on Fine-tuned LLM Performance in Data-to-Text Generation: A State-of-the-Art Investigation
Joy Mahapatra
Utpal Garain
37
9
0
19 Jul 2024
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar
Tom Kocmi
Mrinmaya Sachan
30
5
0
18 Jun 2024
Unveiling the Power of Source: Source-based Minimum Bayes Risk Decoding for Neural Machine Translation
Boxuan Lyu
Hidetaka Kamigaito
Kotaro Funakoshi
Manabu Okumura
38
0
0
17 Jun 2024
Critical Learning Periods: Leveraging Early Training Dynamics for Efficient Data Pruning
E. Chimoto
Jay Gala
Orevaoghene Ahia
Julia Kreutzer
Bruce A. Bassett
Sara Hooker
VLM
39
4
0
29 May 2024
What Have We Achieved on Non-autoregressive Translation?
Yafu Li
Huajian Zhang
Jianhao Yan
Yongjing Yin
Yue Zhang
31
1
0
21 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
119
22
0
20 May 2024
Natural Language Processing RELIES on Linguistics
Juri Opitz
Shira Wein
Nathan Schneider
AI4CE
52
7
0
09 May 2024
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations
Dayeon Ki
Marine Carpuat
36
17
0
11 Apr 2024
Human Evaluation of English--Irish Transformer-Based NMT
Séamus Lankford
Haithem Afli
Andy Way
35
10
0
04 Mar 2024
Likelihood-based Mitigation of Evaluation Bias in Large Language Models
Masanari Ohi
Masahiro Kaneko
Ryuto Koike
Mengsay Loem
Naoaki Okazaki
27
4
0
25 Feb 2024
Evaluating Optimal Reference Translations
Vilém Zouhar
Vvera Kloudová
Martin Popel
Ondrej Bojar
29
2
0
28 Nov 2023
Physician Detection of Clinical Harm in Machine Translation: Quality Estimation Aids in Reliance and Backtranslation Identifies Critical Errors
Nikita Mehandru
Sweta Agrawal
Yimin Xiao
Elaine C. Khoong
Ge Gao
Marine Carpuat
Niloufar Salehi
24
10
0
25 Oct 2023
Thresh: A Unified, Customizable and Deployable Platform for Fine-Grained Text Evaluation
David Heineman
Yao Dou
Wei-ping Xu
26
7
0
14 Aug 2023
Learning Evaluation Models from Large Language Models for Sequence Generation
Chenglong Wang
Hang Zhou
Kai-Chun Chang
Tongran Liu
Chunliang Zhang
Quan Du
Tong Xiao
Yue Zhang
Jingbo Zhu
ELM
34
3
0
08 Aug 2023
Efficient Machine Translation Corpus Generation
K. Yuksel
Ahmet Gunduz
Shreyas Sharma
H. Sawaf
19
4
0
20 Jun 2023
BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation
T. Glushkova
Chrysoula Zerva
André F. T. Martins
33
6
0
30 May 2023
A Critical Evaluation of Evaluations for Long-form Question Answering
Fangyuan Xu
Yixiao Song
Mohit Iyyer
Eunsol Choi
ELM
37
96
0
29 May 2023
Towards Unsupervised Recognition of Token-level Semantic Differences in Related Documents
Jannis Vamvas
Rico Sennrich
21
1
0
22 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
80
1,147
0
17 May 2023
Angler: Helping Machine Translation Practitioners Prioritize Model Improvements
Samantha Robertson
Zijie J. Wang
Dominik Moritz
Mary Beth Kery
Fred Hohman
30
15
0
12 Apr 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
31
81
0
06 Apr 2023
Error Analysis Prompting Enables Human-Like Translation Evaluation in Large Language Models
Qingyu Lu
Baopu Qiu
Liang Ding
Liping Xie
Tom Kocmi
Dacheng Tao
LRM
ALM
ELM
23
107
0
24 Mar 2023
Towards Fine-Grained Information: Identifying the Type and Location of Translation Errors
Keqin Bao
Yu Wan
Dayiheng Liu
Baosong Yang
Wenqiang Lei
Xiangnan He
Derek F.Wong
Jun Xie
29
4
0
17 Feb 2023
The unreasonable effectiveness of few-shot learning for machine translation
Xavier Garcia
Yamini Bansal
Colin Cherry
George F. Foster
M. Krikun
Fan Feng
Melvin Johnson
Orhan Firat
27
102
0
02 Feb 2023
Extrinsic Evaluation of Machine Translation Metrics
Nikita Moghe
Tom Sherborne
Mark Steedman
Alexandra Birch
ELM
18
18
0
20 Dec 2022
Toward Human-Like Evaluation for Natural Language Generation with Error Analysis
Qingyu Lu
Liang Ding
Liping Xie
Kanjian Zhang
Derek F. Wong
Dacheng Tao
ELM
ALM
34
14
0
20 Dec 2022
One Embedder, Any Task: Instruction-Finetuned Text Embeddings
Hongjin Su
Weijia Shi
Jungo Kasai
Yizhong Wang
Yushi Hu
Mari Ostendorf
Wen-tau Yih
Noah A. Smith
Luke Zettlemoyer
Tao Yu
27
278
0
19 Dec 2022
Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models
Vikas Raunak
Matt Post
Arul Menezes
EGVM
27
0
0
19 Nov 2022
Reward Gaming in Conditional Text Generation
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
24
24
0
16 Nov 2022
HilMeMe: A Human-in-the-Loop Machine Translation Evaluation Metric Looking into Multi-Word Expressions
Lifeng Han
12
2
0
09 Nov 2022
Dialect-robust Evaluation of Generated Text
Jiao Sun
Thibault Sellam
Elizabeth Clark
Tu Vu
Timothy Dozat
Dan Garrette
Aditya Siddhant
Jacob Eisenstein
Sebastian Gehrmann
15
19
0
02 Nov 2022
Searching for a higher power in the human evaluation of MT
Johnny Tian-Zheng Wei
Tom Kocmi
C. Federmann
16
6
0
20 Oct 2022
Alibaba-Translate China's Submission for WMT 2022 Metrics Shared Task
Yu Wan
Keqin Bao
Dayiheng Liu
Baosong Yang
Derek F. Wong
Lidia S. Chao
Wenqiang Lei
Jun Xie
22
9
0
18 Oct 2022
DICTDIS: Dictionary Constrained Disambiguation for Improved NMT
Ayush Maheshwari
Piyush Sharma
P. Jyothi
Ganesh Ramakrishnan
31
2
0
13 Oct 2022
Toxicity in Multilingual Machine Translation at Scale
Marta R. Costa-jussá
Eric Michael Smith
C. Ropers
Daniel Licht
Jean Maillard
Javier Ferrando
Carlos Escolano
22
24
0
06 Oct 2022
Rethink about the Word-level Quality Estimation for Machine Translation from Human Judgement
Zhen Yang
Fandong Meng
Yuanmeng Yan
Jie Zhou
21
3
0
13 Sep 2022
Automatic Correction of Human Translations
Jessy Lin
G. Kovács
Aditya Shastry
Joern Wuebker
John DeNero
28
3
0
17 Jun 2022
Resolving the Human Subjects Status of Machine Learning's Crowdworkers
Divyansh Kaushik
Zachary Chase Lipton
A. London
25
2
0
08 Jun 2022
Twist Decoding: Diverse Generators Guide Each Other
Jungo Kasai
Keisuke Sakaguchi
Ronan Le Bras
Hao Peng
Ximing Lu
Dragomir R. Radev
Yejin Choi
Noah A. Smith
SyDa
21
4
0
19 May 2022
1
2
Next