Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.13988
Cited By
GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4
21 October 2023
Tom Kocmi
C. Federmann
Re-assign community
ArXiv
PDF
HTML
Papers citing
"GEMBA-MQM: Detecting Translation Quality Error Spans with GPT-4"
50 / 52 papers shown
Title
Same evaluation, more tokens: On the effect of input length for machine translation evaluation using Large Language Models
Tobias Domhan
Dawei Zhu
24
0
0
03 May 2025
An LLM-as-a-judge Approach for Scalable Gender-Neutral Translation Evaluation
Andrea Piergentili
Beatrice Savoldi
Matteo Negri
L. Bentivogli
ELM
35
0
0
16 Apr 2025
AskQE: Question Answering as Automatic Evaluation for Machine Translation
Dayeon Ki
Kevin Duh
Marine Carpuat
24
0
0
15 Apr 2025
Large Language Models as Span Annotators
Zdeněk Kasner
Vilém Zouhar
Patrícia Schmidtová
Ivan Kartáč
Kristýna Onderková
Ondřej Plátek
Dimitra Gkatzia
Saad Mahamood
Ondrej Dusek
Simone Balloccu
ALM
27
0
0
11 Apr 2025
DeepSeek vs. o3-mini: How Well can Reasoning LLMs Evaluate MT and Summarization?
Daniil Larionov
Sotaro Takeshita
Ran Zhang
Yanran Chen
Christoph Leiter
Zhipin Wang
Christian Greisinger
Steffen Eger
ReLM
ELM
LRM
69
0
0
10 Apr 2025
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation
Baban Gain
Dibyanayan Bandyopadhyay
Asif Ekbal
LM&MA
52
0
0
02 Apr 2025
Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models
José P. Pombal
Nuno M. Guerreiro
Ricardo Rei
André F. T. Martins
ALM
66
0
0
01 Apr 2025
XL-Instruct: Synthetic Data for Cross-Lingual Open-Ended Generation
Vivek Iyer
Ricardo Rei
Pinzhen Chen
Alexandra Birch
SyDa
LM&MA
66
0
0
29 Mar 2025
OpeNLGauge: An Explainable Metric for NLG Evaluation with Open-Weights LLMs
Ivan Kartáč
Mateusz Lango
Ondrej Dusek
ELM
46
1
0
14 Mar 2025
QE4PE: Word-level Quality Estimation for Human Post-Editing
Gabriele Sarti
Vilém Zouhar
Grzegorz Chrupała
Ana Guerberof Arenas
Malvina Nissim
Arianna Bisazza
38
0
0
04 Mar 2025
BatchGEMBA: Token-Efficient Machine Translation Evaluation with Batched Prompting and Prompt Compression
Daniil Larionov
Steffen Eger
VLM
MQ
74
0
0
04 Mar 2025
SwiLTra-Bench: The Swiss Legal Translation Benchmark
Joel Niklaus
Jakob Merane
Luka Nenadic
Sina Ahmadi
Yingqiang Gao
...
Matthew Guillod
Robin Mamié
Daniel Brunner
Julio Pereyra
Niko Grupen
AILaw
ELM
74
0
0
03 Mar 2025
Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation
Xiang Geng
Zhejian Lai
Jiajun Chen
Hao Yang
Shujian Huang
60
0
0
27 Feb 2025
M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation
Zhaopeng Feng
Jiayuan Su
Jiamei Zheng
Jiahan Ren
Yan Zhang
Jian Wu
Hongwei Wang
Zuozhu Liu
ELM
198
0
0
21 Feb 2025
Cascaded Self-Evaluation Augmented Training for Lightweight Multimodal LLMs
Zheqi Lv
Wenkai Wang
Jiawei Wang
Shengyu Zhang
Fei Wu
LRM
ReLM
51
0
0
10 Jan 2025
A 2-step Framework for Automated Literary Translation Evaluation: Its Promises and Pitfalls
Sheikh Shafayat
Dongkeun Yoon
Woori Jang
Jiwoo Choi
Alice H. Oh
Seohyon Jung
91
1
0
03 Jan 2025
Towards Automatic Evaluation for Image Transcreation
Simran Khanuja
Vivek Iyer
Claire He
Graham Neubig
ViT
77
1
0
18 Dec 2024
From Jack of All Trades to Master of One: Specializing LLM-based Autoraters to a Test Set
M. Finkelstein
Dan Deutsch
Parker Riley
Juraj Juraska
Geza Kovacs
Markus Freitag
66
0
0
23 Nov 2024
MetaMetrics-MT: Tuning Meta-Metrics for Machine Translation via Human Preference Calibration
David Anugraha
Garry Kuwanto
Lucky Susanto
Derry Wijaya
Genta Indra Winata
OSLM
30
2
0
01 Nov 2024
How Good Are LLMs for Literary Translation, Really? Literary Translation Evaluation with Humans and LLMs
Ran Zhang
Wei-Ye Zhao
Steffen Eger
68
4
0
24 Oct 2024
Findings of the WMT 2024 Shared Task on Chat Translation
Wafaa Mohammed
Sweta Agrawal
M. Amin Farajian
Vera Cabarrão
Bryan Eikema
Ana C. Farinha
José G. C. de Souza
19
3
0
15 Oct 2024
Are Large Language Models State-of-the-art Quality Estimators for Machine Translation of User-generated Content?
Shenbin Qian
Constantin Orasan
Diptesh Kanojia
Félix do Carmo
ELM
17
0
0
08 Oct 2024
Beyond Correlation: Interpretable Evaluation of Machine Translation Metrics
Stefano Perrella
Lorenzo Proietti
Pere-Lluís Huguet Cabot
Edoardo Barba
Roberto Navigli
14
2
0
07 Oct 2024
What do Large Language Models Need for Machine Translation Evaluation?
Shenbin Qian
Archchana Sindhujan
Minnie Kabra
Diptesh Kanojia
Constantin Orasan
Tharindu Ranasinghe
Frédéric Blain
ELM
LRM
ALM
LM&MA
18
0
0
04 Oct 2024
A Multi-task Learning Framework for Evaluating Machine Translation of Emotion-loaded User-generated Content
Shenbin Qian
Constantin Orasan
Diptesh Kanojia
Félix do Carmo
20
0
0
04 Oct 2024
MetaMetrics: Calibrating Metrics For Generation Tasks Using Human Preferences
Genta Indra Winata
David Anugraha
Lucky Susanto
Garry Kuwanto
Derry Wijaya
37
7
0
03 Oct 2024
Creative and Context-Aware Translation of East Asian Idioms with GPT-4
Kenan Tang
Peiyang Song
Yao Qin
Xifeng Yan
22
1
0
01 Oct 2024
MQM-APE: Toward High-Quality Error Annotation Predictors with Automatic Post-Editing in LLM Translation Evaluators
Qingyu Lu
Liang Ding
Kanjian Zhang
Jinxia Zhang
Dacheng Tao
24
3
0
22 Sep 2024
Guardians of the Machine Translation Meta-Evaluation: Sentinel Metrics Fall In!
Stefano Perrella
Lorenzo Proietti
Alessandro Sciré
Edoardo Barba
Roberto Navigli
18
3
0
25 Aug 2024
LLaMAX: Scaling Linguistic Horizons of LLM by Enhancing Translation Capabilities Beyond 100 Languages
Yinquan Lu
Wenhao Zhu
Lei Li
Yu Qiao
Fei Yuan
42
24
0
08 Jul 2024
PrExMe! Large Scale Prompt Exploration of Open Source LLMs for Machine Translation and Summarization Evaluation
Christoph Leiter
Steffen Eger
27
7
0
26 Jun 2024
LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks
A. Bavaresco
Raffaella Bernardi
Leonardo Bertolazzi
Desmond Elliott
Raquel Fernández
...
David Schlangen
Alessandro Suglia
Aditya K Surikuchi
Ece Takmaz
A. Testoni
ALM
ELM
36
62
0
26 Jun 2024
Themis: Towards Flexible and Interpretable NLG Evaluation
Xinyu Hu
Li Lin
Mingqi Gao
Xunjian Yin
Xiaojun Wan
ELM
29
6
0
26 Jun 2024
xCOMET-lite: Bridging the Gap Between Efficiency and Quality in Learned MT Evaluation Metrics
Daniil Larionov
Mikhail Seleznyov
Vasiliy Viskov
Alexander Panchenko
Steffen Eger
26
3
0
20 Jun 2024
AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar
Tom Kocmi
Mrinmaya Sachan
28
4
0
18 Jun 2024
PFID: Privacy First Inference Delegation Framework for LLMs
Haoyan Yang
Zhitao Li
Yong Zhang
Jianzong Wang
Ning Cheng
Ming Li
Jing Xiao
23
1
0
18 Jun 2024
How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?
Anushka Singh
Ananya B. Sai
Raj Dabre
Ratish Puduppully
Anoop Kunchukuttan
Mitesh Khapra
28
1
0
06 Jun 2024
Can Automatic Metrics Assess High-Quality Translations?
Sweta Agrawal
António Farinhas
Ricardo Rei
André F. T. Martins
21
8
0
28 May 2024
(Perhaps) Beyond Human Translation: Harnessing Multi-Agent Collaboration for Translating Ultra-Long Literary Texts
Minghao Wu
Jiahao Xu
Yulin Yuan
Gholamreza Haffari
Longyue Wang
Weihua Luo
Kaifu Zhang
LLMAG
114
22
0
20 May 2024
Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations
Dayeon Ki
Marine Carpuat
25
17
0
11 Apr 2024
Tower: An Open Multilingual Large Language Model for Translation-Related Tasks
Duarte M. Alves
José P. Pombal
Nuno M. Guerreiro
Pedro H. Martins
Joao Alves
...
Patrick Fernandes
Sweta Agrawal
Pierre Colombo
José G. C. de Souza
André F.T. Martins
LRM
40
128
0
27 Feb 2024
TEaR: Improving LLM-based Machine Translation with Systematic Self-Refinement
Zhaopeng Feng
Yan Zhang
Hao Li
Bei Wu
Jiayu Liao
Wenqiang Liu
Jun Lang
Yang Feng
Jian Wu
Zuozhu Liu
LRM
32
9
0
26 Feb 2024
LLM-based NLG Evaluation: Current Status and Challenges
Mingqi Gao
Xinyu Hu
Jie Ruan
Xiao Pu
Xiaojun Wan
ELM
LM&MA
53
28
0
02 Feb 2024
Beyond Traditional Benchmarks: Analyzing Behaviors of Open LLMs on Data-to-Text Generation
Zdeněk Kasner
Ondrej Dusek
22
8
0
18 Jan 2024
Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
Tom Kocmi
Vilém Zouhar
C. Federmann
Matt Post
21
10
0
12 Jan 2024
Lost in the Source Language: How Large Language Models Evaluate the Quality of Machine Translation
Xu Huang
Zhirui Zhang
Xiang Geng
Yichao Du
Jiajun Chen
Shujian Huang
40
7
0
12 Jan 2024
Adapting Large Language Models for Document-Level Machine Translation
Minghao Wu
Thuy-Trang Vu
Lizhen Qu
George F. Foster
Gholamreza Haffari
77
42
0
12 Jan 2024
MERA: A Comprehensive LLM Evaluation in Russian
Alena Fenogenova
Artem Chervyakov
Nikita Martynov
Anastasia Kozlova
Maria Tikhonova
...
Nikita Savushkin
Polina Mikhailova
Denis Dimitrov
Alexander Panchenko
Sergey Markov
ELM
20
10
0
09 Jan 2024
xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection
Nuno M. Guerreiro
Ricardo Rei
Daan van Stigt
Luísa Coheur
Pierre Colombo
André F.T. Martins
35
109
0
16 Oct 2023
Towards Explainable Evaluation Metrics for Machine Translation
Christoph Leiter
Piyawat Lertvittayakumjorn
M. Fomicheva
Wei-Ye Zhao
Yang Gao
Steffen Eger
ELM
18
11
0
22 Jun 2023
1
2
Next