ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.04696
  4. Cited By
BLEURT: Learning Robust Metrics for Text Generation
v1v2v3v4v5 (latest)

BLEURT: Learning Robust Metrics for Text Generation

Annual Meeting of the Association for Computational Linguistics (ACL), 2020
9 April 2020
Thibault Sellam
Dipanjan Das
Ankur P. Parikh
ArXiv (abs)PDFHTML

Papers citing "BLEURT: Learning Robust Metrics for Text Generation"

50 / 1,045 papers shown
SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification
SpecPV: Improving Self-Speculative Decoding for Long-Context Generation via Partial Verification
Zhendong Tan
Xingjun Zhang
Chaoyi Hu
Junjie Peng
Kun Xia
LRM
181
0
0
02 Dec 2025
Agreement-Constrained Probabilistic Minimum Bayes Risk Decoding
Agreement-Constrained Probabilistic Minimum Bayes Risk Decoding
Koki Natsumi
Hiroyuki Deguchi
Yusuke Sakai
Hidetaka Kamigaito
Taro Watanabe
115
0
0
01 Dec 2025
HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment
HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment
Valentin Noël
Elimane Yassine Seidou
Charly Ken Capo-Chichi
Ghanem Amari
HILM
201
1
0
01 Dec 2025
A Systematic Analysis of Large Language Models with RAG-enabled Dynamic Prompting for Medical Error Detection and Correction
A Systematic Analysis of Large Language Models with RAG-enabled Dynamic Prompting for Medical Error Detection and Correction
Farzad Ahmed
Joniel Augustine Jerome
Meliha Yetisgen
Özlem Uzuner
186
0
0
25 Nov 2025
ARQUSUMM: Argument-aware Quantitative Summarization of Online Conversations
ARQUSUMM: Argument-aware Quantitative Summarization of Online Conversations
A. Tang
Xiuzhen Zhang
M. Dinh
Zhuang Li
127
0
0
21 Nov 2025
SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation
SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation
Shrikant B. Kendre
Austin Xu
Honglu Zhou
Michael S Ryoo
Shafiq Joty
Juan Carlos Niebles
251
0
0
21 Nov 2025
WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue
WER is Unaware: Assessing How ASR Errors Distort Clinical Understanding in Patient Facing Dialogue
Zachary Ellis
Jared Joselowitz
Yash Deo
Yajie Vera He
Anna Kalygina
Aisling Higham
Mana Rahimzadeh
Yan Jia
Ibrahim Habli
Ernest Lim
329
1
0
20 Nov 2025
Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation
Music Recommendation with Large Language Models: Challenges, Opportunities, and Evaluation
Elena V. Epure
Yashar Deldjoo
Bruno Sguerra
Markus Schedl
Manuel Moussallam
218
0
0
20 Nov 2025
Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
Beyond Surface-Level Similarity: Hierarchical Contamination Detection for Synthetic Training Data in Foundation Models
Sushant Mehta
162
0
0
18 Nov 2025
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Revisiting NLI: Towards Cost-Effective and Human-Aligned Metrics for Evaluating LLMs in Question Answering
Sai Shridhar Balamurali
Lu Cheng
190
2
0
10 Nov 2025
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
VADER: Towards Causal Video Anomaly Understanding with Relation-Aware Large Language Models
Ying Cheng
Y. Lin
Min-Hung Chen
Fu-En Yang
S. Lai
280
0
0
10 Nov 2025
How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
How to Evaluate Speech Translation with Source-Aware Neural MT Metrics
Mauro Cettolo
Marco Gaido
Matteo Negri
Sara Papi
L. Bentivogli
234
0
0
05 Nov 2025
AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs
AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs
Mo El-Haj
Paul Rayson
AIFin
499
0
0
03 Nov 2025
MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts
MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts
Naoto Iwase
Hiroki Okuyama
Junichiro Iwasawa
LRMELM
145
2
0
01 Nov 2025
Rating Roulette: Self-Inconsistency in LLM-As-A-Judge Frameworks
Rating Roulette: Self-Inconsistency in LLM-As-A-Judge FrameworksConference on Empirical Methods in Natural Language Processing (EMNLP), 2025
Rajarshi Haldar
Julia Hockenmaier
203
16
0
31 Oct 2025
Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media
Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media
Shakib Yazdani
Yasser Hamidullah
C. España-Bonet
Josef van Genabith
SLR
302
1
0
29 Oct 2025
A Critical Study of Automatic Evaluation in Sign Language Translation
A Critical Study of Automatic Evaluation in Sign Language Translation
Shakib Yazdani
Yasser Hamidullah
C. España-Bonet
Eleftherios Avramidis
Josef van Genabith
SLR
390
0
0
29 Oct 2025
A Survey on Unlearning in Large Language Models
A Survey on Unlearning in Large Language Models
Ruichen Qiu
Jiajun Tan
Jiayue Pu
Honglin Wang
Xiao-Shan Gao
Fei Sun
MUAILawPILM
789
2
0
29 Oct 2025
Text Simplification with Sentence Embeddings
Text Simplification with Sentence Embeddings
Matthew Shardlow
112
0
0
28 Oct 2025
MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task
Juraj Juraska
Tobias Domhan
M. Finkelstein
Tetsuji Nakagawa
Geza Kovacs
Daniel Deutsch
Pidong Wang
Markus Freitag
151
8
0
28 Oct 2025
Wisdom and Delusion of LLM Ensembles for Code Generation and Repair
Wisdom and Delusion of LLM Ensembles for Code Generation and Repair
Fernando Vallecillos Ruiz
Max Hort
Leon Moonen
211
3
0
24 Oct 2025
Structure-Conditional Minimum Bayes Risk Decoding
Structure-Conditional Minimum Bayes Risk Decoding
Bryan Eikema
Anna Rutkiewicz
Mario Giulianelli
201
1
0
23 Oct 2025
Spatio-temporal Sign Language Representation and Translation
Spatio-temporal Sign Language Representation and TranslationConference on Machine Translation (WMT), 2025
Yasser Hamidullah
Josef van Genabith
C. España-Bonet
SLR
329
7
0
22 Oct 2025
Sign Language Translation with Sentence Embedding Supervision
Sign Language Translation with Sentence Embedding SupervisionAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yasser Hamidullah
Josef van Genabith
C. España-Bonet
SLR
390
15
0
22 Oct 2025
Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition
Re-evaluating Minimum Bayes Risk Decoding for Automatic Speech Recognition
Yuu Jinnai
185
1
0
22 Oct 2025
SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
SONAR-SLT: Multilingual Sign Language Translation via Language-Agnostic Sentence Embedding Supervision
Yasser Hamidullah
Shakib Yazdani
Cennet Oguz
Josef van Genabith
C. España-Bonet
SLRVLM
460
3
0
22 Oct 2025
Evaluating Medical LLMs by Levels of Autonomy: A Survey Moving from Benchmarks to Applications
Evaluating Medical LLMs by Levels of Autonomy: A Survey Moving from Benchmarks to Applications
Xiao Ye
Jacob Dineen
Zhaonan Li
Zhikun Xu
Weiyu Chen
...
Ji-Eun Irene Yum
Muhammad Ali Khan
Muhammad Umar Afzal
Irbaz B. Riaz
Ben Zhou
LM&MAELM
266
1
0
20 Oct 2025
Bolster Hallucination Detection via Prompt-Guided Data Augmentation
Bolster Hallucination Detection via Prompt-Guided Data Augmentation
Wenyun Li
Zheng Zhang
Dongmei Jiang
Xiangyuan Lan
HILM
226
0
0
13 Oct 2025
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models
Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models
Christopher Chiu
Silviu Pitis
Mihaela van der Schaar
LM&MAELMLRM
221
1
0
11 Oct 2025
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation
DITING: A Multi-Agent Evaluation Framework for Benchmarking Web Novel Translation
Enze Zhang
Jiaying Wang
Mengxi Xiao
Jifei Liu
Ziyan Kuang
Rui Dong
Eric Dong
Sophia Ananiadou
Min Peng
Qianqian Xie
182
2
0
10 Oct 2025
Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
Revisiting Metric Reliability for Fine-grained Evaluation of Machine Translation and Summarization in Indian Languages
Amir Hossein Yari
Kalmit Kulkarni
Ahmad Raza Khan
Fajri Koto
146
0
0
08 Oct 2025
LASER: An LLM-based ASR Scoring and Evaluation Rubric
LASER: An LLM-based ASR Scoring and Evaluation Rubric
Amruta Parulekar
Preethi Jyothi
141
1
0
08 Oct 2025
Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation"
Reproducibility Study of "XRec: Large Language Models for Explainable Recommendation"
Ranjan Mishra
Julian I. Bibo
Quinten van Engelen
Henk Schaapman
LRM
148
0
0
06 Oct 2025
Reward Models are Metrics in a Trench Coat
Reward Models are Metrics in a Trench Coat
Sebastian Gehrmann
189
0
0
03 Oct 2025
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
Addressing Pitfalls in the Evaluation of Uncertainty Estimation Methods for Natural Language Generation
Mykyta Ielanskyi
Kajetan Schweighofer
L. Aichberger
Sepp Hochreiter
HILM
266
2
0
02 Oct 2025
Automatic Fact-checking in English and Telugu
Automatic Fact-checking in English and Telugu
Ravi Kiran Chikkala
Tatiana Anikina
N. Skachkova
Ivan Vykopal
Rodrigo Agerri
Josef van Genabith
HILM
501
0
0
30 Sep 2025
Model Fusion with Multi-LoRA Inference for Tool-Enhanced Game Dialogue Agents
Model Fusion with Multi-LoRA Inference for Tool-Enhanced Game Dialogue Agents
Kangxu Wang
Ze Chen
Chengcheng Wei
Jiewen Zheng
Jiarong He
Max Gao
MoMe
130
0
0
29 Sep 2025
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
EasySteer: A Unified Framework for High-Performance and Extensible LLM Steering
Haolei Xu
Xinyu Mei
Yuchen Yan
Rui Zhou
Wenqi Zhang
Weiming Lu
Yueting Zhuang
Yongliang Shen
LLMSV
211
6
0
29 Sep 2025
LLM Hallucination Detection: HSAD
LLM Hallucination Detection: HSAD
Jinxin Li
Gang Tu
Junjie Hu
253
1
0
28 Sep 2025
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
Semantic Voting: A Self-Evaluation-Free Approach for Efficient LLM Self-Improvement on Unverifiable Open-ended Tasks
Chunyang Jiang
Y. Zhang
Yiyang Cai
Chi-Min Chan
Yulong Liu
Mingming Chen
Wei Xue
Yike Guo
LRM
165
1
0
27 Sep 2025
Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation
Liaozhai through the Looking-Glass: On Paratextual Explicitation of Culture-Bound Terms in Machine Translation
Sherrie Shen
Weixuan Wang
Alexandra Birch
153
0
0
27 Sep 2025
Culture In a Frame: C$^3$B as a Comic-Based Benchmark for Multimodal Culturally Awareness
Culture In a Frame: C3^33B as a Comic-Based Benchmark for Multimodal Culturally Awareness
Yuchen Song
Andong Chen
Wenxin Zhu
Kehai Chen
X. Bai
Muyun Yang
Tiejun Zhao
211
1
0
27 Sep 2025
Temporal Generalization: A Reality Check
Temporal Generalization: A Reality Check
Divyam Madaan
S. Chopra
Kyunghyun Cho
OODAI4TS
169
0
0
27 Sep 2025
MO-GRPO: Mitigating Reward Hacking of Group Relative Policy Optimization on Multi-Objective Problems
MO-GRPO: Mitigating Reward Hacking of Group Relative Policy Optimization on Multi-Objective Problems
Yuki Ichihara
Yuu Jinnai
Tetsuro Morimura
Mitsuki Sakamoto
Ryota Mitsuhashi
Eiji Uchibe
250
5
0
26 Sep 2025
Semantic Agreement Enables Efficient Open-Ended LLM Cascades
Semantic Agreement Enables Efficient Open-Ended LLM Cascades
Duncan Soiffer
Steven Kolawole
Virginia Smith
315
1
0
26 Sep 2025
EnAnchored-X2X: English-Anchored Optimization for Many-to-Many Translation
EnAnchored-X2X: English-Anchored Optimization for Many-to-Many Translation
Sen Yang
Yu Bao
Yu Lu
Jiajun Chen
Shujian Huang
Shanbo Cheng
184
2
0
24 Sep 2025
Evaluating Language Translation Models by Playing Telephone
Evaluating Language Translation Models by Playing Telephone
Syeda Jannatus Saba
Steven Skiena
155
0
0
23 Sep 2025
Specification-Aware Machine Translation and Evaluation for Purpose Alignment
Specification-Aware Machine Translation and Evaluation for Purpose Alignment
Yoko Kayano
Saku Sugawara
159
0
0
22 Sep 2025
Extending Automatic Machine Translation Evaluation to Book-Length Documents
Extending Automatic Machine Translation Evaluation to Book-Length Documents
Kuang-Da Wang
Shuoyang Ding
Chao-Han Huck Yang
Ping-Chun Hsieh
Wen-Chih Peng
Vitaly Lavrukhin
Boris Ginsburg
196
2
0
21 Sep 2025
Deep learning and abstractive summarisation for radiological reports: an empirical study for adapting the PEGASUS models' family with scarce data
Deep learning and abstractive summarisation for radiological reports: an empirical study for adapting the PEGASUS models' family with scarce data
Claudio Benzoni
Martina Langhals
Martin Boeker
Luise Modersohn
Máté E. Maros
MedIm
149
0
0
18 Sep 2025
1234...192021
Next
Page 1 of 21
Pageof 21