ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17578
  4. Cited By
MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models
v1v2 (latest)

MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models

23 October 2024
Seunghyeok Hong
Dongkeun Yoon
Juyoung Suk
Javier Aula-Blasco
Mano Aslan
Vu Trong Kim
Shayekh Bin Islam
Jaume Prats-Cristià
Lucía Tormo-Bañuelos
Seungone Kim
    ELMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "MM-Eval: A Multilingual Meta-Evaluation Benchmark for LLM-as-a-Judge and Reward Models"

5 / 5 papers shown
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and Languages
BLEnD: A Benchmark for LLMs on Everyday Knowledge in Diverse Cultures and LanguagesNeural Information Processing Systems (NeurIPS), 2024
Junho Myung
Nayeon Lee
Yi Zhou
Jiho Jin
Rifki Afina Putri
...
Seid Muhie Yimam
Mohammad Taher Pilehvar
N. Ousidhoum
Jose Camacho-Collados
Alice Oh
512
119
0
17 Jan 2025
Better Instruction-Following Through Minimum Bayes Risk
Better Instruction-Following Through Minimum Bayes RiskInternational Conference on Learning Representations (ICLR), 2024
Ian Wu
Patrick Fernandes
Amanda Bertsch
Seungone Kim
Sina Pakazad
Graham Neubig
594
15
0
03 Oct 2024
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Uncertainty-aware Reward Model: Teaching Reward Models to Know What is Unknown
Xingzhou Lou
Dong Yan
Wei Shen
Yuzi Yan
Jian Xie
Junge Zhang
413
44
0
01 Oct 2024
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models
The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024
Seungone Kim
Juyoung Suk
Ji Yong Cho
Shayne Longpre
Chaeeun Kim
...
Sean Welleck
Graham Neubig
Moontae Lee
Kyungjae Lee
Minjoon Seo
ELMALMLM&MA
431
71
0
09 Jun 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Abigail Z. Jacobs
Tatsunori Hashimoto
ALM
458
605
0
06 Apr 2024
1
Page 1 of 1