v1v2 (latest)

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do

17 September 2024

ArXiv (abs)PDF HTML HuggingFace (2 upvotes)Github (3★)

Papers citing "LLM-as-a-Judge & Reward Model: What They Can and Cannot Do"

12 / 12 papers shown

ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration

169

26 Sep 2025

Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO

Francesco Pappone

Ruggero Marino Lazzaroni

Federico Califano

Niccolò Gentile

Roberto Marras

154

16 Sep 2025

Reinforcement Learning for Machine Learning Engineering Agents

Sherry Yang

Joy He-Yueya

Percy Liang

244

01 Sep 2025

Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance

Rudransh Agnihotri

Ananya Pandey

OffRL ALM

267

06 Jun 2025

SkillVerse : Assessing and Enhancing LLMs with Tree EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

176

31 May 2025

When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research

...

420

17 May 2025

Heimdall: test-time scaling on the generative verification

Wenlei Shi

Xing Jin

LRM

499

14 Apr 2025

GROVE: A Generalized Reward for Learning Open-Vocabulary Physical SkillComputer Vision and Pattern Recognition (CVPR), 2025

474

05 Apr 2025

OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology

465

03 Feb 2025

Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators

1.2K

148

20 Jan 2025

Generative Adversarial Reviews: When LLMs Become the Critic

Nicolas Bougie

Narimasa Watanabe

421

09 Dec 2024

Are We Done with MMLU?

Alberto Carlo Maria Mancino

...

Joshua Harris

Pasquale Minervini

520

123

06 Jun 2024