ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.11239
  4. Cited By
LLM-as-a-Judge & Reward Model: What They Can and Cannot Do
v1v2 (latest)

LLM-as-a-Judge & Reward Model: What They Can and Cannot Do

17 September 2024
Guijin Son
Hyunwoo Ko
Hoyoung Lee
Yewon Kim
Seunghyeok Hong
    ALMELM
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)Github (3★)

Papers citing "LLM-as-a-Judge & Reward Model: What They Can and Cannot Do"

12 / 12 papers shown
ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
Gaole Dai
Shiqi Jiang
Ting Cao
Yuqing Yang
Yuanchun Li
Rui Tan
Mo Li
Lili Qiu
LRM
169
1
0
26 Sep 2025
Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
Shaping Explanations: Semantic Reward Modeling with Encoder-Only Transformers for GRPO
Francesco Pappone
Ruggero Marino Lazzaroni
Federico Califano
Niccolò Gentile
Roberto Marras
154
1
0
16 Sep 2025
Reinforcement Learning for Machine Learning Engineering Agents
Reinforcement Learning for Machine Learning Engineering Agents
Sherry Yang
Joy He-Yueya
Percy Liang
244
4
0
01 Sep 2025
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Efficient Online RFT with Plug-and-Play LLM Judges: Unlocking State-of-the-Art Performance
Rudransh Agnihotri
Ananya Pandey
OffRLALM
267
1
0
06 Jun 2025
SkillVerse : Assessing and Enhancing LLMs with Tree Evaluation
SkillVerse : Assessing and Enhancing LLMs with Tree EvaluationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yufei Tian
Jiao Sun
Nanyun Peng
Zizhao Zhang
176
1
0
31 May 2025
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
When AI Co-Scientists Fail: SPOT-a Benchmark for Automated Verification of Scientific Research
Guijin Son
Jiwoo Hong
Honglu Fan
Heejeong Nam
Hyunwoo Ko
...
Jinyeop Song
Jinha Choi
Gonçalo Paulo
Youngjae Yu
Stella Biderman
420
15
0
17 May 2025
Heimdall: test-time scaling on the generative verification
Heimdall: test-time scaling on the generative verification
Wenlei Shi
Xing Jin
LRM
499
25
0
14 Apr 2025
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical Skill
GROVE: A Generalized Reward for Learning Open-Vocabulary Physical SkillComputer Vision and Pattern Recognition (CVPR), 2025
Jieming Cui
Tengyu Liu
Ziyu Meng
Jiale Yu
Ran Song
Wei Zhang
Yixin Zhu
Siyuan Huang
VLM
474
9
0
05 Apr 2025
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
OphthBench: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Ophthalmology
Chengfeng Zhou
Ji Wang
Juanjuan Qin
Yining Wang
Ling Sun
Weiwei Dai
LM&MAELM
465
1
0
03 Feb 2025
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators
Yinhong Liu
Han Zhou
Zhijiang Guo
Ehsan Shareghi
Ivan Vulić
Anna Korhonen
Nigel Collier
ALM
1.2K
148
0
20 Jan 2025
Generative Adversarial Reviews: When LLMs Become the Critic
Generative Adversarial Reviews: When LLMs Become the Critic
Nicolas Bougie
Narimasa Watanabe
421
10
0
09 Dec 2024
Are We Done with MMLU?
Are We Done with MMLU?
Aryo Pradipta Gema
Joshua Ong Jun Leang
Giwon Hong
Alessio Devoto
Alberto Carlo Maria Mancino
...
R. McHardy
Joshua Harris
Jean Kaddour
Emile van Krieken
Pasquale Minervini
ELM
520
123
0
06 Jun 2024
1
Page 1 of 1