ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.09443
  4. Cited By
LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge
v1v2 (latest)

LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge

11 June 2025
Songze Li
Chuokun Xu
Jiaying Wang
Xueluan Gong
Chen Chen
J. Zhang
Jun Wang
K. Lam
R. Beyah
    AAMLELM
ArXiv (abs)PDFHTMLGithub

Papers citing "LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge"

4 / 4 papers shown
KEO: Knowledge Extraction on OMIn via Knowledge Graphs and RAG for Safety-Critical Aviation Maintenance
KEO: Knowledge Extraction on OMIn via Knowledge Graphs and RAG for Safety-Critical Aviation Maintenance
Kuangshi Ai
Jonathan A. Karr Jr.
Meng Jiang
Nitesh Chawla
Chaoli Wang
214
1
0
10 Apr 2026
Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions
Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions
Emi Soroka
Tanmay Chopra
Krish Desai
Sanjay Lall
ALM
392
0
0
04 Nov 2025
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
Distractor Injection Attacks on Large Reasoning Models: Characterization and Defense
Zhehao Zhang
Weijie Xu
Shixian Cui
Chandan K. Reddy
AAMLLRM
169
0
0
17 Oct 2025
Knowledge-Graph Based RAG System Evaluation Framework
Knowledge-Graph Based RAG System Evaluation Framework
Sicheng Dong
Vahid Zolfaghari
Nenad Petrovic
Alois C. Knoll
182
0
0
02 Oct 2025
1
Page 1 of 1