ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
  • Feedback
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.12624
  4. Cited By
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
v1v2v3v4v5 (latest)

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

18 June 2024
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
    ELMALM
ArXiv (abs)PDFHTMLHuggingFace (38 upvotes)

Papers citing "Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges"

4 / 104 papers shown
Title
TruthfulQA: Measuring How Models Mimic Human Falsehoods
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
300
2,192
0
08 Sep 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELMRALM
705
5,210
0
07 Sep 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.5K
45,749
0
28 May 2020
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for
  Reading Comprehension
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
709
2,912
0
09 May 2017
Previous
123