Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2406.12624
Cited By
v1
v2
v3
v4
v5 (latest)
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
18 June 2024
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (38 upvotes)
Papers citing
"Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges"
4 / 104 papers shown
Title
TruthfulQA: Measuring How Models Mimic Human Falsehoods
Stephanie C. Lin
Jacob Hilton
Owain Evans
HILM
300
2,192
0
08 Sep 2021
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
Basel Alomair
Jacob Steinhardt
ELM
RALM
705
5,210
0
07 Sep 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
1.5K
45,749
0
28 May 2020
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
709
2,912
0
09 May 2017
Previous
1
2
3