Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2503.09347
Cited By
v1
v2
v3 (latest)
Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
12 March 2025
Hongyu Chen
Seraphina Goldfarb-Tarrant
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts"
6 / 6 papers shown
Title
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Zixuan Hu
Li Shen
Zhenyi Wang
Yongxian Wei
Dacheng Tao
AAML
107
0
0
31 Oct 2025
A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
Nishant Balepur
Matthew Shu
Yoo Yeon Sung
Seraphina Goldfarb-Tarrant
Shi Feng
Fumeng Yang
Rachel Rudinger
Jordan L. Boyd-Graber
150
0
0
23 Sep 2025
Neither Valid nor Reliable? Investigating the Use of LLMs as Judges
Khaoula Chehbouni
Mohammed Haddou
Jackie CK Cheung
G. Farnadi
LLMAG
273
5
0
25 Aug 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
467
124
0
03 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
974
240
0
25 Nov 2024
Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges
Aman Singh Thakur
Kartik Choudhary
Venkat Srinik Ramayapally
Sankaran Vaidyanathan
Dieuwke Hupkes
ELM
ALM
592
127
0
18 Jun 2024
1