Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.13213
Cited By
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
20 March 2024
Khaoula Chehbouni
Megha Roshan
Emmanuel Ma
Futian Andrew Wei
Afaf Taik
Jackie CK Cheung
G. Farnadi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards"
4 / 4 papers shown
Title
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni
Jonathan Colaço-Carr
Yash More
Jackie CK Cheung
G. Farnadi
71
0
0
12 Nov 2024
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
46
5
0
04 Oct 2024
Annotation alignment: Comparing LLM and human annotations of conversational safety
Rajiv Movva
Pang Wei Koh
Emma Pierson
ALM
27
3
0
10 Jun 2024
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
210
364
0
15 Oct 2021
1