Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.06866
Cited By
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context
9 July 2024
Victoria R. Li
Yida Chen
Naomi Saphra
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context"
3 / 3 papers shown
Title
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
42
27
0
16 Jul 2024
Dialect prejudice predicts AI decisions about people's character, employability, and criminality
Valentin Hofmann
Pratyusha Kalluri
Dan Jurafsky
Sharese King
75
38
0
01 Mar 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
1