ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.06866
  4. Cited By
ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context

ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context

9 July 2024
Victoria R. Li
Yida Chen
Naomi Saphra
ArXivPDFHTML

Papers citing "ChatGPT Doesn't Trust Chargers Fans: Guardrail Sensitivity in Context"

3 / 3 papers shown
Title
Does Refusal Training in LLMs Generalize to the Past Tense?
Does Refusal Training in LLMs Generalize to the Past Tense?
Maksym Andriushchenko
Nicolas Flammarion
42
27
0
16 Jul 2024
Dialect prejudice predicts AI decisions about people's character,
  employability, and criminality
Dialect prejudice predicts AI decisions about people's character, employability, and criminality
Valentin Hofmann
Pratyusha Kalluri
Dan Jurafsky
Sharese King
75
38
0
01 Mar 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
1