ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.16366
  4. Cited By
A generative approach to LLM harmfulness detection with special red flag tokens

A generative approach to LLM harmfulness detection with special red flag tokens

22 February 2025
Sophie Xhonneux
David Dobre
Mehrnaz Mohfakhami
Leo Schwinn
Gauthier Gidel
ArXivPDFHTML

Papers citing "A generative approach to LLM harmfulness detection with special red flag tokens"

1 / 1 papers shown
Title
Safety Pretraining: Toward the Next Generation of Safe AI
Safety Pretraining: Toward the Next Generation of Safe AI
Pratyush Maini
Sachin Goyal
Dylan Sam
Alex Robey
Yash Savani
Yiding Jiang
Andy Zou
Zacharcy C. Lipton
J. Zico Kolter
45
0
0
23 Apr 2025
1