ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.20087
  4. Cited By
Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models

Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models

26 May 2025
Makesh Narsimhan Sreedhar
Traian Rebedea
Christopher Parisien
    LRM
ArXivPDFHTML

Papers citing "Safety Through Reasoning: An Empirical Study of Reasoning Guardrail Models"

7 / 7 papers shown
Title
X-Guard: Multilingual Guard Agent for Content Moderation
X-Guard: Multilingual Guard Agent for Content Moderation
Bibek Upadhayay
Vahid Behzadan
Ph.D
57
2
0
11 Apr 2025
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
Reasoning-to-Defend: Safety-Aware Reasoning Can Defend Large Language Models from Jailbreaking
Junda Zhu
Lingyong Yan
Shuaiqiang Wang
Dawei Yin
Lei Sha
AAML
LRM
60
5
0
18 Feb 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
156
20
0
30 Jan 2025
Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Aegis2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh
Prasoon Varshney
Makesh Narsimhan Sreedhar
Aishwarya Padmakumar
Traian Rebedea
Jibin Rajan Varghese
Christopher Parisien
69
14
0
15 Jan 2025
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Controllable Safety Alignment: Inference-Time Adaptation to Diverse Safety Requirements
Jingyu Zhang
Ahmed Elgohary
Ahmed Magooda
Daniel Khashabi
Benjamin Van Durme
359
3
0
11 Oct 2024
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
  Dialogues
CantTalkAboutThis: Aligning Language Models to Stay on Topic in Dialogues
Makesh Narsimhan Sreedhar
Traian Rebedea
Shaona Ghosh
Jiaqi Zeng
Christopher Parisien
ALM
63
6
0
04 Apr 2024
A Holistic Approach to Undesired Content Detection in the Real World
A Holistic Approach to Undesired Content Detection in the Real World
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
34
228
0
05 Aug 2022
1