On Calibration of LLM-based Guard Models for Reliable Content Moderation

14 October 2024

Papers citing "On Calibration of LLM-based Guard Models for Reliable Content Moderation"

1 / 1 papers shown

Title
GuardReasoner: Towards Reasoning-based LLM Safeguards Yue Liu Hongcheng Gao Shengfang Zhai Jun-Xiong Xia Tianyi Wu Zhiwei Xue Y. Chen Kenji Kawaguchi Jiaheng Zhang Bryan Hooi AI4TS LRM 106 13 0 30 Jan 2025