Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2505.19690
Cited By
Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models
26 May 2025
Baihui Zheng
Boren Zheng
Kerui Cao
Y. Tan
Zhendong Liu
Weixun Wang
Jiaheng Liu
Jian Yang
Wenbo Su
Xiaoyong Zhu
Bo Zheng
Kaifu Zhang
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Beyond Safe Answers: A Benchmark for Evaluating True Risk Awareness in Large Reasoning Models"
16 / 16 papers shown
Title
ReasoningGuard: Safeguarding Large Reasoning Models with Inference-time Safety Aha Moments
Yuquan Wang
Mi Zhang
Yining Wang
Geng Hong
Xiaoyu You
Min Yang
LRM
30
0
0
06 Aug 2025
Safety in Large Reasoning Models: A Survey
Cheng Wang
Teli Ma
Yangqiu Song
Duzhen Zhang
Hao Sun
...
Shengju Yu
Xinfeng Li
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
LRM
625
26
0
24 Apr 2025
RealSafe-R1: Safety-Aligned DeepSeek-R1 without Compromising Reasoning Capability
Yuanhang Zhang
Zihao Zeng
Dongbai Li
Yao Huang
Zhijie Deng
Yinpeng Dong
LRM
139
22
0
14 Apr 2025
ShadowCoT: Cognitive Hijacking for Stealthy Reasoning Backdoors in LLMs
Gejian Zhao
Hanzhou Wu
Xinpeng Zhang
Athanasios V. Vasilakos
LRM
111
5
0
08 Apr 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
242
70
0
14 Mar 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Yongbin Li
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
203
10
0
17 Feb 2025
SafeChain: Safety of Language Models with Long Chain-of-Thought Reasoning Capabilities
Fengqing Jiang
Zhangchen Xu
Yuetai Li
Luyao Niu
Zhen Xiang
Yue Liu
Bill Yuchen Lin
Radha Poovendran
KELM
ELM
LRM
181
40
0
17 Feb 2025
Safety Evaluation of DeepSeek Models in Chinese Contexts
Wenjing Zhang
Xuejiao Lei
Zhaoxiang Liu
Rongjia Du
Zhenhong Long
...
Jiaojiao Zhao
Minjie Hua
Chaoyang Ma
Kai Wang
Kai Wang
ELM
268
10
0
16 Feb 2025
OverThink: Slowdown Attacks on Reasoning LLMs
A. Kumar
Jaechul Roh
A. Naseh
Marzena Karpinska
Mohit Iyyer
Amir Houmansadr
Eugene Bagdasarian
LRM
253
35
0
04 Feb 2025
Trading Inference-Time Compute for Adversarial Robustness
Wojciech Zaremba
Evgenia Nitishinskaya
Boaz Barak
Stephanie Lin
Sam Toyer
...
Rachel Dias
Eric Wallace
Kai Y. Xiao
Johannes Heidecke
Amelia Glaese
LRM
AAML
212
38
0
31 Jan 2025
GuardReasoner: Towards Reasoning-based LLM Safeguards
Yue Liu
Hongcheng Gao
Shengfang Zhai
Jun Xia
Tianyi Wu
Zhiwei Xue
Yuxiao Chen
Kenji Kawaguchi
Jiaheng Zhang
Bryan Hooi
AI4TS
LRM
370
34
0
30 Jan 2025
Early External Safety Testing of OpenAI's o3-mini: Insights from the Pre-Deployment Evaluation
Aitor Arrieta
Miriam Ugarte
Pablo Valle
José Antonio Parejo
Sergio Segura
ELM
LRM
108
10
0
29 Jan 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
525
3,182
0
22 Jan 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zihao Huang
Ziyao Xu
Zhiyong Yang
Zonghan Yang
Zongyu Lin
OffRL
ALM
AI4TS
VLM
LRM
500
473
0
22 Jan 2025
OR-Bench: An Over-Refusal Benchmark for Large Language Models
Justin Cui
Wei-Lin Chiang
Ion Stoica
Cho-Jui Hsieh
ALM
261
66
0
31 May 2024
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
David Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
452
311
0
20 Oct 2023
1