Output Length Effect on DeepSeek-R1's Safety in Forced Thinking

2 March 2025

Abstract

Large Language Models (LLMs) have demonstrated strong reasoning capabilities, but their safety under adversarial conditions remains a challenge. This study examines the impact of output length on the robustness of DeepSeek-R1, particularly in Forced Thinking scenarios. We analyze responses across various adversarial prompts and find that while longer outputs can improve safety through self-correction, certain attack types exploit extended generations. Our findings suggest that output length should be dynamically controlled to balance reasoning effectiveness and security. We propose reinforcement learning-based policy adjustments and adaptive token length regulation to enhance LLM safety.

View on arXiv

@article{li2025_2503.01923,
  title={ Output Length Effect on DeepSeek-R1's Safety in Forced Thinking },
  author={ Xuying Li and Zhuo Li and Yuji Kosuga and Victor Bian },
  journal={arXiv preprint arXiv:2503.01923},
  year={ 2025 }
}

Comments on this paper