108
v1v2 (latest)

KOTOX: A Korean Toxic Dataset for Deobfuscation and Detoxification

Main:8 Pages
11 Figures
Bibliography:2 Pages
25 Tables
Appendix:16 Pages
Abstract

Online communication increasingly amplifies toxic language, and recent research actively explores methods for detecting and rewriting such content. Existing studies primarily focus on non-obfuscated text, which limits robustness in the situation where users intentionally disguise toxic expressions. In particular, Korean allows toxic expressions to be easily disguised through its agglutinative characteristic. However, obfuscation in Korean remains largely unexplored, which motivates us to introduce a KOTOX: Korean toxic dataset for deobfuscation and detoxification. We categorize Korean obfuscation patterns into linguistically grounded classes and define transformation rules derived from real-world examples. Using these rules, we provide paired neutral and toxic sentences alongside their obfuscated counterparts. Models trained on our dataset better handle obfuscated text without sacrificing performance on non-obfuscated text. This is the first dataset that simultaneously supports deobfuscation and detoxification for the Korean language. We expect it to facilitate better understanding and mitigation of obfuscated toxic content in LLM for Korean. Our code and data are available atthis https URL.

View on arXiv
Comments on this paper