KoACD: The First Korean Adolescent Dataset for Cognitive Distortion Analysis

Cognitive distortion refers to negative thinking patterns that can lead to mental health issues like depression and anxiety in adolescents. Previous studies using natural language processing (NLP) have focused mainly on small-scale adult datasets, with limited research on adolescents. This study introduces KoACD, the first large-scale dataset of cognitive distortions in Korean adolescents, containing 108,717 instances. We applied a multi-Large Language Model (LLM) negotiation method to refine distortion classification and generate synthetic data using two approaches: cognitive clarification for textual clarity and cognitive balancing for diverse distortion representation. Validation through LLMs and expert evaluations showed that while LLMs classified distortions with explicit markers, they struggled with context-dependent reasoning, where human evaluators demonstrated higher accuracy. KoACD aims to enhance future research on cognitive distortion detection.
View on arXiv@article{kim2025_2505.00367, title={ KoACD: The First Korean Adolescent Dataset for Cognitive Distortion Analysis }, author={ JunSeo Kim and HyeHyeon Kim }, journal={arXiv preprint arXiv:2505.00367}, year={ 2025 } }