12
0

The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support

Abstract

Can small language models with 0.5B to 5B parameters meaningfully engage in trauma-informed, empathetic dialogue for individuals with PTSD? We address this question by introducing TIDE, a dataset of 10,000 two-turn dialogues spanning 500 diverse PTSD client personas and grounded in a three-factor empathy model: emotion recognition, distress normalization, and supportive reflection. All scenarios and reference responses were reviewed for realism and trauma sensitivity by a clinical psychologist specializing in PTSD. We evaluate eight small language models before and after fine-tuning, comparing their outputs to a frontier model (Claude Sonnet 3.5). Our IRB-approved human evaluation and automatic metrics show that fine-tuning generally improves perceived empathy, but gains are highly scenario- and user-dependent, with smaller models facing an empathy ceiling. Demographic analysis shows older adults value distress validation and graduate-educated users prefer nuanced replies, while gender effects are minimal. We highlight the limitations of automatic metrics and the need for context- and user-aware system design. Our findings, along with the planned release of TIDE, provide a foundation for building safe, resource-efficient, and ethically sound empathetic AI to supplement, not replace, clinical mental health care.

View on arXiv
@article{bn2025_2505.15065,
  title={ The Pursuit of Empathy: Evaluating Small Language Models for PTSD Dialogue Support },
  author={ Suhas BN and Yash Mahajan and Dominik Mattioli and Andrew M. Sherrill and Rosa I. Arriaga and Chris W. Wiese and Saeed Abdullah },
  journal={arXiv preprint arXiv:2505.15065},
  year={ 2025 }
}
Comments on this paper