Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models

Post-Traumatic Stress Disorder (PTSD) remains underdiagnosed in clinical settings, presenting opportunities for automated detection to identify patients. This study evaluates natural language processing approaches for detecting PTSD from clinical interview transcripts. We compared general and mental health-specific transformer models (BERT/RoBERTa), embedding-based methods (SentenceBERT/LLaMA), and large language model prompting strategies (zero-shot/few-shot/chain-of-thought) using the DAIC-WOZ dataset. Domain-specific models significantly outperformed general models (Mental-RoBERTa F1=0.643 vs. RoBERTa-base 0.485). LLaMA embeddings with neural networks achieved the highest performance (F1=0.700). Zero-shot prompting using DSM-5 criteria yielded competitive results without training data (F1=0.657). Performance varied significantly across symptom severity and comorbidity status, with higher accuracy for severe PTSD cases and patients with comorbid depression. Our findings highlight the potential of domain-adapted embeddings and LLMs for scalable screening while underscoring the need for improved detection of nuanced presentations and offering insights for developing clinically viable AI tools for PTSD assessment.
View on arXiv@article{chen2025_2504.01216, title={ Detecting PTSD in Clinical Interviews: A Comparative Analysis of NLP Methods and Large Language Models }, author={ Feng Chen and Dror Ben-Zeev and Gillian Sparks and Arya Kadakia and Trevor Cohen }, journal={arXiv preprint arXiv:2504.01216}, year={ 2025 } }