Can Large Language Models (LLMs) and text-embedding models detect depression and suicide risk based on Sentence Completion Test (SCT) narratives of psychiatric patients?

Mental health assessment still relies heavily on clinical interviews and subjective judgment. With the recent development of LLMs, researchers are increasingly exploring whether Artificial Intelligence (AI) can support early detection of psychiatric symptoms through language analysis. AI-assisted approaches are, indeed, a particularly suitable method to evaluate mental health outcomes based on verbal and narrative information.

With the aim of further exploring the potential of AI-based clinical tools, Lho et al. (2026) analyzed whether LLMs and text-embedding Machine Learning (ML) models can identify clinically significant depression and suicide risk from written narratives produced by psychiatric patients. In addition, this study specifically focused on narratives generated through the Sentence Completion Test (SCT), a semistructured psychological assessment in which participants complete unfinished sentences related to self-concept, interpersonal relationships, family, and gender perception. This interesting investigation is highly related to ALENTAR-J-CM’s mission: preventing mental health problems and suicide among adolescents and youth by using high-technological and ethical AI-based tools.

In this study SCT data from 1,064 Korean-speaking psychiatric patients were analyzed, comprising more than 52,000 narrative responses. Moreover, depression and suicide severity were defined using validated clinical scales (e.g., Beck Depression Inventory-II – BDI-II -; Zung Self-Rating Depression Scale – SDS -). Regarding AI approaches, several models were compared, including GPT-40, GPT-3.5 Turbo, Gemini 1.0 Pro, and embedding-based ML models combined with algorithms (e.g., Support Vector Machines – SVM -; Logistic Regression; Extreme Gradient Boosting – XGB -).

Adequate LLMs depression and suicide risk detection performance

Overall, Lho et al. (2026) observed that both LLMs and embedding-based ML models were able to detect depression and suicide risk with relatively strong performance. Most models achieved AUROC values above 0.70, suggesting meaningful discriminative capacity. Furthermore, among the LLMs:

GPT-4o achieved the highest performance with AUROC values around 0.73.
Gemini showed similar results.
GPT-3.5 performed less effectively, although few-shot prompting improved its accuracy.
The best overall performance came from embedding-based ML models. In particular, the “text-embedding-3-large” model combined with XGB achieved an AUROC of 0.841 and accuracy above 82%.

One of the most important findings was that self-concept narratives yielded the strongest predictive performance. Narratives involving guilt, self-image, future expectations, and personal identity appeared especially informative for detecting depression and suicide risk.

Additionally, the qualitative analyses revealed that AI performance depended not only on narrative content, but also on how participants expressed themselves. For instance, patients who openly expressed pessimism and negative self-perception were more accurately classified. Conversely, defensive, superficial, or emotionally restricted narratives reduced model performance. This suggests that AI models may struggle when patients intentionally minimize distress or provide limited emotional information.

Clinical implications and future directions of LLMs on mental health assessment

This study highlights the potential of AI-assisted mental health screening tools based on natural language analysis. According to the authors, these tools offer an early detection support system of depression and suicide risk which facilitate precision psychiatry approaches. Furthermore, LLM-based systems could provide assistance to clinicians in decision-making processes, therefore complementing traditional clinical assessment.

Despite the cross-sectional nature of this study, and its methodological limitations – psychiatric sample; self-reported measures instead of structured clinical diagnostic measures -, this study provides strong evidence that LLMs and text-embedding models can meaningfully detect depression and suicide risk from patient narratives. The findings reinforce the growing role of AI and computational psychiatry in mental health assessment.

Nevertheless, the authors remark the importance of considering these technologies as supportive tools rather than replacements for clinical judgment. Additionally, the authors argue the need for further improvements in accuracy, explainability, safety, and ethical governance are necessary before real-world clinical implementation.

Read the full text

We really enjoyed this research study, as it shares ALENTAR-J-CM’s interests. If you would like to know more about the study, here you can find the link to the article!

Adequate LLMs depression and suicide risk detection performance

Clinical implications and future directions of LLMs on mental health assessment

Read the full text

Artículos relacionados

Leave a Comment Cancel Reply