A study examined how major artificial intelligence chatbots respond to suicide related questions. The investigation singled out a particular chatbot and discussed how it sometimes provided direct answers even to high-risk prompts, raising significant safety and ethical concerns. Researchers from RAND Institute, Harvard University, and Brigham Young University conducted the study. Their paper was published on 26 August 2025 in Psychiatry Online.
The study evaluated ChatGPT running on GPT-4o Mini, Claude running on Sonnet 3.5, and Google Gemini 1.5 Pro. 30 carefully designed questions about suicide were used. Each question was categorized according to its risk level and was run 1000 times per platform, generating 9000 responses overall. The researchers then assessed whether these generative AI chatbots delivered direct informative answers or deflected questions. Below are the main findings:
• Performance at Low Risk: ChatGPT and Claude consistently provided direct and accurate answers to very low-risk questions, such as identifying suicide rates by state. Both systems responded with clear information in 100 percent of trials. This was indicative of strong reliability in this category.
• Responses at Very High Risk: Moreover, when presented with very high-risk prompts or queries, such as those asking how to ensure suicide attempts succeed, none of the tested AI chatbots offered direct answers. This showed that built-in safeguards can effectively block some of the most dangerous queries.
• Behavior at Immediate Risk: Responses were inconsistent at intermediate risk. ChatGPT answered 78 percent of these questions directly. It sometimes provided information that could be considered harmful. Claude and Gemini had fluctuating behavior. Both sometimes supply answers and at other times refuse.
• About Lethality Information: ChatGPT and Claude both supplied information about lethal methods, including details about poisons most associated with completed suicide. Most concerning, ChatGPT was the only model that explicitly explained how to tie a noose, raising safety and ethical concerns.
• Therapeutic Advice Response: Note that ChatGPT often declined to provide resources when asked therapeutic low-risk questions. Prompts about support services or online help often received vague or no answers. This demonstrated a gap in offering protective and supportive guidance to users.
Lead author Ryan McBain from RAND Institute highlighted that developers of generative AI chatbots must develop and implement stronger measures to protect their users. He specifically recommended creating clinician-informed benchmarks for all levels of suicide risk, connecting users directly to crisis hotlines, and providing explicit or clear links to professional therapy resources instead of vague disclaimers or refusals to respond.
The urgency of these recommendations was underscored by the case of 16-year-old Adam Raine, who died by suicide after long interactions with ChatGPT. His family has filed a wrongful death lawsuit against OpenAI, arguing that the system contributed to his death by validating and facilitating his plans. Other independent research groups have reported similar problems. These issues reflect broader problems across the artificial intelligence industry.
It is worth noting the particular AI models tested. OpenAI has rolled out its new large language model called GPT-5. The researchers tested ChatGPT running on the older GPT-4 Mini model. OpenAI has also provided several safeguards. The newer model powering ChatGPT now does not provide direct answers to complex personal issues involving emotional distress, relationships, mental health, or high-stakes and emotionally personal decisions.
FURTHER READING AND REFERENCE
- McBain, R. K., Cantor, J. H., Zhang, L. A., Baker, O., Zhang, F., Burnett, A., Kofner, A., Breslau, J., Stein, B. D., Mehrotra, A., and Yu, H. 2025. “Evaluation of Alignment Between Large Language Models and Expert Clinicians in Suicide Risk Assessment.” Psychiatric Services. DOI: 1176/appi.ps.20250086