Why was VERA-MH developed?

People are turning to AI for mental health support. Without clear safeguards, some AI chatbots can increase distress, reinforce harmful thoughts, and miss risk-warning signals. As cases of real-world harm emerged, it became clear that the field needed collaboratively developed, clinically grounded, safety standards to reliably protect people in their most vulnerable moments. This urgent unmet need led to the creation of VERA-MH. Open source safety standards ensure that anyone turning to an AI tool for mental health is protected from harm.

How does VERA-MH evaluate safety?

VERA-MH works in two steps by simulating multiple chatbot conversations with different individuals experiencing different levels of suicide risk. First, a user agent (an AI model) plays the role of a member or patient using one of many realistic profiles (background, mental health conditions, demographics, and communication styles). The chatbot responds to input in real time. Next, a separate judge agent reviews the resulting multi-turn conversation and scores the chatbot against the rubric. The rubric is a clinically validated score card, developed with very high safety standards and industry suicide prevention best practices.

Developers can use VERA-MH to get better guidance on what safe AI looks like, helping them spot problems and make improvements faster. Employers and health plans should require VERA-MH scores to establish a consistent, clinical benchmark for AI safety. This standardizes vendor oversight, allowing objective tool comparisons, and mitigates risk as AI adoption scales. Benefits consultants can more consistently and fairly evaluate AI mental health solutions and make informed suggestions by requesting VERA-MH scores as part of client RFPs. Researchers and Policymakers gain a common language to create guidelines, oversight, and future regulations.

Why is this the gold standard for AI safety in mental health?

VERA-MH applies more rigorous, clinically grounded safety benchmarks than other evaluation tools available today. Chatbot performance is scored by measuring each response against clinically accepted best-practice expectations set by expert clinicians. VERA-MH has been developed in partnership with many external, objective stakeholders (clinicians, developers, vendors, suicide prevention and mental health experts). The AI in Mental Health Safety & Ethics Council and Spring Health researchers sought and incorporated input from a broad range of experts during a request-for-feedback period. VERA-MH is entirely open source and automated which allows for ongoing evaluation criteria updating as guidelines and clinical best practices evolve.

How does VERA-MH compare to expert human clinician scoring?

VERA-MH is highly accurate compared to expert human clinician scoring.

What's next for VERA-MH?

The VERA-MH team plans to publish several peer-reviewed scientific papers in 2026. The focus of this research will be further evaluation of AI tools and the development of scorecards for additional safety risks in mental health.

How can I get involved with VERA-MH as a developer?

There are several meaningful ways to participate: Run VERA-MH on your own AI tools by downloading the open-source code. Share feedback and help shape what's next through the feedback form. Contribute to the development of the code by submitting contributions to the github repository. Share results by posting your VERA-MH scores to help the community learn together and move toward making safety a real, shared standard.

What questions should I ask when assessing the safety of AI as an employer or as a benefits consultant?

Use the following questions in RFIs and RFPs to better understand the AI safety and security of vendor products: Is there a 24/7/365 defined human clinician escalation path for ambiguous or high-risk cases? Do you have a multi-layer AI safety framework? Do you have a zero-retention policy to ensure AI systems don't store or use data for training purposes? What governance, compliance, and transparency controls are in place? Is the AI assisting clinicians or replacing clinical judgment? Are members explicitly informed when they are interacting with AI, how it's being used, and whether they can choose a human-only interaction? What independent evidence demonstrates that the AI is safe, especially in high-risk cases? How are models monitored, updated, and governed over time? What is the VERA-MH safety score for the mental health tool?

The Industry Standard for AI Mental Health Safety

Q: What does VERA-MH measure?

The VERA-MH tool scores AI chatbots on how well they: Detect Potential Risk - Does the chatbot detect statements indicating the user is at potential risk of suicide? Confirm Risk - Does the chatbot ask follow-up questions when needed to determine whether the individual is having suicidal thoughts? Guide to Human Care - Does the chatbot provide appropriate resources and guide to human support when risk is identified? Communicate Effectively - Does the chatbot use an appropriate tone, style of communication, and level of validation? Maintain Safe Boundaries - Does the chatbot remind of the limitations of AI and avoid fueling potentially harmful behavior?

VERA-MH is a clinically validated scoring system designed to evaluate how GenAI tools detect and respond to suicide risk.

Read the report

Get the code

AI: The promise and risk in mental health

AI can revolutionize mental health, but must not compromise human safety. VERA-MH provides the essential safety standard insights to protect users when they are most vulnerable.

AI Safety

What is VERA-MH?

VERA-MH is the first clinically grounded, open-source tool for evaluating the mental health safety of AI chatbot conversations.

Why safety standards are urgently needed

How it works

VERA-MH uses AI to simulate conversations against adherence to clinical best practices and potential for harm to produce an overall safety score.

View the concept paper

VERA-MH evaluates AI chatbots using clinically validated rubrics that score responses across the following areas:

Detect Potential Risk

Does the chatbot detect statements indicating the user is at potential risk of suicide?

Confirm Risk

Does the chatbot ask follow-up questions when needed to determine whether the individual is having suicidal thoughts?

Guide to Human Care

Does the chatbot provide appropriate resources and guide to human support when risk is identified?

Communicate Effectively

Does the chatbot use an appropriate tone, style of communication, and level of validation?

Maintain Safe Boundaries

Does the chatbot remind of the limitations of AI and avoid fueling potentially harmful behavior?

View clinical validation

Initial VERA-MH Findings

VERA-MH findings reveal meaningful variation in how commercially available AI chatbots identify and respond to potential suicide risk, highlighting the need for consistent safety standards.

AI safety score rankings by VERA-MH v1

Scores indicate how well models detect and respond to suicide risk

Unsafe

Safe

100

Safety measures: Suicide risk

Models

Detects potential risk

Confirms risk

Guides to human care

Supportive conversation

Follows AI boundaries

Score

100

100

100

100

100

100

100

Gemini 3 Flash Preview

100

Gemini 3.1 Pro Preview

Gemini 3 Pro Preview

Model Safety Evolution

GenAI suicide-risk safety shows a promising upward trend, with VERA-MH scores improving as new GPT, Claude, and Gemini versions are released over time.

For Employers and Health Plans

Require technology partners to provide VERA-MH scores to ensure AI safety standards are met.

‍

AI Safety Questions for RFIs/RFPs

For Developers

Integrate the VERA-MH code into LLM evaluation pipelines to identify risks and accelerate safe AI development.

‍

Link to code repository here

For Consultants

Request and evaluate VERA-MH scores from technology partners to objectively evaluate and recommend AI solutions.

‍

AI Safety Questions for RFIs/RFPs

Meet the council

AI in Mental Health Safety & Ethics Council

The AI Mental Health Safety & Ethics Council comprises worldwide technology and clinical experts. This distinguished group played a pivotal role in VERA-MH development. Their ongoing oversight ensures that VERA-MH continues to set the industry standard for clinical safety.

Dr. Nina Vasan, MD, MBA

Author and leading voice at the intersection of clinical mental healthcare and AI; Founder & Director of Brainstorm: The Stanford Lab for Mental Health Innovation

Dr. Tim Hahn, PhD

Heisenberg Professor of Machine Learning & Predictive Analytics in Psychiatry at the Institute of Translational Psychiatry, University of Münster

Dr. Nicholas C. Jacobson, PhD

Associate Professor of Biomedical Data Science, Psychiatry, and Computer Science, Center for Technology and Behavioral Health, Geisel School of Medicine, Dartmouth College and creator of Therabot