VERA-MH UPDATES

Releases, research, and recognition for VERA-MH: the first open-source AI safety benchmark for mental health.

May 17, 2026

VERA-MH added to the OECD.AI Catalogue of Tools & Metrics for Trustworthy AI

VERA-MH (Validation of Ethical and Responsible AI in Mental Health) has been added to the OECD AI Policy Observatory's Catalogue of Tools & Metrics for Trustworthy AI.

VERA-MH is the first open-source AI safety benchmark for mental health. Co-developed and open-source by Spring Health, it helps researchers, developers, clinicians, and policymakers evaluate how AI systems handle mental health conversations involving suicide risk.

Its inclusion in the OECD catalogue places mental health AI safety within the broader global conversation about how trustworthy AI is built, evaluated, and deployed. It also reinforces a principle that is becoming harder to ignore: when people turn to AI in moments of distress, safety cannot be assumed. It has to be measured.

View the OECD listing: https://oecd.ai/en/catalogue/tools

May 13, 2026

New preprint: VERA-MH methodology and first evaluation results

A new research paper detailing the VERA-MH methodology and evaluation results for four leading LLM providers is now available on arXiv.

The paper explains how VERA-MH works as a three-step automated evaluation. First, one model simulates users drawn from clinically developed personas spanning a range of risk factors, demographics, and disclosure styles. Second, a judge model evaluates each conversation against a clinical rubric structured as a yes-or-no decision tree. Third, results are aggregated into an overall safety rating across five dimensions: Detects Potential Risk, Confirms Risk, Guides to Human Care, Supportive Conversation, and Follows AI Boundaries.

Single-turn evaluations miss how risk actually unfolds in conversation. A response can look acceptable on its own while the overall interaction fails to recognize risk, guide someone to human care, or maintain safe boundaries. VERA-MH was built to evaluate the full conversation.

Read the paper: https://arxiv.org/abs/2605.13318

May 7, 2026

Webinar recording: Evaluating AI safety in mental health — practical frameworks, gaps, and what comes next

A recording of the recent webinar, “Evaluating AI Safety in Mental Health: Practical Frameworks, Gaps, and What Comes Next,” is now available.

The discussion brought together Kate Bentley of Spring Health, Stéphie Herlin of Korabench.ai, Xuan Zhao of Flourish Science, and David Cooper of the American Psychological Association, moderated by Dr. Laura Erickson-Schroth of The Jed Foundation.

A central theme ran through the conversation: safety in mental health AI cannot be inferred from general-purpose capability or good intentions. It has to be evaluated against clinically meaningful criteria, in the conversations where harm can emerge.

Four themes stood out:

  • Safety needs to be measurable. Open benchmarks and shared evaluation frameworks are essential for identifying risk, comparing systems, and driving improvement.
  • Safety is an ongoing process. As models and use cases evolve, evaluation requires iteration, monitoring, and human oversight.
  • Practical tools are needed now. Even as the field continues to build consensus, developers and organizations need frameworks they can apply today.
  • The conversation needs many perspectives. Clinicians, researchers, developers, policymakers, and people with lived experience all have a role in shaping what safe mental health AI should look like.

Watch the recording: https://www.linkedin.com/posts/vera-mh_evaluating-ai-safety-for-mental-health-best-activity-7457883654002864129-T0nq

May 5, 2026

VERA-MH v1.1 is now available

VERA-MH v1.1 strengthens how teams can simulate and evaluate chatbot conversations involving suicide risk against a clinically informed safety rubric. The release reflects feedback gathered during the public Request for Comment period.

What's changed:

  • 100 personas, expanded from 10. Broader coverage across demographics, risk presentations, and disclosure styles.
  • Updated safety scoring framework. Refined based on input from external stakeholders, clinicians, and AI developers during the public comment period.
  • Refined rubric. Considers context more carefully, distinguishes high potential for harm responses from suboptimal ones more clearly, and reduces coupling between scoring dimensions.
  • Improvements for larger evaluations. Retries, timeouts, resumable runs, and clearer logging make outputs easier to audit, review, and share.

Because the rubric and persona set have changed, v1.1 scores are not directly comparable to v1.0. That tradeoff is deliberate. Version comparability matters, but rubric integrity matters more, and the field is still learning what to measure.

VERA-MH is a living framework. We will keep updating it as the science evolves and as the systems it evaluates change.

Repository: https://github.com/SpringCare/VERA-MH

March 19, 2026

The Hemingway Report highlights the need for shared AI safety standards in mental health

In “The Map is Not the Territory,” Steve Duke and Kevin Hou examine two defining questions in mental health AI: how to tell whether an AI system is safe, and how to compare one chatbot against another.

Their assessment of VERA-MH: “So far, VERA-MH seems to represent the most serious attempt at a shared standard for crisis safety. It's open-source, clinically validated, and I've heard very positive feedback on the evals themselves and their openness to feedback and development.”

Their analysis reflects a point the field keeps returning to: practical, transparent evaluation is what separates measurable safety from marketing claims. VERA-MH is part of that shift by giving the field an open-source, clinically validated way to evaluate mental health AI safety across full conversations.

Read the analysis: The Hemingway Report — The Map is Not the Territory