New study shows OpenAI's GPT-5.5 failed to outperform o3 on law school exams

Published June 29, 2026

Score

Seven faculty members from the University of Maryland Francis King Carey School of Law graded both models using the same criteria applied to their students. In spring 2025, o3 earned A+ grades in Constitutional Law, Professional Responsibility, and Property Law, with grades ranging from A+ to B across eight exams. When the researchers tested GPT-5.5 the following spring using its "xhigh" reasoning effort setting, results showed only marginal gains: two A+s, three As, and a B+. The newer model demonstrated no clear superiority.

The study extends a multi-year tracking effort beginning with GPT-3.5 in 2022, which scored mostly C's and D's, through GPT-4, which passed the bar exam at the 90th percentile. The plateau in performance despite increased computational resources and advanced reasoning features suggests potential stagnation in AI progress on legal benchmarks.

Attorneys and legal institutions should monitor this finding as it complicates the narrative around AI capability scaling. If performance gains on complex legal reasoning tasks have genuinely plateaued, claims about AI readiness for high-stakes professional work warrant skepticism. The results may influence how courts, bar associations, and law firms evaluate and deploy AI tools in the years ahead.

New study shows OpenAI's GPT-5.5 failed to outperform o3 on law school exams

Why it matters

Sources

mail Subscribe to Law And Technology email updates

Related

UN releases 2026 International AI Safety Report warning of enormous benefits and existential risks

AAA Launches Legal Context Protocol for AI Agent Transactions

UN independent panel warns unchecked AI progress poses catastrophic risks