About

New study shows OpenAI's GPT-5.5 failed to outperform o3 on law school exams

Published
Score
31

Why it matters

University of Maryland law professors have found that OpenAI's GPT-5.5 did not meaningfully outperform its predecessor, o3, on law school final exams—a finding that challenges assumptions about consistent improvement in newer AI models.

Seven faculty members from the University of Maryland Francis King Carey School of Law graded both models using the same criteria applied to their students. In spring 2025, o3 earned A+ grades in Constitutional Law, Professional Responsibility, and Property Law, with grades ranging from A+ to B across eight exams. When the researchers tested GPT-5.5 the following spring using its "xhigh" reasoning effort setting, results showed only marginal gains: two A+s, three As, and a B+. The newer model demonstrated no clear superiority.

The study extends a multi-year tracking effort beginning with GPT-3.5 in 2022, which scored mostly C's and D's, through GPT-4, which passed the bar exam at the 90th percentile. The plateau in performance despite increased computational resources and advanced reasoning features suggests potential stagnation in AI progress on legal benchmarks.

Attorneys and legal institutions should monitor this finding as it complicates the narrative around AI capability scaling. If performance gains on complex legal reasoning tasks have genuinely plateaued, claims about AI readiness for high-stakes professional work warrant skepticism. The results may influence how courts, bar associations, and law firms evaluate and deploy AI tools in the years ahead.

Sources

mail Subscribe to Law And Technology email updates

Primary sources. No fluff. Straight to your inbox.

Also on LawSnap