About
AI Reasoning Benchmarks

AI Reasoning Benchmarks

3 entries in Legal Intelligence Tracker

3 Contributing Entries

New study shows OpenAI's GPT-5.5 failed to outperform o3 on law school exams

University of Maryland law professors have found that OpenAI's GPT-5.5 did not meaningfully outperform its predecessor, o3, on law school final exams—a finding that challenges assumptions about consistent improvement in newer AI models.

UN independent panel warns unchecked AI progress poses catastrophic risks

On July 1, 2026, the UN's Independent International Scientific Panel on Artificial Intelligence released a preliminary report warning that unregulated AI development is outpacing both scientific understanding and government policy, with no guarantee against catastrophic harm. Led by UN Secretary-General António Guterres and computer scientist Yoshua Bengio, the panel identified specific risks: loss of control over autonomous systems, deceptive AI behaviors, and exploitation for fraud, cyberattacks, and biological threats. The report notes that AI already demonstrates expert-level reasoning in mathematics and science, with task complexity doubling every four to seven months, while current models trained on only a fraction of the world's 7,000 languages produce dangerous errors in health diagnoses for many populations.

Chinese startup Z.ai launches GLM-5.2, rivaling Anthropic and OpenAI at one-sixth the cost

Beijing-based startup Z.ai launched GLM-5.2 last month, a large language model now performing nearly as well as Anthropic's Claude Opus 4.8 on coding and agent tasks while operating at roughly one-sixth the cost of closed U.S. models like GPT and Claude. The model has rapidly gained traction on third-party AI platforms including OpenRouter, where it now ranks above Anthropic's offerings, and on Artificial Analysis' leaderboard, where it holds fifth place overall and second place for front-end coding. Industry observers have characterized the development as a "mini DeepSeek moment"—a reference to the Chinese competitor that disrupted markets in 2025 with its own low-cost, high-capability model. Prominent Western tech leaders including Snowflake CEO Sridhar Ramaswamy and venture capitalist Marc Andreessen have publicly praised GLM-5.2's capabilities.

mail Subscribe to AI Reasoning Benchmarks email updates

Primary sources. No fluff. Straight to your inbox.

Also on LawSnap