AI Reasoning Benchmarks

3 entries in Legal Intelligence Tracker

3 Contributing Entries

Law And Technology Artificial Intelligence AI Legal Research AI Reasoning Benchmarks AI Capability Research AI Education AI Legal Education

Score

New study shows OpenAI's GPT-5.5 failed to outperform o3 on law school exams

University of Maryland law professors have found that OpenAI's GPT-5.5 did not meaningfully outperform its predecessor, o3, on law school final exams—a finding that challenges assumptions about consistent improvement in newer AI models.

June 29, 2026

Details arrow_forward

Artificial Intelligence Law And Technology Privacy AI Transparency Disclosure AI Preemption AI International Competition AI Bias Audit AI Agentic Systems AI Capability Research AI National Security AI Liability Framework AI State Legislation AI Agentic Governance AI Federal Framework AI Hallucination Incident Fraud Regulatory Fragmentation Deepfake Detection AI Physical Robotics AI Reasoning Benchmarks AI Sandbox Program AI Content Moderation AI Journalism AI Identity Verification AI Training Data Health Care

Score

UN independent panel warns unchecked AI progress poses catastrophic risks

On July 1, 2026, the UN's Independent International Scientific Panel on Artificial Intelligence released a preliminary report warning that unregulated AI development is outpacing both scientific understanding and government policy, with no guarantee against catastrophic harm. Led by UN Secretary-General António Guterres and computer scientist Yoshua Bengio, the panel identified specific risks: loss of control over autonomous systems, deceptive AI behaviors, and exploitation for fraud, cyberattacks, and biological threats. The report notes that AI already demonstrates expert-level reasoning in mathematics and science, with task complexity doubling every four to seven months, while current models trained on only a fraction of the world's 7,000 languages produce dangerous errors in health diagnoses for many populations.

July 1, 2026

Details arrow_forward

Artificial Intelligence AI Capability Research AI International Competition AI Vendor Market AI Enterprise Adoption Law And Technology AI Reasoning Benchmarks Antitrust

Score

Chinese startup Z.ai launches GLM-5.2, rivaling Anthropic and OpenAI at one-sixth the cost

Beijing-based startup Z.ai launched GLM-5.2 last month, a large language model now performing nearly as well as Anthropic's Claude Opus 4.8 on coding and agent tasks while operating at roughly one-sixth the cost of closed U.S. models like GPT and Claude. The model has rapidly gained traction on third-party AI platforms including OpenRouter, where it now ranks above Anthropic's offerings, and on Artificial Analysis' leaderboard, where it holds fifth place overall and second place for front-end coding. Industry observers have characterized the development as a "mini DeepSeek moment"—a reference to the Chinese competitor that disrupted markets in 2025 with its own low-cost, high-capability model. Prominent Western tech leaders including Snowflake CEO Sridhar Ramaswamy and venture capitalist Marc Andreessen have publicly praised GLM-5.2's capabilities.

July 2, 2026

Details arrow_forward

mail Subscribe to AI Reasoning Benchmarks email updates

Primary sources. No fluff. Straight to your inbox.