AI lab claims self-improving coding agents set new benchmark

Published May 15, 2026

Score

Why it matters

Poetic's meta-system has reportedly achieved a score of 93.9 on the Soda benchmark—surpassing GPT-5.5—by running live code benchmarks and building its own test harnesses without fine-tuning or special access. In a separate effort, Prime Intellect provided idle compute to Anthropic's Codex and Claude Code to optimize a "nano GPT speedrun" track; after approximately 14,000 H200 GPU hours, the agents beat the human baseline, with Opus 4.7 recording a time of 2,930 steps. These developments were discussed in a May 15, 2026 episode of The Innermost Loop, hosted by Dr. Alex Wissner-Gross, which framed the activity as evidence that AI systems are beginning to optimize their own optimizers.

The claims remain unverified outside the podcast discussion. No independent benchmarking body has confirmed the reported scores, and details about Poetic's methodology and Prime Intellect's compute allocation have not been made public. The timeline and technical specifications come solely from the podcast episode and related materials.

Attorneys tracking AI liability and IP issues should note the shift in how these systems are being deployed. When AI agents design their own test harnesses and optimization loops, questions about ownership of improvements, reproducibility for patent prosecution, and liability for errors in self-generated benchmarks become material. The recursive nature of these tasks—machines improving the machines that improve machines—may also trigger closer scrutiny from regulators focused on autonomous AI development and safety validation.

mail Subscribe to Artificial Intelligence email updates

Primary sources. No fluff. Straight to your inbox.

View all Artificial Intelligence

21 Score

Litigation

Contracts

Compliance

Legal Intelligence

AI lab claims self-improving coding agents set new benchmark

Why it matters

mail Subscribe to Artificial Intelligence email updates

Related

New York Enacts AI Digital Replica Laws for Fashion Models Effective June 2026

Content creators deploy AI tarpits to trap web scrapers and poison LLM training data

Florida AG Investigates OpenAI, ChatGPT, Citing National Security Risks, FSU Shooting