AI lab claims self-improving coding agents set new benchmark
Poetic's meta-system has reportedly achieved a score of 93.9 on the Soda benchmark—surpassing GPT-5.5—by running live code benchmarks and building its own test harnesses without fine-tuning or special access. In a separate effort, Prime Intellect provided idle compute to Anthropic's Codex and Claude Code to optimize a "nano GPT speedrun" track; after approximately 14,000 H200 GPU hours, the agents beat the human baseline, with Opus 4.7 recording a time of 2,930 steps. These developments were discussed in a May 15, 2026 episode of The Innermost Loop, hosted by Dr. Alex Wissner-Gross, which framed the activity as evidence that AI systems are beginning to optimize their own optimizers.