The problem traces to model degradation since 2023, when large language models began absorbing AI-generated training data and reinforcing their own errors through repeated use. Error rates on complex queries now range from 15 to 30 percent. OpenAI, Microsoft, and Meta have acknowledged the issue across systems including ChatGPT, Tay, and Llama, though no specific regulatory response or legislation has emerged. Industry efforts to address the gap—confidence scoring, error monitoring, and human-in-the-loop verification—remain inconsistent and incomplete.
Attorneys should treat AI-generated research and citations with heightened skepticism. The documented harms include lawyers citing fabricated case law and physicians receiving flawed medical guidance. As AI agents integrate deeper into professional workflows, the risk of undetected errors escalates. Organizations deploying these systems should implement independent verification protocols and resist the false confidence that improved overall performance provides.