The mechanism underlying the concern is straightforward: as AI systems exhaust human-generated training data available on the internet, they increasingly train on content they themselves created, producing a self-referential loop. This process mirrors digital degradation—similar to repeated JPEG compression—where models progressively forget rare knowledge and eventually collapse into incoherent output. The timing reflects an acceleration in AI-generated content; by 2023, over 1 percent of published scientific papers were AI-written. The specific legal and regulatory responses to model collapse remain undetermined, as does whether platforms will implement technical solutions to distinguish human-generated from AI-generated training data.
Attorneys should monitor this issue for two reasons. First, as model reliability degrades, liability questions will emerge around AI-generated content used in professional contexts—from legal research to medical diagnostics. Second, regulators may mandate data provenance standards or require platforms to segregate training datasets, creating compliance obligations similar to existing data governance frameworks. The neuroscientist's framing of this as a "slow-motion car crash" suggests the problem compounds over time rather than manifesting as discrete failures, making early attention to emerging standards and industry responses strategically important.