How content licensing solves AI's source quality crisis

260423 AI overviews

A while back, my colleague Jason was trying to find the blazon (i.e., the official heraldic description) for Sir Brian Leveson’s armorial bearings. When searching on Google, the first thing he saw was an AI overview which claimed that his arms featured a double-headed eagle, his crest had a Rod of Asculapius and a Chinese dragon, and his badge had a serpent conjoined to the axon and telodendria of a nerve.

At first blush, it looked  like a thorough overview. It even cited the College of Arms in several places. But something didn’t add up. Jason wondered why Sir Brian, a retired judge, would have a Rod of Ascelapius or the axon and telodendria of a nerve in his armorial bearings. When he dug deeper, he realized that Google’s AI overview had actually described an entirely different set of armorial ensigns belonging to an Australian named Nicholas Schaerf that just happened to be featured in the College of Arms’ newsletter alongside Sir Brian’s arms. 

That anecdote illustrates the real problem with AI overviews at the moment. Far too often, they produce answers that look grounded and reputable only to collapse when subjected to scrutiny.

How often are Google's AI overviews actually accurate?

This isn’t an outlier. The New York Times recently highlighted a study by AI startup Oumi which found that Google’s AI overviews are accurate around 90 percent of the time. But the search giant processes over five trillion searches a year, so it’s serving up tens of millions of inaccurate answers every hour. And more than half of the overviews were “ungrounded,” meaning they cited material that didn’t actually support their claims. Google’s AI also struggled to cite reputable sources. For example, Oumi found that they drew heavily from Facebook and Reddit instead of something more authoritative. 

Even when Google does find an authoritative source, there’s no guarantee that it’ll be interpreted correctly. The College of Arms newsletter that Google cited in Jason’s response did mention that Sir Brian had been granted arms, but it was a brief notice and lacked a description or an image. Schaerf’s arms, on the other hand, were described in detail and illustrated. 

Why AI source quality determines answer quality

Problematic answers aren’t just a problem with the AI models themselves. As the saying goes, garbage in, garbage out. If the AI can’t access high-quality information, it won’t be able to produce high-quality answers. At the moment, sources are often a witches’ brew of scraped pages, weak metadata, inconsistent attribution, partial snippets, and content that was never prepared for machine interpretation. In this environment, it’s easy for an AI to be led astray. And while some of these answers are obviously wrong (e.g. “Doctors recommend smoking 2-3 cigarettes per day during pregnancy”), that won’t always be the case. If Jason didn’t know anything about Sir Brian Leveson, he might have taken the AI Overview at face value. Ideally, people would take the time to check the AI’s output, but the reality is that many people will simply accept it as authoritative.

How content licensing improves AI grounding and provenance

Happily, there is a solution: Licensed, metadata-rich, authoritative content that AI models can find, interpret, and cite with confidence.

That is why structured content licensing isn’t just a matter of ethics. Publisher consent and compensation are still vital, but licensing can also make it easier for AI models to use source material by enhancing its readability and providing documented provenance. 

Microsoft has already embraced that logic. In its February 2026 announcement on a sustainable content economy for the agentic web, the company argued that as AI shifts from search results to conversational answers, content quality becomes mission-critical. Its Publisher Content Marketplace is explicitly designed to create licensed access to premium publisher content under transparent terms, with content-quality positioned as a key differentiator in AI experiences.

Cloudflare’s moves point in the same direction from the infrastructure layer. In July 2025, it announced a permission-based model that blocks AI crawlers from accessing original content without permission and introducing “pay per crawl” tooling so publishers can control and monetize machine access. Whatever one thinks of the commercial details, the idea of a rules-based order for AI usage is clearly gaining traction.

Why content licensing is AI's path to trustworthy answers

Platforms would do well to read the room. Prioritizing licensed access, better metadata, and clearer provenance now will pay dividends down the line. Conversely, platforms that rely on scraped data at scale will be making a risky bet. They may hope that model improvements can produce better outcomes, but that’s far from guaranteed. If a tech titan like Google, which has world-class models and world-scale retrieval, cannot reliably produce grounded answers from unstructured web content, it’s hard to see anyone else making it work. 

And for publishers, there is a secondary but important point. They’re already producing the kind of structured, authoritative content AI systems so desperately need. By making it available through content licensing, they can help shape the next generation of AI.