What the Ahrefs Schema Study Actually Proves

A study comes out saying “walking one mile a day hurts the performance of elite athletes.” A week later, the takeaway floating around social media is “walking is bad for your health.” The original finding might have been perfectly valid for the specific population it studied, but the interpretation of the results was much broader than the data supports.

That’s roughly what’s happening with the recent Ahrefs schema and AI citations study.

The study, by Louise Linehan and Xibeijia Guan, tracked 1,885 pages that added JSON-LD between August 2025 and March 2026, matched them against 4,000 control pages, and measured citation changes on Google AI Overviews, Google AI Mode, and ChatGPT. They found a very small, negative effect. AI Overviews moved -4.6%. AI Mode and ChatGPT moved a few points in the positive direction, but so little it could be explained by statistical noise.

The headline implication people are already pulling—”schema doesn’t help AI citations”—is not what the study says. The study says something narrower, more specific, and considerably more defensible. Both versions deserve to be taken seriously, and they deserve to be told apart.

What the study actually says

The Linehan/Guan piece is a rigorous attempt to put a number on whether schema markup, by itself, lifts how often AI systems cite a page. The methodology is honest. The team flags their own confounds. The findings are presented with appropriate hedges.

Here’s what they actually measured:

1,885 pages that already had 100+ AI Overview citations in February 2025
Same pages, transitioning from no JSON-LD to some JSON-LD between August 2025 and March 2026
Citation counts in the 30 days before and 30 days after the transition date
Three platforms: Google AI Overviews, Google AI Mode, ChatGPT
Matched against control pages on other domains with similar pre-period citation levels

The narrow claim that this study design actually supports is:

Among pages already receiving heavy AI Overview citation traffic, adding some form of JSON-LD did not produce a measurable lift in citation frequency on three AI surfaces within a 30-day window.

That’s a useful, but very narrow finding. It deflates a piece of vendor hype—the pitch that JSON-LD always and everywhere boosts citation numbers. The study proves that there exists a certain set of circumstances where that’s not the case, at least not when the criterion is “any JSON-LD at all” being added.

What the study cannot support is the broader claim that’s already circulating: “schema doesn’t help AI citations.”

Barry Schwartz (who I love and listen to five days a week) overstates the case:

A new study from @ahrefs shows that adding schema does not improve citations across any AI platform, including Google AI Mode, AI Overviews, ChatGPT and more https://t.co/ie1TGlZOqK pic.twitter.com/RXrQ4SOPBW
— Barry Schwartz (@rustybrick) May 13, 2026

Ahrefs’ own Ryan Law also overstates the case and then adds some caveats:

does adding schema markup help your pages get cited in AI search? probably not 👇

we (@ahrefs) analyzed 6M URLs and found schema is more common on heavily cited pages

BUT that's probably correlation, not causation: schema markup is more common on pages with good SEO generally… pic.twitter.com/t9rVDXkNzL
— Ryan Law (@thinking_slow) May 13, 2026

I don’t think Schwartz or Law are acting in bad faith here—I like them both and think they are good guys. And to be fair, the piece’s title “We Tracked 1,885 Pages Adding Schema. AI Citations Barely Moved.” and its subtitles like “Adding schema didn’t boost citations on any platform” make it easy to misread the results. Unless you read the study carefully, it’s easy to overstate its claims.

This is a structural problem with how research moves through the attention economy. Complex findings get simplified in transit, and the simplified version travels faster than the nuanced one.

All that said, I want to go through why the very narrow claim about already heavily-cited pages not improving over a 30-day window after adding JSON-LD does not support the much broader claim that structured data does not improve AI citations in all cases.

The data just isn’t there for that broader claim, which Linehan and Guan make quite clear in their piece.

Six things between the finding and the misread

Six things stand between the narrow finding and the broader x.com post version. None of these are nitpicks. Each describes a different layer of what schema and entity work actually do for publishers.

The citation data is a sample, not a census.
The cohort is pages already winning the AI citation game, not pages trying to enter it.
The study doesn’t measure the four surfaces where entity work demonstrably moves the needle—Top Stories, Google News, Google Discover, and plain old organic search results.
All schema types are pooled, even though they do completely different jobs.
All citations are counted equally, even though their value to a publisher could vary by orders of magnitude.
Pages adding schema in the real world are usually doing other things at the same time, and those other things confound the result.

Sampling, not census

The study filters down to pages that had 100+ AI Overview citations in February 2025. That sounds like a sturdy threshold. But “100+ citations” here means “100+ citations the sampler observed.” The actual universe of AI citations isn’t visible to any tool, including Ahrefs’ own.

AI surfaces don’t return one canonical results page per query. They synthesize answers across a sprawl of queries, including ultra-long-tail questions or prompts that traditional SEO tools have never tracked. A page can be cited thousands of times on questions (or family of similar questions) only asked twelve times a month each. None of those citations will ever show up in a measurement dashboard. The visible slice of AI citations is a sample of an unknown whole.

Saying “this page got 100+ citations in February 2025” really means “this page got 100+ citations our tool observed.” That’s not a knock on Ahrefs—their tool is great and has its uses. It’s simply a structural fact about measuring AI visibility.

Which means the 1,885 pages in this study are the measurable citation winners, not the actual citation winners. And the -4.6% movement is -4.6% of a sample, not -4.6% of the truth.

Already-cited pages only

Every page in the study was already a citation magnet. Pages going from zero citations to a few, or from a few to many, were structurally invisible to the design.

But that’s the path most schema implementations are actually trying to influence. A publisher doesn’t add structured data hoping a page that already gets 200 monthly AI citations gets 220 instead. They add it because their page on a topic they’re authoritative about isn’t getting picked up at all, and they want Google’s pipeline to understand what the page is about and who they are as a source.

The Ahrefs study doesn’t see that population, by design. Therefore this study cannot tell us whether schema helps pages enter the citation set in the first place. The whole left side of the distribution is outside the frame. Generalizing from one slice of a distribution to the whole of it is what turns a valid narrow finding into a distorted broad falsehood.

The surfaces never measured

The study measured three AI surfaces. In our experience, Top Stories, Google News, Google Discover, and organic search are where entity recognition and Knowledge Graph connection actually pay out for editorial sites. None of those are in the dataset.

We can show what entity work does on those surfaces because we run it every week.

Carolina Journal, after activating TopicalBoost on September 17, 2025, went from 22 Top Stories appearances in the prior six months to 106 appearances in the next six. 83 unique pages appeared across Top Stories carousels, 34 of them in the first position. The site didn’t undergo a major rethink. The change was schema and internal linking working at the entity layer.
Illinois Policy added 62% in organic traffic within 60 days of activation and sustained 37% growth over the following six months (935K to 1.28M sessions). Average monthly Google Discover clicks tripled, from roughly 49K to roughly 154K, peaking at 244K in December 2025. Average organic position moved from 10.5 to 7.0.
Reason Magazine nearly quadrupled its Google Discover traffic and posted the largest single-day Google News result we’ve ever seen for a single article.

These movements happened on the surfaces the Ahrefs study didn’t measure, on timescales it didn’t measure, with the kind of schema it didn’t isolate.

Not all schema is the same

Gianluca Fiorelli, responding to the study, drew the distinction I think is the most important conceptual move in this whole conversation. Structured data isn’t one tool, it’s at least two, and they do very different jobs. They are:

Content type signals. Article schema, FAQPage, Product, BusinessLocation, HowTo. These declare what type of content is on the page. They help Google decide whether to surface the page in a product carousel, a map pack, a job listing, or an FAQ accordion. They’re rich-results infrastructure.
Entity identity signals. Markup that connects the article to recognized entities in the Knowledge Graph: this article is about Marco Rubio (a person), the Brookings Institution (an organization), constitutional carry (a concept), Louisiana (a place). The job isn’t to claim a rich-results slot. It’s to tell Google what the subject matter is and to resolve ambiguity. Applying this schema consistently helps Google to distinguish a publisher who has covered Marco Rubio fifty times from a publisher who has merely mentioned him once in passing.

These are not variations of the same tool. They sit at different layers of Google’s pipeline and feed different downstream features.

Yet the Ahrefs study pooled them. Article, FAQ, Product, HowTo, Persons, Organizations, and Concepts all in one bucket. Then it took an average.

Pooling tools that do different jobs and averaging the result is how studies produce numbers that look meaningful but have diminished explanatory power. It’s like averaging “took an aspirin” with “started chemotherapy” and reporting on “medical intervention” effects. Sure, they’re both medical interventions, but wildly different kinds. Throwing them into the same study saps that study of meaning.

Not all citations are the same

The other thing the study averages is citations themselves. Every citation counts as one. A citation on a query asked thousands of times a day counts the same as a citation on a query asked twelve times a month. A citation on a purely informational search (“what is a 401k”) counts the same as a citation on a bottom-of-funnel comparison (“best Roth IRA provider 2026”).

Those are wildly different in value. I would rather lose 4% of my generic informational citations and pick up citations on queries that might actually lead to a click or a conversion. A net loss in citation volume can be exactly the right move if your scope is tightening toward queries where the citation converts.

Consider it from a publisher’s perspective. A think tank covering Louisiana energy policy doesn’t need 10,000 citations on “what is energy policy.” It needs to be the cited source when someone asks an AI about a specific Louisiana ballot measure or PSC ruling. Those are smaller-volume queries with much higher intent. Schema and entity work tend to move pages toward the second kind of citation, sometimes at the cost of the first.

A study that counts citations as a flat unit cannot see that trade. The -4.6% movement might be exactly the trade that should be celebrated.

The confound that cuts both ways

Ahrefs flags this themselves: pages that add JSON-LD often change other things at the same time. They can’t fully separate schema from co-occurring changes. In real publisher work, schema changes usually happen as part of broader SEO work. Content pruning, taxonomy cleanup, internal-link restructuring, sitemap revision, and sometimes a CMS migration coincide with schema changes. A 4.6% citation dip could be entirely explained by a simple pruning exercise. If the schema was added as part of a CMS migration or site redesign, a 4.6% dip would actually be great as these sorts of disruptions often harm SEO in the short term—and this study was observing a 30-day window.

The confound cuts in the other direction too. Carolina Journal and Illinois Policy installed TopicalBoost without doing any broader site rethink. No mass pruning, no taxonomy collapse, no CMS migration. They added entity-identity schema and internal linking against topic pages, and that was it. The results—5x Top Stories appearances, tripled Discover clicks, dozens of entity queries moving from outside the top 100 into the top 10—are what entity work does when it’s deployed cleanly, in isolation.

The Ahrefs cohort isn’t doing what these clients are doing. The treatment being measured in the study is “schema added during an unspecified set of site changes.”

The treatment we measure on the publisher side is “entity-identity schema added to a working site.” Those aren’t the same intervention.

Ahrefs vs. Ahrefs

The cleanest evidence that the broader claims made from this study are a misread comes from Ahrefs’ own published material. Their Entity-Based SEO glossary entry lists schema markup as one of four pillars of entity-based SEO. Their own words: “Using schema markup to help search engines understand the entities and their relationships within the content.”

The study’s narrow finding and Ahrefs’ glossary can both be true. Schema is foundational to entity SEO—Ahrefs publishes that explicitly. Adding some JSON-LD to pages Google already understands perfectly doesn’t lift citations in 30 days—Ahrefs measured that.

These statements describe different layers of the same pipeline. The first is about how Google constructs its understanding of an entity and its relationships. The second is about whether a marginal markup tweak shifts retrieval-time outcomes on three AI surfaces over a short window.

How to read this study

Before applying any SEO study to your own work, you not only need to question whether a study’s methodology was sound—in this case I believe it is quite sound—you also need to ask yourself these questions:

What population does this examine?
On what surface?
Over what time window?
With what type of intervention?
Measured against what definition of the outcome?

When you ask those questions of this study, you get a result that is real, narrow, and useful. You also get a result that should not be stretched across the broader claim it’s being used to support.

Linehan and Guan are right about what they tested.

On the other hand, entity-identity schema, on the surfaces publishers actually care about, is a different test entirely. We run it every week. The results hold.