The Mullet Effect: How Headless WordPress Can Go Wrong

Amazon Cloudfront frontend, WordPress backend: a powerful combo? Not when crucial signals like RSS feeds and sitemaps are broken, leading to traffic loss.

I recently worked with a publisher running a headless WordPress setup—the mullet of website setups. All Amazon Cloudfront business on the frontend, the wild party of WordPress in the backend.

In all seriousness, this website was nicely done. A modern stack, fast pages, and good architecture, at least in the abstract. But in the handoff between WordPress and AWS, two vital elements were broken.

The first was their RSS feeds. WordPress generates feeds for everything by default. A main feed, per-category feeds, per-tag feeds, per-author feeds, per-post comment feeds. On this client, none of them reached the public site. The static frontend was configured to pass through pages but not files, and /feed/ (along with everything that sat under it) was treated as a file. Anyone hitting any of those endpoints got a 404. The feeds existed in the backend, but were unreachable from the open web.

The second was their XML sitemap. Every time the static site regenerated, the sitemap re-stamped the lastmod date on essentially every URL. Posts published last week, posts published last year, and posts published decades ago, some from before the web existed, all carried a recent lastmod. The signal was technically present but meaningless.

That second one is the more interesting failure. A 404 on /feed/ is at least obviously broken. A sitemap with the wrong modification dates is the kind of thing nobody notices until something starts behaving strangely or traffic slowly melts away.

Why I noticed

I run TopicalBoost for our clients. It’s a content analysis tool that identifies entities in a post, surfaces search-volume data inline in the editor, and writes Schema.org structured data and internal links on publish. To verify that it’s working on a given client site, I have an n8n workflow that monitors new posts and checks the resulting metadata.

On every other client site I have set up that way, the workflow points at the sitemap or the RSS feed. Both are designed for exactly this use case: tell an outside system what changed without making it re-crawl your homepage every hour.

On this client, neither worked. The feed wasn’t there. The sitemap claimed everything had just been updated, so I couldn’t distinguish a post that went up this morning from a post from the Reagan administration.

I ended up using ScrapingBee to scrape the homepage and category archives several times a day, diffing the URL lists, and treating new URLs as new posts. That works. It’s also a bad way to do it. It’s brittle, it’s wasteful, and it only catches posts that surface on the homepage or in a top-level archive within my polling window.

Small costs become big costs at scale

I had to run one crawling workflow watching one site. The cost of doing that for me is some extra API credits and a more fragile pipeline. Annoying, but survivable.

Google runs essentially this workflow against every publisher on the open web. They are doing the same job I was doing, at the scale of billions of pages. When your site provides a reliable sitemap and a working RSS feed, you have handed Google a cheap way to determine what’s new since it last checked. When your site doesn’t, Google has to fall back on the same thing I did: crawl pages and look for what changed.

Google can do that. Google has the resources to do that for anyone. But crawl is rationed and that rationing is the thing publishers actually feel.

The surfaces that actually depend on freshness

This is the part where most publishers tune out, because their organic rankings look fine.

That’s the trap. Slow discovery doesn’t really sink regular organic ranking. Google can take its time finding your page, and once it’s in the index, your evergreen content competes on the same signals as everyone else’s: relevance, authority, content quality, entity coverage. A two-day delay in discovery doesn’t materially change whether your three-thousand-word policy explainer ranks for the right queries six months from now.

Three Google surfaces work differently, and they happen to be the three publishers care about most.

  • Google News filters for freshness by design. A story that gets discovered ninety minutes after publication has a meaningfully worse shot at appearing in a Google News feed than the same story discovered five minutes after publication. The window where news ranks as news closes quickly.
  • Top Stories carousels in regular Search sit on top of the same news index. Whatever delays a story from showing up in Google News also delays it from qualifying for the Top Stories carousel on the queries where that carousel renders.
  • Google Discover is the most freshness-sensitive of the three. Discover surfaces content into users’ feeds based on a combination of topical interest and recency. A piece that took an extra day to enter Google’s index is a piece that missed its Discover window entirely on the news cycle that would have driven the traffic.

So a publisher with broken freshness signals can look healthy on a rankings report. Organic positions hold. Evergreen pages perform. The things that erodes are the surfaces that deliver the biggest publisher traffic spikes. You don’t notice that on a dashboard that only tracks keyword positions.

What Google actually does with sitemaps and RSS

It’s worth grounding all this in what Google says rather than taking my word for it.

On lastmod, Google’s own documentation is explicit: “Google uses the <lastmod> value if it’s consistently and verifiably accurate.” Gary Illyes from Google’s search team has described the lastmod signal as essentially binary. If your site has a history of inaccurate lastmod values, Google stops trusting the tag for your whole site. Not just on the URLs where it was wrong. On all of them.

That’s the deeper consequence of the case I described. The static frontend wasn’t just stamping the wrong date on a few pages. It was poisoning the signal for the entire sitemap. Every legitimately fresh post on that site was carrying the same lastmod value as a page from 1972, and Google’s response to that pattern is to ignore the field rather than try to guess which entries are honest.

RSS plays a complementary role. Google has been explicit about how the two formats fit together since at least 2014, when the Search Central team published guidance recommending publishers use both: “XML sitemaps describe the whole set of URLs within a site, while RSS/Atom feeds describe recent changes.” The two aren’t redundant. The sitemap is the full inventory. The feed is the most-recently-changed slice of that inventory, delivered in a format Google can poll cheaply and frequently.

The practical move I make on every WordPress site I work with is to submit the /feed/ URL to Google Search Console alongside the regular sitemap. GSC accepts RSS feeds in the Sitemaps tool, and it gives Google a second channel for “here’s what just changed” without depending on the feed being auto-discovered from the site’s <head>. It’s a thirty-second action, and it puts the freshness signal in Google’s hands directly.

A couple of years ago I wrote a post called Sitemaps Aren’t Maps, and the point worth reinforcing here is the same one. A sitemap is not a map of your site. It’s not a substitute for internal linking, it doesn’t communicate anything about the importance or centrality of a page, it won’t make up for an un-crawable site.

It’s just a simple inventory, and the most useful column in that inventory is the dates. Google uses the lastmod field as the primary way to decide what’s worth re-crawling and how soon. A sitemap with reliable, accurate dates is a prioritized work list for Googlebot. A sitemap with broken or re-stamped dates is a work list with no priorities, which Google handles by ignoring the dates and falling back on its own crawl scheduling. The inventory still exists, but the prioritization signal that made the inventory useful is gone.

This is becoming a rule, not the exception

I’m naming the architecture rather than the publisher because this isn’t a one-off.

Headless WordPress with a React, Next.js, or Gatsby frontend is a pattern that has spread across publishers over the last several years. The benefits are real: better page performance, more flexibility for design teams, separation between editorial workflow and presentation layer.

The cost most teams underestimate is that everything WordPress used to do for you about being a good web citizen now has to be re-implemented deliberately on the frontend.

WP Engine, who hosts many of these setups, has an entire developer guide on handling sitemaps in headless WordPress with Next.js. The guide exists because problems exist. Their recommendation is WP Sitemap REST API, a plugin that exposes WordPress sitemap data through endpoints the Next.js frontend can consume. None of this is automatic.

What to do about it

Two reasonable paths.

The first is to stick with a standard WordPress setup. Yoast or RankMath sitemaps, WordPress’s native feed, a sensible cache. These tools have been refined against Google’s behavior for over fifteen years. Their default configuration produces correct freshness signals. If you don’t have a strong reason to go headless, the standard path solves this problem for free.

The second is to go headless deliberately. Decide that you’re trading default SEO infrastructure for design and performance benefits, and put real engineering effort into rebuilding the infrastructure you just lost. On the sitemap side, that means making sure lastmod reflects per-post post_modified rather than build time. On the RSS side, it means routing /feed/ (and any per-category feeds you publish) through to the public origin. Tools like next-sitemap for Next.js sites and the WP Sitemap REST API plugin on the WordPress side cover most of the work. Simply Static, if you’re using it, has an “Additional URLs” configuration where you can explicitly include /feed/ so it gets captured in the static build.

What I’d avoid is the implicit middle path, where a team picks a headless architecture without anyone owning whether the freshness signals carried over. That’s the version where things look fine on the rankings dashboard and quietly underperform on the surfaces that drive the most traffic.

In this particular case, the fix was lighter than the problem looked. The team just needed to allow more things to pass through to the static frontend. So the breakage wasn’t an architectural mistake per se; rather, it was a configuration that nobody had a reason to revisit until somebody downstream tripped over it.

The floor under everything else

Topical authority, entity recognition, Knowledge Graph signals, structured data, internal linking strategy, the whole layer of work that determines how Google understands what your site is about, sits on top of an assumption: that Google can find your new content quickly and place it correctly on the timeline of what your site has been publishing.

If the discovery layer is broken, the work above it doesn’t stop mattering. It just stops compounding at the speed it should. The pieces you publish today take longer to enter the picture. The topical archives you’ve built up don’t get refreshed in Google’s eyes. The freshness-gated surfaces, the ones publishers care most about, drift out of reach.

Sitemaps and RSS feeds are the cheapest, oldest, least glamorous part of technical SEO. They’re also the part that decides whether Google sees you publishing in real time or finds out about your work next Tuesday. On a standard setup, they handle themselves. On a non-standard setup, somebody has to be deliberate about making them work. It’s worth checking which one you’re running.

Connect

Contact

1 Lee Hill Road
Lee, NH 03861

‪(978) 238-8733‬

Contact Us

Subscribe

Join our email list to receive the latest updates.