Understanding and correctly implementing robots meta tags will allow you to control how your content is indexed and presented in search engine results, improving your SEO efforts.
These tags are found within the <head> of an HTML document, the portion reserved for meta data of all sorts, like the page’s title, meta description, Facebook OpenGraph data or Twitter card data. The <head> may also contain instructions for the browser to access resources outside the site, like font libraries or tracking code.
Robots meta tags in the <head> give you granular control over how individual pages are handled by search engines. These page-by-page search engine directives work in conjunction with the typically broader rules found in your robots.txt file.
Robots Meta Tags for Publishers
For our purposes, we’re going to focus on only a small number of the total robots meta tags, those supported by Google and that publishers typically use:
- noindex – Do not show this page in search results.
- nofollow – Do not follow the links on this page. This blocks the discovery of new URLs through a given page.
- noindex, nofollow – Combines the directive above. This can also be expressed as none.
- all – No restrictions on indexing or crawling. This is the default, so it’s not necessary to add this to pages on a new site, but can be helpful if a site previously employed noindex or nofollow directives. This sends a clear message that your policy has changed.
- max-image-preview – By setting this directive to “large” or
<meta name="robots" content="max-image-preview:large">
you’re telling Google that the largest image on the page may be used as a preview image in search results, Google News, and Google Discover. This seems to have a particularly strong influence over Google Discover traffic. - max-video-preview – Similar to max-image-preview, this should be set to a value of -1, meaning that there is no limit on the length of video clip Google can show in its results.
- noarchive – This is largely obsolete as Google no longer offers a cached pages. Notably this does not block The Internet Archive, which requires using robots.txt rules.
- nosnippet, notranslate, noimageindex – We have never employed these on a client site, but they do as they state, blocking snippets (the short excerpts at the top of some Google results), translation, and appearance in the Google image search index. All of our clients want to maximize where they appear online and haven’t used these more granular controls .
- unavailable_after – You may have content that makes its way behind a paywall or into a members-only area of your site after a certain number of months or years. In this case setting a sunset date like
<meta name="robots" content="unavailable_after: 2027-06-03">
will let Google know that the page will be inaccessible after that date. Google will repay you for this courtesy as it will prevent your site from having indexed, but inaccessible pages, which usually negatively affect overall site metrics.
When to use noindex or nofollow
Sometimes our think tank or publishing clients believe that these tags should be applied to pages like terms of service or privacy policies. However, these pages are actually important markers of a professional and trustworthy websites, so they shouldn’t be blocked from indexing or crawling. In fact, there’s special markup for About, Contact, Ethics Policy, Publishing Principles, Ownership Info, and other specialty/disclosure pages.
Instead, noindex and nofollow should be reserved for pages that really shouldn’t appear in search results, like landing pages for a social media or email campaign, confirmation pages, or private event listings.
We’ve also had great success applying noindex, nofollow to pages that think tanks repost to their site from other sources, like op-eds or articles placed on other sites. Generally it’s a good idea not to duplicate content from other sites to your own, but think tanks often do it to show that they’re engaging in the public exchange of ideas, something donors value.
In those cases, we deploy canonicalization, noindex, or sometimes both, despite Google’s previous advice to never combine canonical and noindex, which it has since suggested can be used together.
Use Caution with Meta Robots Tags
These simple single-line declarations in code can make or break an entire site. We’ve seen several examples where sites have had no traction in Google search because a developer left a noindex, nofollow, or both directives in place after launching a new site. This is common because these tags should be used on development or staging sites to make sure that Google doesn’t index these works in progress. This underscores why you should work with an SEO in addition to a great developer.
Tools like Yoast SEO make applying these rules page-by-page or site wide incredibly easy, but that doesn’t mean they should be treated capriciously. When in doubt, let content be indexed. When in greater doubt, consult an SEO.