WordPress Robots.txt and Search Engine Indexing

In WordPress Settings->Reading you’ll see a check box entitled “Discourage search engines from indexing this site.” This seems like a good idea when you’ve just created a site, as you are probably not ready for the world to see it just yet. Before selecting this option, however, there are a few things you should know.

Robots.txt to disallow indexing of parts of your siteWordPress will create a virtual robots.txt once this option is selected, which will be read by search engine crawlers as they access your site. Basically this optional file contains a list of instructions for the crawlers, indicating whether to index certain pages or even whether to follow certain links.

The virtual robots.txt document WordPress creates when you’ve enabled this setting will tell these crawlers not to index your site, so your content stays out of the search rankings and reduces the likelihood that someone will stumble onto your site before you’re ready.

Essentially WordPress has created a robots.txt file with the following:

User-agent: *
Disallow: /

This has basically told crawlers everything about your site is off-limits.

Do you feel a “but” coming? Here it is: WordPress doesn’t always stop using this virtual robots.txt file once you’ve enabled this option.

That means that even if you decide you want search engines to start indexing your site by disabling the “discourage search engines from indexing this site” option, WordPress may continue telling crawlers to go away for awhile. This can be frustrating, as you can’t start ranking and building your SEO until you’re indexed.

Correcting This & Getting Your Site Indexed

There’s a relatively simple solution to overcoming this conundrum. WordPress is set to use any physical robots.txt file placed in its main directory, which overrides any virtual version of this file it has created. That means you can create your own simple text file that allows indexing and name it “robots.txt”, then place it in the base directory of your WordPress site (the directory with sub-directories such as “wp-admin” and “wp-content” in it).

The default format to use is:

User-agent: *
Disallow: /wp-admin
Disallow: /wp-includes

This will stop bots from crawling your admin and includes directories, which is generally a good idea. Anything not specifically called out in this file will be indexed, so in this case all your pages, posts, categories, and tags are things search engines like Google and Bing will explore.

Since this file overrides the virtual one WordPress made, the next time search engine crawlers visit your site they’ll see that they can index your site and will begin actually crawling its content.

Should I Use WordPress’ “Discourage Indexing” Option When Building A Site?

Generally speaking, no (in my opinion). Once search crawlers visit your site once and see they can’t index it, it can be awhile before they come back. Because of this, even if you’ve followed the above steps and made your site accessible, they may not return to notice that for some time. This leaves your site invisible to the likes of Google and Bing.

If there was a reliable way to get indexed immediately after un-checking the “discourage indexing” option it’d be different.

While you’re in the midst of building a site out, chances are you haven’t linked to it or shared it anywhere. Any SEO work you’ve done probably won’t have taken hold yet, so even if you’re indexed it’s unlikely that you’ll actually rank for anything at first (especially if it’s a new domain). Therefore you’re probably not going to get accidental traffic anyway, and this way when you’re ready to officially launch your site is already accessible to search engines with no delay.

An exception to this would be if you foresee the site being in development for a long time (several months or more).

If you’ve enabled indexing and are looking for a way to get the crawlers to revisit your site faster, share it on social networks and see if you can get others to engage with it. Google will notice the backlinks and the fact that it’s been ‘liked’ or shared, and will be more apt to revisit sooner.

Further Reading:

Trackbacks/Pingbacks

  1. With Or Without www In Domains - SEO FYIs | Metanigma - September 18, 2013

    […] WordPress Robots.txt and Search Engine Indexing […]

Leave a Reply

Show Buttons
Hide Buttons