Home > Best Practices, SEO, SEO > Robots.txt for Subdomains

Robots.txt for Subdomains

Do I need a unique robots.txt for each of my Subdomains?

The quick and dirty answer is yes. Spiders treat subdomains as separate website, and similar to how you create a unique robots.txt for each domain one should also be created for the subdomain.

When a spider finds a URL, it takes the whole domain name (everything between http:// and the next ‘/’), then sticks a ‘/robots.txt’ on the end of it and looks for that file. If that file exists, then the spider should read it to see where it is allowed to crawl.

In the case of multiple websites and sites with subdomains, the spider should try to access each of the sites, example: domain.com/robots.txt and subdomain.domain.com/robots.txt. The rules in each robots.txt file are treated as separate and unique so disallowing robots from domain.com/ should result in domain.com/ being removed from search results while subdomain.domain.com/ would remain unaffected and could still appear in the index. In some cases you can disallow an entire subdomain via the main websites robots.txt file, but if you notice pages appearing into the index it’s time to go back to best practices and place unique robots.txt files at the subdomain level.

Here is an excerpt from Bing Webmaster Tools speaking exactly to the example above:

Note that the host here is the full subdomain (us.contoso.com), not contoso.com nor http://www.contoso.com. This means that if you have multiple subdomains, BingBot must be able to fetch robots.txt at the root of each one of them, even if all these robots.txt files are the same. In particular, if a robots.txt file is missing from a subdomain, BingBot will not try to fall back to any other file in your domain, meaning it will consider itself allowed anywhere on the subdomain. BingBot does not “assume” directives from other hosts which have a robots.txt in place, associated with a domain.

Best Practice for Robots.txt

Placing a robots.txt on every domain and subdomain, every time.

Free Robots.txt Tools

Resources for more information about Robots.txt

Examples of valid robots.txt URLs from Google WMT

Information for this table is taken from Google Webmaster Tools Guide for Controlling Crawl Index

Robots.txt URL Valid for Not valid for Comments
http://example.com/robots.txt http://example.com/ http://other.example.com/ This is the general case. It is not valid for other subdomains, protocols or port numbers. It is valid for all files in all subdirectories on the same host, protocol and port number.
  http://example.com/folder/file https://example.com/  
    http://example.com:8181/  
http://www.example.com/robots.txt http://www.example.com/ http://example.com/ A robots.txt on a subdomain is only valid for that subdomain.
    http://shop.www.example.com/  
    http://www.shop.example.com/  
http://example.com/folder/robots.txt not a valid robots.txt file!   Crawlers will not check for robots.txt files in subdirectories.
http://www.müller.eu/robots.txt http://www.müller.eu/ http://www.muller.eu/ IDNs are equivalent to their punycode versions. See also RFC 3492.
  http://www.xn--mller-kva.eu/    
ftp://example.com/robots.txt ftp://example.com/ http://example.com/ Google-specific: We use the robots.txt for FTP resources.
http://212.96.82.21/robots.txt http://212.96.82.21/ http://example.com/ (even if hosted on 212.96.82.21) A robots.txt with IP-address as host name will only be valid for crawling of that IP-address as host name. It will not automatically be valid for all websites hosted on that IP-address (though it is possible that the robots.txt file is shared, in which case it would also be available under the shared host name).
http://example.com:80/robots.txt http://example.com:80/ http://example.com:81/ Standard port numbers (80 for http, 443 for https, 21 for ftp) are equivalent to their default host names. See also [portnumbers].
  http://example.com/    
http://example.com:8181/robots.txt http://example.com:8181/ http://example.com/ Robots.txt files on non-standard port numbers are only valid for content made available through those port numbers.
Advertisements
  1. September 16, 2013 at 11:40 pm

    Hi can I add the subdmain to the crawl list in the Robot.txt file!? I need http://blog.example.com to be crawled.

    • christijolson
      October 29, 2013 at 3:30 am

      Yes. You can include subdomains to crawl and have indexed as well as to not have crawled or indexed. If you have alot of rules for individual subdomains what you can do is set up the main robots.txt file for the site with instructions pointing to each each subdomain, and then develop a unique robots.txt for each individual subdomain.

  2. Jim
    December 8, 2014 at 6:38 pm

    Hi,
    Can you provide the specific documents on google and bing that state each subdomain should be treated separately for robots.txt?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: