Archive for the ‘SEO’ Category

Robots.txt for Subdomains

April 18, 2013 3 comments

Do I need a unique robots.txt for each of my Subdomains?

The quick and dirty answer is yes. Spiders treat subdomains as separate website, and similar to how you create a unique robots.txt for each domain one should also be created for the subdomain.

When a spider finds a URL, it takes the whole domain name (everything between http:// and the next ‘/’), then sticks a ‘/robots.txt’ on the end of it and looks for that file. If that file exists, then the spider should read it to see where it is allowed to crawl.

In the case of multiple websites and sites with subdomains, the spider should try to access each of the sites, example: and The rules in each robots.txt file are treated as separate and unique so disallowing robots from should result in being removed from search results while would remain unaffected and could still appear in the index. In some cases you can disallow an entire subdomain via the main websites robots.txt file, but if you notice pages appearing into the index it’s time to go back to best practices and place unique robots.txt files at the subdomain level.

Here is an excerpt from Bing Webmaster Tools speaking exactly to the example above:

Note that the host here is the full subdomain (, not nor This means that if you have multiple subdomains, BingBot must be able to fetch robots.txt at the root of each one of them, even if all these robots.txt files are the same. In particular, if a robots.txt file is missing from a subdomain, BingBot will not try to fall back to any other file in your domain, meaning it will consider itself allowed anywhere on the subdomain. BingBot does not “assume” directives from other hosts which have a robots.txt in place, associated with a domain.

Best Practice for Robots.txt

Placing a robots.txt on every domain and subdomain, every time.

Free Robots.txt Tools

Resources for more information about Robots.txt

Examples of valid robots.txt URLs from Google WMT

Information for this table is taken from Google Webmaster Tools Guide for Controlling Crawl Index

Robots.txt URL Valid for Not valid for Comments This is the general case. It is not valid for other subdomains, protocols or port numbers. It is valid for all files in all subdirectories on the same host, protocol and port number. A robots.txt on a subdomain is only valid for that subdomain. not a valid robots.txt file!   Crawlers will not check for robots.txt files in subdirectories.
http://www.mü http://www.mü IDNs are equivalent to their punycode versions. See also RFC 3492. Google-specific: We use the robots.txt for FTP resources. (even if hosted on A robots.txt with IP-address as host name will only be valid for crawling of that IP-address as host name. It will not automatically be valid for all websites hosted on that IP-address (though it is possible that the robots.txt file is shared, in which case it would also be available under the shared host name). Standard port numbers (80 for http, 443 for https, 21 for ftp) are equivalent to their default host names. See also [portnumbers]. Robots.txt files on non-standard port numbers are only valid for content made available through those port numbers.