A robots.txt file is a useful way of flagging something to the web robots trawling your content, and it’s certainly nothing new. However, we’ve recently started using them in a rather different way; employing robots.txt validators that won’t validate unless there is an XML sitemap specified in conjunction to it.
Why this works…
It’s a really good idea to do this, as it ensures that the XML sitemap gets picked up by all web robot crawlers that support the ‘sitemap’ tag in the robots.txt file. Plenty of search engines do, including Google and MSN, which makes it doubly worthwhile.
An example of using robots.txt with a sitemap location
Let’s say, for example, that we’re using sitemap for domain example.com located at http://www.example.com/sitemap.xml. The robots.txt file would look something like this:
It’s really important to use a blank link between the user-agent tag and sitemap tag, otherwise you might be inadvertently giving some of the web crawlers some issues.