In this article let’s know What is Robots.txt file & Syntax used. How to Create a Robots.txt file and upload it in the server?
What is Robots.txt file?
Robots.txt is a text file which is created to instruct web robots (typically search engine robots like Googlebot, Bingbot, Msnbot, discobot, and Slurp) to allow or block web pages for crawling the website.
In practice, robots.txt files consist of one or more rules to allow or block the web-crawling software or search engine robots. Crawl instructions are specified by “disallowing” or “allowing” the behavior of specific (or all) user agents like Googlebot, Bingbot, Msnbot, discobot, and Slurp.
Technical robots.txt syntax listed below:
There are Six common terms which can be included in robots.txt as per the requirement:
1. User-agent: The specific web crawler to which you’re giving crawl instructions (usually a search engine robot like Googlebot, Bingbot, Msnbot, discobot, and Slurp, etc).
- Syntax: User-agent: [user-agent name]
- Example: User-agent: Googlebot
2. Disallow: The command to tell a user-agent not to crawl particular folder or URL.
- Syntax: Disallow: [URL string not to be crawled]
- Example: Disallow: /content
3. Allow (Only applicable for Googlebot): The command to tell Googlebot it can access a page or subfolder even though its parent page or subfolder may be disallowed.
- Syntax: Allow: [URL string to be crawled]
- Example: Allow: /mywebpage
4. Crawl-delay: How many seconds a crawler should wait before loading and crawling page content. Note that Googlebot does not acknowledge this command, but crawl rate can be set in Google Search Console.
- Syntax: Crawl-delay: [Seconds]
- Example: Crawl-delay: 10
5. Sitemap: Used to call out the location of an XML sitemap(s) associated with this URL. Note this command is supported only by Google, Ask, Bing, and Yahoo.
- Syntax: Sitemap: [Sitemap url]
- Example: Sitemap: abc.com/sitemap.xml
6. Noindex: No index is used to inform the crawlers not to index specific web-pages to avoid duplicate content getting indexed.
Ex: I have created a post with five different tags. So, in total six pages are produced (1 post page and 5 tag pages). As the Same content will be assigned to those Six pages it’s going to affect our SEO because of duplicate content for different web-pages.
- Syntax: Noindex: [URL string not to be crawled]
- Example: Noindex: /tag
You can know more about What is Robots.txt file? How to Create a Robots.txt file in this link.
Few Examples of Robots.txt:
Robots.txt file URL: www.example.com/robots.txt
Example 1: Blocking all web crawlers from all content.
Using this syntax in a robots.txt file would tell all web crawlers not to crawl any pages on www.example.com, including the homepage. And, * = all web crawlers
Example 2: Allowing all web crawlers access to all content.
Using this syntax in a robots.txt file tells web crawlers to crawl all pages on www.example.com, including the homepage. And, * = all web crawlers
Example 3: Allow all the web pages to crawl only by Google web crawler. Block the page name with “mysecretpage” from crawling. Time delay to crawl each page should be 5 secs, and sitemap should include in the robots.txt file.
How to Create a Robots.txt file & Upload it in the Server?
Here are the steps to create robots.txt file:
- In your PC, create a robots.txt file with the required syntax as mentioned in the above for your website.
- Log in to your website server or Cpanel.
- Go the file manager and in the main website directory upload the robots.txt file.