H- 49, Sector 63
Noida, IN 201301
24x7 Ticket Support
Open Ticket For Support
[email protected]
1 hr response

What is the robots.txt file?

What is the robots.txt file?

Robots.txt is a text file webmasters create to instruct web robots ( search engine robots ) which pages on your website to crawl or not to crawl.

The robots.txt file is primarily used to specify which parts of your website should be crawled by spiders or web crawlers. It can specify different rules for different spiders.

Googlebot is an example of a spider. It’s deployed by Google to crawl the Internet and record information about websites so it knows how high to rank different websites in search results.

  • Example of Robots.txt file URL: https://www.xyz.com/robots.txt
  • Blocking all web crawlers from all content

User-agent: *

Disallow: /

Using this syntax in a robots.txt file would tell all web crawlers not to crawl any pages of the website, including the homepage.

  • Allowing all web crawlers access to all content

User-agent: *

Disallow:

Using this syntax in a robots.txt file tells web crawlers to crawl all pages of the website, including the homepage.

  • Blocking a specific web crawler from a specific folder

User-agent: Googlebot

Disallow: /xyz-subfolder/

This syntax tells only Google’s crawler (user-agent name Googlebot) not to crawl any pages that contain the URL string.

  • Blocking a specific web crawler from a specific web page

User-agent: Bingbot

Disallow: /xyz-subfolder/blocked-page.html

This syntax tells only Bing’s crawler (user-agent name Bing) to avoid crawling the specific page.

There are two important considerations when using /robots.txt:

  • Robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
  • The /robots.txt file is a publicly available file. Anyone can see what sections of your server you don’t want robots to use.

Recommended Posts

Leave a Reply

Your email address will not be published. Required fields are marked *