- Tutorials / S.E.O.
- Wednesday, 15th Sep, 2010
If you are fresh in web industry, you might wonder what is the ‘robots.txt’ file that stored in your server root directory.
Robots.txt file is a regular text file that restrict access to your website by search engine robots, spiders or crawlers. You can instruct the robots to not crawl and index certain files or directories within your website. There are two ways to generate the robots.txt file, either by using online robots.txt generator or manually creating a robots.txt file.
In order to use a robots.txt file, you must have access to the root directory of your domain and the ‘robots.txt’ file must be stored in the root of your website (e.g. www.example.com/robots.txt).
Below is the basic format of the robots.txt:
User-agent: * Disallow:
With the above declared, all search engine robots are allow to access and index all the files of the website.
There are few common terms that you will use throughout the robots.txt file:
This is the field used to specify the robots name. For example, you might want to restrict the Google search engine accessing and indexing your website by writing your value as ‘User-agent: Googlebot’. If the value is ‘
*‘, its means ALL search engine robot.
This is the field to specify any URL that is to be visited. For example, Allow: /support allow robots to access and index all the files that under support folder.
This is the field to specify any URL that is NOT to be visited. For example, Disallow: /support/about.html disallow robots to access and index ‘about.html’ in the support folder.
1. Example below that tells all the robots to access and index the directories of website except Google Robots
User-agent: * Disallow: / User-agent: Googlebot Allow: /
2. Example below that tells all robots to only accessing and indexing logo.jpg and avatar.jpg in the images folder.
User-agent: * Disallow: /images/ Allow: /images/logo.jpg Allow: /images/avatar.jpg