What is robots.txt?

Tutorial Details

Difficulty: Beginner
Technology: Basic Server-Side Knowledge
Supported Browser: –

Introduction

If you are fresh in web industry, you might wonder what is the ‘robots.txt’ file that stored in your server root directory.

Robots.txt file is a regular text file that restrict access to your website by search engine robots, spiders or crawlers. You can instruct the robots to not crawl and index certain files or directories within your website. There are two ways to generate the robots.txt file, either by using online robots.txt generator or manually creating a robots.txt file.

Requirements

In order to use a robots.txt file, you must have access to the root directory of your domain and the ‘robots.txt’ file must be stored in the root of your website (e.g. www.example.com/robots.txt).

About the Format

Below is the basic format of the robots.txt:

[code]
User-agent: *
Disallow:
[/code]

With the above declared, all search engine robots are allow to access and index all the files of the website.

There are few common terms that you will use throughout the robots.txt file:

User-agent

This is the field used to specify the robots name. For example, you might want to restrict the Google search engine accessing and indexing your website by writing your value as ‘User-agent: Googlebot’. If the value is ‘*‘, its means ALL search engine robot.

Allow

This is the field to specify any URL that is to be visited. For example, Allow: /support allow robots to access and index all the files that under support folder.

Disallow

This is the field to specify any URL that is NOT to be visited. For example, Disallow: /support/about.html disallow robots to access and index ‘about.html’ in the support folder.

Example

1. Example below that tells all the robots to access and index the directories of website except Google Robots

[code]
User-agent: *
Disallow: /
User-agent: Googlebot
Allow: /
[/code]

2. Example below that tells all robots to only accessing and indexing logo.jpg and avatar.jpg in the images folder.

[code]
User-agent: *
Disallow: /images/
Allow: /images/logo.jpg
Allow: /images/avatar.jpg
[/code]

Search Engine Optimization (SEO) for Beginner