Guides and InsightsWeb Internet

google-bot

The robots.txt file is a file used to deny or allow access to engine spiders search the site or some pages and web resources.

The robots.txt file must be called and must be inserted in the root of the website: http://www.tuosito.com/robots.txt.

The format of the robots.txt file

The format and semantics of the file “/ robots.txt is as follows:

The file consists of one or more rows and each record contains lines consisting of “:”. The field name is case insensitive (you can use either uppercase or lowercase).

Comments may be used by inserting the character ‘#’ (using UNIX code). Everything in the character ‘#’ is not considered by the machine that reads the file.

The record begins with the first line followed by a User-agent Disallow line just as detailed below:

User-agent

– The value of this field is the name of the robot to which the record is describing access. – If you insert more robots, all robots have entered the same treatment. – If you asegna the value as ‘*’, in user-agent all robots have the Journey.

Disallow

– The value of this field specifies a part of URLs that should not be visited. For example: Disallow: / login – At least one field must be present.

Sitemap

The value of this field is to tell the robot where is the sitemap of the site. It is not a mandatory and if you have more Sitemaps you can add more lines.

Example: Sitemap: http://www.mysite.com/sitemap.xml

E ‘to keep in mind that a robots.txt file will be considered void as if there by search engines, so I considered the robots are “welcome” to each file of the site.

Illustrative example of various files robot

# Robots.txt written by Simon for http://www.esempio.com/

User-agent: * Disallow: / admin / login / # This is an infinite virtual URL space Disallow: / tmp / # temporary folder Disallow: / quelchetipare.html Sitemap: http://www.miosito.com/sitemap.xml

This file specifies that no robots robot can visit the URLs that begin with “/ admin / login /” or “/ tmp /” or the page / quelchetipare.html. Also indicates the location of the sitemap.

Everything in writing after “#” is just my comment and is not read by the robots.

Here is a more complex case with the authorization of one or more robots:

# Robots.txt written by Simon for http://www.esempio.com/

User-agent: * Disallow: / admin / login / # This is an infinite virtual URL space Disallow: / tmp / # temporary folder Disallow: / anythinck.html Sitemap: http://www.miosito.com/sitemap.xml

# Yahoo! can index everything. User-agent: Yahoo! Slurp Disallow:

To leave the field empty disallow robots enter the Yahoo! in all the files without restrictions.

How Utlimi example, a file robot which closes the door to all bots of search engines:

# Do not want the site in no engine search User-agent: * Disallow: /

Realated Articles:








Leave a Reply

*

Articles Email Notification:
If you'd like to receive an email notification for each new article, enter your email address here
Loading...Loading...
NEWSLETTER
Loading...Loading...
Who's Online
0 visitors online now
Search in Tecnics
Archives
Categories