The meta robot tags are used as part of the html code on a website and are created to tell search engines to not crawl specific parts of a website or an entire page. There are different ways you can use them and apply them. By default, major search engines index and crawl all web pages no matter if there is a robots tag or not. How you define the tag gives search engines instructions on what to do when they do crawl the page.
If you want search engines to crawl all of the pages, then do not ever use the tags. By default, the pages will then be crawled, indexed and archived. You have to deny these things within the robots tag through the on-page html code. The robot tags can be found at the top of the page between the header tag and this is where search engines look when they crawl a website.
There is also an option to include a robots.txt file, which can easily prohibit crawlers from going through and indexing certain pages at all. Meta robot tags have a lot more options to them and ways of preventing certain actions.
Different Types of Meta Robot Tags
As a rule of thumb, the meta robot tags always go between the header tag on the top of the page’s code and it can go anywhere in between the header tag. As part of this tag, the name attribute should be specified as Robots, which applies to all crawlers. You can also change this attribute to be specific to a certain search engine and several of these can be included. Here is an example:
<meta name=”googlebot” content=”xxx” />
The content value that follows the name attribute should include the directive you would like the search engine to use. There are several values you can include to prevent the search engine from performing certain actions. A widely used value is noindex, which says to not include the page in the search engine’s index. Here is an example:
<meta name=”robots” content=”noindex” />
Another frequent value is the nofollow, which prevents crawlers from following a link on the page. Keep in mind that this does not mean the links will never be followed. There are instances where they may be, such as through other sites, others linking to that page which do not include a nofollow, through browser toolbars, analytics, etc. Here is an example:
<meta name=”robots” content=”nofollow” />
Another thing to keep in mind is that with multiple directives, spaces do not make a difference within the tag. You can put or not put spaces in between commas. Capitalization does not make a difference either by Google’s specifications. You can use all caps or all lowercase.
Are Meta Robots Necessary?
There are some instances where the meta robots tag should be used and is even essential. Search engines follow a set of guidelines to rank pages and certain things can lower your ranking or be against their rules. Some of these things can be avoided with the robots tags.
If your website has pages with duplicate content, this can hurt how you rank. Duplicate content can happen several ways including having two different pages that have content, which looks almost exactly the same. Some sites may have this for different reasons but if you do and would like to hide one of those pages from being indexed, you would need to use the meta robots tag of noindex for instance. In some circumstances, it would be better to write a canonical or pagination tag. This will tell them which page to include in their directories and which to exclude from the search results page.
Another good way to use robot tags is if you have multiple URLs on your page that you do not want followed. This way, outgoing links on your page going to other websites will not be crawled. If the links on your page lead to untrusted content for any reason you will want to use the nofollow value. Another reason to use this is to prioritize certain links on your page. For instance, you will not want registration type links since they are not useful for search engines.
<meta name=”robots” content=”nofollow” />
Using the Robots.txt File
Other than the meta robot tags, there is another way you can prevent a search engine from crawling your pages. This can be done with the robots.txt file, which is a robots exclusion protocol (REP). This instructs search engine robots on how to crawl and index the page. You do this by saving a blank page through a text editor as robots.txt and save some rules to it. Then it will need to be added to the top level directory of the web server. Here is an example of code to disallow indexing of everything:
This is another way to get around placing the robot instructions in a tag at the top of the header of the page code, although it is not recommended to only depend on this method. The file name in this situation is case sensitive. Keep in mind though, it is still public so anyone can still see what has been blocked from the search engines, so do not include private information in these files. Each page you are trying to not index on a website needs to be included in this file.
If for any reason you need to exclude pages from your website, the best way is by using the meta robot tags. Implementing this tag can be tricky or confusing at times. Contact Customer Paradigm today for a reliable and reputable company to help with all your Colorado SEO needs!