Custom robots txt Blogger (all-in-one guide)

Custom robots txt for blogger thumbnail

Blocked by robots.txt! Indexed through blocked by robots.txt! Submitted URL blocked by robots.txt! How to create custom robots txt for Blogger?

Did these terms are familiar to you? Here you get the proper knowledge and solution for all of these. You will get even more in this all-in-one guide. Needless to say, after taking the guide of this article you can set up your custom robots txt blogger in one go. You can be expert enough even to understand each basic term used in robots txt. Without wasting your valuable time let’s get started.

What is robots.txt

Robots.txt is a set of instructions for different web crawlers/spiders/bots used to control which part can be crawled of a website and which part should not be crawled.

Why robots.txt is important

Using robots.txt is the recommendation of google itself. It helps in the following way:

1) Clear instructions for bots:

When robots enter your website look for any instructions first. Robots.txt is a good way to give them instructions. It provides clarity from both sides, i.e, crawlers and website owners.

2) Control the pages for not crawled by bots:

You can prevent all the pages which you do not want to be crawled. Below are a few cases.

i) There may be some sensitive information located on your website.

ii) Some product you want to launch after a certain date. Before that, you can keep it hidden from search engines.

3) Prevent overloading the requests:

There are too many pages that can be generated on your website. For example, if the “search directory” is crawled then a website can be overloaded with queries. A search directory is created when any queries are being searched by a search box within your website. These types of directories can be disallowed. It will prevent overloading your website. Thus it helps to crawl only your desired SEO optimization post.

4) Optimization of Crawl budget:

When Googlebot enters your website it starts to crawl all the accessible queries. It can crawl your website for a specific time depending upon the authority, size, etc criteria. It is known as the crawl budget. Within the specific time, it is best to instruct to crawl necessary files only. Your important file can be missed because of unnecessary requests. Hence, It has a direct effect on your on-page SEO.

5) Best index method:

Including robots.txt will help you to get a fast index of the well-written post. You have also no worries about the random index of the pages. Let’s say, you want “domain/URL 1” to be indexed. Without robots.txt there is a possibility of “domain/label 1/URL 1” can be indexed. So it is best to include robots.txt to index only the pages with the specific directory.

Limitations of robots.txt

It has a few limitations too. For example –

1) Some standard bots like Googlebot always obey the robots.txt whether some bots do not care about it and crawl in their way.

2) Some bot uses a special character for understanding the command. The general command may not understand them and will not crawl according to your discretion.

Even if you have not included robots.txt then also search engines will crawl to your site. However, including robots.txt can be a wise step for crawling efficiency in your website.

Examples of robots.txt

There are lots of examples that can be given for robots.txt. Here you find two examples.

1) Simple robot.txt example:

User-agent: *

Disallow: /search

Allow: /

Sitemap: https://www.bloggerguidepro.com/atom.xml?redirect=false&start-index=1&max-results=500

2) Relatively bigger robots.txt file example:

# Updated on Mon, 1 Aug 2022 09:30:35 GMT

Sitemap: https://www.domain or subdomain name/sitemap.xml

#instruction for Googlebot

User-agent: Googlebot

Disallow: /search

Disallow: /offline

Disallow: /information

Disallow: /online

Allow: /offline/employee

Allow: /information/selling

#instruction for Yahoo bot

User-agent: Slurp

Disallow: /search

Allow:/

# Block Semrush Bot

User-agent: SemrushBot

Disallow: /

#instruction for other bots

User-agent: *

Disallow:/search

Disallow:/category

Disallow:/tag

Allow:/

Understanding each term

Here you find a clear understanding of each term. It will help you out from the fear when using robots.txt in any area. Let’s meet your curiosity.

We are going to explain all the elements used in the below demo robot.txt. It is generated from Lebanon and used in my account. We will discuss how to generate custom robots.txt for Blogger after knowing all the terms.

# Blogger Sitemap created on Mon, 1 Aug 2022 09:30:35 GMT

# Sitemap built with https://www.labnol.org/blogger/sitemap

User-agent: *

Disallow: /search

Disallow: /category/

Disallow: /tag/

Allow: /

Sitemap: https://www.bloggerguidepro.com/atom.xml?redirect=false&start-index=1&max-results=500

Let’s know this one by one.

1) Using # for comment:

Any type of comment can be placed by using a hutch (#). It helps for a better understanding of the specific function. It is most helpful to understand when it is updated. You can see two # are used for comment here. One is about updates and the other is about the source from where it is created.

2) User-agent:

For simple understanding break this term. It indicates the agent who is going to use your robots.txt file. That can be Googlebot, Yahoo bot, or infinite bot like that. You will consider a bot to crawl according to your criteria. Here are a few examples –

“User-agent: Googlebot”

It means you are setting instructions for Googlebot.

“User-agent: Googlebot

User-agent: Bingbot”

It means you are considering two bots here to give the instruction. You can set these terms of instructions for many bots separately.

Some notable user agents:

Here are a few notable user agents which can be useful for you.

Google – Googlebot

Bing – Bingbot

Yahoo – Slurp

DuckDuckGo – DuckDuckBot

Yandex – YandexBot

Baidu – Baiduspider

Google Images – Googlebot-Image

Bing Images and Videos – MSNBot-Media

Google Videos – Googlebot-Video

Google News – Googlebot-News

Semrush – SemrushBot

Moz – Rogerbot

Ahrefs – AhrefsSiteAudit

3) Use of asterisk (*):

When we set the instruction for all bots, we demarcate it by using Asterisk (*). It has very common use. A simple example of it is given below.

User-agent: *

Disallow: /search

Allow: /

This example indicates all bots can crawl the whole website except those related to the search query.

4) Disallow:

This instruction helps to prevent crawling for the specific page. We already know why it is important. Sometimes it is used to block the crawler for the entire site. Here is an example of a blocking bot of Moz.

# Block Moz for crawling

User-agent: Rogerbot

Disallow: /

Disallow is mainly used to prevent crawling for specific pages or queries. It enhanced the efficiency of the crawler about which pages can be crawled and which page is not.

5) use of first slash (/):

It refers to a root directory. The root directory is the folder situated in the topmost position in a hierarchy. One example of a root directory is https:/www.bloggerguidepro.com. When you use only slash (/) it refers to all the instructions for all files within your website. Here is an example –

#Block Ahrefs from crawling

User-agent: AhrefsSiteAudit

Disallow: /

This example says no file can be crawled by the Ahrefs bot.

6) Root directory and subdirectory:

The top-level directory among the folders is known as a root directory. It can be considered as “layer 0″. The next directory can be termed a subdirectory which can be called ” layer 1″ and so on. Look at the below examples for a clear understanding.

1) Root directory (layer 0): https:/www.bloggerguidepro.com/

2) Subdirectory (layer 1): https:/www.bloggerguidepro.com/information/

3) Layer 2: https:/www.bloggerguidepro.com/information/employee/

Similarly below are examples of layer 0, layer 1, and layer 2 for the robots.txt file.

Disallow: /

Disallow: /information/

Disallow: /information/employee

The third disallow (layer 2) prevents crawling employee information within the information subdirectory under the website.

7) Disallowing search:

It is mostly used by websites. Multiple queries can be generated by the search. It produces when people search for any query in the search box of your website. Below is one example –

https://www.bloggerguidepro.com/search?q=Types+of+blogs&m=1

So disallowing the search button prevents huge requests for unnecessary files. It properly utilizes the crawl budget. An example is –

User-agent: *

Disallow: / search

Allow: /

8) Disallowing category:

It has a similar function to search. It prevents crawling the query which is started from the category directory or path. You can allow a specific path too.

User-agent: *

Disallow: / search

Disallow: / category/

Allow: / category/ SEO

Allow: /

Here you allow for queries of only the SEO category. It is written for example only. It does not need to use without a specific cause.

9) Disallowing tags:

It acts similarly to likes categories. Any queries are prevented starting from the path of the tag. To prevent the label from crawling that is used in custom robots.txt blogger. Below is an example.

User-agent: *

Disallow: / search

Disallow: / category/

Disallow: / tags/

Allow: /

10) Allow:

Allow permits for crawling. You can allow many elements to place a command in each line. You can understand it by looking below.

User-agent: Googlebot-Image

Disallow: /

Allow:/images

Allow: /blog/images

Allow:/gallery/images

Note: When conflicts occur then accept the more specific instruction. Here allow is more specific than disallowing. That’s why specific allow elements will be considered first.

11) Sitemap:

A sitemap is used to provide information about your site. It can be a .xml sitemap (large) or an RSS/atom sitemap (small). Google recommends using both.

Sitemap information from Google

Two examples are provided below:

i) XML sitemap:

https://www.domain name/sitemap.xml

ii) atom sitemap:

https://www.domain name/atom.xml?redirect=false&start-index=1&max-results=500

You have to know about each element. Now ready to generate custom robots txt for blogger.

Some robots.txt best practices

Here are some best practices you have to remind.

1) Set up robots.txt in your root domain. For this website, “https://www.bloggerguidepro.com/robots.txt” is the right choice and “https://www.bloggerguidepro.com/blog/robots.txt” is not the best choice because it is not a root domain.

2) This file is case-sensitive. It means if you write SEO, it does not include Seo or SEO. It will work only for “SEO”

3) Specify the user agent separately where it is needed.

4) Do not use robots.txt to prevent duplicate content or no-index purpose. You can use “canonical tag”, “meta robots no-index tag”, etc instead.

5) In case of conflict that occurs in robots.txt, specific one will you can use specifically disallow or allow directives wherever it is required.

Custom robots txt blogger

After creating an account in Blogger it must be set up for the best benefit. Here you will generate robot.txt for blogger and will set it up properly. Each step will be shown in detail. Follow these steps and set up your custom robots.txt for blogger.

How to generate robots txt

There are many ways to generate it. It can be written in notepad or even directly in blogger. However, you will create it from a generator so any type of fear will not round in your mind. Some popular robots.txt generators are

i) Labnol

ii) TechWelkin

iii) Oflox

There are so many websites for generating robots.txt but all are not following google standards. Here Labnol is used to not face any issues during setting up custom robots.txt blogger. Follow these step-by-step processes.

1) Go to https://www.labnol.org/blogger/sitemap/ to generate a sitemap for the blogger.

2) Enter your site name in the Blog URL box.

For my website I put https://www.bloggerguidepro.com

If you do not buy a custom domain yet then you can enter

https://yoursitename.blogspot.com

Robots.txt set up steps by using blogger sitemap generator

3) click on generate an XML sitemap. After a few seconds, your sitemap will be shown to you.

4) Copy the given text.

5) Open your blogger account where you want to set up custom robots.txt.

6) Click on the triple bar (menu) located in the leftmost corner.

Click on blogger menu

7) Click on settings.

Click on blogger settings

8) Find Crawlers and indexing title.

Crawls and indexing set up steps

8) Enable custom robots.txt and enable custom robot header tags

9) Click on custom robots.txt

10) Paste the sitemap text which you have copied from Labnol.

11) Find

User-agent: *

Disallow: /search

Paste sitemap in blogger custom robots.txt

You have to add two lines after the search. It is good to disallow categories and tags for bloggers. Add these two lines after the search

Disallow: /category/

Disallow: /tag/

After set up makes sure your text looks like the below.

Custom robots txt blogger full set up

# Blogger Sitemap created on Wed, 14 Sep 2022 07:14:29 GMT

# Sitemap built with https://www.labnol.org/blogger/sitemap

User-agent: *

Disallow: /search

Disallow: /category/

Disallow: /tag/

Allow: /

Sitemap: https://www.bloggerguidepro.com/atom.xml?redirect=false&start-index=1&max-results=500

12) Click on save.

Now you have to set up custom robot header tags.

13) Custom robot header tags: You have already enabled them. Now you have to set up three things –

14) Home page tags: Click on home page tags. Enable “all” and “noodp” options. After that click on save.

Custom robot tags for home page set up

15) Archive and search page tags: Click on Archive and search page tags. Enable “noindex” and “noodp” options. After that click on save.

Custom robot tags for archive and search pages set up

16) Post and page tags: Click on the post and page tags. Enable “all” and “noodp” options. After that click on save.

Custom robot tags for posts and pages set up

You have completed all the setups related to the custom robots.txt blogger. Any indexing issue will not be shown for this matter. You can check it also for clarity.

How can I check/find my robots.txt file on a website

Checking your robots.txt file on a website is very simple. Put your URL in the add “/robots.txt” and search for it. My website can be https://www.bloggerguidepro.com/robots.txt. After searching it, the results are shown below.

Checking robots.txt file in website

You have clarity about many areas in robots.txt. However, some unknown terms can create fear in you. Therefore, here are some common issues/problems/terms disclosed with their possible solutions.

Blocked by robots.txt

That is the word you have to see in the Google search console. It has two possibilities.

i) You successfully blocked the unnecessary files from crawling. Blocked by robots.txt reflects the perfect outcome that you wanted to see. You have to do nothing about this.

ii) Any important file is blocked which should be crawled and indexed. You have to check your robots.txt file. Change the status from disallowing to allow. For example, a business wants to show “shoe” within the “store” directory. Additionally following “allow: /store/shoe” text can be added to solve that.

User agent: *

Disallow: /store

Allow: /store/shoe/

Allow: /

Submitted URL blocked by robots.txt

It refers to at least one of the URLs blocked by robots.txt in your submitted sitemap. You have to check the four things.

1) It does not contain any “no-index” pages.

2) There are not any other canonical pages.

3) There are not any redirected pages.

4) Your robots.txt file is structured properly to allow these URLs.

You have to readjust your pages and robots.txt accordingly. You have also the option to find the path of the affected pages through the robots.txt tester.

Indexed through blocked by robots.txt

It indicates google crawls your website before showing your disallow directive. It happens when google crawls your specific pages and you set up robots.txt. It has two solutions –

1) You disallow it by mistake and want to keep indexing it. Then visit your robots.txt file and remove disallow directive for this specific page/pages/paths.

2) You do not want to keep indexing that page. For that google says, robots.txt is not the right option for that. You can remove disallow directive for that page. After that apply the “meta robots no-index tag” to prevent indexing particular pages.

Ending session for custom robots txt for Blogger

Here lots of things are covered to make clarity about robots.txt. It started with What is robots.txt. You know why it is important. Some examples are given. After that, all elements are explained in detail. Some best practices are also considered. We generated custom robots.txt for Blogger. There are full setup is placed with custom robot header tags. A simple searching method is discovered. Finally, some common issues and their possible solutions are marked.

Robots.txt creates an excellent relationship with bots for the maximum benefit if it is set up properly. Another way it can create several issues is by applying the wrong directive in the robots.txt file. Therefore set up your robot.txt correctly and enjoy the best crawling by different bots. If you face any issues put them into the mailbox. If you want to add any value to it then the whole comment box is waiting for you.

Leave a Comment