Frequently we read two words in blogging world “Crawling” and “Indexing”.
On these terminologies, whole world web depends and these are the most important elements of SEO. These are also called Google Crawling and Google Indexing.
There are hundreds of millions of website on the internet world, so it is not possible for Google to check every website post by post and then collect it data and show it in returning results in search engines if it is eligible for this.
So Google Administration has simple made Googlebot to perform these responsibilities. This Googlebot is searching bot software that Google sends out to circulate from website to website and they collect data from all websites and report back to Google. So process in which Google searching bot software visits different sites is called crawling.
A keyword definition of crawling is:
“Crawling is a process in which Google searching bot software visits each and every website, discover new and updated data and report back to Google.”
Here Google mean Google Search Engine. Remember! These searching bots are also called “spider”.
“Indexing is a process in which the information collected from website by the Google searching bot software (crawling process) is processed and added to Google Searchable Index”.
In simple words indexing is the process of adding your site page in search engine.
Googlebot’s crawl visits all those URL’s which were generated during previous crawling process and those sites which are amplified with sitemap data as provided by the webmaster.
When Googlebot visits these sites, it collects the information about source links and hypertext reference from the page of the site and also about new created sites, changing in existing sites and dead links. After collecting this information spider report back to Google to index these SRC, HREF and other links.
Now we discuss how Googlebot access your site.
How Google bot Access Your Site
Generally Googlebot when visits your site, it download a copy of your page at a time.
But if there is any network error, it may be possible that when Googlebot starts downloading a page, network error appears sometime and Googlebot stopped. To download page, Googlebot again starts downloading your site page after some time.
So Googlebot access your site once in a day to collect/download the information from your page but it may be appear multiple times to download page if there is any network error.
There is an important point to note that “as webpages increases from time to time, the performance of Googlebot also improves.
There are several machines located near site on which the Googlebot is distributed. These Googlebot collect information and report back to Google from where this information sent back to the machines for indexing located near the site. Googlebot try to crawl all pages of your site in once visit without overwhelming your server’s bandwidth. But if it can’t complete crawling process of all pages in once time due to any reason, then it can visit again to crawl remaining pages.
Stop Google bot to Crawl on Your Site
You can also prevent your confidential and secret content and links from Googlebot to crawl these. You have number of option to block Googlebot to access your confidential and secret files and crawl and index them in search engine. Some of these options are given below.
- Use robot.txt
- Block URL by password-protecting your service directories
- Block Search Indexing with meta tags
- Opt-Out of display on Google+ Local, Google Flights, Google Hotels, Google Shopping and Google Advisor. (This option is only applicable on those services which are hosted by Google.com and not applicable for other Google domains.)
These are some option for blogger and web master to block the Googlebot / Spider from crawling your site or some specific pages which you want.
Make Your Site Crawlable
In every new minute, there is new website is appearing on web.
There are now hundreds of millions of websites on earth. So Googlebot discover the new sites and other sites by following the links from page to page.
Now you can make your site page Crawlable by removing the crawl errors from your site page. You should also review these crawl errors on regularly basis so you can identify any problem that is with your site.
For example you site is not getting traffic despite creating a high quality links and sharing on all available resources like social media. If you are running an AJAX application with content and you also want that it appear in search results then you can do it by reviewing JAX-based content Crawlable and indexable.