Around once a month, Google update their index by recalculating the Pageranks of each of the web pages that they have crawled. The update, now widely know as the "Google Dance," has become a monthly online event. In this updation period webmasters strongly watch as Google's new index regularly comes online and either celebrate or dejection over their new rankings. The period during the update is known as the Google dance.
During the Google Dance, new URL’s ( the `address’ of a web page) will make their appearance, old ones will be removed, perhaps to reappear later, while others still, shift around in their ranking positions in the Google index that decides how high up on the results pages they appear for particular search queries.
Update does not proceed as a switch from one index to another at one point in time. In fact, it takes several days to complete the index update. During this period, the old and the new index alternate on www.google.com.
Some Technical Backgrounds
Google's search engine structure has over 10,000 servers that operate on Linux. The computers are divided to a great number of Data Center and each Data center has its own IP. So, an index update cannot be proceed on all those servers at the same time. One server after the other has to be updated with the new index.
When you type the Google address in browser, the address is translated by the DNS server of your service provider to an IP address. Keeping Google's IP address is permitted for 5 minutes only, as a result after every five minutes the DNS server of your service company has to search for Google's updated IP address.
Every time that a DNS server checks Google's IP address, Google returns an IP address of a different data center, so that for every search performed, the search will reach a different data center. This way Google scatters the required search power on many computers.
Google has two other searchable servers apart from www.google.com. They are www2.google.com and www3.google.com. Most of the time, the results on all 3 servers are the same, but during the dance, they are different.
The rankings that can be seen on www2 and www3 are the new rankings that will transfer to www when the dance is over. During the dance, the results from www2 and www3 will sometimes show on the www server, but only briefly. Also, new results on www2 and www3 can disappear for short periods. At the end of the dance, the results on www will match those on www2 and www3.
Google Dance and DNSData transfers always take place in-between IP addresses on internets. The information is provided by the name servers of the DNS. When we enters a domain into browser, a locally configured name server gets the IP address for that domain by contacting the name server which is responsible for that domain and is cached by the name server, so that it is not necessary to contact the responsible name server each time a connection is built up to a domain.
The records for a domain at the responsible name server constitute for how long the record may be cached by a caching name server. This is the Time To Live (TTL) of a domain. As soon as the TTL expires, the caching name server has to fetch the record for a domain again from the responsible name server. Quite often, the TTL is set to one or more days. In contrast, the Time To Live of the domain www.google.com is only five minutes. So, a name server may only cache Google's IP address for five minutes and has then to look up the IP address again.
So when ever Google's name server is contacted, it sends back the IP address of only one data center. In this way, Google queries are always directed to different data centers by changing DNS records. On the one hand, the DNS records may be based on the load of the single data centers. In this way, Google would conduct a simple form of load balancing by its use of the DNS. On the other hand, the geographical location of a caching name server may influence how often it receives the single data centers' IP addresses. So, the distance for data transmissions can be reduced.
Some of the IP Addresses and Domains of Google's Data Centers:
Google has domains which resolve to the single data centers' IP addresses. These domains as well as their IP addresses are shown in the following list.
Data centers
216.239.33.100 :: www-ex.google.com
216.239.35.100 :: www-sj.google.com
216.239.37.100 :: www-va.google.com
216.239.39.100 :: www-dc.google.com
216.239.41.100 :: www-fi.google.com
216.239.51.100 :: www-ab.google.com
216.239.53.100 :: www-in.google.com
216.239.55.100 :: www-zu.google.com
216.239.57.100 :: www-cw.google.com
216.239.59.100 :: www-gv.google.com
66.102.11.100 :: www-kr.google.com
66.102.7.100 :: www-mc.google.com
For every domain www-xx.google.com, there is an additional domain www-xx2.google.com. The IP address of such a domain ends on .101 instead of .100. These pairs of domains and IP addresses belong to the same data center and, hence, the same index is searched by queries on them.
Querying the data centers
For this, it is necessary to have the Google Toolbar installed and the PageRank indicator on. Every time a page is received by the browser, the Toolbar requests its PageRank from one of Google's data centers. The information is returned as a one-line text file and stored in the Temporary Internet Files folder.
The Toolbar's request URL includes the URL of the page that it wants the PageRank for (the target page), and a checksum that matches that URL.
A fat URL for a typical Toolbar request :-
http://216.239.33.102/search
?client=navclient-auto
&ch=5150615727
&features=Rank:FVN
&q=info:http%3A%2F%2Fwww%2Eexampledomain%2Ecom%2F
If you copy and paste that fat URL into your browser, you will get Google's "forbidden" page back. That's because the target page and checksum don't match - it's just an example of the request URL.
Notice that the target page is in escaped format - some of the characters are represented by hexadecimal codes (e.g. %2F).
To get the new PageRank for a particular page, you need to make the same request that the Toolbar makes for it. I.e. you need the fat URL that the Toolbar uses. And you need to request the PageRank from all of Google's data centers. The method is a bit long-winded but it works.