|Image via CrunchBase|
Most importantly, I have estimated downstream traffic only, i.e., I do not seek to estimate Google's internal traffic between datacenters.
Google Search (text only) - around 200 Gb/s downstream capacity required.We can estimate Google Search average downstream traffic as: [downstream traffic/month for Google Search] = [number of Google Search request / month] * [downstream data for a Google Search request]
Then, we will derive the amount of bandwidth that Google needs to provision for Google Search service through the following equation: [downstream capacity required for Google Search] = [downstream traffic/month for Google Search] / [number of seconds in a month] * [overprovisioning ratio]
Let's estimate the value of our variables, starting with [downstream data for a Google Search request]. I did a quick test with firebug.
- Google Search empty homepage size = 455kBytes
- Homepage size with results = 55kbytes
- So, let's assume that the average downstream data triggered by a Google Search request is about 500kBytes, based on the measure above.
Putting the pieces together, that makes : [downstream traffic/month for Google Search] = 13bn/month * 500kBytes=13 * 10^9 * 5*10^5 Bytes/month= 6,5 *10^15 Bytes/month ~= ~5 10^16 bits /month of traffic for the US.
Google needs to dimension the pipes to accommodate for the peak traffic, not just for the average traffic that we have calculated. So, let's assume that Google provisions four times the average bandwidth it needs for delivering Google Search results to end-users ([overprovisioning ratio]=4). This leads us to: [downstream capacity required for Google Search] =4 * 5*10^16 / (30*24*3600) = 80 Gbit/s for the US.
Now, let's consider that the US represent 45% of the requests, because they represent 45% of Google's revenues (source: Google's 2013 annual report). This brings us to a final figure of :
[downstream capacity required for Google Search] = 80 Gb/s *1/45% ~= ~180 Gb/s. To conclude, we see that about 180 Gb/s of downstream capacity is required for delivering Google Search traffic worldwide. That's smaller than what I expected ! It is not 100% consistent with the peering capacity that Google deploys worldwide: indeed, Google deploys 400Gb/s connectivity in largest interconnection points (see the post about Google's interconnection points).
How to explain the gap:
- The simple arithmetic above accounts for the delivery of the text search page (not picture search). But, behind this page, there is a lot of calculations and traffic exchanges among the datacenters for synchronization / load balancing etc. In a research paper, Google explains that "On average, a single query on Google reads hundreds of megabytes of data and consumes tens of billions of CPU cycles." But this is amount of data flows through private point-to-point optical links interconnecting Google's datacenters and thus they are not visible in our estimation of downstream connectivity to the end-users. To understand this, I advise you to read the following Google research paper: "What Devices do Data Centers Need? ".
- Youtube represents a much larger share of the downstream capacity needs (cf. my other post estimating the downstream traffic induced by Youtube)