In case you were wondering how much information Google stores, the paper about BigTable I was talking about last week gives some interesting insights.
Google search crawler uses 850 TB of information (1 TB = 1024 GB), so that's the amount of raw data from the web. Google Analytics uses 220 TB stored in two tables: 200 TB for the raw data and 20 TB for the summaries.
Google Earth uses 70.5 TB: 70 TB for the raw imagery and 500 GB for the index data. The second table "is relatively small (500 GB), but it must serve tens of thousands of queries per second per datacenter with low latency".
Personalized Search doesn't need too much data: only 4 TB. "Personalized Search stores each user's data in Bigtable. Each user has a unique userid and is assigned a row named by that userid. All user actions are stored in a table."
Google Base uses 2 TB and Orkut only 9 TB of data.
If we take into account that all this information is compressed (for example, the crawled data has compression rate of 11%, so 800 TB become 88 TB), Google uses for all the services mentioned before 220 TB. It's also interesting to note that the size of the raw imagery from Google Earth is almost equal to the size of the compressed web pages crawled by Google.
No comments:
Post a Comment