Exploring Google Books

Google Books, and its associated ngram indices, represent one of the largest publicly available databases in the world. At last count, Google had scanned and indexed over 25 million books containing over 1,000,000,000,000 terms (ngrams) - roughly comparable to all of the text on all of the pages of the internet. Not only is the database impressive in its scale and availability, but in the wealth of knowledge in it about our culture over the last 200 years. 

Here are some examples of some of the insights you can gain with Google Books. These graphs show the relative occurrences in printed material of the specified words and phrases, by year, and to a good approximation reflect what people were thinking (and writing about) during this time.

Political ideologies:


Modes of transportation:

Family roles:

Many more examples are here.

Working with data sets this large required Google to pioneer new concepts in highly scalable parallel data processing, such as MapReduce, also known by the name of its popular implementation, Hadoop. These techniques allowed Google to break down the massive problem of indexing this vast database into manageable chunks that could be performed by many machines working in parallel. These systems and techniques are now used by many companies for big data problems, such as customer analytics and machine learning. 


View User Profile for Brian Conte Brian founded Fast Track with over 15 years of entrepreneurial experience and technology expertise. Brian managed the development of Microsoft's first browser in 1985 and later founded hDC, the first Windows software company. Brian ran hDC, later named Express Systems, for 10 years before selling it to WRQ in 1996, where he remained as CTO. Brian spearheaded the development of one of WRQ's most successful products, Express 2000, which generated more than $10 million in its first year. Brian holds a BSE in Electrical Engineering and Computer Science from Princeton University.
Posted by Brian Conte Tuesday, October 18, 2016 1:31:00 AM Categories: B2B big data custom development enterprise technology web development