Ngram Viewer
Google Books Ngrams Viewer is a tool for analyzing the whole google books corpus.
Introduction
According to Culturonomic, , retrieved 12:56, 18 December 2010 (CET), “The Google Labs N-gram Viewer is the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ('United States of America') in books over time. You'll be searching through over 5.2 million books: ~4% of all books ever published!”
“We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.” (Quantitative Analysis of Culture Using Millions of Digitized Books retrieved, dec 18, 2010.
“[The] Google Books data set, which is available for download along with the Google Books Ngram Viewer, is a free quantitative tool made available to supplement humanities research worldwide. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, and Russian.” (Oh, the humanity, retrieved 14:03, 18 December 2010 (CET))
See also: Visualization
Links
- Tools and data
- Ngrams Viewer (Google's tool)
- Data Sets CSV files for local dataprocessing.
- Official and semi-official Google blogs
- Inside google Books (informal Google blog). e.g.
- find-out-whats-in-word-or-five, Inside google Books (December 16, 2010, crossposted to google blog)
- The research Blog (Harward, ...)
- Blog articles
- A-users-guide-to-culturomics
- The cultural genome: Google Books reveals traces of fame, censorship and changing languages, Discover Magazine, dec 16, 2010.
- Oh, the humanity, Harvard, Google researchers use digitized books as a ‘cultural genome,
- Alternative on-line corpuses
- The Corpus Of Historical American English (Coha) (By Mark Davies, Brigham Young University). Smaller corpus recstricted to USA, but better search capacities.
References
- Michel, Jean-Baptiste:; Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. (2010). [Quantitative Analysis of Culture Using Millions of Digitized Books. Science. DOI:10.1126/science.1199644
- ScienceMag online Version (12/16/210), DOI:10.1126/science.1199644 Quantitative Analysis of Culture Using Millions of Digitized Books
- Erez Lieberman, Jean-Baptiste Michel, Joe Jackson, Tina Tang, and Martin Nowak, (2007. Quantifying the Evolutionary Dynamics of Language., Nature 449 .