Ngram Viewer

From EduTech Wiki
Jump to: navigation, search

This article or section is a stub. A stub is an entry that did not yet receive substantial attention from editors, and as such does not yet contain enough information to be considered a real article. In other words, it is a short or insufficient piece of information and requires additions.

Draft

Google Books Ngrams Viewer is a tool for analyzing the whole google books corpus.

1 Introduction

According to Culturonomic, , retrieved 12:56, 18 December 2010 (CET), “The Google Labs N-gram Viewer is the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ('United States of America') in books over time. You'll be searching through over 5.2 million books: ~4% of all books ever published! ”

“We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities. ” (Quantitative Analysis of Culture Using Millions of Digitized Books retrieved, dec 18, 2010.

“[The] Google Books data set, which is available for download along with the Google Books Ngram Viewer, is a free quantitative tool made available to supplement humanities research worldwide. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, and Russian.” (Oh, the humanity, retrieved 14:03, 18 December 2010 (CET))

See also: Visualization

2 Links

Tools and data
Official and semi-official Google blogs
The research Blog (Harward, ...)
Blog articles
  • Oh, the humanity, Harvard, Google researchers use digitized books as a ‘cultural genome,
Alternative on-line corpuses

3 References