Ngram Viewer: Difference between revisions

Revision as of 14:03, 18 December 2010

This article or section is a stub. It does not yet contain enough information to be considered a real article. In other words, it is a short or insufficient piece of information and requires additions.

Draft

Google Books Ngrams Viewer is a tool for analyzing the whole google books corpus.

Introduction

According to Culturonomic, , retrieved 12:56, 18 December 2010 (CET), “The Google Labs N-gram Viewer is the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ('United States of America') in books over time. You'll be searching through over 5.2 million books: ~4% of all books ever published!”

“We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities.” (Quantitative Analysis of Culture Using Millions of Digitized Books retrieved, dec 18, 2010.

“[The] Google Books data set, which is available for download along with the Google Books Ngram Viewer, is a free quantitative tool made available to supplement humanities research worldwide. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, and Russian.” (Oh, the humanity, retrieved 14:03, 18 December 2010 (CET))

Links

Tools and data

Ngrams Viewer (Google's tool)
Data Sets CSV files for local dataprocessing.

Official and semi-official Google blogs

Inside google Books (informal Google blog). e.g.
- find-out-whats-in-word-or-five, Inside google Books (December 16, 2010, crossposted to google blog)

The research Blog (Harward, ...)

Cultureomics

Blog articles

A-users-guide-to-culturomics
The cultural genome: Google Books reveals traces of fame, censorship and changing languages, Discover Magazine, dec 16, 2010.

Oh, the humanity, Harvard, Google researchers use digitized books as a ‘cultural genome,

Alternative on-line corpuses

The Corpus Of Historical American English (Coha) (By Mark Davies, Brigham Young University). Smaller corpus recstricted to USA, but better search capacities.

References

Michel, Jean-Baptiste:; Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. (2010). [Quantitative Analysis of Culture Using Millions of Digitized Books. Science. DOI:10.1126/science.1199644
- ScienceMag online Version (12/16/210), DOI:10.1126/science.1199644 Quantitative Analysis of Culture Using Millions of Digitized Books

Erez Lieberman, Jean-Baptiste Michel, Joe Jackson, Tina Tang, and Martin Nowak, (2007. Quantifying the Evolutionary Dynamics of Language., Nature 449 .

@@ Line 2: / Line 2: @@
 Google Books [http://ngrams.googlelabs.com/ Ngrams Viewer] is a tool for analyzing the whole google books corpus.
+== Introduction ==
 According to [http://www.culturomics.org/Resources/A-users-guide-to-culturomics Culturonomic], , retrieved 12:56, 18 December 2010 (CET), {{quotation|The Google Labs N-gram Viewer is the first tool of its kind, capable of precisely and rapidly quantifying cultural trends based on massive quantities of data. It is a gateway to culturomics! The browser is designed to enable you to examine the frequency of words (banana) or phrases ('United States of America') in books over time. You'll be searching through over 5.2 million books: ~4% of all books ever published! }}
+{{quotation|We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of "culturomics", focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. "Culturomics" extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities. }} ([http://www.sciencemag.org/content/early/2010/12/15/science.1199644 Quantitative Analysis of Culture Using Millions of Digitized Books] retrieved, dec 18, 2010.
+{{quotation|[The] Google Books data set, which is available for download along with the Google Books Ngram Viewer, is a free quantitative tool made available to supplement humanities research worldwide. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, and Russian.}} ([http://news.harvard.edu/gazette/story/2010/12/cultural-genome/ Oh, the humanity], retrieved 14:03, 18 December 2010 (CET))
 See also: [[Visualization]]
@@ Line 14: / Line 20: @@
 * [http://ngrams.googlelabs.com/datasets Data Sets] CSV files for local dataprocessing.
-; Official and semi-official
+; Official and semi-official Google blogs
 * [http://booksearch.blogspot.com/ Inside google Books] (informal Google blog). e.g.
 ** [http://booksearch.blogspot.com/2010/12/find-out-whats-in-word-or-five-with.html find-out-whats-in-word-or-five], Inside google Books (December 16, 2010, crossposted to [http://googleblog.blogspot.com/2010/12/find-out-whats-in-word-or-five-with.html google blog])
-; Other
+; The research Blog (Harward, ...)
-* [http://www.culturomics.org/ Cultureomics], E.g.
+* [http://www.culturomics.org/ Cultureomics]
-** [http://www.culturomics.org/Resources/A-users-guide-to-culturomics A-users-guide-to-culturomics]
+; Blog articles
+* [http://www.culturomics.org/Resources/A-users-guide-to-culturomics A-users-guide-to-culturomics]
 * [http://blogs.discovermagazine.com/notrocketscience/2010/12/16/the-cultural-genome-google-books-reveals-traces-of-fame-censorship-and-changing-languages/ The cultural genome: Google Books reveals traces of fame, censorship and changing languages], Discover Magazine, dec 16, 2010.
+* [http://news.harvard.edu/gazette/story/2010/12/cultural-genome/ Oh, the humanity], Harvard, Google researchers use digitized books as a ‘cultural genome,
 ; Alternative on-line corpuses
@@ Line 29: / Line 39: @@
 == References ==
-* Michel, Jean-Baptiste:; Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. (2010). [Quantitative Analysis of Culture Using Millions of Digitized Books.  Science. ScienceMag online Version (12/16/210) [http://www.sciencemag.org/content/early/2010/12/15/science.1199644 Quantitative Analysis of Culture Using Millions of Digitized Books]
+* Michel, Jean-Baptiste:; Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, and Erez Lieberman Aiden. (2010). [Quantitative Analysis of Culture Using Millions of Digitized Books.  Science. [http://dx.doi.org/10.1126/science.1199644 DOI:10.1126/science.1199644]
+** ScienceMag online Version (12/16/210), [http://dx.doi.org/10.1126/science.1199644 DOI:10.1126/science.1199644] [http://www.sciencemag.org/content/early/2010/12/15/science.1199644 Quantitative Analysis of Culture Using Millions of Digitized Books]
 * Erez Lieberman, Jean-Baptiste Michel, Joe Jackson, Tina Tang, and Martin Nowak, (2007.  [http://www.erez.com/Nature2007QuantifyingtheEvolutionary.pdf?attredirects=0 Quantifying the Evolutionary Dynamics of Language.],  Nature 449 .

Ngram Viewer: Difference between revisions

Revision as of 14:03, 18 December 2010

Introduction

Links

References

Navigation menu

Slow Search