Books Ngram viewer | Google Labs

TwitterFacebook
Get flash to fully experience Pearltrees

[2010] The study in Science

We constructed a corpus of digitized texts containing about 4% of all books ever printed. Analysis of this corpus enables us to investigate cultural trends quantitatively. We survey the vast terrain of ‘culturomics,’ focusing on linguistic and cultural phenomena that were reflected in the English language between 1800 and 2000. We show how this approach can provide insights about fields as diverse as lexicography, the evolution of grammar, collective memory, the adoption of technology, the pursuit of fame, censorship, and historical epidemiology. Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena spanning the social sciences and the humanities. <p style="text-align:right;color:#A8A8A8"></p> https://www.sciencemag.org/content/early/2010/12/15/science.1199644
http://blog.veronis.fr/2010/12/google-largest-linguistic-corpus-of-all.html

Review by Jean Véronis

When I was a student at the end of the 1970's, I never dared imagine, even in my wildest dreams, that the scientific community would one day have the means of analyzing computerized corpuses of texts of several hundreds of billions of words. At the time, I marvelled at the Brown Corpus , which included an extraordinary quantity of one million words of American English, and that after serving to compile the American Heritage Dictionary , was made widely available to scientists. This corpus, despite its size, which now seems derisory, enabled an impressive quantity of studies and largely contributed to the development of language technologies... The study to be published tomorrow in Science by a team comprising scientists from Google, Harvard, MIT, the Encyclopaedia Britannica and Houghton Mifflin Harcourt (publisher of the American Heritage Dictionary ) deals with the largest linguistic corpus of all time: 500 billion words.
http://blog.veronis.fr/2010/12/google-le-plus-grand-corpus.html Lorsque j'étais étudiant, à la fin des années 70, je n'aurais jamais osé imaginer, même dans mes rêves les plus fous, que la communauté scientifique ait un jour les moyens d'analyser des corpus de textes informatisés de plusieurs de centaines de milliards de mots. A l'époque, j'étais émerveillé par le Brown Corpus , qui comportait la quantité extraordinaire d'un million de mots d'anglais américain, et qui après avoir servi à la compilation de l' American Heritage Dictionary , avait été mis assez largement à disposition des chercheurs. Ce corpus, malgré sa taille, qui apparaît maintenant dérisoire, a permis une quantité impressionnante d'études et a contribué largement à l'essor des technologies du langage...

Le plus grand corpus linguistique de tous les temps