background preloader

Culturomics

Culturomics

Culturomics 2.0: Forecasting large-scale human behavior using global news media tone in time and space Global geocoded tone of all Summary of World Broadcasts content January 1979–April 2011 mentioning “Bin Laden” (click to view animation). (Credit: UIC) Computational analysis of large text archives can yield novel insights into the functioning of society, recent literature has suggested, including predicting future economic events, says Kalev Leetaru, Assistant Director for Text and Digital Media Analytics at the Institute for Computing in the Humanities, Arts, and Social Science at the University of Illinois and Center Affiliate of the National Center for Supercomputing Applications. The emerging field of “Culturomics” seeks to explore broad cultural trends through the computerized analysis of vast digital book archives, offering novel insights into the functioning of human society, while books represent the “digested history” of humanity, written with the benefit of hindsight. Global geocoded tone of all New York Times content, 2005 (click on image to see animation). (Credit: UIC)

Google: Le plus grand corpus linguistique de tous les temps Lorsque j'étais étudiant, à la fin des années 70, je n'aurais jamais osé imaginer, même dans mes rêves les plus fous, que la communauté scientifique ait un jour les moyens d'analyser des corpus de textes informatisés de plusieurs de centaines de milliards de mots. A l'époque, j'étais émerveillé par le Brown Corpus, qui comportait la quantité extraordinaire d'un million de mots d'anglais américain, et qui après avoir servi à la compilation de l'American Heritage Dictionary, avait été mis assez largement à disposition des chercheurs. Ce corpus, malgré sa taille, qui apparaît maintenant dérisoire, a permis une quantité impressionnante d'études et a contribué largement à l'essor des technologies du langage... J'ai eu la chance d'avoir pu accéder à l'étude avant publication, et j'ai eu quelque peu le vertige... Et pour le français ? Eh bien, tout est à faire. Les linguistes (français en tout cas) en auront-ils conscience ? Pour en savoir plus

Culturomics Further reading[edit] References[edit] External links[edit] Culturomics.org, website by The Cultural Observatory at Harvard directed by Erez Lieberman Aiden and Jean-Baptiste Michel In 500 Billion Words, a New Window on Culture The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of this magnitude and searching tools are at the disposal of Ph.D.’s, middle school students and anyone else who likes to spend time in front of a small screen. It consists of the 500 billion words contained in books published between 1500 and 2008 in English, French, Spanish, German, Chinese and Russian. The intended audience is scholarly, but a simple online tool allows anyone with a computer to plug in a string of up to five words and see a graph that charts the phrase’s use over time — a diversion that can quickly become as addictive as the habit-forming game Angry Birds. With a click you can see that “women,” in comparison with “men,” is rarely mentioned until the early 1970s, when feminism gained a foothold. The data set can be downloaded, and users can build their own search tools.

Culturomics research uses quarter-century of media coverage to forecast human behavior "Culturomics" is an emerging field of study into human culture that relies on the collection and analysis of large amounts of data. A previous culturomic research effort used Google's culturomic tool to examine a dataset made up of the text of about 5.2 million books to quantify cultural trends across seven languages and three centuries. Now a new research project has used a supercomputer to examine a dataset made up of a quarter-century of worldwide news coverage to forecast and visualize human behavior. Using the tone and location of news coverage, the research was able to retroactively predict the recent Arab Spring and successfully estimate the final location of Osama Bin Laden to within 200 km (124 miles). Tone Leetaru says that examining the tone of a news story is one of the most important aspects of his version of culturomics and the most reliable metric for conflict. Location, location, location World "civilizations" according to SWB, 1979-2009 (Image: Leetaru)

Quand Google Books permet de comprendre notre génome culturel Pour une fois, on va dire du bien de Google dans cette lecture de la semaine. A travers un article paru sur le site de Discover Magazine en décembre 2010, sous la plume de Ed Young. Le titre de cet article : “Le génome culturel ; Google Books révèle les traces de la notoriété, de la censure et des changements de la langue”. “De la même manière qu’un fossile nous dit des choses sur l’évolution de la vie sur terre, explique Ed Young, les mots inscrits dans les livres racontent l’histoire de l’humanité. Heureusement, poursuit Young, c’est exactement ce que fait Google depuis 2004 avec Google Books. 15 millions de livres ont été numérisés aujourd’hui, soit 12 % de l’ensemble des livres qui ont été publiés à ce jour. L’équipe a travaillé sur un tiers du corpus total. 5 millions de livres publiés en Anglais, Français, Espagnol, Allemand, Chinois, Russe et Hébreu, et remontant au 16e siècle. Maintenant, quelques résultats de ce travail : 1. 2. 3. 4. Image : l’évolution de ce que nous mangeons…

MemeTracker: tracking news phrases over the web Our adventures in culturomics Peter Aldhous, Jim Giles and MacGregor Campbell, reporters (Image: Michael St. Maur Sheil/Corbis) Here in New Scientist's San Francisco bureau we can't resist an invitation to participate in an entirely new field of research. So after reading about the first analyses of word usage over time in Google's mammoth database of 5 million digitised books, we were excited to learn that the search giant has provided a neat tool, the Books Ngram Viewer, to perform your own "culturomic" studies. Diving straight into the US culture war, this result made us exclaim, "Science be praised!" We soon thought we'd made a real culturomic discovery: nanotechnology has been around since 1899: (Note, to see the clear peaks you need to set the "smoothing" value to zero.) Then we saw the same pattern for searches relating to the internet and cutting-edge biology. Was the world blessed with some spookily prescient authors around the dawn of the twentieth century? But why do glitches cluster around 1899 and 1905?

Bluefin Mines Social Media To Improve TV Analytics

Related: