Science Magazine: Sign In. El gripazo de Google muestra las flaquezas del 'big data' Why big data is in trouble: they forgot about applied statistics. All of these articles warn about issues that statisticians have been thinking about for a very long time: sampling populations, confounders, multiple testing, bias, and overfitting.
In the rush to take advantage of the hype around big data, these ideas were ignored or not given sufficient attention. One reason is that when you actually take the time to do an analysis right, with careful attention to all the sources of variation in the data, it is almost a law that you will have to make smaller claims than you could if you just shoved your data in a machine learning algorithm and reported whatever came out the other side. The prime example in the press is Google Flu trends. Google Flu trends was originally developed as a machine learning algorithm for predicting the number of flu cases based on Google search terms. As we have seen, lack of expertise in statistics has led to fundamental errors in both genomic science and economics. All of this leads to two questions: Radstats: Unemployment: How Official Statistics Distort Analysis and Policy, and Why.
David Webster Minor corrections were made to this article on 8 May, 2012 at the request of the author.
The current state of British official unemployment statistics fully vindicates the concerns of the statisticians who launched Radical Statistics 27 years ago - "a common concern about the political implications of their work and an awareness of the actual and potential misuse of statistics". Indeed this is probably a more graphic example than any contemplated at that time, which was something of a golden age for British official statistics. The official statistics represent British unemployment as much lower than realistically it is, by concealing the scale of disguised unemployment. They also misrepresent its geography, obscuring its high degree of concentration in the former industrial cities and coalfields.
The two issues of the level of unemployment, and its geography, are interlinked and it is difficult to understand one without the other. Click HERE for Map 1a. Government Trainees. Big data: are we making a big mistake? Big data is a vague term for a massive phenomenon that has rapidly become an obsession with entrepreneurs, scientists, governments and the media ©Ed Nacional Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world’s top scientific journals, Nature.
Without needing the results of a single medical check-up, they were nevertheless able to track the spread of influenza across the US. What’s more, they could do it more quickly than the Centers for Disease Control and Prevention (CDC). Google’s tracking had only a day’s delay, compared with the week or more it took for the CDC to assemble a picture based on reports from doctors’ surgeries. Google Flu Trends still appears sick. Does Big Data have the flu? Google Flu Trends was designed to predict the CDC’s reports of flu cases, but often misses its target.
These results led Northeastern University network scientists to take a closer look at how Big Data should be used to advance scientific research. Image via Thinkstock. These days, when people start feeling a fever and a sore throat coming on, often times their first move isn’t to the medicine cabinet. Instead, it’s to a computer or smartphone to Google their symptoms. These queries, which make up only a tiny fraction of the more than 7 billion total queries the search engine handles each day, are all stored by Google. One example of the latter is Google Flu Trends, a statistical model developed by engineers at Google.org—the company’s foundational arm—in an effort to “now-cast” what’s happening with the flu on any given day.
But research has shown that GFT often misses its target. Eight-no-nine-problems-with-big-data. Photo BIG data is suddenly everywhere.
Everyone seems to be collecting it, analyzing it, making money from it and celebrating (or fearing) its powers. Big Data is a Big Deal. Posted by Tom Kalil on March 29, 2012 at 09:23 AM EDT [Editor's Note: Watch the live webcast today at 2pm ET of the Big Data Research and Development event at Today, the Obama Administration is announcing the “Big Data Research and Development Initiative.”
By improving our ability to extract knowledge and insights from large and complex collections of digital data, the initiative promises to help accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning. To launch the initiative, six Federal departments and agencies will announce more than $200 million in new commitments that, together, promise to greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data. Big Data un reto para empresas en 2013. Los datos de una empresa son información de alto valor.
Actualmente el marketing digital es una gran nube que aporta millones de datos a las empresas. Sin embargo, sólo el 45% de las empresas reconocen que analizan estos datos y son conscientes de su alto potencial. Un reciente informe de Infogroup Targetin Solutions, concluyó que el análisis del Big Data es un tema pendiente en las empresas, dentro de una muestra de 700 marketers. Entre las principales conclusiones que arrojó dicho estudio fue que el 68% de los marketers tienen previsto aumentar su presupuesto destinado al análisis y tratamiento de dicha información. Un dato no menos importante es que más de la mitad, el 56% quiere aumentar su nómina con personal dedicado exclusivamente a esta área. Principal reto en 2013. ‘Big data’ is dead. What’s next?
How can big data and smart analytics tools ignite growth for your company?
Find out at DataBeat, May 19-20 in San Francisco, from top data scientists, analysts, investors, and entrepreneurs. Register now and save $200! Big data can blind us to the long term. Fallout-from-snowden-hurting-bottom-line-of-tech-companies.