Críticas a Big Data

Science Magazine: Sign In. El gripazo de Google muestra las flaquezas del 'big data' Why big data is in trouble: they forgot about applied statistics. All of these articles warn about issues that statisticians have been thinking about for a very long time: sampling populations, confounders, multiple testing, bias, and overfitting.

Why big data is in trouble: they forgot about applied statistics

In the rush to take advantage of the hype around big data, these ideas were ignored or not given sufficient attention. One reason is that when you actually take the time to do an analysis right, with careful attention to all the sources of variation in the data, it is almost a law that you will have to make smaller claims than you could if you just shoved your data in a machine learning algorithm and reported whatever came out the other side. The prime example in the press is Google Flu trends. Google Flu trends was originally developed as a machine learning algorithm for predicting the number of flu cases based on Google search terms. As we have seen, lack of expertise in statistics has led to fundamental errors in both genomic science and economics. All of this leads to two questions: Radstats: Unemployment: How Official Statistics Distort Analysis and Policy, and Why.

Radstats: Unemployment: How Official Statistics Distort Analysis and Policy, and Why

The current state of British official unemployment statistics fully vindicates the concerns of the statisticians who launched Radical Statistics 27 years ago - "a common concern about the political implications of their work and an awareness of the actual and potential misuse of statistics". Indeed this is probably a more graphic example than any contemplated at that time, which was something of a golden age for British official statistics. The official statistics represent British unemployment as much lower than realistically it is, by concealing the scale of disguised unemployment. They also misrepresent its geography, obscuring its high degree of concentration in the former industrial cities and coalfields.

Government Trainees. Big data: are we making a big mistake? Big data is a vague term for a massive phenomenon that has rapidly become an obsession with entrepreneurs, scientists, governments and the media ©Ed Nacional Five years ago, a team of researchers from Google announced a remarkable achievement in one of the world's top scientific journals, Nature.

Big data: are we making a big mistake?

Without needing the results of a single medical check-up, they were nevertheless able to track the spread of influenza across the US. What’s more, they could do it more quickly than the Centers for Disease Control and Prevention (CDC). Google’s tracking had only a day’s delay, compared with the week or more it took for the CDC to assemble a picture based on reports from doctors’ surgeries. Google Flu Trends still appears sick. Does Big Data have the flu? Google Flu Trends was designed to predict the CDC’s reports of flu cases, but often misses its target.

Does Big Data have the flu?

These results led Northeastern University network scientists to take a closer look at how Big Data should be used to advance scientific research. Image via Thinkstock. These days, when people start feeling a fever and a sore throat coming on, often times their first move isn’t to the med­i­cine cab­inet. Instead, it’s to a com­puter or smart­phone to Google their symptoms. These queries, which make up only a tiny frac­tion of the more than 7 bil­lion total queries the search engine han­dles each day, are all stored by Google. One example of the latter is Google Flu Trends, a sta­tis­tical model devel­oped by engi­neers at—the company’s foun­da­tional arm—in an effort to “now-​​cast” what’s hap­pening with the flu on any given day.

