background preloader

Data analysis

Facebook Twitter

Onomastique et Big Data. Chaque jour, notre cerveau interprète des noms, dans une langue que nous comprenons, une culture que nous connaissons, une région que nous visitons : le menu d’un restaurant, le nom d’une entreprise… même le nom d’un animal de compagnie peut révéler quelque chose sur son propriétaire.

Onomastique et Big Data

Les noms (prénom, nom, pseudo) sont porteurs de sens qui varient en fonction de la langue et de la culture d’un individu mais constituent souvent une part essentielle de son identité. L’art d’extraire le sens des noms S’il était possible de programmer un ordinateur pour extraire le sens des noms, nous fournirait-il de précieux renseignements dans le domaine des affaires ? Aux Etats-Unis, un certain nombre de personnes en sont convaincues. La CIA (Agence centrale de renseignement) a une longue expérience en la matière. En Europe, le cadre juridique permettant de tirer parti de ces outils varie d’un pays à l’autre, mais il est généralement très strict. Prenons le cas de l’Irlande. Que voyons-nous ? Le logiciel qui prédit les délits.

Santa Cruz, en Californie, août 2012.

Le logiciel qui prédit les délits

Il est 12 h 30. Un policier arpente une rue tranquille qu'il n'a pas l'habitude de surveiller. Quelques minutes plus tard, il arrêtera deux hommes en flagrant délit : ils tentaient de voler un véhicule. Quelques mois auparavant, deux de ses collègues qui "planquaient" aux abords d'un parking du centre-ville avaient interpellé deux femmes qui cherchaient à forcer la portière d'un véhicule. Dans les deux cas, les policiers n'étaient pas là par hasard. L.A. Cops Embrace Crime-Predicting Algorithm. On patrol: A computer-generated “heat map,” left, shows predicted crime activity.

L.A. Cops Embrace Crime-Predicting Algorithm

This is translated into patrol instructions in the form of the red boxes on the map, right. A recent study suggests that computers could be better than seasoned police analysts at predicting when and where crime will strike next in a busy city. Software tested in Los Angeles was twice as good as human analysts at predicting where burglaries and car break-ins might happen, according to a company deploying the technology. When police in an L.A. precinct called Foothill division followed the computer’s advice—and focused their patrols within the areas identified—those areas experienced a 25 percent drop in reported burglaries, an anomaly compared to neighboring areas. Les Echos. Can math and science help solve crimes? Scientists work with Los Angeles police to identify and analyze crime 'hotspots' UCLA's Jeffrey Brantingham works with the Los Angeles Police Department to analyze crime patterns.

Can math and science help solve crimes? Scientists work with Los Angeles police to identify and analyze crime 'hotspots'

He also studies hunter-gatherers in Northern Tibet. If you tell him his research interests sound completely unrelated, he will quickly correct you. "Criminal offenders are essentially hunter-gatherers; they forage for opportunities to commit crimes," said Brantingham, a UCLA associate professor of anthropology. "The behaviors that a hunter-gatherer uses to choose a wildebeest versus a gazelle are the same calculations a criminal uses to choose a Honda versus a Lexus. " Brantingham has been working for years with Andrea Bertozzi, a professor of mathematics and director of applied mathematics at UCLA, to apply sophisticated math to urban crime patterns.

They believe their findings apply not only to Los Angeles but to cities worldwide. Policing actions directed at one type of hotspot will have a very different effect from actions directed at the other type. Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge. You probably recall all the excitement that went around when a group finally won the big Netflix $1 million prize in 2009, improving Netflix's recommendation algorithm by 10%.

Why Netflix Never Implemented The Algorithm That Won The Netflix $1 Million Challenge

But what you might not know, is that Netflix never implemented that solution itself. Netflix recently put up a blog post discussing some of the details of its recommendation system, which (as an aside) explains why the winning entry never was used. First, they note that they did make use of an earlier bit of code that came out of the contest: A year into the competition, the Korbell team won the first Progress Prize with an 8.43% improvement. They reported more than 2000 hours of work in order to come up with the final combination of 107 algorithms that gave them this prize. Neat. We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.