background preloader

PROJETS DATA

Facebook Twitter

Architecture overview — Scrapy 0.24.6 documentation. This document describes the architecture of Scrapy and how its components interact.

Architecture overview — Scrapy 0.24.6 documentation

Scrapy Engine¶ The engine is responsible for controlling the data flow between all components of the system, and triggering events when certain actions occur. See the Data Flow section above for more details. Scheduler¶ The Scheduler receives requests from the engine and enqueues them for feeding them later (also to the engine) when the engine requests them. Downloader¶ The Downloader is responsible for fetching web pages and feeding them to the engine which, in turn, feeds them to the spiders.

Spiders¶ Spiders are custom classes written by Scrapy users to parse responses and extract items (aka scraped items) from them or additional requests to follow. Item Pipeline¶ The Item Pipeline is responsible for processing the items once they have been extracted (or scraped) by the spiders. Downloader middlewares¶ Use a Downloader middleware if you need to do one of the following: L’homme le plus connecté du monde s’est fait dévorer par ses données. HOW - Face to Facebook.

How we did it Through special custom software we collected data from more than 1,000,000 Facebook users.

HOW - Face to Facebook

What we collected is their "public data" - some of their personal data (name, country, Facebook groups they subscribe to) plus their main profile picture and a few friend relationships. We built a database with all this data, then began to analyze the pictures that showed smiling faces. The vast majority of pictures were both amateurish and somehow almost involuntarily or unconsciously alluring. And they are almost always "smiling". It's also evident that the majority of users want to appear in the best shape and look.

Once the database was ready, we studied and customized a face recognition algorithm. After grouping them, we started to dive into these seas of faces, with all the perceptual consequences. In "The Love Delusion" essay, Dan Jones cites Martie Haselton’s research, which indicates that men typically overestimate the sexual interest conveyed by a woman's smile or laughter. Big Bang Data. OLGA SUBIRÓS STUDIO. We all generate data, with our mobile phones, sensors, social networks, digital photographs and videos, purchase transactions and GPS signals.

OLGA SUBIRÓS STUDIO

What is new is that it is increasingly easy to store and process these vast quantities of data that detect patterns (of incidents, behaviour, consumption, voting, investment, etc.). This fact is very quickly and completely changing the way decisions are made at all levels. Is data the new oil, a potentially boundless source of wealth? Is it the ammunition for arms of mass surveillance? Or should it be primarily an opportunity, an instrument for knowledge, prevention, efficiency and transparency, a tool to help construct a more transparent, participatory democracy? Selfiecity London. How we collected and filtered the data To locate selfies photos, we randomly selected 140,000 photos (20,000-30,000 photos per city) from a total of 808,000 images we collected on Instagram. 2-4 Amazon’s Mechanical Turk workers tagged each photo.

selfiecity London

For these, we asked Mechanical Turk workers the simple question "Does this photo shows a single selfie"? We then selected top 1000 photos for each city (i.e., photos which at least 2 workers tagged as a single person selfie). Selfiexploratory. The upper part shows charts that you can use as , while the lower part displays the .

Selfiexploratory

Click on any of the circles to see only selfies taken in that city. Clicking on the map background will remove the filter. You can navigate through the results grid using the on the left and right. Note how the other charts react when you filter the data! Click and drag the mouse to create a filter. The works the same as the map: Just click on a circle to filter. Is location (captured by Instagram app) and gender and age (tagged and guesses by Mechanical Turk workers.) Erica Scourti. Big Bang Data. Big Bang Data & Electronic Superhighway. Big Bang Data & Electronic Superhighway.

Big Bang Data & Electronic Superhighway

S01E01 - Do Not Track. La culture à la merci des algorithmes? Inutile de le cacher plus longtemps: vous avez un sérieux penchant pour les films avec Scarlett Johansson.

La culture à la merci des algorithmes?

Netflix le sait. Au bureau, vous écoutez Lilly Wood & the Prick. Deezer le voit tous les jours. Et pour votre anniversaire, le 27 janvier, vous vous offrirez l'autobiographie de Zlatan Ibrahimovic. Amazon l'a déjà emballé. Comment ces sites Internet font-ils pour vous connaître si bien et anticiper vos choix? Comment la SNCF déploie le Big Data pour optimiser les flux de voyageurs. SNCF Gares & Connexions va se doter d'un cluster Hadoop pour analyser les données Wi-fi des gares.

Comment la SNCF déploie le Big Data pour optimiser les flux de voyageurs

Objectif : suivre et optimiser en temps réel les flux de voyageurs. La SNCF s'est lancée dans un vaste projet visant à optimiser les mouvements de voyageurs en gares. Le projet est mené au sein du pôle SNCF Gares & Connexions qui a pour mission de maintenir, d'aménager et de développer quelque 3000 gares ferroviaires à travers la France. Le défi : mieux comprendre les flux des deux milliards de passagers transitant chaque année en gare et établissant des connexions avec d'autres moyens de transports (bus, autocars, taxis, vélos en libre-service...).

Un vaste chantier de Big Data a été initié pour le relever.