background preloader

Big Data

Facebook Twitter

Comment extraire en données structurées les infos contenues sur des pages web. Si vous avez besoin d'extraire des données à partir de pages web pour les transformer en données structurées, j'ai peut-être un truc qui va vous plaire. Il s'agit d'un bookmarklet qui se place dans votre barre de favoris sur votre navigateur et qui permet d'exporter sous forme de tableau CSV, le contenu d'une page web. Par exemple, sur Amazon, je peux extraire en données structurées une page de résultat. C'est tout con, mais vachement pratique. Ce bookmarklet s'appelle ConvExtra et il faudra s'inscrire sur leur site pour exporter les résultats en CSV. Vous trouverez toutes les infos ici et une démo en vidéo ci-dessous : Rejoignez les 60820 korbenautes et réveillez le bidouilleur qui est en vous Suivez KorbenUn jour ça vous sauvera la vie.. Big Data Challenges. Quandl - Intelligent Search for Numerical Data.

All public Facebook posts ever made are now searchable. Facebook Graph Search now includes posts and status updates in its results, according to a Facebook blog post Monday. Such searches will accept modifiers like time—“All of my posts from 2012” for instance—location, or people who participated. This new aspect of Graph Search will take advantage of Facebook’s recently announced hashtags. One intended purpose is for users to search posts among different social groups for topic matter, e.g., “posts about Breaking Bad by my friends.” Graph Search will also allow searches based on tagged locations (“Posts from the Empire State Building”) or involvement of other users (“Posts my friend John Smith has commented on”). The search is still subject to privacy controls, so users won’t be able to see results they couldn’t view otherwise. Like the OG Graph Search before it, new Graph Search seems prone to surfacing connections and trends we may not ever want to acknowledge.

Internet Mathematics 2010. Task description The goal of the contest Internet Mathematics 2010 is to predict the rate of traffic congestion based on previous observations. The data provided for participants in the contest are a graph of Moscow streets and observation information – the speed of traffic flow on segments of streets during one month. The task is to predict the rate of traffic congestion on the last day of the month. Data sets Data sets provided for the contest consist of two components: graph of streets and data on traffic flow. Street graph Moscow’s street intersections are represented by vertices, while sections of the city’s streets correspond to edges (a two-way street is represented by two bidirectional edges). The vertices.txt file contains all vertices IDs (first column) together with their groups (second column).

In this example, vertices 0, 1, 2, 3 are 'proper' vertices, while vertices 40, 41, 42 are parts of vertex 42, which means they represent one and the same road intersection. Evaluation. Data - The Big Data Combine Engineered by BattleFin. For this competition, you are asked to predict the percentage change in a financial instrument at a time 2 hours in the future. The data represents features of various financial securities (198 in total) recorded at 5-minute intervals throughout a trading day. To discourage cheating, you are not provided with the features' names or the specific dates. data.zip - contains features for 510 days worth of trading, including 200 training days and 310 testing daystrainLabels.csv - contains the targets for the 200 training dayssampleSubmission.csv - shows the submission format Each variable named O1, O2, O3, etc.

(the outputs) represents a percent change in the value of a security. Each variable named I1, I2, I3, etc. Within each trading day, you are provided the outputs as a relative percentage compared to the previous day's closing price. You are asked to predict the outputs 2 hours later, at 4PM ET. DatasheetLib.com - The Ultimate Datasheet Library.

Data - StumbleUpon Evergreen Classification Challenge. Data - The Big Data Combine Engineered by BattleFin. RavenPack. Renewable internal freshwater resources per capita (cubic meters. Raw Data | Data. The Morning Briefing: What is Big Data worth? What you shouldn't need to know about Big Data and Machine Learning. Big Data News Roundup: Correlation vs. Causation. In the first quarter of 2013, the stock of big data has experienced sudden declines followed by sporadic bouts of enthusiasm. The volatility—a new big data “V”—continues this month and Ted Cuzzillo summed up the recent negative sentiment in “Big data, big hype, big danger” on SmartDataCollective: “A remarkable thing happened in Big Data last week. One of Big Data’s best friends poked fun at one of its cornerstones: the Three V’s.

The well-networked and alert observer Shawn Rogers, vice president of research at Enterprise Management Associates, tweeted his eight V’s: ‘…Vast, Volumes of Vigorously, Verified, Vexingly Variable Verbose yet Valuable Visualized high Velocity Data.’ Indeed, all the people who “got stuck” on Laney’s “definition,” conveniently forgot that he first used the “three-Vs” to describe data management challenges in 2001. Cuzzillo is joined by a growing chorus of critics that challenge some of the breathless pronouncements of big data enthusiasts. Why Big Data Mining / Analytics is the New Gold Rush. Just mention the words “Big Data” to any technology entrepreneur or investor and observe how his/her face lights up with excitement. Given the perceived opportunity in Big Data, tech entrepreneurs and investors want to capitalize on it by starting /investing in a Big Data Management, Mining and Analytics business.

Is this perceived opportunity in Big Data for real or is it a bubble that will burst soon? I think the perceived opportunity in Big Data is real and is here to stay as Big Data Mining/Analytics will fundamentally change the way business is done not only online but also offline. Here’s why: Big Data is a key enabler for Social Business and without Big Data Mining/Analytics, a large or medium sized company can neither make sense of all the user generated content online nor can collaborate with customers, suppliers and partners effectively on Social Media channels.

Can you imagine any large or medium sized business without e-Business in the internet age? What do you think? How Big Data Is Rewriting Hollywood Scripts. I honestly can’t tell horror movies apart. From set and costume design to the trailers that all seem to have the same…tempo…of…HE’S RIGHT THERE, OMG!! --novelty has clearly given way to successful tropes and generic market testing. But how bad have things gotten, really? Today, box office analytics are being applied all the way to the screenplay level--that’s right, the very core of the film. The New York Times profiled former stats professor Vinny Bruzzese, who is apparently the guy for data-driven script evaluation. For as much as $20,000 per script, Mr. Now, one way to analyze this is that it’s nauseatingly unartistic. Read more here. [IMAGE: Village of the Damned, Chris Vaughan via Flickr]

Etude : Big Data, comment les entreprises adaptent-elles leur stratégie ? The Economist Intelligence Unit, l’entité de recherche du magazine The Exonomist a publié récemment un rapport d’étude à propos des Big Data (commandité par Wipro ) et la manière dont les entreprises appréhendent ce virage. Avantages certains, obstacles et difficultés identifiées, types d’applications… plusieurs dirigeants se sont prononcés pour donner leur vision ainsi que pour témoigner de leur expérience passée. Les principaux résultats se déclinent ainsi : Concernant les types de données collectées, les pratiques sont variables avec une tendance claire autour du développement des tracking des données des réseaux sociaux et de la géolocalisation.

La répartition des usages entre collecte et exploitation des données varie sensiblement entre les entreprises à forte croissance et les entreprises à faible croissance, comme le montre le graphique ci-dessous : Articles similaires: