background preloader

Data Sets

Data Sets

http://archive.ics.uci.edu/ml/datasets.html

Related:  Computer ProjectsData Showdown

Open-source, distributed deep learning for the JVM Deeplearning4j is not the first open-source deep-learning project, but it is distinguished from its predecessors in both programming language and intent. DL4J is a JVM-based, industry-focused, commercially supported, distributed deep-learning framework intended to solve problems involving massive amounts of data in a reasonable amount of time. It integrates with Hadoop and Spark using an arbitrary number of GPUs or CPUs, and it has a number you can call if anything breaks. Get Started With Deeplearning4j Content

The Journalist-Engineer A couple months ago, I published an article comparing historic and present-day popularity of older music. I used two huge datasets: 50,000 Billboard songs and 1,4M tracks on Spotify. If I were writing an academic paper, I’d do a ton of analysis, regression, and modeling to figure out why certain songs have become more popular over time. Or I could just make some sick visualizations… U.S. Census Return Rate Challenge (Visualization Competition) Note: The prediction phase of this competition has ended. Please join the visualization competition which ends on Nov. 11, 2012. This challenge is to develop a statistical model to predict census mail return rates at the Census block group level of geography.

Visual Business Intelligence For data sensemakers and others who are concerned with the integrity of data sensemaking and its outcomes, the most important book published in 2016 was Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, by Cathy O’Neil. This book is much more than a clever title. It is a clarion call of imminent necessity. Technical Analysis: Chart Patterns By Cory Janssen, Chad Langager and Casey Murphy A chart pattern is a distinct formation on a stock chart that creates a trading signal, or a sign of future price movements. Chartists use these patterns to identify current trends and trend reversals and to trigger buy and sell signals. In the first section of this tutorial, we talked about the three assumptions of technical analysis, the third of which was that in technical analysis, history repeats itself. The theory behind chart patters is based on this assumption. The idea is that certain patterns are seen many times, and that these patterns signal a certain high probability move in a stock.

What is the Marital Status of Americans by Age? Visualization Data Notes A few months ago I created a visualization that allowed users to compare age distributions for various topics and another one that showed marital status by age range. gnumpy (Russian / Romanian / Belarussian translations by various people) Gnumpy is free software, but if you use it in scientific work that gets published, you should cite this tech report in your publication. Download: gnumpy.py (also be sure to have the most recent version of Cudamat) Documentation: here. Do you want to have both the compute power of GPU's and the programming convenience of Python numpy? Gnumpy + Cudamat will bring you that. VC blog Posted: November 26th, 2014 | Author: Manuel Lima | Filed under: Uncategorized | No Comments » As some attentive users of Visual Complexity might have noticed, the number of projects featured on the website has slowly come to a halt, with the perpetual grand total of 777 being a grieving reminder of inactivity for well over a year. Today, If you go the the main page and look at the top right corner, you will see an invigorating new message: “Indexing 782 projects”. Of course I didn’t want to write this blog post to announce that five new projects have been added to the database.

Implementing a Neural Network from Scratch in Python – An Introduction – WildML Get the code: To follow along, all the code is also available as an iPython notebook on Github. In this post we will implement a simple 3-layer neural network from scratch. We won’t derive all the math that’s required, but I will try to give an intuitive explanation of what we are doing. I will also point to resources for you read up on the details.

how fast does miles teller play in whiplash EDIT 05 Sep. 2015: The concept of Beat Per Minutes (BPM) has been mis-understood as mentioned by reddit. What I was supposed to write was Strokes Per Minutes (SPM). Released in 2014, Whiplash focuses on a promising young drummer (Miles Teller) pursuing his dream of greatness. A tutorial on Deep Learning Complex probabilistic models of unlabeled data can be created by combining simpler models. Mixture models are obtained by averaging the densities of simpler models and "products of experts" are obtained by multiplying the densities together and renormalizing. A far more powerful type of combination is to form a "composition of experts" by treating the values of the latent variables of one model as the data for learning the next model. The first half of the tutorial will show how deep belief nets -- directed generative models with many layers of hidden variables -- can be learned one layer at a time by composing simple, undirected, product of expert models that only have one hidden layer. It will also explain why composing directed models does not work.

Je vais donner accès aux groupes datasets et data tools pour garder le tout propre ;) by dishwasherz Jul 27

Un grand ensemble de datasets très variés. A garder de côté ! by simd3v Jul 27

Related: