
The Journalist-Engineer A couple months ago, I published an article comparing historic and present-day popularity of older music. I used two huge datasets: 50,000 Billboard songs and 1,4M tracks on Spotify. If I were writing an academic paper, I’d do a ton of analysis, regression, and modeling to figure out why certain songs have become more popular over time. Or I could just make some sick visualizations… Instead of reporting on my “theory”, I wagered that readers would get more out of an elegant presentation of the data, not an analysis of it. Here’s that same approach on another project: rappers and the size of their vocabulary. Instead of proving that one rapper was better than another, readers are really good at absorbing the data, and they’d much rather form their own judgements. A few years ago, Bret Victor wrote about the notion of passive and active readers: In theory, this sounds great…but kinda crazy. But it’s happening — there are active readers. I believe it’s a response to “too long, didn’t read.”
bl_sample_s.pdf Learn R for beginners with our PDF With so much emphasis on getting insight from data these days, it's no wonder that R is rapidly rising in popularity. R was designed from day one to handle statistics and data visualization, it's highly extensible with many new packages aimed at solving real-world problems and it's open source (read "free"). If you're ready to learn, we have just the ticket: A free PDF of Computerworld's "Beginner's guide to R." Included in this 45-page guide: Introduction: First steps, including downloading R and RStudio, setting your working directory and installing and using packages. Sure, it will take more than any single guide to make you an R master. If you are not already part of the Computerworld Insider program, register for free and then download the guide. Happy learning!
Million Song Dataset | scaling MIR research The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks. Its purposes are: To encourage research on algorithms that scale to commercial sizesTo provide a reference dataset for evaluating researchAs a shortcut alternative to creating a large dataset with APIs (e.g. The Echo Nest's)To help new researchers get started in the MIR field The core of the dataset is the feature analysis and metadata for one million songs, provided by The Echo Nest. The Million Song Dataset is also a cluster of complementary datasets contributed by the community: The Million Song Dataset started as a collaborative project between The Echo Nest and LabROSA. How to get started To get a sense of the dataset, you can look at this description of one of the million songs. To start your own experiments, you can download the entire dataset (280 GB). We also have a set of suggested tasks, including snippets of code to get you started.
What is the Marital Status of Americans by Age? Visualization Data Notes A few months ago I created a visualization that allowed users to compare age distributions for various topics and another one that showed marital status by age range. Marital Status Sex Pretty generic question here. Race The ACS has six basic race categories. Employment Status This fields is broken out to let you see not only who is in the labor force and who isn’t, but it allows you to see age of those who are employed in the Armed Forces as well. State Geography often is associated with different trends. Online Charts Builder Hohli Online Charts Builder New version: Try new version of Charts Builder, it based on new Google Charts API Load From Image URL: Chart Data can't equal to original, but very similar to it. Only for images on chart.apis.google.com Chart Type: 3D Pie charts Lines Bar charts Pie charts For Pie Charts with labels choose 1000x300 or 800x375 size Venn diagrams Scatter plots Radar charts Chart Size: 320x240 Horizontal 1000x300 800x375 600x500 320x240 Vertical 300x1000 375x800 500x600 240x320 Square 546x546 400x400 300x300 200x200 Chart Ads: Data: Should be consists only positive numbers, use minus one (-1) for missing value, separated by coma, space or semi(,; ), e.g.: 23, 432, 456, 341 For Lines (pairs): Input data as x-axis and y-axis coordinates, e.g.: x1,y1, x2,y2, x3,y3 Title: Use a pipe character (|) to force a line break in title. Background: Chart is ready you can save it as image Right click on the chart Select "Save image as" Save the image to your computer © 2011 Charts Builder. Developed by Anton Shevchuk
A brief introduction to “apply” in R | What You're Doing Is Rather Desperate At any R Q&A site, you’ll frequently see an exchange like this one: Q: How can I use a loop to […insert task here…] ? A: Don’t. Use one of the apply functions. So, what are these wondrous apply functions and how do they work? I think the best way to figure out anything in R is to learn by experimentation, using embarrassingly trivial data and functions. Let’s examine each of those. 1. applyDescription: “Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.” OK – we know about vectors/arrays and functions, but what are these “margins”? That last example was rather trivial; you could just as easily do “m[, 1:2]/2” – but you get the idea. 2. by Updated 27/2/14: note that the original example in this section no longer works; use colMeans now instead of mean.Description: “Function ‘by’ is an object-oriented wrapper for ‘tapply’ applied to data frames.” The by function is a little more complex than that. The replicate function is very useful.
Alternative Interfaces how fast does miles teller play in whiplash EDIT 05 Sep. 2015: The concept of Beat Per Minutes (BPM) has been mis-understood as mentioned by reddit. What I was supposed to write was Strokes Per Minutes (SPM). Released in 2014, Whiplash focuses on a promising young drummer (Miles Teller) pursuing his dream of greatness. I am unfortunately not a musician, nor an enlightened enthusiast, so what strikes me the most is the strong ability of Miles Teller to play quite fast. The metric used is the Beat Per Minutes (BPM) which, in the case of the drum, simplified to how many times the drummer hits his instrument per minutes. Now let's see how the Miles performs in the first see of the movie. The BPM of the final scene has also been studied (from 2:36 to 3:56 of the embedded video). Truly fast in my opinion. And finally, let's look at the BPM of the challenge given by the World's Fastest Drummer to Miles Teller. But as Miles said “Why would you challenge a guy who played in some garage bands in Florida and has a fun time doing it?
Restaurant Revenue Prediction Kaggle Scripts enable you to run R, Python, Julia, or R Markdown code directly on competition datasets. The script computations currently have the following restrictions: Input data files are stored in the read-only /kaggle/input directory. Your scripts execute with /kaggle/working set as the current working directory. PhysioBank Archive Index This page lists all currently available databases in the PhysioBank archives, organized according to the types of signals and annotations contained in each database: If you prefer, you can view separate lists of these databases organized by class: Class 1 (completed reference databases) Class 2 (archival copies of raw data that support published research, contributed by authors or journals) Class 3 (other contributed collections of data, including works in progress) We make class 2 and class 3 data available via PhysioNet as a service to the research community. Contributed data are placed in classes 2 and 3 on acceptance, and may be admitted to class 1 after review and a public comment period. On this page, listings within each group are ordered by class, and then alphabetically by the name of the database. Multi-Parameter Databases These databases include a variety of digitized physiologic signals in each recording. [Class 1] MGH/MF Waveform Database. ECG Databases
Je vais donner accès aux groupes datasets et data tools pour garder le tout propre ;) by dishwasherz Jul 27
Un grand ensemble de datasets très variés. A garder de côté ! by simd3v Jul 27