background preloader

Data Science Specialization

Data Science Specialization

Related:  Data ScienceRCoursera Data Science Specialization

IBM Many Eyes Democratizing visualization Advanced visualization from IBM can help you gain insight from the myriad of data that your company generates. You can understand much more about the underlying numbers in your data when you can see them. For your visualization to be effective, you need technology that simplifies the visualization creation process and guidance from visualization specialists who can show you the best format for presenting your data.

List A list is a generic vector containing other objects. For example, the following variable x is a list containing copies of three vectors n, s, b, and a numeric value 3. > n = c(2, 3, 5) > s = c("aa", "bb", "cc", "dd", "ee") > b = c(TRUE, FALSE, TRUE, FALSE, FALSE) > x = list(n, s, b, 3) # x contains copies of n, s, b List Slicing We retrieve a list slice with the single square bracket "[]" operator.

Read Statistical inference for data science About this book This book is written as a companion book to the Statistical Inference Coursera class as part of the Data Science Specialization. However, if you do not take the class, the book mostly stands on its own. Data Repository List From Open Access Directory This list is part of the Open Access Directory. This is a list of repositories and databases for open data. Please annotate the entries to indicate the hosting organization, scope, licensing, and usage restrictions (if any). If a repository is open in some respects but not others, please include it with an annotation rather than exclude it.

R Introduction We offer here a couple of introductory tutorials on basic R concepts. It serves as background material for our main tutorial series Elementary Statistics withR. The only hardware requirement for most of the R tutorials is a PC with the latest free open source R software installed. aggregate {stats} Compute Summary Statistics of Data Subsets Description Splits the data into subsets, computes summary statistics for each, and returns the result in a convenient form. Usage New York Times APIs The Article Search API Search Times articles from 1851 to today, retrieving headlines, abstracts and links to associated multimedia. The Books API

Axes and Text Many high level plotting functions (plot, hist, boxplot, etc.) allow you to include axis and text options (as well as other graphical paramters). For example # Specify axis options within plot() plot(x, y, main="title", sub="subtitle", xlab="X-axis label", ylab="y-axix label", xlim=c(xmin, xmax), ylim=c(ymin, ymax)) A deterministic statistical machine As Roger pointed out the most recent batch of Y Combinator startups included a bunch of data-focused companies. One of these companies, StatWing, is a web-based tool for data analysis that looks like an improvement on SPSS with more plain text, more visualization, and a lot of the technical statistical details “under the hood”. I first read about StatWing on TechCrunch, where the title, “How Statwing Makes It Easier To Ask Questions About Data So You Don’t Have To Hire a Statistical Wizard”.

AWS Public Data Sets A data set containing Google Books n-gram corpora. This data set is freely available on Amazon S3 in a Hadoop friendly file format and is licensed under a Creative Commons Attribution 3.0 Unported License. The original dataset is available from Last Modified: Jan 12, 2015 21:46 PM GMT

SparkR (R on Spark) - Spark 1.4.0 Documentation SparkR is an R package that provides a light-weight frontend to use Apache Spark from R. In Spark 1.4.0, SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. A DataFrame is a distributed collection of data organized into named columns. It is conceptually equivalent to a table in a relational database or a data frame in R, but with richer optimizations under the hood. DataFrames can be constructed from a wide array of sources such as: structured data files, tables in Hive, external databases, or existing local R data frames.