The Work of Edward Tufte and Graphics Press Edward Tufte is a statistician and artist, and Professor Emeritus of Political Science, Statistics, and Computer Science at Yale University. He wrote, designed, and self-published 4 classic books on data visualization. The New York Times described ET as the "Leonardo da Vinci of data," and Business Week as the "Galileo of graphics." He is now writing a book/film The Thinking Eye and constructing a 234-acre tree farm and sculpture park in northwest Connecticut, which will show his artworks and remain open space in perpetuity. He founded Graphics Press, ET Modern gallery/studio, and Hogpen Hill Farms LLC. Visual Display of Quantitative Information 200 pages Datasets for Data Mining, Analytics and Knowledge Discovery See also Data repositories AssetMacro, historical data of Macroeconomic Indicators and Market Data.
List of academic databases and search engines the general list of search engines for all-purpose search engines that can be used for academic purposesbibliographic databases for information about databases giving bibliographic information about finding books and journal articles. Note that "free" or "subscription" can refer both to the availability of the database or of the journal articles included. This has been indicated as precisely as possible in the lists below. See also Large Network Dataset Collection Social networks Networks with ground-truth communities Communication networks Citation networks Collaboration networks Web graphs
Options: Chunk options and package options The knitr package shares most options with Sweave, but some were dropped/changed and some new options were added. The default values are in the parentheses below. Note that the chunk label for each chunk is assumed to be unique, i.e., no two chunks share the same label. Axes and Text Many high level plotting functions (plot, hist, boxplot, etc.) allow you to include axis and text options (as well as other graphical paramters). For example # Specify axis options within plot() plot(x, y, main="title", sub="subtitle", xlab="X-axis label", ylab="y-axix label", xlim=c(xmin, xmax), ylim=c(ymin, ymax)) For finer control or for modularization, you can use the functions described below.
BBC Datasets - Machine Learning Group (UCD) Two news article datasets, originating from BBC News, provided for use as benchmarks for machine learning research. These datasets are made available for non-commercial and research purposes only, and all data is provided in pre-processed matrix format. If you make use of these datasets please reference the publication: 6 dataset lists curated by data scientists Docs Blog 6 dataset lists curated by data scientists November 21, 2013 Scott Haylon Since we do a lot of experimenting with data, we’re always excited to find new datasets to use with Mortar. 50 Resources for Getting the Most Out of Google Analytics Google Analytics is a very useful free tool for tracking site statistics. For most users, however, it never becomes more than just a pretty interface with interesting graphs. The resources below will help anyone, from the beginner to those who have been using Google Analytics for some time, learn how to get the most out of this great tool. For Beginners
Do more with dates and times in R with lubridate 1.3.0 note: This vignette is an updated version of the blog post first published at r-statistics Lubridate is an R package that makes it easier to work with dates and times. Below is a concise tour of some of the things lubridate can do for you. Lubridate was created by Garrett Grolemund and Hadley Wickham. Parsing dates and times The ClueWeb09 Dataset The ClueWeb09 dataset was created to support research on information retrieval and related human language technologies. It consists of about 1 billion web pages in ten languages that were collected in January and February 2009. The dataset is used by several tracks of the TREC conference. Dataset Specifications Web Pages: 1,040,809,705 web pages, in 10 languages 5 TB, compressed. (25 TB, uncompressed.)
24 Data Science Resources to Keep Your Finger on the Pulse There are lots of resources out there to learn about, or to build upon what you already know about, data science. But where do you start? What are some of the best or most authoritative sources?