Untitled. Analyze Core | data is beautiful, data is a story. Packages. Using R to quickly create an interactive online map using the leafletR package | Technical Tidbits From Spatial Analysis & Data Science. Making Maps with R | The Molecular Ecologist. First off, thanks to Tim and Jeremy for the invitation to write a guest post here on using R to make maps! As a brief introduction, my name is Kim Gilbert, and I am a Ph.D. student at the University of British Columbia working with Mike Whitlock. I am broadly interested in population genetics and population structure, and am currently studying local adaptation in a tree species.
If you want to know more, check out my website, where I also have this tutorial as a .pdf presentation. Okay, onward with R! And apologies in advance that the R code provided will not show color coded text (a limitation of wordpress), but I decided that it is more useful to leave as text in order to allow copying and pasting rather than insert screenshots that might look nicer but require retyping on your part. In the field of molecular ecology we see many, many maps. R is free, it is open source, and users are constantly contributing new packages and functions. Make a Simple Map Plotting GPS Data & Shapefiles. Packages. Data Science - Community. Web Technologies and Services. Data Sets. Index of /ml/machine-learning-databases. Publishable Stuff. DataRemixed. VizWiz - Data Visualization Done Right. Visual Business Intelligence.
We typically think of quantitative scales as linear, with equal quantities from one labeled value to the next. For example, a quantitative scale ranging from 0 to 1000 might be subdivided into equal intervals of 100 each. Linear scales seem natural to us. If we took a car trip of 1000 miles, we might imagine that distance as subdivided into ten 100 mile segments. It isn’t likely that we would imagine it subdivided into four logarithmic segments consisting of 1, 9, 90, and 900 mile intervals. Similarly, we think of time’s passage—also quantitative—in terms of days, weeks, months, years, decades, centuries, or millennia; intervals that are equal (or in the case of months, roughly equal) in duration. Logarithms and their scales are quite useful in mathematics and at times in data analysis, but they are only useful for presenting data on those relatively rare cases when addressing an audience that consists of those who have been trained to think in logarithms.
VC blog. Posted: November 26th, 2014 | Author: Manuel Lima | Filed under: Uncategorized | No Comments » As some attentive users of Visual Complexity might have noticed, the number of projects featured on the website has slowly come to a halt, with the perpetual grand total of 777 being a grieving reminder of inactivity for well over a year. Today, If you go the the main page and look at the top right corner, you will see an invigorating new message: “Indexing 782 projects”. Of course I didn’t want to write this blog post to announce that five new projects have been added to the database.
This recent addition is part of a larger plan I’ve been wanting to share with you for some time. In October 2015, Visual Complexity will celebrate its 10th Anniversary, a significant feat considering the life-span of many online projects, and an eerie memo that a long time has gone by since I launched the website after graduating from a MFA program at Parsons School of Design. I was immediately hooked. Treemaps. Information aesthetics - Data Visualization & Information Design. Principal Component Analysis step by step. In this article I want to explain how a Principal Component Analysis (PCA) works by implementing it in Python step by step. At the end we will compare the results to the more convenient Python PCA()classes that are available through the popular matplotlib and scipy libraries and discuss how they differ.
The main purposes of a principal component analysis are the analysis of data to identify patterns and finding patterns to reduce the dimensions of the dataset with minimal loss of information. Here, our desired outcome of the principal component analysis is to project a feature space (our dataset consisting of n x d-dimensional samples) onto a smaller subspace that represents our data "well". A possible application would be a pattern classification task, where we want to reduce the computational costs and the error of parameter estimation by reducing the number of dimensions of our feature space by extracting a subspace that describes our data "best". What is a "good" subspace? FlowingData | Data Visualization, Infographics, and Statistics.
A mashup There is an R package called sqldf that allows you to use SQL commands to extract data from an R data frame. It must be installed on your machine (once)it must be available in each R session where it is used You can use sqldf by doing (one time only, and assuming an internet connection): install.packages("sqldf") Then in each R session where you want to use it: require(sqldf) To simplify the examples, we’ll slightly modify one of the inbuilt data frames: myCO2 <- CO2 attributes(myCO2) <- attributes(CO2)[ c("names", "row.names", "class")] class(myCO2) <- "data.frame" Note that the character between C and 2 is a capital-O and not a zero. Column names In R the colnames function returns the names of the columns: Subsetting columns All columns. Error Statistics Philosophy. Machine Learning (Theory)
Data Mining: Text Mining, Visualization and Social Media. The Turing Test for artificial intelligence is a reasonably well understood idea: if, through a written form of communication, a machine can convince a human that it too is a human, then it passes the test. The elegance of this approach (which I believe is its primary attraction) is that it avoids any troublesome definition of intelligence and appeals to an innate ability in humans to detect entities which are not 'one of us'.
This form of AI is the one that is generally presented in entertainment (films, novels, etc.). However, to an engineer, there are some problems with this as the accepted popular idea of artificial intelligence. I believe that software engineering can be evaluated in a simple measure of productivity. We either create things that make the impossible possible - going from 0 to 1, or we create things that amplify some value, generally a human's ability to do something, - going from X to nX.
Hilarymason.com. Big Data, Plainly Spoken (aka Numbers Rule Your World) Two years ago, Wired breathlessly extolled the virtues of A/B testing (link). A lot of Web companies are in the forefront of running hundreds or thousands of tests daily. The reality is that most A/B tests fail. A/B tests fail for many reasons. Typically, business leaders consider a test to have failed when the analysis fails to support their hypothesis.
"We ran all these tests varying the color of the buttons, and nothing significant ever surfaced, and it was all a waste of time! " Bad outcome isn't the primary reason for A/B test failure. 1. 2. 3. These issues are often ignored or dismissed. The Facebook Data Science team just launched an open platform for running online experiments, called PlanOut. The rest of this post gets into some technical, sausage-factory stuff, so be warned. Bad design is when the experiment is set up in such a way that it does not provide data to answer the research question. The problem is noise in your data. The examples of failed designs are endless. Statisfaction | I can't get no. Statisfaction | I can't get no. Doing Bayesian Data Analysis. Three-Toed Sloth. "Learning Spatio-Temporal Dynamics" : Boasting about my student's just-completed doctoral dissertation. Over 2500 words extolling new statistical methods, plus mathematical symbols and ugly computer graphics, without any actual mathematical content, or even enough detail to let others in the field judge the results.
On Monday, my student Georg M. Goerg , last seen here leading Skynet to declare war on humanity at his dissertation proposal , defeated the snake — that is, defended his thesis: Learning Spatio-Temporal Dynamics: Nonparametric Methods for Optimal Forecasting and Automated Pattern Discovery Many important scientific and data-driven problems involve quantities that vary over space and time. Being data-driven problems, it is important to have methods and algorithms that work well in practice for a wide range of spatio-temporal processes as well as various data types.
PDF [7 Mb] Since this is a simulation, can work out the true predictive distribution. Blog. Old tails: a crude power law fit on ebook sales We use R to take a very brief look at the distribution of e-book sales on Amazon.com. Read more… You don’t need to understand pointers to program using R Practical Data Science with R: Release date announced It took a little longer than we’d hoped, but we did it! Practical Data Science with R will be released on April 2nd (physical version). The eBook version will follow soon after, on April 15th. If you haven’t yet, order it now! (softbound 416 pages, black and white; includes access to color PDF, ePub and Kindle when available) Can a classifier that never says “yes” be useful? Many data science projects and presentations are needlessly derailed by not having set shared business relevant quantitative expectations early on (for some advice see Setting expectations in data science projects). Some statistics about the book The release date for Zumel, Mount “Practical Data Science with R” is getting close.
You can read the rest of the article here. Access to Statistics. Freakonometrics | An Academic Blogging Experiment. Bayesianbiologist | Corey Chivers on P(A|B) ∝P(B|A)P(A)