Principal Component Analysis Often, it is not helpful or informative to only look at all the variables in a dataset for correlations or covariances. A preferable approach is to derive new variables from the original variables that preserve most of the information given by their variances. Principal component analysis is a widely used and popular statistical method for reducing data with many dimensions (variables) by projecting the data with fewer dimensions using linear combinations of the variables, known as principal components. The new projected variables (principal components) are uncorrelated with each other and are ordered so that the first few components retain most of the variation present in the original variables.
Visualizing Data: Challenges to Presentation of Quality Graphics—and Solutions Naomi Robbins, a consultant and seminar leader who specializes in the graphical display of data, offers training on the effective presentation of data. She also reviews documents and presentations for clients. She is the author of Creating More Effective Graphs. Three common challenges statisticians and others face when preparing data for presentation include poor options and defaults in many software packages used for creating graphs, managers and colleagues who are socialized to expect figures that attract attention, and poor instructions from conference organizers. This article addresses each of these challenges and offers some tips for dealing with them. Rtips. Revival 2012! Paul E. Johnson <pauljohn @ ku.edu> The original Rtips started in 1999. It became difficult to update because of limitations in the software with which it was created.
Implementation of a basic reproducible data analysis workflow In a previous post, I described the principles of my basic reproducible data analysis workflow. Today, let’s be more practical and see how to implement it. Be noted that it is a basic workflow.
Statistics with R Warning Here are the notes I took while discovering and using the statistical environment R. However, I do not claim any competence in the domains I tackle: I hope you will find those notes useful, but keep you eyes open -- errors and bad advice are still lurking in those pages... Should you want it, I have prepared a quick-and-dirty PDF version of this document. The old, French version is still available, in HTML or as a single file. You may also want all the code in this document.
knitr: Elegant, flexible and fast dynamic report generation with R Overview The knitr package was designed to be a transparent engine for dynamic report generation with R, solve some long-standing problems in Sweave, and combine features in other add-on packages into one package (knitr ≈ Sweave + cacheSweave + pgfSweave + weaver + animation::saveLatex + R2HTML::RweaveHTML + highlight::HighlightWeaveLatex + 0.2 * brew + 0.1 * SweaveListingUtils + more). This package is developed on GitHub; for installation instructions and FAQ’s, see README. This website serves as the full documentation of knitr, and you can find the main manual, the graphics manual and other demos / examples here.
Empirical Software Engineering using R: first draft available for download A draft of my book Empirical Software Engineering using R is now available for download. The book essentially comes in two parts: statistical techniques that are useful for analyzing software engineering data. This draft release contains most of the techniques I plan to cover. Software - Miquel De Cáceres Ainsa Indicspecies R package Indicator species are species that are used as ecological indicators of community or habitat types, environmental conditions, or environmental changes. In order to determine indicator species, the characteristic to be predicted is represented in the form of a classification of the sites, which is compared to the patterns of distribution of the species found in that set of sites. 'Indicspecies' is an R package that contains a set of functions to assess the strength of relationship between species and a classification of sites. As such, it includes the well-known IndVal method (Dufrêne & Legendre 1997) and extends it by allowing the user to study combinations of site groups (De Cáceres et al. 2010).