background preloader

PCA - CA - FA - MDS

Facebook Twitter

Computing and visualizing PCA in R | Thiago G. Martins. Following my introduction to PCA, I will demonstrate how to apply and visualize PCA in R. There are many packages and functions that can apply PCA in R. In this post I will use the function prcomp from the stats package. I will also show how to visualize PCA in R using Base R graphics. However, my favorite visualization function for PCA is ggbiplot, which is implemented by Vince Q. Vu and available on github. Please, let me know if you have better ways to visualize PCA in R.

Computing the Principal Components (PC) I will use the classical iris dataset for the demonstration. We will apply PCA to the four continuous variables and use the categorical variable to visualize the PCs later. Since skewness and the magnitude of the variables influence the resulting PCs, it is good practice to apply skewness transformation, center and scale the variables prior to the application of PCA. Analyzing the results The prcomp function returns an object of class prcomp, which have some methods available. Factoextra R package : Quick Multivariate data analysis (PCA, CA, MCA) and visualization - R software and data mining. The R package factoextra provides some easy-to-use functions to extract and visualize the output of PCA (Principal Component Analysis), CA (Correspondence Analysis) and MCA (Multiple Correspondence Analysis) functions from several packages : PCA, CA, MCA [FactoMineR]; prcomp and princomp [stats]; dudi.pca, dudi.coa, dudi.acm [ade4]; ca [ca]; corresp [MASS].

Ggplot2 plotting system is used. Principal Component Analysis is used to summarize the information contained in a continuous (i.e, quantitative) multivariate data by reducing the dimensionality of the data without loosing important informations.Correspondence Analysis (CA) is an extension of Principal Component Analysis suited to analyse a large contingency table formed by two qualitative variables (or categorical data).Multiple Correspondence Analysis (MCA) is an adaptation of CA to a data table containing more van two categorical variables. If you want to do this, there is no other package, use factoextra, it’s simple. Get_pca: Extract the results for individuals/variables in Principal Component Analysis - R software and data mining. Extract all the results (coordinates, squared cosine, contributions) for the active individuals/variables from Principal Component Analysis (PCA) outputs. get_pca(): Extract the results for variables and individualsget_pca_ind(): Extract the results for individuals onlyget_pca_var(): Extract the results for variables only These functions are included in factoextra package.

The package devtools is required for the installation as factoextra is hosted on github. library("devtools")install_github("kassambara/factoextra") Load factoextra : library("factoextra") get_pca(res.pca, element = c("var", "ind")) get_pca_ind(res.pca, ...) get_pca_var(res.pca) A list of matrices containing all the results for the active individuals/variables including: coord: coordinates for the individuals/variablescos2: cos2 for the individuals/variablescontrib: contributions of the individuals/variables Principal component analysis data(iris)head(iris) res.pca <- prcomp(iris[, -5], scale = TRUE) var <- get_pca_var(res.pca)var.

Get_mca: Extract the results for individuals/variables in Multiple Correspondence Analysis. Extract all the results (coordinates, squared cosine and contributions) for the active individuals/variable categories from Multiple Correspondence Analysis (MCA) outputs. get_mca(): Extract the results for variables and individualsget_mca_ind(): Extract the results for individuals onlyget_mca_var(): Extract the results for variables only These functions are included in factoextra package.

The package devtools is required for the installation as factoextra is hosted on github. library("devtools")install_github("kassambara/factoextra") Load factoextra : library("factoextra") get_mca(res.mca, element = c("var", "ind")) get_mca_var(res.mca) get_mca_ind(res.mca) A list of matrices containing all the results for the active individuals/variables including: coord: coordinates for the individuals/variablescos2: cos2 for the individuals/variablescontrib: contributions of the individuals/variables Multiple Correspondence Analysis res.mca <- MCA(poison.active, graph=FALSE) Extract the results for variables.

Get_ca: Extract the results for rows/columns in Correspondence Analysis - R software and data mining. Extract all the results (coordinates, squared cosine, contributions and inertia) for the active row/column variables from Correspondence Analysis (CA) outputs. get_ca(): Extract the results for rows and columnsget_ca_row(): Extract the results for rows onlyget_ca_col(): Extract the results for columns only These functions are included in factoextra package. The package devtools is required for the installation as factoextra is hosted on github. library("devtools")install_github("kassambara/factoextra") Load factoextra : library("factoextra") get_ca(res.ca, element = c("row", "col")) get_ca_col(res.ca) get_ca_row(res.ca) A list of matrices containing all the results for the active rows/columns including: coord: coordinates for the rows/columnscos2: cos2 for the rows/columnscontrib: contributions of the rows/columns Correspondence Analysis A Correspondence Analysis (CA) is performed using the function CA() [in FactoMineR] and housetasks data [in factoextra]: Extract the results for column variables.

Fviz_pca: Quick Principal Component Analysis data visualization - R software and data mining. Draw the graph of individuals/variables from the output of Principal Component Analysis (PCA). The following functions, from factoextra package are use: fviz_pca_ind(): Graph of individualsfviz_pca_var(): Graph of variablesfviz_pca_biplot() (or fviz_pca()): Biplot of individuals and variables The package devtools is required for the installation as factoextra is hosted on github. library("devtools")install_github("kassambara/factoextra") Load factoextra : library("factoextra") Principal component analysis A principal component analysis (PCA) is performed using the built-in R function prcomp() and iris data: data(iris)head(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2 setosa 3 4.7 3.2 1.3 0.2 setosa 4 4.6 3.1 1.5 0.2 setosa 5 5.0 3.6 1.4 0.2 setosa 6 5.4 3.9 1.7 0.4 setosa res.pca <- prcomp(iris[, -5], scale = TRUE) fviz_pca_ind(): Graph of individuals fviz_pca_ind(res.pca) fviz_pca_ind(res.pca) + labs(title ="PCA", x = "PC1", y = "PC2")

Fviz_mca: Quick Multiple Correspondence Analysis data visualization - R software and data mining. Draw the graph of individuals/variables from the output of Multiple Correspondence Analysis (MCA). The following functions, from factoextra package are use: fviz_mca_ind(): Graph of individualsfviz_mca_var(): Graph of variable categoriesfviz_mca_biplot() (or fviz_mca()): Biplot of individuals and variable categories The package devtools is required for the installation as factoextra is hosted on github. library("devtools")install_github("kassambara/factoextra") Load factoextra : library("factoextra") The default plot of MCA is a “symmetric” plot in which both rows and columns are in principal coordinates.

“rowprincipal” or “colprincipal”: asymmetric plots with either rows in principal coordinates and columns in standard coordinates, or vice versa. Multiple Correspondence Analysis A Multiple Correspondence Analysis (MCA) is performed using the function MCA() [in FactoMineR] and poison data [in FactoMineR]: res.mca <- MCA(poison.active, graph=FALSE) fviz_mca_ind(): Graph of individuals. Fviz_contrib - Quick visualization of row/column contributions - R software and data mining. This function can be used to visualize the contributions of rows/columns from the results of Principal Component Analysis (PCA), Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA) functions. The function fviz_contrib() [in factoextra package] is used. The package devtools is required for the installation as factoextra is hosted on github. devtools::install_github("kassambara/factoextra") Load factoextra : library("factoextra") fviz_contrib(X, choice = c("row", "col", "var", "ind"), axes = 1, fill = "steelblue", color = "steelblue", sort.val = c("desc", "asc", "none"), top = Inf) The function fviz_contrib() creates a barplot of row/column contributions.

For a given dimension, any row/column with a contribution above the reference line could be considered as important in contributing to the dimension. Principal component analysis A principal component analysis (PCA) is performed using the built-in R function prcomp() and the decathlon2 [in factoextra] data. Fviz_ca: Quick Correspondence Analysis data visualization using factoextra - R software and data mining. Graph of column/row variables from the output of Correspondence Analysis (CA). The following functions, from factoextra package are use: fviz_ca_row(): Graph of row variablesfviz_ca_col(): Graph of column variablesfviz_ca_biplot(): Biplot of row and column variablesfviz_ca(): An alias of fviz_ca_biplot() These functions are included in factoextra package. The package devtools is required for the installation as factoextra is hosted on github. library("devtools")install_github("kassambara/factoextra") Load factoextra : library("factoextra") The default plot of CA is a “symmetric” plot in which both rows and columns are in principal coordinates.

“rowprincipal” or “colprincipal”: asymmetric plots with either rows in principal coordinates and columns in standard coordinates, or vice versa. Correspondence Analysis Correspondence Analysis (CA) is performed using the function CA() [in FactoMineR] and housetasks data [in factoextra]: library("FactoMineR")data(housetasks)head(housetasks) Eigenvalues: Quick data visualization with factoextra - R software and data mining. This article describes how to extract and visualize the eigenvalues/variances of the dimensions from the results of Principal Component Analysis (PCA), Correspondence Analysis (CA) and Multiple Correspondence Analysis (MCA) functions. The R software and factoextra package are used. The functions described here are: get_eig() (or get_eigenvalue()): Extract the eigenvalues/variances of the principal dimensionsfviz_eig() (or fviz_screeplot()): Plot the eigenvalues/variances against the number of dimensions The package devtools is required for the installation as factoextra is hosted on github. library("devtools")install_github("kassambara/factoextra") Load factoextra : library("factoextra") get_eig(X) fviz_eig(X, choice = c("variance", "eigenvalue"), geom = c("bar", "line"), barfill = "steelblue", barcolor = "steelblue", linecolor = "black", ncp = 5, addlabels = FALSE, ...) get_eigenvalue(X) fviz_screeplot(...)

Principal Component Analysis data(iris)head(iris) fviz_eig(res.pca) fviz_eig(res.ca) Reinventing the wheel for ordination biplots with ggplot2 – R is my friend. I’ll be the first to admit that the topic of plotting ordination results using ggplot2 has been visited many times over. As is my typical fashion, I started creating a package for this purpose without completely searching for existing solutions. Specifically, the ggbiplot and factoextra packages already provide almost complete coverage of plotting results from multivariate and ordination analyses in R. Being the stubborn individual, I couldn’t give up on my own package so I started exploring ways to improve some of the functionality of biplot methods in these existing packages.

For example, ggbiplot and factoextra work almost exclusively with results from principal components analysis, whereas numerous other multivariate analyses can be visualized using the biplot approach. I started to write methods to create biplots for some of the more common ordination techniques, in addition to all of the functions I could find in R that conduct PCA.

I’ll repeat myself again. Cheers, Marcus. Multidimensional Scaling (MDS) with R | blog.RDataMining.com. Multidimensional Scaling (MDS) with R This page shows Multidimensional Scaling (MDS) with R. It demonstrates with an example of automatic layout of Australian cities based on distances between them. The layout obtained with MDS is very close to their locations on a map. At first, the data of distances between 8 city in Australia are loaded from dist.au <- read.csv(" Alternatively, we can download the file first and then read it into R from local drive. dist.au <- read.csv("dist-Aus.csv")dist.au Then we remove the frist column, acronyms of cities, and set them to row names. row.names(dist.au) <- dist.au[, 1]dist.au <- dist.au[, -1]dist.au After that, we run Multidimensional Scaling (MDS) with function cmdscale(), and get x and y coordinates. fit <- cmdscale(dist.au, eig = TRUE, k = 2)x <- fit$points[, 1]y <- fit$points[, 2] Like this: Like Loading...

Metaphors Matter: Factor Structure vs. Correlation Network Maps. The psych R package includes a data set called "bfi" with self-report ratings on 25 personality items along a 6-point agreement scale. All the details are provided in the documentation accompanying the package. My focus is how to represent the correlations among these ratings: factor analysis or network graphics?

Let's start with the correlation network map produced by the R package qgraph. As always, all the R code can be found at the end of this post. First, we need to discover the underlying pattern, so we will begin by looking for nodes with the highest correlations and thus interconnected with the thickest lines. Red lines indicate negative correlations (e.g., those who claim that they are "indifferent to others" are unlikely to tell us that they "inquire about others" or "comfort others").

Using this approach, we can identify several regions that are placed near each other because of their interconnections. The network model provides an alternative account. Plotting principal component analysis with ggplot. Visualizing Principal Components « Systematic Investor. Interactive MDS visualisation using D3 « dahtah. Here’s a sneak peak into upcoming visualisation work. I’ve been working a bit on MDS (Multi-dimensional scaling), a classical technique for visualising distance data. Classical MDS is useful, but interactive MDS is *much* more useful. Using D3, a Javascript visualisation framework, it’s relatively easy to make interactive MDS plots.

This example shows how basic interaction can be used to show the approximation inherent in a MDS representation. The sourcecode is available as well, but seeing as I’m new to Javascript it’s not exactly a model of clarity and elegance. Like this: Like Loading... 7 Functions to do Metric Multidimensional Scaling in R « Data Analysis Visually Enforced.