 # Cluster Analysis R has an amazing variety of functions for cluster analysis. In this section, I will describe three of the many approaches: hierarchical agglomerative, partitioning, and model based. While there are no best solutions for the problem of determining the number of clusters to extract, several approaches are given below. Data Preparation Prior to clustering data, you may want to remove or estimate missing data and rescale variables for comparability. # Prepare Data mydata <- na.omit(mydata) # listwise deletion of missing mydata <- scale(mydata) # standardize variables Partitioning K-means clustering is the most popular partitioning method. # Determine number of clusters wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) for (i in 2:15) wss[i] <- sum(kmeans(mydata, centers=i)\$withinss) plot(1:15, wss, type="b", xlab="Number of Clusters", ylab="Within groups sum of squares") A robust version of K-means based on mediods can be invoked by using pam( ) instead of kmeans( ). Hierarchical Agglomerative Related:  R ResourcesRbooks

ChemWiki: The Dynamic Chemistry E-textbook - Chemwiki Factor Analysis This section covers principal components and factor analysis. The later includes both exploratory and confirmatory methods. Principal Components The princomp( ) function produces an unrotated principal component analysis. # Pricipal Components Analysis # entering raw data and extracting PCs # from the correlation matrix fit <- princomp(mydata, cor=TRUE) summary(fit) # print variance accounted for loadings(fit) # pc loadings plot(fit,type="lines") # scree plot fit\$scores # the principal components biplot(fit) click to view Use cor=FALSE to base the principal components on the covariance matrix. The principal( ) function in the psych package can be used to extract and rotate principal components. # Varimax Rotated Principal Components # retaining 5 components library(psych) fit <- principal(mydata, nfactors=5, rotate="varimax") fit # print results mydata can be a raw data matrix or a covariance matrix. Exploratory Factor Analysis mydata can be a raw data matrix or a covariance matrix.