 # Statistics

Statistics for HCI Research: Statistics with Crosstab Tables. Introduction A crosstab table is probably the most common way to visualize the nominal (categorical) data. It is a table representing the distributions of the responses to two variables. A crosstab table can be 2 x 2 or n x m as follows. As I explain in the types of data page, you cannot do many things on categorical data. Coefficients of Association (Phi-Coefficients, Contingency Coefficients and Cramer's V) A coefficient of association is something like a correlation for categorical data. This crosstab table shows the distribution of the ownership of the two devices separated by users' ages. > data <- matrix(c(20, 10, 3, 27), ncol=2, byrow=T) > library(vcd) > assocstats(data) X^2 df P(> X^2) Likelihood Ratio 22.185 1 2.4762e-06 Pearson 20.376 1 6.3622e-06 Phi-Coefficient : 0.583 Contingency Coeff.: 0.503 Cramer's V : 0.583 Thus, Cramer's V is 0.58 in this example.

Agreement and inter-rater reliability (Cohen's Kappa) Cohen's Kappa for Nominal Data Thus, you have 94% agreement. References. Statistics for HCI Research: Normality and Data transformation. Introduction Data transformation is a powerful tool when the data don't look like forming a normal distribution. The idea of data transformation is that you convert your data so that you can assume the normality and use parametric tests. To determine whether we need any data transformation, we need to check the normality of the data. Although there are several statistical methods for checking the normality, what you should do is to look at a histogram and QQ-plot, and then run a test for checking the normality .

You also should read the section for the differences of the two statistical methods explained in this page . One important point of data transformation is that you must defend that your data transformation is legitimate . Histogram We prepare data by using a random function. > set.seed(111) > data_normal <- rnorm(20) This means that we are randomly taking 20 samples from the normal distribution with mean = 0 and sd = 1. We will also prepare another kind of data for comparison. Annotated Output of Proc Univariate. SAS Annotated Output Proc univariate Below is an example of code used to investigate the distribution of a variable. In our example, we will use the hsb2 data set and we will investigate the distribution of the continuous variable write, which is the scores of 200 high school students on a writing test. We use the plots option on the proc univariate statement to produce the stem-and-leaf and normal probability plots shown at the bottom of the output.

We will start by showing all of the unaltered output produced by this command, and then we will annotate each section. proc univariate data = "D:\hsb2" plots; var write; run; The UNIVARIATE Procedure Variable: write (writing score) Moments N 200 Sum Weights 200 Mean 52.775 Sum Observations 10555 Std Deviation 9.47858602 Variance 89.843593 Skewness -0.4820386 Kurtosis -0.7502476 Uncorrected SS 574919 Corrected SS 17878.875 Coeff Variation 17.9603714 Std Error Mean 0.67023725 Basic Statistical Measures Location Variability Tests for Location: Mu0=0. Rt.uits.iu.edu/visualization/analytics/docs/normality-docs/normality.pdf. Statwiki. Academic.csuohio.edu/kneuendorf/c63111/hand22.pdf. The CORR Procedure: Cronbach’s Coefficient Alpha. The CORR Procedure: Computing Cronbach’s Coefficient Alpha. Know How | Likert Scale – What is it? When to Use it? How to Analyze it?

In all likelihood, you have used a Likert scale (or something you’ve called a Likert scale) in a survey before. It might surprise you to learn that Likert scales are a very specific format and what you have been calling Likert may not be. Not to worry — researchers that have been doing surveys for years still get their definitions confused. In fact, many researchers do not even agree on the best way to report on the numeric values in a Likert scale. This article will explain the traditional and, in our opinion, most valuable way to use Likert scales and report on them. What is a Likert Scale vs. a Likert Item A “Likert scale” is actually the sum of responses to several Likert items. In a “good” Likert scale, the scale is balanced on both sides of a neutral option, creating a less biased measurement.

A “Likert Item” is a statement that the respondent is asked to evaluate. Below is an example of a nearly perfect Likert scale. So given this new information, when should you use a Likert scale? Analyzing Likert Data. Introduction Over the years, numerous methods have been used to measure character and personality traits (Likert, 1932). The difficulty of measuring attitudes, character, and personality traits lies in the procedure for transferring these qualities into a quantitative measure for data analysis purposes. The recent popularity of qualitative research techniques has relieved some of the burden associated with the dilemma; however, many social scientists still rely on quantitative measures of attitudes, character and personality traits. In response to the difficulty of measuring character and personality traits, Likert (1932) developed a procedure for measuring attitudinal scales. The original Likert scale used a series of questions with five response alternatives: strongly approve (1), approve (2), undecided (3), disapprove (4), and strongly disapprove (5).

He combined the responses from the series of questions to create an attitudinal measurement scale. Likert-Type Versus Likert Scales. Statistics Roundtable: Likert Scales and Data Analyses. By I. Elaine Allen and Christopher A. Seaman Surveys are consistently used to measure quality. For example, surveys might be used to gauge customer perception of product quality or quality performance in service delivery. Likert scales are a common ratings format for surveys. Respondents rank quality from high to low or best to worst using five or seven levels. Statisticians have generally grouped data collected from these surveys into a hierarchy of four levels of measurement: Nominal data: The weakest level of measurement representing categories without numerical representation. Data analyses using nominal, interval and ratio data are generally straightforward and transparent.

An underlying reason for analyzing ordinal data as interval data might be the contention that parametric statistical tests (based on the central limit theorem) are more powerful than nonparametric alternatives. Basics of Likert Scales Analysis, Generalization To Continuous Indexes Conclusion I. CHRISTOPHER A. Wilcoxon. Introduction to social network analysis:  Chapter 18:  Some statistical tools. Introduction to social network methods 18. Some statistical tools This page is part of an on-line text by Robert A. Hanneman (Department of Sociology, University of California, Riverside) and Mark Riddle (Department of Sociology, University of Northern Colorado). Contents of chapter 18: Some statistical tools Introduction: Applying statistical tools to network data Network analysis in the social sciences developed from a conjuncture of anthropologist's observations about relations in face-to-face groups and mathematical graph theory.

In more recent work, however, some of the focus of social network research has moved away from these roots. All of these concerns (large networks, sampling, concern about the reliability of observations) have led social network researchers to begin to apply the techniques of descriptive and inferential statistics in their work. Inferential statistics have also proven to have very useful applications to social network analysis. Table of contents Figure 18.1.

## SAS

Nonparametric Statistics. Types of Non-parametric Tests Non-parametric Tests Summary Table Non-parametric equivalent of Student's t tests Mechanics of the Mann Whitney U test - analogue of the Two Independent samples t test Mechanics of the Wilcoxon Matched Pairs (or Signed Ranks, Paired Samples, Rank Sum) test - analogue of the Two Dependent samples t test (calculations are based on a comparison of the sums of the absolute values of the positive rankings, R, and negative rankings, S, of differences between samples) Non-parametric equivalent of the ANOVA Kruskal-Wallis test Non-parametric equivalent of Correlation Coefficient Spearman Rank Correlation Coefficient BI 45 Homepage Saint Anselm College Homepage Blackboard at Saint Anselm College Dr.

Trademark and Disclaimers Copyright © 2001 Jay Pitocchelli. Www.stat.cmu.edu/~hseltman/309/Book/chapter16.pdf. Www.utdallas.edu/~herve/Abdi-PLS-pretty.pdf. Mediation (David A. Kenny) Some might benefit from Muthén (2011). Note that both the CDE and the NDE would equal the regression slope or what was earlier called path c' if the model is linear, assumptions are met, and there is no XM interaction affecting Y, the NIE would equal ab, and the TE would equal ab + c'. In the case in which the specifications made by traditional mediation approach (e.g., linearity, no omitted variables, no XM interaction), the estimates would be the same. Here I give the general formulas for the NDE and NIE when X is an intervally measured based on Valeri & VanderWeele, (2013). If the XM effect is added to the Y equation, that equation can be stated as and the intercept in the M equation can be denoted as iM.

Where X0 is a theoretical baseline score on X or a "zero" score and X1 is a theoretical "improvement" score on X or "1" score. When X is a dichotomy, it is fairly obvious what values to use for X0 and X1. References Baron, R. Bauer, D. Bolger, N., & Laurenceau, J. Bollen, K. Cole, D.

Analytics.ncsu.edu/sesug/2004/TU04-Pappas.pdf.

## Cluster Analysis

Www.math.wpi.edu/saspdf/stat/chap47.pdf.