background preloader

Statistics

Facebook Twitter

Correlation. When two sets of data are strongly linked together we say they have a High Correlation. The word Correlation is made of Co- (meaning "together"), and Relation Correlation is Positive when the values increase together, and Correlation is Negative when one value decreases as the other increases Like this: Correlation can have a value: 1 is a perfect positive correlation 0 is no correlation (the values don't seem linked at all) -1 is a perfect negative correlation The value shows how good the correlation is (not how steep the line is), and if it is positive or negative. Example: Ice Cream Sales The local ice cream shop keeps track of how much ice cream they sell versus the temperature on that day, here are their figures for the last 12 days: And here is the same data as a Scatter Plot: We can easily see that warmer weather leads to more sales, the relationship is good but not perfect.

In fact the correlation is 0.9575 ... see at the end how I calculated it. Correlation Is Not Good at Curves Where: Normal Distribution. Data can be "distributed" (spread out) in different ways. But there are many cases where the data tends to be around a central value with no bias left or right, and it gets close to a "Normal Distribution" like this: A Normal Distribution The "Bell Curve" is a Normal Distribution.

And the yellow histogram shows some data that follows it closely, but not perfectly (which is usual). Many things closely follow a Normal Distribution: heights of people size of things produced by machines errors in measurements blood pressure marks on a test We say the data is "normally distributed". Quincunx Standard Deviations The Standard Deviation is a measure of how spread out numbers are (read that page for details on how to calculate it). When you calculate the standard deviation of your data, you will find that (generally): Example: 95% of students at school are between 1.1m and 1.7m tall. Assuming this data is normally distributed can you calculate the mean and standard deviation? Standard Scores. Standard Deviation and Variance. Deviation just means how far from the normal Standard Deviation The Standard Deviation is a measure of how spread out numbers are.

Its symbol is σ (the greek letter sigma) The formula is easy: it is the square root of the Variance. So now you ask, "What is the Variance? " Variance The Variance is defined as: The average of the squared differences from the Mean. To calculate the variance follow these steps: Work out the Mean (the simple average of the numbers)Then for each number: subtract the Mean and square the result (the squared difference).Then work out the average of those squared differences. Example You and your friends have just measured the heights of your dogs (in millimeters): The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm. Find out the Mean, the Variance, and the Standard Deviation. Your first step is to find the Mean: Answer: Mean = 600 + 470 + 170 + 430 + 3005 = 19705 = 394 so the mean (average) height is 394 mm.

So the Variance is 21,704 Formulas Oh No! Quartiles. Quartiles are the values that divide a list of numbers into quarters. First put the list of numbers in order Then cut the list into four equal parts The Quartiles are at the "cuts" Like this: Example: 5, 8, 4, 4, 6, 3, 8 Put them in order: 3, 4, 4, 5, 6, 8, 8 Cut the list into quarters: And the result is: Quartile 1 (Q1) = 4 Quartile 2 (Q2), which is also the Median, = 5 Quartile 3 (Q3) = 8 Sometimes a "cut" is between two numbers ... the Quartile is the average of the two numbers.

Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8 The numbers are already in order In this case Quartile 2 is half way between 5 and 6: Quartile 1 (Q1) = 3 Quartile 2 (Q2) = 5.5 Quartile 3 (Q3) = 7 Interquartile Range The "Interquartile Range" is from Q1 to Q3: To calculate it just subtract Quartile 1 from Quartile 3, like this: Example: The Interquartile Range is: Box and Whisker Plot You can show all the important values in a "Box and Whisker Plot", like this: A final example covering everything: Put them in order: Cut it into quarters:

Calculating the mean from a frequency table. It is easy to calculate the Mean: Add up all the numbers, then divide by how many numbers there are. Example 1: What is the Mean of these numbers? Add the numbers: 6 + 11 + 7 = 24 Divide by how many numbers (there are 3 numbers): 24 ÷ 3 = 8 The Mean is 8 But sometimes you won't have a simple list of numbers, you might have a frequency table like this (the "frequency" says how often they occur): (it says that score 1 occurred 2 times, score 2 occurred 5 times, etc) You could list all the numbers like this: But rather than do lots of adds (like 3+3+3+3) it is often easier to use multiplication: And rather than count how many numbers there are, we can add up the frequencies: So let's calculate: And that is how to calculate the mean from a frequency table!

Here is another example: Example: Parking Spaces per House in Hampton Street Isabella went up and down the street to find out how many parking spaces each house had. What is the mean number of Parking Spaces? Answer: Notation (where f is frequency) Standard Deviation Formulas. Deviation just means how far from the normal Standard Deviation The Standard Deviation is a measure of how spread out numbers are. You might like to read this simpler page on Standard Deviation first. But here we explain the formulas. The symbol for Standard Deviation is σ (the Greek letter sigma). This is the formula for Standard Deviation: Say what? OK. Say you have a bunch of numbers like 9, 2, 5, 4, 12, 7, 8, 11. To calculate the standard deviation of those numbers: 1. The formula actually says all of that, and I will show you how. The Formula Explained First, let us have some example values to work on: Example: Sam has 20 Rose Bushes. The number of flowers on each bush is Work out the Standard Deviation.

Step 1. In the formula above μ (the greek letter "mu") is the mean of all our values ... Example: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4 The mean is: So: μ = 7 Step 2. This is the part of the formula that says: So what is xi ? In other words x1 = 9, x2 = 2, x3 = 5, etc. Step 3.