background preloader

Standard Deviation and Variance

Standard Deviation and Variance
Deviation just means how far from the normal Standard Deviation The Standard Deviation is a measure of how spread out numbers are. Its symbol is σ (the greek letter sigma) The formula is easy: it is the square root of the Variance. Variance The Variance is defined as: The average of the squared differences from the Mean. To calculate the variance follow these steps: Work out the Mean (the simple average of the numbers)Then for each number: subtract the Mean and square the result (the squared difference).Then work out the average of those squared differences. Example You and your friends have just measured the heights of your dogs (in millimeters): The heights (at the shoulders) are: 600mm, 470mm, 170mm, 430mm and 300mm. Find out the Mean, the Variance, and the Standard Deviation. Your first step is to find the Mean: Answer: Mean = 600 + 470 + 170 + 430 + 3005 = 19705 = 394 so the mean (average) height is 394 mm. Now we calculate each dog's difference from the Mean: So the Variance is 21,704 Formulas

Modeling integers When modeling integers, we can use colored chips to represent integers. One color can represent a positive number and another color can represent a negative number Here, a yellow chip will represent a positive integer and a red chip will represent a negative integer For example, the modeling for 4, -1, and -3 are shown below: It is extremely important to know how to model a zero. For example, all the followings represent zero pair(s) And so on... Adding and subtracting integers with modeling can be extremely helpful if you are having problems understanding integers In modeling integers, adding and subtracting are always physical actions. If a board is used with the chip, adding always mean " Add something to the board" and subtraction always mean "Remove something from the board" Here, we will use a big square to represent a board Let's start with addition of integers: Example #1: -2 + -1 Put two red chips on the board. Notice that big arrow represents the "+" sign or the action of adding

7.1.6. What are outliers in the data? The data set of N = 90 ordered observations as shown below is examined for outliers: The computations are as follows: Median = (n+1)/2 largest data point = the average of the 45th and 46th ordered points = (559 + 560)/2 = 559.5 Lower quartile = .25(N+1)th ordered point = 22.75th ordered point = 411 + .75(436-411) = 429.75 Upper quartile = .75(N+1)th ordered point = 68.25th ordered point = 739 +.25(752-739) = 742.25 Interquartile range = 742.25 - 429.75 = 312.5 Lower inner fence = 429.75 - 1.5 (312.5) = -39.0 Upper inner fence = 742.25 + 1.5 (312.5) = 1211.0 Lower outer fence = 429.75 - 3.0 (312.5) = -507.75 Upper outer fence = 742.25 + 3.0 (312.5) = 1679.75 From an examination of the fence points and the data, one point (1441) exceeds the upper inner fence and stands out as a mild outlier; there are no extreme outliers.

Quartiles Quartiles are the values that divide a list of numbers into quarters. First put the list of numbers in order Then cut the list into four equal parts The Quartiles are at the "cuts" Like this: Example: 5, 8, 4, 4, 6, 3, 8 Put them in order: 3, 4, 4, 5, 6, 8, 8 Cut the list into quarters: And the result is: Quartile 1 (Q1) = 4 Quartile 2 (Q2), which is also the Median, = 5 Quartile 3 (Q3) = 8 Sometimes a "cut" is between two numbers ... the Quartile is the average of the two numbers. Example: 1, 3, 3, 4, 5, 6, 6, 7, 8, 8 The numbers are already in order In this case Quartile 2 is half way between 5 and 6: Quartile 1 (Q1) = 3 Quartile 2 (Q2) = 5.5 Quartile 3 (Q3) = 7 Interquartile Range The "Interquartile Range" is from Q1 to Q3: To calculate it just subtract Quartile 1 from Quartile 3, like this: Example: The Interquartile Range is: Box and Whisker Plot You can show all the important values in a "Box and Whisker Plot", like this: A final example covering everything: Put them in order: Cut it into quarters:

Statistics Notes: Standard deviations and standard errors Binary numeral system The binary or base-two numeral system is a representation for numbers that uses a radix of two. It was first described by Gottfried Leibniz, and is used by most modern computers because of its ease of implementation using digital electronics--early 20th century computers were based the on/off and true/false principles of Boolean algebra. Binary can be considered the most basic practical numeral system (the Unary system is simpler, but impractical for most computation). Representation A binary number can be represented by any set of bits (binary digits), which in turn may be represented by any mechanism capable of being in two mutually exclusive states. The following could all be interpreted as binary numbers: 0101001101011 on off off on off on + - - + - + Y N N Y N Y In keeping with customary representation of numerals using decimal digits, binary numbers are commonly written using the symbols 0 and 1. 100101 binary (explicit statement of format) 100101b (a suffix indicating binary format)

How to Read and Use a Box-and-Whisker Plot The box-and-whisker plot is an exploratory graphic, created by John W. Tukey, used to show the distribution of a dataset (at a glance). Think of the type of data you might use a histogram with, and the box-and-whisker (or box plot, for short) could probably be useful. The box plot, although very useful, seems to get lost in areas outside of Statistics, but I’m not sure why. It could be that people don’t know about it or maybe are clueless on how to interpret it. Reading a Box-and-Whisker Plot Let’s say we ask 2,852 people (and they miraculously all respond) how many hamburgers they’ve consumed in the past week. Take the top 50% of the group (1,426) who ate more hamburgers; they are represented by everything above the median (the white line). Find Skews in the Data The box-and-whisker of course shows you more than just four split groups. Want to learn more about making data graphics?

Normal Distribution Data can be "distributed" (spread out) in different ways. But there are many cases where the data tends to be around a central value with no bias left or right, and it gets close to a "Normal Distribution" like this: A Normal Distribution The "Bell Curve" is a Normal Distribution. Many things closely follow a Normal Distribution: heights of people size of things produced by machines errors in measurements blood pressure marks on a test We say the data is "normally distributed". Quincunx Standard Deviations The Standard Deviation is a measure of how spread out numbers are (read that page for details on how to calculate it). When you calculate the standard deviation of your data, you will find that (generally): Example: 95% of students at school are between 1.1m and 1.7m tall. Assuming this data is normally distributed can you calculate the mean and standard deviation? The mean is halfway between 1.1m and 1.7m: Mean = (1.1m + 1.7m) / 2 = 1.4m Standard Scores How far is 1.85 from the mean?

Don't Want Cancer? Sweat It Off Cancer sucks. No, wait: cancer REALLY sucks. Most of us perceive cancer as a gruesome condition that slowly degrades health and dignity. Though enormous advances in cancer therapy have been made in the last 30 years, the best option remains: just don’t get it. Cancer can be difficult to conceptualize: many common forms (in first-world countries, save lung cancer and cervical cancer) have no clear pathogen. A few months ago Jordan Rapp asked me to pen an article or two. 1) People who do aerobic exercise tend to develop less cancer than sedentary individuals 2) The best mechanism to explain #1 has implications for slowing aging as well In 2009 a group in Great Britain published a meta-analysis of 40 case-controlled studies in peer-reviewed scientific journals ( on exercise and cancer. I would like to point out that the aforementioned reports are correlation studies. Why does this matter? Telomere length is quantifiable. Ryon

Related: