# VassarStats: Statistical Computation Web Site

Sample Size Calculator - Confidence Level, Confidence Interval, Sample Size, Population Size, Relevant Population - Creative Research Systems This Sample Size Calculator is presented as a public service of Creative Research Systems survey software. You can use it to determine how many people you need to interview in order to get results that reflect the target population as precisely as needed. You can also find the level of precision you have in an existing sample. Before using the sample size calculator, there are two terms that you need to know. These are: confidence interval and confidence level. If you are not familiar with these terms, click here. Enter your choices in a calculator below to find the sample size you need or the confidence interval you have. Sample Size Calculator Terms: Confidence Interval & Confidence Level The confidence interval (also called margin of error) is the plus-or-minus figure usually reported in newspaper or television opinion poll results. The confidence level tells you how sure you can be. Factors that Affect Confidence Intervals Sample sizePercentagePopulation size Sample Size Percentage

Interactive Statistical Calculation Pages Interactive Statistical Calculation Pages Sample Size Calculator by Raosoft, Inc. If 50% of all the people in a population of 20000 people drink coffee in the morning, and if you were repeat the survey of 377 people ("Did you drink coffee this morning?") many times, then 95% of the time, your survey would find that between 45% and 55% of the people in your sample answered "Yes". The remaining 5% of the time, or for 1 in 20 survey questions, you would expect the survey response to more than the margin of error away from the true answer. When you survey a sample of the population, you don't know that you've found the correct answer, but you do know that there's a 95% chance that you're within the margin of error of the correct answer. Try changing your sample size and watch what happens to the alternate scenarios. That tells you what happens if you don't use the recommended sample size, and how M.O.E and confidence level (that 95%) are related. To learn more if you're a beginner, read Basic Statistics: A Modern Approach and The Cartoon Guide to Statistics.

Determinación del tamaño muestral Todo estudio epidemiológico lleva implícito en la fase de diseño la determinación del tamaño muestral necesario para la ejecución del mismo (1-4). El no realizar dicho proceso, puede llevarnos a dos situaciones diferentes: primera que realicemos el estudio sin el número adecuado de pacientes, con lo cual no podremos ser precisos al estimar los parámetros y además no encontraremos diferencias significativas cuando en la realidad sí existen. La segunda situación es que podríamos estudiar un número innecesario de pacientes, lo cual lleva implícito no solo la pérdida de tiempo e incremento de recursos innecesarios sino que además la calidad del estudio, dado dicho incremento, puede verse afectada en sentido negativo. Para determinar el tamaño muestral de un estudio, debemos considerar diferentes situaciones (5-7): A. Estudios para determinar parámetros. Es decir pretendemos hacer inferencias a valores poblacionales (proporciones, medias) a partir de una muestra (Tabla 1).

Downloadable Sample SPSS Data Files Downloadable Sample SPSS Data Files Data QualityEnsure that required fields contain data.Ensure that the required homicide (09A, 09B, 09C) offense segment data fields are complete.Ensure that the required homicide (09A, 09B, 09C) victim segment data fields are complete.Ensure that offenses coded as occurring at midnight are correctEnsure that victim variables are reported where required and are correct when reported but not required. Standardizing the Display of IBR Data: An Examination of NIBRS ElementsTime of Juvenile Firearm ViolenceTime of Day of Personal Robberies by Type of LocationIncidents on School Property by HourTemporal Distribution of Sexual Assault Within Victim Age CategoriesLocation of Juvenile and Adult Property Crime VictimizationsRobberies by LocationFrequency Distribution for Victim-Offender Relationship by Offender and Older Age Groups and Location Analysis ExamplesFBI's Analysis of RobberyFBI's Analysis of Motor Vehicle Theft Using Survival Model

RStats Resources - RStats Institute Statistics Tutoring Undergraduate students who need assistance with statistics homework can receive one-on-one tutoring through Missouri State University's Bear CLAW (Center for Learning and Writing). Click here to access Bear CLAW Statistics Tutoring. Instructional Videos Tables and Calculators Click here to access: Normal Distribution TableT Distribution TableCritical Pearson's r ValuesF Distribution TableChi Square Distribution Table and CalculatorCohen's D Effect Size Calculator Notes from Previous RStats Workshops Information About RStats

The R Trader » Blog Archive » BERT: a newcomer in the R Excel connection A few months ago a reader point me out this new way of connecting R and Excel. I don’t know for how long this has been around, but I never came across it and I’ve never seen any blog post or article about it. So I decided to write a post as the tool is really worth it and before anyone asks, I’m not related to the company in any way. BERT stands for Basic Excel R Toolkit. It’s free (licensed under the GPL v2) and it has been developed by Structured Data LLC. At the time of writing the current version of BERT is 1.07. In this post I’m not going to show you how R and Excel interact via BERT. How do I use BERT? My trading signals are generated using a long list of R files but I need the flexibility of Excel to display results quickly and efficiently. Use XML to build user defined menus and buttons in an Excel file.The above menus and buttons are essentially calls to VBA functions.Those VBA functions are wrapup around R functions defined using BERT. Prerequisite Step by step guide You’re done!

Measuring Association in Case-Control Studies All the examples above were for cohort studies or clinical trials in which we compared either cumulative incidence or incidence rates among two or more exposure groups. However, in a true case-control study we don't measure and compare incidence. There is no "follow-up" period in case-control studies. In the module on Overview of Analytic Studies we considered a rare disease in a source population that looked like this: This view of the population is hypothetical because it shows us the exposure status of all subjects in the population. Another way of looking at this association is to consider that the "Diseased" column tells us the relative exposure status in people who developed the outcome (7/6 = 1.16667), and the "Total" column tells us the relative exposure status of the entire source population (1,007/5,640 = 0.1785). The Odds Ratio The relative exposure distributions (7/6) and (10/56) are really odds, i.e. the odds of exposure among cases and non-diseased controls.

How To Determine Sample Size, Determining Sample Size In order to prove that a process has been improved, you must measure the process capability before and after improvements are implemented. This allows you to quantify the process improvement (e.g., defect reduction or productivity increase) and translate the effects into an estimated financial result – something business leaders can understand and appreciate. If data is not readily available for the process, how many members of the population should be selected to ensure that the population is properly represented? If data has been collected, how do you determine if you have enough data? Determining sample size is a very important issue because samples that are too large may waste time, resources and money, while samples that are too small may lead to inaccurate results. When sample data is collected and the sample mean is calculated, that sample mean is typically different from the population mean . is the maximum difference between the observed sample mean where: is the sample size. . .

Introduction to Principal Component Analysis (PCA) - Laura Diane Hamilton Principal Component Analysis (PCA) is a dimensionality-reduction technique that is often used to transform a high-dimensional dataset into a smaller-dimensional subspace prior to running a machine learning algorithm on the data. When should you use PCA? It is often helpful to use a dimensionality-reduction technique such as PCA prior to performing machine learning because: Reducing the dimensionality of the dataset reduces the size of the space on which k-nearest-neighbors (kNN) must calculate distance, which improve the performance of kNN. What does PCA do? Principal Component Analysis does just what it advertises; it finds the principal components of the dataset. Can you ELI5? Let’s say your original dataset has two variables, x1 and x2: Now, we want to identify the first principal component that has explains the highest amount of variance. Let's say we just wanted to project the data onto the first principal component only. Here is a picture: You can think of this sort of like a shadow.

THE DECISION TREE FOR STATISTICS The material used in this guide is based upon "A Guide for Selecting Statistical Techniques for Analyzing Social Science Data," Second Edit ion, produced at the Institute for Social Research, The University of Michigan, under the authorship of Frank M. Andrews, Laura Klem, Terrence N. Davidson, Patrick O'Malley, and Willard L. Rodgers, copyright 1981 by The University of Michigan, All Rights Reserved. The Decision Tree helps select statistics or statistical techniques appropriate for the purpose and conditions of a particular analysis and to select the MicrOsiris commands which produce them or find the corresponding SPSS and SAS commands. Start with the first question on the next screen and choose one of the alternatives presented there by selecting the appropriate link. The "Statistics Programs" button provides a table of all statistics mentioned which can be produced by MicrOsiris, SPSS, or SAS and the corresponding commands for them. GlossaryReferences

R Tutorials--Logistic Regression Preliminaries Model Formulae You will need to know a bit about Model Formulae to understand this tutorial. Odds, Odds Ratios, and Logit When you go to the track, how do you know which horse to bet on? p(one outcome) p(success) p odds = -------------------- = ----------- = ---, where q = 1 - p p(the other outcome) p(failure) q So for Sea Brisket, odds(winning) = (1/9)/(8/9) = 1/8. The natural log of odds is called the logit, or logit transformation, of p: logit(p) = loge(p/q). If odds(success) = 1, then logit(p) = 0. Logistic regression is a method for fitting a regression curve, y = f(x), when y consists of proportions or probabilities, or binary coded (0,1--failure,success) data. y = [exp(b0 + b1x)] / [1 + exp(b0 + b1x)] Logistic regression fits b0 and b1, the regression coefficients (which were 0 and 1, respectively, for the graph above). logit(y) = b0 + b1x Odds ratio might best be illustrated by returning to our horse race. Logistic Regression: One Numerical Predictor I'm impressed!

Related: