background preloader

Big Data, Data Mining, Predictive Analytics, Statistics, StatSoft Electronic Textbook

Big Data, Data Mining, Predictive Analytics, Statistics, StatSoft Electronic Textbook
"Thank you and thank you again for providing a complete, well-structured, and easy-to-understand online resource. Every other website or snobbish research paper has not deigned to explain things in words consisting of less than four syllables. I was tossed to and fro like a man holding on to a frail plank that he calls his determination until I came across your electronic textbook...You have cleared the air for me. You have enlightened. You have illuminated. You have educated me." — Mr. "As a professional medical statistician of some 40 years standing, I can unreservedly recommend this textbook as a resource for self-education, teaching and on-the-fly illustration of specific statistical methodology in one-to-one statistical consulting. — Mr. "Excellent book. — Dr. "Just wanted to congratulate whoever wrote the 'Experimental Design' page. — James A. Read More Testimonials >> StatSoft has freely provided the Electronic Statistics Textbook as a public service since 1995. Proper citation:

Related:  HSH216 Epidemiology & Biostatistics 2

Menu OpenEpi provides statistics for counts and measurements in descriptive and analytic studies, stratified analysis with exact confidence limits, matched pair and person-time analysis, sample size and power calculations, random numbers, sensitivity, specificity and other evaluation statistics, R x C tables, chi-square for dose-response, and links to other useful sites. OpenEpi is free and open source software for epidemiologic statistics. It can be run from a web server or downloaded and run without a web connection. A server is not required. Nakagami distribution The Nakagami distribution or the Nakagami-m distribution is a probability distribution related to the gamma distribution. It has two parameters: a shape parameter and a second parameter controlling spread, Characterization[edit] Its probability density function (pdf) is[1]

Margin of Error and Confidence Levels Made Simple Pamela Hunter February 26, 2010 A survey is a valuable assessment tool in which a sample is selected and information from the sample can then be generalized to a larger population. Surveying has been likened to taste-testing soup – a few spoonfuls tell what the whole pot tastes like. The key to the validity of any survey is randomness. Just as the soup must be stirred in order for the few spoonfuls to represent the whole pot, when sampling a population, the group must be stirred before respondents are selected.

The Central Limit Theorem To understand the wildness of samples, we would choose thousands of samples, calculate an x-bar for each, and display the x-bars in a histogram. This histogram represents a sampling distribution and when we look at it we see something truly amazing. Sampling distributions tend to be far less variable or wild than the populations they are drawn from (See Fig. 1A, 1B, 1C and 1D.) They also have essentially the same mean as the population. Sampling distributions drawn from a uniformly distributed population start to look like normal distributions even with a sample size as small as 2 (see Fig. 1B). If the sample size is large enough they form nearly perfect normal distributions (see Fig. 1C). Autoregressive conditional heteroskedasticity ARCH(q) model Specification[edit] Suppose one wishes to model a time series using an ARCH process. Let denote the error terms (return residuals, with respect to a mean process) i.e. the series terms. December 31, 2006 | Tags: digg, diggstatus, statistics, data, analysisThe Experiment Saturday, December 9th, I decided to run an experiment. The experiment was intended do several things: It needed to chronicle the Digg Effect. This has been done many times before, so I needed to come up with something that would provide more information than a typical traffic chart.I wanted to know more about the Digg community, and how most people use the site. There has been plenty of coverage of the Top Users, but nothing that really shows their stats in the context of the entire Digg user base.It needed to determine profitability of "blog spamming" by tracking the ad revenue of one Google AdSense advertisement while being linked to on the front page. We have all seen people post a summary of a news story on their ad-invested blog and post the link to Digg.

Sample Size Calculator - Confidence Level, Confidence Interval, Sample Size, Population Size, Relevant Population - Creative Research Systems This Sample Size Calculator is presented as a public service of Creative Research Systems survey software. You can use it to determine how many people you need to interview in order to get results that reflect the target population as precisely as needed. You can also find the level of precision you have in an existing sample. Before using the sample size calculator, there are two terms that you need to know.

Lévy flight The term "Lévy flight" was coined by Benoît Mandelbrot,[1] who used this for one specific definition of the distribution of step sizes. He used the term Cauchy flight for the case where the distribution of step sizes is a Cauchy distribution,[2] and Rayleigh flight for when the distribution is a normal distribution[3] (which is not an example of a heavy-tailed probability distribution). Later researchers have extended the use of the term "Lévy flight" to include cases where the random walk takes place on a discrete grid rather than on a continuous space.[4][5] A Lévy flight is a random walk in which the steps are defined in terms of the step-lengths, which have a certain probability distribution, with the directions of the steps being isotropic and random. The particular case for which Mandelbrot used the term "Lévy flight"[1] is defined by the survivor function (commonly known as the survival function) of the distribution of step-sizes, U, being[6] for some k satisfying 1 < k < 3.

Mediation (David A. Kenny) Some might benefit from Muthén (2011). Note that both the CDE and the NDE would equal the regression slope or what was earlier called path c' if the model is linear, assumptions are met, and there is no XM interaction affecting Y, the NIE would equal ab, and the TE would equal ab + c'. In the case in which the specifications made by traditional mediation approach (e.g., linearity, no omitted variables, no XM interaction), the estimates would be the same. Here I give the general formulas for the NDE and NIE when X is an intervally measured based on Valeri & VanderWeele, (2013).

Adolescent and School Health Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content Get Email Updates To receive email updates about this page, enter your email address: CDCDASH HomeDataYRBSSData & Documentation