Probability and Statistics

Cross Validation. Next: Blackbox Model SelectionUp: Autonomous Modeling Previous: Judging Model Quality by Cross validation is a model evaluation method that is better than residuals.

The problem with residual evaluations is that they do not give an indication of how well the learner will do when it is asked to make new predictions for data it has not already seen. One way to overcome this problem is to not use the entire data set when training a learner. Some of the data is removed before training begins. Then when training is done, the data that was removed can be used to test the performance of the learned model on ``new'' data. Www-stat.stanford.edu/~susan/courses/s200/lectures/lect11.pdf. Maximum likelihood. In statistics, maximum-likelihood estimation (MLE) is a method of estimating the parameters of a statistical model.

When applied to a data set and given a statistical model, maximum-likelihood estimation provides estimates for the model's parameters. The method of maximum likelihood corresponds to many well-known estimation methods in statistics. For example, one may be interested in the heights of adult female penguins, but be unable to measure the height of every single penguin in a population due to cost or time constraints.

Poisson distribution. In probability theory and statistics, the Poisson distribution (French pronunciation [pwasɔ̃]; in English usually /ˈpwɑːsɒn/), named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate and independently of the time since the last event.[1] The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

For instance, suppose someone typically gets 4 pieces of mail per day on average. Mercury.bio.uaf.edu/courses/wlf625/readings/MLEstimation.PDF. People.physics.anu.edu.au/~tas110/Teaching/Lectures/L3/Material/Myung03.pdf. IBM SPSS Statistics 19. Type: Applications > Windows Files: Size: 475.98 MiB (499099798 Bytes)

Random assignment. Random assignment or random placement is an experimental technique for assigning subjects to different treatments (or no treatment).

The thinking behind random assignment is that by randomizing treatment assignment, then the group attributes for the different treatments will be roughly equivalent and therefore any effect observed between treatment groups can be linked to the treatment effect and is not a characteristic of the individuals in the group. In experimental design, random assignment of participants in experiments or treatment and control groups help to ensure that any differences between and within the groups are not systematic at the outset of the experiment. Random assignment does not guarantee that the groups are "matched" or equivalent, only that any differences are due to chance.

Prediction interval. In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed.

Prediction intervals are often used in regression analysis. Prediction intervals are used in both frequentist statistics and Bayesian statistics: a prediction interval bears the same relationship to a future observation that a frequentist confidence interval or Bayesian credible interval bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed. The distinction between confidence intervals, prediction intervals and tolerance intervals. When you fit a parameter to a model, the accuracy or precision can be expressed as a confidence interval, a prediction interval or a tolerance interval.

The three are quite distinct. The discussion below explains the three different intervals for the simple case of fitting a mean to a sample of data (assuming sampling from a Gaussian distribution). The same ideas can be applied to intervals for any best-fit parameter determined by regression. Confidence intervals tell you about how well you have determined the mean. Assume that the data really are randomly sampled from a Gaussian distribution.

EXCEL 2007: Two-Variable Regression Using Data Analysis Add-in. EXCEL 2007: Two-Variable Regression Using Data Analysis Add-in A.

Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis This January 2009 help sheet gives information on Two-variable linear regression. Run the regression using the Data Analysis Add-in. EXCEL 2007 Basics: Access and Activating Data Analysis Add-in. EXCEL: Access and Activating the Data Analysis Toolpack A.

Colin Cameron, Dept. of Economics, Univ. of Calif. - Davis This January 2009 help sheet gives information on Excel Access at U.C. -Davis Adding-in the Data Analysis Toolpack Excel Documentation UCD computer labs have Excel.