An experiment that can result in different outcomes, even though it is repeated in the same manner every time, is called a **random experiment**.

The set of all possible outcomes of a random experiment is called the **sample space** of the experiment.

An **event** is a subspace of the sample space of a random experiment.

A sample space is **discrete** if it consists of a finite (or countably infinite) set of outcomes.

**Probability** is a measure of certainty on a scale of 0 to 1. The probability of an impossible event is 0, the probability of an inevitable event is 1. If A and B are events, then P(A+B) = P(A) + P(B) - P(A.B), where A+B denotes set union and A.B denotes set intersection. Any one of the following three definitions of probability can be used to assign a probability to an event that is neither impossible nor inevitable.

The

relative frequency definition of probabilityapplies when the sample space consists of elementary outcomes which, through physical symmetry, are recognized as being equally likely. The probability of an event E is the number of elementary outcomes in E divided by the number of elementary outcomes in the sample space.The

limiting frequency definition of probabilityapplies when you can envisage a sequence of independent trials. Consider the number of trials that result in an event E, divided by the total number of trials. The probability of E is the hypothetical limit to which any such series of trials will tend.The

subjective definition of probabilitydefines your personal probability of an event E as the maximum amount of money you are willing to bet in order to win $1 if E occurs.

A **random variable** is a function that assigns a real number to each outcome in the sample space of a random experiment.

A **discrete random variable** is a random variable with a finite (or countably infinite) range.

A **continuous random variable** is a random variable with an interval (either finite or infinite) of real numbers for its range.

The **probability density function** for a random variable X is a non-negative function f(x) which gives the relative probability of each point x in the sample space of X. It integrates to 1 over the whole sample space. The integral over any subset of the sample space gives the probability that X will fall in that subset.

The probability density function for a discrete-valued random variable is sometimes called a **probability mass function**.

A **parameter** is a scalar or vector that indexes a family of probability distributions.

The **expected value** or **mean** or **average** of a random variable is computed as a sum (or integral) over all possible values of the random variable, each weighted by the probability of getting that value. It can be interpreted as the centre of mass of the probability distribution.

The **variance** of a random variable is the expected squared deviation from the mean.

The **covariance** of two random variables is the expected product of their deviations from their respective means.

The **correlation coefficient** is a dimensionless measure of association between two random variables. Pearson's correlation coefficient is computed as their covariance divided by the product of their standard deviations. It ranges from +1 (perfect linear relationship with positive slope) to -1 (perfect linear relationship with negative slope), with 0 indicating no relationship.

A **time series** is a sequence of observations ordered in time or space.

**Autocorrelation** is correlation between consecutive observations in a time series. A sequence of independent observations has zero autocorrelation.

When approximating a discrete distribution (defined on integer values) by a continuous distribution, the sum of probabilities up to and including the probability at a point *x *is usually approximated by the area under the continuous distribution up to *x* + 0.5; the 0.5 is called the **continuity correction**.

A **population** consists of the totality of the observations with which we are concerned.

A **sample** is a subset of observations selected from the population.

A **statistic** is any function of the observations in a sample. It may not include any unknown parameters.

The distribution of a statistic is called a **sampling distribution**. It describes how the statistic will vary from one sample to another.

A statistic will be of use in a given application only if its sampling distribution depends on the parameter of interest. If it also depends on a parameter that is not of interest, that parameter is called a **nuisance parameter**.

A **pivotal quantity** is a function of a statistic and the parameter of interest that follows a standard distribution. The distribution may not include any unknown parameters.

A **confidence interval **is a random interval which includes the true value of the parameter of interest with probability 1-a. When we have computed a confidence interval from data, it is fixed by the data and no longer random, so we say that we are 100(1-a)% *confident* that it includes the true value of the parameter.

In parametric statistical inference, an **hypothesis** is a statement about the parameters of a probability distribution.

A **null hypothesis** states that there is no difference between the hypothesized value of a parameter and its true value.

The **alternative hypothesis** is an hypothesis that applies when the null hypothesis is false.

A **simple hypothesis** specifies a single value for a parameter, a **composite hypothesis** specifies more than one value, or a range of values.

A **test statistic** can be derived from a pivotal quantity by replacing the unknown parameter by its hypothesized value.

The distribution of a test statistic when the null hypothesis is true is called the **reference distribution** for the test.

Rejecting the null hypothesis when it is true is defined as a **type I error**. Failing to reject the null hypothesis when it is false is defined as a **type II error**.

In an accept-reject test of hypothesis, the conditional probability of committing a type I error, given that the hypothesis is true, is called the **level of significance** of the test.

In an accept-reject test of hypothesis, the conditional probability of rejecting the null hypothesis, given that the alternative hypothesis is true, is called the **power** of the test. **The type II error rate** is computed as (1-power).

There are three definitions of **P-value**. Satisfy yourself that all three mean exactly the same thing.

(1)

P-valueis thesmallestlevel of significance that will lead torejectionof the null hypothesis with the given data.(2)

P-valueis thelargestlevel of significance that will lead toacceptanceof the null hypothesis with the given data.(3)

P-valueis the probability of getting a value of the test statistics as extreme as, or more extreme than, the value observed,if the null hypothesis were true.The alternative hypothesis determines the direction of "extreme".

A statistical method is said to be **robust** if it does what it is supposed to do even when the assumptions on which it is based are not satisfied. (For example, the z-test for a normal mean when the variance is known is robust against non-normality, but not against dependent data or an incorrectly specified variance.)

In the **simple linear regression** model, the conditional mean of the **dependent variable** (also called the **Y-variable** or **response variable **or** predicted variable**) is a linear function of a single **independent variable** (also called the **X-variable** or **explanatory variable **or** predictor variable **or** covariate**).

The term **regression** comes from breeding experiments. If inheritance were perfect, plotting a characteristic of an offspring against the same characteristic in the parent would give points along the diagonal. In reality, offspring tend to "regress" towards the population mean, so that offspring of superior parents tend to be less superior than their parents and offspring of inferior parents tend to be less inferior than their parents, hence the points will lie along a line with slope less than 1. This was called the "regression line" and fitted by least squares. Now, any model fitting with least squares is called "regression".

In the **multiple linear regression** model, the conditional mean of the **dependent variable** is a linear function of more than one **independent variable**.

A categorical independent variable is called a **factor**. The categories are called the **levels** of the factor.

**Replications** are experiment observations made under the same conditions, that is, under the same combination of factor levels.

An experimental design is said to be **balanced** if each combination of factor levels is replicated the same number of times.

The case where the variance of a random variable is the same in each subpopulation, or at any given level of the covariates, is called **homoscedastic**. The contrary case is called **heteroscedastic**.

In **analysis of variance**, or **ANOVA**, the **sum of squared deviations** of the dependent variable about its mean is broken down into a sum of terms, each term a sum of squared deviations representing the variation attributable to an explanatory variable, and the **residual**, or **unexplained**, **variation**.

The terms **residual sum of squares** or **error sum of squares** apply to the sum of squared deviations after a model has been fitted, whether or not the model is correct, even if there are possible explanatory variables that have been not been put into the model.

A residual sum of squares based on replication is sometimes called a **pure error sum of squares** because there is no explanation for the observed variability other than random error.

The **degrees of freedom** of a sum of squared deviations is the number of squares in the sum, minus the number of fitted parameters in the expected values about which the deviations are computed.

A **mean square** is a sum of squares divided by its degrees of freedom.

The Bin(n, p) distribution can be approximated by the Pois(np) distribution when n is large and p is small.

The Bin(n, p) distribution can be approximated by the N(np, np(1-p)) distribution when p < 0.5 and np > 5 or when q < .5 and nq > 5. The continuity correction is recommended.

The Pois(m) distribution can be approximated by the N(m, m) distribution when m > 5. The continuity correction is recommended.

A process in time (or space) where events happen one at a time, at random, independently of each other, at a constant average rate l, is called a **Poisson process**. The number of events in a fixed time interval of length t follows a **Poisson distribution** with mean lt. The time between events, or from an arbitrary time to the next event, follows an **exponential distribution** with mean 1/l.

The relations between the Normal, Chi-square, t and F distributions can be illustrated with the following identities, which you should verify in the tables:

z

_{p}= t_{infinity,p}(z

_{1-p/2})^{2}= (t_{infinity,1-p/2 })_{ }^{2 }= c^{2}_{1,1-p}= F_{1, infinity,1-p}= 1/F_{infinity,1,p}(t

_{d,1-p/2 })_{ }^{2 }= F_{1,d,1-p}^{ }= 1/F_{d,1,p}(c

^{2}_{d,1-p})/d = F_{d, infinity,1-p}= 1/F_{infinity,d,p}

Statistics 3N03