Use Excel or any other spreadsheet to draw graphs of the N(m, s2) and Bin(n, p) distributions. Put a grid of x-values in column A and compute the corresponding f(x) values in column B. Put the two parameters ((m, s) for the Normal or (n, p) for the Binomial) in cells C1 and D1, respectively, or C2 and D2 if you have put column headings in row 1. Link the parameters to the calculations in column B so that the graph will automatically be re-drawn whenever you change the parameter values.
Show that when n is large and p is not too close to 0 or 1, the shape of the Binomial is close to that of the Normal.
Use R for this one. Generate 5 observations from a N(3, 9) distribution and draw a histogram. Do this a few times. You can use the up arrow to retrieve and repeat a command.
> hist(rnorm(5, 3, 3))Does the histogram look like a N(3, 9) distribution? How much does it change from one sample of 5 observations to another? Repeat the process with samples of 10, 40 and 10000 observations.
> hist(rnorm(10, 3, 3)) > hist(rnorm(40, 3, 3)) > hist(rnorm(10000, 3, 3))You can compare the histogram to the theoretical density function by plotting both on the same graph. Set
freq = F
to draw the histogram as a density. You know that virtually all the probability in a normal distribution lies within 3 standard deviations of the mean. In this example the mean is 3 and the standard deviation is 3, so define a gridxgr
of 100 x-values running from -6 to +12 and usednorm(xgr,3,3)
to compute the N(3, 9) density function over the grid. The functionlines()
adds lines to the existing plot without initiating a new plot.> hist(rnorm(40, 3, 3), freq = F) > xgr <- seq(-6, 12, length = 100) > lines(xgr, dnorm(xgr, 3, 3))Now do the same for an Exponential distribution with mean equal to 3.> hist(rexp(5, 1/3))> hist(rexp(40, 1/3), freq = F) > xgre <- seq(0, 15, length = 100) > lines(xgre, dexp(xgre, 1/3)) > hist(rexp(1000, 1/3), freq = F) > lines(xgre, dexp(xgre, 1/3))Explain what you have learned.
One way to state the Central Limit Theorem is to say that if the sample mean x_bar is computed from n independent observations drawn from a population with mean m and standard deviation s, then the sampling distribution of x_bar (that is, the distribution that describes how x_bar will vary from sample to sample) will have mean m, standard deviation s /sqrt(n), and be exactly Normal if the original data are Normal, or approximately Normal (when n is large, and some mathematical regularity conditions hold) otherwise.
In MINITAB, put 1000 N(3, 9) random numbers in each of C1-C5. Looking across the rows, you will have 1000 independent samples, each of size n = 5. Use the Row Means command to compute x_bar for each row, storing the results in C6. Looking down C6 you can see how x_bar varies from sample to sample. Plot a histogram of C6 and compute the mean, variance and standard deviation. The theory says that the sampling distribution of x_bar should be exactly normal with mean = 3 and standard deviation = 3/sqrt(5). Are your results consistent with the theory? Repeat for n = 40; you could put the data in C1-C40 and the row means in C41.
Here are the same steps programmed in R. The apply() function is used here to apply the function mean() to the first dimension (i.e. the rows) of the array normdat.
> normdat <- matrix(rnorm(5000,3,3), ncol=5) > rowmeans <- apply(normdat,1,mean) > hist(rowmeans) > mean(rowmeans) > var(rowmeans) > sqrt(var(rowmeans))To generate 1000 samples, each of size n = 40, use
> normdat <- matrix(rnorm(40000,3,3), ncol=40)followed by the same commands as before to compute and display the 1000 means.
Repeat this demonstration with data from the Exponential distribution with mean m = 3. We can show by integration that, for this distribution, s = m = 3. The results should be similar to those obtained for Normal data, but, by the Central Limit Theorem, the sampling distribution of x_bar will be only approximately normal, with the approximation being better for n = 40 than it is for n = 5.