It is used to estimate the population mean \(\mu\)

\[\overline{X}_n=\frac{\sum_{i=1}^{n} X_i}{n}\]

\[bias(\overline{X}_n) = E[\overline{X}_n] - \mu = 0\]

Demonstration

\(E[\overline{X}_n] = \frac{1}{n} \sum_{i} E[X_i] = \frac{1}{n} n E[X_i]= E[X_i]= \mu\)

given that \(E[\sum_{i=1}^n g(X_i)]= n E[g(X_1)]\) if \(X_1,\dotsc,X_n\) are identically distributed.

\[se = \frac{\sigma}{\sqrt{n}}\] where \(\sigma^2\) is the population variance.

\[V[\overline{X}_n]=V(\frac{1}{n}\sum_i X_i) = \frac{1}{n^2} \sum_i V(X_i) = \frac{1}{n^2} n \sigma^2 = \frac{\sigma^2}{n}\]

If \(X_1,\dotsc,X_n\) are normally distributed, then \(\overline{X}_n\) is distributed \(N(\mu,\frac{\sigma^2}{n})\)

```
library(tidyverse)
M <- 1:2000
n <- 1:7
mu <- 100
sigma <- 16
sim_sample_mean <- crossing(M, n) %>%
group_by(M, n) %>%
mutate(x = rnorm(1, mu, sigma),
z = (x - mu)/sigma) %>%
group_by(M) %>%
summarise(sample_mean_x = mean(x),
sample_mean_z = mean(z))
ggplot(sim_sample_mean, aes(sample = sample_mean_z)) +
stat_qq()
```

\(X_1,\dotsc,X_n\) for large \(n\), \(\overline{X}_n\) is distributed \(N(\mu,\frac{\sigma^2}{n})\)

We say that \(\overline{X}_n\) is asymptotically normal. That means that probability statements about the sample mean can be approximated using a normal distribution.

\(\overline{X}_n\) is consistent because it converges in probability to \(\mu\).