Action space

Often \(\Theta\).

Decision rule

In the context of point estimation, the decision rule is called a statistical estimator (also termed statistic or estimator) of \(\theta\), usually designed as \(\widehat{\theta}\), is any function of the data (sample of the random variable) \(\widehat{\theta}=d(x)\).

Nuisance parameter

Parameters that we do not want to estimate.


Sample mean

\[\overline{X}_n=\frac{\sum_{i=1}^{n} X_i}{n}\]

Sample variance


Sampling distribution of a statistic \(\widehat{\theta}\)

The distribution of \(\widehat{\theta}\) is called the sampling distribution.


Sample mean for a normal distributed random variable

If \(X_1,\dotsc,X_n\) are normally distributed, then \(\overline{X}_n\) is distributed \(N(\mu,\frac{\sigma^2}{n})\)

  • Demonstration ….

Bias of \(\widehat{\theta}\)

\[bias(\widehat{\theta}) = E[\widehat{\theta}] - \theta\]


Sample mean

\(bias(\overline{X}_n) = E[\overline{X}_n] - \mu = 0\)

Sample variance

  • \(bias(S_n^2) = E[S^2_n] - \sigma^2 = 0\)
    • Demonstration …

Standard error \(se\) of \(\widehat{\theta}\)



Sample mean

\(se(\overline{X}_n)=\sqrt{V(\overline{X}_n)} = \frac{\sigma}{\sqrt{n}}\)

  • Demonstration

    \(V[\overline{X}_n]=V(\frac{1}{n}\sum_i X_i) = \frac{1}{n^2} \sum_i V(X_i) = \frac{1}{n^2} n \sigma^2 = \frac{\sigma^2}{n}\)


\(\widehat{\theta}_n\) is consistent if it converges in probability to \(\theta\).


Sample mean

\(\overline{X}_n\) is consistent because it converges in probability to \(\mu\) (law of large numbers).

  • Demonstration:

Sample variance

\(S_n^2\) is consistent because it converges in probability to \(\sigma^2\)

  • Demonstration:

Sample standard deviation

\(S_n\) is consistent because it converges in probability to \(\sigma\).

Sample uncorrected variance

Uncorrected \(S_n^2\) is consistent because it converges in probability to \(\sigma^2\).

Asymptotic normality

\(\widehat{\theta}_n\) is asymptotically normal if \(\frac{\widehat{\theta}_n - \theta}{se}\) converges in distribution to \(N(0,1)\).


\(\overline{X}_n\) is asymptotically normal because \(\frac{\overline{X}_n - \mu}{se}\) where \(se=\frac{\sigma}{\sqrt{n}}\) converges in distribution to \(N(0,1)\) (central limit theorem).



Loss and risk functions

Squared error

\(L(\theta,\widehat{\theta})=(\theta - \widehat{\theta})^2\)

The risk function is called MSE.

\(MSE = R(\theta,\widehat{\theta})=E[L(\theta,\widehat{\theta})]= \int (\theta - \widehat{\theta})^2 f(x;\theta) \, dx\)


  • \(MSE = bias(\widehat{\theta})^2 + var(\widehat{\theta})\)

    Demonstration: …


Example 1

\(X \sim N(\mu,1)\)

\(\widehat{\mu}_1 = 4\)

\(MSE_1 = bias(\widehat{\mu}_1)^2 + var(\widehat{\mu}_1)= \left( E[\widehat{\mu}_1] - \mu \right)^2+ var(\widehat{\mu}_1) = \left(4-\mu \right)^2 + 0 = \left(4-\mu \right)^2\)

\(\widehat{\mu}_2 = X\)

\(MSE_2 = bias(\widehat{\mu}_2)^2 + var(\widehat{\mu}_2)= \left( E[\widehat{\mu}_2] - \mu \right)^2+ var(\widehat{\mu}_2) = \left(E[X]-\mu \right)^2 + var(X) = \left(\mu-\mu \right)^2 + 1 = 1\)

If the parameter happens to be close to 4 the risk for the first estimator is better than for the second. Otherwise, the second si better.

Absolute error

\(L(\theta,\widehat{\theta})=|\theta - \widehat{\theta}|\)


\(L(\theta,\widehat{\theta})=\int \log \left( \frac{f(x;\theta)}{f(x;\widehat{\theta})} \right) f(x;\theta) dx\)