## Action space

Often \(\Theta\).

## Decision rule

In the context of point estimation, the decision rule is called a
statistical estimator (also termed statistic or estimator) of \(\theta\), usually designed as \(\widehat{\theta}\), is any function of the
data (sample of the random variable) \(\widehat{\theta}=d(x)\).

We might want to estimate some parameter of \(f\) or some feature such as \(\mu\).

### Nuisance parameter

Parameters that we do not want to estimate.

## Sampling distribution of a statistic \(\widehat{\theta}\)

The distribution of \(\widehat{\theta}\) is called the sampling
distribution.

## Bias of \(\widehat{\theta}\)

\[bias(\widehat{\theta}) =
E[\widehat{\theta}] - \theta\]

## Standard error \(se\) of \(\widehat{\theta}\)

\[se(\widehat{\theta})=\sqrt{V(\widehat{\theta})}\]
Often \(se\) is difficult to calculate
as the variance depends on the unknown distribution \(F\). A typical solution is to aproximate
\(F\) with the empirical distribution
\(\widehat{F}\) and use bootstrap.

## Consistency

\(\widehat{\theta}_n\) is consistent
if it converges in probability to \(\theta\).

## Asymptotic normality

\(\widehat{\theta}_n\) is
asymptotically normal if \(\frac{\widehat{\theta}_n - \theta}{se}\)
converges in distribution to \(N(0,1)\).

## Loss and risk functions

### Squared error

\(L(\theta,\widehat{\theta})=(\theta -
\widehat{\theta})^2\)

The risk function is called MSE.

\(MSE =
R(\theta,\widehat{\theta})=E[L(\theta,\widehat{\theta})]= \int (\theta -
\widehat{\theta})^2 f(x;\theta) \, dx\)

#### Examples

##### Example 1

\(X \sim N(\mu,1)\)

\(\widehat{\mu}_1 = 4\)

\(MSE_1 = bias(\widehat{\mu}_1)^2 +
var(\widehat{\mu}_1)= \left( E[\widehat{\mu}_1] - \mu \right)^2+
var(\widehat{\mu}_1) = \left(4-\mu \right)^2 + 0 = \left(4-\mu
\right)^2\)

\(\widehat{\mu}_2 = X\)

\(MSE_2 = bias(\widehat{\mu}_2)^2 +
var(\widehat{\mu}_2)= \left( E[\widehat{\mu}_2] - \mu \right)^2+
var(\widehat{\mu}_2) = \left(E[X]-\mu \right)^2 + var(X) = \left(\mu-\mu
\right)^2 + 1 = 1\)

If the parameter happens to be close to 4 the risk for the first
estimator is better than for the second. Otherwise, the second si
better.

### Absolute error

\(L(\theta,\widehat{\theta})=|\theta -
\widehat{\theta}|\)

### Kullback-Leibler

\(L(\theta,\widehat{\theta})=\int \log
\left( \frac{f(x;\theta)}{f(x;\widehat{\theta})} \right) f(x;\theta)
dx\)