Kullback–Leibler (KL) information $$I$$

The KL information or distance between a full reality model considered to be fixed $$f$$ and an approximating model $$g$$ is

$I(f,g)=\int f(x) \log \left( \frac{f(x)}{g(x | \theta)} \right) \, dx$

that can be written as

$I(f,g)=E_f \left[ f(x) \right] - E_f \left[ g(x | \theta) \right]$

The first expected value is a constant $$C$$ that we will never know as we don’t know the full reality model, but we can calculate relative distances

$relative \, distance = I(f,g) - C = - E_f \left[ g(x | \theta) \right]$

For two approximating models:

$I(f,g_1) - I(f,g_2) = - E_f \left[ g_1(x | \theta) \right] + - E_f \left[ g_2(x | \theta) \right]$

Notice that while $$I(f,g)$$ has a true zero, the $$relative \, distance$$ has not a true zero.

Often, we will have an estimation of $$I$$ as we don’t know the parameters of the model and need to estimate them $$\widehat{\theta}$$.

Akaike information criterion $$AIC$$

$AIC = -2\log{L(\widehat{\theta})} + 2 k$

where L is the likelihood evaluated in the parameters of the model that maximizes it and $$k$$ is the number of parameters of the model.

Example

n <- 100
x <- c(.2, .4, .6, .8, 1)
k <- c(10, 26, 73, 94, 97)
r <- n - k
y <- k/n
dat <- data.frame(x, y, k, r, n)

Using quickpsy

library(quickpsy)
fit<-quickpsy(dat, x, k, n)
fit\$aic
##        aic
## 1 30.89906

Using glm

model <- glm( cbind(k, r) ~ x, data= dat, family = binomial(probit))
model
##
## Call:  glm(formula = cbind(k, r) ~ x, family = binomial(probit), data = dat)
##
## Coefficients:
## (Intercept)            x
##      -2.279        4.572
##
## Degrees of Freedom: 4 Total (i.e. Null);  3 Residual
## Null Deviance:       304.4
## Residual Deviance: 6.662     AIC: 30.9

Sometimes, other software provide other values for the $$AIC$$ become it uses the loglikelihood dropping the binomial coefficients (that do not depend on the parameters). In general, this is not a problem as we are interested in differences in the $$AIC$$, but it could be a problem when comparing models with different distribution of errors.