library('ggplot2')
x<-c(.2,.4,.6,.8,1)
y<-c(1,2.3,3,3.5,4.9)
dat<-data.frame(x,y)

p<-ggplot()+
  geom_point(data=dat,aes(x=x,y=y))
p

In the simpler case of MLE, we have some iid (independent and identically distributed) observations and we want to estimate from the parametric family of distributions, which is the most likely distribution. Here, we have some observations \(y_1, y_2,..y_n\) that are independent but no identically distributed because they are indexed by another variable \(x_1,x_2,...,x_n\). We need to use a generalization of the likelihood function that includes the index \[L(\theta|x,y)=f(x,y|\theta)\]

We don’t know the exact probability density functions for \(y_1, y_2,..y_n\) , but we know that for each \(y_i\) the shape is the shape of the normal distribution centered around \(\mu_i = A+Bx_i\). Then, the probability density function for \(y_i\) is \[f(x_i,y_i\mid\theta)=f(x_i,y_i\mid A,B,\sigma)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(y_i-A-Bx_i)^2}{2\sigma^2}}\]

and the log likelihood \[log(L(A,B,\sigma))=-\frac{n}{2} log(2 \pi \sigma^2) - \frac{1}{2 \sigma^2} \sum_{i=1}^{n} (y_i-A-B x_i)^2\]

It is clear that to maximize the likelihood we need to minimize the summation term, which actually corresponds to the least squares minimization. So the MLE intercept and slope corresponds to the standard regression model parameters. Introducing the best A \((A_{MLE})\) and B \((B_{MLE})\) and derivating relative to \(\sigma\) is easy to find the MLE for \(\sigma^{2}\) \[\sigma_{MLE}^{2}=\frac{1}{n} \sum_{i=1}^{n} (y_i-A_{MLE}-B_{MLE} x_i)^2=\frac{SSE}{n}\]

where \(SSE\) is the sum of squared errors, which is different from the \(\sigma\) obtained using least squares \[\sigma_{LS}^2 = \frac{SSE}{n - r}\] with \(r=2\) in this case (A and B).

logL<-function(p) { 
  n<-length(x) #A: p[1],  B: p[2],  sigma: p[3]
  -.5*n*log(2*pi*p[3]^2)-1/(2*p[3]^2)*sum((y-p[1]-p[2]*x)^2)
  }
negLogL<-function(x) -logL(x)
MLEparameters<-optim(c(0,1,.1), negLogL)$par
MLEparameters
## [1] 0.2407149 4.4988915 0.2153094
xSeq<-seq(0,1,.1)
yPred<-MLEparameters[1]+MLEparameters[2]*xSeq 
curveMLE<-data.frame(xSeq,yPred)
p+geom_line(data=curveMLE,aes(x=xSeq,y=yPred))

We can calculate which is the likelihood for the best parameters

logL(MLEparameters)
## [1] 0.5814404

Of course, we can estimate the parameters and the likelihood more directly

lm(y~x)
## 
## Call:
## lm(formula = y ~ x)
## 
## Coefficients:
## (Intercept)            x  
##        0.24         4.50
logLik(lm(y~x))
## 'log Lik.' 0.5814469 (df=3)

References

Burnham, K. P., & Anderson, D. R. (2002). Model Selection and Multi-Model Inference. Springer Verlag.