In the simpler case of parameters estimation, we have some iid (independent and identically distributed) observations and we want to estimate from the parametric family of distributions, which is the most likely distribution. Here, we have some observations \(Y_1, Y_2,..,Y_n\) that are independent but no identically distributed because they are indexed by another random variable \(X_1,X_2,...,X_n\).
The statistical model is
\[f_{model} (y_1, \cdots, y_n; x_1, \cdots, x_n) = \prod_{i=1}^{n} f(y_i; x_i)\]
Using the marginal distribution \(f\), the regression function is defined as
\(r(x)=E[Y|X=x]\) where conditional expectation is defined here.
\(Y\) is called the response variable.
\(X\) is called the predictor variable, covariate or feature.
The goal is the obtain \(r(x)\) from data of the form
\[(X_1, Y_1), \dots, (X_n, Y_n) \sim F_{X,Y}\]
If we define \(\epsilon\) as \(\epsilon = Y -r(x)\), then the regression model can be expressed as
\[Y=r(x)+\epsilon\]
with \(E[\epsilon]=0\).
\(r(x)\) corresponds to the deterministic part.