In the context of statistical modeling, a model is a set of distributions.  It is said to be parametric when it is completely determined by a finite set of parameters. For example in the case of a linear model

$y_i = \beta_0 + \beta_1 x_i + \epsilon_i$

In this case the regression line that fits the data is completely determined by its parameters and a noise term. We say that the vector space of the model parameters is finite-dimensional.

On the other hand, a model is said to be non-parametric if it doesn’t have a finite set of parameters (which, by the way, can be confusing as it often does have parameters).

As an example, the kernel density estimator has the smoothing parameter $h$.  Non-parametric models do need the training data alongside, whereas the parametric models, once the parameters are determined, do not. Consider as an analogy the case of continuous functions, which can be approximated by a set of finite degree polynomials and expressed in a Taylor series. However, that would yield a potentially infinite number of coefficients and therefore the space of continuous functions is an infinite-dimension vector space.

A semi-parametric model does have a finite-dimension parameter vector and an infinite dimensional function.

In our regression example we would express such a model as follows :

$Y = \beta^d X + g(z) + \epsilon$

Where we do retrieve the finite-dimension part and an arbitrary function.

One practical aspect of non-parametric regression models is that they need more data than parametric ones, because the data provides both the model estimates and the model structure itself.

Some interesting non-parametric regression model resources :

Alexis

Alexis is the founder of Aleph Technologies, a data infrastructure consulting and professional services provider based in Brussels, Belgium.