In the context of statistical modeling, a model is a set of distributions.  It is said to be parametric when it is completely determined by a finite set of parameters. For example in the case of a linear model

y_i = \beta_0 + \beta_1 x_i + \epsilon_i

In this case the regression line that fits the data is completely determined by its parameters and a noise term. We say that the vector space of the model parameters is finite-dimensional.

On the other hand, a model is said to be non-parametric if it doesn’t have a finite set of parameters (which, by the way, can be confusing as it often does have parameters).

As an example, the kernel density estimator has the smoothing parameter h.  Non-parametric models do need the training data alongside, whereas the parametric models, once the parameters are determined, do not. Consider as an analogy the case of continuous functions, which can be approximated by a set of finite degree polynomials and expressed in a Taylor series. However, that would yield a potentially infinite number of coefficients and therefore the space of continuous functions is an infinite-dimension vector space.

A semi-parametric model does have a finite-dimension parameter vector and an infinite dimensional function.

In our regression example we would express such a model as follows :

Y = \beta^d X + g(z) + \epsilon

Where we do retrieve the finite-dimension part and an arbitrary function.

One practical aspect of non-parametric regression models is that they need more data than parametric ones, because the data provides both the model estimates and the model structure itself.

Some interesting non-parametric regression model resources :


Alexis is the founder of Aleph Technologies, a data infrastructure consulting and professional services provider based in Brussels, Belgium.

More Posts - Website

Follow Me: