Based on visibly discernible trends in data, one may come to quick conclusions about approximate parametric (linear) models to represent a
supposed relationship.Although this method may be an appropriate starting place in estimating a model, it is ultimately riddled with uncertainty
and may poorly approximate more specific, localized trends. A more suitable option for such analysis is locally weighted regression, or “loess”.
Loess approximates non-parametric trends by fitting linear models to sequential subsets of data. Subdivisions are split on intervals in one
dimension of the data based on a computer determined, or user specified, number of points per interval by a weight function. Let X represent
multidimensional inputdata (a UxP matrix thats transpose is ) and y be the
dependent variable. Parametric models assume the form
where
is independently and identically distributed with mean 0 and standard
deviation 1. To determine the
parameters, or weights, of the model, we write
where
is model noise and
.
Thus,
,
which allows the parametric model’s
parameters associated with each vector contained in
(representing the different dimensions of the data) to be calculated by
. Loess uses this same process to estimate parametric equations on
data subintervals as described above and since weights are calculated for individual datapoints, we can ensure that although the model may not be smooth, it will be coninuous (no breaks or jumps). It is important to note that a widely varied distribution of datapoints can lead to overfitting
if too few or many points are selected for each interval, however, modifications to the kernal (or “bump”) fuction can help reduce this.