Data-410-Raposo

View the Project on GitHub aeraposo/Data-410-Raposo

Reading reflection - Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting (Cleveland and Devlin)

Based on visibly discernible trends in data, one may come to quick conclusions about approximate parametric (linear) models to represent a supposed relationship.Although this method may be an appropriate starting place in estimating a model, it is ultimately riddled with uncertainty and may poorly approximate more specific, localized trends. A more suitable option for such analysis is locally weighted regression, or “loess”. Loess approximates non-parametric trends by fitting linear models to sequential subsets of data. Subdivisions are split on intervals in one dimension of the data based on a computer determined, or user specified, number of points per interval by a weight function. Let X represent multidimensional inputdata (a UxP matrix thats transpose is Math) and y be the dependent variable. Parametric models assume the form Math where Math is independently and identically distributed with mean 0 and standard deviation 1. To determine the Math parameters, or weights, of the model, we write Math where Math is model noise and Math. Thus, Math, which allows the parametric model’s Math parameters associated with each vector contained in Math (representing the different dimensions of the data) to be calculated by Math. Loess uses this same process to estimate parametric equations on data subintervals as described above and since weights are calculated for individual datapoints, we can ensure that although the model may not be smooth, it will be coninuous (no breaks or jumps). It is important to note that a widely varied distribution of datapoints can lead to overfitting if too few or many points are selected for each interval, however, modifications to the kernal (or “bump”) fuction can help reduce this.