Curve smoothing in R

Wed 13 November 2013

Say you have some data that looks like this -

Randomly Generated Data

It is hard to see any trends in the data owing to the high degree of variance. A common technique used to get rid of this noise such that patterns are more apparent is referred to as 'curve smoothing'. There are various algorithms available for this and this post will talk about three of them - rolling means, local regressions and smoothing splines.

Rolling Means

This is the simplest of the smoothing algorithms. The basic premise is that taking averages tends to reduce the variance in a data set and thus eliminates extreme values. A rolling mean calculates a value by taking the average of the last n values. So n = 10 would calculate the value by taking the average of the current value and the previous 9 values. Here's how the original curve looks when the rolling mean algorithm is applied to it.

  • n = 10

Rolling Mean - n=10

  • n = 20

Rolling Mean - n=20

In R, the zoo package provides a convenient rollmean function that takes the size of the rolling window as parameter.

Local Regressions

In simple terms, this algorithm calculates the least squares fit for a given set of points chosen using the nearest neighbors algorithm. The number of data points is controlled by the parameter. It is also referred to as loess for brevity. Here's how the original curve looks when the local regression algorithm is applied to it.

  • ⍶ = 0.1

Loess - ⍶=0.1

  • ⍶ = 0.6

Loess - ⍶=0.1

In R, the loess function in the base package provides a good implementation. The span parameter controls the value. Note that you need to feed the model generated by the loess function to predict to get the resulting y values.

Smoothing Splines

This algorthm uses the properties of a spline function to calculate the smooth curve. The algorithm is iterative in nature and is controlled by the λ parameter. Here's how the original curve looks when the smoothing spline algorithm is applied to it.

  • λ = 0.1

Smoothing Spline - λ=0.1

  • λ = 0.6

Smoothing Spline - λ=0.1

In R, the smooth.spline function in the base package provides a good implementation. The spar parameter controls the λ value.

Of course, there are many more ways you could smooth the data and I would encourage you to find the one that makes the most sense for the problem domain you are working in.