Say you have some data that looks like this -

It is hard to see any trends in the data owing to the high degree of variance. A common technique used to get rid of this noise such that patterns are more apparent is referred to as 'curve smoothing'. There are various algorithms available for this and this post will talk about three of them - rolling means, local regressions and smoothing splines.

### Rolling Means

This is the simplest of the smoothing algorithms. The basic premise is
that taking averages tends to reduce the variance in a data set and
thus eliminates extreme values. A rolling mean calculates a value by
taking the average of the last `n`

values. So `n = 10`

would calculate
the value by taking the average of the current value and the previous
9 values. Here's how the original curve looks when the rolling mean algorithm is applied to it.

- n = 10

- n = 20

In R, the `zoo`

package provides a convenient `rollmean`

function that takes the size of the rolling window as parameter.

### Local Regressions

In simple terms, this algorithm calculates the least squares fit for a given set of points chosen using the nearest neighbors algorithm. The number of data points is controlled by the `⍶`

parameter. It is also referred to as `loess`

for brevity. Here's how the original curve looks when the local regression algorithm is applied to it.

- ⍶ = 0.1

- ⍶ = 0.6

In R, the `loess`

function in the `base`

package provides a good implementation. The `span`

parameter controls the `⍶`

value. Note that you need to feed the model generated by the `loess`

function to `predict`

to get the resulting `y`

values.

### Smoothing Splines

This algorthm uses the properties of a spline function to calculate the smooth curve. The algorithm is iterative in nature and is controlled by the `λ`

parameter. Here's how the original curve looks when the smoothing spline algorithm is applied to it.

- λ = 0.1

- λ = 0.6

In R, the `smooth.spline`

function in the `base`

package provides a good implementation. The `spar`

parameter controls the `λ`

value.

Of course, there are many more ways you could smooth the data and I would encourage you to find the one that makes the most sense for the problem domain you are working in.