3 Gordon Drive, P.O.Box 1347 Rockland, Maine 04841 U.S.A.
Find Tools for Your Chip


Subscribe to our Newsletter

© 2004 Avocet Systems, Inc.
Call Us Today at 207-596-0080
Avocet Systems, Inc. : The Complete Solution for Embedded Systems Development Tools
Convolutions

Copyright 1993, Jack G. Ganssle

Abstract

Convolutions are a really cool way to smooth and even differentiate data. You gotta check this out.

Published in Embedded Systems Programming, December 1993

In November of 1992 I wrote about taming analog noise, a subject that generated a heart warming flood of letters. A lot of you apparently
develop systems that read A/D converters and compute some sort of response based on the input data. In the good old days cheap 8 bit A/Ds
were more than adequate for many applications, and had such a crummy resolution that noise wasn't much of an issue. That has certainly
changed. I'm astonished at the number of ultra-high resolution converters now cheaply available. Maxim and others sell 16, 18, and even 20
bit A/Ds. Some are surprisingly cheap.

18 bits of conversion drives the weight of each bit into the microvolts. Each bit is worth:

(input voltage range)/(2**number of bits).

An old 8 bit A/D converting over a 0 to +5 range had a resolution of about 20 millivolts (.020 volts). 18 bits over the same range gives
each LSB 19 microvolts (.000019 volts)! It's awfully hard to produce analog circuits that have less than 19 microvolts of noise.

As I discussed a year ago, fast repetitive signals or those modulated on a high frequency carrier are fairly easy to filter. An example is
your TV set, which receives not much more than a few microvolts at the antenna, buried in many millivolts of noise. This is easy to
filter since the signal is modulated on the station's carrier frequency. Narrow bandwidth RF filters notch out virtually everything that
is not at almost exactly the carrier's center frequency. The noise is distributed over a huge bandwidth; the filters remove all but a tiny
portion of that bandwidth, taking the noise out as well.

Radios are rather a special case. Most signals presented to digital systems are at a much lower frequency. Indeed, most are non-periodic,
nearly DC voltages. No carrier exists; all of the "bandwidth" carries important information. Most solutions revolve around smart
or silly averaging techniques: you read the sample N times, sum each read, and then divide by N.

It's important that only the latest N readings are included in the average, so very old data can't skew the results. N is chosen large
enough to generate quiet data, but must be small compared to the rate data changes at, to avoid smearing the results.

For a running average, as each sample is collected the oldest is dropped and the new one added. A circular buffer is often used to store
the readings. Once the buffer is initially filled the system can provide a smoothed output by taking only one reading and computing the
average of the buffer's contents.

Any simple averaging technique reduces noise at a rate proportional to the square root of the number of samples in the average. Using 100
samples results in an order of magnitude improvement in noise figure. Since any system will eventually be bound by the computer and A/D
speed, this results in a sort of diminishing return. Increasing the number of averages to 1000 results in only a factor of three or so
improvement in noise over an average of 100. A tradeoff must be made between system response time and allowable noise level.

If your system acquires "swept" data (like that from an oscilloscope: signal strength on one axis and time or some other
parameter on the other), you have the option of averaging both in time (by averaging each matching point of several sweeps together) as
well as in the dimension the data is taken over. For a scope this dimension is time - a sample is taken so many times per second. Since
the data behaves as a continuous function, there are no sharp discontinuities between data points. Any point i is much like its
neighbors, so a short average over the time axis does not distort the result excessively.

Averaging over the baseline axis (say, time) will indeed smear the data somewhat, but is inherently faster than averaging multiple sweeps.
Your average is complete almost in real time.

To average in the baseline dimension, imagine the digitized data has been placed in an array D[i], where i runs from 0 (the first point of
the current scan) to k (the last point of the current scan). You can easily generate a smoothed version of this data by computing a new
array D1[i..k]. Every element i of D1 is the average of D[i-2], D[i-1], D[i], D[i+1], D[i+2].

Of course, this has the effect of smearing the output data. Sharp peaks will be somewhat flattened. Can we define an averaging strategy
that will avoid this side-effect?

The Convolution

The system just described generates an output for each point i that is the average of all the points in the vicinity of i. In effect, for
each i we make an output by multiplying the input waveform by a unit step function, and summing the results at each point. The unit step
is then slid one point to the right and the algorithm repeated. Adjusting the number of points included in each average simply changes the
width of the step. Wider steps (i.e., a bigger N) give more noise-free data, but at the price of smearing it. Narrow steps give noisy
signals that are more faithful to the input data. A step exactly one point wide gives the unmodified input signal. We call this technique
the convolution.

Now, when I went to collage, the professors baffled me with math that left the concept behind convolution a total mystery. Sure, I could
sling the exponential integrals the academics loved, but really had no idea the convolution is the simple process of using one function,
in a narrow sliding window against another function, to generate a third that is the combination of the two.

In averaging we slide a very simple function along the axis. It's coefficients are 0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0. Each string of zeroes
is infinitely long, and the string of ones is N points wide. Sure, we recognize that this smears the results. Why not "convolve"
the input data with a function that resembles the signal itself? In other words, pick a set of weights that accentuate the mid-point and
include reduced levels of the more distant points. Computationally this involves multiplying each point in the vicinity of i with a
weighting factor, and then averaging the results. Points far away from the center play a much less significant role in the result, giving
a more faithful output waveform. In other words, use a set of weighting values to fit the result to the anticipated result.

Remember that any continuous function can be approximated with a polynomial? We'll take advantage of this here, and represent the signal's
shape over a small interval with a polynomial of some degree. We'll fit a curve through the data to generate a more realistic model of
the signal we're digitizing. No one point will be assumed to be correct; the curve defined by a collection of points will yield data that
is, on the average, closer to the truth.

The goal of curve fitting techniques is to reduce the "amount of error" in the data. A least squares approach does just this. If
you wish to minimize the maximum error you can use the Chebyshev approach, which uses trig relations rather than polynomials to
approximate the function.

Least squares will find coefficients for a polynomial describing a curve that fits the data in such a fashion that the "sum-
square" error is minimized, i.e., the square root of the sum of the squares of the error at each point is a minimum. The error at
each point is the difference between what the fitted curve predicts, and what the actual data is.

It's too hard to compute a least squares polynomial in real time. Fortunately, the concept of convolutions can be extended to perform the
least squares automatically.

The derivation is too involved to present here. I suggest reference to the Savitsky and Golay (see reference). Suffice to say that a set
of integers can be defined that, when convolved with an input signal, gives an output waveform that is a least squares fit to an
"ideal" signal over a narrow range.

Picking the convolving functions is a bit of an art form, and depends on the characteristics of your input data. Very complex data will
need high-order polynomials to get a decent fit. Less busy data can probably be matched with simple third order equations.

Here are a few sets of integers that define convolving functions that will yield least squares fits to input data. Different convolution
intervals are given. A polynomial fit will be made over the number of sample points shown. Thus, the set of 5 integers will try to fit 5
sample points.

Just as you'd divide by N when doing a simple average of N samples, you must divide the sum by the "norm" as shown above. In
this case, the norm is 35.

If the data is very busy (i.e., over only a few sample points it undergoes a lot of maxima and minima), then a small set of integers
should be picked (e.g., 9 rather than 21). After all, the method attempts to fit a curve to a short segment of the input data; busy data
is all but impossible to fit under any circumstances. Data that changes slowly can use the larger sets of integers, resulting in more
smoothing. Differentiation

Stretch you mind back to college circuit theory. Those who managed to stay awake may remember that the convolution integral has an
important property, namely:

f'(t)=g(t)*h'(t) and, f'(t)=g'(t)*h(t)

where * represents the convolution process and the prime marks indicate the derivative.

This means if you need to both smooth and differentiate a signal, you can convolve the signal with the derivative of the convolving
function. You never need to explicitly differentiate the input signal, a task that can use far too much computer power to be practical.

Since there are many instances where it is important to compute the derivative of an input signal, it is worthwhile to pursue this a
little further. Remember that the convolving process primarily smoothes input data. If a useful smoothing function is picked, the
derivative of that function can be convolved with the input signal to both smooth and take the input's derivative. A single convolution
can do both.

Why compute a derivative? The first derivative of a function is nothing more than its rate of change, a factor used extensively in some
embedded systems, particularly in instruments.

So, if we compute the derivative of the least squares function, we can generate a new set of integers that both smoothes and
differentiates the data. Again, refer to the reference for the math details. Theoretically, you can extend this concept to any number of
derivatives, though it seems above about the third derivative the integers get unwieldy.

The following set of integers computes the first derivative:

Apply these in exactly the same manner as previously described.

Conclusion

A lot of us primarily use old-fashioned averaging for noise reduction, often simply summing successive sweeps together. The convolution is
a different way of looking at noise reduction, potentially improving response time by performing a smart average over the time dimension.
It even lets you differentiate the signal, for essentially no cost in speed or memory. Of course, any sort of averaging does reduce signal
fidelity somewhat, but in smoothing, as in life, everything is a compromise.

Those who are interested in pursuing this further should consult the classic paper by Abraham Savitzky and Marcel Golay in the July, 1964
issue of Analytical Chemistry (Volume 36 Number 8).