Convolutions
Copyright 1993,
Jack G. Ganssle
Abstract
Convolutions
are a really cool way to smooth and even differentiate data. You gotta check
this out.
Published in
Embedded Systems Programming, December 1993
In November of
1992 I wrote about taming analog noise, a subject that generated a heart warming
flood of letters. A lot of you apparently
develop systems that read A/D converters and compute some sort of response
based on the input data. In the good old days cheap 8 bit A/Ds
were more than adequate for many applications, and had such a crummy resolution
that noise wasn't much of an issue. That has certainly
changed. I'm astonished at the number of ultra-high resolution converters
now cheaply available. Maxim and others sell 16, 18, and even 20
bit A/Ds. Some are surprisingly cheap.
18 bits of conversion
drives the weight of each bit into the microvolts. Each bit is worth:
(input voltage
range)/(2**number of bits).
An old 8 bit
A/D converting over a 0 to +5 range had a resolution of about 20 millivolts
(.020 volts). 18 bits over the same range gives
each LSB 19 microvolts (.000019 volts)! It's awfully hard to produce analog
circuits that have less than 19 microvolts of noise.
As I discussed
a year ago, fast repetitive signals or those modulated on a high frequency
carrier are fairly easy to filter. An example is
your TV set, which receives not much more than a few microvolts at the antenna,
buried in many millivolts of noise. This is easy to
filter since the signal is modulated on the station's carrier frequency. Narrow
bandwidth RF filters notch out virtually everything that
is not at almost exactly the carrier's center frequency. The noise is distributed
over a huge bandwidth; the filters remove all but a tiny
portion of that bandwidth, taking the noise out as well.
Radios are rather
a special case. Most signals presented to digital systems are at a much lower
frequency. Indeed, most are non-periodic,
nearly DC voltages. No carrier exists; all of the "bandwidth" carries
important information. Most solutions revolve around smart
or silly averaging techniques: you read the sample N times, sum each read,
and then divide by N.
It's important
that only the latest N readings are included in the average, so very old data
can't skew the results. N is chosen large
enough to generate quiet data, but must be small compared to the rate data
changes at, to avoid smearing the results.
For a running
average, as each sample is collected the oldest is dropped and the new one
added. A circular buffer is often used to store
the readings. Once the buffer is initially filled the system can provide a
smoothed output by taking only one reading and computing the
average of the buffer's contents.
Any simple averaging
technique reduces noise at a rate proportional to the square root of the number
of samples in the average. Using 100
samples results in an order of magnitude improvement in noise figure. Since
any system will eventually be bound by the computer and A/D
speed, this results in a sort of diminishing return. Increasing the number
of averages to 1000 results in only a factor of three or so
improvement in noise over an average of 100. A tradeoff must be made between
system response time and allowable noise level.
If your system
acquires "swept" data (like that from an oscilloscope: signal strength
on one axis and time or some other
parameter on the other), you have the option of averaging both in time (by
averaging each matching point of several sweeps together) as
well as in the dimension the data is taken over. For a scope this dimension
is time - a sample is taken so many times per second. Since
the data behaves as a continuous function, there are no sharp discontinuities
between data points. Any point i is much like its
neighbors, so a short average over the time axis does not distort the result
excessively.
Averaging over
the baseline axis (say, time) will indeed smear the data somewhat, but is
inherently faster than averaging multiple sweeps.
Your average is complete almost in real time.
To average in
the baseline dimension, imagine the digitized data has been placed in an array
D[i], where i runs from 0 (the first point of
the current scan) to k (the last point of the current scan). You can easily
generate a smoothed version of this data by computing a new
array D1[i..k]. Every element i of D1 is the average of D[i-2], D[i-1], D[i],
D[i+1], D[i+2].
Of course, this
has the effect of smearing the output data. Sharp peaks will be somewhat flattened.
Can we define an averaging strategy
that will avoid this side-effect?
The Convolution
The system just
described generates an output for each point i that is the average of all
the points in the vicinity of i. In effect, for
each i we make an output by multiplying the input waveform by a unit step
function, and summing the results at each point. The unit step
is then slid one point to the right and the algorithm repeated. Adjusting
the number of points included in each average simply changes the
width of the step. Wider steps (i.e., a bigger N) give more noise-free data,
but at the price of smearing it. Narrow steps give noisy
signals that are more faithful to the input data. A step exactly one point
wide gives the unmodified input signal. We call this technique
the convolution.
Now, when I went
to collage, the professors baffled me with math that left the concept behind
convolution a total mystery. Sure, I could
sling the exponential integrals the academics loved, but really had no idea
the convolution is the simple process of using one function,
in a narrow sliding window against another function, to generate a third that
is the combination of the two.
In averaging
we slide a very simple function along the axis. It's coefficients are 0,0,0,0,0,0,0,1,1,1,1,1,0,0,0,0.
Each string of zeroes
is infinitely long, and the string of ones is N points wide. Sure, we recognize
that this smears the results. Why not "convolve"
the input data with a function that resembles the signal itself? In other
words, pick a set of weights that accentuate the mid-point and
include reduced levels of the more distant points. Computationally this involves
multiplying each point in the vicinity of i with a
weighting factor, and then averaging the results. Points far away from the
center play a much less significant role in the result, giving
a more faithful output waveform. In other words, use a set of weighting values
to fit the result to the anticipated result.
Remember that
any continuous function can be approximated with a polynomial? We'll take
advantage of this here, and represent the signal's
shape over a small interval with a polynomial of some degree. We'll fit a
curve through the data to generate a more realistic model of
the signal we're digitizing. No one point will be assumed to be correct; the
curve defined by a collection of points will yield data that
is, on the average, closer to the truth.
The goal of curve
fitting techniques is to reduce the "amount of error" in the data.
A least squares approach does just this. If
you wish to minimize the maximum error you can use the Chebyshev approach,
which uses trig relations rather than polynomials to
approximate the function.
Least squares
will find coefficients for a polynomial describing a curve that fits the data
in such a fashion that the "sum-
square" error is minimized, i.e., the square root of the sum of the squares
of the error at each point is a minimum. The error at
each point is the difference between what the fitted curve predicts, and what
the actual data is.
It's too hard
to compute a least squares polynomial in real time. Fortunately, the concept
of convolutions can be extended to perform the
least squares automatically.
The derivation
is too involved to present here. I suggest reference to the Savitsky and Golay
(see reference). Suffice to say that a set
of integers can be defined that, when convolved with an input signal, gives
an output waveform that is a least squares fit to an
"ideal" signal over a narrow range.
Picking the convolving
functions is a bit of an art form, and depends on the characteristics of your
input data. Very complex data will
need high-order polynomials to get a decent fit. Less busy data can probably
be matched with simple third order equations.
Here are a few
sets of integers that define convolving functions that will yield least squares
fits to input data. Different convolution
intervals are given. A polynomial fit will be made over the number of sample
points shown. Thus, the set of 5 integers will try to fit 5
sample points.

Just as you'd
divide by N when doing a simple average of N samples, you must divide the
sum by the "norm" as shown above. In
this case, the norm is 35.
If the data is
very busy (i.e., over only a few sample points it undergoes a lot of maxima
and minima), then a small set of integers
should be picked (e.g., 9 rather than 21). After all, the method attempts
to fit a curve to a short segment of the input data; busy data
is all but impossible to fit under any circumstances. Data that changes slowly
can use the larger sets of integers, resulting in more
smoothing. Differentiation
Stretch you mind
back to college circuit theory. Those who managed to stay awake may remember
that the convolution integral has an
important property, namely:
f'(t)=g(t)*h'(t)
and, f'(t)=g'(t)*h(t)
where * represents
the convolution process and the prime marks indicate the derivative.
This means if
you need to both smooth and differentiate a signal, you can convolve the signal
with the derivative of the convolving
function. You never need to explicitly differentiate the input signal, a task
that can use far too much computer power to be practical.
Since there are
many instances where it is important to compute the derivative of an input
signal, it is worthwhile to pursue this a
little further. Remember that the convolving process primarily smoothes input
data. If a useful smoothing function is picked, the
derivative of that function can be convolved with the input signal to both
smooth and take the input's derivative. A single convolution
can do both.
Why compute a
derivative? The first derivative of a function is nothing more than its rate
of change, a factor used extensively in some
embedded systems, particularly in instruments.
So, if we compute
the derivative of the least squares function, we can generate a new set of
integers that both smoothes and
differentiates the data. Again, refer to the reference for the math details.
Theoretically, you can extend this concept to any number of
derivatives, though it seems above about the third derivative the integers
get unwieldy.
The following
set of integers computes the first derivative:

Apply these in
exactly the same manner as previously described.
Conclusion
A lot of us
primarily use old-fashioned averaging for noise reduction, often simply summing
successive sweeps together. The convolution is
a different way of looking at noise reduction, potentially improving response
time by performing a smart average over the time dimension.
It even lets you differentiate the signal, for essentially no cost in speed
or memory. Of course, any sort of averaging does reduce signal
fidelity somewhat, but in smoothing, as in life, everything is a compromise.
Those who are
interested in pursuing this further should consult the classic paper by Abraham
Savitzky and Marcel Golay in the July, 1964
issue of Analytical Chemistry (Volume 36 Number 8).