Tricks
of the Trade
Copyright 1996,
Jack G. Ganssle
Abstract
More troubleshooting
hints and kinks.
Troubleshooting
is more art than science. The Grand Masters of troubleshooting draw on a wealth
of experience, gleaned from battles fought
in the 3 AM getting-a-product-out trenches. One of the biggest challenges
faced by any engineer -- and any engineering manager -- is
gaining this experience quickly, so troubleshooting becomes yet another finely-honed
skill in one's toolbox.
In conversations
with engineers I've discovered a troubling pattern: more and more often troubleshooting
seems to be relegated to a
handful of old-timers. Are we seeing the beginning of the end of these critical
skills?
One of the fascinating
parts of having young children is learning the limits of wisdom. Kids just
have to make many of the same mistakes
we made, despite our desire to shield them from these errors, and despite
our best wishes to help them over life's rocky paths.
I think we unconsciously
adopt the same philosophy in all facets of business - more due to laziness
or lack of time than from
paternalistically watching our charges pick up things the hard way. Perhaps
the real reason we abandon an aggressive teaching strategy is
the lack of a codified "state of the art". How many of us, having
mastered troubleshooting or any other art, write about this skills? How
many of us even make rough notes about a cool trick or clever solution to
problems?
Fact is: virtually
no one does, so each generation of designers must reinvent the same skillset.
Seems pretty silly, doesn't it? I'm
always looking for ways of accelerating the learning process for the engineers
I work with. Ideally, we as an industry will someday
develop a handbook of troubleshooting wisdom. In the meantime the best we
can do is pass along our own experiences, and collect other
sources of knowledge.
One of the best
troubleshooting references is Bob Pease's book, Troubleshooting Analog Circuits
(Butterworth-Heinemann, Boston, 1991).
Though aimed squarely at the analog designer, it's still a must-read for us
digital folks. Never succumb to the temptation to forget that
digital electronics is still electronics; those ones and zeroes are merely
abstractions of specific voltages.
And so, here
are a few tips I've collected, mostly through the school of hard knocks.
Scoping Next
to a skeptical attitude, biased towards questioning every assumption, the
most important tool we use is the oscilloscope.
Emulators, logic analyzers, and all of those other nifty pieces of capital
equipment have their own very important roles, but nothing
measures up to a scope for 90% of normal troubleshooting.
I love toys -
the shinier and newer, the more knobs and displays the better. Scopes glitter
from the pages of catalogs, each with their
own special features luring us into a frenzy of high-tech lust. If you can
afford the best available, by all means scarf that puppy up and
enjoy the thrill of 2 GHz full digital acquisition at the touch of a button.
The rest of us
will often have to make do with something less extravagant. Though there is
no substitute for the correct test equipment,
clever use of that which we have may often be all that is required. One consultant
I know still uses a vacuum-tube based 545 with only
about 20 MHz bandwidth. Personally, I think he's working too hard, as spending
a few grand on a modern instrument seems like a minimal
price of entry to the field, but his deep knowledge of the scope, and troubleshooting
skills, makes him quite successful at finding tough
problems.
One of the worst
mistakes we make is neglecting probes. Crummy probes will turn that wonderful
1 GHz instrument into junk. Managers hate
to spend a lot on probes, when they see them drooling onto the floor, mixed
with all of the other debris. Worse, we always immediately
lose the tips and other accessories acquired at great expense, so connect
to a node using a 12 inch clip lead hastily purchased at Radio
Shack.
Then, after destroying
a couple of chips by accidentally shorting things to ground with that nice
alligator ground clip mounted on the
probe, we tear it off in frustration, losing it as well. Tip: if you really
don't intend to use the ground connection, clip that alligator
lead to itself, keeping it our of harm's way but instantly available for use.
Take care of
your probes. Keep them off the floor; don't let your chair roll over the leads,
squishing the coax and changing its
impedance. Buy decent ones before every probe in the shop falls apart. After
trying all of the cheap varieties found in general electronic
catalogs, I now swallow hard and spend the $150 needed to get high quality
probes from Tektronix or HP.
Here's another
tip: when using a scope, if a signal looks weird, maybe there's something
wrong! Avoid the temptation to rationalize the
problem. Instead of blaming the signal on a lousy ground, quickly connect
that ground clip and test your assumption.
Never accept
something that looks awful. Convince yourself that either it's actually OK,
or find the source of the problem.
Walk through
your lab. You'll find most of the digital folks have their vertical amplifiers
set to 2 /division, which eases displaying two
traces simultaneously. Unfortunately, too many of us seem to think the vertical
gain knob is welded into position. It's hard to
distinguish a valid zero from one drooling just a little too high with so
little resolution! Flip to 1 V/division occasionally to make
sure that zero is legitimate.
Every instrument
is a lying beast, a source of both information and disinformation. The scope
is no exception. A 100 MHz scope will show
even a perfect 50 MHz clock as a sine wave, not in it's true square form.
Digital scopes exhibiting aliasing - sweep too slowly (below the
Nyquist limit) for a given signal, and that 50 MHz clock may look like a perfect
1 kHz signal, causing the inexperienced engineer to go
crazy searching for a problem that just does not exist. You have to know your
tools to use them effectively.
We digital folks
deal in ones and zeroes
and tristates. Each condition means something.
When troubleshooting you've got to know which of
these three (not two!) states a node is in. Our best tool is the scope, yet
it is inherently incapable of distinguishing the tristate
condition.
In the good old
days of LS technology you could be pretty sure a tristated signal would show
up at around 1.5 volts - somewhere between a
zero and a one. With CMOS this assurance is gone, yet most engineers blithely
continue to assume that zero volts means zero. It just ain't
so!
My solution is
a little tool I made: a 1k resistor with a clip lead on each end. Mine is
nicely soldered together and covered with
insulation to avoid shorts. To tell the difference between a legal state and
high impedance, clip the tool to the node and alternately
touch the other end to Vcc and then ground. If the node moves more than a
trifle something is wrong. The scope, plus my tool, lets me
identify all three possible states. Without the tool I'm guessing, and guessing
while troubleshooting always sends you down time consuming
blind alleys.
You can use a
variation of this approach when troubleshooting an intermittent problem. If
the silly thing refuses to fail when you're
working on it - a sure bet, given the perversity of nature - run your fingers
over the board's pins. A purely digital board should
continue to run despite the slight impedance changes brought about by your
fingers, yet these may be enough to drive a floating pin to the
other state, hopefully creating the failure you are looking for.
On SMT boards
it's tough to get at a device's pins. If there's one pin you are suspicious
of, tough it with an X-Acto knife. The sharp
blade will precisely align with any tiny pin, and it's metal handle will conduct
your body impedance to the node. Sometimes I'll connect
my trusty pullup/pulldown clip lead to the knife itself to exercise the node
more deterministically.
Other Tricks
The most effective troubleshooting tool is a keen eye. With a working design,
most problems stem from poor manufacturing. How
many of us have spent hours troubleshooting a board, only to find a missing
chip? Perhaps the wrong part is installed, or the correct one
upside down.
In smaller companies
engineering is often production's backup for troubleshooting. Don't accept
boards unless a technician has performed a
careful visual inspection first.
Then, inspect
it yourself. It's far faster to find most manufacturing defects by eye then
by component-level diagnosis. Look for those
missing and backwards chips. Check soldering and solder splashes.
Inspect soldering
on through-hole boards using a not-terribly sharp pointer, like an awl. Move
it alone every pin, using it as a guide for
your eye (which will otherwise quickly tire looking at a sea of pins). Scan
the board one chip at a time, working in a logical progression
from one side of the board to the other. Look for unsoldered and poorly-soldered
pins, as well as solder splashes. If it looks bad, it is.
PC board defects
are the most frustrating of all problems. Despite modern quality control processes
they are still far too common. Keep
the PCB artwork around as a reference, so you can see where the tracks run
when it's time to fix a short or a design problem.
Often a new design
suffers from a problem you just KNOW you can cure by grounding a signal. Be
wary of using a clip lead as a grounder:
high speed signals will see the lead's inductance as a high impedance. The
ground end will be at ground, for sure. The signal end may not
look much different than without the clip lead attached. Edges are so fast
now, even in slow systems, that wires no longer act like wires.
Solder a short very short run to ground, perhaps using a discarded
resistor lead. I have found that grounding via a clip lead now only
works on DC signals. Sigh
in the good old days of slow systems a mountain
of clip leads were a troubleshooter's best weapons. Now look
warily at that mound, realizing that a wire is not a wire.
Use all of your
tools. One of our scopes has a neat digital counter. We use it for tough hardware/software
troubleshooting problems.
Unsure if an interrupt comes as often as it should? The counter will tell
you without a doubt how many come along. Wondering if all
interrupts get serviced? Put one counter on the interrupt line, and another
on the acknowledge, and see that the values are identical.
Computer systems
will crash and burn from a single-event. Though digital scopes are wonderful
at capturing single-shot signals, it's
usually much easier to work with a problem that repeats itself, often, so
you can run tests at will. A logic analyzer excels at finding
these one-time problems, but most won't help much with electrical (say, marginal
signal levels) issues.
Always be on
the lookout for ways to cause these events to repeat. For example, the easiest
way to troubleshoot reset problems is to use a
pulse generator to reset a dead CPU repeatedly, so you can scope the reset
sequence.
Years ago we
used a shortwave radio to listen to the operation of our systems' code. With
a little experience we knew what sort of noise
to expect in each of the instrument's important operating modes. With the
volume turned to a quiet murmer, any change in its buzz
instantly signaled trouble. Troubleshooting is a multi-sensory experience.
Wait! What's that? It smells like a resistor burning
a wire-
wound, by its odor
. The game's afoot!