The
Perils of NMI
Copyright 1991,
Jack G. Ganssle
Abstract
NMI is a critical
resource, yet all too often it's misused.
Published in
Embedded Systems Programming, April 1991
Wise amateurs
fear interrupts. Fools go where wise men fear to tread. Normal sequential
code is hard enough to understand, code, and
debug. Toss in a handful of asyncronous events that randomly change the processor's
execution path, perhaps thousands of times per second,
and you have a recipe for disaster.
Yet interrupts
are an important fact of life for all real time systems. No experienced programmer
would dream of replacing a clean
interrupt service routine with polled I/O, particularly where fast I/O response
is required.
In fact, interrupts
are the both the best and worse microprocessor feature. Well thought out interrupt-driven
code will be reasonably easy
to write, debug and maintain. A poorly conceived interrupt routine is probably
the worst possible software to work on. Because interrupts
are so important to embedded systems, it is vital to become proficient with
their use.
If interrupts
are tough to work with, then the non-maskable interrupt (NMI) is the true
killer of the business. Be careful before you
connect a peripheral to your processor's NMI input - think through the problems
carefully.
Almost every
processor has some sort of NMI signal, though it may be called something else.
On the 68000, a level 7 interrupt cannot be
masked, and is equivalent to NMI. Some 8051-family CPUs have no non-maskable
interrupt, an idea that is sort of appealing in terms of
enforcing interrupt discipline.
I'm a firm believer
in restricting NMI to those conditions that are truly unusual and of momentous
importance. Quite a few designers use
NMI as a general purpose interrupt, a practice that usually spells disaster.
When timing gets
tight, the code can easily disable a conventional interrupt. Indeed, the very
assertion of an interrupt signal
automatically turns all interrupts off until the software explicitly reenables
them, giving the code a clean window to process a high
priority task. Not so with NMI. An NMI at any time will interrupt the CPU
- no ifs, ands or buts. As long as the hardware supplies NMIs to
the processor, it will stop whatever it's doing and vector through the NMI
handler.
The very fact
that NMI can never be disabled makes it ideal for handling a small but vital
class of extremely high priority events. Chief
among these is a power failure. If a system must die gracefully, then hardware
that detects the imminent loss of power can assert NMI to
let the software park disk heads, put moving sensors into a "safe"
state, copy important variables from RAM to non-volitile
storage, and generally prepare for being down.
Modern power
supplies have little reserve capacity. Old linear designs had massive filtering
capacitors that acted like batteries with
several seconds of reserve capacity. Today's off-line switchers use comparatively
tiny capacitors; smart electronics does the filtering.
When the AC power goes down, the switcher's output quickly follows suit.
During the short
time it takes for power to trail away the code may very well be executing
with interrupts disabled. Only NMI is
guaranteed to be available at all times. Power fail is such an important event,
that NMI is really the only option for notifying the
software of power's impending demise.
Perhaps more
should be said about power fail circuits at this point, since so many suffer
from serious design flaws. Most embedded systems
ignore power fail conditions. Running ROM based code with no dangerous or
critical external hardware, they can restart without harm from
the top when power resumes. However, two types of systems require power-fail
management hardware and software. The first category are
those systems controlling moving objects; a disk controller should park the
head, a robot should stop all motors, and an X-ray system
should shut down the beam.
The other class
are systems that preserve transient data through a power-up cycle. A data
acquisition system might need to keep logged data even when power goes down,
an instrument sometimes has to save painfully collected calibration constants,
and a video game should remember high scoring individuals' initials and totals.
Decaying Power
Far too many
designs rely on nothing more than battery backed up static RAMs or some true
non-volitile device like an EEPROM to store data
through multiple on/off cycles. More often than not these schemes work, but
all will sooner or later fail. Let's consider what happens
when the AC power fails.
Without AC, the
power supply stops working. The computer continues to run from the energy
stored in the supply's output capacitor. The
amount of time left before the computer goes haywire is proportional to the
size of the capacitor in microfarads and inversely
proportional to the amount of current consumed by the electronics.
Until the computer's
5 volts decays to about 4.75 it continues to run properly. At the 4.75 volt
level most of the system's chips are no
longer operating in their design region. No one can predict what will happen
with any certainty.
At about 4.8
to 4.9 volts the well-designed power fail circuit will inject an NMI into
the computer (some detect missing AC cycles, a
better but more expensive approach). Probably the system has only milleseconds
before Vcc decays to the 4.75 volt region of instability.
The NMI routine should quickly shut down external events and save critical
variables.
After processing
the power fail condition, the computer and external I/O is all in a safe state.
The voltage level continues to decline
past 4.75 volts, eventually reaching zero. Unfortunately, the supply's capacitor
decays exponentially. It will provide something between
zero and 4.75 volts for a comparatively long time (perhaps seconds).
What does the
CPU chip, memories, and glue logic do with, say, 4 volts applied? No one knows.
No vendor will guarantee any behavior under
the 4.75 volt level. Frequently the program just runs wild, executing practically
random instructions. Your carefully saved data or
meticulously protected I/O could be destroyed by rogue instructions!
No power fail
circuit is complete unless it clamps the reset line whenever power is less
than the magic 4.75 volt level. A suitable
circuit keeps the CPU in a reset state, preventing wild execution from corrupting
the efforts of the NMI power save routine. Motorola
sells a 3-terminal reset controller for less than a dollar which will hold
reset down in low Vcc conditions.
Consider another
case: suppose the power grid's sadly overload summertime generating capacity
experiences a brownout. If the line drops
from 110 VAC to, say, 80 volts, what happens to the +5 volt output from your
system's power supply? Most likely it will go out of
regulation, giving perhaps 3 or 4 volts until the 110 input level is reestablished.
Hopefully the power fail circuit will assert an NMI to
the processor chip. Using the conventional resistor/capacitor unclamped reset
circuit, the reset input will decline only to the 3-4 volt
level, not nearly low enough to force a reset when power comes back.
The reset clamping
circuit will not only keep the CPU in a safe state; in this brownout case
it will also insure that the system restarts
properly when +5 volts is reestablished.
Regardless, NMI
is the only reasonable interrupt choice for power fail detection. NMI Abuse
Unfortunately,
NMI is widely abused as a general purpose interrupt. Use NMI only for events
that occur infrequently. Never substitute it
for poor design.
It's not too
unusual to see a divider circuit driving NMI, generating hundreds or thousands
of interrupts per seconds. Usually these
designs start life using a reasonable maskable interrupt. As the programmers
debug the system they find the CPU occasionally misses an
interrupt, so they switch over to NMI. This is a mistake. If the code misses
interrupts, there is a fundamental flaw in its design that
NMI will not cure.
Your code will
miss interrupts only if some bottleneck keeps them disabled for too long.
Always design the code to keep interrupts disabled only while servicing the
hardware. Reenable them as soon as possible. With good reentrant design, interrupts
should never be off for more than a few tens of microseconds.
On the Edge
Quite a few processors
implement NMI as an edge sensitive interrupt. This guarantees that even a
breathtakingly short pulse will set the
CPU's internal NMI flip flop, so the interrupt simply cannot be missed. It
might, however, cause several kinds of nasty problems.
Suppose the input
comes from the real world, perhaps after having been transmitted a few feet.
Without proper pulse shaping circuitry, the
signal could easily have ragged edges or even multiple, closely spaced transitions.
Maskable interrupts live quite happily with short
bouncing on their lines, since the first transition will make the processor
disable the input and start the ISR. Even the fastest code
will take a few microseconds to service and reenable the interrupt, by which
time the transients will be long gone. NMI cannot be
disabled; every bit of bounce will reinitiate the NMI service routine. The
result: one real interrupt might masquerade as several
independent NMIs, each one pushing onto the stack and recalling the ISR.
Edge sensitive
inputs respond when the input voltage crosses some threshold. Imperfect digital
circuits give a rather broad window to the
threshold. If the NMI input signal is perfectly clean but moves slowly from
the idle to the asserted state, it stays within the threshold
region for far too long, sometimes causing multiple NMI triggers.
Finally, the
edge sensitive nature of the NMI signal renders it susceptible to every stray
bit of electrical noise. A clean NMI driven by
a gate on the other side of a circuit board might pick up unexpected noise
from other parts of the circuit.
Edge sensitive
NMI inputs must be clean, noise free, and should switch quickly and cleanly.
Remember that
debugging NMI service routines is sometimes tough. How will you single step
in an NMI service routine if, while debugging,
dozens more NMIs keep coming? Most of us debug code by stopping at a breakpoint
and looking at the registers and variables. If, when
debugging the NMI handler, another comes along while we're stopped, after
resuming execution the service routine will re-invoke itself,
probably corrupting a non-reentrant value.
In summary, NMI
is a valuable feature. Don't abuse it; restrict its use to those few situations
where only an NMI will solve a problem.