Interrupt
Predictability
Copyright 1995,
Jack G. Ganssle
Abstract
Will your ISRs
be fast enough? How do you know?
Published in
Embedded Systems Programming, May 1995
"There are
strange things done in the midnight sun by the men who moil for gold The arctic
trails have their secret tales that would
make your blood run cold; The northern lights have seen queer sights, but
the queerest they ever did see Was that night on the marge of
Lake Lebarge, when I cremated Sam Mcgee."
Switch a few
words from these lines from Robert Service's ode to the Yukon and you'd have
a description of the weird and mysterious ways
software developers make their real time systems run properly. For, it seems
when interrupts start coming fast and furious, like a shower
of arrows in an injun attack, no one really knows how to insure that each
interrupt gets serviced on-time and in-time.
How do you know
that your code handles every interrupt in a timely manner? Though it's possible
to watch for a system crash, some failures
occur slowly and insidiously, like a Windows application that erratically
leaks resources. A few missed interrupts may only cause your
system to lose track of time, at first, or to miscount an encoder by just
a few pulses. These cancerous infections may lurk for months or
years before manifesting themselves as noticeable bugs. Components of Disaster
A simple embedded
system with a single interrupt clearly will run correctly as long as the ISR
never takes longer to execute than the
frequency of the interrupt. Correctness is easy to prove: measure the ISR's
maximum execution time and compare it to the interrupt's
minimum interval.
Unless you have
very smart hardware that can stack up backlogged interrupts, then the software
must service each interrupt before two get
backed up. Two? One interrupt can go pending while another is being serviced;
though the processor will ignore the single backlogged one
as the first is in process, after the ISR completes the CPU will go ahead
and respond to the asserted request.
If yet another
interrupt were to occur (say, from that rotating encoder), then one of the
two backlogged requests will be lost. After all,
the interrupt comes in to the CPU on but a single pin; it can only express
"pending" or "not pending" states; there's
just not enough bits to indicate "hey, now I've got two pending!"
The obvious moral
is to make sure interrupts are never disabled for so long that one can be
missed. It's not easy - perhaps sometimes not
even possible - to guarantee that the code will satisfy this condition. I
contend that any reasonably complex system will probably not
have an interrupt structure that is "practically" provably correct.
"Practically" is the operative word - I have yet
to speak to any embedded designer using any formal method of proving code
correctness for any application.
If the academics
have a solution, we're not using it!
Crummy hardware
design will create significant interrupt service problems as well. Most processors
have level sensitive interrupt inputs.
Any device requesting an interrupt must assert the request until the processor
acknowledges it. You can't just bleep the input and expect
the CPU to catch it.
Design your hardware
to assert the input until the CPU responds with an interrupt acknowledge cycle.
Most modern processors will require
this, as you'll have to drop a vector on the bus at the same time. Others,
though, include default vectoring that sorely tempts a chip-
limited designer to just assume the software will always be in an interrupt-ready
state. Your code could be off doing something, with
interrupts disabled, and miss that oh-so-short input signal. Reentrancy
Well designed
interrupt handlers are largely reentrant. Reentrant functions, AKA "pure
code", are often falsely thought to be
any code that does not modify itself. Too many programmers feel if they simply
avoid self-modifying code, then their routines are
guaranteed to be reentrant, and thus interrupt-safe. Nothing could be further
from the truth.
A function is
reentrant if, while it is being executed, it can be re-invoked by itself,
or by any other routine. Reentrancy was originally
invented for mainframes, in the days when memory was a valuable commodity.
System operators noticed that a dozen or hundreds of identical
copies of a few big programs would be in the computer's memory array at any
time. At the University of Maryland, my old hacking grounds,
the monster Univac 1108 had one of the early reentrant FORTRAN compilers.
It burned up a (for those days) breathtaking 32kw of system
memory, but being reentrant, it required only 32k even if 50 users were running
it. Each user executed the same code, from the same set of
addresses.
A routine must
satisfy the following conditions to be reentrant:
1) It never modifies
itself. That is, the instructions of the program are never changed. Period.
Under any circumstances. Far too many
embedded systems still violate this cardinal rule.
2) All variables
changed by the routine must be allocated to a particular "instance"
of the function's invocation. Thus, if
reentrant function FOO is called by three different functions, then FOO's
data must be stored in three different areas of RAM. The C
language makes this trivial, assuming you are clever enough to use automatic
variables in your code. Automatics are stored on the stack;
each incarnation of a reentrant routine brings in its own stack frame, and
own set of automatics.
This is not the
only reentrancy issue, though. Suppose your main line routine and the ISRs
are all coded in C. The compiler will certainly
invoke runtime functions to support floating point math, I/O, string manipulations,
etc. If the runtime package is only partially
reentrant, than your ISRs may very well corrupt the execution of the main
line code. This problem is common, but is virtually impossible
to troubleshoot since symptoms result only occasionally and erratically. Can
you imagine the difficulty of isolating a bug which
manifests itself only occasionally, and with totally different characteristics
each time?
Be sure your
compiler has a pure runtime package.
Now, sometimes
we're tempted to cheat and write a nearly-pure routine. If your ISR merely
increments a global 32 bit value, say, to
maintain time, it would seem legal to produce code that does nothing more
than a quick and dirty increment. Beware! Especially when
writing code on an 8 or 16 bit processor, remember that the C compiler will
surely generate several instructions to do the deed. On a 186,
the construct ++j might produce:
mov ax,[j]
add ax,1 ; increment low part of j
mov [j],ax
mov ax,[j+1]
adc ax,0 ; prop carry to high part of j
mov [j+1],ax
An interrupt
in the middle of this code will leave j just partially changed; if the ISR
is reincarnated with j in transition, its value
will surely be corrupt.
Even the perfectly
coded reentrant ISR leads to problems. If such a routine runs so slowly that
interrupts keep giving birth to additional
copies of it, eventually the stack will fill. Once the stack bangs into your
variables the program is on its way to oblivion. You must
insure that the average interrupt rate is such that the routine will return
more often than it is invoked. Measuring Interrupt Response
Though predicting
a system's interrupt response is probably impossible, you can use a few tricks
to get typical performance numbers.
Typical numbers
are the best we can get. There's no assurance that measurements taken over
a second, year or century will represent worst
case system performance. Perhaps one day users select an unusual combination
of inputs; the temperature is running a bit hot; interrupts
are bunched up by a faster than usual serial stream, whose data for some reason
consists of once-in-a-lifetime numbers that are tough to
compute, burning more CPU time. One interrupt runs just a shade too long,
causing another to back up till a third gets missed. It's a
chaotic situation that we hope never occurs, but our hopes are based on nothing
more than a nervous prayer. Thankfully the occupants of
the aircraft whose autopilot your system controls don't understand just how
poorly we know what we're doing!
Branch analyzers
are the rage in larger systems. These devices, akin to emulators, monitor
your code's execution to insure that every
possible branch in the code takes place. A branch analyzer insures that the
code has been at least totally exercised, though correctness
is more difficult to monitor. Though a branch analyzer will prove that each
ISR has executed at least once, it simply can't insure that
interrupts will never be missed.
A scope can measure
interrupt latency and response very effectively in a single-interrupt system,
but when more than one device can
interrupt the processor, the scope is generally unsatisfactory. Too much is
going on, too fast, in too many dimensions, to monitor on even
a fast digital scope. Similarly, logic analyzers do a poor job of finding
crummy interrupt response.
Probably the
best hardware tool you can use is a decent performance analyzer. Be sure to
get one that measures more than average response
to an interrupt; it must log the worst case, or maximum time, in each ISR.
Make sure it can monitor all of the ISRs simultaneously. Run
your tests for weeks over every possible condition - and then cross your fingers
and hope things don't degenerate after the product starts
to ship.
Personally, I
think the best way to measure interrupt predictability is to instrument the
code to fault when an error occurs. Plan for
failure. If your system can at least alert the user that things have gone
to hell, you'll avert a crash and will have the option of
failing gracefully. In a life-critical application add a little hardware to
indicate "lost interrupt"... but don't tie the
output of this circuit to the CPU's normal interrupt pin! Use NMI, as this
situation is as catastrophic as a power failure.
Beware of reentrant
routines. Add a bit of code in the system's main loop to monitor the stack
pointer. If the SP bottoms out, you've
clearly got a problem that could be related to getting interrupts faster than
the system can process them. Any sort of creeping SP is a
deadly problem that is easy to detect. Common Sense Coding
Poorly coded
interrupt service routines are the bane of our industry. Most ISRs are hastily
thrown together, tuned at debug time to work,
and tossed in the "oh my god it works" pile and forgotten. A few
simple rules can alleviate many of the common problems.
First, don't
even consider writing a line of code for your new embedded system until you
lay out an interrupt map. List each one, and give
an English description of what the routine should do. Include your estimate
of the interrupt's frequency.
Now approximate
the complexity of each ISR. Given the interrupt rate, with some idea of how
long it'll take to service each, you can
assign priorities (assuming your hardware includes some sort of interrupt
controller). Some developers assign the highest priority to
things that must get done; remember that in any embedded system every interrupt
must be serviced sooner or later. Give the highest
priority to things that must be done in staggeringly short times to satisfy
the hardware or the system's mission (like, to accept data
coming in from a 1 Mb/sec source).
The cardinal
rule of interrupt handling is to keep the handlers short. A long ISR simply
reduces the odds you'll be able to handle all
time-critical events in a timely fashion. If the interrupt starts something
truly complex, have the ISR spawn off a task that can run
independently. This is an area where an RTOS is a real asset, as task management
requires nothing more than a call from the application
code.
Reenable interrupts
as soon as practical in the ISR. Do the hardware-critical and non-reentrant
things up front, then execute the
interrupt enable instruction. Give other ISRs a fighting chance to do their
thing.
Use reentrant
code! Write your ISRs in C if at all possible, and use C's wonderful local
variable scoping. Globals are an abomination in
any programming environment; never more so than in interrupt handlers. Reentrant
C code is orders of magnitude easier to write than
reentrant assembly code.
Don't use NMI
for anything other than catastrophic events. Power-fail, system shutdown,
interrupt loss, and the apocalypse are all good
things to monitor with NMI. Timer or UART interrupts are not.
When I see an
embedded system with the timer tied to NMI, I know, for sure, that the developers
found themselves missing interrupts. NMI
may alleviate the symptoms, but only masks deeper problems in the code that
most certainly should be cured.
NMI will break
a reentrant interrupt handler, since most ISRs are non-reentrant during the
first few lines of code where the hardware is
serviced. NMI will thwart your stack management efforts as well. Conclusion
Start your interrupt
planning before writing a single line of code. Work out the details, priorities,
and maximum execution times.. Plan
for problems: include code that looks for failures. In a really busy system
try desperately to get time allocated for lots of testing,
though we all know that when the system works at all, management will usually
yell their mantra: "ship it!"
References:
The Cremation
of Sam Mcgee, by Robert Service, from Collected Poems of Robert Service, 1907,
G.P. Putnam Sons, NY