DMA
Copyright 1994,
Jack G. Ganssle
Abstract
DMA is an important
part of many embedded systems, yet far too many of us don't really understand
it. Read on...
Published in
Embedded Systems Programming, October 1994
Engineers love
to torture the English language, using words in convoluted ways, verbizing
the most passive of nouns, and inventing strange new words that might embarrass
your mother (the word "dikes", referring to diagonal cutters - wire
cutters - comes to mind).
Acronyms are
our special bane. Every three word noun phrase is immediately shortened to
its initials, even when this may make an acronym of an acronym. The passage
of time dulls ones memory till the original words referred to by the letters
slip away, so the acronym becomes its own word. For example, though "CRT"
really refers only to a large tube, it has come to stand for a complete monitor,
electronics, tube, and all. Even names of corporations reflect our reliance
on verbal shorthand. IBM, GE, SAIC - an alphabet soup of letters dances in
front of our eyes. CACI even reincorporated themselves some years back to
make the acronym the new, real corporate name, consigning the words the letters
stood for to eternal obscurity.
Many embedded
systems make use of DMA controllers. How many of us remember that DMA stands
for Direct Memory Access? What idiot invented
this meaningless phrase? I figure any CPU cycle directly accesses memory.
This is a case of an acronym conveniently sweeping an embarrassing piece of
verbal pomposity under the rug where it belongs.
Regardless, DMA
is nothing more than a way to bypass the CPU to get to system memory and/or
I/O. DMA is usually associated with an I/O device that needs very rapid access
to large chunks of RAM. For example - a data logger may need to save a massive
burst of data when some event occurs.
DMA requires
an extensive amount of special hardware to managing the data transfers and
to arbitrating access to the system bus. This might seem to violate our desire
to use software wherever possible. However, DMA makes sense when the transfer
rates exceed anything possible with software. Even the fastest loop in assembly
language comes burdened with lots of baggage. A short code fragment that reads
from a port, stores to memory, increments pointers, decrements a loop counter,
and then repeats based on the value of the counter takes quite a few clock
cycles per byte copied. A hardware DMA controller can do the same with no
wasted cycles and no CPU intervention.
Admittedly, modern
processors often have blindingly fast looping instructions. The 386's REPS
(repeat string) moves data much faster than most applications will ever need.
However, the latency between a hardware event coming true, and the code being
ready to execute the REPS, will surely be many microseconds even in the most
carefully crafted program - far too much time in many applications.
How it Works
Processors provide
one or two levels of DMA support. Since the dawn of the micro age just about
every CPU has had the basic bus exchange
support. This is quite simple, and usually consists of just a pair of pins.
"Bus Request"
(AKA "Hold" on Intel CPUs) is an input that, when asserted by some
external device, causes the CPU to tri-state it's pins at the completion of
the next instruction. "Bus Grant" (AKA "Bus Acknowledge"
or "Hold
Acknowledge") signals that the processor is indeed tristated. This means
any other device can put addresses, data, and control signals on the bus.
The idea is that a DMA controller can cause the CPU to yield control, at which
point the controller takes over the bus and initiates bus cycles. Obviously,
the DMA controller must be pretty intelligent to properly handle the timing
and to drive external devices through the bus.
Modern high integration
processors often include DMA controllers built right on the processor's silicon.
This is part of the vendors' never-ending quest to move more and more of the
support silicon to the processor itself, greatly reducing the cost and complexity
of building an embedded system. In this case the Bus Request and Bus Grant
pins are connected to the onboard controller inside of the CPU package, though
they usually come out to pins as well, so really complex systems can run multiple
DMA controllers. It's a scary thought....
Every DMA transfer
starts with the software programming the DMA controller, the device (either
on-board a integration CPU chip or a discrete component) that manages these
transactions. The code must typically set up destination and source pointers
to tell the controller where the data is coming from, and where it is going
to. A counter must be programmed to track the number of bytes in the transfer.
Finally, numerous bits setting the DMA mode and type must be set. These may
include source or destination type (I/O or memory), action to take on completion
of the transaction (generate an interrupt, restart the controller, etc.),
wait states for each CPU cycle, etc.
Now the DMA controller
waits for some action to start the transfer. Perhaps an external I/O device
signals it is ready by toggling a bit. Sometimes the software simply sets
a "start now" flag. Regardless, the controller takes over the bus
and starts making transfers.
Each DMA transfer
looks just like a pair of normal CPU cycles. A memory or I/O read from the
DMA source address is followed by a corresponding write to the destination.
The source and destination devices cannot tell if the CPU is doing the accesses
or if the DMA controller is doing them.
During each DMA
cycle the CPU is dead - it's waiting patiently for access to the bus, but
cannot perform any useful computation during the transfer. The DMA controller
is the bus master. It has control until it chooses to release the bus back
to the CPU.
Depending on
the type of DMA transfer and the characteristics of the controller, a single
pair of cycles may terminate to allow the CPU to
run for a while, or a complete block of data may be moved without bringing
the processor back from the land of the idle.
Once the entire
transfer is complete the DMA controller may quietly go to sleep, or it may
restart the transfer when the I/O is ready for another block, or it may signal
the processor that the action is complete. In my experience this is always
signaled via an interrupt. The controller interrupts the CPU so the firmware
can take the appropriate action.
To summarize,
the processor programs the DMA controller with parameters about the transfers,
the controller tri-states the CPU and moves data over the CPU bus, and then
when the entire block is done the controller signals completion via an interrupt.
DMA Types
DMA controllers
are wondrous and scary things, each with dozens of registers you must program
just right to get any sort of response. I
think these things are designed by committee, with each member throwing every
possible feature into the chip. Though it's nice to have so much capability,
writing the code can be a trial.
You must first
have a very clear idea of exactly what sort of DMA transfers your system needs.
DMA was invented to move data between I/O and memory, but now people use it
for a variety of other reasons as well.
Traditional Synchronous
DMA moves a byte or word at a time between system memory and a peripheral,
handshaking with the I/O port for each
transfer. This sort of transfer recognizes that the port may not always be
in a ready condition; the handshaking is a hardware mechanism to throttle
the transactions.
With this sort
of transfer, the program sets up the controller and then carries on, oblivious
to the state of the DMA transaction. The hardware moves one byte or word between
memory and I/O each time the I/O port signals it is ready for another transaction.
On each read indication, the DMA controller asserts Bus Request, waits for
a Bus Acknowledge in response, and then takes over the bus for a single cycle.
Then, the DMA controller goes idle again, waiting for another ready signal
from the port. Thus, the program and DMA cycles share bus cycles, with the
controller winning any contest for control of the bus. Sometimes this is called
"Cycle Stealing".
Burst Mode DMA,
in contrast, generally assumes that the destination and source addresses can
take transfers as fast as the controller can generate them. The program sets
up the controller, and then (perhaps after a single ready indication from
a port occurs), the entire source block is copied to the destination. The
DMA controller gains exclusive access to the bus for the duration of the transfer,
during which time the program is effectively shut down. Burst mode DMA can
transfer data very rapidly indeed.
Flyby DMA, something
that is not supported on many controllers, is a beast of a different color.
The DMA controller gains access to the bus and puts the source or destination
address out. Then, it initiates what is in effect a read and a write cycle
simultaneously. The data is read from the source address, and written to the
destination, at the same time. This implies that either the source or destination
does not require an address, since it is very unlikely that both would use
the same. An example might be copying data from memory to a FIFO port - the
source address (a pointer to memory) increments on each transfer, while the
destination is always the same FIFO.
Flyby transactions
are very fast since the read/write cycle pair is reduced to a single cycle.
Both burst and synchronous types of transfers can be supported.
Typical Uses
The original
IBM PC, that 8088 based monstrosity we all once yearned for but now snicker
at, used a DMA controller to generate dynamic RAM
refresh addresses. It simply ran a null transfer every few milliseconds to
generate the addresses needed by the DRAMs.
This was a very
clever design - a normal refresh controller is a mess of logic. The only down
side was that the PC's RAM was non-functional until the power-up code properly
programmed the DMA controller.
Both floppy and
hard disk controllers often use DMA to transfer data to and from the drive.
This is a natural and perfect application. The software arms the controller
and then carries on. The hardware waits for the drive to get to the correct
position and then performs the transfer without further reliance on the system
software.
If I'm working
on a microprocessor with a "free" DMA controller (one built onto
the chip), I'll sometimes use it for large memory transfers. This is especially
useful with processors with segmented address spaces, like the 80188 or Z180
(the Z180's space is not actually segmented, but is limited to 64k without
software intervention to program the MMU).
Both of these
CPUs include on-board DMA controllers that support transfers over the entire
1 Mb address space of the part. Judicious programming of the controller lets
you do a simple and easy memory copy of any size to any address - all without
worrying about segment registers or the MMU. This is yet another argument
for encapsulation: write a DMA routine once, debug it thoroughly, and then
reuse it even for mundane tasks.
Over the years
I've profiled a lot of embedded code, and in many instances have found that
execution time seems to be really burned up by string copy and move routines
inside the C runtime library. If you have a spare DMA controller channel,
why not recode portions of the library to use a DMA channel to do the moves?
Depending on the processor a tremendous speed improvement can result.
I'll never forget
the one time I should have used DMA, but didn't. As a consultant, rushed to
get a job done, I carelessly threw together a hardware design figuring somehow
I could make things work by tuning the software. For some inexplicable reason
I did not put a DMA controller on the system, and suffered for weeks, tuning
a few instructions to move data off a tape drive without missing bytes. A
little more forethought would have made a big difference. Troubleshooting
DMA
This is a case
where a thorough knowledge of the hardware is essential to making the software
work. DMA is almost impossible to troubleshoot without using a logic analyzer.
No matter what
mode the transfers will ultimately use, and no matter what the source and
destination devices are, I always first write a routine to do a memory to
memory DMA transfer. This is much easier to troubleshoot than DMA to a complex
I/O port. You can use your ICE to see if the transfer happened (by looking
at the destination block), and to see if exactly the right number of bytes
were transferred.
At some point
you'll have to recode to direct the transfer to your device. Hook up a logic
analyzer to the DMA signals on the chip to be sure that the addresses and
byte count are correct. Check this even if things seem to work - a slight
mistake might trash part of your stack or data space.
Some high integration
CPUs with internal DMA controllers do not produce any sort of cycle that you
can flag as being associated with DMA. This drives me nuts - one lousy extra
pin would greatly ease debugging. The only way to track these transfers is
to trigger the logic analyzer on address ranges associated with the transfer,
but unfortunately these ranges may also have non-DMA activity in them.
Be aware that
DMA will destroy your timing calculations. Bit banging UARTs will not be reliable;
carefully crafted timing loops will run slower than expected. In the old days
we all counted T-states to figure how long a loop ran, but DMA, prefetchers,
cache, and all sorts of modern exoticness makes it almost impossible to calculate
real execution time.
Scopes
On another subject,
some time ago I wrote about using oscilloscopes to debug software. The subject
is really too big to do justice in a couple of short magazine pieces. However,
I just received a booklet from Tektronix that does justice to the subject.
It's called "Basic Concepts - XYZ of Analog and Digital Oscilloscopes",
and is their publication number 070869001. Highly recommended. Get the book,
borrow a scope, and play around for a while. It's fun and tremendously worthwhile.