Banking
Basics
Copyright 1996,
Jack G. Ganssle
Abstract
What do you do
when you run out of address space? Go to a bigger processor? Maybe, but another
option is building a memory manager.
Nelson Rockefeller,
when asked how much money is enough, reportedly replied "just a little
bit more." We poor folks may have trouble
understanding his perspective, but all too often exhibit the same response
when picking the size of the address space for a new design.
Given that the code inexorably grows to fill any allocated space, "just
a little more" is a plea we hear from the software people all too
often.
Is the solution
to use 32 bit machines exclusively, cramming a full 4 GB of RAM into our cost-sensitive
application in the hopes that no
one could possibly use that much memory?
Though clearly
most systems couldn't tolerate the costs associated with such a poor decision,
an awful lot of designers take a middle
tack, selecting high end processors to cover their (ahem) posterior parts.
32 bit CPUs have
tons of address space. 16 bitters sport (generally) one to 16 Mb. It's hard
to imagine needing more than 16 Mb for a
typical embedded app; even 1 Mb is enough for the vast majority of designs.
A typical 8 bit
processor, though, is limited to 64k. Once this was an ocean of memory we
could never imagine filling. Now C compilers let
us reasonably produce applications far more complex then dreamed of even a
few years ago. Today the mid-range embedded systems I see
usually burn up something between 64k and 256k of program and data space -
too much for an 8 bitter to handle without some help.
If horsepower
were not an issue I'd simply toss in an 80188 and profit from the cheap 8
bit bus that runs 16 bit instructions over 1 Mb of
address space. Sometimes this is simply not an option; an awful lot of us
design upgrades to older systems. We're stuck with tens of
thousands of lines of "legacy" code (sounds more like the name of
a car than a technical term) that are too expensive to change. The code
forces us to continue using the same CPU. Like taxes, programs always get
bigger, demanding more address space that the processor can
handle. Whatcha gonna do?
Perhaps the only
solution is to add address bits. Build an external mapper using PLDs or discrete
logic. The mapper's outputs go into high
order address lines on your RAM and ROM devices. Add code to remap these lines,
swapping sections of program or data in and out as
required.
Logical to Physical
Add a mapper,
though, and you'll suddenly be confronted with two distinct address spaces
that complicate software design.
The first is
the physical space - the entire universe of memory on your system. Expand
your processor's 64k limit to 256k by adding two
address lines, and the physical space is 256k.
Logical addresses
are the ones generated by your program, and thence asserted onto the processor's
bus. Executing a MOV A,(0FFFF)
instruction tells the processor to read from the very last address in its
64k logical address space. External banking hardware can
translate this to some other address, but the code itself remains blissfully
unaware of such actions. All it knows is that some data comes
from memory in response to the 0FFFF placed on the bus. The program can never
generate a logical address larger than 64k (for a typical 8
bit CPU with 16 address lines).
This is very
much like the situation faced by 80x86 assembly language programmers. 64k
segments are essentially logical spaces. You can't
get to the rest of physical memory without doing something; in this case reloading
a segment register.
Conversely, if
there's no mapper then the physical and logical spaces are identical.
Hardware Issues
Consider doubling
your address space by taking advantage of processor cycle types. If the CPU
differentiates memory reads from fetches you
may be able to easily produce separate data and code spaces. The 68000's seldom-used
function codes are for just this purpose, potentially
giving it distinct 16 Mb code and data spaces.
Writes should
clearly go to the data area (you're not writing self-modifying code, are you?).
Reads are more problematic. It's easy to
distinguish memory reads from fetches when the processor generates a fetch
signal for every instruction byte. Some processors (e.g., the
Z80) produce a fetch only on the read of the first byte of a multiple byte
opcode; subsequent ones all look the same as any data read.
Forget trying to split the memory space if cycle types are not truly unique.
When such a space
spitting scheme is impossible then build an external mapper that translates
address lines. However, avoid the temptation
to simply latch upper address lines. Though it's easy to store A16, A17 et
al in an output port, every time the latch changes the entire
program gets mapped out. Though there are awkward ways to write code to deal
with this, add a bit more hardware to ease the software
team's job.
Design a circuit
that maps just portions of the logical space in and out. Look at software
requirements first to see what hardware
configuration makes sense.
Every program
needs access to a data area which holds the stack and miscellaneous variables.
The stack, for sure, must always be visible
to the processor so calls and returns function. Some amount of "common"
program storage should always be mapped in. The remapping code,
at least, should be stored here so that it doesn't disappear during a bank
switch. Design the hardware so these regions are always
available.
Is the address
space limitation due to an excess of code or of data? Perhaps the code is
tiny, but a gigantic array requires tons of RAM.
Clearly, you'll be mapping RAM in and out, leaving one area of ROM - enough
to store the entire program - always in view. An obese program
yields just the opposite design. In either of these cases a logical address
space split into three sections makes the most sense: common
code (always visible, containing runtime routines called by a compiler and
the mapping code), mapped code or data, and common RAM (stack
and other critical variables needed all the time).
For example,
perhaps 0000 to 03FFF is common code. 4000 to 7FFF might be banked code; depending
on the setting of a port it could map to
almost any physical address. 8000 to FFFF is then common RAM.
Sure, you can
use heroic programming to simplify the hardware. I think it's a mistake, as
the incremental parts cost is minuscule compared
to the increased bug rate implicit in any complicated bit of code. It is possible
- and reasonable - to remove one bank by copying the
common code to RAM and executing it there, using one bank for both common
code and data.
It's easy to
implement a three-bank design. Suppose addresses are arranged as in the previous
example. A0 to A14 go to the RAM, which is
selected when A15 = 1.
Turn ROM on when
A15 is low. Run A0 to A14 into the ROM. Assuming we're mapping a 128k x 8
ROM into the 32k logical space, generate a fake
A15 and A16 (simple bits latched into an output port) that go to the ROM's
A15 and A16 inputs. However, feed these through AND gates.
Enable the gates only when A15=0 (RAM off) and A14=1 (bank area enabled).
RAM is, of course,
selected with logical addresses between 8000 and FFFF. Any address under 4000
disables the gates and enables the first
4000 locations in ROM. When A14 is a one, whatever values you've stuck into
the fake A15 and A16 select a chunk of ROM 4000 bytes long.
The virtue of
this design is its great simplicity and its conservation of ROM - there are
no wasted chunks of memory, a common problem
with other mapping schemes.
Occasionally
a designer directly generates chip selects (instead of extra address lines)
from the mapping output port. I think this is a
mistake. It complicates the ROM select logic. Worse, sometimes it's awfully
hard to make your debugging tools understand the translation
from addresses to symbols. By translating addresses, you can provide your
debugger with a logical to physical translation cheat sheet.
The Software
In assembly language
you control everything, so handling banked memory is not too difficult. The
hardest part of designing remappable code
is figuring out how to segment the banks. Casual calling of other routines
is out, as you dare not call something not mapped in.
Some folks write
a bank manager that tracks which routines are currently located in the logical
space. All calls, then, go through the
bank manager which dynamically brings routines in and out as needed.
If you were foresighted
enough to design your system around a real time operating system (RTOS), then
managing the mapper is much simpler.
Assign one task per bank. Modify the context switcher to remap whenever a
new task is spawned or reawakened.
Many tasks are
quite small - much smaller than the size of the logical banked area. Use memory
more efficiently by giving tasks two
banking parameters: the bank number associated with the task, and a starting
offset into the bank. If the context switcher both remaps and
then starts the task at the given offset, you'll be able to pack multiple
tasks per bank.
Some C compilers
come with built-in banking support. Check with your vendor. Some will completely
manage a multiple bank system,
automatically remapping as needed to bring code in and out of the logical
address space. Figure on making a few patches to the supplied
remapping code to accommodate your unique hardware design.
In C or assembly,
using an RTOS or not, be sure to put all of your interrupt service routines
and associated vectors in a common area. Put
the banking code there as well, along with all frequently-used functions (when
using a compiler, put the entire runtime package in
unmapped memory).
As always, when
designing the hardware carefully document the approach you've selected. Include
this information in the banking routine so
some poor soul several years in the future has a fighting chance to figure
out what you've done.