Huge
Z180 Arrays
Copyright 1990,
Jack G. Ganssle
Abstract
The Z180's banking
scheme is great for handling code; data is a bit more complex. Here's example
code.
Published in
Embedded Systems Programming, August 1990
Zilog's Z180 (aka
64180) is sometimes used in applications where huge quantities of data are collected
or analyzed. Their CMOS low power
design makes them perfect for battery powered data acquisition equipment. Or,
as part of an instrument, their architecture is ideal for
storing large amounts of data during analysis.
As I mentioned
in a previous article, these processors include a powerful Memory Management
Unit (MMU) designed to extend the address
space to a full megabyte. With some understanding of the intricacies of the
MMU, it's not too hard to upgrade an old Z80 application to
make use of the extra memory. Of course, the HD64180 still uses a 64k logical
address (for compatibility with the Z80), so a Z80 program
doesn't suddenly get a nice 1 Mb linear addressing range. The software must
be somewhat restructured if huge programs or data areas are
needed.
(As a refresher,
"logical" addresses are those issued by the program; on the HD64180
these can never exceed 64k, the limit imposed by
using 16 bit addresses in load and jump instructions. "Physical"
addresses are those appearing on the CPU's pins; the internal Memory
Management Unit converts Logical to Physical, using a translation algorithm
controlled by the programmer.)
Suppose you're
designing a remote data collection device that gets one 16 bit A/D reading
every second. This doesn't sound like much data,
but after only a day the system will have acquired 86,400 words (172,800 bytes)
- far more than the logical address space of 64k.
Managing a 64180's
memory in this sort of application demands careful analysis of both the logical
and physical address space needs. There
is no way all 176k will fit within the 64k logical space, so the program will
have to remap the MMU, possibly often, to get to the array.
Accessing Arrays
This problem
requires "sequential" access to the data, since the current data
point is appended immediately after the last. It is a
special case of the more general array accessing situation, where a program
might need any arbitrary data point, in no particular order.
"Random" array access is analogous to that employed by disk drives,
while sequential is typical of tapes.
Byte-oriented
data (like strings) is easy to work with. Element I is the "Ith"
address in the string; in other words, the address of
DATA(8) is the start address of the array plus 8 (assuming DATA(0) is a valid
element). This is called a vector - it has only one
dimension, that given by the subscript.
Of course, in
real applications we often work with data that is 16 or more bits long. Integers
are often 2 bytes; floating point values
typically are 4 or even 8 bytes. Array element I of a vector is found from:
base + I * element
size,
where "base"
is the start address of the array.
Again, this assumes
element 0 is valid. Element 0 is at the first address of the array, followed
sequentially by each other element.
Multidimensional
arrays use an extension of this formula. In most systems these arrays are
stored in "row major" order: given
A(row,column), all column elements of row 0 are stored first, then the column
elements of row 1, etc. We can get to any element A(I,J)
using the formula:
address=base
+ I * (Jmax * Esize) + J * Esize
Jmax is the number
of columns in the array and Esize is the number of bytes per element of the
array.
As you can imagine,
this can be a computationally expensive way to get to an array. Multiplications
are slow. The HD64180's multiply
instruction only handles 8 bit operations, and so by itself it is not adequate
for indexing into an array. Generally you can count on the
quantity Jmax * Esize to be a constant; for speed critical applications it
may be wise to set this to a convenient value (a power of two),
even if some memory is wasted. Similarly, aim for an element size that is
1, 2, 4, or 8 so shifts can be substituted for multiplies.
Any number of
dimensions can be accommodated by extending the formula. Higher dimension
arrays require even more math to access, so try to
limit them to one or two.
In real time
applications it's often nice to support two forms of data access. A perfectly
general form is useful for offline data
reduction; your application program can request any array element in any sequence.
During data acquisition you might need a shortcut to
avoid the computational overhead of computing an array index. If data is gathered
in some sequential form, it's easy to visualize the data
as a one dimensional vector (instead of a multidimensional array), and store
each value sequentially. We'll explore both methods. The
Memory Manager
Sure, it's easy
to write code to compute an array index along the lines presented. We'll run
into trouble when the array gets too big.
Suppose your code uses most of a 27256 PROM, leaving only 32k of logical address
space for data. If all of this were dedicated to a 2
dimensional array consisting of 4 byte-long elements, then only 8192 elements
fit in memory - ARRAY(100,100) exceeds address 64k.
Here the MMU
comes to the rescue, but not without a lot of help from you the programmer.
Obviously, we can simply reprogram the MMU's
control registers every so often and bank switch portions of RAM into the
processor's address space. The secret to success is careful
planning.
If you've never
really blown a software project by immediately jumping into coding, then you
are probably sick of hearing about software
methodologies. In fact, as processors and programs get more complicated careful
design is far more important than it once was. Oh for the
days of 4k programs! We could crank out a few thousand lines of code in a
couple of weeks and be done. Now, when 300k+ programs are then
norm, a carelessly designed system will be a disaster. Guaranteed.
This is certainly
true when using any sort of memory manager. One penalty of the MMU is a segmentation
of your logical address space that
must be designed in up front, and that very likely can NEVER be changed, without
completely rethinking the entire structure of the
program.
Using huge arrays
forces you into a three bank memory model on the HD64180 (note that on other
chips other options exist - e.g., the
Z280's fantastically complex and powerful MMU will let you have up to 16 banks
in the logical address space). One bank points to the ROMed
program; in most cases this logical to physical mapping will never change.
Another bank accesses an area of RAM for program variables and
transients. In all but extreme cases this never be remapped, because the stack
will reside here. Finally, one bank points to the huge
array(s).

Figure 1: Organization
of Logical Address Space
Figure 1 shows
one possible configuration of the memory management unit. This assumes 32k
of ROM from logical 0 to 7FFF, 28k of RAM from
8000 to EFFF, and 4k of huge array RAM at the end of memory.
Wait a minute
- 4k of RAM for huge arrays? This doesn't seem like much!
This 4k section
of the system's address space is essentially a "window" into the
huge array. We have to provide a section of logical space
to allow access to the data; the 4k window is this section. This is much like
a disk buffer used in operating systems - data is written to
the buffer and then flushed to disk when full. It's size is a result of the
mapping resolution of the MMU - 4k is the minimum amount we
can allocate. You can always make this larger, which will cut down on MMU
remapping, but you'll sacrifice either variable or program area.
The window is
for storage of huge arrays. Never, never store the stack or other variables
here, since remapping will invalidate the data.
The idea behind
using this window is to compute the physical (20 bit) address of the proper
array element, and then to position the window
to bring the data into view. Earlier we saw how to compute an index into any
array. Now, this windowing step is also introduced. The
algorithm is not complex, but a wise programmer will insulate his code from
the machinations of array element lookup by hiding the
details in subroutines.
The routine "PUT1D"
stores a byte to a huge one dimensional array using the map shown in figure
1. The code to get a byte from the array
is nearly identical and so is not shown. PUT1D accepts an array index in HL
that can range all the way up to 65k. Thus, the one 4k window
in logical address space gives the routine access to a 65k hunk of data. Note
that it supports random accesses - it stores the data in B
into the array at the index in HL. HL can be 3 on one invocation and 40000
on the next.
PUT1D locates
the array in physical memory at address: 0F000 + BASE_CBR*4096. Thus, if BASE_CBR
is 1, then the array runs from 10000 to
1FFFF. This implies that the memory manager's CBAR register must be set to
Fx (where "x" defines the logical start of the base area) so
that logical addresses from F000 to FFFF (our window) are translated into
common area 1.
For the sake
of simplicity the code stores a one byte value. A word version would require
the "index" to be multiplied by 2 before it is
used in the computations. Use a shift to do the multiply, but be wary of overflows.
Multiple dimension
arrays can be programmed just as easily, but you must compute the address
using the formula previously discussed.
Multiplies are slow and should be avoided; try to pick values of jmax that
are powers of two, and then use a series of shifts. Fast access
will require some thought to optimize the computations.
One easy speed
trick is shown in "NEXT1D". Especially when gathering data, we often
just put one value in the next location - random array
accesses are not needed. This means we can skip all of the multiplications
needed to compute the address; we just increment the last
address. NEXT1D illustrates this approach with the one dimensional array we've
already looked at. PUT1D saves the current CBR and logical
addresses in SAVE_CBR and SAVE_LOG for NEXT1D. After one call to PUT1D to
set things up, all further sequential accesses can use the
faster NEXT1D. With the single indexes shown, the speed difference is not
important; with a 2 or 3 dimensional array a tremendous time
saving will result.
What if your
program uses a number of big arrays? You can't assign more windows since the
HD64180 limits mapping to three sections.
Probably the easiest solution is to write a "PUT" and "GET"
routine for each array, using a different BASE_CBR for each. Avoid using
sequential accesses unless you can be really sure that accesses will be restricted
to one array at a time.
