|
Writing
Relocatable Code
Copyright 1992,
Jack G. Ganssle
Abstract
Some embedded
code must run at more than on address.
Published in
Embedded Systems Programming, February 1992
Email yourself
a nicely formatted version of this document.The file is only long. FTP a nicely
formatted () version of this document now.
Return to articles list Return to Softaid's home page
Employees of
large multinationals think of packing their house and home when hearing the
word "relocation"; however, embedded
programmers use relocation as a way to reduce hardware costs.
Relocatable code
is software whose execution address can be changed. A relocatable program
might run at address 0 in one instance, and at
10000 in another.
Just to confuse
the issue, partially built programs are composed of object modules unfortunately
called "relocatables". Linkers
combine multiple relocatables to one final program. The word "relocatable"
is applicable, since each is assembled at a pseudo-
address of 0. The linker corrects all address references to the proper execution
values. Once linked, the code is frequently no longer
relocatable, since it can typically run only at a single address.
Obviously, this
sort of relocation is an important consideration for linking multi-module
programs. Without it we'd be doomed to giving
absolute start addresses to each of the modules before assembly, a mind-numbing
prospect since changing the length of any one module may
necessitate changing the origin of all of them.
Once linked,
it may not be so obvious why you'd care if the program could be relocated.
Consider the case of an application running on a
multiuser system - if every users' program had to run at some absolute address,
who assigns these addresses? What if two users pick the
same one? It is possible to swap the programs in and out of memory, sharing
the same addresses, but this makes rather poor use of the huge
address spaces supported by modern CPUs. Certainly a system with 16 Mb of
memory should be able to run a dozen or more moderate-sized
applications without the disk-intensive overhead of swapping!
Most mainframes
and other large systems avoid this by either requiring all programs to be
reentrant (and thus inherently relocatable), or
by assigning an absolute address range to each user. A hardware base register
defines the start address of this memory within the physical
address space
It is useful,
though, to write intrinsically relocatable code even on small embedded systems;
code that, even after linking, can be
dynamically moved around in the computer's address space and executed at more
than one address. In embedded systems the relocation may be
handled in hardware, all but invisible to the program. Start-up Problems
One of the more
common cases for simple relocation is dealing with the power-up sequence of
a CPU. The 68000 fetches its starting address
and initial stack pointer from address 0. This forces the system designer
to put ROM at the beginning of memory. However, like the 80x88
it keeps all exception vectors in low memory. Quite a few systems dynamically
change interrupt pointers as the code executes, demanding
that these low memory vectors be in RAM. We're faced with a conflict: the
reset information at 0 must be in ROM, but the interrupt
vectors, also near 0, must be in RAM. Code relocation is one solution.
Back in the pre-DOS
dark ages of CP/M, Z80 systems solved the same sort of problem by mapping
in a "phantom" ROM at address 0000
during boot. Once the system was up and running, and the operating system
copied or loaded to RAM, the phantom ROM simply disappeared from
the processor's address space. Though the boot phase was ugly, once complete
the CPU had a clean full-address-space of RAM.
In minimal cost
systems a single ROM chip (or ROM pair, in the case of a 16 bit CPU using
bytewide memories) likely contains the entire
program. Though the phantom ROM approach is feasible (copy the ROM to RAM,
start executing code just above the interrupt vectors, and shut
down the ROM), it does mean you'll spend more on memories. That is, you'll
need to add as much RAM as there is ROM just to solve the
relocation problem. It makes more sense to write smarter code and design cleverer
hardware.
If the system's
recurring (i.e., manufacturing) costs are a concern, then by all means try
to make every part of the hardware do double
duty. Software is expensive, but you only pay for it once. Hardware costs
money each time an "instance" of the system is built.
Quite a few processors
have program counter (PC) relative jump and call instructions. Instead of
transferring control to a fixed address
embedded in the instruction, the destination address of PC relative instructions
is encoded as a positive or negative offset to be added
to the current program counter. This makes particularly good sense in a ROM-based
system, as pretty much all of the code lies contiguously
in one or more non-volatile memories; after each compilation the distance
between subroutines is fixed.
Solving the power-up
relocation problem with PC relative instructions is straightforward: start
the code from ROM fixed at address 00000.
Enter an initialization routine that is located near 00000. Copy a short routine
to RAM:
set
an I/O bit
jump to reloc_address
Then, jump to
this RAM routine. The I/O bit should move the start of the code to some other
address; one that is far from the interrupt
vectors in low memory. "reloc_address" is the start of the
code at the new, moved, address of the ROM. If all transfers are made
with PC-relative instructions, then the code will not notice or care that
it is operating at some other address.
A variant of
this involves tricking the assembler to compile part of the code at the final,
moved destination address. Then PC relative
instructions are no longer needed as the assembler and linker will be able
to resolve all transfers directly. On some assemblers the
pseudo operators PHASE and DEPHASE let you assemble code for use at other
addresses. This is a better solution when using a high level
language where you have no control over what instructions are compiled (i.e.,
you cannot insure the compiler will generate PC-relative
transfers).
Both of these
solutions do require that some RAM exists for the transfer code. Think about
it: if the ROM is to suddenly move, you better
not be executing in that ROM during the jump! After the I/O instruction moves
the memory, the jump instruction will no longer be
accessible.
I have seen folks
use a CPU's prefetcher to pick up the jump before the I/O completes, but this
seems a little like playing Russian
roulette. Sooner or later a new mask of the silicon will change the prefetching
algorithm.
Sometimes the
hardware includes a little state machine that defers changing the ROM's base
address until the jump just completes. While
elegant, more hardware is needed. Prefetchers (most modern CPUs have them
now) make it almost impossible to get the timing right.
Often you won't
have RAM available until the ROM has been moved. It's a lot easier to put
one big hunk of RAM at an address than several
little pieces scattered around. Before the ROM move, the low (at least) RAM
cannot be accessed as it is at the same address as the ROM.
What other solutions
are there? The embedded world is unlike that of big computers. Few of us use
the entire address space of the
processor, so why not minimally decode the address bus?
To take a fer
instance: the 68000 has 24 address lines, giving 16 mb of memory space. This
is a lot more than almost any low cost embedded
system needs. If your program eats up, say, 256k for code and 256k for data,
many of the address lines can be ignored.
In this case,
one solution is to use a pair of 128k by 8 ROMs and a similar pair of RAM
chips. Generate the chip select signals from a
single address wire. You might tie the inversion of A23 to the ROM chip select,
and the inversion of the ROM select to the RAM. (Inverted
signals are universally used by static memory devices). Thus, the RAM is on
if the ROM is off. This is a trifle simplistic, since all I/O
on the 68000 is memory mapped, but you get the idea.
Now, if A23 is
high (true for any address of 1xxx xxxx xxxx xxxx xxxx xxxx, where "x"=don't
care) the ROM will be on. For all
other addresses the RAM is on. Thus, we've effectively split the address space
of the CPU into ROM and RAM halves, with the ROM eating up
the entire top half of a 16 mb address space and the RAM the lower.
Add a latch to
force A23 to the memories "high" immediately after reset.
Add an I/O instruction to disable this condition. After
reset the ROM will be at 0 and the RAM just doesn't exist (depending on hardware
implementation). The startup code should be a jump to
address 800000+the offset of the main code (the ROM's normal space), followed
by the I/O instruction needed to disable the funny reset
circuit.
The 8 mb jump
stays in ROM, since the hardware continues to decode a ROM chip select. Using
a single address line as a chip select
simplifies decoding logic, saving a buck or three in the unit's production,
and makes relocation easy. It's true that the ROM appears
multiple times in the processor's address space (it starts at 800000, 840000,
880000, and every multiple of it's size). The code could
care less, as it is entirely contained in 256k. Relocating with MMUs
Recently I've
seen a lot of people doing this on the 64180 and Z180 processors. Both include
memory management units that translate the
core CPU's 64k logical address space into a 1 mb physical space. On these
processors most people relocating their code go one step
further: they trick the code into thinking it really executes at location
0, even when an upper order chip select is all that enables ROM.
This is pretty
easy to do on any processor with a memory management unit: just open a window
in the MMU to point to the ROM with a logical
address (that issued by the code's instructions) in low memory. Program the
MMU to add offsets to make the real, physical, address up
high. This is particularly useful when old technology assemblers and compilers
are used, which can handle only a 64k address space. The
logical addresses are always in 64k, even though the MMU-translated ones are
not, so the tools are happy.
There is one
important "gotcha": debugging might be tough. If the code
is compiled at 0 but the MMU translates it to, say,
80000, then your logic analyzer, emulator, or other debugging engine will
not be able to work symbolically. These hardware debuggers see
the address bus (post-MMU translation), and try to equivalence the addresses
to the pre-translated addresses made by the linker. The best
solution is to get a modern linker that fixes symbol addresses automatically.
For some reason, though, a lot of programmers refuse to
desert their old tools.
Viruses, TSRs
and the 80x88
Power-up is not
the only domain of relocatable code. In the DOS world too many of us have
been stricken with computer viruses, which are
nearly always relocatable. TSR programs are as well, as it is difficult to
say just where a TSR (or virus) will be loaded.
The 80x88's segment
registers let you build a more interesting relocation scheme, especially for
small programs. In effect, every address
reference is made by applying the offset contained in one of the processor's
segment registers. Just change the register, and the entire
address of the program changes as well. If the program is less than one segment
long (64k), it can be entirely relocatable with or without
using PC-relative instructions.
However, Intel
did remove most of the power-up difficulties from the 80x88 family by making
it boot from the end of the address space,
rather than from the beginning. It seems most ROMed programs execute from
high memory to save the trouble of adding relocating circuitry.
|