DMA
386 Protected
Mode
Copyright 1991,
Jack G. Ganssle
Abstract
Part 2 of a two
part series on protected mode.
Published in
Embedded Systems Programming, August 1991
Last month I
introduced the architecture of the 386, and described how it uses "segment"
registers to access a 4 Gb address
space. Though many believe that segmentation isn't used in protected mode,
in fact it is every bit as crucial as with an 8088. Every
address reference is made via segmentation whether in real or protected mode.
However, protected mode segments can be any size, from a
single byte to all the way up to 4 Gb (32 bits).
To summarize
last month's description of 386 addressing, every protected mode memory reference
uses a selector, an offset, and a
descriptor to form a linear address. CS, DS, ES, SS, FS, and GS (segment registers
in real mode) are called "selectors", and are
pointers to data structures that define characteristics of a segment. These
8 byte data structures are known as "descriptors",
and are grouped into tables. The Global Descriptor Table (GDT) is available
to every task in a 386 program, and contains up to 8192
descriptors. Local Descriptor Tables (LDTs) can be private to individual tasks,
and also contain up to 8192 descriptors.
Every descriptor
contains the starting address of the segment (a 32 bit absolute number), the
end address (for checking out-of-bounds
errors), and miscellaneous access right bits.
Just like in
real mode, a "segment register" is associated with every
type of memory access. In protected mode, these selectors
contain a 13 bit index into the GDT or LDT. The instruction:
MOV AX,[data]
uses selector
DS (by default) to index into the GDT or LDT, where the processor finds the
base address of the segment containing item
"data". The CPU adds this base address to the offset (i.e.,
the address of "data" as stored in the instruction bytes)
to create a linear address to send to memory.
Thus, the descriptor
tables define the bases and sizes of every segment in the program, and define
the areas of memory that are
addressable. It's easy to set up the descriptor tables using special 386-aware
linkers available from a number of vendors. Protection
Systems
So far I've glossed
over the details of the format of selectors and descriptors. In fact, each
contains information used to keep ill-
behaved programs in check. The whole issue of capturing address violation
errors is perhaps a bit new to the embedded world, but with the
proliferation of ever more complex systems will certainly become important
in the next few years. As one who has suffered through watching
programs crash and write over themselves, I find it breathtaking to watch
buggy 386 code recover from practically any insult I toss at it;
the protection mechanisms insure that the code never gets overwritten, and
that the operating system, if any, remains intact and
functional.
The 386 supports
3 privilege levels, numbered 0 to 3. The highest, most privileged level is
0 - a program running at this level can gain
access to any 386 resource. Programs running with lower privilege levels are
restricted in their ability to use memory, I/O, and some
instructions.
Privilege levels
are intimately tied to descriptors. As I mentioned, the descriptor contains
the base address of a segment, the segment
size, and access rights bits. Two of these bits specify the Descriptor's Privilege
Level (DPL). Privileges are thus associated with
segments, a somewhat novel concept when you consider that most CPUs simply
have a global privilege setting that effects all of memory
equally.
Before describing
how a segment's DPL effects memory access rights, it makes sense to answer
the obvious question: what defines the
processor's privilege level? Cleverly enough, this is handled entirely within
the context of segment privileges. The CPU runs at the
privilege level defined within the DPL of the current code segment - the Current
Privilege Level (CPL). Privileges are somewhat removed
from the code, then. A transfer to a segment with a DPL of 0 (say, the operating
system), will always run with the greatest access rights.
Vector off to a code segment with DPL=3 and you'll be very limited in your
ability to run amok.
Every time any
section of code accesses another segment, the 386 hardware compares the CPL
to the referenced segment's DPL (i.e., it
compares the privilege level the CPU is running at to the privilege defined
for the segment). If the CPL is the same or higher (smaller
number) than the DPL, then it can proceed with the access. An attempt to access
a segment more privileged then the computer's CPL results
in an exception, letting us know something is wrong.
Thus, code running
in a segment with a DPL of 0 pumps the CPU up to a CPL of 0, and gives the
CPU access to every other segment.
Novice 8086 assembly
programmers always moan about the complexity of segments and segment groups.
Sometimes the ASSUMEs, GROUPs, and other
pseudo-ops seem to be an awful lot of trouble. When you switch to the 386
suddenly these constructs make perfect sense: group like
segments together, simultaneously grouping privilege levels. Perhaps the operating
system will be grouped into one segment with a DPL of 0
so it can access any resource. Maybe device drivers can fit into a less important
group, giving them just as much power as needed but no
more, preventing them from trashing code. Finally, run the application program
at a very low privilege (i.e., high number, like 3), so it
cannot effect system data structures or I/O.
We're now talking
about two independent levels of protection. The first is defined by segment
sizes: no task can access outside of
whatever segment it is attempting to use, since an address that exceeds the
segment-size field in the descriptor will generate an
exception. Obviously, array subscripting errors just cannot cause major crashes
if the segments are defined cleverly. The second level of
protection is DPL checking, which prevents accesses to higher privileged segments.
In addition,
the processor provides hardware protection of certain dangerous instructions.
Obviously, the HLT instruction is one to be
limited only to very highly privileged tasks. In addition, those instructions
that load the 386's internal control registers (including
the debug registers), and those that load the descriptor table base pointers
should be restricted to only some tasks. These and a few
other instructions will cause an exception if they are executed by ny code
running with a CPL greater than 0.
I/O instructions
are protected as well. An I/O protection level is defined in the processor's
EFLAGS register. Instructions to enable and
disable interrupts will cause an exception if executed from a section of code
less privileged than the I/O protection level. Any I/O
instruction will create a similar error only if a particular port is set to
"protected" in the I/O Permission bitmap, an array
of 64k bits that indicates the protection status for each and every port.
Call Gates
Given that a
low privileged task cannot access code or data with a higher privilege (lower
number), then how can any task invoke the
operating system? The operating system, probably running at CPL 0, can access
outwards; a mechanism is needed to permit application
programs access to OS resources.
The 386 uses
"call gates" to access higher privileged routines. A call
gate is a special type of descriptor, stored in the GDT
or LDT, that contains a pointer to an entry point. To invoke a higher privilege
routine the linker will replace your CALL instructions
with a CALL that works indirectly through this new form of descriptor.
Where a normal
descriptor contains just the segment's base address, length, and access rights
bits, a call gate (which is also 8 bytes
long) has only the destination routine's selector, offset, and DPL. The call
gate is an indirect pointer to the destination segment's
descriptor.
Though this is
a bit tricky, essentially all a call gate does is remove the selector and
offset from the call instruction (where these
things would normally go), and place them inside of the descriptor table.
That is, the call gate contains the complete destination address
selection parameters. The CALL instruction itself has a selector (that selects
the call gate, just as any selector picks a descriptor),
and an ignored offset (since the offset to the routine is in the call gate).
If you use a
call gate to access routine invoke_os, the linker will replace your CALL with
a CALL to the gate - it will load the selector
with the gate's index in the descriptor table and probably store garbage in
the offset part of the instruction. At runtime, the 386 sees
the call, uses the selector to read the gate's 8 bytes, saves the offset part
from the descriptor, and uses the descriptor's selector to
load in the destination address's code segment descriptor. This yields a base
address (and length and access rights), which is added to
the offset from the call gate, generating the linear address of the routine.
The 386 uses
the DPL in the call gate to insure the invoker is allowed to use the gate:
the caller must be at least as privileged as the
gate. It then switches to the privilege level indicated in the descriptor
pointed to by the gate. Thus, a low level application routine
calls for operating system service with a call gate. The transfer through
the gate will raise the privilege level to that of the OS.
Call gates add
yet another level of complexity to a program's structure, but most of the
details can be left to the linker. One of the
nice advantages of the gate is that every call to it uses the same selector.
If the gate is defined at some sacred location that never
changes from version to version, then the gate is sort of like a jump table.
I've always been a big fan of using jump tables in embedded
systems, so you can figure out where routines are, even in the field with
limited tools, even after 50 versions of the ROM.
Call gates are
designed mostly for use when privilege level transitions are needed. Since
they are stored in a descriptor table, you are
limited in the number of gates the system will support. Remember that the
GDT and each LDT is limited to 8k entries, which is far from
infinity. Generally, gates are used to funnel requests for operating system
service through a single OS dispatcher. Other Goodies
The 386 is just
chock full of features for managing complex operating systems and code. This
list is far too extensive to cover here in
any detail. However, I'll briefly mention several other features that can
help in developing any kind of system, embedded or otherwise.
The processor
does support virtual memory. One of the attribute bits in every segment descriptor
indicates if the segment is present. A
reference to a not-present segment creates an exception, allowing system software
to load the required segment from disk. Frankly, I'm not
sure what this would be useful for in an embedded system, but it does seem
like a neat feature. I'd welcome ideas...
The processor's
memory management has yet another level beyond the segmentation I've described.
Optionally, you can divide the 4 Gb
address space into smaller chunks and then remap the physical address of each
chunk through page tables. You define the page tables to
translate practically any address into any other. Thus, two tasks could be
compiled at identical addresses, yet run at different physical
addresses by using different paging. Again, is this useful for an embedded
system? Does someone out there have some devilishly clever
technique you'd care to share with us?
The 386 does
include a number of debug registers that let you set hardware breakpoints
on up to 4 addresses simultaneously. These
breakpoints work rather like those produced by an emulator: they are non-intrusive,
and work in ROM or RAM. You can set them on code or
data accesses. If you'd care to write a monitor to embed in the product (always
a good idea for long term product maintenance), then by
all means use these resources. Conclusion
Why use protected
mode in embedded applications? The biggest attraction is the large, 32 bit
address space that becomes immediately
available. Of course, most any other 32 bit CPU will give easier access to
lots of memory.
Certainly the
DOS based tools that so many non-embedded people use are a compelling incentive
to stick with the 80x86 architecture. How
many millions use all of the great DOS Cs and assemblers? You can use any
of these on the 386, and as they become more 32 bit aware
they'll take even greater advantage of the 386's features. Quick development
cycles demand proven tools, and it's awfully hard to argue
against those from the DOS world. You can even do a lot of the development
on a DOS machine, and port to the harder embedded world after
removing most of the bugs.
A lot of embedded
folks are now putting DOS into ROM - a subject I know will see a lot of discussion
at the upcoming Embedded Systems
Conference. With the 386 you can run DOS as a task in its own segment, and
run other applications concurrently.
Finally, protected
mode really does protect your code. With the right segmentation, you'll never,
and I mean never, see a rogue program
overwrite the code. This could be important in medical and other life-critical
applications.
For those wishing
to explore the mysteries of this processor in more detail, be sure to get
the complete set of Intel reference manuals.
Intel's "Microprocessors"
manual (mine is dated 1990) contains a pretty complete hardware and software
description of the part,
but is definitely not for the faint hearted. It is complete but succinct.
Their "386
DX Microprocessor Programmer's Reference Manual" is far more readable,
but neglects all hardware issues. It gives a
pretty readable account of the operation of all of the processor's major modes.
This is a must read for serious 386 users.
Intel's "80386
System Software Writer's Guide", though thin, does include lots of
sample code, including routines to enter and
exit protected mode. It is a good adjunct to the Programmer's Reference Manual.
Finally, the
"80386 Microprocessor Hardware Reference Manual" helps explain
how to design hardware that will really work with
the 386. This is not a trivial problem, as the CPU can get out of sync with
it's bus cycles - you have to build a sort of state machine to
determine what it is doing when. Even adding wait states is a bit challenging.