386 Protected Mode
386 Protected Mode (part 1)
Copyright 1991, Jack G. Ganssle
Abstract
Here's how protected mode changes everything you know about
the x86.
Published in Embedded Systems Programming, August 1991
In the few years since Intel release the 386 processor, it
has gone from a tremendously overpriced compute engine to the minimum processor
for anyone considering purchasing a PC. Proliferation versions (like the 386SX
and AMD's variants) drive the chip cost down while
maintaining software compatibility with the rest of the line.
It seems those of us in the embedded world could ignore this
technology, since so many designs revolve around low performance controllers.
Now, however, more and more embedded systems use the 386 series of components.
Examples include high speed data communications devices
(though in cheap modems the Z80 still reigns supreme), graphics equipment,
and ultra-high-speed data acquisition gear. Even the cockpit
displays of some modern jetliners use 386's as controllers.
Why? What's so great about the 386 that compels a designer
to include a $325 processor in his embedded system? The 386 offers two
important features: raw compute horsepower, and the potential for a huge address
space.
I recently had the opportunity to design a rather complex
embedded system using a 386, and found the experience to be both frustrating
and
rewarding. Frustrating, because Intel's documentation assumes the reader is
completely knowledgeable about protected mode. Rewarding,
because the processor's power and complexity is awesome. I ended the project
with a great deal of respect for those who mastered this
complexity to design the chip way back in the mid-80s. 386 Benefits
Most of us computing with a 386-based PC run the processor
in its slowest and least functional mode. Yet, even then we get staggering
performance improvements over that for which we lusted a decade ago. Most
PC applications run in "real mode", using 8088-like 20
bit addresses and 16 bit registers.
The 386 can and does often act just like a very fast 8088.
It's most obvious virtue is its raw speed. With no wait states machine cycles
take only two clocks. At 33 Mhz, this is a blazing 61 nsec per cycle. Short
instructions (e.g., a register to register move) complete in
two cycles, or about 122 nsec. This baby is no slouch at moving data!
There is a sort of hidden price to running so fast, though.
How many memory systems can present data so quickly? Inject a single wait
state, and the machine's performance declines by a third. Any high performance
embedded system will likely need costly cache to properly
match memory speeds to the processor's bandwidth.
The 386 has a richer instruction set than it's 80x88 cousins.
32 bit multiply/divides, barrel shifters that shift up to 32 bits in 7
cycles, and bit manipulations are all included. All registers are 32 bits,
so handling decent sized data is a breeze.
Embedded people might be disappointed with its lack of peripherals.
64180/Z180, 8051, 80196, and other embedded parts include timers,
serial ports, and the like, all designed to reduce the cost and size of a
system. Not so the 386, which is targeted only at high
performance, high cost applications. I hope Intel or AMD does eventually come
up with versions specifically for embedded markets,
including serial and parallel ports. It would seem a sensible use of the vendors'
ability to cram ever more functionality onto a piece of
silicon. After all, even the RISC folks are now targeting processors specifically
towards the embedded marketplace. Protected vs. Real
Modes
If you've worked with the 80x88 family, you are intimately
familiar with what 386 documentation calls "Real Mode".
Real Mode
addresses are limited to 20 bits, and are generated by adding a 16 bit segment
register, shifted left four bits, to a 16 bit offset. This
much maligned segmentation causes no end of grief for programmers trying to
access large data structures. Since an offset cannot exceed 16
bits, you just can't increment beyond 64k; you'll have to watch for a 64k
boundary and then play games with the segment register.
The 386's Protected Mode changes everything you ever learned
about 80x88 segmentation. Protected mode offers direct access to 32 bit
addresses. Though segment registers still play a part in every address calculation,
their role is no longer one of directly specifying an
address. In protected mode segment registers are pointers to data structures
that define segmentation limits and addresses. More on this
later.
On a 386 operating in real mode you have access to practically
every feature the 386 has to offer - with the exception of 32 bit
addressing. Just about all of the new instructions are available. All operands
can be 8, 16, or even 32 bits. That's right - real mode
programs can easily handle double word long data, using 32 bit registers.
On the 386, in real or protected modes, you access operands as
follows:

You can use the 32 bit registers to address memory, but in
real mode the effective address may not exceed 20 bits. The 386 will generate
an exception if the address is too large.
Take advantage of the 386's extended instructions (even in
real mode), to greatly speed processing:

The processor includes extra segment registers. Where an
80x88 CPU only provides ES, DS, SS, and CS, the 386 adds FS and GS, which
you can
use in real or protected mode. Protected Mode Addressing
Segment registers are called "selectors"
when operating in protected mode, to distinguish their operation from that
of real
mode. For these registers do indeed perform a selection process. In protected
mode, segment register simply point to a data structures
that contain the information needed to access a location.
Every protected mode program must include a table of "descriptors",
which are 8 byte data structures that define the start and
end of a segment. Depending on the type of segment, a descriptor may have
other information such as access rights and the like. A typical
descriptor contains the following information, packed into an 8 byte record:
Segment start: absolute 32 bit address
Segment limit: Maximum address this segment can reference
Segment status: privilege level, segment present, segment
available, segment type, etc.
Thus, the descriptor tells the 386 everything it needs to know about accessing
data or code in a segment. Accesses to memory are qualified
by the descriptor selected by the current segment register. This selector
is a 12 bit number indicating which entry to use in the
descriptor table; if the selector is 0, the first descriptor is taken, a selector
of 1 takes the second, etc. The 386 multiplies the
selector by 8 (8 bytes per entry), and adds this to the base address of the
table of descriptors (contained in an internal 386 register
loaded by the programmer before switching to protected mode.)
For example, a code fetch always uses the current CS. A protected
mode fetch starts by multiplying CS by 8 and then adding the descriptor
base register. The 386 then reads an entire 8 byte record from the descriptor
table. The entry describes the start of the segment; the
processor adds the current instruction pointer to this start to get an effective
address.
A data access behaves the same way. A load from location DS:1000
makes the processor read a descriptor by shifting DS left 3 bits (i.e.,
times 8), adding the table's base address (stored in the 386's on-board descriptor
table register), and reading the 8 byte descriptor at
this address. The descriptor contains the segment's start address, which is
added to the offset in the instruction (in this case 1000).
Offsets, and segment start addresses, are 32 bit numbers - it's really easy
to reference any location in memory.
Every memory access works through these 8 byte descriptors.
If they were stored only in user RAM the 386's throughput would be pathetic,
since each memory reference needs the information. Can you imagine waiting
for an 8 byte read before every memory access? Actually, the
processor caches a descriptor for each selector (one for CS, one for DS, etc)
on-chip, so the segment translation requires no overhead.
However, every load of a selector (like MOV DS,AX or POP ES) will make the
386 stop and read all 8 bytes to it's internal cache, slowing
things down just a bit.
Figure 1 shows how addressing works. The figure ignores Paging,
yet another 386 feature that permits extending the address space far
beyond 4 Gb.
It's all a little mind boggling. The CPU manipulates these
8 byte data structures automatically, reading, parsing, caching, and working
with them as needed, with no programmer intervention (once they are set up).
Not only does the CPU translate addresses as described. In
parallel it checks every memory reference to insure it behaves properly.
Remember the "limit" field in the descriptor? If the effective
address (base plus offset) is greater than this limit, the 386
aborts the instruction and generates a protection violation exception. It
won't let you do something stupid. You can even specify that a
segment is read-only; a write will create the same exception.
But wait a minute! Everyone seems to think that segments aren't
used in protected mode! In fact, segmentation is practically essential,
and is far more useful than you might think.
On a 80x88 processor you'll frequently write programs divided
into more than one named code segment. The linker combines like-named
segments together, and then groups the segments into one hunk. In the embedded
world, using a Locator (like ones sold by Systems and
Software and Paradigm), you can separate named segments into specific RAM
or ROM addresses to match the nuances of your particular
hardware environment. The 386 takes this one step further.
A 386 linker groups like-named segment together. Then, if
you wish, you can assign any group to any descriptor. Though the selector
uses
only 12 bits to pick a descriptor, another bit selects which of two descriptor
tables to read from (the Local or Global tables), giving up
to 8192 separate segments.
This is a lot of power; most DOS users ignore it. It is ideal
for embedded applications. Suppose you have memory mapped I/O: group it into
a named segment and assign read/write attributes to it. Even better, separate
read and write ports into different segments to insure your
code never accidently accesses one incorrectly. Make your code fetch-only,
so illegal accesses create protection violation errors -
debugging will be a lot easier with this enabled.
Some embedded systems include a ROMed version of DOS. DOS
runs in real mode only, so use the 386's segmentation to define real and
protected segments. The real ones will (sigh) not have the great protection
mechanisms. Restrict them to low addresses (under 20 bits),
and put the protected mode code up high. The real mode code will not physically
be able to generate a high address that might effect the
protected mode code. Linkers
If we had to define the selectors and descriptors ourselves,
protected mode would be just too hard to use. The descriptors are arranged
in
a nasty, hard to assemble format. Fortunately, Intel and others supply linkers
that do all of the hard work for you.
It is a little tedious to actually switch from real to protected
mode, but Intel application notes do a pretty good job of describing the
procedure. There seems to be surprisingly little written about actually building
an application. It turns out that the linker does most of
the work of building descriptors.
I've been using System & Software's (Irvine, CA) Link
& Locate 386 lately, and find that writing protected mode code with
it is a
breeze. Writing protected mode code is really no different than for real mode.
Break your code into named segments, separating data and
code, and segment them further if you wish to restrict access in some fashion.
Assemble the code with any decent assembler: Microsoft's
MASM and Borland's TASM do just fine. Then, use a linker with a carefully
scripted command file to assign descriptors as wished. Figure 2
shows a script file for Link & Locate 386 for a typical application.
This program consists of just 4 segments. Real_code is real
mode code executed occasionally by the program. Cgroup is the bulk of the
program. Dgroup is a data area. Flat_seg is a special segment defined so the
program can perform a linear address anywhere in memory.
Notice how the segments, in many cases, have absolute addresses
assigned, defining their start. The Linker puts in ending limits
automatically.
Flat_seg is a special case; we've set it to start at 0 and
end at the end of memory. This more or less bypasses protection checking,
as
the segment's definition precludes getting an addressing error. Sometimes,
in embedded systems we need to access any area to get to
specific hardware.
A program operating with this structure will have its code
all in segment cgroup, and all data in dgroup. The program will start with
code
that looks something like:

This looks just like 80x88 code. Now, suppose we want an absolute
reference anywhere in memory (say, we have some wierd hardware device to
read from). Do this:

Since selector ES points to a descriptor that is a flat,
32 bit address space, any number in ESI is a 32 bit offset added to flat_seg's
start address of 0.
The code set up selectors just like real mode 80x88 code sets
segment registers. There really is no difference. The linker replaces
segment references with pointers to the descriptor table. In the linker command
file, we've defined "gdt" (the Global Descriptor
Table), and specific entries for each segment. GDT entries 1 to 8 are reserved
in this case, but 9 corresponds to dgroup, 10 to cgroup,
etc. The linker will build GDT and insert it into the program. Conclusion
Next month I'll discuss the processor's protection mechanisms
in more detail, including the way the 386 handles context switching.
