3 Gordon Drive, P.O.Box 1347 Rockland, Maine 04841 U.S.A.
Find Tools for Your Chip


 

© 2004 Avocet Systems, Inc.
Call Us Today at 207-596-7766 ("Picton Press")
Avocet Systems, Inc. : The Complete Solution for Embedded Systems Development Tools
Hints
Prefetchers

Abstract
Most modern CPUs (like the 80188, 68xxx, PIC and others) have prefetchers on-chip to increase peformance. They can cause no end of debugging trouble, though...

Most 16 CPUs are really a combination of two tightly integrated processors. An Execution Unit runs the code. A Bus Interface Unit manages the processor's pins, always trying to fill a small prefetch queue with the next instruction to execute. Memory is slow; if the instruction is already on-chip quite a bit of time will be saved.

The Bus Interface Unit (BIU) sits between the Execution Unit and the device's pins. If the CPU were a simple Z80, the BIU would just pass memory requests from the Execution Unit to the outside world. Instead, prefeching CPUs exploit idle bus times by adding intelligence to the BIU.

Code generally executes from a low address to a bigger one. Sure, jumps and calls reverse the monotonically increasing fetch sequence, but even in a short loop more often than not the next instruction byte is located right after the one just fetched. The 80188's BIU uses this fact to keep the CPU to memory interface busy (i.e., maximize the bandwidth).

The BIU is really rather stupid. It just blindly keeps fetching bytes from ROM, storing them in a little FIFO between it and the Execution Unit. When the EU is ready for the next instruction it might already be on-chip in the FIFO. Memory fetch delays are thus avoided. The FIFO is small but most of the time the next byte is there when needed.

Once in a while the CPU will decode and execute some sort of branch operation. Presumably the BIU will have at least partially filled the FIFO with bytes located sequentially beyond the jump; bytes that just are not needed. Jumps and calls flush the FIFO, erasing these unneeded entries. The BIU then starts fetching from the new execution address. Program transfers therefore essentially stall the CPU; the processor must wait for the first instruction at the jump destination before proceeding, just like any simple non-prefetching computer would. Soon, however, the BIU will again fill the FIFO, keeping a bit ahead of the EU's needs.

Some instructions read or write data to the memory array. A load or store operation causes a momentary disruption in normal prefetching sequence. Loads and stores work around the FIFO; they temporarily suspend prefetching, transfer the data, and then resume without corrupting the FIFO's contents.

Prefetcher Perils

Prefetchers cause two sorts of emulation problems. They often hopelessly confuse the real time trace data and sometimes cause incorrect breakpoint operation.

The CPU's Bus Interface Unit is fairly unintelligent. It constantly issues requests for the next sequential instruction until a jump or other program transfer invalidates the prefetched but not executed data. The real time trace in some emulators cannot deal with these erratic and sometimes incomplete instruction fetches. When the processor prefetches an instruction, a jump that is pending in the internal prefetch queue could stop the fetch even before the entire instruction is read. If the trace system doesn't model the processor's internal operations the displayed data will be meaningless. Softaid spent almost two years developing an algorithm that models the processor's operation and then correctly displays the trace data.

If you look at the trace data collected by our emulators you'll see that the "index", or position of a line in the trace data, is not monotonically increasing. This is due to the algorithm, which must move data around to properly disassemble the trace data. If a move-from-memory instruction is found, for example, the algorithm will look ahead in the trace data to find the bus cycles representing the data transfers and align them with the instruction that did the deed, making the programmer's life much simpler.

In addition, the algorithm deletes prefetched-but-unexecuted instructions from the trace display since these instructions were not executed and will only confuse the programmer. (Note that in "Raw" display mode, the algorithm is disabled so you can see every bus transaction, just as it took place).

Since the Bus Interface Unit fetches ahead, regardless of what instruction is actually being executed, it may fetch an instruction that is never or only rarely executed, causing problems with breakpoints. Consider the loop:

label: < loop code>

jnz label

< more code>

If you place a breakpoint on < more code> every loop iteration might cause a breakpoint - even if the jump is taken. The emulator only sees the fetch and can't distinguish a fetch that will be executed from one that will not be. Softaid's breakpoint circuits have a state machine that models bus cycles in real time to see if the breakpoint really should be taken. The breakpoint will occur only if < more code> is really executed.