Microcontroller C Compilers
Copyright 1990, Jack G. Ganssle
Abstract
This article discusses the state of C for controllers circa
1990.
Published in Electronic Engineering Times, November 1990
C has taken the software industry by storm. In the last few
years even the microcontroller industry, the last bastion of pure assembly
language, has experienced an exponential use of C language programs.
C's rapid proliferation spawned dozens of compilers targeted
to embedded systems. Their quality ranges from nearly perfect to downright
unusable. Sometimes the compiler's caliber is inversely proportional to the
data sheet's gloss and the product's cost. Unfortunately the best characteristics
are apparent in the vendor's advertising, the worst don't show up until you've
used it for some time.
Code size and speed are usually uppermost programming concerns,
especially to cynical dyed-in-the-wool assembly programmers being dragged
into a high level language for the first time. I hesitate to add to the hype
about efficiency, except to state the obvious, that C code is slower and larger
than comparable assembly. But not by much - in most cases the difference is
only about 25%.
It's hard to beat C for a large embedded project. Everyone
admits that it is less efficient than assembly, but C will reduce non-recurring
engineering (NRE) costs by a factor of 3 or more. In most products the savings
in design, coding and maintenance quickly justifies the extra memory expenses.
Where is the crossover between NRE and recurring costs? How many units will
you have to sell before the larger ROM costs more than the extra $50, 100,
or 150 thousand in NRE?
This conventional wisdom doesn't directly apply to microcontrollers,
where ROM sizes are fixed and just can't be increased. Still, the quality
of modern compilers is such that the penalty in memory space for using C is
small. Using a controller with a little extra memory can save a lot during
development.
Moore's law states that every two years the silicon wizards
will double the number of transistors on a chip. Therefore memory is cheap,
and gets ever cheaper as densities increase. Unfortunately, microcontrollers
seem to be at the tail end of the benefit chain of this rule. Even today it's
hard to find more than 16k or so of ROM on a controller. 4k is still not uncommon.
Intel did recently introduced a 32k version of the venerable 8051, but 32k
is but a pittance compared to the memory used by big microprocessor-based
programs.
The cardinal rule of design for microcontrollers is to carefully
manage your resources. Certainly the most important and scarcest resource
on any microcontroller is memory. Hoard this precious commodity; don't squander
it in careless decisions. Before selecting a compiler be sure it is as miserly
in its use of memory as you would be if writing assembly code.
Consider stack operations. The 6805 has no real stack, so
the compiler must laboriously manipulate automatic variables (i.e., ones stored
on the runtime stack) using the CPU's entirely inadequate 8 bit index register.
Stack overhead will therefore occupy a lot of ROM space. In this sort of situation
perhaps it makes sense to adapt to the processor's architecture and use static
variables everywhere, but you'll pay a penalty in RAM requirements. At the
other end of the spectrum, the embedded 68302 is a C programmer's dream platform,
since stack relative addressing is fast and easy.
Many microcontroller C compilers speed operations by trading
off speed for memory. Take the 8051: stack accesses are so cumbersome that
many compilers allocate automatic variables as statics. In other words, even
if "x" is a temporary automatic whose scope is local to one function,
the compiler assigns it a permanent address in memory. With limited RAM size
this can be an important concern.
Most of us consider ROM size when making the C versus assembly
decision. RAM is just as important. Obviously the compiler will allocate RAM
space for the stack, variables, structures, and the like. Some compilers also
make use of potentially large amounts of RAM for internal compiler runtime
functions. Library routines invariably use RAM. Some compilers copy all string
literals from ROM to RAM during the intialization sequence. Why? Just to make
the compilation easier.
Compilers designed for non-embedded programming usually are
very poor at dividing memory into separate RAM and ROM sections. They assume
that the address space is all RAM. Embedded programs reside in ROM, while
the data is stored in a remote RAM area, so the ability to specify separate
starting addresses for code and data is crucial. But this is far from enough
- the ideal compiler will let you divide your code and data even further.
Suppose the product uses memory-mapped I/O; it's essential that these ports,
although looking like RAM variables to the compiler, be assigned to the proper
absolute addresses. Interrupt vectors are also stored at fixed locations;
the compiler/linker must let you define these at absolute addresses distinct
from the rest of the code.
All compilers will perform some amount of optimization to
minimize code size or increase speed. Some are truly remarkable, removing
constant expressions from loops and the like. This may not be a virtue; extensive
optimization makes the code impossible to debug. No one yet knows how to tie
optimized code to a debugger. All meaningful references between the object
code and the original source lines are lost when the compiler moves the source
around, so all debugger vendors insist that you debug with optimizations turned
off. Rather than rely on extensive optimization, write good code! Don't leave
a constant assignment inside a loop where it will be executed thousands or
millions of times. Don't ask the compiler to convert needlessly between floating
point and integer.
How fast does the compiler translate a program? Many programmers
are now familiar with Turbo C's blazing compilations. No cross compiler is
nearly so fast. In evaluations conducted at Softaid we measured compile/link
times for a 1000 line program ranging from Turbo's few seconds to over 20
minutes (on the most expensive compiler evaluated). Expect to use your tools
a lot. Demand reasonable translation times.
Linkers can be even slower than the compiler. After all, a
well designed program is built around quite a few small modules. After making
a change, you'll only recompile the one module that was effected, but you'll
relink the entire program. A slow linker is a curse to avoid at all costs.
Many cross compilers build their internal data structures
in memory, incorrectly assuming the "huge" address space of the
host development system will be adequate. On a PC much of 640k address space
is taken up with the compiler itself, DOS, network drivers, and TSRs. It's
not unusual to find a cross compiler unable to compile big source programs.
Virtual products, which write intermediate tables to disk, support programs
of any size.
How important it is that the compiler conforms to the ANSI
standard? If the product will be reused many times by other engineering groups,
portability is vital. There is a lot to be said for using at least a close
facsimile of ANSI compatibility so the product can survive a midstream compiler
change. In any event, be sure the compiler at least supports function prototyping.
It's a simple way of automatically checking parameter lists that wall eliminate
many hours of debugging. Be sure the compiler actually checks the parameters,
and doesn't accept but ignore the "prototype" keyword!
If the system will use some form of multitasking or interrupt
handlers written in C, be sure the library is reentrant. Some aren't. Manuals
rarely allude to reentrancy, so compile a tiny "do nothing" program
and measure RAM use. Then, add library calls without adding variables. Be
suspicious if the more RAM is linked in. The most common non-reentrant part
of a library is the floating point package.
Non-reentrant code might force you to write all of the interrupt
handlers in assembly. Remember that allocation of automatics can change the
reentrancy characteristics of a function. Automatics stored on the stack will
be intrinsically reentrant; those assigned to specific RAM locations will
not.
On the subject of libraries, be sure the compiler includes
all of the functions you'll need. While many embedded systems make no runtime
calls, others depend on extensive library support. Evaluate your requirements.
Will you require special CPU resources? Does the compiler support these? Most
compiler companies use a common parser and just replace code generation modules
when building variants for different processors. Therefore, register variables,
for example, may not really use registers.
A few compilers will support a CPU's internal memory management
unit. If your design requires extended memory that can be accessed only via
an MMU, be sure the compiler gives you some sort of MMU control. Some compilers
will even automatically remap it and insert C functions into individual maps.
Hitachi's 647180X microcontroller includes an MMU; others will as time moves
on and memory demands increase.
Is the compiler compatible with the assembler? Compiled C
code must be combined with assembled files via a common linker. Almost every
linker takes a different object file format, essentially guaranteeing problems
in combining tools from different vendors. In some cases the tools from a
single vendor are incompatible. Don't expect to work around a weak assembler
by substituting one from another software house, since they will rarely be
compatible.
Once the code is written you'll have to debug it. Plan to
use a source level debugger (SLD), and be sure the compiler is compatible
with the debugger. Prior to the widespread use of C it was common to mix and
match tools; the assembler, linker, and debugger could all come from different
vendors, yet work together with a minimum of trouble. The symbol and hex files
might need a trivial amount of conversion to get the tools to work well together,
but that was expected and was really not a lot of trouble. Block-structured
languages like C have changed all of this. Tools are sometimes like the construction
workers on the Tower of Babel. Few are now really compatible with each other.
Source level C debugging requires a tremendous amount of information
about the program's organization. C is not just an extension of assembly;
line number records and symbol addresses, while sufficient for programs created
with an assembler, are only a fraction of what is needed for C. Usually most
of the difficulty lies with data representations. Is a variable local to one
function? How is it's scope defined in the debug file? Is it a static that
has an absolute memory address, or, as for an automatic type, is it assigned
as an offset from the stack? Is it a register variable? All of these questions
get even more complicated if the variable is an array or structure.
Unfortunately, many compilers produce little or no debug information,
rendering them all but useless in an embedded environment where troubleshooting
by adding print statements just doesn't work.
The 8 bit arena is especially chaotic. While a number of standards
for expressing source debug information have been proposed (such as IEEE-695
and COFF), few languages produce these; those that do often add their own
extensions. The quality of information varies widely and changes almost daily
as the vendors scramble to get their products into better competitive positions.
The files are far more complex than a simple symbol file, so generating conversion
utilities is a difficult and time-consuming process. SLD vendors must think
long and hard before supporting a particular format.
The moral of the story is to ask hard questions from each
of your development tool vendors. Make no assumptions about compatibility.
Once you have a text file of C code, it must be compiled, linked, perhaps
located, and debugged through an SLD and emulator. Will each of these tools
work together? Will you get full source debug functionality, like local variables,
scope tracking and C line number support, or will some important feature be
compromised?
Finally, try to insure that the compiler is reliable. Talk
to people who have successfully completed sizable embedded projects with the
tool. How often did the compiler crash, miscompile, or unexpectedly cost engineering
time? I know of one big Navy job where the government mandated the use the
obscure language CMSC. At great expense the company acquired a DOD-approved
VAX cross compiler that was so unreliable they were forced to write code in
small sections, compile it, and examine all of the compiler's assembly output.
If the translation was obviously wrong, the programmers made a more or less
random source change and tried again. Your tax dollars at work.
A lot of factors go into compiler selection. When you finally
make a decision, buy the product and immediately run tests to evaluate the
product's usefulness. A few days of testing can reveal many fatal flaws. Return
it if it is unacceptable - reputable companies will always take a return if
made within the first week or two.