GNU Toolchain

From TauWiki

The entire gnu toolchain can be invoked with the appropriate arguments to the primary compilers, gcc, g++ and g77. There are other important tools which can also be considered to be part of the GNU Toolchain, such as the [GNU Debugger].

The primary code generation tools can be split into four steps.



Pre Processor

C has always used a pre-processor. This program processes things like #include and #define directives, and produces straight C code.

You can demonstrate this step with a command like:

    gpp mySourceFile.c t.i

You will note that a lot of stuff has been inserted, mostly from #include directives.

Under GNU, you will also note some odd # comments that would not be legal under C. These are special comments so that the compiler can report errors to the correct file line, or included file line.

The GNU FORTRAN-77 compiler, g77, also has the ability to use a version of this C pre-processor, in addition to the ANSI FORTRAN include command. This can be very useful for writing cross-platform FORTRAN, by using constructs like #ifdef.

The GNU C++ compiler, g++ is actually just a fancier form of C PreProcessor. In fact, with the -E flag you can see the generated C code.

   g++ -E mySource.cpp > mySource.i


The GNU compiler, usually gcc has the primary purpose of converting pre-processed C code into machine-specific assembler. This program can also command all of the other tools, so it is capable of all four steps. However, since it uses other executables, we consider the compilation to assembler stage to be its primary purpose.

You can examine your assembler code with a command like:

   gcc -S mySource.c

It should produce a file like mySource.s containing assembler code.

This is the real diamond on the GNU Toolchain. It is the hardest part to do, since it includes very complicated processes like optimization. Unfortunately, the GNU optimization is usually inferior to commercial tools such as icc.

How Computers Reproduce

The most exciting thing about this part of the GNU toolchain is that it is the key to how computers reproduce. The GNU compiler can be invoked as a cross-compiler. This means that I can build a version of gcc on my vanilla LINUX box which produces assembler code for the Broadcom processor on my Linksys router. With this tool, I can develop a custom kernel for the LinkSys. Once I have a kernel, I can use this cross-compiler to build other tools in from a small-linux package to create a custom mini-LINUX box.

In general, this is how you would create a new computer. Use a working computer to build a new gcc cross-compiler. (Write a cross-assembler too, but that is a relatively trivial task.) Build bootstrap code. Burn it to flash/rom. Plug into the new machine and boot the new hardware.

The relative ease of this process is the basis of all UN*X-like operating systems, including Mac. This is a large part of the reason why LINUX and NetBSD have been ported to so many platforms, and why SunOS has run on so many chips. (old motorolas, then sun's own RISC, then 386, then more RISC, including 64-bit, then modern Solaris for x86...)


The assembler is responsible for converting machine-specific assembly code into native byte code. You can use a member of the GNU assembler suite, as to do this. This is what I have almost always done. However, if the gcc produced assembler meets specs, you can often use the vendor-supplied assembler. I think I may have done that once on a large IBM system for some reason, I forget why.

This is a relatively simple process, since there is a fairly straightforward mapping between the ASCII test assembly files, and your binary object code. The tricky part is that there are many object file formats, like elf, coff, and perhaps others by specific vendors. Kind of annoying, but I'm sure these formats are fairly straightforward. I'm a little out of my depth here.


Object code is still in some ways generic. More work needs to be done to stitch it together into an executable. The GNU linker, ld can do this. Again, at this stage, it is often possible to substitute the vendor-sullpied linker. I'm pretty sure that GNU was inadequate for that odd AIX cluster I used once, so we used the IBM linker.

The linker can read typical object (.o) files, object archives (.a), and insert code to load sharable object files (.so) at run-time.

There are special flags on the GNU linker to produce static objects, sharable objects, or executables. I'm not sure about archives, I usually use ar for that, but as far as I know, the archiver is part of the linker. Anybody know?

One of the great things about the GNU linker is that on at least SOME platforms, it can convert between .so and .a. I recall an issue where a major software package for the Alpha was available only as an .a and we needed it as a .so to load as a plug-in for another application. All we needed to do was write the plug-in duct tape, and link it to the .a, specifying sharable output. This produced a viable plug-in. The vendor-supplied linker would NOT do this. As I recall, we had to go back to source, alter the Makefiles (which did not produce .so) and re-build the package to do this on SunOS. This was a very expensive process (It took about a week). All because the package provider saw no reason to support .so.