Maestro now supports gcc!

2025-06-30T13:00:00+00:00

A dive into cross compilation and program loading

Article cover image

The goal of Maestro is to be a Linux-compatible operating system that is complete enough to fulfil most use cases while being lightweight. It uses the safety of Rust to help ensure the correctness of the codebase and improve security.

To be usable for production workloads, it needs to support plenty of programmes and libraries. As a guideline, I have stated in my last article that my goal for this year was to be able to do software development on Maestro itself.

To fulfil this goal, I needed to port a compiler. This will also allow porting a lot more programmes to Maestro later.

This is now DONE, with Maestro supporting both gcc and g++ allowing to build C and C++ applications!

Hello world! with gcc Hello world! with g++

I also have tested donut.c, which is an obfuscated C programme that displays a rotating doughnut.

             k;double sin()
         ,cos();main(){float A=
       0,B=0,i,j,z[1760];char b[
     1760];printf("\x1b[2J");for(;;
  ){memset(b,32,1760);memset(z,0,7040)
  ;for(j=0;6.28>j;j+=0.07)for(i=0;6.28
 >i;i+=0.02){float c=sin(i),d=cos(j),e=
 sin(A),f=sin(j),g=cos(A),h=d+2,D=1/(c*
 h*e+f*g+5),l=cos      (i),m=cos(B),n=s\
in(B),t=c*h*g-f*        e;int x=40+30*D*
(l*h*m-t*n),y=            12+15*D*(l*h*n
+t*m),o=x+80*y,          N=8*((f*e-c*d*g
 )*m-c*d*e-f*g-l        *d*n);if(22>y&&
 y>0&&x>0&&80>x&&D>z[o]){z[o]=D;;;b[o]=
 ".,-~:;=!*#$@"[N>0?N:0];}}/*#****!!-*/
  printf("\x1b[H");for(k=0;1761>k;k++)
   putchar(k%80?b[k]:10);A+=0.04;B+=
     0.02;}}/*****####*******!!=;:~
       ~::==!!!**********!!!==::-
         .,~~;;;========;;;:~-.
             ..,--------,*/

It is compiled with this command line:

gcc donut.c -o donut -ansi

And here is the result:

Aside from a few yet unsupported ANSI escape sequences, this is the expected behaviour :)

A little anecdote: at the end of the video, just before bash’s prompt, you may see a little heart character.

A little heart before the bash prompt

This is due to the way the TTY works. When typing Ctrl + C, it receives the ETX (End Of Text) character which has the ASCII code 3. This is because C is the 3rd letter in the alphabet.

The TTY should interpret this character as a SIGINT signal, which is sent to the process group on the foreground (and thus, the currently running programme using the TTY).

I simply did not translate the ETX character on the display yet, which should show up as ^C (but I did implement sending the signal). So instead, the TTY displays the character corresponding to the value 3 of the VGA text mode, which is a little heart <3

For reference, here is the default VGA characters set:

VGA characters

Since I had to make a few fixes on the cross compilation toolchain and the kernel’s programme loading code, I figured it would talk a bit about how it works!

Cross compilation

The first step was to build a cross-compiler, which is a compiler that builds for a different target than the one it is running on.

This is usually necessary when building for a different CPU architecture, but in my case, this is needed because Maestro uses musl, and the computer I build on uses glibc (the GNU C library).

The cross-compilation of gcc actually has to happen several times. This is due to the fact that gcc can only build to one target at a time. Moreover, the standard C library (musl) is also not present at the beginning.

Let A be the compilation target of whatever machine we build the cross-compiler on. B will be the target machine running Maestro (this is going to be x86_64-unknown-linux-musl).

We have to perform roughly the following steps:

  • Build binutils on machine A, targeting machine B, which will be used by the first compilation of gcc
  • Build gcc on machine A, targeting machine B. However:
    • We don’t have the C standard library (musl) built yet, so we have to use the --with-newlib when configuring it (newlib is another libc) to trick the compiler into not relying on machine A’s libc
    • We cannot yet build the C++ standard library (which is shipped with gcc) since it needs the C standard library, so we also have to use --disable-hosted-libstdcxx
  • Build musl using the previously built gcc, targeting B
  • Build libstdc++, which can now be done since musl has been built, to run on machine B

This first stage gives us a cross-compiler that runs on machine A to produce programmes for the machine B. It cannot itself run on Maestro, so now we need to build binutils and gcc again with our cross-compiler so that they can run on machine B.

For more details, you can check out the scripts I wrote to build the cross-compilation toolchain here. This is largely inspired by the Linux From Scratch book. Since it is meant for Linux, and Maestro is ABI-compatible with Linux, it fits pretty well.

Program loading

Once built, a programme can be loaded by the kernel in order to be executed.

Two types of ELF files can be loaded as programmes:

  • ET_EXEC: a fixed-position executable (this is what you get when you build with -fno-pie)
  • ET_DYN: a position-independent executable (-fpie)

When loading an ELF programme, if it is ET_DYN, the kernel can load it at the offset it desires (called the “load bias”). However, programmes that are ET_EXEC, need to be loaded at the beginning of the process’s memory.

Some ELF programmes contain a programme header of type PT_INTERP, which specifies the path of the ELF interpreter to be used. Since Maestro uses musl by default, this path should be /lib/ld-musl-x86_64.so.1 in our case.

From the point of view of the kernel, the following operations need to be done when loading an ELF programme:

  • Map the ELF programme’s segments (represented by “program headers”) into the process’s address space
  • Map the ELF interpreter’s segments into the process’s address space (if any)
  • Allocate a stack for the programme
  • Map the vDSO in memory
  • Write arguments, environment variables and the auxiliary vector onto the process’s stack, according to the System V ABI

Note: the stack is executable by default, but this can be disabled with a PT_GNU_STACK program header.

The kernel checks the permissions on this program header to determine whether the stack should be executable or not.

Note that performing relocations, either on the programme itself or on the ELF interpreter, is NOT the kernel’s job. The interpreter is responsible for doing it, even when the programme is statically linked (in which case the interpreter is embedded into the programme itself).

In this part I had the following issues:

  • ELF interpreters were not supported
  • The mmap system call was buggy and was overwriting the zone dedicated for brk

Auxiliary vector and ELF interpreter

The auxiliary vector is a set of values passed to the ELF interpreter, which gives information about the way the programme is loaded. Those values can either be integers or strings.

Important values include:

  • AT_PHDR is the address of the programme’s (not the interpreter’s) program header table (containing segments information)
  • AT_BASE is the address at which the interpreter (not the programme) is loaded
  • AT_ENTRY is the address of the entry point of the programme. This is the address at which the interpreter jumps to after dynamic linking is done (usually the _start function, which is responsible for calling main)

The ELF interpreter uses those values to compute the position of the programme, then it reads it to figure out which shared libraries it depends on (if any).

Then, the ELF interpreter, the ELF programme and the shared libraries it depends on all collaborate to perform relocations (that is, compute the final address of each symbol depending on the address at which each shared library is loaded).

When relocations have been computed, the final programme image is ready, and the interpreter jumps to the AT_ENTRY address to execute the programme.

Running the GNU compiler

To build a programme (without LTO), GCC runs the following sub-programmes:

  • cc1: the actual C compiler (or cc1plus for C++)
  • as: turns the resulting assembly into object files
  • collect2: links object files a first time to look for constructor functions, then creates a table of them to be called by __main at start
    • ld: final executable linking

Those programmes are located in /usr/lib/gcc/x86_64-unknown-linux-musl/<gcc-version>.

It turns out, cc1 is surprisingly big (more than 300MB). I had to rewrite the ELF parser to avoid reading the whole file at once, since it was way too slow.

Aside from this minor difficulty, supporting gcc was surprisingly easy once the required system calls were implemented (which was already the case). I just had to make a minor fix to the _llseek implementation which did not support negative offsets.

And voilà!

What’s next?

Now that I am able to build C and C++ code, the next step is to port the programmes necessary for software development. This includes (non-exhaustive):

  • autoconf
  • automake
  • cargo
  • git
  • grep
  • make
  • rustc
  • vim

Since building those takes a bit of time, it is relevant to first add support for Symmetric MultiProcessing (SMP), an important feature Maestro lacks so far. This allows us to use all the CPU cores on the machine. Which will make building much faster.

cargo and git will also require network support, which will likely come by the end of the year. This will likely be the subject of a few blog articles.

All of this is the goal for the end of the year 2025.

For 2026, I shall give a try at building a desktop environment for Maestro.

In the past, I already started writing an implementation of X11, although I would like to give Wayland a try since it is more modern.

Stay tuned!