boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 6f73803c2215ca80eee97a3c0c85d1238d6113a1
parent 5d78449f72b14b1a16ce75235d12da6c4e14d1c5
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 24 Apr 2026 17:52:57 -0700

rm post, moved to site

Diffstat:
Dpost.md | 423-------------------------------------------------------------------------------
1 file changed, 0 insertions(+), 423 deletions(-)

diff --git a/post.md b/post.md @@ -1,423 +0,0 @@ -# Playing with the bootstrap - -## Where does it all come from? - -Computers execute programs. Programs are specified as machine code, binary -blobs loaded into memory that the processor interprets on the fly. - -Humans write programs (well, some humans do, and now many non-humans as well). -But humans specify programs in programming languages, not machine code. It's -the job of a compiler to translate to machine code. - -The compiler is itself a program. Where did it come from? Well, it was compiled -of course, by another compiler! - -Uh oh. Infinite regress? Not quite. It bottoms out in something directly -specified in machine code, a "binary seed" from which the rest sprouts. - -For decades, though, we had lost the seed. We were just going around using -whatever compiler was on hand, figuring that there always would be one on hand, -and trusting that it would do what it said on its tin. People noticed it was a -bit odd that we had lost the chain, that we couldn't compile the compiler if -we lost the compiler, but not much was done about it. - -In 1983, Ken Thompson told a little horror story that spooked folks, but still, -it didn't really change much. Like many good horror stories, this one was about -a little invisible demon. Someone had put a virus into the compiler that -replicated itself into all future compilers but would scrub itself out of -existence if you ever tried compiling a program that went looking for it. All -of our compilers were infected and none of us knew. Shivers. (Search for -"Trusting Trust" if you want to read the full story). - -## Bottle universe - -Nobody did anything about it until someone did. Our heroes are Jeremiah Orians -and Jan Nieuwenhuizen; in 2023, they delivered "THE FULL BOOTSTRAP" (tm). A few -hundred bytes of binary seed that defined a little language called "hex0", and -then source files from there on until they could compile Fabrice Bellard's Tiny -C Compiler (TCC). Once you have TCC, you compile an old version of GCC that was -still written in C (GCC has long since moved to C++, woe be unto us all), then -compile a more recent GCC, then you use that to compile everything else, -including Linux. Truly amazing work. - -Jeremiah's early chain is quite beautiful: - -``` -hex0-seed = given - -hex0 = hex0-seed(hex0.hex0) # hex0 compiles itself from source -hex1 = hex0(hex1.hex0) -hex2 = hex1(hex2.hex1) -catm = hex2(catm.hex2) -M0 = hex2(catm(ELF.hex2, M0.hex2)) -``` - -**hex0**: the minimal seed hex compiler; it turns pairs of hexadecimal digits -into raw bytes. Supports `#` and `;` line comments. - -**hex1**: a small extension of hex0 that adds the first notion of symbolic -control flow. Adds single-character labels and 32-bit relative offsets to a -label. - -**hex2**: the final "hex language" and first real linker-like stage in the -chain. Extends hex1 with labels of arbitrary length, 8/16/32-bit relative -references (!, @, %), and 16/32-bit absolute references ($, &). - -**catm**: a minimal file concatenation utility, takes N files and outputs 1. - -**M0**: a basic macro preprocessor built on top of hex2, used as a mini -assembler. Features DEFINE substitution of names to hex bytes, hex and ascii -strings, 8/16/32-bit decimal and hex immediates (!, @, %). - -From this base, they build a "subset C" compiler and a mini shell called "kaem" -in M0 (per architecture), then use that to build up to a little userspace -called "M2-Planet" and the Mes Scheme interpreter and ecosystem. They then -define another C compiler in Scheme (MesCC), and use that to compile TCC. And -then you're off to the races. - -There's something elegant and simple and beautiful about this bottle universe -they constructed: an assembler and linker, a Scheme, a C. Timeless. What more -do you need really? - -## An alternative path - -Up through M0, it's about 2000 lines of code (2KLOC) per architecture. But it -gets goopy quick. It's ~30KLOC to get from M0 to MesCC, and then ~20KLOC for -TCC itself. And the early bits defined in M1, like the subset C compiler, are -all arch-specific. - -For fun, learning, and tremendous profit, I thought I'd play around with an -alternative path. One that hopefully will knock down that KLOC count and maybe -tidy up this little bottle universe that exists at the bottom of the software -sea. - -Two moves. - -**First, portability.** M1 source is arch-specific: not just the bytes you emit -depend on whether you're targeting aarch64 or amd64 or riscv, but the source -code you write is dictated by arch-specific defines. Replace that with a -portable pseudo-ISA called P1. The interface is entirely portable, implemented -by arch-specific defines. Each target ships a standard backend — -`P1A-aarch64.M1`, `P1A-amd64.M1`, and so on — that DEFINEs each P1 mnemonic as -the right raw bytes for that arch. Catm the backend in at build time and the -same P1 source assembles anywhere. Think of it as portable assembly, well -before you get to the portable assembly that we call C, but targetable from -a simple textual expander instead of a full-blown compiler. - -**Second, power.** M1 is austere — DEFINE substitution of simple names and raw -bytes — and writing anything of size in it is miserable. So layer a macro -preprocessor on top and you get M1pp (ht Bjarne). Named-parameter macros, -compile-time integer expressions, conditional expansion, local labels, a few -structural shorthands. Rewrite the P1 files with M1pp macros to get the -arch-specific `P1A.M1pp` and the arch-agnostic `P1.M1pp`; the latter I'll call -P1pp, our richer portable target. Expressive enough to host a Lisp in a file -you might actually want to read. - -Here's what I'm aiming for, in Lispy pseudocode: - -``` -;; Bootstrap from seed, same as before -(define hex0 (hex0-seed hex0.hex0)) -(define hex1 (hex0 hex1.hex0)) -(define hex2 (hex1 hex2.hex1)) -(define catm (hex2 catm.hex2)) -(define M0 (hex2 (catm ELF.hex2 M0.hex2))) - -;; Compile+Link for arch-specific M1 source -(defn exe (M1-src) (hex2 (catm ELF.hex2 (M0 M1-src)))) - -;; P1 — portable pseudo-ISA at the M1 level. -;; P1A.M1 is the arch-specific backend -;; m1pp is itself a P1 program. -(define m1pp (exe (catm P1A.M1 m1pp.P1))) - -;; P1pp — P1 rewritten with m1pp macros. Assemble any P1pp source via m1pp. -;; P1A.M1pp is the arch-specific backend, now rewritten to use M1pp -;; P1.M1pp is the arch-agnostic interface -;; P1pp.P1pp is "libp1pp", P1pp's standard library, niceties and utilities for -;; programming in P1pp -(defn ppexe (src) (exe (m1pp (catm P1A.M1pp P1.M1pp P1pp.P1pp src)))) - -;; To Scheme! -(define scheme (ppexe scheme1.P1pp)) - -;; And finally C -(defn scc (C-src) (ppexe (scheme cc.scm C-src))) -(define tcc0 (scc tcc.c)) ;; compiler: scheme cc.scm -(define tcc1 (tcc0 tcc.c)) ;; compiler: scheme-compiled tcc -(define tcc (tcc1 tcc.c)) ;; compiler: tcc-compiled tcc -``` - -So the new bits are: - -- P1: `P1A.M1` -- m1pp: `m1pp.P1` -- P1pp: `P1A.M1pp`, `P1.M1pp`, `P1pp.P1pp` -- Scheme: `scheme1.P1pp` -- C: `cc.scm` - -Today, I present P1, M1pp, and P1pp — the portability move and the power move. -Next will come a Scheme in P1pp, and finally enough of a C compiler in -Scheme to compile the tcc source. - -## P1 - -P1 is a portable pseudo-ISA. Source looks a lot like assembly; each mnemonic -is a macro, and each arch ships a backend that realizes those mnemonics as -raw bytes for that target. Flip the backend file you catm in, and the same -P1 source assembles for a different arch. - -Registers: - -- `a0`–`a3` — argument and caller-saved general registers. -- `t0`–`t2` — caller-saved temporaries. -- `s0`–`s3` — callee-saved general registers. -- `sp` — stack pointer. - -That's a modest budget — twelve visible registers. The goal is small backends -and uniform source across architectures. Anything beyond this that a backend -needs for encoding (scratch regs, link register, zero register) is -backend-private and exposed only through the portable ops. `LA_BR` is the one -exception on the source side: it loads a label into a hidden *branch-target -register* that the immediately following direct control-flow op (`B`, `CALL`, -`TAIL`, or any conditional branch) then consumes. The register-indirect forms -(`BR`, `CALLR`, `TAILR`) take their target in an ordinary register instead. - -A word is register-sized — 32 bits on 32-bit targets, 64 bits on 64-bit -targets. `LD`/`ST` move words; `LB`/`SB` move bytes. No 16/32-bit slots in -between; code that needs them can OR/shift. - -Ops: - -- Materialization: `LI` (load immediate), `LA` (load label address), `LA_BR` - (load label into the hidden branch-target register). -- Moves: `MOV`. -- Arithmetic: `ADD`, `SUB`, `AND`, `OR`, `XOR`, `SHL`, `SHR`, `SAR`, `MUL`, - `DIV`, `REM`. -- Arithmetic with immediates: `ADDI`, `ANDI`, `ORI`, `SHLI`, `SHRI`, `SARI`. - Immediate widths are backend-dependent; the backend errors if the constant - doesn't fit. -- Memory: `LD`, `ST`, `LB`, `SB`. -- Branching: `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ`. - Signed and unsigned less-than; `>=`, `>`, `<=` are synthesized by swapping - operands or inverting branch sense. -- Calls / returns: `CALL`, `CALLR`, `RET`, `ERET`, `TAIL`, `TAILR`. -- Frame management: `ENTER`. -- ABI arg access: `LDARG` — reads stack-passed incoming args without - hard-coding the frame layout. -- System: `SYSCALL`. - -Calling convention: - -- `a0`-`a3` hold the first four argument words; extras live in an incoming - stack-argument area, read with `LDARG`. -- `a0` is the one-word return register. Two-word returns use `a0`/`a1`. -- `a0`-`a3` and `t0`-`t2` are caller-saved; `s0`-`s3` and `sp` are - callee-saved. -- `ENTER` builds the standard frame; `ERET` tears it down and returns - (`TAIL`/`TAILR` likewise combine teardown with a jump). -- Stack-passed outgoing args are staged in a dedicated frame-local area - before `CALL`, so the callee finds them at a known offset from `sp`. -- Wider-than-two-word returns use the usual hidden-pointer trick: caller - passes a writable result buffer in `a0`, real args slide over by one, - callee returns that same pointer in `a0`. - -Program entry: portable source writes `:p1_main` as an ordinary P1 function -with `argc` in `a0` and `argv` in `a1`, and returns an exit status in `a0`. -The backend emits a small per-arch `:_start` stub that captures the native -entry state, calls `p1_main`, and hands its return value to `sys_exit`. The -portable side never sees the raw entry stack. - -This shape is aimed at portability and simplicity of targeting from pure macro -expansion, sacrificing some performance, which seems a reasonable tradeoff -for the bootstrap. - -P1 comes in two flavors, one per toolchain level. The M1-level flavor is a -pair of files you catm before any P1 source: - -- `P1A.M1` — arch-specific backend, one DEFINE per mnemonic. - -The M1pp-level flavor — P1pp — is the same idea with richer macros, where the -arch-specific encodings are specified in P1A.M1pp via macro instead of -hardcoded like in P1A.M1: - -- `P1A.M1pp` — arch-specific backend. -- `P1.M1pp` — the portable interface. - -## M1pp - -P1-at-the-M1-level gets us portability, but M1 is austere — DEFINE -substitution and raw bytes. Fine for hand-encoding a handful of instructions; -painful for anything of size. m1pp is a small macro expander layered on top -of M0 that makes low-level programming bearable enough to linger down here -near the bottom of the sea. It adds three things M0 doesn't have: macros with -named parameters, integer expressions that evaluate at expand time, and -conditional expansion. - -Features: - -- **Macros**: `%macro NAME(a, b) ... %endm`, called with `%NAME(x, y)`. Macro - bodies can themselves contain macro calls, so expansion is recursive. -- **Integer expressions**: emit little-endian hex from a Lisp expression — - `!(expr)` for 8 bits, `@(expr)` for 16, `%(expr)` for 32, `$(expr)` for 64. - The punctuation mirrors M0's decimal/hex immediate widths (`!@%`) and adds - `$` for the 64-bit case. -- **Conditional expansion**: `%select(cond, then, else)` evaluates `cond` as - an integer and emits exactly one of the two branches. Non-zero is true. -- **Token concat**: `a ## b` pastes two tokens into one identifier. -- **Struct / enum shorthands**: `%struct NAME { f1 f2 }` synthesizes - `%NAME.f1` → 0, `%NAME.f2` → 8, `%NAME.SIZE` → 16. `%enum` does the same - with stride 1 and a `COUNT` terminator. Saves hand-counting offsets for - records and small tag sets. -- **Local labels**: inside a macro body, `:@loop` and `&@loop` pick up a - fresh per-expansion suffix, so a macro can define its own labels without - colliding with itself at a second call site. -- **Scopes**: `%scope NAME ... %endscope` pushes a lexical scope. Inside - it, a label written `::foo` emits as `:NAME__foo` (stack joined by `__` - when nested), so several callers can reuse short names without collision. - Resolution follows the caller's scope, not the macro's, so a generic - `%break` or `%continue` inside a scoped loop lands on the right label. -- **Stringify**: `%str(foo)` turns a word token into the string literal - `"foo"`. - -Comments (`#` and `;`) and M0 `DEFINE`s pass through untouched. - -The expression syntax is, of course, Lisp-shaped: - -``` -atoms: decimal or 0x-prefixed integer literals -calls: (+ a b) (- a b) (* a b) (/ a b) (% a b) (<< a b) (>> a b) - (= a b) (!= a b) (< a b) (<= a b) (> a b) (>= a b) - (& a b) (| a b) (^ a b) (~ a) - (strlen "astr") -``` - -Here's a real macro from the aarch64 P1pp backend — the "add register, immediate" -instruction: - -``` -%macro aa64_add_imm(rd, ra, imm12) -%((| 0x91000000 (<< (& imm12 0xFFF) 10) (<< %aa64_reg(ra) 5) %aa64_reg(rd))) -%endm -``` - -A call like `%aa64_add_imm(a0, a0, 8)` expands to a single Lisp expression -that ORs together the opcode, the 12-bit immediate shifted into place, the -source register field, and the destination register field. The outer `%(...)` -evaluates that expression and emits it as four little-endian hex bytes, which -M0 and hex2 then hand to the linker. The nested `%aa64_reg(...)` calls in the -body show that macros can call other macros during expansion — `aa64_reg` is a -tiny macro per register name that produces the aarch64-native register number -(e.g. `a0` → 0, `s0` → 19). - -That one pattern — Lisp expression in, hex bytes out — covers all per-arch -instruction encoding. - -The scope mechanism is what lets libp1pp's standard library define a -generic function macro `%fn`: - -``` -%fn(parse_number, 16, { - ... - LA_BR &::done - BEQZ t0 - ... - ::done -}) -``` - -The `16` is the frame-local byte count passed through to `%enter`. The -`::done` labels mangle to `parse_number__done`, so every function gets -its own short namespace with no hand-prefixing, and scoped loops inside -the body (`%loop_scoped`, `%while_scoped_nez`, etc.) stack further — a -`%break` anywhere inside the innermost loop finds its `_end`. - -## Three programs - -The shortest useful P1 program is a bare return. `p1_main` gets `argc` in -`a0` on entry, `a0` is also the one-word return register, and the backend -stub hands the return value to `sys_exit` — so this returns `argc` as the -exit status: - -``` -:p1_main - %ret - -:ELF_end -``` - -Hello world is a single `sys_write` from a leaf function: - -``` -:p1_main - %li(a0) %sys_write - %li(a1) %1 %0 - %la(a2) &msg - %li(a3) %14 %0 - %syscall - %li(a0) %0 %0 - %ret - -:msg -"Hello, World! -" - -:ELF_end -``` - -A few things to notice. `%li(a0)` and `%la(a2)` don't take the immediate -as a macro argument — the backend emits a load-from-literal-pool prefix -and the bytes that follow in source become the literal. Here -`%sys_write` expands to the 8-byte Linux syscall number for write -(the aarch64 backend's choice); `&msg` is a 4-byte label reference -(addresses fit in 32 bits in the stage0 image layout). The two `%N %0` -pairs are M0 4-byte decimal immediates padded to 8 bytes. - -A function call, with a helper that doubles its argument: - -``` -:double - %shli(a0, a0, 1) - %ret - -:p1_main - %enter(0) - %call(&double) - %eret - -:ELF_end -``` - -`double` is a leaf and needs no frame. `p1_main` is not — it calls -`double`, so it opens a standard frame with `%enter(0)` to preserve the -hidden return-address state across the call, and closes it with -`%eret`, which tears down the frame and returns in one step. Run with -`./double a b c` and the exit status is `8` (argc=4, doubled). - -## What it cost - -Concrete sizes for what's landed today: - -- `m1pp.P1`: ~4KLOC — the P1 source of the macro expander itself. -- `P1.M1pp`: ~150 LOC — the portable P1 interface. -- `P1-{aarch64,riscv64,amd64}.M1pp`: ~{480,430,650} LOC — arch backends. -- `p1pp.P1pp`: ~1KLOC — libp1pp, the standard library of control-flow - macros and utilities. - -(These are non-comment, non-blank LOC) - -We can't really see the full payoff until we have scheme1.P1pp and cc.scm of -course, but I think it's a meaningful improvement over per-arch subset C -compilers, and it's a neat little macro language to boot. Adding support for a -new arch at this level is just a few hundred lines of macro definitions, not a -new compiler. - -I'll be working on a tiny Scheme implementation in P1pp, and will likely edit -P1 and M1pp as I do. - -The dream is that we go from P1pp to Scheme to C via nice direct hops and no -external dependencies until TCC, and specifically, to a tiny single-binary -cross-compiling TCC-like C compiler. A micro Clang/LLVM, maybe 80% of the bang -for 1% of the buck. Projects like TCC, QBE, and Tilde all make me hopeful that -this is possible, and as ever, people like Jeremiah, Jan, and Fabrice are -inspirations. Here's to the crazy ones.