post update - boot2 - Playing with the boostrap

commit 932b2491b2cd4b0d21da8e12eb8796f6deb74da5
parent 67f4606e7e4c31112760ac9855384a4315e3a7df
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 23 Apr 2026 11:15:21 -0700

post update

Diffstat:
M post.md  | 245 ++++++++++++++++++++++++++++++++++++++++++++++---------------------------------

1 file changed, 143 insertions(+), 102 deletions(-)
diff --git a/post.md b/post.md
@@ -1,31 +1,45 @@
-# Title
+# Playing with the bootstrap
 
-Your compiler compiles your code, but what compiled the compiler? And the one
-before that?
+## Where does it all come from?
 
-Ken Thompson freaked everyone out in 1983 by describing a nasty
-self-replicating attack on compilers themselves that reminded folks that it
-might be a good idea to have a "full bootstrap" from as small a binary seed as
-could be managed and then audited source from there forward to the toolchains
-we know and (maybe-kinda) love today.
+Computers execute programs. Programs are specified as machine code, binary
+blobs loaded into memory that the processor interprets on the fly.
 
-Jeremiah Orians and Jan Nieuwenhuizen delivered exactly that to the world in
-2023, going from `hex0-seed`, a few hundred byte seed binary that is the "hex0"
-compiler, through Fabrice Bellard's Tiny C Compiler (TCC), and then up through
-GCC and the rest of the stack. Truly awesome work.
+Humans write programs (well, some humans do, and now many non-humans as well).
+But humans specify programs in programming languages, not machine code. It's
+the job of a compiler to translate to machine code.
 
-Between "hex0" and TCC, there's a minimal assembler and linker in M0 and hex2,
-then a "subset C" compiler that builds out the Mes Scheme ecosystem, then
-a Mes C compiler that compiles TCC.
+The compiler is itself a program. Where did it come from? Well, it was compiled
+of course, by another compiler!
 
-From seed to M0 and hex2 is ~2KLOC. The "middle" stage is ~25KLOC to get to
-Mes Scheme and a collection of build and packaging tools. And the "late" stage
-of the Mes C compiler and TCC sources are another ~25KLOC.
+Uh oh. Infinite regress? Not quite. It bottoms out in something directly
+specified in machine code, a "binary seed" from which the rest sprouts.
 
-I find M0 and hex2 to be very cute and everything after that to be rather
-goopy. So, let's dive into the cuteness!
+For about 30 years though, we had lost the seed. We were just going around
+using whatever compiler was on hand, figuring that there always would be one
+on hand, and trusting that it would do what it said on its tin. People noticed
+it was a bit odd that we had lost the chain, that we couldn't compile the
+compiler if we had lost the compiler, but not much was done about it.
 
-Getting there is fairly straightforward:
+In 1983, Ken Thompson told a little horror story that spooked folks, but still,
+it didn't really change much. Like many good horror stories, this one was about
+a little invisible demon. Someone had put a virus into the compiler that
+replicated itself into all future compilers and would scrub itself out of
+existence if you ever tried compiling a program that went looking for it. All
+of our compilers were infected and none of us knew. Shivers.
+
+## Bottle universe
+
+Nobody did anything about it. Until Jeremiah Orians and Jan Nieuwenhuizen
+decided to do something about it, and in 2023, they delivered "the full
+bootstrap". A few hundred bytes of binary seed that defined a little language
+called "hex0", and then source files from there on until they could compile
+Fabrice Bellard's Tiny C Compiler (TCC). Once you have TCC, you compile an old
+version of GCC that was still written in C (GCC moved to C++), then compile a
+more recent GCC, then you use that to compile everything else, including Linux.
+Truly amazing work.
+
+Jeremiah's early chain is quite beautiful:
 
 ```
 hex0-seed = given
@@ -38,60 +52,117 @@ M0 = hex2(catm(ELF.hex2, M0.hex2))
 ```
 
 **hex0**: the minimal seed hex compiler; it turns pairs of hexadecimal digits
-into raw bytes. Supports # and ; line comments.
+into raw bytes. Supports `#` and `;` line comments.
 
 **hex1**: a small extension of hex0 that adds the first notion of symbolic
-control flow. Features everything in hex0, single-character labels, one size
-of relative jump/offset encoding.
+control flow. Adds single-character labels and 32-bit relative offsets to a
+label.
 
 **hex2**: the final "hex language" and first real linker-like stage in the
-chain. Features everything in hex1, labels of arbitrary length, relative
-references of multiple sizes (!, @, %), absolute references ($, &), enough
-symbol/address resolution to link later stages.
+chain. Extends hex1 with labels of arbitrary length, 8/16/32-bit relative
+references (!, @, %), and 16/32-bit absolute references ($, &).
+
+**catm**: a minimal file concatenation utility, takes N files and outputs 1.
+
+**M0**: a basic macro preprocessor built on top of hex2, used as a mini
+assembler. Features DEFINE substitution of names to hex bytes, hex and ascii
+strings, 8/16/32-bit decimal and hex immediates (!, @, %).
+
+From this base, they build a "subset C" compiler and a mini shell called "kaem"
+in M0, then use that to build up to a little userspace called "M2-Planet" and
+the Mes Scheme interpreter and ecosystem. They then define another C compiler
+in Scheme (MesCC), and use that to compile TCC. And then you're off to the races.
+
+There's something elegant and simple and beautiful about this bottle universe
+they constructed: an assembler and linker, a Scheme, a C. Timeless. What more
+do you need really?
+
+## An alternative path
+
+Up through M0, it's about 2000 lines of code (2KLOC). But it gets goopy quick.
+It's ~30KLOC to get from M0 to MesCC, and then ~20KLOC for TCC itself. And
+the early bits defined in M1, like the subset C compiler, are all
+arch-specific.
+
+For fun, learning, and tremendous profit, I thought I'd play around with an
+alternative path. One that hopefully will knock down that KLOC count and maybe
+tidy up this little bottle universe that exists at the bottom of the software
+sea.
+
+There are 2 parts. First, get to portability sooner, to knock down the amount
+of arch-specific code that's needed. I do this by defining a pseudo-ISA called
+P1 that can be mapped to each arch via simple expansions. Second, define a
+richer macro preprocessor to make P1 programming more ergonomic, ergonomic
+enough to... Third, define a Scheme interpreter in P1. Then this Scheme becomes
+the base to define a shell and a C compiler.
 
-**catm**: a minimal file concatenation utility, take N files and output 1.
+Here's what I'm aiming for, in Schemey pseudocode:
 
-**M0**: the first assembler built on top of hex2. Features DEFINE substitution
-of names to hex bytes, immediate values in various sizes/formats, symbolic
-assembly over hex2.
+```
+;; Bootstrap from seed, same as before
+(define hex2 (((hex0-seed hex0.hex0) hex1.hex0) hex2.hex1))
+(define catm (hex2 catm.hex2))
+(define M0   (hex2 (catm ELF.hex2 M0.hex2)))
+
+;; Compile+Link
+(defn exe (M1-src) (hex2 (catm ELF.hex2 (M0 M1-src))))
+
+;; We define a portable pseudo-ISA called P1, and use that to define a richer
+;; macro expander called m1pp (ht Bjarne)
+(define m1pp (exe (catm P1.M1 m1pp.M1)))
+
+;; From there, we redefine P1 in terms of m1pp, and use P1 sources
+;; P1A.M1pp is the arch-specific "backend" file
+;; P1pp.M1pp is P1 exposed as M1pp macro definitions that sources use
+(defn p1exe (P1-src) (exe (m1pp (catm P1A.M1pp P1pp.M1pp P1-src))))
+
+;; To Scheme!
+(define scheme (p1exe scheme.P1))
+
+;; And finally C
+(defn scc (C-src) (p1exe (scheme cc.scm C-src)))
+(define tcc0 (scc tcc.c))   ;; compiler: scheme cc.scm
+(define tcc1 (tcc0 tcc.c))  ;; compiler: scheme-compiled tcc 
+(define tcc  (tcc1 tcc.c))   ;; compiler: tcc-compiled tcc
+```
 
-M0 programs are architecture-specific because the DEFINEd set of mnemonics and
-their byte-level encodings are architecture-specific. So that means that the
-subset C compiler, defined in M0, has to be written once per architecture.
-That's a drag.
+So the new bits we need:
+- m1pp: P1.M1, m1pp.M1
+- P1: P1A.M1pp, P1pp.M1pp
+- Scheme: scheme.P1
+- C: cc.scm
 
-I thought it'd be rather fun to play around at this low level and see if we
-could build off M0 in a way that might get us to a functioning C compiler
-through a path that cuts down on the amount of code in that "middle" layer.
+Today, I present just m1pp and P1 defined in it. Next will come a Scheme in P1,
+and finally enough of a C compiler in Scheme to compile the tcc source.
 
-So I present to you "P1", a portable pseudo-ISA targetable via M1 DEFINEs, and
-"M1M", a macro expander for M1, written in P1.
+## P1
 
-P1
+"P1" is a portable pseudo-ISA targetable via M1 DEFINEs.
 
 Registers:
 
-- `a0`–`a3` — argument registers. Also caller-saved general registers.
+- `a0`–`a3` — argument registers, and caller-saved general registers.
 - `t0` — caller-saved temporary.
 - `s0`–`s1` — callee-saved general registers.
 - `sp` — stack pointer.
 
 Ops:
-- Materialization: `LI`, `LA`, `LA_BR`
+- Materialization: `LI` (load immediate), `LA` (load address), `LA_BR` (load branch register)
 - Moves: `MOV`
 - Arithmetic: `ADD`, `SUB`, `AND`, `OR`, `XOR`, `SHL`, `SHR`, `SAR`, `MUL`,
   `DIV`, `REM`
-- Immediate arithmetic: `ADDI`, `ANDI`, `ORI`, `SHLI`, `SHRI`, `SARI`
+- Arithmetic with immediates: `ADDI`, `ANDI`, `ORI`, `SHLI`, `SHRI`, `SARI`
 - Memory: `LD`, `ST`, `LB`, `SB`
 - Branching: `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BEQZ`, `BNEZ`, `BLTZ`
 - Calls / returns: `CALL`, `CALLR`, `RET`, `TAIL`, `TAILR`
 - Frame management: `ENTER`, `LEAVE`
-- ABI access: `LDARG`
+- ABI arg access: `LDARG`
 - System: `SYSCALL`
 
-It's a deliberately dumb little RISC: a handful of visible registers, a hidden
-branch/call target register behind `LA_BR`, and a generator that spits out the
-per-arch `DEFINE` tables for 32-bit and 64-bit x86, ARM, and RISC-V.
+It's a dumb little RISC, with a bit of sugar for frame management and tail
+calling: a handful of visible registers, a hidden branch/call target register
+behind `LA_BR`. It's meant to be implemented by a set of per-arch `DEFINEs`,
+aiming for 32-bit and 64-bit x86, ARM, and RISC-V.
 
 Calling convention:
 
@@ -100,81 +171,51 @@ Calling convention:
   `LDARG`.
 - `a0` is also the one-word return register.
 - `a0`-`a3` and `t0` are caller-saved; `s0`-`s1` and `sp` are callee-saved.
-- If you need a frame, `ENTER` builds the standard one and `LEAVE` tears it
-  down.
+- `ENTER` builds the standard frame and `LEAVE` tears it down.
 - Stack-passed outgoing args are staged in frame-local space before `CALL`.
 - Wider-than-one-word returns use the usual hidden pointer trick: caller passes
   a writable result buffer in `a0`, real args slide over by one, callee returns
   that same pointer in `a0`.
 
-That's enough to write a portable macro expander, a Lisp, or a tiny C
-compiler once, then retarget by swapping out the generated M1 defs instead of
-rewriting the whole program per architecture. That cuts down auditable code at
-the bottom of the stack quite a bit. And it may give us a shorter path from
-M0 to TCC (but no promises). Mostly it's just a cute base to work on.
+It ships as a pair of files you catm before your P1 source:
+- P1A.m1pp: architecture-specific backend implementing the portable interface
+- P1.m1pp: the portable interface that P1 sources program against
 
-```
-;; Bootstrap from seed
-(define hex0 (hex0-seed hex0.hex0))
-(define hex1 (hex0 hex1.hex0))
-(define hex2 (hex1 hex2.hex1))
-(define catm (hex2 catm.hex2))
-(define M0   (hex2 (catm ELF.hex2 M0.hex2)))
-
-(defn link (hex2-src) (hex2 (catm ELF.hex2 hex2-src)))
-
-;; Bootstrap the macro expander once without macros.
-(define m1m (link (M0 (catm P1.M1 m1m.M1))))
+## m1pp
 
-;; Normal P1 build: expand macros, then assemble/link.
-;; P1A.M1=arch-specific backend, P1M.M1=P1 macros
-(defn p1compile (src) (M0 (m1m (catm P1A.M1 P1M.M1 src))))
-(defn p1exe (src) (link (p1compile src)))
-```
-
-Now with P1, with macro support, we have a half-decent portable assembler.
-We use that to build a Scheme with file IO and process support.
-
-```
-;; Build the language that will host the rest of the climb.
-(define scheme (p1exe scheme.P1))
-```
+m1pp is a macro expander:
+- Define macros: `%macro NAME(a, b) ... %endm`
+- Use macros: `%NAME(x, y)`. These can appear within macros as well, so it's a
+  recursive expansion.
+- Concat tokens: `a ## b`
+- Evaluate arithmetic integer expressions and emit little-endian hex:
+  `!(expr)` (8-bit), `@(expr)` (16-bit), `%(expr)` (32-bit), `$(expr)` (64-bit)
+- Conditional expansions: `%select(cond_expr, then, else)`
 
-And then Scheme is our shell and our C compiler.
+The expression syntax is, of course, Lisp-shaped:
 
 ```
-(defn scc (src) (scheme cc.scm src))
-(defn ccexe (src) (p1exe (scc src)))
+atoms: decimal or 0x-prefixed integer literals
+calls: (+ a b) (- a b) (* a b) (/ a b) (% a b) (<< a b) (>> a b)
+       (= a b) (!= a b) (< a b) (<= a b) (> a b) (>= a b)
+       (& a b) (| a b) (^ a b) (~ a)
 ```
 
-So, just like in the current bootstrap, a unified source TCC would then compile
-as:
+It's enough for all per-arch instruction encoding and defining P1 mnemonics.
 
-```
-(define tcc0 (ccexe tcc.c))  ;; compiler: scheme
-(define tcc1 (tcc0 tcc.c))   ;; compiler: scheme-compiled tcc 
-(define tcc (tcc1 tcc.c))    ;; compiler: tcc-compiled tcc
-```
-
-So, what does P1 code with the macro expansion base look like?
+## Example P1 program
 
 ```
-## copy.M1
-##
-## And symbolic constants:
-##   SYS_open SYS_read SYS_write SYS_close SYS_exit O_RDONLY
-##   WORD
-##   SAVE_S0 SAVE_S1 BUF_OFF COPY_LOCALS BUF_SIZE
-##
-## Entry convention:
-##   a0 = argc
-##   a1 = argv
+## copy.P1
 
 DEFINE BUF_SIZE %128
 DEFINE SAVE_S0 %16
 DEFINE SAVE_S1 %24
 DEFINE COPY_LOCALS %144
 
+## Entry convention:
+##   a0 = argc
+##   a1 = argv
 :main
   %li(t0, 2)
   %blt(a0, t0, usage)          ; if argc < 2, exit 2

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs