commit 6f73803c2215ca80eee97a3c0c85d1238d6113a1
parent 5d78449f72b14b1a16ce75235d12da6c4e14d1c5
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Fri, 24 Apr 2026 17:52:57 -0700
rm post, moved to site
Diffstat:
| D | post.md | | | 423 | ------------------------------------------------------------------------------- |
1 file changed, 0 insertions(+), 423 deletions(-)
diff --git a/post.md b/post.md
@@ -1,423 +0,0 @@
-# Playing with the bootstrap
-
-## Where does it all come from?
-
-Computers execute programs. Programs are specified as machine code, binary
-blobs loaded into memory that the processor interprets on the fly.
-
-Humans write programs (well, some humans do, and now many non-humans as well).
-But humans specify programs in programming languages, not machine code. It's
-the job of a compiler to translate to machine code.
-
-The compiler is itself a program. Where did it come from? Well, it was compiled
-of course, by another compiler!
-
-Uh oh. Infinite regress? Not quite. It bottoms out in something directly
-specified in machine code, a "binary seed" from which the rest sprouts.
-
-For decades, though, we had lost the seed. We were just going around using
-whatever compiler was on hand, figuring that there always would be one on hand,
-and trusting that it would do what it said on its tin. People noticed it was a
-bit odd that we had lost the chain, that we couldn't compile the compiler if
-we lost the compiler, but not much was done about it.
-
-In 1983, Ken Thompson told a little horror story that spooked folks, but still,
-it didn't really change much. Like many good horror stories, this one was about
-a little invisible demon. Someone had put a virus into the compiler that
-replicated itself into all future compilers but would scrub itself out of
-existence if you ever tried compiling a program that went looking for it. All
-of our compilers were infected and none of us knew. Shivers. (Search for
-"Trusting Trust" if you want to read the full story).
-
-## Bottle universe
-
-Nobody did anything about it until someone did. Our heroes are Jeremiah Orians
-and Jan Nieuwenhuizen; in 2023, they delivered "THE FULL BOOTSTRAP" (tm). A few
-hundred bytes of binary seed that defined a little language called "hex0", and
-then source files from there on until they could compile Fabrice Bellard's Tiny
-C Compiler (TCC). Once you have TCC, you compile an old version of GCC that was
-still written in C (GCC has long since moved to C++, woe be unto us all), then
-compile a more recent GCC, then you use that to compile everything else,
-including Linux. Truly amazing work.
-
-Jeremiah's early chain is quite beautiful:
-
-```
-hex0-seed = given
-
-hex0 = hex0-seed(hex0.hex0) # hex0 compiles itself from source
-hex1 = hex0(hex1.hex0)
-hex2 = hex1(hex2.hex1)
-catm = hex2(catm.hex2)
-M0 = hex2(catm(ELF.hex2, M0.hex2))
-```
-
-**hex0**: the minimal seed hex compiler; it turns pairs of hexadecimal digits
-into raw bytes. Supports `#` and `;` line comments.
-
-**hex1**: a small extension of hex0 that adds the first notion of symbolic
-control flow. Adds single-character labels and 32-bit relative offsets to a
-label.
-
-**hex2**: the final "hex language" and first real linker-like stage in the
-chain. Extends hex1 with labels of arbitrary length, 8/16/32-bit relative
-references (!, @, %), and 16/32-bit absolute references ($, &).
-
-**catm**: a minimal file concatenation utility, takes N files and outputs 1.
-
-**M0**: a basic macro preprocessor built on top of hex2, used as a mini
-assembler. Features DEFINE substitution of names to hex bytes, hex and ascii
-strings, 8/16/32-bit decimal and hex immediates (!, @, %).
-
-From this base, they build a "subset C" compiler and a mini shell called "kaem"
-in M0 (per architecture), then use that to build up to a little userspace
-called "M2-Planet" and the Mes Scheme interpreter and ecosystem. They then
-define another C compiler in Scheme (MesCC), and use that to compile TCC. And
-then you're off to the races.
-
-There's something elegant and simple and beautiful about this bottle universe
-they constructed: an assembler and linker, a Scheme, a C. Timeless. What more
-do you need really?
-
-## An alternative path
-
-Up through M0, it's about 2000 lines of code (2KLOC) per architecture. But it
-gets goopy quick. It's ~30KLOC to get from M0 to MesCC, and then ~20KLOC for
-TCC itself. And the early bits defined in M1, like the subset C compiler, are
-all arch-specific.
-
-For fun, learning, and tremendous profit, I thought I'd play around with an
-alternative path. One that hopefully will knock down that KLOC count and maybe
-tidy up this little bottle universe that exists at the bottom of the software
-sea.
-
-Two moves.
-
-**First, portability.** M1 source is arch-specific: not just the bytes you emit
-depend on whether you're targeting aarch64 or amd64 or riscv, but the source
-code you write is dictated by arch-specific defines. Replace that with a
-portable pseudo-ISA called P1. The interface is entirely portable, implemented
-by arch-specific defines. Each target ships a standard backend —
-`P1A-aarch64.M1`, `P1A-amd64.M1`, and so on — that DEFINEs each P1 mnemonic as
-the right raw bytes for that arch. Catm the backend in at build time and the
-same P1 source assembles anywhere. Think of it as portable assembly, well
-before you get to the portable assembly that we call C, but targetable from
-a simple textual expander instead of a full-blown compiler.
-
-**Second, power.** M1 is austere — DEFINE substitution of simple names and raw
-bytes — and writing anything of size in it is miserable. So layer a macro
-preprocessor on top and you get M1pp (ht Bjarne). Named-parameter macros,
-compile-time integer expressions, conditional expansion, local labels, a few
-structural shorthands. Rewrite the P1 files with M1pp macros to get the
-arch-specific `P1A.M1pp` and the arch-agnostic `P1.M1pp`; the latter I'll call
-P1pp, our richer portable target. Expressive enough to host a Lisp in a file
-you might actually want to read.
-
-Here's what I'm aiming for, in Lispy pseudocode:
-
-```
-;; Bootstrap from seed, same as before
-(define hex0 (hex0-seed hex0.hex0))
-(define hex1 (hex0 hex1.hex0))
-(define hex2 (hex1 hex2.hex1))
-(define catm (hex2 catm.hex2))
-(define M0 (hex2 (catm ELF.hex2 M0.hex2)))
-
-;; Compile+Link for arch-specific M1 source
-(defn exe (M1-src) (hex2 (catm ELF.hex2 (M0 M1-src))))
-
-;; P1 — portable pseudo-ISA at the M1 level.
-;; P1A.M1 is the arch-specific backend
-;; m1pp is itself a P1 program.
-(define m1pp (exe (catm P1A.M1 m1pp.P1)))
-
-;; P1pp — P1 rewritten with m1pp macros. Assemble any P1pp source via m1pp.
-;; P1A.M1pp is the arch-specific backend, now rewritten to use M1pp
-;; P1.M1pp is the arch-agnostic interface
-;; P1pp.P1pp is "libp1pp", P1pp's standard library, niceties and utilities for
-;; programming in P1pp
-(defn ppexe (src) (exe (m1pp (catm P1A.M1pp P1.M1pp P1pp.P1pp src))))
-
-;; To Scheme!
-(define scheme (ppexe scheme1.P1pp))
-
-;; And finally C
-(defn scc (C-src) (ppexe (scheme cc.scm C-src)))
-(define tcc0 (scc tcc.c)) ;; compiler: scheme cc.scm
-(define tcc1 (tcc0 tcc.c)) ;; compiler: scheme-compiled tcc
-(define tcc (tcc1 tcc.c)) ;; compiler: tcc-compiled tcc
-```
-
-So the new bits are:
-
-- P1: `P1A.M1`
-- m1pp: `m1pp.P1`
-- P1pp: `P1A.M1pp`, `P1.M1pp`, `P1pp.P1pp`
-- Scheme: `scheme1.P1pp`
-- C: `cc.scm`
-
-Today, I present P1, M1pp, and P1pp — the portability move and the power move.
-Next will come a Scheme in P1pp, and finally enough of a C compiler in
-Scheme to compile the tcc source.
-
-## P1
-
-P1 is a portable pseudo-ISA. Source looks a lot like assembly; each mnemonic
-is a macro, and each arch ships a backend that realizes those mnemonics as
-raw bytes for that target. Flip the backend file you catm in, and the same
-P1 source assembles for a different arch.
-
-Registers:
-
-- `a0`–`a3` — argument and caller-saved general registers.
-- `t0`–`t2` — caller-saved temporaries.
-- `s0`–`s3` — callee-saved general registers.
-- `sp` — stack pointer.
-
-That's a modest budget — twelve visible registers. The goal is small backends
-and uniform source across architectures. Anything beyond this that a backend
-needs for encoding (scratch regs, link register, zero register) is
-backend-private and exposed only through the portable ops. `LA_BR` is the one
-exception on the source side: it loads a label into a hidden *branch-target
-register* that the immediately following direct control-flow op (`B`, `CALL`,
-`TAIL`, or any conditional branch) then consumes. The register-indirect forms
-(`BR`, `CALLR`, `TAILR`) take their target in an ordinary register instead.
-
-A word is register-sized — 32 bits on 32-bit targets, 64 bits on 64-bit
-targets. `LD`/`ST` move words; `LB`/`SB` move bytes. No 16/32-bit slots in
-between; code that needs them can OR/shift.
-
-Ops:
-
-- Materialization: `LI` (load immediate), `LA` (load label address), `LA_BR`
- (load label into the hidden branch-target register).
-- Moves: `MOV`.
-- Arithmetic: `ADD`, `SUB`, `AND`, `OR`, `XOR`, `SHL`, `SHR`, `SAR`, `MUL`,
- `DIV`, `REM`.
-- Arithmetic with immediates: `ADDI`, `ANDI`, `ORI`, `SHLI`, `SHRI`, `SARI`.
- Immediate widths are backend-dependent; the backend errors if the constant
- doesn't fit.
-- Memory: `LD`, `ST`, `LB`, `SB`.
-- Branching: `B`, `BR`, `BEQ`, `BNE`, `BLT`, `BLTU`, `BEQZ`, `BNEZ`, `BLTZ`.
- Signed and unsigned less-than; `>=`, `>`, `<=` are synthesized by swapping
- operands or inverting branch sense.
-- Calls / returns: `CALL`, `CALLR`, `RET`, `ERET`, `TAIL`, `TAILR`.
-- Frame management: `ENTER`.
-- ABI arg access: `LDARG` — reads stack-passed incoming args without
- hard-coding the frame layout.
-- System: `SYSCALL`.
-
-Calling convention:
-
-- `a0`-`a3` hold the first four argument words; extras live in an incoming
- stack-argument area, read with `LDARG`.
-- `a0` is the one-word return register. Two-word returns use `a0`/`a1`.
-- `a0`-`a3` and `t0`-`t2` are caller-saved; `s0`-`s3` and `sp` are
- callee-saved.
-- `ENTER` builds the standard frame; `ERET` tears it down and returns
- (`TAIL`/`TAILR` likewise combine teardown with a jump).
-- Stack-passed outgoing args are staged in a dedicated frame-local area
- before `CALL`, so the callee finds them at a known offset from `sp`.
-- Wider-than-two-word returns use the usual hidden-pointer trick: caller
- passes a writable result buffer in `a0`, real args slide over by one,
- callee returns that same pointer in `a0`.
-
-Program entry: portable source writes `:p1_main` as an ordinary P1 function
-with `argc` in `a0` and `argv` in `a1`, and returns an exit status in `a0`.
-The backend emits a small per-arch `:_start` stub that captures the native
-entry state, calls `p1_main`, and hands its return value to `sys_exit`. The
-portable side never sees the raw entry stack.
-
-This shape is aimed at portability and simplicity of targeting from pure macro
-expansion, sacrificing some performance, which seems a reasonable tradeoff
-for the bootstrap.
-
-P1 comes in two flavors, one per toolchain level. The M1-level flavor is a
-pair of files you catm before any P1 source:
-
-- `P1A.M1` — arch-specific backend, one DEFINE per mnemonic.
-
-The M1pp-level flavor — P1pp — is the same idea with richer macros, where the
-arch-specific encodings are specified in P1A.M1pp via macro instead of
-hardcoded like in P1A.M1:
-
-- `P1A.M1pp` — arch-specific backend.
-- `P1.M1pp` — the portable interface.
-
-## M1pp
-
-P1-at-the-M1-level gets us portability, but M1 is austere — DEFINE
-substitution and raw bytes. Fine for hand-encoding a handful of instructions;
-painful for anything of size. m1pp is a small macro expander layered on top
-of M0 that makes low-level programming bearable enough to linger down here
-near the bottom of the sea. It adds three things M0 doesn't have: macros with
-named parameters, integer expressions that evaluate at expand time, and
-conditional expansion.
-
-Features:
-
-- **Macros**: `%macro NAME(a, b) ... %endm`, called with `%NAME(x, y)`. Macro
- bodies can themselves contain macro calls, so expansion is recursive.
-- **Integer expressions**: emit little-endian hex from a Lisp expression —
- `!(expr)` for 8 bits, `@(expr)` for 16, `%(expr)` for 32, `$(expr)` for 64.
- The punctuation mirrors M0's decimal/hex immediate widths (`!@%`) and adds
- `$` for the 64-bit case.
-- **Conditional expansion**: `%select(cond, then, else)` evaluates `cond` as
- an integer and emits exactly one of the two branches. Non-zero is true.
-- **Token concat**: `a ## b` pastes two tokens into one identifier.
-- **Struct / enum shorthands**: `%struct NAME { f1 f2 }` synthesizes
- `%NAME.f1` → 0, `%NAME.f2` → 8, `%NAME.SIZE` → 16. `%enum` does the same
- with stride 1 and a `COUNT` terminator. Saves hand-counting offsets for
- records and small tag sets.
-- **Local labels**: inside a macro body, `:@loop` and `&@loop` pick up a
- fresh per-expansion suffix, so a macro can define its own labels without
- colliding with itself at a second call site.
-- **Scopes**: `%scope NAME ... %endscope` pushes a lexical scope. Inside
- it, a label written `::foo` emits as `:NAME__foo` (stack joined by `__`
- when nested), so several callers can reuse short names without collision.
- Resolution follows the caller's scope, not the macro's, so a generic
- `%break` or `%continue` inside a scoped loop lands on the right label.
-- **Stringify**: `%str(foo)` turns a word token into the string literal
- `"foo"`.
-
-Comments (`#` and `;`) and M0 `DEFINE`s pass through untouched.
-
-The expression syntax is, of course, Lisp-shaped:
-
-```
-atoms: decimal or 0x-prefixed integer literals
-calls: (+ a b) (- a b) (* a b) (/ a b) (% a b) (<< a b) (>> a b)
- (= a b) (!= a b) (< a b) (<= a b) (> a b) (>= a b)
- (& a b) (| a b) (^ a b) (~ a)
- (strlen "astr")
-```
-
-Here's a real macro from the aarch64 P1pp backend — the "add register, immediate"
-instruction:
-
-```
-%macro aa64_add_imm(rd, ra, imm12)
-%((| 0x91000000 (<< (& imm12 0xFFF) 10) (<< %aa64_reg(ra) 5) %aa64_reg(rd)))
-%endm
-```
-
-A call like `%aa64_add_imm(a0, a0, 8)` expands to a single Lisp expression
-that ORs together the opcode, the 12-bit immediate shifted into place, the
-source register field, and the destination register field. The outer `%(...)`
-evaluates that expression and emits it as four little-endian hex bytes, which
-M0 and hex2 then hand to the linker. The nested `%aa64_reg(...)` calls in the
-body show that macros can call other macros during expansion — `aa64_reg` is a
-tiny macro per register name that produces the aarch64-native register number
-(e.g. `a0` → 0, `s0` → 19).
-
-That one pattern — Lisp expression in, hex bytes out — covers all per-arch
-instruction encoding.
-
-The scope mechanism is what lets libp1pp's standard library define a
-generic function macro `%fn`:
-
-```
-%fn(parse_number, 16, {
- ...
- LA_BR &::done
- BEQZ t0
- ...
- ::done
-})
-```
-
-The `16` is the frame-local byte count passed through to `%enter`. The
-`::done` labels mangle to `parse_number__done`, so every function gets
-its own short namespace with no hand-prefixing, and scoped loops inside
-the body (`%loop_scoped`, `%while_scoped_nez`, etc.) stack further — a
-`%break` anywhere inside the innermost loop finds its `_end`.
-
-## Three programs
-
-The shortest useful P1 program is a bare return. `p1_main` gets `argc` in
-`a0` on entry, `a0` is also the one-word return register, and the backend
-stub hands the return value to `sys_exit` — so this returns `argc` as the
-exit status:
-
-```
-:p1_main
- %ret
-
-:ELF_end
-```
-
-Hello world is a single `sys_write` from a leaf function:
-
-```
-:p1_main
- %li(a0) %sys_write
- %li(a1) %1 %0
- %la(a2) &msg
- %li(a3) %14 %0
- %syscall
- %li(a0) %0 %0
- %ret
-
-:msg
-"Hello, World!
-"
-
-:ELF_end
-```
-
-A few things to notice. `%li(a0)` and `%la(a2)` don't take the immediate
-as a macro argument — the backend emits a load-from-literal-pool prefix
-and the bytes that follow in source become the literal. Here
-`%sys_write` expands to the 8-byte Linux syscall number for write
-(the aarch64 backend's choice); `&msg` is a 4-byte label reference
-(addresses fit in 32 bits in the stage0 image layout). The two `%N %0`
-pairs are M0 4-byte decimal immediates padded to 8 bytes.
-
-A function call, with a helper that doubles its argument:
-
-```
-:double
- %shli(a0, a0, 1)
- %ret
-
-:p1_main
- %enter(0)
- %call(&double)
- %eret
-
-:ELF_end
-```
-
-`double` is a leaf and needs no frame. `p1_main` is not — it calls
-`double`, so it opens a standard frame with `%enter(0)` to preserve the
-hidden return-address state across the call, and closes it with
-`%eret`, which tears down the frame and returns in one step. Run with
-`./double a b c` and the exit status is `8` (argc=4, doubled).
-
-## What it cost
-
-Concrete sizes for what's landed today:
-
-- `m1pp.P1`: ~4KLOC — the P1 source of the macro expander itself.
-- `P1.M1pp`: ~150 LOC — the portable P1 interface.
-- `P1-{aarch64,riscv64,amd64}.M1pp`: ~{480,430,650} LOC — arch backends.
-- `p1pp.P1pp`: ~1KLOC — libp1pp, the standard library of control-flow
- macros and utilities.
-
-(These are non-comment, non-blank LOC)
-
-We can't really see the full payoff until we have scheme1.P1pp and cc.scm of
-course, but I think it's a meaningful improvement over per-arch subset C
-compilers, and it's a neat little macro language to boot. Adding support for a
-new arch at this level is just a few hundred lines of macro definitions, not a
-new compiler.
-
-I'll be working on a tiny Scheme implementation in P1pp, and will likely edit
-P1 and M1pp as I do.
-
-The dream is that we go from P1pp to Scheme to C via nice direct hops and no
-external dependencies until TCC, and specifically, to a tiny single-binary
-cross-compiling TCC-like C compiler. A micro Clang/LLVM, maybe 80% of the bang
-for 1% of the buck. Projects like TCC, QBE, and Tilde all make me hopeful that
-this is possible, and as ever, people like Jeremiah, Jan, and Fabrice are
-inspirations. Here's to the crazy ones.