commit 0696a381fa35277134e0c1dd22511fc2de886e96
parent 7a1408b717a8edf57455a7733b9113603afe3bbf
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Tue, 21 Apr 2026 08:53:42 -0700
docs: drop C1 and kaem-minimal; collapse to three contributions
Remove C1 as a separate compiler layer — seed tools now compile
through the same Lisp-hosted C compiler that builds tcc-boot
(PLAN.md). Vendor live-bootstrap's mescc-tools-extra, simple-patch,
and M2libc portable layer instead of authoring C here; no dispatcher
binary. Expand Lisp's syscall surface to 8 (+clone/execve/waitid,
open→openat) so the Lisp program itself drives the tcc-boot build,
eliminating kaem-minimal as a separate artifact.
Post-M1 contributions now: P1 pseudo-ISA, Lisp interpreter, C
compiler in Lisp.
Diffstat:
| D | docs/C1.md | | | 486 | ------------------------------------------------------------------------------- |
| M | docs/LISP.md | | | 8 | ++++++-- |
| M | docs/PLAN.md | | | 69 | ++++++++++++++++++++++++++++++++++++++++++++++++--------------------- |
| M | docs/SEED.md | | | 356 | +++++++++++++++++++++++++++++++++++++++++-------------------------------------- |
4 files changed, 239 insertions(+), 680 deletions(-)
diff --git a/docs/C1.md b/docs/C1.md
@@ -1,486 +0,0 @@
-# Bootstrap C-like Language
-
-A minimal C-like language tuned for trivial one-pass compilation. Strict LL(1)
-grammar, recursive-descent parseable with one token of lookahead, no semantic
-feedback into the parser. Integer-only, two's complement, word-sized.
-
-## Goals and non-goals
-
-Goals:
-
-- **LL(1) grammar.** Hand-written recursive descent with one-token lookahead.
- No symbol table needed for parsing.
-- **One-pass compilation.** Source order determines visibility; forward
- declarations are used for anything defined later.
-- **One integer width.** `int` (machine word) and `byte` (8 bits, memory only).
- No promotions, no rank, no integer zoo.
-- **No type-directed parsing.** The tokenizer and parser never ask "is this
- identifier a type?"
-- **Trivial code generation.** Tree walker emitting stack-machine-style output
- is sufficient. No register allocation required.
-- **Explicit over implicit.** Ambiguous precedence combinations require
- parentheses.
-
-Non-goals: ergonomics, expressiveness, optimization, source compatibility with
-C.
-
-## Lexical structure
-
-- **Identifiers:** `[A-Za-z_][A-Za-z0-9_]*`.
-- **Integers:** decimal `123`, hex `0x7F`, character `'a'` with escapes
- `\n \t \r \0 \\ \' \"`.
-- **Strings:** `"..."` with the same escapes. Type is `[]byte` — a slice
- with `.ptr` into static read-only storage and `.len` equal to the number
- of source bytes (no null terminator in the length, though a trailing
- `'\0'` byte is present for interop with C-style APIs).
-- **Comments:** `// ...` to end of line. No block comments.
-- **Keywords:** `var const fn type struct return if else while break continue
- switch default pub extern as sizeof null int byte`.
-- **Operators:** listed in the expression grammar below.
-
-## Top-level structure
-
-```
-program = { toplevel } EOF
-toplevel = [ 'pub' ] ( fn_def | var_decl | type_decl | fn_decl | const_decl )
- | 'extern' ( var_decl | fn_decl )
-fn_def = 'fn' IDENT '(' params ')' type block
-fn_decl = 'fn' IDENT '(' params ')' type ';'
-var_decl = 'var' IDENT type [ '=' const_expr ] ';'
-type_decl = 'type' IDENT [ '=' type ] ';' // no '=' means forward decl
-const_decl = 'const' IDENT '=' const_expr ';'
-params = [ param { ',' param } ]
-param = IDENT type
-```
-
-### Visibility
-
-All top-level names are **file-local by default**. Prefix with `pub` to export
-from the translation unit. `extern` declares a name defined in another unit
-and is always non-`pub` (it is a local reference to an external symbol).
-
-```
-var counter int; // file-local
-pub var version int = 1; // exported
-extern fn write(fd int, buf []byte) int; // defined elsewhere
-```
-
-### Forward declarations
-
-Because the compiler is one-pass, every name must be declared before use.
-Functions defined later in the same file need a forward `fn` declaration
-(signature with `;` instead of a body). Same for globals, if needed.
-
-```
-fn main() int; // forward decl; body appears later
-
-fn helper() int {
- return main(); // legal because of the forward decl
-}
-
-fn main() int {
- return 0;
-}
-```
-
-Forward declarations and definitions must agree on signature exactly.
-
-### Constants
-
-```
-const MAX_TOKENS = 1024;
-pub const AST_ADD = 1;
-pub const AST_SUB = AST_ADD + 1;
-```
-
-`const` binds a name to a compile-time integer. The right-hand side must be
-a constant integer expression — integer/character literals, earlier `const`
-names, `sizeof(T)`, and the arithmetic/bitwise/shift/compare operators
-applied to those. Constants have type `int` and can be used anywhere an
-`int` is expected, including array sizes, `switch` case labels, and `var`
-or `const` initializers.
-
-Constants occupy no storage and cannot be addressed with `&`. A `const` is
-local to the file unless marked `pub`.
-
-## Types
-
-Prefix constructors; read left-to-right.
-
-```
-type = 'int' | 'byte'
- | '*' type
- | '[' INT ']' type // array
- | '[' ']' type // slice
- | 'struct' '{' { IDENT type ';' } '}'
- | 'fn' '(' typelist ')' type
- | IDENT // named type
-typelist = [ type { ',' type } ]
-```
-
-Examples:
-
-| Type | Meaning |
-|------------------------------|-------------------------------------------|
-| `int` | signed machine word |
-| `byte` | 8-bit value, memory only |
-| `*int` | pointer to int |
-| `[10]int` | array of 10 ints |
-| `[]int` | slice of int (pointer + length) |
-| `*[10]*int` | pointer to array of 10 pointers to int |
-| `fn(int, int) int` | function pointer |
-| `struct { x int; y int; }` | anonymous struct type |
-
-### Named types
-
-```
-type Point = struct { x int; y int; };
-type NodePtr = *Node;
-type Node = struct { next *Node; value int; };
-```
-
-Self-reference through a pointer works in one pass because the pointer's
-pointee type is just a name at parse time. Mutual recursion across two `type`
-declarations requires a forward `type Name;` declaration (empty body) followed
-by the definition.
-
-### Slices
-
-A slice `[]T` is a two-word value: a pointer and a length. It is laid out
-as if declared:
-
-```
-struct { ptr *T; len int; }
-```
-
-with guaranteed field order and the members accessible as `.ptr` and `.len`.
-Unlike other aggregate types, **slices pass and return by value** — they are
-always exactly two words.
-
-Construct a slice from an array or another slice with slice syntax:
-
-```
-var buf [256]byte;
-var s1 []byte = buf[..]; // whole array
-var s2 []byte = buf[0..16]; // first 16 bytes
-var s3 []byte = s1[4..]; // from index 4 to end
-var s4 []byte = s1[..10]; // first 10
-```
-
-Or assemble one by assigning the fields directly:
-
-```
-var s []int;
-s.ptr = &arr[0];
-s.len = 10;
-```
-
-Indexing `s[i]` is sugar for `*(s.ptr + i)` with element-size scaling and is
-an lvalue. Taking `&s[i]` yields `*T`. There is no bounds checking.
-
-Slicing is allowed on arrays and on slices only. `expr[i..j]` requires
-`i <= j` and yields a slice of length `j - i`. Omitted bounds default to `0`
-(low) and the base's length (high). Slicing a bare `*T` is not supported —
-set `.ptr` and `.len` manually instead.
-
-Slices **cannot be compared by value.** `s1 == s2` is an error. Compare
-`s1.ptr == s2.ptr` and `s1.len == s2.len` if you mean identity, or write
-a byte-by-byte equality helper if you mean content.
-
-### No implicit conversions
-
-Every cross-type conversion goes through `as`:
-
-- `byte` ↔ `int`: explicit `as`. `byte as int` zero-extends; `int as byte`
- truncates to the low 8 bits.
-- `*T` ↔ `*U`: explicit `as`.
-- `int` ↔ `*T`: explicit `as`.
-- `null` is assignable to any `*T` without `as` (the only exception).
-
-### No decay
-
-Arrays and structs do **not** decay or copy implicitly. To pass one to a
-function, take its address with `&`. `&arr` yields `*T` pointing at the
-first element (not `*[N]T`). `&s` on a struct yields `*S`.
-
-```
-var buf [256]byte;
-write(1, &buf, 256); // pass pointer to first byte
-
-var p Point;
-init_point(&p, 3, 4); // pass pointer to struct
-```
-
-Arrays cannot be assigned, returned, or passed by value. Structs cannot be
-assigned, returned, or passed by value. If you want to copy, call a helper.
-
-## Statements
-
-```
-block = '{' { statement } '}'
-statement = var_decl
- | 'if' expr block [ 'else' ( if_tail | block ) ]
- | 'while' expr block
- | switch_stmt
- | 'return' [ expr ] ';'
- | 'break' ';'
- | 'continue' ';'
- | block
- | expr_stmt
-if_tail = 'if' expr block [ 'else' ( if_tail | block ) ]
-switch_stmt = 'switch' expr '{' { case_arm } [ default_arm ] '}'
-case_arm = const_expr { ',' const_expr } block
-default_arm = 'default' block
-expr_stmt = expr ( '=' expr ';' | ';' )
-```
-
-- **No parentheses on conditions.** The expression ends at the opening `{` of
- the block because `{` is never a valid continuation of an expression.
-- Braces are **mandatory** on every `if`, `else`, `while`. Dangling-else is
- impossible.
-- **Assignment is a statement, not an expression.** No chained assignment,
- no assignment inside conditions. `=` vs `==` confusion at the statement
- level is caught by grammar.
-- No `for`, no ternary, no comma operator, no compound assignment
- (`+=` etc.), no `++`/`--`.
-- Local variables are **uninitialized** unless `= expr` is given. Local
- initializers may be any expression, not just constants.
-- Scalar globals (`int`, `byte`, `*T`, `fn(...)...`): initializer must be a
- constant expression; zero-initialized if omitted.
-- Aggregate globals (arrays, structs, slices): **zero-initialized only** —
- there is no non-zero initializer syntax. Populate at program start if
- needed.
-
-### Switch
-
-```
-switch tok.kind {
- TK_PLUS, TK_MINUS { return parse_add(tok); }
- TK_STAR { return parse_mul(tok); }
- TK_LPAREN { return parse_group(tok); }
- default { return parse_error(tok); }
-}
-```
-
-- The scrutinee is any integer expression (`int` or `byte`).
-- Case labels must be compile-time integer constants. Multiple labels per
- arm are comma-separated. All labels across a `switch` must be distinct.
-- Each arm is a mandatory `{}` block. **No fallthrough.**
-- `default` is optional. With no default and no matching case, the `switch`
- has no effect.
-- `break` and `continue` inside an arm refer to the enclosing `while`, not
- the `switch`. `return` returns from the function as usual.
-
-## Expressions
-
-Eight precedence levels. Where a level is marked **non-chainable**, the
-operator may appear at most once; mixing with the surrounding levels requires
-explicit parentheses. This is how we kill the classic C precedence traps
-(`a & mask == 0` meaning `a & (mask == 0)`, etc.) with zero runtime cost and
-a trivial parser — each non-chainable level is a one-shot `[ OP operand ]`
-rather than a loop.
-
-```
-expr = logor
-logor = logand { '||' logand } // chainable with itself
-logand = compare { '&&' compare } // chainable with itself
-compare = bitwise [ CMPOP bitwise ] // non-chainable
-bitwise = shift [ BITOP shift ] // non-chainable
-shift = addsub [ SHIFT addsub ] // non-chainable
-addsub = muldiv { ('+'|'-') muldiv } // left-assoc chainable
-muldiv = unary { MULOP unary } // left-assoc chainable
-unary = ('-' | '!' | '~' | '*' | '&') unary
- | postfix
-postfix = primary { '(' args ')' | '[' expr ']'
- | '[' [ expr ] '..' [ expr ] ']'
- | '.' IDENT | 'as' type }
-primary = INT | CHAR | STRING | 'null' | IDENT
- | '(' expr ')'
- | 'sizeof' '(' type ')'
-args = [ expr { ',' expr } ]
-
-MULOP = '*' | '/' | '%' | '/u' | '%u'
-SHIFT = '<<' | '>>' | '>>u'
-BITOP = '&' | '|' | '^'
-CMPOP = '==' | '!=' | '<' | '<=' | '>' | '>='
- | '<u' | '<=u' | '>u' | '>=u'
-```
-
-Concretely, these require parentheses:
-
-```
-a & b | c // error: mixing & and |
-a << b + c // error: mixing shift and add
-a == b == c // error: chained comparison
-a < b < c // error: chained comparison
-a && b || c // error: mixing && and ||
-a & mask == 0 // error: mixing bitwise and compare
-```
-
-Write instead:
-
-```
-(a & b) | c
-a << (b + c)
-(a == b) && (b == c)
-(a < b) && (b < c)
-(a && b) || c
-(a & mask) == 0
-```
-
-Chaining is allowed within the arithmetic levels (`a + b - c + d`,
-`a * b / c`) and within `&&` and `||` individually (`a && b && c`,
-`x || y || z`). Mixing `&&` and `||` still requires parentheses.
-
-### Signed vs unsigned
-
-Operators are **signed by default**. Unsigned variants are distinct tokens:
-
-| Signed | Unsigned | Meaning |
-|--------|----------|--------------------------|
-| `/` | `/u` | division |
-| `%` | `%u` | remainder |
-| `>>` | `>>u` | right shift (arith/log) |
-| `<` | `<u` | less than |
-| `<=` | `<=u` | less or equal |
-| `>` | `>u` | greater than |
-| `>=` | `>=u` | greater or equal |
-
-`+`, `-`, `*`, `==`, `!=`, `<<`, `&`, `|`, `^`, `~` have identical behavior in
-two's complement and so have no signed/unsigned split.
-
-Signed overflow **wraps** (defined, not UB). Shift by >= word width is
-undefined.
-
-### Booleans
-
-No boolean type. Zero is false, non-zero is true. Comparisons and `!` yield
-0 or 1. `&&` and `||` short-circuit via branches.
-
-### Lvalues
-
-Exactly: `IDENT` (naming a variable), `*expr`, `lv.field`, `lv[expr]`.
-
-Field access **auto-dereferences pointers**: if `lv` has type `*S`, then
-`lv.field` means `(*lv).field`. This chains as needed: `p.a.b` where `p: *A`
-and `A.a: *B` means `(*(*p).a).b`. There is no separate `->` operator.
-
-`lv[expr]` requires the base to be a pointer, an array, or a slice.
-Element-size scaling is applied — see Pointer arithmetic.
-
-### `sizeof`
-
-`sizeof(T)` takes a type, never an expression. Result is `int`, compile-time
-constant.
-
-### Casts
-
-`expr as T` is postfix. `(T)expr` is **not** a cast — parentheses are only
-for grouping. This removes the only genuine LL(1) hazard in C.
-
-### Pointer arithmetic
-
-When one operand of `+` or `-` is a pointer `*T`:
-
-- `*T + int` and `int + *T` yield `*T`, advancing by `sizeof(T)` bytes per
- unit.
-- `*T - int` yields `*T`, retreating by `sizeof(T)` bytes per unit.
-- `*T - *T` (same pointee type) yields `int`, the signed element-count
- difference.
-- Pointer + pointer is not allowed.
-
-Indexing desugars to this arithmetic:
-
-- `p[i]` with `p: *T` is `*(p + i)`.
-- `arr[i]` with `arr: [N]T` is `*(&arr + i)` (since `&arr` has type `*T`).
-- `s[i]` with `s: []T` is `*(s.ptr + i)`.
-
-### Function values
-
-A bare function name is its own function pointer — no `&` required. If
-`my_func` has type `fn(int) int`, then `my_func` is directly usable wherever
-an `fn(int) int` value is expected, and `my_func(42)` calls it. Writing
-`&my_func` is legal and yields the same pointer.
-
-## Calling convention and ABI notes
-
-- Arguments passed by value, evaluated left to right (pushed right to left
- in typical stack-based targets).
-- Returns are word-sized: `int`, any `*T`, or `byte` (returned as `int`
- with zero-extension).
-- **Structs never cross function boundaries by value.** Pass `*S`, return
- `*S`, or use an out-parameter.
-- **Slices (`[]T`) do cross by value** — always two words (pointer + length),
- in a register pair or adjacent stack slots. This is the sole exception to
- the one-word return / no-aggregate-by-value rule.
-- No varargs. To print multiple values, call multiple helpers.
-
-## Preprocessor
-
-Only `#include "path"` is supported. Inclusion is textual but
-**idempotent per resolved path** — a file already included in the current
-compilation is silently skipped on subsequent `#include`s. No include
-guards needed, no macros, no conditional compilation, no `#define`. Named
-integer constants go in `const`; type aliases go in `type`.
-
-## Example
-
-```
-#include "io.lang"
-
-pub type Node = struct {
- next *Node;
- value int;
-};
-
-fn list_len(head *Node) int; // forward decl
-
-pub fn list_sum(head *Node) int {
- var total int = 0;
- var p *Node = head;
- while p != null {
- total = total + p.value;
- p = p.next;
- }
- return total;
-}
-
-fn list_len(head *Node) int {
- var n int = 0;
- var p *Node = head;
- while p != null {
- n = n + 1;
- p = p.next;
- }
- return n;
-}
-
-pub fn main() int {
- var nodes [3]Node;
- nodes[0].value = 10; nodes[0].next = &nodes[1];
- nodes[1].value = 20; nodes[1].next = &nodes[2];
- nodes[2].value = 30; nodes[2].next = null;
-
- var sum int = list_sum(&nodes[0]);
- if (sum > 0) && (sum <u 1000) {
- put_int(sum);
- }
- return 0;
-}
-```
-
-## Dropped from C
-
-For reference — these are intentionally absent:
-
-`float`, `double`, `long double`, `complex`, `short`, `long`, `long long`,
-`unsigned` as a type (use signed types with unsigned operators), `enum`
-(use `const`), `union`, bitfields, C's `const` as a type qualifier (the
-keyword is reused for named integer constants), `volatile`, `restrict`,
-`static` (replaced by file-local default + `pub`), `typedef` (replaced by
-`type`), K&R function syntax, variadic functions, designated initializers,
-compound literals, `for`, `do`/`while`, ternary `?:`, comma operator,
-compound assignment, `++`/`--`, block comments, pre-processor macros,
-implicit conversions, array and function decay, struct/array pass-by-value,
-C's `switch` (the keyword is reused with no `case`, no fallthrough, and
-mandatory-block arms).
diff --git a/docs/LISP.md b/docs/LISP.md
@@ -51,8 +51,12 @@ Load-bearing; the rest of the document assumes them.
9. **Tail calls via P1 `TAIL`.** `eval` dispatches tail-position calls
through `TAIL`; non-tail through `CALL`. Scheme-level tail-call
correctness falls out for free.
-10. **Five syscalls: `read`, `write`, `open`, `close`, `exit`.** Matches
- PLAN.md. No signals, no `lseek`, no `stat`.
+10. **Eight syscalls: `read`, `write`, `openat`, `close`, `exit`,
+ `clone`, `execve`, `waitid`.** Matches PLAN.md. The last three
+ let the Lisp program spawn M1/hex2 and act as the tcc-boot
+ build driver; `openat(AT_FDCWD, …)` replaces bare `open`
+ because aarch64/riscv64 lack it in the asm-generic table. No
+ signals, no `lseek`, no `stat`.
11. **Pair GC marks live in a separate bitmap**, not in the pair words.
~1.25 MB BSS for a 20 MB heap; keeps pairs at 16 bytes and keeps
fixnums at 61 bits.
diff --git a/docs/PLAN.md b/docs/PLAN.md
@@ -118,20 +118,18 @@ uses these heavily.
## Backend
-Two options, to be decided after the P1 spike:
-
-1. **Emit text M1 assembly** for x86_64, single-arch. Simplest codegen;
- tcc-boot only runs on amd64. Matches the original plan.
-2. **Emit P1** from the C compiler. The C compiler is written once in
- portable Lisp and also *emits* portable asm, so tcc-boot lands on all
- three arches for free (modulo tcc-boot's own arch support). Codegen gets
- slightly harder — P1 is deliberately dumb, so C idioms like `x += y`
- expand to multi-op P1 sequences — but we pay the ~2× code-size tax
- already budgeted in `P1.md` rather than writing three backends.
-
-Option 2 is the natural endpoint of the P1 investment. Defer the decision
-until we have measured P1 codegen quality on a non-trivial program (P1.md
-stage 5).
+**Settled: emit P1.** The C compiler is written once in portable Lisp and
+emits portable asm, so both the pre-tcc-boot seed userland (`SEED.md`) and
+tcc-boot itself land on all three arches without a second backend. Codegen
+is slightly harder than direct amd64 — P1 is deliberately dumb, so C
+idioms like `x += y` expand to multi-op P1 sequences — but we pay the
+~2× code-size tax already budgeted in `P1.md` rather than writing three
+backends.
+
+This forecloses the alternative of emitting amd64 M1 directly (simpler
+codegen, single-arch only). That option would have satisfied a
+tcc-boot-only goal, but `SEED.md` requires tri-arch seed binaries, so a
+portable backend is load-bearing.
## Estimated budget
@@ -140,7 +138,7 @@ stage 5).
| Lisp interpreter in P1 (reader, eval, GC, primitives, I/O, pmatch) | 4,000–6,000 P1 |
| C lexer + recursive-descent parser + CPP (in Lisp) | 2,000–3,000 |
| Type checker + IR (slimmed compile.scm + info.scm) | 2,000–3,000 |
-| Codegen + asm emit (M1-amd64 or P1, see Backend) | 800–1,500 |
+| Codegen + P1 emit (see Backend) | 800–1,500 |
| **Total auditable (this plan)** | **~9,000–13,000 LOC** |
vs. **~54,000 LOC** current = **~4–6× shrink**, and the result is
@@ -175,9 +173,38 @@ with any future seed-stage program.
region at link time. No `brk`/`mmap` at runtime, no arena-sizing flag.
Keeps the P1 program to a minimal syscall surface and makes the
interpreter image self-describing.
-- **Syscalls: five.** `read`, `write`, `open`, `close`, `exit`. Each
- becomes one P1 `SYSCALL` op backed by a per-arch number table in the
- P1 defs file. `read-file` loops `read` into a growable string until
- EOF (no `stat`/`lseek`); `display`/`write`/`error` go through `write`
- on fd 1/2; `error` finishes with `exit`. No signals, time, fork/exec,
- or networking.
+- **Syscalls: eight.** `read`, `write`, `openat`, `close`, `exit`,
+ `clone`, `execve`, `waitid`. Each becomes one P1 `SYSCALL` op
+ backed by a per-arch number table in the P1 defs file.
+ `read-file` loops `read` into a growable string until EOF (no
+ `stat`/`lseek`); `display`/`write`/`error` go through `write` on
+ fd 1/2; `error` finishes with `exit`. `openat(AT_FDCWD, …)`
+ replaces `open` because aarch64/riscv64 lack bare `open` in the
+ asm-generic table. `clone(SIGCHLD)` + `execve` + `waitid` give
+ the Lisp enough to drive the tcc-boot build directly — see
+ "Build driver" below. No signals, time, or networking.
+
+## Build driver
+
+Once Lisp can spawn, the Lisp program itself is the build driver.
+There is no separate shell. A top-level Lisp source file reads the
+pinned list of tcc-boot translation units, iterates over them, and
+for each one:
+
+1. Reads the `.c` source into a Lisp string.
+2. Calls the Lisp-hosted C compiler (in-process) to produce P1 text.
+3. Writes the P1 text to a temp file.
+4. Spawns M1 (from stage0-posix, via `clone`+`execve`) to assemble
+ P1 → `.hex2`; waits via `waitid`, aborts on non-zero.
+5. Spawns hex2 to emit the final `.o` / ELF; waits, aborts on
+ non-zero.
+
+The seed-tool builds (each mescc-tools-extra source → one ELF) run
+the same loop. Spawn-and-wait is a ~20 LOC Lisp primitive; the full
+driver, including the hard-coded tcc-boot file list, is ~100–200
+LOC of Lisp counted against this plan.
+
+Concentrating orchestration in the Lisp program (rather than a
+separate P1/M1 shell) collapses the post-M1 contribution list to
+exactly three artifacts: P1, the Lisp interpreter, and the C
+compiler.
diff --git a/docs/SEED.md b/docs/SEED.md
@@ -4,14 +4,14 @@
Bridge the window between *Lisp exists* and *tcc-boot exists* without
touching M2-Planet, Mes, or MesCC. Inside that window, all code is
-either a Lisp program running on the Lisp interpreter or subcommands
-of a single monolithic C1 binary (`seed`) compiled through the
-Lisp-hosted C1 compiler → P1 → M1 → hex2 pipeline.
+either a Lisp program running on the Lisp interpreter or one of a
+small set of standalone C binaries compiled through the Lisp-hosted
+C compiler → P1 → M1 → hex2 pipeline.
This document covers only that window. Phases before it (`seed0 →
-hex0/hex1/hex2 → M1`, P1 defs, Lisp interpreter) are documented in
-`P1.md` and `PLAN.md`. tcc-boot itself and everything downstream are
-standard C and out of scope.
+hex0/hex1/hex2 → M1`, P1 defs, Lisp interpreter, and the Lisp-hosted
+C compiler) are documented in `P1.md` and `PLAN.md`. tcc-boot itself
+and everything downstream are standard C and out of scope.
## Position in the chain
@@ -19,142 +19,136 @@ standard C and out of scope.
stage0-posix: seed0 → hex0 → hex1 → hex2 → M1 (no C, no Lisp)
P1 layer: P1 defs files load into M1 (P1.md)
Lisp: P1 text (Lisp interp source) → M1 → hex2 (PLAN.md)
-C1 compiler: Lisp program, loaded into the Lisp image (this doc)
-──────── seed window begins here ────────
-seed binary: C1 source → Lisp+C1cc → P1 text → M1 → hex2 (this doc)
C compiler: Lisp program, loaded into the Lisp image (PLAN.md)
+──────── seed window begins here ────────
+seed tools: C source → Lisp+Ccc → P1 text → M1 → hex2 (this doc)
──────── seed window ends when tcc-boot is built ────────
tcc-boot: C source → Lisp+Ccc → P1 text → M1 → hex2 (PLAN.md)
```
-Two Lisp programs (C1 compiler, C compiler) and one statically-linked
-C1 binary. No M2-Planet artifact and no Mes Scheme module anywhere.
+One Lisp-hosted C compiler (shared with tcc-boot) and a handful of
+statically-linked C binaries. No M2-Planet artifact and no Mes
+Scheme module anywhere.
## Settled decisions
These are load-bearing; rest of the document assumes them.
-1. **C1 targets P1.** One C1 source per subcommand, tri-arch binary
- via the existing M1+hex2 path. Accepts P1's ~2× code-size tax.
-2. **C1 compiler lives in Lisp.** Same host as the C compiler; shares
- the Lisp runtime. ~1.5–2.5k LOC Lisp, counted against `PLAN.md`.
-3. **Monolithic `seed` binary.** One executable with subcommand
- dispatch on `argv[1]` (e.g. `seed kaem script.kaem`, `seed cat
- file`, `seed cp a b`). One audit unit, one copy of the runtime,
- no loader. Bug blast radius is the whole seed userland — mitigated
- by keeping each subcommand self-contained and tested in isolation.
-4. **Uncompressed tcc-boot mirror.** Host the upstream tcc-boot source
+1. **Seed programs compile through the same Lisp-hosted C compiler
+ as tcc-boot.** No separate seed-stage compiler. Authors write in
+ the C subset fixed in `PLAN.md`; backend emits P1, so seed lands
+ tri-arch via the existing M1+hex2 path. Accepts P1's ~2×
+ code-size tax.
+2. **Vendor upstream C where it exists.** `cat`, `cp`, `mkdir`,
+ `rm`, `sha256sum`, `untar` are taken from live-bootstrap's
+ `mescc-tools-extra`; `patch-apply` from `simple-patch-1.0`.
+ The libc these sources depend on (`<stdio.h>`, `<string.h>`,
+ `<stdlib.h>`, etc.) is vendored M2libc's portable layer —
+ `bootstrappable.c`, `string.c`, `stdio.c`, `stdlib.c`, and the
+ small `ctype`/`fcntl` files (~1,500 LOC). Per-arch syscall
+ stubs backing M2libc's declarations are replaced with our
+ P1-based stubs (see "How seed tools reach syscalls" below). All
+ of the above was written against M2-Planet's C subset, which is
+ a subset of ours. Local adaptations ship as unified diffs in
+ the repo. **No C is written fresh here** — each vendored
+ source already has its own `main`.
+3. **The Lisp program is the build driver — no separate shell.**
+ Per `PLAN.md`, the Lisp's syscall surface includes `clone`,
+ `execve`, `waitid`, so a top-level Lisp file drives the whole
+ tcc-boot build: iterate over translation units, call the
+ Lisp-hosted C compiler in-process, spawn M1/hex2 to finish
+ each artifact, check exit status. No `kaem`, no `sh`, no flat
+ script — just Lisp code.
+4. **One binary per tool.** Each vendored source compiles to a
+ standalone ELF — `cat`, `cp`, `mkdir`, `rm`, `sha256sum`,
+ `untar`, `patch-apply`. Installed into a single directory
+ (say, `/seed/`) and invoked by absolute path from the Lisp
+ driver. No dispatcher, no argv[0] multiplexing, no fresh `main`
+ to write. Each tool is its own audit unit.
+5. **Uncompressed tcc-boot mirror.** Host the upstream tcc-boot source
as an uncompressed `.tar` with sha256 pinned. No gzip support
anywhere in the seed stage. Deletes ~1000–1500 LOC of deflate from
the audit.
-5. **Explicit patches via `seed patch-apply`.** Upstream source stays
+6. **Explicit patches via `patch-apply`.** Upstream source stays
verbatim. Our changes live as unified-diff files in this repo,
- applied by a ~200 LOC C1 subcommand. "Upstream vs ours" stays
- legible.
-6. **fork + execve for process spawn.** Simplest kernel contract,
- stable syscall numbers on all three arches. Plus `wait4` to
- reap children. No clone, vfork, or posix_spawn.
+ applied by the `simple-patch`-derived binary. "Upstream vs
+ ours" stays legible.
7. **Target self-build is primary; cross-build is a cache.** The
canonical build is a fresh target machine bootstrapping from
stage0-posix hex seed. Cross-built per-arch tarballs are supported
as a reproducibility cache — identical bytes expected, verified
against a target self-build, not trusted by assumption.
-## The `seed` binary
-
-One ELF per arch, invoked as `seed <subcommand> [args...]`. Internal
-dispatch table maps `argv[1]` to a function; unknown subcommands error
-out. Startup shim parses `argc/argv`, calls the dispatch function,
-propagates its return code to `exit`.
-
-### Subcommands
-
-| Subcommand | Purpose | C1 LOC |
-|---------------|--------------------------------------------------|----------|
-| `kaem` | shell driving the tcc-boot build | 700–900 |
-| `untar` | POSIX ustar extract (no gzip, no creation) | 500–700 |
-| `patch-apply` | apply a unified diff in-place | ~200 |
-| `sha256sum` | verify source tarball hashes | 500–700 |
-| `cp` | copy one file | ~150 |
-| `mkdir` | single-level directory create | ~80 |
-| `rm` | remove one file (no `-r`, no `-f`) | ~120 |
-| `mv` | rename within one filesystem | ~150 |
-| `cat` | concatenate files to stdout | ~80 |
-| `test` | file and string predicates for kaem | ~280 |
-| `echo` | write args to stdout | ~50 |
-| dispatch + argv plumbing | top-level `main`, subcommand table | ~100 |
-| C1 runtime + mini libc | startup, syscalls, memcpy/memset/str* | ~400 |
-| **Total** | | **~3310–3910** |
-
-Dispatch is flat: there is no nesting, no aliases, no argv[0]-based
-dispatch. Kaem scripts write out `seed <sub>` in full. One installed
-file on disk, no symlinks, no `link` syscall needed.
-
-### Kaem feature set
-
-Line-oriented minimal shell:
-
-- One command per line. No `;`, no `&&`, no `||`.
-- Command = word (built-in or path) + whitespace-separated args.
- Quoting: `"..."` is one arg, with `\n \t \\ \"` escapes. No
- single-quote form.
-- Variable substitution: `${NAME}` from environment only.
-- Built-ins: `cd` (via `chdir` syscall), `set NAME=VALUE` (env),
- `exit`.
-- Redirection: `> file` (truncate) and `< file` (stdin). No append,
- no pipes.
-- Failure: non-zero exit from any command aborts the script.
-- Comments: `#` to end of line.
-- Expansion excluded: globbing, command substitution, arithmetic,
- here-docs, background jobs.
-
-That suffices to express "unpack → verify → compile each file → link
-→ install." Orchestration lives in kaem text, not C1.
+## The seed tools
+
+One ELF per tool per arch. Each tool is invoked by absolute path
+from the Lisp build driver (e.g. `/seed/sha256sum foo.tar`). Each
+binary links against the same vendored M2libc portable layer and
+the same P1 syscall stubs.
+
+### Inventory
+
+| Tool / layer | Purpose | Source / LOC |
+|--------------------|---------------------------------------------|-------------------------|
+| `untar` | POSIX ustar extract (no gzip, no creation) | mescc-tools-extra/untar.c (460) |
+| `patch-apply` | apply a unified diff in-place | simple-patch-1.0 (~200) |
+| `sha256sum` | verify source tarball hashes | mescc-tools-extra/sha256sum.c (586) |
+| `cp` | copy one file | mescc-tools-extra/cp.c (332) |
+| `mkdir` | single-level directory create | mescc-tools-extra/mkdir.c (117) |
+| `rm` | remove one file (no `-r`, no `-f`) | mescc-tools-extra/rm.c (54) |
+| `cat` | concatenate files to stdout | mescc-tools-extra/catm.c (69) |
+| libc (portable) | stdio, string, stdlib, ctype, fcntl | vendored M2libc (~1,500) |
+| syscall stubs | per-arch bridge below M2libc | ~120 lines of P1, not C |
+| **Total C** | | **~3,300, fully vendored** |
+
+Deliberately excluded: `test`, `echo`, `mv`. The Lisp driver does
+any conditional or rename logic it needs in Lisp, and emits
+progress messages via its own `write` calls — no externalised
+shell utilities needed for those concerns.
+
+The driver is Lisp code, not a shell script; see `PLAN.md`'s
+"Build driver" section for the control flow.
## Syscall surface
-Combined with PLAN.md's compiler surface, the seed window requires
-**12 syscalls** total. Each gets one row in every `p1_<arch>.M1`
-defs file.
+The seed tools collectively need **7 syscalls** (process spawn
+lives in the Lisp driver, not in the tools).
| Syscall | Used by |
|------------|-------------------------------------------|
-| `read` | all file-reading subcommands, Lisp I/O |
+| `read` | all file-reading tools |
| `write` | stdout/stderr, all file-writing |
-| `open` | file open (`O_RDONLY` / `O_WRONLY|O_CREAT|O_TRUNC` with mode) |
+| `openat` | file open (`AT_FDCWD` + `O_RDONLY` / `O_WRONLY|O_CREAT|O_TRUNC` with mode) |
| `close` | all file ops |
| `exit` | program termination |
-| `fork` | kaem child spawn |
-| `execve` | kaem child spawn |
-| `wait4` | kaem reaping children |
-| `mkdir` | `seed mkdir`, `untar` (directory entries) |
-| `unlink` | `seed rm` |
-| `rename` | `seed mv` |
-| `access` | `seed test` (file predicates) |
-| `chdir` | kaem `cd` builtin |
-
-Bumps `PLAN.md`'s "five syscalls" contract to 13 (includes `chdir`);
-PLAN.md should be cross-referenced to this list, not restated
-independently. Deliberately excluded: `stat/fstat` (use `access`
-instead), `chmod` (rely on `open` mode bits for initial perms),
+| `mkdir` | `mkdir` tool, `untar` (directory entries) |
+| `unlink` | `rm` tool |
+
+PLAN.md's Lisp surface is 8 syscalls (`read`, `write`, `openat`,
+`close`, `exit`, `clone`, `execve`, `waitid`). The seed tools add
+`mkdir` and `unlink` on top of that, for a window total of **10
+distinct syscalls**. Each gets one row in every `p1_<arch>.M1`
+defs file. Deliberately excluded: `stat/fstat`, `access`,
+`rename`, `chmod` (rely on `openat` mode bits for initial perms),
`lseek` (all reads are sequential), `getdents`/`readdir` (no
directory traversal needed), `dup`/`pipe`/signals/time/net.
-### How C1 reaches syscalls
+### How seed tools reach syscalls
-C1 has no inline asm and no intrinsics. Each syscall is exposed as an
-ordinary `extern fn` declaration, backed by a hand-written P1 stub in
-`runtime.p1`. The stubs are ~3 P1 ops each (load number, `SYSCALL`,
-`RET`), totalling ~40 lines of P1 for the whole surface.
+The Lisp-hosted C compiler has no inline asm and no intrinsics. Each
+syscall is exposed as an ordinary `extern` function declaration,
+backed by a hand-written P1 stub in `runtime.p1`. The stubs are ~3 P1
+ops each (load number, `SYSCALL`, `RET`), totalling ~40 lines of P1
+for the whole surface.
```
-:sys_write ; C1 args arrive in P1 r1-r6 per call ABI
+:sys_write ; C args arrive in P1 r1-r6 per call ABI
SYSCALL write ; expands per-arch via p1_<arch>.M1 defs
RET
```
```
-extern fn sys_write(fd int, buf ptr byte, n int) int;
+extern int sys_write(int fd, char *buf, int n);
```
Prerequisite: P1 picks its argument registers (`r1–r6`) to coincide
@@ -169,55 +163,65 @@ the result register. Wrappers return the raw integer; callers test
## Build ordering inside the seed window
-Once the Lisp interpreter binary exists and the C1 compiler Lisp
-source is loaded:
-
-1. Compile the `seed` monolith: one C1 source file (or small set
- `#include`d into one translation unit, since C1's preprocessor
- supports `#include` only) → P1 text → M1 → hex2 → `seed` ELF.
- Per-arch, repeat for each target.
-2. Install `seed` on the target (copy to a known path). No other
- setup required.
-
-The tcc-boot build then runs as kaem scripts:
-
-1. `seed sha256sum upstream.tar` against pinned hash.
-2. `seed untar upstream.tar`.
-3. For each patch file: `seed patch-apply patches/foo.diff`.
-4. Loop over tcc-boot `.c` files, invoking Lisp-as-C-compiler to
- emit P1 text, then M1+hex2 to produce per-object files or a
- single linked binary. (tcc-boot's build is simple enough to
- treat as one compilation unit; the loop is unrolled in kaem.)
-5. Install tcc-boot binary.
-
+Once the Lisp interpreter binary exists and the C compiler Lisp
+source is loaded (both per `PLAN.md`):
+
+1. Compile each seed tool independently: its vendored source plus
+ the vendored M2libc layer plus the per-arch P1 syscall stubs →
+ P1 text → M1 → hex2 → one ELF per tool. Per-arch, repeat for
+ each target.
+2. Install the tools into a single directory on the target (e.g.
+ `/seed/`). No other setup required.
+
+The tcc-boot build runs as a Lisp program invoked on the Lisp
+interpreter. The driver:
+
+1. Spawns `/seed/sha256sum upstream.tar` and checks against pinned
+ hash.
+2. Spawns `/seed/untar upstream.tar`.
+3. For each patch file: spawns `/seed/patch-apply patches/foo.diff`.
+4. Iterates over tcc-boot `.c` files. For each one, calls the
+ Lisp-hosted C compiler in-process to emit P1 text, then spawns
+ M1 and hex2 to produce the object or final linked binary.
+5. Installs the tcc-boot binary.
+
+See `PLAN.md` "Build driver" for the spawn-and-wait primitive.
Seed window is closed.
## Target self-build vs cross-build
**Target self-build (primary).** A fresh machine of arch `A` starts
from the stage0-posix hex seed, runs the hex0→hex1→hex2→M1 chain,
-loads `p1_A.M1`, assembles the Lisp interpreter, loads the C1
-compiler into Lisp, compiles `seed`, runs the tcc-boot build. Whole
-process is a kaem script (bootstrapped from a hand-assembled first
-kaem, same way hex2 and M1 are) driving the toolchain.
+loads `p1_A.M1`, assembles the Lisp interpreter, loads the C
+compiler into Lisp, runs the Lisp build-driver program, which
+compiles each seed tool, then compiles and links tcc-boot.
+stage0-posix's own `kaem` runs the early hex0→M1 chain; above M1,
+the Lisp program takes over.
**Cross-build cache (secondary).** On an already-bootstrapped
-machine, produce `seed` binaries for all three arches and ship them
-as tarballs. Users who opt into this skip the target self-build and
-land directly at "seed installed." Trust claim: **none by
-assumption** — the cache is only trusted after a target self-build
-of at least one arch has verified byte-identical output. Cross-build
-is an optimization, not a trust input.
+machine, produce the seed tool binaries for all three arches and
+ship them as tarballs. Users who opt into this skip the target
+self-build and land directly at "seed tools installed." Trust
+claim: **none by assumption** — the cache is only trusted after a
+target self-build of at least one arch has verified byte-identical
+output. Cross-build is an optimization, not a trust input.
## Provenance
-Three kinds of artifact flow in:
+Artifacts flowing in:
- **stage0-posix hex seed + P1 defs**: part of this repo, audited
with the rest of it.
-- **Lisp interpreter source (in P1)**: part of this repo.
-- **C1 sources for `seed` + the C1 compiler + C compiler (in Lisp)**:
- part of this repo.
+- **Lisp interpreter source (in P1) and C compiler (in Lisp)**:
+ part of this repo, covered by `PLAN.md`.
+- **Vendored seed C sources**: pinned snapshots of
+ live-bootstrap's `mescc-tools-extra` (catm, cp, mkdir, rm,
+ sha256sum, untar), `simple-patch-1.0`, and M2libc's portable
+ layer (the libc the mescc-tools sources depend on — stdio,
+ string, stdlib, ctype, fcntl, bootstrappable). All shipped
+ verbatim as `.tar` files with sha256 pinned. Local adaptations
+ ride as unified diffs in the repo, applied by `patch-apply` at
+ build time so "upstream vs ours" stays legible.
- **Upstream tcc-boot source**: mirrored as uncompressed `.tar` at
a pinned URL + sha256. The mirror file is one of this repo's
auditable inputs; it can be re-derived from upstream by untaring
@@ -225,10 +229,14 @@ Three kinds of artifact flow in:
published `.tar.gz` by re-gzipping and comparing hashes on a
machine that has `gzip` (done once, out of band).
-`seed sha256sum` is the single piece of C1 whose correctness has a
-direct trust consequence downstream; unit-test it against known
-vectors (empty string, "abc", "abcdbcde..."-length tests) before
-declaring the seed build complete.
+No C is authored fresh in this repo for the seed window; the only
+things written here are unified-diff patches against the vendored
+tree and the per-arch P1 syscall stubs.
+
+`sha256sum` is the single seed tool whose correctness has a direct
+trust consequence downstream; unit-test it against known vectors
+(empty string, "abc", "abcdbcde..."-length tests) before declaring
+the seed build complete.
## Interaction with tcc-boot
@@ -237,53 +245,59 @@ coreutils`. Mapping:
| tcc-boot expects | Seed provides |
|------------------|--------------------------------------------------|
-| `cc` / `gcc` | kaem loop invoking Lisp-as-C-compiler per `.c` |
-| `make` | flat kaem script (tcc-boot is simple enough) |
-| `sh` | `seed kaem` |
-| `cat`/`cp`/etc. | `seed <sub>` |
+| `cc` / `gcc` | Lisp-hosted C compiler, invoked in-process per `.c` |
+| `make` | Lisp driver program (tcc-boot is simple enough) |
+| `sh` | not provided — the Lisp driver spawns tools directly |
+| `cat`/`cp`/etc. | individual seed-tool binaries at absolute paths |
| `ld` | tcc-boot's built-in linker (for its own output) |
| `ar` | not needed; tcc-boot builds one static binary |
-A thin shim script under `scripts/` maps tcc-boot's literal command
-names (`cc`, `make`, `install`) to the `seed <sub>` / Lisp-invocation
-forms. That shim is kaem text, not C1.
+Any translation from tcc-boot's literal build-command names
+(`cc`, `make`, `install`) to seed tools lives in Lisp, not in a
+separate shim script.
## Budget rollup
Fresh auditable LOC introduced by this document, on top of PLAN.md:
-| Layer | LOC |
-|-----------------------------------------------|-----------------|
-| C1 compiler (Lisp, counted in PLAN.md) | (1,500–2,500) |
-| `seed` monolith (all subcommands + runtime) | 3,300–3,900 |
-| kaem scripts (orchestration, driver) | a few hundred |
-| **Seed window addition** | **~3,300–3,900**|
+| Layer | LOC |
+|--------------------------------------------------------|---------|
+| seed tools — vendored mescc-tools-extra + simple-patch | ~1,800 |
+| seed tools — vendored M2libc portable layer | ~1,500 |
+| syscall stubs (P1, not C) | ~120 |
+| Lisp build-driver program | counted in PLAN.md |
+| **Seed window addition** | **~3,300 C (all vendored) + ~120 P1** |
Combined PLAN.md + SEED.md audit surface: **~13–17k LOC**, tri-arch,
-M2-Planet-free and Mes-free.
+M2-Planet-free and Mes-free. No fresh C is authored for the seed
+window; the entire ~3,300 LOC is audited upstream code written
+against M2-Planet's C subset. The build driver is Lisp code
+counted against PLAN.md (~100–200 LOC).
## Handoff notes for the engineer
Approximate build order for implementation:
-1. **C1 compiler in Lisp** (blocks everything below). Write against
- a small corpus of C1 test programs. Validate by compiling a
- 20–50 LOC C1 program, running the output, confirming behavior.
-2. **C1 runtime + syscall wrappers + mini libc.** Smallest
- subcommand (`echo` or `cat`) is the bring-up test.
-3. **`seed` dispatch skeleton** plus `echo`, `cat`, `cp`, `mkdir`,
- `rm`, `mv`. Small, independent, easy to unit-test.
-4. **`sha256sum`** with unit tests before anything depends on its
- correctness.
-5. **`test`** (file predicates needed by kaem).
+1. **C compiler in Lisp** (blocks everything below). Per `PLAN.md`;
+ validate on a small corpus before touching seed.
+2. **Vendor M2libc's portable layer** and write the per-arch P1
+ syscall stubs that back its declarations. Bring-up test: link
+ `catm.c` (69 LOC) against this libc and run it.
+3. **Vendor mescc-tools-extra + simple-patch.** Pin sha256s.
+ Confirm each source compiles unmodified through the Lisp-hosted
+ C compiler; if anything trips, capture the delta as a unified
+ diff rather than editing the vendored tree in place.
+4. **Build the small tools** individually (`cat`, `cp`, `mkdir`,
+ `rm`) — each is its own ELF.
+5. **`sha256sum`** with unit tests (empty / "abc" / long vectors)
+ before anything depends on its correctness.
6. **`untar`** (ustar extract only).
7. **`patch-apply`** (unified-diff in-place).
-8. **`kaem`** (depends on `fork`, `execve`, `wait4`, `chdir`,
- redirect).
-9. **End-to-end bring-up**: kaem script running `sha256sum` →
- `untar` → `patch-apply` → Lisp-C-compile loop → linked
- tcc-boot. First full trip through the seed window.
+8. **End-to-end bring-up**: Lisp build-driver running
+ `sha256sum` → `untar` → `patch-apply` → in-process C-compile
+ loop (spawning M1/hex2 per `.c`) → linked tcc-boot. First full
+ trip through the seed window.
-Each step compiles standalone C1 and assembles through the existing
+Each step compiles standalone C and assembles through the existing
P1 → M1 → hex2 path; no new tooling infrastructure is needed
between steps.