boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit c049a3b304a6f4291d0903294a972db245f5d02c
parent f1f725af88ed3d7b2eba1997e8062c362de159f1
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue, 21 Apr 2026 05:33:19 -0700

C1 and SEED docs

Diffstat:
AC1.md | 486+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ASEED.md | 289++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 775 insertions(+), 0 deletions(-)

diff --git a/C1.md b/C1.md @@ -0,0 +1,486 @@ +# Bootstrap C-like Language + +A minimal C-like language tuned for trivial one-pass compilation. Strict LL(1) +grammar, recursive-descent parseable with one token of lookahead, no semantic +feedback into the parser. Integer-only, two's complement, word-sized. + +## Goals and non-goals + +Goals: + +- **LL(1) grammar.** Hand-written recursive descent with one-token lookahead. + No symbol table needed for parsing. +- **One-pass compilation.** Source order determines visibility; forward + declarations are used for anything defined later. +- **One integer width.** `int` (machine word) and `byte` (8 bits, memory only). + No promotions, no rank, no integer zoo. +- **No type-directed parsing.** The tokenizer and parser never ask "is this + identifier a type?" +- **Trivial code generation.** Tree walker emitting stack-machine-style output + is sufficient. No register allocation required. +- **Explicit over implicit.** Ambiguous precedence combinations require + parentheses. + +Non-goals: ergonomics, expressiveness, optimization, source compatibility with +C. + +## Lexical structure + +- **Identifiers:** `[A-Za-z_][A-Za-z0-9_]*`. +- **Integers:** decimal `123`, hex `0x7F`, character `'a'` with escapes + `\n \t \r \0 \\ \' \"`. +- **Strings:** `"..."` with the same escapes. Type is `[]byte` — a slice + with `.ptr` into static read-only storage and `.len` equal to the number + of source bytes (no null terminator in the length, though a trailing + `'\0'` byte is present for interop with C-style APIs). +- **Comments:** `// ...` to end of line. No block comments. +- **Keywords:** `var const fn type struct return if else while break continue + switch default pub extern as sizeof null int byte`. +- **Operators:** listed in the expression grammar below. + +## Top-level structure + +``` +program = { toplevel } EOF +toplevel = [ 'pub' ] ( fn_def | var_decl | type_decl | fn_decl | const_decl ) + | 'extern' ( var_decl | fn_decl ) +fn_def = 'fn' IDENT '(' params ')' type block +fn_decl = 'fn' IDENT '(' params ')' type ';' +var_decl = 'var' IDENT type [ '=' const_expr ] ';' +type_decl = 'type' IDENT [ '=' type ] ';' // no '=' means forward decl +const_decl = 'const' IDENT '=' const_expr ';' +params = [ param { ',' param } ] +param = IDENT type +``` + +### Visibility + +All top-level names are **file-local by default**. Prefix with `pub` to export +from the translation unit. `extern` declares a name defined in another unit +and is always non-`pub` (it is a local reference to an external symbol). + +``` +var counter int; // file-local +pub var version int = 1; // exported +extern fn write(fd int, buf []byte) int; // defined elsewhere +``` + +### Forward declarations + +Because the compiler is one-pass, every name must be declared before use. +Functions defined later in the same file need a forward `fn` declaration +(signature with `;` instead of a body). Same for globals, if needed. + +``` +fn main() int; // forward decl; body appears later + +fn helper() int { + return main(); // legal because of the forward decl +} + +fn main() int { + return 0; +} +``` + +Forward declarations and definitions must agree on signature exactly. + +### Constants + +``` +const MAX_TOKENS = 1024; +pub const AST_ADD = 1; +pub const AST_SUB = AST_ADD + 1; +``` + +`const` binds a name to a compile-time integer. The right-hand side must be +a constant integer expression — integer/character literals, earlier `const` +names, `sizeof(T)`, and the arithmetic/bitwise/shift/compare operators +applied to those. Constants have type `int` and can be used anywhere an +`int` is expected, including array sizes, `switch` case labels, and `var` +or `const` initializers. + +Constants occupy no storage and cannot be addressed with `&`. A `const` is +local to the file unless marked `pub`. + +## Types + +Prefix constructors; read left-to-right. + +``` +type = 'int' | 'byte' + | '*' type + | '[' INT ']' type // array + | '[' ']' type // slice + | 'struct' '{' { IDENT type ';' } '}' + | 'fn' '(' typelist ')' type + | IDENT // named type +typelist = [ type { ',' type } ] +``` + +Examples: + +| Type | Meaning | +|------------------------------|-------------------------------------------| +| `int` | signed machine word | +| `byte` | 8-bit value, memory only | +| `*int` | pointer to int | +| `[10]int` | array of 10 ints | +| `[]int` | slice of int (pointer + length) | +| `*[10]*int` | pointer to array of 10 pointers to int | +| `fn(int, int) int` | function pointer | +| `struct { x int; y int; }` | anonymous struct type | + +### Named types + +``` +type Point = struct { x int; y int; }; +type NodePtr = *Node; +type Node = struct { next *Node; value int; }; +``` + +Self-reference through a pointer works in one pass because the pointer's +pointee type is just a name at parse time. Mutual recursion across two `type` +declarations requires a forward `type Name;` declaration (empty body) followed +by the definition. + +### Slices + +A slice `[]T` is a two-word value: a pointer and a length. It is laid out +as if declared: + +``` +struct { ptr *T; len int; } +``` + +with guaranteed field order and the members accessible as `.ptr` and `.len`. +Unlike other aggregate types, **slices pass and return by value** — they are +always exactly two words. + +Construct a slice from an array or another slice with slice syntax: + +``` +var buf [256]byte; +var s1 []byte = buf[..]; // whole array +var s2 []byte = buf[0..16]; // first 16 bytes +var s3 []byte = s1[4..]; // from index 4 to end +var s4 []byte = s1[..10]; // first 10 +``` + +Or assemble one by assigning the fields directly: + +``` +var s []int; +s.ptr = &arr[0]; +s.len = 10; +``` + +Indexing `s[i]` is sugar for `*(s.ptr + i)` with element-size scaling and is +an lvalue. Taking `&s[i]` yields `*T`. There is no bounds checking. + +Slicing is allowed on arrays and on slices only. `expr[i..j]` requires +`i <= j` and yields a slice of length `j - i`. Omitted bounds default to `0` +(low) and the base's length (high). Slicing a bare `*T` is not supported — +set `.ptr` and `.len` manually instead. + +Slices **cannot be compared by value.** `s1 == s2` is an error. Compare +`s1.ptr == s2.ptr` and `s1.len == s2.len` if you mean identity, or write +a byte-by-byte equality helper if you mean content. + +### No implicit conversions + +Every cross-type conversion goes through `as`: + +- `byte` ↔ `int`: explicit `as`. `byte as int` zero-extends; `int as byte` + truncates to the low 8 bits. +- `*T` ↔ `*U`: explicit `as`. +- `int` ↔ `*T`: explicit `as`. +- `null` is assignable to any `*T` without `as` (the only exception). + +### No decay + +Arrays and structs do **not** decay or copy implicitly. To pass one to a +function, take its address with `&`. `&arr` yields `*T` pointing at the +first element (not `*[N]T`). `&s` on a struct yields `*S`. + +``` +var buf [256]byte; +write(1, &buf, 256); // pass pointer to first byte + +var p Point; +init_point(&p, 3, 4); // pass pointer to struct +``` + +Arrays cannot be assigned, returned, or passed by value. Structs cannot be +assigned, returned, or passed by value. If you want to copy, call a helper. + +## Statements + +``` +block = '{' { statement } '}' +statement = var_decl + | 'if' expr block [ 'else' ( if_tail | block ) ] + | 'while' expr block + | switch_stmt + | 'return' [ expr ] ';' + | 'break' ';' + | 'continue' ';' + | block + | expr_stmt +if_tail = 'if' expr block [ 'else' ( if_tail | block ) ] +switch_stmt = 'switch' expr '{' { case_arm } [ default_arm ] '}' +case_arm = const_expr { ',' const_expr } block +default_arm = 'default' block +expr_stmt = expr ( '=' expr ';' | ';' ) +``` + +- **No parentheses on conditions.** The expression ends at the opening `{` of + the block because `{` is never a valid continuation of an expression. +- Braces are **mandatory** on every `if`, `else`, `while`. Dangling-else is + impossible. +- **Assignment is a statement, not an expression.** No chained assignment, + no assignment inside conditions. `=` vs `==` confusion at the statement + level is caught by grammar. +- No `for`, no ternary, no comma operator, no compound assignment + (`+=` etc.), no `++`/`--`. +- Local variables are **uninitialized** unless `= expr` is given. Local + initializers may be any expression, not just constants. +- Scalar globals (`int`, `byte`, `*T`, `fn(...)...`): initializer must be a + constant expression; zero-initialized if omitted. +- Aggregate globals (arrays, structs, slices): **zero-initialized only** — + there is no non-zero initializer syntax. Populate at program start if + needed. + +### Switch + +``` +switch tok.kind { + TK_PLUS, TK_MINUS { return parse_add(tok); } + TK_STAR { return parse_mul(tok); } + TK_LPAREN { return parse_group(tok); } + default { return parse_error(tok); } +} +``` + +- The scrutinee is any integer expression (`int` or `byte`). +- Case labels must be compile-time integer constants. Multiple labels per + arm are comma-separated. All labels across a `switch` must be distinct. +- Each arm is a mandatory `{}` block. **No fallthrough.** +- `default` is optional. With no default and no matching case, the `switch` + has no effect. +- `break` and `continue` inside an arm refer to the enclosing `while`, not + the `switch`. `return` returns from the function as usual. + +## Expressions + +Eight precedence levels. Where a level is marked **non-chainable**, the +operator may appear at most once; mixing with the surrounding levels requires +explicit parentheses. This is how we kill the classic C precedence traps +(`a & mask == 0` meaning `a & (mask == 0)`, etc.) with zero runtime cost and +a trivial parser — each non-chainable level is a one-shot `[ OP operand ]` +rather than a loop. + +``` +expr = logor +logor = logand { '||' logand } // chainable with itself +logand = compare { '&&' compare } // chainable with itself +compare = bitwise [ CMPOP bitwise ] // non-chainable +bitwise = shift [ BITOP shift ] // non-chainable +shift = addsub [ SHIFT addsub ] // non-chainable +addsub = muldiv { ('+'|'-') muldiv } // left-assoc chainable +muldiv = unary { MULOP unary } // left-assoc chainable +unary = ('-' | '!' | '~' | '*' | '&') unary + | postfix +postfix = primary { '(' args ')' | '[' expr ']' + | '[' [ expr ] '..' [ expr ] ']' + | '.' IDENT | 'as' type } +primary = INT | CHAR | STRING | 'null' | IDENT + | '(' expr ')' + | 'sizeof' '(' type ')' +args = [ expr { ',' expr } ] + +MULOP = '*' | '/' | '%' | '/u' | '%u' +SHIFT = '<<' | '>>' | '>>u' +BITOP = '&' | '|' | '^' +CMPOP = '==' | '!=' | '<' | '<=' | '>' | '>=' + | '<u' | '<=u' | '>u' | '>=u' +``` + +Concretely, these require parentheses: + +``` +a & b | c // error: mixing & and | +a << b + c // error: mixing shift and add +a == b == c // error: chained comparison +a < b < c // error: chained comparison +a && b || c // error: mixing && and || +a & mask == 0 // error: mixing bitwise and compare +``` + +Write instead: + +``` +(a & b) | c +a << (b + c) +(a == b) && (b == c) +(a < b) && (b < c) +(a && b) || c +(a & mask) == 0 +``` + +Chaining is allowed within the arithmetic levels (`a + b - c + d`, +`a * b / c`) and within `&&` and `||` individually (`a && b && c`, +`x || y || z`). Mixing `&&` and `||` still requires parentheses. + +### Signed vs unsigned + +Operators are **signed by default**. Unsigned variants are distinct tokens: + +| Signed | Unsigned | Meaning | +|--------|----------|--------------------------| +| `/` | `/u` | division | +| `%` | `%u` | remainder | +| `>>` | `>>u` | right shift (arith/log) | +| `<` | `<u` | less than | +| `<=` | `<=u` | less or equal | +| `>` | `>u` | greater than | +| `>=` | `>=u` | greater or equal | + +`+`, `-`, `*`, `==`, `!=`, `<<`, `&`, `|`, `^`, `~` have identical behavior in +two's complement and so have no signed/unsigned split. + +Signed overflow **wraps** (defined, not UB). Shift by >= word width is +undefined. + +### Booleans + +No boolean type. Zero is false, non-zero is true. Comparisons and `!` yield +0 or 1. `&&` and `||` short-circuit via branches. + +### Lvalues + +Exactly: `IDENT` (naming a variable), `*expr`, `lv.field`, `lv[expr]`. + +Field access **auto-dereferences pointers**: if `lv` has type `*S`, then +`lv.field` means `(*lv).field`. This chains as needed: `p.a.b` where `p: *A` +and `A.a: *B` means `(*(*p).a).b`. There is no separate `->` operator. + +`lv[expr]` requires the base to be a pointer, an array, or a slice. +Element-size scaling is applied — see Pointer arithmetic. + +### `sizeof` + +`sizeof(T)` takes a type, never an expression. Result is `int`, compile-time +constant. + +### Casts + +`expr as T` is postfix. `(T)expr` is **not** a cast — parentheses are only +for grouping. This removes the only genuine LL(1) hazard in C. + +### Pointer arithmetic + +When one operand of `+` or `-` is a pointer `*T`: + +- `*T + int` and `int + *T` yield `*T`, advancing by `sizeof(T)` bytes per + unit. +- `*T - int` yields `*T`, retreating by `sizeof(T)` bytes per unit. +- `*T - *T` (same pointee type) yields `int`, the signed element-count + difference. +- Pointer + pointer is not allowed. + +Indexing desugars to this arithmetic: + +- `p[i]` with `p: *T` is `*(p + i)`. +- `arr[i]` with `arr: [N]T` is `*(&arr + i)` (since `&arr` has type `*T`). +- `s[i]` with `s: []T` is `*(s.ptr + i)`. + +### Function values + +A bare function name is its own function pointer — no `&` required. If +`my_func` has type `fn(int) int`, then `my_func` is directly usable wherever +an `fn(int) int` value is expected, and `my_func(42)` calls it. Writing +`&my_func` is legal and yields the same pointer. + +## Calling convention and ABI notes + +- Arguments passed by value, evaluated left to right (pushed right to left + in typical stack-based targets). +- Returns are word-sized: `int`, any `*T`, or `byte` (returned as `int` + with zero-extension). +- **Structs never cross function boundaries by value.** Pass `*S`, return + `*S`, or use an out-parameter. +- **Slices (`[]T`) do cross by value** — always two words (pointer + length), + in a register pair or adjacent stack slots. This is the sole exception to + the one-word return / no-aggregate-by-value rule. +- No varargs. To print multiple values, call multiple helpers. + +## Preprocessor + +Only `#include "path"` is supported. Inclusion is textual but +**idempotent per resolved path** — a file already included in the current +compilation is silently skipped on subsequent `#include`s. No include +guards needed, no macros, no conditional compilation, no `#define`. Named +integer constants go in `const`; type aliases go in `type`. + +## Example + +``` +#include "io.lang" + +pub type Node = struct { + next *Node; + value int; +}; + +fn list_len(head *Node) int; // forward decl + +pub fn list_sum(head *Node) int { + var total int = 0; + var p *Node = head; + while p != null { + total = total + p.value; + p = p.next; + } + return total; +} + +fn list_len(head *Node) int { + var n int = 0; + var p *Node = head; + while p != null { + n = n + 1; + p = p.next; + } + return n; +} + +pub fn main() int { + var nodes [3]Node; + nodes[0].value = 10; nodes[0].next = &nodes[1]; + nodes[1].value = 20; nodes[1].next = &nodes[2]; + nodes[2].value = 30; nodes[2].next = null; + + var sum int = list_sum(&nodes[0]); + if (sum > 0) && (sum <u 1000) { + put_int(sum); + } + return 0; +} +``` + +## Dropped from C + +For reference — these are intentionally absent: + +`float`, `double`, `long double`, `complex`, `short`, `long`, `long long`, +`unsigned` as a type (use signed types with unsigned operators), `enum` +(use `const`), `union`, bitfields, C's `const` as a type qualifier (the +keyword is reused for named integer constants), `volatile`, `restrict`, +`static` (replaced by file-local default + `pub`), `typedef` (replaced by +`type`), K&R function syntax, variadic functions, designated initializers, +compound literals, `for`, `do`/`while`, ternary `?:`, comma operator, +compound assignment, `++`/`--`, block comments, pre-processor macros, +implicit conversions, array and function decay, struct/array pass-by-value, +C's `switch` (the keyword is reused with no `case`, no fallthrough, and +mandatory-block arms). diff --git a/SEED.md b/SEED.md @@ -0,0 +1,289 @@ +# Seed userland: the pre-tcc-boot tools + +## Goal + +Bridge the window between *Lisp exists* and *tcc-boot exists* without +touching M2-Planet, Mes, or MesCC. Inside that window, all code is +either a Lisp program running on the Lisp interpreter or subcommands +of a single monolithic C1 binary (`seed`) compiled through the +Lisp-hosted C1 compiler → P1 → M1 → hex2 pipeline. + +This document covers only that window. Phases before it (`seed0 → +hex0/hex1/hex2 → M1`, P1 defs, Lisp interpreter) are documented in +`P1.md` and `PLAN.md`. tcc-boot itself and everything downstream are +standard C and out of scope. + +## Position in the chain + +``` +stage0-posix: seed0 → hex0 → hex1 → hex2 → M1 (no C, no Lisp) +P1 layer: P1 defs files load into M1 (P1.md) +Lisp: P1 text (Lisp interp source) → M1 → hex2 (PLAN.md) +C1 compiler: Lisp program, loaded into the Lisp image (this doc) +──────── seed window begins here ──────── +seed binary: C1 source → Lisp+C1cc → P1 text → M1 → hex2 (this doc) +C compiler: Lisp program, loaded into the Lisp image (PLAN.md) +──────── seed window ends when tcc-boot is built ──────── +tcc-boot: C source → Lisp+Ccc → P1 text → M1 → hex2 (PLAN.md) +``` + +Two Lisp programs (C1 compiler, C compiler) and one statically-linked +C1 binary. No M2-Planet artifact and no Mes Scheme module anywhere. + +## Settled decisions + +These are load-bearing; rest of the document assumes them. + +1. **C1 targets P1.** One C1 source per subcommand, tri-arch binary + via the existing M1+hex2 path. Accepts P1's ~2× code-size tax. +2. **C1 compiler lives in Lisp.** Same host as the C compiler; shares + the Lisp runtime. ~1.5–2.5k LOC Lisp, counted against `PLAN.md`. +3. **Monolithic `seed` binary.** One executable with subcommand + dispatch on `argv[1]` (e.g. `seed kaem script.kaem`, `seed cat + file`, `seed cp a b`). One audit unit, one copy of the runtime, + no loader. Bug blast radius is the whole seed userland — mitigated + by keeping each subcommand self-contained and tested in isolation. +4. **Uncompressed tcc-boot mirror.** Host the upstream tcc-boot source + as an uncompressed `.tar` with sha256 pinned. No gzip support + anywhere in the seed stage. Deletes ~1000–1500 LOC of deflate from + the audit. +5. **Explicit patches via `seed patch-apply`.** Upstream source stays + verbatim. Our changes live as unified-diff files in this repo, + applied by a ~200 LOC C1 subcommand. "Upstream vs ours" stays + legible. +6. **fork + execve for process spawn.** Simplest kernel contract, + stable syscall numbers on all three arches. Plus `wait4` to + reap children. No clone, vfork, or posix_spawn. +7. **Target self-build is primary; cross-build is a cache.** The + canonical build is a fresh target machine bootstrapping from + stage0-posix hex seed. Cross-built per-arch tarballs are supported + as a reproducibility cache — identical bytes expected, verified + against a target self-build, not trusted by assumption. + +## The `seed` binary + +One ELF per arch, invoked as `seed <subcommand> [args...]`. Internal +dispatch table maps `argv[1]` to a function; unknown subcommands error +out. Startup shim parses `argc/argv`, calls the dispatch function, +propagates its return code to `exit`. + +### Subcommands + +| Subcommand | Purpose | C1 LOC | +|---------------|--------------------------------------------------|----------| +| `kaem` | shell driving the tcc-boot build | 700–900 | +| `untar` | POSIX ustar extract (no gzip, no creation) | 500–700 | +| `patch-apply` | apply a unified diff in-place | ~200 | +| `sha256sum` | verify source tarball hashes | 500–700 | +| `cp` | copy one file | ~150 | +| `mkdir` | single-level directory create | ~80 | +| `rm` | remove one file (no `-r`, no `-f`) | ~120 | +| `mv` | rename within one filesystem | ~150 | +| `cat` | concatenate files to stdout | ~80 | +| `test` | file and string predicates for kaem | ~280 | +| `echo` | write args to stdout | ~50 | +| dispatch + argv plumbing | top-level `main`, subcommand table | ~100 | +| C1 runtime + mini libc | startup, syscalls, memcpy/memset/str* | ~400 | +| **Total** | | **~3310–3910** | + +Dispatch is flat: there is no nesting, no aliases, no argv[0]-based +dispatch. Kaem scripts write out `seed <sub>` in full. One installed +file on disk, no symlinks, no `link` syscall needed. + +### Kaem feature set + +Line-oriented minimal shell: + +- One command per line. No `;`, no `&&`, no `||`. +- Command = word (built-in or path) + whitespace-separated args. + Quoting: `"..."` is one arg, with `\n \t \\ \"` escapes. No + single-quote form. +- Variable substitution: `${NAME}` from environment only. +- Built-ins: `cd` (via `chdir` syscall), `set NAME=VALUE` (env), + `exit`. +- Redirection: `> file` (truncate) and `< file` (stdin). No append, + no pipes. +- Failure: non-zero exit from any command aborts the script. +- Comments: `#` to end of line. +- Expansion excluded: globbing, command substitution, arithmetic, + here-docs, background jobs. + +That suffices to express "unpack → verify → compile each file → link +→ install." Orchestration lives in kaem text, not C1. + +## Syscall surface + +Combined with PLAN.md's compiler surface, the seed window requires +**12 syscalls** total. Each gets one row in every `p1_<arch>.M1` +defs file. + +| Syscall | Used by | +|------------|-------------------------------------------| +| `read` | all file-reading subcommands, Lisp I/O | +| `write` | stdout/stderr, all file-writing | +| `open` | file open (`O_RDONLY` / `O_WRONLY|O_CREAT|O_TRUNC` with mode) | +| `close` | all file ops | +| `exit` | program termination | +| `fork` | kaem child spawn | +| `execve` | kaem child spawn | +| `wait4` | kaem reaping children | +| `mkdir` | `seed mkdir`, `untar` (directory entries) | +| `unlink` | `seed rm` | +| `rename` | `seed mv` | +| `access` | `seed test` (file predicates) | +| `chdir` | kaem `cd` builtin | + +Bumps `PLAN.md`'s "five syscalls" contract to 13 (includes `chdir`); +PLAN.md should be cross-referenced to this list, not restated +independently. Deliberately excluded: `stat/fstat` (use `access` +instead), `chmod` (rely on `open` mode bits for initial perms), +`lseek` (all reads are sequential), `getdents`/`readdir` (no +directory traversal needed), `dup`/`pipe`/signals/time/net. + +### How C1 reaches syscalls + +C1 has no inline asm and no intrinsics. Each syscall is exposed as an +ordinary `extern fn` declaration, backed by a hand-written P1 stub in +`runtime.p1`. The stubs are ~3 P1 ops each (load number, `SYSCALL`, +`RET`), totalling ~40 lines of P1 for the whole surface. + +``` +:sys_write ; C1 args arrive in P1 r1-r6 per call ABI + SYSCALL write ; expands per-arch via p1_<arch>.M1 defs + RET +``` + +``` +extern fn sys_write(fd int, buf ptr byte, n int) int; +``` + +Prerequisite: P1 picks its argument registers (`r1–r6`) to coincide +with the native syscall arg registers on each arch (`rdi/rsi/…`, +`x0–x5`, `a0–a5`), so stubs need no register shuffling beyond what +`SYSCALL` already does. Confirm this in `P1.md` during implementation. + +Return convention: Linux returns `-errno` (values in `-1..-4095`) in +the result register. Wrappers return the raw integer; callers test +`r <u 0xfffff000` to detect failure and abort with a message. No +`errno` global, no per-tool error recovery. + +## Build ordering inside the seed window + +Once the Lisp interpreter binary exists and the C1 compiler Lisp +source is loaded: + +1. Compile the `seed` monolith: one C1 source file (or small set + `#include`d into one translation unit, since C1's preprocessor + supports `#include` only) → P1 text → M1 → hex2 → `seed` ELF. + Per-arch, repeat for each target. +2. Install `seed` on the target (copy to a known path). No other + setup required. + +The tcc-boot build then runs as kaem scripts: + +1. `seed sha256sum upstream.tar` against pinned hash. +2. `seed untar upstream.tar`. +3. For each patch file: `seed patch-apply patches/foo.diff`. +4. Loop over tcc-boot `.c` files, invoking Lisp-as-C-compiler to + emit P1 text, then M1+hex2 to produce per-object files or a + single linked binary. (tcc-boot's build is simple enough to + treat as one compilation unit; the loop is unrolled in kaem.) +5. Install tcc-boot binary. + +Seed window is closed. + +## Target self-build vs cross-build + +**Target self-build (primary).** A fresh machine of arch `A` starts +from the stage0-posix hex seed, runs the hex0→hex1→hex2→M1 chain, +loads `p1_A.M1`, assembles the Lisp interpreter, loads the C1 +compiler into Lisp, compiles `seed`, runs the tcc-boot build. Whole +process is a kaem script (bootstrapped from a hand-assembled first +kaem, same way hex2 and M1 are) driving the toolchain. + +**Cross-build cache (secondary).** On an already-bootstrapped +machine, produce `seed` binaries for all three arches and ship them +as tarballs. Users who opt into this skip the target self-build and +land directly at "seed installed." Trust claim: **none by +assumption** — the cache is only trusted after a target self-build +of at least one arch has verified byte-identical output. Cross-build +is an optimization, not a trust input. + +## Provenance + +Three kinds of artifact flow in: + +- **stage0-posix hex seed + P1 defs**: part of this repo, audited + with the rest of it. +- **Lisp interpreter source (in P1)**: part of this repo. +- **C1 sources for `seed` + the C1 compiler + C compiler (in Lisp)**: + part of this repo. +- **Upstream tcc-boot source**: mirrored as uncompressed `.tar` at + a pinned URL + sha256. The mirror file is one of this repo's + auditable inputs; it can be re-derived from upstream by untaring + and retaring in a canonical form, or checked against upstream's + published `.tar.gz` by re-gzipping and comparing hashes on a + machine that has `gzip` (done once, out of band). + +`seed sha256sum` is the single piece of C1 whose correctness has a +direct trust consequence downstream; unit-test it against known +vectors (empty string, "abc", "abcdbcde..."-length tests) before +declaring the seed build complete. + +## Interaction with tcc-boot + +tcc-boot expects a build environment roughly like `cc + make + sh + +coreutils`. Mapping: + +| tcc-boot expects | Seed provides | +|------------------|--------------------------------------------------| +| `cc` / `gcc` | kaem loop invoking Lisp-as-C-compiler per `.c` | +| `make` | flat kaem script (tcc-boot is simple enough) | +| `sh` | `seed kaem` | +| `cat`/`cp`/etc. | `seed <sub>` | +| `ld` | tcc-boot's built-in linker (for its own output) | +| `ar` | not needed; tcc-boot builds one static binary | + +A thin shim script under `scripts/` maps tcc-boot's literal command +names (`cc`, `make`, `install`) to the `seed <sub>` / Lisp-invocation +forms. That shim is kaem text, not C1. + +## Budget rollup + +Fresh auditable LOC introduced by this document, on top of PLAN.md: + +| Layer | LOC | +|-----------------------------------------------|-----------------| +| C1 compiler (Lisp, counted in PLAN.md) | (1,500–2,500) | +| `seed` monolith (all subcommands + runtime) | 3,300–3,900 | +| kaem scripts (orchestration, driver) | a few hundred | +| **Seed window addition** | **~3,300–3,900**| + +Combined PLAN.md + SEED.md audit surface: **~13–17k LOC**, tri-arch, +M2-Planet-free and Mes-free. + +## Handoff notes for the engineer + +Approximate build order for implementation: + +1. **C1 compiler in Lisp** (blocks everything below). Write against + a small corpus of C1 test programs. Validate by compiling a + 20–50 LOC C1 program, running the output, confirming behavior. +2. **C1 runtime + syscall wrappers + mini libc.** Smallest + subcommand (`echo` or `cat`) is the bring-up test. +3. **`seed` dispatch skeleton** plus `echo`, `cat`, `cp`, `mkdir`, + `rm`, `mv`. Small, independent, easy to unit-test. +4. **`sha256sum`** with unit tests before anything depends on its + correctness. +5. **`test`** (file predicates needed by kaem). +6. **`untar`** (ustar extract only). +7. **`patch-apply`** (unified-diff in-place). +8. **`kaem`** (depends on `fork`, `execve`, `wait4`, `chdir`, + redirect). +9. **End-to-end bring-up**: kaem script running `sha256sum` → + `untar` → `patch-apply` → Lisp-C-compile loop → linked + tcc-boot. First full trip through the seed window. + +Each step compiles standalone C1 and assembles through the existing +P1 → M1 → hex2 path; no new tooling infrastructure is needed +between steps.