commit c049a3b304a6f4291d0903294a972db245f5d02c
parent f1f725af88ed3d7b2eba1997e8062c362de159f1
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Tue, 21 Apr 2026 05:33:19 -0700
C1 and SEED docs
Diffstat:
| A | C1.md | | | 486 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | SEED.md | | | 289 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
2 files changed, 775 insertions(+), 0 deletions(-)
diff --git a/C1.md b/C1.md
@@ -0,0 +1,486 @@
+# Bootstrap C-like Language
+
+A minimal C-like language tuned for trivial one-pass compilation. Strict LL(1)
+grammar, recursive-descent parseable with one token of lookahead, no semantic
+feedback into the parser. Integer-only, two's complement, word-sized.
+
+## Goals and non-goals
+
+Goals:
+
+- **LL(1) grammar.** Hand-written recursive descent with one-token lookahead.
+ No symbol table needed for parsing.
+- **One-pass compilation.** Source order determines visibility; forward
+ declarations are used for anything defined later.
+- **One integer width.** `int` (machine word) and `byte` (8 bits, memory only).
+ No promotions, no rank, no integer zoo.
+- **No type-directed parsing.** The tokenizer and parser never ask "is this
+ identifier a type?"
+- **Trivial code generation.** Tree walker emitting stack-machine-style output
+ is sufficient. No register allocation required.
+- **Explicit over implicit.** Ambiguous precedence combinations require
+ parentheses.
+
+Non-goals: ergonomics, expressiveness, optimization, source compatibility with
+C.
+
+## Lexical structure
+
+- **Identifiers:** `[A-Za-z_][A-Za-z0-9_]*`.
+- **Integers:** decimal `123`, hex `0x7F`, character `'a'` with escapes
+ `\n \t \r \0 \\ \' \"`.
+- **Strings:** `"..."` with the same escapes. Type is `[]byte` — a slice
+ with `.ptr` into static read-only storage and `.len` equal to the number
+ of source bytes (no null terminator in the length, though a trailing
+ `'\0'` byte is present for interop with C-style APIs).
+- **Comments:** `// ...` to end of line. No block comments.
+- **Keywords:** `var const fn type struct return if else while break continue
+ switch default pub extern as sizeof null int byte`.
+- **Operators:** listed in the expression grammar below.
+
+## Top-level structure
+
+```
+program = { toplevel } EOF
+toplevel = [ 'pub' ] ( fn_def | var_decl | type_decl | fn_decl | const_decl )
+ | 'extern' ( var_decl | fn_decl )
+fn_def = 'fn' IDENT '(' params ')' type block
+fn_decl = 'fn' IDENT '(' params ')' type ';'
+var_decl = 'var' IDENT type [ '=' const_expr ] ';'
+type_decl = 'type' IDENT [ '=' type ] ';' // no '=' means forward decl
+const_decl = 'const' IDENT '=' const_expr ';'
+params = [ param { ',' param } ]
+param = IDENT type
+```
+
+### Visibility
+
+All top-level names are **file-local by default**. Prefix with `pub` to export
+from the translation unit. `extern` declares a name defined in another unit
+and is always non-`pub` (it is a local reference to an external symbol).
+
+```
+var counter int; // file-local
+pub var version int = 1; // exported
+extern fn write(fd int, buf []byte) int; // defined elsewhere
+```
+
+### Forward declarations
+
+Because the compiler is one-pass, every name must be declared before use.
+Functions defined later in the same file need a forward `fn` declaration
+(signature with `;` instead of a body). Same for globals, if needed.
+
+```
+fn main() int; // forward decl; body appears later
+
+fn helper() int {
+ return main(); // legal because of the forward decl
+}
+
+fn main() int {
+ return 0;
+}
+```
+
+Forward declarations and definitions must agree on signature exactly.
+
+### Constants
+
+```
+const MAX_TOKENS = 1024;
+pub const AST_ADD = 1;
+pub const AST_SUB = AST_ADD + 1;
+```
+
+`const` binds a name to a compile-time integer. The right-hand side must be
+a constant integer expression — integer/character literals, earlier `const`
+names, `sizeof(T)`, and the arithmetic/bitwise/shift/compare operators
+applied to those. Constants have type `int` and can be used anywhere an
+`int` is expected, including array sizes, `switch` case labels, and `var`
+or `const` initializers.
+
+Constants occupy no storage and cannot be addressed with `&`. A `const` is
+local to the file unless marked `pub`.
+
+## Types
+
+Prefix constructors; read left-to-right.
+
+```
+type = 'int' | 'byte'
+ | '*' type
+ | '[' INT ']' type // array
+ | '[' ']' type // slice
+ | 'struct' '{' { IDENT type ';' } '}'
+ | 'fn' '(' typelist ')' type
+ | IDENT // named type
+typelist = [ type { ',' type } ]
+```
+
+Examples:
+
+| Type | Meaning |
+|------------------------------|-------------------------------------------|
+| `int` | signed machine word |
+| `byte` | 8-bit value, memory only |
+| `*int` | pointer to int |
+| `[10]int` | array of 10 ints |
+| `[]int` | slice of int (pointer + length) |
+| `*[10]*int` | pointer to array of 10 pointers to int |
+| `fn(int, int) int` | function pointer |
+| `struct { x int; y int; }` | anonymous struct type |
+
+### Named types
+
+```
+type Point = struct { x int; y int; };
+type NodePtr = *Node;
+type Node = struct { next *Node; value int; };
+```
+
+Self-reference through a pointer works in one pass because the pointer's
+pointee type is just a name at parse time. Mutual recursion across two `type`
+declarations requires a forward `type Name;` declaration (empty body) followed
+by the definition.
+
+### Slices
+
+A slice `[]T` is a two-word value: a pointer and a length. It is laid out
+as if declared:
+
+```
+struct { ptr *T; len int; }
+```
+
+with guaranteed field order and the members accessible as `.ptr` and `.len`.
+Unlike other aggregate types, **slices pass and return by value** — they are
+always exactly two words.
+
+Construct a slice from an array or another slice with slice syntax:
+
+```
+var buf [256]byte;
+var s1 []byte = buf[..]; // whole array
+var s2 []byte = buf[0..16]; // first 16 bytes
+var s3 []byte = s1[4..]; // from index 4 to end
+var s4 []byte = s1[..10]; // first 10
+```
+
+Or assemble one by assigning the fields directly:
+
+```
+var s []int;
+s.ptr = &arr[0];
+s.len = 10;
+```
+
+Indexing `s[i]` is sugar for `*(s.ptr + i)` with element-size scaling and is
+an lvalue. Taking `&s[i]` yields `*T`. There is no bounds checking.
+
+Slicing is allowed on arrays and on slices only. `expr[i..j]` requires
+`i <= j` and yields a slice of length `j - i`. Omitted bounds default to `0`
+(low) and the base's length (high). Slicing a bare `*T` is not supported —
+set `.ptr` and `.len` manually instead.
+
+Slices **cannot be compared by value.** `s1 == s2` is an error. Compare
+`s1.ptr == s2.ptr` and `s1.len == s2.len` if you mean identity, or write
+a byte-by-byte equality helper if you mean content.
+
+### No implicit conversions
+
+Every cross-type conversion goes through `as`:
+
+- `byte` ↔ `int`: explicit `as`. `byte as int` zero-extends; `int as byte`
+ truncates to the low 8 bits.
+- `*T` ↔ `*U`: explicit `as`.
+- `int` ↔ `*T`: explicit `as`.
+- `null` is assignable to any `*T` without `as` (the only exception).
+
+### No decay
+
+Arrays and structs do **not** decay or copy implicitly. To pass one to a
+function, take its address with `&`. `&arr` yields `*T` pointing at the
+first element (not `*[N]T`). `&s` on a struct yields `*S`.
+
+```
+var buf [256]byte;
+write(1, &buf, 256); // pass pointer to first byte
+
+var p Point;
+init_point(&p, 3, 4); // pass pointer to struct
+```
+
+Arrays cannot be assigned, returned, or passed by value. Structs cannot be
+assigned, returned, or passed by value. If you want to copy, call a helper.
+
+## Statements
+
+```
+block = '{' { statement } '}'
+statement = var_decl
+ | 'if' expr block [ 'else' ( if_tail | block ) ]
+ | 'while' expr block
+ | switch_stmt
+ | 'return' [ expr ] ';'
+ | 'break' ';'
+ | 'continue' ';'
+ | block
+ | expr_stmt
+if_tail = 'if' expr block [ 'else' ( if_tail | block ) ]
+switch_stmt = 'switch' expr '{' { case_arm } [ default_arm ] '}'
+case_arm = const_expr { ',' const_expr } block
+default_arm = 'default' block
+expr_stmt = expr ( '=' expr ';' | ';' )
+```
+
+- **No parentheses on conditions.** The expression ends at the opening `{` of
+ the block because `{` is never a valid continuation of an expression.
+- Braces are **mandatory** on every `if`, `else`, `while`. Dangling-else is
+ impossible.
+- **Assignment is a statement, not an expression.** No chained assignment,
+ no assignment inside conditions. `=` vs `==` confusion at the statement
+ level is caught by grammar.
+- No `for`, no ternary, no comma operator, no compound assignment
+ (`+=` etc.), no `++`/`--`.
+- Local variables are **uninitialized** unless `= expr` is given. Local
+ initializers may be any expression, not just constants.
+- Scalar globals (`int`, `byte`, `*T`, `fn(...)...`): initializer must be a
+ constant expression; zero-initialized if omitted.
+- Aggregate globals (arrays, structs, slices): **zero-initialized only** —
+ there is no non-zero initializer syntax. Populate at program start if
+ needed.
+
+### Switch
+
+```
+switch tok.kind {
+ TK_PLUS, TK_MINUS { return parse_add(tok); }
+ TK_STAR { return parse_mul(tok); }
+ TK_LPAREN { return parse_group(tok); }
+ default { return parse_error(tok); }
+}
+```
+
+- The scrutinee is any integer expression (`int` or `byte`).
+- Case labels must be compile-time integer constants. Multiple labels per
+ arm are comma-separated. All labels across a `switch` must be distinct.
+- Each arm is a mandatory `{}` block. **No fallthrough.**
+- `default` is optional. With no default and no matching case, the `switch`
+ has no effect.
+- `break` and `continue` inside an arm refer to the enclosing `while`, not
+ the `switch`. `return` returns from the function as usual.
+
+## Expressions
+
+Eight precedence levels. Where a level is marked **non-chainable**, the
+operator may appear at most once; mixing with the surrounding levels requires
+explicit parentheses. This is how we kill the classic C precedence traps
+(`a & mask == 0` meaning `a & (mask == 0)`, etc.) with zero runtime cost and
+a trivial parser — each non-chainable level is a one-shot `[ OP operand ]`
+rather than a loop.
+
+```
+expr = logor
+logor = logand { '||' logand } // chainable with itself
+logand = compare { '&&' compare } // chainable with itself
+compare = bitwise [ CMPOP bitwise ] // non-chainable
+bitwise = shift [ BITOP shift ] // non-chainable
+shift = addsub [ SHIFT addsub ] // non-chainable
+addsub = muldiv { ('+'|'-') muldiv } // left-assoc chainable
+muldiv = unary { MULOP unary } // left-assoc chainable
+unary = ('-' | '!' | '~' | '*' | '&') unary
+ | postfix
+postfix = primary { '(' args ')' | '[' expr ']'
+ | '[' [ expr ] '..' [ expr ] ']'
+ | '.' IDENT | 'as' type }
+primary = INT | CHAR | STRING | 'null' | IDENT
+ | '(' expr ')'
+ | 'sizeof' '(' type ')'
+args = [ expr { ',' expr } ]
+
+MULOP = '*' | '/' | '%' | '/u' | '%u'
+SHIFT = '<<' | '>>' | '>>u'
+BITOP = '&' | '|' | '^'
+CMPOP = '==' | '!=' | '<' | '<=' | '>' | '>='
+ | '<u' | '<=u' | '>u' | '>=u'
+```
+
+Concretely, these require parentheses:
+
+```
+a & b | c // error: mixing & and |
+a << b + c // error: mixing shift and add
+a == b == c // error: chained comparison
+a < b < c // error: chained comparison
+a && b || c // error: mixing && and ||
+a & mask == 0 // error: mixing bitwise and compare
+```
+
+Write instead:
+
+```
+(a & b) | c
+a << (b + c)
+(a == b) && (b == c)
+(a < b) && (b < c)
+(a && b) || c
+(a & mask) == 0
+```
+
+Chaining is allowed within the arithmetic levels (`a + b - c + d`,
+`a * b / c`) and within `&&` and `||` individually (`a && b && c`,
+`x || y || z`). Mixing `&&` and `||` still requires parentheses.
+
+### Signed vs unsigned
+
+Operators are **signed by default**. Unsigned variants are distinct tokens:
+
+| Signed | Unsigned | Meaning |
+|--------|----------|--------------------------|
+| `/` | `/u` | division |
+| `%` | `%u` | remainder |
+| `>>` | `>>u` | right shift (arith/log) |
+| `<` | `<u` | less than |
+| `<=` | `<=u` | less or equal |
+| `>` | `>u` | greater than |
+| `>=` | `>=u` | greater or equal |
+
+`+`, `-`, `*`, `==`, `!=`, `<<`, `&`, `|`, `^`, `~` have identical behavior in
+two's complement and so have no signed/unsigned split.
+
+Signed overflow **wraps** (defined, not UB). Shift by >= word width is
+undefined.
+
+### Booleans
+
+No boolean type. Zero is false, non-zero is true. Comparisons and `!` yield
+0 or 1. `&&` and `||` short-circuit via branches.
+
+### Lvalues
+
+Exactly: `IDENT` (naming a variable), `*expr`, `lv.field`, `lv[expr]`.
+
+Field access **auto-dereferences pointers**: if `lv` has type `*S`, then
+`lv.field` means `(*lv).field`. This chains as needed: `p.a.b` where `p: *A`
+and `A.a: *B` means `(*(*p).a).b`. There is no separate `->` operator.
+
+`lv[expr]` requires the base to be a pointer, an array, or a slice.
+Element-size scaling is applied — see Pointer arithmetic.
+
+### `sizeof`
+
+`sizeof(T)` takes a type, never an expression. Result is `int`, compile-time
+constant.
+
+### Casts
+
+`expr as T` is postfix. `(T)expr` is **not** a cast — parentheses are only
+for grouping. This removes the only genuine LL(1) hazard in C.
+
+### Pointer arithmetic
+
+When one operand of `+` or `-` is a pointer `*T`:
+
+- `*T + int` and `int + *T` yield `*T`, advancing by `sizeof(T)` bytes per
+ unit.
+- `*T - int` yields `*T`, retreating by `sizeof(T)` bytes per unit.
+- `*T - *T` (same pointee type) yields `int`, the signed element-count
+ difference.
+- Pointer + pointer is not allowed.
+
+Indexing desugars to this arithmetic:
+
+- `p[i]` with `p: *T` is `*(p + i)`.
+- `arr[i]` with `arr: [N]T` is `*(&arr + i)` (since `&arr` has type `*T`).
+- `s[i]` with `s: []T` is `*(s.ptr + i)`.
+
+### Function values
+
+A bare function name is its own function pointer — no `&` required. If
+`my_func` has type `fn(int) int`, then `my_func` is directly usable wherever
+an `fn(int) int` value is expected, and `my_func(42)` calls it. Writing
+`&my_func` is legal and yields the same pointer.
+
+## Calling convention and ABI notes
+
+- Arguments passed by value, evaluated left to right (pushed right to left
+ in typical stack-based targets).
+- Returns are word-sized: `int`, any `*T`, or `byte` (returned as `int`
+ with zero-extension).
+- **Structs never cross function boundaries by value.** Pass `*S`, return
+ `*S`, or use an out-parameter.
+- **Slices (`[]T`) do cross by value** — always two words (pointer + length),
+ in a register pair or adjacent stack slots. This is the sole exception to
+ the one-word return / no-aggregate-by-value rule.
+- No varargs. To print multiple values, call multiple helpers.
+
+## Preprocessor
+
+Only `#include "path"` is supported. Inclusion is textual but
+**idempotent per resolved path** — a file already included in the current
+compilation is silently skipped on subsequent `#include`s. No include
+guards needed, no macros, no conditional compilation, no `#define`. Named
+integer constants go in `const`; type aliases go in `type`.
+
+## Example
+
+```
+#include "io.lang"
+
+pub type Node = struct {
+ next *Node;
+ value int;
+};
+
+fn list_len(head *Node) int; // forward decl
+
+pub fn list_sum(head *Node) int {
+ var total int = 0;
+ var p *Node = head;
+ while p != null {
+ total = total + p.value;
+ p = p.next;
+ }
+ return total;
+}
+
+fn list_len(head *Node) int {
+ var n int = 0;
+ var p *Node = head;
+ while p != null {
+ n = n + 1;
+ p = p.next;
+ }
+ return n;
+}
+
+pub fn main() int {
+ var nodes [3]Node;
+ nodes[0].value = 10; nodes[0].next = &nodes[1];
+ nodes[1].value = 20; nodes[1].next = &nodes[2];
+ nodes[2].value = 30; nodes[2].next = null;
+
+ var sum int = list_sum(&nodes[0]);
+ if (sum > 0) && (sum <u 1000) {
+ put_int(sum);
+ }
+ return 0;
+}
+```
+
+## Dropped from C
+
+For reference — these are intentionally absent:
+
+`float`, `double`, `long double`, `complex`, `short`, `long`, `long long`,
+`unsigned` as a type (use signed types with unsigned operators), `enum`
+(use `const`), `union`, bitfields, C's `const` as a type qualifier (the
+keyword is reused for named integer constants), `volatile`, `restrict`,
+`static` (replaced by file-local default + `pub`), `typedef` (replaced by
+`type`), K&R function syntax, variadic functions, designated initializers,
+compound literals, `for`, `do`/`while`, ternary `?:`, comma operator,
+compound assignment, `++`/`--`, block comments, pre-processor macros,
+implicit conversions, array and function decay, struct/array pass-by-value,
+C's `switch` (the keyword is reused with no `case`, no fallthrough, and
+mandatory-block arms).
diff --git a/SEED.md b/SEED.md
@@ -0,0 +1,289 @@
+# Seed userland: the pre-tcc-boot tools
+
+## Goal
+
+Bridge the window between *Lisp exists* and *tcc-boot exists* without
+touching M2-Planet, Mes, or MesCC. Inside that window, all code is
+either a Lisp program running on the Lisp interpreter or subcommands
+of a single monolithic C1 binary (`seed`) compiled through the
+Lisp-hosted C1 compiler → P1 → M1 → hex2 pipeline.
+
+This document covers only that window. Phases before it (`seed0 →
+hex0/hex1/hex2 → M1`, P1 defs, Lisp interpreter) are documented in
+`P1.md` and `PLAN.md`. tcc-boot itself and everything downstream are
+standard C and out of scope.
+
+## Position in the chain
+
+```
+stage0-posix: seed0 → hex0 → hex1 → hex2 → M1 (no C, no Lisp)
+P1 layer: P1 defs files load into M1 (P1.md)
+Lisp: P1 text (Lisp interp source) → M1 → hex2 (PLAN.md)
+C1 compiler: Lisp program, loaded into the Lisp image (this doc)
+──────── seed window begins here ────────
+seed binary: C1 source → Lisp+C1cc → P1 text → M1 → hex2 (this doc)
+C compiler: Lisp program, loaded into the Lisp image (PLAN.md)
+──────── seed window ends when tcc-boot is built ────────
+tcc-boot: C source → Lisp+Ccc → P1 text → M1 → hex2 (PLAN.md)
+```
+
+Two Lisp programs (C1 compiler, C compiler) and one statically-linked
+C1 binary. No M2-Planet artifact and no Mes Scheme module anywhere.
+
+## Settled decisions
+
+These are load-bearing; rest of the document assumes them.
+
+1. **C1 targets P1.** One C1 source per subcommand, tri-arch binary
+ via the existing M1+hex2 path. Accepts P1's ~2× code-size tax.
+2. **C1 compiler lives in Lisp.** Same host as the C compiler; shares
+ the Lisp runtime. ~1.5–2.5k LOC Lisp, counted against `PLAN.md`.
+3. **Monolithic `seed` binary.** One executable with subcommand
+ dispatch on `argv[1]` (e.g. `seed kaem script.kaem`, `seed cat
+ file`, `seed cp a b`). One audit unit, one copy of the runtime,
+ no loader. Bug blast radius is the whole seed userland — mitigated
+ by keeping each subcommand self-contained and tested in isolation.
+4. **Uncompressed tcc-boot mirror.** Host the upstream tcc-boot source
+ as an uncompressed `.tar` with sha256 pinned. No gzip support
+ anywhere in the seed stage. Deletes ~1000–1500 LOC of deflate from
+ the audit.
+5. **Explicit patches via `seed patch-apply`.** Upstream source stays
+ verbatim. Our changes live as unified-diff files in this repo,
+ applied by a ~200 LOC C1 subcommand. "Upstream vs ours" stays
+ legible.
+6. **fork + execve for process spawn.** Simplest kernel contract,
+ stable syscall numbers on all three arches. Plus `wait4` to
+ reap children. No clone, vfork, or posix_spawn.
+7. **Target self-build is primary; cross-build is a cache.** The
+ canonical build is a fresh target machine bootstrapping from
+ stage0-posix hex seed. Cross-built per-arch tarballs are supported
+ as a reproducibility cache — identical bytes expected, verified
+ against a target self-build, not trusted by assumption.
+
+## The `seed` binary
+
+One ELF per arch, invoked as `seed <subcommand> [args...]`. Internal
+dispatch table maps `argv[1]` to a function; unknown subcommands error
+out. Startup shim parses `argc/argv`, calls the dispatch function,
+propagates its return code to `exit`.
+
+### Subcommands
+
+| Subcommand | Purpose | C1 LOC |
+|---------------|--------------------------------------------------|----------|
+| `kaem` | shell driving the tcc-boot build | 700–900 |
+| `untar` | POSIX ustar extract (no gzip, no creation) | 500–700 |
+| `patch-apply` | apply a unified diff in-place | ~200 |
+| `sha256sum` | verify source tarball hashes | 500–700 |
+| `cp` | copy one file | ~150 |
+| `mkdir` | single-level directory create | ~80 |
+| `rm` | remove one file (no `-r`, no `-f`) | ~120 |
+| `mv` | rename within one filesystem | ~150 |
+| `cat` | concatenate files to stdout | ~80 |
+| `test` | file and string predicates for kaem | ~280 |
+| `echo` | write args to stdout | ~50 |
+| dispatch + argv plumbing | top-level `main`, subcommand table | ~100 |
+| C1 runtime + mini libc | startup, syscalls, memcpy/memset/str* | ~400 |
+| **Total** | | **~3310–3910** |
+
+Dispatch is flat: there is no nesting, no aliases, no argv[0]-based
+dispatch. Kaem scripts write out `seed <sub>` in full. One installed
+file on disk, no symlinks, no `link` syscall needed.
+
+### Kaem feature set
+
+Line-oriented minimal shell:
+
+- One command per line. No `;`, no `&&`, no `||`.
+- Command = word (built-in or path) + whitespace-separated args.
+ Quoting: `"..."` is one arg, with `\n \t \\ \"` escapes. No
+ single-quote form.
+- Variable substitution: `${NAME}` from environment only.
+- Built-ins: `cd` (via `chdir` syscall), `set NAME=VALUE` (env),
+ `exit`.
+- Redirection: `> file` (truncate) and `< file` (stdin). No append,
+ no pipes.
+- Failure: non-zero exit from any command aborts the script.
+- Comments: `#` to end of line.
+- Expansion excluded: globbing, command substitution, arithmetic,
+ here-docs, background jobs.
+
+That suffices to express "unpack → verify → compile each file → link
+→ install." Orchestration lives in kaem text, not C1.
+
+## Syscall surface
+
+Combined with PLAN.md's compiler surface, the seed window requires
+**12 syscalls** total. Each gets one row in every `p1_<arch>.M1`
+defs file.
+
+| Syscall | Used by |
+|------------|-------------------------------------------|
+| `read` | all file-reading subcommands, Lisp I/O |
+| `write` | stdout/stderr, all file-writing |
+| `open` | file open (`O_RDONLY` / `O_WRONLY|O_CREAT|O_TRUNC` with mode) |
+| `close` | all file ops |
+| `exit` | program termination |
+| `fork` | kaem child spawn |
+| `execve` | kaem child spawn |
+| `wait4` | kaem reaping children |
+| `mkdir` | `seed mkdir`, `untar` (directory entries) |
+| `unlink` | `seed rm` |
+| `rename` | `seed mv` |
+| `access` | `seed test` (file predicates) |
+| `chdir` | kaem `cd` builtin |
+
+Bumps `PLAN.md`'s "five syscalls" contract to 13 (includes `chdir`);
+PLAN.md should be cross-referenced to this list, not restated
+independently. Deliberately excluded: `stat/fstat` (use `access`
+instead), `chmod` (rely on `open` mode bits for initial perms),
+`lseek` (all reads are sequential), `getdents`/`readdir` (no
+directory traversal needed), `dup`/`pipe`/signals/time/net.
+
+### How C1 reaches syscalls
+
+C1 has no inline asm and no intrinsics. Each syscall is exposed as an
+ordinary `extern fn` declaration, backed by a hand-written P1 stub in
+`runtime.p1`. The stubs are ~3 P1 ops each (load number, `SYSCALL`,
+`RET`), totalling ~40 lines of P1 for the whole surface.
+
+```
+:sys_write ; C1 args arrive in P1 r1-r6 per call ABI
+ SYSCALL write ; expands per-arch via p1_<arch>.M1 defs
+ RET
+```
+
+```
+extern fn sys_write(fd int, buf ptr byte, n int) int;
+```
+
+Prerequisite: P1 picks its argument registers (`r1–r6`) to coincide
+with the native syscall arg registers on each arch (`rdi/rsi/…`,
+`x0–x5`, `a0–a5`), so stubs need no register shuffling beyond what
+`SYSCALL` already does. Confirm this in `P1.md` during implementation.
+
+Return convention: Linux returns `-errno` (values in `-1..-4095`) in
+the result register. Wrappers return the raw integer; callers test
+`r <u 0xfffff000` to detect failure and abort with a message. No
+`errno` global, no per-tool error recovery.
+
+## Build ordering inside the seed window
+
+Once the Lisp interpreter binary exists and the C1 compiler Lisp
+source is loaded:
+
+1. Compile the `seed` monolith: one C1 source file (or small set
+ `#include`d into one translation unit, since C1's preprocessor
+ supports `#include` only) → P1 text → M1 → hex2 → `seed` ELF.
+ Per-arch, repeat for each target.
+2. Install `seed` on the target (copy to a known path). No other
+ setup required.
+
+The tcc-boot build then runs as kaem scripts:
+
+1. `seed sha256sum upstream.tar` against pinned hash.
+2. `seed untar upstream.tar`.
+3. For each patch file: `seed patch-apply patches/foo.diff`.
+4. Loop over tcc-boot `.c` files, invoking Lisp-as-C-compiler to
+ emit P1 text, then M1+hex2 to produce per-object files or a
+ single linked binary. (tcc-boot's build is simple enough to
+ treat as one compilation unit; the loop is unrolled in kaem.)
+5. Install tcc-boot binary.
+
+Seed window is closed.
+
+## Target self-build vs cross-build
+
+**Target self-build (primary).** A fresh machine of arch `A` starts
+from the stage0-posix hex seed, runs the hex0→hex1→hex2→M1 chain,
+loads `p1_A.M1`, assembles the Lisp interpreter, loads the C1
+compiler into Lisp, compiles `seed`, runs the tcc-boot build. Whole
+process is a kaem script (bootstrapped from a hand-assembled first
+kaem, same way hex2 and M1 are) driving the toolchain.
+
+**Cross-build cache (secondary).** On an already-bootstrapped
+machine, produce `seed` binaries for all three arches and ship them
+as tarballs. Users who opt into this skip the target self-build and
+land directly at "seed installed." Trust claim: **none by
+assumption** — the cache is only trusted after a target self-build
+of at least one arch has verified byte-identical output. Cross-build
+is an optimization, not a trust input.
+
+## Provenance
+
+Three kinds of artifact flow in:
+
+- **stage0-posix hex seed + P1 defs**: part of this repo, audited
+ with the rest of it.
+- **Lisp interpreter source (in P1)**: part of this repo.
+- **C1 sources for `seed` + the C1 compiler + C compiler (in Lisp)**:
+ part of this repo.
+- **Upstream tcc-boot source**: mirrored as uncompressed `.tar` at
+ a pinned URL + sha256. The mirror file is one of this repo's
+ auditable inputs; it can be re-derived from upstream by untaring
+ and retaring in a canonical form, or checked against upstream's
+ published `.tar.gz` by re-gzipping and comparing hashes on a
+ machine that has `gzip` (done once, out of band).
+
+`seed sha256sum` is the single piece of C1 whose correctness has a
+direct trust consequence downstream; unit-test it against known
+vectors (empty string, "abc", "abcdbcde..."-length tests) before
+declaring the seed build complete.
+
+## Interaction with tcc-boot
+
+tcc-boot expects a build environment roughly like `cc + make + sh +
+coreutils`. Mapping:
+
+| tcc-boot expects | Seed provides |
+|------------------|--------------------------------------------------|
+| `cc` / `gcc` | kaem loop invoking Lisp-as-C-compiler per `.c` |
+| `make` | flat kaem script (tcc-boot is simple enough) |
+| `sh` | `seed kaem` |
+| `cat`/`cp`/etc. | `seed <sub>` |
+| `ld` | tcc-boot's built-in linker (for its own output) |
+| `ar` | not needed; tcc-boot builds one static binary |
+
+A thin shim script under `scripts/` maps tcc-boot's literal command
+names (`cc`, `make`, `install`) to the `seed <sub>` / Lisp-invocation
+forms. That shim is kaem text, not C1.
+
+## Budget rollup
+
+Fresh auditable LOC introduced by this document, on top of PLAN.md:
+
+| Layer | LOC |
+|-----------------------------------------------|-----------------|
+| C1 compiler (Lisp, counted in PLAN.md) | (1,500–2,500) |
+| `seed` monolith (all subcommands + runtime) | 3,300–3,900 |
+| kaem scripts (orchestration, driver) | a few hundred |
+| **Seed window addition** | **~3,300–3,900**|
+
+Combined PLAN.md + SEED.md audit surface: **~13–17k LOC**, tri-arch,
+M2-Planet-free and Mes-free.
+
+## Handoff notes for the engineer
+
+Approximate build order for implementation:
+
+1. **C1 compiler in Lisp** (blocks everything below). Write against
+ a small corpus of C1 test programs. Validate by compiling a
+ 20–50 LOC C1 program, running the output, confirming behavior.
+2. **C1 runtime + syscall wrappers + mini libc.** Smallest
+ subcommand (`echo` or `cat`) is the bring-up test.
+3. **`seed` dispatch skeleton** plus `echo`, `cat`, `cp`, `mkdir`,
+ `rm`, `mv`. Small, independent, easy to unit-test.
+4. **`sha256sum`** with unit tests before anything depends on its
+ correctness.
+5. **`test`** (file predicates needed by kaem).
+6. **`untar`** (ustar extract only).
+7. **`patch-apply`** (unified-diff in-place).
+8. **`kaem`** (depends on `fork`, `execve`, `wait4`, `chdir`,
+ redirect).
+9. **End-to-end bring-up**: kaem script running `sha256sum` →
+ `untar` → `patch-apply` → Lisp-C-compile loop → linked
+ tcc-boot. First full trip through the seed window.
+
+Each step compiles standalone C1 and assembles through the existing
+P1 → M1 → hex2 path; no new tooling infrastructure is needed
+between steps.