commit 56150a05235e38ca5c19d28d886e1f5fa0dc2fb6
parent 7e6e49004c67be0c88cce368096d626450ec804a
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Tue, 5 May 2026 12:03:45 -0700
docs/OS-TODO: collapse landed items; add tcc3 self-host roadmap
The eleven OS.md items, seed-driver port for boot{3,4,5}, atomic
sys_spawn, HVF, STT_FILE prefix strip, and virtio-blk transport are
all in. Move them out of the body and into a one-paragraph done
summary. Replace with a tcc3 self-host section listing the kernel.S
mnemonics, sysreg names, and ld-script features still missing from
tcc 0.9.26's arm64 assembler/linker — the remaining gap before the
kernel can rebuild itself out of boot4.
Diffstat:
| M | docs/OS-TODO.md | | | 302 | ++++++++++++++++++++++--------------------------------------------------------- |
1 file changed, 84 insertions(+), 218 deletions(-)
diff --git a/docs/OS-TODO.md b/docs/OS-TODO.md
@@ -1,218 +1,84 @@
-# Seed kernel — gaps against docs/OS.md
-
-Audit of [`seed-kernel/`](../seed-kernel/) against the contract in
-[`OS.md`](OS.md). All eleven items are now resolved — the seed kernel
-boots, parses the DTB, unpacks an initramfs into an in-memory tmpfs,
-loads `/init` as a static aarch64 ELF, dispatches the eight Tier-1
-syscalls plus a single atomic `sys_spawn` (a private syscall replacing
-the original POSIX-style `clone`+`execve` pair) and `sys_waitid`, and
-supports both the host-side verification gates `scripts/tier1-gate.sh`
-and `scripts/tier2-gate.sh`. Verified against `boot0/catm`,
-`boot1/M1pp`, and `boot3/tcc0`; the canonical Tier-2 case (scheme1
-driver spawns tcc0 to compile a `.c` into a relocatable ELF object)
-round-trips end-to-end. The bootN scripts themselves now run on
-seed-kernel via `DRIVER=seed scripts/bootN.sh aarch64` for
-N∈{0,1,2,3,4,5}, producing byte-identical outputs to the podman path;
-`scripts/seed-accept.sh` exercises the boot2-built scheme1 spawning
-the boot2-built catm via the .scm prelude.
-
-## Tier 1
-
-1. **Real argv.** ✅ `build_user_stack` takes `(argc, argv[])`. argv is
- sourced from `/chosen/bootargs` (whitespace-tokenised), then from a
- `/init.argv` file in the initramfs, with `argc=1, argv[0]="init"`
- as the final fallback. The kernel reserves a `dumpfs` token in
- bootargs (stripped from user argv) that triggers the UART tmpfs
- dump on exit (item 9).
-
-2. **User load address.** ✅ Per-process L2 page table installs an
- `l2_user[]` covering the low 1 GB of VA in 2 MB blocks. Slot 0 is
- invalid (NULL traps); slots 1…384 are Normal user RAM backed by a
- 768 MB physical pool (`USER_POOL_PA`); slots 385…511 stay Device-
- identity for safety. The PL011 / GIC / virtio that used to live in
- the low 1 GB are now reached from kernel code via a high alias —
- `L1[4]` is a 1 GB Device block at PA 0, so VA `0x109000000` ↔ PA
- `0x09000000`. This lets the boot2 chain link at its native
- `-B 0x600000` and run unmodified on the seed kernel.
-
-3. **Bigger heap.** ✅ User pool is 768 MB (slots 1…384 × 2 MB),
- sized so tcc0/tcc-boot2 (which declare a 512 MB BSS at link base
- `0x600000` ⇒ end VA `0x20600000`) fit with a healthy brk window
- above end-of-bss. `load_elf` walks PT_LOAD segments and records
- the post-clip end-of-image in `g_user_image_end`; kmain and
- `do_execve` use it to seed `brk_base`. `brk_max` is
- `USER_VA_HI - 16 MB` (16 MB stack reserve at the top).
-
-4. **Per-segment ELF permissions.** ✅ Documented as a deliberate
- spec-permissible choice in `load_elf` — segments are RWX at EL1.
- OS.md §"Memory model" allows this; tcc-boot2 doesn't JIT.
-
-5. **`exit_group` exit-code masking.** ✅ `code &= 0xff` in
- `sys_exit_final` / `sys_exit_or_resume_parent`.
-
-## Tier 2
-
-6. **Atomic `spawn` (replaces `clone` + `execve`).** ✅ `sys_spawn`
- (private syscall 1024) folds the prelude's clone-then-immediate-
- execve sequence into a single kernel transaction. The kernel
- captures path/argv from the parent's pool into a kernel buffer,
- pushes parent state onto `proc_stack[]`, swaps `l2_user[]` to the
- alternate pool with **no memory copy** (the previous design paid one
- 768 MB `mem_cpy` per fork to seed the child's pool — needed only
- because user code ran a few interpreter cells between clone and
- execve, which would otherwise mutate parent BSS heap globals;
- folding the syscall closes that window entirely), `load_elf`s the
- new image into the alternate pool, resets brk above the new
- end-of-bss, builds a fresh user stack, and rewrites the trap frame
- so `eret` enters the child at the new entry point. `sys_waitid`
- populates siginfo at offsets 8 (CLD_EXITED) and 24 (status) per
- `scheme1/prelude.scm:497-506`. On `sys_exit_or_resume_parent` the
- kernel swaps `l2_user[]` back to the parent pool (still pristine),
- restores regs/brk/fds, runs `ic iallu`, and returns to the parent's
- `spawn()` site with `x0 = child_pid`.
-
-7. **Per-process state on a stack.** ✅ `proc_save` records regs +
- ELR + SPSR + sp_el0 + brk_base + brk_cur + fd table + which user
- pool (A or B) the parent was running in. `MAX_PROC_DEPTH = 1` — the
- scheme1 prelude only forks one level deep before waiting; one save
- frame plus two pools is all that's needed.
-
-8. **scheme1 prelude probes once, dispatches per environment.** ✅
- The same scheme1 binary runs on both Linux (boot{3,4,5} podman
- path) and the seed kernel. `prelude.scm` calls `(sys-spawn "" '())`
- once at init: on Linux that returns `-ENOSYS=38` and the prelude
- binds `(spawn …)` to the classic clone+execve sequence; on seed it
- returns `-ENOENT=2` (kernel finds no such file) and the prelude
- binds `(spawn …)` to `sys-spawn` directly. Both `sys-clone` and
- `sys-execve` primitives remain in the scheme1 binary as the Linux
- fallback path.
-
-## Verification harness
-
-9. **Output extraction.** ✅ The kernel emits a sentinel-framed
- hex dump of every tmpfs file on exit when bootargs contain the
- `dumpfs` token. Scripts:
- - [`scripts/extract-dump.sh`](../seed-kernel/scripts/extract-dump.sh) —
- scans a UART transcript for `=== DUMP-BEGIN ===` … `=== DUMP-END ===`,
- decodes each `=== FILE path=… size=… ===` payload, writes files.
-
-10. **Tier 1 gate.** ✅
- [`scripts/tier1-gate.sh`](../seed-kernel/scripts/tier1-gate.sh) —
- builds an initramfs containing a stage binary as `/init` plus
- arbitrary input files, runs the seed kernel under qemu with the
- stage's argv as bootargs, and extracts the post-run tmpfs.
- Verified against `boot0/catm` (multi-input concatenation, output
- matches host `cat`) and `boot3/tcc0` (compiles `int main(void)
- {return 42;}` into a valid aarch64 relocatable object).
-
-11. **Tier 2 gate.** ✅
- [`scripts/tier2-gate.sh`](../seed-kernel/scripts/tier2-gate.sh) —
- cats `prelude.scm` + a driver fixture into `combined.scm`, packs
- initramfs `/init=scheme1, /child-prog=<chain stage>, /combined.scm,
- <inputs>`, runs the seed kernel, asserts the driver exited 0, and
- extracts every output file. Verified end-to-end with the canonical
- fixture
- [`scripts/fixtures/tier2-tcc-driver.scm`](../seed-kernel/scripts/fixtures/tier2-tcc-driver.scm) —
- scheme1 evaluates `(run "child-prog" "-nostdlib" "-c" "-o" "out.o"
- "input.c")`, where `child-prog` is `boot3/tcc0`. Output `out.o` is a
- valid aarch64 ELF relocatable with the expected `add` and `main`
- symbols.
-
-## Things still worth doing (out of scope of the original list)
-
-- **FP/ASIMD enabled at EL0.** `setup_mmu` programs
- `CPACR_EL1.FPEN = 0b11` so user binaries can issue FP/ASIMD
- instructions without trapping. tcc-built tcc1/tcc2/tcc3 (boot4) emit
- FP register saves in their start glue; without this they trap with
- ESR EC=0x07. tcc0 (cc.scm-built) didn't, which is why the original
- Tier-2 fixture worked with FPEN=00.
-
-- **Port boot3/4 to the seed driver — landed.** A second DSL,
- [`scripts/lib-seed-runscm.sh`](../scripts/lib-seed-runscm.sh) (sibling
- to `lib-pipeline.sh`), packs an initramfs of `/init=scheme1`,
- `/run.scm` (= prelude.scm + the bootN driver), and every input file
- flat at top level; one qemu boot, scheme1 drives the chain via
- `(run …)`. `scripts/boot3.sh` ships a hand-written
- [`scripts/boot3-run.scm`](../scripts/boot3-run.scm) (catm cc-bundle →
- scheme1 libc/tcc → catm combined.M1pp → M1pp → catm linked.hex2pp →
- hex2pp -B 0x600000 → tcc0). `scripts/boot4.sh` generates run.scm via
- [`scripts/boot4-gen-runscm.sh`](../scripts/boot4-gen-runscm.sh) —
- per-arch values (LIB_HELPER_SRC, LIBTCC1_C_SRCS, LIBTCC1_ASM_SRCS,
- LIB_HELPER_DEFS) resolved on the host so the .scm body is straight-
- line `(run …)`. Both bootN.sh now branch on `DRIVER=podman|seed`,
- mirroring boot0/1/2's lib-pipeline.sh wiring.
- [`scripts/seed-accept-boot34.sh`](../scripts/seed-accept-boot34.sh)
- asserts byte-identity vs the podman path. boot3 (tcc0) and boot4
- (tcc3, hello) round-trip byte-identical. boot4's intermediate
- artifacts (`crt1.o`, `libc.a`, `libtcc1.a`) differ from the podman
- path by exactly the length of the embedded source-filename string —
- the seed harness stages files at flat basenames (`start.S`) while
- podman mounts them at `/work/in/start.S`, and tcc emits the input
- path into the .o relocations. tcc3 and hello are unaffected because
- the linker drops those strings in the final executable.
-
-- **Atomic spawn — landed (zero copy on fork).** Item 6 above describes
- the kernel side. The previous design had `sys_clone` eager-copy 768 MB
- parent→alternate-pool per fork; the only reason for the copy was the
- scheme1 prelude executing a few interpreter cells of user code in the
- child between clone and execve, which would have mutated parent BSS
- globals (`heap_next`, `current_heap_next_ptr`, `scratch_next`) if the
- child shared the parent's pool. `sys_spawn` folds clone+execve into
- one syscall, the child runs zero user code in the parent's address
- space, and the eager copy is gone. Same scheme1 binary still runs on
- Linux (boot{3,4,5} podman path) by probing `(sys-spawn "" '())` once
- at prelude init and binding `(spawn …)` to clone+execve when the probe
- returns -ENOSYS=38. boot4 acceptance still hits its `tcc2 == tcc3`
- fixed point under DRIVER=seed; per-spawn wall time on the boot4
- fixture dropped from ~5 s to well under 1 s.
-
-- **HVF acceleration.** All seed-driver qemu invocations use
- `-machine virt,gic-version=3,accel=hvf -cpu host` on macOS hosts.
- tier2-gate ≈ 22 s; seed-accept (boot0/1/2) ≈ 2 s; boot3 + boot4
- acceptance combined ≈ 5 min wall (boot3 alone was 5 min before
- sys_spawn; multi-hour under TCG without HVF).
-
-- **STT_FILE prefix strip — landed.** tcc emitted the unmodified
- argv path into each `.o`'s `STT_FILE` symbol, so podman-mounted
- `/work/in/start.S` and seed-staged flat `start.S` produced
- byte-different relocations. `simple-patches/tcc-0.9.26/strip-file-prefix`
- drops the bootstrap-internal prefixes (`/work/in/tcc-lib/`,
- `/work/in/`) under `#if BOOTSTRAP` before the symbol is emitted.
- Patch is gated on the existing `-D BOOTSTRAP=1` from
- `stage1-flatten.sh` so it bakes into `tcc.flat.c` and applies to
- cc.scm-built tcc0 plus every tccN it self-hosts. With this in,
- `seed-accept-boot34.sh` checks `tcc3`, `hello`, `crt1.o`, `libc.a`,
- and `libtcc1.a` for byte-identity vs the podman path; all pass.
-
-- **Port boot5 to the seed driver — landed.** With the per-spawn copy
- gone (sys_spawn), the naive 1300-spawn straight port works without
- needing tcc batch mode or in-kernel prelude caching. boot5.sh's
- `DRIVER=seed` branch wires
- [`scripts/boot5-gen-runscm.sh`](../scripts/boot5-gen-runscm.sh) to
- emit one `(run "tcc" …)` per source plus the CRT/ar/link tail; the
- full musl tree is staged in cpio at `/tmp/musl-1.2.5/...` (matching
- podman's tmpfs layout, so STT_FILE strings are byte-identical).
- Required kernel adjustments: `MAX_FILES` 64 → 4096 (the cpio carries
- ~2600 inputs plus ~1300 .o outputs), `path[64]` → `path[96]` (musl
- paths reach ~50 chars under the `/tmp/musl-1.2.5/obj/...` prefix),
- and a loud warning when `parse_cpio` drops files (silent drops on
- MAX_FILES exhaustion otherwise masquerade as random "include not
- found" tcc errors mid-build). New extension to lib-seed-runscm.sh:
- `seed_runscm_input_tree` stages a directory subtree into the cpio
- preserving relative paths.
- [`scripts/seed-accept-boot5.sh`](../scripts/seed-accept-boot5.sh)
- asserts byte-identity vs the podman path for libc.a, crt1.o, crti.o,
- crtn.o, hello.
-
-## Open
-
-- **NULL-page hardening.** Slot 0 is unmapped so a NULL deref faults to
- the kernel as a user sync; the kernel currently panics rather than
- delivering a SIGSEGV-equivalent. Acceptable per OS.md (default-action
- termination is sufficient) but a minor polish opportunity.
-
-- **Cache parsed prelude in kernel (optional optimization).** Each
- spawn re-parses the 24 KB `prelude.scm` from scratch. Hashing it
- once and reusing the AST across spawns would shave a fraction of
- per-spawn overhead. Not load-bearing now that sys_spawn removed the
- big copy; would matter again if a future driver crosses ~10k spawns.
+# Seed kernel — open items
+
+The [`OS.md`](OS.md) contract is fully met by [`seed-kernel/`](../seed-kernel/):
+boots via the arm64 Linux boot protocol, parses the DTB, unpacks an
+initramfs into an in-memory tmpfs, loads `/init` as a static aarch64
+ELF, dispatches the eight Tier-1 syscalls plus atomic `sys_spawn`
+(private syscall 1024, replaces POSIX `clone`+`execve`) and
+`sys_waitid`, with virtio-blk in/out transports for boot{0..5} use.
+[`scripts/tier1-gate.sh`](../seed-kernel/scripts/tier1-gate.sh) and
+[`scripts/tier2-gate.sh`](../seed-kernel/scripts/tier2-gate.sh) cover
+acceptance; `boot{0..5}.sh DRIVER=seed` is byte-identical to the
+podman path. HVF acceleration enabled.
+
+This file tracks remaining polish.
+
+## tcc3 self-host of `seed-kernel/`
+
+The C side already compiles and runs cleanly under
+[`build/aarch64/boot4/tcc3`](../scripts/boot4.sh) — `kernel.c` and
+`user/{forktest,child,hello}.c` have no inline asm, all sysreg / barrier
+/ cache / TLB / PSCI ops route through the C-callable thunks at the
+bottom of [`kernel.S`](../seed-kernel/kernel.S), and per-syscall
+wrappers delegate to a single `syscall6` toplevel-asm thunk.
+
+Remaining blockers are all in tcc 0.9.26's arm64 assembler /
+linker (see [`docs/TCC-ARM64-ASM.md`](TCC-ARM64-ASM.md) for the
+phase-1/2/3 trajectory of
+`scripts/simple-patches/tcc-0.9.26/files/arm64-asm.c`).
+
+### `kernel.S` mnemonics not yet in phase-2
+
+- `msr` / `mrs` to/from named system registers: `mair_el1`,
+ `tcr_el1`, `ttbr0_el1`, `sctlr_el1`, `cpacr_el1`, `sp_el0`,
+ `esr_el1`, `far_el1`, `vbar_el1`, `hcr_el2`, `spsr_el2`,
+ `elr_el2`, `sp_el1`. Plus the immediate / pseudo forms
+ `msr daifset, #imm` and `mrs CurrentEL`.
+- `eret`.
+- `ic iallu`, `tlbi vmalle1`.
+- `dsb sy` / `dsb ish` / `dmb ish` / `dmb ishst` by name — the
+ current assembler wants `dsb #imm` and rejects the named scope.
+- `adrp` / `adr` + `:lo12:` relocations for label addresses (the
+ EL1 boot path uses `adrp Xn, sym; add Xn, Xn, :lo12:sym`).
+- `.macro` / `.endm` (VENTRY pattern in the exception vector table).
+- `.quad sym1 - sym2` (the arm64 Image header's `image_size` field).
+
+### `kernel.lds` — no `-T` in tcc3
+
+tcc3's linker accepts `-Wl,-Ttext=`, `-Wl,-image-base=`,
+`-Wl,-section-alignment=` only; there is no `-T script`. The seed
+kernel's link layout needs:
+
+- `KEEP(*(.head.text))` placed first (boot header at `0x40080000`).
+- Link-time symbol assignments: `__bss_start`, `__bss_end`,
+ `kstack_top`, `_end`, `_image_end`.
+- A custom `.stack` section sized to 64 KB.
+
+Two paths: tcc gains a small ld-script subset, or `kernel.S`
+self-defines the symbols and reserves stack space inline (with
+`-Wl,-Ttext=0x40080000` replacing the base). The former is bigger
+but reusable across kernel-style targets; the latter ties section
+ordering to input file order and is brittle.
+
+### Toplevel `asm()` `.globl` ordering — worked around
+
+tcc 0.9.26's toplevel-asm parser leaves a symbol marked `UND` if
+`.globl name` precedes `name:`; the reverse order registers it as
+defined (gcc accepts both). Worked around in `user/{forktest,
+child,hello}.c` with `label:` first and `.globl label` immediately
+after, with a sourcecode comment noting the constraint. A tcc-side
+fix would let the comments go.
+
+## Open polish
+
+- **NULL-page hardening.** Slot 0 is unmapped so a NULL deref faults
+ to the kernel as a user sync; the kernel currently panics rather
+ than delivering a SIGSEGV-equivalent. Acceptable per OS.md
+ (default-action termination is sufficient) but a minor polish
+ opportunity.
+
+- **Cache parsed prelude in kernel.** Each spawn re-parses the
+ 24 KB `prelude.scm` from scratch. Hashing it once and reusing the
+ AST across spawns would shave a fraction of per-spawn overhead.
+ Not load-bearing now that sys_spawn removed the big copy; would
+ matter again if a future driver crosses ~10k spawns.