boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 56150a05235e38ca5c19d28d886e1f5fa0dc2fb6
parent 7e6e49004c67be0c88cce368096d626450ec804a
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue,  5 May 2026 12:03:45 -0700

docs/OS-TODO: collapse landed items; add tcc3 self-host roadmap

The eleven OS.md items, seed-driver port for boot{3,4,5}, atomic
sys_spawn, HVF, STT_FILE prefix strip, and virtio-blk transport are
all in. Move them out of the body and into a one-paragraph done
summary. Replace with a tcc3 self-host section listing the kernel.S
mnemonics, sysreg names, and ld-script features still missing from
tcc 0.9.26's arm64 assembler/linker — the remaining gap before the
kernel can rebuild itself out of boot4.

Diffstat:
Mdocs/OS-TODO.md | 302++++++++++++++++++++++---------------------------------------------------------
1 file changed, 84 insertions(+), 218 deletions(-)

diff --git a/docs/OS-TODO.md b/docs/OS-TODO.md @@ -1,218 +1,84 @@ -# Seed kernel — gaps against docs/OS.md - -Audit of [`seed-kernel/`](../seed-kernel/) against the contract in -[`OS.md`](OS.md). All eleven items are now resolved — the seed kernel -boots, parses the DTB, unpacks an initramfs into an in-memory tmpfs, -loads `/init` as a static aarch64 ELF, dispatches the eight Tier-1 -syscalls plus a single atomic `sys_spawn` (a private syscall replacing -the original POSIX-style `clone`+`execve` pair) and `sys_waitid`, and -supports both the host-side verification gates `scripts/tier1-gate.sh` -and `scripts/tier2-gate.sh`. Verified against `boot0/catm`, -`boot1/M1pp`, and `boot3/tcc0`; the canonical Tier-2 case (scheme1 -driver spawns tcc0 to compile a `.c` into a relocatable ELF object) -round-trips end-to-end. The bootN scripts themselves now run on -seed-kernel via `DRIVER=seed scripts/bootN.sh aarch64` for -N∈{0,1,2,3,4,5}, producing byte-identical outputs to the podman path; -`scripts/seed-accept.sh` exercises the boot2-built scheme1 spawning -the boot2-built catm via the .scm prelude. - -## Tier 1 - -1. **Real argv.** ✅ `build_user_stack` takes `(argc, argv[])`. argv is - sourced from `/chosen/bootargs` (whitespace-tokenised), then from a - `/init.argv` file in the initramfs, with `argc=1, argv[0]="init"` - as the final fallback. The kernel reserves a `dumpfs` token in - bootargs (stripped from user argv) that triggers the UART tmpfs - dump on exit (item 9). - -2. **User load address.** ✅ Per-process L2 page table installs an - `l2_user[]` covering the low 1 GB of VA in 2 MB blocks. Slot 0 is - invalid (NULL traps); slots 1…384 are Normal user RAM backed by a - 768 MB physical pool (`USER_POOL_PA`); slots 385…511 stay Device- - identity for safety. The PL011 / GIC / virtio that used to live in - the low 1 GB are now reached from kernel code via a high alias — - `L1[4]` is a 1 GB Device block at PA 0, so VA `0x109000000` ↔ PA - `0x09000000`. This lets the boot2 chain link at its native - `-B 0x600000` and run unmodified on the seed kernel. - -3. **Bigger heap.** ✅ User pool is 768 MB (slots 1…384 × 2 MB), - sized so tcc0/tcc-boot2 (which declare a 512 MB BSS at link base - `0x600000` ⇒ end VA `0x20600000`) fit with a healthy brk window - above end-of-bss. `load_elf` walks PT_LOAD segments and records - the post-clip end-of-image in `g_user_image_end`; kmain and - `do_execve` use it to seed `brk_base`. `brk_max` is - `USER_VA_HI - 16 MB` (16 MB stack reserve at the top). - -4. **Per-segment ELF permissions.** ✅ Documented as a deliberate - spec-permissible choice in `load_elf` — segments are RWX at EL1. - OS.md §"Memory model" allows this; tcc-boot2 doesn't JIT. - -5. **`exit_group` exit-code masking.** ✅ `code &= 0xff` in - `sys_exit_final` / `sys_exit_or_resume_parent`. - -## Tier 2 - -6. **Atomic `spawn` (replaces `clone` + `execve`).** ✅ `sys_spawn` - (private syscall 1024) folds the prelude's clone-then-immediate- - execve sequence into a single kernel transaction. The kernel - captures path/argv from the parent's pool into a kernel buffer, - pushes parent state onto `proc_stack[]`, swaps `l2_user[]` to the - alternate pool with **no memory copy** (the previous design paid one - 768 MB `mem_cpy` per fork to seed the child's pool — needed only - because user code ran a few interpreter cells between clone and - execve, which would otherwise mutate parent BSS heap globals; - folding the syscall closes that window entirely), `load_elf`s the - new image into the alternate pool, resets brk above the new - end-of-bss, builds a fresh user stack, and rewrites the trap frame - so `eret` enters the child at the new entry point. `sys_waitid` - populates siginfo at offsets 8 (CLD_EXITED) and 24 (status) per - `scheme1/prelude.scm:497-506`. On `sys_exit_or_resume_parent` the - kernel swaps `l2_user[]` back to the parent pool (still pristine), - restores regs/brk/fds, runs `ic iallu`, and returns to the parent's - `spawn()` site with `x0 = child_pid`. - -7. **Per-process state on a stack.** ✅ `proc_save` records regs + - ELR + SPSR + sp_el0 + brk_base + brk_cur + fd table + which user - pool (A or B) the parent was running in. `MAX_PROC_DEPTH = 1` — the - scheme1 prelude only forks one level deep before waiting; one save - frame plus two pools is all that's needed. - -8. **scheme1 prelude probes once, dispatches per environment.** ✅ - The same scheme1 binary runs on both Linux (boot{3,4,5} podman - path) and the seed kernel. `prelude.scm` calls `(sys-spawn "" '())` - once at init: on Linux that returns `-ENOSYS=38` and the prelude - binds `(spawn …)` to the classic clone+execve sequence; on seed it - returns `-ENOENT=2` (kernel finds no such file) and the prelude - binds `(spawn …)` to `sys-spawn` directly. Both `sys-clone` and - `sys-execve` primitives remain in the scheme1 binary as the Linux - fallback path. - -## Verification harness - -9. **Output extraction.** ✅ The kernel emits a sentinel-framed - hex dump of every tmpfs file on exit when bootargs contain the - `dumpfs` token. Scripts: - - [`scripts/extract-dump.sh`](../seed-kernel/scripts/extract-dump.sh) — - scans a UART transcript for `=== DUMP-BEGIN ===` … `=== DUMP-END ===`, - decodes each `=== FILE path=… size=… ===` payload, writes files. - -10. **Tier 1 gate.** ✅ - [`scripts/tier1-gate.sh`](../seed-kernel/scripts/tier1-gate.sh) — - builds an initramfs containing a stage binary as `/init` plus - arbitrary input files, runs the seed kernel under qemu with the - stage's argv as bootargs, and extracts the post-run tmpfs. - Verified against `boot0/catm` (multi-input concatenation, output - matches host `cat`) and `boot3/tcc0` (compiles `int main(void) - {return 42;}` into a valid aarch64 relocatable object). - -11. **Tier 2 gate.** ✅ - [`scripts/tier2-gate.sh`](../seed-kernel/scripts/tier2-gate.sh) — - cats `prelude.scm` + a driver fixture into `combined.scm`, packs - initramfs `/init=scheme1, /child-prog=<chain stage>, /combined.scm, - <inputs>`, runs the seed kernel, asserts the driver exited 0, and - extracts every output file. Verified end-to-end with the canonical - fixture - [`scripts/fixtures/tier2-tcc-driver.scm`](../seed-kernel/scripts/fixtures/tier2-tcc-driver.scm) — - scheme1 evaluates `(run "child-prog" "-nostdlib" "-c" "-o" "out.o" - "input.c")`, where `child-prog` is `boot3/tcc0`. Output `out.o` is a - valid aarch64 ELF relocatable with the expected `add` and `main` - symbols. - -## Things still worth doing (out of scope of the original list) - -- **FP/ASIMD enabled at EL0.** `setup_mmu` programs - `CPACR_EL1.FPEN = 0b11` so user binaries can issue FP/ASIMD - instructions without trapping. tcc-built tcc1/tcc2/tcc3 (boot4) emit - FP register saves in their start glue; without this they trap with - ESR EC=0x07. tcc0 (cc.scm-built) didn't, which is why the original - Tier-2 fixture worked with FPEN=00. - -- **Port boot3/4 to the seed driver — landed.** A second DSL, - [`scripts/lib-seed-runscm.sh`](../scripts/lib-seed-runscm.sh) (sibling - to `lib-pipeline.sh`), packs an initramfs of `/init=scheme1`, - `/run.scm` (= prelude.scm + the bootN driver), and every input file - flat at top level; one qemu boot, scheme1 drives the chain via - `(run …)`. `scripts/boot3.sh` ships a hand-written - [`scripts/boot3-run.scm`](../scripts/boot3-run.scm) (catm cc-bundle → - scheme1 libc/tcc → catm combined.M1pp → M1pp → catm linked.hex2pp → - hex2pp -B 0x600000 → tcc0). `scripts/boot4.sh` generates run.scm via - [`scripts/boot4-gen-runscm.sh`](../scripts/boot4-gen-runscm.sh) — - per-arch values (LIB_HELPER_SRC, LIBTCC1_C_SRCS, LIBTCC1_ASM_SRCS, - LIB_HELPER_DEFS) resolved on the host so the .scm body is straight- - line `(run …)`. Both bootN.sh now branch on `DRIVER=podman|seed`, - mirroring boot0/1/2's lib-pipeline.sh wiring. - [`scripts/seed-accept-boot34.sh`](../scripts/seed-accept-boot34.sh) - asserts byte-identity vs the podman path. boot3 (tcc0) and boot4 - (tcc3, hello) round-trip byte-identical. boot4's intermediate - artifacts (`crt1.o`, `libc.a`, `libtcc1.a`) differ from the podman - path by exactly the length of the embedded source-filename string — - the seed harness stages files at flat basenames (`start.S`) while - podman mounts them at `/work/in/start.S`, and tcc emits the input - path into the .o relocations. tcc3 and hello are unaffected because - the linker drops those strings in the final executable. - -- **Atomic spawn — landed (zero copy on fork).** Item 6 above describes - the kernel side. The previous design had `sys_clone` eager-copy 768 MB - parent→alternate-pool per fork; the only reason for the copy was the - scheme1 prelude executing a few interpreter cells of user code in the - child between clone and execve, which would have mutated parent BSS - globals (`heap_next`, `current_heap_next_ptr`, `scratch_next`) if the - child shared the parent's pool. `sys_spawn` folds clone+execve into - one syscall, the child runs zero user code in the parent's address - space, and the eager copy is gone. Same scheme1 binary still runs on - Linux (boot{3,4,5} podman path) by probing `(sys-spawn "" '())` once - at prelude init and binding `(spawn …)` to clone+execve when the probe - returns -ENOSYS=38. boot4 acceptance still hits its `tcc2 == tcc3` - fixed point under DRIVER=seed; per-spawn wall time on the boot4 - fixture dropped from ~5 s to well under 1 s. - -- **HVF acceleration.** All seed-driver qemu invocations use - `-machine virt,gic-version=3,accel=hvf -cpu host` on macOS hosts. - tier2-gate ≈ 22 s; seed-accept (boot0/1/2) ≈ 2 s; boot3 + boot4 - acceptance combined ≈ 5 min wall (boot3 alone was 5 min before - sys_spawn; multi-hour under TCG without HVF). - -- **STT_FILE prefix strip — landed.** tcc emitted the unmodified - argv path into each `.o`'s `STT_FILE` symbol, so podman-mounted - `/work/in/start.S` and seed-staged flat `start.S` produced - byte-different relocations. `simple-patches/tcc-0.9.26/strip-file-prefix` - drops the bootstrap-internal prefixes (`/work/in/tcc-lib/`, - `/work/in/`) under `#if BOOTSTRAP` before the symbol is emitted. - Patch is gated on the existing `-D BOOTSTRAP=1` from - `stage1-flatten.sh` so it bakes into `tcc.flat.c` and applies to - cc.scm-built tcc0 plus every tccN it self-hosts. With this in, - `seed-accept-boot34.sh` checks `tcc3`, `hello`, `crt1.o`, `libc.a`, - and `libtcc1.a` for byte-identity vs the podman path; all pass. - -- **Port boot5 to the seed driver — landed.** With the per-spawn copy - gone (sys_spawn), the naive 1300-spawn straight port works without - needing tcc batch mode or in-kernel prelude caching. boot5.sh's - `DRIVER=seed` branch wires - [`scripts/boot5-gen-runscm.sh`](../scripts/boot5-gen-runscm.sh) to - emit one `(run "tcc" …)` per source plus the CRT/ar/link tail; the - full musl tree is staged in cpio at `/tmp/musl-1.2.5/...` (matching - podman's tmpfs layout, so STT_FILE strings are byte-identical). - Required kernel adjustments: `MAX_FILES` 64 → 4096 (the cpio carries - ~2600 inputs plus ~1300 .o outputs), `path[64]` → `path[96]` (musl - paths reach ~50 chars under the `/tmp/musl-1.2.5/obj/...` prefix), - and a loud warning when `parse_cpio` drops files (silent drops on - MAX_FILES exhaustion otherwise masquerade as random "include not - found" tcc errors mid-build). New extension to lib-seed-runscm.sh: - `seed_runscm_input_tree` stages a directory subtree into the cpio - preserving relative paths. - [`scripts/seed-accept-boot5.sh`](../scripts/seed-accept-boot5.sh) - asserts byte-identity vs the podman path for libc.a, crt1.o, crti.o, - crtn.o, hello. - -## Open - -- **NULL-page hardening.** Slot 0 is unmapped so a NULL deref faults to - the kernel as a user sync; the kernel currently panics rather than - delivering a SIGSEGV-equivalent. Acceptable per OS.md (default-action - termination is sufficient) but a minor polish opportunity. - -- **Cache parsed prelude in kernel (optional optimization).** Each - spawn re-parses the 24 KB `prelude.scm` from scratch. Hashing it - once and reusing the AST across spawns would shave a fraction of - per-spawn overhead. Not load-bearing now that sys_spawn removed the - big copy; would matter again if a future driver crosses ~10k spawns. +# Seed kernel — open items + +The [`OS.md`](OS.md) contract is fully met by [`seed-kernel/`](../seed-kernel/): +boots via the arm64 Linux boot protocol, parses the DTB, unpacks an +initramfs into an in-memory tmpfs, loads `/init` as a static aarch64 +ELF, dispatches the eight Tier-1 syscalls plus atomic `sys_spawn` +(private syscall 1024, replaces POSIX `clone`+`execve`) and +`sys_waitid`, with virtio-blk in/out transports for boot{0..5} use. +[`scripts/tier1-gate.sh`](../seed-kernel/scripts/tier1-gate.sh) and +[`scripts/tier2-gate.sh`](../seed-kernel/scripts/tier2-gate.sh) cover +acceptance; `boot{0..5}.sh DRIVER=seed` is byte-identical to the +podman path. HVF acceleration enabled. + +This file tracks remaining polish. + +## tcc3 self-host of `seed-kernel/` + +The C side already compiles and runs cleanly under +[`build/aarch64/boot4/tcc3`](../scripts/boot4.sh) — `kernel.c` and +`user/{forktest,child,hello}.c` have no inline asm, all sysreg / barrier +/ cache / TLB / PSCI ops route through the C-callable thunks at the +bottom of [`kernel.S`](../seed-kernel/kernel.S), and per-syscall +wrappers delegate to a single `syscall6` toplevel-asm thunk. + +Remaining blockers are all in tcc 0.9.26's arm64 assembler / +linker (see [`docs/TCC-ARM64-ASM.md`](TCC-ARM64-ASM.md) for the +phase-1/2/3 trajectory of +`scripts/simple-patches/tcc-0.9.26/files/arm64-asm.c`). + +### `kernel.S` mnemonics not yet in phase-2 + +- `msr` / `mrs` to/from named system registers: `mair_el1`, + `tcr_el1`, `ttbr0_el1`, `sctlr_el1`, `cpacr_el1`, `sp_el0`, + `esr_el1`, `far_el1`, `vbar_el1`, `hcr_el2`, `spsr_el2`, + `elr_el2`, `sp_el1`. Plus the immediate / pseudo forms + `msr daifset, #imm` and `mrs CurrentEL`. +- `eret`. +- `ic iallu`, `tlbi vmalle1`. +- `dsb sy` / `dsb ish` / `dmb ish` / `dmb ishst` by name — the + current assembler wants `dsb #imm` and rejects the named scope. +- `adrp` / `adr` + `:lo12:` relocations for label addresses (the + EL1 boot path uses `adrp Xn, sym; add Xn, Xn, :lo12:sym`). +- `.macro` / `.endm` (VENTRY pattern in the exception vector table). +- `.quad sym1 - sym2` (the arm64 Image header's `image_size` field). + +### `kernel.lds` — no `-T` in tcc3 + +tcc3's linker accepts `-Wl,-Ttext=`, `-Wl,-image-base=`, +`-Wl,-section-alignment=` only; there is no `-T script`. The seed +kernel's link layout needs: + +- `KEEP(*(.head.text))` placed first (boot header at `0x40080000`). +- Link-time symbol assignments: `__bss_start`, `__bss_end`, + `kstack_top`, `_end`, `_image_end`. +- A custom `.stack` section sized to 64 KB. + +Two paths: tcc gains a small ld-script subset, or `kernel.S` +self-defines the symbols and reserves stack space inline (with +`-Wl,-Ttext=0x40080000` replacing the base). The former is bigger +but reusable across kernel-style targets; the latter ties section +ordering to input file order and is brittle. + +### Toplevel `asm()` `.globl` ordering — worked around + +tcc 0.9.26's toplevel-asm parser leaves a symbol marked `UND` if +`.globl name` precedes `name:`; the reverse order registers it as +defined (gcc accepts both). Worked around in `user/{forktest, +child,hello}.c` with `label:` first and `.globl label` immediately +after, with a sourcecode comment noting the constraint. A tcc-side +fix would let the comments go. + +## Open polish + +- **NULL-page hardening.** Slot 0 is unmapped so a NULL deref faults + to the kernel as a user sync; the kernel currently panics rather + than delivering a SIGSEGV-equivalent. Acceptable per OS.md + (default-action termination is sufficient) but a minor polish + opportunity. + +- **Cache parsed prelude in kernel.** Each spawn re-parses the + 24 KB `prelude.scm` from scratch. Hashing it once and reusing the + AST across spawns would shave a fraction of per-spawn overhead. + Not load-bearing now that sys_spawn removed the big copy; would + matter again if a future driver crosses ~10k spawns.