boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit bcb244c97bd9a962f8f7fb969f770d7e8d00a195
parent 060023ae36abcbc5aad2e62383201b9ef0f3834c
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Wed, 29 Apr 2026 20:49:37 -0700

Make trace macro preserve registers

Diffstat:
MP1/P1pp.P1pp | 37+++++++++++++++++++++++++++++++------
Mdocs/DEBUG.md | 22++++++++++++----------
Mdocs/TCC-TODO.md | 90+++++++++++++++++++++++++++++++++++++++++++++++++++++---------------------------
3 files changed, 103 insertions(+), 46 deletions(-)

diff --git a/P1/P1pp.P1pp b/P1/P1pp.P1pp @@ -1441,9 +1441,9 @@ # # %trace(tag_addr, tag_len) — emit a runtime stderr probe at the call # site. Prints `[trace @0xHEX TAG]\n` to stderr, where 0xHEX is the -# runtime address of the instruction immediately following the trace's -# call sequence (the address of `:@here` in this site's expansion) and -# TAG is the byte string at [tag_addr..tag_addr+tag_len). +# runtime address of this trace site (the address of `:@here` in this +# site's expansion) and TAG is the byte string at +# [tag_addr..tag_addr+tag_len). # # `tag_addr` is a label reference token (e.g. `&cc__str_3`) — the # caller is responsible for emitting the bytes at that label. cc.scm's @@ -1457,15 +1457,40 @@ # guarantees that each function's first instruction *is* a trace call, # so the printed address falls on a known function-entry boundary. # -# Clobbers: a0..a2, ra, t0..t2 (per %call ABI). Use only inside a %fn -# body where the caller has already spilled live argument regs (or -# doesn't need them past the trace point). +# Preserves all exposed P1 registers (a0..a3, t0..t2, s0..s3) by +# borrowing 112 aligned bytes below the current stack pointer: 16 bytes +# for the backend frame prefix plus 88 bytes for saved registers. Use +# only inside an active %fn body, after %enter and before %eret. %macro trace(tag_addr, tag_len) :@here + %addi(sp, sp, -112) + %st(a0, sp, 0) + %st(a1, sp, 8) + %st(a2, sp, 16) + %st(a3, sp, 24) + %st(t0, sp, 32) + %st(t1, sp, 40) + %st(t2, sp, 48) + %st(s0, sp, 56) + %st(s1, sp, 64) + %st(s2, sp, 72) + %st(s3, sp, 80) %la(a0, &@here) %la(a1, tag_addr) %li(a2, tag_len) %call(&libp1pp__trace) + %ld(a0, sp, 0) + %ld(a1, sp, 8) + %ld(a2, sp, 16) + %ld(a3, sp, 24) + %ld(t0, sp, 32) + %ld(t1, sp, 40) + %ld(t2, sp, 48) + %ld(s0, sp, 56) + %ld(s1, sp, 64) + %ld(s2, sp, 72) + %ld(s3, sp, 80) + %addi(sp, sp, 112) %endm # libp1pp__trace(addr=a0, tag_addr=a1, tag_len=a2) — print diff --git a/docs/DEBUG.md b/docs/DEBUG.md @@ -13,10 +13,8 @@ body. At runtime each entry prints one line: [trace @601a34 main] ``` -The hex is the runtime address of the instruction immediately after -the trace's call sequence (i.e. the first instruction of the body -proper). The trailing word is the mangled function name, interned -through cc's regular string pool. +The hex is the runtime address of the trace site. The trailing word is +the mangled function name, interned through cc's regular string pool. Build + run: @@ -34,12 +32,16 @@ make tcc-boot2 ARCH=aarch64 CC_TRACE_EMIT=1 ./build/aarch64/tcc-boot2/tcc-boot2 -version 2>trace.log ``` -Cost: ~6 instructions + one call per traced function. Off by default; -the `%trace` macro itself lives in [P1/P1pp.P1pp](../P1/P1pp.P1pp) -(§Tracepoint) and can also be invoked manually — drop a -`%trace(&label, len)` into any `combined.M1pp` snapshot under -`build/$ARCH/.work/<src>/`, re-run the m1pp/M0/hex2 stages, and bisect -by stderr position. +Cost: register save/restore traffic plus one call per traced function. +Off by default; the `%trace` macro itself lives in +[P1/P1pp.P1pp](../P1/P1pp.P1pp) (§Tracepoint) and can also be invoked +manually — drop a `%trace(&label, len)` into any `combined.M1pp` +snapshot under `build/$ARCH/.work/<src>/`, re-run the m1pp/M0/hex2 +stages, and bisect by stderr position. `%trace` preserves the exposed +P1 registers (`a0..a3`, `t0..t2`, `s0..s3`) by borrowing temporary +stack space, so it is safe to add inside an active `%fn` body after +the function prologue. The borrowed area includes the backend's +standard frame prefix, so trace saves stay below the caller's frame. To map an address back to its function, see the lookup tool below. diff --git a/docs/TCC-TODO.md b/docs/TCC-TODO.md @@ -37,7 +37,7 @@ head -c 50000 build/tcc/X86_64/tcc.flat.c \ # then re-run the podman invocation against tcc.head.c ``` -## Status — parse + cg-finish complete on tcc.flat.c +## Status — tcc-boot2 builds; runtime segfault remains The full 608 KB TU now parses to EOF (line 18800) and cg-finish emits ~6.5 MB of P1pp. No semantic-coverage gap remains in this TU. Last @@ -53,18 +53,51 @@ aarch64 cc-debug run: [cc] phase=cg-finish: heap 90 674 020 out-bytes 6 489 215 ``` -The remaining work is downstream of cc.scm: +The emitted P1pp now assembles through m1pp → M0 → hex2 and links with +the mes-libc subset via the `tcc-boot2` make target. The active blocker +is runtime correctness: `build/aarch64/tcc-boot2/tcc-boot2 -version` +still exits 139 with no stdout. -1. **Assemble the emitted P1pp** through the existing - `scripts/boot-build-p1pp.sh` pipeline (m1pp → M0 → hex2). The output - is large by P1pp standards — about 2× the scheme1 binary's input — - so this exercises m1pp/M0 throughput at a scale they haven't yet - been used at. Expect to find table size or scratch caps that need - bumping in those tools, or P1pp emission patterns cc.scm produces - that the macro layer doesn't accept verbatim. -2. **Run the resulting `tcc-boot2`** and verify `-version`. Beyond - that, milestone 4 in [CC.md §Validation milestones](CC.md) — full - self-host of tcc — is the end goal. +Current traced aarch64 crash tail with `CC_TRACE_EMIT=1`: + +``` +[trace @663108 cc__next_nomacro] +[trace @662d68 cc__next_nomacro_spc] +[trace @658d20 cc__next_nomacro1] +[trace @630580 cc__tok_alloc_new] +[trace @62d228 cc__tal_realloc_impl] +[trace @607bb4 memcpy] +[trace @6078e8 _memcpy] +Segmentation fault (core dumped) +``` + +Address lookup for the tail: + +``` +0x630580 cc__tok_alloc_new+0x30 +0x62d228 cc__tal_realloc_impl+0x30 +0x607bb4 memcpy+0x30 +0x6078e8 _memcpy+0x30 +``` + +Source review puts the final `memcpy` after `tal_realloc_impl` returns +in `tok_alloc_new`: + +``` +ts = tal_realloc_impl(&toksym_alloc, 0, sizeof(TokenSym) + len); +... +memcpy(ts->str, str, len); +``` + +So the next investigation should focus on the returned `TokenSym` +pointer, the computed `TokenSym::str` offset, and the `len` / `str` +arguments at that call site. The reduced +`tests/cc-libc/18-tinyalloc-token.c` fixture currently passes, including +with traced libc, so the failing condition likely depends on the full +tcc struct layout or parser token stream rather than TinyAlloc alone. + +Milestone 4 in [CC.md §Validation milestones](CC.md) remains the end +goal: compile tcc and verify `tcc-boot2 -version` runs. Harness target: `make tcc-boot2 ARCH=amd64` (see Makefile + `scripts/boot-build-cc.sh`) drives stage1-flatten on the host, runs @@ -281,32 +314,29 @@ decl complete with parse heap at ~31 MB on the 1612-line cut. See [DEBUG.md](DEBUG.md) — `CC_TRACE_EMIT=1` injects per-function-entry stderr probes; `m1-symbols.py lookup` resolves the printed addresses -back to functions. +back to functions. `%trace` now saves/restores all exposed P1 registers +(`a0..a3`, `t0..t2`, `s0..s3`) by borrowing stack space inside the +current `%fn` frame, so manual probes can be inserted in live code +without clobbering caller state. ## Expected next-tier blockers (downstream of cc.scm) -The semantic parser has covered every construct in this TU. The next -likely walls live in the assembly side and at runtime: - -- **m1pp / M0 / hex2 caps under a 6.5 MB P1pp**. These tools have only - ever been driven against scheme1-scale inputs (tens to hundreds of - KB of source, maybe a few MB after expansion). cc.scm's tcc.c output - is ~6.5 MB pre-expansion. Expect symbol-table, line-buffer, or - scratch-arena caps to need bumping. -- **Patterns cc.scm emits that m1pp / M0 don't accept**. Until now the - cc has only been validated against the small `tests/cc/*` programs. - Larger programs may hit edge cases in label naming, literal sizing, - or directive ordering that the existing tests didn't reach. -- **Wall-clock**. Parsing to EOF takes ~30 s under scheme1 today; - cg-finish adds another bump. Assembly is in addition. A first end- - to-end run will set the baseline. +The semantic parser has covered every construct in this TU, and the +large P1pp output now makes it through m1pp / M0 / hex2. The next likely +walls are runtime/codegen mismatches: + - **`tcc-boot2 -version` correctness**. Even when the toolchain produces an ELF, the runtime still has to walk through tcc's setup (string-table init, command-line parsing, output for `-version`) without tripping on cg semantics that pass the small tests but diverge from C in subtle ways. -- **libc**. The 39 unresolved externals in [LIBC.txt](LIBC.txt) are - unmet — there is no libc in the link today. See §libc strategy below. +- **Struct layout / flexible-tail object correctness**. The current + crash path is `tok_alloc_new` copying into `TokenSym::str`, so offsets + around `TokenSym`, `TinyAlloc`, and related tcc structs are high-value + targets for small focused tests. +- **libc behavior under full tcc load**. The mes-libc subset is now in + the link, but runtime helpers still need validation under tcc's actual + allocation/string/token workloads. The end goal is milestone 4 in [CC.md §Validation milestones](CC.md) — "Compile tcc.c (under the tcc-mes defines) → tcc-boot2; verify