boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit ca3b28606cf90cb1a43bfc82ba92858c587971d2
parent 1f7b74c622a0e62f2737ae7a364996207026684b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue, 28 Apr 2026 08:00:49 -0700

docs: CC-SCRATCH plan — bound parse heap via scratch arena

Captures the diagnosis from instrumented runs (per-token interpreter
overhead at ~4.5 KB/token, with an additional O(decls^2) leak in
linear alist-walks via env-cons cells) and the plan to bound it.

Three phases:

1. Pre-refactors that shrink the cross-decl boundary contract — drop
   ps-typedefs (redundant with ps-scope kind=typedef lookups), drop
   cg-globals (write-only after the parse-fn-body arena gate goes
   away), eliminate non-essential ctype mutation (enum-spec / array
   size patch), group the surviving roots into a world record, stop
   precomputing mangled names, lock in sym immutability.

2. scheme1 two-heap primitives — current_heap_ptr indirection through
   cons/alloc_hdr/alloc_bytes, plus use-scratch-heap! /
   use-main-heap! / reset-scratch-heap! / heap-in-main? prims.

3. cc parse-decl-or-fn boundary — scratch by default; at each
   top-level decl, switch to main, deep-copy world roots and
   iter-buffer survivors via a sharing-preserving promote-map, reset
   scratch. Subsumes parse-fn-body's bespoke heap-mark/rewind dance.

Companion to TCC-TODO.md (heap blocker) and CC.md (subset spec).

Diffstat:
Adocs/CC-SCRATCH.md | 356+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 356 insertions(+), 0 deletions(-)

diff --git a/docs/CC-SCRATCH.md b/docs/CC-SCRATCH.md @@ -0,0 +1,356 @@ +# Parse arena via scratch heap + +Plan to bound parse-phase heap residency by routing parser allocations +through a separate scratch heap that resets at top-level decl +boundaries. Companion to [TCC-TODO.md](TCC-TODO.md) (the heap-blocker +that motivates this) and [CC.md](CC.md) (the C subset). + +## Why this, not mark/rewind alone + +The streaming pipeline ([commit 257fdd5](../cc/cc.scm)) reduced +intermediate materialization but didn't bound steady-state heap. +Investigation pinned the cost on per-token interpreter overhead: +every `cond` arm, `let*`, named-let recursion, and CPS closure +allocates env-cons cells in scheme1's bump arena, with no GC to +reclaim them. Measured at ~4.5 KB allocated per consumed token after +removing the O(decls²) `alist-ref` walk leak — so the full TU still +needs ~700 MB at minimum, far above the 256 MiB cap. + +`heap-mark` / `heap-rewind!` can't fix this directly. Per-decl +survivors (newly-bound sym, ctype, name-bv) are constructed *during* +the parse, so they live above the mark; rewinding them as scratch +defeats their purpose. Pre-allocating survivor cells doesn't work +either — the parser doesn't know the shape of what it's building +until it's done. + +A second heap solves it cleanly: parse runs in scratch, and at each +decl boundary we deep-copy the surviving roots into the main heap and +reset scratch. The copy-out lands in main, so it's safe to reset +scratch wholesale. + +## Plan overview + +Three phases, each independently shippable: + +1. **Pre-refactors** — semantically null, but make the boundary + contract clearer and shrink the deep-copy surface. +2. **scheme1 two-heap primitives** — `current-heap-ptr` indirection, + `(use-scratch-heap!)` / `(use-main-heap!)` / `(reset-scratch-heap!)`. +3. **cc parse-decl-or-fn boundary** — scratch-by-default, promote-and-reset + at each top-level decl; subsumes parse-fn-body's bespoke arena. + +--- + +## Phase 1 — Pre-refactors + +### 1.1 Drop `ps-typedefs` + +Redundant index. A name is a typedef iff `scope-lookup` returns a sym +with `(eq? (sym-kind sm) 'typedef)`. + +```scheme +(define (typedef? ps n) + (let ((sm (scope-lookup ps n))) + (and sm (eq? (sym-kind sm) 'typedef)))) +``` + +`parse-decl-spec` (cc.scm:4112–4117) currently calls `(typedef? ps n)` +and then `(scope-lookup ps n)` — collapses to one walk. + +Removes one root from the boundary work and one slot from `pstate`. + +### 1.2 Drop `cg-globals` + +Only reader is `parse-fn-body` (5338, 5344) — the canary for "did the +body add a global?", gating the heap-rewind. With Phase 3 that gate +goes away, and the alist becomes write-only. + +`cg-emit-global` / `cg-emit-extern` keep emitting bytes into +`cg-data` / `cg-bss` (fixed-storage); they just stop maintaining the +alist. + +If anything later wants "all emitted globals", iterate ps-scope's +file-scope frame filtered to `kind ∈ {fn,var}` with `defined?` set. + +Removes another root and a `cg` slot. + +### 1.3 Eliminate non-essential ctype mutation + +Three mutation sites today: + +- **`complete-agg!` (4208–4210)** — forward struct/union completion. + Cross-decl, structurally necessary for forward-decl identity. + **Keep.** This is the one mutation the promote walker has to + handle. +- **`parse-enum-spec` (4226)** — same-decl. Build the members list + first, then construct the enum ctype with that ext directly. + Remove the `ctype-ext-set!`. +- **`%init-fix-array-size!` (4882–4886)** — same-decl. Build the + array ctype with the right size after the initializer count is + known instead of fixing it up. Remove both `ctype-ext-set!` / + `ctype-size-set!`. + +After: ctype has exactly one mutation site, and the doc comment on +the record (321–327) shrinks to "size/align/ext mutate only on +forward struct/union completion." + +### 1.4 Group cross-boundary state into a `world` record + +Today's persistent-across-decl state is split across `pstate` (scope, +tags, typedefs) and `cg` (globals, str-pool). After 1.1/1.2 it +collapses to three roots — make them one record: + +```scheme +(define-record-type world + (%world scope tags str-pool) + world? + (scope world-scope world-scope-set!) + (tags world-tags world-tags-set!) + (str-pool world-str-pool world-str-pool-set!)) +``` + +`pstate` and `cg` both reference the same `world`. The boundary +contract reads as one record definition. If macros come back +later, `world-macros` joins the same record. + +### 1.5 Stop precomputing mangled names + +`handle-decl` does `(bytevector-append "cc__" n)` at every var/fn +binding (4727, 4736–4738, 4755) and stores the result in `sym-slot`. +But the same mangling can happen at emit time — it's deterministic +from `(sym-name sm)`. + +Drop the precompute. `cg-push-sym` / `cg-emit-global` / +`cg-emit-extern` mangle on demand. `sym-slot` becomes a clean union: +fixnum (auto local), #f (fn / non-static var), bv only for the +block-static mangled case (4736–4738). + +Saves a bv per fn/var from being copied across the boundary; shrinks +the `promote-sym` walker. + +### 1.6 Note `sym` immutability as load-bearing + +Already true (no `sym-*-set!` exists). Add a doc-comment line so it +stays true. After 1.3, the only mutable record in cc is `ctype`. +Promotion is dramatically easier when records can't be mutated +post-construction. + +--- + +## Phase 2 — scheme1 two-heap primitives + +Add a second heap region and an indirection. All Lisp-level +allocation in scheme1 goes through three sites: `cons`, `alloc_hdr`, +`alloc_bytes`. Each currently bumps `heap_next`. Route them through +`current_heap_ptr` instead. + +### 2.1 New globals + +``` +heap_buf # main heap +heap_next +heap_end + +scratch_buf # scratch heap +scratch_next +scratch_end + +current_heap_ptr # &heap_next or &scratch_next +current_heap_end # &heap_end or &scratch_end +``` + +Default at `heap_init`: main. + +### 2.2 New primitives + +| Form | Effect | +|---|---| +| `(use-scratch-heap!)` | Set `current_heap_ptr = &scratch_next`, `current_heap_end = &scratch_end`. | +| `(use-main-heap!)` | Set `current_heap_ptr = &heap_next`, `current_heap_end = &heap_end`. | +| `(reset-scratch-heap!)` | `scratch_next = scratch_buf` (round-aligned). | +| `(in-main-heap?)` | bool — true iff currently allocating in main. (Optional; useful for assertions.) | +| `(heap-in-main? obj)` | bool — true iff `obj`'s pointer falls inside `[heap_buf, heap_buf+heap_size)`. Used by promote walkers to short-circuit on already-promoted / interned objects. | + +`heap-mark` / `heap-rewind!` continue to operate on whichever heap is +current. Mostly used in scratch from now on. + +### 2.3 Sizing + +Scratch needs to cover the worst-case decl + per-token churn between +boundaries. Estimate from the per-decl probe data: max ~1 MB per +decl observed in the test inputs, well below a 16 MiB scratch. Start +with 16 MiB scratch, 256 MiB main (current cap). + +ELF `p_memsz` grows by the scratch size — bumped from 512 MB to +`512 + scratch` MB. + +### 2.4 Fault behavior + +If an alloc would overflow scratch: `runtime_error` ("scratch +exhausted"). Cleaner than silently stomping into main. The "live +data per decl" is bounded by parser state; if scratch runs out it's +a real bug or pathological input, not a sizing oversight. + +### 2.5 Error path discipline + +`die` (sys-exits) is the only abnormal exit. It doesn't need to +restore heap selection — process exits. So mid-parse errors with +the allocator pointed at scratch are fine. + +### 2.6 Test + +A scheme1-level test under `tests/scheme1/` exercises the prims: +allocate in scratch, switch to main, allocate, verify both ptrs in +their respective ranges, reset scratch, verify scratch ptrs become +dangling. Same shape as `93-heap-mark-rewind.scm`. + +--- + +## Phase 3 — cc parse-decl-or-fn boundary + +### 3.1 Driver shape + +```scheme +(define (parse-translation-unit ps) + (cond + ((eq? (tok-kind (peek ps)) 'EOF) #t) + (else + (use-scratch-heap!) + (let ((R (snapshot-roots (ps-world ps)))) + (parse-decl-or-fn ps) + (use-main-heap!) + (promote-roots! (ps-world ps) R) + (promote-iter-buffers! (ps-iter ps))) + (reset-scratch-heap!) + (parse-translation-unit ps)))) +``` + +`snapshot-roots` captures the `eq?`-heads of the three world alists +(plus any peek-buffer survivors that crossed in from the previous +decl, which are already in main). `promote-roots!` walks each alist +from its current head down to the snapshotted head, deep-copying +each new entry into main and rebuilding the chain. + +### 3.2 Promote-map for sharing preservation + +Single alist mapping `scratch-ptr → main-ptr`, cleared at each +boundary. Used by all `promote-*` walkers to preserve `eq?` identity: + +- Forward struct/union ctype completed in this decl: tags promotion + finds the (already-in-main) ctype, walks its now-scratch `ext` and + rewrites it in place via `ctype-ext-set!` — the one allowed + ctype mutation, now happening from scratch context to a main-heap + record (safe). +- Same struct ctype referenced by multiple ptrs in the same decl: + map ensures all copies point at one main-heap ctype. + +### 3.3 Promote walkers + +``` +promote-bv ; bytevector-copy +promote-loc ; new record, copy file-bv via map +promote-tok ; new record, recurse on value (if bv) + loc + hide +promote-ctype ; map-cached, branches on kind +promote-sym ; new record, copy name-bv, promote type, copy slot if bv +promote-fields ; (name-bv ctype offset) list +promote-fn-params ; (name-bv-or-#f . ctype) list +promote-enum-members ; (name-bv . fixnum) list +``` + +All short, all leaf-recursive. Scratch context calls main-heap +allocators after `(use-main-heap!)` is in effect. + +### 3.4 Promotion order + +`world-tags` first (struct/union/enum identity anchors), then +`world-scope` and `world-str-pool`. The map preserves identity +across the second pass. + +### 3.5 Iter buffer copy + +`promote-iter-buffers!` walks `(tok-iter-buf pp-iter)`, +`(tok-iter-buf lex-iter)`, and pp-state's `up-pending` / `out-buf` / +`cur-file` / `macros`. Toks in those buffers are scratch-allocated by +lex; promote each. + +For tcc.flat.c (no macros, no #line): typically 0–2 toks in pp-iter +peek buf, 0–2 in lex-iter peek buf. Bounded constant. + +### 3.6 Delete `parse-fn-body`'s arena dance + +Lines 5328–5357 collapse to: + +```scheme +(define (parse-fn-body ps name dt) + (scope-bind! ps name + (%sym name 'fn 'extern dt #f #t)) + (%parse-fn-body-inner ps name dt)) +``` + +The hoisted `scope-bind!` / heap-mark / state-eq? gate / fn-meta +cleanup all go away. The body runs in scratch like everything else. +Block-statics, string literals, block-scope tags created inside the +body get promoted at the enclosing parse-decl-or-fn boundary along +with everything else. + +The doc block at 5305–5327 (the A→B→C arena pattern) shrinks to a +pointer at this doc. + +### 3.7 cc-init runs in main + +`cg-init`, `make-pstate`, the iter constructors, `%lex-scratch`, +the cg bufs — all the long-lived top-level state — must be allocated +in main. The default heap is main, so this is automatic provided +`use-scratch-heap!` is only called inside `parse-translation-unit`. + +--- + +## Expected impact + +For tcc.flat.c (608 KB, 18 896 lines): + +| state | predicted high water | +|--------------------------------|---------------------:| +| current | heap-exhausted >256 MB | +| post Phase 1 (refactors only) | unchanged — still exhausted | +| post Phase 2 (prims, unused) | unchanged | +| post Phase 3 (boundary live) | scratch ≤ 16 MB peak; main ≤ 30 MB (cg bufs + persistent decls). Total ~45 MB. | + +The fixed cost is `cg-init`'s pre-allocated bufs (~12 MiB) plus +linear-in-decls persistent state in main heap. Per-decl peak in +scratch is bounded by the most-allocating single decl (deep-nested +declarators, large struct bodies) — well under 16 MiB on observed +inputs. + +## Migration order + +Phase 1 lands as 4–6 small commits, each independently testable. +None touch heap behavior; existing tests cover them. + +Phase 2 lands as one commit to scheme1 + a scheme1-level test. cc +is unaffected (default heap is main; cc never calls the new prims). + +Phase 3 lands as one commit to cc.scm. The cc test suites and +tcc.flat.c probe (TCC-TODO.md repro) are the validation. + +## Open questions + +- **Do we want to also wrap `pp-iter-pull` per-token?** Inside + scratch, it costs nothing — the env-cons cells go to scratch and + reset at the next decl boundary. Probably no, simpler. +- **Promote-map as alist vs. hash-table.** Per-decl entry count is + small (< 100 typical, < 500 worst case), so alist is fine. Hash + is a scheme1 capability change anyway. +- **Do we need an `(in-main-heap thunk)` wrapper** that + save/use-main/run/restore for inline copy-out points? + Probably yes — makes promote walkers callable from inside other + contexts without manual juggling. Decide during Phase 3 build. + +## Related + +- [TCC-TODO.md](TCC-TODO.md) — the heap-explosion blocker this + plan resolves. +- [CC.md](CC.md) — the C subset the cc accepts. +- The deleted [CC-STREAM.md] (visible at git rev `67e43e7`) — prior + plan that turned the pipeline streaming. This plan addresses what + remained after streaming.