commit ca3b28606cf90cb1a43bfc82ba92858c587971d2
parent 1f7b74c622a0e62f2737ae7a364996207026684b
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Tue, 28 Apr 2026 08:00:49 -0700
docs: CC-SCRATCH plan — bound parse heap via scratch arena
Captures the diagnosis from instrumented runs (per-token interpreter
overhead at ~4.5 KB/token, with an additional O(decls^2) leak in
linear alist-walks via env-cons cells) and the plan to bound it.
Three phases:
1. Pre-refactors that shrink the cross-decl boundary contract — drop
ps-typedefs (redundant with ps-scope kind=typedef lookups), drop
cg-globals (write-only after the parse-fn-body arena gate goes
away), eliminate non-essential ctype mutation (enum-spec / array
size patch), group the surviving roots into a world record, stop
precomputing mangled names, lock in sym immutability.
2. scheme1 two-heap primitives — current_heap_ptr indirection through
cons/alloc_hdr/alloc_bytes, plus use-scratch-heap! /
use-main-heap! / reset-scratch-heap! / heap-in-main? prims.
3. cc parse-decl-or-fn boundary — scratch by default; at each
top-level decl, switch to main, deep-copy world roots and
iter-buffer survivors via a sharing-preserving promote-map, reset
scratch. Subsumes parse-fn-body's bespoke heap-mark/rewind dance.
Companion to TCC-TODO.md (heap blocker) and CC.md (subset spec).
Diffstat:
| A | docs/CC-SCRATCH.md | | | 356 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 356 insertions(+), 0 deletions(-)
diff --git a/docs/CC-SCRATCH.md b/docs/CC-SCRATCH.md
@@ -0,0 +1,356 @@
+# Parse arena via scratch heap
+
+Plan to bound parse-phase heap residency by routing parser allocations
+through a separate scratch heap that resets at top-level decl
+boundaries. Companion to [TCC-TODO.md](TCC-TODO.md) (the heap-blocker
+that motivates this) and [CC.md](CC.md) (the C subset).
+
+## Why this, not mark/rewind alone
+
+The streaming pipeline ([commit 257fdd5](../cc/cc.scm)) reduced
+intermediate materialization but didn't bound steady-state heap.
+Investigation pinned the cost on per-token interpreter overhead:
+every `cond` arm, `let*`, named-let recursion, and CPS closure
+allocates env-cons cells in scheme1's bump arena, with no GC to
+reclaim them. Measured at ~4.5 KB allocated per consumed token after
+removing the O(decls²) `alist-ref` walk leak — so the full TU still
+needs ~700 MB at minimum, far above the 256 MiB cap.
+
+`heap-mark` / `heap-rewind!` can't fix this directly. Per-decl
+survivors (newly-bound sym, ctype, name-bv) are constructed *during*
+the parse, so they live above the mark; rewinding them as scratch
+defeats their purpose. Pre-allocating survivor cells doesn't work
+either — the parser doesn't know the shape of what it's building
+until it's done.
+
+A second heap solves it cleanly: parse runs in scratch, and at each
+decl boundary we deep-copy the surviving roots into the main heap and
+reset scratch. The copy-out lands in main, so it's safe to reset
+scratch wholesale.
+
+## Plan overview
+
+Three phases, each independently shippable:
+
+1. **Pre-refactors** — semantically null, but make the boundary
+ contract clearer and shrink the deep-copy surface.
+2. **scheme1 two-heap primitives** — `current-heap-ptr` indirection,
+ `(use-scratch-heap!)` / `(use-main-heap!)` / `(reset-scratch-heap!)`.
+3. **cc parse-decl-or-fn boundary** — scratch-by-default, promote-and-reset
+ at each top-level decl; subsumes parse-fn-body's bespoke arena.
+
+---
+
+## Phase 1 — Pre-refactors
+
+### 1.1 Drop `ps-typedefs`
+
+Redundant index. A name is a typedef iff `scope-lookup` returns a sym
+with `(eq? (sym-kind sm) 'typedef)`.
+
+```scheme
+(define (typedef? ps n)
+ (let ((sm (scope-lookup ps n)))
+ (and sm (eq? (sym-kind sm) 'typedef))))
+```
+
+`parse-decl-spec` (cc.scm:4112–4117) currently calls `(typedef? ps n)`
+and then `(scope-lookup ps n)` — collapses to one walk.
+
+Removes one root from the boundary work and one slot from `pstate`.
+
+### 1.2 Drop `cg-globals`
+
+Only reader is `parse-fn-body` (5338, 5344) — the canary for "did the
+body add a global?", gating the heap-rewind. With Phase 3 that gate
+goes away, and the alist becomes write-only.
+
+`cg-emit-global` / `cg-emit-extern` keep emitting bytes into
+`cg-data` / `cg-bss` (fixed-storage); they just stop maintaining the
+alist.
+
+If anything later wants "all emitted globals", iterate ps-scope's
+file-scope frame filtered to `kind ∈ {fn,var}` with `defined?` set.
+
+Removes another root and a `cg` slot.
+
+### 1.3 Eliminate non-essential ctype mutation
+
+Three mutation sites today:
+
+- **`complete-agg!` (4208–4210)** — forward struct/union completion.
+ Cross-decl, structurally necessary for forward-decl identity.
+ **Keep.** This is the one mutation the promote walker has to
+ handle.
+- **`parse-enum-spec` (4226)** — same-decl. Build the members list
+ first, then construct the enum ctype with that ext directly.
+ Remove the `ctype-ext-set!`.
+- **`%init-fix-array-size!` (4882–4886)** — same-decl. Build the
+ array ctype with the right size after the initializer count is
+ known instead of fixing it up. Remove both `ctype-ext-set!` /
+ `ctype-size-set!`.
+
+After: ctype has exactly one mutation site, and the doc comment on
+the record (321–327) shrinks to "size/align/ext mutate only on
+forward struct/union completion."
+
+### 1.4 Group cross-boundary state into a `world` record
+
+Today's persistent-across-decl state is split across `pstate` (scope,
+tags, typedefs) and `cg` (globals, str-pool). After 1.1/1.2 it
+collapses to three roots — make them one record:
+
+```scheme
+(define-record-type world
+ (%world scope tags str-pool)
+ world?
+ (scope world-scope world-scope-set!)
+ (tags world-tags world-tags-set!)
+ (str-pool world-str-pool world-str-pool-set!))
+```
+
+`pstate` and `cg` both reference the same `world`. The boundary
+contract reads as one record definition. If macros come back
+later, `world-macros` joins the same record.
+
+### 1.5 Stop precomputing mangled names
+
+`handle-decl` does `(bytevector-append "cc__" n)` at every var/fn
+binding (4727, 4736–4738, 4755) and stores the result in `sym-slot`.
+But the same mangling can happen at emit time — it's deterministic
+from `(sym-name sm)`.
+
+Drop the precompute. `cg-push-sym` / `cg-emit-global` /
+`cg-emit-extern` mangle on demand. `sym-slot` becomes a clean union:
+fixnum (auto local), #f (fn / non-static var), bv only for the
+block-static mangled case (4736–4738).
+
+Saves a bv per fn/var from being copied across the boundary; shrinks
+the `promote-sym` walker.
+
+### 1.6 Note `sym` immutability as load-bearing
+
+Already true (no `sym-*-set!` exists). Add a doc-comment line so it
+stays true. After 1.3, the only mutable record in cc is `ctype`.
+Promotion is dramatically easier when records can't be mutated
+post-construction.
+
+---
+
+## Phase 2 — scheme1 two-heap primitives
+
+Add a second heap region and an indirection. All Lisp-level
+allocation in scheme1 goes through three sites: `cons`, `alloc_hdr`,
+`alloc_bytes`. Each currently bumps `heap_next`. Route them through
+`current_heap_ptr` instead.
+
+### 2.1 New globals
+
+```
+heap_buf # main heap
+heap_next
+heap_end
+
+scratch_buf # scratch heap
+scratch_next
+scratch_end
+
+current_heap_ptr # &heap_next or &scratch_next
+current_heap_end # &heap_end or &scratch_end
+```
+
+Default at `heap_init`: main.
+
+### 2.2 New primitives
+
+| Form | Effect |
+|---|---|
+| `(use-scratch-heap!)` | Set `current_heap_ptr = &scratch_next`, `current_heap_end = &scratch_end`. |
+| `(use-main-heap!)` | Set `current_heap_ptr = &heap_next`, `current_heap_end = &heap_end`. |
+| `(reset-scratch-heap!)` | `scratch_next = scratch_buf` (round-aligned). |
+| `(in-main-heap?)` | bool — true iff currently allocating in main. (Optional; useful for assertions.) |
+| `(heap-in-main? obj)` | bool — true iff `obj`'s pointer falls inside `[heap_buf, heap_buf+heap_size)`. Used by promote walkers to short-circuit on already-promoted / interned objects. |
+
+`heap-mark` / `heap-rewind!` continue to operate on whichever heap is
+current. Mostly used in scratch from now on.
+
+### 2.3 Sizing
+
+Scratch needs to cover the worst-case decl + per-token churn between
+boundaries. Estimate from the per-decl probe data: max ~1 MB per
+decl observed in the test inputs, well below a 16 MiB scratch. Start
+with 16 MiB scratch, 256 MiB main (current cap).
+
+ELF `p_memsz` grows by the scratch size — bumped from 512 MB to
+`512 + scratch` MB.
+
+### 2.4 Fault behavior
+
+If an alloc would overflow scratch: `runtime_error` ("scratch
+exhausted"). Cleaner than silently stomping into main. The "live
+data per decl" is bounded by parser state; if scratch runs out it's
+a real bug or pathological input, not a sizing oversight.
+
+### 2.5 Error path discipline
+
+`die` (sys-exits) is the only abnormal exit. It doesn't need to
+restore heap selection — process exits. So mid-parse errors with
+the allocator pointed at scratch are fine.
+
+### 2.6 Test
+
+A scheme1-level test under `tests/scheme1/` exercises the prims:
+allocate in scratch, switch to main, allocate, verify both ptrs in
+their respective ranges, reset scratch, verify scratch ptrs become
+dangling. Same shape as `93-heap-mark-rewind.scm`.
+
+---
+
+## Phase 3 — cc parse-decl-or-fn boundary
+
+### 3.1 Driver shape
+
+```scheme
+(define (parse-translation-unit ps)
+ (cond
+ ((eq? (tok-kind (peek ps)) 'EOF) #t)
+ (else
+ (use-scratch-heap!)
+ (let ((R (snapshot-roots (ps-world ps))))
+ (parse-decl-or-fn ps)
+ (use-main-heap!)
+ (promote-roots! (ps-world ps) R)
+ (promote-iter-buffers! (ps-iter ps)))
+ (reset-scratch-heap!)
+ (parse-translation-unit ps))))
+```
+
+`snapshot-roots` captures the `eq?`-heads of the three world alists
+(plus any peek-buffer survivors that crossed in from the previous
+decl, which are already in main). `promote-roots!` walks each alist
+from its current head down to the snapshotted head, deep-copying
+each new entry into main and rebuilding the chain.
+
+### 3.2 Promote-map for sharing preservation
+
+Single alist mapping `scratch-ptr → main-ptr`, cleared at each
+boundary. Used by all `promote-*` walkers to preserve `eq?` identity:
+
+- Forward struct/union ctype completed in this decl: tags promotion
+ finds the (already-in-main) ctype, walks its now-scratch `ext` and
+ rewrites it in place via `ctype-ext-set!` — the one allowed
+ ctype mutation, now happening from scratch context to a main-heap
+ record (safe).
+- Same struct ctype referenced by multiple ptrs in the same decl:
+ map ensures all copies point at one main-heap ctype.
+
+### 3.3 Promote walkers
+
+```
+promote-bv ; bytevector-copy
+promote-loc ; new record, copy file-bv via map
+promote-tok ; new record, recurse on value (if bv) + loc + hide
+promote-ctype ; map-cached, branches on kind
+promote-sym ; new record, copy name-bv, promote type, copy slot if bv
+promote-fields ; (name-bv ctype offset) list
+promote-fn-params ; (name-bv-or-#f . ctype) list
+promote-enum-members ; (name-bv . fixnum) list
+```
+
+All short, all leaf-recursive. Scratch context calls main-heap
+allocators after `(use-main-heap!)` is in effect.
+
+### 3.4 Promotion order
+
+`world-tags` first (struct/union/enum identity anchors), then
+`world-scope` and `world-str-pool`. The map preserves identity
+across the second pass.
+
+### 3.5 Iter buffer copy
+
+`promote-iter-buffers!` walks `(tok-iter-buf pp-iter)`,
+`(tok-iter-buf lex-iter)`, and pp-state's `up-pending` / `out-buf` /
+`cur-file` / `macros`. Toks in those buffers are scratch-allocated by
+lex; promote each.
+
+For tcc.flat.c (no macros, no #line): typically 0–2 toks in pp-iter
+peek buf, 0–2 in lex-iter peek buf. Bounded constant.
+
+### 3.6 Delete `parse-fn-body`'s arena dance
+
+Lines 5328–5357 collapse to:
+
+```scheme
+(define (parse-fn-body ps name dt)
+ (scope-bind! ps name
+ (%sym name 'fn 'extern dt #f #t))
+ (%parse-fn-body-inner ps name dt))
+```
+
+The hoisted `scope-bind!` / heap-mark / state-eq? gate / fn-meta
+cleanup all go away. The body runs in scratch like everything else.
+Block-statics, string literals, block-scope tags created inside the
+body get promoted at the enclosing parse-decl-or-fn boundary along
+with everything else.
+
+The doc block at 5305–5327 (the A→B→C arena pattern) shrinks to a
+pointer at this doc.
+
+### 3.7 cc-init runs in main
+
+`cg-init`, `make-pstate`, the iter constructors, `%lex-scratch`,
+the cg bufs — all the long-lived top-level state — must be allocated
+in main. The default heap is main, so this is automatic provided
+`use-scratch-heap!` is only called inside `parse-translation-unit`.
+
+---
+
+## Expected impact
+
+For tcc.flat.c (608 KB, 18 896 lines):
+
+| state | predicted high water |
+|--------------------------------|---------------------:|
+| current | heap-exhausted >256 MB |
+| post Phase 1 (refactors only) | unchanged — still exhausted |
+| post Phase 2 (prims, unused) | unchanged |
+| post Phase 3 (boundary live) | scratch ≤ 16 MB peak; main ≤ 30 MB (cg bufs + persistent decls). Total ~45 MB. |
+
+The fixed cost is `cg-init`'s pre-allocated bufs (~12 MiB) plus
+linear-in-decls persistent state in main heap. Per-decl peak in
+scratch is bounded by the most-allocating single decl (deep-nested
+declarators, large struct bodies) — well under 16 MiB on observed
+inputs.
+
+## Migration order
+
+Phase 1 lands as 4–6 small commits, each independently testable.
+None touch heap behavior; existing tests cover them.
+
+Phase 2 lands as one commit to scheme1 + a scheme1-level test. cc
+is unaffected (default heap is main; cc never calls the new prims).
+
+Phase 3 lands as one commit to cc.scm. The cc test suites and
+tcc.flat.c probe (TCC-TODO.md repro) are the validation.
+
+## Open questions
+
+- **Do we want to also wrap `pp-iter-pull` per-token?** Inside
+ scratch, it costs nothing — the env-cons cells go to scratch and
+ reset at the next decl boundary. Probably no, simpler.
+- **Promote-map as alist vs. hash-table.** Per-decl entry count is
+ small (< 100 typical, < 500 worst case), so alist is fine. Hash
+ is a scheme1 capability change anyway.
+- **Do we need an `(in-main-heap thunk)` wrapper** that
+ save/use-main/run/restore for inline copy-out points?
+ Probably yes — makes promote walkers callable from inside other
+ contexts without manual juggling. Decide during Phase 3 build.
+
+## Related
+
+- [TCC-TODO.md](TCC-TODO.md) — the heap-explosion blocker this
+ plan resolves.
+- [CC.md](CC.md) — the C subset the cc accepts.
+- The deleted [CC-STREAM.md] (visible at git rev `67e43e7`) — prior
+ plan that turned the pipeline streaming. This plan addresses what
+ remained after streaming.