Parse arena via scratch heap
Plan to bound parse-phase heap residency by routing parser allocations through a separate scratch heap that resets at top-level decl boundaries. Companion to TCC-TODO.md (the heap-blocker that motivates this) and CC.md (the C subset).
Why this, not mark/rewind alone
The streaming pipeline (commit 257fdd5) reduced
intermediate materialization but didn't bound steady-state heap.
Investigation pinned the cost on per-token interpreter overhead:
every cond arm, let*, named-let recursion, and CPS closure
allocates env-cons cells in scheme1's bump arena, with no GC to
reclaim them. Measured at ~4.5 KB allocated per consumed token after
removing the O(decls²) alist-ref walk leak — so the full TU still
needs ~700 MB at minimum, far above the 256 MiB cap.
heap-mark / heap-rewind! can't fix this directly. Per-decl
survivors (newly-bound sym, ctype, name-bv) are constructed during
the parse, so they live above the mark; rewinding them as scratch
defeats their purpose. Pre-allocating survivor cells doesn't work
either — the parser doesn't know the shape of what it's building
until it's done.
A second heap solves it cleanly: parse runs in scratch, and at each decl boundary we deep-copy the surviving roots into the main heap and reset scratch. The copy-out lands in main, so it's safe to reset scratch wholesale.
Plan overview
Three phases, each independently shippable:
- Pre-refactors — semantically null, but make the boundary contract clearer and shrink the deep-copy surface.
- scheme1 two-heap primitives —
current-heap-ptrindirection,(use-scratch-heap!)/(use-main-heap!)/(reset-scratch-heap!). - cc parse-decl-or-fn boundary — scratch-by-default, promote-and-reset at each top-level decl; subsumes parse-fn-body's bespoke arena.
Phase 1 — Pre-refactors
1.1 Drop ps-typedefs
Redundant index. A name is a typedef iff scope-lookup returns a sym
with (eq? (sym-kind sm) 'typedef).
(define (typedef? ps n)
(let ((sm (scope-lookup ps n)))
(and sm (eq? (sym-kind sm) 'typedef))))
parse-decl-spec (cc.scm:4112–4117) currently calls (typedef? ps n)
and then (scope-lookup ps n) — collapses to one walk.
Removes one root from the boundary work and one slot from pstate.
1.2 Drop cg-globals
Only reader is parse-fn-body (5338, 5344) — the canary for "did the
body add a global?", gating the heap-rewind. With Phase 3 that gate
goes away, and the alist becomes write-only.
cg-emit-global / cg-emit-extern keep emitting bytes into
cg-data / cg-bss (fixed-storage); they just stop maintaining the
alist.
If anything later wants "all emitted globals", iterate ps-scope's
file-scope frame filtered to kind ∈ {fn,var} with defined? set.
Removes another root and a cg slot.
1.3 Eliminate non-essential ctype mutation
Three mutation sites today:
complete-agg!(4208–4210) — forward struct/union completion. Cross-decl, structurally necessary for forward-decl identity. Keep. This is the one mutation the promote walker has to handle.parse-enum-spec(4226) — same-decl. Build the members list first, then construct the enum ctype with that ext directly. Remove thectype-ext-set!.%init-fix-array-size!(4882–4886) — same-decl. Build the array ctype with the right size after the initializer count is known instead of fixing it up. Remove bothctype-ext-set!/ctype-size-set!.
After: ctype has exactly one mutation site, and the doc comment on the record (321–327) shrinks to "size/align/ext mutate only on forward struct/union completion."
1.4 Group cross-boundary state into a world record
Today's persistent-across-decl state is split across pstate (scope,
tags, typedefs) and cg (globals, str-pool). After 1.1/1.2 it
collapses to three roots — make them one record:
(define-record-type world
(%world scope tags str-pool)
world?
(scope world-scope world-scope-set!)
(tags world-tags world-tags-set!)
(str-pool world-str-pool world-str-pool-set!))
pstate and cg both reference the same world. The boundary
contract reads as one record definition. If macros come back
later, world-macros joins the same record.
1.5 Stop precomputing mangled names
handle-decl does (bytevector-append "cc__" n) at every var/fn
binding (4727, 4736–4738, 4755) and stores the result in sym-slot.
But the same mangling can happen at emit time — it's deterministic
from (sym-name sm).
Drop the precompute. cg-push-sym / cg-emit-global /
cg-emit-extern mangle on demand. sym-slot becomes a clean union:
fixnum (auto local), #f (fn / non-static var), bv only for the
block-static mangled case (4736–4738).
Saves a bv per fn/var from being copied across the boundary; shrinks
the promote-sym walker.
1.6 Note sym immutability as load-bearing
Already true (no sym-*-set! exists). Add a doc-comment line so it
stays true. After 1.3, the only mutable record in cc is ctype.
Promotion is dramatically easier when records can't be mutated
post-construction.
Phase 2 — scheme1 two-heap primitives
Add a second heap region and an indirection. All Lisp-level
allocation in scheme1 goes through three sites: cons, alloc_hdr,
alloc_bytes. Each currently bumps heap_next. Route them through
current_heap_ptr instead.
2.1 New globals
heap_buf # main heap
heap_next
heap_end
scratch_buf # scratch heap
scratch_next
scratch_end
current_heap_ptr # &heap_next or &scratch_next
current_heap_end # &heap_end or &scratch_end
Default at heap_init: main.
2.2 New primitives
| Form | Effect |
|---|---|
(use-scratch-heap!) |
Set current_heap_ptr = &scratch_next, current_heap_end = &scratch_end. |
(use-main-heap!) |
Set current_heap_ptr = &heap_next, current_heap_end = &heap_end. |
(reset-scratch-heap!) |
scratch_next = scratch_buf (round-aligned). |
(in-main-heap?) |
bool — true iff currently allocating in main. (Optional; useful for assertions.) |
(heap-in-main? obj) |
bool — true iff obj's pointer falls inside [heap_buf, heap_buf+heap_size). Used by promote walkers to short-circuit on already-promoted / interned objects. |
heap-mark / heap-rewind! continue to operate on whichever heap is
current. Mostly used in scratch from now on.
2.3 Sizing
Scratch needs to cover the worst-case decl + per-token churn between boundaries. Estimate from the per-decl probe data: max ~1 MB per decl observed in the test inputs, well below a 16 MiB scratch. Start with 16 MiB scratch, 256 MiB main (current cap).
ELF p_memsz grows by the scratch size — bumped from 512 MB to
512 + scratch MB.
2.4 Fault behavior
If an alloc would overflow scratch: runtime_error ("scratch
exhausted"). Cleaner than silently stomping into main. The "live
data per decl" is bounded by parser state; if scratch runs out it's
a real bug or pathological input, not a sizing oversight.
2.5 Error path discipline
die (sys-exits) is the only abnormal exit. It doesn't need to
restore heap selection — process exits. So mid-parse errors with
the allocator pointed at scratch are fine.
2.6 Test
A scheme1-level test under tests/scheme1/ exercises the prims:
allocate in scratch, switch to main, allocate, verify both ptrs in
their respective ranges, reset scratch, verify scratch ptrs become
dangling. Same shape as 93-heap-mark-rewind.scm.
Phase 3 — cc parse-decl-or-fn boundary
3.1 Driver shape
(define (parse-translation-unit ps)
(cond
((eq? (tok-kind (peek ps)) 'EOF) #t)
(else
(use-scratch-heap!)
(let ((R (snapshot-roots (ps-world ps))))
(parse-decl-or-fn ps)
(use-main-heap!)
(promote-roots! (ps-world ps) R)
(promote-iter-buffers! (ps-iter ps)))
(reset-scratch-heap!)
(parse-translation-unit ps))))
snapshot-roots captures the eq?-heads of the three world alists
(plus any peek-buffer survivors that crossed in from the previous
decl, which are already in main). promote-roots! walks each alist
from its current head down to the snapshotted head, deep-copying
each new entry into main and rebuilding the chain.
3.2 Promote-map for sharing preservation
Single alist mapping scratch-ptr → main-ptr, cleared at each
boundary. Used by all promote-* walkers to preserve eq? identity:
- Forward struct/union ctype completed in this decl: tags promotion
finds the (already-in-main) ctype, walks its now-scratch
extand rewrites it in place viactype-ext-set!— the one allowed ctype mutation, now happening from scratch context to a main-heap record (safe). - Same struct ctype referenced by multiple ptrs in the same decl: map ensures all copies point at one main-heap ctype.
3.3 Promote walkers
promote-bv ; bytevector-copy
promote-loc ; new record, copy file-bv via map
promote-tok ; new record, recurse on value (if bv) + loc + hide
promote-ctype ; map-cached, branches on kind
promote-sym ; new record, copy name-bv, promote type, copy slot if bv
promote-fields ; (name-bv ctype offset) list
promote-fn-params ; (name-bv-or-#f . ctype) list
promote-enum-members ; (name-bv . fixnum) list
All short, all leaf-recursive. Scratch context calls main-heap
allocators after (use-main-heap!) is in effect.
3.4 Promotion order
world-tags first (struct/union/enum identity anchors), then
world-scope and world-str-pool. The map preserves identity
across the second pass.
3.5 Iter buffer copy
promote-iter-buffers! walks (tok-iter-buf pp-iter),
(tok-iter-buf lex-iter), and pp-state's up-pending / out-buf /
cur-file / macros. Toks in those buffers are scratch-allocated by
lex; promote each.
For tcc.flat.c (no macros, no #line): typically 0–2 toks in pp-iter peek buf, 0–2 in lex-iter peek buf. Bounded constant.
3.6 Delete parse-fn-body's arena dance
Lines 5328–5357 collapse to:
(define (parse-fn-body ps name dt)
(scope-bind! ps name
(%sym name 'fn 'extern dt #f #t))
(%parse-fn-body-inner ps name dt))
The hoisted scope-bind! / heap-mark / state-eq? gate / fn-meta
cleanup all go away. The body runs in scratch like everything else.
Block-statics, string literals, block-scope tags created inside the
body get promoted at the enclosing parse-decl-or-fn boundary along
with everything else.
The doc block at 5305–5327 (the A→B→C arena pattern) shrinks to a pointer at this doc.
3.7 cc-init runs in main
cg-init, make-pstate, the iter constructors, %lex-scratch,
the cg bufs — all the long-lived top-level state — must be allocated
in main. The default heap is main, so this is automatic provided
use-scratch-heap! is only called inside parse-translation-unit.
Expected impact
For tcc.flat.c (608 KB, 18 896 lines):
| state | predicted high water |
|---|---|
| current | heap-exhausted >256 MB |
| post Phase 1 (refactors only) | unchanged — still exhausted |
| post Phase 2 (prims, unused) | unchanged |
| post Phase 3 (boundary live) | scratch ≤ 16 MB peak; main ≤ 30 MB (cg bufs + persistent decls). Total ~45 MB. |
The fixed cost is cg-init's pre-allocated bufs (~12 MiB) plus
linear-in-decls persistent state in main heap. Per-decl peak in
scratch is bounded by the most-allocating single decl (deep-nested
declarators, large struct bodies) — well under 16 MiB on observed
inputs.
Migration order
Phase 1 lands as 4–6 small commits, each independently testable. None touch heap behavior; existing tests cover them.
Phase 2 lands as one commit to scheme1 + a scheme1-level test. cc is unaffected (default heap is main; cc never calls the new prims).
Phase 3 lands as one commit to cc.scm. The cc test suites and tcc.flat.c probe (TCC-TODO.md repro) are the validation.
Open questions
- Do we want to also wrap
pp-iter-pullper-token? Inside scratch, it costs nothing — the env-cons cells go to scratch and reset at the next decl boundary. Probably no, simpler. - Promote-map as alist vs. hash-table. Per-decl entry count is small (< 100 typical, < 500 worst case), so alist is fine. Hash is a scheme1 capability change anyway.
- Do we need an
(in-main-heap thunk)wrapper that save/use-main/run/restore for inline copy-out points? Probably yes — makes promote walkers callable from inside other contexts without manual juggling. Decide during Phase 3 build.
Related
- TCC-TODO.md — the heap-explosion blocker this plan resolves.
- CC.md — the C subset the cc accepts.
- The deleted [CC-STREAM.md] (visible at git rev
67e43e7) — prior plan that turned the pipeline streaming. This plan addresses what remained after streaming.