boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

Parse arena via scratch heap

Plan to bound parse-phase heap residency by routing parser allocations through a separate scratch heap that resets at top-level decl boundaries. Companion to TCC-TODO.md (the heap-blocker that motivates this) and CC.md (the C subset).

Why this, not mark/rewind alone

The streaming pipeline (commit 257fdd5) reduced intermediate materialization but didn't bound steady-state heap. Investigation pinned the cost on per-token interpreter overhead: every cond arm, let*, named-let recursion, and CPS closure allocates env-cons cells in scheme1's bump arena, with no GC to reclaim them. Measured at ~4.5 KB allocated per consumed token after removing the O(decls²) alist-ref walk leak — so the full TU still needs ~700 MB at minimum, far above the 256 MiB cap.

heap-mark / heap-rewind! can't fix this directly. Per-decl survivors (newly-bound sym, ctype, name-bv) are constructed during the parse, so they live above the mark; rewinding them as scratch defeats their purpose. Pre-allocating survivor cells doesn't work either — the parser doesn't know the shape of what it's building until it's done.

A second heap solves it cleanly: parse runs in scratch, and at each decl boundary we deep-copy the surviving roots into the main heap and reset scratch. The copy-out lands in main, so it's safe to reset scratch wholesale.

Plan overview

Three phases, each independently shippable:

  1. Pre-refactors — semantically null, but make the boundary contract clearer and shrink the deep-copy surface.
  2. scheme1 two-heap primitivescurrent-heap-ptr indirection, (use-scratch-heap!) / (use-main-heap!) / (reset-scratch-heap!).
  3. cc parse-decl-or-fn boundary — scratch-by-default, promote-and-reset at each top-level decl; subsumes parse-fn-body's bespoke arena.

Phase 1 — Pre-refactors

1.1 Drop ps-typedefs

Redundant index. A name is a typedef iff scope-lookup returns a sym with (eq? (sym-kind sm) 'typedef).

(define (typedef? ps n)
  (let ((sm (scope-lookup ps n)))
    (and sm (eq? (sym-kind sm) 'typedef))))

parse-decl-spec (cc.scm:4112–4117) currently calls (typedef? ps n) and then (scope-lookup ps n) — collapses to one walk.

Removes one root from the boundary work and one slot from pstate.

1.2 Drop cg-globals

Only reader is parse-fn-body (5338, 5344) — the canary for "did the body add a global?", gating the heap-rewind. With Phase 3 that gate goes away, and the alist becomes write-only.

cg-emit-global / cg-emit-extern keep emitting bytes into cg-data / cg-bss (fixed-storage); they just stop maintaining the alist.

If anything later wants "all emitted globals", iterate ps-scope's file-scope frame filtered to kind ∈ {fn,var} with defined? set.

Removes another root and a cg slot.

1.3 Eliminate non-essential ctype mutation

Three mutation sites today:

After: ctype has exactly one mutation site, and the doc comment on the record (321–327) shrinks to "size/align/ext mutate only on forward struct/union completion."

1.4 Group cross-boundary state into a world record

Today's persistent-across-decl state is split across pstate (scope, tags, typedefs) and cg (globals, str-pool). After 1.1/1.2 it collapses to three roots — make them one record:

(define-record-type world
  (%world scope tags str-pool)
  world?
  (scope    world-scope    world-scope-set!)
  (tags     world-tags     world-tags-set!)
  (str-pool world-str-pool world-str-pool-set!))

pstate and cg both reference the same world. The boundary contract reads as one record definition. If macros come back later, world-macros joins the same record.

1.5 Stop precomputing mangled names

handle-decl does (bytevector-append "cc__" n) at every var/fn binding (4727, 4736–4738, 4755) and stores the result in sym-slot. But the same mangling can happen at emit time — it's deterministic from (sym-name sm).

Drop the precompute. cg-push-sym / cg-emit-global / cg-emit-extern mangle on demand. sym-slot becomes a clean union: fixnum (auto local), #f (fn / non-static var), bv only for the block-static mangled case (4736–4738).

Saves a bv per fn/var from being copied across the boundary; shrinks the promote-sym walker.

1.6 Note sym immutability as load-bearing

Already true (no sym-*-set! exists). Add a doc-comment line so it stays true. After 1.3, the only mutable record in cc is ctype. Promotion is dramatically easier when records can't be mutated post-construction.


Phase 2 — scheme1 two-heap primitives

Add a second heap region and an indirection. All Lisp-level allocation in scheme1 goes through three sites: cons, alloc_hdr, alloc_bytes. Each currently bumps heap_next. Route them through current_heap_ptr instead.

2.1 New globals

heap_buf        # main heap
heap_next
heap_end

scratch_buf     # scratch heap
scratch_next
scratch_end

current_heap_ptr  # &heap_next or &scratch_next
current_heap_end  # &heap_end  or &scratch_end

Default at heap_init: main.

2.2 New primitives

Form Effect
(use-scratch-heap!) Set current_heap_ptr = &scratch_next, current_heap_end = &scratch_end.
(use-main-heap!) Set current_heap_ptr = &heap_next, current_heap_end = &heap_end.
(reset-scratch-heap!) scratch_next = scratch_buf (round-aligned).
(in-main-heap?) bool — true iff currently allocating in main. (Optional; useful for assertions.)
(heap-in-main? obj) bool — true iff obj's pointer falls inside [heap_buf, heap_buf+heap_size). Used by promote walkers to short-circuit on already-promoted / interned objects.

heap-mark / heap-rewind! continue to operate on whichever heap is current. Mostly used in scratch from now on.

2.3 Sizing

Scratch needs to cover the worst-case decl + per-token churn between boundaries. Estimate from the per-decl probe data: max ~1 MB per decl observed in the test inputs, well below a 16 MiB scratch. Start with 16 MiB scratch, 256 MiB main (current cap).

ELF p_memsz grows by the scratch size — bumped from 512 MB to 512 + scratch MB.

2.4 Fault behavior

If an alloc would overflow scratch: runtime_error ("scratch exhausted"). Cleaner than silently stomping into main. The "live data per decl" is bounded by parser state; if scratch runs out it's a real bug or pathological input, not a sizing oversight.

2.5 Error path discipline

die (sys-exits) is the only abnormal exit. It doesn't need to restore heap selection — process exits. So mid-parse errors with the allocator pointed at scratch are fine.

2.6 Test

A scheme1-level test under tests/scheme1/ exercises the prims: allocate in scratch, switch to main, allocate, verify both ptrs in their respective ranges, reset scratch, verify scratch ptrs become dangling. Same shape as 93-heap-mark-rewind.scm.


Phase 3 — cc parse-decl-or-fn boundary

3.1 Driver shape

(define (parse-translation-unit ps)
  (cond
    ((eq? (tok-kind (peek ps)) 'EOF) #t)
    (else
     (use-scratch-heap!)
     (let ((R (snapshot-roots (ps-world ps))))
       (parse-decl-or-fn ps)
       (use-main-heap!)
       (promote-roots! (ps-world ps) R)
       (promote-iter-buffers! (ps-iter ps)))
     (reset-scratch-heap!)
     (parse-translation-unit ps))))

snapshot-roots captures the eq?-heads of the three world alists (plus any peek-buffer survivors that crossed in from the previous decl, which are already in main). promote-roots! walks each alist from its current head down to the snapshotted head, deep-copying each new entry into main and rebuilding the chain.

3.2 Promote-map for sharing preservation

Single alist mapping scratch-ptr → main-ptr, cleared at each boundary. Used by all promote-* walkers to preserve eq? identity:

3.3 Promote walkers

promote-bv     ; bytevector-copy
promote-loc    ; new record, copy file-bv via map
promote-tok    ; new record, recurse on value (if bv) + loc + hide
promote-ctype  ; map-cached, branches on kind
promote-sym    ; new record, copy name-bv, promote type, copy slot if bv
promote-fields        ; (name-bv ctype offset) list
promote-fn-params     ; (name-bv-or-#f . ctype) list
promote-enum-members  ; (name-bv . fixnum) list

All short, all leaf-recursive. Scratch context calls main-heap allocators after (use-main-heap!) is in effect.

3.4 Promotion order

world-tags first (struct/union/enum identity anchors), then world-scope and world-str-pool. The map preserves identity across the second pass.

3.5 Iter buffer copy

promote-iter-buffers! walks (tok-iter-buf pp-iter), (tok-iter-buf lex-iter), and pp-state's up-pending / out-buf / cur-file / macros. Toks in those buffers are scratch-allocated by lex; promote each.

For tcc.flat.c (no macros, no #line): typically 0–2 toks in pp-iter peek buf, 0–2 in lex-iter peek buf. Bounded constant.

3.6 Delete parse-fn-body's arena dance

Lines 5328–5357 collapse to:

(define (parse-fn-body ps name dt)
  (scope-bind! ps name
               (%sym name 'fn 'extern dt #f #t))
  (%parse-fn-body-inner ps name dt))

The hoisted scope-bind! / heap-mark / state-eq? gate / fn-meta cleanup all go away. The body runs in scratch like everything else. Block-statics, string literals, block-scope tags created inside the body get promoted at the enclosing parse-decl-or-fn boundary along with everything else.

The doc block at 5305–5327 (the A→B→C arena pattern) shrinks to a pointer at this doc.

3.7 cc-init runs in main

cg-init, make-pstate, the iter constructors, %lex-scratch, the cg bufs — all the long-lived top-level state — must be allocated in main. The default heap is main, so this is automatic provided use-scratch-heap! is only called inside parse-translation-unit.


Expected impact

For tcc.flat.c (608 KB, 18 896 lines):

state predicted high water
current heap-exhausted >256 MB
post Phase 1 (refactors only) unchanged — still exhausted
post Phase 2 (prims, unused) unchanged
post Phase 3 (boundary live) scratch ≤ 16 MB peak; main ≤ 30 MB (cg bufs + persistent decls). Total ~45 MB.

The fixed cost is cg-init's pre-allocated bufs (~12 MiB) plus linear-in-decls persistent state in main heap. Per-decl peak in scratch is bounded by the most-allocating single decl (deep-nested declarators, large struct bodies) — well under 16 MiB on observed inputs.

Migration order

Phase 1 lands as 4–6 small commits, each independently testable. None touch heap behavior; existing tests cover them.

Phase 2 lands as one commit to scheme1 + a scheme1-level test. cc is unaffected (default heap is main; cc never calls the new prims).

Phase 3 lands as one commit to cc.scm. The cc test suites and tcc.flat.c probe (TCC-TODO.md repro) are the validation.

Open questions

Related