boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs | README

commit 736cf9c89accb9c4bd818732a717d354ac712820
parent 32f98055288f0ef2591cf9cd95f03eab81f761cc
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue, 28 Apr 2026 11:08:25 -0700

cc: CC-SCRATCH Phase 3 — parse-decl-or-fn boundary

Each top-level decl now parses in scratch heap; surviving roots
(world tags / scope / str-pool, pp/lex iter buffers, pps-macros) are
deep-copied into main and scratch is reset at the boundary. Drops
parse-fn-body's bespoke heap-mark/rewind dance — the same boundary
covers the body. complete-agg! tracks main-heap struct/union ctypes
so their forward-completion ext gets rewritten in main before reset.

Per-byte main-heap residency on tcc.flat.c prefixes drops from the
pre-Phase-3 6.5 KB/B to ~0.9 KB/B (and ~0.4 KB/B after amortizing
the cg-init bufs). Update docs/TCC-TODO.md with the post-Phase-3
probe table and call out the remaining single-decl scratch peak —
the 800-member enum tcc_token block exhausts even 128 MiB scratch
because scope-bind!'s alist-ref walk is O(N²) under the interpreter.

Diffstat:
Mcc/cc.scm | 394++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
Mdocs/TCC-TODO.md | 80++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
2 files changed, 402 insertions(+), 72 deletions(-)

diff --git a/cc/cc.scm b/cc/cc.scm @@ -411,7 +411,7 @@ ;; is shared by pstate and cg so its three slots — scope (var/typedef ;; bindings), tags (struct/union/enum tags), str-pool (interned string ;; literals) — can be reasoned about as one boundary contract. -;; Phase 3's promote walkers will deep-copy from this single root. +;; Phase 3's promote walkers deep-copy from this single root. ;; -------------------------------------------------------------------- (define-record-type world (%world scope tags str-pool) @@ -449,10 +449,10 @@ ;; -------------------------------------------------------------------- ;; fn-buf and prologue-buf are pre-allocated (cg-init) and reused across ;; functions — cg-fn-begin/v calls buf-reset! on them, cg-fn-end drains -;; them into cg-text via buf-drain!. No per-fn allocation, which is what -;; lets parse-fn-body wrap the body in heap-mark/heap-rewind! safely: -;; the destination buf storage lives pre-mark, byte writes are stable -;; across rewind, and the parse/cg scratch dies cleanly. +;; them into cg-text via buf-drain!. No per-fn allocation, which lets +;; the parse-decl-or-fn boundary (Phase 3, scratch heap) discard +;; everything the body allocated wholesale — fixed-storage byte writes +;; survive scratch reset because the buf storage was allocated in main. ;; ;; in-fn? discriminates "currently inside a function body" so ;; %cg-emit-buf can route emits to fn-buf during the body and cg-text @@ -2940,8 +2940,8 @@ ;; (memcpy, no allocation). Header/footer pieces go through buf-push! ;; on cg-text — also memcpy. Net result: zero net heap allocation in ;; cg-fn-end other than the small (%n N) bvs for staging-bytes / - ;; frame-size, which the surrounding parse-fn-body's heap-rewind! - ;; reclaims. + ;; frame-size, which the enclosing parse-decl-or-fn boundary's + ;; reset-scratch-heap! reclaims. (let* ((name (%cg-fn-get cg '%fn-name)) (ret-slot (%cg-fn-get cg '%fn-ret-slot)) (ret-type (%cg-fn-get cg '%fn-ret-type)) @@ -4221,7 +4221,15 @@ (else (align-up last ma))))) (ctype-size-set! ct sz) (ctype-align-set! ct ma) - (ctype-ext-set! ct (list tag #t fs)))) + (ctype-ext-set! ct (list tag #t fs)) + ;; Phase 3: if `ct` is a forward-declared struct/union that lived in + ;; main from a prior decl, its newly-set ext lives in scratch and + ;; would dangle on reset-scratch-heap!. Track it so promote-roots! + ;; can rewrite ext in main before the boundary fires. Scratch-resident + ;; ct (defined and completed in this decl) is promoted normally via + ;; the tag walker. + (set! %promote-pending-completions + (cons ct %promote-pending-completions)))) (define (parse-enum-spec ps) (advance ps) @@ -4709,10 +4717,316 @@ (values (reverse (cons (cons nm ty2) acc)) #f)) (else (die (tok-loc (peek ps)) "param"))))))))))) -(define (parse-translation-unit ps) +;; ==================================================================== +;; Phase 3: parse-decl-or-fn boundary — scratch by default, promote +;; surviving roots into main and reset scratch at each top-level decl. +;; +;; Per-decl mutable state: +;; %promote-map: alist mapping scratch-ptr -> main-ptr, cleared at +;; each boundary. Preserves eq? identity for objects that appear +;; multiple times (one ctype referenced from several syms; one +;; file bv shared by many tok locs). +;; %promote-pending-completions: list of struct/union ctypes that +;; complete-agg! mutated during the current decl. Each main-heap +;; entry needs its now-scratch ext rewritten to main before the +;; scratch reset. +;; ==================================================================== + +(define %promote-map '()) +(define %promote-pending-completions '()) + +(define (%promote-map-ref obj) + (alist-ref/eq obj %promote-map)) + +(define (%promote-map-add! obj cloned) + (set! %promote-map (cons (cons obj cloned) %promote-map)) + cloned) + +(define (%promote-clear!) + (set! %promote-map '()) + (set! %promote-pending-completions '())) + +;; Deep-copy a bytevector into the current heap. Idempotent on +;; already-promoted bvs via the promote-map (same content but a fresh +;; bv each scratch instance, so eq? matters). #f passes through (used +;; for anonymous fields, missing tags, missing param names). +(define (promote-bv bv) (cond - ((eq? (tok-kind (peek ps)) 'EOF) #t) - (else (parse-decl-or-fn ps) (parse-translation-unit ps)))) + ((not bv) bv) + ((heap-in-main? bv) bv) + (else + (let ((c (%promote-map-ref bv))) + (cond + (c c) + (else + (let ((n (bytevector-length bv))) + (%promote-map-add! bv (bytevector-copy bv 0 n))))))))) + +(define (promote-loc l) + (cond + ((not l) l) + ((heap-in-main? l) l) + (else + (let ((c (%promote-map-ref l))) + (cond + (c c) + (else + (%promote-map-add! l + (%loc (promote-bv (loc-file l)) + (loc-line l) (loc-col l))))))))) + +;; A list of bvs. Each element is a bv (no recursion). +(define (promote-bv-list xs) + (cond ((null? xs) '()) + (else (cons (promote-bv (car xs)) + (promote-bv-list (cdr xs)))))) + +(define (promote-tok t) + (cond + ((not t) t) + ((heap-in-main? t) t) + (else + (let ((c (%promote-map-ref t))) + (cond + (c c) + (else + (let* ((v (tok-value t)) + (v* (cond ((bytevector? v) (promote-bv v)) + (else v))) ; symbol / fixnum / #f + (loc* (promote-loc (tok-loc t))) + (hide* (promote-bv-list (tok-hide t)))) + (%promote-map-add! t (%tok (tok-kind t) v* loc* hide*))))))))) + +(define (promote-tok-list xs) + (cond ((null? xs) '()) + (else (cons (promote-tok (car xs)) + (promote-tok-list (cdr xs)))))) + +;; Field list element: (name-bv ctype offset). +(define (promote-fields fs) + (cond ((null? fs) '()) + (else + (let ((f (car fs))) + (cons (list (promote-bv (car f)) + (promote-ctype (cadr f)) + (car (cddr f))) ; offset is fixnum + (promote-fields (cdr fs))))))) + +;; Fn-param list element: (name-bv-or-#f . ctype). +(define (promote-fn-params ps) + (cond ((null? ps) '()) + (else + (let* ((p (car ps)) + (nm (car p))) + (cons (cons (cond ((not nm) #f) (else (promote-bv nm))) + (promote-ctype (cdr p))) + (promote-fn-params (cdr ps))))))) + +;; Enum member element: (name-bv . fixnum). +(define (promote-enum-members ms) + (cond ((null? ms) '()) + (else + (let ((m (car ms))) + (cons (cons (promote-bv (car m)) (cdr m)) + (promote-enum-members (cdr ms))))))) + +;; ctype ext walker, dispatched by kind. +(define (promote-ctype-ext k ext) + (cond + ((or (eq? k 'void) (eq? k 'i8) (eq? k 'u8) + (eq? k 'i16) (eq? k 'u16) (eq? k 'i32) (eq? k 'u32) + (eq? k 'i64) (eq? k 'u64) (eq? k 'bool) + (eq? k 'flt) (eq? k 'dbl) (eq? k 'ldbl)) + ext) ; #f + ((eq? k 'ptr) (promote-ctype ext)) + ((eq? k 'arr) + (cons (promote-ctype (car ext)) (cdr ext))) ; (ctype . n) + ((eq? k 'fn) + (let ((ret (car ext)) (par (cadr ext)) (var (car (cddr ext)))) + (list (promote-ctype ret) + (promote-fn-params par) + var))) ; variadic? is bool + ((or (eq? k 'struct) (eq? k 'union)) + (let ((tag (car ext)) (done? (cadr ext)) (fs (car (cddr ext)))) + (list (cond ((not tag) #f) (else (promote-bv tag))) + done? + (promote-fields fs)))) + ((eq? k 'enum) + (let ((tag (car ext)) (ms (cadr ext))) + (list (cond ((not tag) #f) (else (promote-bv tag))) + (promote-enum-members ms)))) + (else (die #f "promote-ctype-ext: unknown kind" k)))) + +(define (promote-ctype c) + (cond + ((not c) c) + ((heap-in-main? c) c) ; interned prims; any prior-decl ctype + (else + (let ((cached (%promote-map-ref c))) + (cond + (cached cached) + (else + ;; Allocate a stand-in eagerly and register it before recursing, + ;; so a self-referential ctype (struct with a pointer to itself) + ;; resolves the back-edge to the same main copy. + (let ((m (%ctype (ctype-kind c) + (ctype-size c) (ctype-align c) + #f))) + (%promote-map-add! c m) + (ctype-ext-set! m (promote-ctype-ext (ctype-kind c) + (ctype-ext c))) + m))))))) + +(define (promote-sym s) + (cond + ((not s) s) + ((heap-in-main? s) s) + (else + (let ((cached (%promote-map-ref s))) + (cond + (cached cached) + (else + (let* ((nm (sym-name s)) + (sl (sym-slot s)) + (sl* (cond ((bytevector? sl) (promote-bv sl)) + (else sl)))) ; fixnum or #f + (%promote-map-add! s + (%sym (promote-bv nm) + (sym-kind s) ; symbol + (sym-storage s) ; symbol or #f + (promote-ctype (sym-type s)) + sl* + (sym-defined? s)))))))))) + +;; --- Alist promotion: head -> snapshot-tail. +;; Walks new entries (head ... snapshot-tail) and deep-copies each (k . v) +;; cons in the current heap. Tail (snapshot and below) is already in main +;; and shared. +(define (%promote-alist-frame head tail key-fn val-fn) + (cond + ((eq? head tail) tail) + (else + (let* ((entry (car head)) + (k* (key-fn (car entry))) + (v* (val-fn (cdr entry)))) + (cons (cons k* v*) + (%promote-alist-frame (cdr head) tail key-fn val-fn)))))) + +;; A snapshot captures the eq?-heads of the three world alists so +;; promote-roots! knows where the new entries stop. +(define-record-type world-snap + (%world-snap scope-top tags-top str-pool) + world-snap? + (scope-top wsnap-scope-top) + (tags-top wsnap-tags-top) + (str-pool wsnap-str-pool)) + +(define (snapshot-roots w) + (%world-snap (car (world-scope w)) + (car (world-tags w)) + (world-str-pool w))) + +(define (promote-roots! w snap) + ;; Step 1: fix forward-completion ext on pre-existing main-heap ctypes. + ;; Their ext was set to a scratch-allocated (list tag #t fs) by + ;; complete-agg!; rewrite into main now. Scratch-resident ctypes in the + ;; pending list get promoted via the tag walker below — skip here. + (let pc ((xs %promote-pending-completions)) + (cond + ((null? xs) #t) + (else + (let ((c (car xs))) + (cond + ((heap-in-main? c) + (ctype-ext-set! c + (promote-ctype-ext (ctype-kind c) (ctype-ext c)))))) + (pc (cdr xs))))) + ;; Step 2: tags first (struct/union/enum identity anchors). The tags + ;; and scope worlds are stacks of frames; after parse-decl-or-fn only + ;; the file-scope top frame is live (nested frames popped via + ;; scope-leave!). Rebuild that frame; the rest (always '()) is + ;; preserved. + (let* ((tn (world-tags w)) + (nf (%promote-alist-frame (car tn) (wsnap-tags-top snap) + promote-bv promote-ctype))) + (world-tags-set! w (cons nf (cdr tn)))) + ;; Step 3: scope (var/typedef bindings). + (let* ((sn (world-scope w)) + (nf (%promote-alist-frame (car sn) (wsnap-scope-top snap) + promote-bv promote-sym))) + (world-scope-set! w (cons nf (cdr sn)))) + ;; Step 4: string-pool (bv-content . label-bv). Flat alist, no frames. + (world-str-pool-set! w + (%promote-alist-frame (world-str-pool w) (wsnap-str-pool snap) + promote-bv promote-bv))) + +;; Iter-buffer carryover: any tok already pulled from upstream and held +;; in tok-iter-buf / pps-up-pending / pps-out-buf is scratch-allocated by +;; lex; promote each. pps-cur-file is a bv installed by `#line "file"`. +;; pps-macros entries new this decl get promoted; the snapshot tail +;; carries the prior-decl macros (already in main). + +(define-record-type iter-snap + (%iter-snap macros) + iter-snap? + (macros isnap-macros)) + +(define (snapshot-iters ps-iter) + ;; ps-iter is the pp-iter; its state is a pp-state. + (let ((st (tok-iter-state ps-iter))) + (%iter-snap (pps-macros st)))) + +(define (%promote-tok-buf it) + (tok-iter-buf-set! it (promote-tok-list (tok-iter-buf it)))) + +(define (promote-macro m) + (cond + ((not m) m) + ((heap-in-main? m) m) + (else + (let ((cached (%promote-map-ref m))) + (cond + (cached cached) + (else + (%promote-map-add! m + (%macro (macro-kind m) + (promote-bv-list (macro-params m)) + (promote-tok-list (macro-body m)))))))))) + +(define (promote-iter-buffers! pp-it isnap) + (let* ((st (tok-iter-state pp-it)) + (lex-it (pps-lex-iter st))) + (%promote-tok-buf pp-it) + (cond (lex-it (%promote-tok-buf lex-it))) + (pps-up-pending-set! st (promote-tok-list (pps-up-pending st))) + (pps-out-buf-set! st (promote-tok-list (pps-out-buf st))) + (let ((cf (pps-cur-file st))) + (cond (cf (pps-cur-file-set! st (promote-bv cf))))) + (pps-macros-set! st + (%promote-alist-frame (pps-macros st) (isnap-macros isnap) + promote-bv promote-macro)))) + +(define (parse-translation-unit ps) + (let loop () + (use-scratch-heap!) + (cond + ((eq? (tok-kind (peek ps)) 'EOF) #t) + (else + (let ((wsnap (snapshot-roots (ps-world ps))) + (isnap (snapshot-iters (ps-iter ps)))) + (parse-decl-or-fn ps) + (use-main-heap!) + (promote-roots! (ps-world ps) wsnap) + (promote-iter-buffers! (ps-iter ps) isnap) + ;; cg-fn-meta may hold scratch alist conses left over from the + ;; just-finished function; cg-fn-begin would reset it, but a + ;; trailing fn means it'd dangle past reset-scratch-heap!. + (cg-fn-meta-set! (ps-cg ps) '()) + (%promote-clear!) + (reset-scratch-heap!)) + (loop)))) + (use-main-heap!) + #t) (define (parse-decl-or-fn ps) (let-values (((sto b) (parse-decl-spec ps))) @@ -5351,57 +5665,15 @@ (cons fname seen)))))))) -;; A → B → C arena pattern from tests/scheme1/93-heap-mark-rewind.scm: -;; -;; A = parse-decl-or-fn (caller, arena-unaware) -;; B = parse-fn-body (this fn — arena boundary) -;; C = %parse-fn-body-inner (the real per-fn parse + cg work) -;; -;; B's "out" is the cg's fixed-storage bufs (cg-text and friends): they -;; were allocated at cg-init (pre-mark) and only mutate via byte writes, -;; so byte-level work survives heap-rewind!. Everything C allocates — -;; vstack opnds, intermediate bvs, ctype scratch, scope frames, switch -;; case alists — is post-mark and discarded. -;; -;; The fn-name binding into the surrounding scope (used so recursive -;; calls can resolve the name during the body) is done BEFORE the mark -;; so it survives. Inner scope frames are popped via scope-leave! before -;; the rewind, so their cells become unreachable; rewind reclaims them. -;; -;; Rewind-safety guard: the body might add string-pool entries (string -;; literals) or tags. Those entries are post-mark and would dangle on -;; rewind. We snapshot the relevant alists before parsing and skip the -;; rewind if any changed — paying full heap cost only for functions -;; that genuinely mutate global state. -;; (Block-scope typedefs and block-static syms are popped with -;; scope-leave! so their cells become unreachable cleanly; emitted -;; .data/.bss bytes for block-statics live in pre-mark fixed-storage -;; bufs and are unaffected by rewind.) +;; parse-fn-body: bind the fn-sym for recursive lookup, then parse the +;; body. Heap discipline is handled at the parse-decl-or-fn boundary — +;; the body runs in scratch like the rest of the decl, and surviving +;; roots (block-statics, string literals, block-scope tags that escape +;; via the global tables) are promoted en masse there. See the Phase 3 +;; section above parse-translation-unit. (define (parse-fn-body ps name dt) - ;; Hoist the recursive-binding scope-bind! out of the marked region - ;; so the fn-sym cons survives rewind. defined?=#t marks this as a - ;; definition so a prior forward decl gets overwritten and a second - ;; definition with the same name fires sym-merge's redefinition error. - (scope-bind! ps name - (%sym name 'fn 'extern dt #f #t)) - (let* ((cg (ps-cg ps)) - (mark (heap-mark)) - (str-pool-before (cg-str-pool cg)) - (tags-before (ps-tags ps))) - (%parse-fn-body-inner ps name dt) - (cond - ((and (eq? str-pool-before (cg-str-pool cg)) - (eq? tags-before (ps-tags ps))) - ;; cg-fn-meta points at post-mark alist conses (fn metadata, - ;; switch-case lists, indirect-slots). Drop the reference before - ;; rewinding so the cg record holds no dangling pointers — the - ;; next cg-fn-begin/v would reset it anyway, but if this is the - ;; last fn, leaving it set leaves a latent landmine. - (cg-fn-meta-set! cg '()) - (heap-rewind! mark) - (debug-log "fn-rewound" name "heap" (heap-usage))) - (else - (debug-log "fn-kept" name "heap" (heap-usage)))))) + (scope-bind! ps name (%sym name 'fn 'extern dt #f #t)) + (%parse-fn-body-inner ps name dt)) (define (%parse-fn-body-inner ps name dt) (let* ((e (ctype-ext dt)) (ret (car e)) diff --git a/docs/TCC-TODO.md b/docs/TCC-TODO.md @@ -50,13 +50,71 @@ before the full TU is parsed. Probing prefixes against the catm'd cc | 50 000 B head | 49 986 | — | heap exhausted | | full tcc.flat.c |608 547 | — | heap exhausted | -So parse-phase residency is roughly 6.5 KB heap per source byte at the -moment, which puts the full 608 KB TU around 4 GB — far beyond any -reasonable scheme1 cap. Bumping the cap is not the fix; the parser's -per-token allocations need to drop into the per-token mark/rewind -arena that the lex pipeline already uses (commit 62dabed). +Pre-Phase-3 parse-phase residency was roughly 6.5 KB heap per source +byte, which puts the full 608 KB TU around 4 GB — far beyond any +reasonable scheme1 cap. The per-decl scratch arena introduced by +[CC-SCRATCH.md](CC-SCRATCH.md) (Phase 3) addresses this. + +### Post-Phase-3 measurements (scratch = 128 MiB, heap = 256 MiB) + +Re-running the probe with the catm'd cc after Phase 3 lands. Each cut +ends at a clean top-level `};` boundary so the parse completes: + +| line | bytes | heap after parse | Δ from start | KB / source byte | +|-----:|-------:|-----------------:|-------------:|-----------------:| +| 220 | 7 953 | 16 421 864 | 15 191 584 | 1.91 | +| 280 | 9 795 | 16 922 368 | 15 692 088 | 1.60 | +| 683 | 18 260 | 19 764 528 | 18 534 248 | 1.02 | +| 880 | 22 111 | 20 763 656 | 19 533 376 | 0.88 | +| 981 | 24 557 | 22 918 568 | 21 688 288 | 0.88 | +| 986 | 24 630 | 22 931 984 | 21 701 704 | 0.88 | + +Marginal residency (heap delta / new bytes between successive probes): +~0.26 – 0.88 KB per added source byte, depending on whether the new +content is dense typedef structs (high) or wider whitespace / forward +fn decls (low). The per-byte average converges to roughly **0.9 KB / +input byte** as the prefix grows past the per-decl baseline overhead — +a **7 ×** drop from the pre-Phase-3 6.5 KB / input byte. + +Heap delta minus the start-of-process baseline (~1.23 MB scheme1 +runtime + cc-init bufs at ~12 MB): -Repro: +``` +parse_heap - start_heap += persistent main-heap state introduced by parse += surviving roots (scope, tags, str-pool) + cg-text/data/bss bytes +``` + +The cg-init bufs themselves account for ~12 MB of the start baseline +(see %BUF-CAP-* in cc.scm). After amortizing them out, persistent +parse state is closer to **0.4 KB / input byte** — limit the average +adds across decls. + +### Remaining blocker — single-decl scratch peak + +Phase 3 bounds *steady-state* heap, but a single decl that allocates +> SCRATCH_CAP_BYTES of per-token churn still aborts with `scratch +exhausted`. tcc.flat.c contains exactly one such decl: the +`enum tcc_token` block (lines 987–1612, ~16 KB of source) defines +800+ enum constants in one go. scope-bind!'s `alist-ref` walk is O(N) +per binding, and each interpreted recursion step extends the env, so +the cumulative per-decl scratch is O(N²) in member count. 16 MiB, +64 MiB, and 128 MiB scratch all overflow before the closing `}`; +the actual peak for this single decl exceeds 128 MiB. + +Fix paths (any one suffices): + +- Make `scope-bind!` lookup sub-linear — bucketed alist or a real + hashtable in scheme1. Cheapest big-win for this enum and for the + general parser, which also pays alist-ref cost per identifier. +- Bump SCRATCH_CAP_BYTES enough to absorb the worst single decl. + Needs measurement to pick a final size; also needs the ELF + `p_memsz` to grow correspondingly (currently 512 MiB, with + 256 MiB heap + 128 MiB scratch + 1 MiB readbuf already inside it). +- Bypass scope-bind!'s walk for enum-const insertion specifically + (build the members list first, batch-bind at the end). + +Repro for the per-decl exhaustion: ``` podman run --rm --pull=never --platform linux/arm64 \ @@ -64,13 +122,13 @@ podman run --rm --pull=never --platform linux/arm64 \ -v "$(pwd)":/work -w /work boot2-busybox:aarch64 \ build/aarch64/scheme1 build/aarch64/cc/cc.scm --cc-debug \ build/cc-bootstrap/X86_64/tcc.flat.c /tmp/tcc.flat.P1pp +# -> [cc] phase=slurp ... +# scheme1: scratch exhausted ``` -Phase log shows heap climbs only during `parse`; `slurp` is cheap and -`cg-finish` does not move the needle. Investigation should focus on -which parser products escape the per-token arena — top suspects are -ctypes built up during decl parsing, expression vstack residue, and -the alist-based scope/tag/typedef tables. +Phase log shows the abort fires inside the big enum: `slurp` and the +pre-enum decls all complete normally; the hit lands once parse +descends into `parse-enum-spec`'s member loop. ## Suspected next-tier blockers (not yet observed)