commit 97244390fc17b2a65d6b516201addd537d325ec7
parent 0f31a272680990b067b56657b85fb69e396d33df
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sun, 10 May 2026 07:20:00 -0700
cg: CG-driven spill/reload on a finite physical-register pool
Implement doc/REGALLOC.md. Replace the aarch64 backend's monotonic
scratch-cursor + statement-boundary reset with bitmask RegPools that
hand back free regs on demand. CG tracks per-SValue residency
(INHERENT/REG/SPILLED), spills the deepest unpinned RES_REG when the
pool exhausts, and reloads on consumption. cg_call publishes its
in-flight avs[] so the spill driver can re-spill an already-
materialized arg's reg (rewriting storage to OPK_LOCAL) when the
stack victim list runs dry — required for calls with more reg-class
args than the pool can hold.
Removes cg_reset_scratch (which silently corrupted opt's bytes when
opt was in the chain) and the aa_panic("spill_reg/reload_reg") stubs.
free_reg becomes load-bearing; every pop site routes through release().
test/parse: 306 pass → 322 pass (+16, no regressions). New
6_5_33_regalloc_spill row exercises 12-arg call spill paths.
Diffstat:
7 files changed, 1048 insertions(+), 328 deletions(-)
diff --git a/doc/REGALLOC.md b/doc/REGALLOC.md
@@ -0,0 +1,346 @@
+# REGALLOC — CG-driven spill/reload on a finite physical-register pool
+
+The single-pass codegen produces target machine code by streaming
+`CGTarget` calls. Backends expose a finite scratch-register pool
+(aarch64 today: 10 INT, 16 FP). CG must drive that pool correctly under
+arbitrary register pressure: when the pool is empty and another reg is
+needed, spill the deepest live value on the CG value stack to a frame
+slot, free its register, and proceed. When a spilled value is consumed,
+reload it first.
+
+This document defines the contract between CG and the backend and the
+residency state machine on the value stack. Opt is out of scope —
+`opt_cgtarget` panics on `spill_reg`/`reload_reg` today and that
+remains the case until the deferred Phase 3 RA pass lands.
+
+---
+
+## 1. What's broken today
+
+`aa_alloc_reg` panics with "out of INT scratch (no spill yet)" once 10
+integer scratch regs are live. To work around this, CG calls
+`cg_reset_scratch` at every statement boundary
+(`parse/parse.c:1618`), which calls `aa_reset_scratch(g->target)`
+directly (`cg/cg.c:412`). That direct call is a layering violation:
+
+- It hardcodes the aarch64 backend by name.
+- When opt is in the chain, `g->target` is an `OptImpl*`, not an
+ `AAImpl*`. `aa_reset_scratch` casts it as the latter and writes
+ `used_int=0; used_fp=0;` at AAImpl offsets inside opt's bytes —
+ silent memory corruption on every statement boundary.
+- The reset hides the fact that CG never calls `free_reg` on consumed
+ SValues. The pool is recycled by fiat, not by tracking ownership.
+
+Both `cg_reset_scratch` and `aa_reset_scratch` are removed by this
+design. `free_reg` becomes load-bearing.
+
+---
+
+## 2. CGTarget contract changes
+
+### 2.1 `alloc_reg` returns `REG_NONE` on exhaustion
+
+```c
+Reg (*alloc_reg)(CGTarget*, RegClass, const Type*);
+```
+
+Returns a fresh physical reg, or `REG_NONE` if the class's pool is
+empty. Backends never panic on exhaustion — that decision belongs to
+CG. Other failure modes (unknown `RegClass`, internal invariant
+violation) still panic.
+
+### 2.2 `free_reg` is real
+
+```c
+void (*free_reg)(CGTarget*, Reg);
+```
+
+Returns `r` to the backend's free-list. Idempotent calls and
+double-frees are bugs and may panic. CG owes exactly one `free_reg`
+per `alloc_reg` over the lifetime of every value.
+
+### 2.3 `spill_reg` implies `free_reg`
+
+```c
+void (*spill_reg)(CGTarget*, Operand src_reg, FrameSlot, MemAccess);
+```
+
+`src_reg` must be `OPK_REG`. The call:
+
+1. Stores the register's contents to the frame slot per `MemAccess`.
+2. Returns the register to the backend's free-list.
+
+After `spill_reg`, the caller must not use `src_reg` and must not call
+`free_reg` on it. Coupling these two operations matches every CG
+caller's intent and removes a forget-to-free foot-gun.
+
+### 2.4 `reload_reg` is independent
+
+```c
+void (*reload_reg)(CGTarget*, Operand dst_reg, FrameSlot, MemAccess);
+```
+
+`dst_reg` must already have been allocated by the caller via
+`alloc_reg`. Loads the slot's bytes into the register. The slot is not
+released by this call — CG returns it to its own slot free-list.
+Multiple reloads from the same slot are well-defined; CG does not rely
+on this but the contract permits it.
+
+### 2.5 Backend pool implementation
+
+Each `RegClass` holds a `u32` free-mask: bit `i` set means the i-th
+register in that class's contiguous range is free. The aarch64 backend
+keeps two such masks alongside the per-class base register:
+
+```c
+typedef struct RegPool {
+ u32 free; /* bit i set ⇔ regs[base + i] is free */
+ u8 base; /* first physical reg in the class */
+ u8 nregs; /* count; bits [nregs..32) are always 0 */
+} RegPool;
+```
+
+For aarch64:
+
+- INT pool: `base = 19`, `nregs = 10`, initial `free = 0x000003FFu`
+ (x19..x28).
+- FP pool: `base = 8`, `nregs = 16`, initial `free = 0x0000FFFFu`
+ (v8..v23, callee-saves first then caller-saved scratch).
+
+The three pool ops are pure bit operations:
+
+```c
+static Reg pool_alloc(RegPool* p) {
+ if (p->free == 0) return REG_NONE;
+ u32 idx = (u32)__builtin_ctz(p->free);
+ p->free &= ~(1u << idx);
+ return (Reg)(p->base + idx);
+}
+
+static void pool_free(RegPool* p, Reg r) {
+ u32 idx = (u32)r - p->base;
+ /* Double-free is a CG bug — bit must currently be 0. */
+ if (p->free & (1u << idx))
+ compiler_panic(..., "free_reg: %u already free", (unsigned)r);
+ p->free |= (1u << idx);
+}
+```
+
+`spill_reg` emits the store via the backend's existing store path,
+then calls `pool_free` to release the bit.
+
+`__builtin_ctz` picks the lowest free bit, so allocation order is
+deterministic and the same physical regs stay hot — matching today's
+sequential allocation order for diff stability across the test corpus.
+The 32-bit mask is sufficient: every architecture cfree targets has
+fewer than 32 scratch regs per class.
+
+---
+
+## 3. CG residency model
+
+### 3.1 SValue extension
+
+```c
+typedef enum SResidency {
+ RES_INHERENT, /* IMM / LOCAL / GLOBAL — no reg owed */
+ RES_REG, /* op holds a Reg this SValue owns */
+ RES_SPILLED, /* register contents stored to spill_slot;
+ op.kind reflects the original value form
+ (REG or INDIRECT); op.v.reg is REG_NONE */
+} SResidency;
+
+typedef struct SValue {
+ Operand op;
+ const Type* type;
+ SResidency res;
+ FrameSlot spill_slot; /* valid iff res == RES_SPILLED */
+ u8 pinned; /* 1 = ineligible spill victim */
+} SValue;
+```
+
+`OPK_INDIRECT` lvalues hold a base register and are tracked with
+`RES_REG`. On spill the base reg is saved; on reload the SValue is
+restored to `OPK_INDIRECT` with the freshly-reloaded base. The
+deferred-load identity of the lvalue is preserved across spill/reload
+— CG does not eagerly materialize an INDIRECT to a plain rvalue just
+because it became a spill victim.
+
+`pinned` is set for the duration of one CG operation. Pinning prevents
+a freshly-reloaded operand from being chosen as the spill victim while
+CG is still arranging the other operand of a binop or call.
+
+### 3.2 Spill-slot pool
+
+CG maintains two free-lists of `FrameSlot`s, one per `RegClass`:
+
+- `RC_INT` slots are 8 bytes, 8-byte aligned.
+- `RC_FP` slots are 16 bytes, 16-byte aligned (covers `double` and
+ the spilled portion of `long double`).
+
+A spill takes a slot from the free-list (allocating a fresh
+`FrameSlot` from the backend if the list is empty). A reload returns
+the slot to the free-list. Frame footprint is bounded by peak
+concurrent spills per class, which on a stack machine of typical depth
+is small.
+
+Slots are per-function. `cg_func_end` discards both free-lists.
+
+### 3.3 Invariants
+
+After every public `cg_*` call returns:
+
+1. Every `RES_REG` SValue on the stack owns its register; the sum of
+ `RES_REG` regs equals the backend's allocated-reg count.
+2. No SValue is `pinned`.
+3. Every `RES_SPILLED` SValue holds a slot that is *not* on the
+ slot free-list.
+
+`cg_func_end` asserts the stack is empty and both free-lists are well-
+formed (every entry distinct, no aliasing with a live `RES_SPILLED`
+slot).
+
+---
+
+## 4. Spill / reload algorithm
+
+### 4.1 Allocation with fallback
+
+```
+alloc_reg_or_spill(g, cls, ty) -> Reg:
+ r = target->alloc_reg(cls, ty)
+ if r != REG_NONE:
+ return r
+ victim = pick_victim(g, cls)
+ if victim == NULL:
+ panic("regalloc: no spillable victim") // pinned-only is a CG bug
+ slot = take_spill_slot(g, cls)
+ target->spill_reg(victim.op, slot, mem_for_spill(victim, cls))
+ victim.spill_slot = slot
+ victim.res = RES_SPILLED
+ victim.op.v.reg = REG_NONE
+ r = target->alloc_reg(cls, ty)
+ assert(r != REG_NONE)
+ return r
+```
+
+`pick_victim` walks the value stack from index 0 upward and returns
+the first SValue with `res == RES_REG`, `pinned == 0`, and matching
+`RegClass`. This is FIFO from the bottom — equivalent to "deepest
+first" — and matches the intuition that the top of the stack is about
+to be consumed.
+
+`mem_for_spill` constructs a `MemAccess` from the victim's type with
+`alias.kind = ALIAS_LOCAL` rooted at the spill slot.
+
+### 4.2 Reload before consumption
+
+```
+ensure_reg(g, sv):
+ if sv.res != RES_SPILLED:
+ return
+ r = alloc_reg_or_spill(g, class_of(sv.type), sv.type)
+ target->reload_reg(op_reg(r, sv.type), sv.spill_slot,
+ mem_for_spill(sv, class_of(sv.type)))
+ return_spill_slot(g, sv.spill_slot, class_of(sv.type))
+ sv.spill_slot = FRAME_SLOT_NONE
+ if sv.op.kind == OPK_INDIRECT:
+ sv.op.v.ind.base = r
+ else:
+ sv.op = op_reg(r, sv.type)
+ sv.res = RES_REG
+```
+
+`ensure_reg` is called at the start of every operation that consumes
+a register-resident operand. The ensure-then-pin pattern is:
+
+```
+binop(g, op):
+ b = pop(g); ensure_reg(g, &b); b.pinned = 1
+ a = pop(g); ensure_reg(g, &a); a.pinned = 1
+ rd = alloc_reg_or_spill(g, class_of(result), result_ty)
+ target->binop(op, op_reg(rd, result_ty), a.op, b.op)
+ a.pinned = b.pinned = 0
+ target->free_reg(reg_of(a)); target->free_reg(reg_of(b))
+ push(g, svalue_reg(rd, result_ty))
+```
+
+Pinning is symmetric within one CG call. Nothing leaks across calls.
+
+### 4.3 Termination
+
+Each `alloc_reg_or_spill` call either succeeds on first try or evicts
+exactly one unpinned `RES_REG` SValue. The pinned set is bounded
+(within one CG op, at most a handful: two source operands plus a
+destination-in-progress). With at least `pinned + 1` registers in the
+class, the second `alloc_reg` call in `alloc_reg_or_spill` is
+guaranteed to succeed. Every backend's pool comfortably exceeds this
+bound (aarch64: 10 INT, 16 FP; minimum needed: ~3).
+
+`cg_call` is the one exception to "victims live on the value stack."
+While the pop loop materializes args, popped-but-not-yet-emitted regs
+accumulate in the local `CGABIValue avs[]` array — off the value
+stack and so invisible to `pick_victim`. For a call with more reg-
+class args than the pool size can hold, `pick_victim` will eventually
+return NULL while genuine victims remain in `avs[]`.
+
+To handle this, `cg_call` publishes its in-flight `avs` array via
+`CG.avs_in_flight` for the duration of the pop+materialize loop.
+`alloc_reg_or_spill` falls back to `spill_avs_victim` when the stack
+victim list is empty: it picks an `OPK_REG` arg storage entry,
+evicts it through `spill_reg`, and rewrites `avs[i].storage` to
+`OPK_LOCAL` so the backend's call lowering loads from the slot. After
+`T->call`, `cg_call` walks `avs` and returns each `OPK_LOCAL` slot to
+the spill-slot free-list. Without this fallback, a 12-arg INT call on
+aarch64 (10 INT pool) would be unsatisfiable.
+
+If a backend is added with a class smaller than the pinned bound for
+some operation outside `cg_call`, that's a wiring bug; CG asserts
+after the recursive `alloc_reg` call.
+
+### 4.4 Free-on-consume
+
+Every site that pops an SValue and is done with it must release any
+register it owned:
+
+```
+release(g, sv):
+ if sv.res == RES_REG:
+ if sv.op.kind == OPK_REG:
+ target->free_reg(sv.op.v.reg)
+ else if sv.op.kind == OPK_INDIRECT:
+ target->free_reg(sv.op.v.ind.base)
+ else if sv.res == RES_SPILLED:
+ return_spill_slot(g, sv.spill_slot, class_of(sv.type))
+ /* RES_INHERENT: nothing owed */
+```
+
+`cg_drop`, the result-discard path of `cg_store`, the operand pops in
+every `cg_binop`/`cg_unop`/`cg_cmp`/`cg_call`/`cg_load`/`cg_addr`/etc.
+all flow through `release`. This is the audit that bulks up the diff —
+CG has many `pop` sites and they currently leak.
+
+---
+
+## 5. Removed mechanisms
+
+The following are deleted as part of this change:
+
+- `cg_reset_scratch` (`cg/cg.c`) — no replacement; `release` in §4.4
+ makes it unnecessary.
+- The `extern void cg_reset_scratch(CG*)` declaration and the five call
+ sites in `parse/parse.c`.
+- `aa_reset_scratch` (`arch/aarch64.c`) and its forward declaration in
+ `cg/cg.c`.
+- The "(no spill yet)" panics in `aa_alloc_reg` — replaced by
+ `return REG_NONE`.
+- The `aa_panic("spill_reg")`/`aa_panic("reload_reg")` stubs —
+ replaced by real STR/LDR implementations that route through the
+ backend's existing store/load paths.
+
+The `used_int` / `used_fp` fields on `AAImpl` are repurposed: instead
+of monotonically-incrementing allocation cursors (where free was a
+no-op), they now track the highest-index-+1 ever allocated per class —
+the high-water mark needed by the prologue/epilogue to size the
+callee-save area. The actual free-list is the `int_free` / `fp_free`
+bitmasks added alongside.
diff --git a/src/arch/aarch64.c b/src/arch/aarch64.c
@@ -16,10 +16,14 @@
* local slots (cum_off bytes)
* x29, x30 save (16 bytes) -- x29 = sp + frame_size - 16
*
- * Single-pass register allocator: alloc_reg(RC_INT) hands out x19..x28 in
- * order; alloc_reg(RC_FP) hands out v8..v15. Both ranges are callee-saved
- * and only the prefix actually used is saved by the prologue. Width
- * derives from Operand.type via type_is_64. Spill/reload not implemented.
+ * Single-pass register allocator: a free-mask pool per class hands out
+ * the lowest free index. INT pool covers x19..x28 (10 callee-saves);
+ * FP pool covers v8..v23, with v8..v15 callee-saved and v16..v23
+ * caller-saved scratch — lowest-bit-first allocation prefers callee-
+ * saves. Only the prefix actually used (high-water mark) is saved by
+ * the prologue. Width derives from Operand.type via type_is_64. CG
+ * drives spill/reload through alloc_reg returning REG_NONE on
+ * exhaustion plus the spill_reg/reload_reg vtable entries.
*
* Multi-function: each func_begin/func_end pair owns its own frame state
* via the AAImpl fields, so the harness can build several functions in
@@ -257,6 +261,51 @@ static inline u32 aa64_bfm(u32 sf, u32 Rd, u32 Rn, u32 immr, u32 imms) {
}
/* ============================================================
+ * Per-class register pool (free-mask + high-water mark).
+ *
+ * The mask uses bit i for the i-th register in the class's contiguous
+ * range, so allocation is `__builtin_ctz` over the free mask and
+ * deallocation is bit-set. `hwm` records the highest-index-+1 ever
+ * allocated, which the prologue/epilogue uses to size the callee-save
+ * area. 32-bit masks suffice for every aarch64/x86_64/RISC-V class.
+ * ============================================================ */
+
+typedef struct RegPool {
+ u32 free; /* bit i set ⇔ regs[base + i] is free */
+ u32 hwm; /* highest-index-+1 ever allocated */
+ u8 base; /* first physical reg in the class */
+ u8 nregs; /* count; bits [nregs..32) are always 0 */
+ u8 pad[2];
+} RegPool;
+
+static void regpool_init(RegPool* p, u8 base, u8 nregs) {
+ p->base = base;
+ p->nregs = nregs;
+ p->hwm = 0;
+ p->free = (nregs >= 32u) ? 0xFFFFFFFFu : ((1u << nregs) - 1u);
+}
+
+static Reg regpool_alloc(RegPool* p) {
+ if (p->free == 0) return (Reg)REG_NONE;
+ u32 idx = (u32)__builtin_ctz(p->free);
+ p->free &= ~(1u << idx);
+ if (idx + 1u > p->hwm) p->hwm = idx + 1u;
+ return (Reg)(p->base + idx);
+}
+
+/* Returns 1 on successful free, 0 if `r` is outside this pool's range,
+ * -1 on double-free (caller is expected to panic). */
+static int regpool_free(RegPool* p, Reg r) {
+ u32 rn = (u32)r;
+ if (rn < p->base || rn >= (u32)(p->base + p->nregs)) return 0;
+ u32 idx = rn - p->base;
+ u32 bit = 1u << idx;
+ if (p->free & bit) return -1;
+ p->free |= bit;
+ return 1;
+}
+
+/* ============================================================
* AAImpl
* ============================================================ */
@@ -305,9 +354,18 @@ typedef struct AAImpl {
u8 has_sret; /* sret pointer arrived in x8 */
FrameSlot sret_ptr_slot; /* hidden slot holding incoming x8 */
- /* Reg allocator (callee-saved prefix). */
- u32 used_int; /* x19 + i, i in [0, used_int) */
- u32 used_fp; /* v8 + i, i in [0, used_fp ) */
+ /* Reg allocator pools. Bit i set in `free` means the i-th register in
+ * the class's contiguous range (base..base+nregs-1) is available. The
+ * high-water mark `hwm` is the largest index+1 ever allocated for the
+ * class — used by the prologue to decide how many callee-saves to push.
+ *
+ * INT pool: base = 19, nregs = 10 (x19..x28).
+ * FP pool : base = 8, nregs = 16 (v8..v23). The first 8 (v8..v15) are
+ * AAPCS64 callee-saves; v16..v23 are caller-saved scratch handed out
+ * after the callee-saved range fills. Allocation is lowest-bit-first
+ * so callee-saves are still preferred. */
+ RegPool int_pool;
+ RegPool fp_pool;
/* Structured-scope stack. Entries are not popped — IDs returned to
* the caller are stable indices into this array for the lifetime
@@ -343,6 +401,9 @@ static AAImpl* impl_of(CGTarget* t) { return (AAImpl*)t; }
static FrameSlot aa_frame_slot(CGTarget* t, const FrameSlotDesc* d);
static AASlot* slot_get(AAImpl* a, FrameSlot fs);
static u32 force_reg_int(CGTarget* t, Operand op, u32 sf, u32 scratch);
+static void aa_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma);
+static void aa_store(CGTarget* t, Operand addr, Operand src, MemAccess ma);
+static void aa_free_reg(CGTarget* t, Reg r);
/* ---- helpers ---- */
@@ -535,8 +596,8 @@ static void aa_func_begin(CGTarget* t, const CGFuncDesc* fd) {
a->has_sret = (fd->abi && fd->abi->has_sret) ? 1 : 0;
a->cum_off = 0;
a->max_outgoing = 0;
- a->used_int = 0;
- a->used_fp = 0;
+ regpool_init(&a->int_pool, /*base=*/19u, /*nregs=*/10u); /* x19..x28 */
+ regpool_init(&a->fp_pool, /*base=*/8u, /*nregs=*/16u); /* v8..v23 */
a->nslots = 0;
a->nscopes = 0;
a->has_alloca = 0;
@@ -614,8 +675,8 @@ static void aa_func_end(CGTarget* t) {
/* Compute callee-save layout. Only v8..v15 are callee-saved; the
* caller-saved v16..v23 are handed out by alloc_reg too but never
* appear in prologue saves. */
- u32 n_int_pairs = (a->used_int + 1) / 2; /* round up */
- u32 used_fp_cs = a->used_fp > 8 ? 8u : a->used_fp;
+ u32 n_int_pairs = (a->int_pool.hwm + 1) / 2; /* round up */
+ u32 used_fp_cs = a->fp_pool.hwm > 8 ? 8u : a->fp_pool.hwm;
u32 n_fp_pairs = (used_fp_cs + 1) / 2;
u32 outgoing_off = 0;
@@ -750,45 +811,28 @@ static void aa_func_end(CGTarget* t) {
static Reg aa_alloc_reg(CGTarget* t, RegClass cls, const Type* ty) {
AAImpl* a = impl_of(t);
(void)ty;
- if (cls == RC_INT) {
- if (a->used_int >= 10) {
- compiler_panic(t->c, a->loc,
- "aarch64 alloc_reg: out of INT scratch (no spill yet)");
- }
- return (Reg)(19u + a->used_int++);
- }
- if (cls == RC_FP) {
- /* v8..v15 are callee-saved (low 64 bits); v16..v23 are caller-saved
- * scratch. Hand out callee-saved first, then fall back to scratch
- * for short-lived materialization (e.g. j06 builds 9 FP arg regs
- * with no intervening call). */
- if (a->used_fp >= 16) {
- compiler_panic(t->c, a->loc,
- "aarch64 alloc_reg: out of FP scratch (no spill yet)");
- }
- u32 idx = a->used_fp++;
- return (Reg)(idx < 8 ? 8u + idx : 16u + (idx - 8u));
- }
+ /* Lowest-bit-first allocation hands out callee-saves before caller-
+ * saves on the FP side (v8..v15 then v16..v23) — short-lived
+ * materializations (e.g. j06 building 9 FP arg regs with no
+ * intervening call) thus reach into the caller-saved range. */
+ if (cls == RC_INT) return regpool_alloc(&a->int_pool);
+ if (cls == RC_FP) return regpool_alloc(&a->fp_pool);
compiler_panic(t->c, a->loc, "aarch64 alloc_reg: class %d unimpl", (int)cls);
}
static void aa_free_reg(CGTarget* t, Reg r) {
- (void)t;
- (void)r;
-}
-
-/* Reset the scratch-register cursors. The parser calls this between
- * statements (when its value stack is known empty), letting the next
- * statement reuse the entire scratch pool. Safe only when no live
- * register-resident SValue is still expected — the parser asserts
- * that precondition by checking cg's sp before forwarding. Without
- * this, any function whose body contains more than ~10 sequential
- * register-allocating operations exhausts the scratch pool. */
-void aa_reset_scratch(CGTarget* t);
-void aa_reset_scratch(CGTarget* t) {
AAImpl* a = impl_of(t);
- a->used_int = 0;
- a->used_fp = 0;
+ RegPool* pools[2] = {&a->int_pool, &a->fp_pool};
+ for (u32 i = 0; i < 2; ++i) {
+ int rc = regpool_free(pools[i], r);
+ if (rc == 1) return;
+ if (rc == -1) {
+ compiler_panic(t->c, a->loc, "aarch64 free_reg: reg %u already free",
+ (unsigned)r);
+ }
+ }
+ compiler_panic(t->c, a->loc, "aarch64 free_reg: reg %u not a scratch reg",
+ (unsigned)r);
}
static FrameSlot aa_frame_slot(CGTarget* t, const FrameSlotDesc* d) {
@@ -914,17 +958,35 @@ static const Reg* aa_clobbers(CGTarget* t, RegClass c, u32* n) {
(void)n;
aa_panic(t, "clobbers");
}
-static void aa_spill_reg(CGTarget* t, Operand s, FrameSlot f, MemAccess m) {
- (void)s;
- (void)f;
- (void)m;
- aa_panic(t, "spill_reg");
-}
-static void aa_reload_reg(CGTarget* t, Operand d, FrameSlot f, MemAccess m) {
- (void)d;
- (void)f;
- (void)m;
- aa_panic(t, "reload_reg");
+static void aa_spill_reg(CGTarget* t, Operand src, FrameSlot slot,
+ MemAccess ma) {
+ AAImpl* a = impl_of(t);
+ if (src.kind != OPK_REG) {
+ compiler_panic(t->c, a->loc, "aarch64 spill_reg: src is not OPK_REG");
+ }
+ Operand addr;
+ memset(&addr, 0, sizeof addr);
+ addr.kind = OPK_LOCAL;
+ addr.cls = RC_INT;
+ addr.type = ma.type;
+ addr.v.frame_slot = slot;
+ aa_store(t, addr, src, ma);
+ aa_free_reg(t, src.v.reg);
+}
+
+static void aa_reload_reg(CGTarget* t, Operand dst, FrameSlot slot,
+ MemAccess ma) {
+ AAImpl* a = impl_of(t);
+ if (dst.kind != OPK_REG) {
+ compiler_panic(t->c, a->loc, "aarch64 reload_reg: dst is not OPK_REG");
+ }
+ Operand addr;
+ memset(&addr, 0, sizeof addr);
+ addr.kind = OPK_LOCAL;
+ addr.cls = RC_INT;
+ addr.type = ma.type;
+ addr.v.frame_slot = slot;
+ aa_load(t, dst, addr, ma);
}
/* ---- labels / control flow ----
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -11,11 +11,29 @@
* - OPK_LOCAL / OPK_GLOBAL / OPK_INDIRECT are lvalues. cg_load promotes
* them to OPK_REG via target->load + a fresh scratch register.
*
- * This is the spine slice — enough for §6.5/§6.8 fixtures: scalar i32
- * locals, integer arithmetic, comparisons, control flow, and return.
- * Aggregates, atomics, calls, and the asm/setjmp/intrinsic methods are
- * placeholders pending their corpus rows. The interface in cg.h is the
- * commitment; this file fills in the slice that's exercised today. */
+ * Register pressure & spill (see doc/REGALLOC.md):
+ * - Each SValue carries an SResidency tag (INHERENT / REG / SPILLED).
+ * REG-residing SValues own a physical scratch register that must be
+ * released back to the pool when the value is consumed; SPILLED
+ * SValues own a frame slot instead and must be reloaded before use.
+ * - alloc_reg_or_spill is the single allocation entry point. On pool
+ * exhaustion it picks the deepest unpinned RES_REG SValue from the
+ * value stack as the spill victim, evicts its register through
+ * T->spill_reg, and retries. ensure_reg is the dual: it reloads a
+ * SPILLED SValue's register before consumption, possibly evicting
+ * another value to make room.
+ * - Pop sites (binop, cmp, store, branch, call, ...) call release()
+ * to return regs/slots after consumption — there is no statement-
+ * boundary scratch reset; ownership tracking is the discipline.
+ * - cg_call additionally exposes its in-flight CGABIValue array via
+ * CG.avs_in_flight so the spill driver can re-spill an already-
+ * materialized arg's register (rewriting storage to OPK_LOCAL) when
+ * the value stack runs out of victims; this lets calls with more
+ * reg-class args than the pool size can hold lower correctly.
+ *
+ * Aggregates, atomics, asm/setjmp/intrinsic methods are placeholders
+ * pending their corpus rows. The interface in cg.h is the commitment;
+ * this file fills in the slice that's exercised today. */
#include "cg/cg.h"
@@ -35,12 +53,28 @@
* Value stack
* ============================================================ */
+/* Residency: where the value's storage actually lives. INHERENT values
+ * (IMM, LOCAL, GLOBAL) carry no register obligation. REG values own a
+ * physical scratch register. SPILLED values had their register evicted
+ * to a frame slot under register pressure and must be reloaded before
+ * consumption. See doc/REGALLOC.md §3. */
+typedef enum SResidency {
+ RES_INHERENT,
+ RES_REG,
+ RES_SPILLED,
+} SResidency;
+
typedef struct SValue {
Operand op; /* IMM/REG (rvalue) or LOCAL/GLOBAL/INDIRECT (lvalue) */
const Type* type; /* C semantic type of the value (post-promotion) */
+ u8 res; /* SResidency */
+ u8 pinned; /* 1 = ineligible spill victim (cleared per CG op) */
+ u8 pad[2];
+ FrameSlot spill_slot; /* valid iff res == RES_SPILLED */
} SValue;
#define CG_STACK_INITIAL 16u
+#define CG_SPILL_FREE_INITIAL 4u
struct CG {
Compiler* c;
@@ -64,6 +98,29 @@ struct CG {
SValue* stack;
u32 sp;
u32 cap;
+
+ /* Per-function spill-slot free-lists, one per RegClass. A spill takes a
+ * slot from the free-list (allocating fresh from the backend if empty);
+ * a reload returns the slot. Frame footprint is bounded by the peak
+ * concurrent spills per class. Reset at func_end. See doc/REGALLOC.md
+ * §3.2. */
+ struct {
+ FrameSlot* free;
+ u32 n;
+ u32 cap;
+ } slot_pools[3]; /* indexed by RegClass; RC_VEC reserved */
+
+ /* Set during cg_call's pop+materialize loop to point at the call's
+ * in-flight CGABIValue array. When alloc_reg_or_spill exhausts the
+ * stack victim list, it falls back to spilling an OPK_REG arg
+ * storage entry from this array, rewriting it to OPK_LOCAL so the
+ * backend's call lowering loads it from the spill slot. NULL outside
+ * cg_call. Without this fallback, calls with more reg-class args
+ * than the pool can hold (e.g. 10+ INT args on aarch64) would have
+ * unreclaimable regs sitting in avs[] while the value stack runs
+ * out of victims. */
+ CGABIValue* avs_in_flight;
+ u32 avs_in_flight_n;
};
static void stack_grow(CG* g, u32 want) {
@@ -93,11 +150,24 @@ static SValue pop(CG* g) {
return g->stack[--g->sp];
}
-static SValue* top(CG* g) {
- if (g->sp == 0) {
- compiler_panic(g->c, g->cur_loc, "cg: stack empty");
- }
- return &g->stack[g->sp - 1];
+
+/* Residency of an Operand as it should land on the stack at construction
+ * time. REG → owns a register; INDIRECT → owns its base register; the
+ * rest carry no register obligation. */
+static u8 residency_for(const Operand* o) {
+ if (o->kind == OPK_REG) return RES_REG;
+ if (o->kind == OPK_INDIRECT) return RES_REG;
+ return RES_INHERENT;
+}
+
+static SValue make_sv(Operand op, const Type* ty) {
+ SValue sv;
+ memset(&sv, 0, sizeof sv);
+ sv.op = op;
+ sv.type = ty;
+ sv.res = residency_for(&op);
+ sv.spill_slot = FRAME_SLOT_NONE;
+ return sv;
}
/* ============================================================
@@ -187,6 +257,257 @@ static AliasKind alias_for_lvalue(const Operand* o) {
}
}
+/* MemAccess for a load/store through an lvalue Operand. Routes through
+ * derive_mem with the alias root and any local-id taken from the lvalue
+ * itself. The Operand must be OPK_LOCAL/GLOBAL/INDIRECT. */
+static MemAccess mem_for_lvalue(CG* g, const Operand* lv, const Type* ty) {
+ AliasKind ak = alias_for_lvalue(lv);
+ i32 al = (ak == ALIAS_LOCAL) ? (i32)lv->v.frame_slot : 0;
+ return derive_mem(g, ty, ak, al);
+}
+
+/* C type carried by an SValue, with a fallback to the Operand's own type
+ * field for SValues whose `type` was never set (notably the bare-slot
+ * cg_push_local path). The two should agree when both present. */
+static const Type* sv_type(const SValue* sv) {
+ return sv->type ? sv->type : sv->op.type;
+}
+
+/* Build an OPK_INDIRECT Operand for [base + ofs] of type `ty`. The base
+ * register is always RC_INT (it holds a pointer). */
+static Operand op_indirect(Reg base, i32 ofs, const Type* ty) {
+ Operand o;
+ memset(&o, 0, sizeof o);
+ o.kind = OPK_INDIRECT;
+ o.cls = RC_INT;
+ o.type = ty;
+ o.v.ind.base = base;
+ o.v.ind.ofs = ofs;
+ return o;
+}
+
+/* ============================================================
+ * Register-pool & spill driver — see doc/REGALLOC.md
+ * ============================================================ */
+
+/* Class an SValue's register lives in: RC_FP for float types, RC_INT for
+ * everything else (including OPK_INDIRECT base regs which always hold a
+ * pointer-sized integer). */
+static u8 class_of_sv(const SValue* sv) {
+ if (sv->op.kind == OPK_INDIRECT) return RC_INT;
+ return type_class(sv_type(sv));
+}
+
+/* The C type whose width matches the register an SValue currently owns.
+ * For OPK_INDIRECT, that's a pointer to the lvalue's type (the base reg
+ * holds an address); for everything else, the SValue's plain type. Used
+ * by the spill/reload machinery to size the slot store/load. */
+static const Type* sv_owned_reg_type(CG* g, const SValue* sv) {
+ if (sv->op.kind == OPK_INDIRECT) {
+ return type_ptr(g->pool, sv->type ? sv->type : type_void(g->pool));
+ }
+ return sv_type(sv);
+}
+
+static u8 reg_of_sv(const SValue* sv) {
+ if (sv->op.kind == OPK_REG) return (u8)sv->op.v.reg;
+ if (sv->op.kind == OPK_INDIRECT) return (u8)sv->op.v.ind.base;
+ return 0;
+}
+
+/* Write the register an SValue owns. For OPK_INDIRECT the ind.base
+ * field; for OPK_REG the v.reg field. Used by ensure_reg to restore
+ * the reloaded reg, and by alloc_reg_or_spill to mark a freshly-
+ * spilled SValue's reg slot as REG_NONE so any stale read fails fast. */
+static void set_owned_reg(SValue* sv, Reg r) {
+ if (sv->op.kind == OPK_INDIRECT) {
+ sv->op.v.ind.base = r;
+ } else {
+ sv->op.v.reg = r;
+ }
+}
+
+/* MemAccess for a spill/reload. Width is driven by the type of the value
+ * whose register is moving — pointer-sized for OPK_INDIRECT bases,
+ * otherwise the value's own type. The alias root is ALIAS_LOCAL with
+ * id 0: spill traffic is internal CG bookkeeping that opt's alias
+ * analysis never inspects, and the slot itself disambiguates accesses. */
+static MemAccess mem_for_spill(CG* g, const SValue* sv) {
+ return derive_mem(g, sv_owned_reg_type(g, sv), ALIAS_LOCAL, 0);
+}
+
+/* Per-class spill-slot size in bytes. FP slots are 16 bytes to cover
+ * `double` and the spilled portion of `long double`; INT slots are
+ * pointer-width. Both values are also the slot's alignment. */
+static u32 spill_slot_size(u8 cls) { return (cls == RC_FP) ? 16u : 8u; }
+
+/* Take a spill slot from the per-class free-list, or allocate a fresh one. */
+static FrameSlot take_spill_slot(CG* g, u8 cls) {
+ if (g->slot_pools[cls].n > 0) {
+ return g->slot_pools[cls].free[--g->slot_pools[cls].n];
+ }
+ FrameSlotDesc d;
+ memset(&d, 0, sizeof d);
+ d.size = spill_slot_size(cls);
+ d.align = d.size;
+ d.kind = FS_SPILL;
+ return g->target->frame_slot(g->target, &d);
+}
+
+static void return_spill_slot(CG* g, FrameSlot s, u8 cls) {
+ if (s == FRAME_SLOT_NONE) return;
+ if (g->slot_pools[cls].n == g->slot_pools[cls].cap) {
+ Heap* h = g->c->env->heap;
+ u32 cap = g->slot_pools[cls].cap;
+ u32 nc = cap ? cap * 2u : CG_SPILL_FREE_INITIAL;
+ FrameSlot* nb =
+ (FrameSlot*)h->alloc(h, sizeof(FrameSlot) * nc, _Alignof(FrameSlot));
+ if (g->slot_pools[cls].free) {
+ memcpy(nb, g->slot_pools[cls].free,
+ sizeof(FrameSlot) * g->slot_pools[cls].n);
+ h->free(h, g->slot_pools[cls].free, sizeof(FrameSlot) * cap);
+ }
+ g->slot_pools[cls].free = nb;
+ g->slot_pools[cls].cap = nc;
+ }
+ g->slot_pools[cls].free[g->slot_pools[cls].n++] = s;
+}
+
+/* Walk the value stack from index 0 upward and return the first
+ * unpinned RES_REG SValue that matches `cls`. FIFO-from-bottom == the
+ * deepest live value, which matches the intuition that the top of
+ * stack is about to be consumed. Pinned entries are skipped — they're
+ * the operands of an in-flight CG op that mustn't be evicted from
+ * under itself (see cg_dup, which keeps its source on the stack while
+ * allocating the duplicate's destination register). Returns NULL if
+ * no eligible victim exists; that's a CG bug at this call site. */
+static SValue* pick_victim(CG* g, u8 cls) {
+ for (u32 i = 0; i < g->sp; ++i) {
+ SValue* sv = &g->stack[i];
+ if (sv->res != RES_REG) continue;
+ if (sv->pinned) continue;
+ if (class_of_sv(sv) != cls) continue;
+ return sv;
+ }
+ return NULL;
+}
+
+/* Release the resources owned by a single in-flight CGABIValue arg
+ * after the call has returned: REG storage goes back to the reg pool,
+ * LOCAL storage (only produced by spill_avs_victim) returns its slot
+ * to the per-class spill-slot pool. IMM and other kinds carry no
+ * runtime ownership and need nothing. */
+static void release_arg_storage(CG* g, const Operand* st) {
+ if (st->kind == OPK_REG) {
+ g->target->free_reg(g->target, st->v.reg);
+ } else if (st->kind == OPK_LOCAL) {
+ return_spill_slot(g, st->v.frame_slot, st->cls);
+ }
+}
+
+/* Try to spill a CGABIValue arg from the in-flight set whose `storage`
+ * is an OPK_REG of `cls`. Mutates the entry: storage becomes OPK_LOCAL
+ * pointing at a freshly-taken spill slot, and the backend's call
+ * lowering will load from there when materializing the arg into its
+ * ABI register. Returns 1 if a victim was found and spilled, 0 if none
+ * was eligible. Used by alloc_reg_or_spill as a fallback when the
+ * value stack has no eligible victim — see CG.avs_in_flight. */
+static int spill_avs_victim(CG* g, u8 cls) {
+ CGTarget* T = g->target;
+ if (!g->avs_in_flight) return 0;
+ for (u32 i = 0; i < g->avs_in_flight_n; ++i) {
+ CGABIValue* av = &g->avs_in_flight[i];
+ if (av->storage.kind != OPK_REG) continue;
+ if (av->storage.cls != cls) continue;
+ FrameSlot slot = take_spill_slot(g, cls);
+ SValue tmp = make_sv(av->storage, av->type);
+ MemAccess ma = mem_for_spill(g, &tmp);
+ T->spill_reg(T, av->storage, slot, ma);
+ /* Rewrite storage to OPK_LOCAL so the backend reads from the slot.
+ * The cls is preserved on the operand so the cleanup loop knows
+ * which spill-slot pool to return the slot to. */
+ Operand local = op_local(slot, av->type);
+ local.cls = cls;
+ av->storage = local;
+ return 1;
+ }
+ return 0;
+}
+
+/* Allocate a register; on pool exhaustion, spill the deepest live
+ * RES_REG value to a frame slot and try again. If the value stack has
+ * no eligible victim, fall back to spilling an in-flight cg_call arg
+ * (see avs_in_flight) — without that fallback, a call with more args
+ * than the pool size can hold becomes unsatisfiable since the popped-
+ * but-not-yet-emitted arg regs would sit unreclaimable in avs[]. */
+static Reg alloc_reg_or_spill(CG* g, u8 cls, const Type* ty) {
+ CGTarget* T = g->target;
+ Reg r = T->alloc_reg(T, cls, ty);
+ if (r != (Reg)REG_NONE) return r;
+
+ SValue* victim = pick_victim(g, cls);
+ if (victim) {
+ FrameSlot slot = take_spill_slot(g, cls);
+ Operand victim_reg = op_reg((Reg)reg_of_sv(victim), victim->type);
+ T->spill_reg(T, victim_reg, slot, mem_for_spill(g, victim));
+
+ victim->spill_slot = slot;
+ victim->res = RES_SPILLED;
+ /* Mark the reg slot as REG_NONE so any stale read fails fast; the
+ * reload via ensure_reg will write the new reg back into the same
+ * field. */
+ set_owned_reg(victim, (Reg)REG_NONE);
+ } else if (!spill_avs_victim(g, cls)) {
+ compiler_panic(g->c, g->cur_loc,
+ "cg: regalloc — no spillable victim (class %u)",
+ (unsigned)cls);
+ }
+
+ r = T->alloc_reg(T, cls, ty);
+ if (r == (Reg)REG_NONE) {
+ compiler_panic(g->c, g->cur_loc,
+ "cg: regalloc — class %u still empty after spill",
+ (unsigned)cls);
+ }
+ return r;
+}
+
+/* Reload a spilled SValue back into a register (possibly evicting another
+ * live value). After return, `sv->res == RES_REG` and the operand reflects
+ * the reloaded register. INDIRECT lvalues are restored to OPK_INDIRECT
+ * with a fresh base; the deferred-load identity is preserved. */
+static void ensure_reg(CG* g, SValue* sv) {
+ if (sv->res != RES_SPILLED) return;
+ CGTarget* T = g->target;
+ u8 cls = class_of_sv(sv);
+ const Type* ty = sv_owned_reg_type(g, sv);
+ Reg r = alloc_reg_or_spill(g, cls, ty);
+ T->reload_reg(T, op_reg(r, ty), sv->spill_slot, mem_for_spill(g, sv));
+ return_spill_slot(g, sv->spill_slot, cls);
+ sv->spill_slot = FRAME_SLOT_NONE;
+ /* For INDIRECT, the lvalue's deferred-load identity is preserved: only
+ * the base reg changes. For everything else, refresh the whole REG
+ * operand so its `type` matches the SValue's current type tag. */
+ if (sv->op.kind == OPK_INDIRECT) {
+ sv->op.v.ind.base = r;
+ } else {
+ sv->op = op_reg(r, sv_type(sv));
+ }
+ sv->res = RES_REG;
+}
+
+/* Release any register or spill slot owned by an SValue popped off the
+ * stack and not consumed by a downstream operation. */
+static void release(CG* g, SValue* sv) {
+ if (sv->res == RES_REG) {
+ g->target->free_reg(g->target, (Reg)reg_of_sv(sv));
+ } else if (sv->res == RES_SPILLED) {
+ return_spill_slot(g, sv->spill_slot, class_of_sv(sv));
+ sv->spill_slot = FRAME_SLOT_NONE;
+ }
+ sv->res = RES_INHERENT;
+}
+
/* ============================================================
* Construction
* ============================================================ */
@@ -212,6 +533,12 @@ void cg_free(CG* g) {
if (!g) return;
h = g->c->env->heap;
if (g->stack) h->free(h, g->stack, sizeof(SValue) * g->cap);
+ for (u32 c = 0; c < 3; ++c) {
+ if (g->slot_pools[c].free) {
+ h->free(h, g->slot_pools[c].free,
+ sizeof(FrameSlot) * g->slot_pools[c].cap);
+ }
+ }
h->free(h, g, sizeof *g);
}
@@ -227,6 +554,10 @@ void cg_func_begin(CG* g, const CGFuncDesc* fd) {
g->fn_ret_type = fd->fn_type ? fd->fn_type->fn.ret : NULL;
g->fn_abi = fd->abi;
g->sp = 0;
+ /* Per-function spill-slot free-lists reset. The backing arrays are
+ * reused; only the counts go to zero since slot ids belong to the new
+ * function's frame. */
+ for (u32 c = 0; c < 3; ++c) g->slot_pools[c].n = 0;
/* Class-1 DWARF: a new subprogram opens. doc/DWARF.md §3.1 makes this
* the parser's job; we forward through cg as a convenience hook. */
@@ -270,10 +601,7 @@ void cg_bind_decl(CG* g, DeclId id) {
* ============================================================ */
void cg_push_int(CG* g, i64 v, const Type* ty) {
- SValue sv;
- sv.op = op_imm(v, ty);
- sv.type = ty;
- push(g, sv);
+ push(g, make_sv(op_imm(v, ty), ty));
}
void cg_push_const(CG* g, ConstBytes cb) {
@@ -281,15 +609,10 @@ void cg_push_const(CG* g, ConstBytes cb) {
* stack value is plain rvalue REG. The constant pool / immediate-encoding
* choice is the backend's. */
CGTarget* T = g->target;
- Reg r = T->alloc_reg(T, type_class(cb.type), cb.type);
+ Reg r = alloc_reg_or_spill(g, type_class(cb.type), cb.type);
Operand dst = op_reg(r, cb.type);
T->load_const(T, dst, cb);
- {
- SValue sv;
- sv.op = dst;
- sv.type = cb.type;
- push(g, sv);
- }
+ push(g, make_sv(dst, cb.type));
}
void cg_push_float(CG* g, double v, const Type* ty) {
@@ -330,43 +653,28 @@ void cg_push_local(CG* g, FrameSlot s) {
* scope record, not through cg, so the push uses NULL type and the
* subsequent cg_load supplies the right type. The parser actually pushes
* via the type-aware variant; this base entry is here for completeness. */
- SValue sv;
- sv.op = op_local(s, NULL);
- sv.type = NULL;
- push(g, sv);
+ push(g, make_sv(op_local(s, NULL), NULL));
}
/* Type-aware variants used by the parser. Not in the public header; the
* parser calls these directly via a small extension below. */
void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty);
void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty) {
- SValue sv;
- sv.op = op_local(s, ty);
- sv.type = ty;
- push(g, sv);
+ push(g, make_sv(op_local(s, ty), ty));
}
/* Pop a pointer rvalue and push an OPK_INDIRECT lvalue for the pointee.
* The parser uses this to implement unary `*`. The pointer is materialized
* into a register; the resulting lvalue's MemAccess alias root is unknown
* (not LOCAL/GLOBAL), which is the right conservative answer for *ptr. */
-static Operand force_reg(CG* g, SValue v, const Type* ty);
+static Operand force_reg(CG* g, SValue* v, const Type* ty);
void cg_deref(CG* g, const Type* pointee_ty);
void cg_deref(CG* g, const Type* pointee_ty) {
SValue v = pop(g);
- const Type* pty = v.type ? v.type : v.op.type;
- Operand src = force_reg(g, v, pty);
- Operand ind;
- SValue sv;
- memset(&ind, 0, sizeof ind);
- ind.kind = OPK_INDIRECT;
- ind.cls = RC_INT;
- ind.type = pointee_ty;
- ind.v.ind.base = src.v.reg;
- ind.v.ind.ofs = 0;
- sv.op = ind;
- sv.type = pointee_ty;
- push(g, sv);
+ /* The pointer reg becomes the new lvalue's base — ownership transfers
+ * from `v` to the new INDIRECT SValue, so no release on `v`. */
+ Operand src = force_reg(g, &v, sv_type(&v));
+ push(g, make_sv(op_indirect(src.v.reg, 0, pointee_ty), pointee_ty));
}
/* Read the type of the value currently on top of the stack without popping.
@@ -398,25 +706,8 @@ void cg_retag_top(CG* g, const Type* ty) {
g->stack[g->sp - 1].op.type = ty;
}
-/* Recycle the backend's scratch-register pool. Safe only when nothing on
- * the value stack holds a register operand (sp == 0 in particular, but
- * an all-IMM stack is also fine in principle). The parser calls this at
- * statement boundaries so a function body with many sequential reg-
- * allocating operations doesn't exhaust the fixed scratch window. */
-extern void aa_reset_scratch(CGTarget*);
-void cg_reset_scratch(CG* g);
-void cg_reset_scratch(CG* g) {
- if (g->sp != 0) return;
- /* For now we only know about the aarch64 backend; once a second arch
- * lands we promote this to a CGTarget vtable entry. */
- aa_reset_scratch(g->target);
-}
-
void cg_push_global(CG* g, ObjSymId sym, const Type* ty) {
- SValue sv;
- sv.op = op_global(sym, 0, ty);
- sv.type = ty;
- push(g, sv);
+ push(g, make_sv(op_global(sym, 0, ty), ty));
}
/* ============================================================
@@ -424,8 +715,47 @@ void cg_push_global(CG* g, ObjSymId sym, const Type* ty) {
* ============================================================ */
void cg_dup(CG* g) {
- SValue v = *top(g);
- push(g, v);
+ /* Duplicate the top SValue. INHERENT values (IMM/LOCAL/GLOBAL) carry
+ * no register ownership — copying the SValue is enough. REG-owning
+ * values must materialize into a second register so each side of the
+ * dup can be released independently; the contents come over via
+ * target->copy.
+ *
+ * The source stays on the stack while the destination register is
+ * allocated, which means alloc_reg_or_spill could in principle pick
+ * the source as the spill victim if pool pressure is high and there
+ * is no other eligible RES_REG on the stack. Pin the source for the
+ * duration to keep pick_victim away from it. */
+ CGTarget* T = g->target;
+ if (g->sp == 0) compiler_panic(g->c, g->cur_loc, "cg_dup: stack empty");
+ SValue* top_p = &g->stack[g->sp - 1];
+ ensure_reg(g, top_p);
+ SValue v = *top_p; /* snapshot AFTER reload — v.op now reflects fresh reg */
+ if (v.res != RES_REG) {
+ push(g, v);
+ return;
+ }
+ top_p->pinned = 1;
+ const Type* ty = sv_owned_reg_type(g, &v);
+ Reg r = alloc_reg_or_spill(g, class_of_sv(&v), ty);
+ T->copy(T, op_reg(r, ty), op_reg((Reg)reg_of_sv(&v), ty));
+ /* Refresh the stack pointer: alloc_reg_or_spill above may have spilled
+ * a different stack entry (the top was pinned, so it stayed put), but
+ * the SValue array address could have been reallocated by a future
+ * grow. Today no path inside copy/spill grows the stack, but reading
+ * through the original `top_p` after a potential realloc would be UB
+ * regardless. */
+ g->stack[g->sp - 1].pinned = 0;
+ /* The duplicate is `v` with its owned reg replaced by `r`. set_owned_reg
+ * writes into ind.base for INDIRECT lvalues or v.reg for REG rvalues —
+ * either way the duplicate ends up RES_REG with the freshly-copied
+ * value, independent of the source. */
+ SValue dup = v;
+ set_owned_reg(&dup, r);
+ dup.res = RES_REG;
+ dup.pinned = 0;
+ dup.spill_slot = FRAME_SLOT_NONE;
+ push(g, dup);
}
void cg_swap(CG* g) {
@@ -438,7 +768,10 @@ void cg_swap(CG* g) {
g->stack[g->sp - 2] = a;
}
-void cg_drop(CG* g) { (void)pop(g); }
+void cg_drop(CG* g) {
+ SValue v = pop(g);
+ release(g, &v);
+}
/* ============================================================
* load / store / addr
@@ -451,49 +784,34 @@ static int is_lvalue(const Operand* o) {
void cg_load(CG* g) {
SValue v = pop(g);
- CGTarget* T = g->target;
+ ensure_reg(g, &v);
if (!is_lvalue(&v.op)) {
/* Already an rvalue — passing-through is correct (cg_load is idempotent
* on rvalues so the parser can call it eagerly). */
push(g, v);
return;
}
- {
- const Type* ty = v.type ? v.type : v.op.type;
- Reg r = T->alloc_reg(T, type_class(ty), ty);
- Operand dst = op_reg(r, ty);
- MemAccess ma;
- AliasKind ak = alias_for_lvalue(&v.op);
- i32 alias_local = (ak == ALIAS_LOCAL) ? (i32)v.op.v.frame_slot : 0;
- ma = derive_mem(g, ty, ak, alias_local);
- T->load(T, dst, v.op, ma);
- {
- SValue rv;
- rv.op = dst;
- rv.type = ty;
- push(g, rv);
- }
- }
+ /* force_reg's lvalue branch does exactly what cg_load wants: alloc a
+ * fresh value reg, T->load through the lvalue's MemAccess, free the
+ * old INDIRECT base if any, retag v as RES_REG. */
+ const Type* ty = sv_type(&v);
+ Operand dst = force_reg(g, &v, ty);
+ push(g, make_sv(dst, ty));
}
void cg_addr(CG* g) {
SValue v = pop(g);
CGTarget* T = g->target;
+ ensure_reg(g, &v);
if (!is_lvalue(&v.op)) {
compiler_panic(g->c, g->cur_loc, "cg_addr: operand is not an lvalue");
}
- {
- const Type* pty = type_ptr(g->pool, v.type ? v.type : v.op.type);
- Reg r = T->alloc_reg(T, RC_INT, pty);
- Operand dst = op_reg(r, pty);
- T->addr_of(T, dst, v.op);
- {
- SValue rv;
- rv.op = dst;
- rv.type = pty;
- push(g, rv);
- }
- }
+ const Type* pty = type_ptr(g->pool, sv_type(&v));
+ Reg r = alloc_reg_or_spill(g, RC_INT, pty);
+ Operand dst = op_reg(r, pty);
+ T->addr_of(T, dst, v.op);
+ release(g, &v);
+ push(g, make_sv(dst, pty));
}
void cg_store(CG* g) {
@@ -506,34 +824,25 @@ void cg_store(CG* g) {
SValue rv = pop(g);
SValue lv = pop(g);
CGTarget* T = g->target;
+ ensure_reg(g, &rv);
+ ensure_reg(g, &lv);
if (!is_lvalue(&lv.op)) {
compiler_panic(g->c, g->cur_loc, "cg_store: destination is not an lvalue");
}
- {
- const Type* ty = lv.type ? lv.type : lv.op.type;
- AliasKind ak = alias_for_lvalue(&lv.op);
- i32 alias_local = (ak == ALIAS_LOCAL) ? (i32)lv.op.v.frame_slot : 0;
- MemAccess ma = derive_mem(g, ty, ak, alias_local);
- /* IMM is a legal source for store; otherwise force into a reg. */
- Operand src = rv.op;
- if (src.kind != OPK_REG && src.kind != OPK_IMM) {
- Reg r = T->alloc_reg(T, type_class(ty), ty);
- Operand dst = op_reg(r, ty);
- MemAccess mr;
- AliasKind sak = alias_for_lvalue(&src);
- i32 saloc = (sak == ALIAS_LOCAL) ? (i32)src.v.frame_slot : 0;
- mr = derive_mem(g, rv.type ? rv.type : ty, sak, saloc);
- T->load(T, dst, src, mr);
- src = dst;
- }
- T->store(T, lv.op, src, ma);
- {
- SValue out;
- out.op = src;
- out.type = ty;
- push(g, out);
- }
+ const Type* ty = sv_type(&lv);
+ /* IMM is a legal source for store; otherwise force the rvalue into a
+ * register. force_reg handles the lvalue → REG transition cleanly. */
+ Operand src;
+ if (rv.op.kind == OPK_IMM || rv.op.kind == OPK_REG) {
+ src = rv.op;
+ } else {
+ src = force_reg(g, &rv, sv_type(&rv));
}
+ T->store(T, lv.op, src, mem_for_lvalue(g, &lv.op, ty));
+ release(g, &lv);
+ /* Result of assignment expression: leave the stored rvalue on top.
+ * Ownership of any reg in `src` transfers to the new SValue. */
+ push(g, make_sv(src, ty));
}
/* ============================================================
@@ -561,26 +870,32 @@ void cg_bitfield_store(CG* g, BitFieldAccess b) {
* Arithmetic / compare / convert
* ============================================================ */
-/* Force an SValue into a register operand of the given type. */
-static Operand force_reg(CG* g, SValue v, const Type* ty) {
+/* Force an SValue (already popped, by reference) into a register operand
+ * of the given type. Mutates `*v` so that v->op is OPK_REG and v->res is
+ * RES_REG; on lvalue inputs this means the original lvalue's base reg is
+ * freed and replaced by the freshly-loaded value reg. The caller can
+ * then release(g, v) to give the register back when the operation is
+ * done with it. */
+static Operand force_reg(CG* g, SValue* v, const Type* ty) {
CGTarget* T = g->target;
- if (v.op.kind == OPK_REG) return v.op;
- if (v.op.kind == OPK_IMM) {
- Reg r = T->alloc_reg(T, type_class(ty), ty);
- Operand dst = op_reg(r, ty);
- T->load_imm(T, dst, v.op.v.imm);
- return dst;
- }
- if (is_lvalue(&v.op)) {
- Reg r = T->alloc_reg(T, type_class(ty), ty);
- Operand dst = op_reg(r, ty);
- AliasKind ak = alias_for_lvalue(&v.op);
- i32 al = (ak == ALIAS_LOCAL) ? (i32)v.op.v.frame_slot : 0;
- MemAccess ma = derive_mem(g, ty, ak, al);
- T->load(T, dst, v.op, ma);
- return dst;
+ ensure_reg(g, v);
+ if (v->op.kind == OPK_REG) return v->op;
+ Reg r = alloc_reg_or_spill(g, type_class(ty), ty);
+ Operand dst = op_reg(r, ty);
+ if (v->op.kind == OPK_IMM) {
+ T->load_imm(T, dst, v->op.v.imm);
+ } else if (is_lvalue(&v->op)) {
+ T->load(T, dst, v->op, mem_for_lvalue(g, &v->op, ty));
+ /* Old INDIRECT base reg is no longer referenced — release it. */
+ if (v->op.kind == OPK_INDIRECT) {
+ T->free_reg(T, v->op.v.ind.base);
+ }
+ } else {
+ compiler_panic(g->c, g->cur_loc, "cg: cannot force operand to register");
}
- compiler_panic(g->c, g->cur_loc, "cg: cannot force operand to register");
+ v->op = dst;
+ v->res = RES_REG;
+ return dst;
}
void cg_binop(CG* g, BinOp op) {
@@ -590,33 +905,26 @@ void cg_binop(CG* g, BinOp op) {
CGTarget* T = g->target;
/* Result type is `a`'s type at this slice (parser already coerced). */
const Type* ty = a.type ? a.type : b.type;
- Operand ra = force_reg(g, a, ty);
- Operand rb = force_reg(g, b, ty);
- Reg rr = T->alloc_reg(T, type_class(ty), ty);
+ Operand ra = force_reg(g, &a, ty);
+ Operand rb = force_reg(g, &b, ty);
+ Reg rr = alloc_reg_or_spill(g, type_class(ty), ty);
Operand dst = op_reg(rr, ty);
T->binop(T, op, dst, ra, rb);
- {
- SValue sv;
- sv.op = dst;
- sv.type = ty;
- push(g, sv);
- }
+ release(g, &a);
+ release(g, &b);
+ push(g, make_sv(dst, ty));
}
void cg_unop(CG* g, UnOp op) {
SValue a = pop(g);
CGTarget* T = g->target;
const Type* ty = a.type ? a.type : a.op.type;
- Operand ra = force_reg(g, a, ty);
- Reg rr = T->alloc_reg(T, type_class(ty), ty);
+ Operand ra = force_reg(g, &a, ty);
+ Reg rr = alloc_reg_or_spill(g, type_class(ty), ty);
Operand dst = op_reg(rr, ty);
T->unop(T, op, dst, ra);
- {
- SValue sv;
- sv.op = dst;
- sv.type = ty;
- push(g, sv);
- }
+ release(g, &a);
+ push(g, make_sv(dst, ty));
}
void cg_cmp(CG* g, CmpOp op) {
@@ -626,17 +934,14 @@ void cg_cmp(CG* g, CmpOp op) {
CGTarget* T = g->target;
const Type* opty = a.type ? a.type : b.type;
const Type* i32 = type_prim(g->pool, TY_INT);
- Operand ra = force_reg(g, a, opty);
- Operand rb = force_reg(g, b, opty);
- Reg rr = T->alloc_reg(T, RC_INT, i32);
+ Operand ra = force_reg(g, &a, opty);
+ Operand rb = force_reg(g, &b, opty);
+ Reg rr = alloc_reg_or_spill(g, RC_INT, i32);
Operand dst = op_reg(rr, i32);
T->cmp(T, op, dst, ra, rb);
- {
- SValue sv;
- sv.op = dst;
- sv.type = i32;
- push(g, sv);
- }
+ release(g, &a);
+ release(g, &b);
+ push(g, make_sv(dst, i32));
}
void cg_inc_dec(CG* g, BinOp op, int post) {
@@ -644,45 +949,31 @@ void cg_inc_dec(CG* g, BinOp op, int post) {
* because juggling lv + old + new through dup/swap from outside requires
* a 3-element rotate the stack API doesn't expose. */
CGTarget* T = g->target;
- SValue lv;
- const Type* ty;
- AliasKind ak;
- i32 alias_local;
- MemAccess ma;
- Reg r_old;
- Reg r_new;
- Operand o_old;
- Operand o_new;
- Operand o_one;
-
- lv = pop(g);
+ SValue lv = pop(g);
+ ensure_reg(g, &lv);
if (!is_lvalue(&lv.op)) {
compiler_panic(g->c, g->cur_loc,
"cg_inc_dec: target is not an lvalue");
}
- ty = lv.type ? lv.type : lv.op.type;
- ak = alias_for_lvalue(&lv.op);
- alias_local = (ak == ALIAS_LOCAL) ? (i32)lv.op.v.frame_slot : 0;
- ma = derive_mem(g, ty, ak, alias_local);
+ const Type* ty = sv_type(&lv);
+ MemAccess ma = mem_for_lvalue(g, &lv.op, ty);
/* Load current value into r_old, compute r_new = r_old +/- 1, store back. */
- r_old = T->alloc_reg(T, type_class(ty), ty);
- o_old = op_reg(r_old, ty);
+ Reg r_old = alloc_reg_or_spill(g, type_class(ty), ty);
+ Operand o_old = op_reg(r_old, ty);
T->load(T, o_old, lv.op, ma);
- r_new = T->alloc_reg(T, type_class(ty), ty);
- o_new = op_reg(r_new, ty);
- o_one = op_imm(1, ty);
- T->binop(T, op, o_new, o_old, o_one);
+ Reg r_new = alloc_reg_or_spill(g, type_class(ty), ty);
+ Operand o_new = op_reg(r_new, ty);
+ T->binop(T, op, o_new, o_old, op_imm(1, ty));
T->store(T, lv.op, o_new, ma);
- {
- SValue sv;
- sv.op = post ? o_old : o_new;
- sv.type = ty;
- push(g, sv);
- }
+ /* Free whichever register is NOT being returned, plus any base reg the
+ * lvalue owned. */
+ T->free_reg(T, post ? r_new : r_old);
+ release(g, &lv);
+ push(g, make_sv(post ? o_old : o_new, ty));
}
void cg_convert(CG* g, const Type* dst_ty) {
@@ -698,8 +989,8 @@ void cg_convert(CG* g, const Type* dst_ty) {
push(g, v);
return;
}
- src = force_reg(g, v, sty);
- rr = T->alloc_reg(T, type_class(dst_ty), dst_ty);
+ src = force_reg(g, &v, sty);
+ rr = alloc_reg_or_spill(g, type_class(dst_ty), dst_ty);
dst = op_reg(rr, dst_ty);
/* Pick a ConvKind from src/dst kinds. v1 spine only sees integer↔integer
* (sign/zero ext + trunc); float and bitcast follow the same dispatch. */
@@ -733,12 +1024,8 @@ void cg_convert(CG* g, const Type* dst_ty) {
}
}
T->convert(T, ck, dst, src);
- {
- SValue sv;
- sv.op = dst;
- sv.type = dst_ty;
- push(g, sv);
- }
+ release(g, &v);
+ push(g, make_sv(dst, dst_ty));
}
/* ============================================================
@@ -749,60 +1036,51 @@ void cg_call(CG* g, u32 nargs, const Type* fn_type) {
/* stack: [..., callee, arg0..argN-1] → [result] (or nothing if void) */
CGTarget* T = g->target;
const ABIFuncInfo* abi = abi_func_info(g->abi, fn_type);
- CGABIValue* avs = NULL;
- CGABIValue ret_v;
- CGCallDesc desc;
- Operand callee_op;
- SValue callee;
- u32 i;
+ const Type* ret_ty = fn_type->fn.ret;
+ int has_result = ret_ty && ret_ty->kind != TY_VOID;
if (g->sp < (u32)nargs + 1u) {
compiler_panic(g->c, g->cur_loc, "cg_call: stack underflow");
}
+ CGABIValue* avs = NULL;
if (nargs) {
avs = arena_array(g->c->tu, CGABIValue, nargs);
memset(avs, 0, sizeof(CGABIValue) * nargs);
}
- /* Pop args in reverse so we can fill avs[i] in declaration order. */
- for (i = 0; i < nargs; ++i) {
+ /* Expose avs to the regalloc fallback. As we pop and materialize args
+ * one at a time, the popped regs accumulate in avs[] off the value
+ * stack, where pick_victim can't reach them. If pressure exhausts the
+ * pool while reloading a spilled arg later in the loop, spill_avs_victim
+ * picks an already-materialized avs entry, stores it to a frame slot,
+ * and rewrites avs[i].storage to OPK_LOCAL — the backend's call
+ * lowering loads from the slot. */
+ g->avs_in_flight = avs;
+ g->avs_in_flight_n = nargs;
+
+ /* Pop args in reverse so we can fill avs[i] in declaration order.
+ * Lvalues materialize into a register through force_reg (which also
+ * frees an old INDIRECT base); OPK_IMM and OPK_REG pass through so
+ * the call sees the same operand. */
+ for (u32 i = 0; i < nargs; ++i) {
u32 idx = nargs - 1u - i;
SValue arg = pop(g);
+ ensure_reg(g, &arg);
const Type* aty = fn_type->fn.params ? fn_type->fn.params[idx] : arg.type;
- Operand src;
- /* Materialize into an Operand the backend can route through ABI parts.
- * For simple scalars REG/IMM is enough; aggregates would force LOCAL. */
- if (arg.op.kind == OPK_LOCAL || arg.op.kind == OPK_GLOBAL ||
- arg.op.kind == OPK_INDIRECT) {
- /* lvalue: backend may need an address (byval/indirect) or a loaded
- * value. Spine: scalars only — load to register. */
- Reg r = T->alloc_reg(T, type_class(aty), aty);
- Operand dst = op_reg(r, aty);
- AliasKind ak = alias_for_lvalue(&arg.op);
- i32 al = (ak == ALIAS_LOCAL) ? (i32)arg.op.v.frame_slot : 0;
- MemAccess ma = derive_mem(g, aty, ak, al);
- T->load(T, dst, arg.op, ma);
- src = dst;
- } else if (arg.op.kind == OPK_IMM) {
- src = arg.op;
- } else {
- src = arg.op;
- }
avs[idx].type = aty;
avs[idx].abi = &abi->params[idx];
- avs[idx].storage = src;
- avs[idx].parts = NULL;
- avs[idx].nparts = 0;
+ avs[idx].storage = is_lvalue(&arg.op) ? force_reg(g, &arg, aty) : arg.op;
}
- callee = pop(g);
- if (callee.op.kind == OPK_GLOBAL) {
- callee_op = callee.op;
- } else {
- /* Indirect call — force into a register if necessary. */
- callee_op = force_reg(g, callee, fn_type);
- }
+ SValue callee = pop(g);
+ ensure_reg(g, &callee);
+ /* Direct calls keep the OPK_GLOBAL operand; indirect calls force the
+ * function pointer into a register. */
+ Operand callee_op = (callee.op.kind == OPK_GLOBAL)
+ ? callee.op
+ : force_reg(g, &callee, fn_type);
+ CGCallDesc desc;
memset(&desc, 0, sizeof desc);
desc.fn_type = fn_type;
desc.abi = abi;
@@ -810,24 +1088,29 @@ void cg_call(CG* g, u32 nargs, const Type* fn_type) {
desc.args = avs;
desc.nargs = nargs;
desc.flags = CG_CALL_NONE;
- /* Return storage: REG of the right class for scalar returns; struct
- * returns would set parts/storage differently. */
- memset(&ret_v, 0, sizeof ret_v);
- ret_v.type = fn_type->fn.ret;
- ret_v.abi = &abi->ret;
- if (ret_v.type && ret_v.type->kind != TY_VOID) {
- Reg r = T->alloc_reg(T, type_class(ret_v.type), ret_v.type);
- ret_v.storage = op_reg(r, ret_v.type);
+ desc.ret.type = ret_ty;
+ desc.ret.abi = &abi->ret;
+ if (has_result) {
+ Reg r = alloc_reg_or_spill(g, type_class(ret_ty), ret_ty);
+ desc.ret.storage = op_reg(r, ret_ty);
}
- desc.ret = ret_v;
T->call(T, &desc);
- if (ret_v.type && ret_v.type->kind != TY_VOID) {
- SValue sv;
- sv.op = ret_v.storage;
- sv.type = ret_v.type;
- push(g, sv);
+ /* Tear down the in-flight arg set: each entry's storage may be a REG
+ * (return to pool) or OPK_LOCAL (a spill slot, return to per-class
+ * free-list). IMMs carry no runtime ownership. */
+ for (u32 i = 0; i < nargs; ++i) {
+ release_arg_storage(g, &avs[i].storage);
+ }
+ g->avs_in_flight = NULL;
+ g->avs_in_flight_n = 0;
+
+ if (callee.op.kind != OPK_GLOBAL) {
+ T->free_reg(T, callee_op.v.reg);
+ }
+ if (has_result) {
+ push(g, make_sv(desc.ret.storage, ret_ty));
}
}
@@ -848,13 +1131,14 @@ void cg_ret(CG* g, int has_value) {
{
SValue v = pop(g);
const Type* rty = g->fn_ret_type;
- Operand ret_op = force_reg(g, v, rty);
+ Operand ret_op = force_reg(g, &v, rty);
CGABIValue av;
memset(&av, 0, sizeof av);
av.type = rty;
av.abi = &abi->ret;
av.storage = ret_op;
T->ret(T, &av);
+ release(g, &v);
}
}
@@ -869,22 +1153,15 @@ void cg_alloca(CG* g) {
* size up to a 16-byte multiple, which is what keeps SP aligned). */
CGTarget* T = g->target;
SValue sz = pop(g);
- Operand sz_op;
const Type* void_ptr = type_ptr(g->pool, type_void(g->pool));
- Reg dst_r;
- Operand dst;
- SValue out;
- if (sz.op.kind == OPK_IMM) {
- sz_op = sz.op;
- } else {
- sz_op = force_reg(g, sz, sz.type ? sz.type : sz.op.type);
- }
- dst_r = T->alloc_reg(T, RC_INT, void_ptr);
- dst = op_reg(dst_r, void_ptr);
+ ensure_reg(g, &sz);
+ Operand sz_op =
+ (sz.op.kind == OPK_IMM) ? sz.op : force_reg(g, &sz, sv_type(&sz));
+ Reg dst_r = alloc_reg_or_spill(g, RC_INT, void_ptr);
+ Operand dst = op_reg(dst_r, void_ptr);
T->alloca_(T, dst, sz_op, /*align=*/16);
- out.op = dst;
- out.type = void_ptr;
- push(g, out);
+ release(g, &sz);
+ push(g, make_sv(dst, void_ptr));
}
void cg_va_start_(CG* g) {
compiler_panic(g->c, g->cur_loc, "cg_va_start: not in v1 slice");
@@ -945,9 +1222,10 @@ void cg_branch_true(CG* g, CGLabel l) {
SValue v = pop(g);
CGTarget* T = g->target;
const Type* ty = v.type ? v.type : type_prim(g->pool, TY_INT);
- Operand a = force_reg(g, v, ty);
+ Operand a = force_reg(g, &v, ty);
Operand zero = op_imm(0, ty);
T->cmp_branch(T, CMP_NE, a, zero, (Label)l);
+ release(g, &v);
}
void cg_branch_false(CG* g, CGLabel l) {
@@ -962,12 +1240,14 @@ void cg_branch_false(CG* g, CGLabel l) {
if (v.op.v.imm == 0) {
T->jump(T, (Label)l);
}
+ release(g, &v);
return;
}
{
- Operand a = force_reg(g, v, ty);
+ Operand a = force_reg(g, &v, ty);
Operand zero = op_imm(0, ty);
T->cmp_branch(T, CMP_EQ, a, zero, (Label)l);
+ release(g, &v);
}
}
@@ -986,7 +1266,11 @@ CGScope cg_scope_begin(CG* g, CGScopeConfig cfg) {
/* Pop the condition. */
SValue v = pop(g);
const Type* ty = v.type ? v.type : type_prim(g->pool, TY_INT);
- d.cond = force_reg(g, v, ty);
+ d.cond = force_reg(g, &v, ty);
+ /* The cond reg is consumed by the backend's scope_begin emit; once
+ * the comparison/branch is in flight there's no live use, so free
+ * it back to the pool now. */
+ release(g, &v);
}
return (CGScope)g->target->scope_begin(g->target, &d);
}
diff --git a/src/parse/parse.c b/src/parse/parse.c
@@ -53,7 +53,6 @@ extern void cg_retag_top(CG*, const Type*);
/* Recycle the backend's scratch-register pool when no value-stack entry
* holds a live register. Called at statement boundaries to avoid
* exhausting the fixed scratch window over the course of a function. */
-extern void cg_reset_scratch(CG*);
/* ============================================================
* Keywords
@@ -1636,7 +1635,6 @@ static int parse_decl_suffix(Parser* p, DeclSuffix* out) {
cg_swap(p->cg);
cg_store(p->cg);
cg_drop(p->cg);
- cg_reset_scratch(p->cg);
p->vla_pending = 1;
p->vla_pending_count_slot = out->vla_count_slot;
}
@@ -1823,7 +1821,6 @@ static void zero_init_at(Parser* p, FrameSlot slot, const Type* arr_ty,
cg_push_int(p->cg, 0, ty);
cg_store(p->cg);
cg_drop(p->cg);
- cg_reset_scratch(p->cg);
}
/* Parse the initializer for the sub-object at `offset` of type `ty`. Arrays
@@ -1862,7 +1859,6 @@ static void init_at(Parser* p, FrameSlot slot, const Type* arr_ty, u32 offset,
to_rvalue(p);
cg_store(p->cg);
cg_drop(p->cg);
- cg_reset_scratch(p->cg);
if (had_brace) {
accept_punct(p, ','); /* tolerate trailing comma inside `{ x, }` */
expect_punct(p, '}', "'}' after scalar initializer");
@@ -1900,7 +1896,6 @@ static void parse_init_declarator(Parser* p, const DeclSpecs* specs) {
cg_swap(p->cg);
cg_store(p->cg);
cg_drop(p->cg);
- cg_reset_scratch(p->cg);
if (accept_punct(p, '=')) {
perr(p, "VLA initializers are not allowed (§6.7.9 ¶3)");
}
@@ -2092,7 +2087,6 @@ static void parse_stmt(Parser* p) {
/* Each statement starts from an empty value stack; recycle scratch
* registers so a function body with many sequential reg-allocating
* operations isn't bounded by the backend's fixed scratch window. */
- cg_reset_scratch(p->cg);
cg_set_loc(p->cg, tok_loc(&p->cur));
if (is_punct(&p->cur, '{')) {
parse_compound_stmt(p);
diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md
@@ -143,6 +143,7 @@ here for completeness once they're real cases.
| `6_5_30_generic_selection`| · | `int x=42; return _Generic((x), int: x, default: 0);` | 42 |
| `6_5_31_subscript_commute`| ★ | `int a[5]={0,0,42,0,0}; return 2[a];` | 42 |
| `6_5_32_string_subscript` | ★ | `return "*"[0];` | 42 |
+| `6_5_33_regalloc_spill` | ★ | 12-arg `sum12(x1+0, ..., x12+0)` — exceeds the 10-INT scratch pool, exercises `spill_reg`/`reload_reg` and the cg_call avs-in-flight fallback (see doc/REGALLOC.md) | 78 |
## §6.6 Constant expressions
diff --git a/test/parse/cases/6_5_33_regalloc_spill.c b/test/parse/cases/6_5_33_regalloc_spill.c
@@ -0,0 +1,32 @@
+/* Regalloc spill stress.
+ *
+ * Each `xN + 0` arg is a binop that leaves a register-resident rvalue on
+ * the cg value stack until cg_call materializes it. With 12 simultaneously
+ * accumulating arg values and a 10-INT scratch pool on aarch64, several
+ * code paths the spill driver must handle get exercised:
+ *
+ * - alloc_reg_or_spill returning REG_NONE on pool exhaustion and
+ * picking the deepest unpinned RES_REG via pick_victim, then routing
+ * it through spill_reg / take_spill_slot.
+ * - ensure_reg reloading SPILLED args inside cg_call's arg-pop loop,
+ * consuming pool slots as it goes.
+ * - The avs-in-flight fallback: once cg_call has popped enough RES_REG
+ * args into avs[] to drain the pool, alloc_reg_or_spill finds no
+ * stack victim (later args are SPILLED, earlier args are off-stack
+ * in avs[]) and falls back to spilling an OPK_REG avs entry through
+ * spill_avs_victim, rewriting it to OPK_LOCAL so the backend's call
+ * lowering loads from the spill slot.
+ *
+ * 1+2+...+12 = 78. */
+
+int sum12(int a, int b, int c, int d, int e, int f,
+ int g, int h, int i, int j, int k, int l) {
+ return a + b + c + d + e + f + g + h + i + j + k + l;
+}
+
+int test_main(void) {
+ int x1 = 1, x2 = 2, x3 = 3, x4 = 4, x5 = 5, x6 = 6;
+ int x7 = 7, x8 = 8, x9 = 9, x10 = 10, x11 = 11, x12 = 12;
+ return sum12(x1 + 0, x2 + 0, x3 + 0, x4 + 0, x5 + 0, x6 + 0,
+ x7 + 0, x8 + 0, x9 + 0, x10 + 0, x11 + 0, x12 + 0);
+}
diff --git a/test/parse/cases/6_5_33_regalloc_spill.expected b/test/parse/cases/6_5_33_regalloc_spill.expected
@@ -0,0 +1 @@
+78