cg: CG-driven spill/reload on a finite physical-register pool - kit

commit 97244390fc17b2a65d6b516201addd537d325ec7
parent 0f31a272680990b067b56657b85fb69e396d33df
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 07:20:00 -0700

cg: CG-driven spill/reload on a finite physical-register pool

Implement doc/REGALLOC.md. Replace the aarch64 backend's monotonic
scratch-cursor + statement-boundary reset with bitmask RegPools that
hand back free regs on demand. CG tracks per-SValue residency
(INHERENT/REG/SPILLED), spills the deepest unpinned RES_REG when the
pool exhausts, and reloads on consumption. cg_call publishes its
in-flight avs[] so the spill driver can re-spill an already-
materialized arg's reg (rewriting storage to OPK_LOCAL) when the
stack victim list runs dry — required for calls with more reg-class
args than the pool can hold.

Removes cg_reset_scratch (which silently corrupted opt's bytes when
opt was in the chain) and the aa_panic("spill_reg/reload_reg") stubs.
free_reg becomes load-bearing; every pop site routes through release().

test/parse: 306 pass → 322 pass (+16, no regressions). New
6_5_33_regalloc_spill row exercises 12-arg call spill paths.

Diffstat:
A doc/REGALLOC.md  | 346 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M src/arch/aarch64.c  | 174 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------
M src/cg/cg.c  | 816 +++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------
M src/parse/parse.c  | 6 ------
M test/parse/CORPUS.md  | 1 +
A test/parse/cases/6_5_33_regalloc_spill.c  | 32 ++++++++++++++++++++++++++++++++
A test/parse/cases/6_5_33_regalloc_spill.expected  | 1 +

7 files changed, 1048 insertions(+), 328 deletions(-)
diff --git a/doc/REGALLOC.md b/doc/REGALLOC.md
@@ -0,0 +1,346 @@
+# REGALLOC — CG-driven spill/reload on a finite physical-register pool
+
+The single-pass codegen produces target machine code by streaming
+`CGTarget` calls. Backends expose a finite scratch-register pool
+(aarch64 today: 10 INT, 16 FP). CG must drive that pool correctly under
+arbitrary register pressure: when the pool is empty and another reg is
+needed, spill the deepest live value on the CG value stack to a frame
+slot, free its register, and proceed. When a spilled value is consumed,
+reload it first.
+
+This document defines the contract between CG and the backend and the
+residency state machine on the value stack. Opt is out of scope —
+`opt_cgtarget` panics on `spill_reg`/`reload_reg` today and that
+remains the case until the deferred Phase 3 RA pass lands.
+
+---
+
+## 1. What's broken today
+
+`aa_alloc_reg` panics with "out of INT scratch (no spill yet)" once 10
+integer scratch regs are live. To work around this, CG calls
+`cg_reset_scratch` at every statement boundary
+(`parse/parse.c:1618`), which calls `aa_reset_scratch(g->target)`
+directly (`cg/cg.c:412`). That direct call is a layering violation:
+
+- It hardcodes the aarch64 backend by name.
+- When opt is in the chain, `g->target` is an `OptImpl*`, not an
+  `AAImpl*`. `aa_reset_scratch` casts it as the latter and writes
+  `used_int=0; used_fp=0;` at AAImpl offsets inside opt's bytes —
+  silent memory corruption on every statement boundary.
+- The reset hides the fact that CG never calls `free_reg` on consumed
+  SValues. The pool is recycled by fiat, not by tracking ownership.
+
+Both `cg_reset_scratch` and `aa_reset_scratch` are removed by this
+design. `free_reg` becomes load-bearing.
+
+---
+
+## 2. CGTarget contract changes
+
+### 2.1 `alloc_reg` returns `REG_NONE` on exhaustion
+
+```c
+Reg (*alloc_reg)(CGTarget*, RegClass, const Type*);
+```
+
+Returns a fresh physical reg, or `REG_NONE` if the class's pool is
+empty. Backends never panic on exhaustion — that decision belongs to
+CG. Other failure modes (unknown `RegClass`, internal invariant
+violation) still panic.
+
+### 2.2 `free_reg` is real
+
+```c
+void (*free_reg)(CGTarget*, Reg);
+```
+
+Returns `r` to the backend's free-list. Idempotent calls and
+double-frees are bugs and may panic. CG owes exactly one `free_reg`
+per `alloc_reg` over the lifetime of every value.
+
+### 2.3 `spill_reg` implies `free_reg`
+
+```c
+void (*spill_reg)(CGTarget*, Operand src_reg, FrameSlot, MemAccess);
+```
+
+`src_reg` must be `OPK_REG`. The call:
+
+1. Stores the register's contents to the frame slot per `MemAccess`.
+2. Returns the register to the backend's free-list.
+
+After `spill_reg`, the caller must not use `src_reg` and must not call
+`free_reg` on it. Coupling these two operations matches every CG
+caller's intent and removes a forget-to-free foot-gun.
+
+### 2.4 `reload_reg` is independent
+
+```c
+void (*reload_reg)(CGTarget*, Operand dst_reg, FrameSlot, MemAccess);
+```
+
+`dst_reg` must already have been allocated by the caller via
+`alloc_reg`. Loads the slot's bytes into the register. The slot is not
+released by this call — CG returns it to its own slot free-list.
+Multiple reloads from the same slot are well-defined; CG does not rely
+on this but the contract permits it.
+
+### 2.5 Backend pool implementation
+
+Each `RegClass` holds a `u32` free-mask: bit `i` set means the i-th
+register in that class's contiguous range is free. The aarch64 backend
+keeps two such masks alongside the per-class base register:
+
+```c
+typedef struct RegPool {
+  u32 free;   /* bit i set ⇔ regs[base + i] is free */
+  u8 base;    /* first physical reg in the class */
+  u8 nregs;   /* count; bits [nregs..32) are always 0 */
+} RegPool;
+```
+
+For aarch64:
+
+- INT pool: `base = 19`, `nregs = 10`, initial `free = 0x000003FFu`
+  (x19..x28).
+- FP pool: `base = 8`, `nregs = 16`, initial `free = 0x0000FFFFu`
+  (v8..v23, callee-saves first then caller-saved scratch).
+
+The three pool ops are pure bit operations:
+
+```c
+static Reg pool_alloc(RegPool* p) {
+  if (p->free == 0) return REG_NONE;
+  u32 idx = (u32)__builtin_ctz(p->free);
+  p->free &= ~(1u << idx);
+  return (Reg)(p->base + idx);
+}
+
+static void pool_free(RegPool* p, Reg r) {
+  u32 idx = (u32)r - p->base;
+  /* Double-free is a CG bug — bit must currently be 0. */
+  if (p->free & (1u << idx))
+    compiler_panic(..., "free_reg: %u already free", (unsigned)r);
+  p->free |= (1u << idx);
+}
+```
+
+`spill_reg` emits the store via the backend's existing store path,
+then calls `pool_free` to release the bit.
+
+`__builtin_ctz` picks the lowest free bit, so allocation order is
+deterministic and the same physical regs stay hot — matching today's
+sequential allocation order for diff stability across the test corpus.
+The 32-bit mask is sufficient: every architecture cfree targets has
+fewer than 32 scratch regs per class.
+
+---
+
+## 3. CG residency model
+
+### 3.1 SValue extension
+
+```c
+typedef enum SResidency {
+  RES_INHERENT,  /* IMM / LOCAL / GLOBAL — no reg owed */
+  RES_REG,       /* op holds a Reg this SValue owns */
+  RES_SPILLED,   /* register contents stored to spill_slot;
+                    op.kind reflects the original value form
+                    (REG or INDIRECT); op.v.reg is REG_NONE */
+} SResidency;
+
+typedef struct SValue {
+  Operand op;
+  const Type* type;
+  SResidency res;
+  FrameSlot spill_slot;  /* valid iff res == RES_SPILLED */
+  u8 pinned;             /* 1 = ineligible spill victim */
+} SValue;
+```
+
+`OPK_INDIRECT` lvalues hold a base register and are tracked with
+`RES_REG`. On spill the base reg is saved; on reload the SValue is
+restored to `OPK_INDIRECT` with the freshly-reloaded base. The
+deferred-load identity of the lvalue is preserved across spill/reload
+— CG does not eagerly materialize an INDIRECT to a plain rvalue just
+because it became a spill victim.
+
+`pinned` is set for the duration of one CG operation. Pinning prevents
+a freshly-reloaded operand from being chosen as the spill victim while
+CG is still arranging the other operand of a binop or call.
+
+### 3.2 Spill-slot pool
+
+CG maintains two free-lists of `FrameSlot`s, one per `RegClass`:
+
+- `RC_INT` slots are 8 bytes, 8-byte aligned.
+- `RC_FP` slots are 16 bytes, 16-byte aligned (covers `double` and
+  the spilled portion of `long double`).
+
+A spill takes a slot from the free-list (allocating a fresh
+`FrameSlot` from the backend if the list is empty). A reload returns
+the slot to the free-list. Frame footprint is bounded by peak
+concurrent spills per class, which on a stack machine of typical depth
+is small.
+
+Slots are per-function. `cg_func_end` discards both free-lists.
+
+### 3.3 Invariants
+
+After every public `cg_*` call returns:
+
+1. Every `RES_REG` SValue on the stack owns its register; the sum of
+   `RES_REG` regs equals the backend's allocated-reg count.
+2. No SValue is `pinned`.
+3. Every `RES_SPILLED` SValue holds a slot that is *not* on the
+   slot free-list.
+
+`cg_func_end` asserts the stack is empty and both free-lists are well-
+formed (every entry distinct, no aliasing with a live `RES_SPILLED`
+slot).
+
+---
+
+## 4. Spill / reload algorithm
+
+### 4.1 Allocation with fallback
+
+```
+alloc_reg_or_spill(g, cls, ty) -> Reg:
+    r = target->alloc_reg(cls, ty)
+    if r != REG_NONE:
+        return r
+    victim = pick_victim(g, cls)
+    if victim == NULL:
+        panic("regalloc: no spillable victim")  // pinned-only is a CG bug
+    slot = take_spill_slot(g, cls)
+    target->spill_reg(victim.op, slot, mem_for_spill(victim, cls))
+    victim.spill_slot = slot
+    victim.res = RES_SPILLED
+    victim.op.v.reg = REG_NONE
+    r = target->alloc_reg(cls, ty)
+    assert(r != REG_NONE)
+    return r
+```
+
+`pick_victim` walks the value stack from index 0 upward and returns
+the first SValue with `res == RES_REG`, `pinned == 0`, and matching
+`RegClass`. This is FIFO from the bottom — equivalent to "deepest
+first" — and matches the intuition that the top of the stack is about
+to be consumed.
+
+`mem_for_spill` constructs a `MemAccess` from the victim's type with
+`alias.kind = ALIAS_LOCAL` rooted at the spill slot.
+
+### 4.2 Reload before consumption
+
+```
+ensure_reg(g, sv):
+    if sv.res != RES_SPILLED:
+        return
+    r = alloc_reg_or_spill(g, class_of(sv.type), sv.type)
+    target->reload_reg(op_reg(r, sv.type), sv.spill_slot,
+                       mem_for_spill(sv, class_of(sv.type)))
+    return_spill_slot(g, sv.spill_slot, class_of(sv.type))
+    sv.spill_slot = FRAME_SLOT_NONE
+    if sv.op.kind == OPK_INDIRECT:
+        sv.op.v.ind.base = r
+    else:
+        sv.op = op_reg(r, sv.type)
+    sv.res = RES_REG
+```
+
+`ensure_reg` is called at the start of every operation that consumes
+a register-resident operand. The ensure-then-pin pattern is:
+
+```
+binop(g, op):
+    b = pop(g);          ensure_reg(g, &b);  b.pinned = 1
+    a = pop(g);          ensure_reg(g, &a);  a.pinned = 1
+    rd = alloc_reg_or_spill(g, class_of(result), result_ty)
+    target->binop(op, op_reg(rd, result_ty), a.op, b.op)
+    a.pinned = b.pinned = 0
+    target->free_reg(reg_of(a)); target->free_reg(reg_of(b))
+    push(g, svalue_reg(rd, result_ty))
+```
+
+Pinning is symmetric within one CG call. Nothing leaks across calls.
+
+### 4.3 Termination
+
+Each `alloc_reg_or_spill` call either succeeds on first try or evicts
+exactly one unpinned `RES_REG` SValue. The pinned set is bounded
+(within one CG op, at most a handful: two source operands plus a
+destination-in-progress). With at least `pinned + 1` registers in the
+class, the second `alloc_reg` call in `alloc_reg_or_spill` is
+guaranteed to succeed. Every backend's pool comfortably exceeds this
+bound (aarch64: 10 INT, 16 FP; minimum needed: ~3).
+
+`cg_call` is the one exception to "victims live on the value stack."
+While the pop loop materializes args, popped-but-not-yet-emitted regs
+accumulate in the local `CGABIValue avs[]` array — off the value
+stack and so invisible to `pick_victim`. For a call with more reg-
+class args than the pool size can hold, `pick_victim` will eventually
+return NULL while genuine victims remain in `avs[]`.
+
+To handle this, `cg_call` publishes its in-flight `avs` array via
+`CG.avs_in_flight` for the duration of the pop+materialize loop.
+`alloc_reg_or_spill` falls back to `spill_avs_victim` when the stack
+victim list is empty: it picks an `OPK_REG` arg storage entry,
+evicts it through `spill_reg`, and rewrites `avs[i].storage` to
+`OPK_LOCAL` so the backend's call lowering loads from the slot. After
+`T->call`, `cg_call` walks `avs` and returns each `OPK_LOCAL` slot to
+the spill-slot free-list. Without this fallback, a 12-arg INT call on
+aarch64 (10 INT pool) would be unsatisfiable.
+
+If a backend is added with a class smaller than the pinned bound for
+some operation outside `cg_call`, that's a wiring bug; CG asserts
+after the recursive `alloc_reg` call.
+
+### 4.4 Free-on-consume
+
+Every site that pops an SValue and is done with it must release any
+register it owned:
+
+```
+release(g, sv):
+    if sv.res == RES_REG:
+        if sv.op.kind == OPK_REG:
+            target->free_reg(sv.op.v.reg)
+        else if sv.op.kind == OPK_INDIRECT:
+            target->free_reg(sv.op.v.ind.base)
+    else if sv.res == RES_SPILLED:
+        return_spill_slot(g, sv.spill_slot, class_of(sv.type))
+    /* RES_INHERENT: nothing owed */
+```
+
+`cg_drop`, the result-discard path of `cg_store`, the operand pops in
+every `cg_binop`/`cg_unop`/`cg_cmp`/`cg_call`/`cg_load`/`cg_addr`/etc.
+all flow through `release`. This is the audit that bulks up the diff —
+CG has many `pop` sites and they currently leak.
+
+---
+
+## 5. Removed mechanisms
+
+The following are deleted as part of this change:
+
+- `cg_reset_scratch` (`cg/cg.c`) — no replacement; `release` in §4.4
+  makes it unnecessary.
+- The `extern void cg_reset_scratch(CG*)` declaration and the five call
+  sites in `parse/parse.c`.
+- `aa_reset_scratch` (`arch/aarch64.c`) and its forward declaration in
+  `cg/cg.c`.
+- The "(no spill yet)" panics in `aa_alloc_reg` — replaced by
+  `return REG_NONE`.
+- The `aa_panic("spill_reg")`/`aa_panic("reload_reg")` stubs —
+  replaced by real STR/LDR implementations that route through the
+  backend's existing store/load paths.
+
+The `used_int` / `used_fp` fields on `AAImpl` are repurposed: instead
+of monotonically-incrementing allocation cursors (where free was a
+no-op), they now track the highest-index-+1 ever allocated per class —
+the high-water mark needed by the prologue/epilogue to size the
+callee-save area. The actual free-list is the `int_free` / `fp_free`
+bitmasks added alongside.
diff --git a/src/arch/aarch64.c b/src/arch/aarch64.c
@@ -16,10 +16,14 @@
  *   local slots    (cum_off bytes)
  *   x29, x30 save  (16 bytes)                -- x29 = sp + frame_size - 16
  *
- * Single-pass register allocator: alloc_reg(RC_INT) hands out x19..x28 in
- * order; alloc_reg(RC_FP) hands out v8..v15. Both ranges are callee-saved
- * and only the prefix actually used is saved by the prologue. Width
- * derives from Operand.type via type_is_64. Spill/reload not implemented.
+ * Single-pass register allocator: a free-mask pool per class hands out
+ * the lowest free index. INT pool covers x19..x28 (10 callee-saves);
+ * FP pool covers v8..v23, with v8..v15 callee-saved and v16..v23
+ * caller-saved scratch — lowest-bit-first allocation prefers callee-
+ * saves. Only the prefix actually used (high-water mark) is saved by
+ * the prologue. Width derives from Operand.type via type_is_64. CG
+ * drives spill/reload through alloc_reg returning REG_NONE on
+ * exhaustion plus the spill_reg/reload_reg vtable entries.
  *
  * Multi-function: each func_begin/func_end pair owns its own frame state
  * via the AAImpl fields, so the harness can build several functions in
@@ -257,6 +261,51 @@ static inline u32 aa64_bfm(u32 sf, u32 Rd, u32 Rn, u32 immr, u32 imms) {
 }
 
 /* ============================================================
+ * Per-class register pool (free-mask + high-water mark).
+ *
+ * The mask uses bit i for the i-th register in the class's contiguous
+ * range, so allocation is `__builtin_ctz` over the free mask and
+ * deallocation is bit-set. `hwm` records the highest-index-+1 ever
+ * allocated, which the prologue/epilogue uses to size the callee-save
+ * area. 32-bit masks suffice for every aarch64/x86_64/RISC-V class.
+ * ============================================================ */
+
+typedef struct RegPool {
+  u32 free; /* bit i set ⇔ regs[base + i] is free */
+  u32 hwm;  /* highest-index-+1 ever allocated */
+  u8 base;  /* first physical reg in the class */
+  u8 nregs; /* count; bits [nregs..32) are always 0 */
+  u8 pad[2];
+} RegPool;
+
+static void regpool_init(RegPool* p, u8 base, u8 nregs) {
+  p->base = base;
+  p->nregs = nregs;
+  p->hwm = 0;
+  p->free = (nregs >= 32u) ? 0xFFFFFFFFu : ((1u << nregs) - 1u);
+}
+
+static Reg regpool_alloc(RegPool* p) {
+  if (p->free == 0) return (Reg)REG_NONE;
+  u32 idx = (u32)__builtin_ctz(p->free);
+  p->free &= ~(1u << idx);
+  if (idx + 1u > p->hwm) p->hwm = idx + 1u;
+  return (Reg)(p->base + idx);
+}
+
+/* Returns 1 on successful free, 0 if `r` is outside this pool's range,
+ * -1 on double-free (caller is expected to panic). */
+static int regpool_free(RegPool* p, Reg r) {
+  u32 rn = (u32)r;
+  if (rn < p->base || rn >= (u32)(p->base + p->nregs)) return 0;
+  u32 idx = rn - p->base;
+  u32 bit = 1u << idx;
+  if (p->free & bit) return -1;
+  p->free |= bit;
+  return 1;
+}
+
+/* ============================================================
  * AAImpl
  * ============================================================ */
 
@@ -305,9 +354,18 @@ typedef struct AAImpl {
   u8 has_sret;             /* sret pointer arrived in x8 */
   FrameSlot sret_ptr_slot; /* hidden slot holding incoming x8 */
 
-  /* Reg allocator (callee-saved prefix). */
-  u32 used_int; /* x19 + i, i in [0, used_int) */
-  u32 used_fp;  /* v8  + i, i in [0, used_fp ) */
+  /* Reg allocator pools. Bit i set in `free` means the i-th register in
+   * the class's contiguous range (base..base+nregs-1) is available. The
+   * high-water mark `hwm` is the largest index+1 ever allocated for the
+   * class — used by the prologue to decide how many callee-saves to push.
+   *
+   * INT pool: base = 19, nregs = 10 (x19..x28).
+   * FP pool : base =  8, nregs = 16 (v8..v23). The first 8 (v8..v15) are
+   * AAPCS64 callee-saves; v16..v23 are caller-saved scratch handed out
+   * after the callee-saved range fills. Allocation is lowest-bit-first
+   * so callee-saves are still preferred. */
+  RegPool int_pool;
+  RegPool fp_pool;
 
   /* Structured-scope stack. Entries are not popped — IDs returned to
    * the caller are stable indices into this array for the lifetime
@@ -343,6 +401,9 @@ static AAImpl* impl_of(CGTarget* t) { return (AAImpl*)t; }
 static FrameSlot aa_frame_slot(CGTarget* t, const FrameSlotDesc* d);
 static AASlot* slot_get(AAImpl* a, FrameSlot fs);
 static u32 force_reg_int(CGTarget* t, Operand op, u32 sf, u32 scratch);
+static void aa_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma);
+static void aa_store(CGTarget* t, Operand addr, Operand src, MemAccess ma);
+static void aa_free_reg(CGTarget* t, Reg r);
 
 /* ---- helpers ---- */
 
@@ -535,8 +596,8 @@ static void aa_func_begin(CGTarget* t, const CGFuncDesc* fd) {
   a->has_sret = (fd->abi && fd->abi->has_sret) ? 1 : 0;
   a->cum_off = 0;
   a->max_outgoing = 0;
-  a->used_int = 0;
-  a->used_fp = 0;
+  regpool_init(&a->int_pool, /*base=*/19u, /*nregs=*/10u); /* x19..x28 */
+  regpool_init(&a->fp_pool, /*base=*/8u, /*nregs=*/16u);   /* v8..v23 */
   a->nslots = 0;
   a->nscopes = 0;
   a->has_alloca = 0;
@@ -614,8 +675,8 @@ static void aa_func_end(CGTarget* t) {
   /* Compute callee-save layout. Only v8..v15 are callee-saved; the
    * caller-saved v16..v23 are handed out by alloc_reg too but never
    * appear in prologue saves. */
-  u32 n_int_pairs = (a->used_int + 1) / 2; /* round up */
-  u32 used_fp_cs = a->used_fp > 8 ? 8u : a->used_fp;
+  u32 n_int_pairs = (a->int_pool.hwm + 1) / 2; /* round up */
+  u32 used_fp_cs = a->fp_pool.hwm > 8 ? 8u : a->fp_pool.hwm;
   u32 n_fp_pairs = (used_fp_cs + 1) / 2;
 
   u32 outgoing_off = 0;
@@ -750,45 +811,28 @@ static void aa_func_end(CGTarget* t) {
 static Reg aa_alloc_reg(CGTarget* t, RegClass cls, const Type* ty) {
   AAImpl* a = impl_of(t);
   (void)ty;
-  if (cls == RC_INT) {
-    if (a->used_int >= 10) {
-      compiler_panic(t->c, a->loc,
-                     "aarch64 alloc_reg: out of INT scratch (no spill yet)");
-    }
-    return (Reg)(19u + a->used_int++);
-  }
-  if (cls == RC_FP) {
-    /* v8..v15 are callee-saved (low 64 bits); v16..v23 are caller-saved
-     * scratch. Hand out callee-saved first, then fall back to scratch
-     * for short-lived materialization (e.g. j06 builds 9 FP arg regs
-     * with no intervening call). */
-    if (a->used_fp >= 16) {
-      compiler_panic(t->c, a->loc,
-                     "aarch64 alloc_reg: out of FP scratch (no spill yet)");
-    }
-    u32 idx = a->used_fp++;
-    return (Reg)(idx < 8 ? 8u + idx : 16u + (idx - 8u));
-  }
+  /* Lowest-bit-first allocation hands out callee-saves before caller-
+   * saves on the FP side (v8..v15 then v16..v23) — short-lived
+   * materializations (e.g. j06 building 9 FP arg regs with no
+   * intervening call) thus reach into the caller-saved range. */
+  if (cls == RC_INT) return regpool_alloc(&a->int_pool);
+  if (cls == RC_FP)  return regpool_alloc(&a->fp_pool);
   compiler_panic(t->c, a->loc, "aarch64 alloc_reg: class %d unimpl", (int)cls);
 }
 
 static void aa_free_reg(CGTarget* t, Reg r) {
-  (void)t;
-  (void)r;
-}
-
-/* Reset the scratch-register cursors. The parser calls this between
- * statements (when its value stack is known empty), letting the next
- * statement reuse the entire scratch pool. Safe only when no live
- * register-resident SValue is still expected — the parser asserts
- * that precondition by checking cg's sp before forwarding. Without
- * this, any function whose body contains more than ~10 sequential
- * register-allocating operations exhausts the scratch pool. */
-void aa_reset_scratch(CGTarget* t);
-void aa_reset_scratch(CGTarget* t) {
   AAImpl* a = impl_of(t);
-  a->used_int = 0;
-  a->used_fp = 0;
+  RegPool* pools[2] = {&a->int_pool, &a->fp_pool};
+  for (u32 i = 0; i < 2; ++i) {
+    int rc = regpool_free(pools[i], r);
+    if (rc == 1) return;
+    if (rc == -1) {
+      compiler_panic(t->c, a->loc, "aarch64 free_reg: reg %u already free",
+                     (unsigned)r);
+    }
+  }
+  compiler_panic(t->c, a->loc, "aarch64 free_reg: reg %u not a scratch reg",
+                 (unsigned)r);
 }
 
 static FrameSlot aa_frame_slot(CGTarget* t, const FrameSlotDesc* d) {
@@ -914,17 +958,35 @@ static const Reg* aa_clobbers(CGTarget* t, RegClass c, u32* n) {
   (void)n;
   aa_panic(t, "clobbers");
 }
-static void aa_spill_reg(CGTarget* t, Operand s, FrameSlot f, MemAccess m) {
-  (void)s;
-  (void)f;
-  (void)m;
-  aa_panic(t, "spill_reg");
-}
-static void aa_reload_reg(CGTarget* t, Operand d, FrameSlot f, MemAccess m) {
-  (void)d;
-  (void)f;
-  (void)m;
-  aa_panic(t, "reload_reg");
+static void aa_spill_reg(CGTarget* t, Operand src, FrameSlot slot,
+                         MemAccess ma) {
+  AAImpl* a = impl_of(t);
+  if (src.kind != OPK_REG) {
+    compiler_panic(t->c, a->loc, "aarch64 spill_reg: src is not OPK_REG");
+  }
+  Operand addr;
+  memset(&addr, 0, sizeof addr);
+  addr.kind = OPK_LOCAL;
+  addr.cls = RC_INT;
+  addr.type = ma.type;
+  addr.v.frame_slot = slot;
+  aa_store(t, addr, src, ma);
+  aa_free_reg(t, src.v.reg);
+}
+
+static void aa_reload_reg(CGTarget* t, Operand dst, FrameSlot slot,
+                          MemAccess ma) {
+  AAImpl* a = impl_of(t);
+  if (dst.kind != OPK_REG) {
+    compiler_panic(t->c, a->loc, "aarch64 reload_reg: dst is not OPK_REG");
+  }
+  Operand addr;
+  memset(&addr, 0, sizeof addr);
+  addr.kind = OPK_LOCAL;
+  addr.cls = RC_INT;
+  addr.type = ma.type;
+  addr.v.frame_slot = slot;
+  aa_load(t, dst, addr, ma);
 }
 
 /* ---- labels / control flow ----
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -11,11 +11,29 @@
  *   - OPK_LOCAL / OPK_GLOBAL / OPK_INDIRECT are lvalues. cg_load promotes
  *     them to OPK_REG via target->load + a fresh scratch register.
  *
- * This is the spine slice — enough for §6.5/§6.8 fixtures: scalar i32
- * locals, integer arithmetic, comparisons, control flow, and return.
- * Aggregates, atomics, calls, and the asm/setjmp/intrinsic methods are
- * placeholders pending their corpus rows. The interface in cg.h is the
- * commitment; this file fills in the slice that's exercised today. */
+ * Register pressure & spill (see doc/REGALLOC.md):
+ *   - Each SValue carries an SResidency tag (INHERENT / REG / SPILLED).
+ *     REG-residing SValues own a physical scratch register that must be
+ *     released back to the pool when the value is consumed; SPILLED
+ *     SValues own a frame slot instead and must be reloaded before use.
+ *   - alloc_reg_or_spill is the single allocation entry point. On pool
+ *     exhaustion it picks the deepest unpinned RES_REG SValue from the
+ *     value stack as the spill victim, evicts its register through
+ *     T->spill_reg, and retries. ensure_reg is the dual: it reloads a
+ *     SPILLED SValue's register before consumption, possibly evicting
+ *     another value to make room.
+ *   - Pop sites (binop, cmp, store, branch, call, ...) call release()
+ *     to return regs/slots after consumption — there is no statement-
+ *     boundary scratch reset; ownership tracking is the discipline.
+ *   - cg_call additionally exposes its in-flight CGABIValue array via
+ *     CG.avs_in_flight so the spill driver can re-spill an already-
+ *     materialized arg's register (rewriting storage to OPK_LOCAL) when
+ *     the value stack runs out of victims; this lets calls with more
+ *     reg-class args than the pool size can hold lower correctly.
+ *
+ * Aggregates, atomics, asm/setjmp/intrinsic methods are placeholders
+ * pending their corpus rows. The interface in cg.h is the commitment;
+ * this file fills in the slice that's exercised today. */
 
 #include "cg/cg.h"
 
@@ -35,12 +53,28 @@
  * Value stack
  * ============================================================ */
 
+/* Residency: where the value's storage actually lives. INHERENT values
+ * (IMM, LOCAL, GLOBAL) carry no register obligation. REG values own a
+ * physical scratch register. SPILLED values had their register evicted
+ * to a frame slot under register pressure and must be reloaded before
+ * consumption. See doc/REGALLOC.md §3. */
+typedef enum SResidency {
+  RES_INHERENT,
+  RES_REG,
+  RES_SPILLED,
+} SResidency;
+
 typedef struct SValue {
   Operand op;       /* IMM/REG (rvalue) or LOCAL/GLOBAL/INDIRECT (lvalue) */
   const Type* type; /* C semantic type of the value (post-promotion) */
+  u8 res;           /* SResidency */
+  u8 pinned;        /* 1 = ineligible spill victim (cleared per CG op) */
+  u8 pad[2];
+  FrameSlot spill_slot; /* valid iff res == RES_SPILLED */
 } SValue;
 
 #define CG_STACK_INITIAL 16u
+#define CG_SPILL_FREE_INITIAL 4u
 
 struct CG {
   Compiler* c;
@@ -64,6 +98,29 @@ struct CG {
   SValue* stack;
   u32 sp;
   u32 cap;
+
+  /* Per-function spill-slot free-lists, one per RegClass. A spill takes a
+   * slot from the free-list (allocating fresh from the backend if empty);
+   * a reload returns the slot. Frame footprint is bounded by the peak
+   * concurrent spills per class. Reset at func_end. See doc/REGALLOC.md
+   * §3.2. */
+  struct {
+    FrameSlot* free;
+    u32 n;
+    u32 cap;
+  } slot_pools[3]; /* indexed by RegClass; RC_VEC reserved */
+
+  /* Set during cg_call's pop+materialize loop to point at the call's
+   * in-flight CGABIValue array. When alloc_reg_or_spill exhausts the
+   * stack victim list, it falls back to spilling an OPK_REG arg
+   * storage entry from this array, rewriting it to OPK_LOCAL so the
+   * backend's call lowering loads it from the spill slot. NULL outside
+   * cg_call. Without this fallback, calls with more reg-class args
+   * than the pool can hold (e.g. 10+ INT args on aarch64) would have
+   * unreclaimable regs sitting in avs[] while the value stack runs
+   * out of victims. */
+  CGABIValue* avs_in_flight;
+  u32 avs_in_flight_n;
 };
 
 static void stack_grow(CG* g, u32 want) {
@@ -93,11 +150,24 @@ static SValue pop(CG* g) {
   return g->stack[--g->sp];
 }
 
-static SValue* top(CG* g) {
-  if (g->sp == 0) {
-    compiler_panic(g->c, g->cur_loc, "cg: stack empty");
-  }
-  return &g->stack[g->sp - 1];
+
+/* Residency of an Operand as it should land on the stack at construction
+ * time. REG → owns a register; INDIRECT → owns its base register; the
+ * rest carry no register obligation. */
+static u8 residency_for(const Operand* o) {
+  if (o->kind == OPK_REG) return RES_REG;
+  if (o->kind == OPK_INDIRECT) return RES_REG;
+  return RES_INHERENT;
+}
+
+static SValue make_sv(Operand op, const Type* ty) {
+  SValue sv;
+  memset(&sv, 0, sizeof sv);
+  sv.op = op;
+  sv.type = ty;
+  sv.res = residency_for(&op);
+  sv.spill_slot = FRAME_SLOT_NONE;
+  return sv;
 }
 
 /* ============================================================
@@ -187,6 +257,257 @@ static AliasKind alias_for_lvalue(const Operand* o) {
   }
 }
 
+/* MemAccess for a load/store through an lvalue Operand. Routes through
+ * derive_mem with the alias root and any local-id taken from the lvalue
+ * itself. The Operand must be OPK_LOCAL/GLOBAL/INDIRECT. */
+static MemAccess mem_for_lvalue(CG* g, const Operand* lv, const Type* ty) {
+  AliasKind ak = alias_for_lvalue(lv);
+  i32 al = (ak == ALIAS_LOCAL) ? (i32)lv->v.frame_slot : 0;
+  return derive_mem(g, ty, ak, al);
+}
+
+/* C type carried by an SValue, with a fallback to the Operand's own type
+ * field for SValues whose `type` was never set (notably the bare-slot
+ * cg_push_local path). The two should agree when both present. */
+static const Type* sv_type(const SValue* sv) {
+  return sv->type ? sv->type : sv->op.type;
+}
+
+/* Build an OPK_INDIRECT Operand for [base + ofs] of type `ty`. The base
+ * register is always RC_INT (it holds a pointer). */
+static Operand op_indirect(Reg base, i32 ofs, const Type* ty) {
+  Operand o;
+  memset(&o, 0, sizeof o);
+  o.kind = OPK_INDIRECT;
+  o.cls = RC_INT;
+  o.type = ty;
+  o.v.ind.base = base;
+  o.v.ind.ofs = ofs;
+  return o;
+}
+
+/* ============================================================
+ * Register-pool & spill driver — see doc/REGALLOC.md
+ * ============================================================ */
+
+/* Class an SValue's register lives in: RC_FP for float types, RC_INT for
+ * everything else (including OPK_INDIRECT base regs which always hold a
+ * pointer-sized integer). */
+static u8 class_of_sv(const SValue* sv) {
+  if (sv->op.kind == OPK_INDIRECT) return RC_INT;
+  return type_class(sv_type(sv));
+}
+
+/* The C type whose width matches the register an SValue currently owns.
+ * For OPK_INDIRECT, that's a pointer to the lvalue's type (the base reg
+ * holds an address); for everything else, the SValue's plain type. Used
+ * by the spill/reload machinery to size the slot store/load. */
+static const Type* sv_owned_reg_type(CG* g, const SValue* sv) {
+  if (sv->op.kind == OPK_INDIRECT) {
+    return type_ptr(g->pool, sv->type ? sv->type : type_void(g->pool));
+  }
+  return sv_type(sv);
+}
+
+static u8 reg_of_sv(const SValue* sv) {
+  if (sv->op.kind == OPK_REG) return (u8)sv->op.v.reg;
+  if (sv->op.kind == OPK_INDIRECT) return (u8)sv->op.v.ind.base;
+  return 0;
+}
+
+/* Write the register an SValue owns. For OPK_INDIRECT the ind.base
+ * field; for OPK_REG the v.reg field. Used by ensure_reg to restore
+ * the reloaded reg, and by alloc_reg_or_spill to mark a freshly-
+ * spilled SValue's reg slot as REG_NONE so any stale read fails fast. */
+static void set_owned_reg(SValue* sv, Reg r) {
+  if (sv->op.kind == OPK_INDIRECT) {
+    sv->op.v.ind.base = r;
+  } else {
+    sv->op.v.reg = r;
+  }
+}
+
+/* MemAccess for a spill/reload. Width is driven by the type of the value
+ * whose register is moving — pointer-sized for OPK_INDIRECT bases,
+ * otherwise the value's own type. The alias root is ALIAS_LOCAL with
+ * id 0: spill traffic is internal CG bookkeeping that opt's alias
+ * analysis never inspects, and the slot itself disambiguates accesses. */
+static MemAccess mem_for_spill(CG* g, const SValue* sv) {
+  return derive_mem(g, sv_owned_reg_type(g, sv), ALIAS_LOCAL, 0);
+}
+
+/* Per-class spill-slot size in bytes. FP slots are 16 bytes to cover
+ * `double` and the spilled portion of `long double`; INT slots are
+ * pointer-width. Both values are also the slot's alignment. */
+static u32 spill_slot_size(u8 cls) { return (cls == RC_FP) ? 16u : 8u; }
+
+/* Take a spill slot from the per-class free-list, or allocate a fresh one. */
+static FrameSlot take_spill_slot(CG* g, u8 cls) {
+  if (g->slot_pools[cls].n > 0) {
+    return g->slot_pools[cls].free[--g->slot_pools[cls].n];
+  }
+  FrameSlotDesc d;
+  memset(&d, 0, sizeof d);
+  d.size = spill_slot_size(cls);
+  d.align = d.size;
+  d.kind = FS_SPILL;
+  return g->target->frame_slot(g->target, &d);
+}
+
+static void return_spill_slot(CG* g, FrameSlot s, u8 cls) {
+  if (s == FRAME_SLOT_NONE) return;
+  if (g->slot_pools[cls].n == g->slot_pools[cls].cap) {
+    Heap* h = g->c->env->heap;
+    u32 cap = g->slot_pools[cls].cap;
+    u32 nc = cap ? cap * 2u : CG_SPILL_FREE_INITIAL;
+    FrameSlot* nb =
+        (FrameSlot*)h->alloc(h, sizeof(FrameSlot) * nc, _Alignof(FrameSlot));
+    if (g->slot_pools[cls].free) {
+      memcpy(nb, g->slot_pools[cls].free,
+             sizeof(FrameSlot) * g->slot_pools[cls].n);
+      h->free(h, g->slot_pools[cls].free, sizeof(FrameSlot) * cap);
+    }
+    g->slot_pools[cls].free = nb;
+    g->slot_pools[cls].cap = nc;
+  }
+  g->slot_pools[cls].free[g->slot_pools[cls].n++] = s;
+}
+
+/* Walk the value stack from index 0 upward and return the first
+ * unpinned RES_REG SValue that matches `cls`. FIFO-from-bottom == the
+ * deepest live value, which matches the intuition that the top of
+ * stack is about to be consumed. Pinned entries are skipped — they're
+ * the operands of an in-flight CG op that mustn't be evicted from
+ * under itself (see cg_dup, which keeps its source on the stack while
+ * allocating the duplicate's destination register). Returns NULL if
+ * no eligible victim exists; that's a CG bug at this call site. */
+static SValue* pick_victim(CG* g, u8 cls) {
+  for (u32 i = 0; i < g->sp; ++i) {
+    SValue* sv = &g->stack[i];
+    if (sv->res != RES_REG) continue;
+    if (sv->pinned) continue;
+    if (class_of_sv(sv) != cls) continue;
+    return sv;
+  }
+  return NULL;
+}
+
+/* Release the resources owned by a single in-flight CGABIValue arg
+ * after the call has returned: REG storage goes back to the reg pool,
+ * LOCAL storage (only produced by spill_avs_victim) returns its slot
+ * to the per-class spill-slot pool. IMM and other kinds carry no
+ * runtime ownership and need nothing. */
+static void release_arg_storage(CG* g, const Operand* st) {
+  if (st->kind == OPK_REG) {
+    g->target->free_reg(g->target, st->v.reg);
+  } else if (st->kind == OPK_LOCAL) {
+    return_spill_slot(g, st->v.frame_slot, st->cls);
+  }
+}
+
+/* Try to spill a CGABIValue arg from the in-flight set whose `storage`
+ * is an OPK_REG of `cls`. Mutates the entry: storage becomes OPK_LOCAL
+ * pointing at a freshly-taken spill slot, and the backend's call
+ * lowering will load from there when materializing the arg into its
+ * ABI register. Returns 1 if a victim was found and spilled, 0 if none
+ * was eligible. Used by alloc_reg_or_spill as a fallback when the
+ * value stack has no eligible victim — see CG.avs_in_flight. */
+static int spill_avs_victim(CG* g, u8 cls) {
+  CGTarget* T = g->target;
+  if (!g->avs_in_flight) return 0;
+  for (u32 i = 0; i < g->avs_in_flight_n; ++i) {
+    CGABIValue* av = &g->avs_in_flight[i];
+    if (av->storage.kind != OPK_REG) continue;
+    if (av->storage.cls != cls) continue;
+    FrameSlot slot = take_spill_slot(g, cls);
+    SValue tmp = make_sv(av->storage, av->type);
+    MemAccess ma = mem_for_spill(g, &tmp);
+    T->spill_reg(T, av->storage, slot, ma);
+    /* Rewrite storage to OPK_LOCAL so the backend reads from the slot.
+     * The cls is preserved on the operand so the cleanup loop knows
+     * which spill-slot pool to return the slot to. */
+    Operand local = op_local(slot, av->type);
+    local.cls = cls;
+    av->storage = local;
+    return 1;
+  }
+  return 0;
+}
+
+/* Allocate a register; on pool exhaustion, spill the deepest live
+ * RES_REG value to a frame slot and try again. If the value stack has
+ * no eligible victim, fall back to spilling an in-flight cg_call arg
+ * (see avs_in_flight) — without that fallback, a call with more args
+ * than the pool size can hold becomes unsatisfiable since the popped-
+ * but-not-yet-emitted arg regs would sit unreclaimable in avs[]. */
+static Reg alloc_reg_or_spill(CG* g, u8 cls, const Type* ty) {
+  CGTarget* T = g->target;
+  Reg r = T->alloc_reg(T, cls, ty);
+  if (r != (Reg)REG_NONE) return r;
+
+  SValue* victim = pick_victim(g, cls);
+  if (victim) {
+    FrameSlot slot = take_spill_slot(g, cls);
+    Operand victim_reg = op_reg((Reg)reg_of_sv(victim), victim->type);
+    T->spill_reg(T, victim_reg, slot, mem_for_spill(g, victim));
+
+    victim->spill_slot = slot;
+    victim->res = RES_SPILLED;
+    /* Mark the reg slot as REG_NONE so any stale read fails fast; the
+     * reload via ensure_reg will write the new reg back into the same
+     * field. */
+    set_owned_reg(victim, (Reg)REG_NONE);
+  } else if (!spill_avs_victim(g, cls)) {
+    compiler_panic(g->c, g->cur_loc,
+                   "cg: regalloc — no spillable victim (class %u)",
+                   (unsigned)cls);
+  }
+
+  r = T->alloc_reg(T, cls, ty);
+  if (r == (Reg)REG_NONE) {
+    compiler_panic(g->c, g->cur_loc,
+                   "cg: regalloc — class %u still empty after spill",
+                   (unsigned)cls);
+  }
+  return r;
+}
+
+/* Reload a spilled SValue back into a register (possibly evicting another
+ * live value). After return, `sv->res == RES_REG` and the operand reflects
+ * the reloaded register. INDIRECT lvalues are restored to OPK_INDIRECT
+ * with a fresh base; the deferred-load identity is preserved. */
+static void ensure_reg(CG* g, SValue* sv) {
+  if (sv->res != RES_SPILLED) return;
+  CGTarget* T = g->target;
+  u8 cls = class_of_sv(sv);
+  const Type* ty = sv_owned_reg_type(g, sv);
+  Reg r = alloc_reg_or_spill(g, cls, ty);
+  T->reload_reg(T, op_reg(r, ty), sv->spill_slot, mem_for_spill(g, sv));
+  return_spill_slot(g, sv->spill_slot, cls);
+  sv->spill_slot = FRAME_SLOT_NONE;
+  /* For INDIRECT, the lvalue's deferred-load identity is preserved: only
+   * the base reg changes. For everything else, refresh the whole REG
+   * operand so its `type` matches the SValue's current type tag. */
+  if (sv->op.kind == OPK_INDIRECT) {
+    sv->op.v.ind.base = r;
+  } else {
+    sv->op = op_reg(r, sv_type(sv));
+  }
+  sv->res = RES_REG;
+}
+
+/* Release any register or spill slot owned by an SValue popped off the
+ * stack and not consumed by a downstream operation. */
+static void release(CG* g, SValue* sv) {
+  if (sv->res == RES_REG) {
+    g->target->free_reg(g->target, (Reg)reg_of_sv(sv));
+  } else if (sv->res == RES_SPILLED) {
+    return_spill_slot(g, sv->spill_slot, class_of_sv(sv));
+    sv->spill_slot = FRAME_SLOT_NONE;
+  }
+  sv->res = RES_INHERENT;
+}
+
 /* ============================================================
  * Construction
  * ============================================================ */
@@ -212,6 +533,12 @@ void cg_free(CG* g) {
   if (!g) return;
   h = g->c->env->heap;
   if (g->stack) h->free(h, g->stack, sizeof(SValue) * g->cap);
+  for (u32 c = 0; c < 3; ++c) {
+    if (g->slot_pools[c].free) {
+      h->free(h, g->slot_pools[c].free,
+              sizeof(FrameSlot) * g->slot_pools[c].cap);
+    }
+  }
   h->free(h, g, sizeof *g);
 }
 
@@ -227,6 +554,10 @@ void cg_func_begin(CG* g, const CGFuncDesc* fd) {
   g->fn_ret_type = fd->fn_type ? fd->fn_type->fn.ret : NULL;
   g->fn_abi = fd->abi;
   g->sp = 0;
+  /* Per-function spill-slot free-lists reset. The backing arrays are
+   * reused; only the counts go to zero since slot ids belong to the new
+   * function's frame. */
+  for (u32 c = 0; c < 3; ++c) g->slot_pools[c].n = 0;
 
   /* Class-1 DWARF: a new subprogram opens. doc/DWARF.md §3.1 makes this
    * the parser's job; we forward through cg as a convenience hook. */
@@ -270,10 +601,7 @@ void cg_bind_decl(CG* g, DeclId id) {
  * ============================================================ */
 
 void cg_push_int(CG* g, i64 v, const Type* ty) {
-  SValue sv;
-  sv.op = op_imm(v, ty);
-  sv.type = ty;
-  push(g, sv);
+  push(g, make_sv(op_imm(v, ty), ty));
 }
 
 void cg_push_const(CG* g, ConstBytes cb) {
@@ -281,15 +609,10 @@ void cg_push_const(CG* g, ConstBytes cb) {
    * stack value is plain rvalue REG. The constant pool / immediate-encoding
    * choice is the backend's. */
   CGTarget* T = g->target;
-  Reg r = T->alloc_reg(T, type_class(cb.type), cb.type);
+  Reg r = alloc_reg_or_spill(g, type_class(cb.type), cb.type);
   Operand dst = op_reg(r, cb.type);
   T->load_const(T, dst, cb);
-  {
-    SValue sv;
-    sv.op = dst;
-    sv.type = cb.type;
-    push(g, sv);
-  }
+  push(g, make_sv(dst, cb.type));
 }
 
 void cg_push_float(CG* g, double v, const Type* ty) {
@@ -330,43 +653,28 @@ void cg_push_local(CG* g, FrameSlot s) {
    * scope record, not through cg, so the push uses NULL type and the
    * subsequent cg_load supplies the right type. The parser actually pushes
    * via the type-aware variant; this base entry is here for completeness. */
-  SValue sv;
-  sv.op = op_local(s, NULL);
-  sv.type = NULL;
-  push(g, sv);
+  push(g, make_sv(op_local(s, NULL), NULL));
 }
 
 /* Type-aware variants used by the parser. Not in the public header; the
  * parser calls these directly via a small extension below. */
 void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty);
 void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty) {
-  SValue sv;
-  sv.op = op_local(s, ty);
-  sv.type = ty;
-  push(g, sv);
+  push(g, make_sv(op_local(s, ty), ty));
 }
 
 /* Pop a pointer rvalue and push an OPK_INDIRECT lvalue for the pointee.
  * The parser uses this to implement unary `*`. The pointer is materialized
  * into a register; the resulting lvalue's MemAccess alias root is unknown
  * (not LOCAL/GLOBAL), which is the right conservative answer for *ptr. */
-static Operand force_reg(CG* g, SValue v, const Type* ty);
+static Operand force_reg(CG* g, SValue* v, const Type* ty);
 void cg_deref(CG* g, const Type* pointee_ty);
 void cg_deref(CG* g, const Type* pointee_ty) {
   SValue v = pop(g);
-  const Type* pty = v.type ? v.type : v.op.type;
-  Operand src = force_reg(g, v, pty);
-  Operand ind;
-  SValue sv;
-  memset(&ind, 0, sizeof ind);
-  ind.kind = OPK_INDIRECT;
-  ind.cls = RC_INT;
-  ind.type = pointee_ty;
-  ind.v.ind.base = src.v.reg;
-  ind.v.ind.ofs = 0;
-  sv.op = ind;
-  sv.type = pointee_ty;
-  push(g, sv);
+  /* The pointer reg becomes the new lvalue's base — ownership transfers
+   * from `v` to the new INDIRECT SValue, so no release on `v`. */
+  Operand src = force_reg(g, &v, sv_type(&v));
+  push(g, make_sv(op_indirect(src.v.reg, 0, pointee_ty), pointee_ty));
 }
 
 /* Read the type of the value currently on top of the stack without popping.
@@ -398,25 +706,8 @@ void cg_retag_top(CG* g, const Type* ty) {
   g->stack[g->sp - 1].op.type = ty;
 }
 
-/* Recycle the backend's scratch-register pool. Safe only when nothing on
- * the value stack holds a register operand (sp == 0 in particular, but
- * an all-IMM stack is also fine in principle). The parser calls this at
- * statement boundaries so a function body with many sequential reg-
- * allocating operations doesn't exhaust the fixed scratch window. */
-extern void aa_reset_scratch(CGTarget*);
-void cg_reset_scratch(CG* g);
-void cg_reset_scratch(CG* g) {
-  if (g->sp != 0) return;
-  /* For now we only know about the aarch64 backend; once a second arch
-   * lands we promote this to a CGTarget vtable entry. */
-  aa_reset_scratch(g->target);
-}
-
 void cg_push_global(CG* g, ObjSymId sym, const Type* ty) {
-  SValue sv;
-  sv.op = op_global(sym, 0, ty);
-  sv.type = ty;
-  push(g, sv);
+  push(g, make_sv(op_global(sym, 0, ty), ty));
 }
 
 /* ============================================================
@@ -424,8 +715,47 @@ void cg_push_global(CG* g, ObjSymId sym, const Type* ty) {
  * ============================================================ */
 
 void cg_dup(CG* g) {
-  SValue v = *top(g);
-  push(g, v);
+  /* Duplicate the top SValue. INHERENT values (IMM/LOCAL/GLOBAL) carry
+   * no register ownership — copying the SValue is enough. REG-owning
+   * values must materialize into a second register so each side of the
+   * dup can be released independently; the contents come over via
+   * target->copy.
+   *
+   * The source stays on the stack while the destination register is
+   * allocated, which means alloc_reg_or_spill could in principle pick
+   * the source as the spill victim if pool pressure is high and there
+   * is no other eligible RES_REG on the stack. Pin the source for the
+   * duration to keep pick_victim away from it. */
+  CGTarget* T = g->target;
+  if (g->sp == 0) compiler_panic(g->c, g->cur_loc, "cg_dup: stack empty");
+  SValue* top_p = &g->stack[g->sp - 1];
+  ensure_reg(g, top_p);
+  SValue v = *top_p; /* snapshot AFTER reload — v.op now reflects fresh reg */
+  if (v.res != RES_REG) {
+    push(g, v);
+    return;
+  }
+  top_p->pinned = 1;
+  const Type* ty = sv_owned_reg_type(g, &v);
+  Reg r = alloc_reg_or_spill(g, class_of_sv(&v), ty);
+  T->copy(T, op_reg(r, ty), op_reg((Reg)reg_of_sv(&v), ty));
+  /* Refresh the stack pointer: alloc_reg_or_spill above may have spilled
+   * a different stack entry (the top was pinned, so it stayed put), but
+   * the SValue array address could have been reallocated by a future
+   * grow. Today no path inside copy/spill grows the stack, but reading
+   * through the original `top_p` after a potential realloc would be UB
+   * regardless. */
+  g->stack[g->sp - 1].pinned = 0;
+  /* The duplicate is `v` with its owned reg replaced by `r`. set_owned_reg
+   * writes into ind.base for INDIRECT lvalues or v.reg for REG rvalues —
+   * either way the duplicate ends up RES_REG with the freshly-copied
+   * value, independent of the source. */
+  SValue dup = v;
+  set_owned_reg(&dup, r);
+  dup.res = RES_REG;
+  dup.pinned = 0;
+  dup.spill_slot = FRAME_SLOT_NONE;
+  push(g, dup);
 }
 
 void cg_swap(CG* g) {
@@ -438,7 +768,10 @@ void cg_swap(CG* g) {
   g->stack[g->sp - 2] = a;
 }
 
-void cg_drop(CG* g) { (void)pop(g); }
+void cg_drop(CG* g) {
+  SValue v = pop(g);
+  release(g, &v);
+}
 
 /* ============================================================
  * load / store / addr
@@ -451,49 +784,34 @@ static int is_lvalue(const Operand* o) {
 
 void cg_load(CG* g) {
   SValue v = pop(g);
-  CGTarget* T = g->target;
+  ensure_reg(g, &v);
   if (!is_lvalue(&v.op)) {
     /* Already an rvalue — passing-through is correct (cg_load is idempotent
      * on rvalues so the parser can call it eagerly). */
     push(g, v);
     return;
   }
-  {
-    const Type* ty = v.type ? v.type : v.op.type;
-    Reg r = T->alloc_reg(T, type_class(ty), ty);
-    Operand dst = op_reg(r, ty);
-    MemAccess ma;
-    AliasKind ak = alias_for_lvalue(&v.op);
-    i32 alias_local = (ak == ALIAS_LOCAL) ? (i32)v.op.v.frame_slot : 0;
-    ma = derive_mem(g, ty, ak, alias_local);
-    T->load(T, dst, v.op, ma);
-    {
-      SValue rv;
-      rv.op = dst;
-      rv.type = ty;
-      push(g, rv);
-    }
-  }
+  /* force_reg's lvalue branch does exactly what cg_load wants: alloc a
+   * fresh value reg, T->load through the lvalue's MemAccess, free the
+   * old INDIRECT base if any, retag v as RES_REG. */
+  const Type* ty = sv_type(&v);
+  Operand dst = force_reg(g, &v, ty);
+  push(g, make_sv(dst, ty));
 }
 
 void cg_addr(CG* g) {
   SValue v = pop(g);
   CGTarget* T = g->target;
+  ensure_reg(g, &v);
   if (!is_lvalue(&v.op)) {
     compiler_panic(g->c, g->cur_loc, "cg_addr: operand is not an lvalue");
   }
-  {
-    const Type* pty = type_ptr(g->pool, v.type ? v.type : v.op.type);
-    Reg r = T->alloc_reg(T, RC_INT, pty);
-    Operand dst = op_reg(r, pty);
-    T->addr_of(T, dst, v.op);
-    {
-      SValue rv;
-      rv.op = dst;
-      rv.type = pty;
-      push(g, rv);
-    }
-  }
+  const Type* pty = type_ptr(g->pool, sv_type(&v));
+  Reg r = alloc_reg_or_spill(g, RC_INT, pty);
+  Operand dst = op_reg(r, pty);
+  T->addr_of(T, dst, v.op);
+  release(g, &v);
+  push(g, make_sv(dst, pty));
 }
 
 void cg_store(CG* g) {
@@ -506,34 +824,25 @@ void cg_store(CG* g) {
   SValue rv = pop(g);
   SValue lv = pop(g);
   CGTarget* T = g->target;
+  ensure_reg(g, &rv);
+  ensure_reg(g, &lv);
   if (!is_lvalue(&lv.op)) {
     compiler_panic(g->c, g->cur_loc, "cg_store: destination is not an lvalue");
   }
-  {
-    const Type* ty = lv.type ? lv.type : lv.op.type;
-    AliasKind ak = alias_for_lvalue(&lv.op);
-    i32 alias_local = (ak == ALIAS_LOCAL) ? (i32)lv.op.v.frame_slot : 0;
-    MemAccess ma = derive_mem(g, ty, ak, alias_local);
-    /* IMM is a legal source for store; otherwise force into a reg. */
-    Operand src = rv.op;
-    if (src.kind != OPK_REG && src.kind != OPK_IMM) {
-      Reg r = T->alloc_reg(T, type_class(ty), ty);
-      Operand dst = op_reg(r, ty);
-      MemAccess mr;
-      AliasKind sak = alias_for_lvalue(&src);
-      i32 saloc = (sak == ALIAS_LOCAL) ? (i32)src.v.frame_slot : 0;
-      mr = derive_mem(g, rv.type ? rv.type : ty, sak, saloc);
-      T->load(T, dst, src, mr);
-      src = dst;
-    }
-    T->store(T, lv.op, src, ma);
-    {
-      SValue out;
-      out.op = src;
-      out.type = ty;
-      push(g, out);
-    }
+  const Type* ty = sv_type(&lv);
+  /* IMM is a legal source for store; otherwise force the rvalue into a
+   * register. force_reg handles the lvalue → REG transition cleanly. */
+  Operand src;
+  if (rv.op.kind == OPK_IMM || rv.op.kind == OPK_REG) {
+    src = rv.op;
+  } else {
+    src = force_reg(g, &rv, sv_type(&rv));
   }
+  T->store(T, lv.op, src, mem_for_lvalue(g, &lv.op, ty));
+  release(g, &lv);
+  /* Result of assignment expression: leave the stored rvalue on top.
+   * Ownership of any reg in `src` transfers to the new SValue. */
+  push(g, make_sv(src, ty));
 }
 
 /* ============================================================
@@ -561,26 +870,32 @@ void cg_bitfield_store(CG* g, BitFieldAccess b) {
  * Arithmetic / compare / convert
  * ============================================================ */
 
-/* Force an SValue into a register operand of the given type. */
-static Operand force_reg(CG* g, SValue v, const Type* ty) {
+/* Force an SValue (already popped, by reference) into a register operand
+ * of the given type. Mutates `*v` so that v->op is OPK_REG and v->res is
+ * RES_REG; on lvalue inputs this means the original lvalue's base reg is
+ * freed and replaced by the freshly-loaded value reg. The caller can
+ * then release(g, v) to give the register back when the operation is
+ * done with it. */
+static Operand force_reg(CG* g, SValue* v, const Type* ty) {
   CGTarget* T = g->target;
-  if (v.op.kind == OPK_REG) return v.op;
-  if (v.op.kind == OPK_IMM) {
-    Reg r = T->alloc_reg(T, type_class(ty), ty);
-    Operand dst = op_reg(r, ty);
-    T->load_imm(T, dst, v.op.v.imm);
-    return dst;
-  }
-  if (is_lvalue(&v.op)) {
-    Reg r = T->alloc_reg(T, type_class(ty), ty);
-    Operand dst = op_reg(r, ty);
-    AliasKind ak = alias_for_lvalue(&v.op);
-    i32 al = (ak == ALIAS_LOCAL) ? (i32)v.op.v.frame_slot : 0;
-    MemAccess ma = derive_mem(g, ty, ak, al);
-    T->load(T, dst, v.op, ma);
-    return dst;
+  ensure_reg(g, v);
+  if (v->op.kind == OPK_REG) return v->op;
+  Reg r = alloc_reg_or_spill(g, type_class(ty), ty);
+  Operand dst = op_reg(r, ty);
+  if (v->op.kind == OPK_IMM) {
+    T->load_imm(T, dst, v->op.v.imm);
+  } else if (is_lvalue(&v->op)) {
+    T->load(T, dst, v->op, mem_for_lvalue(g, &v->op, ty));
+    /* Old INDIRECT base reg is no longer referenced — release it. */
+    if (v->op.kind == OPK_INDIRECT) {
+      T->free_reg(T, v->op.v.ind.base);
+    }
+  } else {
+    compiler_panic(g->c, g->cur_loc, "cg: cannot force operand to register");
   }
-  compiler_panic(g->c, g->cur_loc, "cg: cannot force operand to register");
+  v->op = dst;
+  v->res = RES_REG;
+  return dst;
 }
 
 void cg_binop(CG* g, BinOp op) {
@@ -590,33 +905,26 @@ void cg_binop(CG* g, BinOp op) {
   CGTarget* T = g->target;
   /* Result type is `a`'s type at this slice (parser already coerced). */
   const Type* ty = a.type ? a.type : b.type;
-  Operand ra = force_reg(g, a, ty);
-  Operand rb = force_reg(g, b, ty);
-  Reg rr = T->alloc_reg(T, type_class(ty), ty);
+  Operand ra = force_reg(g, &a, ty);
+  Operand rb = force_reg(g, &b, ty);
+  Reg rr = alloc_reg_or_spill(g, type_class(ty), ty);
   Operand dst = op_reg(rr, ty);
   T->binop(T, op, dst, ra, rb);
-  {
-    SValue sv;
-    sv.op = dst;
-    sv.type = ty;
-    push(g, sv);
-  }
+  release(g, &a);
+  release(g, &b);
+  push(g, make_sv(dst, ty));
 }
 
 void cg_unop(CG* g, UnOp op) {
   SValue a = pop(g);
   CGTarget* T = g->target;
   const Type* ty = a.type ? a.type : a.op.type;
-  Operand ra = force_reg(g, a, ty);
-  Reg rr = T->alloc_reg(T, type_class(ty), ty);
+  Operand ra = force_reg(g, &a, ty);
+  Reg rr = alloc_reg_or_spill(g, type_class(ty), ty);
   Operand dst = op_reg(rr, ty);
   T->unop(T, op, dst, ra);
-  {
-    SValue sv;
-    sv.op = dst;
-    sv.type = ty;
-    push(g, sv);
-  }
+  release(g, &a);
+  push(g, make_sv(dst, ty));
 }
 
 void cg_cmp(CG* g, CmpOp op) {
@@ -626,17 +934,14 @@ void cg_cmp(CG* g, CmpOp op) {
   CGTarget* T = g->target;
   const Type* opty = a.type ? a.type : b.type;
   const Type* i32 = type_prim(g->pool, TY_INT);
-  Operand ra = force_reg(g, a, opty);
-  Operand rb = force_reg(g, b, opty);
-  Reg rr = T->alloc_reg(T, RC_INT, i32);
+  Operand ra = force_reg(g, &a, opty);
+  Operand rb = force_reg(g, &b, opty);
+  Reg rr = alloc_reg_or_spill(g, RC_INT, i32);
   Operand dst = op_reg(rr, i32);
   T->cmp(T, op, dst, ra, rb);
-  {
-    SValue sv;
-    sv.op = dst;
-    sv.type = i32;
-    push(g, sv);
-  }
+  release(g, &a);
+  release(g, &b);
+  push(g, make_sv(dst, i32));
 }
 
 void cg_inc_dec(CG* g, BinOp op, int post) {
@@ -644,45 +949,31 @@ void cg_inc_dec(CG* g, BinOp op, int post) {
    * because juggling lv + old + new through dup/swap from outside requires
    * a 3-element rotate the stack API doesn't expose. */
   CGTarget* T = g->target;
-  SValue lv;
-  const Type* ty;
-  AliasKind ak;
-  i32 alias_local;
-  MemAccess ma;
-  Reg r_old;
-  Reg r_new;
-  Operand o_old;
-  Operand o_new;
-  Operand o_one;
-
-  lv = pop(g);
+  SValue lv = pop(g);
+  ensure_reg(g, &lv);
   if (!is_lvalue(&lv.op)) {
     compiler_panic(g->c, g->cur_loc,
                    "cg_inc_dec: target is not an lvalue");
   }
-  ty = lv.type ? lv.type : lv.op.type;
-  ak = alias_for_lvalue(&lv.op);
-  alias_local = (ak == ALIAS_LOCAL) ? (i32)lv.op.v.frame_slot : 0;
-  ma = derive_mem(g, ty, ak, alias_local);
+  const Type* ty = sv_type(&lv);
+  MemAccess ma = mem_for_lvalue(g, &lv.op, ty);
 
   /* Load current value into r_old, compute r_new = r_old +/- 1, store back. */
-  r_old = T->alloc_reg(T, type_class(ty), ty);
-  o_old = op_reg(r_old, ty);
+  Reg r_old = alloc_reg_or_spill(g, type_class(ty), ty);
+  Operand o_old = op_reg(r_old, ty);
   T->load(T, o_old, lv.op, ma);
 
-  r_new = T->alloc_reg(T, type_class(ty), ty);
-  o_new = op_reg(r_new, ty);
-  o_one = op_imm(1, ty);
-  T->binop(T, op, o_new, o_old, o_one);
+  Reg r_new = alloc_reg_or_spill(g, type_class(ty), ty);
+  Operand o_new = op_reg(r_new, ty);
+  T->binop(T, op, o_new, o_old, op_imm(1, ty));
 
   T->store(T, lv.op, o_new, ma);
 
-  {
-    SValue sv;
-    sv.op = post ? o_old : o_new;
-    sv.type = ty;
-    push(g, sv);
-  }
+  /* Free whichever register is NOT being returned, plus any base reg the
+   * lvalue owned. */
+  T->free_reg(T, post ? r_new : r_old);
+  release(g, &lv);
+  push(g, make_sv(post ? o_old : o_new, ty));
 }
 
 void cg_convert(CG* g, const Type* dst_ty) {
@@ -698,8 +989,8 @@ void cg_convert(CG* g, const Type* dst_ty) {
     push(g, v);
     return;
   }
-  src = force_reg(g, v, sty);
-  rr = T->alloc_reg(T, type_class(dst_ty), dst_ty);
+  src = force_reg(g, &v, sty);
+  rr = alloc_reg_or_spill(g, type_class(dst_ty), dst_ty);
   dst = op_reg(rr, dst_ty);
   /* Pick a ConvKind from src/dst kinds. v1 spine only sees integer↔integer
    * (sign/zero ext + trunc); float and bitcast follow the same dispatch. */
@@ -733,12 +1024,8 @@ void cg_convert(CG* g, const Type* dst_ty) {
     }
   }
   T->convert(T, ck, dst, src);
-  {
-    SValue sv;
-    sv.op = dst;
-    sv.type = dst_ty;
-    push(g, sv);
-  }
+  release(g, &v);
+  push(g, make_sv(dst, dst_ty));
 }
 
 /* ============================================================
@@ -749,60 +1036,51 @@ void cg_call(CG* g, u32 nargs, const Type* fn_type) {
   /* stack: [..., callee, arg0..argN-1] → [result] (or nothing if void) */
   CGTarget* T = g->target;
   const ABIFuncInfo* abi = abi_func_info(g->abi, fn_type);
-  CGABIValue* avs = NULL;
-  CGABIValue ret_v;
-  CGCallDesc desc;
-  Operand callee_op;
-  SValue callee;
-  u32 i;
+  const Type* ret_ty = fn_type->fn.ret;
+  int has_result = ret_ty && ret_ty->kind != TY_VOID;
 
   if (g->sp < (u32)nargs + 1u) {
     compiler_panic(g->c, g->cur_loc, "cg_call: stack underflow");
   }
+  CGABIValue* avs = NULL;
   if (nargs) {
     avs = arena_array(g->c->tu, CGABIValue, nargs);
     memset(avs, 0, sizeof(CGABIValue) * nargs);
   }
 
-  /* Pop args in reverse so we can fill avs[i] in declaration order. */
-  for (i = 0; i < nargs; ++i) {
+  /* Expose avs to the regalloc fallback. As we pop and materialize args
+   * one at a time, the popped regs accumulate in avs[] off the value
+   * stack, where pick_victim can't reach them. If pressure exhausts the
+   * pool while reloading a spilled arg later in the loop, spill_avs_victim
+   * picks an already-materialized avs entry, stores it to a frame slot,
+   * and rewrites avs[i].storage to OPK_LOCAL — the backend's call
+   * lowering loads from the slot. */
+  g->avs_in_flight = avs;
+  g->avs_in_flight_n = nargs;
+
+  /* Pop args in reverse so we can fill avs[i] in declaration order.
+   * Lvalues materialize into a register through force_reg (which also
+   * frees an old INDIRECT base); OPK_IMM and OPK_REG pass through so
+   * the call sees the same operand. */
+  for (u32 i = 0; i < nargs; ++i) {
     u32 idx = nargs - 1u - i;
     SValue arg = pop(g);
+    ensure_reg(g, &arg);
     const Type* aty = fn_type->fn.params ? fn_type->fn.params[idx] : arg.type;
-    Operand src;
-    /* Materialize into an Operand the backend can route through ABI parts.
-     * For simple scalars REG/IMM is enough; aggregates would force LOCAL. */
-    if (arg.op.kind == OPK_LOCAL || arg.op.kind == OPK_GLOBAL ||
-        arg.op.kind == OPK_INDIRECT) {
-      /* lvalue: backend may need an address (byval/indirect) or a loaded
-       * value. Spine: scalars only — load to register. */
-      Reg r = T->alloc_reg(T, type_class(aty), aty);
-      Operand dst = op_reg(r, aty);
-      AliasKind ak = alias_for_lvalue(&arg.op);
-      i32 al = (ak == ALIAS_LOCAL) ? (i32)arg.op.v.frame_slot : 0;
-      MemAccess ma = derive_mem(g, aty, ak, al);
-      T->load(T, dst, arg.op, ma);
-      src = dst;
-    } else if (arg.op.kind == OPK_IMM) {
-      src = arg.op;
-    } else {
-      src = arg.op;
-    }
     avs[idx].type = aty;
     avs[idx].abi = &abi->params[idx];
-    avs[idx].storage = src;
-    avs[idx].parts = NULL;
-    avs[idx].nparts = 0;
+    avs[idx].storage = is_lvalue(&arg.op) ? force_reg(g, &arg, aty) : arg.op;
   }
 
-  callee = pop(g);
-  if (callee.op.kind == OPK_GLOBAL) {
-    callee_op = callee.op;
-  } else {
-    /* Indirect call — force into a register if necessary. */
-    callee_op = force_reg(g, callee, fn_type);
-  }
+  SValue callee = pop(g);
+  ensure_reg(g, &callee);
+  /* Direct calls keep the OPK_GLOBAL operand; indirect calls force the
+   * function pointer into a register. */
+  Operand callee_op = (callee.op.kind == OPK_GLOBAL)
+                          ? callee.op
+                          : force_reg(g, &callee, fn_type);
 
+  CGCallDesc desc;
   memset(&desc, 0, sizeof desc);
   desc.fn_type = fn_type;
   desc.abi = abi;
@@ -810,24 +1088,29 @@ void cg_call(CG* g, u32 nargs, const Type* fn_type) {
   desc.args = avs;
   desc.nargs = nargs;
   desc.flags = CG_CALL_NONE;
-  /* Return storage: REG of the right class for scalar returns; struct
-   * returns would set parts/storage differently. */
-  memset(&ret_v, 0, sizeof ret_v);
-  ret_v.type = fn_type->fn.ret;
-  ret_v.abi = &abi->ret;
-  if (ret_v.type && ret_v.type->kind != TY_VOID) {
-    Reg r = T->alloc_reg(T, type_class(ret_v.type), ret_v.type);
-    ret_v.storage = op_reg(r, ret_v.type);
+  desc.ret.type = ret_ty;
+  desc.ret.abi = &abi->ret;
+  if (has_result) {
+    Reg r = alloc_reg_or_spill(g, type_class(ret_ty), ret_ty);
+    desc.ret.storage = op_reg(r, ret_ty);
   }
-  desc.ret = ret_v;
 
   T->call(T, &desc);
 
-  if (ret_v.type && ret_v.type->kind != TY_VOID) {
-    SValue sv;
-    sv.op = ret_v.storage;
-    sv.type = ret_v.type;
-    push(g, sv);
+  /* Tear down the in-flight arg set: each entry's storage may be a REG
+   * (return to pool) or OPK_LOCAL (a spill slot, return to per-class
+   * free-list). IMMs carry no runtime ownership. */
+  for (u32 i = 0; i < nargs; ++i) {
+    release_arg_storage(g, &avs[i].storage);
+  }
+  g->avs_in_flight = NULL;
+  g->avs_in_flight_n = 0;
+
+  if (callee.op.kind != OPK_GLOBAL) {
+    T->free_reg(T, callee_op.v.reg);
+  }
+  if (has_result) {
+    push(g, make_sv(desc.ret.storage, ret_ty));
   }
 }
 
@@ -848,13 +1131,14 @@ void cg_ret(CG* g, int has_value) {
   {
     SValue v = pop(g);
     const Type* rty = g->fn_ret_type;
-    Operand ret_op = force_reg(g, v, rty);
+    Operand ret_op = force_reg(g, &v, rty);
     CGABIValue av;
     memset(&av, 0, sizeof av);
     av.type = rty;
     av.abi = &abi->ret;
     av.storage = ret_op;
     T->ret(T, &av);
+    release(g, &v);
   }
 }
 
@@ -869,22 +1153,15 @@ void cg_alloca(CG* g) {
    * size up to a 16-byte multiple, which is what keeps SP aligned). */
   CGTarget* T = g->target;
   SValue sz = pop(g);
-  Operand sz_op;
   const Type* void_ptr = type_ptr(g->pool, type_void(g->pool));
-  Reg dst_r;
-  Operand dst;
-  SValue out;
-  if (sz.op.kind == OPK_IMM) {
-    sz_op = sz.op;
-  } else {
-    sz_op = force_reg(g, sz, sz.type ? sz.type : sz.op.type);
-  }
-  dst_r = T->alloc_reg(T, RC_INT, void_ptr);
-  dst = op_reg(dst_r, void_ptr);
+  ensure_reg(g, &sz);
+  Operand sz_op =
+      (sz.op.kind == OPK_IMM) ? sz.op : force_reg(g, &sz, sv_type(&sz));
+  Reg dst_r = alloc_reg_or_spill(g, RC_INT, void_ptr);
+  Operand dst = op_reg(dst_r, void_ptr);
   T->alloca_(T, dst, sz_op, /*align=*/16);
-  out.op = dst;
-  out.type = void_ptr;
-  push(g, out);
+  release(g, &sz);
+  push(g, make_sv(dst, void_ptr));
 }
 void cg_va_start_(CG* g) {
   compiler_panic(g->c, g->cur_loc, "cg_va_start: not in v1 slice");
@@ -945,9 +1222,10 @@ void cg_branch_true(CG* g, CGLabel l) {
   SValue v = pop(g);
   CGTarget* T = g->target;
   const Type* ty = v.type ? v.type : type_prim(g->pool, TY_INT);
-  Operand a = force_reg(g, v, ty);
+  Operand a = force_reg(g, &v, ty);
   Operand zero = op_imm(0, ty);
   T->cmp_branch(T, CMP_NE, a, zero, (Label)l);
+  release(g, &v);
 }
 
 void cg_branch_false(CG* g, CGLabel l) {
@@ -962,12 +1240,14 @@ void cg_branch_false(CG* g, CGLabel l) {
     if (v.op.v.imm == 0) {
       T->jump(T, (Label)l);
     }
+    release(g, &v);
     return;
   }
   {
-    Operand a = force_reg(g, v, ty);
+    Operand a = force_reg(g, &v, ty);
     Operand zero = op_imm(0, ty);
     T->cmp_branch(T, CMP_EQ, a, zero, (Label)l);
+    release(g, &v);
   }
 }
 
@@ -986,7 +1266,11 @@ CGScope cg_scope_begin(CG* g, CGScopeConfig cfg) {
     /* Pop the condition. */
     SValue v = pop(g);
     const Type* ty = v.type ? v.type : type_prim(g->pool, TY_INT);
-    d.cond = force_reg(g, v, ty);
+    d.cond = force_reg(g, &v, ty);
+    /* The cond reg is consumed by the backend's scope_begin emit; once
+     * the comparison/branch is in flight there's no live use, so free
+     * it back to the pool now. */
+    release(g, &v);
   }
   return (CGScope)g->target->scope_begin(g->target, &d);
 }
diff --git a/src/parse/parse.c b/src/parse/parse.c
@@ -53,7 +53,6 @@ extern void cg_retag_top(CG*, const Type*);
 /* Recycle the backend's scratch-register pool when no value-stack entry
  * holds a live register. Called at statement boundaries to avoid
  * exhausting the fixed scratch window over the course of a function. */
-extern void cg_reset_scratch(CG*);
 
 /* ============================================================
  * Keywords
@@ -1636,7 +1635,6 @@ static int parse_decl_suffix(Parser* p, DeclSuffix* out) {
         cg_swap(p->cg);
         cg_store(p->cg);
         cg_drop(p->cg);
-        cg_reset_scratch(p->cg);
         p->vla_pending = 1;
         p->vla_pending_count_slot = out->vla_count_slot;
       }
@@ -1823,7 +1821,6 @@ static void zero_init_at(Parser* p, FrameSlot slot, const Type* arr_ty,
   cg_push_int(p->cg, 0, ty);
   cg_store(p->cg);
   cg_drop(p->cg);
-  cg_reset_scratch(p->cg);
 }
 
 /* Parse the initializer for the sub-object at `offset` of type `ty`. Arrays
@@ -1862,7 +1859,6 @@ static void init_at(Parser* p, FrameSlot slot, const Type* arr_ty, u32 offset,
   to_rvalue(p);
   cg_store(p->cg);
   cg_drop(p->cg);
-  cg_reset_scratch(p->cg);
   if (had_brace) {
     accept_punct(p, ','); /* tolerate trailing comma inside `{ x, }` */
     expect_punct(p, '}', "'}' after scalar initializer");
@@ -1900,7 +1896,6 @@ static void parse_init_declarator(Parser* p, const DeclSpecs* specs) {
     cg_swap(p->cg);
     cg_store(p->cg);
     cg_drop(p->cg);
-    cg_reset_scratch(p->cg);
     if (accept_punct(p, '=')) {
       perr(p, "VLA initializers are not allowed (§6.7.9 ¶3)");
     }
@@ -2092,7 +2087,6 @@ static void parse_stmt(Parser* p) {
   /* Each statement starts from an empty value stack; recycle scratch
    * registers so a function body with many sequential reg-allocating
    * operations isn't bounded by the backend's fixed scratch window. */
-  cg_reset_scratch(p->cg);
   cg_set_loc(p->cg, tok_loc(&p->cur));
   if (is_punct(&p->cur, '{')) {
     parse_compound_stmt(p);
diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md
@@ -143,6 +143,7 @@ here for completeness once they're real cases.
 | `6_5_30_generic_selection`| · | `int x=42; return _Generic((x), int: x, default: 0);` | 42 |
 | `6_5_31_subscript_commute`| ★ | `int a[5]={0,0,42,0,0}; return 2[a];`     |  42 |
 | `6_5_32_string_subscript` | ★ | `return "*"[0];`                          |  42 |
+| `6_5_33_regalloc_spill`   | ★ | 12-arg `sum12(x1+0, ..., x12+0)` — exceeds the 10-INT scratch pool, exercises `spill_reg`/`reload_reg` and the cg_call avs-in-flight fallback (see doc/REGALLOC.md) | 78 |
 
 ## §6.6 Constant expressions
 
diff --git a/test/parse/cases/6_5_33_regalloc_spill.c b/test/parse/cases/6_5_33_regalloc_spill.c
@@ -0,0 +1,32 @@
+/* Regalloc spill stress.
+ *
+ * Each `xN + 0` arg is a binop that leaves a register-resident rvalue on
+ * the cg value stack until cg_call materializes it. With 12 simultaneously
+ * accumulating arg values and a 10-INT scratch pool on aarch64, several
+ * code paths the spill driver must handle get exercised:
+ *
+ *   - alloc_reg_or_spill returning REG_NONE on pool exhaustion and
+ *     picking the deepest unpinned RES_REG via pick_victim, then routing
+ *     it through spill_reg / take_spill_slot.
+ *   - ensure_reg reloading SPILLED args inside cg_call's arg-pop loop,
+ *     consuming pool slots as it goes.
+ *   - The avs-in-flight fallback: once cg_call has popped enough RES_REG
+ *     args into avs[] to drain the pool, alloc_reg_or_spill finds no
+ *     stack victim (later args are SPILLED, earlier args are off-stack
+ *     in avs[]) and falls back to spilling an OPK_REG avs entry through
+ *     spill_avs_victim, rewriting it to OPK_LOCAL so the backend's call
+ *     lowering loads from the spill slot.
+ *
+ * 1+2+...+12 = 78. */
+
+int sum12(int a, int b, int c, int d, int e, int f,
+          int g, int h, int i, int j, int k, int l) {
+  return a + b + c + d + e + f + g + h + i + j + k + l;
+}
+
+int test_main(void) {
+  int x1 = 1, x2 = 2, x3 = 3, x4 = 4, x5 = 5, x6 = 6;
+  int x7 = 7, x8 = 8, x9 = 9, x10 = 10, x11 = 11, x12 = 12;
+  return sum12(x1 + 0, x2 + 0, x3 + 0, x4 + 0, x5 + 0, x6 + 0,
+               x7 + 0, x8 + 0, x9 + 0, x10 + 0, x11 + 0, x12 + 0);
+}
diff --git a/test/parse/cases/6_5_33_regalloc_spill.expected b/test/parse/cases/6_5_33_regalloc_spill.expected
@@ -0,0 +1 @@
+78

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

A	doc/REGALLOC.md	\|	346	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	src/arch/aarch64.c	\|	174	+++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------
M	src/cg/cg.c	\|	816	+++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------
M	src/parse/parse.c	\|	6	------
M	test/parse/CORPUS.md	\|	1	+
A	test/parse/cases/6_5_33_regalloc_spill.c	\|	32	++++++++++++++++++++++++++++++++
A	test/parse/cases/6_5_33_regalloc_spill.expected	\|	1	+