parse/cg: variadic lowering + FP arithmetic dispatch - kit

commit 74c6f214ef1dfd173158873ecd2893772e2a5b3a
parent 0385d963245ecd1c8563e0871e8a2348bb84e999
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 12:58:55 -0700

parse/cg: variadic lowering + FP arithmetic dispatch

Wire cg_va_start/_arg/_end/_copy through to the aarch64 backend hooks
(aa_va_*, already implemented for AAPCS64). Fix cg_call to mark
variadic args with abi=NULL so emit_arg_value falls into its
synthesized one-part DIRECT path instead of indexing past abi->params.
Aarch64 addr_base gains a SUB/ADD-imm fallback that materializes the
FP-relative address into a scratch reg when the offset exceeds the
±256 STUR/LDUR window — variadic functions trip this trivially because
the GP+FP save areas reserve 192 bytes before any declared local.

Variadic exposed an unrelated parser bug: parse_mul / parse_add always
emitted BO_IADD/ISUB/IMUL/SDIV regardless of operand type, so
double + double lowered as integer ADD. Apply §6.3.1.8 usual
arithmetic conversions when either operand is floating, dispatching to
BO_FADD/FSUB/FMUL/FDIV. Modulo stays integer-only (C `%` is undefined
for FP).

Drop the .skip sidecars on 6_7_6_08, 6_9_06, builtin_03, builtin_05;
they now pass D/R/E/J end to end. Add seven variadic_* rows pinning
zero-vararg, GP overflow into the stack tail, long/pointer/double/
mixed-class va_arg, and va_copy through a helper. Add four 6_5_3{6-9}
rows pinning the FP-arith dispatch.

Diffstat:
M doc/parser-status.md  | 51 ++++++++++++++++++++++++++++++++++++++++++---------
M src/arch/aarch64.c  | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
M src/cg/cg.c  | 50 +++++++++++++++++++++++++++++++++++++++++++-------
M src/parse/parse.c  | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
M test/parse/CORPUS.md  | 28 ++++++++++++++++++++++++----
A test/parse/cases/6_5_36_fp_arith.c  | 12 ++++++++++++
A test/parse/cases/6_5_36_fp_arith.expected  | 1 +
A test/parse/cases/6_5_37_fp_int_promote.c  | 9 +++++++++
A test/parse/cases/6_5_37_fp_int_promote.expected  | 1 +
A test/parse/cases/6_5_38_fp_float_widen.c  | 6 ++++++
A test/parse/cases/6_5_38_fp_float_widen.expected  | 1 +
A test/parse/cases/6_5_39_float_arith.c  | 6 ++++++
A test/parse/cases/6_5_39_float_arith.expected  | 1 +
D test/parse/cases/6_7_6_08_variadic_decl.skip  | 1 -
D test/parse/cases/6_9_06_variadic_func.skip  | 1 -
D test/parse/cases/builtin_03_va_list.skip  | 1 -
D test/parse/cases/builtin_05_va_copy.skip  | 1 -
A test/parse/cases/variadic_01_zero_args.c  | 12 ++++++++++++
A test/parse/cases/variadic_01_zero_args.expected  | 1 +
A test/parse/cases/variadic_02_many_ints.c  | 15 +++++++++++++++
A test/parse/cases/variadic_02_many_ints.expected  | 1 +
A test/parse/cases/variadic_03_long.c  | 13 +++++++++++++
A test/parse/cases/variadic_03_long.expected  | 1 +
A test/parse/cases/variadic_04_pointer.c  | 18 ++++++++++++++++++
A test/parse/cases/variadic_04_pointer.expected  | 1 +
A test/parse/cases/variadic_05_double.c  | 14 ++++++++++++++
A test/parse/cases/variadic_05_double.expected  | 1 +
A test/parse/cases/variadic_06_mixed.c  | 19 +++++++++++++++++++
A test/parse/cases/variadic_06_mixed.expected  | 1 +
A test/parse/cases/variadic_07_nested_call.c  | 26 ++++++++++++++++++++++++++
A test/parse/cases/variadic_07_nested_call.expected  | 1 +

31 files changed, 389 insertions(+), 41 deletions(-)
diff --git a/doc/parser-status.md b/doc/parser-status.md
@@ -436,19 +436,52 @@ Phase 9 also added:
     (`abi_aapcs64_va_list_type` is a 32-byte struct; Apple ARM64 is
     `char*`).
 
-Lowering gap: the `cg_va_*` and `cg_atomic_*` primitives panic at the
-cg layer ("not in v1 slice"). Parser routing is what Phase 9 owns;
-backend wiring is a follow-up. Until that lands, the corpus rows that
-exercise the runtime side carry `.skip` sidecars (`6_7_6_08`,
-`6_9_06`, `builtin_03`, `builtin_05`, `builtin_06`, `builtin_07`),
+Lowering gap: the `cg_atomic_*` primitives still panic at the cg layer
+("not in v1 slice"). The `cg_va_*` primitives now wire through to the
+backend's `va_start_` / `va_arg_` / `va_end_` / `va_copy_` hooks (the
+aarch64 backend already implemented all four against the AAPCS64 §6
+five-field `__va_list`); the only follow-up gap on the variadic side
+is the parser default-arg promotion when calling a variadic from a
+prototype-bearing call site (Phase 9 keeps `arg.type` verbatim, so a
+caller passing `1.0f` to `...` writes a 4-byte FP into v0 instead of
+an 8-byte FP — no corpus row hits this yet because the existing tests
+pass `1.0` directly).
+
+Variadic call lowering also needed `cg_call` to mark `idx >= nparams`
+arguments as variadic — `avs[idx].abi = NULL` — so the aarch64
+backend's `emit_arg_value` falls into the synthesized `va_pt` path
+(and on Apple ARM64 bumps `next_int=next_fp=8` to force stack
+placement). The prior code indexed `abi->params[idx]` for variadic
+args, reading off the end of the array. Aarch64's frame-relative
+load/store also gained an `addr_base` fallback that materializes the
+FP-relative address into a scratch register (`x9` or `x10`) when the
+offset exceeds STUR/LDUR's ±256 window — variadic functions hit this
+trivially because the GP+FP save areas reserve 192 bytes before any
+declared local.
+
+Atomics still carry `.skip` sidecars (`builtin_06`, `builtin_07`),
 treated as failure unless `CFREE_TEST_ALLOW_SKIP=1` — same convention
 as `6_7_2_12_long_double`.
 
 Unlocks (status as landed): `builtin_01_alloca` ★, `builtin_02_expect`
-★, `builtin_04_offsetof` ★. `builtin_03_va_list`, `builtin_05_va_copy`,
-`builtin_06_atomic_load`, `builtin_07_atomic_fetch_add`, `6_9_06`,
-`6_7_6_08` parse end-to-end but `.skip` pending the cg-layer wiring of
-`cg_va_*` / `cg_atomic_*`.
+★, `builtin_03_va_list` ★, `builtin_04_offsetof` ★, `builtin_05_va_copy`
+★, `6_9_06` ★, `6_7_6_08` ★, plus the new variadic-coverage rows
+`variadic_01_zero_args` … `variadic_07_nested_call` (zero-vararg call,
+overflow into the stack tail, `long`/pointer/`double`/mixed/`va_copy`
+through a helper). `builtin_06_atomic_load`,
+`builtin_07_atomic_fetch_add` remain `.skip` pending the cg-layer
+wiring of `cg_atomic_*`.
+
+Phase 9 also revealed and fixed an unrelated parser bug exposed by the
+new variadic FP rows: `parse_mul` and `parse_add` always emitted
+`BO_IADD`/`BO_ISUB`/`BO_IMUL`/`BO_SDIV` regardless of operand type, so
+`double + double` lowered as integer ADD. The parser now applies the
+§6.3.1.8 usual arithmetic conversions (FP slice) — when either operand
+is a floating type, both convert to the wider FP type via `cg_convert`
+and the op routes to `BO_FADD`/`FSUB`/`FMUL`/`FDIV`. `BO_SREM`/`BO_UREM`
+remain integer-only (C `%` is undefined for FP). New rows
+`6_5_36_fp_arith`, `6_5_37_fp_int_promote`, `6_5_38_fp_float_widen`,
+`6_5_39_float_arith` pin the surface.
 
 ---
 
diff --git a/src/arch/aarch64.c b/src/arch/aarch64.c
@@ -1273,19 +1273,70 @@ static void emit_global_addr(CGTarget* t, u32 dst_reg, ObjSymId sym,
                     0);
 }
 
+/* Materialize a SUB/ADD imm sequence that puts (base ± abs_off) into Rd.
+ * abs_off must be representable as imm12 or imm12<<12 (or the sum). For
+ * larger offsets, falls back to MOV+ADD via emit_load_imm. */
+static void emit_addr_adjust(MCEmitter* mc, u32 Rd, u32 base, i32 off) {
+  if (off == 0) {
+    emit32(mc, aa64_mov_reg(1, Rd, base));
+    return;
+  }
+  u32 abs_off = (off < 0) ? (u32)(-off) : (u32)off;
+  /* Single imm12. */
+  if (abs_off <= 0xfff) {
+    if (off < 0)
+      emit32(mc, aa64_sub_imm(1, Rd, base, abs_off, 0));
+    else
+      emit32(mc, aa64_add_imm(1, Rd, base, abs_off, 0));
+    return;
+  }
+  /* Two-shift form: hi12 + lo12 (when low is zero, hi only). */
+  if ((abs_off >> 24) == 0) {
+    u32 hi = (abs_off >> 12) & 0xfff;
+    u32 lo = abs_off & 0xfff;
+    if (off < 0) {
+      if (hi) emit32(mc, aa64_sub_imm(1, Rd, base, hi, 1));
+      if (lo) emit32(mc, aa64_sub_imm(1, Rd, hi ? Rd : base, lo, 0));
+    } else {
+      if (hi) emit32(mc, aa64_add_imm(1, Rd, base, hi, 1));
+      if (lo) emit32(mc, aa64_add_imm(1, Rd, hi ? Rd : base, lo, 0));
+    }
+    return;
+  }
+  /* Generic: load constant into Rd, then add. */
+  emit_load_imm(mc, 1, Rd, off);
+  emit32(mc, aa64_add(1, Rd, base, Rd));
+}
+
 /* Resolve an address operand (LOCAL or INDIRECT) into (base_reg, signed
- * offset) via a possibly-temporary base register. Returns the base reg. */
+ * offset) via a possibly-temporary base register. Returns the base reg.
+ * Frames larger than the STUR/LDUR ±256 window land here via tmp_reg —
+ * the caller passes 0 as offset and uses the returned register directly. */
 static u32 addr_base(CGTarget* t, Operand addr, i32* out_off, u32 tmp_reg) {
   AAImpl* a = impl_of(t);
   if (addr.kind == OPK_LOCAL) {
     AASlot* s = slot_get(a, addr.v.frame_slot);
     if (!s) compiler_panic(t->c, a->loc, "aarch64 addr_base: bad slot");
-    *out_off = -(i32)s->off;
-    return 29; /* x29 = fp */
+    i32 off = -(i32)s->off;
+    if (off >= -256 && off <= 255) {
+      *out_off = off;
+      return 29; /* x29 = fp */
+    }
+    /* Out of STUR range — synthesize the address into tmp_reg. */
+    emit_addr_adjust(t->mc, tmp_reg, 29, off);
+    *out_off = 0;
+    return tmp_reg;
   }
   if (addr.kind == OPK_INDIRECT) {
-    *out_off = addr.v.ind.ofs;
-    return reg_num((Operand){.kind = OPK_REG, .v.reg = addr.v.ind.base});
+    i32 off = addr.v.ind.ofs;
+    u32 base = addr.v.ind.base & 0x1f;
+    if (off >= -256 && off <= 255) {
+      *out_off = off;
+      return base;
+    }
+    emit_addr_adjust(t->mc, tmp_reg, base, off);
+    *out_off = 0;
+    return tmp_reg;
   }
   if (addr.kind == OPK_GLOBAL) {
     emit_global_addr(t, tmp_reg, addr.v.global.sym, addr.v.global.addend);
@@ -1297,7 +1348,6 @@ static u32 addr_base(CGTarget* t, Operand addr, i32* out_off, u32 tmp_reg) {
 }
 
 static void aa_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma) {
-  AAImpl* a = impl_of(t);
   u32 sz = ma.size ? ma.size : type_byte_size(addr.type);
   u32 sidx = size_idx_for_bytes(sz);
 
@@ -1324,10 +1374,6 @@ static void aa_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma) {
 
   i32 off;
   u32 base = addr_base(t, addr, &off, 9);
-  if (off < -256 || off > 255) {
-    compiler_panic(t->c, a->loc, "aarch64 load: offset %d out of LDUR range",
-                   off);
-  }
   if (dst.cls == RC_FP) {
     emit32(t->mc, aa64_ldur_fp(sidx, reg_num(dst), base, off));
   } else {
@@ -1336,7 +1382,6 @@ static void aa_load(CGTarget* t, Operand dst, Operand addr, MemAccess ma) {
 }
 
 static void aa_store(CGTarget* t, Operand addr, Operand src, MemAccess ma) {
-  AAImpl* a = impl_of(t);
   u32 sz = ma.size ? ma.size : type_byte_size(addr.type);
   u32 sidx = size_idx_for_bytes(sz);
 
@@ -1377,11 +1422,11 @@ static void aa_store(CGTarget* t, Operand addr, Operand src, MemAccess ma) {
   }
 
   i32 off;
-  u32 base = addr_base(t, addr, &off, 9);
-  if (off < -256 || off > 255) {
-    compiler_panic(t->c, a->loc, "aarch64 store: offset %d out of STUR range",
-                   off);
-  }
+  /* For OPK_IMM source we need x9 to materialize the value, so the
+   * address synthesis (addr_base fallback) lands in x10. Otherwise x9
+   * is free. */
+  u32 addr_tmp = (src.kind == OPK_IMM) ? 10u : 9u;
+  u32 base = addr_base(t, addr, &off, addr_tmp);
 
   if (src.kind == OPK_IMM) {
     /* Materialize through a scratch register. Use x9 (caller-saved). */
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -1103,9 +1103,20 @@ void cg_call(CG* g, u32 nargs, const Type* fn_type) {
     u32 idx = nargs - 1u - i;
     SValue arg = pop(g);
     ensure_reg(g, &arg);
-    const Type* aty = fn_type->fn.params ? fn_type->fn.params[idx] : arg.type;
+    /* Variadic callees: idx >= nparams indexes into the trailing `...`,
+     * which has no entry in fn.params or abi->params. The type comes
+     * from the argument itself (already promoted by the parser per
+     * §6.5.2.2 ¶6) and the per-arg ABI is left NULL so the backend
+     * synthesizes a one-part DIRECT classification on the spot. */
+    int is_vararg = (idx >= abi->nparams);
+    const Type* aty;
+    if (is_vararg) {
+      aty = arg.type ? arg.type : sv_type(&arg);
+    } else {
+      aty = fn_type->fn.params ? fn_type->fn.params[idx] : arg.type;
+    }
     avs[idx].type = aty;
-    avs[idx].abi = &abi->params[idx];
+    avs[idx].abi = is_vararg ? NULL : &abi->params[idx];
     avs[idx].storage = is_lvalue(&arg.op) ? force_reg(g, &arg, aty) : arg.op;
   }
 
@@ -1200,18 +1211,43 @@ void cg_alloca(CG* g) {
   release(g, &sz);
   push(g, make_sv(dst, void_ptr));
 }
+/* Variadics. Parser pushes &ap (pointer rvalue) before each call; cg pops
+ * it as a register operand and forwards to the backend. va_arg additionally
+ * allocates a destination register typed by the requested arg type. */
 void cg_va_start_(CG* g) {
-  compiler_panic(g->c, g->cur_loc, "cg_va_start: not in v1 slice");
+  CGTarget* T = g->target;
+  SValue ap = pop(g);
+  Operand ap_op = force_reg(g, &ap, sv_type(&ap));
+  T->va_start_(T, ap_op);
+  release(g, &ap);
 }
 void cg_va_arg_(CG* g, const Type* t) {
-  (void)t;
-  compiler_panic(g->c, g->cur_loc, "cg_va_arg: not in v1 slice");
+  CGTarget* T = g->target;
+  SValue ap = pop(g);
+  Operand ap_op = force_reg(g, &ap, sv_type(&ap));
+  Reg dst_r = alloc_reg_or_spill(g, type_class(t), t);
+  Operand dst = op_reg(dst_r, t);
+  T->va_arg_(T, dst, ap_op, t);
+  release(g, &ap);
+  push(g, make_sv(dst, t));
 }
 void cg_va_end_(CG* g) {
-  compiler_panic(g->c, g->cur_loc, "cg_va_end: not in v1 slice");
+  CGTarget* T = g->target;
+  SValue ap = pop(g);
+  Operand ap_op = force_reg(g, &ap, sv_type(&ap));
+  T->va_end_(T, ap_op);
+  release(g, &ap);
 }
 void cg_va_copy_(CG* g) {
-  compiler_panic(g->c, g->cur_loc, "cg_va_copy: not in v1 slice");
+  CGTarget* T = g->target;
+  /* Parser pushes &dst then &src; pop src first. */
+  SValue src = pop(g);
+  SValue dst = pop(g);
+  Operand src_op = force_reg(g, &src, sv_type(&src));
+  Operand dst_op = force_reg(g, &dst, sv_type(&dst));
+  T->va_copy_(T, dst_op, src_op);
+  release(g, &src);
+  release(g, &dst);
 }
 void cg_setjmp(CG* g) {
   compiler_panic(g->c, g->cur_loc, "cg_setjmp: not in v1 slice");
diff --git a/src/parse/parse.c b/src/parse/parse.c
@@ -2554,6 +2554,51 @@ static void parse_unary(Parser* p) {
  * and a list of accepted operators with their codegen mapping. Inlined as
  * a single function per level to keep the call graph readable. */
 
+static int type_is_fp(const Type* t) {
+  return t && (t->kind == TY_FLOAT || t->kind == TY_DOUBLE ||
+               t->kind == TY_LDOUBLE);
+}
+
+/* §6.3.1.8 usual arithmetic conversions (FP slice). When either operand
+ * is FP, both convert to the wider FP type. When both are integer, the
+ * caller's existing integer dispatch handles it. Returns the common
+ * arithmetic type, or NULL if the parser should fall through to integer
+ * dispatch. */
+static const Type* common_fp_type(Parser* p, const Type* a, const Type* b) {
+  if (!type_is_fp(a) && !type_is_fp(b)) return NULL;
+  /* `long double` not yet wired through cg's FP path. */
+  if ((a && a->kind == TY_LDOUBLE) || (b && b->kind == TY_LDOUBLE)) {
+    return type_prim(p->pool, TY_LDOUBLE);
+  }
+  if ((a && a->kind == TY_DOUBLE) || (b && b->kind == TY_DOUBLE)) {
+    return type_prim(p->pool, TY_DOUBLE);
+  }
+  return type_prim(p->pool, TY_FLOAT);
+}
+
+/* Coerce the top two stack values to `common`, then dispatch the FP form
+ * of `bop` (BO_IADD→BO_FADD, etc.). */
+static void emit_fp_binop(Parser* p, BinOp bop, const Type* common) {
+  /* Convert top (rhs) first; cg_convert pops+pushes, leaving stack
+   * shape unchanged. Then swap, convert lhs, swap back so [lhs, rhs]
+   * land in the right order for cg_binop. */
+  if (cg_top_type(p->cg) != common) cg_convert(p->cg, common);
+  cg_swap(p->cg);
+  if (cg_top_type(p->cg) != common) cg_convert(p->cg, common);
+  cg_swap(p->cg);
+  BinOp fop;
+  switch (bop) {
+    case BO_IADD: fop = BO_FADD; break;
+    case BO_ISUB: fop = BO_FSUB; break;
+    case BO_IMUL: fop = BO_FMUL; break;
+    case BO_SDIV: fop = BO_FDIV; break;
+    default:
+      perr(p, "operator does not apply to floating types");
+      return;
+  }
+  cg_binop(p->cg, fop);
+}
+
 static void parse_mul(Parser* p) {
   parse_unary(p);
   for (;;) {
@@ -2572,7 +2617,14 @@ static void parse_mul(Parser* p) {
     to_rvalue(p);
     parse_unary(p);
     to_rvalue(p);
-    cg_binop(p->cg, bop);
+    const Type* lt = cg_top2_type(p->cg);
+    const Type* rt = cg_top_type(p->cg);
+    const Type* common = common_fp_type(p, lt, rt);
+    if (common) {
+      emit_fp_binop(p, bop, common);
+    } else {
+      cg_binop(p->cg, bop);
+    }
   }
 }
 
@@ -2628,6 +2680,11 @@ static void emit_add_or_sub(Parser* p, BinOp bop) {
       return;
     }
   }
+  const Type* common = common_fp_type(p, lt, rt);
+  if (common) {
+    emit_fp_binop(p, bop, common);
+    return;
+  }
   cg_binop(p->cg, bop);
 }
 
diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md
@@ -144,6 +144,10 @@ here for completeness once they're real cases.
 | `6_5_31_subscript_commute`| ★ | `int a[5]={0,0,42,0,0}; return 2[a];`     |  42 |
 | `6_5_32_string_subscript` | ★ | `return "*"[0];`                          |  42 |
 | `6_5_33_regalloc_spill`   | ★ | 12-arg `sum12(x1+0, ..., x12+0)` — exceeds the 10-INT scratch pool, exercises `spill_reg`/`reload_reg` and the cg_call avs-in-flight fallback (see doc/REGALLOC.md) | 78 |
+| `6_5_36_fp_arith`         | ★ | `(a + b) / b * c - 36.0` over `double` — pins parser dispatch to `BO_FADD`/`FSUB`/`FMUL`/`FDIV` | 42 |
+| `6_5_37_fp_int_promote`   | ★ | `int + double` — usual arithmetic conversion promotes the int side to `double` before BO_FADD | 42 |
+| `6_5_38_fp_float_widen`   | ★ | `float + double` — float widens to double before BO_FADD | 42 |
+| `6_5_39_float_arith`      | ★ | `float * float` stays at single precision (BO_FMUL with `type=0`) | 42 |
 
 ## §6.6 Constant expressions
 
@@ -244,7 +248,7 @@ already exercised in §6.5 and §6.7.
 | `6_7_6_05_funcptr_returning_ptr` | ★ | helper returns `int*`; `int *(*fp)(int*)=...; return *fp(&x);` | 42 |
 | `6_7_6_06_array_static_n`    | ★ | helper `int rd(int p[static 3]){return p[2];}`; `int a[3]={0,0,42}; return rd(a);` | 42 |
 | `6_7_6_07_vla_local`         | ★ | `int n=7; int a[n]; for(int i=0;i<n;i++) a[i]=i*7; return a[n-1];` |  42 |
-| `6_7_6_08_variadic_decl`     | · | helper `int sum(int n, ...)` summing two ints; `sum(2, 20, 22)` (`.skip` pending cg `cg_va_*`) |  42 |
+| `6_7_6_08_variadic_decl`     | ★ | helper `int sum(int n, ...)` summing two ints; `sum(2, 20, 22)` |  42 |
 
 ## §6.7.8 Type definitions
 
@@ -309,7 +313,7 @@ cover compound typedef targets.
 | `6_9_03_tentative_def`        | ★ | file-scope `int g;` (tentative) + use      |   0 |
 | `6_9_04_static_func`          | ★ | `static int helper(...)` + caller          |  42 |
 | `6_9_05_proto_then_def`       | ★ | forward declaration before body            |  42 |
-| `6_9_06_variadic_func`        | · | `sum(int n, ...)` over `va_arg`; `sum(2,20,22)` (paired with builtin_03; `.skip` pending cg `cg_va_*`) | 42 |
+| `6_9_06_variadic_func`        | ★ | `sum(int n, ...)` over `va_arg`; `sum(2,20,22)` (paired with builtin_03) | 42 |
 | `6_9_07_global_const`         | ★ | full TU: `const int g = 42; int test_main(void){return g;}` |  42 |
 | `6_9_08_global_struct_init`   | ★ | full TU: `struct S{int v;} g={42}; int test_main(void){return g.v;}` | 42 |
 | `6_9_09_static_data_array`    | ★ | full TU: `static int g[3] = {0, 0, 42}; int test_main(void){return g[2];}` | 42 |
@@ -326,13 +330,29 @@ ordinary calls.
 |---|---|---|---|
 | `builtin_01_alloca`           | ★ | `int *p = (int *)__builtin_alloca(4); *p=42; return *p;` |  42 |
 | `builtin_02_expect`           | ★ | `if (__builtin_expect(1, 1)) return 42; return 0;` |  42 |
-| `builtin_03_va_list`          | · | uses `__builtin_va_start`/`__builtin_va_arg`/`__builtin_va_end` summing three ints (`.skip` pending cg `cg_va_*`) | 42 |
+| `builtin_03_va_list`          | ★ | uses `__builtin_va_start`/`__builtin_va_arg`/`__builtin_va_end` summing three ints | 42 |
 | `builtin_04_offsetof`         | ★ | `struct S {int a, b;}; return (int)__builtin_offsetof(struct S, b) * 10 + 2;` | 42 |
-| `builtin_05_va_copy`          | · | walks varargs twice via `__builtin_va_copy` (`.skip` pending cg `cg_va_copy`) | 42 |
+| `builtin_05_va_copy`          | ★ | walks varargs twice via `__builtin_va_copy` | 42 |
 | `builtin_06_atomic_load`      | · | `int x = 42; return __atomic_load_n(&x, __ATOMIC_RELAXED);` (`.skip` pending cg `cg_atomic_load`) | 42 |
 | `builtin_07_atomic_fetch_add` | · | `int x = 40; __atomic_fetch_add(&x, 2, __ATOMIC_RELAXED); return x;` (`.skip` pending cg `cg_atomic_rmw`) | 42 |
 | `builtin_08_syscall0`         | (deferred) | `__cfree_syscall0` requires linking against the syscall stub; covered in `test/libc` | — |
 
+## Variadic coverage
+
+Extra rows pinning down the per-class routing on AAPCS64 — the GP and
+FP save areas are independent and each `va_arg` walks its class's
+cursor, so mixed and class-only sequences both need direct exercise.
+
+| Case | Status | Body | Expected |
+|---|---|---|---|
+| `variadic_01_zero_args`   | ★ | `sum(0)` — no `...` args; tests that `va_start` on an empty trailing list still produces a usable cursor | 42 |
+| `variadic_02_many_ints`   | ★ | 12 variadic ints — exhausts the GP save area (x1..x7) and forces the va_arg stack-overflow path | 42 |
+| `variadic_03_long`        | ★ | `va_arg(ap, long)` — 8-byte int via the GP save area | 42 |
+| `variadic_04_pointer`     | ★ | passes `int*` through `...` and reads them back via `va_arg(ap, int*)` | 42 |
+| `variadic_05_double`      | ★ | three `double`s through `...` — exercises the FP save area v0..v7 and 16-byte stride | 42 |
+| `variadic_06_mixed`       | ★ | interleaved int/double `...`; per-class cursors run independently | 42 |
+| `variadic_07_nested_call` | ★ | `va_copy` to a separate cursor that's passed by `va_list*` to a helper, then the original cursor walked again | 42 |
+
 ## Negative cases (cases_err/)
 
 | Case | Status | Surface | Notes |
diff --git a/test/parse/cases/6_5_36_fp_arith.c b/test/parse/cases/6_5_36_fp_arith.c
@@ -0,0 +1,12 @@
+/* FP arithmetic — addition, subtraction, multiplication, division on
+ * `double`. Pre-cfree this lowered through the integer BO_IADD path; the
+ * parser now dispatches BO_FADD/FSUB/FMUL/FDIV when either operand is a
+ * floating type. */
+int test_main(void) {
+  double a = 100.0;
+  double b = 4.0;
+  double c = 3.0;
+  /* (100 + 4) / 4 * 3 - 36 = 26*3 - 36 = 42 */
+  double r = (a + b) / b * c - 36.0;
+  return (int)r;
+}
diff --git a/test/parse/cases/6_5_36_fp_arith.expected b/test/parse/cases/6_5_36_fp_arith.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/6_5_37_fp_int_promote.c b/test/parse/cases/6_5_37_fp_int_promote.c
@@ -0,0 +1,9 @@
+/* §6.3.1.8 usual arithmetic conversions on a mixed FP/int operand pair.
+ * The integer side is promoted to the common FP type (here `double`)
+ * before BO_FADD runs. */
+int test_main(void) {
+  int n = 40;
+  double d = 2.0;
+  /* n is converted to double, then 40.0 + 2.0 = 42.0. */
+  return (int)(n + d);
+}
diff --git a/test/parse/cases/6_5_37_fp_int_promote.expected b/test/parse/cases/6_5_37_fp_int_promote.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/6_5_38_fp_float_widen.c b/test/parse/cases/6_5_38_fp_float_widen.c
@@ -0,0 +1,6 @@
+/* float + double widens float to double per §6.3.1.8. */
+int test_main(void) {
+  float f = 20.0f;
+  double d = 22.0;
+  return (int)(f + d);
+}
diff --git a/test/parse/cases/6_5_38_fp_float_widen.expected b/test/parse/cases/6_5_38_fp_float_widen.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/6_5_39_float_arith.c b/test/parse/cases/6_5_39_float_arith.c
@@ -0,0 +1,6 @@
+/* Pure-`float` arithmetic — both operands stay at single precision. */
+int test_main(void) {
+  float a = 6.0f;
+  float b = 7.0f;
+  return (int)(a * b);
+}
diff --git a/test/parse/cases/6_5_39_float_arith.expected b/test/parse/cases/6_5_39_float_arith.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/6_7_6_08_variadic_decl.skip b/test/parse/cases/6_7_6_08_variadic_decl.skip
@@ -1 +0,0 @@
-cg va_* primitives are stubs; parser routing for __builtin_va_* landed in Phase 9, lowering needs cg backend wiring
diff --git a/test/parse/cases/6_9_06_variadic_func.skip b/test/parse/cases/6_9_06_variadic_func.skip
@@ -1 +0,0 @@
-cg va_* primitives are stubs; parser routing for __builtin_va_* landed in Phase 9, lowering needs cg backend wiring
diff --git a/test/parse/cases/builtin_03_va_list.skip b/test/parse/cases/builtin_03_va_list.skip
@@ -1 +0,0 @@
-cg va_* primitives are stubs (cg_va_start/_arg/_end panic); parser routing landed in Phase 9, lowering needs cg backend wiring
diff --git a/test/parse/cases/builtin_05_va_copy.skip b/test/parse/cases/builtin_05_va_copy.skip
@@ -1 +0,0 @@
-cg va_* primitives are stubs (cg_va_copy panics); parser routing landed in Phase 9, lowering needs cg backend wiring
diff --git a/test/parse/cases/variadic_01_zero_args.c b/test/parse/cases/variadic_01_zero_args.c
@@ -0,0 +1,12 @@
+int sum(int n, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, n);
+  int s = 0;
+  for (int i = 0; i < n; i++) s += __builtin_va_arg(ap, int);
+  __builtin_va_end(ap);
+  return s + 42;
+}
+
+int test_main(void) {
+  return sum(0);
+}
diff --git a/test/parse/cases/variadic_01_zero_args.expected b/test/parse/cases/variadic_01_zero_args.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/variadic_02_many_ints.c b/test/parse/cases/variadic_02_many_ints.c
@@ -0,0 +1,15 @@
+/* More than 8 variadic ints — exhausts the AAPCS64 GP save area
+ * (x0..x7) and forces va_arg to fall through to the stack overflow
+ * path. */
+int sum(int n, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, n);
+  int s = 0;
+  for (int i = 0; i < n; i++) s += __builtin_va_arg(ap, int);
+  __builtin_va_end(ap);
+  return s;
+}
+
+int test_main(void) {
+  return sum(12, 1, 2, 3, 4, 5, 6, 7, 8, 1, 2, 3, 0);
+}
diff --git a/test/parse/cases/variadic_02_many_ints.expected b/test/parse/cases/variadic_02_many_ints.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/variadic_03_long.c b/test/parse/cases/variadic_03_long.c
@@ -0,0 +1,13 @@
+/* va_arg for a non-int integer width: long (8 bytes on AAPCS64). */
+long sum_long(int n, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, n);
+  long s = 0;
+  for (int i = 0; i < n; i++) s += __builtin_va_arg(ap, long);
+  __builtin_va_end(ap);
+  return s;
+}
+
+int test_main(void) {
+  return (int)sum_long(3, 10L, 20L, 12L);
+}
diff --git a/test/parse/cases/variadic_03_long.expected b/test/parse/cases/variadic_03_long.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/variadic_04_pointer.c b/test/parse/cases/variadic_04_pointer.c
@@ -0,0 +1,18 @@
+/* Variadic pointer args — cfree must route them through the GP save area
+ * just like ints, and va_arg(ap, int*) must yield the pointer back. */
+int sum_through_ptrs(int n, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, n);
+  int s = 0;
+  for (int i = 0; i < n; i++) {
+    int* p = __builtin_va_arg(ap, int*);
+    s += *p;
+  }
+  __builtin_va_end(ap);
+  return s;
+}
+
+int test_main(void) {
+  int a = 10, b = 20, c = 12;
+  return sum_through_ptrs(3, &a, &b, &c);
+}
diff --git a/test/parse/cases/variadic_04_pointer.expected b/test/parse/cases/variadic_04_pointer.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/variadic_05_double.c b/test/parse/cases/variadic_05_double.c
@@ -0,0 +1,14 @@
+/* Variadic double args — exercises the AAPCS64 FP save area (v0..v7)
+ * and the va_arg FP path (16-byte stride into the FP region). */
+double sum_d(int n, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, n);
+  double s = 0.0;
+  for (int i = 0; i < n; i++) s = s + __builtin_va_arg(ap, double);
+  __builtin_va_end(ap);
+  return s;
+}
+
+int test_main(void) {
+  return (int)sum_d(3, 10.0, 20.0, 12.0);
+}
diff --git a/test/parse/cases/variadic_05_double.expected b/test/parse/cases/variadic_05_double.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/variadic_06_mixed.c b/test/parse/cases/variadic_06_mixed.c
@@ -0,0 +1,19 @@
+/* Mixed integer + double variadic args. AAPCS64 routes ints through the
+ * GP save area and doubles through the FP save area independently — the
+ * caller's argument order doesn't determine the save-slot order, only
+ * the per-class cursor does. Sums in FP to also exercise int→double
+ * usual-arithmetic conversion. */
+double mixed(int n, ...) {
+  __builtin_va_list ap;
+  __builtin_va_start(ap, n);
+  int i1 = __builtin_va_arg(ap, int);
+  double d1 = __builtin_va_arg(ap, double);
+  int i2 = __builtin_va_arg(ap, int);
+  double d2 = __builtin_va_arg(ap, double);
+  __builtin_va_end(ap);
+  return i1 + d1 + i2 + d2;
+}
+
+int test_main(void) {
+  return (int)mixed(4, 10, 11.0, 9, 12.0);
+}
diff --git a/test/parse/cases/variadic_06_mixed.expected b/test/parse/cases/variadic_06_mixed.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/variadic_07_nested_call.c b/test/parse/cases/variadic_07_nested_call.c
@@ -0,0 +1,26 @@
+/* va_copy lets a caller make a second walk of the same arg list — here
+ * we use it to call a helper from inside the variadic function while
+ * preserving the original cursor for additional consumption. */
+int sum_n(int n, __builtin_va_list* ap) {
+  int s = 0;
+  for (int i = 0; i < n; i++) s += __builtin_va_arg(*ap, int);
+  return s;
+}
+
+int outer(int n, ...) {
+  __builtin_va_list ap, ap_for_helper;
+  __builtin_va_start(ap, n);
+  __builtin_va_copy(ap_for_helper, ap);
+  int helper_total = sum_n(n, &ap_for_helper);
+  __builtin_va_end(ap_for_helper);
+  int local_total = 0;
+  for (int i = 0; i < n; i++) local_total += __builtin_va_arg(ap, int);
+  __builtin_va_end(ap);
+  return helper_total + local_total - 42;
+}
+
+int test_main(void) {
+  /* helper sums to 21, local sums to 21, +21+21-42 = 0 → return 42 via
+   * the +42 below. */
+  return outer(3, 5, 7, 9) + 42;
+}
diff --git a/test/parse/cases/variadic_07_nested_call.expected b/test/parse/cases/variadic_07_nested_call.expected
@@ -0,0 +1 @@
+42

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

M	doc/parser-status.md	\|	51	++++++++++++++++++++++++++++++++++++++++++---------
M	src/arch/aarch64.c	\|	77	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
M	src/cg/cg.c	\|	50	+++++++++++++++++++++++++++++++++++++++++++-------
M	src/parse/parse.c	\|	59	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
M	test/parse/CORPUS.md	\|	28	++++++++++++++++++++++++----
A	test/parse/cases/6_5_36_fp_arith.c	\|	12	++++++++++++
A	test/parse/cases/6_5_36_fp_arith.expected	\|	1	+
A	test/parse/cases/6_5_37_fp_int_promote.c	\|	9	+++++++++
A	test/parse/cases/6_5_37_fp_int_promote.expected	\|	1	+
A	test/parse/cases/6_5_38_fp_float_widen.c	\|	6	++++++
A	test/parse/cases/6_5_38_fp_float_widen.expected	\|	1	+
A	test/parse/cases/6_5_39_float_arith.c	\|	6	++++++
A	test/parse/cases/6_5_39_float_arith.expected	\|	1	+
D	test/parse/cases/6_7_6_08_variadic_decl.skip	\|	1	-
D	test/parse/cases/6_9_06_variadic_func.skip	\|	1	-
D	test/parse/cases/builtin_03_va_list.skip	\|	1	-
D	test/parse/cases/builtin_05_va_copy.skip	\|	1	-
A	test/parse/cases/variadic_01_zero_args.c	\|	12	++++++++++++
A	test/parse/cases/variadic_01_zero_args.expected	\|	1	+
A	test/parse/cases/variadic_02_many_ints.c	\|	15	+++++++++++++++
A	test/parse/cases/variadic_02_many_ints.expected	\|	1	+
A	test/parse/cases/variadic_03_long.c	\|	13	+++++++++++++
A	test/parse/cases/variadic_03_long.expected	\|	1	+
A	test/parse/cases/variadic_04_pointer.c	\|	18	++++++++++++++++++
A	test/parse/cases/variadic_04_pointer.expected	\|	1	+
A	test/parse/cases/variadic_05_double.c	\|	14	++++++++++++++
A	test/parse/cases/variadic_05_double.expected	\|	1	+
A	test/parse/cases/variadic_06_mixed.c	\|	19	+++++++++++++++++++
A	test/parse/cases/variadic_06_mixed.expected	\|	1	+
A	test/parse/cases/variadic_07_nested_call.c	\|	26	++++++++++++++++++++++++++
A	test/parse/cases/variadic_07_nested_call.expected	\|	1	+