cg: extract call-shape helpers and central storage-shape predicate - kit

commit 889ad29eec3c9e6ab03bc7592c419335285b2463
parent 96469fa516e26b1e76409a026678a264cb06a928
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Tue, 19 May 2026 15:29:41 -0700

cg: extract call-shape helpers and central storage-shape predicate

Preparatory refactor for an ABI-driven storage-shape decision (see
doc/CBACKEND.md). Adds api_arg_storage_must_be_addr as the single
predicate consulted by api_release_arg_storage and the new
api_alloc_call_ret_storage helper. Dedupes cfree_cg_call and
api_call_symbol_common via api_alloc_call_args, api_pack_call_arg,
api_alloc_call_ret_storage, api_release_call_args, and
api_push_call_result. Behavior preserving: test-cg-api (610), test-opt,
test-toy, test-smoke-x64 all green.

Diffstat:
A doc/CBACKEND.md  | 298 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M src/api/cg.c  | 239 ++++++++++++++++++++++++++++++++++++-------------------------------------------

2 files changed, 406 insertions(+), 131 deletions(-)
diff --git a/doc/CBACKEND.md b/doc/CBACKEND.md
@@ -0,0 +1,298 @@
+# C Source Backend and ABI Storage-Shape Refactor
+
+## Motivation
+
+cfree's no-deps posture rules out linking against LLVM or GCC's optimizer
+directly. The practical path to "industrial-strength" optimization for cfree
+users is to emit C from the CG layer and hand the result to gcc/clang. A C
+backend lives at the same layer as `arch_impl_aa64`, `arch_impl_x64`, etc.: a
+new `arch_impl_c` with its own `CGTarget` and ABI vtable. Frontends do not need
+to know it exists.
+
+GCC/clang-extension C covers what looked like blockers on first read:
+
+- inline asm — `IRAsmAux` is already GCC's `asm(tmpl : outs : ins : clobbers)` shape.
+- overflow/trap — `__builtin_{add,sub,mul}_overflow`, `__builtin_trap`,
+  `__builtin_unreachable`.
+- atomics — `_Atomic` + `<stdatomic.h>` with explicit `memory_order_*`.
+- TLS — `__thread` or `_Thread_local`.
+
+The real blocker is one layer up: the CG layer makes aggregate-passing
+decisions that bypass the ABI vtable. A trivial "C ABI" — every arg
+`ABI_ARG_DIRECT` with one full-coverage part — would still see the CG layer
+materialize aggregates as addresses and allocate sret slots.
+
+This document plans the refactor that makes those decisions ABI-driven, so a
+trivial C ABI vtable produces value-shaped storage suitable for emitting
+`ret = f(a, b, c)` C source.
+
+## Current State
+
+### ABI Vtable Selection
+
+Native ABIs already classify small aggregates as `ABI_ARG_DIRECT` with multiple
+parts (e.g. SysV-x64 splits a 16B struct into two `ABI_CLASS_INT` parts in
+`src/abi/abi_sysv_x64.c:53-71`). Large aggregates classify as
+`ABI_ARG_INDIRECT`. The ABI vtable is selected per-target via
+`ArchImpl.abi_vtable` (`src/arch/arch.h:899`) and dispatched through
+`abi_init` → `select_vtable` (`src/abi/abi.c:176`).
+
+### Preparatory Refactors (landed)
+
+Two preparatory passes shaped `src/api/cg.c` so the functional change can be
+small and confined to a couple of helper bodies:
+
+**Prep A — central predicate.** Added a single helper that today encodes the
+type-shape decision; future change rewrites only its body.
+
+```c
+/* src/api/cg.c:1323 */
+static int api_arg_storage_must_be_addr(Compiler *c, CfreeCgTypeId ty) {
+  return cg_type_is_aggregate(c, ty) || api_is_wide16_scalar_type(c, ty);
+}
+```
+
+Used by `api_release_arg_storage` (`src/api/cg.c:2129`) and
+`api_alloc_call_ret_storage` (`src/api/cg.c:6408`).
+
+**Prep B — call-shape helpers.** `cfree_cg_call` and `api_call_symbol_common`
+shared a ~80-line duplicated body. Extracted five helpers and reduced both
+public entry points to thin orchestration:
+
+| Helper                          | Location              | Role                                       |
+| ------------------------------- | --------------------- | ------------------------------------------ |
+| `api_alloc_call_args`           | `src/api/cg.c:6363`   | `avs` array + `avs_in_flight` setup        |
+| `api_pack_call_arg`             | `src/api/cg.c:6374`   | per-arg type resolution + 3-way packaging  |
+| `api_alloc_call_ret_storage`    | `src/api/cg.c:6406`   | return slot vs return register             |
+| `api_release_call_args`         | `src/api/cg.c:6424`   | post-call release loop                     |
+| `api_push_call_result`          | `src/api/cg.c:6432`   | lv/sv push based on storage kind           |
+
+After Prep A+B, the CG-side surface area that needs to change is reduced to
+two helper bodies and one as-yet-unextracted ret packaging function.
+
+### Remaining Predicate Sites
+
+After the prep refactors, the type-shape decisions that still need to become
+ABI-driven live in just three places:
+
+1. **`api_arg_storage_must_be_addr`** (`src/api/cg.c:1323`) — the central
+   predicate consulted by `api_release_arg_storage` and
+   `api_alloc_call_ret_storage`. Today: `is_aggregate || wide16`.
+2. **`api_pack_call_arg`** (`src/api/cg.c:6374`) — per-arg packaging still
+   has a three-way switch (`api_is_wide16_scalar_type` at line 6387,
+   `cg_type_is_aggregate` at 6392, scalar fall-through). The three branches
+   collapse to "address-shaped" vs "value-shaped" under ABI control.
+3. **`cfree_cg_ret`** (`src/api/cg.c:6636`) — ret packaging still has the
+   same three-way switch inline (`is_aggregate` at 6654, `wide16` at 6662).
+   Not extracted yet because Prep B's scope was call/call_symbol dedupe.
+
+Together these three are the entire functional surface for Phase 1.
+
+## Refactor Plan
+
+### Invariant to Introduce
+
+`CGABIValue.storage` shape is determined by an ABI helper, not by
+`cg_type_is_aggregate`:
+
+```c
+/* In src/abi/abi.h */
+typedef enum ABIStorageShape {
+  ABI_STORAGE_VALUE,  /* storage is the value itself (REG/IMM/GLOBAL/LOCAL) */
+  ABI_STORAGE_ADDR,   /* storage is the address of a memory image */
+} ABIStorageShape;
+
+ABIStorageShape abi_arg_storage_shape(const ABIArgInfo*, u32 type_size);
+```
+
+Rule:
+
+- `ABI_STORAGE_ADDR` iff `kind == ABI_ARG_INDIRECT`, **or**
+  `kind == ABI_ARG_DIRECT && (nparts > 1 || parts[0].src_offset != 0 ||
+  parts[0].size != type_size)`.
+- Otherwise `ABI_STORAGE_VALUE`.
+
+This makes today's native behavior fall out unchanged: small structs
+(multi-part DIRECT) → ADDR; large structs (INDIRECT) → ADDR. Only a
+trivial DIRECT — one part, full coverage, zero offset — produces VALUE,
+which is exactly what the C target will register.
+
+### Touch List
+
+**`src/abi/abi.h` / `src/abi/abi.c`** — add `ABIStorageShape` enum and
+`abi_arg_storage_shape()`.
+
+**`src/api/cg.c`** — rewrite three function bodies:
+
+| Site                              | Location              | Change                                                                 |
+| --------------------------------- | --------------------- | ---------------------------------------------------------------------- |
+| `api_arg_storage_must_be_addr`    | `src/api/cg.c:1323`   | body becomes `abi_arg_storage_shape(abi, size) == ABI_STORAGE_ADDR`    |
+| `api_pack_call_arg`               | `src/api/cg.c:6374`   | collapse 3-way switch to `must_be_addr`-driven materialization         |
+| `cfree_cg_ret`                    | `src/api/cg.c:6636`   | same collapse, OR pre-extract a `api_pack_ret_value` helper first      |
+
+Optionally extract `api_pack_ret_value` from `cfree_cg_ret` as Prep C before
+the functional change, so the three-way collapse lives in helper bodies
+rather than mid-public-function. Small, mechanical, ~20 LOC.
+
+**`src/abi/abi_sysv_x64.c`, `abi_aapcs64.c`, `abi_apple_arm64.c`,
+`abi_rv64.c`** — extend `classify_one`/`classify_scalar` to classify wide16
+scalars (i128, long double) as `ABI_ARG_DIRECT` with multi-part shape. See
+Phase 2 below.
+
+**Untouched** — other `cg_type_is_aggregate` sites in `cg.c` (lines 1754,
+1782, 3823ff, 3945ff, 5094, 5103). Those handle assignments, lvalue
+conversion, and address-of. They are correctly aggregate-policy, not
+ABI-policy.
+
+**Native backends** — no expected change. They already consult
+`desc.args[i].abi` and the invariant preserves what they see today.
+
+## Phasing
+
+### Phase 0 (done) — preparatory refactors
+
+- Prep A: central `api_arg_storage_must_be_addr` predicate.
+- Prep B: extract `api_alloc_call_args`, `api_pack_call_arg`,
+  `api_alloc_call_ret_storage`, `api_release_call_args`,
+  `api_push_call_result`.
+- Verified by `test-cg-api` (610 pass), `test-opt`, `test-toy`,
+  `test-smoke-x64`.
+
+### Phase 1 — Helper bodies become ABI-driven
+
+- Add `abi_arg_storage_shape()` in `abi.h`/`abi.c`.
+- Rewrite `api_arg_storage_must_be_addr` body to delegate to the new helper
+  (needs `const ABIArgInfo*` and `type_size` — adjust the helper signature
+  accordingly, and pass them through from `api_pack_call_arg` /
+  `api_alloc_call_ret_storage` / `api_release_arg_storage`).
+- Collapse the three-way switches in `api_pack_call_arg` and `cfree_cg_ret`
+  (or extracted `api_pack_ret_value`) into a single `must_be_addr` branch.
+
+Acceptance: `make test-cg-api test-opt test-link test-elf test-toy
+test-smoke-x64 test-smoke-rv64 test-aa64-inline` pass; spot-check `.o`
+outputs on a representative corpus against the current state to confirm
+byte-identical codegen for native ABIs.
+
+### Phase 2 — Migrate wide16 to ABI classification
+
+Today `api_is_wide16_scalar_type` papers over incomplete ABI classifiers in
+some native targets (see Risks below). Phase 2 fixes the classifiers, then
+removes the wide16-specific code path from the predicate.
+
+- Fix SysV-x64 `classify_scalar` to emit DIRECT/2-INT-parts for
+  `ti.size == 16 && ti.scalar_kind == ABI_SC_INT` (the i128 case),
+  matching what RV64 and AAPCS64 already do.
+- Defer long-double-as-FP correctness — long double passes through
+  memory in current cfree even on native targets, and the existing
+  wide16 shortcut effectively forces that. Either retain the
+  `is_wide16` check just for long-double cases (narrow the branch),
+  or introduce a dedicated x87 / 16B-FP ABI class as a separate piece
+  of work. The C-backend refactor does not require this fix.
+- After classifiers are correct, drop the `api_is_wide16_scalar_type`
+  clause from `api_arg_storage_must_be_addr`.
+
+Acceptance: same as Phase 1, plus `test-libc` (long double through
+musl/glibc paths) and any i128 coverage.
+
+### Phase 3 — Negative-control fixture
+
+Add a unit test in `test/api/` that constructs a `Compiler` with a
+synthetic ABI vtable returning trivial DIRECT/one-full-part for everything,
+drives `cfree_cg_call` with an aggregate arg and aggregate return, and
+asserts `desc.args[0].storage.kind != OPK_INDIRECT` and that no sret frame
+slot was allocated.
+
+This fixture locks in the new invariant so future changes cannot
+accidentally regress to always-address-for-aggregate.
+
+### Phase 4 (out of scope here, but enabled by this work)
+
+- Add `arch_impl_c` with a `c_abi_vtable` whose `compute_func_info` returns
+  trivial DIRECT/one-full-part for every arg and return.
+- Stub `cgtarget_new` to a placeholder that records call/ret shapes for
+  inspection.
+- The actual C-source emitter is a separate piece of work, driven by the
+  recorded `CGCallDesc` shape that this refactor now makes value-typed for
+  aggregates.
+
+## Risks and Open Items
+
+Investigated post-plan and after the prep refactors:
+
+- **`api_release_arg_storage` (resolved by Prep A).**
+  Originally a fifth open site; now uses `api_arg_storage_must_be_addr`
+  directly (`src/api/cg.c:2129`). Resolution: same helper drives the
+  decision.
+
+- **`call_symbol` duplication (resolved by Prep B).**
+  Both `cfree_cg_call` and `api_call_symbol_common` now share the five
+  extracted helpers and contain only the call-shape orchestration. Drift
+  is no longer a maintenance concern.
+
+- **`fn_abi` is reliably non-null inside a function body.**
+  Set at `cfree_cg_func_begin` (`src/api/cg.c:3125`) and cleared at
+  `func_end` (line 3149). `cfree_cg_ret` only runs within that window.
+  No null-safe fallback needed.
+
+- **CGCallPlan / backends are already fully ABI-driven.**
+  Grep across `src/arch/` finds zero `cg_type_is_aggregate` references.
+  Every site branches on `ai->kind == ABI_ARG_INDIRECT` or iterates
+  `ai->parts`. Examples: `arch/aa64/ops.c:904`, `arch/x64/alloc.c:54`,
+  `arch/rv64/opt_coord.c:178`. The new invariant preserves what native
+  backends observe (multi-part DIRECT aggregates still produce ADDR
+  storage), so backends do not change.
+
+- **Wide16 classification is incomplete in some native ABIs — this is
+  the biggest finding and Phase 2's largest hidden cost.**
+  Today the wide16 check in `api_arg_storage_must_be_addr` papers over
+  bugs in the underlying ABI classifiers. Per-target status:
+
+  - **RV64** (`src/abi/abi_rv64.c:23-43`): correctly classifies 16B
+    INT or FLOAT scalars as DIRECT with two 8B INT parts. ✓
+  - **AAPCS64** (`src/abi/abi_aapcs64.c:23-39`): correctly classifies
+    16B INT scalars (i128) as DIRECT/2-parts. **Missing**: 16B FP
+    (long double on ARM64) should be DIRECT with one or two FP parts
+    in Q-registers, not fall through to single 16B INT part.
+  - **Apple ARM64** (`src/abi/abi_apple_arm64.c`): delegates to AAPCS64;
+    inherits the same long-double gap.
+  - **SysV-x64** (`src/abi/abi_sysv_x64.c:28-44`): **no 16B branch
+    at all**. i128 currently falls through to a single 16B INT part —
+    malformed because no GPR can hold it. Long double is 80-bit x87
+    with 16B alignment and needs a target-specific class entirely.
+    The wide16 clause in `api_arg_storage_must_be_addr` hides both bugs
+    by always routing wide16 through a memory image.
+
+  **Consequence**: if Phase 2 drops the wide16 clause before fixing the
+  SysV-x64 and AAPCS64 classifiers, native codegen breaks. The new
+  `abi_arg_storage_shape` would compute VALUE for a malformed single-part
+  DIRECT (one part, `src_offset==0, size==type_size==16`), but no Operand
+  kind can hold a 16B value.
+
+- **HFA / HVA in AAPCS64**: the existing classifier explicitly defers
+  HFA refinement (see comment at `src/abi/abi_aapcs64.c:9` and `:68-69`).
+  Small aggregates today classify uniformly as DIRECT/INT-parts. Wide16
+  classification (i128) does not collide with HFA logic because the two
+  enter `classify_one` through disjoint type kinds (RECORD vs scalar).
+  Confirmed safe.
+
+- **`tail` interaction**: the tail-call path
+  (`src/api/cg.c:6497-6498`) calls `api_regalloc_finish` before
+  `T->call`, which can mutate live storage state. The storage-shape
+  helper is queried per-arg during pre-call packaging, before this
+  finish call, so the decision sequencing is unchanged. No additional
+  risk.
+
+## Estimated Size
+
+- Phase 0 (done): Prep A (~25 LOC) + Prep B (~95 LOC of helpers, ~120 LOC
+  of duplicate body deleted from `cfree_cg_call` / `api_call_symbol_common`).
+- Phase 1: ~20 LOC for `abi_arg_storage_shape` + rewriting three function
+  bodies in `cg.c` (signature changes to thread `ABIArgInfo*` + size into
+  the helpers).
+- Optional Prep C (extract `api_pack_ret_value` from `cfree_cg_ret`): ~20 LOC.
+- Phase 2a (i128 classification fix): ~30 LOC in `abi_sysv_x64.c` +
+  removing the i128 path from the wide16 clause. ~50 LOC total.
+- Phase 2b (long-double, optional / deferable): not required for the C
+  backend. Treat as separate work.
+- Phase 3 (negative-control fixture): one ~150 LOC test file.
+- Total remaining for C-backend prerequisite: ~250 LOC, no public API change.
diff --git a/src/api/cg.c b/src/api/cg.c
@@ -1314,6 +1314,16 @@ static int api_is_wide16_scalar_type(Compiler *c, CfreeCgTypeId ty) {
   return api_is_f128_type(c, ty) || api_is_i128_type(c, ty);
 }
 
+/* Whether a CGABIValue.storage for `ty` must be an address operand (pointing
+ * to a memory image of the value) rather than a value operand. Today this is
+ * driven by the type shape — aggregates and wide16 scalars cannot fit in a
+ * single Operand. A future refactor will key this off ABIArgInfo so a
+ * trivial-DIRECT ABI (e.g. for a C-source backend) can keep aggregates as
+ * value operands. See doc/CBACKEND.md. */
+static int api_arg_storage_must_be_addr(Compiler *c, CfreeCgTypeId ty) {
+  return cg_type_is_aggregate(c, ty) || api_is_wide16_scalar_type(c, ty);
+}
+
 static Operand api_op_imm(i64 v, CfreeCgTypeId ty) {
   Operand o;
   memset(&o, 0, sizeof o);
@@ -2116,7 +2126,7 @@ static void api_release_arg_storage(CfreeCg *g, Operand *storage) {
     api_free_reg(g, storage->v.reg, storage->cls);
   } else if (storage->kind == OPK_LOCAL && storage->cls < 3) {
     CfreeCgTypeId ty = storage->type;
-    if (cg_type_is_aggregate(g->c, ty) || api_is_wide16_scalar_type(g->c, ty))
+    if (api_arg_storage_must_be_addr(g->c, ty))
       return;
     api_return_spill_slot(g, storage->v.frame_slot, storage->cls);
   } else if (storage->kind == OPK_INDIRECT) {
@@ -6342,6 +6352,93 @@ void cfree_cg_field(CfreeCg *g, uint32_t field_index) {
  * Calls / return
  * ============================================================ */
 
+/* Shared scaffolding for cfree_cg_call / cfree_cg_call_symbol. The two
+ * public entry points differ only in how the callee is obtained and in
+ * their pre-call stack-depth check; everything else (arg packaging, return
+ * storage allocation, post-call release, result push) is identical. These
+ * helpers carry the common shape and are the natural targets for any future
+ * change that wants to vary call-shape policy (e.g. an ABI-driven storage
+ * decision). */
+
+static CGABIValue *api_alloc_call_args(CfreeCg *g, u32 nargs) {
+  CGABIValue *avs = NULL;
+  if (nargs) {
+    avs = arena_array(g->c->tu, CGABIValue, nargs);
+    memset(avs, 0, sizeof(CGABIValue) * nargs);
+  }
+  g->avs_in_flight = avs;
+  g->avs_in_flight_n = nargs;
+  return avs;
+}
+
+static void api_pack_call_arg(CfreeCg *g, CGABIValue *av, CfreeCgTypeId fty,
+                              const ABIFuncInfo *abi, u32 idx) {
+  ApiSValue arg = api_pop(g);
+  int is_vararg = (idx >= abi->nparams);
+  CfreeCgTypeId aty = is_vararg
+                          ? (arg.type ? arg.type : api_sv_type(&arg))
+                          : cg_type_func_param_id(g->c, fty, idx);
+  if (!aty)
+    aty = arg.type;
+
+  av->type = aty;
+  av->abi = is_vararg ? NULL : &abi->params[idx];
+
+  if (api_is_wide16_scalar_type(g->c, aty)) {
+    ApiSValue lv = api_wide16_materialize_lvalue(g, &arg, aty);
+    av->storage = lv.op;
+    av->storage.type = aty;
+    av->size = 16;
+  } else if (cg_type_is_aggregate(g->c, aty)) {
+    api_ensure_reg(g, &arg);
+    Operand st = arg.op;
+    st.type = aty;
+    av->storage = st;
+    av->size = abi_cg_sizeof(g->c->abi, aty);
+  } else {
+    api_ensure_reg(g, &arg);
+    av->storage = (api_is_lvalue_sv(&arg) || arg.op.kind == OPK_GLOBAL)
+                      ? api_force_reg(g, &arg, aty)
+                      : arg.op;
+  }
+}
+
+static void api_alloc_call_ret_storage(CfreeCg *g, CGTarget *T,
+                                       CfreeCgTypeId ret_ty, Operand *out) {
+  if (api_arg_storage_must_be_addr(g->c, ret_ty)) {
+    FrameSlotDesc fsd;
+    memset(&fsd, 0, sizeof fsd);
+    fsd.type = ret_ty;
+    fsd.size = abi_cg_sizeof(g->c->abi, ret_ty);
+    fsd.align = abi_cg_alignof(g->c->abi, ret_ty);
+    fsd.kind = FS_LOCAL;
+    fsd.flags = FSF_ADDR_TAKEN;
+    FrameSlot slot = T->frame_slot(T, &fsd);
+    *out = api_op_local(slot, ret_ty);
+  } else {
+    Reg r = api_alloc_reg_or_spill(g, api_type_class(ret_ty), ret_ty);
+    *out = api_op_reg(r, ret_ty);
+  }
+}
+
+static void api_release_call_args(CfreeCg *g, CGABIValue *avs, u32 nargs) {
+  for (u32 i = 0; i < nargs; ++i) {
+    api_release_arg_storage(g, &avs[i].storage);
+  }
+  g->avs_in_flight = NULL;
+  g->avs_in_flight_n = 0;
+}
+
+static void api_push_call_result(CfreeCg *g, Operand ret_storage,
+                                 CfreeCgTypeId ret_ty) {
+  if (ret_storage.kind == OPK_LOCAL || ret_storage.kind == OPK_GLOBAL ||
+      ret_storage.kind == OPK_INDIRECT) {
+    api_push(g, api_make_lv(ret_storage, ret_ty));
+  } else {
+    api_push(g, api_make_sv(ret_storage, ret_ty));
+  }
+}
+
 void cfree_cg_call(CfreeCg *g, uint32_t nargs, CfreeCgTypeId fn_type,
                    CfreeCgCallAttrs attrs) {
   CGTarget *T;
@@ -6371,48 +6468,10 @@ void cfree_cg_call(CfreeCg *g, uint32_t nargs, CfreeCgTypeId fn_type,
     return;
   }
 
-  avs = NULL;
-  if (nargs) {
-    avs = arena_array(g->c->tu, CGABIValue, nargs);
-    memset(avs, 0, sizeof(CGABIValue) * nargs);
-  }
-
-  g->avs_in_flight = avs;
-  g->avs_in_flight_n = nargs;
-
+  avs = api_alloc_call_args(g, nargs);
   for (u32 i = 0; i < nargs; ++i) {
     u32 idx = nargs - 1u - i;
-    ApiSValue arg = api_pop(g);
-    int is_vararg = (idx >= abi->nparams);
-    CfreeCgTypeId aty;
-    if (is_vararg) {
-      aty = arg.type ? arg.type : api_sv_type(&arg);
-    } else {
-      aty = cg_type_func_param_id(g->c, fty, idx);
-      if (!aty)
-        aty = arg.type;
-    }
-    avs[idx].type = aty;
-    avs[idx].abi = is_vararg ? NULL : &abi->params[idx];
-    int is_aggregate = cg_type_is_aggregate(g->c, aty);
-    if (api_is_wide16_scalar_type(g->c, aty)) {
-      ApiSValue lv = api_wide16_materialize_lvalue(g, &arg, aty);
-      avs[idx].storage = lv.op;
-      avs[idx].storage.type = aty;
-      avs[idx].size = 16;
-    } else if (is_aggregate) {
-      api_ensure_reg(g, &arg);
-      Operand st = arg.op;
-      st.type = aty;
-      avs[idx].storage = st;
-      avs[idx].size = abi_cg_sizeof(g->c->abi, aty);
-    } else {
-      api_ensure_reg(g, &arg);
-      avs[idx].storage =
-          (api_is_lvalue_sv(&arg) || arg.op.kind == OPK_GLOBAL)
-              ? api_force_reg(g, &arg, aty)
-              : arg.op;
-    }
+    api_pack_call_arg(g, &avs[idx], fty, abi, idx);
   }
 
   callee = api_pop(g);
@@ -6432,22 +6491,7 @@ void cfree_cg_call(CfreeCg *g, uint32_t nargs, CfreeCgTypeId fn_type,
   desc.ret.abi = &abi->ret;
 
   if (has_result) {
-    int ret_is_aggregate = cg_type_is_aggregate(g->c, ret_ty);
-    if (ret_is_aggregate || api_is_wide16_scalar_type(g->c, ret_ty)) {
-      FrameSlotDesc fsd;
-      memset(&fsd, 0, sizeof fsd);
-      fsd.type = ret_ty;
-      fsd.size = abi_cg_sizeof(g->c->abi, ret_ty);
-      fsd.align = abi_cg_alignof(g->c->abi, ret_ty);
-      fsd.kind = FS_LOCAL;
-      if (ret_is_aggregate || api_is_wide16_scalar_type(g->c, ret_ty))
-        fsd.flags = FSF_ADDR_TAKEN;
-      FrameSlot ret_slot = T->frame_slot(T, &fsd);
-      desc.ret.storage = api_op_local(ret_slot, ret_ty);
-    } else {
-      Reg r = api_alloc_reg_or_spill(g, api_type_class(ret_ty), ret_ty);
-      desc.ret.storage = api_op_reg(r, ret_ty);
-    }
+    api_alloc_call_ret_storage(g, T, ret_ty, &desc.ret.storage);
   } else {
     desc.ret.storage = api_op_imm(0, builtin_id(CFREE_CG_BUILTIN_VOID));
   }
@@ -6456,24 +6500,14 @@ void cfree_cg_call(CfreeCg *g, uint32_t nargs, CfreeCgTypeId fn_type,
     api_regalloc_finish(g);
   T->call(T, &desc);
 
-  for (u32 i = 0; i < nargs; ++i) {
-    api_release_arg_storage(g, &avs[i].storage);
-  }
-  g->avs_in_flight = NULL;
-  g->avs_in_flight_n = 0;
+  api_release_call_args(g, avs, nargs);
 
   if (callee.op.kind != OPK_GLOBAL) {
     api_free_reg(g, callee_op.v.reg, RC_INT);
   }
 
   if (has_result) {
-    if (desc.ret.storage.kind == OPK_LOCAL ||
-        desc.ret.storage.kind == OPK_GLOBAL ||
-        desc.ret.storage.kind == OPK_INDIRECT) {
-      api_push(g, api_make_lv(desc.ret.storage, ret_ty));
-    } else {
-      api_push(g, api_make_sv(desc.ret.storage, ret_ty));
-    }
+    api_push_call_result(g, desc.ret.storage, ret_ty);
   }
 }
 
@@ -6565,42 +6599,10 @@ static void api_call_symbol_common(CfreeCg *g, CfreeCgSym sym, uint32_t nargs,
     compiler_panic(g->c, g->cur_loc, "CfreeCg: call stack underflow");
     return;
   }
-  avs = NULL;
-  if (nargs) {
-    avs = arena_array(g->c->tu, CGABIValue, nargs);
-    memset(avs, 0, sizeof(CGABIValue) * nargs);
-  }
-  g->avs_in_flight = avs;
-  g->avs_in_flight_n = nargs;
+  avs = api_alloc_call_args(g, nargs);
   for (u32 i = 0; i < nargs; ++i) {
     u32 idx = nargs - 1u - i;
-    ApiSValue arg = api_pop(g);
-    int is_vararg = (idx >= abi->nparams);
-    CfreeCgTypeId aty;
-    aty = is_vararg ? (arg.type ? arg.type : api_sv_type(&arg))
-                    : cg_type_func_param_id(g->c, fty, idx);
-    if (!aty)
-      aty = arg.type;
-    avs[idx].type = aty;
-    avs[idx].abi = is_vararg ? NULL : &abi->params[idx];
-    if (api_is_wide16_scalar_type(g->c, aty)) {
-      ApiSValue lv = api_wide16_materialize_lvalue(g, &arg, aty);
-      avs[idx].storage = lv.op;
-      avs[idx].storage.type = aty;
-      avs[idx].size = 16;
-    } else if (cg_type_is_aggregate(g->c, aty)) {
-      api_ensure_reg(g, &arg);
-      Operand st = arg.op;
-      st.type = aty;
-      avs[idx].storage = st;
-      avs[idx].size = abi_cg_sizeof(g->c->abi, aty);
-    } else {
-      api_ensure_reg(g, &arg);
-      avs[idx].storage =
-          (api_is_lvalue_sv(&arg) || arg.op.kind == OPK_GLOBAL)
-              ? api_force_reg(g, &arg, aty)
-              : arg.op;
-    }
+    api_pack_call_arg(g, &avs[idx], fty, abi, idx);
   }
   callee_op = api_op_global((ObjSymId)sym, 0, cg_type_ptr_to(g->c, fty));
   memset(&desc, 0, sizeof desc);
@@ -6613,41 +6615,16 @@ static void api_call_symbol_common(CfreeCg *g, CfreeCgSym sym, uint32_t nargs,
   desc.ret.type = ret_ty;
   desc.ret.abi = &abi->ret;
   if (has_result) {
-    if (cg_type_is_aggregate(g->c, ret_ty) ||
-        api_is_wide16_scalar_type(g->c, ret_ty)) {
-      FrameSlotDesc fsd;
-      FrameSlot ret_slot;
-      memset(&fsd, 0, sizeof fsd);
-      fsd.type = ret_ty;
-      fsd.size = abi_cg_sizeof(g->c->abi, ret_ty);
-      fsd.align = abi_cg_alignof(g->c->abi, ret_ty);
-      fsd.kind = FS_LOCAL;
-      fsd.flags = FSF_ADDR_TAKEN;
-      ret_slot = T->frame_slot(T, &fsd);
-      desc.ret.storage = api_op_local(ret_slot, ret_ty);
-    } else {
-      Reg r = api_alloc_reg_or_spill(g, api_type_class(ret_ty), ret_ty);
-      desc.ret.storage = api_op_reg(r, ret_ty);
-    }
+    api_alloc_call_ret_storage(g, T, ret_ty, &desc.ret.storage);
   } else {
     desc.ret.storage = api_op_imm(0, builtin_id(CFREE_CG_BUILTIN_VOID));
   }
   if (tail)
     api_regalloc_finish(g);
   T->call(T, &desc);
-  for (u32 i = 0; i < nargs; ++i) {
-    api_release_arg_storage(g, &avs[i].storage);
-  }
-  g->avs_in_flight = NULL;
-  g->avs_in_flight_n = 0;
+  api_release_call_args(g, avs, nargs);
   if (has_result) {
-    if (desc.ret.storage.kind == OPK_LOCAL ||
-        desc.ret.storage.kind == OPK_GLOBAL ||
-        desc.ret.storage.kind == OPK_INDIRECT) {
-      api_push(g, api_make_lv(desc.ret.storage, ret_ty));
-    } else {
-      api_push(g, api_make_sv(desc.ret.storage, ret_ty));
-    }
+    api_push_call_result(g, desc.ret.storage, ret_ty);
   }
 }

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

A	doc/CBACKEND.md	\|	298	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	src/api/cg.c	\|	239	++++++++++++++++++++++++++++++++++++-------------------------------------------