Remove legacy cg implementation - kit

commit df89de575e43a34401cd2a8cfbc73ae97c2e6c0e
parent 9ef2979b70072891cc096bffa3b00a6db7a3f68d
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 14 May 2026 09:11:33 -0700

Remove legacy cg implementation

Diffstat:
M doc/cg-type-migration-plan.md  | 545 ++++++++++++++++++++++---------------------------------------------------------
M include/abi/abi.h  | 11 ++++++++++-
M src/abi/abi.c  | 339 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------
M src/abi/abi.h  | 11 ++++++++++-
M src/abi/abi_aapcs64.c  | 37 +++++++++++++++++++++----------------
M src/abi/abi_apple_arm64.c  | 4 ++--
M src/abi/abi_internal.h  | 10 +++++-----
M src/abi/abi_rv64.c  | 37 +++++++++++++++++++++----------------
M src/abi/abi_sysv_x64.c  | 38 ++++++++++++++++++++++----------------
M src/api/cg.c  | 112 +++++++++++++++++++++----------------------------------------------------------
M src/api/cg_api.h  | 9 +--------
A src/api/cg_type.h  | 72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M src/api/stubs.c  | 3 +--
M src/arch/arch.h  | 3 ++-
D src/cg/cg.c  | 1995 -------------------------------------------------------------------------------
D src/cg/cg.h  | 195 -------------------------------------------------------------------------------
D src/cg/fold.c  | 154 -------------------------------------------------------------------------------
D src/cg/fold.h  | 47 -----------------------------------------------
M src/emu/emu.c  | 28 +++++++++-------------------
M src/emu/emu.h  | 6 +++---
M src/emu/lift.c  | 6 +++---
D test/cg/CORPUS.md  | 436 -------------------------------------------------------------------------------
D test/cg/binder_test.c  | 538 -------------------------------------------------------------------------------
D test/cg/dwarf_validate.sh  | 81 -------------------------------------------------------------------------------
D test/cg/harness/cases.c  | 555 -------------------------------------------------------------------------------
D test/cg/harness/cases_a.c  | 112 -------------------------------------------------------------------------------
D test/cg/harness/cases_asm.c  | 101 -------------------------------------------------------------------------------
D test/cg/harness/cases_b.c  | 315 -------------------------------------------------------------------------------
D test/cg/harness/cases_c.c  | 204 -------------------------------------------------------------------------------
D test/cg/harness/cases_d.c  | 230 -------------------------------------------------------------------------------
D test/cg/harness/cases_e.c  | 258 -------------------------------------------------------------------------------
D test/cg/harness/cases_f.c  | 327 -------------------------------------------------------------------------------
D test/cg/harness/cases_g.c  | 660 -------------------------------------------------------------------------------
D test/cg/harness/cases_h.c  | 655 -------------------------------------------------------------------------------
D test/cg/harness/cases_i.c  | 435 -------------------------------------------------------------------------------
D test/cg/harness/cases_j.c  | 573 -------------------------------------------------------------------------------
D test/cg/harness/cases_k.c  | 210 -------------------------------------------------------------------------------
D test/cg/harness/cases_l.c  | 396 -------------------------------------------------------------------------------
D test/cg/harness/cases_mc.c  | 24 ------------------------
D test/cg/harness/cases_n.c  | 286 -------------------------------------------------------------------------------
D test/cg/harness/cases_o.c  | 381 -------------------------------------------------------------------------------
D test/cg/harness/cases_p.c  | 132 -------------------------------------------------------------------------------
D test/cg/harness/cases_q.c  | 473 -------------------------------------------------------------------------------
D test/cg/harness/cases_shared.c  | 14 --------------
D test/cg/harness/cases_shared.h  | 17 -----------------
D test/cg/harness/cg_check_dwarf.c  | 429 -------------------------------------------------------------------------------
D test/cg/harness/cg_runner.c  | 657 -------------------------------------------------------------------------------
D test/cg/harness/cg_test.c  | 456 -------------------------------------------------------------------------------
D test/cg/harness/cg_test.h  | 300 -------------------------------------------------------------------------------
D test/cg/run.sh  | 652 -------------------------------------------------------------------------------
M test/test.mk  | 33 ++++++---------------------------

51 files changed, 665 insertions(+), 12937 deletions(-)
diff --git a/doc/cg-type-migration-plan.md b/doc/cg-type-migration-plan.md
@@ -1,404 +1,157 @@
-# CG Type Migration Plan
+# Remove C `Type` From `src/`
 
 ## Goal
 
-Make the C frontend just another language frontend, like `lang/toy`, while
-removing the C language `Type` dependency from `src/`.
+`src/` must be language-neutral. C semantic types stay in `lang/c`; generic
+codegen, ABI, arch lowering, optimizer, debug, object emission, and emu use
+`CfreeCgTypeId`, `CgType`, debug type IDs, or explicit storage facts.
 
-The C frontend should keep its rich C semantic type system privately in
-`lang/c`. Codegen, ABI, arch lowering, optimizer, and object emission should
-use a narrower, language-neutral `CgType` model constructed only through the
-public `include/cfree/cg.h` type constructors.
+Completion means:
 
-## Status Checklist
-
-Keep this section current as migration work lands.
-
-- [x] Phase 1: Add an internal `CgType` payload behind `CfreeCgTypeId`.
-- [x] Phase 1: Add internal `cg_type_*` lookup, layout, and classification
-  helpers.
-- [x] Phase 1: Populate `CgType` from public CG constructors and the temporary
-  legacy C `Type*` import bridge.
-- [x] Phase 1: Move public CG type query APIs to `CgType`.
-- [ ] Phase 1: Move `CgType` into a dedicated internal header/module if the
-  next migration slice needs it outside `src/api/cg.c`.
-- [ ] Phase 2: Cache lowered `CfreeCgTypeId` values in the C frontend or in a
-  frontend-owned `Type*` map.
-- [ ] Phase 2: Replace recursive-record placeholder lowering with a real
-  forward/begin/complete public CG record API.
-- [ ] Phase 3: Replace stored `const Type*` in `src/api/cg.c` state with
-  `CfreeCgTypeId` or `CgType` facts.
-- [ ] Phase 4: Migrate ABI APIs and record/function caches from C `Type*` to
-  CG type handles.
-- [ ] Phase 5: Migrate `CGTarget` and arch lowering away from C semantic
-  types.
-- [ ] Phase 6: Migrate optimizer and generic debug bridges away from C
-  semantic types.
-- [ ] Phase 7: Remove legacy C type bridges and shim headers.
-- [ ] Phase 8: Register C through the frontend mechanism and remove direct
-  `src` dependencies on `lang/c`.
-
-## Target Boundary
-
-- `lang/c/Type`: C semantic type. It carries C-only facts such as qualifiers,
-  typedef/tag behavior, incomplete types, decay rules, bitfield syntax, and
-  source-level compatibility.
-- `CgType`: language-neutral codegen type. It carries only storage, layout,
-  calling convention, and lowering information expressible through
-  `include/cfree/cg.h`.
-- `src/`: must not include or depend on `lang/c` headers.
-- `ObjBuilder`: should see symbols, sections, relocs, and bytes, not C types.
-- `CGTarget`: should see codegen facts such as size, alignment, register class,
-  pointer/float/int shape, and operation flags, not C semantic facts.
-
-## Why This Is Needed
-
-Today `CfreeCgTypeId` is public, but internally `src/api/cg.c` resolves it back
-to the C frontend `Type*`. That makes the public CG API a wrapper around the C
-type system and keeps `src` coupled to `lang/c` through compatibility headers.
-
-The main current dependency shape is:
-
-- public `cfree_cg_type_*` constructors create/import C `Type*`
-- CG stack values, operands, slots, symbols, ABI calls, and target descriptors
-  store `const Type*`
-- ABI and arch helpers inspect `Type*` directly
-- `src/type/type.h` is currently a shim to `lang/c/type/type.h`
-
-The migration inverts that relationship. C `Type` lowers to `CfreeCgTypeId`;
-CG internals resolve `CfreeCgTypeId` to `CgType`.
-
-## CgType Shape
-
-Add an internal `CgType` representation, probably under `src/api/cg_type.h` or
-`src/cg/type.h`.
-
-It should model only concepts from public `include/cfree/cg.h`:
-
-```c
-typedef struct CgType CgType;
-
-typedef struct CgTypeField {
-  CfreeSym name;
-  CfreeCgTypeId type;
-  uint64_t offset;
-  uint32_t align_override;
-} CgTypeField;
-
-struct CgType {
-  CfreeCgTypeKind kind;
-  uint64_t size;
-  uint32_t align;
-
-  union {
-    struct {
-      uint32_t width;
-    } integer;
-
-    struct {
-      uint32_t width;
-    } fp;
-
-    struct {
-      CfreeCgTypeId pointee;
-      uint32_t address_space;
-    } ptr;
-
-    struct {
-      CfreeCgTypeId elem;
-      uint64_t count;
-    } array;
-
-    struct {
-      CfreeCgTypeId ret;
-      CfreeCgParam* params;
-      uint32_t nparams;
-      CfreeCgCallConv call_conv;
-      int abi_variadic;
-      CfreeCgAbiAttrs ret_attrs;
-    } func;
-
-    struct {
-      CfreeSym tag;
-      CgTypeField* fields;
-      uint32_t nfields;
-      int is_union;
-      uint32_t align_override;
-      uint32_t flags;
-    } record;
-
-    struct {
-      CfreeSym tag;
-      CfreeCgTypeId base;
-      CfreeCgEnumValue* values;
-      uint32_t nvalues;
-    } enum_;
-
-    struct {
-      CfreeSym name;
-      CfreeCgTypeId base;
-    } alias;
-  };
-};
-```
-
-Exact field names can change, but the important rule is that `CgType` cannot
-grow C-only semantics.
-
-## Public API Gaps To Address
-
-Before fully migrating internals, confirm the public type constructors can
-represent codegen needs without C-specific escape hatches.
-
-Known gaps:
-
-- Record construction needs a forward/incomplete or begin/complete story for
-  self-referential records.
-- Records need to distinguish struct and union.
-- Packed/aligned record and field layout must be expressible.
-- Bitfields need either direct representation or an explicit frontend-lowered
-  representation.
-- Signedness should stay out of storage type identity where possible. Keep it
-  on integer operations, comparisons, conversions, and ABI extension attrs.
-- `va_list` should be expressible as a target-provided CG type without relying
-  on C `Type`.
-
-## Migration Phases
-
-### Phase 1: Add CgType Registry Behind CfreeCgTypeId
-
-Change the existing public type registry in `src/api/cg.c` so each
-`CfreeCgTypeId` resolves to a canonical `CgType`.
-
-During this phase, keep the legacy C type pointer as a bridge:
-
-```c
-typedef struct CgApiType {
-  CgType cg;
-  const Type* legacy_type; /* temporary migration bridge */
-} CgApiType;
-```
-
-Add internal helpers:
-
-```c
-const CgType* cg_type_get(Compiler*, CfreeCgTypeId);
-uint64_t cg_type_size(Compiler*, CfreeCgTypeId);
-uint32_t cg_type_align(Compiler*, CfreeCgTypeId);
-int cg_type_is_int(Compiler*, CfreeCgTypeId);
-int cg_type_is_float(Compiler*, CfreeCgTypeId);
-int cg_type_is_ptr(Compiler*, CfreeCgTypeId);
-int cg_type_is_record(Compiler*, CfreeCgTypeId);
+```sh
+rg 'lang/c|type/type\.h|const Type\*|TypeKind|TY_' src include/abi
 ```
 
-Public query APIs such as `cfree_cg_type_size` and
-`cfree_cg_type_record_field` should use `CgType`, not C `Type`.
-
-Deliverable:
-
-- No behavior change.
-- Public tests still pass.
-- Existing code may still use `legacy_type`, but new code should not add new
-  direct `Type*` usage.
-
-### Phase 2: Cache CgTypeId In The C Frontend Type
-
-Keep the C frontend `Type`, but make its lowered CG representation explicit.
-
-Add a cache field to `lang/c/type/type.h`:
-
-```c
-CfreeCgTypeId cg_id;
+finds no generic `src` dependency on C semantic types. C-specific files under
+`lang/c` may still use `Type`.
+
+## Current Blockers
+
+These are the remaining dependency clusters to remove.
+
+1. **C compatibility shims in `src/`**
+   - `src/type/type.h`
+   - `src/decl/decl.h`
+   - `src/decl/decl_attrs.h`
+   - `src/lex/lex.h`
+   - `src/pp/pp.h`
+   - `src/parse/cg_public_compat.h`
+   - `src/api/pipeline.c -> lang/c/c.h`
+
+2. **ABI still exposes C `Type*` bridge APIs**
+   - `include/abi/abi.h` and `src/abi/abi.h` include `type/type.h`.
+   - `abi_type_info`, `abi_sizeof`, `abi_alignof`, `abi_record_layout`, and
+     `abi_func_info` still take `const Type*`.
+   - `abi_size_type`, `abi_ptrdiff_type`, `abi_intptr_type`,
+     `abi_uintptr_type`, and `abi_va_list_type` still manufacture C types.
+   - `src/abi/abi.c` still has C bridge classification/layout code.
+
+3. **Public CG implementation still stores C `Type*` internally**
+   - `src/api/cg.c` keeps `CgApiType.type`, `resolve_type`,
+     `cg_api_type_import`, `cg_api_type_resolve`, stack value types, slot type
+     tables, symbol type tables, function return types, and bridge helpers.
+   - It builds legacy C `Type*` values when public CG type constructors are
+     called.
+
+4. **`CGTarget` and arch lowering still use C type identity**
+   - `src/arch/arch.h` forward-declares `Type` and uses `const Type*` in
+     `FrameSlotDesc`, `MemAccess`, `ConstBytes`, `AggregateAccess`,
+     `BitFieldAccess`, `Operand`, `CGABIValue`, `CGParamDesc`, `CGFuncDesc`,
+     `CGCallDesc`, `CGScopeDesc`, `AsmConstraint`, `alloc_reg`, and
+     `va_arg_`.
+   - Arch internals include `type/type.h` and use helpers such as
+     `type_is_64`, `type_is_fp_double`, `type_byte_size`, and
+     `type_is_signed`.
+
+5. **Optimizer IR stores C `Type*`**
+   - `src/opt/ir.h`, `src/opt/ir.c`, `src/opt/opt.c`,
+     `src/opt/pass_lower.c`.
+   - `Func.val_type`, instruction result types, frame slots, call metadata,
+     and `IR_VA_ARG` aux data are still `const Type*`.
+
+6. **Generic debug has the C debug adapter in `src`**
+   - `src/debug/c_debug.c` and `src/debug/c_debug.h` walk `Type*`.
+   - Generic debug comments and APIs still refer to C `Type*` caches.
+
+7. **Emu stubs still synthesize C `Type*`**
+   - `src/emu/emu.h` exposes `emu_cpu_type` and `emu_block_fn_type` as
+     `const Type*`.
+   - `src/emu/cpu.c` constructs CPU/block types through C type constructors.
+
+8. **Core pool still has a C type hook**
+   - `src/core/pool.h` forward-declares `Type`.
+   - `pool_type` exists only for the old C type interning shape and should move
+     to `lang/c` or disappear.
+
+## Removal Order
+
+Do this in order; each step should keep `make lib`, `make bin`, and
+`make test-cg-api` green. Run parse/link tests when touching frontend or ABI
+behavior.
+
+1. **Make C lowering own the `Type* -> CfreeCgTypeId` cache**
+   - Add a cache field or map in `lang/c`.
+   - Ensure all C parser/codegen adapters call public CG constructors once per
+     C type.
+   - Add public CG record forward/begin/complete support before removing the
+     recursive-record placeholder bridge.
+
+2. **Finish `src/api/cg.c` migration**
+   - Replace all stored `const Type*` with `CfreeCgTypeId` or `CgType` facts.
+   - Remove legacy C type construction from public CG constructors.
+   - Keep any unavoidable bridge in tiny, named functions until step 8.
+
+3. **Make ABI purely CG-typed**
+   - Rename or replace the `abi_cg_*` APIs as the only ABI layout/classification
+     APIs.
+   - Delete C `Type*` ABI APIs and C bridge classification/layout code from
+     `src/abi`.
+   - Replace target library type helpers with CG type IDs or move C spellings
+     of `size_t`, `ptrdiff_t`, `intptr_t`, `uintptr_t`, and `va_list` to
+     `lang/c`.
+   - Remove `type/type.h` from `include/abi/abi.h` and `src/abi/abi.h`.
+
+4. **Make `CGTarget` language-neutral**
+   - Change target-facing descriptors in `src/arch/arch.h` from `Type*` to
+     `CfreeCgTypeId` or explicit facts: size, align, reg class, integer width,
+     float width, pointer/address-space, signedness where operation-specific.
+   - Replace arch helper reads of C types with CG helpers or operation flags.
+   - Remove `type/type.h` includes from `src/arch/**`.
+
+5. **Move optimizer IR off C types**
+   - Replace IR value/frame/instruction type fields with `CfreeCgTypeId` or
+     compact derived facts.
+   - Replace `IR_VA_ARG` `Type*` aux with a CG type handle.
+   - Remove `type/type.h` from `src/opt/**`.
+
+6. **Move C debug lowering out of generic debug**
+   - Move `src/debug/c_debug.*` to `lang/c/debug` or another C frontend adapter.
+   - Generic debug should consume frontend-provided `DebugTypeId` values, not
+     inspect C `Type`.
+   - Remove C type cache language from generic `src/debug` docs/comments.
+
+7. **Update emu stubs**
+   - Replace `emu_cpu_type` / `emu_block_fn_type` with CG type IDs or explicit
+     layout records.
+   - Build CPU state and block signatures through public CG constructors.
+   - Remove `type/type.h` from `src/emu/**`.
+
+8. **Move pool/type interning ownership to `lang/c`**
+   - Delete `pool_type` from `src/core/pool.*` or move the C-specific type
+     interning helper under `lang/c/type`.
+   - Remove the `Type` forward declaration from `src/core/pool.h`.
+
+9. **Delete compatibility shims and register C like Toy**
+   - Delete `src/type`, `src/decl`, `src/lex`, `src/pp`, and
+     `src/parse/cg_public_compat.h` once no `src` file includes them.
+   - Remove `src/api/pipeline.c`'s direct `lang/c/c.h` include and hardcoded C
+     branch.
+   - Register C through the frontend mechanism used by Toy.
+
+## Do Not Regress
+
+- Do not put C-only facts into `CgType`.
+- Signedness should live on operations, comparisons, conversions, ABI attrs, or
+  explicit lowering metadata, not storage type identity.
+- Object emission must remain byte/section/symbol/reloc based.
+- Keep frontend-specific debug/type lowering outside generic `src`.
+
+## Useful Checks
+
+```sh
+make lib
+make bin
+make test-cg-api
+rg 'lang/c|type/type\.h|const Type\*|TypeKind|TY_' src include/abi
+rg 'cg_api_type_import|cg_api_type_resolve|cfree_cg_internal_.*type' src
 ```
-
-or, if mutability of interned `Type` is undesirable, add a frontend-side
-map from `Type*` to `CfreeCgTypeId`.
-
-Update `type_cg_id(CfreeCompiler*, const Type*)` so it constructs through the
-public CG API once and returns the cached id thereafter.
-
-Recursive records need special handling. Prefer a real public
-begin/complete/forward record API over the current placeholder behavior.
-
-Deliverable:
-
-- `lang/c` lowers to public CG type constructors.
-- `src` still builds with the legacy bridge.
-
-### Phase 3: Migrate src/api/cg.c State To CgType
-
-Replace stored `const Type*` in CG state with `CfreeCgTypeId` or
-`const CgType*`.
-
-High-priority structures:
-
-- stack values
-- operands
-- slots
-- symbol type table
-- function return type
-- memory access descriptors
-- conversion helpers
-- call descriptors
-- intrinsic and atomic lowering helpers
-
-Temporary bridge calls into old ABI/arch code are acceptable, but they should
-be centralized. Do not continue storing C `Type*` in CG state.
-
-Deliverable:
-
-- `src/api/cg.c` mostly reasons in `CfreeCgTypeId`/`CgType`.
-- Remaining `Type*` use is isolated to bridge functions.
-
-### Phase 4: Migrate ABI To CgType
-
-Change ABI APIs from C `Type*` to CG type handles.
-
-Current APIs to migrate:
-
-```c
-ABITypeInfo abi_type_info(TargetABI*, const Type*);
-u32 abi_sizeof(TargetABI*, const Type*);
-u32 abi_alignof(TargetABI*, const Type*);
-const ABIRecordLayout* abi_record_layout(TargetABI*, const Type*);
-const ABIFuncInfo* abi_func_info(TargetABI*, const Type* fn_type);
-```
-
-Target APIs should look more like:
-
-```c
-ABITypeInfo abi_type_info(TargetABI*, CfreeCgTypeId);
-u32 abi_sizeof(TargetABI*, CfreeCgTypeId);
-u32 abi_alignof(TargetABI*, CfreeCgTypeId);
-const ABIRecordLayout* abi_record_layout(TargetABI*, CfreeCgTypeId);
-const ABIFuncInfo* abi_func_info(TargetABI*, CfreeCgTypeId fn_type);
-```
-
-Record layout should be cached by `CfreeCgTypeId`, not `Type*`.
-
-Deliverable:
-
-- ABI no longer includes `type/type.h`.
-- ABI classification uses only `CgType` and target facts.
-
-### Phase 5: Migrate CGTarget And Arch Lowering
-
-Change target-facing structs in `src/arch/arch.h` from `const Type*` to
-`CfreeCgTypeId` or derived `CgType` facts.
-
-Most arch code only needs:
-
-- byte size
-- alignment
-- integer width
-- float width
-- pointer vs integer vs float
-- register class
-- signedness for specific signed operations
-
-Replace helpers such as:
-
-```c
-type_is_64(t)
-type_is_fp_double(t)
-type_byte_size(t)
-type_is_signed(t)
-```
-
-with CG-type helpers. Where signedness is operation-specific, pass it through
-operation metadata instead of reading it from type identity.
-
-Deliverable:
-
-- arch code does not include `type/type.h`
-- `CGTarget` is language-neutral
-
-### Phase 6: Migrate Optimizer And Debug Bridges
-
-Optimizer IR currently stores `const Type*` in several places. Move it to
-`CfreeCgTypeId` or `CgType` facts.
-
-Debug should remain a separate concern:
-
-- Codegen debug emission can consume frontend-provided debug type IDs.
-- C-specific debug lowering from C `Type` should live in `lang/c` or an
-  explicit C debug adapter, not in generic `src` code.
-
-Deliverable:
-
-- optimizer no longer includes C type headers
-- generic debug producer does not depend on C type
-
-### Phase 7: Remove Legacy C Type Bridges
-
-After CG, ABI, arch, opt, and generic debug no longer need C `Type*`, remove:
-
-- `legacy_type` from `CgApiType`
-- `cg_api_type_import`
-- `cg_api_type_resolve`
-- `cfree_cg_internal_*_type`
-- `src/type/type.h` shim
-- `src/decl/*` shims if no longer needed
-
-Deliverable:
-
-- `src` no longer depends on `lang/c/type`.
-- `lang/c` is the only owner of C semantic types.
-
-### Phase 8: Make C A Registered Frontend
-
-Once `src` no longer needs C headers, make C follow the same pattern as Toy.
-
-Current special case:
-
-- `src/api/pipeline.c` directly includes `../../lang/c/c.h`
-- `compile_into` has a hardcoded `CFREE_LANG_C` branch
-
-Target:
-
-- `cfree_c_compile` is registered through the same frontend mechanism as Toy.
-- `src/api/pipeline.c` only calls `c->frontends[input->lang]` for language
-  compilation.
-- Eventually replace enum-indexed registration with string registration.
-
-ASM can remain a builtin path temporarily, or be moved to its own registered
-frontend in a later cleanup.
-
-Deliverable:
-
-- no `src -> lang/c` include
-- C frontend is linked/registered the same way as Toy
-
-## Suggested PR Sequence
-
-1. Add `CgType` and registry helpers; keep legacy `Type*`.
-2. Add C frontend `Type -> CfreeCgTypeId` cache.
-3. Convert public type query APIs to read `CgType`.
-4. Convert `src/api/cg.c` stack/operand/slot/symbol state to CG types.
-5. Convert ABI to CG types.
-6. Convert `CGTarget` and arch lowering to CG types.
-7. Convert optimizer/debug generic paths.
-8. Delete legacy bridges and shim headers.
-9. Register C like Toy and remove direct `src/api/pipeline.c -> lang/c/c.h`.
-
-Each PR should keep `make lib` and `CFREE_TEST_ALLOW_SKIP=1 make test-parse`
-green. Broader `test-cg` and arch-specific tests should be run around phases
-4 through 6.
-
-## Non-Goals
-
-- Do not remove the C frontend `Type` early. It is still needed for C language
-  semantics.
-- Do not move C-only rules into `CgType`.
-- Do not make `ObjBuilder` understand type systems.
-- Do not make `CGTarget` inspect language-specific types.
-
-## Completion Criteria
-
-- No `src` file includes `lang/c` headers directly or indirectly.
-- No `src` file includes `type/type.h` as a C semantic type.
-- Public CG type constructors create all codegen-visible type facts.
-- ABI, arch, CG, and optimizer operate on `CgType`/`CfreeCgTypeId`.
-- C and Toy are both registered language frontends.
-- C-specific lexer, preprocessor, parser, decl, and type code live under
-  `lang/c` only.
diff --git a/include/abi/abi.h b/include/abi/abi.h
@@ -1,6 +1,8 @@
 #ifndef CFREE_ABI_H
 #define CFREE_ABI_H
 
+#include <cfree/cg.h>
+
 #include "core/core.h"
 #include "type/type.h"
 
@@ -115,7 +117,14 @@ void abi_fini(TargetABI*);
 TargetABI* abi_new(Compiler*);
 void abi_free(TargetABI*);
 
-/* Builtin scalar profiles and general type layout. */
+/* Builtin scalar profiles and general type layout. New code should enter
+ * through CfreeCgTypeId; Type* overloads are the temporary C frontend bridge. */
+ABITypeInfo abi_cg_type_info(TargetABI*, CfreeCgTypeId);
+u32 abi_cg_sizeof(TargetABI*, CfreeCgTypeId);
+u32 abi_cg_alignof(TargetABI*, CfreeCgTypeId);
+const ABIRecordLayout* abi_cg_record_layout(TargetABI*, CfreeCgTypeId);
+const ABIFuncInfo* abi_cg_func_info(TargetABI*, CfreeCgTypeId fn_type);
+
 ABITypeInfo abi_type_info(TargetABI*, const Type*);
 u32 abi_sizeof(TargetABI*, const Type*);
 u32 abi_alignof(TargetABI*, const Type*);
diff --git a/src/abi/abi.c b/src/abi/abi.c
@@ -17,6 +17,8 @@
 #include <string.h>
 
 #include "abi/abi_internal.h"
+#include "api/cg_api.h"
+#include "api/cg_type.h"
 #include "core/arena.h"
 #include "core/core.h"
 #include "core/pool.h"
@@ -133,50 +135,80 @@ static ABITypeInfo prim_info(TargetABI* a, TypeKind k) {
   }
 }
 
-ABITypeInfo abi_type_info(TargetABI* a, const Type* t) {
+static ABITypeInfo abi_c_type_info_no_bridge(TargetABI* a, const Type* t);
+
+ABITypeInfo abi_cg_type_info(TargetABI* a, CfreeCgTypeId id) {
   ABITypeInfo r = {0, 0, ABI_SC_VOID, 0, 0, 0};
+  const CgType* t;
+  if (!id) return r;
+  t = cg_type_get(a->c, id);
   if (!t) return r;
   switch (t->kind) {
-    case TY_PTR:
+    case CFREE_CG_TYPE_ALIAS:
+      return abi_cg_type_info(a, t->alias.base);
+    case CFREE_CG_TYPE_PTR:
       r.size = a->c->target.ptr_size ? a->c->target.ptr_size : 8;
       r.align = a->c->target.ptr_align ? a->c->target.ptr_align : 8;
       r.scalar_kind = ABI_SC_PTR;
       return r;
-    case TY_ARRAY: {
-      ABITypeInfo e = abi_type_info(a, t->arr.elem);
-      r.size = e.size * t->arr.count;
+    case CFREE_CG_TYPE_ARRAY: {
+      ABITypeInfo e = abi_cg_type_info(a, t->array.elem);
+      r.size = e.size * t->array.count;
       r.align = e.align;
       return r;
     }
-    case TY_STRUCT:
-    case TY_UNION: {
-      const ABIRecordLayout* L = abi_record_layout(a, t);
+    case CFREE_CG_TYPE_RECORD: {
+      const ABIRecordLayout* L = abi_cg_record_layout(a, id);
       if (L) {
         r.size = L->size;
         r.align = L->align;
       }
       return r;
     }
-    case TY_ENUM:
-      return abi_type_info(
-          a, t->enm.base ? t->enm.base : type_prim(a->c->global, TY_INT));
-    case TY_FUNC:
+    case CFREE_CG_TYPE_ENUM:
+      return abi_cg_type_info(a, t->enum_.base);
+    case CFREE_CG_TYPE_FUNC:
       /* sizeof(function) is undefined in C; use 1 for arithmetic. */
       r.size = 1;
       r.align = 1;
       return r;
+    case CFREE_CG_TYPE_VOID:
+      r.align = 1;
+      r.scalar_kind = ABI_SC_VOID;
+      return r;
+    case CFREE_CG_TYPE_BOOL:
+      r.size = t->size;
+      r.align = t->align;
+      r.scalar_kind = ABI_SC_BOOL;
+      return r;
+    case CFREE_CG_TYPE_INT:
+      r.size = t->size;
+      r.align = t->align;
+      r.scalar_kind = ABI_SC_INT;
+      return r;
+    case CFREE_CG_TYPE_FLOAT:
+      r.size = t->size;
+      r.align = t->align;
+      r.scalar_kind = ABI_SC_FLOAT;
+      return r;
+    case CFREE_CG_TYPE_VARARG_STATE:
+      r.size = t->size;
+      r.align = t->align;
+      return r;
     default:
-      return prim_info(a, (TypeKind)t->kind);
+      return r;
   }
 }
 
-ABITypeInfo abi_internal_type_info(TargetABI* a, const Type* t) {
-  return abi_type_info(a, t);
+ABITypeInfo abi_internal_type_info(TargetABI* a, CfreeCgTypeId id) {
+  return abi_cg_type_info(a, id);
 }
 
-u32 abi_sizeof(TargetABI* a, const Type* t) { return abi_type_info(a, t).size; }
-u32 abi_alignof(TargetABI* a, const Type* t) {
-  return abi_type_info(a, t).align;
+u32 abi_cg_sizeof(TargetABI* a, CfreeCgTypeId id) {
+  return abi_cg_type_info(a, id).size;
+}
+u32 abi_cg_alignof(TargetABI* a, CfreeCgTypeId id) {
+  return abi_cg_type_info(a, id).align;
 }
 
 /* ---- record layout (struct/union) ----
@@ -185,23 +217,25 @@ u32 abi_alignof(TargetABI* a, const Type* t) {
  * natural alignment, no bitfield packing extensions. When a Windows-x64
  * (MSVC bitfield rules) ABI lands, promote this into the vtable. */
 
-static ABIRecordLayout* compute_record_layout(TargetABI* a, const Type* t) {
+static ABIRecordLayout* compute_record_layout(TargetABI* a, CfreeCgTypeId id) {
   ABIRecordLayout* L = arena_new(a->c->tu, ABIRecordLayout);
+  const CgType* t = cg_type_get(a->c, id);
   if (!L) return NULL;
+  if (!t || t->kind != CFREE_CG_TYPE_RECORD) return NULL;
   memset(L, 0, sizeof *L);
   ABIFieldLayout* fl = NULL;
-  if (t->rec.nfields) {
-    fl = arena_array(a->c->tu, ABIFieldLayout, t->rec.nfields);
-    memset(fl, 0, sizeof(ABIFieldLayout) * t->rec.nfields);
+  if (t->record.nfields) {
+    fl = arena_array(a->c->tu, ABIFieldLayout, t->record.nfields);
+    memset(fl, 0, sizeof(ABIFieldLayout) * t->record.nfields);
   }
 
   u32 max_align = 1;
-  if (t->kind == TY_STRUCT) {
+  if (!t->record.is_union) {
     u32 off = 0;
-    for (u16 i = 0; i < t->rec.nfields; ++i) {
-      const Field* f = &t->rec.fields[i];
-      ABITypeInfo fi = abi_type_info(a, f->type);
-      if (t->rec.packed) fi.align = 1;
+    for (u32 i = 0; i < t->record.nfields; ++i) {
+      const CgTypeField* f = &t->record.fields[i];
+      ABITypeInfo fi = abi_cg_type_info(a, f->type);
+      if (f->align_override == 1) fi.align = 1;
       if (f->align_override > fi.align) fi.align = f->align_override;
       if (fi.align > max_align) max_align = fi.align;
       u32 mask = fi.align ? fi.align - 1 : 0;
@@ -216,10 +250,10 @@ static ABIRecordLayout* compute_record_layout(TargetABI* a, const Type* t) {
     L->size = (off + mask) & ~mask;
   } else { /* TY_UNION */
     u32 mx = 0;
-    for (u16 i = 0; i < t->rec.nfields; ++i) {
-      const Field* f = &t->rec.fields[i];
-      ABITypeInfo fi = abi_type_info(a, f->type);
-      if (t->rec.packed) fi.align = 1;
+    for (u32 i = 0; i < t->record.nfields; ++i) {
+      const CgTypeField* f = &t->record.fields[i];
+      ABITypeInfo fi = abi_cg_type_info(a, f->type);
+      if (f->align_override == 1) fi.align = 1;
       if (f->align_override > fi.align) fi.align = f->align_override;
       if (fi.align > max_align) max_align = fi.align;
       if (fi.size > mx) mx = fi.size;
@@ -230,25 +264,26 @@ static ABIRecordLayout* compute_record_layout(TargetABI* a, const Type* t) {
     L->size = (mx + mask) & ~mask;
   }
   L->align = max_align;
-  if (t->rec.align_override > L->align) {
-    L->align = t->rec.align_override;
+  if (t->record.align_override > L->align) {
+    L->align = t->record.align_override;
     u32 mask = L->align - 1;
     L->size = (L->size + mask) & ~mask;
   }
-  L->nfields = t->rec.nfields;
+  L->nfields = t->record.nfields;
   L->fields = fl;
   return L;
 }
 
-const ABIRecordLayout* abi_record_layout(TargetABI* a, const Type* t) {
-  if (!t || (t->kind != TY_STRUCT && t->kind != TY_UNION)) return NULL;
+const ABIRecordLayout* abi_cg_record_layout(TargetABI* a, CfreeCgTypeId id) {
+  const CgType* t = cg_type_get(a->c, id);
+  if (!t || t->kind != CFREE_CG_TYPE_RECORD) return NULL;
   for (RecordLayoutCacheEntry* e = a->rec_cache; e; e = e->next) {
-    if (e->ty == t) return e->layout;
+    if (e->ty == id) return e->layout;
   }
-  ABIRecordLayout* L = compute_record_layout(a, t);
+  ABIRecordLayout* L = compute_record_layout(a, id);
   if (!L) return NULL;
   RecordLayoutCacheEntry* e = arena_new(a->c->tu, RecordLayoutCacheEntry);
-  e->ty = t;
+  e->ty = id;
   e->layout = L;
   e->next = a->rec_cache;
   a->rec_cache = e;
@@ -257,8 +292,9 @@ const ABIRecordLayout* abi_record_layout(TargetABI* a, const Type* t) {
 
 /* ---- function classification (vtabled) ---- */
 
-const ABIFuncInfo* abi_func_info(TargetABI* a, const Type* fn_type) {
-  if (!fn_type || fn_type->kind != TY_FUNC) return NULL;
+const ABIFuncInfo* abi_cg_func_info(TargetABI* a, CfreeCgTypeId fn_type) {
+  const CgType* fn = cg_type_get(a->c, fn_type);
+  if (!fn || fn->kind != CFREE_CG_TYPE_FUNC) return NULL;
   for (FuncInfoCacheEntry* e = a->fn_cache; e; e = e->next) {
     if (e->fn == fn_type) return e->info;
   }
@@ -272,6 +308,227 @@ const ABIFuncInfo* abi_func_info(TargetABI* a, const Type* fn_type) {
   return info;
 }
 
+static ABITypeInfo abi_c_type_info_no_bridge(TargetABI* a, const Type* t) {
+  ABITypeInfo r = {0, 0, ABI_SC_VOID, 0, 0, 0};
+  if (!t) return r;
+  switch (t->kind) {
+    case TY_VOID:
+    case TY_BOOL:
+    case TY_CHAR:
+    case TY_SCHAR:
+    case TY_UCHAR:
+    case TY_SHORT:
+    case TY_USHORT:
+    case TY_INT:
+    case TY_UINT:
+    case TY_LONG:
+    case TY_ULONG:
+    case TY_LLONG:
+    case TY_ULLONG:
+    case TY_INT128:
+    case TY_UINT128:
+    case TY_FLOAT:
+    case TY_DOUBLE:
+    case TY_LDOUBLE:
+      return prim_info(a, (TypeKind)t->kind);
+    case TY_ENUM:
+      return abi_c_type_info_no_bridge(
+          a, t->enm.base ? t->enm.base : type_prim(a->c->global, TY_INT));
+    default:
+      r = abi_cg_type_info(a, cg_api_type_import(a->c, t));
+      switch (t->kind) {
+        case TY_CHAR:
+        case TY_SCHAR:
+        case TY_SHORT:
+        case TY_INT:
+        case TY_LONG:
+        case TY_LLONG:
+        case TY_INT128:
+          r.signed_ = 1;
+          break;
+        default:
+          r.signed_ = 0;
+          break;
+      }
+      return r;
+  }
+}
+
+ABITypeInfo abi_type_info(TargetABI* a, const Type* t) {
+  return abi_c_type_info_no_bridge(a, t);
+}
+
+u32 abi_sizeof(TargetABI* a, const Type* t) { return abi_type_info(a, t).size; }
+u32 abi_alignof(TargetABI* a, const Type* t) {
+  return abi_type_info(a, t).align;
+}
+
+const ABIRecordLayout* abi_record_layout(TargetABI* a, const Type* t) {
+  ABIRecordLayout* L;
+  ABIFieldLayout* fl = NULL;
+  u32 max_align = 1;
+  if (!t || (t->kind != TY_STRUCT && t->kind != TY_UNION)) return NULL;
+  L = arena_new(a->c->tu, ABIRecordLayout);
+  if (!L) return NULL;
+  memset(L, 0, sizeof *L);
+  if (t->rec.nfields) {
+    fl = arena_array(a->c->tu, ABIFieldLayout, t->rec.nfields);
+    memset(fl, 0, sizeof(ABIFieldLayout) * t->rec.nfields);
+  }
+  if (t->kind == TY_STRUCT) {
+    u32 off = 0;
+    for (u16 i = 0; i < t->rec.nfields; ++i) {
+      const Field* f = &t->rec.fields[i];
+      ABITypeInfo fi = abi_type_info(a, f->type);
+      if (t->rec.packed) fi.align = 1;
+      if (f->align_override > fi.align) fi.align = f->align_override;
+      if (fi.align > max_align) max_align = fi.align;
+      u32 mask = fi.align ? fi.align - 1 : 0;
+      off = (off + mask) & ~mask;
+      fl[i].offset = off;
+      fl[i].storage_size = fi.size;
+      off += fi.size;
+    }
+    {
+      u32 mask = max_align - 1;
+      L->size = (off + mask) & ~mask;
+    }
+  } else {
+    u32 mx = 0;
+    for (u16 i = 0; i < t->rec.nfields; ++i) {
+      const Field* f = &t->rec.fields[i];
+      ABITypeInfo fi = abi_type_info(a, f->type);
+      if (t->rec.packed) fi.align = 1;
+      if (f->align_override > fi.align) fi.align = f->align_override;
+      if (fi.align > max_align) max_align = fi.align;
+      if (fi.size > mx) mx = fi.size;
+      fl[i].offset = 0;
+      fl[i].storage_size = fi.size;
+    }
+    {
+      u32 mask = max_align - 1;
+      L->size = (mx + mask) & ~mask;
+    }
+  }
+  L->align = max_align;
+  if (t->rec.align_override > L->align) {
+    L->align = t->rec.align_override;
+    u32 mask = L->align - 1;
+    L->size = (L->size + mask) & ~mask;
+  }
+  L->nfields = t->rec.nfields;
+  L->fields = fl;
+  return L;
+}
+
+static void classify_type_void(ABIArgInfo* out) {
+  memset(out, 0, sizeof *out);
+  out->kind = ABI_ARG_IGNORE;
+}
+
+static void classify_type_scalar(TargetABI* a, const Type* t,
+                                 ABIArgInfo* out) {
+  ABITypeInfo ti = abi_type_info(a, t);
+  ABIArgPart* parts = arena_new(a->c->tu, ABIArgPart);
+  memset(out, 0, sizeof *out);
+  memset(parts, 0, sizeof *parts);
+  out->kind = ABI_ARG_DIRECT;
+  out->parts = parts;
+  out->nparts = 1;
+  parts->cls = (ti.scalar_kind == ABI_SC_FLOAT) ? ABI_CLASS_FP : ABI_CLASS_INT;
+  parts->loc = ABI_LOC_REG;
+  parts->size = ti.size;
+  parts->align = ti.align;
+}
+
+static void classify_type_aggregate(TargetABI* a, const Type* t,
+                                    ABIArgInfo* out, int is_return) {
+  ABITypeInfo ti = abi_type_info(a, t);
+  memset(out, 0, sizeof *out);
+  if (ti.size == 0) {
+    classify_type_void(out);
+    return;
+  }
+  if (ti.size <= 16) {
+    u32 nparts = (ti.size + 7) / 8;
+    ABIArgPart* parts = arena_array(a->c->tu, ABIArgPart, nparts);
+    memset(parts, 0, sizeof(ABIArgPart) * nparts);
+    u32 off = 0;
+    for (u32 i = 0; i < nparts; ++i) {
+      u32 chunk = (ti.size - off > 8) ? 8 : (ti.size - off);
+      parts[i].cls = ABI_CLASS_INT;
+      parts[i].loc = ABI_LOC_REG;
+      parts[i].size = chunk;
+      parts[i].align = 8;
+      parts[i].src_offset = off;
+      off += chunk;
+    }
+    out->kind = ABI_ARG_DIRECT;
+    out->parts = parts;
+    out->nparts = (u16)nparts;
+  } else {
+    out->kind = ABI_ARG_INDIRECT;
+    out->flags = is_return ? ABI_AF_SRET : ABI_AF_BYVAL;
+    out->indirect_align = ti.align ? ti.align : 8;
+  }
+}
+
+static void classify_type_one(TargetABI* a, const Type* t, ABIArgInfo* out,
+                              int is_return) {
+  if (!t || t->kind == TY_VOID) {
+    classify_type_void(out);
+    return;
+  }
+  switch (t->kind) {
+    case TY_STRUCT:
+    case TY_UNION:
+      classify_type_aggregate(a, t, out, is_return);
+      return;
+    default:
+      classify_type_scalar(a, t, out);
+      return;
+  }
+}
+
+static ABIFuncInfo* compute_type_func_info(TargetABI* a, const Type* fn_type) {
+  ABIFuncInfo* info = arena_new(a->c->tu, ABIFuncInfo);
+  memset(info, 0, sizeof *info);
+  classify_type_one(a, fn_type->fn.ret, &info->ret, 1);
+  info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0;
+  info->variadic = fn_type->fn.variadic;
+  info->nparams = fn_type->fn.nparams;
+  if (a->c->target.arch == CFREE_ARCH_ARM_64 &&
+      a->c->target.os == CFREE_OS_MACOS) {
+    info->vararg_on_stack = 1;
+  }
+  if (fn_type->fn.nparams) {
+    ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fn_type->fn.nparams);
+    memset(arr, 0, sizeof(ABIArgInfo) * fn_type->fn.nparams);
+    for (u16 i = 0; i < fn_type->fn.nparams; ++i) {
+      classify_type_one(a, fn_type->fn.params[i], &arr[i], 0);
+    }
+    info->params = arr;
+  }
+  return info;
+}
+
+const ABIFuncInfo* abi_func_info(TargetABI* a, const Type* fn_type) {
+  CfreeCgTypeId id;
+  if (!fn_type || fn_type->kind != TY_FUNC) return NULL;
+  id = cg_api_type_import(a->c, fn_type);
+  for (FuncInfoCacheEntry* e = a->fn_cache; e; e = e->next) {
+    if (e->fn == id) return e->info;
+  }
+  ABIFuncInfo* info = compute_type_func_info(a, fn_type);
+  if (!info) return NULL;
+  FuncInfoCacheEntry* e = arena_new(a->c->tu, FuncInfoCacheEntry);
+  e->fn = id;
+  e->info = info;
+  e->next = a->fn_cache;
+  a->fn_cache = e;
+  return info;
+}
+
 /* ---- target-defined library types ---- */
 
 static const Type* size_or_uintptr(TargetABI* a, Pool* p) {
diff --git a/src/abi/abi.h b/src/abi/abi.h
@@ -1,6 +1,8 @@
 #ifndef CFREE_ABI_H
 #define CFREE_ABI_H
 
+#include <cfree/cg.h>
+
 #include "core/core.h"
 #include "type/type.h"
 
@@ -115,7 +117,14 @@ void abi_fini(TargetABI*);
 TargetABI* abi_new(Compiler*);
 void abi_free(TargetABI*);
 
-/* Builtin scalar profiles and general type layout. */
+/* Builtin scalar profiles and general type layout. New code should enter
+ * through CfreeCgTypeId; Type* overloads are the temporary C frontend bridge. */
+ABITypeInfo abi_cg_type_info(TargetABI*, CfreeCgTypeId);
+u32 abi_cg_sizeof(TargetABI*, CfreeCgTypeId);
+u32 abi_cg_alignof(TargetABI*, CfreeCgTypeId);
+const ABIRecordLayout* abi_cg_record_layout(TargetABI*, CfreeCgTypeId);
+const ABIFuncInfo* abi_cg_func_info(TargetABI*, CfreeCgTypeId fn_type);
+
 ABITypeInfo abi_type_info(TargetABI*, const Type*);
 u32 abi_sizeof(TargetABI*, const Type*);
 u32 abi_alignof(TargetABI*, const Type*);
diff --git a/src/abi/abi_aapcs64.c b/src/abi/abi_aapcs64.c
@@ -14,11 +14,12 @@
 #include <string.h>
 
 #include "abi/abi_internal.h"
+#include "api/cg_type.h"
 #include "core/arena.h"
 #include "core/core.h"
 #include "core/pool.h"
 
-static void classify_scalar(TargetABI* a, const Type* t, ABIArgInfo* out) {
+static void classify_scalar(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out) {
   ABITypeInfo ti = abi_internal_type_info(a, t);
   out->kind = ABI_ARG_DIRECT;
   out->flags = ABI_AF_NONE;
@@ -41,7 +42,7 @@ static void classify_void(ABIArgInfo* out) {
   out->kind = ABI_ARG_IGNORE;
 }
 
-static void classify_aggregate(TargetABI* a, const Type* t, ABIArgInfo* out,
+static void classify_aggregate(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out,
                                int is_return) {
   ABITypeInfo ti = abi_internal_type_info(a, t);
   if (ti.size == 0) {
@@ -79,17 +80,20 @@ static void classify_aggregate(TargetABI* a, const Type* t, ABIArgInfo* out,
   }
 }
 
-static void classify_one(TargetABI* a, const Type* t, ABIArgInfo* out,
+static void classify_one(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out,
                          int is_return) {
-  if (!t || t->kind == TY_VOID) {
+  const CgType* ty = cg_type_get(a->c, t);
+  if (!ty || ty->kind == CFREE_CG_TYPE_VOID) {
     classify_void(out);
     return;
   }
-  switch (t->kind) {
-    case TY_STRUCT:
-    case TY_UNION:
+  switch (ty->kind) {
+    case CFREE_CG_TYPE_RECORD:
       classify_aggregate(a, t, out, is_return);
       return;
+    case CFREE_CG_TYPE_ALIAS:
+      classify_one(a, ty->alias.base, out, is_return);
+      return;
     default:
       classify_scalar(a, t, out);
       return;
@@ -98,20 +102,21 @@ static void classify_one(TargetABI* a, const Type* t, ABIArgInfo* out,
 
 /* Non-static so apple_arm64_compute_func_info can delegate to it during
  * the Phase 1 alias period — see abi_apple_arm64.c. */
-ABIFuncInfo* aapcs64_compute_func_info(TargetABI* a, const Type* fn) {
+ABIFuncInfo* aapcs64_compute_func_info(TargetABI* a, CfreeCgTypeId fn) {
   ABIFuncInfo* info = arena_new(a->c->tu, ABIFuncInfo);
+  const CgType* fnty = cg_type_get(a->c, fn);
   memset(info, 0, sizeof *info);
 
-  classify_one(a, fn->fn.ret, &info->ret, /*is_return=*/1);
+  classify_one(a, fnty->func.ret, &info->ret, /*is_return=*/1);
   info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0;
-  info->variadic = fn->fn.variadic;
+  info->variadic = fnty->func.abi_variadic;
 
-  info->nparams = fn->fn.nparams;
-  if (fn->fn.nparams) {
-    ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fn->fn.nparams);
-    memset(arr, 0, sizeof(ABIArgInfo) * fn->fn.nparams);
-    for (u16 i = 0; i < fn->fn.nparams; ++i) {
-      classify_one(a, fn->fn.params[i], &arr[i], /*is_return=*/0);
+  info->nparams = (u16)fnty->func.nparams;
+  if (fnty->func.nparams) {
+    ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fnty->func.nparams);
+    memset(arr, 0, sizeof(ABIArgInfo) * fnty->func.nparams);
+    for (u32 i = 0; i < fnty->func.nparams; ++i) {
+      classify_one(a, fnty->func.params[i].type, &arr[i], /*is_return=*/0);
     }
     info->params = arr;
   } else {
diff --git a/src/abi/abi_apple_arm64.c b/src/abi/abi_apple_arm64.c
@@ -29,10 +29,10 @@
 #include "core/pool.h"
 #include "type/type.h"
 
-extern ABIFuncInfo* aapcs64_compute_func_info(TargetABI*, const Type*);
+extern ABIFuncInfo* aapcs64_compute_func_info(TargetABI*, CfreeCgTypeId);
 
 static ABIFuncInfo* apple_arm64_compute_func_info(TargetABI* a,
-                                                  const Type* fn) {
+                                                  CfreeCgTypeId fn) {
   /* Phase 2: spell out the Darwin variadic / stack-arg-promotion
    * deltas.  For now the AAPCS64 classifier produces ABI-correct
    * output for the fixed-args-only programs in the v1 cg suite,
diff --git a/src/abi/abi_internal.h b/src/abi/abi_internal.h
@@ -11,8 +11,8 @@
 
 typedef struct ABIVtable {
   /* Compute the ABIFuncInfo for a function type. The cache wrapper in
-   * abi.c calls this once per Type and memoizes the result. */
-  ABIFuncInfo* (*compute_func_info)(TargetABI*, const Type* fn);
+   * abi.c calls this once per CgTypeId and memoizes the result. */
+  ABIFuncInfo* (*compute_func_info)(TargetABI*, CfreeCgTypeId fn);
   /* Build the per-ABI __va_list type. The wrapper in abi.c memoizes. */
   const Type* (*va_list_type)(TargetABI*, Pool*);
 } ABIVtable;
@@ -33,13 +33,13 @@ typedef struct FuncInfoCacheEntry FuncInfoCacheEntry;
 typedef struct RecordLayoutCacheEntry RecordLayoutCacheEntry;
 
 struct FuncInfoCacheEntry {
-  const Type* fn;
+  CfreeCgTypeId fn;
   ABIFuncInfo* info;
   FuncInfoCacheEntry* next;
 };
 
 struct RecordLayoutCacheEntry {
-  const Type* ty;
+  CfreeCgTypeId ty;
   ABIRecordLayout* layout;
   RecordLayoutCacheEntry* next;
 };
@@ -53,6 +53,6 @@ struct TargetABI {
 };
 
 /* Shared helpers exposed to per-ABI TUs. */
-ABITypeInfo abi_internal_type_info(TargetABI*, const Type*);
+ABITypeInfo abi_internal_type_info(TargetABI*, CfreeCgTypeId);
 
 #endif
diff --git a/src/abi/abi_rv64.c b/src/abi/abi_rv64.c
@@ -14,11 +14,12 @@
 #include <string.h>
 
 #include "abi/abi_internal.h"
+#include "api/cg_type.h"
 #include "core/arena.h"
 #include "core/core.h"
 #include "core/pool.h"
 
-static void classify_scalar(TargetABI* a, const Type* t, ABIArgInfo* out) {
+static void classify_scalar(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out) {
   ABITypeInfo ti = abi_internal_type_info(a, t);
   out->kind = ABI_ARG_DIRECT;
   out->flags = ABI_AF_NONE;
@@ -41,7 +42,7 @@ static void classify_void(ABIArgInfo* out) {
   out->kind = ABI_ARG_IGNORE;
 }
 
-static void classify_aggregate(TargetABI* a, const Type* t, ABIArgInfo* out,
+static void classify_aggregate(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out,
                                int is_return) {
   ABITypeInfo ti = abi_internal_type_info(a, t);
   if (ti.size == 0) {
@@ -76,37 +77,41 @@ static void classify_aggregate(TargetABI* a, const Type* t, ABIArgInfo* out,
   }
 }
 
-static void classify_one(TargetABI* a, const Type* t, ABIArgInfo* out,
+static void classify_one(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out,
                          int is_return) {
-  if (!t || t->kind == TY_VOID) {
+  const CgType* ty = cg_type_get(a->c, t);
+  if (!ty || ty->kind == CFREE_CG_TYPE_VOID) {
     classify_void(out);
     return;
   }
-  switch (t->kind) {
-    case TY_STRUCT:
-    case TY_UNION:
+  switch (ty->kind) {
+    case CFREE_CG_TYPE_RECORD:
       classify_aggregate(a, t, out, is_return);
       return;
+    case CFREE_CG_TYPE_ALIAS:
+      classify_one(a, ty->alias.base, out, is_return);
+      return;
     default:
       classify_scalar(a, t, out);
       return;
   }
 }
 
-static ABIFuncInfo* rv64_compute_func_info(TargetABI* a, const Type* fn) {
+static ABIFuncInfo* rv64_compute_func_info(TargetABI* a, CfreeCgTypeId fn) {
   ABIFuncInfo* info = arena_new(a->c->tu, ABIFuncInfo);
+  const CgType* fnty = cg_type_get(a->c, fn);
   memset(info, 0, sizeof *info);
 
-  classify_one(a, fn->fn.ret, &info->ret, /*is_return=*/1);
+  classify_one(a, fnty->func.ret, &info->ret, /*is_return=*/1);
   info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0;
-  info->variadic = fn->fn.variadic;
+  info->variadic = fnty->func.abi_variadic;
 
-  info->nparams = fn->fn.nparams;
-  if (fn->fn.nparams) {
-    ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fn->fn.nparams);
-    memset(arr, 0, sizeof(ABIArgInfo) * fn->fn.nparams);
-    for (u16 i = 0; i < fn->fn.nparams; ++i) {
-      classify_one(a, fn->fn.params[i], &arr[i], /*is_return=*/0);
+  info->nparams = (u16)fnty->func.nparams;
+  if (fnty->func.nparams) {
+    ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fnty->func.nparams);
+    memset(arr, 0, sizeof(ABIArgInfo) * fnty->func.nparams);
+    for (u32 i = 0; i < fnty->func.nparams; ++i) {
+      classify_one(a, fnty->func.params[i].type, &arr[i], /*is_return=*/0);
     }
     info->params = arr;
   } else {
diff --git a/src/abi/abi_sysv_x64.c b/src/abi/abi_sysv_x64.c
@@ -16,6 +16,7 @@
 #include <string.h>
 
 #include "abi/abi_internal.h"
+#include "api/cg_type.h"
 #include "core/arena.h"
 #include "core/core.h"
 #include "core/pool.h"
@@ -25,7 +26,7 @@ static void classify_void(ABIArgInfo* out) {
   out->kind = ABI_ARG_IGNORE;
 }
 
-static void classify_scalar(TargetABI* a, const Type* t, ABIArgInfo* out) {
+static void classify_scalar(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out) {
   ABITypeInfo ti = abi_internal_type_info(a, t);
   out->kind = ABI_ARG_DIRECT;
   out->flags = ABI_AF_NONE;
@@ -43,7 +44,7 @@ static void classify_scalar(TargetABI* a, const Type* t, ABIArgInfo* out) {
   out->nparts = 1;
 }
 
-static void classify_aggregate(TargetABI* a, const Type* t, ABIArgInfo* out,
+static void classify_aggregate(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out,
                                int is_return) {
   ABITypeInfo ti = abi_internal_type_info(a, t);
   if (ti.size == 0) {
@@ -78,37 +79,42 @@ static void classify_aggregate(TargetABI* a, const Type* t, ABIArgInfo* out,
   }
 }
 
-static void classify_one(TargetABI* a, const Type* t, ABIArgInfo* out,
+static void classify_one(TargetABI* a, CfreeCgTypeId t, ABIArgInfo* out,
                          int is_return) {
-  if (!t || t->kind == TY_VOID) {
+  const CgType* ty = cg_type_get(a->c, t);
+  if (!ty || ty->kind == CFREE_CG_TYPE_VOID) {
     classify_void(out);
     return;
   }
-  switch (t->kind) {
-    case TY_STRUCT:
-    case TY_UNION:
+  switch (ty->kind) {
+    case CFREE_CG_TYPE_RECORD:
       classify_aggregate(a, t, out, is_return);
       return;
+    case CFREE_CG_TYPE_ALIAS:
+      classify_one(a, ty->alias.base, out, is_return);
+      return;
     default:
       classify_scalar(a, t, out);
       return;
   }
 }
 
-static ABIFuncInfo* sysv_x64_compute_func_info(TargetABI* a, const Type* fn) {
+static ABIFuncInfo* sysv_x64_compute_func_info(TargetABI* a,
+                                               CfreeCgTypeId fn) {
   ABIFuncInfo* info = arena_new(a->c->tu, ABIFuncInfo);
+  const CgType* fnty = cg_type_get(a->c, fn);
   memset(info, 0, sizeof *info);
 
-  classify_one(a, fn->fn.ret, &info->ret, /*is_return=*/1);
+  classify_one(a, fnty->func.ret, &info->ret, /*is_return=*/1);
   info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0;
-  info->variadic = fn->fn.variadic;
+  info->variadic = fnty->func.abi_variadic;
 
-  info->nparams = fn->fn.nparams;
-  if (fn->fn.nparams) {
-    ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fn->fn.nparams);
-    memset(arr, 0, sizeof(ABIArgInfo) * fn->fn.nparams);
-    for (u16 i = 0; i < fn->fn.nparams; ++i) {
-      classify_one(a, fn->fn.params[i], &arr[i], /*is_return=*/0);
+  info->nparams = (u16)fnty->func.nparams;
+  if (fnty->func.nparams) {
+    ABIArgInfo* arr = arena_array(a->c->tu, ABIArgInfo, fnty->func.nparams);
+    memset(arr, 0, sizeof(ABIArgInfo) * fnty->func.nparams);
+    for (u32 i = 0; i < fnty->func.nparams; ++i) {
+      classify_one(a, fnty->func.params[i].type, &arr[i], /*is_return=*/0);
     }
     info->params = arr;
   } else {
diff --git a/src/api/cg.c b/src/api/cg.c
@@ -6,6 +6,7 @@
 
 #include "abi/abi.h"
 #include "api/cg_api.h"
+#include "api/cg_type.h"
 #include "arch/arch.h"
 #include "core/arena.h"
 #include "core/heap.h"
@@ -13,62 +14,6 @@
 #include "obj/obj.h"
 #include "type/type.h"
 
-typedef struct CgTypeField {
-  CfreeSym name;
-  CfreeCgTypeId type;
-  u64 offset;
-  u32 align_override;
-} CgTypeField;
-
-typedef struct CgType {
-  CfreeCgTypeKind kind;
-  u64 size;
-  u32 align;
-  u32 pad;
-  union {
-    struct {
-      u32 width;
-    } integer;
-    struct {
-      u32 width;
-    } fp;
-    struct {
-      CfreeCgTypeId pointee;
-      u32 address_space;
-    } ptr;
-    struct {
-      CfreeCgTypeId elem;
-      u64 count;
-    } array;
-    struct {
-      CfreeCgTypeId ret;
-      CfreeCgParam* params;
-      u32 nparams;
-      CfreeCgCallConv call_conv;
-      int abi_variadic;
-      CfreeCgAbiAttrs ret_attrs;
-    } func;
-    struct {
-      CfreeSym tag;
-      CgTypeField* fields;
-      u32 nfields;
-      int is_union;
-      u32 align_override;
-      u32 flags;
-    } record;
-    struct {
-      CfreeSym tag;
-      CfreeCgTypeId base;
-      CfreeCgEnumValue* values;
-      u32 nvalues;
-    } enum_;
-    struct {
-      CfreeSym name;
-      CfreeCgTypeId base;
-    } alias;
-  };
-} CgType;
-
 typedef enum CgApiTypeKind {
   CG_API_TYPE_PTR,
   CG_API_TYPE_ARRAY,
@@ -1179,8 +1124,7 @@ void cg_api_fini(Compiler* c) {
 /* ============================================================
  * CfreeCg: public codegen API implementation
  *
- * Drives CGTarget directly with its own value stack, mirroring
- * the internal CG in src/cg/cg.c but without depending on it.
+ * Drives CGTarget directly with its own value stack.
  * ============================================================ */
 
 typedef enum SResidency {
@@ -2137,8 +2081,8 @@ void cfree_cg_func_begin(CfreeCg* g, CfreeCgSym cg_sym) {
   sym = (ObjSymId)cg_sym;
   fty = api_sym_type(g, cg_sym);
   if (!fty) return;
-  abi = abi_func_info(c->abi, fty);
   attrs = api_sym_attrs(g, cg_sym);
+  abi = abi_func_info(c->abi, fty);
 
   text_sec = obj_section(ob, pool_intern_cstr(c->global, ".text"), SEC_TEXT,
                          SF_EXEC | SF_ALLOC, 4);
@@ -2193,8 +2137,8 @@ CfreeCgSlot cfree_cg_local_slot(CfreeCg* g, CfreeCgTypeId type,
   fsd.type = ty;
   fsd.name = (Sym)attrs.name;
   fsd.loc = g->cur_loc;
-  fsd.size = abi_sizeof(g->c->abi, ty);
-  fsd.align = attrs.align ? attrs.align : abi_alignof(g->c->abi, ty);
+  fsd.size = abi_cg_sizeof(g->c->abi, type);
+  fsd.align = attrs.align ? attrs.align : abi_cg_alignof(g->c->abi, type);
   fsd.kind = FS_LOCAL;
   if (attrs.flags & CFREE_CG_SLOT_ADDRESS_TAKEN) fsd.flags |= FSF_ADDR_TAKEN;
   slot = g->target->frame_slot(g->target, &fsd);
@@ -2216,8 +2160,8 @@ CfreeCgSlot cfree_cg_param_slot(CfreeCg* g, uint32_t index, CfreeCgTypeId type,
   fsd.type = ty;
   fsd.name = (Sym)attrs.name;
   fsd.loc = g->cur_loc;
-  fsd.size = abi_sizeof(g->c->abi, ty);
-  fsd.align = attrs.align ? attrs.align : abi_alignof(g->c->abi, ty);
+  fsd.size = abi_cg_sizeof(g->c->abi, type);
+  fsd.align = attrs.align ? attrs.align : abi_cg_alignof(g->c->abi, type);
   fsd.kind = FS_PARAM;
   if (attrs.flags & CFREE_CG_SLOT_ADDRESS_TAKEN) fsd.flags |= FSF_ADDR_TAKEN;
   slot = g->target->frame_slot(g->target, &fsd);
@@ -2265,8 +2209,8 @@ void cfree_cg_push_float(CfreeCg* g, double value, CfreeCgTypeId type) {
   if (!ty) return;
   T = g->target;
   cb.type = ty;
-  cb.size = (u32)abi_sizeof(g->c->abi, ty);
-  cb.align = (u32)abi_alignof(g->c->abi, ty);
+  cb.size = (u32)abi_cg_sizeof(g->c->abi, type);
+  cb.align = (u32)abi_cg_alignof(g->c->abi, type);
   if (ty->kind == TY_FLOAT)
     u.f = (float)value;
   else
@@ -2305,8 +2249,9 @@ CfreeCgSym cfree_cg_const_data(CfreeCg* g, const uint8_t* data, size_t len,
   if (!pty) return CFREE_CG_SYM_NONE;
   sec_name = pool_intern_cstr(c->global, ".rodata");
   sec = obj_section(ob, sec_name, SEC_RODATA, SF_ALLOC,
-                    align ? align : (u32)abi_alignof(c->abi, pty));
-  base = obj_align_to(ob, sec, align ? align : (u32)abi_alignof(c->abi, pty));
+                    align ? align : (u32)abi_cg_alignof(c->abi, pointee_type));
+  base = obj_align_to(
+      ob, sec, align ? align : (u32)abi_cg_alignof(c->abi, pointee_type));
   obj_write(ob, sec, data, len);
   snprintf(name_buf, sizeof(name_buf), ".Lcfree_ro.%u", g->rodata_counter++);
   anon_name = pool_intern_cstr(c->global, name_buf);
@@ -2647,7 +2592,8 @@ static void api_cg_convert_kind(CfreeCg* g, CfreeCgTypeId dst_type,
     api_push(g, v);
     return;
   }
-  if (ck == CV_BITCAST && abi_sizeof(g->c->abi, sty) == abi_sizeof(g->c->abi, dty) &&
+  if (ck == CV_BITCAST &&
+      abi_sizeof(g->c->abi, sty) == abi_cg_sizeof(g->c->abi, dst_type) &&
       api_type_class(sty) == api_type_class(dty)) {
     v.type = dty;
     v.op.type = dty;
@@ -2757,8 +2703,8 @@ void cfree_cg_float_to_uint(CfreeCg* g, CfreeCgTypeId dst,
  * ============================================================ */
 
 static IntrinKind api_map_intrinsic(CfreeCg* g, CfreeCgIntrinsic intrin,
-                                    const Type* result_type) {
-  u32 size = result_type ? abi_sizeof(g->c->abi, result_type) : 0;
+                                    CfreeCgTypeId result_type) {
+  u32 size = result_type ? abi_cg_sizeof(g->c->abi, result_type) : 0;
   switch (intrin) {
     case CFREE_CG_INTRIN_TRAP:
       return INTRIN_TRAP;
@@ -2845,7 +2791,7 @@ void cfree_cg_intrinsic(CfreeCg* g, CfreeCgIntrinsic intrin, uint32_t nargs,
   h = g->c->env->heap;
   rty = resolve_type(g->c, result_type);
   int_ty = type_prim(g->c->global, TY_INT);
-  kind = api_map_intrinsic(g, intrin, rty);
+  kind = api_map_intrinsic(g, intrin, result_type);
   if (kind == INTRIN_NONE) {
     compiler_panic(g->c, g->cur_loc, "CfreeCg: unsupported intrinsic");
     return;
@@ -2917,8 +2863,10 @@ static MemAccess api_mem_for_atomic(CfreeCg* g, const Type* val_ty) {
   MemAccess ma;
   memset(&ma, 0, sizeof ma);
   ma.type = val_ty;
-  ma.size = val_ty ? abi_sizeof(g->c->abi, val_ty) : 0;
-  ma.align = val_ty ? abi_alignof(g->c->abi, val_ty) : 0;
+  ma.size = val_ty ? abi_cg_sizeof(g->c->abi, cg_api_type_import(g->c, val_ty))
+                   : 0;
+  ma.align =
+      val_ty ? abi_cg_alignof(g->c->abi, cg_api_type_import(g->c, val_ty)) : 0;
   ma.flags = MF_ATOMIC;
   ma.alias.kind = (u8)ALIAS_UNKNOWN;
   return ma;
@@ -2929,13 +2877,13 @@ int cfree_cg_atomic_is_legal(CfreeCompiler* c, CfreeCgMemAccess access,
   const Type* ty = resolve_type(c, access.type);
   (void)order;
   if (!ty) return 0;
-  return abi_sizeof(c->abi, ty) <= 8;
+  return abi_cg_sizeof(c->abi, access.type) <= 8;
 }
 
 int cfree_cg_atomic_is_lock_free(CfreeCompiler* c, CfreeCgMemAccess access) {
   const Type* ty = resolve_type(c, access.type);
   if (!ty) return 0;
-  return abi_sizeof(c->abi, ty) <= (u32)c->target.ptr_size;
+  return abi_cg_sizeof(c->abi, access.type) <= (u32)c->target.ptr_size;
 }
 
 void cfree_cg_atomic_load(CfreeCg* g, CfreeCgMemAccess access,
@@ -3569,8 +3517,8 @@ CfreeCgScope cfree_cg_scope_begin(CfreeCg* g, CfreeCgTypeId result_type) {
     FrameSlotDesc fsd;
     memset(&fsd, 0, sizeof fsd);
     fsd.type = s->result_type;
-    fsd.size = abi_sizeof(g->c->abi, s->result_type);
-    fsd.align = abi_alignof(g->c->abi, s->result_type);
+    fsd.size = abi_cg_sizeof(g->c->abi, result_type);
+    fsd.align = abi_cg_alignof(g->c->abi, result_type);
     fsd.kind = FS_LOCAL;
     s->result_slot = g->target->frame_slot(g->target, &fsd);
   }
@@ -3928,7 +3876,7 @@ void cfree_cg_index(CfreeCg* g, uint64_t offset) {
                    "CfreeCg: index base is not a pointer or array lvalue");
     return;
   }
-  elemsz = (u32)abi_sizeof(g->c->abi, elem_ty);
+  elemsz = (u32)abi_cg_sizeof(g->c->abi, cg_api_type_import(g->c, elem_ty));
   idx_ty = idx.type ? idx.type : idx.op.type;
   if (!idx_ty) idx_ty = type_prim(g->c->global, TY_INT);
   if (base_ty && base_ty->kind == TY_ARRAY) {
@@ -3982,7 +3930,7 @@ void cfree_cg_field(CfreeCg* g, uint32_t field_index) {
     compiler_panic(g->c, g->cur_loc, "CfreeCg: field base is not an lvalue");
     return;
   }
-  layout = abi_record_layout(g->c->abi, rec_ty);
+  layout = abi_cg_record_layout(g->c->abi, cg_api_type_import(g->c, rec_ty));
   if (!layout || field_index >= layout->nfields) {
     compiler_panic(g->c, g->cur_loc, "CfreeCg: invalid field index");
     return;
@@ -4361,7 +4309,7 @@ void cfree_cg_data_begin(CfreeCg* g, CfreeCgSym cg_sym,
   ty = api_sym_type(g, cg_sym);
   if (!ty) return;
   decl_attrs = api_sym_attrs(g, cg_sym);
-  align = attrs.align ? attrs.align : (u32)abi_alignof(c->abi, ty);
+  align = attrs.align ? attrs.align : (u32)abi_cg_alignof(c->abi, decl_attrs.type);
   if (!attrs.section && decl_attrs.as.object.section) {
     attrs.section = decl_attrs.as.object.section;
   }
@@ -4408,7 +4356,7 @@ void cfree_cg_data_begin(CfreeCg* g, CfreeCgSym cg_sym,
   g->data_size = 0;
   if (sym != OBJ_SYM_NONE) {
     obj_symbol_define(ob, sym, sec, (u64)g->data_base,
-                      (u64)abi_sizeof(c->abi, ty));
+                      (u64)abi_cg_sizeof(c->abi, decl_attrs.type));
   }
 }
 
@@ -4458,7 +4406,7 @@ void cfree_cg_data_int(CfreeCg* g, uint64_t value, CfreeCgTypeId type) {
   if (!g) return;
   ty = resolve_type(g->c, type);
   if (!ty) return;
-  size = (u32)abi_sizeof(g->c->abi, ty);
+  size = (u32)abi_cg_sizeof(g->c->abi, type);
   if (size > sizeof(bytes)) return;
   for (u32 i = 0; i < size; ++i) {
     u32 shift = g->c->target.big_endian ? (size - 1u - i) * 8u : i * 8u;
diff --git a/src/api/cg_api.h b/src/api/cg_api.h
@@ -3,12 +3,12 @@
 
 #include <cfree/cg.h>
 
+#include "api/cg_type.h"
 #include "core/core.h"
 #include "type/type.h"
 
 typedef struct CGTarget CGTarget;
 typedef struct MCEmitter MCEmitter;
-typedef struct CgType CgType;
 typedef uint32_t ObjSymId;
 
 enum {
@@ -21,13 +21,6 @@ enum {
 
 const Type* cg_api_type_resolve(Compiler*, CfreeCgTypeId);
 CfreeCgTypeId cg_api_type_import(Compiler*, const Type*);
-const CgType* cg_type_get(Compiler*, CfreeCgTypeId);
-uint64_t cg_type_size(Compiler*, CfreeCgTypeId);
-uint32_t cg_type_align(Compiler*, CfreeCgTypeId);
-int cg_type_is_int(Compiler*, CfreeCgTypeId);
-int cg_type_is_float(Compiler*, CfreeCgTypeId);
-int cg_type_is_ptr(Compiler*, CfreeCgTypeId);
-int cg_type_is_record(Compiler*, CfreeCgTypeId);
 Compiler* cfree_cg_internal_compiler(CfreeCg*);
 CGTarget* cfree_cg_internal_target(CfreeCg*);
 MCEmitter* cfree_cg_internal_mc(CfreeCg*);
diff --git a/src/api/cg_type.h b/src/api/cg_type.h
@@ -0,0 +1,72 @@
+#ifndef CFREE_API_CG_TYPE_H
+#define CFREE_API_CG_TYPE_H
+
+#include <cfree/cg.h>
+
+#include "core/core.h"
+
+typedef struct CgTypeField {
+  CfreeSym name;
+  CfreeCgTypeId type;
+  u64 offset;
+  u32 align_override;
+} CgTypeField;
+
+typedef struct CgType {
+  CfreeCgTypeKind kind;
+  u64 size;
+  u32 align;
+  u32 pad;
+  union {
+    struct {
+      u32 width;
+    } integer;
+    struct {
+      u32 width;
+    } fp;
+    struct {
+      CfreeCgTypeId pointee;
+      u32 address_space;
+    } ptr;
+    struct {
+      CfreeCgTypeId elem;
+      u64 count;
+    } array;
+    struct {
+      CfreeCgTypeId ret;
+      CfreeCgParam* params;
+      u32 nparams;
+      CfreeCgCallConv call_conv;
+      int abi_variadic;
+      CfreeCgAbiAttrs ret_attrs;
+    } func;
+    struct {
+      CfreeSym tag;
+      CgTypeField* fields;
+      u32 nfields;
+      int is_union;
+      u32 align_override;
+      u32 flags;
+    } record;
+    struct {
+      CfreeSym tag;
+      CfreeCgTypeId base;
+      CfreeCgEnumValue* values;
+      u32 nvalues;
+    } enum_;
+    struct {
+      CfreeSym name;
+      CfreeCgTypeId base;
+    } alias;
+  };
+} CgType;
+
+const CgType* cg_type_get(Compiler*, CfreeCgTypeId);
+uint64_t cg_type_size(Compiler*, CfreeCgTypeId);
+uint32_t cg_type_align(Compiler*, CfreeCgTypeId);
+int cg_type_is_int(Compiler*, CfreeCgTypeId);
+int cg_type_is_float(Compiler*, CfreeCgTypeId);
+int cg_type_is_ptr(Compiler*, CfreeCgTypeId);
+int cg_type_is_record(Compiler*, CfreeCgTypeId);
+
+#endif
diff --git a/src/api/stubs.c b/src/api/stubs.c
@@ -34,8 +34,7 @@ static _Noreturn void unimplemented(Compiler* c, const char* what) {
 /* Preprocessor implementation lives in src/pp/pp.c. */
 
 /* parse_c lives in src/parse/parse.c.  parse_asm lives in
- * src/parse/parse_asm.c.  DeclTable lives in src/decl/decl.c.
- * CG lives in src/cg/cg.c. */
+ * src/parse/parse_asm.c.  DeclTable lives in src/decl/decl.c. */
 
 /* mc_new / mc_free live in src/arch/mc.c.
  * cgtarget_new / cgtarget_finalize / cgtarget_free live in src/arch/<target>.c
diff --git a/src/arch/arch.h b/src/arch/arch.h
@@ -4,7 +4,8 @@
 #include "abi/abi.h"
 #include "core/core.h"
 #include "obj/obj.h"
-#include "type/type.h"
+
+typedef struct Type Type;
 
 /* Forward-declared so CGTarget can carry an optional Debug* without
  * pulling debug/debug.h into every translation unit that includes arch.h.
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -1,1995 +0,0 @@
-/* Single-pass code generator with a TCC-style value stack.
- *
- * The parser pushes values (lvalues, immediates, register rvalues) and
- * issues operations; cg materializes operands and dispatches to CGTarget.
- * No AST. At -O0 the wrapped target backend is a real CGTarget; at -O1+
- * opt_cgtarget records the same calls into IR for cross-function passes.
- *
- * Value stack semantics:
- *   - SValue.op carries an Operand whose `kind` decides what the value is.
- *   - OPK_IMM / OPK_REG are rvalues (can be consumed by binop/cmp/store).
- *   - OPK_LOCAL / OPK_GLOBAL / OPK_INDIRECT are lvalues. cg_load promotes
- *     them to OPK_REG via target->load + a fresh scratch register.
- *
- * Register pressure & spill:
- *   - Each SValue carries an SResidency tag (INHERENT / REG / SPILLED).
- *     REG-residing SValues own a physical scratch register that must be
- *     released back to the pool when the value is consumed; SPILLED
- *     SValues own a frame slot instead and must be reloaded before use.
- *   - alloc_reg_or_spill is the single allocation entry point. On pool
- *     exhaustion it picks the deepest unpinned RES_REG SValue from the
- *     value stack as the spill victim, evicts its register through
- *     T->spill_reg, and retries. ensure_reg is the dual: it reloads a
- *     SPILLED SValue's register before consumption, possibly evicting
- *     another value to make room.
- *   - Pop sites (binop, cmp, store, branch, call, ...) call release()
- *     to return regs/slots after consumption — there is no statement-
- *     boundary scratch reset; ownership tracking is the discipline.
- *   - cg_call additionally exposes its in-flight CGABIValue array via
- *     CG.avs_in_flight so the spill driver can re-spill an already-
- *     materialized arg's register (rewriting storage to OPK_LOCAL) when
- *     the value stack runs out of victims; this lets calls with more
- *     reg-class args than the pool size can hold lower correctly.
- *
- * Some aggregate and backend-specific intrinsic cases are still limited by
- * their corpus rows. The interface in cg.h is the commitment; this file fills
- * in the slice that's exercised today. */
-
-#include "cg/cg.h"
-
-#include <string.h>
-
-#include "abi/abi.h"
-#include "arch/arch.h"
-#include "cg/fold.h"
-#include "core/arena.h"
-#include "core/core.h"
-#include "core/heap.h"
-#include "core/pool.h"
-#include "debug/debug.h"
-#include "obj/obj.h"
-#include "type/type.h"
-
-/* ============================================================
- * Value stack
- * ============================================================ */
-
-/* Residency: where the value's storage actually lives. INHERENT values
- * (IMM, LOCAL, GLOBAL) carry no register obligation. REG values own a
- * physical scratch register. SPILLED values had their register evicted
- * to a frame slot under register pressure and must be reloaded before
- * consumption. */
-typedef enum SResidency {
-  RES_INHERENT,
-  RES_REG,
-  RES_SPILLED,
-} SResidency;
-
-typedef struct SValue {
-  Operand op;       /* IMM/REG (rvalue) or LOCAL/GLOBAL/INDIRECT (lvalue) */
-  const Type* type; /* C semantic type of the value (post-promotion) */
-  u8 res;           /* SResidency */
-  u8 pinned;        /* 1 = ineligible spill victim (cleared per CG op) */
-  u8 pad[2];
-  FrameSlot spill_slot; /* valid iff res == RES_SPILLED */
-} SValue;
-
-#define CG_STACK_INITIAL 16u
-#define CG_SPILL_FREE_INITIAL 4u
-
-struct CG {
-  Compiler* c;
-  CGTarget* target;
-  Debug* debug;
-  TargetABI* abi;
-  Pool* pool;
-
-  /* Function scope */
-  const CGFuncDesc* fn_desc;
-  ObjSymId fn_sym;
-  ObjSecId fn_text_sec;
-  u32 fn_begin_pos;
-  const Type* fn_ret_type;
-  const ABIFuncInfo* fn_abi;
-
-  SrcLoc cur_loc;
-
-  /* Value stack — grown via heap; arena would also work but heap is fine
-   * since it's freed in cg_free. */
-  SValue* stack;
-  u32 sp;
-  u32 cap;
-
-  /* Per-function spill-slot free-lists, one per RegClass. A spill takes a
-   * slot from the free-list (allocating fresh from the backend if empty);
-   * a reload returns the slot. Frame footprint is bounded by the peak
-   * concurrent spills per class. Reset at func_end. */
-  struct {
-    FrameSlot* free;
-    u32 n;
-    u32 cap;
-  } slot_pools[3]; /* indexed by RegClass; RC_VEC reserved */
-
-  /* Set during cg_call's pop+materialize loop to point at the call's
-   * in-flight CGABIValue array. When alloc_reg_or_spill exhausts the
-   * stack victim list, it falls back to spilling an OPK_REG arg
-   * storage entry from this array, rewriting it to OPK_LOCAL so the
-   * backend's call lowering loads it from the spill slot. NULL outside
-   * cg_call. Without this fallback, calls with more reg-class args
-   * than the pool can hold (e.g. 10+ INT args on aarch64) would have
-   * unreclaimable regs sitting in avs[] while the value stack runs
-   * out of victims. */
-  CGABIValue* avs_in_flight;
-  u32 avs_in_flight_n;
-};
-
-static void stack_grow(CG* g, u32 want) {
-  Heap* h = g->c->env->heap;
-  u32 cap = g->cap;
-  SValue* nb;
-  if (cap >= want) return;
-  while (cap < want) cap = cap ? cap * 2u : CG_STACK_INITIAL;
-  nb = (SValue*)h->alloc(h, sizeof(SValue) * cap, _Alignof(SValue));
-  if (g->stack) {
-    memcpy(nb, g->stack, sizeof(SValue) * g->sp);
-    h->free(h, g->stack, sizeof(SValue) * g->cap);
-  }
-  g->stack = nb;
-  g->cap = cap;
-}
-
-static void push(CG* g, SValue v) {
-  stack_grow(g, g->sp + 1);
-  g->stack[g->sp++] = v;
-}
-
-/* __int128 / unsigned __int128 are parsed and laid out, but no backend
- * implements arithmetic, conversion, or load/store on them yet. Trip a
- * clear panic at the first codegen op that would touch the value. */
-static void reject_int128(CG* g, const Type* ty, const char* where) {
-  if (ty && (ty->kind == TY_INT128 || ty->kind == TY_UINT128)) {
-    compiler_panic(g->c, g->cur_loc,
-                   "%s: __int128 codegen not implemented", where);
-  }
-}
-
-static SValue pop(CG* g) {
-  if (g->sp == 0) {
-    compiler_panic(g->c, g->cur_loc, "cg: stack underflow");
-  }
-  return g->stack[--g->sp];
-}
-
-
-/* Residency of an Operand as it should land on the stack at construction
- * time. REG → owns a register; INDIRECT → owns its base register; the
- * rest carry no register obligation. */
-static u8 residency_for(const Operand* o) {
-  if (o->kind == OPK_REG) return RES_REG;
-  if (o->kind == OPK_INDIRECT) return RES_REG;
-  return RES_INHERENT;
-}
-
-static SValue make_sv(Operand op, const Type* ty) {
-  SValue sv;
-  memset(&sv, 0, sizeof sv);
-  sv.op = op;
-  sv.type = ty;
-  sv.res = residency_for(&op);
-  sv.spill_slot = FRAME_SLOT_NONE;
-  return sv;
-}
-
-/* ============================================================
- * Operand sugar
- * ============================================================ */
-
-static u8 type_class(const Type* ty) {
-  if (ty && (ty->kind == TY_FLOAT || ty->kind == TY_DOUBLE ||
-             ty->kind == TY_LDOUBLE)) {
-    return RC_FP;
-  }
-  return RC_INT;
-}
-
-static Operand op_imm(i64 v, const Type* ty) {
-  Operand o;
-  memset(&o, 0, sizeof o);
-  o.kind = OPK_IMM;
-  o.cls = type_class(ty);
-  o.type = ty;
-  o.v.imm = v;
-  return o;
-}
-
-static Operand op_reg(Reg r, const Type* ty) {
-  Operand o;
-  memset(&o, 0, sizeof o);
-  o.kind = OPK_REG;
-  o.cls = type_class(ty);
-  o.type = ty;
-  o.v.reg = r;
-  return o;
-}
-
-static Operand op_local(FrameSlot s, const Type* ty) {
-  Operand o;
-  memset(&o, 0, sizeof o);
-  o.kind = OPK_LOCAL;
-  o.cls = RC_INT;
-  o.type = ty;
-  o.v.frame_slot = s;
-  return o;
-}
-
-static Operand op_global(ObjSymId sym, i64 addend, const Type* ty) {
-  Operand o;
-  memset(&o, 0, sizeof o);
-  o.kind = OPK_GLOBAL;
-  o.cls = RC_INT;
-  o.type = ty;
-  o.v.global.sym = sym;
-  o.v.global.addend = addend;
-  return o;
-}
-
-/* ============================================================
- * MemAccess derivation
- * ============================================================ */
-
-static MemAccess derive_mem(CG* g, const Type* ty, AliasKind alias_kind,
-                            i32 alias_local) {
-  MemAccess m;
-  memset(&m, 0, sizeof m);
-  m.type = ty;
-  m.size = abi_sizeof(g->abi, ty);
-  m.align = abi_alignof(g->abi, ty);
-  m.flags = MF_NONE;
-  if (ty && (ty->qual & Q_VOLATILE)) m.flags |= MF_VOLATILE;
-  if (ty && (ty->qual & Q_ATOMIC)) m.flags |= MF_ATOMIC;
-  m.alias.kind = (u8)alias_kind;
-  if (alias_kind == ALIAS_LOCAL) {
-    m.alias.v.local_id = alias_local;
-  }
-  return m;
-}
-
-/* Pick an alias root from an lvalue Operand. */
-static AliasKind alias_for_lvalue(const Operand* o) {
-  switch (o->kind) {
-    case OPK_LOCAL:
-      return ALIAS_LOCAL;
-    case OPK_GLOBAL:
-      return ALIAS_GLOBAL;
-    case OPK_INDIRECT:
-    default:
-      return ALIAS_UNKNOWN;
-  }
-}
-
-/* MemAccess for a load/store through an lvalue Operand. Routes through
- * derive_mem with the alias root and any local-id taken from the lvalue
- * itself. The Operand must be OPK_LOCAL/GLOBAL/INDIRECT. */
-static MemAccess mem_for_lvalue(CG* g, const Operand* lv, const Type* ty) {
-  AliasKind ak = alias_for_lvalue(lv);
-  i32 al = (ak == ALIAS_LOCAL) ? (i32)lv->v.frame_slot : 0;
-  return derive_mem(g, ty, ak, al);
-}
-
-/* C type carried by an SValue, with a fallback to the Operand's own type
- * field for SValues whose `type` was never set (notably the bare-slot
- * cg_push_local path). The two should agree when both present. */
-static const Type* sv_type(const SValue* sv) {
-  return sv->type ? sv->type : sv->op.type;
-}
-
-/* Build an OPK_INDIRECT Operand for [base + ofs] of type `ty`. The base
- * register is always RC_INT (it holds a pointer). */
-static Operand op_indirect(Reg base, i32 ofs, const Type* ty) {
-  Operand o;
-  memset(&o, 0, sizeof o);
-  o.kind = OPK_INDIRECT;
-  o.cls = RC_INT;
-  o.type = ty;
-  o.v.ind.base = base;
-  o.v.ind.ofs = ofs;
-  return o;
-}
-
-/* ============================================================
- * Register-pool & spill driver
- * ============================================================ */
-
-/* Class an SValue's register lives in: RC_FP for float types, RC_INT for
- * everything else (including OPK_INDIRECT base regs which always hold a
- * pointer-sized integer). */
-static u8 class_of_sv(const SValue* sv) {
-  if (sv->op.kind == OPK_INDIRECT) return RC_INT;
-  return type_class(sv_type(sv));
-}
-
-/* The C type whose width matches the register an SValue currently owns.
- * For OPK_INDIRECT, that's a pointer to the lvalue's type (the base reg
- * holds an address); for everything else, the SValue's plain type. Used
- * by the spill/reload machinery to size the slot store/load. */
-static const Type* sv_owned_reg_type(CG* g, const SValue* sv) {
-  if (sv->op.kind == OPK_INDIRECT) {
-    return type_ptr(g->pool, sv->type ? sv->type : type_void(g->pool));
-  }
-  return sv_type(sv);
-}
-
-static u8 reg_of_sv(const SValue* sv) {
-  if (sv->op.kind == OPK_REG) return (u8)sv->op.v.reg;
-  if (sv->op.kind == OPK_INDIRECT) return (u8)sv->op.v.ind.base;
-  return 0;
-}
-
-/* Write the register an SValue owns. For OPK_INDIRECT the ind.base
- * field; for OPK_REG the v.reg field. Used by ensure_reg to restore
- * the reloaded reg, and by alloc_reg_or_spill to mark a freshly-
- * spilled SValue's reg slot as REG_NONE so any stale read fails fast. */
-static void set_owned_reg(SValue* sv, Reg r) {
-  if (sv->op.kind == OPK_INDIRECT) {
-    sv->op.v.ind.base = r;
-  } else {
-    sv->op.v.reg = r;
-  }
-}
-
-/* MemAccess for a spill/reload. Width is driven by the type of the value
- * whose register is moving — pointer-sized for OPK_INDIRECT bases,
- * otherwise the value's own type. The alias root is ALIAS_LOCAL with
- * id 0: spill traffic is internal CG bookkeeping that opt's alias
- * analysis never inspects, and the slot itself disambiguates accesses. */
-static MemAccess mem_for_spill(CG* g, const SValue* sv) {
-  return derive_mem(g, sv_owned_reg_type(g, sv), ALIAS_LOCAL, 0);
-}
-
-/* Per-class spill-slot size in bytes. FP slots are 16 bytes to cover
- * `double` and the spilled portion of `long double`; INT slots are
- * pointer-width. Both values are also the slot's alignment. */
-static u32 spill_slot_size(u8 cls) { return (cls == RC_FP) ? 16u : 8u; }
-
-/* Take a spill slot from the per-class free-list, or allocate a fresh one. */
-static FrameSlot take_spill_slot(CG* g, u8 cls) {
-  if (g->slot_pools[cls].n > 0) {
-    return g->slot_pools[cls].free[--g->slot_pools[cls].n];
-  }
-  FrameSlotDesc d;
-  memset(&d, 0, sizeof d);
-  d.size = spill_slot_size(cls);
-  d.align = d.size;
-  d.kind = FS_SPILL;
-  return g->target->frame_slot(g->target, &d);
-}
-
-static void return_spill_slot(CG* g, FrameSlot s, u8 cls) {
-  if (s == FRAME_SLOT_NONE) return;
-  if (g->slot_pools[cls].n == g->slot_pools[cls].cap) {
-    Heap* h = g->c->env->heap;
-    u32 cap = g->slot_pools[cls].cap;
-    u32 nc = cap ? cap * 2u : CG_SPILL_FREE_INITIAL;
-    FrameSlot* nb =
-        (FrameSlot*)h->alloc(h, sizeof(FrameSlot) * nc, _Alignof(FrameSlot));
-    if (g->slot_pools[cls].free) {
-      memcpy(nb, g->slot_pools[cls].free,
-             sizeof(FrameSlot) * g->slot_pools[cls].n);
-      h->free(h, g->slot_pools[cls].free, sizeof(FrameSlot) * cap);
-    }
-    g->slot_pools[cls].free = nb;
-    g->slot_pools[cls].cap = nc;
-  }
-  g->slot_pools[cls].free[g->slot_pools[cls].n++] = s;
-}
-
-/* Walk the value stack from index 0 upward and return the first
- * unpinned RES_REG SValue that matches `cls`. FIFO-from-bottom == the
- * deepest live value, which matches the intuition that the top of
- * stack is about to be consumed. Pinned entries are skipped — they're
- * the operands of an in-flight CG op that mustn't be evicted from
- * under itself (see cg_dup, which keeps its source on the stack while
- * allocating the duplicate's destination register). Returns NULL if
- * no eligible victim exists; that's a CG bug at this call site. */
-static SValue* pick_victim(CG* g, u8 cls) {
-  for (u32 i = 0; i < g->sp; ++i) {
-    SValue* sv = &g->stack[i];
-    if (sv->res != RES_REG) continue;
-    if (sv->pinned) continue;
-    if (class_of_sv(sv) != cls) continue;
-    return sv;
-  }
-  return NULL;
-}
-
-/* Release the resources owned by a single in-flight CGABIValue arg
- * after the call has returned: REG storage goes back to the reg pool,
- * LOCAL storage produced by spill_avs_victim returns its slot to the
- * per-class spill-slot pool, INDIRECT storage (an aggregate arg reached
- * through a pointer, e.g. `f(p->aggr)`) returns its base reg. IMM and
- * other kinds carry no runtime ownership and need nothing.
- *
- * Aggregate-typed OPK_LOCAL is a borrowed lvalue — the slot belongs to
- * a user local or a stable byval/return frame slot — and must NOT
- * return to the spill pool (size and class mismatch corrupt it). The
- * scalar-vs-aggregate type check is the discriminator since
- * spill_avs_victim only ever spills scalar-typed REG storage. */
-static void release_arg_storage(CG* g, const Operand* st) {
-  if (st->kind == OPK_REG) {
-    g->target->free_reg(g->target, st->v.reg, st->cls);
-  } else if (st->kind == OPK_LOCAL) {
-    const Type* t = st->type;
-    if (t && (t->kind == TY_STRUCT || t->kind == TY_UNION)) return;
-    return_spill_slot(g, st->v.frame_slot, st->cls);
-  } else if (st->kind == OPK_INDIRECT) {
-    g->target->free_reg(g->target, st->v.ind.base, RC_INT);
-  }
-}
-
-/* Try to spill a CGABIValue arg from the in-flight set whose `storage`
- * is an OPK_REG of `cls`. Mutates the entry: storage becomes OPK_LOCAL
- * pointing at a freshly-taken spill slot, and the backend's call
- * lowering will load from there when materializing the arg into its
- * ABI register. Returns 1 if a victim was found and spilled, 0 if none
- * was eligible. Used by alloc_reg_or_spill as a fallback when the
- * value stack has no eligible victim — see CG.avs_in_flight. */
-static int spill_avs_victim(CG* g, u8 cls) {
-  CGTarget* T = g->target;
-  if (!g->avs_in_flight) return 0;
-  for (u32 i = 0; i < g->avs_in_flight_n; ++i) {
-    CGABIValue* av = &g->avs_in_flight[i];
-    if (av->storage.kind != OPK_REG) continue;
-    if (av->storage.cls != cls) continue;
-    FrameSlot slot = take_spill_slot(g, cls);
-    SValue tmp = make_sv(av->storage, av->type);
-    MemAccess ma = mem_for_spill(g, &tmp);
-    T->spill_reg(T, av->storage, slot, ma);
-    /* Rewrite storage to OPK_LOCAL so the backend reads from the slot.
-     * The cls is preserved on the operand so the cleanup loop knows
-     * which spill-slot pool to return the slot to. */
-    Operand local = op_local(slot, av->type);
-    local.cls = cls;
-    av->storage = local;
-    return 1;
-  }
-  return 0;
-}
-
-/* Allocate a register; on pool exhaustion, spill the deepest live
- * RES_REG value to a frame slot and try again. If the value stack has
- * no eligible victim, fall back to spilling an in-flight cg_call arg
- * (see avs_in_flight) — without that fallback, a call with more args
- * than the pool size can hold becomes unsatisfiable since the popped-
- * but-not-yet-emitted arg regs would sit unreclaimable in avs[]. */
-static Reg alloc_reg_or_spill(CG* g, u8 cls, const Type* ty) {
-  CGTarget* T = g->target;
-  Reg r = T->alloc_reg(T, cls, ty);
-  if (r != (Reg)REG_NONE) return r;
-
-  SValue* victim = pick_victim(g, cls);
-  if (victim) {
-    FrameSlot slot = take_spill_slot(g, cls);
-    Operand victim_reg = op_reg((Reg)reg_of_sv(victim), victim->type);
-    T->spill_reg(T, victim_reg, slot, mem_for_spill(g, victim));
-
-    victim->spill_slot = slot;
-    victim->res = RES_SPILLED;
-    /* Mark the reg slot as REG_NONE so any stale read fails fast; the
-     * reload via ensure_reg will write the new reg back into the same
-     * field. */
-    set_owned_reg(victim, (Reg)REG_NONE);
-  } else if (!spill_avs_victim(g, cls)) {
-    compiler_panic(g->c, g->cur_loc,
-                   "cg: regalloc — no spillable victim (class %u)",
-                   (unsigned)cls);
-  }
-
-  r = T->alloc_reg(T, cls, ty);
-  if (r == (Reg)REG_NONE) {
-    compiler_panic(g->c, g->cur_loc,
-                   "cg: regalloc — class %u still empty after spill",
-                   (unsigned)cls);
-  }
-  return r;
-}
-
-/* Reload a spilled SValue back into a register (possibly evicting another
- * live value). After return, `sv->res == RES_REG` and the operand reflects
- * the reloaded register. INDIRECT lvalues are restored to OPK_INDIRECT
- * with a fresh base; the deferred-load identity is preserved. */
-static void ensure_reg(CG* g, SValue* sv) {
-  if (sv->res != RES_SPILLED) return;
-  CGTarget* T = g->target;
-  u8 cls = class_of_sv(sv);
-  const Type* ty = sv_owned_reg_type(g, sv);
-  Reg r = alloc_reg_or_spill(g, cls, ty);
-  T->reload_reg(T, op_reg(r, ty), sv->spill_slot, mem_for_spill(g, sv));
-  return_spill_slot(g, sv->spill_slot, cls);
-  sv->spill_slot = FRAME_SLOT_NONE;
-  /* For INDIRECT, the lvalue's deferred-load identity is preserved: only
-   * the base reg changes. For everything else, refresh the whole REG
-   * operand so its `type` matches the SValue's current type tag. */
-  if (sv->op.kind == OPK_INDIRECT) {
-    sv->op.v.ind.base = r;
-  } else {
-    sv->op = op_reg(r, sv_type(sv));
-  }
-  sv->res = RES_REG;
-}
-
-/* Release any register or spill slot owned by an SValue popped off the
- * stack and not consumed by a downstream operation. */
-static void release(CG* g, SValue* sv) {
-  if (sv->res == RES_REG) {
-    g->target->free_reg(g->target, (Reg)reg_of_sv(sv), class_of_sv(sv));
-  } else if (sv->res == RES_SPILLED) {
-    return_spill_slot(g, sv->spill_slot, class_of_sv(sv));
-    sv->spill_slot = FRAME_SLOT_NONE;
-  }
-  sv->res = RES_INHERENT;
-}
-
-/* ============================================================
- * Construction
- * ============================================================ */
-
-CG* cg_new(Compiler* c, CGTarget* t, Debug* d) {
-  Heap* h = c->env->heap;
-  CG* g = (CG*)h->alloc(h, sizeof(CG), _Alignof(CG));
-  memset(g, 0, sizeof *g);
-  g->c = c;
-  g->target = t;
-  g->debug = d;
-  g->abi = c->abi;
-  g->pool = c->global;
-  /* Wire Debug into the backend so per-instruction emit calls can attribute
-   * line rows. cg owns this hookup per DESIGN §11. */
-  if (t) t->debug = d;
-  if (t && t->mc) t->mc->debug = d;
-  return g;
-}
-
-void cg_free(CG* g) {
-  Heap* h;
-  if (!g) return;
-  h = g->c->env->heap;
-  if (g->stack) h->free(h, g->stack, sizeof(SValue) * g->cap);
-  for (u32 c = 0; c < 3; ++c) {
-    if (g->slot_pools[c].free) {
-      h->free(h, g->slot_pools[c].free,
-              sizeof(FrameSlot) * g->slot_pools[c].cap);
-    }
-  }
-  h->free(h, g, sizeof *g);
-}
-
-CGTarget* cg_target(CG* g) {
-  return g ? g->target : NULL;
-}
-
-/* ============================================================
- * Function lifecycle
- * ============================================================ */
-
-void cg_func_begin(CG* g, const CGFuncDesc* fd) {
-  CGTarget* T = g->target;
-  g->fn_desc = fd;
-  g->fn_sym = fd->sym;
-  g->fn_text_sec = fd->text_section_id;
-  g->fn_ret_type = fd->fn_type ? fd->fn_type->fn.ret : NULL;
-  g->fn_abi = fd->abi;
-  g->sp = 0;
-  /* Per-function spill-slot free-lists reset. The backing arrays are
-   * reused; only the counts go to zero since slot ids belong to the new
-   * function's frame. */
-  for (u32 c = 0; c < 3; ++c) g->slot_pools[c].n = 0;
-
-  /* Class-1 DWARF: a new subprogram opens. doc/DWARF.md §3.1 makes this
-   * the parser's job; we forward through cg as a convenience hook. */
-  if (g->debug) {
-    debug_func_begin(g->debug, fd->sym, DEBUG_TYPE_NONE, fd->loc);
-  }
-
-  g->fn_begin_pos = T->mc ? T->mc->pos(T->mc) : 0u;
-  T->func_begin(T, fd);
-}
-
-void cg_func_end(CG* g) {
-  CGTarget* T = g->target;
-  T->func_end(T);
-  if (g->debug && T->mc) {
-    u32 end_pos = T->mc->pos(T->mc);
-    debug_func_pc_range(g->debug, g->fn_text_sec, g->fn_begin_pos, end_pos);
-    debug_func_end(g->debug);
-  }
-  g->fn_desc = NULL;
-}
-
-/* ============================================================
- * Locals / parameters
- * ============================================================ */
-
-FrameSlot cg_local(CG* g, const FrameSlotDesc* d) {
-  return g->target->frame_slot(g->target, d);
-}
-
-void cg_param(CG* g, const CGParamDesc* d) { g->target->param(g->target, d); }
-
-void cg_bind_decl(CG* g, DeclId id) {
-  /* Decl binding is parser territory at this slice; nothing for cg to do. */
-  (void)g;
-  (void)id;
-}
-
-/* ============================================================
- * Pushes
- * ============================================================ */
-
-void cg_push_int(CG* g, i64 v, const Type* ty) {
-  push(g, make_sv(op_imm(v, ty), ty));
-}
-
-void cg_push_const(CG* g, ConstBytes cb) {
-  /* Materialize into a fresh register through target->load_const so the
-   * stack value is plain rvalue REG. The constant pool / immediate-encoding
-   * choice is the backend's. */
-  CGTarget* T = g->target;
-  Reg r = alloc_reg_or_spill(g, type_class(cb.type), cb.type);
-  Operand dst = op_reg(r, cb.type);
-  T->load_const(T, dst, cb);
-  push(g, make_sv(dst, cb.type));
-}
-
-void cg_push_float(CG* g, double v, const Type* ty) {
-  /* Convenience path that sidesteps exact-bit literal materialization.
-   * Conforming literal parsing should prefer cg_push_const. */
-  CGTarget* T = g->target;
-  union {
-    double d;
-    float f;
-    u8 b[8];
-  } u;
-  ConstBytes cb;
-  /* `long double` (binary128 on AAPCS64) needs the rt soft-float helpers
-   * — `__floatsitf`, `__extenddftf2`, `__addtf3`, ... — which cg does
-   * not yet route through. Refuse to silently lower a TF literal as a
-   * narrower precision; the caller has miscategorized the type or is
-   * ahead of the wiring. */
-  if (ty && ty->kind == TY_LDOUBLE) {
-    compiler_panic(g->c, g->cur_loc,
-                   "cg_push_float: long double (binary128) literal needs "
-                   "rt soft-float wiring (rt/lib/fp_tf); not yet routed "
-                   "through cg");
-  }
-  cb.type = ty;
-  cb.size = abi_sizeof(g->abi, ty);
-  cb.align = abi_alignof(g->abi, ty);
-  if (ty && ty->kind == TY_FLOAT) {
-    u.f = (float)v;
-  } else {
-    u.d = v;
-  }
-  cb.bytes = u.b;
-  cg_push_const(g, cb);
-  (void)T;
-}
-
-void cg_push_str(CG* g, Sym str_id, const Type* ty) {
-  /* Place the string bytes in .rodata and push a pointer. v1 unused by
-   * the spine corpus; left as a clean stub. */
-  (void)g;
-  (void)str_id;
-  (void)ty;
-  compiler_panic(g->c, g->cur_loc, "cg_push_str: not implemented in v1 slice");
-}
-
-void cg_push_local(CG* g, FrameSlot s) {
-  /* The slot's type isn't recorded in cg directly — we trust the parser's
-   * declared local type. Spine: local types come back through the parser's
-   * scope record, not through cg, so the push uses NULL type and the
-   * subsequent cg_load supplies the right type. The parser actually pushes
-   * via the type-aware variant; this base entry is here for completeness. */
-  push(g, make_sv(op_local(s, NULL), NULL));
-}
-
-/* Type-aware variants used by the parser. Not in the public header; the
- * parser calls these directly via a small extension below. */
-void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty);
-void cg_push_local_typed(CG* g, FrameSlot s, const Type* ty) {
-  push(g, make_sv(op_local(s, ty), ty));
-}
-
-/* Pop a pointer rvalue and push an OPK_INDIRECT lvalue for the pointee.
- * The parser uses this to implement unary `*`. The pointer is materialized
- * into a register; the resulting lvalue's MemAccess alias root is unknown
- * (not LOCAL/GLOBAL), which is the right conservative answer for *ptr. */
-static Operand force_reg(CG* g, SValue* v, const Type* ty);
-void cg_deref(CG* g, const Type* pointee_ty);
-void cg_deref(CG* g, const Type* pointee_ty) {
-  SValue v = pop(g);
-  /* The pointer reg becomes the new lvalue's base — ownership transfers
-   * from `v` to the new INDIRECT SValue, so no release on `v`. */
-  Operand src = force_reg(g, &v, sv_type(&v));
-  push(g, make_sv(op_indirect(src.v.reg, 0, pointee_ty), pointee_ty));
-}
-
-/* Read the type of the value currently on top of the stack without popping.
- * The parser uses this for type-driven dispatch (e.g. function-call lowering
- * needs the callee's TY_FUNC) without re-deriving from its own state. */
-const Type* cg_top_type(CG* g);
-const Type* cg_top_type(CG* g) {
-  if (g->sp == 0) return NULL;
-  return g->stack[g->sp - 1].type;
-}
-
-/* Type of the second-from-top SValue. Used by the parser when both operands
- * of a binary operator are already on the stack and it needs to pick a
- * pointer-arithmetic vs. integer-arithmetic lowering. */
-const Type* cg_top2_type(CG* g);
-const Type* cg_top2_type(CG* g) {
-  if (g->sp < 2) return NULL;
-  return g->stack[g->sp - 2].type;
-}
-
-/* Replace the type tag on the top SValue without emitting code. Used by
- * the parser for casts that are no-ops at the value level (e.g. pointer-
- * to-pointer of the same width); the underlying register/operand stays
- * the same, only the C type the parser/backend will read changes. */
-void cg_retag_top(CG* g, const Type* ty);
-void cg_retag_top(CG* g, const Type* ty) {
-  if (g->sp == 0) return;
-  g->stack[g->sp - 1].type = ty;
-  g->stack[g->sp - 1].op.type = ty;
-}
-
-void cg_push_global(CG* g, ObjSymId sym, const Type* ty) {
-  /* TLS storage isn't reachable via a single (ADRP+ADD)-style addressing
-   * mode: the access sequence is multi-instruction (LE: tpidr_el0 + tprel
-   * relocs; macho: TLV descriptor call). Materialize the per-thread
-   * address eagerly through target->tls_addr_of and push it as an
-   * OPK_INDIRECT lvalue so subsequent load/store/addr_of paths emit the
-   * normal indirect sequence rather than the OPK_GLOBAL path. */
-  CGTarget* T = g->target;
-  const ObjSym* os = obj_symbol_get(T->obj, sym);
-  if (os && os->kind == SK_TLS) {
-    const Type* pty = type_ptr(g->pool, ty);
-    Reg r = alloc_reg_or_spill(g, RC_INT, pty);
-    Operand dst = op_reg(r, pty);
-    T->tls_addr_of(T, dst, sym, 0);
-    push(g, make_sv(op_indirect(r, 0, ty), ty));
-    return;
-  }
-  push(g, make_sv(op_global(sym, 0, ty), ty));
-}
-
-/* ============================================================
- * Stack manipulation
- * ============================================================ */
-
-void cg_dup(CG* g) {
-  /* Duplicate the top SValue. INHERENT values (IMM/LOCAL/GLOBAL) carry
-   * no register ownership — copying the SValue is enough. REG-owning
-   * values must materialize into a second register so each side of the
-   * dup can be released independently; the contents come over via
-   * target->copy.
-   *
-   * The source stays on the stack while the destination register is
-   * allocated, which means alloc_reg_or_spill could in principle pick
-   * the source as the spill victim if pool pressure is high and there
-   * is no other eligible RES_REG on the stack. Pin the source for the
-   * duration to keep pick_victim away from it. */
-  CGTarget* T = g->target;
-  if (g->sp == 0) compiler_panic(g->c, g->cur_loc, "cg_dup: stack empty");
-  SValue* top_p = &g->stack[g->sp - 1];
-  ensure_reg(g, top_p);
-  SValue v = *top_p; /* snapshot AFTER reload — v.op now reflects fresh reg */
-  if (v.res != RES_REG) {
-    push(g, v);
-    return;
-  }
-  top_p->pinned = 1;
-  const Type* ty = sv_owned_reg_type(g, &v);
-  Reg r = alloc_reg_or_spill(g, class_of_sv(&v), ty);
-  T->copy(T, op_reg(r, ty), op_reg((Reg)reg_of_sv(&v), ty));
-  /* Refresh the stack pointer: alloc_reg_or_spill above may have spilled
-   * a different stack entry (the top was pinned, so it stayed put), but
-   * the SValue array address could have been reallocated by a future
-   * grow. Today no path inside copy/spill grows the stack, but reading
-   * through the original `top_p` after a potential realloc would be UB
-   * regardless. */
-  g->stack[g->sp - 1].pinned = 0;
-  /* The duplicate is `v` with its owned reg replaced by `r`. set_owned_reg
-   * writes into ind.base for INDIRECT lvalues or v.reg for REG rvalues —
-   * either way the duplicate ends up RES_REG with the freshly-copied
-   * value, independent of the source. */
-  SValue dup = v;
-  set_owned_reg(&dup, r);
-  dup.res = RES_REG;
-  dup.pinned = 0;
-  dup.spill_slot = FRAME_SLOT_NONE;
-  push(g, dup);
-}
-
-void cg_swap(CG* g) {
-  SValue a;
-  SValue b;
-  if (g->sp < 2) compiler_panic(g->c, g->cur_loc, "cg_swap: need 2 values");
-  a = g->stack[g->sp - 1];
-  b = g->stack[g->sp - 2];
-  g->stack[g->sp - 1] = b;
-  g->stack[g->sp - 2] = a;
-}
-
-void cg_drop(CG* g) {
-  SValue v = pop(g);
-  release(g, &v);
-}
-
-/* ============================================================
- * load / store / addr
- * ============================================================ */
-
-static int is_lvalue(const Operand* o) {
-  return o->kind == OPK_LOCAL || o->kind == OPK_GLOBAL ||
-         o->kind == OPK_INDIRECT;
-}
-
-void cg_load(CG* g) {
-  SValue v = pop(g);
-  ensure_reg(g, &v);
-  if (!is_lvalue(&v.op)) {
-    /* Already an rvalue — passing-through is correct (cg_load is idempotent
-     * on rvalues so the parser can call it eagerly). */
-    push(g, v);
-    return;
-  }
-  /* force_reg's lvalue branch does exactly what cg_load wants: alloc a
-   * fresh value reg, T->load through the lvalue's MemAccess, free the
-   * old INDIRECT base if any, retag v as RES_REG. */
-  const Type* ty = sv_type(&v);
-  reject_int128(g, ty, "cg_load");
-  Operand dst = force_reg(g, &v, ty);
-  push(g, make_sv(dst, ty));
-}
-
-void cg_addr(CG* g) {
-  SValue v = pop(g);
-  CGTarget* T = g->target;
-  ensure_reg(g, &v);
-  if (!is_lvalue(&v.op)) {
-    compiler_panic(g->c, g->cur_loc, "cg_addr: operand is not an lvalue");
-  }
-  const Type* pty = type_ptr(g->pool, sv_type(&v));
-  Reg r = alloc_reg_or_spill(g, RC_INT, pty);
-  Operand dst = op_reg(r, pty);
-  T->addr_of(T, dst, v.op);
-  release(g, &v);
-  push(g, make_sv(dst, pty));
-}
-
-void cg_store(CG* g) {
-  /* stack: [..., lv, rv] → [..., rv]
-   *
-   * C semantics: the value of an assignment expression is the value
-   * stored. Leaving rv on top of the stack lets the parser fall through
-   * to the next operator naturally; statement-context callers cg_drop
-   * the leftover. */
-  SValue rv = pop(g);
-  SValue lv = pop(g);
-  CGTarget* T = g->target;
-  ensure_reg(g, &rv);
-  ensure_reg(g, &lv);
-  if (!is_lvalue(&lv.op)) {
-    compiler_panic(g->c, g->cur_loc, "cg_store: destination is not an lvalue");
-  }
-  const Type* ty = sv_type(&lv);
-  reject_int128(g, ty, "cg_store");
-  /* IMM is a legal source for store; otherwise force the rvalue into a
-   * register. force_reg handles the lvalue → REG transition cleanly. */
-  Operand src;
-  if (rv.op.kind == OPK_IMM || rv.op.kind == OPK_REG) {
-    src = rv.op;
-  } else {
-    src = force_reg(g, &rv, sv_type(&rv));
-  }
-  T->store(T, lv.op, src, mem_for_lvalue(g, &lv.op, ty));
-  release(g, &lv);
-  /* Result of assignment expression: leave the stored rvalue on top.
-   * Ownership of any reg in `src` transfers to the new SValue. */
-  push(g, make_sv(src, ty));
-}
-
-/* ============================================================
- * Aggregates / bitfields — placeholders
- * ============================================================ */
-
-void cg_copy_aggregate(CG* g, AggregateAccess a) {
-  (void)a;
-  compiler_panic(g->c, g->cur_loc, "cg_copy_aggregate: not in v1 slice");
-}
-void cg_set_aggregate(CG* g, AggregateAccess a) {
-  (void)a;
-  compiler_panic(g->c, g->cur_loc, "cg_set_aggregate: not in v1 slice");
-}
-void cg_bitfield_load(CG* g, BitFieldAccess b) {
-  (void)b;
-  compiler_panic(g->c, g->cur_loc, "cg_bitfield_load: not in v1 slice");
-}
-void cg_bitfield_store(CG* g, BitFieldAccess b) {
-  (void)b;
-  compiler_panic(g->c, g->cur_loc, "cg_bitfield_store: not in v1 slice");
-}
-
-/* ============================================================
- * Arithmetic / compare / convert
- * ============================================================ */
-
-/* Like force_reg, but leaves an OPK_IMM SValue alone — the CGTarget
- * contract for binop/unop/cmp accepts IMM sources, so we avoid burning
- * a value-stack register on `x + 3` style sites. The backend decides
- * imm-form vs. materialize per the literal's width. */
-static Operand force_reg_unless_imm(CG* g, SValue* v, const Type* ty);
-
-/* Force an SValue (already popped, by reference) into a register operand
- * of the given type. Mutates `*v` so that v->op is OPK_REG and v->res is
- * RES_REG; on lvalue inputs this means the original lvalue's base reg is
- * freed and replaced by the freshly-loaded value reg. The caller can
- * then release(g, v) to give the register back when the operation is
- * done with it. */
-static Operand force_reg(CG* g, SValue* v, const Type* ty) {
-  CGTarget* T = g->target;
-  ensure_reg(g, v);
-  if (v->op.kind == OPK_REG) return v->op;
-  Reg r = alloc_reg_or_spill(g, type_class(ty), ty);
-  Operand dst = op_reg(r, ty);
-  if (v->op.kind == OPK_IMM) {
-    T->load_imm(T, dst, v->op.v.imm);
-  } else if (is_lvalue(&v->op)) {
-    T->load(T, dst, v->op, mem_for_lvalue(g, &v->op, ty));
-    /* Old INDIRECT base reg is no longer referenced — release it.
-     * INDIRECT bases are always pointer-typed (RC_INT). */
-    if (v->op.kind == OPK_INDIRECT) {
-      T->free_reg(T, v->op.v.ind.base, RC_INT);
-    }
-  } else {
-    compiler_panic(g->c, g->cur_loc, "cg: cannot force operand to register");
-  }
-  v->op = dst;
-  v->res = RES_REG;
-  return dst;
-}
-
-static Operand force_reg_unless_imm(CG* g, SValue* v, const Type* ty) {
-  if (v->op.kind == OPK_IMM) return v->op;
-  return force_reg(g, v, ty);
-}
-
-void cg_binop(CG* g, BinOp op) {
-  /* stack: [a, b] → [a OP b] */
-  SValue b = pop(g);
-  SValue a = pop(g);
-  CGTarget* T = g->target;
-  /* Result type is `a`'s type at this slice (parser already coerced). */
-  const Type* ty = a.type ? a.type : b.type;
-  reject_int128(g, ty, "cg_binop");
-
-  /* Tier 1+2: constant-fold or apply algebraic identities via the
-   * pure fold helper. KEEP_A/KEEP_B re-push the non-constant operand
-   * unchanged after releasing the IMM side (IMM carries no reg/slot
-   * obligation, but the helper is symmetric and a no-op release is
-   * cheap). */
-  {
-    Operand folded;
-    switch (cg_fold_binop(op, a.op, b.op, ty, g->abi, &folded)) {
-      case CG_FOLD_IMM:
-        release(g, &a);
-        release(g, &b);
-        push(g, make_sv(folded, ty));
-        return;
-      case CG_FOLD_KEEP_A:
-        release(g, &b);
-        push(g, a);
-        return;
-      case CG_FOLD_KEEP_B:
-        release(g, &a);
-        push(g, b);
-        return;
-      case CG_FOLD_NONE: break;
-    }
-  }
-
-  /* IMM sources are legal per the binop contract (arch.h) — the backend
-   * picks imm-form vs. materialize. cg_fold_binop has already collapsed
-   * IMM+IMM, so at most one operand here is IMM. */
-  Operand ra = force_reg_unless_imm(g, &a, ty);
-  Operand rb = force_reg_unless_imm(g, &b, ty);
-  Reg rr = alloc_reg_or_spill(g, type_class(ty), ty);
-  Operand dst = op_reg(rr, ty);
-  T->binop(T, op, dst, ra, rb);
-  release(g, &a);
-  release(g, &b);
-  push(g, make_sv(dst, ty));
-}
-
-void cg_unop(CG* g, UnOp op) {
-  SValue a = pop(g);
-  CGTarget* T = g->target;
-  const Type* ty = a.type ? a.type : a.op.type;
-  reject_int128(g, ty, "cg_unop");
-
-  {
-    Operand folded;
-    if (cg_fold_unop(op, a.op, ty, g->abi, &folded) == CG_FOLD_IMM) {
-      release(g, &a);
-      push(g, make_sv(folded, ty));
-      return;
-    }
-  }
-
-  Operand ra = force_reg_unless_imm(g, &a, ty);
-  Reg rr = alloc_reg_or_spill(g, type_class(ty), ty);
-  Operand dst = op_reg(rr, ty);
-  T->unop(T, op, dst, ra);
-  release(g, &a);
-  push(g, make_sv(dst, ty));
-}
-
-void cg_cmp(CG* g, CmpOp op) {
-  /* stack: [a, b] → [i32 result 0/1] */
-  SValue b = pop(g);
-  SValue a = pop(g);
-  CGTarget* T = g->target;
-  const Type* opty = a.type ? a.type : b.type;
-  const Type* i32 = type_prim(g->pool, TY_INT);
-
-  {
-    Operand folded;
-    if (cg_fold_cmp(op, a.op, b.op, i32, g->abi, &folded) == CG_FOLD_IMM) {
-      release(g, &a);
-      release(g, &b);
-      push(g, make_sv(folded, i32));
-      return;
-    }
-  }
-
-  Operand ra = force_reg_unless_imm(g, &a, opty);
-  Operand rb = force_reg_unless_imm(g, &b, opty);
-  Reg rr = alloc_reg_or_spill(g, RC_INT, i32);
-  Operand dst = op_reg(rr, i32);
-  T->cmp(T, op, dst, ra, rb);
-  release(g, &a);
-  release(g, &b);
-  push(g, make_sv(dst, i32));
-}
-
-void cg_inc_dec(CG* g, BinOp op, int post) {
-  /* stack: [lv] → [resultval]. Materialize the in-place update inside cg
-   * because juggling lv + old + new through dup/swap from outside requires
-   * a 3-element rotate the stack API doesn't expose. */
-  CGTarget* T = g->target;
-  SValue lv = pop(g);
-  ensure_reg(g, &lv);
-  if (!is_lvalue(&lv.op)) {
-    compiler_panic(g->c, g->cur_loc,
-                   "cg_inc_dec: target is not an lvalue");
-  }
-  const Type* ty = sv_type(&lv);
-  MemAccess ma = mem_for_lvalue(g, &lv.op, ty);
-
-  /* Load current value into r_old, compute r_new = r_old +/- 1, store back. */
-  Reg r_old = alloc_reg_or_spill(g, type_class(ty), ty);
-  Operand o_old = op_reg(r_old, ty);
-  T->load(T, o_old, lv.op, ma);
-
-  Reg r_new = alloc_reg_or_spill(g, type_class(ty), ty);
-  Operand o_new = op_reg(r_new, ty);
-  T->binop(T, op, o_new, o_old, op_imm(1, ty));
-
-  T->store(T, lv.op, o_new, ma);
-
-  /* Free whichever register is NOT being returned, plus any base reg the
-   * lvalue owned. */
-  T->free_reg(T, post ? r_new : r_old, type_class(ty));
-  release(g, &lv);
-  push(g, make_sv(post ? o_old : o_new, ty));
-}
-
-void cg_convert(CG* g, const Type* dst_ty) {
-  SValue v = pop(g);
-  CGTarget* T = g->target;
-  const Type* sty = v.type ? v.type : v.op.type;
-  reject_int128(g, sty, "cg_convert");
-  reject_int128(g, dst_ty, "cg_convert");
-  ConvKind ck;
-  Operand src;
-  Reg rr;
-  Operand dst;
-  /* Trivial: same type. */
-  if (sty == dst_ty) {
-    push(g, v);
-    return;
-  }
-  /* `long double` (binary128) conversions need the rt soft-float helpers
-   * — `__floatsitf`, `__fixtfsi`, `__extenddftf2`, `__trunctfdf2` — which
-   * cg does not yet emit. Refuse rather than silently miscompile through
-   * the FP convert dispatch below (the aarch64 backend would otherwise
-   * mis-encode a 16-byte operand as a `d` register). */
-  if ((sty && sty->kind == TY_LDOUBLE) ||
-      (dst_ty && dst_ty->kind == TY_LDOUBLE)) {
-    compiler_panic(g->c, g->cur_loc,
-                   "cg_convert: long double (binary128) conversion needs "
-                   "rt soft-float wiring (rt/lib/fp_tf); not yet routed "
-                   "through cg");
-  }
-  /* Pick a ConvKind from src/dst kinds. Same-size same-class integer
-   * reinterprets are bit-identity and reduce to a retag (no instruction);
-   * everything else routes to the backend's convert hook. */
-  {
-    int s_int = type_is_int(sty);
-    int d_int = type_is_int(dst_ty);
-    int s_flt = sty && (sty->kind == TY_FLOAT || sty->kind == TY_DOUBLE ||
-                        sty->kind == TY_LDOUBLE);
-    int d_flt = dst_ty && (dst_ty->kind == TY_FLOAT || dst_ty->kind == TY_DOUBLE ||
-                           dst_ty->kind == TY_LDOUBLE);
-    u32 s_sz = sty ? abi_sizeof(g->abi, sty) : 0;
-    u32 d_sz = dst_ty ? abi_sizeof(g->abi, dst_ty) : 0;
-    int s_signed = sty ? abi_type_info(g->abi, sty).signed_ : 0;
-    int s_ptr = type_is_ptr(sty);
-    int d_ptr = type_is_ptr(dst_ty);
-    /* Pointers are scalar GPR-class values that convert to/from integers
-     * the same way an unsigned of equal width would: same-size is a
-     * retag, narrowing is a TRUNC, widening is a ZEXT. Treat them as int
-     * for the purposes of selecting a ConvKind. */
-    int s_int_or_ptr = s_int || s_ptr;
-    int d_int_or_ptr = d_int || d_ptr;
-    if (s_int_or_ptr && d_int_or_ptr) {
-      if (d_sz < s_sz) {
-        ck = CV_TRUNC;
-      } else if (d_sz > s_sz) {
-        ck = (s_int && s_signed) ? CV_SEXT : CV_ZEXT;
-      } else {
-        /* Same-size reinterpret (signed↔unsigned, ptr↔int, ptr↔ptr). The
-         * bit pattern is unchanged; just retag the C type and push back. */
-        v.type = dst_ty;
-        v.op.type = dst_ty;
-        push(g, v);
-        return;
-      }
-    } else if (s_int && d_flt) {
-      ck = s_signed ? CV_ITOF_S : CV_ITOF_U;
-    } else if (s_flt && d_int) {
-      int d_signed = abi_type_info(g->abi, dst_ty).signed_;
-      ck = d_signed ? CV_FTOI_S : CV_FTOI_U;
-    } else if (s_flt && d_flt) {
-      ck = (d_sz > s_sz) ? CV_FEXT : CV_FTRUNC;
-    } else {
-      ck = CV_BITCAST;
-    }
-  }
-  src = force_reg(g, &v, sty);
-  rr = alloc_reg_or_spill(g, type_class(dst_ty), dst_ty);
-  dst = op_reg(rr, dst_ty);
-  T->convert(T, ck, dst, src);
-  release(g, &v);
-  push(g, make_sv(dst, dst_ty));
-}
-
-/* ============================================================
- * Calls / return
- * ============================================================ */
-
-void cg_call(CG* g, u32 nargs, const Type* fn_type) {
-  /* stack: [..., callee, arg0..argN-1] → [result] (or nothing if void) */
-  CGTarget* T = g->target;
-  const ABIFuncInfo* abi = abi_func_info(g->abi, fn_type);
-  const Type* ret_ty = fn_type->fn.ret;
-  int has_result = ret_ty && ret_ty->kind != TY_VOID;
-
-  if (g->sp < (u32)nargs + 1u) {
-    compiler_panic(g->c, g->cur_loc, "cg_call: stack underflow");
-  }
-  CGABIValue* avs = NULL;
-  if (nargs) {
-    avs = arena_array(g->c->tu, CGABIValue, nargs);
-    memset(avs, 0, sizeof(CGABIValue) * nargs);
-  }
-
-  /* Expose avs to the regalloc fallback. As we pop and materialize args
-   * one at a time, the popped regs accumulate in avs[] off the value
-   * stack, where pick_victim can't reach them. If pressure exhausts the
-   * pool while reloading a spilled arg later in the loop, spill_avs_victim
-   * picks an already-materialized avs entry, stores it to a frame slot,
-   * and rewrites avs[i].storage to OPK_LOCAL — the backend's call
-   * lowering loads from the slot. */
-  g->avs_in_flight = avs;
-  g->avs_in_flight_n = nargs;
-
-  /* Pop args in reverse so we can fill avs[i] in declaration order.
-   * Scalar lvalues materialize into a register through force_reg (which
-   * also frees an old INDIRECT base); OPK_IMM and OPK_REG pass through
-   * so the call sees the same operand. Aggregate args (struct/union)
-   * stay as lvalues — the backend reads each ABI part from
-   * &storage + part->src_offset (DIRECT) or passes the address
-   * itself (INDIRECT/byval). The parser is expected to have left an
-   * OPK_LOCAL/GLOBAL/INDIRECT on the value stack for them. */
-  for (u32 i = 0; i < nargs; ++i) {
-    u32 idx = nargs - 1u - i;
-    SValue arg = pop(g);
-    ensure_reg(g, &arg);
-    int is_vararg = (idx >= abi->nparams);
-    const Type* aty;
-    if (is_vararg) {
-      aty = arg.type ? arg.type : sv_type(&arg);
-    } else {
-      aty = fn_type->fn.params ? fn_type->fn.params[idx] : arg.type;
-    }
-    avs[idx].type = aty;
-    avs[idx].abi = is_vararg ? NULL : &abi->params[idx];
-    int is_aggregate = aty && (aty->kind == TY_STRUCT || aty->kind == TY_UNION);
-    if (is_aggregate) {
-      if (!is_lvalue(&arg.op)) {
-        compiler_panic(g->c, g->cur_loc,
-                       "cg_call: aggregate arg requires an lvalue source "
-                       "(got operand kind %d)",
-                       (int)arg.op.kind);
-      }
-      /* Stamp the operand's type with the aggregate type so
-       * release_arg_storage recognizes this as a borrowed lvalue and
-       * leaves the slot alone. */
-      Operand st = arg.op;
-      st.type = aty;
-      avs[idx].storage = st;
-      avs[idx].size = abi_sizeof(g->abi, aty);
-    } else {
-      avs[idx].storage =
-          is_lvalue(&arg.op) ? force_reg(g, &arg, aty) : arg.op;
-    }
-  }
-
-  SValue callee = pop(g);
-  ensure_reg(g, &callee);
-  /* Direct calls keep the OPK_GLOBAL operand; indirect calls force the
-   * function pointer into a register. */
-  Operand callee_op = (callee.op.kind == OPK_GLOBAL)
-                          ? callee.op
-                          : force_reg(g, &callee, fn_type);
-
-  CGCallDesc desc;
-  memset(&desc, 0, sizeof desc);
-  desc.fn_type = fn_type;
-  desc.abi = abi;
-  desc.callee = callee_op;
-  desc.args = avs;
-  desc.nargs = nargs;
-  desc.flags = CG_CALL_NONE;
-  desc.ret.type = ret_ty;
-  desc.ret.abi = &abi->ret;
-  int ret_is_aggregate =
-      has_result && (ret_ty->kind == TY_STRUCT || ret_ty->kind == TY_UNION);
-  FrameSlot ret_slot = FRAME_SLOT_NONE;
-  if (has_result) {
-    if (ret_is_aggregate) {
-      /* Caller-side home for the return: INDIRECT (sret) writes through
-       * the hidden destination pointer into this slot; DIRECT multi-part
-       * has the backend store each return register at part->src_offset
-       * within it. Either way the parser receives an OPK_LOCAL lvalue. */
-      FrameSlotDesc fsd;
-      memset(&fsd, 0, sizeof fsd);
-      fsd.type = ret_ty;
-      fsd.size = abi_sizeof(g->abi, ret_ty);
-      fsd.align = abi_alignof(g->abi, ret_ty);
-      fsd.kind = FS_LOCAL;
-      fsd.flags = FSF_ADDR_TAKEN;
-      ret_slot = g->target->frame_slot(g->target, &fsd);
-      desc.ret.storage = op_local(ret_slot, ret_ty);
-    } else {
-      Reg r = alloc_reg_or_spill(g, type_class(ret_ty), ret_ty);
-      desc.ret.storage = op_reg(r, ret_ty);
-    }
-  }
-
-  T->call(T, &desc);
-
-  /* Tear down the in-flight arg set: each entry's storage may be a REG
-   * (return to pool) or OPK_LOCAL (a spill slot, return to per-class
-   * free-list). IMMs carry no runtime ownership. */
-  for (u32 i = 0; i < nargs; ++i) {
-    release_arg_storage(g, &avs[i].storage);
-  }
-  g->avs_in_flight = NULL;
-  g->avs_in_flight_n = 0;
-
-  if (callee.op.kind != OPK_GLOBAL) {
-    /* Indirect callees are function pointers; they live in int regs. */
-    T->free_reg(T, callee_op.v.reg, RC_INT);
-  }
-  if (has_result) {
-    push(g, make_sv(desc.ret.storage, ret_ty));
-  }
-}
-
-void cg_tail_call(CG* g, u32 nargs, const Type* fn_type) {
-  /* Sibling-call form. v1 routes through cg_call with CG_CALL_TAIL. */
-  (void)nargs;
-  (void)fn_type;
-  compiler_panic(g->c, g->cur_loc, "cg_tail_call: not in v1 slice");
-}
-
-void cg_ret(CG* g, int has_value) {
-  CGTarget* T = g->target;
-  const ABIFuncInfo* abi = g->fn_abi;
-  if (!has_value) {
-    T->ret(T, NULL);
-    return;
-  }
-  {
-    SValue v = pop(g);
-    const Type* rty = g->fn_ret_type;
-    int is_aggregate = rty && (rty->kind == TY_STRUCT || rty->kind == TY_UNION);
-    CGABIValue av;
-    memset(&av, 0, sizeof av);
-    av.type = rty;
-    av.abi = &abi->ret;
-    if (is_aggregate) {
-      /* Aggregate return: backend reads parts from the source lvalue
-       * (DIRECT) or memcpys it through the sret pointer (INDIRECT). */
-      if (!is_lvalue(&v.op)) {
-        compiler_panic(g->c, g->cur_loc,
-                       "cg_ret: aggregate return requires an lvalue source "
-                       "(got operand kind %d)",
-                       (int)v.op.kind);
-      }
-      av.storage = v.op;
-      av.storage.type = rty;
-      av.size = abi_sizeof(g->abi, rty);
-      T->ret(T, &av);
-      /* No register/spill obligation to release — the source slot is
-       * borrowed and the underlying lvalue's owner (e.g. the function's
-       * local) cleans up at func_end. */
-      return;
-    }
-    Operand ret_op = force_reg(g, &v, rty);
-    av.storage = ret_op;
-    T->ret(T, &av);
-    release(g, &v);
-  }
-}
-
-/* ============================================================
- * alloca / variadics / setjmp / atomics
- * ============================================================ */
-
-void cg_alloca(CG* g) {
-  /* Pop the size (i64 imm or reg), call CGTarget.alloca_, push the resulting
-   * void* aligned to max_align_t. The 16-byte alignment is the AAPCS64
-   * max_align_t; cg trusts the backend to honor it (aa_alloca_ rounds the
-   * size up to a 16-byte multiple, which is what keeps SP aligned). */
-  CGTarget* T = g->target;
-  SValue sz = pop(g);
-  const Type* void_ptr = type_ptr(g->pool, type_void(g->pool));
-  ensure_reg(g, &sz);
-  Operand sz_op =
-      (sz.op.kind == OPK_IMM) ? sz.op : force_reg(g, &sz, sv_type(&sz));
-  Reg dst_r = alloc_reg_or_spill(g, RC_INT, void_ptr);
-  Operand dst = op_reg(dst_r, void_ptr);
-  T->alloca_(T, dst, sz_op, /*align=*/16);
-  release(g, &sz);
-  push(g, make_sv(dst, void_ptr));
-}
-/* Variadics. Parser pushes &ap (pointer rvalue) before each call; cg pops
- * it as a register operand and forwards to the backend. va_arg additionally
- * allocates a destination register typed by the requested arg type. */
-void cg_va_start_(CG* g) {
-  CGTarget* T = g->target;
-  SValue ap = pop(g);
-  Operand ap_op = force_reg(g, &ap, sv_type(&ap));
-  T->va_start_(T, ap_op);
-  release(g, &ap);
-}
-void cg_va_arg_(CG* g, const Type* t) {
-  CGTarget* T = g->target;
-  SValue ap = pop(g);
-  Operand ap_op = force_reg(g, &ap, sv_type(&ap));
-  Reg dst_r = alloc_reg_or_spill(g, type_class(t), t);
-  Operand dst = op_reg(dst_r, t);
-  T->va_arg_(T, dst, ap_op, t);
-  release(g, &ap);
-  push(g, make_sv(dst, t));
-}
-void cg_va_end_(CG* g) {
-  CGTarget* T = g->target;
-  SValue ap = pop(g);
-  Operand ap_op = force_reg(g, &ap, sv_type(&ap));
-  T->va_end_(T, ap_op);
-  release(g, &ap);
-}
-void cg_va_copy_(CG* g) {
-  CGTarget* T = g->target;
-  /* Parser pushes &dst then &src; pop src first. */
-  SValue src = pop(g);
-  SValue dst = pop(g);
-  Operand src_op = force_reg(g, &src, sv_type(&src));
-  Operand dst_op = force_reg(g, &dst, sv_type(&dst));
-  T->va_copy_(T, dst_op, src_op);
-  release(g, &src);
-  release(g, &dst);
-}
-void cg_setjmp(CG* g) {
-  CGTarget* T = g->target;
-  SValue buf = pop(g);
-  Operand buf_op = force_reg(g, &buf, sv_type(&buf));
-  const Type* int_ty = type_prim(g->pool, TY_INT);
-  Reg dst_r = alloc_reg_or_spill(g, RC_INT, int_ty);
-  Operand dst = op_reg(dst_r, int_ty);
-  T->intrinsic(T, INTRIN_SETJMP, &dst, 1u, &buf_op, 1u);
-  release(g, &buf);
-  push(g, make_sv(dst, int_ty));
-}
-void cg_longjmp(CG* g) {
-  CGTarget* T = g->target;
-  SValue val = pop(g);
-  SValue buf = pop(g);
-  Operand args[2];
-  args[0] = force_reg(g, &buf, sv_type(&buf));
-  args[1] = (val.op.kind == OPK_IMM || val.op.kind == OPK_REG)
-                ? val.op
-                : force_reg(g, &val, sv_type(&val));
-  T->intrinsic(T, INTRIN_LONGJMP, NULL, 0u, args, 2u);
-  release(g, &val);
-  release(g, &buf);
-}
-/* Atomics. The parser pushes the address as a pointer rvalue (typed `T*`)
- * and any value operands as plain rvalues; cg pops them, materializes
- * registers, derives a MemAccess from the pointee type, and dispatches to
- * the backend. MF_ATOMIC is set on the MemAccess so opt sees the access
- * as atomic regardless of any qualifier on the pointee. */
-static const Type* atomic_pointee(CG* g, const Type* pty, const char* who) {
-  if (!pty || pty->kind != TY_PTR) {
-    compiler_panic(g->c, g->cur_loc, "%s: operand is not a pointer", who);
-  }
-  return pty->ptr.pointee;
-}
-
-static MemAccess mem_for_atomic(CG* g, const Type* val_ty) {
-  MemAccess ma = derive_mem(g, val_ty, ALIAS_UNKNOWN, 0);
-  ma.flags |= MF_ATOMIC;
-  return ma;
-}
-
-void cg_atomic_load(CG* g, MemOrder o) {
-  CGTarget* T = g->target;
-  SValue ptr = pop(g);
-  ensure_reg(g, &ptr);
-  const Type* pty = sv_type(&ptr);
-  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_load");
-  Operand addr = force_reg(g, &ptr, pty);
-  Reg dst_r = alloc_reg_or_spill(g, type_class(val_ty), val_ty);
-  Operand dst = op_reg(dst_r, val_ty);
-  T->atomic_load(T, dst, addr, mem_for_atomic(g, val_ty), o);
-  release(g, &ptr);
-  push(g, make_sv(dst, val_ty));
-}
-
-void cg_atomic_store(CG* g, MemOrder o) {
-  CGTarget* T = g->target;
-  SValue val = pop(g);
-  SValue ptr = pop(g);
-  ensure_reg(g, &val);
-  ensure_reg(g, &ptr);
-  const Type* pty = sv_type(&ptr);
-  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_store");
-  Operand addr = force_reg(g, &ptr, pty);
-  Operand src = (val.op.kind == OPK_IMM || val.op.kind == OPK_REG)
-                    ? val.op
-                    : force_reg(g, &val, val_ty);
-  T->atomic_store(T, addr, src, mem_for_atomic(g, val_ty), o);
-  release(g, &val);
-  release(g, &ptr);
-}
-
-void cg_atomic_rmw(CG* g, AtomicOp a, MemOrder o) {
-  CGTarget* T = g->target;
-  SValue val = pop(g);
-  SValue ptr = pop(g);
-  ensure_reg(g, &val);
-  ensure_reg(g, &ptr);
-  const Type* pty = sv_type(&ptr);
-  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_rmw");
-  Operand addr = force_reg(g, &ptr, pty);
-  Operand vop = (val.op.kind == OPK_IMM || val.op.kind == OPK_REG)
-                    ? val.op
-                    : force_reg(g, &val, val_ty);
-  Reg dst_r = alloc_reg_or_spill(g, type_class(val_ty), val_ty);
-  Operand dst = op_reg(dst_r, val_ty);
-  T->atomic_rmw(T, a, dst, addr, vop, mem_for_atomic(g, val_ty), o);
-  release(g, &val);
-  release(g, &ptr);
-  push(g, make_sv(dst, val_ty));
-}
-
-void cg_atomic_cas(CG* g, MemOrder succ, MemOrder fail) {
-  CGTarget* T = g->target;
-  SValue desired = pop(g);
-  SValue expected = pop(g);
-  SValue ptr = pop(g);
-  ensure_reg(g, &desired);
-  ensure_reg(g, &expected);
-  ensure_reg(g, &ptr);
-  const Type* pty = sv_type(&ptr);
-  const Type* val_ty = atomic_pointee(g, pty, "cg_atomic_cas");
-  Operand addr = force_reg(g, &ptr, pty);
-  Operand exp_op = (expected.op.kind == OPK_IMM || expected.op.kind == OPK_REG)
-                       ? expected.op
-                       : force_reg(g, &expected, val_ty);
-  Operand des_op = (desired.op.kind == OPK_IMM || desired.op.kind == OPK_REG)
-                       ? desired.op
-                       : force_reg(g, &desired, val_ty);
-  Reg prior_r = alloc_reg_or_spill(g, type_class(val_ty), val_ty);
-  const Type* i32 = type_prim(g->pool, TY_INT);
-  Reg ok_r = alloc_reg_or_spill(g, RC_INT, i32);
-  Operand prior = op_reg(prior_r, val_ty);
-  Operand ok = op_reg(ok_r, i32);
-  T->atomic_cas(T, prior, ok, addr, exp_op, des_op, mem_for_atomic(g, val_ty),
-                succ, fail);
-  release(g, &desired);
-  release(g, &expected);
-  release(g, &ptr);
-  push(g, make_sv(prior, val_ty));
-  push(g, make_sv(ok, i32));
-}
-
-void cg_fence(CG* g, MemOrder o) { g->target->fence(g->target, o); }
-
-/* One-arg, one-result intrinsic returning C `int`. Used by __builtin_ctz /
- * clz / popcount: the operand drives the width (sf bit on aa64, REX.W on
- * x64, sf on rv64) while the result is always `int`. */
-void cg_intrinsic_unary_to_int(CG* g, IntrinKind kind) {
-  CGTarget* T = g->target;
-  SValue v = pop(g);
-  const Type* arg_ty = sv_type(&v);
-  ensure_reg(g, &v);
-  Operand arg = force_reg(g, &v, arg_ty);
-  const Type* int_ty = type_prim(g->pool, TY_INT);
-  Reg dst_r = alloc_reg_or_spill(g, RC_INT, int_ty);
-  Operand dst = op_reg(dst_r, int_ty);
-  T->intrinsic(T, kind, &dst, 1u, &arg, 1u);
-  release(g, &v);
-  push(g, make_sv(dst, int_ty));
-}
-
-void cg_intrinsic_void(CG* g, IntrinKind kind) {
-  CGTarget* T = g->target;
-  T->intrinsic(T, kind, NULL, 0u, NULL, 0u);
-}
-
-/* ============================================================
- * Control flow — flat labels
- * ============================================================ */
-
-CGLabel cg_label_new(CG* g) { return (CGLabel)g->target->label_new(g->target); }
-
-void cg_label_place(CG* g, CGLabel l) {
-  g->target->label_place(g->target, (Label)l);
-}
-
-void cg_jump(CG* g, CGLabel l) { g->target->jump(g->target, (Label)l); }
-
-void cg_branch_true(CG* g, CGLabel l) {
-  /* Pop i1 and branch if nonzero. v1 synthesizes cmp_branch(CMP_NE, val, 0). */
-  SValue v = pop(g);
-  CGTarget* T = g->target;
-  const Type* ty = v.type ? v.type : type_prim(g->pool, TY_INT);
-  /* Mirror cg_branch_false: a literal condition resolves at compile time. */
-  if (v.op.kind == OPK_IMM) {
-    if (v.op.v.imm != 0) {
-      T->jump(T, (Label)l);
-    }
-    release(g, &v);
-    return;
-  }
-  Operand a = force_reg(g, &v, ty);
-  Operand zero = op_imm(0, ty);
-  T->cmp_branch(T, CMP_NE, a, zero, (Label)l);
-  release(g, &v);
-}
-
-void cg_branch_false(CG* g, CGLabel l) {
-  SValue v = pop(g);
-  CGTarget* T = g->target;
-  const Type* ty = v.type ? v.type : type_prim(g->pool, TY_INT);
-  /* Constant-fold: branch on a known-zero immediate becomes unconditional;
-   * branch on a known-nonzero immediate becomes a no-op. The aarch64
-   * cmp_branch handles immediates too, but folding here keeps the emitted
-   * code clean and lets `if (1) ...` skip the cmp entirely. */
-  if (v.op.kind == OPK_IMM) {
-    if (v.op.v.imm == 0) {
-      T->jump(T, (Label)l);
-    }
-    release(g, &v);
-    return;
-  }
-  {
-    Operand a = force_reg(g, &v, ty);
-    Operand zero = op_imm(0, ty);
-    T->cmp_branch(T, CMP_EQ, a, zero, (Label)l);
-    release(g, &v);
-  }
-}
-
-/* ============================================================
- * Structured control flow — passthrough to target
- * ============================================================ */
-
-CGScope cg_scope_begin(CG* g, CGScopeConfig cfg) {
-  CGScopeDesc d;
-  memset(&d, 0, sizeof d);
-  d.kind = (u8)cfg.kind;
-  d.break_label = (Label)cfg.break_label;
-  d.continue_label = (Label)cfg.continue_label;
-  d.result_type = cfg.result_type;
-  if (cfg.kind == SCOPE_IF) {
-    /* Pop the condition. */
-    SValue v = pop(g);
-    const Type* ty = v.type ? v.type : type_prim(g->pool, TY_INT);
-    d.cond = force_reg(g, &v, ty);
-    /* The cond reg is consumed by the backend's scope_begin emit; once
-     * the comparison/branch is in flight there's no live use, so free
-     * it back to the pool now. */
-    release(g, &v);
-  }
-  return (CGScope)g->target->scope_begin(g->target, &d);
-}
-
-void cg_scope_else(CG* g, CGScope s) {
-  g->target->scope_else(g->target, (CGScope)s);
-}
-
-void cg_scope_end(CG* g, CGScope s) {
-  g->target->scope_end(g->target, (CGScope)s);
-}
-
-void cg_break(CG* g, CGScope s) {
-  g->target->break_to(g->target, (CGScope)s);
-}
-
-void cg_continue(CG* g, CGScope s) {
-  g->target->continue_to(g->target, (CGScope)s);
-}
-
-/* ============================================================
- * Source location
- * ============================================================ */
-
-void cg_set_loc(CG* g, SrcLoc loc) {
-  g->cur_loc = loc;
-  if (g->target->set_loc) g->target->set_loc(g->target, loc);
-  if (g->debug) debug_set_pending_loc(g->debug, loc);
-}
-
-/* ============================================================
- * Inline asm — constraint binder (doc/INLINEASM.md §5).
- *
- * The parser pushed `nin` input SValues onto the value stack in declaration
- * order (the Nth input is at the top). Outputs come back as fresh SValues
- * that the parser assigns to its declared lvalues. AsmConstraint.type
- * carries the bound expression's C type (parser-populated); the binder
- * routes it through alloc_reg + type_class so FP outputs land in RC_FP,
- * pointer outputs keep their pointer type, and narrow types get the right
- * width. Hand-built test constraints (NULL type) fall back to 64-bit int.
- *
- * Constraints handled:
- *   inputs  : "r" (force into REG), "i" (must be IMM),
- *             "m" (materialize an INDIRECT lvalue),
- *             "0".."9" (matching: bind to out_ops[N].v.reg)
- *   outputs : "=r" (alloc fresh), "+r" (alloc fresh; expects a parallel
- *             matching input slot), "=&r" (early-clobber: alloc disjoint
- *             from any input reg)
- * Clobbers:
- *   "memory" — spill all live RES_REG SValues so subsequent reads reload.
- *   register names — passed through to target->asm_block (the arch backend
- *     routes them through its call-clobber set).
- *   "cc" — silently ignored on aarch64 (NZCV is reserved across blocks). */
-
-/* Parse a leading non-negative decimal index from a constraint string.
- * Returns -1 if the first character isn't a digit. */
-static int asm_parse_match_index(const char* s) {
-  if (!s || s[0] < '0' || s[0] > '9') return -1;
-  int n = 0;
-  for (const char* p = s; *p >= '0' && *p <= '9'; ++p) {
-    n = n * 10 + (*p - '0');
-  }
-  return n;
-}
-
-/* Skip leading "=&" / "=" / "+" modifier prefix and return a pointer past
- * it. The remainder is the body letter ("r", "m", ...). */
-static const char* asm_constraint_body(const char* s) {
-  if (!s) return "";
-  if (s[0] == '=' && s[1] == '&') return s + 2;
-  if (s[0] == '=' || s[0] == '+' || s[0] == '&') return s + 1;
-  return s;
-}
-
-static int asm_is_early_clobber(const char* s) {
-  if (!s) return 0;
-  if (s[0] == '=' && s[1] == '&') return 1;
-  if (s[0] == '&') return 1;
-  return 0;
-}
-
-void cg_inline_asm(CG* g, const char* tmpl, const AsmConstraint* outs, u32 nout,
-                   const AsmConstraint* ins, u32 nin, const Sym* clobbers,
-                   u32 nclob) {
-  CGTarget* T = g->target;
-  Heap* h = g->c->env->heap;
-  /* Fallback for hand-built test constraints that don't carry a type. The
-   * parser always populates AsmConstraint.type from the bound expression's
-   * C type; only unit-test constraints leave it NULL. */
-  const Type* fallback_ty = type_prim(g->pool, TY_LLONG);
-
-  /* ---- pop inputs in reverse, store in declaration order ---- */
-  SValue* in_svs = NULL;
-  if (nin) {
-    in_svs = (SValue*)h->alloc(h, sizeof(SValue) * nin, _Alignof(SValue));
-    for (u32 i = 0; i < nin; ++i) {
-      u32 idx = nin - 1u - i;
-      in_svs[idx] = pop(g);
-      ensure_reg(g, &in_svs[idx]);
-    }
-  }
-
-  Operand* in_ops = NULL;
-  if (nin) {
-    in_ops = (Operand*)h->alloc(h, sizeof(Operand) * nin, _Alignof(Operand));
-    memset(in_ops, 0, sizeof(Operand) * nin);
-  }
-  Operand* out_ops = NULL;
-  if (nout) {
-    out_ops = (Operand*)h->alloc(h, sizeof(Operand) * nout, _Alignof(Operand));
-    memset(out_ops, 0, sizeof(Operand) * nout);
-  }
-  /* Tracks whether each out_ops[i] reg was freshly allocated (and should be
-   * pushed back as RES_REG owning that reg) vs. shared with an input that
-   * still owns the reg. */
-  u8* out_reg_owned = NULL;
-  if (nout) {
-    out_reg_owned = (u8*)h->alloc(h, nout, 1);
-    memset(out_reg_owned, 0, nout);
-  }
-
-  /* ---- Pass 1: allocate output regs that are NOT early-clobber. ----
-   * Early-clobber (=&r) outputs are allocated in pass 3 once input regs
-   * are known so the disjoint-set property is checkable. */
-  for (u32 i = 0; i < nout; ++i) {
-    const char* body = asm_constraint_body(outs[i].str);
-    if (asm_is_early_clobber(outs[i].str)) continue;
-    if (body[0] == 'r') {
-      const Type* oty = outs[i].type ? outs[i].type : fallback_ty;
-      u8 cls = type_class(oty);
-      Reg r = alloc_reg_or_spill(g, cls, oty);
-      out_ops[i] = op_reg(r, oty);
-      out_reg_owned[i] = 1;
-    } else {
-      compiler_panic(g->c, g->cur_loc,
-                     "cg_inline_asm: unsupported output constraint '%s'",
-                     outs[i].str ? outs[i].str : "");
-    }
-  }
-
-  /* ---- Pass 2: materialize inputs per constraint. ----
-   * Matching constraints ("0".."9") need their referenced output's reg to
-   * already exist; non-early outputs satisfy that after pass 1. (An output
-   * referenced by a matching input must not itself be early-clobber — that
-   * combination is meaningless; we panic below if the parser produced it.) */
-  for (u32 i = 0; i < nin; ++i) {
-    const char* s = ins[i].str ? ins[i].str : "";
-    int matched = asm_parse_match_index(s);
-    if (matched >= 0) {
-      if ((u32)matched >= nout) {
-        compiler_panic(g->c, g->cur_loc,
-                       "cg_inline_asm: matching constraint '%s' references "
-                       "out-of-range output %d",
-                       s, matched);
-      }
-      if (asm_is_early_clobber(outs[matched].str)) {
-        compiler_panic(g->c, g->cur_loc,
-                       "cg_inline_asm: matching input '%s' references "
-                       "early-clobber output =&r",
-                       s);
-      }
-      /* Force input into the output's register. If the input is already an
-       * IMM or in a different reg, materialize via target->copy/load_imm
-       * into the bound output reg. The input SValue keeps its own reg
-       * (which we'll release at the end); the binding only needs the
-       * value to be present in out_ops[matched].v.reg before the asm runs. */
-      Operand bound = out_ops[matched];
-      ensure_reg(g, &in_svs[i]);
-      if (in_svs[i].op.kind == OPK_REG &&
-          in_svs[i].op.v.reg == bound.v.reg) {
-        /* Already in place. */
-      } else if (in_svs[i].op.kind == OPK_IMM) {
-        T->load_imm(T, bound, in_svs[i].op.v.imm);
-      } else {
-        Operand src = force_reg(g, &in_svs[i], sv_type(&in_svs[i]));
-        T->copy(T, bound, src);
-      }
-      in_ops[i] = bound;
-      continue;
-    }
-    if (s[0] == 'r') {
-      in_ops[i] = force_reg(g, &in_svs[i], sv_type(&in_svs[i]));
-    } else if (s[0] == 'i') {
-      if (in_svs[i].op.kind != OPK_IMM) {
-        compiler_panic(g->c, g->cur_loc,
-                       "cg_inline_asm: 'i' constraint requires constant input");
-      }
-      in_ops[i] = in_svs[i].op;
-    } else if (s[0] == 'm') {
-      if (in_svs[i].op.kind == OPK_INDIRECT) {
-        in_ops[i] = in_svs[i].op;
-      } else if (is_lvalue(&in_svs[i].op)) {
-        const Type* lt = sv_type(&in_svs[i]);
-        const Type* pty = type_ptr(g->pool, lt ? lt : type_void(g->pool));
-        Reg r = alloc_reg_or_spill(g, RC_INT, pty);
-        Operand dst = op_reg(r, pty);
-        T->addr_of(T, dst, in_svs[i].op);
-        /* Replace the SValue's lvalue with an INDIRECT pointing at the
-         * freshly-loaded address; the new INDIRECT owns the base reg, so
-         * release() at the end of the block will free it. */
-        if (in_svs[i].op.kind == OPK_INDIRECT) {
-          T->free_reg(T, in_svs[i].op.v.ind.base, RC_INT);
-        }
-        in_svs[i].op = op_indirect(r, 0, lt);
-        in_svs[i].res = RES_REG;
-        in_ops[i] = in_svs[i].op;
-      } else {
-        compiler_panic(
-            g->c, g->cur_loc,
-            "cg_inline_asm: 'm' constraint requires an addressable operand");
-      }
-    } else {
-      compiler_panic(g->c, g->cur_loc,
-                     "cg_inline_asm: unsupported input constraint '%s'", s);
-    }
-  }
-
-  /* ---- Pass 3: allocate early-clobber outputs (=&r) disjoint from inputs.
-   * The reg pool only hands out free regs, so any reg returned by alloc_reg
-   * is by construction not in use by any input materialized above. We loop
-   * to retry if the pool happens to recycle a reg the spill machinery just
-   * freed (none of the input materializers above call free_reg on input
-   * regs while inputs are still live, so a single alloc suffices in
-   * practice — but the loop documents the intent and gives a clean panic
-   * point). */
-  for (u32 i = 0; i < nout; ++i) {
-    if (!asm_is_early_clobber(outs[i].str)) continue;
-    const char* body = asm_constraint_body(outs[i].str);
-    if (body[0] != 'r') {
-      compiler_panic(g->c, g->cur_loc,
-                     "cg_inline_asm: unsupported early-clobber constraint '%s'",
-                     outs[i].str);
-    }
-    const Type* oty = outs[i].type ? outs[i].type : fallback_ty;
-    u8 cls = type_class(oty);
-    Reg r = alloc_reg_or_spill(g, cls, oty);
-    /* Validate disjoint: walk inputs, collide-check. The pool guarantees
-     * uniqueness against currently-allocated regs, so this is belt-and-
-     * suspenders, but the panic gives a meaningful diagnostic if any
-     * future binder change breaks the invariant. */
-    for (u32 k = 0; k < nin; ++k) {
-      if (in_ops[k].kind == OPK_REG && in_ops[k].v.reg == r) {
-        compiler_panic(g->c, g->cur_loc,
-                       "cg_inline_asm: early-clobber output collided with "
-                       "input reg (binder bug)");
-      }
-      if (in_ops[k].kind == OPK_INDIRECT && in_ops[k].v.ind.base == r) {
-        compiler_panic(g->c, g->cur_loc,
-                       "cg_inline_asm: early-clobber output collided with "
-                       "input INDIRECT base (binder bug)");
-      }
-    }
-    out_ops[i] = op_reg(r, oty);
-    out_reg_owned[i] = 1;
-  }
-
-  /* ---- "memory" clobber: spill all live RES_REG SValues. ----
-   * Intern "memory" once per call; Sym equality is pointer-equal after
-   * interning. The remaining stack values become RES_SPILLED, so a later
-   * read goes through ensure_reg → reload_reg. */
-  Sym sym_memory = pool_intern_cstr(g->pool, "memory");
-  int has_memory_clobber = 0;
-  for (u32 i = 0; i < nclob; ++i) {
-    if (clobbers[i] == sym_memory) {
-      has_memory_clobber = 1;
-      break;
-    }
-  }
-  if (has_memory_clobber) {
-    for (u32 i = 0; i < g->sp; ++i) {
-      SValue* sv = &g->stack[i];
-      if (sv->res != RES_REG) continue;
-      u8 cls = class_of_sv(sv);
-      FrameSlot slot = take_spill_slot(g, cls);
-      Operand victim_reg = op_reg((Reg)reg_of_sv(sv), sv->type);
-      T->spill_reg(T, victim_reg, slot, mem_for_spill(g, sv));
-      T->free_reg(T, (Reg)reg_of_sv(sv), cls);
-      sv->spill_slot = slot;
-      sv->res = RES_SPILLED;
-      set_owned_reg(sv, (Reg)REG_NONE);
-    }
-  }
-
-  /* ---- Named register clobbers: spill any live SValue currently bound
-   * to a clobbered physical reg.  Skipped when "memory" already swept
-   * the stack above. Backends without resolve_reg_name accept all named
-   * clobbers as no-ops (matches pre-v1 behavior). */
-  if (!has_memory_clobber && T->resolve_reg_name) {
-    for (u32 i = 0; i < nclob; ++i) {
-      Reg phys;
-      RegClass cls;
-      if (T->resolve_reg_name(T, clobbers[i], &phys, &cls) != 0) continue;
-      /* Reject overlap with bound in/out operands (GCC contract). */
-      for (u32 k = 0; k < nout; ++k) {
-        if (out_ops[k].kind == OPK_REG && out_ops[k].cls == cls &&
-            (Reg)out_ops[k].v.reg == phys) {
-          compiler_panic(g->c, g->cur_loc,
-                         "cg_inline_asm: named clobber overlaps output reg");
-        }
-      }
-      for (u32 k = 0; k < nin; ++k) {
-        if (in_ops[k].kind == OPK_REG && in_ops[k].cls == cls &&
-            (Reg)in_ops[k].v.reg == phys) {
-          compiler_panic(g->c, g->cur_loc,
-                         "cg_inline_asm: named clobber overlaps input reg");
-        }
-      }
-      for (u32 k = 0; k < g->sp; ++k) {
-        SValue* sv = &g->stack[k];
-        if (sv->res != RES_REG) continue;
-        if (class_of_sv(sv) != cls) continue;
-        if ((Reg)reg_of_sv(sv) != phys) continue;
-        FrameSlot slot = take_spill_slot(g, cls);
-        Operand victim_reg = op_reg(phys, sv->type);
-        T->spill_reg(T, victim_reg, slot, mem_for_spill(g, sv));
-        T->free_reg(T, phys, cls);
-        sv->spill_slot = slot;
-        sv->res = RES_SPILLED;
-        set_owned_reg(sv, (Reg)REG_NONE);
-      }
-    }
-  }
-
-  /* ---- Call the per-arch asm_block. ---- */
-  T->asm_block(T, tmpl, outs, nout, out_ops, ins, nin, in_ops, clobbers, nclob);
-
-  /* ---- Release input SValue resources. ----
-   * Inputs are consumed by the asm block. Their owned regs/slots return to
-   * the pool. Note: matching inputs that were copied into an output reg
-   * still own their original input reg — release frees that one; the
-   * output reg lives on through the pushed output SValue. */
-  for (u32 i = 0; i < nin; ++i) {
-    release(g, &in_svs[i]);
-  }
-
-  /* ---- Push outputs back as fresh SValues for the parser to assign. ----
-   * Each pushed SValue owns the freshly-allocated reg (RES_REG), so the
-   * parser's eventual cg_store on it will release the reg after consuming. */
-  for (u32 i = 0; i < nout; ++i) {
-    const Type* oty = outs[i].type ? outs[i].type : fallback_ty;
-    SValue sv = make_sv(out_ops[i], oty);
-    /* If the target overwrote out_ops[i] with a different kind (e.g. a
-     * memory location), make_sv already classified residency correctly. */
-    if (!out_reg_owned[i] && sv.res == RES_REG) {
-      /* Not owned by us — the value is borrowed from elsewhere. Treat as
-       * inherent to avoid double-free. (No production path produces this
-       * today, but the bookkeeping is explicit.) */
-      sv.res = RES_INHERENT;
-    }
-    push(g, sv);
-  }
-
-  if (in_svs) h->free(h, in_svs, sizeof(SValue) * nin);
-  if (in_ops) h->free(h, in_ops, sizeof(Operand) * nin);
-  if (out_ops) h->free(h, out_ops, sizeof(Operand) * nout);
-  if (out_reg_owned) h->free(h, out_reg_owned, nout);
-}
diff --git a/src/cg/cg.h b/src/cg/cg.h
@@ -1,195 +0,0 @@
-#ifndef CFREE_CG_H
-#define CFREE_CG_H
-
-#include "arch/arch.h"
-#include "decl/decl.h"
-#include "type/type.h"
-
-typedef struct CG CG;
-typedef struct Debug Debug;
-
-/* Debug is optional; pass NULL when -g is off. */
-CG* cg_new(Compiler*, CGTarget*, Debug*);
-void cg_free(CG*);
-CGTarget* cg_target(CG*);
-
-/* ----- functions ----- */
-void cg_func_begin(CG*, const CGFuncDesc*);
-void cg_func_end(CG*);
-
-/* ----- locals & params ----- */
-FrameSlot cg_local(
-    CG*, const FrameSlotDesc*); /* returns frame slot; pushes nothing */
-void cg_param(CG*, const CGParamDesc*);
-
-/* ----- value-stack pushes ----- */
-void cg_push_int(CG*, i64, const Type*);
-void cg_push_const(CG*, ConstBytes); /* exact ABI bytes */
-void cg_push_float(CG*, double,
-                   const Type*); /* convenience for simple parser paths */
-void cg_push_str(CG*, Sym str_id,
-                 const Type*);      /* into rodata; pushes pointer */
-void cg_push_local(CG*, FrameSlot); /* lvalue */
-void cg_push_global(CG*, ObjSymId, const Type*); /* lvalue */
-
-/* ----- value-stack manipulation ----- */
-void cg_load(CG*);  /* lvalue → rvalue; derives MemAccess */
-void cg_addr(CG*);  /* lvalue → ptr rvalue */
-void cg_store(CG*); /* [..., lv, rv] → []; derives MemAccess */
-void cg_dup(CG*);
-void cg_swap(CG*);
-void cg_drop(CG*);
-
-/* Aggregate and bitfield operations keep C object semantics visible to direct
- * targets and opt. Addresses are lvalues or pointer rvalues on the value stack;
- * sizes, offsets, storage units, and alignments come from TargetABI. */
-void cg_copy_aggregate(CG*,
-                       AggregateAccess); /* [..., dst_addr, src_addr] → [] */
-void cg_set_aggregate(CG*, AggregateAccess); /* [..., dst_addr, byte] → [] */
-void cg_bitfield_load(CG*, BitFieldAccess);  /* [..., record_addr] → value */
-void cg_bitfield_store(CG*,
-                       BitFieldAccess); /* [..., record_addr, value] → [] */
-
-void cg_binop(CG*, BinOp);
-void cg_unop(CG*, UnOp);
-void cg_cmp(CG*, CmpOp);
-void cg_convert(CG*, const Type* dst); /* picks ConvKind from src/dst */
-
-/* Increment/decrement an lvalue in place. Pops the lvalue from the value
- * stack, performs `*lv = *lv +/- 1`, and pushes the result rvalue. With
- * `post=1` the pushed value is the OLD value (post-inc/dec); with
- * `post=0` it is the NEW value (pre-inc/dec). `op` is BO_IADD or BO_ISUB.
- * The integer-1 step is the parser's responsibility for non-integer
- * types (pointer arithmetic), but the spine slice deals only with
- * integer locals. */
-void cg_inc_dec(CG*, BinOp op, int post);
-
-/* Direct vs indirect: callee on the stack distinguishes itself by
- * SValue/operand kind. CG obtains ABIFuncInfo from Compiler.abi, materializes
- * CGABIValue argument/return parts, then calls CGTarget.call with a CGCallDesc.
- * On WASM, fn_type selects the call_indirect type index (interned Type*
- * identity is the index source of truth). */
-void cg_call(CG*, u32 nargs,
-             const Type* fn_type); /* stack: [..., callee, arg0..argN-1]
-                                      → result (if non-void) */
-/* Sibling call: pops [callee, arg0..argN-1], lowers as a tail call. The
- * caller's epilogue runs and control transfers to the callee, whose RET
- * returns to the caller's caller. cg_tail_call implicitly terminates the
- * function — the parser must NOT follow it with cg_ret. v1 has must-tail
- * semantics: legality (ABI-compatible return, args fit in registers, no
- * caller-frame pointer escapes via byval) is the caller's responsibility;
- * the backend panics if it cannot honor the tail request. */
-void cg_tail_call(CG*, u32 nargs, const Type* fn_type);
-void cg_ret(CG*, int has_value);
-
-/* ----- C declarations and global initializers -----
- * Parser records C declaration semantics through DeclTable. CG consumes DeclIds
- * only when a declaration becomes executable code or an addressable object. */
-void cg_bind_decl(CG*, DeclId);
-
-/* ----- alloca -----
- * Dynamic stack allocation. Pops `size_bytes` (i64), pushes `void*` aligned to
- * max_align_t. v1 does not parse C99/C11 VLAs (predefines __STDC_NO_VLA__);
- * cg_alloca is reachable only via the __builtin_alloca path. */
-void cg_alloca(CG*);
-
-/* ----- variadics -----
- * va_list type is per-arch (defined in <stdarg.h>). The four ops match the C
- * macros after builtin substitution. cg_va_arg pops &ap and pushes the next
- * arg of `t`. cg_va_start/end/copy pop the va_list addresses and push nothing.
- */
-/* The trailing underscores avoid colliding with <stdarg.h> macros — cfree
- * sources include stdarg.h for compiler_panicv (see core.h). */
-void cg_va_start_(CG*);              /* pop &ap */
-void cg_va_arg_(CG*, const Type* t); /* pop &ap; push value */
-void cg_va_end_(CG*);                /* pop &ap */
-void cg_va_copy_(CG*);               /* pop &dst, &src */
-
-/* ----- setjmp / longjmp -----
- * Intrinsic lowering for targets that cannot use a plain libc call. Real
- * native arches generally parse <setjmp.h>'s setjmp as a normal call.
- * cg_setjmp pops &buf and pushes i32 (0 on direct return, nonzero on longjmp).
- * cg_longjmp pops &buf and val; does not return. */
-void cg_setjmp(CG*);
-void cg_longjmp(CG*);
-
-/* ----- atomics -----
- * Pointer operands are typed `_Atomic T*`. cg derives MemAccess from the
- * pointee type, qualifiers, alignment facts, and alias root; the pointee type
- * drives width and tells the backend whether the op fits inline or routes to
- * compiler-rt. */
-void cg_atomic_load(CG*, MemOrder);          /* pops ptr; pushes value */
-void cg_atomic_store(CG*, MemOrder);         /* pops ptr, value */
-void cg_atomic_rmw(CG*, AtomicOp, MemOrder); /* pops ptr, val; pushes prior */
-void cg_atomic_cas(CG*, MemOrder success, MemOrder failure);
-/* pops ptr, expected, desired;
- * pushes (prior, ok_i1) */
-void cg_fence(CG*, MemOrder);
-
-/* ----- intrinsics -----
- * Builtin lowering for one-arg one-result intrinsics whose width is taken from
- * the operand and whose C result type is `int` (e.g. __builtin_ctz / clz /
- * popcount). Pops one rvalue, dispatches to CGTarget.intrinsic with the given
- * kind, pushes the result as `int`. */
-void cg_intrinsic_unary_to_int(CG*, IntrinKind);
-
-/* Zero-operand, zero-result intrinsic (e.g. __builtin_trap,
- * __builtin_unreachable). Lowers via CGTarget.intrinsic and pushes no
- * value — caller is responsible for pushing a dummy `int 0` if the
- * builtin appears in an expression context. */
-void cg_intrinsic_void(CG*, IntrinKind);
-
-/* ----- control flow (CG-level labels) -----
- * cg_branch_true fuses with a preceding cg_cmp into a single
- * CGTarget.cmp_branch when the i1 on top of stack is the unconsumed result of
- * that cmp. For a non-cmp i1, it emits cmp_branch(CMP_NE, val, IMM_ZERO,
- * label). */
-typedef u32 CGLabel;
-CGLabel cg_label_new(CG*);
-void cg_label_place(CG*, CGLabel);
-void cg_jump(CG*, CGLabel);
-void cg_branch_true(CG*, CGLabel); /* pops i1 */
-void cg_branch_false(CG*, CGLabel);
-
-/* ----- structured control flow -----
- * Used for if / while / for / do — the cases where the parser already knows
- * the structure. Nests like a stack: every scope_begin must pair with one
- * scope_end at the same nesting depth. Break and continue targets are explicit
- * so C `for` continue jumps to the increment expression, not necessarily the
- * loop header.
- *
- * Real backends implement these as a thin shim over label_place/jump (no code
- * size cost). The WASM backend consumes them directly to emit block/loop/if
- * with structurally-bounded br targets — that's the source of CFI on WASM
- * without invoking the relooper.
- *
- * goto, computed-goto, and switch fallthrough still go through the flat label
- * API above. opt's IR is flat-CFG; at -O2 the WASM lowering pass relooper
- * reconstructs structure from the flat IR. At -O0/-O1 (no opt wrapper),
- * CG drives the WASM CGTarget directly with scope ops and no relooper runs. */
-/* ScopeKind is shared with CGTarget (see arch.h). */
-typedef u32 CGScope;
-typedef struct CGScopeConfig {
-  ScopeKind kind;
-  CGLabel break_label;
-  CGLabel continue_label;
-  const Type* result_type;
-} CGScopeConfig;
-CGScope cg_scope_begin(CG*, CGScopeConfig); /* IF: pops i1 */
-void cg_scope_else(CG*, CGScope);           /* IF only */
-void cg_scope_end(CG*, CGScope);
-void cg_break(CG*, CGScope);
-void cg_continue(CG*, CGScope); /* LOOP only */
-
-/* ----- source location ----- */
-void cg_set_loc(CG*, SrcLoc); /* propagates to CGTarget and Debug */
-
-/* ----- inline asm -----
- * Inputs are popped from the CG stack in declaration order before outputs are
- * pushed back as fresh SValues. Constraints are GCC-style strings; binding
- * is per-arch and happens inside CGTarget.asm_block. */
-void cg_inline_asm(CG*, const char* tmpl, const AsmConstraint* outs, u32 nout,
-                   const AsmConstraint* ins, u32 nin, const Sym* clobbers,
-                   u32 nclob);
-
-#endif
diff --git a/src/cg/fold.c b/src/cg/fold.c
@@ -1,154 +0,0 @@
-#include "cg/fold.h"
-
-/* Truncate (and re-sign-extend, for signed types) a folded i64 down to
- * the width of `ty`, so that subsequent folds and compares see the same
- * value the backend would have produced after narrowing to the
- * destination register. No-op for >= 8-byte types and for NULL ty. */
-static i64 narrow(TargetABI* abi, const Type* ty, i64 v) {
-  if (!ty || !abi) return v;
-  u32 sz = abi_sizeof(abi, ty);
-  if (sz >= 8) return v;
-  u64 mask = ((u64)1 << (sz * 8)) - 1;
-  u64 u = (u64)v & mask;
-  if (abi_type_info(abi, ty).signed_) {
-    u64 sign_bit = (u64)1 << (sz * 8 - 1);
-    if (u & sign_bit) u |= ~mask;
-  }
-  return (i64)u;
-}
-
-static Operand make_imm(i64 v, const Type* ty) {
-  Operand o;
-  o.kind = OPK_IMM;
-  o.cls = RC_INT;
-  o.pad = 0;
-  o.type = ty;
-  o.v.imm = v;
-  return o;
-}
-
-/* Literal-literal integer binop. Returns 1 with *out set, or 0 if `op`
- * isn't a foldable kind. Excludes SDIV/UDIV/SREM/UREM (must trap on
- * divisor 0 and INT_MIN/-1), SHL/SHR_* (count >= width is type-width-
- * dependent and not in this tier), and float ops (rounding/NaN belong
- * to the backend). */
-static int literal_binop(BinOp op, i64 a, i64 b, i64* out) {
-  switch (op) {
-    case BO_IADD: *out = (i64)((u64)a + (u64)b); return 1;
-    case BO_ISUB: *out = (i64)((u64)a - (u64)b); return 1;
-    case BO_IMUL: *out = (i64)((u64)a * (u64)b); return 1;
-    case BO_AND:  *out = a & b; return 1;
-    case BO_OR:   *out = a | b; return 1;
-    case BO_XOR:  *out = a ^ b; return 1;
-    default: return 0;
-  }
-}
-
-/* Algebraic-identity dispatch for an integer binop with one literal
- * operand. `k` is the constant; `k_on_right` distinguishes lhs vs rhs
- * for non-commutative ops (ISUB, SHL, SHR_*, SDIV, UDIV).
- *   FID_NONE  — no identity, caller emits normally.
- *   FID_KEEP  — drop the IMM, the non-constant operand is the result.
- *   FID_ZERO  — result is constant 0; drop both operands. */
-typedef enum FoldIdent { FID_NONE, FID_KEEP, FID_ZERO } FoldIdent;
-
-static FoldIdent identity_for(BinOp op, i64 k, int k_on_right) {
-  switch (op) {
-    case BO_IADD: case BO_OR: case BO_XOR:
-      /* x + 0, 0 + x, x | 0, 0 | x, x ^ 0, 0 ^ x */
-      return (k == 0) ? FID_KEEP : FID_NONE;
-    case BO_ISUB:
-      /* x - 0 only; 0 - x needs a UO_NEG and isn't an identity. */
-      return (k == 0 && k_on_right) ? FID_KEEP : FID_NONE;
-    case BO_IMUL:
-      if (k == 1) return FID_KEEP;
-      if (k == 0) return FID_ZERO;
-      return FID_NONE;
-    case BO_AND:
-      if (k == 0) return FID_ZERO;
-      /* All-ones mask of any width sign-extends to -1 as i64. */
-      if (k == -1) return FID_KEEP;
-      return FID_NONE;
-    case BO_SDIV: case BO_UDIV:
-      /* x / 1 only; 1 / x isn't an identity and divisor-on-lhs gives
-       * no useful fold. */
-      return (k == 1 && k_on_right) ? FID_KEEP : FID_NONE;
-    case BO_SHL: case BO_SHR_S: case BO_SHR_U:
-      /* x << 0, x >> 0 only; a zero shift-count on the lhs is the
-       * value being shifted — folding that would need to release the
-       * rhs operand, deferred until a use exists. */
-      return (k == 0 && k_on_right) ? FID_KEEP : FID_NONE;
-    default:
-      return FID_NONE;
-  }
-}
-
-CGFoldKind cg_fold_binop(BinOp op, Operand a, Operand b, const Type* ty,
-                         TargetABI* abi, Operand* out) {
-  /* Tier 1: both literal — fold to a single IMM. */
-  if (a.kind == OPK_IMM && b.kind == OPK_IMM) {
-    i64 r;
-    if (literal_binop(op, a.v.imm, b.v.imm, &r)) {
-      *out = make_imm(narrow(abi, ty, r), ty);
-      return CG_FOLD_IMM;
-    }
-  }
-  /* Tier 2: algebraic identities. Side-effect-free: the non-constant
-   * operand has already been materialized onto the value stack, so any
-   * computation that produced it has already executed. Dropping the
-   * IMM side is the caller's responsibility (release reg/slot if any). */
-  if (b.kind == OPK_IMM) {
-    switch (identity_for(op, b.v.imm, /*k_on_right=*/1)) {
-      case FID_KEEP: return CG_FOLD_KEEP_A;
-      case FID_ZERO: *out = make_imm(0, ty); return CG_FOLD_IMM;
-      case FID_NONE: break;
-    }
-  }
-  if (a.kind == OPK_IMM) {
-    switch (identity_for(op, a.v.imm, /*k_on_right=*/0)) {
-      case FID_KEEP: return CG_FOLD_KEEP_B;
-      case FID_ZERO: *out = make_imm(0, ty); return CG_FOLD_IMM;
-      case FID_NONE: break;
-    }
-  }
-  return CG_FOLD_NONE;
-}
-
-CGFoldKind cg_fold_unop(UnOp op, Operand a, const Type* ty,
-                        TargetABI* abi, Operand* out) {
-  if (a.kind != OPK_IMM) return CG_FOLD_NONE;
-  i64 v = a.v.imm;
-  i64 r;
-  switch (op) {
-    case UO_NEG:  r = (i64)(-(u64)v); break;
-    case UO_BNOT: r = ~v; break;
-    case UO_NOT:  r = v ? 0 : 1; break;
-    default: return CG_FOLD_NONE;
-  }
-  *out = make_imm(narrow(abi, ty, r), ty);
-  return CG_FOLD_IMM;
-}
-
-CGFoldKind cg_fold_cmp(CmpOp op, Operand a, Operand b, const Type* int_ty,
-                       TargetABI* abi, Operand* out) {
-  if (a.kind != OPK_IMM || b.kind != OPK_IMM) return CG_FOLD_NONE;
-  (void)abi; /* compare result is `int` 0/1 — no narrowing needed */
-  i64 x = a.v.imm;
-  i64 y = b.v.imm;
-  i64 r;
-  switch (op) {
-    case CMP_EQ:   r = (x == y); break;
-    case CMP_NE:   r = (x != y); break;
-    case CMP_LT_S: r = (x <  y); break;
-    case CMP_LE_S: r = (x <= y); break;
-    case CMP_GT_S: r = (x >  y); break;
-    case CMP_GE_S: r = (x >= y); break;
-    case CMP_LT_U: r = ((u64)x <  (u64)y); break;
-    case CMP_LE_U: r = ((u64)x <= (u64)y); break;
-    case CMP_GT_U: r = ((u64)x >  (u64)y); break;
-    case CMP_GE_U: r = ((u64)x >= (u64)y); break;
-    default: return CG_FOLD_NONE;
-  }
-  *out = make_imm(r, int_ty);
-  return CG_FOLD_IMM;
-}
diff --git a/src/cg/fold.h b/src/cg/fold.h
@@ -1,47 +0,0 @@
-#ifndef CFREE_CG_FOLD_H
-#define CFREE_CG_FOLD_H
-
-/* Pure constant-folding and algebraic-identity helpers for binop/unop/cmp
- * on Operand inputs. No CG or IR state: callers (cg.c today, opt's
- * pass_gvn / pass_combine eventually) inspect the result and apply it
- * to whichever value representation they hold. All folds are restricted
- * to integer domain — float ops never reach the OPK_IMM path (FP
- * literals materialize via load_const to OPK_REG before they enter the
- * value stack). Division, remainder, shifts, and FP arithmetic are
- * deliberately excluded from the literal-fold paths to preserve trap
- * semantics and rounding behavior; algebraic identities on those ops
- * are limited to cases that don't depend on UB-exploiting transforms
- * (see doc/OPT.md §5.5). */
-
-#include "abi/abi.h"
-#include "arch/arch.h"
-#include "type/type.h"
-
-typedef enum CGFoldKind {
-  CG_FOLD_NONE,   /* no fold; caller emits normally */
-  CG_FOLD_IMM,    /* result is the OPK_IMM Operand in *out; drop both inputs */
-  CG_FOLD_KEEP_A, /* result is `a` unchanged; drop `b` */
-  CG_FOLD_KEEP_B, /* result is `b` unchanged; drop `a` */
-} CGFoldKind;
-
-/* Binop fold + identity. Examines `a` and `b` against `op`:
- *   - both OPK_IMM        → CG_FOLD_IMM, *out = literal narrowed to `ty`
- *   - one OPK_IMM identity → CG_FOLD_KEEP_A or _B, or CG_FOLD_IMM (zero)
- *   - otherwise            → CG_FOLD_NONE
- * `ty` is the result type (used for width-narrowing on the fold path).
- * `abi` supplies size/signedness; required when ty is non-NULL. */
-CGFoldKind cg_fold_binop(BinOp op, Operand a, Operand b, const Type* ty,
-                         TargetABI* abi, Operand* out);
-
-/* Unop fold. Returns CG_FOLD_IMM with *out set on success, CG_FOLD_NONE
- * otherwise. Only integer-domain unops are folded. */
-CGFoldKind cg_fold_unop(UnOp op, Operand a, const Type* ty,
-                        TargetABI* abi, Operand* out);
-
-/* Integer-compare fold. Returns CG_FOLD_IMM with *out set to 0 or 1 of
- * type `int_ty` on success. FP compares (CMP_*_F) return CG_FOLD_NONE —
- * NaN/ordering belongs to the backend. */
-CGFoldKind cg_fold_cmp(CmpOp op, Operand a, Operand b, const Type* int_ty,
-                       TargetABI* abi, Operand* out);
-
-#endif
diff --git a/src/emu/emu.c b/src/emu/emu.c
@@ -11,16 +11,14 @@
 #include "emu/emu.h"
 
 #include <cfree.h>
+#include <cfree/cg.h>
 #include <setjmp.h>
 #include <string.h>
 
-#include "arch/arch.h"
-#include "cg/cg.h"
 #include "core/heap.h"
 #include "core/pool.h"
 #include "link/link.h"
 #include "obj/obj.h"
-#include "opt/opt.h"
 
 /* ---- Lifecycle ---- */
 
@@ -140,9 +138,7 @@ static void* translate_block(CfreeEmu* e, u64 guest_pc) {
   EmuInst insts[EMU_MAX_INSTS_PER_BLOCK];
   u32 ninsts;
   ObjBuilder* ob;
-  MCEmitter* mc;
-  CGTarget* target;
-  CG* cg;
+  CfreeCg* cg;
   Sym block_name;
   ObjSymId block_sym;
   EmuLiftCtx ctx;
@@ -172,18 +168,16 @@ static void* translate_block(CfreeEmu* e, u64 guest_pc) {
     for (j = 0; j < ninsts; ++j) emu_trace_insn(e->c, guest_pc, &insts[j]);
   }
 
-  /* Per-block ObjBuilder + MC + CGTarget pipeline. The block lands
-   * as a single host function. */
+  /* Per-block ObjBuilder + public CG pipeline. The block lands as a single
+   * host function once per-ISA lifters start emitting real code. */
   ob = obj_new(e->c);
-  mc = mc_new(e->c, ob);
-  target = cgtarget_new(e->c, ob, mc);
-  if (e->opt_level > 0) target = opt_cgtarget_new(e->c, target, e->opt_level);
-  cg = cg_new(e->c, target, /*Debug*/ NULL);
+  cg = cfree_cg_new(e->c, ob);
+  if (!cg) compiler_panic(e->c, no_loc(), "emu: cfree_cg_new failed");
 
   block_name = emu_block_sym_name(e->c, guest_pc);
   /* Forward-declare the block's symbol so the lifter can refer to it
-   * via cg_func_begin. obj_symbol_define fills in (section, value, size)
-   * once the function is emitted. */
+   * via cfree_cg_func_begin. obj_symbol_define fills in (section, value,
+   * size) once the function is emitted. */
   block_sym =
       obj_symbol(ob, block_name, SB_GLOBAL, SK_FUNC, OBJ_SEC_NONE, 0, 0);
 
@@ -196,13 +190,9 @@ static void* translate_block(CfreeEmu* e, u64 guest_pc) {
 
   emu_lift_block(e->guest_arch, cg, insts, ninsts, &ctx);
 
-  cgtarget_finalize(target);
+  cfree_cg_free(cg);
   obj_finalize(ob);
 
-  cg_free(cg);
-  cgtarget_free(target); /* opt_cgtarget cascades to wrapped target */
-  mc_free(mc);
-
   /* Add the block's object to the session linker and extend the
    * image. link_resolve_extend places the new section at the next
    * free offset within the reserved VA region (must not change host
diff --git a/src/emu/emu.h b/src/emu/emu.h
@@ -12,12 +12,12 @@
  * never reaches into ISA-specific code. */
 
 #include <cfree.h>
+#include <cfree/cg.h>
 
 #include "core/core.h"
 #include "obj/obj.h"
 #include "type/type.h"
 
-typedef struct CG CG;
 typedef struct LinkImage LinkImage;
 typedef struct Linker Linker;
 
@@ -107,8 +107,8 @@ typedef struct EmuLiftCtx {
 } EmuLiftCtx;
 
 /* Walk `insts` and emit one CG function (signature next_pc_t(CPUState*))
- * for the block. Calls cg_func_begin/end exactly once. */
-void emu_lift_block(CfreeEmuArch, CG*, const EmuInst* insts, u32 n,
+ * for the block. Calls cfree_cg_func_begin/end exactly once. */
+void emu_lift_block(CfreeEmuArch, CfreeCg*, const EmuInst* insts, u32 n,
                     const EmuLiftCtx*);
 
 /* ---- Code cache ------------------------------------------------- */
diff --git a/src/emu/lift.c b/src/emu/lift.c
@@ -4,12 +4,12 @@
  * pipeline below CG is unchanged from the C front-end. */
 
 #include <cfree.h>
+#include <cfree/cg.h>
 
-#include "cg/cg.h"
 #include "emu/emu.h"
 
-void emu_lift_block(CfreeEmuArch arch, CG* cg, const EmuInst* insts, u32 n,
-                    const EmuLiftCtx* ctx) {
+void emu_lift_block(CfreeEmuArch arch, CfreeCg* cg, const EmuInst* insts,
+                    u32 n, const EmuLiftCtx* ctx) {
   /* Per-ISA lifter tables not yet landed. translate_block panics
    * before it would finalize an empty block, so this stub never
    * silently produces an executable host function. */
diff --git a/test/cg/CORPUS.md b/test/cg/CORPUS.md
@@ -1,436 +0,0 @@
-# cg / CGTarget / MCEmitter test corpus
-
-Coverage matrix for `test/cg/`. Each registered case in
-`harness/cases.c` is one row; behavioral oracle is `test_main`'s return
-value (mod 256, since POSIX exit codes are one byte). Mirrors the CORPUS
-shape used by `test/elf/` and `test/link/`.
-
-Test paths per case (run.sh):
-
-- **D** in-process JIT (aarch64 host only) — `cg-runner --jit NAME`.
-- **R** ELF roundtrip (host-arch agnostic) — `cg-runner --emit` →
-  `cfree-roundtrip` → `readelf` + `normalize.py` diff.
-- **E** exec via qemu/podman — `cg-runner --emit` + `start.o` →
-  `link-exe-runner` → run.
-- **J** jit-via-file (aarch64 host only) — `cg-runner --emit` →
-  `jit-runner`.
-- **W** DWARF check (Group P only) — `cg-runner --emit` +
-  `cg-runner --dwarf-checks NAME | cg_check_dwarf OBJ`. Opens the obj
-  via `cfree_dwarf_open` and asserts the line program / subprograms
-  registered for the case. Cases without registered checks skip
-  silently.
-
-`O` (opt-wrapped) lands once `opt_cgtarget` is implemented.
-
-The harness drives the same building blocks the parser will: pool-interned
-Types via `type_*`, ABI classification via `abi_func_info`, and `CGTarget`
-for lowering. There are no ABI mocks. Cases that exercise interfaces the
-lib does not yet implement (param/call/aggregate/FP methods on the
-backend; `type_func`/`abi_func_info`) link against the same symbols the
-parser will, and fail at runtime until those land — that is intentional.
-
-## Status legend
-
-- ★ landed in the spine
-- · planned (case registered, expected value fixed)
-- (deferred) — explicit non-goal for the current pass
-
-## MC-only — direct MCEmitter
-
-| Case | Status | Expected | Notes |
-|---|---|---|---|
-| `mc_smoke` | ★ | 42 | hand-built `mov w0, #42; ret`; analogue of `test/elf/unit/smoke.c` |
-
-## Group A — function lifecycle and return
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `a01_return_const_42`     | ★ | `alloc_reg; load_imm 42; ret reg`           |  42 |
-| `a02_return_zero`         | ★ | `load_imm 0; ret reg`                       |   0 |
-| `a03_ret_imm`             | ★ | `ret IMM 17` (backend materializes)         |  17 |
-| `a04_copy_reg`            | ★ | `load_imm 7; copy r1->r2; ret r2`           |   7 |
-| `a05_return_neg_small`    | ★ | `load_imm -7` via MOVN; ret                 | 249 (= -7 & 0xff) |
-| `a06_return_i64`          | ★ | i64 `load_imm 0x1_0000_002A`; ret as i64    |  42 (low 32 of x0) |
-| `a07_void_return`         | ★ | `ret(NULL)`                                 |   0 (via _start zeroing x0) |
-| `a08_multiple_returns`    | ★ | `ret_imm 1; ret_imm 2` (second is dead)     |   1 |
-| `a09_load_imm_movz_movk`  | ★ | `load_imm 0xABCD` (multi-step materialize)  | 205 (= 0xCD) |
-| `a10_return_u8`           | ★ | `load_imm 200` into u8 reg; ret             | 200 |
-
-## Group B — frame slots, parameters, locals
-
-Param/call cases pair a helper function with `test_main`. Both share one
-`CGTarget` instance — the backend must support multiple
-`func_begin`/`func_end` pairs per TU. `cgtest_begin_func` builds
-`CGFuncDesc` from a real `type_func`/`abi_func_info` pair; param
-materialization, slot allocation, and call lowering use the live
-`TargetABI`.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `b01_param_int`        | ★ | `int echo(int x){return x;}; echo(201)`                                    | 201 |
-| `b02_param_sum`        | ★ | `int sum2(int a,int b){return a+b;}; sum2(40,2)`                           |  42 |
-| `b03_param_spill`      | ★ | `int sum9(a..i)`; nine int params (8 GPR, 1 stack); `sum9(1..9)`           |  45 |
-| `b04_local_int`        | ★ | local int slot; `*p = 42; return *p`                                       |  42 |
-| `b05_addr_taken_local` | ★ | `int x=17; int*p=&x; *p+=1; return *p`                                     |  18 |
-| `b06_sret`             | ★ | `struct Pt{int a,b;}; Pt mk(){{10,32}}; pt=mk(); return pt.a+pt.b`         |  42 |
-| `b07_byval_param`      | ★ | `int take(struct Pt p){return p.a+p.b;}; take({15,27})`                    |  42 |
-| `b08_fp_param`         | ★ | `int trunc(float f){return (int)f;}; trunc(7.5f)`                          |   7 |
-
-## Group C — integer arithmetic
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `c01_add`            | ★ | `1 + 2`                              |   3 |
-| `c02_sub_mul`        | ★ | `7 * 3 - 4`                          |  17 |
-| `c03_bitwise`        | ★ | `(~3) & 0xff`                        | 252 |
-| `c04_shift`          | ★ | `(1<<5) \| (16>>1)` (logical shr)    |  40 |
-| `c05_div_mod`        | ★ | `23 / 4 + 23 % 4` (signed)           |   8 |
-| `c06_xor`            | ★ | `0xa5 ^ 0x5a`                        | 255 |
-| `c07_iadd_i64`       | ★ | i64 `0x1_0000_0029 + 0x1_0000_0001`  |  42 (low 32) |
-| `c08_unsigned_div`   | ★ | `100u / 7u`                          |  14 |
-| `c09_neg`            | ★ | `UO_NEG` 42                          | 214 (= -42 & 0xff) |
-| `c10_logical_not`    | ★ | `UO_NOT 0` (zero-test → 0/1)         |   1 |
-| `c11_shr_signed`     | ★ | `-16 >>(s) 2`                        | 252 (= -4 & 0xff) |
-| `c12_imul_i64`       | ★ | i64 `7 * 6`                          |  42 |
-
-## Group D — compare and branch
-
-Both arms of `cmp` (materializes 0/1 in a GPR) and `cmp_branch` (fused
-test+branch) plus the structured-CFG ops `scope_*`. `cmp` cases return
-the materialized 0/1 directly; `cmp_branch` cases return distinct
-sentinels from the taken vs. fallthrough paths so the oracle can tell
-them apart.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `d01_cmp_eq_true`           | · | `cmp(EQ, 5, 5)` materialize → 1                                |   1 |
-| `d02_cmp_eq_false`          | · | `cmp(EQ, 5, 6)` → 0                                            |   0 |
-| `d03_cmp_ne`                | · | `cmp(NE, 5, 6)` → 1                                            |   1 |
-| `d04_cmp_lt_signed`         | · | `cmp(LT_S, -1, 1)` → 1                                         |   1 |
-| `d05_cmp_lt_unsigned`       | · | `cmp(LT_U, 0xFFFFFFFFu, 1u)` → 0 (signedness in op, not Type)  |   0 |
-| `d06_cmp_ge_signed`         | · | `cmp(GE_S, 5, 5)` → 1 (boundary on inclusive ops)              |   1 |
-| `d07_cmp_branch_taken`      | · | `cmp_branch(EQ, 7, 7) → L`; landing pad returns 42              |  42 |
-| `d08_cmp_branch_not_taken`  | · | `cmp_branch(EQ, 5, 6) → L`; fallthrough returns 33              |  33 |
-| `d09_cmp_branch_lt_signed`  | · | `cmp_branch(LT_S, -3, 0) → L`; landing pad returns 9            |   9 |
-| `d10_jump`                  | · | unconditional `jump L`; early ret is dead                      |   5 |
-| `d11_scope_if_true`         | · | `int x=99; if(1) x=33; return x;`                              |  33 |
-| `d12_scope_if_false`        | · | `int x=99; if(0) x=33; return x;`                              |  99 |
-| `d13_scope_if_else`         | · | `if(0) x=10; else x=7; return x;` (exercises `scope_else`)     |   7 |
-
-## Group E — conversions
-
-One `ConvKind` per case, plus the boundary widths the AArch64 backend
-selects between (`UXTB`/`SXTB` vs `UXTH`/`SXTH` vs `UBFX`/`SBFX`,
-32→64 sign- vs zero-extend). FP cases all funnel back through
-`CV_FTOI_S` so the runner sees an int exit code.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `e01_sext_i8_i32`     | · | `sext (i8)-1 → i32` = 0xFFFFFFFF; low 8 = 255                     | 255 |
-| `e02_zext_u8_i32`     | · | `zext (u8)0xFF → i32` = 0x000000FF; low 8 = 255                   | 255 |
-| `e03_sext_i16_i32`    | · | `sext (i16)-1000 → i32` = 0xFFFFFC18; low 8 = 0x18 = 24           |  24 |
-| `e04_zext_u16_i32`    | · | `zext (u16)0xABCD → i32` = 0x0000ABCD; low 8 = 0xCD = 205         | 205 |
-| `e05_zext_u32_i64`    | · | `zext (u32)0xFFFFFFFF → i64`; low 32 = 0xFFFFFFFF; low 8 = 255    | 255 |
-| `e06_sext_i32_i64`    | · | `sext (i32)-1 → i64` = -1; low 32 = 0xFFFFFFFF; low 8 = 255       | 255 |
-| `e07_trunc_i64_i32`   | · | `trunc 0x100000080 → i32` = 0x80                                  | 128 |
-| `e08_trunc_i32_i8`    | · | `trunc 0x1FF → i8` = 0xFF; returned as u8                         | 255 |
-| `e09_itof_s_i32_f32`  | · | `(i32)7 → f32 7.0 → ftoi_s` round-trip                            |   7 |
-| `e10_itof_u_u32_f64`  | · | `(u32)100 → f64 100.0 → ftoi_s` cross-width                       | 100 |
-| `e11_ftoi_s_neg`      | · | `ftoi_s(-1.5f) = -1` (truncate toward zero); low 8 = 255          | 255 |
-| `e12_ftoi_u_pos`      | · | `ftoi_u(200.7f) = 200u`                                           | 200 |
-| `e13_fext_f32_f64`    | · | `fext 3.5f → 3.5 → ftoi_s` → 3                                    |   3 |
-| `e14_ftrunc_f64_f32`  | · | `ftrunc 7.875 → 7.875f → ftoi_s` → 7                              |   7 |
-| `e15_bitcast_i32_f32` | · | `bitcast 0x40A00000 → f32 5.0f → ftoi_s` (same-size cross-class)  |   5 |
-
-## Group F — memory (loads/stores beyond locals)
-
-Group B already exercises basic load/store of an i32 local. Group F
-pushes the surface: every scalar width, FP load/store, indirect
-non-zero offsets, store-from-IMM vs store-from-REG, `copy_bytes`,
-`set_bytes`, volatile, and the bitfield methods.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `f01_load_store_i8`              | · | local u8; store IMM 200; load; return                           | 200 |
-| `f02_load_store_i16`             | · | local i16; store 0x1234; load; low 8 = 0x34                     |  52 |
-| `f03_load_store_i64`             | · | local i64; store 0x1_0000_0042; runner reads w0 = 0x42          |  66 |
-| `f04_load_store_f32`             | · | local f32; store FP reg = 7.5f; load; ftoi_s                    |   7 |
-| `f05_load_store_f64`             | · | local f64; store FP reg = 3.25; load; ftoi_s                    |   3 |
-| `f06_indirect_nonzero_offset`    | · | i64 local addr-taken; store i32 at +0 (sentinel) and +4         |  42 |
-| `f07_store_reg`                  | · | store from REG (vs IMM in b04) into a local i32; reload         |  17 |
-| `f08_copy_bytes`                 | · | `copy_bytes(dst, src, Pt {10,32})`; sum dst.a+dst.b             |  42 |
-| `f09_set_bytes_zero`             | · | `set_bytes(0)` over an i32 buffer; load → 0                     |   0 |
-| `f10_set_bytes_ff`               | · | `set_bytes(0xFF)` over an i32 buffer; load = 0xFFFFFFFF; low 8  | 255 |
-| `f11_volatile_rw`                | · | b04 body with `MF_VOLATILE` on both store and load              |  42 |
-| `f12_bitfield_unsigned`          | · | `{u: 5}` at bit_offset=3; store 21; load (zero-extend)          |  21 |
-| `f13_bitfield_signed`            | · | `{s: 5}` at bit_offset=0; store -1; load sign-extends; low 8    | 255 |
-
-## Group G — calls (beyond direct-call path)
-
-Group B established the direct-call mechanics (param/return, stack
-spill, sret, byval, fp param). Group G stresses what falls out once
-calls *compose*: indirect calls through function pointers, recursion,
-register-preservation across calls, HFAs, and pass-by-pointer for
-oversized aggregates. Each helper is its own `func_begin`/`func_end`
-under the same `CGTarget`. `cmp_branch`-driven recursion bases share
-the Group D control surface.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `g01_indirect_call`                 | · | `int (*fp)(int) = echo; return fp(42);` (call via REG, not direct symbol) |  42 |
-| `g02_recursion_factorial`           | · | `int f(int n){return n<2?1:n*f(n-1);}; f(5)`                              | 120 |
-| `g03_recursion_fib`                 | · | `int f(int n){return n<2?n:f(n-1)+f(n-2);}; f(10)`                        |  55 |
-| `g04_mutual_recursion`              | · | `is_even`/`is_odd` cross-recursion; `is_even(8)`                          |   1 |
-| `g05_chained_calls`                 | · | `inc(inc(inc(39)))` — return value of one is the arg of the next          |  42 |
-| `g06_mixed_int_fp_params`           | · | `int f(int a, float b, int c, double d, int e)`; integer sum truncated     |  42 |
-| `g07_void_call_outparam`            | · | `void fill(int *p, int v); int x; fill(&x, 42); return x;`                |  42 |
-| `g08_large_struct_byval`            | · | `struct S{int a[8];}` (>16B) passed by value (ABI: indirect)              |  42 |
-| `g09_hfa_param_f32x2`               | · | `struct V{float x,y;}` HFA param `(1.5,1.5)`; ftoi_s of caller-side sum   |   3 |
-| `g10_hfa_return_f32x2`              | · | HFA return `{1.5f,1.5f}`; ftoi_s of caller-side sum                       |   3 |
-| `g11_caller_saved_live_across_call` | · | local `int x=42` live across a clobbering call; backend must preserve     |  42 |
-| `g12_addr_taken_local_across_call`  | · | b05-style addr-taken local survives an intervening call                   |  18 |
-| `g13_call_in_loop_induction`        | · | `for(i=0;i<10;i++) s += id(i);` — induction var preserved across call     |  45 |
-
-## Group H — control flow
-
-Builds out the loop and multi-way branch surface beyond Group D's
-`scope_if`/`scope_else`. Includes both structured loop ops
-(`scope_loop`, `scope_break`, `scope_continue`) and the unstructured
-`jump`/label form so the backend exercises arbitrary CFGs (forward
-and backward `jump`). `switch`-style multi-way uses repeated
-`cmp_branch`. Short-circuit `&&`/`||` are exercised at the IR level
-(lowered to chained `cmp_branch` + materialize) — the test proves
-short-circuit by observing that the RHS side effect did *not* run.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `h01_while_sum_0_to_9`        | · | `int s=0,i=0; while(i<10){ s+=i; i++; } return s;`                       |  45 |
-| `h02_do_while_once`           | · | `int i=0; do { i=42; } while(0); return i;`                              |  42 |
-| `h03_for_count_to_10`         | · | `int s=0; for(int i=1;i<=10;i++) s+=i; return s;`                        |  55 |
-| `h04_loop_break`              | · | `for(i=0;;i++) if(i==42) break; return i;`                               |  42 |
-| `h05_loop_continue`           | · | sum of even i in `[0,20)` using `continue` to skip odds                  |  90 |
-| `h06_nested_loops`            | · | `for(i=0;i<3;i++) for(j=0;j<2;j++) s++; return s;`                       |   6 |
-| `h07_break_inner_only`        | · | `break` exits inner loop only — outer continues                          |   9 |
-| `h08_early_return_in_loop`    | · | `for(i=0;;i++) if(i==17) return i;`                                      |  17 |
-| `h09_switch_three_cases`      | · | `switch(2){case 1:r=10;break; case 2:r=42;break; case 3:r=99;break;}`    |  42 |
-| `h10_switch_fallthrough`      | · | `case 1: r+=10; case 2: r+=20;` (no break) on input 1                    |  30 |
-| `h11_switch_default`          | · | `switch(99){case 1:..;break; default: r=7;}` returns default             |   7 |
-| `h12_jump_forward`            | · | `jump L; ret 99 (unreachable); L: ret 42;` — backend tolerates dead op   |  42 |
-| `h13_jump_backward`           | · | counter loop built from `cmp_branch` + backward `jump` (no scope ops)    |  10 |
-| `h14_short_circuit_and_skip`  | · | `int s=0; (0) && (s=99,1); return s;` — RHS side effect must be skipped  |   0 |
-| `h15_short_circuit_or_skip`   | · | `int s=0; (1) \|\| (s=99,1); return s;` — RHS side effect must be skipped|   0 |
-| `h16_ternary`                 | · | `int x = (5>3) ? 42 : 7; return x;`                                      |  42 |
-| `h17_ternary_side_effect_one_arm` | · | `int s=0; (1) ? (s=42) : (s=99); return s;` — only taken arm runs    |  42 |
-| `h18_unreachable_after_ret`   | · | ops emitted after a `ret` (dead block); backend must not crash           |  42 |
-
-## Group I — alloca / VLA
-
-Stack-allocated runtime-sized memory: the `alloca` op (constant- and
-runtime-size), required-alignment variants, two-allocas-disjoint, and
-VLAs as parameters. Oracles exercise both the *address* (alignment,
-distinct per allocation) and the *contents* (writes survive, helpers
-can deref).
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `i01_alloca_const_int`        | · | `int *p = alloca(sizeof(int)); *p = 42; return *p;`                       |  42 |
-| `i02_alloca_runtime_size`     | · | `int n=5; int *p = alloca(n*sizeof(int));` fill `1..5`; sum                |  15 |
-| `i03_alloca_align_16`         | · | alloca with 16-byte alignment request; return `((uintptr_t)p & 0xF)==0`   |   1 |
-| `i04_alloca_in_loop_distinct` | · | 3 iters, each `alloca(4)` + record addr; return `(a!=b && b!=c)`          |   1 |
-| `i05_alloca_then_call`        | · | alloca buf; pass to helper that writes 42; reload after call              |  42 |
-| `i06_two_allocas_disjoint`    | · | `int *p=alloca(4); int *q=alloca(4); *p=1; *q=2; return *p+*q;`           |   3 |
-| `i07_alloca_addr_escapes`     | · | alloca buf; helper stores `&buf` then reads it back                        |  42 |
-| `i08_vla_param_sum`           | · | helper `int sum(int n, int a[n])`; pass VLA `1..9`; sum                    |  45 |
-| `i09_alloca_preserves_locals` | · | named `int` locals before+after alloca; both readable post-alloca         |  42 |
-| `i10_alloca_after_named_local`| · | alloca after a fixed local — frame layout must keep both addressable      |  42 |
-
-## Group J — varargs
-
-Drives `va_start_`, `va_arg_`, `va_end_`, `va_copy_` on `CGTarget` and
-the ABI's vararg classification (`abi_va_list_type` + the
-`vararg_*_offset` fields on `ABIFuncInfo`). Each case pairs a variadic
-helper (`int f(int n, ...)`) with a `test_main` caller; the helper
-allocates an `ap` of `abi_va_list_type` size in a local slot and passes
-its address to `va_start_`/`va_arg_`. AArch64 PCS routes int and FP var
-args through separate save areas, so spill cases exist for each.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `j01_va_int_sum_3`        | · | `int sum(int n, ...)`; `sum(3, 1, 2, 3)` (basic va_start/va_arg/va_end) |   6 |
-| `j02_va_zero_args`        | · | `sum(0)` — va_start/va_end with zero va_arg calls                       |   0 |
-| `j03_va_int_spill`        | · | `sum(10, 1..10)` — 10 var ints (>7 in GPR save area; rest spill)        |  55 |
-| `j04_va_int64`            | · | `sum_ll(2, 21LL, 21LL)` — i64 var args; low 32 of sum                   |  42 |
-| `j05_va_double_sum`       | · | `int sumd(int n, ...){ftoi_s of fp accumulator}`; `sumd(3, 1.5, 2.0, 3.5)` |   7 |
-| `j06_va_double_spill`     | · | `sumd(9, 0.5×9)` — exhaust FP save area; last spills                    |   4 |
-| `j07_va_mixed_int_dbl`    | · | `int f(int n, int, double, int, double)`; sum truncated to int          |  42 |
-| `j08_va_copy`             | · | `va_copy(b, a)`; consume first arg from each — equal halves of `42`     |  42 |
-| `j09_va_two_fixed`        | · | `int f(int a, int b, ...) { return a+b+va_arg(); }` — second fixed slot |  42 |
-
-## Group K — atomics
-
-Exercises `atomic_load`, `atomic_store`, `atomic_rmw` (every `AtomicOp`
-kind), `atomic_cas` (success and failure paths), and `fence`. Each case
-stores into an `FSF_ADDR_TAKEN` i32/i64 local, performs one atomic op
-via the helper's address operand, then reads back via plain load to
-verify the post-state. `MemOrder` is varied across cases so a backend
-that bakes an ordering bit reset wins consistency. A successful CAS
-returns the prior; failure leaves memory unchanged.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `k01_atomic_load_relaxed`   | · | `int x=42; r=atomic_load(&x, RELAXED); return r;`                          |  42 |
-| `k02_atomic_store_load_acq` | · | `atomic_store(&x, 42, RELEASE); r=atomic_load(&x, ACQUIRE);`                |  42 |
-| `k03_atomic_load_seq_cst`   | · | `atomic_load(&x, SEQ_CST)` — full barrier ordering                         |  42 |
-| `k04_atomic_rmw_add`        | · | `x=40; prior=rmw(ADD,&x,2,SEQ_CST); return atomic_load(&x);` post-state    |  42 |
-| `k05_atomic_rmw_xchg`       | · | `x=99; rmw(XCHG,&x,42); return load(&x);`                                  |  42 |
-| `k06_atomic_rmw_and`        | · | `x=0xFF; rmw(AND,&x,0x2A); return load(&x);`                               |  42 |
-| `k07_atomic_rmw_or`         | · | `x=0x20; rmw(OR,&x,0x0A); return load(&x);`                                |  42 |
-| `k08_atomic_rmw_xor`        | · | `x=0xFF; rmw(XOR,&x,0xD5); return load(&x);` (= 0x2A)                      |  42 |
-| `k09_atomic_rmw_sub`        | · | `x=44; rmw(SUB,&x,2); return load(&x);`                                    |  42 |
-| `k10_atomic_rmw_nand`       | · | `x=0xFF; rmw(NAND,&x,0xD5);` post-state low 8 = `~(0xFF&0xD5)&0xFF = 0x2A` |  42 |
-| `k11_atomic_cas_success`    | · | `x=10; cas(&x,exp=10,des=42)→ok=1;` post-load                              |  42 |
-| `k12_atomic_cas_failure`    | · | `x=10; cas(&x,exp=99,des=42)→ok=0;` post-load (unchanged)                  |  10 |
-| `k13_atomic_load_i64`       | · | i64 atomic load of `0x1_0000_002A`; low 8                                  |  42 |
-| `k14_atomic_rmw_prior`      | · | return `prior` from `rmw(ADD,&x=40,2)` (not post-state) → 40               |  40 |
-| `k15_fence_seq_cst`         | · | `fence(SEQ_CST)` between two plain stores+loads; no observable race        |  42 |
-
-## Group L — intrinsics
-
-Drives `CGTarget.intrinsic` across every `IntrinKind` group. Bit ops
-return their result in a single REG dst. `MEMCPY`/`MEMMOVE`/`MEMSET`
-take three address/byte/n args and write through memory. Hint kinds
-(`PREFETCH`, `EXPECT`, `UNREACHABLE`, `TRAP`, `ASSUME_ALIGNED`) are
-emitted on a path the test then steps over; the oracle is the post-hint
-control flow. Checked-arith intrinsics return `(result, overflow_flag)`
-in two REG dsts; cases observe each independently.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `l01_popcount_u32`       | · | `popcount(0x000000FF) → 8`                                                  |   8 |
-| `l02_popcount_u64`       | · | `popcount((u64)-1) → 64`                                                    |  64 |
-| `l03_ctz_u32`            | · | `ctz(0x80) → 7`                                                             |   7 |
-| `l04_clz_u32`            | · | `clz(0x000000FF) → 24` (32-bit)                                             |  24 |
-| `l05_bswap16`            | · | `bswap16(0x1234) → 0x3412`; low 8                                           |  18 |
-| `l06_bswap32`            | · | `bswap32(0x11223344) → 0x44332211`; low 8                                   |  17 |
-| `l07_bswap64`            | · | `bswap64(0x1122334455667788) → 0x8877665544332211`; low 8                   |  17 |
-| `l08_memcpy_4`           | · | i32 src=42; `memcpy(&dst,&src,4)`; return dst                               |  42 |
-| `l09_memmove_overlap`    | · | `int a[5]={1,2,3,4,5}; memmove(a+1,a,16); return a[4];` (overlap-safe)      |   4 |
-| `l10_memset_zero`        | · | `int b[4]; memset(b,0,16); return b[2];`                                    |   0 |
-| `l11_memset_ff`          | · | `int b; memset(&b,0xFF,4); return b;` low 8                                 | 255 |
-| `l12_expect_taken`       | · | `if (__builtin_expect(x==1,1)) return 42;` with `x=1`                       |  42 |
-| `l13_unreachable_live`   | · | `if (x) return 42; else __builtin_unreachable();` with `x=1`                |  42 |
-| `l14_trap_live`          | · | `if (x) return 42; else __builtin_trap();` with `x=1` — trap path unreached |  42 |
-| `l15_prefetch_noop`      | · | `__builtin_prefetch(p); *p = 42; return *p;` — hint must not corrupt p     |  42 |
-| `l16_assume_aligned`     | · | `p = assume_aligned(p,8); *p=42; return *p;` — hint must round-trip p      |  42 |
-| `l17_add_overflow_no`    | · | `add_overflow(20,22,&r) → ovf=0`; return `r`                                |  42 |
-| `l18_add_overflow_yes`   | · | `add_overflow(INT_MAX,1,&r) → ovf=1`; return `ovf`                          |   1 |
-| `l19_sub_overflow_yes`   | · | `sub_overflow(INT_MIN,1,&r) → ovf=1`; return `ovf`                          |   1 |
-| `l20_mul_overflow_no`    | · | `mul_overflow(6,7,&r) → ovf=0`; return `r`                                  |  42 |
-
-## Group N — TLS (thread-local storage)
-
-Drives `CGTarget.tls_addr_of` plus the `SK_TLS` / `SF_TLS` section/symbol
-machinery on `ObjBuilder`. Each case allocates a `.tdata` (initialized)
-or `.tbss` (zero-init) section, defines a `SK_TLS` symbol in it, and
-accesses storage via `tls_addr_of` → INDIRECT load/store. The backend
-chooses the TLS model (LE/IE/LD/GD) from `c->target` and the symbol's
-visibility; the expectations here don't presume one.
-
-The aarch64 backend currently implements TLS Local-Exec only (commit
-c1cf117). Path E requires `test/link/harness/start.c`'s TCB+TLS image
-setup; paths D/J have no per-thread TLS context yet and are expected to
-fail until the JIT runners grow it.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `n01_tls_load_le`        | · | `_Thread_local int x=42; return x;` (`.tdata`)          |  42 |
-| `n02_tls_store_le`       | · | `_Thread_local int x; x=42; return x;` (`.tbss`)         |  42 |
-| `n03_tls_addr_taken`     | · | `_Thread_local int x=17; int*p=&x; *p+=1; return *p;`    |  18 |
-| `n04_tls_i64`            | · | `_Thread_local long long x=0x1_0000_002A; return (int)x;` |  42 |
-| `n05_tls_in_loop`        | · | TLS access inside loop — addr may be hoisted but correct |  10 |
-| `n06_tls_two_vars`       | · | two distinct TLS vars; `a+b = 10+32`                     |  42 |
-| `n07_tls_bss_zero_init`  | · | `_Thread_local int x;` — `.tbss` reads as zero            |   0 |
-| `n08_tls_addend_offset`  | · | `_Thread_local int a[8]={...,42}; return a[7];`          |  42 |
-
-## Group O — sections and globals (non-TLS)
-
-Drives `addr_of` on `OPK_GLOBAL` operands plus direct `load`/`store`
-through `GLOBAL_op` (the spec accepts `LOCAL|GLOBAL|INDIRECT` addr
-operands). Exercises the `SecKind` × `SymKind` × `SymBind` matrix on
-`ObjBuilder`: `SEC_DATA`, `SEC_BSS`, `SEC_RODATA` × `SK_OBJ` ×
-`SB_GLOBAL` / `SB_LOCAL`, plus a named non-default text section for a
-function. Aggregate-global cases reuse the `Pt` type from
-`cases_shared.c` so its `TagId` interns once across groups.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `o01_global_load_data`         | · | `int g=42; return g;` — direct GLOBAL load              |  42 |
-| `o02_global_store_data`        | · | `int g; g=42; return g;` — store via GLOBAL operand     |  42 |
-| `o03_global_bss_zero`          | · | uninitialized `.bss` reads as zero                       |   0 |
-| `o04_global_addr_taken`        | · | b05 over a global: `&g`, +=1, reload                     |  18 |
-| `o05_global_i64`               | · | `long long g=0x1_0000_002A; return (int)g;`             |  42 |
-| `o06_rodata_load`              | · | `static const int rd[4]={1,2,42,4}; return rd[2];`      |  42 |
-| `o07_global_struct_field`      | · | `struct Pt g={10,32}; return g.a + g.b;`                |  42 |
-| `o08_global_array_runtime_idx` | · | `int g[5]={1..5}; int i=2; return g[i];`                |   3 |
-| `o09_static_local_linkage`     | · | `static int g=42;` — SB_LOCAL data sym                  |  42 |
-| `o10_global_addend`            | · | `int g[8]={...,42};` access via OPK_GLOBAL addend = 28  |  42 |
-| `o11_text_section_named`       | · | helper placed in `.text.o11_helper`; main calls it       |  42 |
-| `o12_global_across_call`       | · | `&g` materialized; intervening call must not corrupt it  |  42 |
-
-## Group Q — multi-function (extends Group B's two-function pattern)
-
-Group B established that two `func_begin`/`func_end` pairs work in one
-TU. Group Q stresses what falls out as the function count grows and
-linkage/section attributes vary: many small helpers, mixed
-`SB_GLOBAL`/`SB_LOCAL`, distinct signatures sharing one `CGTarget`,
-per-function text sections (`-ffunction-sections` analogue), and
-forward-declared helpers defined later in the TU.
-
-| Case | Status | Body | Expected |
-|---|---|---|---|
-| `q01_three_helpers`                 | · | `a()+b()+c() = 10+15+17`                                    |  42 |
-| `q02_static_internal_linkage`       | · | `static int helper(void){return 42;}` — SB_LOCAL            |  42 |
-| `q03_intra_tu_call_chain`           | · | `a→b→c→d`; d returns 42                                     |  42 |
-| `q04_eight_helpers`                 | · | start at 6, chain through 8 helpers each adding `i+1`        |  42 |
-| `q05_distinct_signatures`           | · | int(int), long(long,long), void(int*), int(void) all called |  42 |
-| `q06_function_section_distinct`     | · | helper in `.text.q06_helper`, main in default `.text`        |  42 |
-| `q07_cross_section_calls`           | · | a in `.text.q07_a` calls b in `.text.q07_b`; main calls a    |  42 |
-| `q08_forward_decl_define_late`      | · | main calls helper before the helper body is emitted          |  42 |
-| `q09_helper_calls_helper`           | · | `a()` calls `b()`; main calls `a()`                          |  42 |
-| `q10_global_and_static_mix`         | · | one SB_GLOBAL + two SB_LOCAL helpers; sum = 12+15+15        |  42 |
-| `q11_addr_of_helper_through_global` | · | function ptr stored in `.data` (R_ABS64); indirect call     |  42 |
-
-## Group P — set_loc / debug
-
-Drives the producer-side wiring described in `doc/DWARF.md` §3:
-`cgtest_set_loc` fans the SrcLoc to both `CGTarget.set_loc` (→ MCEmitter
-→ per-instruction `debug_emit_row`) and `debug_set_pending_loc`. The
-runner constructs a `Debug*` for cases that register W directives,
-plumbs it onto `MCEmitter.debug` and `CGTarget.debug`, and calls
-`debug_emit` between `cgtarget_finalize` and `obj_finalize`. The case
-body still returns 42 so D/R/E/J keep passing; the **W** path is the
-metadata oracle and reads the emitted obj back through `cfree_dwarf_*`.
-
-Phase status:
-- Phase 0 wiring (this group's prerequisite): `cgtest_set_loc`,
-  `MCEmitter.debug` line-row fanout in `emit32`, `CGTarget.debug`, and
-  `cgtest_begin_func` / `cgtest_end` calling `debug_func_begin` /
-  `debug_func_pc_range` are all in place.
-- Phase 1+2 (real `.debug_*` sections + `cfree_dwarf_open`): owned by
-  Agents A/B; W flips green for p01..p05 once both land.
-- Phase 3 (`debug_local`, `cfree_dwarf_var_at`): unblocks p07.
-
-| Case | Status | Body | Expected (D/E/J / W) |
-|---|---|---|---|
-| `p01_line_one_inst` | · | `set_loc(p01.c:10)` before single `load_imm 42; ret`; W asserts addr↔line round-trip and `subprogram test_main` | 42 / line p01.c:10 + subprogram test_main |
-| `p02_line_monotone` | · | three `set_loc` transitions on (p02.c, 1/2/3), each followed by a `load_imm`; W asserts all three lines round-trip | 42 / lines p02.c:1,2,3 + subprogram test_main |
-| `p03_line_repeat` | · | `set_loc(p03.c:7)` → `load_imm`; `set_loc(p03.c:8)` → `load_imm`; `set_loc(p03.c:7)` again before final `load_imm`. W asserts the (p03.c, 7) binding survives the round-trip | 42 / line p03.c:7 + subprogram test_main |
-| `p05_func_pc_range` | · | identical to p01 with file `p05.c`; W additionally asserts the subprogram pc range size lies in [16, 256] bytes | 42 / line p05.c:11 + subprogram + pc_range |
-| `p07_local_loc` | Phase 3 | one i32 local (`my_local`) stored to and reloaded from a frame slot; W asserts `var_at` returns a frame-relative location for the name | 42 / line p07.c:5 + subprogram + var (Phase 3) |
-
-## Deferred groups
-
-| Group | Theme |
-|---|---|
-| M | inline asm |
-| R | opt-wrapped equivalence |
diff --git a/test/cg/binder_test.c b/test/cg/binder_test.c
@@ -1,538 +0,0 @@
-/* Unit test for cg_inline_asm — the constraint binder (Track B of
- * doc/INLINEASM.md). Builds a Compiler with a stand-in CGTarget that
- * records every operand the binder hands to asm_block, then asserts the
- * binding shape for each constraint kind:
- *
- *   "r"        — input forced to OPK_REG.
- *   "=r"       — output gets a fresh REG, pushed back as an SValue.
- *   "+r"       — output reg is the same as the matching input slot's reg.
- *   "=&r"      — output reg is disjoint from any input reg.
- *   "i"        — input must be OPK_IMM; passes through.
- *   "m"        — addressable lvalue → OPK_INDIRECT in the bound input.
- *   "0"..."N"  — matching input bound to out_ops[N].v.reg.
- *   "memory"   — every live RES_REG SValue on the CG stack is spilled
- *                via target->spill_reg before asm_block fires.
- *   register-name — passed straight through in the clobbers array.
- *   "cc"       — accepted-and-dropped on the binder side (still appears
- *                in the clobbers array we forward — the arch backend
- *                handles the no-op).
- *
- * The mock target is the smallest thing that compiles: it hands out reg
- * ids 1, 2, 3, ... from a tiny pool, refuses to do real codegen, and
- * appends every call into a log buffer the test asserts against.
- *
- * Built standalone (no cg-runner dependency) so the test runs without
- * the JIT / link harness. Wired into test/test.mk as a separate target
- * (test-cg-binder). */
-
-#include <stdarg.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-#include <cfree.h>
-
-#include "abi/abi.h"
-#include "arch/arch.h"
-#include "cg/cg.h"
-#include "core/core.h"
-#include "core/heap.h"
-#include "core/pool.h"
-#include "type/type.h"
-
-/* ---- host glue ------------------------------------------------------- */
-
-static void* h_alloc(CfreeHeap* h, size_t n, size_t a) {
-  (void)h;
-  (void)a;
-  return n ? malloc(n) : NULL;
-}
-static void* h_realloc(CfreeHeap* h, void* p, size_t o, size_t n, size_t a) {
-  (void)h;
-  (void)o;
-  (void)a;
-  return realloc(p, n);
-}
-static void h_free(CfreeHeap* h, void* p, size_t n) {
-  (void)h;
-  (void)n;
-  free(p);
-}
-static CfreeHeap g_heap = {h_alloc, h_realloc, h_free, NULL};
-
-static void diag_emit(CfreeDiagSink* s, CfreeDiagKind k, CfreeSrcLoc loc,
-                      const char* fmt, va_list ap) {
-  static const char* names[] = {"note", "warning", "error", "fatal"};
-  (void)s;
-  (void)loc;
-  fprintf(stderr, "%s: ", names[k]);
-  vfprintf(stderr, fmt, ap);
-  fputc('\n', stderr);
-}
-static CfreeDiagSink g_diag = {diag_emit, NULL, 0, 0};
-
-/* ---- mock CGTarget --------------------------------------------------- */
-
-#define MOCK_LOG_CAP 4096u
-#define MOCK_REG_CAP 16u
-
-typedef struct MockTarget {
-  CGTarget base;
-
-  /* Tiny reg pool: hand out ids in [1, MOCK_REG_CAP]. 0 is "free", 1 is
-   * "in use". Class is ignored — the binder asks for RC_INT throughout. */
-  u8 in_use[MOCK_REG_CAP + 1u];
-
-  /* Spill-slot id counter; binder doesn't care about layout, only id
-   * uniqueness. */
-  FrameSlot next_slot;
-
-  /* Recorded asm_block call (last one wins; the binder only fires once
-   * per cg_inline_asm). */
-  int asm_called;
-  const char* tmpl;
-  u32 nout, nin, nclob;
-  Operand out_ops[8];
-  Operand in_ops[8];
-  Sym clobbers[8];
-
-  /* Log of side effects: spills, copies, load_imms, addr_ofs, free_regs.
-   * Each entry is a one-line summary; the test scans for substrings. */
-  char log[MOCK_LOG_CAP];
-  u32 log_len;
-} MockTarget;
-
-static void mock_logf(MockTarget* m, const char* fmt, ...) {
-  if (m->log_len >= MOCK_LOG_CAP - 1u) return;
-  va_list ap;
-  va_start(ap, fmt);
-  int n = vsnprintf(m->log + m->log_len, MOCK_LOG_CAP - m->log_len, fmt, ap);
-  va_end(ap);
-  if (n > 0) m->log_len += (u32)n;
-}
-
-static Reg m_alloc_reg(CGTarget* t, RegClass cls, const Type* ty) {
-  (void)cls;
-  (void)ty;
-  MockTarget* m = (MockTarget*)t;
-  for (u32 i = 1; i <= MOCK_REG_CAP; ++i) {
-    if (!m->in_use[i]) {
-      m->in_use[i] = 1;
-      return (Reg)i;
-    }
-  }
-  return (Reg)REG_NONE;
-}
-
-static void m_free_reg(CGTarget* t, Reg r, RegClass cls) {
-  (void)cls;
-  MockTarget* m = (MockTarget*)t;
-  if (r != (Reg)REG_NONE && r <= MOCK_REG_CAP) m->in_use[r] = 0;
-  mock_logf(m, "free_reg r%u\n", (unsigned)r);
-}
-
-static FrameSlot m_frame_slot(CGTarget* t, const FrameSlotDesc* d) {
-  (void)d;
-  MockTarget* m = (MockTarget*)t;
-  return ++m->next_slot;
-}
-
-static void m_spill_reg(CGTarget* t, Operand src, FrameSlot s, MemAccess ma) {
-  (void)ma;
-  MockTarget* m = (MockTarget*)t;
-  mock_logf(m, "spill_reg r%u -> slot %u\n", (unsigned)src.v.reg, (unsigned)s);
-}
-
-static void m_reload_reg(CGTarget* t, Operand dst, FrameSlot s, MemAccess ma) {
-  (void)ma;
-  MockTarget* m = (MockTarget*)t;
-  mock_logf(m, "reload_reg slot %u -> r%u\n", (unsigned)s, (unsigned)dst.v.reg);
-}
-
-static void m_load_imm(CGTarget* t, Operand dst, i64 imm) {
-  MockTarget* m = (MockTarget*)t;
-  mock_logf(m, "load_imm r%u = %lld\n", (unsigned)dst.v.reg, (long long)imm);
-}
-
-static void m_copy(CGTarget* t, Operand dst, Operand src) {
-  MockTarget* m = (MockTarget*)t;
-  mock_logf(m, "copy r%u <- r%u\n", (unsigned)dst.v.reg, (unsigned)src.v.reg);
-}
-
-static void m_addr_of(CGTarget* t, Operand dst, Operand lv) {
-  MockTarget* m = (MockTarget*)t;
-  unsigned src_id = 0;
-  switch (lv.kind) {
-    case OPK_LOCAL:   src_id = (unsigned)lv.v.frame_slot; break;
-    case OPK_GLOBAL:  src_id = (unsigned)lv.v.global.sym; break;
-    case OPK_INDIRECT: src_id = (unsigned)lv.v.ind.base; break;
-    default: src_id = 0;
-  }
-  mock_logf(m, "addr_of r%u <- kind=%u id=%u\n", (unsigned)dst.v.reg,
-            (unsigned)lv.kind, src_id);
-}
-
-static void m_set_loc(CGTarget* t, SrcLoc loc) {
-  (void)t;
-  (void)loc;
-}
-
-static void m_func_begin(CGTarget* t, const CGFuncDesc* fd) {
-  (void)t;
-  (void)fd;
-}
-static void m_func_end(CGTarget* t) { (void)t; }
-static void m_param(CGTarget* t, const CGParamDesc* d) {
-  (void)t;
-  (void)d;
-}
-
-static void m_asm_block(CGTarget* t, const char* tmpl,
-                        const AsmConstraint* outs, u32 nout, Operand* out_ops,
-                        const AsmConstraint* ins, u32 nin,
-                        const Operand* in_ops, const Sym* clobbers, u32 nclob) {
-  (void)outs;
-  (void)ins;
-  MockTarget* m = (MockTarget*)t;
-  m->asm_called = 1;
-  m->tmpl = tmpl;
-  m->nout = nout;
-  m->nin = nin;
-  m->nclob = nclob;
-  if (nout > 8) nout = 8;
-  if (nin > 8) nin = 8;
-  if (nclob > 8) nclob = 8;
-  for (u32 i = 0; i < nout; ++i) m->out_ops[i] = out_ops[i];
-  for (u32 i = 0; i < nin; ++i) m->in_ops[i] = in_ops[i];
-  for (u32 i = 0; i < nclob; ++i) m->clobbers[i] = clobbers[i];
-  mock_logf(m, "asm_block tmpl=%s nout=%u nin=%u nclob=%u\n",
-            tmpl ? tmpl : "(null)", (unsigned)m->nout, (unsigned)m->nin,
-            (unsigned)m->nclob);
-}
-
-static void mock_target_init(MockTarget* m, Compiler* c) {
-  memset(m, 0, sizeof *m);
-  m->base.c = c;
-  m->base.alloc_reg = m_alloc_reg;
-  m->base.free_reg = m_free_reg;
-  m->base.frame_slot = m_frame_slot;
-  m->base.param = m_param;
-  m->base.spill_reg = m_spill_reg;
-  m->base.reload_reg = m_reload_reg;
-  m->base.load_imm = m_load_imm;
-  m->base.copy = m_copy;
-  m->base.addr_of = m_addr_of;
-  m->base.func_begin = m_func_begin;
-  m->base.func_end = m_func_end;
-  m->base.set_loc = m_set_loc;
-  m->base.asm_block = m_asm_block;
-}
-
-/* ---- per-test compiler scaffold ------------------------------------- */
-
-typedef struct TestCtx {
-  Compiler cc;
-  Compiler* c;
-  MockTarget mt;
-  CG* g;
-  const Type* i64_ty;
-} TestCtx;
-
-static void tc_init(TestCtx* tc) {
-  CfreeEnv env;
-  memset(&env, 0, sizeof env);
-  env.heap = &g_heap;
-  env.diag = &g_diag;
-  env.now = -1;
-
-  CfreeTarget tgt;
-  memset(&tgt, 0, sizeof tgt);
-  tgt.arch = CFREE_ARCH_ARM_64;
-  tgt.os = CFREE_OS_LINUX;
-  tgt.obj = CFREE_OBJ_ELF;
-  tgt.ptr_size = 8;
-  tgt.ptr_align = 8;
-
-  /* compiler_init wants the env on the heap-pointer side; stash it. */
-  static CfreeEnv s_env_stash;
-  s_env_stash = env;
-  compiler_init(&tc->cc, tgt, &s_env_stash);
-  tc->c = &tc->cc;
-
-  mock_target_init(&tc->mt, tc->c);
-  tc->g = cg_new(tc->c, &tc->mt.base, NULL);
-  tc->i64_ty = type_prim(tc->c->global, TY_LLONG);
-}
-
-static void tc_fini(TestCtx* tc) {
-  cg_free(tc->g);
-  compiler_fini(&tc->cc);
-}
-
-/* ---- assertion helpers ----------------------------------------------- */
-
-static int g_fails = 0;
-static int g_cases = 0;
-
-#define EXPECT(cond, ...) do {                                           \
-  ++g_cases;                                                             \
-  if (!(cond)) {                                                         \
-    ++g_fails;                                                           \
-    fprintf(stderr, "FAIL %s:%d: %s\n", __FILE__, __LINE__, #cond);      \
-    fprintf(stderr, "  ");                                               \
-    fprintf(stderr, __VA_ARGS__);                                        \
-    fputc('\n', stderr);                                                 \
-  }                                                                      \
-} while (0)
-
-static int log_contains(const MockTarget* m, const char* needle) {
-  return strstr(m->log, needle) != NULL;
-}
-
-/* ---- test cases ------------------------------------------------------ */
-
-static void test_r_in(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* asm("nop" :: "r"(42)) */
-  cg_push_int(tc.g, 42, tc.i64_ty);
-  AsmConstraint ins[1] = {{.str="r", .dir=ASM_IN}};
-  cg_inline_asm(tc.g, "nop", NULL, 0, ins, 1, NULL, 0);
-  EXPECT(tc.mt.asm_called, "asm_block was not invoked");
-  EXPECT(tc.mt.nin == 1, "nin=%u", tc.mt.nin);
-  EXPECT(tc.mt.in_ops[0].kind == OPK_REG, "in_ops[0].kind=%u",
-         tc.mt.in_ops[0].kind);
-  /* The IMM was materialized into a freshly-allocated reg; load_imm shows it. */
-  EXPECT(log_contains(&tc.mt, "load_imm"), "missing load_imm in log:\n%s",
-         tc.mt.log);
-  tc_fini(&tc);
-}
-
-static void test_eq_r_out(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* asm("mov %0, #1" : "=r"(x)) — pushes an output SValue back. */
-  AsmConstraint outs[1] = {{.str="=r", .dir=ASM_OUT}};
-  cg_inline_asm(tc.g, "mov %0, #1", outs, 1, NULL, 0, NULL, 0);
-  EXPECT(tc.mt.asm_called, "asm_block was not invoked");
-  EXPECT(tc.mt.nout == 1, "nout=%u", tc.mt.nout);
-  EXPECT(tc.mt.out_ops[0].kind == OPK_REG, "out_ops[0].kind=%u",
-         tc.mt.out_ops[0].kind);
-  EXPECT(tc.mt.out_ops[0].v.reg != (Reg)REG_NONE, "out reg should be allocated");
-  tc_fini(&tc);
-}
-
-static void test_plus_r_inout(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* +r is GCC's "use this reg as both input and output". The parser
-   * convention this binder honors: emit one output with =r-style behavior
-   * and one matching input "0" with the input value to seed the reg. */
-  cg_push_int(tc.g, 7, tc.i64_ty);
-  AsmConstraint outs[1] = {{.str="+r", .dir=ASM_INOUT}};
-  AsmConstraint ins[1]  = {{.str="0", .dir=ASM_IN}};
-  cg_inline_asm(tc.g, "add %0, %0, #1", outs, 1, ins, 1, NULL, 0);
-  EXPECT(tc.mt.nout == 1 && tc.mt.nin == 1, "nout/nin");
-  EXPECT(tc.mt.out_ops[0].kind == OPK_REG, "out reg");
-  EXPECT(tc.mt.in_ops[0].kind == OPK_REG, "in reg");
-  EXPECT(tc.mt.out_ops[0].v.reg == tc.mt.in_ops[0].v.reg,
-         "matching constraint should bind to same reg (out=%u in=%u)",
-         (unsigned)tc.mt.out_ops[0].v.reg, (unsigned)tc.mt.in_ops[0].v.reg);
-  tc_fini(&tc);
-}
-
-static void test_eq_amp_r_early_clobber(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* asm("..." : "=&r"(x) : "r"(y)) — output reg must differ from input reg. */
-  cg_push_int(tc.g, 5, tc.i64_ty);
-  AsmConstraint outs[1] = {{.str="=&r", .dir=ASM_OUT}};
-  AsmConstraint ins[1]  = {{.str="r", .dir=ASM_IN}};
-  cg_inline_asm(tc.g, "tmpl", outs, 1, ins, 1, NULL, 0);
-  EXPECT(tc.mt.out_ops[0].kind == OPK_REG && tc.mt.in_ops[0].kind == OPK_REG,
-         "REGs expected");
-  EXPECT(tc.mt.out_ops[0].v.reg != tc.mt.in_ops[0].v.reg,
-         "early-clobber should be disjoint (out=%u in=%u)",
-         (unsigned)tc.mt.out_ops[0].v.reg, (unsigned)tc.mt.in_ops[0].v.reg);
-  tc_fini(&tc);
-}
-
-static void test_i_constant(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  cg_push_int(tc.g, 99, tc.i64_ty);
-  AsmConstraint ins[1] = {{.str="i", .dir=ASM_IN}};
-  cg_inline_asm(tc.g, "tmpl", NULL, 0, ins, 1, NULL, 0);
-  EXPECT(tc.mt.in_ops[0].kind == OPK_IMM, "in kind=%u", tc.mt.in_ops[0].kind);
-  EXPECT(tc.mt.in_ops[0].v.imm == 99, "in imm=%lld",
-         (long long)tc.mt.in_ops[0].v.imm);
-  /* No load_imm (the binder forwards the IMM unchanged). */
-  EXPECT(!log_contains(&tc.mt, "load_imm"),
-         "'i' should not load_imm, log:\n%s", tc.mt.log);
-  tc_fini(&tc);
-}
-
-static void test_m_memory_lvalue(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* Push a local lvalue; "m" should materialize it into OPK_INDIRECT
-   * via target->addr_of. */
-  FrameSlotDesc fsd;
-  memset(&fsd, 0, sizeof fsd);
-  fsd.size = 8;
-  fsd.align = 8;
-  fsd.kind = FS_LOCAL;
-  FrameSlot s = cg_local(tc.g, &fsd);
-  /* Use the type-aware push path. We declare it via prototype: */
-  void cg_push_local_typed(CG*, FrameSlot, const Type*);
-  cg_push_local_typed(tc.g, s, tc.i64_ty);
-  AsmConstraint ins[1] = {{.str="m", .dir=ASM_IN}};
-  cg_inline_asm(tc.g, "ldr w0, %0", NULL, 0, ins, 1, NULL, 0);
-  EXPECT(tc.mt.in_ops[0].kind == OPK_INDIRECT, "in kind=%u",
-         tc.mt.in_ops[0].kind);
-  EXPECT(log_contains(&tc.mt, "addr_of"),
-         "expected addr_of in log:\n%s", tc.mt.log);
-  tc_fini(&tc);
-}
-
-static void test_matching_input(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* Output =r at index 0; input "0" should bind to its reg. */
-  cg_push_int(tc.g, 11, tc.i64_ty);
-  AsmConstraint outs[1] = {{.str="=r", .dir=ASM_OUT}};
-  AsmConstraint ins[1]  = {{.str="0", .dir=ASM_IN}};
-  cg_inline_asm(tc.g, "tmpl", outs, 1, ins, 1, NULL, 0);
-  EXPECT(tc.mt.out_ops[0].v.reg == tc.mt.in_ops[0].v.reg,
-         "matching '0' input should reuse out reg (out=%u in=%u)",
-         (unsigned)tc.mt.out_ops[0].v.reg, (unsigned)tc.mt.in_ops[0].v.reg);
-  tc_fini(&tc);
-}
-
-static void test_memory_clobber_spills_live_regs(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* Push a LOCAL lvalue, load it into a reg via cg_load — that leaves a
-   * live RES_REG SValue at the bottom of the stack. Then call asm with a
-   * "memory" clobber and verify the live reg got spilled before the
-   * asm_block fired. */
-  FrameSlotDesc fsd;
-  memset(&fsd, 0, sizeof fsd);
-  fsd.size = 8;
-  fsd.align = 8;
-  fsd.kind = FS_LOCAL;
-  FrameSlot s = cg_local(tc.g, &fsd);
-  void cg_push_local_typed(CG*, FrameSlot, const Type*);
-  cg_push_local_typed(tc.g, s, tc.i64_ty);
-  /* Need a load implementation on the mock to promote LOCAL → REG.
-   * The mock doesn't implement target->load — instead push an immediate
-   * to get a REG-resident SValue without calling cg_load. */
-  cg_push_int(tc.g, 0, tc.i64_ty); /* IMM, not REG, won't be a spill victim */
-
-  /* Force a REG-resident SValue at the bottom by pushing an int and
-   * promoting via cg_inline_asm itself — easier: skip this complexity and
-   * directly observe spill via a real reg-resident value built by =r
-   * output being pushed back. */
-  AsmConstraint outs1[1] = {{.str="=r", .dir=ASM_OUT}};
-  cg_inline_asm(tc.g, "produce", outs1, 1, NULL, 0, NULL, 0);
-  /* Now the stack has a REG-resident SValue from the produced output.
-   * Reset the log before the second call so we can scan for spill_reg
-   * specifically caused by "memory". */
-  tc.mt.log_len = 0;
-  tc.mt.log[0] = '\0';
-
-  Sym mem_sym = pool_intern_cstr(tc.c->global, "memory");
-  Sym clobs[1] = {mem_sym};
-  cg_inline_asm(tc.g, "barrier", NULL, 0, NULL, 0, clobs, 1);
-  EXPECT(log_contains(&tc.mt, "spill_reg"),
-         "expected spill_reg from memory clobber, log:\n%s", tc.mt.log);
-  tc_fini(&tc);
-}
-
-static void test_register_clobber_passthrough(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  Sym x0_sym = pool_intern_cstr(tc.c->global, "x0");
-  Sym clobs[1] = {x0_sym};
-  cg_inline_asm(tc.g, "tmpl", NULL, 0, NULL, 0, clobs, 1);
-  EXPECT(tc.mt.nclob == 1, "nclob=%u", tc.mt.nclob);
-  EXPECT(tc.mt.clobbers[0] == x0_sym, "register clobber not forwarded");
-  tc_fini(&tc);
-}
-
-static void test_cc_clobber_silent(void) {
-  TestCtx tc;
-  tc_init(&tc);
-  /* "cc" should not cause spills; the binder forwards it but does no
-   * special work. (Arch backends drop it on aarch64.) */
-  Sym cc_sym = pool_intern_cstr(tc.c->global, "cc");
-  Sym clobs[1] = {cc_sym};
-  /* Arrange a live REG-resident SValue first; verify it is NOT spilled. */
-  AsmConstraint outs1[1] = {{.str="=r", .dir=ASM_OUT}};
-  cg_inline_asm(tc.g, "produce", outs1, 1, NULL, 0, NULL, 0);
-  tc.mt.log_len = 0;
-  tc.mt.log[0] = '\0';
-  cg_inline_asm(tc.g, "tmpl", NULL, 0, NULL, 0, clobs, 1);
-  EXPECT(!log_contains(&tc.mt, "spill_reg"),
-         "'cc' must not trigger spills, log:\n%s", tc.mt.log);
-  EXPECT(tc.mt.nclob == 1, "nclob=%u", tc.mt.nclob);
-  tc_fini(&tc);
-}
-
-/* AsmConstraint.type drives RegClass for fresh output regs. An FP-typed
- * output must land in RC_FP; a pointer-typed output stays in RC_INT.
- * Hand-built (NULL-type) constraints fall back to int / 64-bit (covered
- * by every other case in this file). */
-static void test_output_type_drives_regclass(void) {
-  TestCtx tc;
-  tc_init(&tc);
-
-  /* asm("..." : "=r"(double_var))  → RC_FP allocation. */
-  const Type* dbl_ty = type_prim(tc.c->global, TY_DOUBLE);
-  AsmConstraint outs_fp[1] = {{.str="=r", .type=dbl_ty, .dir=ASM_OUT}};
-  cg_inline_asm(tc.g, "fmov %0, #1.0", outs_fp, 1, NULL, 0, NULL, 0);
-  EXPECT(tc.mt.nout == 1, "nout=%u", tc.mt.nout);
-  EXPECT(tc.mt.out_ops[0].kind == OPK_REG, "fp out kind=%u",
-         tc.mt.out_ops[0].kind);
-  EXPECT(tc.mt.out_ops[0].cls == RC_FP,
-         "fp out cls=%u (expected RC_FP=%u)", tc.mt.out_ops[0].cls, RC_FP);
-  EXPECT(tc.mt.out_ops[0].type == dbl_ty,
-         "fp out type lost through binder");
-
-  /* Drop the SValue the previous call pushed back so the next call
-   * starts from a clean stack. */
-  cg_drop(tc.g);
-  tc.mt.asm_called = 0;
-  tc.mt.nout = 0;
-
-  /* asm("..." : "=r"(int_ptr))  → RC_INT, pointer type preserved. */
-  const Type* ptr_ty = type_ptr(tc.c->global, type_prim(tc.c->global, TY_INT));
-  AsmConstraint outs_p[1] = {{.str="=r", .type=ptr_ty, .dir=ASM_OUT}};
-  cg_inline_asm(tc.g, "mov %0, sp", outs_p, 1, NULL, 0, NULL, 0);
-  EXPECT(tc.mt.out_ops[0].kind == OPK_REG, "ptr out kind=%u",
-         tc.mt.out_ops[0].kind);
-  EXPECT(tc.mt.out_ops[0].cls == RC_INT,
-         "ptr out cls=%u (expected RC_INT=%u)", tc.mt.out_ops[0].cls, RC_INT);
-  EXPECT(tc.mt.out_ops[0].type == ptr_ty,
-         "ptr out type lost through binder");
-
-  cg_drop(tc.g);
-  tc_fini(&tc);
-}
-
-int main(void) {
-  test_r_in();
-  test_eq_r_out();
-  test_plus_r_inout();
-  test_eq_amp_r_early_clobber();
-  test_i_constant();
-  test_m_memory_lvalue();
-  test_matching_input();
-  test_memory_clobber_spills_live_regs();
-  test_register_clobber_passthrough();
-  test_cc_clobber_silent();
-  test_output_type_drives_regclass();
-
-  fprintf(stderr, "binder_test: %d cases, %d failures\n", g_cases, g_fails);
-  return g_fails ? 1 : 0;
-}
diff --git a/test/cg/dwarf_validate.sh b/test/cg/dwarf_validate.sh
@@ -1,81 +0,0 @@
-#!/usr/bin/env bash
-# test/cg/dwarf_validate.sh — optional third-party DWARF validators.
-#
-# Per doc/DWARF.md §5.3: run `llvm-dwarfdump --verify` and `readelf` over
-# the Phase-1 obj files Group P produces. These are NOT the oracle for
-# any case; the W path's `cg_check_dwarf` is. They exist to catch wire-
-# format errors that our own consumer would miss in the same way the
-# producer makes them.
-#
-# Usage:
-#   test/cg/dwarf_validate.sh [obj-file ...]
-#
-# With no arguments, validates every emitted obj under build/test/cg/p*/.
-# Tools are gated on `command -v` checks; missing tools are skipped
-# silently (exit 0). One non-zero per failed verify; the script returns
-# the count of failures.
-
-set -u
-
-ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
-BUILD_DIR="$ROOT/build/test/cg"
-
-DWARFDUMP="$(command -v llvm-dwarfdump 2>/dev/null || true)"
-READELF_BIN="$(command -v llvm-readelf 2>/dev/null || command -v readelf 2>/dev/null || true)"
-
-if [ -z "$DWARFDUMP" ] && [ -z "$READELF_BIN" ]; then
-    printf 'dwarf_validate: neither llvm-dwarfdump nor readelf in PATH; skipping\n'
-    exit 0
-fi
-
-# Collect targets.
-declare -a OBJS
-if [ $# -gt 0 ]; then
-    OBJS=("$@")
-else
-    if [ ! -d "$BUILD_DIR" ]; then
-        printf 'dwarf_validate: %s does not exist; run test-cg first\n' "$BUILD_DIR" >&2
-        exit 0
-    fi
-    while IFS= read -r f; do OBJS+=("$f"); done \
-        < <(find "$BUILD_DIR" -path '*/p*/p*.o' -type f 2>/dev/null)
-fi
-
-if [ ${#OBJS[@]} -eq 0 ]; then
-    printf 'dwarf_validate: no Group P obj files found; skipping\n'
-    exit 0
-fi
-
-fails=0
-for obj in "${OBJS[@]}"; do
-    [ -f "$obj" ] || continue
-    printf '== %s ==\n' "$obj"
-
-    if [ -n "$DWARFDUMP" ]; then
-        if ! "$DWARFDUMP" --verify "$obj" >/tmp/dwarf_verify.out 2>&1; then
-            printf '  FAIL llvm-dwarfdump --verify\n'
-            sed -n '1,40p' /tmp/dwarf_verify.out | sed 's/^/    /'
-            fails=$((fails + 1))
-        else
-            printf '  PASS llvm-dwarfdump --verify\n'
-        fi
-    fi
-
-    if [ -n "$READELF_BIN" ]; then
-        # Reference render. Non-zero return is a structural error; we
-        # don't diff content, just confirm the reader can walk every
-        # required section.
-        if ! "$READELF_BIN" --debug-dump=info,line,abbrev,aranges \
-                "$obj" >/tmp/dwarf_readelf.out 2>&1; then
-            printf '  FAIL readelf --debug-dump=info,line,abbrev,aranges\n'
-            sed -n '1,20p' /tmp/dwarf_readelf.out | sed 's/^/    /'
-            fails=$((fails + 1))
-        else
-            printf '  PASS readelf --debug-dump=info,line,abbrev,aranges\n'
-        fi
-    fi
-done
-
-total=${#OBJS[@]}
-printf '\ndwarf_validate: %d/%d objs failed\n' "$fails" "$total"
-exit "$fails"
diff --git a/test/cg/harness/cases.c b/test/cg/harness/cases.c
@@ -1,555 +0,0 @@
-/* Test case registry.
- *
- * Each case is a builder function plus an entry in cg_cases[]. Builders
- * live in per-group files (cases_a.c .. cases_i.c, cases_mc.c); helpers
- * shared across groups live in cases_shared.c. This file is just
- * forward-declarations + the flat registry the runner walks.
- *
- * Adding a case: add a `static`/non-static `build_<name>` to the
- * appropriate cases_<x>.c, list its prototype below, append a row to
- * cg_cases[], and update CORPUS.md.
- *
- * Expected exit codes are stored modulo 256 — POSIX exit() truncates to
- * one byte, and the harness compares against (expected & 0xff). */
-
-#include "cg_test.h"
-
-/* ---- builder forward decls ---- */
-
-void build_mc_smoke(CgTestCtx*);
-
-void build_a01_return_const_42(CgTestCtx*);
-void build_a02_return_zero(CgTestCtx*);
-void build_a03_ret_imm(CgTestCtx*);
-void build_a04_copy_reg(CgTestCtx*);
-void build_a05_return_neg_small(CgTestCtx*);
-void build_a06_return_i64(CgTestCtx*);
-void build_a07_void_return(CgTestCtx*);
-void build_a08_multiple_returns(CgTestCtx*);
-void build_a09_load_imm_movz_movk(CgTestCtx*);
-void build_a10_return_u8(CgTestCtx*);
-
-void build_b01_param_int(CgTestCtx*);
-void build_b02_param_sum(CgTestCtx*);
-void build_b03_param_spill(CgTestCtx*);
-void build_b04_local_int(CgTestCtx*);
-void build_b05_addr_taken_local(CgTestCtx*);
-void build_b06_sret(CgTestCtx*);
-void build_b07_byval_param(CgTestCtx*);
-void build_b08_fp_param(CgTestCtx*);
-
-void build_c01_add(CgTestCtx*);
-void build_c02_sub_mul(CgTestCtx*);
-void build_c03_bitwise(CgTestCtx*);
-void build_c04_shift(CgTestCtx*);
-void build_c05_div_mod(CgTestCtx*);
-void build_c06_xor(CgTestCtx*);
-void build_c07_iadd_i64(CgTestCtx*);
-void build_c08_unsigned_div(CgTestCtx*);
-void build_c09_neg(CgTestCtx*);
-void build_c10_logical_not(CgTestCtx*);
-void build_c11_shr_signed(CgTestCtx*);
-void build_c12_imul_i64(CgTestCtx*);
-
-void build_d01_cmp_eq_true(CgTestCtx*);
-void build_d02_cmp_eq_false(CgTestCtx*);
-void build_d03_cmp_ne(CgTestCtx*);
-void build_d04_cmp_lt_signed(CgTestCtx*);
-void build_d05_cmp_lt_unsigned(CgTestCtx*);
-void build_d06_cmp_ge_signed(CgTestCtx*);
-void build_d07_cmp_branch_taken(CgTestCtx*);
-void build_d08_cmp_branch_not_taken(CgTestCtx*);
-void build_d09_cmp_branch_lt_signed(CgTestCtx*);
-void build_d10_jump(CgTestCtx*);
-void build_d11_scope_if_true(CgTestCtx*);
-void build_d12_scope_if_false(CgTestCtx*);
-void build_d13_scope_if_else(CgTestCtx*);
-
-void build_e01_sext_i8_i32(CgTestCtx*);
-void build_e02_zext_u8_i32(CgTestCtx*);
-void build_e03_sext_i16_i32(CgTestCtx*);
-void build_e04_zext_u16_i32(CgTestCtx*);
-void build_e05_zext_u32_i64(CgTestCtx*);
-void build_e06_sext_i32_i64(CgTestCtx*);
-void build_e07_trunc_i64_i32(CgTestCtx*);
-void build_e08_trunc_i32_i8(CgTestCtx*);
-void build_e09_itof_s_i32_f32(CgTestCtx*);
-void build_e10_itof_u_u32_f64(CgTestCtx*);
-void build_e11_ftoi_s_neg(CgTestCtx*);
-void build_e12_ftoi_u_pos(CgTestCtx*);
-void build_e13_fext_f32_f64(CgTestCtx*);
-void build_e14_ftrunc_f64_f32(CgTestCtx*);
-void build_e15_bitcast_i32_f32(CgTestCtx*);
-
-void build_f01_load_store_i8(CgTestCtx*);
-void build_f02_load_store_i16(CgTestCtx*);
-void build_f03_load_store_i64(CgTestCtx*);
-void build_f04_load_store_f32(CgTestCtx*);
-void build_f05_load_store_f64(CgTestCtx*);
-void build_f06_indirect_nonzero_offset(CgTestCtx*);
-void build_f07_store_reg(CgTestCtx*);
-void build_f08_copy_bytes(CgTestCtx*);
-void build_f09_set_bytes_zero(CgTestCtx*);
-void build_f10_set_bytes_ff(CgTestCtx*);
-void build_f11_volatile_rw(CgTestCtx*);
-void build_f12_bitfield_unsigned(CgTestCtx*);
-void build_f13_bitfield_signed(CgTestCtx*);
-
-void build_g01_indirect_call(CgTestCtx*);
-void build_g02_recursion_factorial(CgTestCtx*);
-void build_g03_recursion_fib(CgTestCtx*);
-void build_g04_mutual_recursion(CgTestCtx*);
-void build_g05_chained_calls(CgTestCtx*);
-void build_g06_mixed_int_fp_params(CgTestCtx*);
-void build_g07_void_call_outparam(CgTestCtx*);
-void build_g08_large_struct_byval(CgTestCtx*);
-void build_g09_hfa_param_f32x2(CgTestCtx*);
-void build_g10_hfa_return_f32x2(CgTestCtx*);
-void build_g11_caller_saved_live_across_call(CgTestCtx*);
-void build_g12_addr_taken_local_across_call(CgTestCtx*);
-void build_g13_call_in_loop_induction(CgTestCtx*);
-
-void build_h01_while_sum_0_to_9(CgTestCtx*);
-void build_h02_do_while_once(CgTestCtx*);
-void build_h03_for_count_to_10(CgTestCtx*);
-void build_h04_loop_break(CgTestCtx*);
-void build_h05_loop_continue(CgTestCtx*);
-void build_h06_nested_loops(CgTestCtx*);
-void build_h07_break_inner_only(CgTestCtx*);
-void build_h08_early_return_in_loop(CgTestCtx*);
-void build_h09_switch_three_cases(CgTestCtx*);
-void build_h10_switch_fallthrough(CgTestCtx*);
-void build_h11_switch_default(CgTestCtx*);
-void build_h12_jump_forward(CgTestCtx*);
-void build_h13_jump_backward(CgTestCtx*);
-void build_h14_short_circuit_and_skip(CgTestCtx*);
-void build_h15_short_circuit_or_skip(CgTestCtx*);
-void build_h16_ternary(CgTestCtx*);
-void build_h17_ternary_side_effect_one_arm(CgTestCtx*);
-void build_h18_unreachable_after_ret(CgTestCtx*);
-
-void build_i01_alloca_const_int(CgTestCtx*);
-void build_i02_alloca_runtime_size(CgTestCtx*);
-void build_i03_alloca_align_16(CgTestCtx*);
-void build_i04_alloca_in_loop_distinct(CgTestCtx*);
-void build_i05_alloca_then_call(CgTestCtx*);
-void build_i06_two_allocas_disjoint(CgTestCtx*);
-void build_i07_alloca_addr_escapes(CgTestCtx*);
-void build_i08_vla_param_sum(CgTestCtx*);
-void build_i09_alloca_preserves_locals(CgTestCtx*);
-void build_i10_alloca_after_named_local(CgTestCtx*);
-
-void build_j01_va_int_sum_3(CgTestCtx*);
-void build_j02_va_zero_args(CgTestCtx*);
-void build_j03_va_int_spill(CgTestCtx*);
-void build_j04_va_int64(CgTestCtx*);
-void build_j05_va_double_sum(CgTestCtx*);
-void build_j06_va_double_spill(CgTestCtx*);
-void build_j07_va_mixed_int_dbl(CgTestCtx*);
-void build_j08_va_copy(CgTestCtx*);
-void build_j09_va_two_fixed(CgTestCtx*);
-
-void build_k01_atomic_load_relaxed(CgTestCtx*);
-void build_k02_atomic_store_load_acq(CgTestCtx*);
-void build_k03_atomic_load_seq_cst(CgTestCtx*);
-void build_k04_atomic_rmw_add(CgTestCtx*);
-void build_k05_atomic_rmw_xchg(CgTestCtx*);
-void build_k06_atomic_rmw_and(CgTestCtx*);
-void build_k07_atomic_rmw_or(CgTestCtx*);
-void build_k08_atomic_rmw_xor(CgTestCtx*);
-void build_k09_atomic_rmw_sub(CgTestCtx*);
-void build_k10_atomic_rmw_nand(CgTestCtx*);
-void build_k11_atomic_cas_success(CgTestCtx*);
-void build_k12_atomic_cas_failure(CgTestCtx*);
-void build_k13_atomic_load_i64(CgTestCtx*);
-void build_k14_atomic_rmw_prior(CgTestCtx*);
-void build_k15_fence_seq_cst(CgTestCtx*);
-
-void build_l01_popcount_u32(CgTestCtx*);
-void build_l02_popcount_u64(CgTestCtx*);
-void build_l03_ctz_u32(CgTestCtx*);
-void build_l04_clz_u32(CgTestCtx*);
-void build_l05_bswap16(CgTestCtx*);
-void build_l06_bswap32(CgTestCtx*);
-void build_l07_bswap64(CgTestCtx*);
-void build_l08_memcpy_4(CgTestCtx*);
-void build_l09_memmove_overlap(CgTestCtx*);
-void build_l10_memset_zero(CgTestCtx*);
-void build_l11_memset_ff(CgTestCtx*);
-void build_l12_expect_taken(CgTestCtx*);
-void build_l13_unreachable_live(CgTestCtx*);
-void build_l14_trap_live(CgTestCtx*);
-void build_l15_prefetch_noop(CgTestCtx*);
-void build_l16_assume_aligned(CgTestCtx*);
-void build_l17_add_overflow_no(CgTestCtx*);
-void build_l18_add_overflow_yes(CgTestCtx*);
-void build_l19_sub_overflow_yes(CgTestCtx*);
-void build_l20_mul_overflow_no(CgTestCtx*);
-
-void build_n01_tls_load_le(CgTestCtx*);
-void build_n02_tls_store_le(CgTestCtx*);
-void build_n03_tls_addr_taken(CgTestCtx*);
-void build_n04_tls_i64(CgTestCtx*);
-void build_n05_tls_in_loop(CgTestCtx*);
-void build_n06_tls_two_vars(CgTestCtx*);
-void build_n07_tls_bss_zero_init(CgTestCtx*);
-void build_n08_tls_addend_offset(CgTestCtx*);
-
-void build_o01_global_load_data(CgTestCtx*);
-void build_o02_global_store_data(CgTestCtx*);
-void build_o03_global_bss_zero(CgTestCtx*);
-void build_o04_global_addr_taken(CgTestCtx*);
-void build_o05_global_i64(CgTestCtx*);
-void build_o06_rodata_load(CgTestCtx*);
-void build_o07_global_struct_field(CgTestCtx*);
-void build_o08_global_array_runtime_idx(CgTestCtx*);
-void build_o09_static_local_linkage(CgTestCtx*);
-void build_o10_global_addend(CgTestCtx*);
-void build_o11_text_section_named(CgTestCtx*);
-void build_o12_global_across_call(CgTestCtx*);
-
-void build_p01_line_one_inst(CgTestCtx*);
-void build_p02_line_monotone(CgTestCtx*);
-void build_p03_line_repeat(CgTestCtx*);
-void build_p05_func_pc_range(CgTestCtx*);
-void build_p07_local_loc(CgTestCtx*);
-
-void build_q01_three_helpers(CgTestCtx*);
-void build_q02_static_internal_linkage(CgTestCtx*);
-void build_q03_intra_tu_call_chain(CgTestCtx*);
-void build_q04_eight_helpers(CgTestCtx*);
-void build_q05_distinct_signatures(CgTestCtx*);
-void build_q06_function_section_distinct(CgTestCtx*);
-void build_q07_cross_section_calls(CgTestCtx*);
-void build_q08_forward_decl_define_late(CgTestCtx*);
-void build_q09_helper_calls_helper(CgTestCtx*);
-void build_q10_global_and_static_mix(CgTestCtx*);
-void build_q11_addr_of_helper_through_global(CgTestCtx*);
-void build_asm01_mov_imm_out(CgTestCtx*);
-void build_asm02_copy_input(CgTestCtx*);
-void build_asm03_named_clobber(CgTestCtx*);
-
-/* ---- registry ---- */
-
-const CgCase cg_cases[] = {
-    /* MC-only */
-    {"mc_smoke", build_mc_smoke, 42, CG_CASE_MC_ONLY, CG_ARCH_AARCH64},
-
-    /* Group A — function lifecycle and return */
-    {"a01_return_const_42", build_a01_return_const_42, 42, CG_CASE_DEFAULT},
-    {"a02_return_zero", build_a02_return_zero, 0, CG_CASE_DEFAULT},
-    {"a03_ret_imm", build_a03_ret_imm, 17, CG_CASE_DEFAULT},
-    {"a04_copy_reg", build_a04_copy_reg, 7, CG_CASE_DEFAULT},
-    {"a05_return_neg_small", build_a05_return_neg_small, 249, CG_CASE_DEFAULT},
-    {"a06_return_i64", build_a06_return_i64, 42, CG_CASE_DEFAULT},
-    {"a07_void_return", build_a07_void_return, 0, CG_CASE_DEFAULT},
-    {"a08_multiple_returns", build_a08_multiple_returns, 1, CG_CASE_DEFAULT},
-    {"a09_load_imm_movz_movk", build_a09_load_imm_movz_movk, 205,
-     CG_CASE_DEFAULT},
-    {"a10_return_u8", build_a10_return_u8, 200, CG_CASE_DEFAULT},
-
-    /* Group B — frame slots, parameters, locals */
-    {"b01_param_int", build_b01_param_int, 201, CG_CASE_DEFAULT},
-    {"b02_param_sum", build_b02_param_sum, 42, CG_CASE_DEFAULT},
-    {"b03_param_spill", build_b03_param_spill, 45, CG_CASE_DEFAULT},
-    {"b04_local_int", build_b04_local_int, 42, CG_CASE_DEFAULT},
-    {"b05_addr_taken_local", build_b05_addr_taken_local, 18, CG_CASE_DEFAULT},
-    {"b06_sret", build_b06_sret, 42, CG_CASE_DEFAULT},
-    {"b07_byval_param", build_b07_byval_param, 42, CG_CASE_DEFAULT},
-    {"b08_fp_param", build_b08_fp_param, 7, CG_CASE_DEFAULT},
-
-    /* Group C — integer arithmetic */
-    {"c01_add", build_c01_add, 3, CG_CASE_DEFAULT},
-    {"c02_sub_mul", build_c02_sub_mul, 17, CG_CASE_DEFAULT},
-    {"c03_bitwise", build_c03_bitwise, 252, CG_CASE_DEFAULT},
-    {"c04_shift", build_c04_shift, 40, CG_CASE_DEFAULT},
-    {"c05_div_mod", build_c05_div_mod, 8, CG_CASE_DEFAULT},
-    {"c06_xor", build_c06_xor, 255, CG_CASE_DEFAULT},
-    {"c07_iadd_i64", build_c07_iadd_i64, 42, CG_CASE_DEFAULT},
-    {"c08_unsigned_div", build_c08_unsigned_div, 14, CG_CASE_DEFAULT},
-    {"c09_neg", build_c09_neg, 214, CG_CASE_DEFAULT},
-    {"c10_logical_not", build_c10_logical_not, 1, CG_CASE_DEFAULT},
-    {"c11_shr_signed", build_c11_shr_signed, 252, CG_CASE_DEFAULT},
-    {"c12_imul_i64", build_c12_imul_i64, 42, CG_CASE_DEFAULT},
-
-    /* Group D — compare and branch */
-    {"d01_cmp_eq_true", build_d01_cmp_eq_true, 1, CG_CASE_DEFAULT},
-    {"d02_cmp_eq_false", build_d02_cmp_eq_false, 0, CG_CASE_DEFAULT},
-    {"d03_cmp_ne", build_d03_cmp_ne, 1, CG_CASE_DEFAULT},
-    {"d04_cmp_lt_signed", build_d04_cmp_lt_signed, 1, CG_CASE_DEFAULT},
-    {"d05_cmp_lt_unsigned", build_d05_cmp_lt_unsigned, 0, CG_CASE_DEFAULT},
-    {"d06_cmp_ge_signed", build_d06_cmp_ge_signed, 1, CG_CASE_DEFAULT},
-    {"d07_cmp_branch_taken", build_d07_cmp_branch_taken, 42, CG_CASE_DEFAULT},
-    {"d08_cmp_branch_not_taken", build_d08_cmp_branch_not_taken, 33,
-     CG_CASE_DEFAULT},
-    {"d09_cmp_branch_lt_signed", build_d09_cmp_branch_lt_signed, 9,
-     CG_CASE_DEFAULT},
-    {"d10_jump", build_d10_jump, 5, CG_CASE_DEFAULT},
-    {"d11_scope_if_true", build_d11_scope_if_true, 33, CG_CASE_DEFAULT},
-    {"d12_scope_if_false", build_d12_scope_if_false, 99, CG_CASE_DEFAULT},
-    {"d13_scope_if_else", build_d13_scope_if_else, 7, CG_CASE_DEFAULT},
-
-    /* Group E — conversions */
-    {"e01_sext_i8_i32", build_e01_sext_i8_i32, 255, CG_CASE_DEFAULT},
-    {"e02_zext_u8_i32", build_e02_zext_u8_i32, 255, CG_CASE_DEFAULT},
-    {"e03_sext_i16_i32", build_e03_sext_i16_i32, 24, CG_CASE_DEFAULT},
-    {"e04_zext_u16_i32", build_e04_zext_u16_i32, 205, CG_CASE_DEFAULT},
-    {"e05_zext_u32_i64", build_e05_zext_u32_i64, 255, CG_CASE_DEFAULT},
-    {"e06_sext_i32_i64", build_e06_sext_i32_i64, 255, CG_CASE_DEFAULT},
-    {"e07_trunc_i64_i32", build_e07_trunc_i64_i32, 128, CG_CASE_DEFAULT},
-    {"e08_trunc_i32_i8", build_e08_trunc_i32_i8, 255, CG_CASE_DEFAULT},
-    {"e09_itof_s_i32_f32", build_e09_itof_s_i32_f32, 7, CG_CASE_DEFAULT},
-    {"e10_itof_u_u32_f64", build_e10_itof_u_u32_f64, 100, CG_CASE_DEFAULT},
-    {"e11_ftoi_s_neg", build_e11_ftoi_s_neg, 255, CG_CASE_DEFAULT},
-    {"e12_ftoi_u_pos", build_e12_ftoi_u_pos, 200, CG_CASE_DEFAULT},
-    {"e13_fext_f32_f64", build_e13_fext_f32_f64, 3, CG_CASE_DEFAULT},
-    {"e14_ftrunc_f64_f32", build_e14_ftrunc_f64_f32, 7, CG_CASE_DEFAULT},
-    {"e15_bitcast_i32_f32", build_e15_bitcast_i32_f32, 5, CG_CASE_DEFAULT},
-
-    /* Group F — memory (loads/stores beyond locals) */
-    {"f01_load_store_i8", build_f01_load_store_i8, 200, CG_CASE_DEFAULT},
-    {"f02_load_store_i16", build_f02_load_store_i16, 52, CG_CASE_DEFAULT},
-    {"f03_load_store_i64", build_f03_load_store_i64, 66, CG_CASE_DEFAULT},
-    {"f04_load_store_f32", build_f04_load_store_f32, 7, CG_CASE_DEFAULT},
-    {"f05_load_store_f64", build_f05_load_store_f64, 3, CG_CASE_DEFAULT},
-    {"f06_indirect_nonzero_offset", build_f06_indirect_nonzero_offset, 42,
-     CG_CASE_DEFAULT},
-    {"f07_store_reg", build_f07_store_reg, 17, CG_CASE_DEFAULT},
-    {"f08_copy_bytes", build_f08_copy_bytes, 42, CG_CASE_DEFAULT},
-    {"f09_set_bytes_zero", build_f09_set_bytes_zero, 0, CG_CASE_DEFAULT},
-    {"f10_set_bytes_ff", build_f10_set_bytes_ff, 255, CG_CASE_DEFAULT},
-    {"f11_volatile_rw", build_f11_volatile_rw, 42, CG_CASE_DEFAULT},
-    {"f12_bitfield_unsigned", build_f12_bitfield_unsigned, 21, CG_CASE_DEFAULT},
-    {"f13_bitfield_signed", build_f13_bitfield_signed, 255, CG_CASE_DEFAULT},
-
-    /* Group G — calls (beyond direct-call path) */
-    {"g01_indirect_call", build_g01_indirect_call, 42, CG_CASE_DEFAULT},
-    {"g02_recursion_factorial", build_g02_recursion_factorial, 120,
-     CG_CASE_DEFAULT},
-    {"g03_recursion_fib", build_g03_recursion_fib, 55, CG_CASE_DEFAULT},
-    {"g04_mutual_recursion", build_g04_mutual_recursion, 1, CG_CASE_DEFAULT},
-    {"g05_chained_calls", build_g05_chained_calls, 42, CG_CASE_DEFAULT},
-    {"g06_mixed_int_fp_params", build_g06_mixed_int_fp_params, 42,
-     CG_CASE_DEFAULT},
-    {"g07_void_call_outparam", build_g07_void_call_outparam, 42,
-     CG_CASE_DEFAULT},
-    {"g08_large_struct_byval", build_g08_large_struct_byval, 42,
-     CG_CASE_DEFAULT},
-    {"g09_hfa_param_f32x2", build_g09_hfa_param_f32x2, 3, CG_CASE_DEFAULT},
-    {"g10_hfa_return_f32x2", build_g10_hfa_return_f32x2, 3, CG_CASE_DEFAULT},
-    {"g11_caller_saved_live_across_call",
-     build_g11_caller_saved_live_across_call, 42, CG_CASE_DEFAULT},
-    {"g12_addr_taken_local_across_call", build_g12_addr_taken_local_across_call,
-     18, CG_CASE_DEFAULT},
-    {"g13_call_in_loop_induction", build_g13_call_in_loop_induction, 45,
-     CG_CASE_DEFAULT},
-
-    /* Group H — control flow */
-    {"h01_while_sum_0_to_9", build_h01_while_sum_0_to_9, 45, CG_CASE_DEFAULT},
-    {"h02_do_while_once", build_h02_do_while_once, 42, CG_CASE_DEFAULT},
-    {"h03_for_count_to_10", build_h03_for_count_to_10, 55, CG_CASE_DEFAULT},
-    {"h04_loop_break", build_h04_loop_break, 42, CG_CASE_DEFAULT},
-    {"h05_loop_continue", build_h05_loop_continue, 90, CG_CASE_DEFAULT},
-    {"h06_nested_loops", build_h06_nested_loops, 6, CG_CASE_DEFAULT},
-    {"h07_break_inner_only", build_h07_break_inner_only, 9, CG_CASE_DEFAULT},
-    {"h08_early_return_in_loop", build_h08_early_return_in_loop, 17,
-     CG_CASE_DEFAULT},
-    {"h09_switch_three_cases", build_h09_switch_three_cases, 42,
-     CG_CASE_DEFAULT},
-    {"h10_switch_fallthrough", build_h10_switch_fallthrough, 30,
-     CG_CASE_DEFAULT},
-    {"h11_switch_default", build_h11_switch_default, 7, CG_CASE_DEFAULT},
-    {"h12_jump_forward", build_h12_jump_forward, 42, CG_CASE_DEFAULT},
-    {"h13_jump_backward", build_h13_jump_backward, 10, CG_CASE_DEFAULT},
-    {"h14_short_circuit_and_skip", build_h14_short_circuit_and_skip, 0,
-     CG_CASE_DEFAULT},
-    {"h15_short_circuit_or_skip", build_h15_short_circuit_or_skip, 0,
-     CG_CASE_DEFAULT},
-    {"h16_ternary", build_h16_ternary, 42, CG_CASE_DEFAULT},
-    {"h17_ternary_side_effect_one_arm", build_h17_ternary_side_effect_one_arm,
-     42, CG_CASE_DEFAULT},
-    {"h18_unreachable_after_ret", build_h18_unreachable_after_ret, 42,
-     CG_CASE_DEFAULT},
-
-    /* Group I — alloca / VLA */
-    {"i01_alloca_const_int", build_i01_alloca_const_int, 42, CG_CASE_DEFAULT},
-    {"i02_alloca_runtime_size", build_i02_alloca_runtime_size, 15,
-     CG_CASE_DEFAULT},
-    {"i03_alloca_align_16", build_i03_alloca_align_16, 1, CG_CASE_DEFAULT},
-    {"i04_alloca_in_loop_distinct", build_i04_alloca_in_loop_distinct, 1,
-     CG_CASE_DEFAULT},
-    {"i05_alloca_then_call", build_i05_alloca_then_call, 42, CG_CASE_DEFAULT},
-    {"i06_two_allocas_disjoint", build_i06_two_allocas_disjoint, 3,
-     CG_CASE_DEFAULT},
-    {"i07_alloca_addr_escapes", build_i07_alloca_addr_escapes, 42,
-     CG_CASE_DEFAULT},
-    {"i08_vla_param_sum", build_i08_vla_param_sum, 45, CG_CASE_DEFAULT},
-    {"i09_alloca_preserves_locals", build_i09_alloca_preserves_locals, 42,
-     CG_CASE_DEFAULT},
-    {"i10_alloca_after_named_local", build_i10_alloca_after_named_local, 42,
-     CG_CASE_DEFAULT},
-
-    /* Group J — varargs */
-    {"j01_va_int_sum_3", build_j01_va_int_sum_3, 6, CG_CASE_DEFAULT},
-    {"j02_va_zero_args", build_j02_va_zero_args, 0, CG_CASE_DEFAULT},
-    {"j03_va_int_spill", build_j03_va_int_spill, 55, CG_CASE_DEFAULT},
-    {"j04_va_int64", build_j04_va_int64, 42, CG_CASE_DEFAULT},
-    {"j05_va_double_sum", build_j05_va_double_sum, 7, CG_CASE_DEFAULT},
-    {"j06_va_double_spill", build_j06_va_double_spill, 4, CG_CASE_DEFAULT},
-    {"j07_va_mixed_int_dbl", build_j07_va_mixed_int_dbl, 42, CG_CASE_DEFAULT},
-    {"j08_va_copy", build_j08_va_copy, 42, CG_CASE_DEFAULT},
-    {"j09_va_two_fixed", build_j09_va_two_fixed, 42, CG_CASE_DEFAULT},
-
-    /* Group K — atomics */
-    {"k01_atomic_load_relaxed", build_k01_atomic_load_relaxed, 42,
-     CG_CASE_DEFAULT},
-    {"k02_atomic_store_load_acq", build_k02_atomic_store_load_acq, 42,
-     CG_CASE_DEFAULT},
-    {"k03_atomic_load_seq_cst", build_k03_atomic_load_seq_cst, 42,
-     CG_CASE_DEFAULT},
-    {"k04_atomic_rmw_add", build_k04_atomic_rmw_add, 42, CG_CASE_DEFAULT},
-    {"k05_atomic_rmw_xchg", build_k05_atomic_rmw_xchg, 42, CG_CASE_DEFAULT},
-    {"k06_atomic_rmw_and", build_k06_atomic_rmw_and, 42, CG_CASE_DEFAULT},
-    {"k07_atomic_rmw_or", build_k07_atomic_rmw_or, 42, CG_CASE_DEFAULT},
-    {"k08_atomic_rmw_xor", build_k08_atomic_rmw_xor, 42, CG_CASE_DEFAULT},
-    {"k09_atomic_rmw_sub", build_k09_atomic_rmw_sub, 42, CG_CASE_DEFAULT},
-    {"k10_atomic_rmw_nand", build_k10_atomic_rmw_nand, 42, CG_CASE_DEFAULT},
-    {"k11_atomic_cas_success", build_k11_atomic_cas_success, 42,
-     CG_CASE_DEFAULT},
-    {"k12_atomic_cas_failure", build_k12_atomic_cas_failure, 10,
-     CG_CASE_DEFAULT},
-    {"k13_atomic_load_i64", build_k13_atomic_load_i64, 42, CG_CASE_DEFAULT},
-    {"k14_atomic_rmw_prior", build_k14_atomic_rmw_prior, 40, CG_CASE_DEFAULT},
-    {"k15_fence_seq_cst", build_k15_fence_seq_cst, 42, CG_CASE_DEFAULT},
-
-    /* Group L — intrinsics */
-    {"l01_popcount_u32", build_l01_popcount_u32, 8, CG_CASE_DEFAULT},
-    {"l02_popcount_u64", build_l02_popcount_u64, 64, CG_CASE_DEFAULT},
-    {"l03_ctz_u32", build_l03_ctz_u32, 7, CG_CASE_DEFAULT},
-    {"l04_clz_u32", build_l04_clz_u32, 24, CG_CASE_DEFAULT},
-    {"l05_bswap16", build_l05_bswap16, 18, CG_CASE_DEFAULT},
-    {"l06_bswap32", build_l06_bswap32, 17, CG_CASE_DEFAULT},
-    {"l07_bswap64", build_l07_bswap64, 17, CG_CASE_DEFAULT},
-    {"l08_memcpy_4", build_l08_memcpy_4, 42, CG_CASE_DEFAULT},
-    {"l09_memmove_overlap", build_l09_memmove_overlap, 4, CG_CASE_DEFAULT},
-    {"l10_memset_zero", build_l10_memset_zero, 0, CG_CASE_DEFAULT},
-    {"l11_memset_ff", build_l11_memset_ff, 255, CG_CASE_DEFAULT},
-    {"l12_expect_taken", build_l12_expect_taken, 42, CG_CASE_DEFAULT},
-    {"l13_unreachable_live", build_l13_unreachable_live, 42, CG_CASE_DEFAULT},
-    {"l14_trap_live", build_l14_trap_live, 42, CG_CASE_DEFAULT},
-    {"l15_prefetch_noop", build_l15_prefetch_noop, 42, CG_CASE_DEFAULT},
-    {"l16_assume_aligned", build_l16_assume_aligned, 42, CG_CASE_DEFAULT},
-    {"l17_add_overflow_no", build_l17_add_overflow_no, 42, CG_CASE_DEFAULT},
-    {"l18_add_overflow_yes", build_l18_add_overflow_yes, 1, CG_CASE_DEFAULT},
-    {"l19_sub_overflow_yes", build_l19_sub_overflow_yes, 1, CG_CASE_DEFAULT},
-    {"l20_mul_overflow_no", build_l20_mul_overflow_no, 42, CG_CASE_DEFAULT},
-
-    /* Group N — TLS */
-    {"n01_tls_load_le", build_n01_tls_load_le, 42, CG_CASE_DEFAULT},
-    {"n02_tls_store_le", build_n02_tls_store_le, 42, CG_CASE_DEFAULT},
-    {"n03_tls_addr_taken", build_n03_tls_addr_taken, 18, CG_CASE_DEFAULT},
-    {"n04_tls_i64", build_n04_tls_i64, 42, CG_CASE_DEFAULT},
-    {"n05_tls_in_loop", build_n05_tls_in_loop, 10, CG_CASE_DEFAULT},
-    {"n06_tls_two_vars", build_n06_tls_two_vars, 42, CG_CASE_DEFAULT},
-    {"n07_tls_bss_zero_init", build_n07_tls_bss_zero_init, 0, CG_CASE_DEFAULT},
-    {"n08_tls_addend_offset", build_n08_tls_addend_offset, 42, CG_CASE_DEFAULT},
-
-    /* Group O — sections and globals */
-    {"o01_global_load_data", build_o01_global_load_data, 42, CG_CASE_DEFAULT},
-    {"o02_global_store_data", build_o02_global_store_data, 42, CG_CASE_DEFAULT},
-    {"o03_global_bss_zero", build_o03_global_bss_zero, 0, CG_CASE_DEFAULT},
-    {"o04_global_addr_taken", build_o04_global_addr_taken, 18, CG_CASE_DEFAULT},
-    {"o05_global_i64", build_o05_global_i64, 42, CG_CASE_DEFAULT},
-    {"o06_rodata_load", build_o06_rodata_load, 42, CG_CASE_DEFAULT},
-    {"o07_global_struct_field", build_o07_global_struct_field, 42,
-     CG_CASE_DEFAULT},
-    {"o08_global_array_runtime_idx", build_o08_global_array_runtime_idx, 3,
-     CG_CASE_DEFAULT},
-    {"o09_static_local_linkage", build_o09_static_local_linkage, 42,
-     CG_CASE_DEFAULT},
-    {"o10_global_addend", build_o10_global_addend, 42, CG_CASE_DEFAULT},
-    {"o11_text_section_named", build_o11_text_section_named, 42,
-     CG_CASE_DEFAULT},
-    {"o12_global_across_call", build_o12_global_across_call, 42,
-     CG_CASE_DEFAULT},
-
-    /* Group P — set_loc / debug. The exit-code oracle (D/E/J) is 42; the
-     * W path checks the line program. See cases_p.c for the contract.
-     * Phase-1 producer + Phase-2 consumer make p01..p05 viable; p07
-     * additionally needs Phase-3 (debug_local). */
-    {"p01_line_one_inst", build_p01_line_one_inst, 42, CG_CASE_DEFAULT},
-    {"p02_line_monotone", build_p02_line_monotone, 42, CG_CASE_DEFAULT},
-    {"p03_line_repeat", build_p03_line_repeat, 42, CG_CASE_DEFAULT},
-    {"p05_func_pc_range", build_p05_func_pc_range, 42, CG_CASE_DEFAULT},
-    {"p07_local_loc", build_p07_local_loc, 42, CG_CASE_DEFAULT},
-
-    /* Group Q — multi-function */
-    {"q01_three_helpers", build_q01_three_helpers, 42, CG_CASE_DEFAULT},
-    {"q02_static_internal_linkage", build_q02_static_internal_linkage, 42,
-     CG_CASE_DEFAULT},
-    {"q03_intra_tu_call_chain", build_q03_intra_tu_call_chain, 42,
-     CG_CASE_DEFAULT},
-    {"q04_eight_helpers", build_q04_eight_helpers, 42, CG_CASE_DEFAULT},
-    {"q05_distinct_signatures", build_q05_distinct_signatures, 42,
-     CG_CASE_DEFAULT},
-    {"q06_function_section_distinct", build_q06_function_section_distinct, 42,
-     CG_CASE_DEFAULT},
-    {"q07_cross_section_calls", build_q07_cross_section_calls, 42,
-     CG_CASE_DEFAULT},
-    {"q08_forward_decl_define_late", build_q08_forward_decl_define_late, 42,
-     CG_CASE_DEFAULT},
-    {"q09_helper_calls_helper", build_q09_helper_calls_helper, 42,
-     CG_CASE_DEFAULT},
-    {"q10_global_and_static_mix", build_q10_global_and_static_mix, 42,
-     CG_CASE_DEFAULT},
-    {"q11_addr_of_helper_through_global",
-     build_q11_addr_of_helper_through_global, 42, CG_CASE_DEFAULT},
-
-    /* Group ASM — inline asm */
-    {"asm01_mov_imm_out", build_asm01_mov_imm_out, 42, CG_CASE_DEFAULT,
-     CG_ARCH_AARCH64},
-    {"asm02_copy_input", build_asm02_copy_input, 7, CG_CASE_DEFAULT,
-     CG_ARCH_AARCH64},
-    {"asm03_named_clobber", build_asm03_named_clobber, 99, CG_CASE_DEFAULT,
-     CG_ARCH_AARCH64},
-};
-
-const unsigned cg_cases_count = sizeof(cg_cases) / sizeof(cg_cases[0]);
-
-/* ---- DWARF check registry (path W) ----
- * Only Group P has entries today. See cg_test.h for the directive
- * vocabulary. */
-const CgDwarfCheck cg_dwarf_checks[] = {
-    {"p01_line_one_inst",
-     "subprogram test_main\n"
-     "line p01.c 10\n"},
-    /* p02 — three statements, three line rows (monotone). */
-    {"p02_line_monotone",
-     "subprogram test_main\n"
-     "line p02.c 1\n"
-     "line p02.c 2\n"
-     "line p02.c 3\n"},
-    /* p03 — same line repeated on two distinct PCs; one round-trip is
-     * enough to assert the binding survives. */
-    {"p03_line_repeat",
-     "subprogram test_main\n"
-     "line p03.c 7\n"},
-    /* p05 — function pc range. test_main is a tiny prologue + load_imm +
-     * ret + epilogue; the AArch64 prologue+epilogue alone are ~7 words
-     * (28 bytes), so the function size easily exceeds 16 bytes and is
-     * comfortably under 256 bytes. */
-    {"p05_func_pc_range",
-     "subprogram test_main\n"
-     "line p05.c 11\n"
-     "pc_range p05.c 11 16 256\n"},
-    /* p07 — local variable location. The decl-info pipeline (debug_local)
-     * is Phase 3; until that lands the var directive will fail and the
-     * line/subprogram directives keep us honest about what is wired. */
-    {"p07_local_loc",
-     "subprogram test_main\n"
-     "line p07.c 5\n"
-     "var 0x0 my_local frame *\n"},
-};
-
-const unsigned cg_dwarf_checks_count =
-    sizeof(cg_dwarf_checks) / sizeof(cg_dwarf_checks[0]);
diff --git a/test/cg/harness/cases_a.c b/test/cg/harness/cases_a.c
@@ -1,112 +0,0 @@
-/* Group A — function lifecycle and return.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group A: function lifecycle and return
- * ============================================================ */
-
-/* a01_return_const_42 — alloc reg, load_imm(42), ret reg. */
-void build_a01_return_const_42(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 42);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* a02_return_zero — load_imm(0). */
-void build_a02_return_zero(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 0);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* a03_ret_imm — backend materializes the imm directly inside ret(). */
-void build_a03_ret_imm(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  cgtest_ret_imm(tf, 17, I32);
-  cgtest_end(tf);
-}
-
-/* a04_copy_reg — load_imm(7) into r1, copy r1->r2, ret r2. */
-void build_a04_copy_reg(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r1 = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  Reg r2 = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(r1, I32), 7);
-  ctx->target->copy(ctx->target, REG_op(r2, I32), REG_op(r1, I32));
-  cgtest_ret_reg(tf, r2, I32);
-  cgtest_end(tf);
-}
-
-/* a05_return_neg_small — load_imm(-7), ret. Backend should select MOVN.
- * Exit code = -7 & 0xff = 249. */
-void build_a05_return_neg_small(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), -7);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* a06_return_i64 — i64 load_imm with high bits set, return as i64.
- * test_main is cast to int(*)(void) by the runner, which reads w0
- * (low 32 of x0). Value 0x100000002A → low 32 = 42. */
-void build_a06_return_i64(CgTestCtx* ctx) {
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I64);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I64);
-  ctx->target->load_imm(ctx->target, REG_op(r, I64), 0x100000002Aull);
-  cgtest_ret_reg(tf, r, I64);
-  cgtest_end(tf);
-}
-
-/* a07_void_return — function returns void. The harness wrapper (start.c
- * for path E; the C compiler for path D/J) zeroes x0 before the call, so
- * the observed exit code is 0. */
-void build_a07_void_return(CgTestCtx* ctx) {
-  CgTestFn* tf = cgtest_begin_main(ctx, T_void(ctx));
-  cgtest_ret_void(tf);
-  cgtest_end(tf);
-}
-
-/* a08_multiple_returns — two consecutive ret() calls in straight-line
- * code. The first is taken; the second is dead. Exercises that the
- * backend can emit more than one return-shaped sequence per function. */
-void build_a08_multiple_returns(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  cgtest_ret_imm(tf, 1, I32);
-  cgtest_ret_imm(tf, 2, I32); /* unreachable */
-  cgtest_end(tf);
-}
-
-/* a09_load_imm_movz_movk — value that requires multi-step materialization
- * on AArch64 (low 16 != 0, high 16 != 0). 0xABCD → exit 0xCD = 205. */
-void build_a09_load_imm_movz_movk(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 0xABCD);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* a10_return_u8 — narrow integer return. Value 200 fits in u8 → 200. */
-void build_a10_return_u8(CgTestCtx* ctx) {
-  const Type* U8 = T_u8(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, U8);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, U8);
-  ctx->target->load_imm(ctx->target, REG_op(r, U8), 200);
-  cgtest_ret_reg(tf, r, U8);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_asm.c b/test/cg/harness/cases_asm.c
@@ -1,101 +0,0 @@
-/* Group ASM — inline-asm smoke cases.
- *
- * Each builder drives ctx->target->asm_block directly with hand-built
- * AsmConstraint / Operand arrays, then returns the asm output through
- * cgtest_ret_reg. At opt-level 0 the call lands in aa_asm_block; at
- * opt-level >0 it is captured as IR_ASM_BLOCK by w_asm_block and replayed
- * to the real backend at lowering time. Both paths must produce the
- * same exit code.
- *
- * See CORPUS.md for the case list and expected values. */
-
-#include <string.h>
-
-#include "cg_test.h"
-
-/* asm01_mov_imm_out — `__asm__("mov %w0, #42" : "=r"(rc));` then return rc. */
-void build_asm01_mov_imm_out(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  AsmConstraint outs[1];
-  memset(outs, 0, sizeof outs);
-  outs[0].str = "=r";
-  outs[0].dir = ASM_OUT;
-  outs[0].type = I32;
-
-  Operand out_ops[1];
-  out_ops[0] = REG_op(r, I32);
-
-  ctx->target->asm_block(ctx->target, "mov %w0, #42",
-                         outs, /*nout=*/1, out_ops,
-                         /*ins=*/NULL, /*nin=*/0, /*in_ops=*/NULL,
-                         /*clobbers=*/NULL, /*nclob=*/0);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* asm02_copy_input — `__asm__("mov %w0, %w1" : "=r"(rc) : "r"(7));` */
-void build_asm02_copy_input(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg rin = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(rin, I32), 7);
-
-  Reg rout = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  AsmConstraint outs[1];
-  memset(outs, 0, sizeof outs);
-  outs[0].str = "=r";
-  outs[0].dir = ASM_OUT;
-  outs[0].type = I32;
-  Operand out_ops[1];
-  out_ops[0] = REG_op(rout, I32);
-
-  AsmConstraint ins[1];
-  memset(ins, 0, sizeof ins);
-  ins[0].str = "r";
-  ins[0].dir = ASM_IN;
-  ins[0].type = I32;
-  Operand in_ops[1];
-  in_ops[0] = REG_op(rin, I32);
-
-  ctx->target->asm_block(ctx->target, "mov %w0, %w1",
-                         outs, /*nout=*/1, out_ops,
-                         ins, /*nin=*/1, in_ops,
-                         /*clobbers=*/NULL, /*nclob=*/0);
-  ctx->target->free_reg(ctx->target, rin, RC_INT);
-  cgtest_ret_reg(tf, rout, I32);
-  cgtest_end(tf);
-}
-
-/* asm03_named_clobber — clobber x19 (callee-saved) without using it. The
- * func prologue must save/restore x19 even though no SValue ever bound
- * it; this is the hwm-bump path in aa_asm_block. The asm body trashes
- * x19 (mov x19, #0xdead) and returns a constant; correct exit code
- * indicates x19 was preserved by the prologue. */
-void build_asm03_named_clobber(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  AsmConstraint outs[1];
-  memset(outs, 0, sizeof outs);
-  outs[0].str = "=r";
-  outs[0].dir = ASM_OUT;
-  outs[0].type = I32;
-  Operand out_ops[1];
-  out_ops[0] = REG_op(r, I32);
-
-  Sym clobs[1];
-  clobs[0] = pool_intern_cstr(ctx->pool, "x19");
-
-  ctx->target->asm_block(ctx->target,
-                         "mov x19, #1; mov %w0, #99",
-                         outs, /*nout=*/1, out_ops,
-                         /*ins=*/NULL, /*nin=*/0, /*in_ops=*/NULL,
-                         clobs, /*nclob=*/1);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_b.c b/test/cg/harness/cases_b.c
@@ -1,315 +0,0 @@
-/* Group B — frame slots, parameters, locals.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cases_shared.h"
-#include "cg_test.h"
-
-/* ============================================================
- * Group B: frame slots, parameters, locals
- *
- * Several cases here define a helper function and have test_main call
- * it. The helper exercises param wiring; test_main exercises the call
- * sequence. Both share one CGTarget instance — the backend must support
- * multiple func_begin/func_end pairs per TU.
- * ============================================================ */
-
-/* helper used by b01: int echo(int x) { return x; } */
-static ObjSymId build_b01_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "b01_echo", I32, params, 1);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), cgtest_param_slot(tf, 0), I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* b01_param_int — echo(201) → 201. */
-void build_b01_param_int(CgTestCtx* ctx) {
-  ObjSymId echo = build_b01_helper(ctx);
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 201}};
-  cgtest_call(tf, echo, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by b02: int sum2(int a, int b) { return a + b; } */
-static ObjSymId build_b02_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32, I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "b02_sum2", I32, params, 2);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(a, I32), cgtest_param_slot(tf, 0), I32);
-  cgtest_load_local(tf, REG_op(b, I32), cgtest_param_slot(tf, 1), I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* b02_param_sum — sum2(40, 2) → 42. */
-void build_b02_param_sum(CgTestCtx* ctx) {
-  ObjSymId sum2 = build_b02_helper(ctx);
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32, I32};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 40},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 2},
-  };
-  cgtest_call(tf, sum2, I32, params, args, 2, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by b03: int sum9(a..i) { return a+b+c+d+e+f+g+h+i; }
- * Nine int parameters force at least one to spill onto the stack on
- * AArch64 SysV (8 GP arg registers). */
-static ObjSymId build_b03_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* p[9] = {I32, I32, I32, I32, I32, I32, I32, I32, I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "b03_sum9", I32, p, 9);
-  CGTarget* T = ctx->target;
-  Reg accum = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(accum, I32), 0);
-  for (u32 i = 0; i < 9; ++i) {
-    Reg t = T->alloc_reg(T, RC_INT, I32);
-    cgtest_load_local(tf, REG_op(t, I32), cgtest_param_slot(tf, i), I32);
-    T->binop(T, BO_IADD, REG_op(accum, I32), REG_op(accum, I32),
-             REG_op(t, I32));
-  }
-  cgtest_ret_reg(tf, accum, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* b03_param_spill — sum9(1..9) = 45. */
-void build_b03_param_spill(CgTestCtx* ctx) {
-  ObjSymId sum9 = build_b03_helper(ctx);
-  const Type* I32 = T_i32(ctx);
-  const Type* params[9] = {I32, I32, I32, I32, I32, I32, I32, I32, I32};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  CgTestArg args[9];
-  for (int i = 0; i < 9; ++i) {
-    args[i] = (CgTestArg){.kind = CGT_ARG_IMM, .type = I32, .v.imm = i + 1};
-  }
-  cgtest_call(tf, sum9, I32, params, args, 9, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* b04_local_int — alloc local int, store 42, load it back, return. */
-void build_b04_local_int(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, s, IMM_op(42, I32), I32);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), s, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* b05_addr_taken_local — addr_of forces frame residency. Take address,
- * store via INDIRECT, load via INDIRECT, return.
- *   int x;        store-to-slot 17
- *   int* p = &x;
- *   *p = *p + 1;  → 18
- *   return *p; */
-void build_b05_addr_taken_local(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  cgtest_store_local(tf, x, IMM_op(17, I32), I32);
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(p, T_ptr(ctx, I32)), LOCAL_op(x, I32));
-
-  /* val = *p; val += 1; *p = val; return val; */
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->load(T, REG_op(val, I32), IND_op(p, 0, I32), ma);
-  T->binop(T, BO_IADD, REG_op(val, I32), REG_op(val, I32), IMM_op(1, I32));
-  T->store(T, IND_op(p, 0, I32), REG_op(val, I32), ma);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(out, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by b06:
- *   struct Pt { int a; int b; };
- *   struct Pt mk(void) { return (struct Pt){10, 32}; }
- * On AArch64 SysV a 8-byte struct returns in x0 (or split across regs);
- * the harness leaves register placement to abi_func_info. The body
- * stores .a=10 and .b=32 into a local struct and returns it. */
-
-static ObjSymId build_b06_helper(CgTestCtx* ctx) {
-  const Type* PT = cases_pt_type(ctx);
-  CgTestFn* tf = cgtest_begin_func(ctx, "b06_mk", PT, NULL, 0);
-  CGTarget* T = ctx->target;
-
-  /* Build the struct in a local then return its address; the backend
-   * uses fn->abi_info->ret to decide whether to copy into the sret
-   * pointer (large struct) or load into the return regs (small). */
-  FrameSlot s = cgtest_local(tf, PT, FSF_NONE);
-  /* &s + 0 = .a, &s + 4 = .b */
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, PT));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, PT)), LOCAL_op(s, PT));
-  MemAccess ma_i32 = {
-      .type = T_i32(ctx), .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(base, 0, T_i32(ctx)), IMM_op(10, T_i32(ctx)), ma_i32);
-  T->store(T, IND_op(base, 4, T_i32(ctx)), IMM_op(32, T_i32(ctx)), ma_i32);
-
-  cgtest_ret_indirect(tf, s);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* b06_sret — pt = mk(); return pt.a + pt.b → 42. */
-void build_b06_sret(CgTestCtx* ctx) {
-  const Type* PT = cases_pt_type(ctx);
-  ObjSymId mk = build_b06_helper(ctx);
-  const Type* I32 = T_i32(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Caller provides a local for the sret destination. The harness
-   * passes its address as the ret_storage operand; the backend will
-   * either materialize an sret pointer (large) or unpack regs into
-   * the local (small) per the ABI. */
-  FrameSlot dst = cgtest_local(tf, PT, FSF_ADDR_TAKEN);
-  cgtest_call(tf, mk, PT, NULL, NULL, 0, LOCAL_op(dst, PT));
-
-  /* Load .a, .b, sum, return. */
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, PT));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, PT)), LOCAL_op(dst, PT));
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  Reg ra = T->alloc_reg(T, RC_INT, I32);
-  Reg rb = T->alloc_reg(T, RC_INT, I32);
-  Reg sum = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(ra, I32), IND_op(base, 0, I32), ma);
-  T->load(T, REG_op(rb, I32), IND_op(base, 4, I32), ma);
-  T->binop(T, BO_IADD, REG_op(sum, I32), REG_op(ra, I32), REG_op(rb, I32));
-  cgtest_ret_reg(tf, sum, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by b07:
- *   struct Pt { int a; int b; };
- *   int take(struct Pt p) { return p.a + p.b; }
- * Aggregate-by-value parameter. Caller builds a local Pt and passes its
- * address with byval semantics; the callee receives a local copy. */
-static ObjSymId build_b07_helper(CgTestCtx* ctx) {
-  const Type* PT = cases_pt_type(ctx);
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {PT};
-  CgTestFn* tf = cgtest_begin_func(ctx, "b07_take", I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  /* The param's home slot holds the byval copy. Compute its address
-   * and load .a, .b. */
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, PT));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, PT)),
-             LOCAL_op(cgtest_param_slot(tf, 0), PT));
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_PARAM};
-  Reg ra = T->alloc_reg(T, RC_INT, I32);
-  Reg rb = T->alloc_reg(T, RC_INT, I32);
-  Reg sum = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(ra, I32), IND_op(base, 0, I32), ma);
-  T->load(T, REG_op(rb, I32), IND_op(base, 4, I32), ma);
-  T->binop(T, BO_IADD, REG_op(sum, I32), REG_op(ra, I32), REG_op(rb, I32));
-  cgtest_ret_reg(tf, sum, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* b07_byval_param — take({.a=15, .b=27}) → 42. */
-void build_b07_byval_param(CgTestCtx* ctx) {
-  const Type* PT = cases_pt_type(ctx);
-  const Type* I32 = T_i32(ctx);
-  ObjSymId take = build_b07_helper(ctx);
-  const Type* params[] = {PT};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot src = cgtest_local(tf, PT, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, PT));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, PT)), LOCAL_op(src, PT));
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(base, 0, I32), IMM_op(15, I32), ma);
-  T->store(T, IND_op(base, 4, I32), IMM_op(27, I32), ma);
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_BYVAL_LOCAL, .type = PT, .v.slot = src}};
-  cgtest_call(tf, take, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by b08: int trunc(float f) { return (int)f; } */
-static ObjSymId build_b08_helper(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {F32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "b08_trunc", I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  Reg f = T->alloc_reg(T, RC_FP, F32);
-  cgtest_load_local(tf, REG_op(f, F32), cgtest_param_slot(tf, 0), F32);
-  Reg i = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(i, I32), REG_op(f, F32));
-  cgtest_ret_reg(tf, i, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* b08_fp_param — trunc(7.5f) → 7. The IMM operand encodes the float
- * bit-pattern; backend uses cls=RC_FP + type=float to materialize via
- * load_const or fmov literal. */
-void build_b08_fp_param(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* I32 = T_i32(ctx);
-  ObjSymId fn = build_b08_helper(ctx);
-  const Type* params[] = {F32};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Materialize 7.5f via load_const (immediate float). */
-  static const u8 BYTES_75F[4] = {0x00, 0x00, 0xF0, 0x40}; /* IEEE 7.5f LE */
-  Reg f = T->alloc_reg(T, RC_FP, F32);
-  ConstBytes cb = {.type = F32, .bytes = BYTES_75F, .size = 4, .align = 4};
-  T->load_const(T, REG_op(f, F32), cb);
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_REG, .type = F32, .v.reg = f}};
-  cgtest_call(tf, fn, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_c.c b/test/cg/harness/cases_c.c
@@ -1,204 +0,0 @@
-/* Group C — integer arithmetic.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group C: integer arithmetic
- * ============================================================ */
-
-/* c01_add — 1 + 2 = 3 */
-void build_c01_add(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(a, I32), 1);
-  T->load_imm(T, REG_op(b, I32), 2);
-  T->binop(T, BO_IADD, REG_op(d, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* c02_sub_mul — 7 * 3 - 4 = 17 */
-void build_c02_sub_mul(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r7 = T->alloc_reg(T, RC_INT, I32);
-  Reg r3 = T->alloc_reg(T, RC_INT, I32);
-  Reg r4 = T->alloc_reg(T, RC_INT, I32);
-  Reg rmul = T->alloc_reg(T, RC_INT, I32);
-  Reg rsub = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r7, I32), 7);
-  T->load_imm(T, REG_op(r3, I32), 3);
-  T->load_imm(T, REG_op(r4, I32), 4);
-  T->binop(T, BO_IMUL, REG_op(rmul, I32), REG_op(r7, I32), REG_op(r3, I32));
-  T->binop(T, BO_ISUB, REG_op(rsub, I32), REG_op(rmul, I32), REG_op(r4, I32));
-  cgtest_ret_reg(tf, rsub, I32);
-  cgtest_end(tf);
-}
-
-/* c03_bitwise — (~3) & 0xff = 252 */
-void build_c03_bitwise(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r3 = T->alloc_reg(T, RC_INT, I32);
-  Reg rinv = T->alloc_reg(T, RC_INT, I32);
-  Reg rmask = T->alloc_reg(T, RC_INT, I32);
-  Reg rand_ = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r3, I32), 3);
-  T->load_imm(T, REG_op(rmask, I32), 0xff);
-  T->unop(T, UO_BNOT, REG_op(rinv, I32), REG_op(r3, I32));
-  T->binop(T, BO_AND, REG_op(rand_, I32), REG_op(rinv, I32),
-           REG_op(rmask, I32));
-  cgtest_ret_reg(tf, rand_, I32);
-  cgtest_end(tf);
-}
-
-/* c04_shift — (1<<5) | (16>>1) = 40 */
-void build_c04_shift(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r1 = T->alloc_reg(T, RC_INT, I32);
-  Reg r5 = T->alloc_reg(T, RC_INT, I32);
-  Reg r16 = T->alloc_reg(T, RC_INT, I32);
-  Reg r1s = T->alloc_reg(T, RC_INT, I32);
-  Reg rshl = T->alloc_reg(T, RC_INT, I32);
-  Reg rshr = T->alloc_reg(T, RC_INT, I32);
-  Reg ror = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r1, I32), 1);
-  T->load_imm(T, REG_op(r5, I32), 5);
-  T->load_imm(T, REG_op(r16, I32), 16);
-  T->load_imm(T, REG_op(r1s, I32), 1);
-  T->binop(T, BO_SHL, REG_op(rshl, I32), REG_op(r1, I32), REG_op(r5, I32));
-  T->binop(T, BO_SHR_U, REG_op(rshr, I32), REG_op(r16, I32), REG_op(r1s, I32));
-  T->binop(T, BO_OR, REG_op(ror, I32), REG_op(rshl, I32), REG_op(rshr, I32));
-  cgtest_ret_reg(tf, ror, I32);
-  cgtest_end(tf);
-}
-
-/* c05_div_mod — 23/4 + 23%4 = 5 + 3 = 8 (signed) */
-void build_c05_div_mod(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r23 = T->alloc_reg(T, RC_INT, I32);
-  Reg r4 = T->alloc_reg(T, RC_INT, I32);
-  Reg rd = T->alloc_reg(T, RC_INT, I32);
-  Reg rm = T->alloc_reg(T, RC_INT, I32);
-  Reg rs = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r23, I32), 23);
-  T->load_imm(T, REG_op(r4, I32), 4);
-  T->binop(T, BO_SDIV, REG_op(rd, I32), REG_op(r23, I32), REG_op(r4, I32));
-  T->binop(T, BO_SREM, REG_op(rm, I32), REG_op(r23, I32), REG_op(r4, I32));
-  T->binop(T, BO_IADD, REG_op(rs, I32), REG_op(rd, I32), REG_op(rm, I32));
-  cgtest_ret_reg(tf, rs, I32);
-  cgtest_end(tf);
-}
-
-/* c06_xor — 0xa5 ^ 0x5a = 0xff = 255 */
-void build_c06_xor(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg ra = T->alloc_reg(T, RC_INT, I32);
-  Reg rb = T->alloc_reg(T, RC_INT, I32);
-  Reg rx = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(ra, I32), 0xa5);
-  T->load_imm(T, REG_op(rb, I32), 0x5a);
-  T->binop(T, BO_XOR, REG_op(rx, I32), REG_op(ra, I32), REG_op(rb, I32));
-  cgtest_ret_reg(tf, rx, I32);
-  cgtest_end(tf);
-}
-
-/* c07_iadd_i64 — i64 add. Two i64 values added; low 32 bits returned.
- * (1<<32 | 0x29) + (1<<32 | 0x01) = (2<<32 | 0x2A) → low 32 = 42. */
-void build_c07_iadd_i64(CgTestCtx* ctx) {
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I64);
-  CGTarget* T = ctx->target;
-  Reg ra = T->alloc_reg(T, RC_INT, I64);
-  Reg rb = T->alloc_reg(T, RC_INT, I64);
-  Reg rs = T->alloc_reg(T, RC_INT, I64);
-  T->load_imm(T, REG_op(ra, I64), 0x100000029ll);
-  T->load_imm(T, REG_op(rb, I64), 0x100000001ll);
-  T->binop(T, BO_IADD, REG_op(rs, I64), REG_op(ra, I64), REG_op(rb, I64));
-  cgtest_ret_reg(tf, rs, I64);
-  cgtest_end(tf);
-}
-
-/* c08_unsigned_div — 100u / 7u = 14 */
-void build_c08_unsigned_div(CgTestCtx* ctx) {
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, U32);
-  CGTarget* T = ctx->target;
-  Reg r100 = T->alloc_reg(T, RC_INT, U32);
-  Reg r7 = T->alloc_reg(T, RC_INT, U32);
-  Reg rq = T->alloc_reg(T, RC_INT, U32);
-  T->load_imm(T, REG_op(r100, U32), 100);
-  T->load_imm(T, REG_op(r7, U32), 7);
-  T->binop(T, BO_UDIV, REG_op(rq, U32), REG_op(r100, U32), REG_op(r7, U32));
-  cgtest_ret_reg(tf, rq, U32);
-  cgtest_end(tf);
-}
-
-/* c09_neg — UO_NEG. -42 → exit 256-42 = 214. */
-void build_c09_neg(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  Reg n = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r, I32), 42);
-  T->unop(T, UO_NEG, REG_op(n, I32), REG_op(r, I32));
-  cgtest_ret_reg(tf, n, I32);
-  cgtest_end(tf);
-}
-
-/* c10_logical_not — UO_NOT yields 0/1 from any int. !0 = 1. */
-void build_c10_logical_not(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg z = T->alloc_reg(T, RC_INT, I32);
-  Reg ln = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(z, I32), 0);
-  T->unop(T, UO_NOT, REG_op(ln, I32), REG_op(z, I32));
-  cgtest_ret_reg(tf, ln, I32);
-  cgtest_end(tf);
-}
-
-/* c11_shr_signed — -16 >>(s) 2 = -4 → exit 252. */
-void build_c11_shr_signed(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg rv = T->alloc_reg(T, RC_INT, I32);
-  Reg r2 = T->alloc_reg(T, RC_INT, I32);
-  Reg rs = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(rv, I32), -16);
-  T->load_imm(T, REG_op(r2, I32), 2);
-  T->binop(T, BO_SHR_S, REG_op(rs, I32), REG_op(rv, I32), REG_op(r2, I32));
-  cgtest_ret_reg(tf, rs, I32);
-  cgtest_end(tf);
-}
-
-/* c12_imul_i64 — i64 mul. 7 * 6 = 42; high bits zero. Exit 42. */
-void build_c12_imul_i64(CgTestCtx* ctx) {
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I64);
-  CGTarget* T = ctx->target;
-  Reg r7 = T->alloc_reg(T, RC_INT, I64);
-  Reg r6 = T->alloc_reg(T, RC_INT, I64);
-  Reg rm = T->alloc_reg(T, RC_INT, I64);
-  T->load_imm(T, REG_op(r7, I64), 7);
-  T->load_imm(T, REG_op(r6, I64), 6);
-  T->binop(T, BO_IMUL, REG_op(rm, I64), REG_op(r7, I64), REG_op(r6, I64));
-  cgtest_ret_reg(tf, rm, I64);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_d.c b/test/cg/harness/cases_d.c
@@ -1,230 +0,0 @@
-/* Group D — compare and branch.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group D: compare and branch
- * ============================================================ */
-
-/* d01_cmp_eq_true — cmp materializes 0/1; (5 == 5) → 1. */
-void build_d01_cmp_eq_true(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(a, I32), 5);
-  T->load_imm(T, REG_op(b, I32), 5);
-  T->cmp(T, CMP_EQ, REG_op(d, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* d02_cmp_eq_false — (5 == 6) → 0. */
-void build_d02_cmp_eq_false(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(a, I32), 5);
-  T->load_imm(T, REG_op(b, I32), 6);
-  T->cmp(T, CMP_EQ, REG_op(d, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* d03_cmp_ne — (5 != 6) → 1. */
-void build_d03_cmp_ne(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(a, I32), 5);
-  T->load_imm(T, REG_op(b, I32), 6);
-  T->cmp(T, CMP_NE, REG_op(d, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* d04_cmp_lt_signed — (-1 < 1) signed → 1. */
-void build_d04_cmp_lt_signed(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(a, I32), -1);
-  T->load_imm(T, REG_op(b, I32), 1);
-  T->cmp(T, CMP_LT_S, REG_op(d, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* d05_cmp_lt_unsigned — same bit patterns as d04 but unsigned: 0xFFFFFFFF
- * is huge, so (0xFFFFFFFF < 1) → 0. Signedness lives in CmpOp, not Type. */
-void build_d05_cmp_lt_unsigned(CgTestCtx* ctx) {
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, U32);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, U32);
-  Reg b = T->alloc_reg(T, RC_INT, U32);
-  Reg d = T->alloc_reg(T, RC_INT, U32);
-  T->load_imm(T, REG_op(a, U32), -1);
-  T->load_imm(T, REG_op(b, U32), 1);
-  T->cmp(T, CMP_LT_U, REG_op(d, U32), REG_op(a, U32), REG_op(b, U32));
-  cgtest_ret_reg(tf, d, U32);
-  cgtest_end(tf);
-}
-
-/* d06_cmp_ge_signed — boundary: (5 >= 5) → 1 (LE/GE families include eq). */
-void build_d06_cmp_ge_signed(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(a, I32), 5);
-  T->load_imm(T, REG_op(b, I32), 5);
-  T->cmp(T, CMP_GE_S, REG_op(d, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* d07_cmp_branch_taken — fused cmp_branch with the branch taken; landing
- * pad past the label returns 42. */
-void build_d07_cmp_branch_taken(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r, I32), 7);
-  Label L = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(r, I32), IMM_op(7, I32), L);
-  cgtest_ret_imm(tf, 0, I32); /* dead */
-  T->label_place(T, L);
-  cgtest_ret_imm(tf, 42, I32);
-  cgtest_end(tf);
-}
-
-/* d08_cmp_branch_not_taken — branch not taken; fallthrough returns 33. */
-void build_d08_cmp_branch_not_taken(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r, I32), 5);
-  Label L = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(r, I32), IMM_op(6, I32), L);
-  cgtest_ret_imm(tf, 33, I32);
-  T->label_place(T, L);
-  cgtest_ret_imm(tf, 0, I32); /* dead */
-  cgtest_end(tf);
-}
-
-/* d09_cmp_branch_lt_signed — signed compare-and-branch with negative LHS;
- * (-3 < 0) is true. */
-void build_d09_cmp_branch_lt_signed(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(r, I32), -3);
-  Label L = T->label_new(T);
-  T->cmp_branch(T, CMP_LT_S, REG_op(r, I32), IMM_op(0, I32), L);
-  cgtest_ret_imm(tf, 0, I32); /* dead */
-  T->label_place(T, L);
-  cgtest_ret_imm(tf, 9, I32);
-  cgtest_end(tf);
-}
-
-/* d10_jump — unconditional jump; the early ret is skipped. */
-void build_d10_jump(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Label L = T->label_new(T);
-  T->jump(T, L);
-  cgtest_ret_imm(tf, 0, I32); /* dead */
-  T->label_place(T, L);
-  cgtest_ret_imm(tf, 5, I32);
-  cgtest_end(tf);
-}
-
-/* d11_scope_if_true — `int x = 99; if (1) x = 33; return x;`
- * SCOPE_IF consumes the cond at scope_begin; then-branch updates the
- * local; scope_end closes the join. */
-void build_d11_scope_if_true(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, x, IMM_op(99, I32), I32);
-
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(c, I32), 1);
-  CGScopeDesc desc = {.kind = SCOPE_IF, .cond = REG_op(c, I32)};
-  CGScope s = T->scope_begin(T, &desc);
-  cgtest_store_local(tf, x, IMM_op(33, I32), I32);
-  T->scope_end(T, s);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), x, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* d12_scope_if_false — `int x = 99; if (0) x = 33; return x;`
- * Then-branch is dead; the local keeps its initial value. */
-void build_d12_scope_if_false(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, x, IMM_op(99, I32), I32);
-
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(c, I32), 0);
-  CGScopeDesc desc = {.kind = SCOPE_IF, .cond = REG_op(c, I32)};
-  CGScope s = T->scope_begin(T, &desc);
-  cgtest_store_local(tf, x, IMM_op(33, I32), I32);
-  T->scope_end(T, s);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), x, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* d13_scope_if_else — `int x; if (0) x = 10; else x = 7; return x;`
- * Exercises scope_else: cond is 0, so the else body wins. */
-void build_d13_scope_if_else(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_NONE);
-
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(c, I32), 0);
-  CGScopeDesc desc = {.kind = SCOPE_IF, .cond = REG_op(c, I32)};
-  CGScope s = T->scope_begin(T, &desc);
-  cgtest_store_local(tf, x, IMM_op(10, I32), I32);
-  T->scope_else(T, s);
-  cgtest_store_local(tf, x, IMM_op(7, I32), I32);
-  T->scope_end(T, s);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), x, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_e.c b/test/cg/harness/cases_e.c
@@ -1,258 +0,0 @@
-/* Group E — conversions.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group E: conversions
- *
- * One ConvKind per case, plus the boundary widths the AArch64 backend
- * actually selects between (UXTB/SXTB vs UXTH/SXTH vs UBFX/SBFX, 32→64
- * sign-extend). FP conversions all funnel through ftoi_s so the runner
- * sees an int exit code.
- * ============================================================ */
-
-/* e01_sext_i8_i32 — sext (i8)-1 → i32 = 0xFFFFFFFF; low 8 = 0xFF = 255. */
-void build_e01_sext_i8_i32(CgTestCtx* ctx) {
-  const Type* I8 = T_i8(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, I8);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(s, I8), -1);
-  T->convert(T, CV_SEXT, REG_op(d, I32), REG_op(s, I8));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e02_zext_u8_i32 — zext (u8)0xFF → i32 = 0xFF; low 8 = 255. The high
- * bits are zeroed, distinguishing this from e01. */
-void build_e02_zext_u8_i32(CgTestCtx* ctx) {
-  const Type* U8 = T_u8(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, U8);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(s, U8), 0xFF);
-  T->convert(T, CV_ZEXT, REG_op(d, I32), REG_op(s, U8));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e03_sext_i16_i32 — sext (i16)-1000 → 0xFFFFFC18; low 8 = 0x18 = 24. */
-void build_e03_sext_i16_i32(CgTestCtx* ctx) {
-  const Type* I16 = T_i16(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, I16);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(s, I16), -1000);
-  T->convert(T, CV_SEXT, REG_op(d, I32), REG_op(s, I16));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e04_zext_u16_i32 — zext (u16)0xABCD → 0x0000ABCD; low 8 = 0xCD = 205. */
-void build_e04_zext_u16_i32(CgTestCtx* ctx) {
-  const Type* U16 = T_u16(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, U16);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(s, U16), 0xABCD);
-  T->convert(T, CV_ZEXT, REG_op(d, I32), REG_op(s, U16));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e05_zext_u32_i64 — zext (u32)0xFFFFFFFF → i64 = 0x00000000FFFFFFFF;
- * runner reads w0 = 0xFFFFFFFF; low 8 = 255. Distinct from e06: high
- * 32 bits are zero. */
-void build_e05_zext_u32_i64(CgTestCtx* ctx) {
-  const Type* U32 = T_u32(ctx);
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I64);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, U32);
-  Reg d = T->alloc_reg(T, RC_INT, I64);
-  T->load_imm(T, REG_op(s, U32), 0xFFFFFFFFll);
-  T->convert(T, CV_ZEXT, REG_op(d, I64), REG_op(s, U32));
-  cgtest_ret_reg(tf, d, I64);
-  cgtest_end(tf);
-}
-
-/* e06_sext_i32_i64 — sext (i32)-1 → i64 = -1; low 8 = 255. Same low-byte
- * exit as e05 but the high bits differ — exercises SXTW vs UXTW. */
-void build_e06_sext_i32_i64(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I64);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, I64);
-  T->load_imm(T, REG_op(s, I32), -1);
-  T->convert(T, CV_SEXT, REG_op(d, I64), REG_op(s, I32));
-  cgtest_ret_reg(tf, d, I64);
-  cgtest_end(tf);
-}
-
-/* e07_trunc_i64_i32 — trunc 0x100000080 → low 32 = 0x80 = 128. */
-void build_e07_trunc_i64_i32(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, I64);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(s, I64), 0x100000080ll);
-  T->convert(T, CV_TRUNC, REG_op(d, I32), REG_op(s, I64));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e08_trunc_i32_i8 — trunc 0x1FF → low 8 = 0xFF; returned as u8 = 255. */
-void build_e08_trunc_i32_i8(CgTestCtx* ctx) {
-  const Type* U8 = T_u8(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, U8);
-  CGTarget* T = ctx->target;
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  Reg d = T->alloc_reg(T, RC_INT, U8);
-  T->load_imm(T, REG_op(s, I32), 0x1FF);
-  T->convert(T, CV_TRUNC, REG_op(d, U8), REG_op(s, I32));
-  cgtest_ret_reg(tf, d, U8);
-  cgtest_end(tf);
-}
-
-/* e09_itof_s_i32_f32 — i32(7) → f32(7.0) → ftoi_s i32 → 7. Exact
- * round-trip; verifies SCVTF + FCVTZS form a valid pair. */
-void build_e09_itof_s_i32_f32(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F32 = T_f32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg si = T->alloc_reg(T, RC_INT, I32);
-  Reg f = T->alloc_reg(T, RC_FP, F32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(si, I32), 7);
-  T->convert(T, CV_ITOF_S, REG_op(f, F32), REG_op(si, I32));
-  T->convert(T, CV_FTOI_S, REG_op(d, I32), REG_op(f, F32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e10_itof_u_u32_f64 — u32(100) → f64(100.0) → ftoi_s i32 → 100.
- * Crosses width on the way up (UCVTF Dn,Wn) and back down. */
-void build_e10_itof_u_u32_f64(CgTestCtx* ctx) {
-  const Type* U32 = T_u32(ctx);
-  const Type* I32 = T_i32(ctx);
-  const Type* F64 = T_f64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg si = T->alloc_reg(T, RC_INT, U32);
-  Reg f = T->alloc_reg(T, RC_FP, F64);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(si, U32), 100);
-  T->convert(T, CV_ITOF_U, REG_op(f, F64), REG_op(si, U32));
-  T->convert(T, CV_FTOI_S, REG_op(d, I32), REG_op(f, F64));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e11_ftoi_s_neg — ftoi_s(-1.5f) = -1; low 8 = 255. C99 truncation
- * rounds toward zero. */
-void build_e11_ftoi_s_neg(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  static const u8 BYTES_NEG_1_5[4] = {0x00, 0x00, 0xC0, 0xBF}; /* -1.5f LE */
-  Reg f = T->alloc_reg(T, RC_FP, F32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  ConstBytes cb = {.type = F32, .bytes = BYTES_NEG_1_5, .size = 4, .align = 4};
-  T->load_const(T, REG_op(f, F32), cb);
-  T->convert(T, CV_FTOI_S, REG_op(d, I32), REG_op(f, F32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e12_ftoi_u_pos — ftoi_u(200.7f) = 200u. Truncation toward zero,
- * matching C's (unsigned)x. */
-void build_e12_ftoi_u_pos(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, U32);
-  CGTarget* T = ctx->target;
-  static const u8 BYTES_200_7[4] = {0x33, 0xB3, 0x48, 0x43}; /* 200.7f LE */
-  Reg f = T->alloc_reg(T, RC_FP, F32);
-  Reg d = T->alloc_reg(T, RC_INT, U32);
-  ConstBytes cb = {.type = F32, .bytes = BYTES_200_7, .size = 4, .align = 4};
-  T->load_const(T, REG_op(f, F32), cb);
-  T->convert(T, CV_FTOI_U, REG_op(d, U32), REG_op(f, F32));
-  cgtest_ret_reg(tf, d, U32);
-  cgtest_end(tf);
-}
-
-/* e13_fext_f32_f64 — float→double promotion preserves an exactly
- * representable value (3.5f = 3.5). ftoi_s then yields 3. */
-void build_e13_fext_f32_f64(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* F64 = T_f64(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  static const u8 BYTES_3_5F[4] = {0x00, 0x00, 0x60, 0x40}; /* 3.5f LE */
-  Reg f32r = T->alloc_reg(T, RC_FP, F32);
-  Reg f64r = T->alloc_reg(T, RC_FP, F64);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  ConstBytes cb = {.type = F32, .bytes = BYTES_3_5F, .size = 4, .align = 4};
-  T->load_const(T, REG_op(f32r, F32), cb);
-  T->convert(T, CV_FEXT, REG_op(f64r, F64), REG_op(f32r, F32));
-  T->convert(T, CV_FTOI_S, REG_op(d, I32), REG_op(f64r, F64));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e14_ftrunc_f64_f32 — double→float demotion of 7.875 (exact in both);
- * ftoi_s yields 7. */
-void build_e14_ftrunc_f64_f32(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* F64 = T_f64(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  static const u8 BYTES_7_875[8] = {
-      0x00, 0x00, 0x00, 0x00, 0x00, 0x80, 0x1F, 0x40, /* 7.875 LE double */
-  };
-  Reg f64r = T->alloc_reg(T, RC_FP, F64);
-  Reg f32r = T->alloc_reg(T, RC_FP, F32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  ConstBytes cb = {.type = F64, .bytes = BYTES_7_875, .size = 8, .align = 8};
-  T->load_const(T, REG_op(f64r, F64), cb);
-  T->convert(T, CV_FTRUNC, REG_op(f32r, F32), REG_op(f64r, F64));
-  T->convert(T, CV_FTOI_S, REG_op(d, I32), REG_op(f32r, F32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* e15_bitcast_i32_f32 — same-size cross-class reinterpret. 0x40A00000
- * is the IEEE-754 single bit pattern for 5.0f. ftoi_s yields 5,
- * confirming the bits travelled to the FP register intact. */
-void build_e15_bitcast_i32_f32(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F32 = T_f32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg si = T->alloc_reg(T, RC_INT, I32);
-  Reg f = T->alloc_reg(T, RC_FP, F32);
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(si, I32), 0x40A00000); /* 5.0f bit pattern */
-  T->convert(T, CV_BITCAST, REG_op(f, F32), REG_op(si, I32));
-  T->convert(T, CV_FTOI_S, REG_op(d, I32), REG_op(f, F32));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_f.c b/test/cg/harness/cases_f.c
@@ -1,327 +0,0 @@
-/* Group F — memory (loads/stores beyond locals).
- * See CORPUS.md for the case list and expected values. */
-
-#include "cases_shared.h"
-#include "cg_test.h"
-
-/* ============================================================
- * Group F: memory (loads/stores beyond locals)
- *
- * Group B already exercises the basic load/store-of-local path. Group F
- * pushes the surface: every scalar width, FP load/store, indirect
- * non-zero offsets, store-from-IMM vs store-from-REG, copy_bytes,
- * set_bytes, volatile, and the bitfield methods.
- * ============================================================ */
-
-/* f01_load_store_i8 — local u8; store IMM 200; load; return. */
-void build_f01_load_store_i8(CgTestCtx* ctx) {
-  const Type* U8 = T_u8(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, U8);
-  CGTarget* T = ctx->target;
-  FrameSlot s = cgtest_local(tf, U8, FSF_NONE);
-  cgtest_store_local(tf, s, IMM_op(200, U8), U8);
-  Reg r = T->alloc_reg(T, RC_INT, U8);
-  cgtest_load_local(tf, REG_op(r, U8), s, U8);
-  cgtest_ret_reg(tf, r, U8);
-  cgtest_end(tf);
-}
-
-/* f02_load_store_i16 — local i16; store 0x1234; load; low 8 = 0x34 = 52. */
-void build_f02_load_store_i16(CgTestCtx* ctx) {
-  const Type* I16 = T_i16(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I16);
-  CGTarget* T = ctx->target;
-  FrameSlot s = cgtest_local(tf, I16, FSF_NONE);
-  cgtest_store_local(tf, s, IMM_op(0x1234, I16), I16);
-  Reg r = T->alloc_reg(T, RC_INT, I16);
-  cgtest_load_local(tf, REG_op(r, I16), s, I16);
-  cgtest_ret_reg(tf, r, I16);
-  cgtest_end(tf);
-}
-
-/* f03_load_store_i64 — local i64; store 0x1_0000_0042; load; runner
- * reads w0 = low 32 = 0x42 = 66. */
-void build_f03_load_store_i64(CgTestCtx* ctx) {
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I64);
-  CGTarget* T = ctx->target;
-  FrameSlot s = cgtest_local(tf, I64, FSF_NONE);
-  cgtest_store_local(tf, s, IMM_op(0x100000042ll, I64), I64);
-  Reg r = T->alloc_reg(T, RC_INT, I64);
-  cgtest_load_local(tf, REG_op(r, I64), s, I64);
-  cgtest_ret_reg(tf, r, I64);
-  cgtest_end(tf);
-}
-
-/* f04_load_store_f32 — local f32 home; store FP reg holding 7.5f; load
- * back; ftoi_s → 7. Exercises STR Sn / LDR Sn forms. */
-void build_f04_load_store_f32(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  static const u8 BYTES_75F[4] = {0x00, 0x00, 0xF0, 0x40}; /* 7.5f LE */
-
-  FrameSlot s = cgtest_local(tf, F32, FSF_NONE);
-  Reg src = T->alloc_reg(T, RC_FP, F32);
-  ConstBytes cb = {.type = F32, .bytes = BYTES_75F, .size = 4, .align = 4};
-  T->load_const(T, REG_op(src, F32), cb);
-  cgtest_store_local(tf, s, REG_op(src, F32), F32);
-
-  Reg dst = T->alloc_reg(T, RC_FP, F32);
-  cgtest_load_local(tf, REG_op(dst, F32), s, F32);
-  Reg ri = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(ri, I32), REG_op(dst, F32));
-  cgtest_ret_reg(tf, ri, I32);
-  cgtest_end(tf);
-}
-
-/* f05_load_store_f64 — local f64 home; store FP reg holding 3.25; load
- * back; ftoi_s → 3. STR Dn / LDR Dn. */
-void build_f05_load_store_f64(CgTestCtx* ctx) {
-  const Type* F64 = T_f64(ctx);
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  static const u8 BYTES_3_25[8] = {
-      0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x0A, 0x40, /* 3.25 LE double */
-  };
-
-  FrameSlot s = cgtest_local(tf, F64, FSF_NONE);
-  Reg src = T->alloc_reg(T, RC_FP, F64);
-  ConstBytes cb = {.type = F64, .bytes = BYTES_3_25, .size = 8, .align = 8};
-  T->load_const(T, REG_op(src, F64), cb);
-  cgtest_store_local(tf, s, REG_op(src, F64), F64);
-
-  Reg dst = T->alloc_reg(T, RC_FP, F64);
-  cgtest_load_local(tf, REG_op(dst, F64), s, F64);
-  Reg ri = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(ri, I32), REG_op(dst, F64));
-  cgtest_ret_reg(tf, ri, I32);
-  cgtest_end(tf);
-}
-
-/* f06_indirect_nonzero_offset — addr_of an i64 local, then store/load
- * an i32 at +4. Exercises [base + #imm] addressing past byte 0; also
- * verifies writes to one offset don't clobber a sentinel at another. */
-void build_f06_indirect_nonzero_offset(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I64, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, I64));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, I64)), LOCAL_op(s, I64));
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(base, 0, I32), IMM_op(99, I32), ma);
-  T->store(T, IND_op(base, 4, I32), IMM_op(42, I32), ma);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(base, 4, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* f07_store_reg — store from REG (not IMM) into a local slot. b04 stored
- * an immediate; this distinguishes the REG-source store path. */
-void build_f07_store_reg(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I32, FSF_NONE);
-  Reg src = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(src, I32), 17);
-  cgtest_store_local(tf, s, REG_op(src, I32), I32);
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(dst, I32), s, I32);
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* f08_copy_bytes — copy_bytes(dst, src, Pt {10,32}); read back dst.a +
- * dst.b → 42. The aggregate move is the operation under test; the per-
- * field load/store after it just reads the result. */
-void build_f08_copy_bytes(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PT = cases_pt_type(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot src = cgtest_local(tf, PT, FSF_ADDR_TAKEN);
-  FrameSlot dst = cgtest_local(tf, PT, FSF_ADDR_TAKEN);
-
-  /* Initialize src to {10, 32}. */
-  Reg src_addr = T->alloc_reg(T, RC_INT, T_ptr(ctx, PT));
-  T->addr_of(T, REG_op(src_addr, T_ptr(ctx, PT)), LOCAL_op(src, PT));
-  MemAccess ma_i32 = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(src_addr, 0, I32), IMM_op(10, I32), ma_i32);
-  T->store(T, IND_op(src_addr, 4, I32), IMM_op(32, I32), ma_i32);
-
-  Reg dst_addr = T->alloc_reg(T, RC_INT, T_ptr(ctx, PT));
-  T->addr_of(T, REG_op(dst_addr, T_ptr(ctx, PT)), LOCAL_op(dst, PT));
-
-  AggregateAccess agg = {
-      .type = PT,
-      .size = 8,
-      .align = 4,
-      .mem = {.type = PT, .size = 8, .align = 4, .alias.kind = ALIAS_LOCAL},
-  };
-  T->copy_bytes(T, REG_op(dst_addr, T_ptr(ctx, PT)),
-                REG_op(src_addr, T_ptr(ctx, PT)), agg);
-
-  Reg ra = T->alloc_reg(T, RC_INT, I32);
-  Reg rb = T->alloc_reg(T, RC_INT, I32);
-  Reg sum = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(ra, I32), IND_op(dst_addr, 0, I32), ma_i32);
-  T->load(T, REG_op(rb, I32), IND_op(dst_addr, 4, I32), ma_i32);
-  T->binop(T, BO_IADD, REG_op(sum, I32), REG_op(ra, I32), REG_op(rb, I32));
-  cgtest_ret_reg(tf, sum, I32);
-  cgtest_end(tf);
-}
-
-/* f09_set_bytes_zero — set_bytes(0) on an i32-sized buffer; load the
- * word back → 0. Exercises the "memset to zero" path which backends
- * often special-case (STR XZR). */
-void build_f09_set_bytes_zero(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U8 = T_u8(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, I32)), LOCAL_op(s, I32));
-
-  AggregateAccess agg = {
-      .type = I32,
-      .size = 4,
-      .align = 4,
-      .mem = {.type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL},
-  };
-  T->set_bytes(T, REG_op(base, T_ptr(ctx, I32)), IMM_op(0, U8), agg);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(base, 0, I32), agg.mem);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* f10_set_bytes_ff — set_bytes(0xFF) on an i32-sized buffer; load the
- * word → 0xFFFFFFFF; low 8 = 255. Exercises the byte-broadcast path. */
-void build_f10_set_bytes_ff(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U8 = T_u8(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, I32)), LOCAL_op(s, I32));
-
-  AggregateAccess agg = {
-      .type = I32,
-      .size = 4,
-      .align = 4,
-      .mem = {.type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL},
-  };
-  T->set_bytes(T, REG_op(base, T_ptr(ctx, I32)), IMM_op(0xFF, U8), agg);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(base, 0, I32), agg.mem);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* f11_volatile_rw — same body as b04 but with MF_VOLATILE on both the
- * store and the load. The expected exit value is identical; the
- * difference is in the emitted code (no DSE/DCE, no fold-through-store). */
-void build_f11_volatile_rw(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I32, FSF_NONE);
-  MemAccess ma = {.type = I32,
-                  .size = 4,
-                  .align = 4,
-                  .flags = MF_VOLATILE,
-                  .alias.kind = ALIAS_LOCAL};
-  T->store(T, LOCAL_op(s, I32), IMM_op(42, I32), ma);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), LOCAL_op(s, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* f12_bitfield_unsigned — { unsigned x : 5; } at bit_offset=3 inside a
- * zeroed i32 storage word; store 21; load → 21 (zero-extended). The
- * non-zero bit_offset forces the backend's mask+shift logic. */
-void build_f12_bitfield_unsigned(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, U32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, I32)), LOCAL_op(s, I32));
-
-  /* Zero the storage word so neighboring bits don't perturb the read. */
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(base, 0, I32), IMM_op(0, I32), ma);
-
-  BitFieldAccess bf = {
-      .field_type = U32,
-      .storage = ma,
-      .storage_offset = 0,
-      .bit_offset = 3,
-      .bit_width = 5,
-      .signed_ = 0,
-  };
-  T->bitfield_store(T, REG_op(base, T_ptr(ctx, I32)), IMM_op(21, U32), bf);
-
-  Reg r = T->alloc_reg(T, RC_INT, U32);
-  T->bitfield_load(T, REG_op(r, U32), REG_op(base, T_ptr(ctx, I32)), bf);
-  cgtest_ret_reg(tf, r, U32);
-  cgtest_end(tf);
-}
-
-/* f13_bitfield_signed — { signed x : 5; } at bit_offset=0; store -1
- * (5-bit all-ones); load sign-extends to -1; low 8 = 255. */
-void build_f13_bitfield_signed(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot s = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, I32)), LOCAL_op(s, I32));
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(base, 0, I32), IMM_op(0, I32), ma);
-
-  BitFieldAccess bf = {
-      .field_type = I32,
-      .storage = ma,
-      .storage_offset = 0,
-      .bit_offset = 0,
-      .bit_width = 5,
-      .signed_ = 1,
-  };
-  T->bitfield_store(T, REG_op(base, T_ptr(ctx, I32)), IMM_op(-1, I32), bf);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->bitfield_load(T, REG_op(r, I32), REG_op(base, T_ptr(ctx, I32)), bf);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_g.c b/test/cg/harness/cases_g.c
@@ -1,660 +0,0 @@
-/* Group G — calls (beyond direct-call path).
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group G: calls (beyond direct-call path)
- *
- * Group B established direct-call mechanics. Group G stresses what falls
- * out once calls compose: indirect calls, recursion, mutual recursion,
- * register-preservation across calls, HFAs, oversized struct byval.
- * ============================================================ */
-
-/* helper used by g01 and g11/g12: int echo(int x) { return x; } */
-static ObjSymId build_g_echo_helper(CgTestCtx* ctx, const char* name) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, name, I32, params, 1);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), cgtest_param_slot(tf, 0), I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* g01_indirect_call — int (*fp)(int) = echo; return fp(42); */
-void build_g01_indirect_call(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId echo = build_g_echo_helper(ctx, "g01_echo");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Materialize the function pointer from the GLOBAL symbol. */
-  const Type* fn_ty = type_func(ctx->pool, I32, params, 1, 0);
-  const Type* fnp_ty = T_ptr(ctx, fn_ty);
-  Reg fp = T->alloc_reg(T, RC_INT, fnp_ty);
-  T->addr_of(T, REG_op(fp, fnp_ty), GLOBAL_op(echo, 0));
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 42}};
-  cgtest_call_indirect(tf, fp, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by g02: int fact(int n) { return n<2 ? 1 : n*fact(n-1); }
- * Forward-decl the symbol so the body can reference it for recursion. */
-static ObjSymId build_g02_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId sym = cgtest_decl_func(ctx, "g02_fact");
-  CgTestFn* tf = cgtest_begin_func_at(ctx, sym, I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  Reg n = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(n, I32), cgtest_param_slot(tf, 0), I32);
-
-  /* if (n < 2) goto base; */
-  Label base = T->label_new(T);
-  T->cmp_branch(T, CMP_LT_S, REG_op(n, I32), IMM_op(2, I32), base);
-
-  /* recursive: tmp = fact(n - 1); return n * tmp; */
-  Reg n1 = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_ISUB, REG_op(n1, I32), REG_op(n, I32), IMM_op(1, I32));
-
-  Reg rec = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg rec_args[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = n1}};
-  cgtest_call(tf, sym, I32, params, rec_args, 1, REG_op(rec, I32));
-
-  Reg prod = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IMUL, REG_op(prod, I32), REG_op(n, I32), REG_op(rec, I32));
-  cgtest_ret_reg(tf, prod, I32);
-
-  /* base: return 1; */
-  T->label_place(T, base);
-  cgtest_ret_imm(tf, 1, I32);
-  cgtest_end(tf);
-  return sym;
-}
-
-/* g02_recursion_factorial — fact(5) = 120. */
-void build_g02_recursion_factorial(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId fact = build_g02_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 5}};
-  cgtest_call(tf, fact, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by g03: int fib(int n) { return n<2?n:fib(n-1)+fib(n-2); } */
-static ObjSymId build_g03_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId sym = cgtest_decl_func(ctx, "g03_fib");
-  CgTestFn* tf = cgtest_begin_func_at(ctx, sym, I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  Reg n = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(n, I32), cgtest_param_slot(tf, 0), I32);
-
-  /* if (n < 2) return n; */
-  Label base = T->label_new(T);
-  T->cmp_branch(T, CMP_LT_S, REG_op(n, I32), IMM_op(2, I32), base);
-
-  /* a = fib(n-1); b = fib(n-2); return a+b; */
-  Reg n1 = T->alloc_reg(T, RC_INT, I32);
-  Reg n2 = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_ISUB, REG_op(n1, I32), REG_op(n, I32), IMM_op(1, I32));
-  T->binop(T, BO_ISUB, REG_op(n2, I32), REG_op(n, I32), IMM_op(2, I32));
-
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg a1[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = n1}};
-  CgTestArg a2[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = n2}};
-  cgtest_call(tf, sym, I32, params, a1, 1, REG_op(a, I32));
-  cgtest_call(tf, sym, I32, params, a2, 1, REG_op(b, I32));
-
-  Reg sum = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(sum, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, sum, I32);
-
-  T->label_place(T, base);
-  cgtest_ret_reg(tf, n, I32);
-  cgtest_end(tf);
-  return sym;
-}
-
-/* g03_recursion_fib — fib(10) = 55. */
-void build_g03_recursion_fib(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId fib = build_g03_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 10}};
-  cgtest_call(tf, fib, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* g04_mutual_recursion — is_even(8) = 1.
- *   int is_even(int n) { return n==0 ? 1 : is_odd(n-1); }
- *   int is_odd (int n) { return n==0 ? 0 : is_even(n-1); }
- * Forward-declare both symbols up front so each body can reference the
- * other before it has been emitted. */
-void build_g04_mutual_recursion(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  CGTarget* T = ctx->target;
-
-  ObjSymId sym_e = cgtest_decl_func(ctx, "g04_is_even");
-  ObjSymId sym_o = cgtest_decl_func(ctx, "g04_is_odd");
-
-  /* is_even body. */
-  {
-    CgTestFn* tf = cgtest_begin_func_at(ctx, sym_e, I32, params, 1);
-    Reg n = T->alloc_reg(T, RC_INT, I32);
-    cgtest_load_local(tf, REG_op(n, I32), cgtest_param_slot(tf, 0), I32);
-    Label base = T->label_new(T);
-    T->cmp_branch(T, CMP_EQ, REG_op(n, I32), IMM_op(0, I32), base);
-    Reg n1 = T->alloc_reg(T, RC_INT, I32);
-    T->binop(T, BO_ISUB, REG_op(n1, I32), REG_op(n, I32), IMM_op(1, I32));
-    Reg r = T->alloc_reg(T, RC_INT, I32);
-    CgTestArg args[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = n1}};
-    cgtest_call(tf, sym_o, I32, params, args, 1, REG_op(r, I32));
-    cgtest_ret_reg(tf, r, I32);
-    T->label_place(T, base);
-    cgtest_ret_imm(tf, 1, I32);
-    cgtest_end(tf);
-  }
-
-  /* is_odd body. */
-  {
-    CgTestFn* tf = cgtest_begin_func_at(ctx, sym_o, I32, params, 1);
-    Reg n = T->alloc_reg(T, RC_INT, I32);
-    cgtest_load_local(tf, REG_op(n, I32), cgtest_param_slot(tf, 0), I32);
-    Label base = T->label_new(T);
-    T->cmp_branch(T, CMP_EQ, REG_op(n, I32), IMM_op(0, I32), base);
-    Reg n1 = T->alloc_reg(T, RC_INT, I32);
-    T->binop(T, BO_ISUB, REG_op(n1, I32), REG_op(n, I32), IMM_op(1, I32));
-    Reg r = T->alloc_reg(T, RC_INT, I32);
-    CgTestArg args[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = n1}};
-    cgtest_call(tf, sym_e, I32, params, args, 1, REG_op(r, I32));
-    cgtest_ret_reg(tf, r, I32);
-    T->label_place(T, base);
-    cgtest_ret_imm(tf, 0, I32);
-    cgtest_end(tf);
-  }
-
-  /* test_main: return is_even(8) → 1. */
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 8}};
-  cgtest_call(tf, sym_e, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by g05: int inc(int x) { return x+1; } */
-static ObjSymId build_g05_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "g05_inc", I32, params, 1);
-  CGTarget* T = ctx->target;
-  Reg x = T->alloc_reg(T, RC_INT, I32);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(x, I32), cgtest_param_slot(tf, 0), I32);
-  T->binop(T, BO_IADD, REG_op(r, I32), REG_op(x, I32), IMM_op(1, I32));
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* g05_chained_calls — inc(inc(inc(39))) = 42. */
-void build_g05_chained_calls(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId inc = build_g05_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg r1 = T->alloc_reg(T, RC_INT, I32);
-  Reg r2 = T->alloc_reg(T, RC_INT, I32);
-  Reg r3 = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg a1[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 39}};
-  cgtest_call(tf, inc, I32, params, a1, 1, REG_op(r1, I32));
-  CgTestArg a2[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = r1}};
-  cgtest_call(tf, inc, I32, params, a2, 1, REG_op(r2, I32));
-  CgTestArg a3[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = r2}};
-  cgtest_call(tf, inc, I32, params, a3, 1, REG_op(r3, I32));
-  cgtest_ret_reg(tf, r3, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by g06:
- *   int f(int a, float b, int c, double d, int e)
- *     { return a + (int)b + c + (int)d + e; }
- * Mixes int and FP params — abi_func_info routes int→GPR and FP→FP. */
-static ObjSymId build_g06_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F32 = T_f32(ctx);
-  const Type* F64 = T_f64(ctx);
-  const Type* params[] = {I32, F32, I32, F64, I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "g06_f", I32, params, 5);
-  CGTarget* T = ctx->target;
-
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  Reg e = T->alloc_reg(T, RC_INT, I32);
-  Reg fb = T->alloc_reg(T, RC_FP, F32);
-  Reg fd = T->alloc_reg(T, RC_FP, F64);
-  cgtest_load_local(tf, REG_op(a, I32), cgtest_param_slot(tf, 0), I32);
-  cgtest_load_local(tf, REG_op(fb, F32), cgtest_param_slot(tf, 1), F32);
-  cgtest_load_local(tf, REG_op(c, I32), cgtest_param_slot(tf, 2), I32);
-  cgtest_load_local(tf, REG_op(fd, F64), cgtest_param_slot(tf, 3), F64);
-  cgtest_load_local(tf, REG_op(e, I32), cgtest_param_slot(tf, 4), I32);
-
-  Reg ib = T->alloc_reg(T, RC_INT, I32);
-  Reg id = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(ib, I32), REG_op(fb, F32));
-  T->convert(T, CV_FTOI_S, REG_op(id, I32), REG_op(fd, F64));
-
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(a, I32), REG_op(ib, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(c, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(id, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(e, I32));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* g06_mixed_int_fp_params — f(2, 3.0f, 5, 7.0, 25) → 42. */
-void build_g06_mixed_int_fp_params(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F32 = T_f32(ctx);
-  const Type* F64 = T_f64(ctx);
-  const Type* params[] = {I32, F32, I32, F64, I32};
-  ObjSymId f = build_g06_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Materialize 3.0f and 7.0 in FP regs via load_const. */
-  static const u8 BYTES_3F[4] = {0x00, 0x00, 0x40, 0x40}; /* 3.0f LE */
-  static const u8 BYTES_7D[8] = {0x00, 0x00, 0x00, 0x00,
-                                 0x00, 0x00, 0x1C, 0x40}; /* 7.0   LE double */
-  Reg fb = T->alloc_reg(T, RC_FP, F32);
-  Reg fd = T->alloc_reg(T, RC_FP, F64);
-  ConstBytes cbf = {.type = F32, .bytes = BYTES_3F, .size = 4, .align = 4};
-  ConstBytes cbd = {.type = F64, .bytes = BYTES_7D, .size = 8, .align = 8};
-  T->load_const(T, REG_op(fb, F32), cbf);
-  T->load_const(T, REG_op(fd, F64), cbd);
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 2},
-      {.kind = CGT_ARG_REG, .type = F32, .v.reg = fb},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 5},
-      {.kind = CGT_ARG_REG, .type = F64, .v.reg = fd},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 25},
-  };
-  cgtest_call(tf, f, I32, params, args, 5, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by g07: void fill(int *p, int v) { *p = v; } */
-static ObjSymId build_g07_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* VOID = T_void(ctx);
-  const Type* params[] = {PI32, I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "g07_fill", VOID, params, 2);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  Reg v = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(p, PI32), cgtest_param_slot(tf, 0), PI32);
-  cgtest_load_local(tf, REG_op(v, I32), cgtest_param_slot(tf, 1), I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(p, 0, I32), REG_op(v, I32), ma);
-  cgtest_ret_void(tf);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* g07_void_call_outparam — int x; fill(&x, 42); return x → 42. */
-void build_g07_void_call_outparam(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* VOID = T_void(ctx);
-  const Type* params[] = {PI32, I32};
-  ObjSymId fill = build_g07_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->addr_of(T, REG_op(p, PI32), LOCAL_op(x, I32));
-
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_REG, .type = PI32, .v.reg = p},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 42},
-  };
-  cgtest_call(tf, fill, VOID, params, args, 2, IMM_op(0, VOID));
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), x, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* struct S { int a[8]; }; — 32 bytes, exceeds the 16-byte threshold and is
- * passed by reference (caller-allocated copy) on AArch64 SysV. */
-static const Type* build_g08_struct_type(CgTestCtx* ctx) {
-  Sym tag = pool_intern_cstr(ctx->pool, "S32");
-  TagId tid = type_tag_new(ctx->pool, TAG_STRUCT, tag, (SrcLoc){0, 0, 0});
-  TypeRecordBuilder* b = type_record_begin(ctx->pool, TY_STRUCT, tid, tag);
-  /* Eight i32 fields named a0..a7. */
-  for (int i = 0; i < 8; ++i) {
-    char name[8];
-    name[0] = 'a';
-    name[1] = (char)('0' + i);
-    name[2] = 0;
-    type_record_field(b, (Field){.name = pool_intern_cstr(ctx->pool, name),
-                                 .type = T_i32(ctx)});
-  }
-  return type_record_end(ctx->pool, b);
-}
-
-/* helper used by g08: int take(struct S s) { return s.a7; } */
-static ObjSymId build_g08_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* S = build_g08_struct_type(ctx);
-  const Type* params[] = {S};
-  CgTestFn* tf = cgtest_begin_func(ctx, "g08_take", I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, S));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, S)),
-             LOCAL_op(cgtest_param_slot(tf, 0), S));
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_PARAM};
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  /* a7 lives at offset 28. */
-  T->load(T, REG_op(r, I32), IND_op(base, 28, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* g08_large_struct_byval — 32-byte struct passed by value. */
-void build_g08_large_struct_byval(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* S = build_g08_struct_type(ctx);
-  const Type* params[] = {S};
-  ObjSymId take = build_g08_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot src = cgtest_local(tf, S, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, S));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, S)), LOCAL_op(src, S));
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  /* Zero out a0..a6 so the helper observation depends only on a7. */
-  for (int i = 0; i < 7; ++i) {
-    T->store(T, IND_op(base, i * 4, I32), IMM_op(0, I32), ma);
-  }
-  T->store(T, IND_op(base, 28, I32), IMM_op(42, I32), ma);
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_BYVAL_LOCAL, .type = S, .v.slot = src}};
-  cgtest_call(tf, take, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* struct V { float x, y; }; — HFA of two f32. AArch64 SysV passes in v0,v1
- * and returns in {v0, v1}. */
-static const Type* build_g_hfa_type(CgTestCtx* ctx) {
-  Sym tag = pool_intern_cstr(ctx->pool, "V");
-  TagId tid = type_tag_new(ctx->pool, TAG_STRUCT, tag, (SrcLoc){0, 0, 0});
-  TypeRecordBuilder* b = type_record_begin(ctx->pool, TY_STRUCT, tid, tag);
-  type_record_field(
-      b, (Field){.name = pool_intern_cstr(ctx->pool, "x"), .type = T_f32(ctx)});
-  type_record_field(
-      b, (Field){.name = pool_intern_cstr(ctx->pool, "y"), .type = T_f32(ctx)});
-  return type_record_end(ctx->pool, b);
-}
-
-/* helper used by g09: int f(struct V v) { return (int)(v.x + v.y); } */
-static ObjSymId build_g09_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F32 = T_f32(ctx);
-  const Type* V = build_g_hfa_type(ctx);
-  const Type* params[] = {V};
-  CgTestFn* tf = cgtest_begin_func(ctx, "g09_f", I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, V));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, V)),
-             LOCAL_op(cgtest_param_slot(tf, 0), V));
-  MemAccess ma = {
-      .type = F32, .size = 4, .align = 4, .alias.kind = ALIAS_PARAM};
-  Reg fx = T->alloc_reg(T, RC_FP, F32);
-  Reg fy = T->alloc_reg(T, RC_FP, F32);
-  Reg fs = T->alloc_reg(T, RC_FP, F32);
-  T->load(T, REG_op(fx, F32), IND_op(base, 0, F32), ma);
-  T->load(T, REG_op(fy, F32), IND_op(base, 4, F32), ma);
-  T->binop(T, BO_FADD, REG_op(fs, F32), REG_op(fx, F32), REG_op(fy, F32));
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(r, I32), REG_op(fs, F32));
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* g09_hfa_param_f32x2 — f({1.5f, 1.5f}) → 3. */
-void build_g09_hfa_param_f32x2(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F32 = T_f32(ctx);
-  const Type* V = build_g_hfa_type(ctx);
-  const Type* params[] = {V};
-  ObjSymId f = build_g09_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Build local {1.5f, 1.5f} via FP load_const + store_local. */
-  static const u8 BYTES_15F[4] = {0x00, 0x00, 0xC0, 0x3F}; /* 1.5f LE */
-  FrameSlot src = cgtest_local(tf, V, FSF_ADDR_TAKEN);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, V));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, V)), LOCAL_op(src, V));
-  Reg fc = T->alloc_reg(T, RC_FP, F32);
-  ConstBytes cb = {.type = F32, .bytes = BYTES_15F, .size = 4, .align = 4};
-  T->load_const(T, REG_op(fc, F32), cb);
-  MemAccess ma = {
-      .type = F32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(base, 0, F32), REG_op(fc, F32), ma);
-  T->store(T, IND_op(base, 4, F32), REG_op(fc, F32), ma);
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_BYVAL_LOCAL, .type = V, .v.slot = src}};
-  cgtest_call(tf, f, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by g10: struct V g10_mk(void) { return (struct V){1.5f, 1.5f}; }
- * Returned via the HFA path — abi_func_info classifies the struct as
- * homogeneous-FP, so the backend places fields into v0/v1 instead of
- * memcpying through an sret pointer. cgtest_ret_indirect drives both. */
-static ObjSymId build_g10_helper(CgTestCtx* ctx) {
-  const Type* F32 = T_f32(ctx);
-  const Type* V = build_g_hfa_type(ctx);
-  CgTestFn* tf = cgtest_begin_func(ctx, "g10_mk", V, NULL, 0);
-  CGTarget* T = ctx->target;
-
-  static const u8 BYTES_15F[4] = {0x00, 0x00, 0xC0, 0x3F};
-  FrameSlot s = cgtest_local(tf, V, FSF_NONE);
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, V));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, V)), LOCAL_op(s, V));
-  Reg fc = T->alloc_reg(T, RC_FP, F32);
-  ConstBytes cb = {.type = F32, .bytes = BYTES_15F, .size = 4, .align = 4};
-  T->load_const(T, REG_op(fc, F32), cb);
-  MemAccess ma = {
-      .type = F32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(base, 0, F32), REG_op(fc, F32), ma);
-  T->store(T, IND_op(base, 4, F32), REG_op(fc, F32), ma);
-
-  cgtest_ret_indirect(tf, s);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* g10_hfa_return_f32x2 — sum fields of returned HFA, ftoi_s → 3. */
-void build_g10_hfa_return_f32x2(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F32 = T_f32(ctx);
-  const Type* V = build_g_hfa_type(ctx);
-  ObjSymId mk = build_g10_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot dst = cgtest_local(tf, V, FSF_ADDR_TAKEN);
-  cgtest_call(tf, mk, V, NULL, NULL, 0, LOCAL_op(dst, V));
-
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, V));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, V)), LOCAL_op(dst, V));
-  MemAccess ma = {
-      .type = F32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  Reg fx = T->alloc_reg(T, RC_FP, F32);
-  Reg fy = T->alloc_reg(T, RC_FP, F32);
-  Reg fs = T->alloc_reg(T, RC_FP, F32);
-  T->load(T, REG_op(fx, F32), IND_op(base, 0, F32), ma);
-  T->load(T, REG_op(fy, F32), IND_op(base, 4, F32), ma);
-  T->binop(T, BO_FADD, REG_op(fs, F32), REG_op(fx, F32), REG_op(fy, F32));
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(r, I32), REG_op(fs, F32));
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* g11_caller_saved_live_across_call — x=42 must survive a call that
- * clobbers caller-saved regs. */
-void build_g11_caller_saved_live_across_call(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId echo = build_g_echo_helper(ctx, "g11_echo");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg x = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(x, I32), 42);
-
-  Reg ignored = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 99}};
-  cgtest_call(tf, echo, I32, params, args, 1, REG_op(ignored, I32));
-
-  cgtest_ret_reg(tf, x, I32);
-  cgtest_end(tf);
-}
-
-/* g12_addr_taken_local_across_call — addr-taken local survives an
- * intervening call. b05 body with a side call between increment and
- * read-back. */
-void build_g12_addr_taken_local_across_call(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId echo = build_g_echo_helper(ctx, "g12_echo");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  cgtest_store_local(tf, x, IMM_op(17, I32), I32);
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(p, T_ptr(ctx, I32)), LOCAL_op(x, I32));
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(val, I32), IND_op(p, 0, I32), ma);
-  T->binop(T, BO_IADD, REG_op(val, I32), REG_op(val, I32), IMM_op(1, I32));
-  T->store(T, IND_op(p, 0, I32), REG_op(val, I32), ma);
-
-  /* intervening call — must not corrupt the local or its address. */
-  Reg ignored = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 99}};
-  cgtest_call(tf, echo, I32, params, args, 1, REG_op(ignored, I32));
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(out, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* g13_call_in_loop_induction — for(i=0;i<10;i++) s += id(i); → 45.
- * Built on flat cmp_branch + jump, no SCOPE_LOOP — the induction var
- * lives in an addr-taken slot to force frame-residency across the call. */
-void build_g13_call_in_loop_induction(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  ObjSymId id = build_g_echo_helper(ctx, "g13_id");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot islot = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot sslot = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, islot, IMM_op(0, I32), I32);
-  cgtest_store_local(tf, sslot, IMM_op(0, I32), I32);
-
-  Label top = T->label_new(T);
-  Label done = T->label_new(T);
-  T->label_place(T, top);
-  Reg ireg = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ireg, I32), islot, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ireg, I32), IMM_op(10, I32), done);
-
-  /* res = id(i); */
-  Reg res = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = ireg}};
-  cgtest_call(tf, id, I32, params, args, 1, REG_op(res, I32));
-
-  /* s += res; */
-  Reg sreg = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(sreg, I32), sslot, I32);
-  T->binop(T, BO_IADD, REG_op(sreg, I32), REG_op(sreg, I32), REG_op(res, I32));
-  cgtest_store_local(tf, sslot, REG_op(sreg, I32), I32);
-
-  /* i++; jump top. */
-  Reg inew = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(inew, I32), islot, I32);
-  T->binop(T, BO_IADD, REG_op(inew, I32), REG_op(inew, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, islot, REG_op(inew, I32), I32);
-  T->jump(T, top);
-
-  T->label_place(T, done);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), sslot, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_h.c b/test/cg/harness/cases_h.c
@@ -1,655 +0,0 @@
-/* Group H — control flow.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group H: control flow
- *
- * Loops and multi-way branch beyond Group D's scope_if/scope_else.
- * Loops use SCOPE_LOOP with explicit break/continue labels — the
- * caller places continue at the appropriate point (top for while,
- * after-body-before-incr for for-loops). Switches lower to chained
- * cmp_branch + jump (no dedicated switch op). Short-circuit && / ||
- * are exercised by observing that the RHS side effect did not run.
- * ============================================================ */
-
-/* h01_while_sum_0_to_9 — int s=0,i=0; while(i<10){s+=i;i++;} return s; → 45. */
-void build_h01_while_sum_0_to_9(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label brk = T->label_new(T);
-  Label cnt = T->label_new(T);
-  CGScopeDesc d = {
-      .kind = SCOPE_LOOP, .break_label = brk, .continue_label = cnt};
-  CGScope sc = T->scope_begin(T, &d);
-  T->label_place(T, cnt);
-
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), IMM_op(10, I32), brk);
-
-  Reg sr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(sr, I32), ss, I32);
-  T->binop(T, BO_IADD, REG_op(sr, I32), REG_op(sr, I32), REG_op(ir, I32));
-  cgtest_store_local(tf, ss, REG_op(sr, I32), I32);
-
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, cnt);
-
-  T->label_place(T, brk);
-  T->scope_end(T, sc);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h02_do_while_once — int i=0; do { i=42; } while(0); return i; → 42. */
-void build_h02_do_while_once(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label brk = T->label_new(T);
-  Label cnt = T->label_new(T);
-  CGScopeDesc d = {
-      .kind = SCOPE_LOOP, .break_label = brk, .continue_label = cnt};
-  CGScope sc = T->scope_begin(T, &d);
-  T->label_place(T, cnt);
-
-  /* body: i = 42; */
-  cgtest_store_local(tf, is, IMM_op(42, I32), I32);
-
-  /* condition: while (0) — never taken. */
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(c, I32), 0);
-  T->cmp_branch(T, CMP_NE, REG_op(c, I32), IMM_op(0, I32), cnt);
-
-  T->label_place(T, brk);
-  T->scope_end(T, sc);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), is, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h03_for_count_to_10 — for(i=1;i<=10;i++) s+=i; → 55. */
-void build_h03_for_count_to_10(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-  cgtest_store_local(tf, is, IMM_op(1, I32), I32);
-
-  Label brk = T->label_new(T);
-  Label cnt = T->label_new(T); /* increment site */
-  Label top = T->label_new(T); /* condition test */
-  CGScopeDesc d = {
-      .kind = SCOPE_LOOP, .break_label = brk, .continue_label = cnt};
-  CGScope sc = T->scope_begin(T, &d);
-
-  T->label_place(T, top);
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GT_S, REG_op(ir, I32), IMM_op(10, I32), brk);
-
-  /* body: s += i; */
-  Reg sr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(sr, I32), ss, I32);
-  T->binop(T, BO_IADD, REG_op(sr, I32), REG_op(sr, I32), REG_op(ir, I32));
-  cgtest_store_local(tf, ss, REG_op(sr, I32), I32);
-
-  /* increment: i++; */
-  T->label_place(T, cnt);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, top);
-
-  T->label_place(T, brk);
-  T->scope_end(T, sc);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h04_loop_break — for(i=0;;i++) if(i==42) break; return i; → 42. */
-void build_h04_loop_break(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label brk = T->label_new(T);
-  Label cnt = T->label_new(T);
-  CGScopeDesc d = {
-      .kind = SCOPE_LOOP, .break_label = brk, .continue_label = cnt};
-  CGScope sc = T->scope_begin(T, &d);
-  T->label_place(T, cnt);
-
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_EQ, REG_op(ir, I32), IMM_op(42, I32), brk);
-
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, cnt);
-
-  T->label_place(T, brk);
-  T->scope_end(T, sc);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), is, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h05_loop_continue — sum of even i in [0,20) using continue → 90. */
-void build_h05_loop_continue(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label brk = T->label_new(T);
-  Label cnt = T->label_new(T); /* the increment site */
-  Label top = T->label_new(T);
-  CGScopeDesc d = {
-      .kind = SCOPE_LOOP, .break_label = brk, .continue_label = cnt};
-  CGScope sc = T->scope_begin(T, &d);
-
-  T->label_place(T, top);
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), IMM_op(20, I32), brk);
-
-  /* if (i & 1) continue; — odd → skip add. */
-  Reg parity = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_AND, REG_op(parity, I32), REG_op(ir, I32), IMM_op(1, I32));
-  T->cmp_branch(T, CMP_NE, REG_op(parity, I32), IMM_op(0, I32), cnt);
-
-  /* s += i; */
-  Reg sr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(sr, I32), ss, I32);
-  T->binop(T, BO_IADD, REG_op(sr, I32), REG_op(sr, I32), REG_op(ir, I32));
-  cgtest_store_local(tf, ss, REG_op(sr, I32), I32);
-
-  T->label_place(T, cnt);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, top);
-
-  T->label_place(T, brk);
-  T->scope_end(T, sc);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h06_nested_loops — for(i=0;i<3;i++) for(j=0;j<2;j++) s++; → 6. */
-void build_h06_nested_loops(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot js = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label outer_brk = T->label_new(T);
-  Label outer_cnt = T->label_new(T);
-  CGScopeDesc d_o = {.kind = SCOPE_LOOP,
-                     .break_label = outer_brk,
-                     .continue_label = outer_cnt};
-  CGScope outer = T->scope_begin(T, &d_o);
-  T->label_place(T, outer_cnt);
-
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), IMM_op(3, I32), outer_brk);
-
-  cgtest_store_local(tf, js, IMM_op(0, I32), I32);
-  Label inner_brk = T->label_new(T);
-  Label inner_cnt = T->label_new(T);
-  CGScopeDesc d_i = {.kind = SCOPE_LOOP,
-                     .break_label = inner_brk,
-                     .continue_label = inner_cnt};
-  CGScope inner = T->scope_begin(T, &d_i);
-  T->label_place(T, inner_cnt);
-
-  Reg jr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(jr, I32), js, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(jr, I32), IMM_op(2, I32), inner_brk);
-
-  /* s++ */
-  Reg sr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(sr, I32), ss, I32);
-  T->binop(T, BO_IADD, REG_op(sr, I32), REG_op(sr, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, ss, REG_op(sr, I32), I32);
-
-  /* j++ */
-  T->binop(T, BO_IADD, REG_op(jr, I32), REG_op(jr, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, js, REG_op(jr, I32), I32);
-  T->jump(T, inner_cnt);
-  T->label_place(T, inner_brk);
-  T->scope_end(T, inner);
-
-  /* i++ */
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, outer_cnt);
-  T->label_place(T, outer_brk);
-  T->scope_end(T, outer);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h07_break_inner_only — outer counts 3 iterations, inner breaks after
- * incrementing s by 3 each time → s = 9. */
-void build_h07_break_inner_only(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label outer_brk = T->label_new(T);
-  Label outer_cnt = T->label_new(T);
-  CGScopeDesc d_o = {.kind = SCOPE_LOOP,
-                     .break_label = outer_brk,
-                     .continue_label = outer_cnt};
-  CGScope outer = T->scope_begin(T, &d_o);
-  T->label_place(T, outer_cnt);
-
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), IMM_op(3, I32), outer_brk);
-
-  /* inner loop: counts 0..2, but inner-break exits after counter reaches 3
-   * (so adds 3 to s each outer iteration). */
-  FrameSlot js = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, js, IMM_op(0, I32), I32);
-  Label inner_brk = T->label_new(T);
-  Label inner_cnt = T->label_new(T);
-  CGScopeDesc d_i = {.kind = SCOPE_LOOP,
-                     .break_label = inner_brk,
-                     .continue_label = inner_cnt};
-  CGScope inner = T->scope_begin(T, &d_i);
-  T->label_place(T, inner_cnt);
-
-  Reg jr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(jr, I32), js, I32);
-  /* if (j >= 3) inner-break */
-  T->cmp_branch(T, CMP_GE_S, REG_op(jr, I32), IMM_op(3, I32), inner_brk);
-
-  /* s++ */
-  Reg sr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(sr, I32), ss, I32);
-  T->binop(T, BO_IADD, REG_op(sr, I32), REG_op(sr, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, ss, REG_op(sr, I32), I32);
-
-  /* j++ */
-  T->binop(T, BO_IADD, REG_op(jr, I32), REG_op(jr, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, js, REG_op(jr, I32), I32);
-  T->jump(T, inner_cnt);
-  T->label_place(T, inner_brk);
-  T->scope_end(T, inner);
-
-  /* outer must continue past the inner break — i++ */
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, outer_cnt);
-  T->label_place(T, outer_brk);
-  T->scope_end(T, outer);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h08_early_return_in_loop — for(i=0;;i++) if(i==17) return i; → 17. */
-void build_h08_early_return_in_loop(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label brk = T->label_new(T);
-  Label cnt = T->label_new(T);
-  Label hit = T->label_new(T);
-  CGScopeDesc d = {
-      .kind = SCOPE_LOOP, .break_label = brk, .continue_label = cnt};
-  CGScope sc = T->scope_begin(T, &d);
-  T->label_place(T, cnt);
-
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_EQ, REG_op(ir, I32), IMM_op(17, I32), hit);
-
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, cnt);
-
-  T->label_place(T, hit);
-  cgtest_ret_reg(tf, ir, I32);
-  /* the rest is dead. */
-  T->label_place(T, brk);
-  T->scope_end(T, sc);
-  cgtest_ret_imm(tf, 0, I32);
-  cgtest_end(tf);
-}
-
-/* h09_switch_three_cases — switch(2) {case 1:r=10;break; case 2:r=42;break;
- *                                       case 3:r=99;break;} → 42. */
-void build_h09_switch_three_cases(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot rs = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, rs, IMM_op(0, I32), I32);
-
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(val, I32), 2);
-
-  Label l1 = T->label_new(T), l2 = T->label_new(T), l3 = T->label_new(T);
-  Label end = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(val, I32), IMM_op(1, I32), l1);
-  T->cmp_branch(T, CMP_EQ, REG_op(val, I32), IMM_op(2, I32), l2);
-  T->cmp_branch(T, CMP_EQ, REG_op(val, I32), IMM_op(3, I32), l3);
-  T->jump(T, end);
-
-  T->label_place(T, l1);
-  cgtest_store_local(tf, rs, IMM_op(10, I32), I32);
-  T->jump(T, end);
-  T->label_place(T, l2);
-  cgtest_store_local(tf, rs, IMM_op(42, I32), I32);
-  T->jump(T, end);
-  T->label_place(T, l3);
-  cgtest_store_local(tf, rs, IMM_op(99, I32), I32);
-  T->jump(T, end);
-
-  T->label_place(T, end);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), rs, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h10_switch_fallthrough — switch(1){case 1: r+=10; case 2: r+=20;} → 30. */
-void build_h10_switch_fallthrough(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot rs = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, rs, IMM_op(0, I32), I32);
-
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(val, I32), 1);
-
-  Label l1 = T->label_new(T), l2 = T->label_new(T);
-  Label end = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(val, I32), IMM_op(1, I32), l1);
-  T->cmp_branch(T, CMP_EQ, REG_op(val, I32), IMM_op(2, I32), l2);
-  T->jump(T, end);
-
-  T->label_place(T, l1);
-  {
-    Reg r = T->alloc_reg(T, RC_INT, I32);
-    cgtest_load_local(tf, REG_op(r, I32), rs, I32);
-    T->binop(T, BO_IADD, REG_op(r, I32), REG_op(r, I32), IMM_op(10, I32));
-    cgtest_store_local(tf, rs, REG_op(r, I32), I32);
-  }
-  /* no break — fall through. */
-  T->label_place(T, l2);
-  {
-    Reg r = T->alloc_reg(T, RC_INT, I32);
-    cgtest_load_local(tf, REG_op(r, I32), rs, I32);
-    T->binop(T, BO_IADD, REG_op(r, I32), REG_op(r, I32), IMM_op(20, I32));
-    cgtest_store_local(tf, rs, REG_op(r, I32), I32);
-  }
-  T->jump(T, end);
-
-  T->label_place(T, end);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), rs, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h11_switch_default — switch(99){case 1:r=10;break; default:r=7;} → 7. */
-void build_h11_switch_default(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot rs = cgtest_local(tf, I32, FSF_NONE);
-
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(val, I32), 99);
-
-  Label l1 = T->label_new(T), ldef = T->label_new(T), end = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(val, I32), IMM_op(1, I32), l1);
-  T->jump(T, ldef);
-
-  T->label_place(T, l1);
-  cgtest_store_local(tf, rs, IMM_op(10, I32), I32);
-  T->jump(T, end);
-  T->label_place(T, ldef);
-  cgtest_store_local(tf, rs, IMM_op(7, I32), I32);
-  T->jump(T, end);
-
-  T->label_place(T, end);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), rs, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h12_jump_forward — jump L; ret 99 (dead); L: ret 42; → 42. */
-void build_h12_jump_forward(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Label L = T->label_new(T);
-  T->jump(T, L);
-  cgtest_ret_imm(tf, 99, I32); /* dead */
-  T->label_place(T, L);
-  cgtest_ret_imm(tf, 42, I32);
-  cgtest_end(tf);
-}
-
-/* h13_jump_backward — counter loop entirely from cmp_branch + backward
- * jump (no SCOPE_LOOP). Loops until i == 10. */
-void build_h13_jump_backward(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label top = T->label_new(T);
-  Label end = T->label_new(T);
-  T->label_place(T, top);
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), IMM_op(10, I32), end);
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, top);
-
-  T->label_place(T, end);
-  cgtest_ret_reg(tf, ir, I32);
-  cgtest_end(tf);
-}
-
-/* h14_short_circuit_and_skip — `int s=0; (0) && (s=99,1); return s;` → 0.
- * The RHS side effect must NOT execute when the LHS is 0. */
-void build_h14_short_circuit_and_skip(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-
-  Reg lhs = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(lhs, I32), 0);
-
-  Label rhs = T->label_new(T);
-  Label after = T->label_new(T);
-  /* if (lhs != 0) goto rhs; else fall through to "after" with skipped RHS. */
-  T->cmp_branch(T, CMP_NE, REG_op(lhs, I32), IMM_op(0, I32), rhs);
-  T->jump(T, after);
-
-  T->label_place(T, rhs);
-  /* RHS side effect: s = 99 (must not run). */
-  cgtest_store_local(tf, ss, IMM_op(99, I32), I32);
-
-  T->label_place(T, after);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h15_short_circuit_or_skip — `int s=0; (1) || (s=99,1); return s;` → 0. */
-void build_h15_short_circuit_or_skip(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-
-  Reg lhs = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(lhs, I32), 1);
-
-  Label after = T->label_new(T);
-  /* if (lhs != 0) skip RHS. */
-  T->cmp_branch(T, CMP_NE, REG_op(lhs, I32), IMM_op(0, I32), after);
-
-  /* RHS side effect (must not run). */
-  cgtest_store_local(tf, ss, IMM_op(99, I32), I32);
-
-  T->label_place(T, after);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h16_ternary — int x = (5 > 3) ? 42 : 7; return x; → 42. Uses scope_if. */
-void build_h16_ternary(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot xs = cgtest_local(tf, I32, FSF_NONE);
-
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(a, I32), 5);
-  T->load_imm(T, REG_op(b, I32), 3);
-  T->cmp(T, CMP_GT_S, REG_op(c, I32), REG_op(a, I32), REG_op(b, I32));
-  CGScopeDesc desc = {.kind = SCOPE_IF, .cond = REG_op(c, I32)};
-  CGScope s = T->scope_begin(T, &desc);
-  cgtest_store_local(tf, xs, IMM_op(42, I32), I32);
-  T->scope_else(T, s);
-  cgtest_store_local(tf, xs, IMM_op(7, I32), I32);
-  T->scope_end(T, s);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), xs, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h17_ternary_side_effect_one_arm — int s=0; (1)?(s=42):(s=99); return s; → 42.
- * Only the taken arm runs. Uses cmp_branch + flat labels (no scope). */
-void build_h17_ternary_side_effect_one_arm(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(c, I32), 1);
-
-  Label then_l = T->label_new(T);
-  Label end = T->label_new(T);
-  T->cmp_branch(T, CMP_NE, REG_op(c, I32), IMM_op(0, I32), then_l);
-  /* else arm */
-  cgtest_store_local(tf, ss, IMM_op(99, I32), I32);
-  T->jump(T, end);
-  T->label_place(T, then_l);
-  cgtest_store_local(tf, ss, IMM_op(42, I32), I32);
-  T->label_place(T, end);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* h18_unreachable_after_ret — emit a scalar ret followed by additional
- * (unreachable) ops; backend must tolerate the dead tail. */
-void build_h18_unreachable_after_ret(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  cgtest_ret_imm(tf, 42, I32);
-
-  /* Dead instructions — should not execute, but the emitter must accept them.
-   */
-  Reg dead = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(dead, I32), 99);
-  cgtest_ret_reg(tf, dead, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_i.c b/test/cg/harness/cases_i.c
@@ -1,435 +0,0 @@
-/* Group I — alloca / VLA.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group I: alloca / VLA
- *
- * Stack-allocated runtime-sized memory: alloca with const and runtime
- * sizes, alignment, in-loop distinctness, and crossing a call boundary
- * with the alloca'd pointer. The alloca op signature is
- *   alloca_(target, dst REG, size Operand, align).
- * ============================================================ */
-
-/* i01_alloca_const_int — int *p = alloca(4); *p = 42; return *p. */
-void build_i01_alloca_const_int(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(4, I64), 4);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(p, 0, I32), IMM_op(42, I32), ma);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* i02_alloca_runtime_size — int n=5; int *p = alloca(n*4); fill 1..5; sum=15.
- */
-void build_i02_alloca_runtime_size(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* size_bytes = 5 * 4. Use I64 to match alloca's size operand. */
-  Reg sz = T->alloc_reg(T, RC_INT, I64);
-  T->load_imm(T, REG_op(sz, I64), 20);
-
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), REG_op(sz, I64), 4);
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  /* p[0..4] = 1..5 */
-  for (int i = 0; i < 5; ++i) {
-    T->store(T, IND_op(p, (i32)(i * 4), I32), IMM_op(i + 1, I32), ma);
-  }
-  /* sum */
-  Reg acc = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(acc, I32), 0);
-  for (int i = 0; i < 5; ++i) {
-    Reg v = T->alloc_reg(T, RC_INT, I32);
-    T->load(T, REG_op(v, I32), IND_op(p, (i32)(i * 4), I32), ma);
-    T->binop(T, BO_IADD, REG_op(acc, I32), REG_op(acc, I32), REG_op(v, I32));
-  }
-  cgtest_ret_reg(tf, acc, I32);
-  cgtest_end(tf);
-}
-
-/* i03_alloca_align_16 — alloca(16, align=16); return ((p & 0xF) == 0). */
-void build_i03_alloca_align_16(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PV = T_ptr_void(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, PV);
-  T->alloca_(T, REG_op(p, PV), IMM_op(16, I64), 16);
-
-  /* low_bits = p & 0xF */
-  Reg lb = T->alloc_reg(T, RC_INT, I64);
-  T->binop(T, BO_AND, REG_op(lb, I64), REG_op(p, I64), IMM_op(0xF, I64));
-
-  /* result = (low_bits == 0) */
-  Reg d = T->alloc_reg(T, RC_INT, I32);
-  T->cmp(T, CMP_EQ, REG_op(d, I32), REG_op(lb, I64), IMM_op(0, I64));
-  cgtest_ret_reg(tf, d, I32);
-  cgtest_end(tf);
-}
-
-/* i04_alloca_in_loop_distinct — three alloca(4)s in a loop; return
- * (a != b && b != c). Addresses must differ across iterations. */
-void build_i04_alloca_in_loop_distinct(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Three slots to record the alloca'd addresses. */
-  FrameSlot a = cgtest_local(tf, PI32, FSF_NONE);
-  FrameSlot b = cgtest_local(tf, PI32, FSF_NONE);
-  FrameSlot c = cgtest_local(tf, PI32, FSF_NONE);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label brk = T->label_new(T);
-  Label cnt = T->label_new(T);
-  CGScopeDesc d = {
-      .kind = SCOPE_LOOP, .break_label = brk, .continue_label = cnt};
-  CGScope sc = T->scope_begin(T, &d);
-  T->label_place(T, cnt);
-
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), IMM_op(3, I32), brk);
-
-  /* p = alloca(4) */
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(4, I64), 4);
-
-  /* select destination slot by i. */
-  Label sa = T->label_new(T), sb = T->label_new(T), sc_l = T->label_new(T);
-  Label after_store = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(ir, I32), IMM_op(0, I32), sa);
-  T->cmp_branch(T, CMP_EQ, REG_op(ir, I32), IMM_op(1, I32), sb);
-  T->jump(T, sc_l);
-
-  T->label_place(T, sa);
-  cgtest_store_local(tf, a, REG_op(p, PI32), PI32);
-  T->jump(T, after_store);
-  T->label_place(T, sb);
-  cgtest_store_local(tf, b, REG_op(p, PI32), PI32);
-  T->jump(T, after_store);
-  T->label_place(T, sc_l);
-  cgtest_store_local(tf, c, REG_op(p, PI32), PI32);
-  T->label_place(T, after_store);
-
-  /* i++ */
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, cnt);
-  T->label_place(T, brk);
-  T->scope_end(T, sc);
-
-  /* return (a != b) & (b != c) */
-  Reg ra = T->alloc_reg(T, RC_INT, PI32);
-  Reg rb = T->alloc_reg(T, RC_INT, PI32);
-  Reg rc = T->alloc_reg(T, RC_INT, PI32);
-  cgtest_load_local(tf, REG_op(ra, PI32), a, PI32);
-  cgtest_load_local(tf, REG_op(rb, PI32), b, PI32);
-  cgtest_load_local(tf, REG_op(rc, PI32), c, PI32);
-
-  Reg ne1 = T->alloc_reg(T, RC_INT, I32);
-  Reg ne2 = T->alloc_reg(T, RC_INT, I32);
-  Reg both = T->alloc_reg(T, RC_INT, I32);
-  T->cmp(T, CMP_NE, REG_op(ne1, I32), REG_op(ra, PI32), REG_op(rb, PI32));
-  T->cmp(T, CMP_NE, REG_op(ne2, I32), REG_op(rb, PI32), REG_op(rc, PI32));
-  T->binop(T, BO_AND, REG_op(both, I32), REG_op(ne1, I32), REG_op(ne2, I32));
-  cgtest_ret_reg(tf, both, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by i05: void fill(int *p, int v) { *p = v; } — same shape as
- * g07 but a separate symbol so the cases don't share state. */
-static ObjSymId build_i05_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* VOID = T_void(ctx);
-  const Type* params[] = {PI32, I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "i05_fill", VOID, params, 2);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  Reg v = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(p, PI32), cgtest_param_slot(tf, 0), PI32);
-  cgtest_load_local(tf, REG_op(v, I32), cgtest_param_slot(tf, 1), I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(p, 0, I32), REG_op(v, I32), ma);
-  cgtest_ret_void(tf);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* i05_alloca_then_call — alloca buf; helper writes 42; load and return. */
-void build_i05_alloca_then_call(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* VOID = T_void(ctx);
-  const Type* params[] = {PI32, I32};
-  ObjSymId fill = build_i05_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(4, I64), 4);
-
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_REG, .type = PI32, .v.reg = p},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 42},
-  };
-  cgtest_call(tf, fill, VOID, params, args, 2, IMM_op(0, VOID));
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* i06_two_allocas_disjoint — *p=1; *q=2; return *p + *q → 3. */
-void build_i06_two_allocas_disjoint(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  Reg q = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(4, I64), 4);
-  T->alloca_(T, REG_op(q, PI32), IMM_op(4, I64), 4);
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(p, 0, I32), IMM_op(1, I32), ma);
-  T->store(T, IND_op(q, 0, I32), IMM_op(2, I32), ma);
-
-  Reg vp = T->alloc_reg(T, RC_INT, I32);
-  Reg vq = T->alloc_reg(T, RC_INT, I32);
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(vp, I32), IND_op(p, 0, I32), ma);
-  T->load(T, REG_op(vq, I32), IND_op(q, 0, I32), ma);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(vp, I32), REG_op(vq, I32));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-}
-
-/* i07_alloca_addr_escapes — alloca'd pointer round-trips through an
- * addr-taken local int**, then is dereferenced to write 42. The escape
- * forces the alloca's pointer to be a real value, not a register-only
- * temporary the optimizer could fold away. */
-void build_i07_alloca_addr_escapes(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* PPI32 = T_ptr(ctx, PI32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* int *holder; */
-  FrameSlot holder = cgtest_local(tf, PI32, FSF_ADDR_TAKEN);
-
-  /* p = alloca(4); */
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(4, I64), 4);
-
-  /* holder = p; */
-  cgtest_store_local(tf, holder, REG_op(p, PI32), PI32);
-
-  /* int **pp = &holder; */
-  Reg pp = T->alloc_reg(T, RC_INT, PPI32);
-  T->addr_of(T, REG_op(pp, PPI32), LOCAL_op(holder, PI32));
-
-  /* int *back = *pp; *back = 42; return *back; */
-  MemAccess ma_p = {
-      .type = PI32, .size = 8, .align = 8, .alias.kind = ALIAS_LOCAL};
-  Reg back = T->alloc_reg(T, RC_INT, PI32);
-  T->load(T, REG_op(back, PI32), IND_op(pp, 0, PI32), ma_p);
-
-  MemAccess ma_i = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(back, 0, I32), IMM_op(42, I32), ma_i);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(back, 0, I32), ma_i);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* helper used by i08: int sum(int n, int *p) — n must be > 0. */
-static ObjSymId build_i08_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* params[] = {I32, PI32};
-  CgTestFn* tf = cgtest_begin_func(ctx, "i08_sum", I32, params, 2);
-  CGTarget* T = ctx->target;
-
-  Reg n = T->alloc_reg(T, RC_INT, I32);
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  cgtest_load_local(tf, REG_op(n, I32), cgtest_param_slot(tf, 0), I32);
-  cgtest_load_local(tf, REG_op(p, PI32), cgtest_param_slot(tf, 1), PI32);
-
-  /* int s=0; for (i=0;i<n;i++) s += p[i]; */
-  FrameSlot ss = cgtest_local(tf, I32, FSF_NONE);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, I32), I32);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label top = T->label_new(T);
-  Label end = T->label_new(T);
-  T->label_place(T, top);
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), REG_op(n, I32), end);
-
-  /* offset_bytes = i * 4 */
-  Reg ofs = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_SHL, REG_op(ofs, I32), REG_op(ir, I32), IMM_op(2, I32));
-  /* p_i = p + offset (use I64 ptr arith) */
-  Reg pi = T->alloc_reg(T, RC_INT, PI32);
-  T->binop(T, BO_IADD, REG_op(pi, PI32), REG_op(p, PI32), REG_op(ofs, I32));
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  Reg v = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(v, I32), IND_op(pi, 0, I32), ma);
-
-  Reg sr = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(sr, I32), ss, I32);
-  T->binop(T, BO_IADD, REG_op(sr, I32), REG_op(sr, I32), REG_op(v, I32));
-  cgtest_store_local(tf, ss, REG_op(sr, I32), I32);
-
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, top);
-
-  T->label_place(T, end);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(out, I32), ss, I32);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* i08_vla_param_sum — alloca 9 ints, fill 1..9, helper sums → 45. */
-void build_i08_vla_param_sum(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* params[] = {I32, PI32};
-  ObjSymId sum = build_i08_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* alloca 9*4 = 36 bytes (round up to 40 for 8B align is fine). */
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(36, I64), 4);
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  for (int i = 0; i < 9; ++i) {
-    T->store(T, IND_op(p, (i32)(i * 4), I32), IMM_op(i + 1, I32), ma);
-  }
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 9},
-      {.kind = CGT_ARG_REG, .type = PI32, .v.reg = p},
-  };
-  cgtest_call(tf, sum, I32, params, args, 2, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* i09_alloca_preserves_locals — named locals declared before *and* after
- * an alloca remain readable; the alloca must not overlap their slots. */
-void build_i09_alloca_preserves_locals(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* int x = 17 (declared before alloca). */
-  FrameSlot x = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, x, IMM_op(17, I32), I32);
-
-  /* alloca 4 bytes. */
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(4, I64), 4);
-
-  /* int y = 25 (declared after alloca). */
-  FrameSlot y = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, y, IMM_op(25, I32), I32);
-
-  /* Touch the alloca'd memory so it isn't dead. */
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(p, 0, I32), IMM_op(99, I32), ma);
-
-  /* return x + y → 42. */
-  Reg rx = T->alloc_reg(T, RC_INT, I32);
-  Reg ry = T->alloc_reg(T, RC_INT, I32);
-  Reg rs = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(rx, I32), x, I32);
-  cgtest_load_local(tf, REG_op(ry, I32), y, I32);
-  T->binop(T, BO_IADD, REG_op(rs, I32), REG_op(rx, I32), REG_op(ry, I32));
-  cgtest_ret_reg(tf, rs, I32);
-  cgtest_end(tf);
-}
-
-/* i10_alloca_after_named_local — frame layout must keep both addressable
- * even when the named local is addr-taken. Same expected as i09 but the
- * named local has FSF_ADDR_TAKEN so the backend must place it in the
- * fixed-frame region, not in the dynamic alloca region. */
-void build_i10_alloca_after_named_local(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  cgtest_store_local(tf, x, IMM_op(42, I32), I32);
-
-  /* Take the address of x BEFORE the alloca. */
-  Reg px = T->alloc_reg(T, RC_INT, PI32);
-  T->addr_of(T, REG_op(px, PI32), LOCAL_op(x, I32));
-
-  /* alloca; must not invalidate &x. */
-  Reg dyn = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(dyn, PI32), IMM_op(8, I64), 4);
-
-  /* Reload via the saved &x. */
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(px, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_j.c b/test/cg/harness/cases_j.c
@@ -1,573 +0,0 @@
-/* Group J — varargs.
- * See CORPUS.md for the case list and expected values. */
-
-#include <string.h>
-
-#include "cg_test.h"
-#include "core/arena.h"
-#include "core/pool.h"
-
-/* ============================================================
- * Group J: varargs
- *
- * Drives va_start_/va_arg_/va_end_/va_copy_ on CGTarget plus the ABI's
- * variadic classification (abi_func_info on a type_func with variadic=1
- * carries vararg_gp_offset/vararg_fp_offset/vararg_overflow_offset). The
- * standard cgtest_begin_func/cgtest_call helpers hardcode variadic=0 so
- * this file mirrors the variadic-aware paths locally.
- *
- * Each test_main calls a variadic helper. The helper allocates an `ap`
- * local of abi_va_list_type size (FSF_ADDR_TAKEN), invokes va_start_
- * with &ap, runs n va_arg_'s, calls va_end_, and returns the
- * accumulator.
- * ============================================================ */
-
-/* ---- variadic-aware helpers ---- */
-
-/* Mirrors cgtest_begin_func_at but builds fn_type with variadic=1.
- * The caller passes the count of fixed (named) params; var args are
- * appended at the call site. */
-static CgTestFn* j_begin_va_func(CgTestCtx* ctx, const char* name,
-                                 const Type* ret_ty,
-                                 const Type* const* fixed_param_types,
-                                 u32 nfixed) {
-  CgTestFn* tf = arena_new(ctx->c->tu, CgTestFn);
-  memset(tf, 0, sizeof *tf);
-  tf->ctx = ctx;
-  tf->ret_ty = ret_ty;
-
-  const Type** ptypes = NULL;
-  if (nfixed) {
-    ptypes = arena_array(ctx->c->tu, const Type*, nfixed);
-    for (u32 i = 0; i < nfixed; ++i) ptypes[i] = fixed_param_types[i];
-  }
-  tf->fn_type = type_func(ctx->pool, ret_ty, ptypes, (u16)nfixed, 1);
-  tf->abi_info = abi_func_info(ctx->c->abi, tf->fn_type);
-  tf->sym = cgtest_decl_func(ctx, name);
-
-  CGParamDesc* pds = NULL;
-  if (nfixed) {
-    tf->params = arena_array(ctx->c->tu, CgTestParam, nfixed);
-    memset(tf->params, 0, sizeof(CgTestParam) * nfixed);
-    pds = arena_array(ctx->c->tu, CGParamDesc, nfixed);
-    memset(pds, 0, sizeof(CGParamDesc) * nfixed);
-    for (u32 i = 0; i < nfixed; ++i) {
-      tf->params[i].type = ptypes[i];
-      tf->params[i].abi = &tf->abi_info->params[i];
-      pds[i].index = i;
-      pds[i].type = ptypes[i];
-      pds[i].slot = FRAME_SLOT_NONE;
-      pds[i].abi = &tf->abi_info->params[i];
-      pds[i].incoming = tf->abi_info->params[i].parts;
-      pds[i].nincoming = tf->abi_info->params[i].nparts;
-    }
-  }
-  tf->nparams = nfixed;
-
-  tf->fd.sym = tf->sym;
-  tf->fd.text_section_id = ctx->text_sec;
-  tf->fd.group_id = OBJ_GROUP_NONE;
-  tf->fd.fn_type = tf->fn_type;
-  tf->fd.abi = tf->abi_info;
-  tf->fd.params = pds;
-  tf->fd.nparams = nfixed;
-
-  ctx->target->func_begin(ctx->target, &tf->fd);
-
-  for (u32 i = 0; i < nfixed; ++i) {
-    FrameSlotDesc fsd = {
-        .type = ptypes[i],
-        .size = abi_sizeof(ctx->c->abi, ptypes[i]),
-        .align = abi_alignof(ctx->c->abi, ptypes[i]),
-        .kind = FS_PARAM,
-        .flags = FSF_NONE,
-    };
-    FrameSlot s = ctx->target->frame_slot(ctx->target, &fsd);
-    tf->params[i].slot = s;
-    pds[i].slot = s;
-    ctx->target->param(ctx->target, &pds[i]);
-  }
-  return tf;
-}
-
-/* Direct call to a variadic callee. fn_type built with variadic=1; the
- * abi info reports per-arg classification including the ABI's
- * vararg-vs-fixed split. */
-static void j_call_va(CgTestFn* caller, ObjSymId callee_sym, const Type* ret_ty,
-                      const Type* const* arg_types, const CgTestArg* args,
-                      u32 nargs, u32 nfixed, Operand ret_storage) {
-  CgTestCtx* ctx = caller->ctx;
-  const Type** ptypes = NULL;
-  if (nargs) {
-    ptypes = arena_array(ctx->c->tu, const Type*, nargs);
-    for (u32 i = 0; i < nargs; ++i) ptypes[i] = arg_types[i];
-  }
-  /* type_func with variadic=1; nparams is the fixed count. nfixed must
-   * match the helper's named-param count even though we pass nargs
-   * Type pointers — abi_func_info reads its variadic flag from the
-   * Type and handles per-arg classification via ABIFuncInfo.params[]. */
-  const Type* fn_ty = type_func(ctx->pool, ret_ty, ptypes, (u16)nfixed, 1);
-  const ABIFuncInfo* info = abi_func_info(ctx->c->abi, fn_ty);
-
-  CGABIValue* avs = NULL;
-  if (nargs) {
-    avs = arena_array(ctx->c->tu, CGABIValue, nargs);
-    memset(avs, 0, sizeof(CGABIValue) * nargs);
-    for (u32 i = 0; i < nargs; ++i) {
-      CGABIValue* av = &avs[i];
-      av->type = arg_types[i];
-      av->abi = (i < info->nparams) ? &info->params[i] : NULL;
-      switch (args[i].kind) {
-        case CGT_ARG_IMM:
-          av->storage = IMM_op(args[i].v.imm, arg_types[i]);
-          break;
-        case CGT_ARG_REG:
-          av->storage = REG_op(args[i].v.reg, arg_types[i]);
-          break;
-        default:
-          av->storage = LOCAL_op(args[i].v.slot, arg_types[i]);
-          break;
-      }
-    }
-  }
-
-  CGCallDesc desc;
-  memset(&desc, 0, sizeof desc);
-  desc.fn_type = fn_ty;
-  desc.abi = info;
-  desc.callee = GLOBAL_op(callee_sym, 0);
-  desc.args = avs;
-  desc.nargs = nargs;
-  desc.ret.type = ret_ty;
-  desc.ret.abi = &info->ret;
-  desc.ret.storage = ret_storage;
-  ctx->target->call(ctx->target, &desc);
-}
-
-/* ---- shared helpers ---- */
-
-/* Allocate an ap local of abi_va_list_type and addr_of into a register. */
-typedef struct VaApRegs {
-  FrameSlot slot;
-  Reg ap_addr;
-  const Type* ap_ty;
-} VaApRegs;
-
-static VaApRegs j_alloc_ap(CgTestFn* tf) {
-  CgTestCtx* ctx = tf->ctx;
-  const Type* ap_ty = abi_va_list_type(ctx->c->abi, ctx->pool);
-  const Type* ap_pty = T_ptr(ctx, ap_ty);
-  FrameSlot ap_slot = cgtest_local(tf, ap_ty, FSF_ADDR_TAKEN);
-  Reg ap_addr = ctx->target->alloc_reg(ctx->target, RC_INT, ap_pty);
-  ctx->target->addr_of(ctx->target, REG_op(ap_addr, ap_pty),
-                       LOCAL_op(ap_slot, ap_ty));
-  return (VaApRegs){ap_slot, ap_addr, ap_ty};
-}
-
-/* Build helper: int sum(int n, ...) { va_start(ap); int s=0; for(i=0;i<n;i++)
- * s += va_arg(ap, T); va_end(ap); return s; } — T is the va_arg type. */
-static ObjSymId j_build_int_sum_helper(CgTestCtx* ctx, const char* name,
-                                       const Type* va_ty, const Type* acc_ty) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = j_begin_va_func(ctx, name, acc_ty, params, 1);
-  CGTarget* T = ctx->target;
-
-  Reg n = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(n, I32), cgtest_param_slot(tf, 0), I32);
-
-  VaApRegs ap = j_alloc_ap(tf);
-  T->va_start_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-
-  /* Accumulator starts at 0. */
-  FrameSlot ss = cgtest_local(tf, acc_ty, FSF_NONE);
-  cgtest_store_local(tf, ss, IMM_op(0, acc_ty), acc_ty);
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label top = T->label_new(T);
-  Label end = T->label_new(T);
-  T->label_place(T, top);
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), REG_op(n, I32), end);
-
-  Reg v = T->alloc_reg(T, RC_INT, va_ty);
-  T->va_arg_(T, REG_op(v, va_ty), REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)),
-             va_ty);
-
-  Reg sr = T->alloc_reg(T, RC_INT, acc_ty);
-  cgtest_load_local(tf, REG_op(sr, acc_ty), ss, acc_ty);
-  T->binop(T, BO_IADD, REG_op(sr, acc_ty), REG_op(sr, acc_ty),
-           REG_op(v, va_ty));
-  cgtest_store_local(tf, ss, REG_op(sr, acc_ty), acc_ty);
-
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, top);
-  T->label_place(T, end);
-
-  T->va_end_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-  Reg out = T->alloc_reg(T, RC_INT, acc_ty);
-  cgtest_load_local(tf, REG_op(out, acc_ty), ss, acc_ty);
-  cgtest_ret_reg(tf, out, acc_ty);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* Build helper: int sumd(int n, ...) — fp accumulator, ftoi_s before return. */
-static ObjSymId j_build_double_sum_helper(CgTestCtx* ctx, const char* name) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F64 = T_f64(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = j_begin_va_func(ctx, name, I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  Reg n = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(n, I32), cgtest_param_slot(tf, 0), I32);
-
-  VaApRegs ap = j_alloc_ap(tf);
-  T->va_start_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-
-  FrameSlot ss = cgtest_local(tf, F64, FSF_NONE);
-  Reg zero = T->alloc_reg(T, RC_FP, F64);
-  /* Materialize 0.0 via a u64 zero bitcast: easier — use convert(0). */
-  Reg iz = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(iz, I32), 0);
-  T->convert(T, CV_ITOF_S, REG_op(zero, F64), REG_op(iz, I32));
-  cgtest_store_local(tf, ss, REG_op(zero, F64), F64);
-
-  FrameSlot is = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, is, IMM_op(0, I32), I32);
-
-  Label top = T->label_new(T);
-  Label end = T->label_new(T);
-  T->label_place(T, top);
-  Reg ir = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ir, I32), is, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ir, I32), REG_op(n, I32), end);
-
-  Reg v = T->alloc_reg(T, RC_FP, F64);
-  T->va_arg_(T, REG_op(v, F64), REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)), F64);
-
-  Reg sr = T->alloc_reg(T, RC_FP, F64);
-  cgtest_load_local(tf, REG_op(sr, F64), ss, F64);
-  T->binop(T, BO_FADD, REG_op(sr, F64), REG_op(sr, F64), REG_op(v, F64));
-  cgtest_store_local(tf, ss, REG_op(sr, F64), F64);
-
-  T->binop(T, BO_IADD, REG_op(ir, I32), REG_op(ir, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, is, REG_op(ir, I32), I32);
-  T->jump(T, top);
-  T->label_place(T, end);
-
-  T->va_end_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-  Reg final = T->alloc_reg(T, RC_FP, F64);
-  cgtest_load_local(tf, REG_op(final, F64), ss, F64);
-  Reg ir32 = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(ir32, I32), REG_op(final, F64));
-  cgtest_ret_reg(tf, ir32, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* ---- cases ---- */
-
-/* j01_va_int_sum_3 — sum(3, 1, 2, 3) → 6. */
-void build_j01_va_int_sum_3(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId sum = j_build_int_sum_helper(ctx, "j01_sum", I32, I32);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  const Type* atypes[] = {I32, I32, I32, I32};
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 3},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 1},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 2},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 3},
-  };
-  j_call_va(tf, sum, I32, atypes, args, 4, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* j02_va_zero_args — sum(0); va_start/va_end with no va_arg → 0. */
-void build_j02_va_zero_args(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId sum = j_build_int_sum_helper(ctx, "j02_sum", I32, I32);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  const Type* atypes[] = {I32};
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 0}};
-  j_call_va(tf, sum, I32, atypes, args, 1, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* j03_va_int_spill — sum(10, 1..10) → 55. Exhausts AArch64 GPR save area. */
-void build_j03_va_int_spill(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId sum = j_build_int_sum_helper(ctx, "j03_sum", I32, I32);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  const Type* atypes[11] = {I32, I32, I32, I32, I32, I32,
-                            I32, I32, I32, I32, I32};
-  CgTestArg args[11];
-  args[0] = (CgTestArg){.kind = CGT_ARG_IMM, .type = I32, .v.imm = 10};
-  for (int i = 0; i < 10; ++i) {
-    args[i + 1] = (CgTestArg){.kind = CGT_ARG_IMM, .type = I32, .v.imm = i + 1};
-  }
-  j_call_va(tf, sum, I32, atypes, args, 11, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* j04_va_int64 — sum_ll(2, 21LL, 21LL); low 32 of result → 42. */
-void build_j04_va_int64(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  ObjSymId sum = j_build_int_sum_helper(ctx, "j04_sum_ll", I64, I64);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r64 = T->alloc_reg(T, RC_INT, I64);
-  const Type* atypes[] = {I32, I64, I64};
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 2},
-      {.kind = CGT_ARG_IMM, .type = I64, .v.imm = 21},
-      {.kind = CGT_ARG_IMM, .type = I64, .v.imm = 21},
-  };
-  j_call_va(tf, sum, I64, atypes, args, 3, 1, REG_op(r64, I64));
-  /* Truncate to i32. */
-  Reg r32 = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_TRUNC, REG_op(r32, I32), REG_op(r64, I64));
-  cgtest_ret_reg(tf, r32, I32);
-  cgtest_end(tf);
-}
-
-/* ---- helpers for fp + double-arg passing ---- */
-
-/* Emit a call_const for a double-precision FP constant from raw little-endian
- * bytes; returns the FP reg. */
-static Reg j_load_f64(CgTestCtx* ctx, const u8* bytes_le8) {
-  const Type* F64 = T_f64(ctx);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_FP, F64);
-  ConstBytes cb = {.type = F64, .bytes = bytes_le8, .size = 8, .align = 8};
-  ctx->target->load_const(ctx->target, REG_op(r, F64), cb);
-  return r;
-}
-
-/* j05_va_double_sum — sumd(3, 1.5, 2.0, 3.5) → 7. */
-void build_j05_va_double_sum(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F64 = T_f64(ctx);
-  ObjSymId sumd = j_build_double_sum_helper(ctx, "j05_sumd");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  /* 1.5, 2.0, 3.5 as little-endian double bytes. */
-  static const u8 D15[8] = {0, 0, 0, 0, 0, 0, 0xF8, 0x3F};
-  static const u8 D20[8] = {0, 0, 0, 0, 0, 0, 0x00, 0x40};
-  static const u8 D35[8] = {0, 0, 0, 0, 0, 0, 0x0C, 0x40};
-  Reg r1 = j_load_f64(ctx, D15);
-  Reg r2 = j_load_f64(ctx, D20);
-  Reg r3 = j_load_f64(ctx, D35);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  const Type* atypes[] = {I32, F64, F64, F64};
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 3},
-      {.kind = CGT_ARG_REG, .type = F64, .v.reg = r1},
-      {.kind = CGT_ARG_REG, .type = F64, .v.reg = r2},
-      {.kind = CGT_ARG_REG, .type = F64, .v.reg = r3},
-  };
-  j_call_va(tf, sumd, I32, atypes, args, 4, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* j06_va_double_spill — sumd(9, 0.5×9) → 4 (after ftoi_s of 4.5). */
-void build_j06_va_double_spill(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F64 = T_f64(ctx);
-  ObjSymId sumd = j_build_double_sum_helper(ctx, "j06_sumd");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  static const u8 D05[8] = {0, 0, 0, 0, 0, 0, 0xE0, 0x3F}; /* 0.5 */
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  const Type* atypes[10] = {I32, F64, F64, F64, F64, F64, F64, F64, F64, F64};
-  CgTestArg args[10];
-  args[0] = (CgTestArg){.kind = CGT_ARG_IMM, .type = I32, .v.imm = 9};
-  for (int i = 0; i < 9; ++i) {
-    Reg r = j_load_f64(ctx, D05);
-    args[i + 1] = (CgTestArg){.kind = CGT_ARG_REG, .type = F64, .v.reg = r};
-  }
-  j_call_va(tf, sumd, I32, atypes, args, 10, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper for j07: int f(int n, int a, double b, int c, double d) — fixed n,
- * then 4 var args of mixed kind. Body sums int+(int)b+int+(int)d. */
-static ObjSymId j_build_j07_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F64 = T_f64(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = j_begin_va_func(ctx, "j07_f", I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  VaApRegs ap = j_alloc_ap(tf);
-  T->va_start_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_FP, F64);
-  Reg d = T->alloc_reg(T, RC_FP, F64);
-  T->va_arg_(T, REG_op(a, I32), REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)), I32);
-  T->va_arg_(T, REG_op(b, F64), REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)), F64);
-  T->va_arg_(T, REG_op(c, I32), REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)), I32);
-  T->va_arg_(T, REG_op(d, F64), REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)), F64);
-
-  Reg ib = T->alloc_reg(T, RC_INT, I32);
-  Reg id = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_FTOI_S, REG_op(ib, I32), REG_op(b, F64));
-  T->convert(T, CV_FTOI_S, REG_op(id, I32), REG_op(d, F64));
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(a, I32), REG_op(ib, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(c, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(id, I32));
-
-  T->va_end_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* j07_va_mixed_int_dbl — f(_, 10, 16.5, 7, 8.5) → 10+16+7+8 = 41 truncated.
- * Adjust constants so int sum lands at 42:  10 + (int)16.0 + 8 + (int)8.0 = 42.
- */
-void build_j07_va_mixed_int_dbl(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* F64 = T_f64(ctx);
-  ObjSymId f = j_build_j07_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  static const u8 D16[8] = {0, 0, 0, 0, 0, 0, 0x30, 0x40}; /* 16.0 */
-  static const u8 D08[8] = {0, 0, 0, 0, 0, 0, 0x20, 0x40}; /* 8.0 */
-  Reg b = j_load_f64(ctx, D16);
-  Reg d = j_load_f64(ctx, D08);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  const Type* atypes[] = {I32, I32, F64, I32, F64};
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 0 /* unused n */},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 10},
-      {.kind = CGT_ARG_REG, .type = F64, .v.reg = b},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 8},
-      {.kind = CGT_ARG_REG, .type = F64, .v.reg = d},
-  };
-  j_call_va(tf, f, I32, atypes, args, 5, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper for j08: int f(int n, ...) { va_list a, b; va_start(a); va_copy(b,a);
- *   int x = va_arg(a, int); int y = va_arg(b, int); return x + y; }
- * Both ap and bp see the same first var arg, so x == y. */
-static ObjSymId j_build_j08_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = j_begin_va_func(ctx, "j08_f", I32, params, 1);
-  CGTarget* T = ctx->target;
-
-  /* Two va_list locals + their addresses. */
-  const Type* ap_ty = abi_va_list_type(ctx->c->abi, ctx->pool);
-  const Type* ap_pty = T_ptr(ctx, ap_ty);
-  FrameSlot ap = cgtest_local(tf, ap_ty, FSF_ADDR_TAKEN);
-  FrameSlot bp = cgtest_local(tf, ap_ty, FSF_ADDR_TAKEN);
-  Reg a_addr = T->alloc_reg(T, RC_INT, ap_pty);
-  Reg b_addr = T->alloc_reg(T, RC_INT, ap_pty);
-  T->addr_of(T, REG_op(a_addr, ap_pty), LOCAL_op(ap, ap_ty));
-  T->addr_of(T, REG_op(b_addr, ap_pty), LOCAL_op(bp, ap_ty));
-
-  T->va_start_(T, REG_op(a_addr, ap_pty));
-  T->va_copy_(T, REG_op(b_addr, ap_pty), REG_op(a_addr, ap_pty));
-
-  Reg x = T->alloc_reg(T, RC_INT, I32);
-  Reg y = T->alloc_reg(T, RC_INT, I32);
-  T->va_arg_(T, REG_op(x, I32), REG_op(a_addr, ap_pty), I32);
-  T->va_arg_(T, REG_op(y, I32), REG_op(b_addr, ap_pty), I32);
-
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(x, I32), REG_op(y, I32));
-
-  T->va_end_(T, REG_op(a_addr, ap_pty));
-  T->va_end_(T, REG_op(b_addr, ap_pty));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* j08_va_copy — f(_, 21) → 21+21 = 42 (both va_lists see arg 0). */
-void build_j08_va_copy(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId f = j_build_j08_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  const Type* atypes[] = {I32, I32};
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 0},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 21},
-  };
-  j_call_va(tf, f, I32, atypes, args, 2, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* helper for j09: int f(int a, int b, ...) { va_list ap; va_start(ap, b);
- *   int c = va_arg(ap, int); va_end(ap); return a + b + c; } */
-static ObjSymId j_build_j09_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32, I32};
-  CgTestFn* tf = j_begin_va_func(ctx, "j09_f", I32, params, 2);
-  CGTarget* T = ctx->target;
-
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(a, I32), cgtest_param_slot(tf, 0), I32);
-  cgtest_load_local(tf, REG_op(b, I32), cgtest_param_slot(tf, 1), I32);
-
-  VaApRegs ap = j_alloc_ap(tf);
-  T->va_start_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-  Reg c = T->alloc_reg(T, RC_INT, I32);
-  T->va_arg_(T, REG_op(c, I32), REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)), I32);
-  T->va_end_(T, REG_op(ap.ap_addr, T_ptr(ctx, ap.ap_ty)));
-
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(a, I32), REG_op(b, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(c, I32));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* j09_va_two_fixed — f(10, 15, 17) → 42. */
-void build_j09_va_two_fixed(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId f = j_build_j09_helper(ctx);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  const Type* atypes[] = {I32, I32, I32};
-  CgTestArg args[] = {
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 10},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 15},
-      {.kind = CGT_ARG_IMM, .type = I32, .v.imm = 17},
-  };
-  j_call_va(tf, f, I32, atypes, args, 3, 2, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_k.c b/test/cg/harness/cases_k.c
@@ -1,210 +0,0 @@
-/* Group K — atomics.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group K: atomics
- *
- * Drives atomic_load / atomic_store / atomic_rmw / atomic_cas / fence
- * on CGTarget across every AtomicOp and several MemOrders. Every case
- * uses an FSF_ADDR_TAKEN i32 (or i64 for k13) local as the atomic
- * object: store-into via plain store sets the prior state, the atomic
- * op is then dispatched against the address, and a plain load after
- * reads the post-state for the oracle. The MF_ATOMIC flag rides along
- * the MemAccess so the backend can route to ldar/stlr-class encodings.
- * ============================================================ */
-
-/* Helper: build the standard prelude — a single addr-taken i32 local x
- * pre-initialized to `init`, plus its address in a register. */
-typedef struct KCtx {
-  CgTestFn* tf;
-  FrameSlot x;
-  Reg p_addr;
-} KCtx;
-
-static KCtx k_open_i32(CgTestCtx* ctx, i64 init) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  FrameSlot x = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  cgtest_store_local(tf, x, IMM_op(init, I32), I32);
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->addr_of(T, REG_op(p, PI32), LOCAL_op(x, I32));
-  return (KCtx){tf, x, p};
-}
-
-/* MemAccess for a 4-byte i32 atomic at &x. */
-static MemAccess k_ma32(CgTestCtx* ctx) {
-  MemAccess ma = {0};
-  ma.type = T_i32(ctx);
-  ma.size = 4;
-  ma.align = 4;
-  ma.flags = MF_ATOMIC;
-  ma.alias.kind = ALIAS_LOCAL;
-  return ma;
-}
-
-/* Reload x and return; helper for the post-state oracle. */
-static void k_close_load_x(KCtx* k) {
-  CgTestCtx* ctx = k->tf->ctx;
-  const Type* I32 = T_i32(ctx);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_load_local(k->tf, REG_op(r, I32), k->x, I32);
-  cgtest_ret_reg(k->tf, r, I32);
-  cgtest_end(k->tf);
-}
-
-/* k01_atomic_load_relaxed — return atomic_load(&x=42, RELAXED). */
-void build_k01_atomic_load_relaxed(CgTestCtx* ctx) {
-  KCtx k = k_open_i32(ctx, 42);
-  const Type* I32 = T_i32(ctx);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->atomic_load(ctx->target, REG_op(r, I32),
-                           REG_op(k.p_addr, T_ptr(ctx, I32)), k_ma32(ctx),
-                           MO_RELAXED);
-  cgtest_ret_reg(k.tf, r, I32);
-  cgtest_end(k.tf);
-}
-
-/* k02_atomic_store_load_acq — atomic_store(&x, 42, RELEASE) then
- * atomic_load(&x, ACQUIRE). */
-void build_k02_atomic_store_load_acq(CgTestCtx* ctx) {
-  KCtx k = k_open_i32(ctx, 0);
-  const Type* I32 = T_i32(ctx);
-  CGTarget* T = ctx->target;
-  T->atomic_store(T, REG_op(k.p_addr, T_ptr(ctx, I32)), IMM_op(42, I32),
-                  k_ma32(ctx), MO_RELEASE);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->atomic_load(T, REG_op(r, I32), REG_op(k.p_addr, T_ptr(ctx, I32)),
-                 k_ma32(ctx), MO_ACQUIRE);
-  cgtest_ret_reg(k.tf, r, I32);
-  cgtest_end(k.tf);
-}
-
-/* k03_atomic_load_seq_cst — full-barrier load. */
-void build_k03_atomic_load_seq_cst(CgTestCtx* ctx) {
-  KCtx k = k_open_i32(ctx, 42);
-  const Type* I32 = T_i32(ctx);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->atomic_load(ctx->target, REG_op(r, I32),
-                           REG_op(k.p_addr, T_ptr(ctx, I32)), k_ma32(ctx),
-                           MO_SEQ_CST);
-  cgtest_ret_reg(k.tf, r, I32);
-  cgtest_end(k.tf);
-}
-
-/* Shared body for the rmw post-state cases (k04..k10). */
-static void k_rmw_post(CgTestCtx* ctx, AtomicOp op, i64 init, i64 val) {
-  KCtx k = k_open_i32(ctx, init);
-  const Type* I32 = T_i32(ctx);
-  Reg prior = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->atomic_rmw(ctx->target, op, REG_op(prior, I32),
-                          REG_op(k.p_addr, T_ptr(ctx, I32)), IMM_op(val, I32),
-                          k_ma32(ctx), MO_SEQ_CST);
-  k_close_load_x(&k);
-}
-
-void build_k04_atomic_rmw_add(CgTestCtx* c) { k_rmw_post(c, AO_ADD, 40, 2); }
-void build_k05_atomic_rmw_xchg(CgTestCtx* c) { k_rmw_post(c, AO_XCHG, 99, 42); }
-void build_k06_atomic_rmw_and(CgTestCtx* c) {
-  k_rmw_post(c, AO_AND, 0xFF, 0x2A);
-}
-void build_k07_atomic_rmw_or(CgTestCtx* c) { k_rmw_post(c, AO_OR, 0x20, 0x0A); }
-void build_k08_atomic_rmw_xor(CgTestCtx* c) {
-  k_rmw_post(c, AO_XOR, 0xFF, 0xD5);
-}
-void build_k09_atomic_rmw_sub(CgTestCtx* c) { k_rmw_post(c, AO_SUB, 44, 2); }
-
-/* k10_atomic_rmw_nand — post-state low 8: ~(0xFF & 0xD5) & 0xFF = 0x2A = 42. */
-void build_k10_atomic_rmw_nand(CgTestCtx* c) {
-  k_rmw_post(c, AO_NAND, 0xFF, 0xD5);
-}
-
-/* k11_atomic_cas_success — x=10; cas(&x, exp=10, des=42) → ok=1; load → 42. */
-void build_k11_atomic_cas_success(CgTestCtx* ctx) {
-  KCtx k = k_open_i32(ctx, 10);
-  const Type* I32 = T_i32(ctx);
-  CGTarget* T = ctx->target;
-  Reg prior = T->alloc_reg(T, RC_INT, I32);
-  Reg ok = T->alloc_reg(T, RC_INT, I32);
-  T->atomic_cas(T, REG_op(prior, I32), REG_op(ok, I32),
-                REG_op(k.p_addr, T_ptr(ctx, I32)), IMM_op(10, I32),
-                IMM_op(42, I32), k_ma32(ctx), MO_SEQ_CST, MO_RELAXED);
-  k_close_load_x(&k);
-}
-
-/* k12_atomic_cas_failure — x=10; cas(&x, exp=99, des=42) → ok=0; x unchanged.
- */
-void build_k12_atomic_cas_failure(CgTestCtx* ctx) {
-  KCtx k = k_open_i32(ctx, 10);
-  const Type* I32 = T_i32(ctx);
-  CGTarget* T = ctx->target;
-  Reg prior = T->alloc_reg(T, RC_INT, I32);
-  Reg ok = T->alloc_reg(T, RC_INT, I32);
-  T->atomic_cas(T, REG_op(prior, I32), REG_op(ok, I32),
-                REG_op(k.p_addr, T_ptr(ctx, I32)), IMM_op(99, I32),
-                IMM_op(42, I32), k_ma32(ctx), MO_SEQ_CST, MO_RELAXED);
-  k_close_load_x(&k);
-}
-
-/* k13_atomic_load_i64 — i64 atomic load of 0x1_0000_002A; return low 32 = 42.
- */
-void build_k13_atomic_load_i64(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI64 = T_ptr(ctx, I64);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I64, FSF_ADDR_TAKEN);
-  /* Materialize via load_imm into a 64-bit reg, then store. */
-  Reg init = T->alloc_reg(T, RC_INT, I64);
-  T->load_imm(T, REG_op(init, I64), 0x10000002Aull);
-  cgtest_store_local(tf, x, REG_op(init, I64), I64);
-
-  Reg p = T->alloc_reg(T, RC_INT, PI64);
-  T->addr_of(T, REG_op(p, PI64), LOCAL_op(x, I64));
-
-  MemAccess ma = {.type = I64,
-                  .size = 8,
-                  .align = 8,
-                  .flags = MF_ATOMIC,
-                  .alias.kind = ALIAS_LOCAL};
-  Reg r64 = T->alloc_reg(T, RC_INT, I64);
-  T->atomic_load(T, REG_op(r64, I64), REG_op(p, PI64), ma, MO_SEQ_CST);
-
-  Reg r32 = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_TRUNC, REG_op(r32, I32), REG_op(r64, I64));
-  cgtest_ret_reg(tf, r32, I32);
-  cgtest_end(tf);
-}
-
-/* k14_atomic_rmw_prior — return the prior value rmw produced (40), not the
- * post-state. */
-void build_k14_atomic_rmw_prior(CgTestCtx* ctx) {
-  KCtx k = k_open_i32(ctx, 40);
-  const Type* I32 = T_i32(ctx);
-  Reg prior = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->atomic_rmw(ctx->target, AO_ADD, REG_op(prior, I32),
-                          REG_op(k.p_addr, T_ptr(ctx, I32)), IMM_op(2, I32),
-                          k_ma32(ctx), MO_SEQ_CST);
-  cgtest_ret_reg(k.tf, prior, I32);
-  cgtest_end(k.tf);
-}
-
-/* k15_fence_seq_cst — fence between two plain atomic stores; load checks. */
-void build_k15_fence_seq_cst(CgTestCtx* ctx) {
-  KCtx k = k_open_i32(ctx, 0);
-  const Type* I32 = T_i32(ctx);
-  CGTarget* T = ctx->target;
-  T->atomic_store(T, REG_op(k.p_addr, T_ptr(ctx, I32)), IMM_op(42, I32),
-                  k_ma32(ctx), MO_RELAXED);
-  T->fence(T, MO_SEQ_CST);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->atomic_load(T, REG_op(r, I32), REG_op(k.p_addr, T_ptr(ctx, I32)),
-                 k_ma32(ctx), MO_RELAXED);
-  cgtest_ret_reg(k.tf, r, I32);
-  cgtest_end(k.tf);
-}
diff --git a/test/cg/harness/cases_l.c b/test/cg/harness/cases_l.c
@@ -1,396 +0,0 @@
-/* Group L — intrinsics.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group L: compiler intrinsics
- *
- * Drives CGTarget.intrinsic across every IntrinKind. Operand shapes
- * follow arch.h's documentation:
- *   POPCOUNT/CTZ/CLZ/BSWAP* : dsts[0] REG, args[0] REG
- *   MEMCPY/MEMMOVE          : args = (dst_addr, src_addr, n_bytes)
- *   MEMSET                  : args = (dst_addr, byte_value, n_bytes)
- *   PREFETCH                : args = (addr)
- *   ASSUME_ALIGNED          : dsts[0] REG, args = (ptr, align)
- *   EXPECT                  : dsts[0] REG, args = (val, expected)
- *   UNREACHABLE / TRAP      : no dsts, no args
- *   *_OVERFLOW              : dsts[0] result, dsts[1] i1 ovf; args = (a, b)
- * ============================================================ */
-
-/* helper: emit a single-result bit-op intrinsic on `in` (returns dst reg). */
-static Reg l_bitop(CgTestCtx* ctx, IntrinKind kind, const Type* arg_ty,
-                   i64 imm) {
-  const Type* I32 = T_i32(ctx);
-  CGTarget* T = ctx->target;
-  Reg src = T->alloc_reg(T, RC_INT, arg_ty);
-  T->load_imm(T, REG_op(src, arg_ty), imm);
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  Operand dsts[1] = {REG_op(dst, I32)};
-  Operand args[1] = {REG_op(src, arg_ty)};
-  T->intrinsic(T, kind, dsts, 1, args, 1);
-  return dst;
-}
-
-/* l01_popcount_u32 — popcount(0xFF) → 8. */
-void build_l01_popcount_u32(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_bitop(ctx, INTRIN_POPCOUNT, U32, 0xFF);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l02_popcount_u64 — popcount((u64)-1) → 64. */
-void build_l02_popcount_u64(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U64 = T_u64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_bitop(ctx, INTRIN_POPCOUNT, U64, -1);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l03_ctz_u32 — ctz(0x80) → 7. */
-void build_l03_ctz_u32(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_bitop(ctx, INTRIN_CTZ, U32, 0x80);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l04_clz_u32 — clz(0xFF) over 32 bits → 24. */
-void build_l04_clz_u32(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_bitop(ctx, INTRIN_CLZ, U32, 0xFF);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l05_bswap16 — bswap16(0x1234) → 0x3412 (low 8 = 0x12 = 18). */
-void build_l05_bswap16(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U16 = T_u16(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_bitop(ctx, INTRIN_BSWAP16, U16, 0x1234);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l06_bswap32 — bswap32(0x11223344) → 0x44332211 (low 8 = 0x11 = 17). */
-void build_l06_bswap32(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U32 = T_u32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_bitop(ctx, INTRIN_BSWAP32, U32, 0x11223344);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l07_bswap64 — bswap64(0x1122334455667788) → 0x8877665544332211; low 8 = 17.
- */
-void build_l07_bswap64(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* U64 = T_u64(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg src = T->alloc_reg(T, RC_INT, U64);
-  T->load_imm(T, REG_op(src, U64), 0x1122334455667788ll);
-  Reg dst64 = T->alloc_reg(T, RC_INT, U64);
-  Operand dsts[1] = {REG_op(dst64, U64)};
-  Operand args[1] = {REG_op(src, U64)};
-  T->intrinsic(T, INTRIN_BSWAP64, dsts, 1, args, 1);
-
-  Reg r32 = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_TRUNC, REG_op(r32, I32), REG_op(dst64, U64));
-  cgtest_ret_reg(tf, r32, I32);
-  cgtest_end(tf);
-}
-
-/* l08_memcpy_4 — int src=42; memcpy(&dst,&src,4); return dst. */
-void build_l08_memcpy_4(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot src = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  FrameSlot dst = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  cgtest_store_local(tf, src, IMM_op(42, I32), I32);
-
-  Reg ps = T->alloc_reg(T, RC_INT, PI32);
-  Reg pd = T->alloc_reg(T, RC_INT, PI32);
-  T->addr_of(T, REG_op(ps, PI32), LOCAL_op(src, I32));
-  T->addr_of(T, REG_op(pd, PI32), LOCAL_op(dst, I32));
-
-  Operand args[3] = {
-      REG_op(pd, PI32),
-      REG_op(ps, PI32),
-      IMM_op(4, I64),
-  };
-  T->intrinsic(T, INTRIN_MEMCPY, NULL, 0, args, 3);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), dst, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l09_memmove_overlap — int a[5]={1..5}; memmove(a+1,a,16); return a[4]→4. */
-void build_l09_memmove_overlap(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* U8 = T_u8(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* alloca 5*4 = 20 bytes, aligned to 4. */
-  Reg buf = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->alloca_(T, REG_op(buf, T_ptr(ctx, I32)), IMM_op(20, I64), 4);
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  for (int i = 0; i < 5; ++i) {
-    T->store(T, IND_op(buf, (i32)(i * 4), I32), IMM_op(i + 1, I32), ma);
-  }
-
-  /* dst = a + 4 (one i32 forward); use byte arithmetic for the addr. */
-  Reg dst = T->alloc_reg(T, RC_INT, T_ptr(ctx, U8));
-  T->binop(T, BO_IADD, REG_op(dst, T_ptr(ctx, U8)),
-           REG_op(buf, T_ptr(ctx, I32)), IMM_op(4, I64));
-
-  Operand args[3] = {
-      REG_op(dst, T_ptr(ctx, U8)),
-      REG_op(buf, T_ptr(ctx, I32)),
-      IMM_op(16, I64),
-  };
-  T->intrinsic(T, INTRIN_MEMMOVE, NULL, 0, args, 3);
-
-  /* return a[4] (byte offset 16 from buf — old a[3]=4 was copied here). */
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(buf, 16, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l10_memset_zero — int b[4]; memset(b,0,16); return b[2] → 0. */
-void build_l10_memset_zero(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* U8 = T_u8(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg buf = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->alloca_(T, REG_op(buf, T_ptr(ctx, I32)), IMM_op(16, I64), 4);
-
-  /* Pre-poison so the memset is observable. */
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-  for (int i = 0; i < 4; ++i)
-    T->store(T, IND_op(buf, (i32)(i * 4), I32), IMM_op(0xDEAD, I32), ma);
-
-  Operand args[3] = {
-      REG_op(buf, T_ptr(ctx, I32)),
-      IMM_op(0, U8),
-      IMM_op(16, I64),
-  };
-  T->intrinsic(T, INTRIN_MEMSET, NULL, 0, args, 3);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(buf, 8, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l11_memset_ff — int b; memset(&b,0xFF,4); load → 0xFFFFFFFF; low 8 = 255. */
-void build_l11_memset_ff(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* U8 = T_u8(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot b = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->addr_of(T, REG_op(p, PI32), LOCAL_op(b, I32));
-
-  Operand args[3] = {
-      REG_op(p, PI32),
-      IMM_op(0xFF, U8),
-      IMM_op(4, I64),
-  };
-  T->intrinsic(T, INTRIN_MEMSET, NULL, 0, args, 3);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), b, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l12_expect_taken — int x = expect(1==1, 1); if (x) return 42; else return 99.
- */
-void build_l12_expect_taken(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg cond = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(cond, I32), 1);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  Operand dsts[1] = {REG_op(out, I32)};
-  Operand args[2] = {REG_op(cond, I32), IMM_op(1, I32)};
-  T->intrinsic(T, INTRIN_EXPECT, dsts, 1, args, 2);
-
-  Label miss = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(out, I32), IMM_op(0, I32), miss);
-  cgtest_ret_imm(tf, 42, I32);
-  T->label_place(T, miss);
-  cgtest_ret_imm(tf, 99, I32);
-  cgtest_end(tf);
-}
-
-/* l13_unreachable_live — if(x) return 42; else __builtin_unreachable(). */
-void build_l13_unreachable_live(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg x = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(x, I32), 1);
-
-  Label dead = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(x, I32), IMM_op(0, I32), dead);
-  cgtest_ret_imm(tf, 42, I32);
-
-  T->label_place(T, dead);
-  T->intrinsic(T, INTRIN_UNREACHABLE, NULL, 0, NULL, 0);
-  cgtest_end(tf);
-}
-
-/* l14_trap_live — if(x) return 42; else __builtin_trap(). */
-void build_l14_trap_live(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg x = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(x, I32), 1);
-
-  Label trap_lbl = T->label_new(T);
-  T->cmp_branch(T, CMP_EQ, REG_op(x, I32), IMM_op(0, I32), trap_lbl);
-  cgtest_ret_imm(tf, 42, I32);
-
-  T->label_place(T, trap_lbl);
-  T->intrinsic(T, INTRIN_TRAP, NULL, 0, NULL, 0);
-  cgtest_end(tf);
-}
-
-/* l15_prefetch_noop — prefetch(&x); *p=42; return *p. */
-void build_l15_prefetch_noop(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot x = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->addr_of(T, REG_op(p, PI32), LOCAL_op(x, I32));
-
-  Operand pf_args[1] = {REG_op(p, PI32)};
-  T->intrinsic(T, INTRIN_PREFETCH, NULL, 0, pf_args, 1);
-
-  cgtest_store_local(tf, x, IMM_op(42, I32), I32);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), x, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l16_assume_aligned — p = assume_aligned(p, 8); *p = 42; return *p. */
-void build_l16_assume_aligned(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, PI32);
-  T->alloca_(T, REG_op(p, PI32), IMM_op(8, I64), 8);
-
-  Reg p2 = T->alloc_reg(T, RC_INT, PI32);
-  Operand dsts[1] = {REG_op(p2, PI32)};
-  Operand args[2] = {REG_op(p, PI32), IMM_op(8, I32)};
-  T->intrinsic(T, INTRIN_ASSUME_ALIGNED, dsts, 1, args, 2);
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 8, .alias.kind = ALIAS_LOCAL};
-  T->store(T, IND_op(p2, 0, I32), IMM_op(42, I32), ma);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(p2, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* Helper: emit a 2-operand checked-arith intrinsic; return either the result
- * or the overflow bit per `which`. */
-static Reg l_chkarith(CgTestCtx* ctx, IntrinKind kind, i64 a, i64 b,
-                      int which /*0=value,1=ovf*/) {
-  const Type* I32 = T_i32(ctx);
-  CGTarget* T = ctx->target;
-  Reg ra = T->alloc_reg(T, RC_INT, I32);
-  Reg rb = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(ra, I32), a);
-  T->load_imm(T, REG_op(rb, I32), b);
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  Reg ovf = T->alloc_reg(T, RC_INT, I32);
-  Operand dsts[2] = {REG_op(val, I32), REG_op(ovf, I32)};
-  Operand args[2] = {REG_op(ra, I32), REG_op(rb, I32)};
-  T->intrinsic(T, kind, dsts, 2, args, 2);
-  return which ? ovf : val;
-}
-
-/* l17_add_overflow_no — add_overflow(20,22) → val=42, ovf=0; return val. */
-void build_l17_add_overflow_no(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_chkarith(ctx, INTRIN_ADD_OVERFLOW, 20, 22, 0);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l18_add_overflow_yes — add_overflow(INT_MAX,1) → ovf=1; return ovf. */
-void build_l18_add_overflow_yes(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_chkarith(ctx, INTRIN_ADD_OVERFLOW, 0x7FFFFFFF, 1, 1);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l19_sub_overflow_yes — sub_overflow(INT_MIN,1) → ovf=1. */
-void build_l19_sub_overflow_yes(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_chkarith(ctx, INTRIN_SUB_OVERFLOW, (i64)(i32)0x80000000, 1, 1);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* l20_mul_overflow_no — mul_overflow(6,7) → val=42, ovf=0; return val. */
-void build_l20_mul_overflow_no(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = l_chkarith(ctx, INTRIN_MUL_OVERFLOW, 6, 7, 0);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_mc.c b/test/cg/harness/cases_mc.c
@@ -1,24 +0,0 @@
-/* MC-only cases — direct MCEmitter byte path; no CGTarget.
- * See CORPUS.md for the case list and expected values. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group: MC-only (lowest layer)
- * ============================================================ */
-
-/* mc_smoke — emit `mov w0, #42; ret` as raw AArch64 bytes through
- * MCEmitter. No CGTarget involved. Validates the byte path end-to-end. */
-void build_mc_smoke(CgTestCtx* ctx) {
-  static const u8 BYTES[8] = {
-      /* mov w0, #42 */ 0x40, 0x05, 0x80, 0x52,
-      /* ret */ 0xc0,         0x03, 0x5f, 0xd6,
-  };
-
-  ObjSymId sym = cgtest_mc_begin_main(ctx);
-  ctx->mc->set_section(ctx->mc, ctx->text_sec);
-  ctx->mc->emit_align(ctx->mc, 4, 0);
-  u32 start = ctx->mc->pos(ctx->mc);
-  ctx->mc->emit_bytes(ctx->mc, BYTES, sizeof BYTES);
-  cgtest_mc_end_main(ctx, sym, start);
-}
diff --git a/test/cg/harness/cases_n.c b/test/cg/harness/cases_n.c
@@ -1,286 +0,0 @@
-/* Group N — TLS (thread-local storage).
- * See CORPUS.md for the case list and expected values.
- *
- * Drives CGTarget.tls_addr_of and the SK_TLS / SF_TLS section/symbol
- * machinery on ObjBuilder. Each case allocates a `.tdata` (initialized)
- * or `.tbss` (zero-init) section, defines an SK_TLS symbol in it, and
- * accesses the storage via tls_addr_of → INDIRECT load/store.
- *
- * The aarch64 backend currently implements TLS Local-Exec only (see
- * c1cf117); GD/IE/LD models are not wired up. Path E (link+run) requires
- * test/link/harness/start.c's TCB+TLS setup; paths D/J have no TLS host
- * thread context, so they are expected to fail until the JIT runner
- * grows TLS support. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group N: TLS — _Thread_local globals via tls_addr_of
- * ============================================================ */
-
-/* Helper: define a `.tdata` section once, return its id. */
-static ObjSecId tls_get_tdata(CgTestCtx* ctx) {
-  Sym name = pool_intern_cstr(ctx->pool, ".tdata");
-  return obj_section(ctx->ob, name, SEC_DATA, SF_ALLOC | SF_WRITE | SF_TLS, 4);
-}
-
-/* Helper: define a `.tbss` section once, return its id. */
-static ObjSecId tls_get_tbss(CgTestCtx* ctx) {
-  Sym name = pool_intern_cstr(ctx->pool, ".tbss");
-  return obj_section_ex(ctx->ob, name, SEC_BSS, SSEM_NOBITS,
-                        SF_ALLOC | SF_WRITE | SF_TLS, 4, 0, 0, 0);
-}
-
-/* Helper: define an initialized TLS symbol. Writes `bytes` at the
- * current `.tdata` position and emits a SK_TLS symbol pointing to it. */
-static ObjSymId tls_define_init(CgTestCtx* ctx, const char* name,
-                                const u8* bytes, u32 size, u32 align) {
-  ObjSecId sec = tls_get_tdata(ctx);
-  obj_section_set_align(ctx->ob, sec, align);
-  u32 ofs = obj_pos(ctx->ob, sec);
-  /* Pad up to alignment. */
-  while (ofs & (align - 1)) {
-    u8 zero = 0;
-    obj_write(ctx->ob, sec, &zero, 1);
-    ofs++;
-  }
-  obj_write(ctx->ob, sec, bytes, size);
-  Sym sname = pool_intern_cstr(ctx->pool, name);
-  return obj_symbol(ctx->ob, sname, SB_GLOBAL, SK_TLS, sec, ofs, size);
-}
-
-/* Helper: define a zero-initialized TLS symbol in `.tbss`. */
-static ObjSymId tls_define_bss(CgTestCtx* ctx, const char* name, u32 size,
-                               u32 align) {
-  ObjSecId sec = tls_get_tbss(ctx);
-  obj_section_set_align(ctx->ob, sec, align);
-  /* obj_reserve_bss tracks bss_size; the symbol value is the offset
-   * within .tbss, which equals the section's bss_size before reserve. */
-  const Section* s = obj_section_get(ctx->ob, sec);
-  u32 ofs = s->bss_size;
-  while (ofs & (align - 1)) ofs++;
-  obj_reserve_bss(ctx->ob, sec, ofs - s->bss_size + size, align);
-  Sym sname = pool_intern_cstr(ctx->pool, name);
-  return obj_symbol(ctx->ob, sname, SB_GLOBAL, SK_TLS, sec, ofs, size);
-}
-
-/* n01_tls_load_le — _Thread_local int x = 42; return x; */
-void build_n01_tls_load_le(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[4] = {42, 0, 0, 0};
-  ObjSymId x = tls_define_init(ctx, "n01_x", INIT, 4, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(p, T_ptr(ctx, I32)), x, 0);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  T->load(T, REG_op(r, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* n02_tls_store_le — _Thread_local int x; x = 42; return x; */
-void build_n02_tls_store_le(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId x = tls_define_bss(ctx, "n02_x", 4, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(p, T_ptr(ctx, I32)), x, 0);
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  T->store(T, IND_op(p, 0, I32), IMM_op(42, I32), ma);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* n03_tls_addr_taken — _Thread_local int x = 17; int *p = &x; *p += 1;
- * return *p; — addr-taken TLS local; one materialization of the
- * thread pointer is reused for the load/store/load sequence. */
-void build_n03_tls_addr_taken(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[4] = {17, 0, 0, 0};
-  ObjSymId x = tls_define_init(ctx, "n03_x", INIT, 4, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(p, T_ptr(ctx, I32)), x, 0);
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(val, I32), IND_op(p, 0, I32), ma);
-  T->binop(T, BO_IADD, REG_op(val, I32), REG_op(val, I32), IMM_op(1, I32));
-  T->store(T, IND_op(p, 0, I32), REG_op(val, I32), ma);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(out, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* n04_tls_i64 — _Thread_local long long x = 0x1_0000_002A;
- * return (int)x; — exercises 8-byte TLS access with TLSLE_LDST64
- * relocation kinds (vs the 32-bit family in n01/n02). */
-void build_n04_tls_i64(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  static const u8 INIT[8] = {0x2A, 0, 0, 0, 0x01, 0, 0, 0};
-  ObjSymId x = tls_define_init(ctx, "n04_x", INIT, 8, 8);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I64));
-  T->tls_addr_of(T, REG_op(p, T_ptr(ctx, I64)), x, 0);
-
-  Reg r64 = T->alloc_reg(T, RC_INT, I64);
-  MemAccess ma = {
-      .type = I64, .size = 8, .align = 8, .alias.kind = ALIAS_GLOBAL};
-  T->load(T, REG_op(r64, I64), IND_op(p, 0, I64), ma);
-
-  Reg r32 = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_TRUNC, REG_op(r32, I32), REG_op(r64, I64));
-  cgtest_ret_reg(tf, r32, I32);
-  cgtest_end(tf);
-}
-
-/* n05_tls_in_loop — TLS access inside a loop; the address materialization
- * may be hoisted by opt_cgtarget but must remain correct. Body:
- *   _Thread_local int x = 0;
- *   for (i = 0; i < 10; i++) x += 1;
- *   return x; → 10 */
-void build_n05_tls_in_loop(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[4] = {0, 0, 0, 0};
-  ObjSymId x = tls_define_init(ctx, "n05_x", INIT, 4, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  FrameSlot islot = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, islot, IMM_op(0, I32), I32);
-
-  Label top = T->label_new(T);
-  Label done = T->label_new(T);
-  T->label_place(T, top);
-  Reg ireg = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ireg, I32), islot, I32);
-  T->cmp_branch(T, CMP_GE_S, REG_op(ireg, I32), IMM_op(10, I32), done);
-
-  /* x += 1; */
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(p, T_ptr(ctx, I32)), x, 0);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  Reg cur = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(cur, I32), IND_op(p, 0, I32), ma);
-  T->binop(T, BO_IADD, REG_op(cur, I32), REG_op(cur, I32), IMM_op(1, I32));
-  T->store(T, IND_op(p, 0, I32), REG_op(cur, I32), ma);
-
-  Reg inew = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(inew, I32), islot, I32);
-  T->binop(T, BO_IADD, REG_op(inew, I32), REG_op(inew, I32), IMM_op(1, I32));
-  cgtest_store_local(tf, islot, REG_op(inew, I32), I32);
-  T->jump(T, top);
-
-  T->label_place(T, done);
-  Reg p2 = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(p2, T_ptr(ctx, I32)), x, 0);
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(out, I32), IND_op(p2, 0, I32), ma);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* n06_tls_two_vars — two distinct TLS variables; sum = 42.
- *   _Thread_local int a = 10;
- *   _Thread_local int b = 32;
- *   return a + b; */
-void build_n06_tls_two_vars(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT_A[4] = {10, 0, 0, 0};
-  static const u8 INIT_B[4] = {32, 0, 0, 0};
-  ObjSymId a = tls_define_init(ctx, "n06_a", INIT_A, 4, 4);
-  ObjSymId b = tls_define_init(ctx, "n06_b", INIT_B, 4, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-
-  Reg pa = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  Reg pb = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(pa, T_ptr(ctx, I32)), a, 0);
-  T->tls_addr_of(T, REG_op(pb, T_ptr(ctx, I32)), b, 0);
-
-  Reg ra = T->alloc_reg(T, RC_INT, I32);
-  Reg rb = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(ra, I32), IND_op(pa, 0, I32), ma);
-  T->load(T, REG_op(rb, I32), IND_op(pb, 0, I32), ma);
-
-  Reg sum = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(sum, I32), REG_op(ra, I32), REG_op(rb, I32));
-  cgtest_ret_reg(tf, sum, I32);
-  cgtest_end(tf);
-}
-
-/* n07_tls_bss_zero_init — _Thread_local int x; (no initializer → .tbss);
- * return x; → 0. The TLS image must zero-fill .tbss in the per-thread
- * area; the harness's start.c is responsible for that on path E. */
-void build_n07_tls_bss_zero_init(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId x = tls_define_bss(ctx, "n07_x", 4, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(p, T_ptr(ctx, I32)), x, 0);
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  T->load(T, REG_op(r, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* n08_tls_addend_offset — _Thread_local int a[8] = {0,..,0,42};
- * return a[7]; — exercises the addend on tls_addr_of (or an indirect
- * +offset load). 32 bytes, 4-byte align. Offset of a[7] = 28. */
-void build_n08_tls_addend_offset(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[32] = {
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0,
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0,
-  };
-  ObjSymId arr = tls_define_init(ctx, "n08_arr", INIT, 32, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Address of arr (no addend); read from base+28. The backend may
-   * fold the addend into the TLSLE relocation sequence. */
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->tls_addr_of(T, REG_op(p, T_ptr(ctx, I32)), arr, 0);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  T->load(T, REG_op(r, I32), IND_op(p, 28, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_o.c b/test/cg/harness/cases_o.c
@@ -1,381 +0,0 @@
-/* Group O — sections and globals (non-TLS).
- * See CORPUS.md for the case list and expected values.
- *
- * Drives addr_of on OPK_GLOBAL operands plus direct GLOBAL load/store
- * (the load/store methods accept LOCAL|GLOBAL|INDIRECT addr operands).
- * Also exercises the SecKind / SymKind / SymBind matrix on ObjBuilder:
- * SEC_DATA, SEC_BSS, SEC_RODATA × SK_OBJ × SB_GLOBAL/SB_LOCAL, plus a
- * named non-default text section for a function. The aggregate-global
- * cases reuse cases_shared's Pt to keep one TagId interned across
- * groups. */
-
-#include "cases_shared.h"
-#include "cg_test.h"
-
-/* ============================================================
- * Group O: sections and globals
- * ============================================================ */
-
-/* Helper: define a `.data` symbol initialized to `bytes`. */
-static ObjSymId data_define(CgTestCtx* ctx, const char* name, const u8* bytes,
-                            u32 size, u32 align, SymBind bind) {
-  Sym sec_name = pool_intern_cstr(ctx->pool, ".data");
-  ObjSecId sec =
-      obj_section(ctx->ob, sec_name, SEC_DATA, SF_ALLOC | SF_WRITE, align);
-  obj_section_set_align(ctx->ob, sec, align);
-  u32 ofs = obj_pos(ctx->ob, sec);
-  while (ofs & (align - 1)) {
-    u8 z = 0;
-    obj_write(ctx->ob, sec, &z, 1);
-    ofs++;
-  }
-  obj_write(ctx->ob, sec, bytes, size);
-  Sym sname = pool_intern_cstr(ctx->pool, name);
-  return obj_symbol(ctx->ob, sname, bind, SK_OBJ, sec, ofs, size);
-}
-
-/* Helper: define a zero-initialized `.bss` symbol. */
-static ObjSymId bss_define(CgTestCtx* ctx, const char* name, u32 size,
-                           u32 align, SymBind bind) {
-  Sym sec_name = pool_intern_cstr(ctx->pool, ".bss");
-  ObjSecId sec = obj_section_ex(ctx->ob, sec_name, SEC_BSS, SSEM_NOBITS,
-                                SF_ALLOC | SF_WRITE, align, 0, 0, 0);
-  const Section* s = obj_section_get(ctx->ob, sec);
-  u32 ofs = s->bss_size;
-  while (ofs & (align - 1)) ofs++;
-  obj_reserve_bss(ctx->ob, sec, (ofs - s->bss_size) + size, align);
-  Sym sname = pool_intern_cstr(ctx->pool, name);
-  return obj_symbol(ctx->ob, sname, bind, SK_OBJ, sec, ofs, size);
-}
-
-/* Helper: define a `.rodata` symbol initialized to `bytes`. */
-static ObjSymId rodata_define(CgTestCtx* ctx, const char* name, const u8* bytes,
-                              u32 size, u32 align) {
-  Sym sec_name = pool_intern_cstr(ctx->pool, ".rodata");
-  ObjSecId sec = obj_section(ctx->ob, sec_name, SEC_RODATA, SF_ALLOC, align);
-  u32 ofs = obj_pos(ctx->ob, sec);
-  while (ofs & (align - 1)) {
-    u8 z = 0;
-    obj_write(ctx->ob, sec, &z, 1);
-    ofs++;
-  }
-  obj_write(ctx->ob, sec, bytes, size);
-  Sym sname = pool_intern_cstr(ctx->pool, name);
-  return obj_symbol(ctx->ob, sname, SB_LOCAL, SK_OBJ, sec, ofs, size);
-}
-
-/* o01_global_load_data — int g = 42; return g; — direct GLOBAL load. */
-void build_o01_global_load_data(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[4] = {42, 0, 0, 0};
-  ObjSymId g = data_define(ctx, "o01_g", INIT, 4, 4, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  /* load directly from a GLOBAL operand — the backend lowers the
-   * page-relative addressing internally. */
-  Operand addr = GLOBAL_op(g, 0);
-  addr.type = I32;
-  T->load(T, REG_op(r, I32), addr, ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* o02_global_store_data — int g = 0; g = 42; return g; — store via
- * GLOBAL operand, then read back. */
-void build_o02_global_store_data(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[4] = {0, 0, 0, 0};
-  ObjSymId g = data_define(ctx, "o02_g", INIT, 4, 4, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  Operand addr = GLOBAL_op(g, 0);
-  addr.type = I32;
-  T->store(T, addr, IMM_op(42, I32), ma);
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), addr, ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* o03_global_bss_zero — int g; return g; — uninitialized .bss reads
- * back as zero. The exit code is 0. */
-void build_o03_global_bss_zero(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId g = bss_define(ctx, "o03_g", 4, 4, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  Operand addr = GLOBAL_op(g, 0);
-  addr.type = I32;
-  T->load(T, REG_op(r, I32), addr, ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* o04_global_addr_taken — int g = 17; int *p = &g; *p += 1; return *p;
- * Mirrors b05 over a global storage class. */
-void build_o04_global_addr_taken(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[4] = {17, 0, 0, 0};
-  ObjSymId g = data_define(ctx, "o04_g", INIT, 4, 4, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(p, T_ptr(ctx, I32)), GLOBAL_op(g, 0));
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  Reg val = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(val, I32), IND_op(p, 0, I32), ma);
-  T->binop(T, BO_IADD, REG_op(val, I32), REG_op(val, I32), IMM_op(1, I32));
-  T->store(T, IND_op(p, 0, I32), REG_op(val, I32), ma);
-
-  Reg out = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(out, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, out, I32);
-  cgtest_end(tf);
-}
-
-/* o05_global_i64 — long long g = 0x1_0000_002A; return (int)g; — 8-byte
- * global; exercises wider .data alignment + LDR Xt and downcast. */
-void build_o05_global_i64(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  static const u8 INIT[8] = {0x2A, 0, 0, 0, 0x01, 0, 0, 0};
-  ObjSymId g = data_define(ctx, "o05_g", INIT, 8, 8, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r64 = T->alloc_reg(T, RC_INT, I64);
-  MemAccess ma = {
-      .type = I64, .size = 8, .align = 8, .alias.kind = ALIAS_GLOBAL};
-  Operand addr = GLOBAL_op(g, 0);
-  addr.type = I64;
-  T->load(T, REG_op(r64, I64), addr, ma);
-
-  Reg r32 = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_TRUNC, REG_op(r32, I32), REG_op(r64, I64));
-  cgtest_ret_reg(tf, r32, I32);
-  cgtest_end(tf);
-}
-
-/* o06_rodata_load — static const int rd[4] = {1, 2, 42, 4}; return rd[2];
- * SEC_RODATA write fails at runtime if the linker emits the section
- * unwritably (which is the point). */
-void build_o06_rodata_load(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[16] = {
-      1, 0, 0, 0, 2, 0, 0, 0, 42, 0, 0, 0, 4, 0, 0, 0,
-  };
-  ObjSymId rd = rodata_define(ctx, "o06_rd", INIT, 16, 4);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(p, T_ptr(ctx, I32)), GLOBAL_op(rd, 0));
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  T->load(T, REG_op(r, I32), IND_op(p, 8, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* o07_global_struct_field — struct Pt g = {10, 32}; return g.a + g.b; */
-void build_o07_global_struct_field(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* PT = cases_pt_type(ctx);
-  (void)PT;
-  static const u8 INIT[8] = {10, 0, 0, 0, 32, 0, 0, 0};
-  ObjSymId g = data_define(ctx, "o07_g", INIT, 8, 4, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(p, T_ptr(ctx, I32)), GLOBAL_op(g, 0));
-
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  Reg a = T->alloc_reg(T, RC_INT, I32);
-  Reg b = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(a, I32), IND_op(p, 0, I32), ma);
-  T->load(T, REG_op(b, I32), IND_op(p, 4, I32), ma);
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(a, I32), REG_op(b, I32));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-}
-
-/* o08_global_array_runtime_idx — int g[5] = {1,2,3,4,5}; int i=2; return g[i];
- * Index is loaded from a local at runtime; the address is &g + i*4. */
-void build_o08_global_array_runtime_idx(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[20] = {
-      1, 0, 0, 0, 2, 0, 0, 0, 3, 0, 0, 0, 4, 0, 0, 0, 5, 0, 0, 0,
-  };
-  ObjSymId g = data_define(ctx, "o08_g", INIT, 20, 4, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* int i = 2; — keep i in a local so the index is dynamic. */
-  FrameSlot islot = cgtest_local(tf, I32, FSF_NONE);
-  cgtest_store_local(tf, islot, IMM_op(2, I32), I32);
-
-  Reg base = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(base, T_ptr(ctx, I32)), GLOBAL_op(g, 0));
-
-  Reg ireg = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(ireg, I32), islot, I32);
-  /* offs = i * 4 = i << 2 */
-  Reg offs = T->alloc_reg(T, RC_INT, T_i64(ctx));
-  T->convert(T, CV_SEXT, REG_op(offs, T_i64(ctx)), REG_op(ireg, I32));
-  T->binop(T, BO_SHL, REG_op(offs, T_i64(ctx)), REG_op(offs, T_i64(ctx)),
-           IMM_op(2, T_i64(ctx)));
-
-  /* addr = base + offs */
-  Reg addr = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->binop(T, BO_IADD, REG_op(addr, T_ptr(ctx, I32)),
-           REG_op(base, T_ptr(ctx, I32)), REG_op(offs, T_i64(ctx)));
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  T->load(T, REG_op(r, I32), IND_op(addr, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* o09_static_local_linkage — static int g = 42; return g; — SB_LOCAL
- * (file-static) symbol. The relocation must resolve to the local
- * definition without going through a GOT. */
-void build_o09_static_local_linkage(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[4] = {42, 0, 0, 0};
-  ObjSymId g = data_define(ctx, "o09_g", INIT, 4, 4, SB_LOCAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  Operand addr = GLOBAL_op(g, 0);
-  addr.type = I32;
-  T->load(T, REG_op(r, I32), addr, ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* o10_global_addend — int g[8] = {0,...,0,42}; return *(g+7); — addend
- * encoded into the OPK_GLOBAL operand rather than a runtime add. The
- * backend may fold the addend into ADD_LO12_NC (or equivalent). */
-void build_o10_global_addend(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  static const u8 INIT[32] = {
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,  0, 0, 0,
-      0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 42, 0, 0, 0,
-  };
-  ObjSymId g = data_define(ctx, "o10_g", INIT, 32, 4, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* addr_of(GLOBAL{g, 28}); load *addr. */
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(p, T_ptr(ctx, I32)), GLOBAL_op(g, 28));
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-  T->load(T, REG_op(r, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* o11_text_section_named — function placed in `.text.helper`, called
- * from test_main in the default `.text`. Models -ffunction-sections /
- * __attribute__((section("..."))) on a function. */
-void build_o11_text_section_named(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-
-  /* Create a separate text section and aim the next func_begin at it. */
-  Sym sec_name = pool_intern_cstr(ctx->pool, ".text.o11_helper");
-  ObjSecId helper_sec =
-      obj_section(ctx->ob, sec_name, SEC_TEXT, SF_ALLOC | SF_EXEC, 4);
-  ObjSecId saved = ctx->text_sec;
-  ctx->text_sec = helper_sec;
-  ctx->mc->set_section(ctx->mc, helper_sec);
-
-  /* Helper: int echo(int x) { return x; } */
-  CgTestFn* h = cgtest_begin_func(ctx, "o11_helper", I32, params, 1);
-  Reg hr = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_load_local(h, REG_op(hr, I32), cgtest_param_slot(h, 0), I32);
-  cgtest_ret_reg(h, hr, I32);
-  cgtest_end(h);
-  ObjSymId helper_sym = h->sym;
-
-  /* Restore default text section for test_main. */
-  ctx->text_sec = saved;
-  ctx->mc->set_section(ctx->mc, saved);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 42}};
-  cgtest_call(tf, helper_sym, I32, params, args, 1, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* o12_global_across_call — int g = 42; helper modifies nothing relevant;
- * return g; — verifies global address materialization is not corrupted
- * by an intervening call (caller-saved register policy). */
-void build_o12_global_across_call(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  static const u8 INIT[4] = {42, 0, 0, 0};
-  ObjSymId g = data_define(ctx, "o12_g", INIT, 4, 4, SB_GLOBAL);
-
-  /* Simple int echo helper, isolated to this case. */
-  CgTestFn* h = cgtest_begin_func(ctx, "o12_echo", I32, params, 1);
-  Reg hr = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_load_local(h, REG_op(hr, I32), cgtest_param_slot(h, 0), I32);
-  cgtest_ret_reg(h, hr, I32);
-  cgtest_end(h);
-  ObjSymId echo = h->sym;
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  MemAccess ma = {
-      .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_GLOBAL};
-
-  /* Materialize &g, do an intervening call that may clobber p, then
-   * load *p. The backend must either preserve p or remate the addr. */
-  Reg p = T->alloc_reg(T, RC_INT, T_ptr(ctx, I32));
-  T->addr_of(T, REG_op(p, T_ptr(ctx, I32)), GLOBAL_op(g, 0));
-
-  Reg ignored = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg args[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 99}};
-  cgtest_call(tf, echo, I32, params, args, 1, REG_op(ignored, I32));
-
-  Reg r = T->alloc_reg(T, RC_INT, I32);
-  T->load(T, REG_op(r, I32), IND_op(p, 0, I32), ma);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_p.c b/test/cg/harness/cases_p.c
@@ -1,132 +0,0 @@
-/* Group P — set_loc / debug.
- * See CORPUS.md for the case list and expected values.
- *
- * Group P's oracle is metadata, not exit code: the case still returns 42
- * (so D/E/J keep passing) but the *real* assertion runs through path W,
- * which opens the emitted obj with cfree_dwarf_open and checks the line
- * program against the (file, line) pairs the case set via cgtest_set_loc.
- *
- * The harness is the parser stand-in per doc/DWARF.md §3.1: cgtest_set_loc
- * fans the loc to both CGTarget (which forwards to MCEmitter so per-insn
- * emit gets attribution) and Debug (debug_set_pending_loc). Group P cases
- * register dwarf-check directives in cases.c so cg-runner emits them on
- * --dwarf-checks NAME for the W path runner. */
-
-#include "cg_test.h"
-#include "core/core.h"
-
-/* p01_line_one_inst — one instruction at a known SrcLoc.
- *
- * Registers a synthetic source file "p01.c" with the SourceManager,
- * stamps line 10 onto a single load_imm via cgtest_set_loc, and returns
- * 42. Path W asserts that the emitted obj's .debug_line maps some PC
- * inside test_main back to (p01.c, 10). */
-void build_p01_line_one_inst(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-
-  u32 file_id = source_add_memory(ctx->c->sources, "p01.c");
-  SrcLoc loc = {file_id, 10, 0};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  cgtest_set_loc(ctx, loc);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 42);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* p02_line_monotone — three lines, three rows.
- *
- * Three statement-level set_loc transitions on the same file; each
- * straddles at least one emitted instruction. The W path checks all
- * three (file, line) pairs round-trip via line_to_addr / addr_to_line.
- * Verifies the line program advances PC and line monotonically. */
-void build_p02_line_monotone(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  u32 file_id = source_add_memory(ctx->c->sources, "p02.c");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  cgtest_set_loc(ctx, (SrcLoc){file_id, 1, 0});
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 1);
-
-  cgtest_set_loc(ctx, (SrcLoc){file_id, 2, 0});
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 2);
-
-  cgtest_set_loc(ctx, (SrcLoc){file_id, 3, 0});
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 42);
-
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* p03_line_repeat — same line on two distinct PCs.
- *
- * Two statement-level set_loc transitions onto (p03.c, 7) interleaved
- * with intervening emits at a different line. Per doc/DWARF.md §3.4 the
- * line program records a row whenever PC advances, even if the line
- * doesn't change; one round-trip directive is enough to assert the
- * binding survives. */
-void build_p03_line_repeat(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  u32 file_id = source_add_memory(ctx->c->sources, "p03.c");
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-
-  cgtest_set_loc(ctx, (SrcLoc){file_id, 7, 0});
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 1);
-
-  cgtest_set_loc(ctx, (SrcLoc){file_id, 8, 0});
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 2);
-
-  cgtest_set_loc(ctx, (SrcLoc){file_id, 7, 0});
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 42);
-
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* p05_func_pc_range — exercise the (low_pc, high_pc) bounds.
- *
- * Body is identical to p01; the directive set adds `pc_range` which
- * checks the subprogram's range covers more than one instruction (i.e.
- * cgtest_end's debug_func_pc_range handed off real bounds). */
-void build_p05_func_pc_range(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  u32 file_id = source_add_memory(ctx->c->sources, "p05.c");
-  SrcLoc loc = {file_id, 11, 0};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  cgtest_set_loc(ctx, loc);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  ctx->target->load_imm(ctx->target, REG_op(r, I32), 42);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* p07_local_loc — variable-location query.
- *
- * Allocates a single i32 local named "my_local", stores 42 into it, and
- * reloads before return. cgtest_local_named registers a DW_TAG_variable
- * with a DW_OP_fbreg location; the W path's `var` directive checks the
- * round-trip kind (frame) but accepts any encoded offset (`*`). The
- * frame_ofs passed here is a synthetic value — backends don't expose a
- * real fp-relative offset for a FrameSlot. */
-void build_p07_local_loc(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  u32 file_id = source_add_memory(ctx->c->sources, "p07.c");
-  SrcLoc loc = {file_id, 5, 0};
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  cgtest_set_loc(ctx, loc);
-
-  FrameSlot slot = cgtest_local_named(tf, I32, FSF_NONE, "my_local", loc, -8);
-  cgtest_store_local(tf, slot, IMM_op(42, I32), I32);
-
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), slot, I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_q.c b/test/cg/harness/cases_q.c
@@ -1,473 +0,0 @@
-/* Group Q — multi-function (extends Group B's two-function pattern).
- * See CORPUS.md for the case list and expected values.
- *
- * Group B already validates that two func_begin/func_end pairs work in
- * one TU. Group Q stresses what falls out as the function count grows:
- *   - many small helpers (8+ functions per TU)
- *   - mixed SB_GLOBAL / SB_LOCAL (file-static) linkage
- *   - distinct param/return signatures sharing a CGTarget
- *   - per-function text sections (-ffunction-sections analogue)
- *   - calls between functions placed in different text sections
- *   - forward-declared helpers defined later in the TU
- *
- * Each case constructs a flat call graph rooted at test_main; the oracle
- * is the final exit code. */
-
-#include "cg_test.h"
-
-/* ============================================================
- * Group Q: multi-function
- * ============================================================ */
-
-/* Helper: int return, no params, body returns IMM `v`. */
-static ObjSymId qfn_const(CgTestCtx* ctx, const char* name, i64 v,
-                          SymBind bind) {
-  const Type* I32 = T_i32(ctx);
-  Sym sname = pool_intern_cstr(ctx->pool, name);
-  ObjSymId sym = obj_symbol(ctx->ob, sname, bind, SK_FUNC, OBJ_SEC_NONE, 0, 0);
-  CgTestFn* tf = cgtest_begin_func_at(ctx, sym, I32, NULL, 0);
-  cgtest_ret_imm(tf, v, I32);
-  cgtest_end(tf);
-  return sym;
-}
-
-/* Helper: int echo(int x) — distinct symbol per case. */
-static ObjSymId qfn_echo(CgTestCtx* ctx, const char* name) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-  CgTestFn* tf = cgtest_begin_func(ctx, name, I32, params, 1);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r, I32), cgtest_param_slot(tf, 0), I32);
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-  return tf->sym;
-}
-
-/* q01_three_helpers — three int(void) helpers a/b/c each returning
- * a partial sum; main returns a()+b()+c() = 10+15+17 = 42. */
-void build_q01_three_helpers(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId a = qfn_const(ctx, "q01_a", 10, SB_GLOBAL);
-  ObjSymId b = qfn_const(ctx, "q01_b", 15, SB_GLOBAL);
-  ObjSymId c = qfn_const(ctx, "q01_c", 17, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg ra = T->alloc_reg(T, RC_INT, I32);
-  Reg rb = T->alloc_reg(T, RC_INT, I32);
-  Reg rc = T->alloc_reg(T, RC_INT, I32);
-  cgtest_call(tf, a, I32, NULL, NULL, 0, REG_op(ra, I32));
-  cgtest_call(tf, b, I32, NULL, NULL, 0, REG_op(rb, I32));
-  cgtest_call(tf, c, I32, NULL, NULL, 0, REG_op(rc, I32));
-
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(ra, I32), REG_op(rb, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(rc, I32));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-}
-
-/* q02_static_internal_linkage — `static int helper(void) { return 42; }`
- * SB_LOCAL symbol; the call lowers to a near branch resolved within
- * this TU (no PLT/GOT). */
-void build_q02_static_internal_linkage(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId h = qfn_const(ctx, "q02_helper", 42, SB_LOCAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_call(tf, h, I32, NULL, NULL, 0, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* q03_intra_tu_call_chain — a→b→c→d, where a/b/c are bodies that just
- * tail-forward to the next, and d returns 42. Built without
- * CG_CALL_TAIL — exercises a 4-deep linear call stack. */
-void build_q03_intra_tu_call_chain(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-
-  /* d: returns 42. */
-  ObjSymId d = qfn_const(ctx, "q03_d", 42, SB_GLOBAL);
-  /* c: returns d(); */
-  ObjSymId c;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q03_c", I32, NULL, 0);
-    Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-    cgtest_call(tf, d, I32, NULL, NULL, 0, REG_op(r, I32));
-    cgtest_ret_reg(tf, r, I32);
-    cgtest_end(tf);
-    c = tf->sym;
-  }
-  /* b: returns c(); */
-  ObjSymId b;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q03_b", I32, NULL, 0);
-    Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-    cgtest_call(tf, c, I32, NULL, NULL, 0, REG_op(r, I32));
-    cgtest_ret_reg(tf, r, I32);
-    cgtest_end(tf);
-    b = tf->sym;
-  }
-  /* a: returns b(); */
-  ObjSymId a;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q03_a", I32, NULL, 0);
-    Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-    cgtest_call(tf, b, I32, NULL, NULL, 0, REG_op(r, I32));
-    cgtest_ret_reg(tf, r, I32);
-    cgtest_end(tf);
-    a = tf->sym;
-  }
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_call(tf, a, I32, NULL, NULL, 0, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* q04_eight_helpers — eight int(int) helpers, each adding a constant.
- * Composing them in order yields 0 + 1+2+3+4+5+6+7+8 = 36. Plus a 6
- * baseline → 42. Stresses many func_begin/func_end pairs in one TU. */
-void build_q04_eight_helpers(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* params[] = {I32};
-
-  ObjSymId helpers[8];
-  for (int i = 0; i < 8; ++i) {
-    char name[16];
-    name[0] = 'q';
-    name[1] = '0';
-    name[2] = '4';
-    name[3] = '_';
-    name[4] = 'h';
-    name[5] = (char)('1' + i);
-    name[6] = 0;
-    Sym sn = pool_intern_cstr(ctx->pool, name);
-    ObjSymId sym =
-        obj_symbol(ctx->ob, sn, SB_GLOBAL, SK_FUNC, OBJ_SEC_NONE, 0, 0);
-    CgTestFn* tf = cgtest_begin_func_at(ctx, sym, I32, params, 1);
-    CGTarget* T = ctx->target;
-    Reg x = T->alloc_reg(T, RC_INT, I32);
-    cgtest_load_local(tf, REG_op(x, I32), cgtest_param_slot(tf, 0), I32);
-    Reg r = T->alloc_reg(T, RC_INT, I32);
-    T->binop(T, BO_IADD, REG_op(r, I32), REG_op(x, I32), IMM_op(i + 1, I32));
-    cgtest_ret_reg(tf, r, I32);
-    cgtest_end(tf);
-    helpers[i] = sym;
-  }
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-  /* Start at 6, then chain h1..h8. 6 + (1+2+...+8) = 6 + 36 = 42. */
-  Reg cur = T->alloc_reg(T, RC_INT, I32);
-  T->load_imm(T, REG_op(cur, I32), 6);
-  for (int i = 0; i < 8; ++i) {
-    Reg next = T->alloc_reg(T, RC_INT, I32);
-    CgTestArg args[] = {{.kind = CGT_ARG_REG, .type = I32, .v.reg = cur}};
-    cgtest_call(tf, helpers[i], I32, params, args, 1, REG_op(next, I32));
-    cur = next;
-  }
-  cgtest_ret_reg(tf, cur, I32);
-  cgtest_end(tf);
-}
-
-/* q05_distinct_signatures — four helpers with distinct (ret, params)
- * signatures, all called from main; sum truncated to 42.
- *   int   h_int (int);
- *   long  h_long(long, long);
- *   void  h_void(int*);
- *   int   h_zero(void);
- * Sum: 10 + 20 + 5 + 7 = 42 (low 32 of long sum). */
-void build_q05_distinct_signatures(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  const Type* I64 = T_i64(ctx);
-  const Type* PI32 = T_ptr(ctx, I32);
-  const Type* VOID = T_void(ctx);
-
-  /* h_int(x) = x + 5 */
-  const Type* p_int[] = {I32};
-  ObjSymId h_int;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q05_h_int", I32, p_int, 1);
-    CGTarget* T = ctx->target;
-    Reg x = T->alloc_reg(T, RC_INT, I32);
-    cgtest_load_local(tf, REG_op(x, I32), cgtest_param_slot(tf, 0), I32);
-    Reg r = T->alloc_reg(T, RC_INT, I32);
-    T->binop(T, BO_IADD, REG_op(r, I32), REG_op(x, I32), IMM_op(5, I32));
-    cgtest_ret_reg(tf, r, I32);
-    cgtest_end(tf);
-    h_int = tf->sym;
-  }
-  /* h_long(a, b) = a + b */
-  const Type* p_long[] = {I64, I64};
-  ObjSymId h_long;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q05_h_long", I64, p_long, 2);
-    CGTarget* T = ctx->target;
-    Reg a = T->alloc_reg(T, RC_INT, I64);
-    Reg b = T->alloc_reg(T, RC_INT, I64);
-    cgtest_load_local(tf, REG_op(a, I64), cgtest_param_slot(tf, 0), I64);
-    cgtest_load_local(tf, REG_op(b, I64), cgtest_param_slot(tf, 1), I64);
-    Reg s = T->alloc_reg(T, RC_INT, I64);
-    T->binop(T, BO_IADD, REG_op(s, I64), REG_op(a, I64), REG_op(b, I64));
-    cgtest_ret_reg(tf, s, I64);
-    cgtest_end(tf);
-    h_long = tf->sym;
-  }
-  /* h_void(p) { *p = 5; } */
-  const Type* p_void[] = {PI32};
-  ObjSymId h_void;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q05_h_void", VOID, p_void, 1);
-    CGTarget* T = ctx->target;
-    Reg p = T->alloc_reg(T, RC_INT, PI32);
-    cgtest_load_local(tf, REG_op(p, PI32), cgtest_param_slot(tf, 0), PI32);
-    MemAccess ma = {
-        .type = I32, .size = 4, .align = 4, .alias.kind = ALIAS_LOCAL};
-    T->store(T, IND_op(p, 0, I32), IMM_op(5, I32), ma);
-    cgtest_ret_void(tf);
-    cgtest_end(tf);
-    h_void = tf->sym;
-  }
-  /* h_zero(void) = 7 */
-  ObjSymId h_zero = qfn_const(ctx, "q05_h_zero", 7, SB_GLOBAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* h_int(5) = 10 */
-  Reg r_int = T->alloc_reg(T, RC_INT, I32);
-  CgTestArg a_int[] = {{.kind = CGT_ARG_IMM, .type = I32, .v.imm = 5}};
-  cgtest_call(tf, h_int, I32, p_int, a_int, 1, REG_op(r_int, I32));
-
-  /* h_long(8, 12) = 20 */
-  Reg r_long = T->alloc_reg(T, RC_INT, I64);
-  CgTestArg a_long[] = {
-      {.kind = CGT_ARG_IMM, .type = I64, .v.imm = 8},
-      {.kind = CGT_ARG_IMM, .type = I64, .v.imm = 12},
-  };
-  cgtest_call(tf, h_long, I64, p_long, a_long, 2, REG_op(r_long, I64));
-
-  /* int x; h_void(&x); — x is set to 5. */
-  FrameSlot xslot = cgtest_local(tf, I32, FSF_ADDR_TAKEN);
-  cgtest_store_local(tf, xslot, IMM_op(0, I32), I32);
-  Reg px = T->alloc_reg(T, RC_INT, PI32);
-  T->addr_of(T, REG_op(px, PI32), LOCAL_op(xslot, I32));
-  CgTestArg a_void[] = {{.kind = CGT_ARG_REG, .type = PI32, .v.reg = px}};
-  cgtest_call(tf, h_void, VOID, p_void, a_void, 1, IMM_op(0, VOID));
-  Reg r_void = T->alloc_reg(T, RC_INT, I32);
-  cgtest_load_local(tf, REG_op(r_void, I32), xslot, I32);
-
-  /* h_zero() = 7 */
-  Reg r_zero = T->alloc_reg(T, RC_INT, I32);
-  cgtest_call(tf, h_zero, I32, NULL, NULL, 0, REG_op(r_zero, I32));
-
-  /* sum = r_int + (i32)r_long + r_void + r_zero. */
-  Reg r_long_lo = T->alloc_reg(T, RC_INT, I32);
-  T->convert(T, CV_TRUNC, REG_op(r_long_lo, I32), REG_op(r_long, I64));
-
-  /* Accumulate into r_int rather than allocating a fresh sum reg — keeps
-   * concurrent Val count at 6, matching the x64 INT pool (RBX, R12..R15,
-   * R10). A 7th alloc would return REG_NONE here, and opt_cgtarget's
-   * replay alloc-on-first-use exhausts the same pool at -O1. */
-  T->binop(T, BO_IADD, REG_op(r_int, I32), REG_op(r_int, I32),
-           REG_op(r_long_lo, I32));
-  T->binop(T, BO_IADD, REG_op(r_int, I32), REG_op(r_int, I32),
-           REG_op(r_void, I32));
-  T->binop(T, BO_IADD, REG_op(r_int, I32), REG_op(r_int, I32),
-           REG_op(r_zero, I32));
-  cgtest_ret_reg(tf, r_int, I32);
-  cgtest_end(tf);
-}
-
-/* q06_function_section_distinct — helper placed in `.text.q06_helper`,
- * test_main in default `.text`. CGFuncDesc.text_section_id varies per
- * function; the backend must honor it. */
-void build_q06_function_section_distinct(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-
-  Sym sec_name = pool_intern_cstr(ctx->pool, ".text.q06_helper");
-  ObjSecId helper_sec =
-      obj_section(ctx->ob, sec_name, SEC_TEXT, SF_ALLOC | SF_EXEC, 4);
-  ObjSecId saved = ctx->text_sec;
-  ctx->text_sec = helper_sec;
-  ctx->mc->set_section(ctx->mc, helper_sec);
-
-  ObjSymId helper = qfn_const(ctx, "q06_helper", 42, SB_GLOBAL);
-
-  ctx->text_sec = saved;
-  ctx->mc->set_section(ctx->mc, saved);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_call(tf, helper, I32, NULL, NULL, 0, REG_op(r, I32));
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* q07_cross_section_calls — two helpers, each in its own
- * `.text.<name>`, calling each other plus test_main calling one.
- * Caller and callee in distinct text sections must produce a CALL26
- * relocation (or veneer) rather than a fixed PC-relative offset. */
-void build_q07_cross_section_calls(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-
-  /* helper_a in .text.q07_a — returns helper_b() + 10. */
-  Sym sa = pool_intern_cstr(ctx->pool, ".text.q07_a");
-  Sym sb = pool_intern_cstr(ctx->pool, ".text.q07_b");
-  ObjSecId sec_a = obj_section(ctx->ob, sa, SEC_TEXT, SF_ALLOC | SF_EXEC, 4);
-  ObjSecId sec_b = obj_section(ctx->ob, sb, SEC_TEXT, SF_ALLOC | SF_EXEC, 4);
-
-  /* Forward-decl both syms so each body can reference the other. */
-  ObjSymId hb = cgtest_decl_func(ctx, "q07_b");
-
-  ObjSecId saved = ctx->text_sec;
-
-  /* helper_a body in sec_a. */
-  ctx->text_sec = sec_a;
-  ctx->mc->set_section(ctx->mc, sec_a);
-  ObjSymId ha;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q07_a", I32, NULL, 0);
-    CGTarget* T = ctx->target;
-    Reg r_b = T->alloc_reg(T, RC_INT, I32);
-    cgtest_call(tf, hb, I32, NULL, NULL, 0, REG_op(r_b, I32));
-    Reg r = T->alloc_reg(T, RC_INT, I32);
-    T->binop(T, BO_IADD, REG_op(r, I32), REG_op(r_b, I32), IMM_op(10, I32));
-    cgtest_ret_reg(tf, r, I32);
-    cgtest_end(tf);
-    ha = tf->sym;
-  }
-
-  /* helper_b body in sec_b — returns 32. */
-  ctx->text_sec = sec_b;
-  ctx->mc->set_section(ctx->mc, sec_b);
-  {
-    CgTestFn* tf = cgtest_begin_func_at(ctx, hb, I32, NULL, 0);
-    cgtest_ret_imm(tf, 32, I32);
-    cgtest_end(tf);
-  }
-
-  /* test_main back in default `.text`, calls helper_a → 42. */
-  ctx->text_sec = saved;
-  ctx->mc->set_section(ctx->mc, saved);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_call(tf, ha, I32, NULL, NULL, 0, REG_op(r, I32));
-  cgtest_ret_reg(tf, r, I32);
-  cgtest_end(tf);
-}
-
-/* q08_forward_decl_define_late — declare helper at the start, define it
- * after test_main. test_main's call site is emitted before the symbol
- * has a section/value; obj_finalize is responsible for resolving the
- * relocation once cgtest_begin_func_at fills in the symbol body. */
-void build_q08_forward_decl_define_late(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId h = cgtest_decl_func(ctx, "q08_late");
-
-  /* Emit test_main first — it calls h before h has a body. */
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_call(tf, h, I32, NULL, NULL, 0, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-
-  /* Now define h. */
-  {
-    CgTestFn* tf2 = cgtest_begin_func_at(ctx, h, I32, NULL, 0);
-    cgtest_ret_imm(tf2, 42, I32);
-    cgtest_end(tf2);
-  }
-}
-
-/* q09_helper_calls_helper — a → b, both globals; main calls a. */
-void build_q09_helper_calls_helper(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId b = qfn_const(ctx, "q09_b", 42, SB_GLOBAL);
-
-  /* a returns b(). */
-  ObjSymId a;
-  {
-    CgTestFn* tf = cgtest_begin_func(ctx, "q09_a", I32, NULL, 0);
-    Reg r = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-    cgtest_call(tf, b, I32, NULL, NULL, 0, REG_op(r, I32));
-    cgtest_ret_reg(tf, r, I32);
-    cgtest_end(tf);
-    a = tf->sym;
-  }
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  Reg dst = ctx->target->alloc_reg(ctx->target, RC_INT, I32);
-  cgtest_call(tf, a, I32, NULL, NULL, 0, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
-
-/* q10_global_and_static_mix — three helpers in one TU: SB_GLOBAL +
- * SB_LOCAL + SB_LOCAL. All three are called; sum = 12+15+15 = 42. */
-void build_q10_global_and_static_mix(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  ObjSymId g = qfn_const(ctx, "q10_global", 12, SB_GLOBAL);
-  ObjSymId s1 = qfn_const(ctx, "q10_static1", 15, SB_LOCAL);
-  ObjSymId s2 = qfn_const(ctx, "q10_static2", 15, SB_LOCAL);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  Reg rg = T->alloc_reg(T, RC_INT, I32);
-  Reg rs1 = T->alloc_reg(T, RC_INT, I32);
-  Reg rs2 = T->alloc_reg(T, RC_INT, I32);
-  cgtest_call(tf, g, I32, NULL, NULL, 0, REG_op(rg, I32));
-  cgtest_call(tf, s1, I32, NULL, NULL, 0, REG_op(rs1, I32));
-  cgtest_call(tf, s2, I32, NULL, NULL, 0, REG_op(rs2, I32));
-
-  Reg s = T->alloc_reg(T, RC_INT, I32);
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(rg, I32), REG_op(rs1, I32));
-  T->binop(T, BO_IADD, REG_op(s, I32), REG_op(s, I32), REG_op(rs2, I32));
-  cgtest_ret_reg(tf, s, I32);
-  cgtest_end(tf);
-}
-
-/* q11_addr_of_helper_through_global — store helper's address into a
- * data global, load it, indirect-call. Tests function-symbol relocation
- * into a non-text section (data ABS64) and indirect call via REG. */
-void build_q11_addr_of_helper_through_global(CgTestCtx* ctx) {
-  const Type* I32 = T_i32(ctx);
-  /* The helper. */
-  ObjSymId h = qfn_const(ctx, "q11_helper", 42, SB_GLOBAL);
-
-  /* Allocate a .data slot of pointer size with an ABS64 reloc to h. */
-  Sym dn = pool_intern_cstr(ctx->pool, ".data");
-  ObjSecId data_sec =
-      obj_section(ctx->ob, dn, SEC_DATA, SF_ALLOC | SF_WRITE, 8);
-  static const u8 ZERO8[8] = {0};
-  u32 dofs = obj_pos(ctx->ob, data_sec);
-  obj_write(ctx->ob, data_sec, ZERO8, 8);
-  obj_reloc(ctx->ob, data_sec, dofs, R_ABS64, h, 0);
-  Sym fn = pool_intern_cstr(ctx->pool, "q11_fp");
-  ObjSymId fp_sym =
-      obj_symbol(ctx->ob, fn, SB_GLOBAL, SK_OBJ, data_sec, dofs, 8);
-
-  CgTestFn* tf = cgtest_begin_main(ctx, I32);
-  CGTarget* T = ctx->target;
-
-  /* Load the function pointer from the global slot. */
-  const Type* fn_ty = type_func(ctx->pool, I32, NULL, 0, 0);
-  const Type* fnp_ty = T_ptr(ctx, fn_ty);
-  Reg fp = T->alloc_reg(T, RC_INT, fnp_ty);
-  MemAccess ma = {
-      .type = fnp_ty, .size = 8, .align = 8, .alias.kind = ALIAS_GLOBAL};
-  Operand addr = GLOBAL_op(fp_sym, 0);
-  addr.type = fnp_ty;
-  T->load(T, REG_op(fp, fnp_ty), addr, ma);
-
-  Reg dst = T->alloc_reg(T, RC_INT, I32);
-  cgtest_call_indirect(tf, fp, I32, NULL, NULL, 0, REG_op(dst, I32));
-  cgtest_ret_reg(tf, dst, I32);
-  cgtest_end(tf);
-}
diff --git a/test/cg/harness/cases_shared.c b/test/cg/harness/cases_shared.c
@@ -1,14 +0,0 @@
-/* Shared helpers across Group case files. See cases_shared.h. */
-
-#include "cases_shared.h"
-
-const Type* cases_pt_type(CgTestCtx* ctx) {
-  Sym tag = pool_intern_cstr(ctx->pool, "Pt");
-  TagId tid = type_tag_new(ctx->pool, TAG_STRUCT, tag, (SrcLoc){0, 0, 0});
-  TypeRecordBuilder* b = type_record_begin(ctx->pool, TY_STRUCT, tid, tag);
-  type_record_field(
-      b, (Field){.name = pool_intern_cstr(ctx->pool, "a"), .type = T_i32(ctx)});
-  type_record_field(
-      b, (Field){.name = pool_intern_cstr(ctx->pool, "b"), .type = T_i32(ctx)});
-  return type_record_end(ctx->pool, b);
-}
diff --git a/test/cg/harness/cases_shared.h b/test/cg/harness/cases_shared.h
@@ -1,17 +0,0 @@
-/* Helpers shared across more than one Group's case file.
- *
- * Per-case helpers and per-group struct types stay file-static in their
- * cases_<x>.c. This header is for the rare item that genuinely crosses a
- * group boundary — e.g., the 8-byte Pt struct used by both Group B's
- * sret/byval cases and Group F's copy_bytes case. Routing through one
- * shared definition keeps the type's TagId interned to a single id. */
-
-#ifndef CFREE_TEST_CG_CASES_SHARED_H
-#define CFREE_TEST_CG_CASES_SHARED_H
-
-#include "cg_test.h"
-
-/* struct Pt { int a; int b; }; — 8-byte two-i32 record. */
-const Type* cases_pt_type(CgTestCtx*);
-
-#endif
diff --git a/test/cg/harness/cg_check_dwarf.c b/test/cg/harness/cg_check_dwarf.c
@@ -1,429 +0,0 @@
-/* cg_check_dwarf — path W oracle for the test/cg harness.
- *
- *   cg_check_dwarf <obj_path>     # reads directives from stdin, one per line
- *
- * Directives (see cg_test.h for the contract):
- *
- *   line FILE LINE
- *       Some PC inside the object's text must map to (FILE, LINE) via
- *       cfree_dwarf_addr_to_line, and cfree_dwarf_line_to_addr(FILE, LINE)
- *       must return a PC that maps back to (FILE, LINE).
- *
- *   subprogram NAME
- *       cfree_dwarf_subprogram_at must report a non-empty pc range whose
- *       name equals NAME.
- *
- *   pc_range FILE LINE MIN_SIZE MAX_SIZE
- *       Resolve (FILE, LINE) -> pc, then call subprogram_at(pc) and
- *       require (high_pc - low_pc) to fall in [MIN_SIZE, MAX_SIZE]. This
- *       sanity-checks that debug_func_pc_range fed real bounds and
- *       neither under- nor over-flowed.
- *
- *   var PC NAME EXPECT_KIND EXPECT_VALUE
- *       cfree_dwarf_var_at(pc=PC, name=NAME) must succeed. EXPECT_KIND
- *       is one of: reg, frame, global. EXPECT_VALUE is parsed against
- *       the kind: an unsigned integer for reg / global, a signed integer
- *       for frame. The "*" wildcard accepts any value of that kind.
- *
- * Exit code: 0 if every directive passes; 1 if any directive fails or the
- * object cannot be opened. Blank lines and lines beginning with '#' are
- * ignored. */
-
-#include <cfree.h>
-#include <fcntl.h>
-#include <stdarg.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/stat.h>
-#include <unistd.h>
-
-/* ---- env ---- */
-
-static void* h_alloc(CfreeHeap* h, size_t n, size_t a) {
-  (void)h;
-  (void)a;
-  return n ? malloc(n) : NULL;
-}
-static void* h_realloc(CfreeHeap* h, void* p, size_t o, size_t n, size_t a) {
-  (void)h;
-  (void)o;
-  (void)a;
-  return realloc(p, n);
-}
-static void h_free(CfreeHeap* h, void* p, size_t n) {
-  (void)h;
-  (void)n;
-  free(p);
-}
-static CfreeHeap g_heap = {h_alloc, h_realloc, h_free, NULL};
-
-static void diag_emit(CfreeDiagSink* s, CfreeDiagKind k, CfreeSrcLoc loc,
-                      const char* fmt, va_list ap) {
-  static const char* names[] = {"note", "warning", "error", "fatal"};
-  (void)s;
-  (void)loc;
-  fprintf(stderr, "%s: ", names[k]);
-  vfprintf(stderr, fmt, ap);
-  fputc('\n', stderr);
-}
-static CfreeDiagSink g_diag = {diag_emit, NULL, 0, 0};
-
-/* ---- file slurp ---- */
-
-static int slurp(const char* path, uint8_t** out, size_t* n_out) {
-  int fd = open(path, O_RDONLY);
-  if (fd < 0) {
-    perror(path);
-    return 1;
-  }
-  struct stat st;
-  if (fstat(fd, &st) != 0) {
-    perror("fstat");
-    close(fd);
-    return 1;
-  }
-  size_t n = (size_t)st.st_size;
-  uint8_t* buf = (uint8_t*)malloc(n);
-  if (!buf) {
-    close(fd);
-    return 1;
-  }
-  size_t off = 0;
-  while (off < n) {
-    ssize_t k = read(fd, buf + off, n - off);
-    if (k <= 0) {
-      perror("read");
-      close(fd);
-      free(buf);
-      return 1;
-    }
-    off += (size_t)k;
-  }
-  close(fd);
-  *out = buf;
-  *n_out = n;
-  return 0;
-}
-
-/* ---- directive checks ---- */
-
-typedef struct Ctx {
-  CfreeCompiler* cc;
-  CfreeDebugInfo* di;
-  int fails;
-} Ctx;
-
-static void fail(Ctx* c, const char* fmt, ...) {
-  va_list ap;
-  va_start(ap, fmt);
-  fputs("FAIL ", stdout);
-  vfprintf(stdout, fmt, ap);
-  fputc('\n', stdout);
-  va_end(ap);
-  c->fails++;
-}
-
-static void pass(const char* fmt, ...) {
-  va_list ap;
-  va_start(ap, fmt);
-  fputs("PASS ", stdout);
-  vfprintf(stdout, fmt, ap);
-  fputc('\n', stdout);
-  va_end(ap);
-}
-
-static void check_line(Ctx* c, const char* file, uint32_t line) {
-  uint64_t pc = 0;
-  if (cfree_dwarf_line_to_addr(c->di, file, line, &pc) != 0) {
-    fail(c, "line %s:%u — line_to_addr returned no PC", file, line);
-    return;
-  }
-  const char* got_file = NULL;
-  uint32_t got_line = 0, got_col = 0;
-  if (cfree_dwarf_addr_to_line(c->di, pc, &got_file, &got_line, &got_col) !=
-      0) {
-    fail(c, "line %s:%u — addr_to_line(0x%llx) returned no entry", file, line,
-         (unsigned long long)pc);
-    return;
-  }
-  if (!got_file || strcmp(got_file, file) != 0 || got_line != line) {
-    fail(c, "line %s:%u — round-tripped to %s:%u (pc=0x%llx)", file, line,
-         got_file ? got_file : "(null)", got_line, (unsigned long long)pc);
-    return;
-  }
-  pass("line %s:%u (pc=0x%llx)", file, line, (unsigned long long)pc);
-}
-
-static void check_pc_range(Ctx* c, const char* file, uint32_t line,
-                           uint64_t min_size, uint64_t max_size) {
-  uint64_t pc = 0;
-  if (cfree_dwarf_line_to_addr(c->di, file, line, &pc) != 0) {
-    fail(c, "pc_range %s:%u — line_to_addr returned no PC", file, line);
-    return;
-  }
-  CfreeDwarfSubprogram sp;
-  if (cfree_dwarf_subprogram_at(c->di, pc, &sp) != 0) {
-    fail(c, "pc_range %s:%u — subprogram_at(0x%llx) returned no entry", file,
-         line, (unsigned long long)pc);
-    return;
-  }
-  if (sp.high_pc <= sp.low_pc) {
-    fail(c, "pc_range %s:%u — empty pc range [0x%llx, 0x%llx)", file, line,
-         (unsigned long long)sp.low_pc, (unsigned long long)sp.high_pc);
-    return;
-  }
-  uint64_t size = sp.high_pc - sp.low_pc;
-  if (size < min_size || size > max_size) {
-    fail(c, "pc_range %s:%u — size %llu not in [%llu, %llu]", file, line,
-         (unsigned long long)size, (unsigned long long)min_size,
-         (unsigned long long)max_size);
-    return;
-  }
-  pass("pc_range %s:%u size=%llu", file, line, (unsigned long long)size);
-}
-
-static const char* loc_kind_str(CfreeDwarfLocKind k) {
-  switch (k) {
-    case CFREE_DLOC_REG:
-      return "reg";
-    case CFREE_DLOC_FRAME_OFS:
-      return "frame";
-    case CFREE_DLOC_GLOBAL:
-      return "global";
-    case CFREE_DLOC_EXPR:
-      return "expr";
-  }
-  return "?";
-}
-
-static void check_var(Ctx* c, uint64_t pc, const char* name,
-                      const char* expect_kind, const char* expect_value) {
-  CfreeDwarfVarLoc loc;
-  memset(&loc, 0, sizeof loc);
-  if (cfree_dwarf_var_at(c->di, pc, name, &loc) != 0) {
-    fail(c, "var 0x%llx %s — var_at returned no entry", (unsigned long long)pc,
-         name);
-    return;
-  }
-
-  CfreeDwarfLocKind want;
-  if (strcmp(expect_kind, "reg") == 0)
-    want = CFREE_DLOC_REG;
-  else if (strcmp(expect_kind, "frame") == 0)
-    want = CFREE_DLOC_FRAME_OFS;
-  else if (strcmp(expect_kind, "global") == 0)
-    want = CFREE_DLOC_GLOBAL;
-  else {
-    fail(c, "var %s — unknown expect_kind %s", name, expect_kind);
-    return;
-  }
-  if (loc.kind != want) {
-    fail(c, "var %s — kind %s, expected %s", name, loc_kind_str(loc.kind),
-         expect_kind);
-    return;
-  }
-
-  if (strcmp(expect_value, "*") != 0) {
-    if (want == CFREE_DLOC_REG) {
-      uint32_t want_r = (uint32_t)strtoul(expect_value, NULL, 0);
-      if (loc.v.reg != want_r) {
-        fail(c, "var %s — reg %u, expected %u", name, loc.v.reg, want_r);
-        return;
-      }
-    } else if (want == CFREE_DLOC_FRAME_OFS) {
-      int32_t want_o = (int32_t)strtol(expect_value, NULL, 0);
-      if (loc.v.frame_ofs != want_o) {
-        fail(c, "var %s — frame_ofs %d, expected %d", name, loc.v.frame_ofs,
-             want_o);
-        return;
-      }
-    } else if (want == CFREE_DLOC_GLOBAL) {
-      uint64_t want_g = strtoull(expect_value, NULL, 0);
-      if (loc.v.global != want_g) {
-        fail(c, "var %s — global 0x%llx, expected 0x%llx", name,
-             (unsigned long long)loc.v.global, (unsigned long long)want_g);
-        return;
-      }
-    }
-  }
-  pass("var %s kind=%s", name, expect_kind);
-}
-
-static void check_subprogram(Ctx* c, const char* name) {
-  /* No "find subprogram by name" entry exists in cfree_dwarf_*; we have
-   * subprogram_at(pc, ...). Walk a small probe range starting at 0 and
-   * accept the first hit whose name matches. This is a stopgap that
-   * keeps the directive vocabulary stable; once a name-keyed query
-   * lands we'll switch to it. */
-  CfreeDwarfSubprogram sp;
-  for (uint64_t pc = 0; pc < 0x10000ull; pc += 4) {
-    if (cfree_dwarf_subprogram_at(c->di, pc, &sp) != 0) continue;
-    if (sp.name && strcmp(sp.name, name) == 0) {
-      if (sp.high_pc <= sp.low_pc) {
-        fail(c, "subprogram %s — empty pc range [0x%llx, 0x%llx)", name,
-             (unsigned long long)sp.low_pc, (unsigned long long)sp.high_pc);
-        return;
-      }
-      pass("subprogram %s [0x%llx, 0x%llx)", name,
-           (unsigned long long)sp.low_pc, (unsigned long long)sp.high_pc);
-      return;
-    }
-  }
-  fail(c, "subprogram %s — not found in first 64KB of text", name);
-}
-
-static void run_directive(Ctx* c, char* line) {
-  while (*line == ' ' || *line == '\t') line++;
-  if (*line == '\0' || *line == '\n' || *line == '#') return;
-
-  /* strip trailing newline */
-  size_t n = strlen(line);
-  while (n > 0 && (line[n - 1] == '\n' || line[n - 1] == '\r')) line[--n] = 0;
-
-  char* sp = strchr(line, ' ');
-  if (!sp) {
-    fail(c, "bad directive: %s", line);
-    return;
-  }
-  *sp = 0;
-  const char* op = line;
-  char* rest = sp + 1;
-
-  if (strcmp(op, "line") == 0) {
-    char* sp2 = strchr(rest, ' ');
-    if (!sp2) {
-      fail(c, "line: expected FILE LINE");
-      return;
-    }
-    *sp2 = 0;
-    const char* file = rest;
-    long ln = strtol(sp2 + 1, NULL, 10);
-    if (ln <= 0) {
-      fail(c, "line: bad line number");
-      return;
-    }
-    check_line(c, file, (uint32_t)ln);
-  } else if (strcmp(op, "subprogram") == 0) {
-    check_subprogram(c, rest);
-  } else if (strcmp(op, "pc_range") == 0) {
-    /* pc_range FILE LINE MIN_SIZE MAX_SIZE */
-    char* tok[4];
-    int ntok = 0;
-    char* p = rest;
-    while (ntok < 4) {
-      tok[ntok++] = p;
-      char* nxt = strchr(p, ' ');
-      if (!nxt) break;
-      *nxt = 0;
-      p = nxt + 1;
-    }
-    if (ntok != 4) {
-      fail(c, "pc_range: expected FILE LINE MIN_SIZE MAX_SIZE");
-      return;
-    }
-    const char* file = tok[0];
-    long ln = strtol(tok[1], NULL, 10);
-    unsigned long long mn = strtoull(tok[2], NULL, 0);
-    unsigned long long mx = strtoull(tok[3], NULL, 0);
-    if (ln <= 0) {
-      fail(c, "pc_range: bad line number");
-      return;
-    }
-    check_pc_range(c, file, (uint32_t)ln, mn, mx);
-  } else if (strcmp(op, "var") == 0) {
-    /* var PC NAME EXPECT_KIND EXPECT_VALUE */
-    char* tok[4];
-    int ntok = 0;
-    char* p = rest;
-    while (ntok < 4) {
-      tok[ntok++] = p;
-      char* nxt = strchr(p, ' ');
-      if (!nxt) break;
-      *nxt = 0;
-      p = nxt + 1;
-    }
-    if (ntok != 4) {
-      fail(c, "var: expected PC NAME EXPECT_KIND EXPECT_VALUE");
-      return;
-    }
-    uint64_t pc = strtoull(tok[0], NULL, 0);
-    check_var(c, pc, tok[1], tok[2], tok[3]);
-  } else {
-    fail(c, "unknown directive: %s", op);
-  }
-}
-
-/* ---- main ---- */
-
-int main(int argc, char** argv) {
-  if (argc != 2) {
-    fprintf(stderr, "usage: cg_check_dwarf <obj_path>\n");
-    return 2;
-  }
-  const char* obj_path = argv[1];
-
-  /* Slurp the obj. */
-  uint8_t* bytes = NULL;
-  size_t nbytes = 0;
-  if (slurp(obj_path, &bytes, &nbytes) != 0) return 1;
-
-  CfreeTarget target;
-  memset(&target, 0, sizeof target);
-  target.arch = CFREE_ARCH_ARM_64;
-  target.os = CFREE_OS_LINUX;
-  target.obj = CFREE_OBJ_ELF;
-  target.ptr_size = 8;
-  target.ptr_align = 8;
-  CfreeEnv env;
-  memset(&env, 0, sizeof env);
-  env.heap = &g_heap;
-  env.diag = &g_diag;
-  env.now = -1;
-
-  CfreeCompiler* cc = cfree_compiler_new(target, &env);
-  if (!cc) {
-    fprintf(stderr, "cg_check_dwarf: compiler_new failed\n");
-    free(bytes);
-    return 1;
-  }
-
-  CfreeBytesInput in;
-  memset(&in, 0, sizeof in);
-  in.name = obj_path;
-  in.data = bytes;
-  in.len = nbytes;
-  CfreeObjFile* obj = cfree_obj_open(&env, &in);
-  if (!obj) {
-    fprintf(stderr, "cg_check_dwarf: cannot open %s as object\n", obj_path);
-    cfree_compiler_free(cc);
-    free(bytes);
-    return 1;
-  }
-
-  CfreeDebugInfo* di = cfree_dwarf_open(cc, obj);
-  if (!di) {
-    fprintf(stderr,
-            "cg_check_dwarf: %s has no DWARF (cfree_dwarf_open returned "
-            "NULL)\n",
-            obj_path);
-    cfree_obj_close(obj);
-    cfree_compiler_free(cc);
-    free(bytes);
-    return 1;
-  }
-
-  Ctx ctx = {cc, di, 0};
-
-  /* Stream directives from stdin. */
-  char buf[1024];
-  while (fgets(buf, sizeof buf, stdin)) {
-    run_directive(&ctx, buf);
-  }
-
-  cfree_dwarf_close(di);
-  cfree_obj_close(obj);
-  cfree_compiler_free(cc);
-  free(bytes);
-  return ctx.fails ? 1 : 0;
-}
diff --git a/test/cg/harness/cg_runner.c b/test/cg/harness/cg_runner.c
@@ -1,657 +0,0 @@
-/* cg_runner — multi-mode test runner for the cg/CGTarget/MCEmitter stack.
- *
- *   cg-runner --list                     # print every registered case name
- *   cg-runner --expected NAME            # print expected exit code (stdout)
- *   cg-runner --emit NAME OUT.o          # build, emit_elf, write to OUT.o
- *   cg-runner --jit  NAME                # build, link, JIT, call test_main;
- *                                        # exit code = test_main's return
- *
- * The --jit path uses link_add_obj on the in-process ObjBuilder, so it
- * exercises the live OB → JIT mapping (no .o serialization). The --emit
- * path produces a .o that the existing test/link harness binaries
- * (cfree-roundtrip, link-exe-runner, jit-runner) consume to drive paths
- * R, E, and J. The shell harness compares the exit codes of those runs
- * against the value reported by --expected. */
-
-#include <cfree.h>
-#include <fcntl.h>
-#include <stdarg.h>
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-#include <sys/mman.h>
-#include <sys/stat.h>
-#include <unistd.h>
-
-#include "abi/abi.h"
-#include "arch/arch.h"
-#include "cg_test.h"
-#include "core/core.h"
-#include "core/pool.h"
-#include "debug/debug.h"
-#include "lib/cfree_test_target.h"
-#include "link/link.h"
-#include "obj/obj.h"
-#include "opt/opt.h"
-#include "type/type.h"
-
-/* --opt-level N: wrap the constructed CGTarget with opt_cgtarget_new(level)
- * before each case runs. 0 (default) drives the backend directly; 1 / 2
- * exercise the opt pipeline. The corpus is the equivalence oracle — every
- * case's exit code at level 0 must match levels 1 / 2. */
-static int g_opt_level = 0;
-
-/* ---- env ---- */
-
-static void* h_alloc(CfreeHeap* h, size_t n, size_t a) {
-  (void)h;
-  (void)a;
-  return n ? malloc(n) : NULL;
-}
-static void* h_realloc(CfreeHeap* h, void* p, size_t o, size_t n, size_t a) {
-  (void)h;
-  (void)o;
-  (void)a;
-  return realloc(p, n);
-}
-static void h_free(CfreeHeap* h, void* p, size_t n) {
-  (void)h;
-  (void)n;
-  free(p);
-}
-static CfreeHeap g_heap = {h_alloc, h_realloc, h_free, NULL};
-
-static void diag_emit(CfreeDiagSink* s, CfreeDiagKind k, CfreeSrcLoc loc,
-                      const char* fmt, va_list ap) {
-  static const char* names[] = {"note", "warning", "error", "fatal"};
-  (void)s;
-  (void)loc;
-  fprintf(stderr, "%s: ", names[k]);
-  vfprintf(stderr, fmt, ap);
-  fputc('\n', stderr);
-}
-static CfreeDiagSink g_diag = {diag_emit, NULL, 0, 0};
-
-/* posix-backed CfreeExecMem for the JIT path. Mirrors driver/env.c — see
- * that file for the strict-W^X dual-mapping rationale. Apple uses
- * mach_vm_remap; Linux uses memfd_create + dual mmap; other POSIX falls
- * back to a single mapping with mprotect transitions. */
-#if defined(__APPLE__)
-#include <mach/mach.h>
-#include <mach/mach_vm.h>
-#define XM_DUAL_APPLE 1
-#else
-#define XM_DUAL_APPLE 0
-#endif
-#if defined(__linux__)
-#include <sys/syscall.h>
-#define XM_DUAL_LINUX 1
-#else
-#define XM_DUAL_LINUX 0
-#endif
-static int xm_to_posix(int p) {
-  int q = 0;
-  if (p & CFREE_PROT_READ) q |= PROT_READ;
-  if (p & CFREE_PROT_WRITE) q |= PROT_WRITE;
-  if (p & CFREE_PROT_EXEC) q |= PROT_EXEC;
-  return q;
-}
-typedef struct XmTok {
-  void* w;
-  void* r;
-  size_t n;
-} XmTok;
-static int xm_reserve_single(size_t n, CfreeExecMemRegion* out) {
-  void* p =
-      mmap(NULL, n, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
-  if (p == MAP_FAILED) return 1;
-  out->write = out->runtime = p;
-  out->size = n;
-  out->token = NULL;
-  return 0;
-}
-static int xm_reserve(void* u, size_t n, int p, CfreeExecMemRegion* out) {
-  (void)u;
-  if (!out || !n) return 1;
-  if (!(p & CFREE_PROT_EXEC)) return xm_reserve_single(n, out);
-#if XM_DUAL_APPLE
-  {
-    void* w =
-        mmap(NULL, n, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
-    mach_vm_address_t r = 0;
-    vm_prot_t cur = 0, max = 0;
-    XmTok* tok;
-    if (w == MAP_FAILED) return 1;
-    if (mach_vm_remap(mach_task_self(), &r, (mach_vm_size_t)n, 0,
-                      VM_FLAGS_ANYWHERE, mach_task_self(),
-                      (mach_vm_address_t)(uintptr_t)w, FALSE, &cur, &max,
-                      VM_INHERIT_NONE) != KERN_SUCCESS) {
-      munmap(w, n);
-      return 1;
-    }
-    if (mprotect((void*)(uintptr_t)r, n, PROT_READ) != 0) {
-      munmap((void*)(uintptr_t)r, n);
-      munmap(w, n);
-      return 1;
-    }
-    tok = (XmTok*)malloc(sizeof(*tok));
-    if (!tok) {
-      munmap((void*)(uintptr_t)r, n);
-      munmap(w, n);
-      return 1;
-    }
-    tok->w = w;
-    tok->r = (void*)(uintptr_t)r;
-    tok->n = n;
-    out->write = w;
-    out->runtime = (void*)(uintptr_t)r;
-    out->size = n;
-    out->token = tok;
-    return 0;
-  }
-#elif XM_DUAL_LINUX
-  {
-    int fd = (int)syscall(SYS_memfd_create, "cfree-jit-test", 0u);
-    void *w, *r;
-    XmTok* tok;
-    if (fd < 0) return 1;
-    if (ftruncate(fd, (off_t)n) != 0) {
-      close(fd);
-      return 1;
-    }
-    w = mmap(NULL, n, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
-    if (w == MAP_FAILED) {
-      close(fd);
-      return 1;
-    }
-    r = mmap(NULL, n, PROT_READ, MAP_SHARED, fd, 0);
-    close(fd);
-    if (r == MAP_FAILED) {
-      munmap(w, n);
-      return 1;
-    }
-    tok = (XmTok*)malloc(sizeof(*tok));
-    if (!tok) {
-      munmap(r, n);
-      munmap(w, n);
-      return 1;
-    }
-    tok->w = w;
-    tok->r = r;
-    tok->n = n;
-    out->write = w;
-    out->runtime = r;
-    out->size = n;
-    out->token = tok;
-    return 0;
-  }
-#else
-  return xm_reserve_single(n, out);
-#endif
-}
-static int xm_protect(void* u, void* a, size_t n, int p) {
-  (void)u;
-  return mprotect(a, n, xm_to_posix(p));
-}
-static void xm_release(void* u, CfreeExecMemRegion* region) {
-  (void)u;
-  if (!region || !region->size) return;
-  if (region->token) {
-    XmTok* tok = (XmTok*)region->token;
-    if (tok->r && tok->r != tok->w) munmap(tok->r, tok->n);
-    if (tok->w) munmap(tok->w, tok->n);
-    free(tok);
-  } else if (region->write) {
-    munmap(region->write, region->size);
-  }
-  region->write = region->runtime = NULL;
-  region->size = 0;
-  region->token = NULL;
-}
-static void xm_flush(void* u, void* a, size_t n) {
-  (void)u;
-#if defined(__aarch64__) || defined(__arm__)
-  __builtin___clear_cache((char*)a, (char*)a + n);
-#else
-  (void)a;
-  (void)n;
-#endif
-}
-static CfreeExecMem g_execmem = {
-    16 * 1024, xm_reserve, xm_protect, xm_release, xm_flush, NULL,
-};
-
-/* ---- helpers ---- */
-
-static const CgCase* find_case(const char* name) {
-  for (unsigned i = 0; i < cg_cases_count; ++i) {
-    if (strcmp(cg_cases[i].name, name) == 0) return &cg_cases[i];
-  }
-  return NULL;
-}
-
-static void target_from_env(CfreeTarget* t) {
-  if (cfree_test_target_init(t) != 0) {
-    fprintf(stderr, "cg-runner: cfree_test_target_init failed\n");
-    exit(2);
-  }
-}
-
-/* Has this case registered any path-W DWARF directives? Used to decide
- * whether to construct a Debug producer for the build. */
-static int case_wants_dwarf(const char* name) {
-  for (unsigned i = 0; i < cg_dwarf_checks_count; ++i) {
-    if (strcmp(cg_dwarf_checks[i].case_name, name) == 0) return 1;
-  }
-  return 0;
-}
-
-/* Build the ObjBuilder for a case. On success returns 0 and fills *ob_out;
- * on panic returns nonzero (the diagnostic was already emitted). */
-typedef struct BuildState {
-  Compiler* c;
-  ObjBuilder* ob;
-  MCEmitter* mc;
-  CGTarget* target;
-  Debug* debug;
-  CgTestCtx ctx;
-} BuildState;
-
-static int build_case(BuildState* st, const CgCase* cc) {
-  Compiler* c = st->c;
-
-  if (setjmp(c->panic)) {
-    compiler_run_cleanups(c);
-    return 1;
-  }
-
-  st->ob = obj_new(c);
-  st->mc = mc_new(c, st->ob);
-
-  if (cc->kind != CG_CASE_MC_ONLY) {
-    st->target = cgtarget_new(c, st->ob, st->mc);
-    if (g_opt_level > 0) {
-      st->target = opt_cgtarget_new(c, st->target, g_opt_level);
-    }
-  } else {
-    st->target = NULL;
-  }
-
-  /* Construct a Debug producer for cases that register W-path directives.
-   * The harness is the parser stand-in per doc/DWARF.md §3.1; it owns
-   * Class-1 (debug_func_begin) and Class-3 (debug_func_pc_range) calls,
-   * dispatched from cgtest_begin_func / cgtest_end. The backend's
-   * Class-2 line-row fanout is reached through the Debug pointer we hand
-   * to MCEmitter and CGTarget below. */
-  if (case_wants_dwarf(cc->name) && st->target) {
-    st->debug = debug_new(c, st->ob);
-    st->mc->debug = st->debug;
-    st->target->debug = st->debug;
-  } else {
-    st->debug = NULL;
-  }
-
-  Sym text_name = pool_intern_cstr(c->global, ".text");
-  ObjSecId text_sec =
-      obj_section(st->ob, text_name, SEC_TEXT, SF_ALLOC | SF_EXEC, 4);
-
-  st->ctx.c = c;
-  st->ctx.ob = st->ob;
-  st->ctx.mc = st->mc;
-  st->ctx.target = st->target;
-  st->ctx.text_sec = text_sec;
-  st->ctx.pool = c->global;
-  st->ctx.debug = st->debug;
-
-  if (st->target) {
-    st->mc->set_section(st->mc, text_sec);
-  }
-
-  cc->build(&st->ctx);
-
-  if (st->target) cgtarget_finalize(st->target);
-  /* debug_emit must run after the backend has finished writing text but
-   * before obj_finalize, per doc/DWARF.md §3 / debug.h contract. */
-  if (st->debug) debug_emit(st->debug);
-  obj_finalize(st->ob);
-  return 0;
-}
-
-/* ---- modes ---- */
-
-static int mode_list(void) {
-  for (unsigned i = 0; i < cg_cases_count; ++i) {
-    fprintf(stdout, "%s\n", cg_cases[i].name);
-  }
-  return 0;
-}
-
-static int mode_expected(const char* name) {
-  const CgCase* cc = find_case(name);
-  if (!cc) {
-    fprintf(stderr, "cg-runner: unknown case '%s'\n", name);
-    return 2;
-  }
-  fprintf(stdout, "%d\n", cc->expected);
-  return 0;
-}
-
-/* --arches NAME — print one arch token per line for the named case.
- * Used by test/cg/run.sh to decide which exec_target backend to dispatch
- * path E through. Empty/zero arches in the registry mean CG_ARCH_DEFAULT
- * (aarch64 today). */
-static int mode_arches(const char* name) {
-  const CgCase* cc = find_case(name);
-  if (!cc) {
-    fprintf(stderr, "cg-runner: unknown case '%s'\n", name);
-    return 2;
-  }
-  unsigned arches = cc->arches ? cc->arches : (unsigned)CG_ARCH_DEFAULT;
-  if (arches & CG_ARCH_AARCH64) fputs("aarch64\n", stdout);
-  if (arches & CG_ARCH_X64)     fputs("x64\n", stdout);
-  if (arches & CG_ARCH_RV64)    fputs("rv64\n", stdout);
-  return 0;
-}
-
-/* CfreeWriter that wraps stdout; used by --dump-tape. */
-typedef struct StdoutWriter {
-  CfreeWriter base;
-} StdoutWriter;
-
-static void sw_write(CfreeWriter* w, const void* data, size_t n) {
-  (void)w;
-  fwrite(data, 1, n, stdout);
-}
-static void sw_seek(CfreeWriter* w, uint64_t off) {
-  (void)w;
-  (void)off;
-}
-static uint64_t sw_tell(CfreeWriter* w) {
-  (void)w;
-  return 0;
-}
-static int sw_error(CfreeWriter* w) {
-  (void)w;
-  return 0;
-}
-static void sw_close(CfreeWriter* w) { (void)w; }
-
-static StdoutWriter g_stdout_writer = {{sw_write, sw_seek, sw_tell, sw_error,
-                                        sw_close}};
-
-/* --dump-tape NAME — build the case at the current --opt-level (must be
- * >= 1) and print each function's recorded tape to stdout instead of
- * just running the equivalence path. Useful for ad-hoc inspection and
- * golden-file diffs. */
-static int mode_dump_tape(const char* name) {
-  const CgCase* cc = find_case(name);
-  if (!cc) {
-    fprintf(stderr, "cg-runner: unknown case '%s'\n", name);
-    return 2;
-  }
-  if (g_opt_level < 1) {
-    fprintf(stderr, "cg-runner: --dump-tape requires --opt-level >= 1\n");
-    return 2;
-  }
-
-  CfreeTarget target;
-  target_from_env(&target);
-  CfreeEnv env;
-  memset(&env, 0, sizeof env);
-  env.heap = &g_heap;
-  env.diag = &g_diag;
-  env.execmem = &g_execmem;
-  env.now = -1;
-
-  CfreeCompiler* cc_ = cfree_compiler_new(target, &env);
-  if (!cc_) return 2;
-
-  BuildState st;
-  memset(&st, 0, sizeof st);
-  st.c = (Compiler*)cc_;
-
-  /* Pre-empt build_case so we can install the dump writer before the
-   * case runs through func_begin/func_end. */
-  Compiler* c = st.c;
-  if (setjmp(c->panic)) {
-    compiler_run_cleanups(c);
-    cfree_compiler_free(cc_);
-    return 1;
-  }
-  st.ob = obj_new(c);
-  st.mc = mc_new(c, st.ob);
-  st.target = cgtarget_new(c, st.ob, st.mc);
-  st.target = opt_cgtarget_new(c, st.target, g_opt_level);
-  opt_set_dump_writer(st.target, &g_stdout_writer.base);
-
-  Sym text_name = pool_intern_cstr(c->global, ".text");
-  ObjSecId text_sec =
-      obj_section(st.ob, text_name, SEC_TEXT, SF_ALLOC | SF_EXEC, 4);
-
-  st.ctx.c = c;
-  st.ctx.ob = st.ob;
-  st.ctx.mc = st.mc;
-  st.ctx.target = st.target;
-  st.ctx.text_sec = text_sec;
-  st.ctx.pool = c->global;
-  st.ctx.debug = NULL;
-  st.mc->set_section(st.mc, text_sec);
-  cc->build(&st.ctx);
-  cgtarget_finalize(st.target);
-
-  cfree_compiler_free(cc_);
-  return 0;
-}
-
-/* --dwarf-checks NAME — print the W-path directive blob registered for
- * NAME, or nothing if the case has no DWARF checks. The shell harness
- * pipes this into cg_check_dwarf <obj>. */
-static int mode_dwarf_checks(const char* name) {
-  for (unsigned i = 0; i < cg_dwarf_checks_count; ++i) {
-    if (strcmp(cg_dwarf_checks[i].case_name, name) == 0) {
-      fputs(cg_dwarf_checks[i].directives, stdout);
-      return 0;
-    }
-  }
-  return 0; /* not registered → empty stdout, harness skips W */
-}
-
-static int mode_emit(const char* name, const char* out_path) {
-  const CgCase* cc = find_case(name);
-  if (!cc) {
-    fprintf(stderr, "cg-runner: unknown case '%s'\n", name);
-    return 2;
-  }
-
-  CfreeTarget target;
-  target_from_env(&target);
-  CfreeEnv env;
-  memset(&env, 0, sizeof env);
-  env.heap = &g_heap;
-  env.diag = &g_diag;
-  env.execmem = &g_execmem;
-  env.now = -1;
-
-  CfreeCompiler* cc_ = cfree_compiler_new(target, &env);
-  if (!cc_) {
-    fprintf(stderr, "cg-runner: compiler_new failed\n");
-    return 2;
-  }
-
-  BuildState st;
-  memset(&st, 0, sizeof st);
-  st.c = (Compiler*)cc_;
-  if (build_case(&st, cc)) {
-    cfree_compiler_free(cc_);
-    return 1;
-  }
-
-  /* Emit ELF to a memory writer, then dump to OUT_PATH. */
-  CfreeWriter* w = cfree_writer_mem(&g_heap);
-  emit_elf(st.c, st.ob, w);
-
-  size_t len = 0;
-  const uint8_t* data = cfree_writer_mem_bytes(w, &len);
-
-  int rc = 0;
-  int fd = open(out_path, O_WRONLY | O_CREAT | O_TRUNC, 0644);
-  if (fd < 0) {
-    perror(out_path);
-    rc = 2;
-  } else {
-    size_t off = 0;
-    while (off < len) {
-      ssize_t k = write(fd, data + off, len - off);
-      if (k <= 0) {
-        perror("write");
-        rc = 2;
-        break;
-      }
-      off += (size_t)k;
-    }
-    close(fd);
-  }
-  cfree_writer_close(w);
-  cfree_compiler_free(cc_);
-  return rc;
-}
-
-static int mode_jit(const char* name) {
-  const CgCase* cc = find_case(name);
-  if (!cc) {
-    fprintf(stderr, "cg-runner: unknown case '%s'\n", name);
-    return 2;
-  }
-
-  CfreeTarget target;
-  target_from_env(&target);
-  CfreeEnv env;
-  memset(&env, 0, sizeof env);
-  env.heap = &g_heap;
-  env.diag = &g_diag;
-  env.execmem = &g_execmem;
-  env.now = -1;
-
-  CfreeCompiler* cc_ = cfree_compiler_new(target, &env);
-  if (!cc_) {
-    fprintf(stderr, "cg-runner: compiler_new failed\n");
-    return 2;
-  }
-
-  BuildState st;
-  memset(&st, 0, sizeof st);
-  st.c = (Compiler*)cc_;
-  if (build_case(&st, cc)) {
-    cfree_compiler_free(cc_);
-    return 1;
-  }
-
-  /* Direct in-process link: hand the ObjBuilder to the linker. */
-  Compiler* c = st.c;
-  if (setjmp(c->panic)) {
-    compiler_run_cleanups(c);
-    cfree_compiler_free(cc_);
-    return 1;
-  }
-
-  Linker* lk = link_new(c);
-  link_add_obj(lk, st.ob);
-  link_set_entry(lk, "test_main");
-  LinkImage* img = link_resolve(lk);
-  if (!img) {
-    link_free(lk);
-    cfree_compiler_free(cc_);
-    return 1;
-  }
-  CfreeJit* jit = cfree_jit_from_image(img);
-  if (!jit) {
-    link_free(lk);
-    cfree_compiler_free(cc_);
-    return 1;
-  }
-
-  int (*fn)(void) = (int (*)(void))cfree_jit_lookup(jit, "test_main");
-
-  /* AArch64 TLS Local-Exec setup, mirroring jit_runner.c. Build a
-   * thread-local image (16-byte TCB + .tdata copy + .tbss zero-fill) and
-   * point TPIDR_EL0 at it just before invoking test_main. On Darwin,
-   * libc functions clobber TPIDR_EL0 (probably via dyld stub binding /
-   * locale TSD), so msr → call() must be back-to-back with NO libc
-   * invocations between. */
-#if defined(__aarch64__) || defined(__arm64__)
-  static char tls_block[8192] __attribute__((aligned(16)));
-  {
-    char* td_start = (char*)cfree_jit_lookup(jit, "__tdata_start");
-    char* td_end = (char*)cfree_jit_lookup(jit, "__tdata_end");
-    unsigned long bs_n =
-        (unsigned long)(unsigned long long)cfree_jit_lookup(jit, "__tbss_size");
-    if (td_start && td_end) {
-      unsigned long td_n = (unsigned long)(td_end - td_start);
-      unsigned long i;
-      /* Plain loops at -O0 stay loops; do NOT use memcpy/memset
-       * here — those go through dyld's stub binder on first call
-       * and clobber TPIDR_EL0. */
-      for (i = 0; i < td_n; ++i) tls_block[16 + i] = td_start[i];
-      for (i = 0; i < bs_n; ++i) tls_block[16 + td_n + i] = 0;
-    }
-  }
-#endif
-
-  int result;
-  if (fn) {
-#if defined(__aarch64__) || defined(__arm64__)
-    __asm__ volatile("msr tpidr_el0, %0" ::"r"(tls_block) : "memory");
-#endif
-    result = fn();
-  } else {
-    result = 1;
-  }
-
-  cfree_jit_free(jit);
-  link_free(lk);
-  cfree_compiler_free(cc_);
-  return result;
-}
-
-/* ---- main ---- */
-
-static int usage(void) {
-  fprintf(stderr,
-          "usage: cg-runner [--opt-level N] --list\n"
-          "       cg-runner [--opt-level N] --expected NAME\n"
-          "       cg-runner [--opt-level N] --arches NAME\n"
-          "       cg-runner [--opt-level N] --dwarf-checks NAME\n"
-          "       cg-runner [--opt-level N] --emit NAME OUT.o\n"
-          "       cg-runner [--opt-level N] --jit  NAME\n"
-          "       cg-runner --opt-level N --dump-tape NAME\n");
-  return 2;
-}
-
-int main(int argc, char** argv) {
-  {
-    long ps = sysconf(_SC_PAGESIZE);
-    if (ps > 0) g_execmem.page_size = (size_t)ps;
-  }
-  /* Optional leading --opt-level N flag. */
-  if (argc >= 3 && !strcmp(argv[1], "--opt-level")) {
-    g_opt_level = atoi(argv[2]);
-    argc -= 2;
-    argv += 2;
-  }
-  if (argc < 2) return usage();
-  if (!strcmp(argv[1], "--list"))
-    return mode_list();
-  else if (!strcmp(argv[1], "--expected") && argc == 3)
-    return mode_expected(argv[2]);
-  else if (!strcmp(argv[1], "--arches") && argc == 3)
-    return mode_arches(argv[2]);
-  else if (!strcmp(argv[1], "--dwarf-checks") && argc == 3)
-    return mode_dwarf_checks(argv[2]);
-  else if (!strcmp(argv[1], "--emit") && argc == 4)
-    return mode_emit(argv[2], argv[3]);
-  else if (!strcmp(argv[1], "--jit") && argc == 3)
-    return mode_jit(argv[2]);
-  else if (!strcmp(argv[1], "--dump-tape") && argc == 3)
-    return mode_dump_tape(argv[2]);
-  return usage();
-}
diff --git a/test/cg/harness/cg_test.c b/test/cg/harness/cg_test.c
@@ -1,456 +0,0 @@
-/* test/cg fixture API implementation.
- *
- * Drives the same building blocks the parser will: pool-interned Types,
- * abi_func_info classification, and the CGTarget lowering interface. No
- * ABI mocks; CGFuncDesc/CGParamDesc/CGCallDesc/CGABIValue are populated
- * from `c->abi` exactly as the parser will populate them. */
-
-#include "cg_test.h"
-
-#include <string.h>
-
-#include "core/arena.h"
-#include "core/pool.h"
-#include "debug/c_debug.h"
-#include "debug/debug.h"
-
-/* ---- pre-interned type accessors ---- */
-
-const Type* T_void(CgTestCtx* x) { return type_void(x->pool); }
-const Type* T_i8(CgTestCtx* x) { return type_prim(x->pool, TY_SCHAR); }
-const Type* T_u8(CgTestCtx* x) { return type_prim(x->pool, TY_UCHAR); }
-const Type* T_i16(CgTestCtx* x) { return type_prim(x->pool, TY_SHORT); }
-const Type* T_u16(CgTestCtx* x) { return type_prim(x->pool, TY_USHORT); }
-const Type* T_i32(CgTestCtx* x) { return type_prim(x->pool, TY_INT); }
-const Type* T_u32(CgTestCtx* x) { return type_prim(x->pool, TY_UINT); }
-const Type* T_i64(CgTestCtx* x) { return type_prim(x->pool, TY_LLONG); }
-const Type* T_u64(CgTestCtx* x) { return type_prim(x->pool, TY_ULLONG); }
-const Type* T_f32(CgTestCtx* x) { return type_prim(x->pool, TY_FLOAT); }
-const Type* T_f64(CgTestCtx* x) { return type_prim(x->pool, TY_DOUBLE); }
-const Type* T_ptr_void(CgTestCtx* x) {
-  return type_ptr(x->pool, type_void(x->pool));
-}
-const Type* T_ptr(CgTestCtx* x, const Type* p) { return type_ptr(x->pool, p); }
-
-/* ---- operand sugar ---- */
-
-Operand IMM_op(i64 v, const Type* ty) {
-  Operand o = {0};
-  o.kind = OPK_IMM;
-  o.cls = (ty && (ty->kind == TY_FLOAT || ty->kind == TY_DOUBLE ||
-                  ty->kind == TY_LDOUBLE))
-              ? RC_FP
-              : RC_INT;
-  o.type = ty;
-  o.v.imm = v;
-  return o;
-}
-Operand REG_op(Reg r, const Type* ty) {
-  Operand o = {0};
-  o.kind = OPK_REG;
-  o.cls = (ty && (ty->kind == TY_FLOAT || ty->kind == TY_DOUBLE ||
-                  ty->kind == TY_LDOUBLE))
-              ? RC_FP
-              : RC_INT;
-  o.type = ty;
-  o.v.reg = r;
-  return o;
-}
-Operand LOCAL_op(FrameSlot s, const Type* ty) {
-  Operand o = {0};
-  o.kind = OPK_LOCAL;
-  o.cls = RC_INT; /* address class is INT */
-  o.type = ty;
-  o.v.frame_slot = s;
-  return o;
-}
-Operand IND_op(Reg base, i32 ofs, const Type* ty) {
-  Operand o = {0};
-  o.kind = OPK_INDIRECT;
-  o.cls = (ty && (ty->kind == TY_FLOAT || ty->kind == TY_DOUBLE ||
-                  ty->kind == TY_LDOUBLE))
-              ? RC_FP
-              : RC_INT;
-  o.type = ty;
-  o.v.ind.base = base;
-  o.v.ind.ofs = ofs;
-  return o;
-}
-Operand GLOBAL_op(ObjSymId sym, i64 addend) {
-  Operand o = {0};
-  o.kind = OPK_GLOBAL;
-  o.cls = RC_INT;
-  o.type = NULL;
-  o.v.global.sym = sym;
-  o.v.global.addend = addend;
-  return o;
-}
-
-void cgtest_set_loc(CgTestCtx* ctx, SrcLoc loc) {
-  /* CGTarget.set_loc forwards to MCEmitter, which is what subsequent
-   * emit32 calls read for line-row attribution. Debug gets the same loc
-   * so that a row whose offset hasn't been emitted yet picks up the
-   * right pending value. */
-  if (ctx->target) ctx->target->set_loc(ctx->target, loc);
-  if (ctx->debug) debug_set_pending_loc(ctx->debug, loc);
-}
-
-/* ---- internal helpers ---- */
-
-static MemAccess default_memaccess(CgTestCtx* ctx, const Type* ty) {
-  MemAccess ma = {0};
-  ma.type = ty;
-  ma.size = abi_sizeof(ctx->c->abi, ty);
-  ma.align = abi_alignof(ctx->c->abi, ty);
-  ma.flags = MF_NONE;
-  ma.alias.kind = ALIAS_LOCAL;
-  return ma;
-}
-
-/* ---- function-fixture helpers ---- */
-
-CgTestFn* cgtest_begin_main(CgTestCtx* ctx, const Type* ret_ty) {
-  return cgtest_begin_func(ctx, "test_main", ret_ty, NULL, 0);
-}
-
-ObjSymId cgtest_decl_func(CgTestCtx* ctx, const char* name) {
-  Sym sname = pool_intern_cstr(ctx->pool, name);
-  return obj_symbol(ctx->ob, sname, SB_GLOBAL, SK_FUNC, OBJ_SEC_NONE, 0, 0);
-}
-
-CgTestFn* cgtest_begin_func(CgTestCtx* ctx, const char* name,
-                            const Type* ret_ty, const Type* const* param_types,
-                            u32 nparams) {
-  return cgtest_begin_func_at(ctx, cgtest_decl_func(ctx, name), ret_ty,
-                              param_types, nparams);
-}
-
-CgTestFn* cgtest_begin_func_at(CgTestCtx* ctx, ObjSymId pre_sym,
-                               const Type* ret_ty,
-                               const Type* const* param_types, u32 nparams) {
-  CgTestFn* tf = arena_new(ctx->c->tu, CgTestFn);
-  memset(tf, 0, sizeof *tf);
-  tf->ctx = ctx;
-  tf->ret_ty = ret_ty;
-
-  /* Build TY_FUNC and classify with the live TargetABI. */
-  const Type** ptypes = NULL;
-  if (nparams) {
-    ptypes = arena_array(ctx->c->tu, const Type*, nparams);
-    for (u32 i = 0; i < nparams; ++i) ptypes[i] = param_types[i];
-  }
-  tf->fn_type = type_func(ctx->pool, ret_ty, ptypes, (u16)nparams, 0);
-  tf->abi_info = abi_func_info(ctx->c->abi, tf->fn_type);
-
-  tf->sym = pre_sym;
-
-  /* Param slots + descriptors. Frame slots must be allocated against the
-   * function's frame, which begins at func_begin — so we do this AFTER
-   * func_begin below. We pre-allocate the descriptor array here. */
-  CGParamDesc* pds = NULL;
-  if (nparams) {
-    tf->params = arena_array(ctx->c->tu, CgTestParam, nparams);
-    memset(tf->params, 0, sizeof(CgTestParam) * nparams);
-    pds = arena_array(ctx->c->tu, CGParamDesc, nparams);
-    memset(pds, 0, sizeof(CGParamDesc) * nparams);
-    for (u32 i = 0; i < nparams; ++i) {
-      tf->params[i].type = ptypes[i];
-      tf->params[i].abi = &tf->abi_info->params[i];
-      pds[i].index = i;
-      pds[i].name = 0;
-      pds[i].type = ptypes[i];
-      pds[i].slot = FRAME_SLOT_NONE; /* filled below */
-      pds[i].abi = &tf->abi_info->params[i];
-      pds[i].incoming = tf->abi_info->params[i].parts;
-      pds[i].nincoming = tf->abi_info->params[i].nparts;
-      pds[i].loc = (SrcLoc){0, 0, 0};
-    }
-  }
-  tf->nparams = nparams;
-
-  tf->fd.sym = tf->sym;
-  tf->fd.text_section_id = ctx->text_sec;
-  tf->fd.group_id = OBJ_GROUP_NONE;
-  tf->fd.fn_type = tf->fn_type;
-  tf->fd.abi = tf->abi_info;
-  tf->fd.params = pds;
-  tf->fd.nparams = nparams;
-  tf->fd.loc = (SrcLoc){0, 0, 0};
-
-  /* Class-1 (parser-driven) DWARF event: a new subprogram opens. The
-   * harness doesn't run c_debug_type on the function's TY_FUNC — the W
-   * directives that exist today (`subprogram`, `pc_range`) only need
-   * (name, low_pc, high_pc), so we pass DEBUG_TYPE_NONE and skip the type
-   * DIE for the function itself. Capture the entry text offset so
-   * cgtest_end can hand (begin_ofs, end_ofs) to debug_func_pc_range. */
-  tf->func_begin_ofs = obj_pos(ctx->ob, ctx->text_sec);
-  if (ctx->debug) {
-    debug_func_begin(ctx->debug, tf->sym, DEBUG_TYPE_NONE, tf->fd.loc);
-  }
-
-  ctx->target->func_begin(ctx->target, &tf->fd);
-
-  /* Allocate FS_PARAM slots and dispatch param() in declaration order. */
-  for (u32 i = 0; i < nparams; ++i) {
-    FrameSlotDesc fsd = {
-        .type = ptypes[i],
-        .name = 0,
-        .loc = (SrcLoc){0, 0, 0},
-        .size = abi_sizeof(ctx->c->abi, ptypes[i]),
-        .align = abi_alignof(ctx->c->abi, ptypes[i]),
-        .kind = FS_PARAM,
-        .flags = FSF_NONE,
-    };
-    FrameSlot s = ctx->target->frame_slot(ctx->target, &fsd);
-    tf->params[i].slot = s;
-    pds[i].slot = s;
-    ctx->target->param(ctx->target, &pds[i]);
-  }
-  return tf;
-}
-
-FrameSlot cgtest_param_slot(CgTestFn* tf, u32 idx) {
-  return tf->params[idx].slot;
-}
-
-/* ---- frame slots and memory ---- */
-
-FrameSlot cgtest_local(CgTestFn* tf, const Type* ty, u16 flags) {
-  FrameSlotDesc fsd = {
-      .type = ty,
-      .name = 0,
-      .loc = (SrcLoc){0, 0, 0},
-      .size = abi_sizeof(tf->ctx->c->abi, ty),
-      .align = abi_alignof(tf->ctx->c->abi, ty),
-      .kind = FS_LOCAL,
-      .flags = flags,
-  };
-  return tf->ctx->target->frame_slot(tf->ctx->target, &fsd);
-}
-
-FrameSlot cgtest_local_named(CgTestFn* tf, const Type* ty, u16 flags,
-                             const char* name, SrcLoc decl, i32 frame_ofs) {
-  CgTestCtx* ctx = tf->ctx;
-  Sym name_sym = pool_intern_cstr(ctx->pool, name);
-  FrameSlotDesc fsd = {
-      .type = ty,
-      .name = name_sym,
-      .loc = decl,
-      .size = abi_sizeof(ctx->c->abi, ty),
-      .align = abi_alignof(ctx->c->abi, ty),
-      .kind = FS_LOCAL,
-      .flags = flags,
-  };
-  FrameSlot s = ctx->target->frame_slot(ctx->target, &fsd);
-  if (ctx->debug) {
-    DebugTypeId tid = c_debug_type(ctx->debug, ctx->c->abi, ty);
-    DebugVarLoc vloc = {0};
-    vloc.kind = DVL_FRAME;
-    vloc.v.frame_ofs = frame_ofs;
-    debug_local(ctx->debug, name_sym, tid, decl, vloc);
-  }
-  return s;
-}
-
-void cgtest_load_local(CgTestFn* tf, Operand dst_reg, FrameSlot s,
-                       const Type* ty) {
-  MemAccess ma = default_memaccess(tf->ctx, ty);
-  tf->ctx->target->load(tf->ctx->target, dst_reg, LOCAL_op(s, ty), ma);
-}
-
-void cgtest_store_local(CgTestFn* tf, FrameSlot s, Operand src,
-                        const Type* ty) {
-  MemAccess ma = default_memaccess(tf->ctx, ty);
-  tf->ctx->target->store(tf->ctx->target, LOCAL_op(s, ty), src, ma);
-}
-
-/* ---- return ---- */
-
-void cgtest_ret_reg(CgTestFn* tf, Reg r, const Type* ty) {
-  CGABIValue v = {0};
-  v.type = ty;
-  v.abi = &tf->abi_info->ret;
-  v.storage = REG_op(r, ty);
-  v.parts = NULL;
-  v.nparts = 0;
-  tf->ctx->target->ret(tf->ctx->target, &v);
-}
-
-void cgtest_ret_imm(CgTestFn* tf, i64 imm, const Type* ty) {
-  CGABIValue v = {0};
-  v.type = ty;
-  v.abi = &tf->abi_info->ret;
-  v.storage = IMM_op(imm, ty);
-  v.parts = NULL;
-  v.nparts = 0;
-  tf->ctx->target->ret(tf->ctx->target, &v);
-}
-
-void cgtest_ret_void(CgTestFn* tf) {
-  tf->ctx->target->ret(tf->ctx->target, NULL);
-}
-
-void cgtest_ret_indirect(CgTestFn* tf, FrameSlot addr_local) {
-  CGABIValue v = {0};
-  v.type = tf->ret_ty;
-  v.abi = &tf->abi_info->ret;
-  v.storage = LOCAL_op(addr_local, tf->ret_ty);
-  v.parts = NULL;
-  v.nparts = 0;
-  tf->ctx->target->ret(tf->ctx->target, &v);
-}
-
-void cgtest_ret_struct_in_regs(CgTestFn* tf, const Reg* part_regs, u32 nparts) {
-  CGABIValue v = {0};
-  const ABIArgInfo* a = &tf->abi_info->ret;
-  CGABIPart* parts = arena_array(tf->ctx->c->tu, CGABIPart, nparts);
-  memset(parts, 0, sizeof(CGABIPart) * nparts);
-  for (u32 i = 0; i < nparts; ++i) {
-    parts[i].abi_part = &a->parts[i];
-    parts[i].op = REG_op(part_regs[i], NULL);
-    parts[i].src_offset = a->parts[i].src_offset;
-    parts[i].size = a->parts[i].size;
-    parts[i].flags = CG_ABI_PART_NONE;
-  }
-  v.type = tf->ret_ty;
-  v.abi = a;
-  v.storage = (Operand){0};
-  v.parts = parts;
-  v.nparts = nparts;
-  tf->ctx->target->ret(tf->ctx->target, &v);
-}
-
-void cgtest_end(CgTestFn* tf) {
-  CgTestCtx* ctx = tf->ctx;
-  ctx->target->func_end(ctx->target);
-  if (ctx->debug) {
-    /* Class-3 fanout: function bounds are known only after func_end has
-     * finalized the function size. doc/DWARF.md §3.1 puts the call to
-     * debug_func_pc_range in cg_func_end after target->func_end returns —
-     * the harness mirrors that, since it's the CG stand-in here. */
-    u32 end_ofs = obj_pos(ctx->ob, ctx->text_sec);
-    debug_func_pc_range(ctx->debug, ctx->text_sec, tf->func_begin_ofs, end_ofs);
-    debug_func_end(ctx->debug);
-  }
-}
-
-/* ---- calls ---- */
-
-/* Shared body for direct and indirect calls. Direct sets callee.kind =
- * OPK_GLOBAL; indirect sets OPK_REG. Everything else is identical. */
-static void cgtest_call_with_callee(CgTestFn* caller, Operand callee,
-                                    const Type* ret_ty,
-                                    const Type* const* arg_types,
-                                    const CgTestArg* args, u32 nargs,
-                                    Operand ret_storage) {
-  CgTestCtx* ctx = caller->ctx;
-
-  /* Build callee fn_type and ABIFuncInfo independently of the caller's. */
-  const Type** ptypes = NULL;
-  if (nargs) {
-    ptypes = arena_array(ctx->c->tu, const Type*, nargs);
-    for (u32 i = 0; i < nargs; ++i) ptypes[i] = arg_types[i];
-  }
-  const Type* fn_ty = type_func(ctx->pool, ret_ty, ptypes, (u16)nargs, 0);
-  const ABIFuncInfo* info = abi_func_info(ctx->c->abi, fn_ty);
-
-  /* Materialize a CGABIValue per arg. */
-  CGABIValue* avs = NULL;
-  if (nargs) {
-    avs = arena_array(ctx->c->tu, CGABIValue, nargs);
-    memset(avs, 0, sizeof(CGABIValue) * nargs);
-    for (u32 i = 0; i < nargs; ++i) {
-      CGABIValue* av = &avs[i];
-      av->type = arg_types[i];
-      av->abi = &info->params[i];
-      av->parts = NULL;
-      av->nparts = 0;
-      switch (args[i].kind) {
-        case CGT_ARG_IMM:
-          av->storage = IMM_op(args[i].v.imm, arg_types[i]);
-          break;
-        case CGT_ARG_REG:
-          av->storage = REG_op(args[i].v.reg, arg_types[i]);
-          break;
-        case CGT_ARG_LOCAL_VALUE: {
-          /* Load into a fresh reg; storage is the reg. */
-          Reg r = ctx->target->alloc_reg(
-              ctx->target,
-              (av->abi->parts && av->abi->parts[0].cls == ABI_CLASS_FP)
-                  ? RC_FP
-                  : RC_INT,
-              arg_types[i]);
-          cgtest_load_local(caller, REG_op(r, arg_types[i]), args[i].v.slot,
-                            arg_types[i]);
-          av->storage = REG_op(r, arg_types[i]);
-          break;
-        }
-        case CGT_ARG_BYVAL_LOCAL:
-        case CGT_ARG_INDIRECT_LOCAL:
-          /* Storage is the address of the local; backend reads
-           * abi.flags (BYVAL/INDIRECT) and copies as needed. */
-          av->storage = LOCAL_op(args[i].v.slot, arg_types[i]);
-          break;
-        default:
-          break;
-      }
-    }
-  }
-
-  CGCallDesc desc;
-  memset(&desc, 0, sizeof desc);
-  desc.fn_type = fn_ty;
-  desc.abi = info;
-  desc.callee = callee;
-  desc.args = avs;
-  desc.nargs = nargs;
-  desc.flags = CG_CALL_NONE;
-  desc.ret.type = ret_ty;
-  desc.ret.abi = &info->ret;
-  desc.ret.storage = ret_storage;
-  desc.ret.parts = NULL;
-  desc.ret.nparts = 0;
-
-  ctx->target->call(ctx->target, &desc);
-}
-
-void cgtest_call(CgTestFn* caller, ObjSymId callee_sym, const Type* ret_ty,
-                 const Type* const* arg_types, const CgTestArg* args, u32 nargs,
-                 Operand ret_storage) {
-  cgtest_call_with_callee(caller, GLOBAL_op(callee_sym, 0), ret_ty, arg_types,
-                          args, nargs, ret_storage);
-}
-
-void cgtest_call_indirect(CgTestFn* caller, Reg callee, const Type* ret_ty,
-                          const Type* const* arg_types, const CgTestArg* args,
-                          u32 nargs, Operand ret_storage) {
-  /* Function-pointer type for the callee operand; the backend reads
-   * desc.fn_type for ABI but uses callee.kind == OPK_REG to know it's
-   * indirect. The Type on the operand is informational. type_func wants
-   * a non-const argv, so copy through a fresh array. */
-  const Type** ptypes_for_op = NULL;
-  if (nargs) {
-    ptypes_for_op = arena_array(caller->ctx->c->tu, const Type*, nargs);
-    for (u32 i = 0; i < nargs; ++i) ptypes_for_op[i] = arg_types[i];
-  }
-  const Type* fn_ty_for_op =
-      type_func(caller->ctx->pool, ret_ty, ptypes_for_op, (u16)nargs, 0);
-  const Type* fnp_ty = type_ptr(caller->ctx->pool, fn_ty_for_op);
-  cgtest_call_with_callee(caller, REG_op(callee, fnp_ty), ret_ty, arg_types,
-                          args, nargs, ret_storage);
-}
-
-/* ---- MC-only case helpers ---- */
-
-ObjSymId cgtest_mc_begin_main(CgTestCtx* ctx) {
-  Sym name = pool_intern_cstr(ctx->pool, "test_main");
-  ObjSymId sym =
-      obj_symbol(ctx->ob, name, SB_GLOBAL, SK_FUNC, OBJ_SEC_NONE, 0, 0);
-  return sym;
-}
-
-void cgtest_mc_end_main(CgTestCtx* ctx, ObjSymId sym, u32 start_pos) {
-  u32 end = ctx->mc->pos(ctx->mc);
-  obj_symbol_define(ctx->ob, sym, ctx->text_sec, (u64)start_pos,
-                    (u64)(end - start_pos));
-}
diff --git a/test/cg/harness/cg_test.h b/test/cg/harness/cg_test.h
@@ -1,300 +0,0 @@
-/* test/cg fixture API.
- *
- * Each case is a small builder function that constructs `int test_main(void)`
- * (or another named function returning int) by driving CGTarget directly,
- * MCEmitter directly, or — once cg.h is implemented — the cg.h value-stack
- * API. The runner finds the case by name, runs build(), finalizes the
- * ObjBuilder, and exposes it through one of three exit paths:
- *
- *   --emit NAME OUT.o  : emit_elf to OUT.o (used by R/E/J path scripts)
- *   --jit  NAME        : link in-process and call test_main, exit with result
- *   --list             : list every registered case name
- *
- * The harness drives the same building blocks the parser will use:
- *   type_*       — to construct interned Types (TY_INT, TY_PTR, TY_FUNC, ...)
- *   abi_*        — to classify return + parameters (abi_func_info)
- *   CGTarget     — to lower function lifecycle, params, locals, calls, ret
- *   MCEmitter    — for raw byte emission (mc_smoke and similar)
- *
- * No ABI mocks: the harness asks the live TargetABI for ABIFuncInfo from a
- * pool-interned function Type. That is the same contract the parser will
- * rely on, so test cases here double as a behavioral spec for those
- * interfaces. Cases requiring features the lib does not yet implement
- * (type_func, abi_func_info, the call/param/aggregate methods on CGTarget)
- * fail at link/runtime until the dependencies land — that is intentional. */
-
-#ifndef CFREE_TEST_CG_TEST_H
-#define CFREE_TEST_CG_TEST_H
-
-#include "abi/abi.h"
-#include "arch/arch.h"
-#include "core/core.h"
-#include "obj/obj.h"
-#include "type/type.h"
-
-/* ---- ctx + case registry ---- */
-
-/* Forward decl — included by harness sources that need it; cases that only
- * touch ctx->debug as an opaque pointer don't need debug/debug.h. */
-typedef struct Debug Debug;
-
-typedef struct CgTestCtx {
-  Compiler* c;
-  ObjBuilder* ob;
-  MCEmitter* mc;
-  CGTarget* target;
-  ObjSecId text_sec;
-  Pool* pool;
-
-  /* Optional Debug producer. The cg-runner constructs one for cases that
-   * register DWARF checks (path W) and leaves it NULL otherwise. The
-   * harness is the parser stand-in per doc/DWARF.md §3.1, so it owns the
-   * Class-1 calls (debug_func_begin / debug_func_pc_range — emitted from
-   * cgtest_begin_func / cgtest_end when debug != NULL) and Class-2's
-   * pending-loc fanout (cgtest_set_loc). */
-  Debug* debug;
-} CgTestCtx;
-
-typedef void (*CgCaseFn)(CgTestCtx*);
-
-typedef enum {
-  CG_CASE_DEFAULT = 0, /* uses CGTarget (default) */
-  CG_CASE_MC_ONLY = 1, /* uses MCEmitter only — no CGTarget construction */
-} CgCaseKind;
-
-/* Per-case arch mask. Cases tagged with the arches they're known to run
- * on; path E (exec) dispatches to the runner for whichever arches the
- * case advertises. Today every case is aarch64-only — x86_64 cases
- * arrive alongside x64 codegen in MULTIARCH phase 3. CG_ARCH_DEFAULT
- * exists so the registry doesn't need a tag on every row. */
-enum {
-  CG_ARCH_AARCH64 = 1u << 0,
-  CG_ARCH_X64     = 1u << 1,
-  CG_ARCH_RV64    = 1u << 2,
-  /* Default = portable across all implemented backends. Cases that emit
-   * hand-crafted bytes for a specific arch (mc_smoke today) must set
-   * their arch mask explicitly. */
-  CG_ARCH_DEFAULT = CG_ARCH_AARCH64 | CG_ARCH_X64 | CG_ARCH_RV64,
-};
-
-typedef struct CgCase {
-  const char* name;
-  CgCaseFn build;
-  int expected;   /* test_main return value (default 0) */
-  unsigned kind;  /* CgCaseKind */
-  unsigned arches; /* CG_ARCH_* mask; 0 = CG_ARCH_DEFAULT */
-} CgCase;
-
-extern const CgCase cg_cases[];
-extern const unsigned cg_cases_count;
-
-/* ---- DWARF checks (path W) ----
- * Optional per-case directives consumed by test/cg/harness/cg_check_dwarf
- * after --emit. Each entry pairs a case name with a directive blob: one
- * directive per line, blank lines ignored. Cases not listed here are
- * skipped on path W. Supported directives:
- *
- *   line FILE LINE
- *       Some PC inside the obj's text must map to (FILE, LINE) and the
- *       inverse line_to_addr must round-trip.
- *
- *   subprogram NAME
- *       cfree_dwarf_subprogram_at must report a non-empty pc range for
- *       the named symbol.
- *
- * The cfree_dwarf_* consumers are stubbed today (src/api/stubs.c), so
- * every directive currently fails — that's intentional. */
-typedef struct CgDwarfCheck {
-  const char* case_name;
-  const char* directives;
-} CgDwarfCheck;
-
-extern const CgDwarfCheck cg_dwarf_checks[];
-extern const unsigned cg_dwarf_checks_count;
-
-/* ---- pre-interned type accessors ----
- * Resolved once per ctx via type_prim/type_void/type_ptr against
- * ctx->pool. Sugar so cases don't repeat the lookup. */
-const Type* T_void(CgTestCtx*);
-const Type* T_i8(CgTestCtx*);
-const Type* T_u8(CgTestCtx*);
-const Type* T_i16(CgTestCtx*);
-const Type* T_u16(CgTestCtx*);
-const Type* T_i32(CgTestCtx*);
-const Type* T_u32(CgTestCtx*);
-const Type* T_i64(CgTestCtx*);
-const Type* T_u64(CgTestCtx*);
-const Type* T_f32(CgTestCtx*);
-const Type* T_f64(CgTestCtx*);
-const Type* T_ptr_void(CgTestCtx*);
-const Type* T_ptr(CgTestCtx*, const Type* pointee);
-
-/* ---- operand sugar ---- */
-Operand IMM_op(i64 v, const Type* ty);
-Operand REG_op(Reg r, const Type* ty);
-Operand LOCAL_op(FrameSlot s, const Type* ty);
-Operand IND_op(Reg base, i32 ofs, const Type* ty);
-Operand GLOBAL_op(ObjSymId sym, i64 addend);
-
-/* ---- function-fixture helpers ---- */
-
-typedef struct CgTestParam {
-  const Type* type;
-  FrameSlot slot;        /* FS_PARAM home, allocated by helper */
-  const ABIArgInfo* abi; /* points into ABIFuncInfo.params[i] */
-} CgTestParam;
-
-typedef struct CgTestFn {
-  CgTestCtx* ctx;
-  const Type* fn_type; /* TY_FUNC; built from ret + param types */
-  const Type* ret_ty;
-  const ABIFuncInfo* abi_info; /* abi_func_info(c->abi, fn_type) */
-  ObjSymId sym;
-  CGFuncDesc fd;
-  CgTestParam* params;
-  u32 nparams;
-  u32 func_begin_ofs; /* obj_pos at func_begin entry; used to compute the
-                         (begin, end) PC range passed to debug_func_pc_range
-                         in cgtest_end when ctx->debug != NULL. Mirrors the
-                         field doc/DWARF.md §3.1 expects on CG. */
-} CgTestFn;
-
-/* Set the pending source loc, fanning out to both CGTarget (which forwards
- * to MCEmitter) and Debug (debug_set_pending_loc). The harness is the
- * parser stand-in per doc/DWARF.md §3.1; this is the parser-half of the
- * Class-2 line-row protocol. Cases that need to stamp specific (file,
- * line) onto an instruction range should call this rather than
- * target->set_loc directly so the Debug fanout happens. */
-void cgtest_set_loc(CgTestCtx* ctx, SrcLoc loc);
-
-/* Begin a function returning ret_ty with no parameters. test_main is the
- * canonical entry; the runner casts it to int(*)(void). Internally calls
- * cgtest_begin_func with name="test_main" and zero params. */
-CgTestFn* cgtest_begin_main(CgTestCtx* ctx, const Type* ret_ty);
-
-/* Begin an arbitrary named function. param_types[i] is the type of param i.
- *
- *   - Builds fn_type via type_func(pool, ret_ty, param_types, nparams, 0).
- *   - Computes ABIFuncInfo via abi_func_info(c->abi, fn_type).
- *   - Allocates an FS_PARAM frame slot for each param (size/align from
- *     abi_sizeof/abi_alignof on the param type).
- *   - Constructs CGParamDesc{index,name=0,type,slot,abi=info->params[i],
- *     incoming=info->params[i]->parts, nincoming=info->params[i]->nparts,
- *     loc=0} and stores into fd.params[].
- *   - Calls target->func_begin(target, &fd).
- *   - For each param, calls target->param(target, &fd.params[i]).
- *
- * Returns a CgTestFn the body can use; cgtest_param_slot(tf,i) reads the
- * home slot for param i. */
-CgTestFn* cgtest_begin_func(CgTestCtx* ctx, const char* name,
-                            const Type* ret_ty, const Type* const* param_types,
-                            u32 nparams);
-
-/* Like cgtest_begin_func, but uses an already-allocated ObjSymId instead of
- * creating one. Lets a case forward-declare a symbol with cgtest_decl_func
- * (so a mutually-recursive partner can refer to it before its body emits)
- * and then attach the definition here. */
-CgTestFn* cgtest_begin_func_at(CgTestCtx* ctx, ObjSymId pre_sym,
-                               const Type* ret_ty,
-                               const Type* const* param_types, u32 nparams);
-
-/* Forward-declare a function symbol with the given name. Returns an
- * ObjSymId callable via cgtest_call before its body is emitted. The symbol
- * is defined later by cgtest_begin_func_at(..., pre_sym, ...). */
-ObjSymId cgtest_decl_func(CgTestCtx*, const char* name);
-
-FrameSlot cgtest_param_slot(CgTestFn*, u32 idx);
-
-/* ---- frame slots and memory ---- */
-
-/* Allocate a local frame slot of the given type with default size/align
- * from the live TargetABI. flags is a FrameSlotFlag mask (FSF_ADDR_TAKEN,
- * etc.). */
-FrameSlot cgtest_local(CgTestFn*, const Type* ty, u16 flags);
-
-/* Like cgtest_local but additionally registers a DW_TAG_variable when the
- * harness was constructed with Debug. The caller supplies the source-level
- * decl name and SrcLoc; the variable's location is encoded as DW_OP_fbreg
- * with the supplied frame_ofs. The harness has no public API to read a
- * FrameSlot's actual fp-relative offset, so callers wanting a specific
- * encoded value pass it explicitly — directives that don't care use 0 and
- * accept the wildcard "*". */
-FrameSlot cgtest_local_named(CgTestFn*, const Type* ty, u16 flags,
-                             const char* name, SrcLoc decl, i32 frame_ofs);
-
-/* Convenience wrappers around target->load/store with a default MemAccess
- * derived from `ty` (size/align from TargetABI, alias=ALIAS_LOCAL). */
-void cgtest_load_local(CgTestFn*, Operand dst_reg, FrameSlot, const Type*);
-void cgtest_store_local(CgTestFn*, FrameSlot, Operand src, const Type*);
-
-/* ---- return ---- */
-
-void cgtest_ret_reg(CgTestFn*, Reg r, const Type* ty);
-void cgtest_ret_imm(CgTestFn*, i64 imm, const Type* ty);
-void cgtest_ret_void(CgTestFn*);
-/* Aggregate / sret return: result lives at the address held in addr_local
- * (typically an FS_LOCAL of the ret type). Builds CGABIValue with
- * abi=fn->abi_info->ret, storage=OPK_LOCAL{addr_local}, parts=NULL. */
-void cgtest_ret_indirect(CgTestFn*, FrameSlot addr_local);
-/* For a struct return that is split into two registers per ABI: caller has
- * already loaded each part into a register; this packs them into the
- * CGABIValue.parts array so the backend can place them in the ABI-classed
- * registers. */
-void cgtest_ret_struct_in_regs(CgTestFn*, const Reg* part_regs, u32 nparts);
-
-void cgtest_end(CgTestFn*);
-
-/* ---- direct calls ---- */
-
-typedef enum {
-  CGT_ARG_IMM,            /* scalar immediate */
-  CGT_ARG_REG,            /* scalar register */
-  CGT_ARG_LOCAL_VALUE,    /* scalar value loaded from a local slot */
-  CGT_ARG_BYVAL_LOCAL,    /* aggregate by value: backend reads from &local */
-  CGT_ARG_INDIRECT_LOCAL, /* aggregate indirect: pointer to &local */
-} CgTestArgKind;
-
-typedef struct CgTestArg {
-  u8 kind; /* CgTestArgKind */
-  const Type* type;
-  union {
-    i64 imm;
-    Reg reg;
-    FrameSlot slot;
-  } v;
-} CgTestArg;
-
-/* Emit a direct call to `callee_sym` whose signature matches the function
- * defined by cgtest_begin_func with `ret_ty` + `arg_types`. Internally:
- *
- *   - Builds fn_type = type_func(pool, ret_ty, arg_types, nargs, 0).
- *   - Looks up abi_func_info(c->abi, fn_type) to classify ret + each arg.
- *   - Materializes a CGABIValue for each arg using args[i] (IMM / REG /
- *     LOCAL_VALUE pack into storage; BYVAL_LOCAL/INDIRECT_LOCAL pack the
- *     local's address as storage).
- *   - Builds CGCallDesc.callee = OPK_GLOBAL{callee_sym, 0} for direct call.
- *   - Sets CGCallDesc.ret with ret_storage as the destination operand:
- *       scalar return : REG_op(dst, ret_ty)
- *       sret return   : LOCAL_op(sret_slot, ret_ty)
- *       void return   : IMM_op(0, T_void) (storage unused)
- *   - Calls target->call(target, &desc). */
-void cgtest_call(CgTestFn* caller, ObjSymId callee_sym, const Type* ret_ty,
-                 const Type* const* arg_types, const CgTestArg* args, u32 nargs,
-                 Operand ret_storage);
-
-/* Like cgtest_call, but the callee is held in a register at runtime — emits
- * an indirect call. CGCallDesc.callee.kind is set to OPK_REG (as opposed to
- * OPK_GLOBAL); the rest of the wiring (ABI, args, ret) is identical. */
-void cgtest_call_indirect(CgTestFn* caller, Reg callee, const Type* ret_ty,
-                          const Type* const* arg_types, const CgTestArg* args,
-                          u32 nargs, Operand ret_storage);
-
-/* ---- low-level helpers (used by mc_smoke and similar) ---- */
-
-/* Define a function symbol at the current MCEmitter section position with
- * size = current_pos - start_pos. Used by MC-only cases that emit bytes
- * directly without a CGTarget. */
-ObjSymId cgtest_mc_begin_main(CgTestCtx*);
-void cgtest_mc_end_main(CgTestCtx*, ObjSymId, u32 start_pos);
-
-#endif
diff --git a/test/cg/run.sh b/test/cg/run.sh
@@ -1,652 +0,0 @@
-#!/usr/bin/env bash
-# test/cg/run.sh — fixture-driven cg / CGTarget / MCEmitter test harness.
-#
-# For each registered case (cg-runner --list), runs up to four paths:
-#
-#   D  in-process JIT — cg-runner --jit NAME → exit code matches expected.
-#                       No file I/O. aarch64 host only.
-#   R  ELF roundtrip  — cg-runner --emit NAME → cfree-roundtrip → readelf+
-#                       normalize diff. Validates emitter+reader fidelity.
-#   E  exec via qemu  — cg-runner --emit + start.o → link-exe-runner → qemu/
-#                       podman → exit code. Cross-host friendly.
-#   J  jit-via-file   — cg-runner --emit + jit-runner. aarch64 host.
-#   W  DWARF check   — cg-runner --emit + cg-runner --dwarf-checks NAME |
-#                       cg_check_dwarf OBJ. Group P only; cases that don't
-#                       register checks are silently skipped. Today every
-#                       check fails by design — debug_emit and the
-#                       cfree_dwarf_* consumers are stubs.
-#   S  asm roundtrip — for every cg-emitted aarch64 binary, walk .text
-#                       through cfree_disasm_iter_*, re-assemble the
-#                       resulting text via the asm-runner, byte-compare.
-#                       Phase 1 (per doc/ASM.md §5): always reports
-#                       SKIP — the disasm iterator and asm parser are
-#                       stubs in src/api/stubs.c. S is opt-in (not in
-#                       the default DREJW path matrix) until phase 4
-#                       lands; run with `./run.sh '' S` or
-#                       CFREE_TEST_PATHS=DREJWS.
-#
-# Reuses the existing test/link harness binaries (link-exe-runner,
-# jit-runner, cfree-roundtrip) verbatim.
-#
-# Skip-vs-fail follows test/link convention: skipped layers are treated as
-# failures unless CFREE_TEST_ALLOW_SKIP=1.
-#
-# Filtering:
-#   ./run.sh [name_filter] [paths]
-#     name_filter   substring match against case name (e.g. "a01", "add")
-#     paths         subset of "DREJW" (default "DREJW")
-#   Equivalent env vars: CFREE_TEST_FILTER, CFREE_TEST_PATHS.
-#
-# Parallelism:
-#   default                 run in parallel with a capped CPU-count default.
-#   CFREE_TEST_JOBS=N       run up to N cases per opt level concurrently.
-#   CFREE_TEST_JOBS=auto    same as the default.
-
-set -u
-
-ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
-TEST_DIR="$ROOT/test/cg"
-LINK_TEST_DIR="$ROOT/test/link"
-BUILD_DIR="$ROOT/build/test"
-LIB_AR="$ROOT/build/libcfree.a"
-
-CG_RUNNER="$BUILD_DIR/cg-runner"
-ROUNDTRIP_BIN="$BUILD_DIR/cfree-roundtrip"
-LINK_EXE_RUNNER="$BUILD_DIR/link-exe-runner"
-JIT_RUNNER="$BUILD_DIR/jit-runner"
-DWARF_CHECK="$BUILD_DIR/cg-check-dwarf"
-NORMALIZE="$ROOT/test/elf/normalize.py"
-
-# shellcheck source=../lib/parallel.sh
-source "$ROOT/test/lib/parallel.sh"
-
-# CFREE_TEST_ARCH and CFREE_TEST_OBJ together select the cross-target
-# the harness drives the compiler at. Defaults aa64+elf preserve
-# historical behavior. The runners (cg-runner / link-exe-runner /
-# jit-runner) read the same env vars via test/lib/cfree_test_target.h,
-# so the C side and the shell side stay in lockstep.
-CFREE_TEST_ARCH="${CFREE_TEST_ARCH:-aa64}"
-CFREE_TEST_OBJ="${CFREE_TEST_OBJ:-elf}"
-case "$CFREE_TEST_ARCH" in
-    aa64|aarch64|arm64)   TEST_ARCH=aa64;   EXEC_ARCH=aarch64 ;;
-    x64|x86_64|amd64)     TEST_ARCH=x64;    EXEC_ARCH=x64 ;;
-    rv64|riscv64)         TEST_ARCH=rv64;   EXEC_ARCH=rv64 ;;
-    *) printf 'unknown CFREE_TEST_ARCH=%s\n' "$CFREE_TEST_ARCH" >&2; exit 2 ;;
-esac
-case "$CFREE_TEST_OBJ" in
-    elf)
-        EXEC_OS=linux
-        case "$TEST_ARCH" in
-            aa64) CLANG_TRIPLE=aarch64-linux-gnu ;;
-            x64)  CLANG_TRIPLE=x86_64-linux-gnu ;;
-            rv64) CLANG_TRIPLE=riscv64-linux-gnu ;;
-        esac
-        ;;
-    macho)
-        EXEC_OS=macos
-        case "$TEST_ARCH" in
-            aa64) CLANG_TRIPLE=arm64-apple-macos ;;
-            x64)  CLANG_TRIPLE=x86_64-apple-macos ;;
-            rv64) printf 'CFREE_TEST_OBJ=macho has no rv64 target\n' >&2; exit 2 ;;
-        esac
-        ;;
-    *) printf 'unknown CFREE_TEST_OBJ=%s\n' "$CFREE_TEST_OBJ" >&2; exit 2 ;;
-esac
-EXEC_TAG="${EXEC_ARCH}-${EXEC_OS}"
-export CFREE_TEST_ARCH CFREE_TEST_OBJ
-
-CLANG_TARGET="--target=$CLANG_TRIPLE"
-CC="${CC:-cc}"
-CFREE_CFLAGS="-I$ROOT/include -I$ROOT/src -I$ROOT/test -I$TEST_DIR/harness"
-ALLOW_SKIP="${CFREE_TEST_ALLOW_SKIP:-0}"
-
-# Filters (env vars or positional args; args win):
-#   $1 / CFREE_TEST_FILTER — substring match against case name
-#   $2 / CFREE_TEST_PATHS  — subset of "DREJ" (default "DREJ")
-#   CFREE_OPT_LEVELS — space-separated opt levels to exercise. Default
-#                      "0 1": directly against the backend (level 0)
-#                      and through the opt_cgtarget wrapper (level 1).
-#                      Level 2 (Phase 3 dry-run build_cfg + build_ssa,
-#                      discarded before replay) is opt-in via
-#                      CFREE_OPT_LEVELS="0 1 2".
-#                      Path W (DWARF) only runs at level 0 — opt-level
-#                      DWARF equivalence is a later phase concern.
-FILTER="${1:-${CFREE_TEST_FILTER:-}}"
-PATHS="${2:-${CFREE_TEST_PATHS:-DREJW}}"
-OPT_LEVELS="${CFREE_OPT_LEVELS:-0 1}"
-case "$PATHS" in *D*) RUN_D=1;; *) RUN_D=0;; esac
-case "$PATHS" in *R*) RUN_R=1;; *) RUN_R=0;; esac
-case "$PATHS" in *E*) RUN_E=1;; *) RUN_E=0;; esac
-case "$PATHS" in *J*) RUN_J=1;; *) RUN_J=0;; esac
-case "$PATHS" in *W*) RUN_W=1;; *) RUN_W=0;; esac
-case "$PATHS" in *S*) RUN_S=1;; *) RUN_S=0;; esac
-T_D=0; T_R=0; T_E=0; T_J=0; T_W=0; T_S=0  # accumulated wall-clock seconds per path
-now_ms() { python3 -c 'import time;print(int(time.time()*1000))'; }
-
-mkdir -p "$BUILD_DIR" "$BUILD_DIR/cg"
-
-TEST_JOBS="$(cfree_parallel_jobs)" || exit 2
-PARALLEL_DIR="$BUILD_DIR/cg.parallel/$$"
-mkdir -p "$PARALLEL_DIR"
-
-PASS=0; FAIL=0; SKIP=0
-FAIL_NAMES=(); SKIP_NAMES=()
-
-color_red() { printf '\033[31m%s\033[0m' "$1"; }
-color_grn() { printf '\033[32m%s\033[0m' "$1"; }
-color_yel() { printf '\033[33m%s\033[0m' "$1"; }
-
-note_pass() { PASS=$((PASS+1)); printf '  %s %s\n' "$(color_grn PASS)" "$1"; }
-note_fail() { FAIL=$((FAIL+1)); FAIL_NAMES+=("$1"); printf '  %s %s\n' "$(color_red FAIL)" "$1"; }
-note_skip() { SKIP=$((SKIP+1)); SKIP_NAMES+=("$1"); printf '  %s %s — %s\n' "$(color_yel SKIP)" "$1" "$2"; }
-
-event_path() { printf '%s/%s.%04d.events' "$PARALLEL_DIR" "$1" "$2"; }
-worker_stdout_path() { printf '%s/%s.%04d.stdout' "$PARALLEL_DIR" "$1" "$2"; }
-worker_stderr_path() { printf '%s/%s.%04d.stderr' "$PARALLEL_DIR" "$1" "$2"; }
-
-emit_event() {
-    local file="$1" kind="$2"
-    shift 2
-    printf '%s' "$kind" >> "$file"
-    while [ $# -gt 0 ]; do
-        printf '\t%s' "$1" >> "$file"
-        shift
-    done
-    printf '\n' >> "$file"
-}
-
-replay_events() {
-    local event="$1" stdout_log="$2" stderr_log="$3"
-    local kind a b c d e f
-
-    if [ ! -s "$event" ]; then
-        note_fail "internal: missing worker result $event"
-        if [ -s "$stdout_log" ]; then sed 's/^/    | /' "$stdout_log"; fi
-        if [ -s "$stderr_log" ]; then sed 's/^/    | /' "$stderr_log"; fi
-        return
-    fi
-
-    while IFS=$'\t' read -r kind a b c d e f; do
-        case "$kind" in
-            NOOP) : ;;
-            PASS) note_pass "$a" ;;
-            FAIL) note_fail "$a" ;;
-            SKIP) note_skip "$a" "$b" ;;
-            TIME)
-                case "$a" in
-                    D) T_D=$(( T_D + b )) ;;
-                    R) T_R=$(( T_R + b )) ;;
-                    E) T_E=$(( T_E + b )) ;;
-                    J) T_J=$(( T_J + b )) ;;
-                    W) T_W=$(( T_W + b )) ;;
-                    S) T_S=$(( T_S + b )) ;;
-                esac
-                ;;
-            QUEUE_E)
-                E_NAMES+=("$a")
-                E_WORK+=("$b")
-                E_LINK_MS+=("$c")
-                E_EXPECTED+=("$d")
-                T_E=$(( T_E + c ))
-                exec_target_queue "$f" "$e" "$b/linked.exe" \
-                    "$b/exec.out" "$b/exec.err" "$b/exec.rc"
-                ;;
-            *)
-                note_fail "internal: malformed worker event in $event"
-                ;;
-        esac
-    done < "$event"
-}
-
-run_parallel_items() {
-    local layer="$1" worker="$2"
-    shift 2
-
-    local events=()
-    local stdout_logs=()
-    local stderr_logs=()
-    local idx=0
-    local item event stdout_log stderr_log
-
-    for item in "$@"; do
-        event="$(event_path "$layer" "$idx")"
-        stdout_log="$(worker_stdout_path "$layer" "$idx")"
-        stderr_log="$(worker_stderr_path "$layer" "$idx")"
-        : > "$event"
-        : > "$stdout_log"
-        : > "$stderr_log"
-        events+=("$event")
-        stdout_logs+=("$stdout_log")
-        stderr_logs+=("$stderr_log")
-        cfree_parallel_run "$TEST_JOBS" "$worker" "$idx" "$item" "$event" \
-            > "$stdout_log" 2> "$stderr_log"
-        idx=$((idx+1))
-    done
-
-    cfree_parallel_wait_all || true
-
-    idx=0
-    while [ $idx -lt ${#events[@]} ]; do
-        replay_events "${events[$idx]}" "${stdout_logs[$idx]}" "${stderr_logs[$idx]}"
-        idx=$((idx+1))
-    done
-}
-
-# ---- tool detection --------------------------------------------------------
-
-have_clang_cross=0
-have_readelf=0
-have_python3=0
-have_qemu=0
-have_podman=0
-have_runner=0
-have_roundtrip=0
-have_exe_runner=0
-have_jit_runner=0
-is_aarch64=0
-
-if clang $CLANG_TARGET -c -x c - -o /dev/null < /dev/null 2>/dev/null; then
-    have_clang_cross=1
-fi
-command -v llvm-readelf >/dev/null 2>&1 && have_readelf=1
-command -v readelf      >/dev/null 2>&1 && have_readelf=1
-command -v python3      >/dev/null 2>&1 && have_python3=1
-
-QEMU_BIN="$(command -v qemu-aarch64-static 2>/dev/null || command -v qemu-aarch64 2>/dev/null || true)"
-[ -n "$QEMU_BIN" ] && have_qemu=1
-command -v podman >/dev/null 2>&1 && have_podman=1
-{ [ $have_qemu -eq 1 ] || [ $have_podman -eq 1 ]; } && have_runner=1
-
-arch_raw="$(uname -m 2>/dev/null || true)"
-{ [ "$arch_raw" = "aarch64" ] || [ "$arch_raw" = "arm64" ]; } && is_aarch64=1
-
-# is_native_target=1 when the cross-target arch matches the host arch.
-# Path D (in-process JIT) and path J (jit-runner) require native execution
-# of cfree-emitted code; on a non-matching host we skip them.
-is_native_target=0
-case "$TEST_ARCH" in
-    aa64) [ $is_aarch64 -eq 1 ] && is_native_target=1 ;;
-    x64)  { [ "$arch_raw" = "x86_64" ] || [ "$arch_raw" = "amd64" ]; } && is_native_target=1 ;;
-    rv64) [ "$arch_raw" = "riscv64" ] && is_native_target=1 ;;
-esac
-
-READELF_BIN="$(command -v llvm-readelf 2>/dev/null || command -v readelf 2>/dev/null || true)"
-
-# Shared per-arch exec helper — see test/lib/exec_target.sh. Path E
-# queues each linked.exe and we drain the queue in a single batched
-# podman run per arch after the case loop, amortizing the per-launch
-# podman overhead across all ~200 cg cases.
-EXEC_TARGET_MOUNT_ROOT="$BUILD_DIR"
-# shellcheck source=../lib/exec_target.sh
-source "$ROOT/test/lib/exec_target.sh"
-
-# ---- build harness binaries ------------------------------------------------
-
-printf 'Building harness...\n'
-
-if [ ! -f "$LIB_AR" ]; then
-    printf '  FATAL: %s not found — run "make lib" first\n' "$LIB_AR" >&2
-    exit 1
-fi
-
-# cg-runner
-if $CC $CFREE_CFLAGS \
-        "$TEST_DIR/harness/cg_runner.c" \
-        "$TEST_DIR/harness/cg_test.c" \
-        "$TEST_DIR/harness/cases.c" \
-        "$TEST_DIR/harness/cases_shared.c" \
-        "$TEST_DIR/harness/cases_mc.c" \
-        "$TEST_DIR/harness/cases_a.c" \
-        "$TEST_DIR/harness/cases_b.c" \
-        "$TEST_DIR/harness/cases_c.c" \
-        "$TEST_DIR/harness/cases_d.c" \
-        "$TEST_DIR/harness/cases_e.c" \
-        "$TEST_DIR/harness/cases_f.c" \
-        "$TEST_DIR/harness/cases_g.c" \
-        "$TEST_DIR/harness/cases_h.c" \
-        "$TEST_DIR/harness/cases_i.c" \
-        "$TEST_DIR/harness/cases_j.c" \
-        "$TEST_DIR/harness/cases_k.c" \
-        "$TEST_DIR/harness/cases_l.c" \
-        "$TEST_DIR/harness/cases_n.c" \
-        "$TEST_DIR/harness/cases_o.c" \
-        "$TEST_DIR/harness/cases_p.c" \
-        "$TEST_DIR/harness/cases_q.c" \
-        "$TEST_DIR/harness/cases_asm.c" \
-        "$LIB_AR" -o "$CG_RUNNER" 2>"$BUILD_DIR/cg-runner.err"; then
-    printf '  %s cg-runner\n' "$(color_grn built)"
-else
-    printf '  %s cg-runner (see %s)\n' \
-        "$(color_red FATAL)" "$BUILD_DIR/cg-runner.err" >&2
-    exit 1
-fi
-
-# cfree-roundtrip — for path R. test/elf/run.sh builds this; skip path R if
-# we can't find or build it.
-if [ ! -x "$ROUNDTRIP_BIN" ]; then
-    if $CC -I"$ROOT/include" -I"$ROOT/src" \
-           "$ROOT/test/elf/cfree-roundtrip.c" "$LIB_AR" \
-           -o "$ROUNDTRIP_BIN" 2>"$BUILD_DIR/cfree-roundtrip.err"; then
-        have_roundtrip=1
-        printf '  %s cfree-roundtrip\n' "$(color_grn built)"
-    else
-        printf '  %s cfree-roundtrip (see %s)\n' \
-            "$(color_yel warn)" "$BUILD_DIR/cfree-roundtrip.err" >&2
-    fi
-else
-    have_roundtrip=1
-fi
-
-# link-exe-runner — for path E.
-if [ ! -x "$LINK_EXE_RUNNER" ]; then
-    if $CC -I"$ROOT/include" -I"$ROOT/test" \
-           "$LINK_TEST_DIR/harness/link_exe_runner.c" \
-           "$LIB_AR" -o "$LINK_EXE_RUNNER" 2>"$BUILD_DIR/link-exe-runner.err"; then
-        have_exe_runner=1
-        printf '  %s link-exe-runner\n' "$(color_grn built)"
-    else
-        printf '  %s link-exe-runner (see %s)\n' \
-            "$(color_yel warn)" "$BUILD_DIR/link-exe-runner.err" >&2
-    fi
-else
-    have_exe_runner=1
-fi
-
-# jit-runner — for path J. Only when the host arch matches the cross-target
-# (otherwise the JIT can't execute the emitted code natively).
-if [ $is_native_target -eq 1 ]; then
-    if [ ! -x "$JIT_RUNNER" ]; then
-        if $CC -I"$ROOT/include" -I"$ROOT/test" "$LINK_TEST_DIR/harness/jit_runner.c" \
-               "$LIB_AR" -o "$JIT_RUNNER" 2>"$BUILD_DIR/jit-runner.err"; then
-            have_jit_runner=1
-            printf '  %s jit-runner\n' "$(color_grn built)"
-        else
-            printf '  %s jit-runner (see %s)\n' \
-                "$(color_yel warn)" "$BUILD_DIR/jit-runner.err" >&2
-        fi
-    else
-        have_jit_runner=1
-    fi
-fi
-
-# cg-check-dwarf — for path W. Always rebuild (small file, picks up
-# changes alongside the rest of the harness).
-have_dwarf_check=0
-if $CC -I"$ROOT/include" "$TEST_DIR/harness/cg_check_dwarf.c" \
-       "$LIB_AR" -o "$DWARF_CHECK" 2>"$BUILD_DIR/cg-check-dwarf.err"; then
-    have_dwarf_check=1
-    printf '  %s cg-check-dwarf\n' "$(color_grn built)"
-else
-    printf '  %s cg-check-dwarf (see %s)\n' \
-        "$(color_yel warn)" "$BUILD_DIR/cg-check-dwarf.err" >&2
-fi
-
-# Cached start.o — every case used to recompile this from the same source
-# (~40 ms × N cases). Build it once for the whole harness run.
-START_OBJ="$BUILD_DIR/cg_start.o"
-have_start_obj=0
-if [ $have_clang_cross -eq 1 ]; then
-    if clang $CLANG_TARGET -O1 -ffreestanding -fno-stack-protector \
-            -fno-PIC -fno-pie \
-            -c "$LINK_TEST_DIR/harness/start.c" -o "$START_OBJ" 2>/dev/null; then
-        have_start_obj=1
-    fi
-fi
-
-CASES="$($CG_RUNNER --list)"
-
-run_cg_case() {
-    local _idx="$1" name="$2" event="$3"
-    local work case_arches expected expected_byte case_tag obj t0 dt
-    local d_rc rt r_ok r_msg exe link_dt j_rc w_rc
-    : "$_idx"
-
-    work="$BUILD_DIR/$WORK_SUB/$name"
-    mkdir -p "$work"
-
-    # Filter cases whose declared arch mask excludes the test arch.
-    # cg-runner --arches NAME prints one token per arch the case
-    # supports; skip if our $EXEC_ARCH isn't listed.
-    case_arches="$("${CG_RUN[@]}" --arches "$name" 2>/dev/null)"
-    if [ -n "$case_arches" ] && \
-            ! printf '%s\n' "$case_arches" | grep -qx "$EXEC_ARCH"; then
-        emit_event "$event" NOOP
-        return 0
-    fi
-
-    expected="$("${CG_RUN[@]}" --expected "$name" 2>/dev/null)"
-    expected="${expected:-0}"
-    # Exit codes are mod 256 on POSIX; mask the expected the same way so
-    # negative-return cases compare correctly.
-    expected_byte=$(( expected & 0xff ))
-
-    # Path E target tag. The shell drives every case at the
-    # (CFREE_TEST_ARCH, CFREE_TEST_OBJ)-selected target — emit panics
-    # on stub backends surface as case failures rather than harness
-    # skips, which is the multi-arch/multi-obj contract through
-    # Phase 2. cg-runner's --arches output is informational at this
-    # stage.
-    case_tag="$EXEC_TAG"
-
-    # ---- Path D: in-process JIT (only when host arch == cross-target) ----
-    if [ $RUN_D -eq 1 ]; then
-        if [ $is_native_target -eq 1 ]; then
-            t0=$(now_ms)
-            "${CG_RUN[@]}" --jit "$name" >"$work/d.out" 2>"$work/d.err"
-            d_rc=$?
-            dt=$(( $(now_ms) - t0 ))
-            emit_event "$event" TIME D "$dt"
-            if [ "$d_rc" -eq "$expected_byte" ]; then
-                emit_event "$event" PASS "$name/D${TAG} (${dt}ms)"
-            else
-                emit_event "$event" FAIL "$name/D${TAG} (expected $expected_byte got $d_rc, ${dt}ms)"
-            fi
-        else
-            emit_event "$event" SKIP "$name/D${TAG}" "host arch != $TEST_ARCH (no native JIT)"
-        fi
-    fi
-
-    # ---- emit (needed by R/E/J/W) -----------------------------------------
-    obj="$work/$name.o"
-    if [ $RUN_R -eq 1 ] || [ $RUN_E -eq 1 ] || [ $RUN_J -eq 1 ] \
-            || [ $RUN_W -eq 1 ]; then
-        if ! "${CG_RUN[@]}" --emit "$name" "$obj" 2>"$work/emit.err"; then
-            emit_event "$event" FAIL "$name/emit${TAG} (cg-runner --emit failed; see $work/emit.err)"
-            return 0
-        fi
-    fi
-
-    # ---- Path R: ELF roundtrip --------------------------------------------
-    if [ $RUN_R -eq 1 ]; then
-        if [ $have_roundtrip -eq 1 ] && [ $have_readelf -eq 1 ] && [ $have_python3 -eq 1 ]; then
-            t0=$(now_ms)
-            rt="$work/$name.rt.o"
-            r_ok=1; r_msg=""
-            if ! "$ROUNDTRIP_BIN" "$obj" "$rt" 2>"$work/rt.err"; then
-                r_ok=0; r_msg=" (roundtrip failed)"
-            else
-                "$READELF_BIN" -aW "$obj" | python3 "$NORMALIZE" >"$work/golden.norm" 2>/dev/null
-                "$READELF_BIN" -aW "$rt"  | python3 "$NORMALIZE" >"$work/rt.norm"     2>/dev/null
-                diff -u "$work/golden.norm" "$work/rt.norm" >"$work/r.diff" 2>&1 || r_ok=0
-            fi
-            dt=$(( $(now_ms) - t0 ))
-            emit_event "$event" TIME R "$dt"
-            if [ $r_ok -eq 1 ]; then emit_event "$event" PASS "$name/R${TAG} (${dt}ms)"
-            else emit_event "$event" FAIL "$name/R${TAG}${r_msg} (${dt}ms)"; fi
-        else
-            emit_event "$event" SKIP "$name/R${TAG}" "missing roundtrip/readelf/python3"
-        fi
-    fi
-
-    # ---- Path E: link + (batched) qemu/podman ------------------------------
-    # Link now (per case); the run is queued for the post-loop flush.
-    if [ $RUN_E -eq 1 ]; then
-        if [ $have_exe_runner -eq 1 ] && [ $have_clang_cross -eq 1 ] \
-                && [ $have_start_obj -eq 1 ]; then
-            t0=$(now_ms)
-            exe="$work/linked.exe"
-            if ! "$LINK_EXE_RUNNER" -o "$exe" "$obj" "$START_OBJ" \
-                    >"$work/exec_link.out" 2>"$work/exec_link.err"; then
-                dt=$(( $(now_ms) - t0 ))
-                emit_event "$event" TIME E "$dt"
-                emit_event "$event" FAIL "$name/E${TAG} (link failed, ${dt}ms)"
-            elif exec_target_supported "$case_tag"; then
-                link_dt=$(( $(now_ms) - t0 ))
-                # Queue with a level-tagged key so cases at different
-                # opt levels don't collide in the batched runner.
-                emit_event "$event" QUEUE_E "$name" "$work" "$link_dt" \
-                    "$expected_byte" "L${OPT_LEVEL}_${name}" "$case_tag"
-            else
-                emit_event "$event" SKIP "$name/E${TAG}" "no runner for $case_tag"
-            fi
-        else
-            emit_event "$event" SKIP "$name/E${TAG}" "no link-exe-runner, aarch64 clang, or start.o"
-        fi
-    fi
-
-    # ---- Path J: jit-via-file ---------------------------------------------
-    if [ $RUN_J -eq 1 ]; then
-        if [ $have_jit_runner -eq 1 ]; then
-            t0=$(now_ms)
-            "$JIT_RUNNER" "$obj" >"$work/jit.out" 2>"$work/jit.err"
-            j_rc=$?
-            dt=$(( $(now_ms) - t0 ))
-            emit_event "$event" TIME J "$dt"
-            if [ "$j_rc" -eq "$expected_byte" ]; then
-                emit_event "$event" PASS "$name/J${TAG} (${dt}ms)"
-            else
-                emit_event "$event" FAIL "$name/J${TAG} (expected $expected_byte got $j_rc, ${dt}ms)"
-            fi
-        else
-            emit_event "$event" SKIP "$name/J${TAG}" "no jit-runner (host arch != $TEST_ARCH)"
-        fi
-    fi
-
-    # ---- Path W: DWARF check ----------------------------------------------
-    # Cases that don't register directives produce empty stdout from
-    # --dwarf-checks; we silently skip those (no SKIP entry, since W is
-    # opt-in per case rather than per host). DWARF / opt-level
-    # equivalence is a Phase 5+ concern, so skip W when level > 0.
-    if [ $RUN_W -eq 1 ] && [ "$OPT_LEVEL" = "0" ]; then
-        "${CG_RUN[@]}" --dwarf-checks "$name" >"$work/w.directives" \
-            2>"$work/w.dc.err"
-        if [ -s "$work/w.directives" ]; then
-            if [ $have_dwarf_check -eq 1 ]; then
-                t0=$(now_ms)
-                "$DWARF_CHECK" "$obj" <"$work/w.directives" \
-                    >"$work/w.out" 2>"$work/w.err"
-                w_rc=$?
-                dt=$(( $(now_ms) - t0 ))
-                emit_event "$event" TIME W "$dt"
-                if [ "$w_rc" -eq 0 ]; then
-                    emit_event "$event" PASS "$name/W (${dt}ms)"
-                else
-                    emit_event "$event" FAIL "$name/W (see $work/w.out, $work/w.err; ${dt}ms)"
-                fi
-            else
-                emit_event "$event" SKIP "$name/W" "no cg-check-dwarf"
-            fi
-        fi
-    fi
-
-    # ---- Path S: asm roundtrip (phase-1 stub) -----------------------------
-    # Walks .text through cfree_disasm_iter_*, reassembles via
-    # asm-runner --encode, byte-compares against the emitted bytes.
-    # Phase 1 per doc/ASM.md §5: the iterator and parse_asm are still
-    # stubs, so we report SKIP unconditionally when S is requested.
-    # When phase 3+4 land, replace this block with the real
-    # disasm/reassemble pipeline.
-    if [ $RUN_S -eq 1 ]; then
-        emit_event "$event" SKIP "$name/S${TAG}" "phase 1: cfree_disasm_iter_* / parse_asm are stubs"
-    fi
-
-    # W-only runs intentionally produce no output for cases without DWARF
-    # directives. Mark those as handled so replay can still distinguish them
-    # from workers that failed before writing a result.
-    emit_event "$event" NOOP
-    return 0
-}
-
-# Each level wraps cg-runner with --opt-level N. Level 0 drives the AArch64
-# backend directly; level >0 inserts opt_cgtarget. Cases tagged with /L<N>
-# in the output when level>0 so failures localize to the level.
-for OPT_LEVEL in $OPT_LEVELS; do
-    if [ "$OPT_LEVEL" = "0" ]; then
-        CG_RUN=("$CG_RUNNER")
-        TAG=""
-        WORK_SUB="cg"
-    else
-        CG_RUN=("$CG_RUNNER" "--opt-level" "$OPT_LEVEL")
-        TAG="/L${OPT_LEVEL}"
-        WORK_SUB="cg-L${OPT_LEVEL}"
-    fi
-
-    printf 'Running cases (opt-level %s, %s jobs)...\n' "$OPT_LEVEL" "$TEST_JOBS"
-
-    # Path E result bookkeeping (per level — flushed at end of this iteration).
-    E_NAMES=()
-    E_WORK=()
-    E_LINK_MS=()
-    E_EXPECTED=()
-
-    FILTERED_CASES=()
-    for name in $CASES; do
-        [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && continue
-        FILTERED_CASES+=("$name")
-    done
-
-    run_parallel_items "cg-L${OPT_LEVEL}" run_cg_case "${FILTERED_CASES[@]}"
-
-    # ---- batched path-E flush + verification (per level) -------------------
-    # Run every queued case in a single podman invocation per arch, then
-    # iterate the queue to read each exit code and emit PASS/FAIL.
-    if [ "$(exec_target_queue_size)" -gt 0 ]; then
-        printf 'Running path E%s (%d cases batched)...\n' \
-            "$TAG" "$(exec_target_queue_size)"
-        t0=$(now_ms)
-        exec_target_flush
-        DELTA=$(( $(now_ms) - t0 ))
-        T_E_BATCH=$(( ${T_E_BATCH:-0} + DELTA )); T_E=$(( T_E + DELTA ))
-
-        i=0
-        while [ $i -lt ${#E_NAMES[@]} ]; do
-            name="${E_NAMES[$i]}"
-            work="${E_WORK[$i]}"
-            link_dt="${E_LINK_MS[$i]}"
-            expected_byte="${E_EXPECTED[$i]}"
-            if [ ! -f "$work/exec.rc" ]; then
-                note_fail "$name/E${TAG} (no rc; podman batch did not produce results)"
-            else
-                RUN_RC="$(cat "$work/exec.rc")"
-                if [ "$RUN_RC" -eq "$expected_byte" ]; then
-                    note_pass "$name/E${TAG} (link ${link_dt}ms)"
-                else
-                    note_fail "$name/E${TAG} (expected $expected_byte got $RUN_RC, link ${link_dt}ms)"
-                fi
-            fi
-            i=$((i+1))
-        done
-    fi
-done
-
-T_E_BATCH=${T_E_BATCH:-0}
-
-# ---- summary ---------------------------------------------------------------
-
-if [ ${#FAIL_NAMES[@]} -gt 0 ]; then
-    printf '\nFailed:\n'
-    for n in "${FAIL_NAMES[@]}"; do printf '  %s\n' "$n"; done
-fi
-
-if [ ${#SKIP_NAMES[@]} -gt 0 ] && [ "$ALLOW_SKIP" != "1" ]; then
-    printf '\nSkipped (treat as failure; set CFREE_TEST_ALLOW_SKIP=1 to allow):\n'
-    for n in "${SKIP_NAMES[@]}"; do printf '  %s\n' "$n"; done
-fi
-
-printf '\nResults: %s pass, %s fail, %s skip\n' "$PASS" "$FAIL" "$SKIP"
-printf 'Time:    D=%dms  R=%dms  E=%dms (batch %dms)  J=%dms  W=%dms  S=%dms\n' \
-    "$T_D" "$T_R" "$T_E" "$T_E_BATCH" "$T_J" "$T_W" "$T_S"
-
-if [ $FAIL -gt 0 ]; then exit 1; fi
-if [ $SKIP -gt 0 ] && [ "$ALLOW_SKIP" != "1" ]; then exit 1; fi
-exit 0
diff --git a/test/test.mk b/test/test.mk
@@ -15,13 +15,9 @@
 # - test-link: linker + JIT behavioral harness in test/link/; three paths
 #   per case (roundtrip R, ELF exec E, JIT J). Depends only on libcfree.a.
 #   Set CFREE_TEST_ALLOW_SKIP=1 to allow skipped layers.
-# - test-cg:   cg / CGTarget / MCEmitter behavioral harness in test/cg/;
-#   four paths per case (D direct-JIT, R roundtrip, E exec, J jit-via-file).
-#   Depends only on libcfree.a; reuses test/link harness binaries.
 # - test-parse / test-parse-err: file-driven C parser harness in
-#   test/parse/; same path matrix as test-cg, but each case is a .c
-#   source file rather than a hand-built ObjBuilder fixture. Built
-#   against the public cfree.h surface; reuses cfree-roundtrip,
+#   test/parse/; each case is a .c source file. Built against the public
+#   cfree.h surface; reuses cfree-roundtrip,
 #   link-exe-runner, and jit-runner.
 # - test-asm: file-driven assembler/disassembler harness in test/asm/.
 #   Three sub-corpora (encode/, decode/, listing/), one mode per
@@ -29,9 +25,9 @@
 #   parse_asm / cfree_disasm_iter_* are still stubs; the harness builds
 #   and runs end-to-end so the wiring stays exercised. See doc/ASM.md.
 
-.PHONY: test test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-cg test-cg-api test-cg-binder test-toy test-opt test-dwarf test-debug test-parse test-parse-err test-asm test-isa test-aa64-inline test-libc test-musl test-glibc test-lib-deps test-smoke-x64 test-smoke-rv64
+.PHONY: test test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-cg-api test-toy test-opt test-dwarf test-debug test-parse test-parse-err test-asm test-isa test-aa64-inline test-libc test-musl test-glibc test-lib-deps test-smoke-x64 test-smoke-rv64
 
-test: test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-cg test-cg-binder test-toy test-dwarf test-debug test-parse test-parse-err test-asm test-isa test-aa64-inline test-lib-deps
+test: test-lex test-pp test-pp-err test-elf test-ar test-ar-driver test-link test-toy test-dwarf test-debug test-parse test-parse-err test-asm test-isa test-aa64-inline test-lib-deps
 
 test-lex: bin
 	@CFREE=$(abspath $(BIN)) test/lex/run.sh
@@ -79,8 +75,7 @@ $(DWARF_TEST_BIN): test/dwarf/dwarf_test.c $(LIB_AR)
 # debug_emit, asserts the produced sections have valid DWARF 5 structure
 # (length fields, version, address sizes, expected relocations against
 # function symbol). Deliberately bypasses the consumer (cfree_dwarf_open)
-# so encoder bugs aren't masked by matching decoder bugs — end-to-end
-# round-trip lives in test/cg path W.
+# so encoder bugs aren't masked by matching decoder bugs.
 DEBUG_TEST_BIN = build/test/debug_roundtrip_unit
 
 test-debug: $(DEBUG_TEST_BIN)
@@ -103,12 +98,6 @@ $(AA64_ISA_TEST_BIN): test/arch/aa64_isa_test.c $(LIB_AR)
 	@mkdir -p $(dir $@)
 	$(CC) $(DRIVER_CFLAGS) -Isrc test/arch/aa64_isa_test.c $(LIB_AR) -o $@
 
-# cg_inline_asm constraint binder unit test (doc/INLINEASM.md Track B).
-# Drives cg_inline_asm against a stand-in CGTarget that records every
-# operand handed to asm_block; covers r/=r/+r/=&r/i/m/0, the "memory"
-# clobber spill behaviour, register-name passthrough, and "cc" no-op.
-# Internal cg/ + arch/ surface — needs -Isrc.
-CG_BINDER_TEST_BIN = build/test/cg_binder_test
 CG_API_TEST_BIN = build/test/cg_api_test
 
 test-cg-api: $(CG_API_TEST_BIN)
@@ -118,13 +107,6 @@ $(CG_API_TEST_BIN): test/api/cg_type_test.c $(LIB_AR)
 	@mkdir -p $(dir $@)
 	$(CC) $(DRIVER_CFLAGS) test/api/cg_type_test.c $(LIB_AR) -o $@
 
-test-cg-binder: $(CG_BINDER_TEST_BIN)
-	$(CG_BINDER_TEST_BIN)
-
-$(CG_BINDER_TEST_BIN): test/cg/binder_test.c $(LIB_AR)
-	@mkdir -p $(dir $@)
-	$(CC) $(DRIVER_CFLAGS) -Isrc test/cg/binder_test.c $(LIB_AR) -o $@
-
 test-toy: bin
 	@CFREE=$(abspath $(BIN)) test/toy/run.sh
 
@@ -142,7 +124,7 @@ $(AA64_INLINE_TEST_BIN): test/arch/aa64_inline_test.c $(LIB_AR)
 	@mkdir -p $(dir $@)
 	$(CC) $(DRIVER_CFLAGS) -Isrc test/arch/aa64_inline_test.c $(LIB_AR) -o $@
 
-# Test harness binaries shared by test-elf, test-link, and test-cg.
+# Test harness binaries shared by test-elf and test-link.
 # Declared as Make targets (not built by the run.sh scripts) so they pick
 # up libcfree.a changes deterministically.
 #
@@ -185,9 +167,6 @@ test-elf: lib bin-soft $(ROUNDTRIP_BIN)
 test-link: lib $(ROUNDTRIP_BIN) $(ROUNDTRIP_BIN_MACHO) $(LINK_EXE_RUNNER) $(JIT_RUNNER)
 	bash test/link/run.sh
 
-test-cg: lib $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER)
-	bash test/cg/run.sh
-
 OPT_TEST_BIN = build/test/opt_test
 
 test-opt: $(OPT_TEST_BIN)

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

M	doc/cg-type-migration-plan.md	\|	545	++++++++++++++++++++++---------------------------------------------------------
M	include/abi/abi.h	\|	11	++++++++++-
M	src/abi/abi.c	\|	339	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------
M	src/abi/abi.h	\|	11	++++++++++-
M	src/abi/abi_aapcs64.c	\|	37	+++++++++++++++++++++----------------
M	src/abi/abi_apple_arm64.c	\|	4	++--
M	src/abi/abi_internal.h	\|	10	+++++-----
M	src/abi/abi_rv64.c	\|	37	+++++++++++++++++++++----------------
M	src/abi/abi_sysv_x64.c	\|	38	++++++++++++++++++++++----------------
M	src/api/cg.c	\|	112	+++++++++++++++++++++----------------------------------------------------------
M	src/api/cg_api.h	\|	9	+--------
A	src/api/cg_type.h	\|	72	++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	src/api/stubs.c	\|	3	+--
M	src/arch/arch.h	\|	3	++-
D	src/cg/cg.c	\|	1995	-------------------------------------------------------------------------------
D	src/cg/cg.h	\|	195	-------------------------------------------------------------------------------
D	src/cg/fold.c	\|	154	-------------------------------------------------------------------------------
D	src/cg/fold.h	\|	47	-----------------------------------------------
M	src/emu/emu.c	\|	28	+++++++++-------------------
M	src/emu/emu.h	\|	6	+++---
M	src/emu/lift.c	\|	6	+++---
D	test/cg/CORPUS.md	\|	436	-------------------------------------------------------------------------------
D	test/cg/binder_test.c	\|	538	-------------------------------------------------------------------------------
D	test/cg/dwarf_validate.sh	\|	81	-------------------------------------------------------------------------------
D	test/cg/harness/cases.c	\|	555	-------------------------------------------------------------------------------
D	test/cg/harness/cases_a.c	\|	112	-------------------------------------------------------------------------------
D	test/cg/harness/cases_asm.c	\|	101	-------------------------------------------------------------------------------
D	test/cg/harness/cases_b.c	\|	315	-------------------------------------------------------------------------------
D	test/cg/harness/cases_c.c	\|	204	-------------------------------------------------------------------------------
D	test/cg/harness/cases_d.c	\|	230	-------------------------------------------------------------------------------
D	test/cg/harness/cases_e.c	\|	258	-------------------------------------------------------------------------------
D	test/cg/harness/cases_f.c	\|	327	-------------------------------------------------------------------------------
D	test/cg/harness/cases_g.c	\|	660	-------------------------------------------------------------------------------
D	test/cg/harness/cases_h.c	\|	655	-------------------------------------------------------------------------------
D	test/cg/harness/cases_i.c	\|	435	-------------------------------------------------------------------------------
D	test/cg/harness/cases_j.c	\|	573	-------------------------------------------------------------------------------
D	test/cg/harness/cases_k.c	\|	210	-------------------------------------------------------------------------------
D	test/cg/harness/cases_l.c	\|	396	-------------------------------------------------------------------------------
D	test/cg/harness/cases_mc.c	\|	24	------------------------
D	test/cg/harness/cases_n.c	\|	286	-------------------------------------------------------------------------------
D	test/cg/harness/cases_o.c	\|	381	-------------------------------------------------------------------------------
D	test/cg/harness/cases_p.c	\|	132	-------------------------------------------------------------------------------
D	test/cg/harness/cases_q.c	\|	473	-------------------------------------------------------------------------------
D	test/cg/harness/cases_shared.c	\|	14	--------------
D	test/cg/harness/cases_shared.h	\|	17	-----------------
D	test/cg/harness/cg_check_dwarf.c	\|	429	-------------------------------------------------------------------------------
D	test/cg/harness/cg_runner.c	\|	657	-------------------------------------------------------------------------------
D	test/cg/harness/cg_test.c	\|	456	-------------------------------------------------------------------------------
D	test/cg/harness/cg_test.h	\|	300	-------------------------------------------------------------------------------
D	test/cg/run.sh	\|	652	-------------------------------------------------------------------------------
M	test/test.mk	\|	33	++++++---------------------------