kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 72f78c0141fca9e68a8efb575a098ba4db5b44ce
parent 662ea223050b63f620b330a9f129a1b352dfc0af
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat, 23 May 2026 15:26:19 -0700

ssa2 plan

Diffstat:
Adoc/SSA2.md | 932+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 932 insertions(+), 0 deletions(-)

diff --git a/doc/SSA2.md b/doc/SSA2.md @@ -0,0 +1,932 @@ +# SSA2 — Incremental SSA Def-Use, MIR-shape + +This document is the design sketch for replacing cfree's wholesale rebuilt +def-use representation (`opt_rebuild_def_use` in `src/opt/pass_analysis.c:327`) +with MIR-style incremental SSA edges. The structural move lands first; every +SSA-era downstream pass is disabled in-tree and then re-introduced one at a +time, each matching or improving on the corresponding MIR pass. + +Background: see `doc/OPT.md` for the O2 pipeline shape, `doc/OPT_PERF.md` "O2 +Gap Analysis vs MIR" for the measured impact of wholesale rebuilds (38 call +sites, 11 SSA-era DCE/copy invocations vs MIR's 2). + +## Status quo: what we are replacing + +cfree today carries a flat side table of uses, rebuilt from scratch by every +pass that mutates instructions. + +`OptUse` (`src/opt/ir.h:374-385`) records one use: + +```c +typedef struct OptUse { + Val val; + u32 block; + u32 inst; + InstId inst_id; + u32 next_for_val; /* singly-linked through opt_uses[] */ + u32 operand_index; + u32 phi_pred_index; + Operand* operand; /* pointer into Inst.opnds; valid only between rebuilds */ + u8 kind; /* OPT_USE_{OPERAND,INDIRECT_BASE,INDIRECT_INDEX,PHI_INPUT} */ + u8 pad[3]; +} OptUse; +``` + +`Func` owns `opt_uses[]` plus the per-Val head array +`opt_first_use_by_val[v]` (`ir.h:471-473`). The contract: + +- After `opt_rebuild_def_use(f)`, every use of every live Val is enumerable by + walking the `next_for_val` chain starting at `opt_first_use_by_val[v]`. +- Defs are addressed separately through `Func.val_def_block[v]` and + `Func.val_def_inst[v]` (`ir.h:405-406`) — one def per Val (SSA single-def + invariant). +- Any pass that adds, deletes, or rewrites an instruction must re-run + `opt_rebuild_def_use` before the next query, or the `operand` pointers in + `OptUse` go stale and the chains misrepresent the IR. + +That contract is what produces the 38 rebuild call sites and the 11 SSA-era +DCE/copy interleaving documented in `doc/OPT_PERF.md`. + +## MIR's representation + +MIR encodes def-use as a per-operand linked-list system (mir-gen.c:2134-2164): + +```c +typedef struct ssa_edge *ssa_edge_t; +struct ssa_edge { + bb_insn_t use, def; + char flag; /* scratch bit used by renaming/worklist */ + uint16_t def_op_num; + uint32_t use_op_num; + ssa_edge_t prev_use, next_use; /* doubly-linked list of uses of the same def */ +}; +``` + +The clever part: every operand carries a single pointer `op->data`. Its +meaning depends on whether the operand is a def or a use: + +- **Def operand** (`bb_insn->insn->ops[def_op_num].data`): head of the + doubly-linked list of `ssa_edge_t` records, one per use of this def. +- **Use operand** (`bb_insn->insn->ops[use_op_num].data`): the single + `ssa_edge_t` representing this use, embedded in the def's list via + `prev_use`/`next_use`. + +So traversing all uses of a def is `for (se = def->ops[i].data; se; se = se->next_use)`. Removing one use is O(1) via the doubly-linked pointers +(`mir-gen.c:2422-2434`). The shared `flag` byte is reused by renaming +(`push_to_rename`/`pop_to_rename`, `mir-gen.c:2550-2616`) and other worklist +algorithms — handy because it gives any pass a per-edge scratch bit without a +separate map. + +The companion structures (also `mir-gen.c:2166-2186`): + +- `def_tab` — hash `(bb, reg) → bb_insn_t` used by demand-driven `get_def` + (mir-gen.c:2284-2304). Only live during SSA construction; cleared after. +- `phis` / `deleted_phis` — running lists used by `minimize_ssa` + (mir-gen.c:2323-2372). +- `ssa_edges_to_process` — VARR worklist used by renaming. + +The whole machinery is built once by `build_ssa` (mir-gen.c:2674-2709), +maintained incrementally for the lifetime of SSA, and torn down by +`undo_build_ssa` (mir-gen.c:2789-2820, walking ops and freeing edges). + +## Why this works for MIR and what changes for cfree + +MIR can stash a pointer directly on each operand because: + +1. `MIR_op_t` already has a `void* data` field for back-end use. +2. `MIR_insn_t` is heap-allocated and reached via a doubly-linked list — instruction objects never move. +3. `MIR_OP_VAR_MEM` packs base+index into one operand; in practice MIR + treats each operand position as a single use site (the back-end iterates + vars within an op separately). + +cfree differs on every count and needs a slightly different layout: + +1. `Operand` (`src/opt/ir.h`) has no scratch field. We have to add one or + keep edges in a side table. +2. `Block.insts` is an `Inst*` array that can be reallocated on growth. + `Inst*` pointers are not stable — we must address instructions by + `(block_id, inst_index)`. `Inst.opnds` is arena-backed and stable for the + life of the instruction, so `Operand*` is stable as long as the instruction + exists, but it is not unique-id-able across arena resets. +3. `OPK_INDIRECT` carries two register uses (`v.ind.base`, `v.ind.index`) + within one `Operand`. We need two edge slots per indirect operand. +4. Many uses live in `extra.aux` (phi inputs in `IRPhiAux.pred_vals[]`, call + args in `IRCallAux`, asm operands in `IRAsmAux`, intrinsic args, return + value, scope cond) — not in `opnds[]`. Each of those needs an edge slot + too. cfree already enumerates them in `opt_collect_inst_uses` + (`pass_analysis.c:269-325`); the same enumeration is the basis for + maintenance. + +## Design constraints + +Speed is paramount, second only to correctness. The hot operations are walk +uses of a Val (GVN, DSE, LICM, DCE), relink an operand (copy prop, +substitute, addr_xform), remove a use (delete inst, rewrite operand), and +redirect a def (GVN replacement, copy_cleanup). Each must be O(1) per +operation aside from `redirect_def` which is O(uses_of_val). + +Three structural decisions that follow from this: + +1. **`u32` ids throughout — no pointers stored across operations.** Edge + ids are u32 indices into a pool. Storing `OptSsaEdge*` across an `add` + or `remove` requires pointer-stable backing storage; ids don't. +2. **`SegVec` (`src/core/segvec.h`) for the edge pool.** Push never moves + existing elements. A free list through `next_use` reuses freed slots, so + the segment count tracks peak live edges, not lifetime allocations. MIR + allocates each `ssa_edge_t` separately with `gen_malloc`; we get the + same pointer stability with better cache density and no per-edge malloc + overhead. +3. **Use-side edge id stored on the use site itself.** Operands gain a + `u32 edge` field. Aux structs gain a parallel `u32* edges` array sized + at aux creation. This makes "rewrite this operand's Val" a single edge + lookup, never a hash hit or a scan of the def's use list. + +We do **not** use MIR's `op->data` double-duty (head-at-def, edge-at-use). +cfree already has `opt_first_use_by_val[v]` keyed directly by Val id, which +is faster than walking from a def-side head — one array load vs an +operand load plus an Inst lookup. Use sites store their edge id, defs +store nothing in the operand. + +## Data structures + +All names use the `opt_ssa_` / `OptSsa` prefix to avoid colliding with the +legacy `OptUse` types during the migration. + +### Edge record + +```c +#define OPT_SSA_EDGE_NONE 0xffffffffu + +typedef enum OptSsaUseKind { + OPT_SSA_USE_OPERAND, /* Inst.opnds[op_idx] is an OPK_REG */ + OPT_SSA_USE_INDIRECT_BASE, /* Inst.opnds[op_idx].v.ind.base */ + OPT_SSA_USE_INDIRECT_INDEX, /* Inst.opnds[op_idx].v.ind.index */ + OPT_SSA_USE_PHI_INPUT, /* IRPhiAux.pred_vals[sub_idx] */ + OPT_SSA_USE_AUX_OPERAND, /* call/asm/intrinsic/ret/scope aux operand */ +} OptSsaUseKind; + +typedef struct OptSsaEdge { + /* Val whose def this edge refers to. Cached on the edge so the per-Val + * list head update on remove doesn't need to chase the use operand. */ + Val val; + + /* Use site. (use_block, use_inst) identify the consuming instruction; + * use_op_idx and sub_idx locate the specific use within it. */ + u32 use_block; + u32 use_inst; + u16 use_op_idx; + u16 sub_idx; /* phi pred index or aux sub-index; 0 otherwise */ + + /* Def location. Single-def SSA invariant means there's exactly one + * producing inst per Val; def_op_idx is which output position of that + * inst produces val (0 for primary def; >0 for multi-result via + * Inst.defs[]). Cached to avoid scanning the def inst on each touch. */ + u32 def_block; + u32 def_inst; + u8 def_op_idx; + + u8 kind; /* OptSsaUseKind */ + u8 flag; /* scratch bit; reused by renaming/GVN worklist */ + u8 pad; + + /* Doubly-linked use list per Val. Head at f->opt_first_use_by_val[val]. */ + u32 prev_use; /* OPT_SSA_EDGE_NONE if first */ + u32 next_use; /* OPT_SSA_EDGE_NONE if last; also free-list link */ +} OptSsaEdge; +``` + +The struct is 32 bytes on a 64-bit host. Two edges per cache line. + +### Edge pool + +```c +SEGVEC_DEFINE(OptSsaEdgeVec, OptSsaEdge, 7); /* 128 edges per segment ≈ 4 KB */ + +typedef struct OptSsaState { + OptSsaEdgeVec edges; /* pool; index = edge id */ + u32 free_head; /* head of free list through next_use */ + /* Per-Val use list heads. Indexed by Val id. */ + u32* first_use_by_val; + u32 first_use_by_val_cap; +} OptSsaState; +``` + +`OptSsaState` lives on `Func` (`Func.opt_ssa`) and is initialized at +`opt_build_reg_ssa` / finalized at `opt_undo_ssa`. The SegVec uses the +Compiler's `Heap`; segments are freed at `_fini`. + +Per-Val head array can be arena-grown (`Func.arena`): it holds only u32 +indices, never pointers, so reallocation on Val mint is safe. + +`first_use_by_val_cap` tracks current capacity; on Val mint, if `nvals > +cap`, double and copy. The copy is cheap (u32s, count = active Vals). + +### Use-site edge storage + +The use site stores its edge id so that `relink` and `remove` are one +lookup, never a list walk. + +**`Operand`** (in `src/opt/ir.h`) — add to the `OPK_REG` and `OPK_INDIRECT` +variants: + +```c +union { + struct { Reg r; u32 edge; } reg; + struct { Reg base, index; i32 ofs; u32 edge_base; + u32 edge_index; } ind; + /* other variants unchanged */ +} v; +``` + +`Operand` grows by 4 bytes for `reg` and 8 bytes for `ind`. Non-reg, +non-indirect variants are untouched. + +**Aux structs** (in `src/opt/ir.h`) — each gains a parallel edge-id array, +sized at aux construction: + +```c +typedef struct IRPhiAux { + /* existing fields */ + Val* pred_vals; + u32* pred_edges; /* edge id per pred; OPT_SSA_EDGE_NONE if none */ + u32 npreds; + /* ... */ +} IRPhiAux; +``` + +Same shape for `IRCallAux`, `IRAsmAux`, `IRIntrinAux`, `IRRetAux`, +`IRScopeAux`. Sized exactly to the use count enumerated by +`opt_collect_inst_uses` today (`pass_analysis.c:269-325`), so no growth. + +Memory accounting: per use site we spend 4 bytes (edge id) at the use plus +32 bytes (edge record) in the pool, vs today's 24 bytes per `OptUse` entry +plus an entire `opt_uses[]` rebuild per pass. The win is in not paying the +rebuild cost across ~38 sites per O2 pipeline. + +### Defs + +Existing `Func.val_def_block[v]` / `val_def_inst[v]` continue to identify +the single def for each Val (`ir.h:405-406`). The edge's `def_block`, +`def_inst`, `def_op_idx` are caches of the same information — written at +edge creation, updated by `opt_ssa_redirect_def`. They exist so that walks +of a Val's use list don't have to re-lookup the def through `val_def_*` +just to know "what produced this". + +## Core operations + +The API splits into two layers: + +- **Low-level pool/list primitives** (this section). Hot path. Inline in + `src/opt/ssa_edges.h`. These touch the edge pool and the per-Val use + list only; they do not know about Operand or aux layouts. Suitable for + use inside tight walks where the caller already has the use-site handle. + +- **Use-site-aware public helpers** (next section). What passes call. + Out-of-line in `src/opt/ssa_edges.c`. These wrap the primitives and also + resolve and update the use-site slot (Operand field, aux edges array). + +Splitting this way lets the inner-most loops — walking a Val's use list +and rewriting edges as you go — call only the primitives without paying +for a use-site lookup that the caller already has implicitly. + +### Lookup and iteration + +```c +/* O(1) — two dependent loads inside SegVec_at. */ +static inline OptSsaEdge* opt_ssa_edge(Func* f, u32 edge_id) { + return OptSsaEdgeVec_at(&f->opt_ssa.edges, edge_id); +} + +/* Walk every use of val. Cache next_use before the body if the body may + * remove the current edge — see the "iterator invalidation" note below. */ +#define OPT_SSA_FOR_EACH_USE(F, V, E) \ + for (u32 E = (F)->opt_ssa.first_use_by_val[V]; \ + E != OPT_SSA_EDGE_NONE; \ + E = opt_ssa_edge((F), E)->next_use) +``` + +### Pool allocate / free + +```c +/* O(1). Pop from free list if non-empty, else SegVec_push. Returned slot + * is uninitialized — the caller writes every field. */ +static inline u32 opt_ssa_pool_alloc(Func* f) { + u32 id = f->opt_ssa.free_head; + if (id != OPT_SSA_EDGE_NONE) { + f->opt_ssa.free_head = opt_ssa_edge(f, id)->next_use; + return id; + } + (void)OptSsaEdgeVec_push(&f->opt_ssa.edges, &id); + return id; +} + +/* O(1). Push onto free list via next_use. flag = 0xff marks the slot as + * free for the debug verifier. */ +static inline void opt_ssa_pool_free(Func* f, u32 id) { + OptSsaEdge* e = opt_ssa_edge(f, id); +#ifndef NDEBUG + e->flag = 0xff; +#endif + e->next_use = f->opt_ssa.free_head; + f->opt_ssa.free_head = id; +} +``` + +### List link / unlink + +These are the inner loops of `add`/`remove`/`relink`. Inlined. + +```c +/* O(1). Insert edge at the head of val's use list. Caller has already + * written e->val. */ +static inline void opt_ssa_list_push(Func* f, u32 edge_id) { + OptSsaEdge* e = opt_ssa_edge(f, edge_id); + u32 head = f->opt_ssa.first_use_by_val[e->val]; + e->prev_use = OPT_SSA_EDGE_NONE; + e->next_use = head; + if (head != OPT_SSA_EDGE_NONE) + opt_ssa_edge(f, head)->prev_use = edge_id; + f->opt_ssa.first_use_by_val[e->val] = edge_id; +} + +/* O(1). Unlink edge from val's use list. The edge fields are left as-is; + * the caller decides whether to free or relink. */ +static inline void opt_ssa_list_unlink(Func* f, u32 edge_id) { + OptSsaEdge* e = opt_ssa_edge(f, edge_id); + if (e->prev_use != OPT_SSA_EDGE_NONE) + opt_ssa_edge(f, e->prev_use)->next_use = e->next_use; + else + f->opt_ssa.first_use_by_val[e->val] = e->next_use; + if (e->next_use != OPT_SSA_EDGE_NONE) + opt_ssa_edge(f, e->next_use)->prev_use = e->prev_use; +} +``` + +### Primitive add / remove / relink + +```c +/* O(1). Allocate, initialize, and link an edge. Caller writes the new + * edge id into the use-site handle. */ +static inline u32 opt_ssa_edge_add_raw( + Func* f, Val val, + u32 def_block, u32 def_inst, u8 def_op_idx, + u32 use_block, u32 use_inst, + OptSsaUseKind kind, u16 use_op_idx, u16 sub_idx) { + u32 id = opt_ssa_pool_alloc(f); + OptSsaEdge* e = opt_ssa_edge(f, id); + e->val = val; + e->def_block = def_block; + e->def_inst = def_inst; + e->def_op_idx = def_op_idx; + e->use_block = use_block; + e->use_inst = use_inst; + e->use_op_idx = use_op_idx; + e->sub_idx = sub_idx; + e->kind = (u8)kind; + e->flag = 0; + opt_ssa_list_push(f, id); + return id; +} + +/* O(1). Unlink and free. Caller clears the use-site handle. */ +static inline void opt_ssa_edge_remove_raw(Func* f, u32 edge_id) { + opt_ssa_list_unlink(f, edge_id); + opt_ssa_pool_free(f, edge_id); +} + +/* O(1). Repoint an existing edge to a new Val. Caller is responsible for + * rewriting the use-site Reg/Val to match. The edge id and its position + * in the use site stay the same — only the val and the use-list it + * belongs to change. */ +static inline void opt_ssa_edge_relink_raw(Func* f, u32 edge_id, Val new_val, + u32 new_def_block, + u32 new_def_inst, + u8 new_def_op_idx) { + opt_ssa_list_unlink(f, edge_id); + OptSsaEdge* e = opt_ssa_edge(f, edge_id); + e->val = new_val; + e->def_block = new_def_block; + e->def_inst = new_def_inst; + e->def_op_idx = new_def_op_idx; + opt_ssa_list_push(f, edge_id); +} +``` + +### Iterator invalidation + +Removing the current edge during a `OPT_SSA_FOR_EACH_USE` walk corrupts +the iteration because the macro fetches `next_use` *after* the body. The +safe pattern: + +```c +u32 e = f->opt_ssa.first_use_by_val[v]; +while (e != OPT_SSA_EDGE_NONE) { + u32 next = opt_ssa_edge(f, e)->next_use; + if (should_remove_p(e)) opt_ssa_edge_remove_raw(f, e); + e = next; +} +``` + +This is the same shape as MIR's removal loops (mir-gen.c:2810-2811). The +macro is for read-only walks; mutation walks use the explicit form above. + +## Public helpers + +Out-of-line in `src/opt/ssa_edges.c`. These are what passes actually call; +each resolves the use-site slot and then calls the primitive layer. + +### Use-site slot resolver + +A single switch from `(use_block, use_inst, kind, op_idx, sub_idx)` to a +pointer to the u32 edge slot. Used by `_drop_inst`, `_add_inst`, the high- +level relink helpers, and the verifier. + +```c +static u32* opt_ssa_use_slot(Func* f, u32 use_block, u32 use_inst, + OptSsaUseKind kind, u16 op_idx, u16 sub_idx) { + Inst* in = &f->blocks[use_block].insts[use_inst]; + switch (kind) { + case OPT_SSA_USE_OPERAND: + return &in->opnds[op_idx].v.reg.edge; + case OPT_SSA_USE_INDIRECT_BASE: + return &in->opnds[op_idx].v.ind.edge_base; + case OPT_SSA_USE_INDIRECT_INDEX: + return &in->opnds[op_idx].v.ind.edge_index; + case OPT_SSA_USE_PHI_INPUT: + return &((IRPhiAux*)in->extra.aux)->pred_edges[sub_idx]; + case OPT_SSA_USE_AUX_OPERAND: + /* Dispatch on IROp; each aux struct carries its own *_edges[] sized + * to the use count enumerated by opt_collect_inst_uses. */ + switch ((IROp)in->op) { + case IR_CALL: return &((IRCallAux*)in->extra.aux)->arg_edges[sub_idx]; + case IR_ASM_BLOCK: return &((IRAsmAux*)in->extra.aux)->in_edges[sub_idx]; + case IR_INTRINSIC: return &((IRIntrinAux*)in->extra.aux)->arg_edges[sub_idx]; + case IR_RET: return &((IRRetAux*)in->extra.aux)->val_edges[sub_idx]; + case IR_SCOPE_BEGIN: return &((IRScopeAux*)in->extra.aux)->cond_edge; + default: cfree_unreachable(); + } + } + cfree_unreachable(); +} +``` + +The dispatch is one switch and one struct field load per resolution. +Hot-loop callers (relink during copy_prop, GVN substitute) avoid this by +holding the slot pointer or the Operand directly — see the relink-operand +helpers below. + +### Inst-level add / drop + +```c +/* Walk every use site on the instruction and add an edge. Called once + * per instruction from opt_ssa_edges_add_inst's outer loop, plus per + * new instruction emitted mid-pipeline. */ +void opt_ssa_edges_add_inst(Func* f, u32 block, u32 inst_idx); + +/* Walk every use site and remove its edge. Asserts (debug) that none of + * the instruction's defined Vals still has uses — caller has done its + * rewriting. */ +void opt_ssa_edges_drop_inst(Func* f, u32 block, u32 inst_idx); +``` + +The structure of both mirrors `opt_collect_inst_uses` (pass_analysis.c: +269-325) exactly — same operand walk, same aux-struct dispatch. The +difference is that today's path appends to `opt_uses[]` and writes +`opt_first_use_by_val[v]` once at the end; the new path calls +`opt_ssa_edge_add_raw` for each site and writes the returned edge id back +through `opt_ssa_use_slot`. + +### Operand relink (the most common public helper) + +This is what copy_prop, GVN substitute, addr_xform, and ssa_combine call +to rewrite an operand's Val: + +```c +/* Rewrite an OPK_REG operand from old_val to new_val, maintaining edges. + * Reads the edge id from op->v.reg.edge; relinks it to new_val's list; + * writes op->v.reg.r = (Reg)new_val. */ +static inline void opt_ssa_relink_operand(Func* f, Operand* op, Val new_val) { + u32 edge_id = op->v.reg.edge; + opt_ssa_edge_relink_raw(f, edge_id, new_val, + f->val_def_block[new_val], + f->val_def_inst[new_val], + opt_ssa_val_def_op_idx(f, new_val)); + op->v.reg.r = (Reg)new_val; +} + +/* OPK_INDIRECT.base variant. Same shape against op->v.ind.edge_base. */ +static inline void opt_ssa_relink_indirect_base(Func* f, Operand* op, Val new_val); +static inline void opt_ssa_relink_indirect_index(Func* f, Operand* op, Val new_val); + +/* Phi input: aux->pred_edges[p], aux->pred_vals[p] = new_val. */ +static inline void opt_ssa_relink_phi_input(Func* f, IRPhiAux* aux, + u32 pred_idx, Val new_val); +``` + +All five are inline because the body is short and they're called inside +inner loops. The cost is one edge lookup, one list unlink + push, two +field writes — no use-site resolver call because the caller already has +the Operand/aux pointer. + +### Redirect def (whole-Val substitution) + +O(uses_of(from)). One pass over the list, rewriting each edge's +`val`/`def_*` and the operand it refers to, then splicing the whole list +onto `to`'s head. Single-pass equivalent of N individual relinks. + +```c +void opt_ssa_redirect_def(Func* f, Val from, Val to) { + u32 head = f->opt_ssa.first_use_by_val[from]; + if (head == OPT_SSA_EDGE_NONE) return; + + u32 to_def_block = f->val_def_block[to]; + u32 to_def_inst = f->val_def_inst[to]; + u8 to_def_op_idx = opt_ssa_val_def_op_idx(f, to); + Reg to_reg = (Reg)to; + + /* Walk from-val's list, rewriting val + operand at each edge. */ + u32 tail = head; + for (u32 e = head; e != OPT_SSA_EDGE_NONE; ) { + OptSsaEdge* ed = opt_ssa_edge(f, e); + ed->val = to; + ed->def_block = to_def_block; + ed->def_inst = to_def_inst; + ed->def_op_idx = to_def_op_idx; + /* Rewrite the use-site operand to match. */ + u32* slot = opt_ssa_use_slot(f, ed->use_block, ed->use_inst, + (OptSsaUseKind)ed->kind, + ed->use_op_idx, ed->sub_idx); + (void)slot; /* slot already holds e; what we update is the Reg/Val it points to */ + opt_ssa_write_use_reg(f, ed, to_reg); + tail = e; + e = ed->next_use; + } + + /* Splice the whole list onto to's head. */ + u32 to_head = f->opt_ssa.first_use_by_val[to]; + opt_ssa_edge(f, tail)->next_use = to_head; + if (to_head != OPT_SSA_EDGE_NONE) + opt_ssa_edge(f, to_head)->prev_use = tail; + f->opt_ssa.first_use_by_val[to] = head; + f->opt_ssa.first_use_by_val[from] = OPT_SSA_EDGE_NONE; +} +``` + +`opt_ssa_write_use_reg` is a small helper that switches on the edge's +`kind` and writes the appropriate Reg field of the use's Operand or aux +entry. It's the inverse of `opt_ssa_use_slot` but for the Val/Reg value +rather than the edge id. + +This mirrors MIR's `change_ssa_edge_list_def` (mir-gen.c:2445-2462) one +to one, with cfree's per-kind dispatch for the operand-side write. + +### Inst-level delete + +```c +void opt_ssa_delete_inst(Func* f, u32 block, u32 inst_idx) { + opt_ssa_edges_drop_inst(f, block, inst_idx); + /* existing inst-removal path: mark NOP / shift block.insts / etc. */ + ir_block_remove_inst(f, block, inst_idx); +} +``` + +Wraps the existing inst-removal path with the edge drop step. Passes +that delete an inst always go through this rather than touching the +block insts array directly. + +### State lifecycle + +```c +void opt_ssa_state_init(Func* f); /* opt_build_reg_ssa */ +void opt_ssa_state_fini(Func* f); /* opt_undo_ssa */ + +/* On Val mint: ensure first_use_by_val[] is large enough for the new + * highest Val id. Caller passes the new minimum capacity. */ +void opt_ssa_state_ensure_val_cap(Func* f, u32 nvals_needed); +``` + +`_init` zero-inits `OptSsaState`, sets `free_head = OPT_SSA_EDGE_NONE`, +initializes the SegVec on the Compiler heap, allocates +`first_use_by_val` with capacity `nvals` and fills with +`OPT_SSA_EDGE_NONE`. + +`_fini` calls `OptSsaEdgeVec_fini` and clears the pointers. The arena- +backed `first_use_by_val` is reclaimed by the function-end arena reset. + +### Debug verifier + +```c +#ifndef NDEBUG +/* Walk all blocks/insts; re-derive expected use set; compare against + * live edges. Called at OPT_VERIFY checkpoints in pass_o2.c. */ +void opt_ssa_edges_check(Func* f, const char* stage); +#endif +``` + +For each instruction: + +1. For every use site, read the edge id from its slot. If non-NONE, + look up the edge and assert `e.val == operand_val`, + `e.use_block/use_inst/op_idx/sub_idx/kind` match, and the edge is on + `first_use_by_val[e.val]`'s doubly-linked list. +2. For every Val the inst defines, walk `first_use_by_val[v]` and + assert each edge's `def_block`/`def_inst`/`def_op_idx` agrees with + the producing inst. +3. Verify the free list — no live edge id appears on it, every free + edge has `flag == 0xff`. + +This is O(insts + uses) per call. Gated under `CFREE_OPT_SSA_VERIFY` so +that day-to-day debug builds stay fast and only the full verify runs +under explicit opt-in. + +## Data flow + +The lifecycle of an edge, from creation to teardown. + +**Creation (SSA construction).** `opt_build_reg_ssa` and `opt_build_ssa` +walk the function once after IDF-based phi insertion. For each +instruction, they call `opt_ssa_edges_add_inst`, which enumerates use +sites the same way `opt_collect_inst_uses` does today. Each use site: + +1. Look up `val_def_block[v]`, `val_def_inst[v]` for the producing inst. +2. Allocate edge (free list or pool push). +3. Initialize all fields. +4. Write `next_use = first_use_by_val[v]; prev_use = NONE`. If the + existing head exists, set its `prev_use = new_edge_id`. +5. Set `first_use_by_val[v] = new_edge_id`. +6. Write the edge id into the use-site handle (Operand or aux array). + +**Use walk (the most common operation).** Passes consult uses of a Val v. +The walk reads `first_use_by_val[v]`, then chases `next_use`. Each step is +one SegVec `_at` (two dependent loads). Within a walk, edges are stable — +the caller can hold `OptSsaEdge*` for the duration of one iteration safely +across edge removal at the current cursor (cache the `next_use` before +removing). + +**Relink (operand rewrite).** A pass discovers that an operand currently +referring to Val `old_v` should now refer to `new_v`. It reads the edge id +from the use-site handle, calls `opt_ssa_edge_relink`. The edge's +`prev_use`/`next_use` are unlinked from `old_v`'s list, the edge's `val` +and the operand's Reg are updated, and the edge is linked at the head of +`new_v`'s list. Six pointer rewrites total, no list walk. + +**Redirect def (whole-Val substitution).** A pass discovers Val `a` is +equivalent to Val `b` (GVN substitute, copy_cleanup). `opt_ssa_redirect_def` +walks `a`'s use list once. For each edge: rewrite its `val` and the +operand it indexes. After the walk, splice the tail of `a`'s list onto +`b`'s list head: + +```c +u32 head_a = first_use_by_val[a]; +if (head_a == NONE) { return; } +u32 tail = walk(head_a); /* find tail, rewriting val + operand along the way */ +opt_ssa_edge(tail)->next_use = first_use_by_val[b]; +if (first_use_by_val[b] != NONE) + opt_ssa_edge(first_use_by_val[b])->prev_use = tail; +first_use_by_val[b] = head_a; +first_use_by_val[a] = NONE; +``` + +O(uses_of(a)) with no per-element list-juggling. The producing inst of `a` +is left in place (DCE will remove it once unused). + +**Inst deletion.** Three steps: + +1. `opt_ssa_edges_drop_inst(f, block, inst_idx)`. For each use site on the + inst, remove the edge (incoming). For each Val the inst defines, walk + its use list — if non-empty, the caller is wrong to be deleting this + inst; in debug builds, assert. The walk is fine because deletion implies + no remaining uses (the caller has already done its rewriting). +2. Remove the inst from `Block.insts[]` (or mark NOP — existing + convention). +3. The arena memory for `Inst.opnds` and aux structs is not freed + individually; it's reclaimed at function-end arena reset. + +**Inst creation mid-pipeline.** A pass emits a new inst (e.g., a copy +inserted by copy_cleanup, a hoisted invariant by LICM): + +1. `ir_emit` allocates the inst with Operand fields initialized to + `OPT_SSA_EDGE_NONE`. +2. Pass fills in operands and aux. +3. Pass calls `opt_ssa_edges_add_inst` on the new inst. + +The order matters: edges are added *after* operands are populated, so the +add path can read the Reg values to find their producing Val and edge to. + +**Teardown (opt_undo_ssa).** Walks all blocks/insts, calls +`opt_ssa_edges_drop_inst` on each. After the walk, `OptSsaState` is reset: +edges SegVec emptied (or kept for the next function), free_head reset, +first_use_by_val cleared. + +## Lifetimes + +Everything is per-function. SSA edges are not retained across functions in +the optimizer's `FuncSet`; each function is built, lowered, and emitted in +isolation. The lifetimes: + +| Object | Born | Dies | Storage | +| --- | --- | --- | --- | +| `OptSsaEdge` (record) | `opt_ssa_edge_add` | `opt_ssa_edge_remove` | `OptSsaState.edges` (SegVec on Heap) | +| Edge slot in SegVec | first segment push | `OptSsaState_fini` | Heap; freed at function teardown | +| `OptSsaState.free_head` | `opt_build_reg_ssa` | `opt_undo_ssa` | `Func.opt_ssa` | +| `first_use_by_val[]` | `opt_build_reg_ssa` / on Val mint grow | `opt_undo_ssa` | `Func.arena` | +| `Operand.v.reg.edge` | `ir_emit` (init to NONE) → `opt_ssa_edge_add` | `opt_ssa_edge_remove` / inst deletion | inside `Inst.opnds` (arena) | +| `IRPhiAux.pred_edges[]` | aux construction | function arena reset | `Func.arena` | +| Other aux `*_edges[]` | aux construction | function arena reset | `Func.arena` | + +Three lifetime invariants for callers: + +1. **Edge ids are valid only between `opt_build_*_ssa` and `opt_undo_ssa`.** + At O1, no edges exist. Any code path that runs at both O1 and O2 must + gate edge maintenance on `Func.opt_reg_ssa`. +2. **`OptSsaEdge*` is stable across `add`/`remove`/`relink`** thanks to + SegVec. It is NOT stable across `OptSsaState_fini`. Within a pass it's + safe to hold; across passes use the edge id. +3. **An edge's use-site handle must always agree with the edge.** If a + pass rewrites an operand's Reg directly (not through `relink`), the + edge's `val` becomes stale and the use list is wrong. The debug + verifier (`opt_ssa_edges_check`) walks all insts, re-derives the + expected use set, and compares; this is the catch-all. + +## Operand mutation: the contract + +Every place that mutates an instruction operand or aux use moves to a +helper: + +| Today | New | +| --- | --- | +| `op->v.reg = new_val;` | `opt_ssa_relink_operand(f, op, new_val);` | +| `aux->pred_vals[p] = new_val;` | `opt_ssa_relink_phi_input(f, aux, p, new_val);` | +| `block.insts[i].op = IR_NOP;` (delete) | `opt_ssa_delete_inst(f, b, i);` | +| emit new inst, then read its operands | emit new inst, then `opt_ssa_edges_add_inst(f, b, i);` | +| `opt_rebuild_def_use(f);` | deleted; the invariant is maintained | + +The migration tax is one-time: every pass that mutates touches these +helpers exactly once per mutation kind, then is done. + +## Construction and teardown + +`opt_build_reg_ssa` / `opt_build_ssa` (currently in `src/opt/pass_ssa.c`) are +rewritten in two phases: + +1. **Today's shape, edge-aware**: replace the post-construction + `opt_rebuild_def_use(f)` call (`pass_ssa.c:548`, `:807`) with a single + forward walk that calls `opt_ssa_edges_add_inst` once per instruction. + IDF-based phi insertion stays unchanged. Net structural move with no + algorithm change. + +2. **Later (post-roadmap re-enable)**: switch to MIR-shape demand-driven + construction with `get_def` and `minimize_ssa` + (mir-gen.c:2284-2372). Add live-in filtering to mem2reg + (`compute_phi_sites`, `pass_ssa.c:623-654`). + +Teardown is symmetric: `opt_undo_ssa` calls `opt_ssa_edges_drop_inst` for +each instruction as it converts back to PReg form, returning all edges to +the free list. + +## Invalidation and verification + +`Func.opt_valid_analyses` (`ir.h:475`) currently has an +`OPT_ANALYSIS_DEF_USE` bit. Under the new representation, def-use is +**always valid in SSA mode** — there is no rebuild. The bit becomes a +verification flag rather than a state flag: + +- Set after `opt_build_*_ssa` (matches today). +- Cleared by `opt_undo_ssa`. +- Never cleared by individual passes. If a pass needs to mutate instructions, + it goes through `opt_ssa_edges_*` and the invariant survives. + +In debug builds, `opt_verify` (`pass_o2.c` checkpoints) gains an +`opt_ssa_edges_check` step that re-derives uses from instructions and +compares against the live edge lists. This is the safety net for the +migration: any pass that mutates instructions without maintaining edges +trips the verifier at the next checkpoint. + +## Migration plan + +The migration treats every SSA-era pass as opt-in. The structural move lands +green with downstream passes disabled, then each pass is re-enabled in turn +with its MIR-parity work. + +### Phase 0 — quarantine SSA-era passes + +In `src/opt/pass_o2.c:opt_cleanup`, gate every pass between +`opt_build_reg_ssa` and `opt_undo_ssa` behind a per-pass `#ifdef +CFREE_OPT_SSA_PASS_<NAME>`. By default all are undefined. The schedule then +runs: + +```text +opt_build_cfg, opt_jump_cleanup, opt_build_cfg +opt_build_reg_ssa +opt_build_ssa +opt_undo_ssa +opt_jump_opt +``` + +That is the smallest legal O2 schedule: build SSA, immediately undo, run +the shared lowering pipeline. It is functionally equivalent to O1 plus the +SSA round-trip — expensive but correct, and the bisection floor for the +structural change. + +Tests that currently exercise SSA-era transformations +(`128_o2_branch_join_addr_mem.toy` and friends) move to an "expected +mediocre" tier in `test-opt` and `test-toy`: they must continue to compile +and produce correct output, but quality is not asserted until the +corresponding pass is re-enabled. This avoids losing coverage during the +migration. `make bench-opt` at O2 will regress against today's numbers +during the quarantine window — that is expected and recorded in +`OPT_PERF.md` as a known trough. + +### Phase 1 — land the data structures + +1. Add `OptSsaEdge`, the pool fields on `Func`, and the `u32 edge` / + `u32 edge_index` slots on `Operand`. +2. Add `u32* pred_edges` to `IRPhiAux`; `u32* operand_edges` to the call / + asm / intrinsic / ret / scope aux structs (sizes track the existing + use enumeration in `opt_collect_inst_uses`). +3. Implement the core operations: `opt_ssa_edge_add`, + `opt_ssa_edge_remove`, `opt_ssa_edge_relink`, `opt_ssa_redirect_def`, + `opt_ssa_edges_add_inst`, `opt_ssa_edges_drop_inst`. +4. Implement `opt_ssa_edges_check` (debug-only verifier). +5. Wire `opt_build_*_ssa` to build edges in one forward pass; wire + `opt_undo_ssa` to drop them. +6. Delete `opt_uses[]`, `OptUse`, `opt_rebuild_def_use`, and every call to + it. Each call site converts to a direct edge walk. + +Acceptance: O2 quarantine schedule (Phase 0) passes `make test-opt +test-toy`, with the verifier enabled in debug builds. + +### Phase 2 — re-enable passes, one per landing + +Each phase is its own commit (or small series), shaped: + +1. Implement the MIR-equivalent algorithm using incremental edges from the + start — never reintroduce a wholesale rebuild. +2. Define the corresponding `CFREE_OPT_SSA_PASS_<NAME>` and enable it in + the O2 schedule. +3. Restore the pass's quality assertions in `test-opt` / `test-toy`. +4. Refresh the corresponding row in `doc/OPT_PERF.md` and add an iteration + note documenting the MIR parity (or improvement) achieved. + +Suggested order, chosen so each pass exercises only edges already proven by +its predecessors: + +1. **SSA DCE** (`opt_ssa_dce`). The simplest consumer: walks edges, deletes + instructions with no uses, calls `opt_ssa_edges_drop_inst`. Validates the + delete path. +2. **Copy cleanup / copy prop** (`opt_copy_cleanup`, `opt_copy_prop`). Heavy + use of `opt_ssa_redirect_def`. Validates the redirect path. Folds into a + single pass (vs MIR's `copy_prop` — see `OPT_PERF.md` gap analysis). +3. **Address transform** (`opt_addr_xform`). Per-use rewrite using + `opt_ssa_edge_relink`. Implements MIR's per-use folding (vs cfree's + all-or-nothing today). +4. **Simplify** (`opt_simplify`). Pure local rewrites — minimal new edge + maintenance, mostly relink. +5. **GVN** (`opt_gvn`). Worklist re-enqueue after branch fold (MIR parity + item from gap analysis). Use the per-edge `flag` for the worklist. +6. **DSE** (`opt_dse`). Liveness computed once outside; pass walks + incrementally. Drops the inline fixpoint. +7. **LICM** (`opt_licm`) — with pressure filter and loop-tree reuse from + the start. This is the priority runtime fix from `OPT_PERF.md`; it + waits until edges are stable because the filter needs use-set queries. +8. **Pressure relief**, **conventional SSA**, **ssa_combine**, **undo SSA** + round out the schedule. + +The structural move itself is Phases 0–1. Phase 2 is the pass-by-pass +re-introduction and overlaps the O2 runtime roadmap in `OPT_PERF.md`. + +## Files touched + +Structural change (Phase 0–1): + +- `src/opt/ir.h` — `OptSsaEdge`, pool fields on `Func`, `Operand.edge` and + `Operand.edge_index`, aux struct edge arrays. +- `src/opt/pass_analysis.c` — replace `opt_rebuild_def_use` and `OptUse` + collection with edge-aware enumeration helpers. +- `src/opt/pass_ssa.c` — `opt_build_reg_ssa`, `opt_build_ssa`, `opt_undo_ssa` + switch to edge maintenance. +- `src/opt/pass_o2.c` — add `CFREE_OPT_SSA_PASS_*` gates around the SSA-era + schedule. +- New `src/opt/ssa_edges.c` (or extend `pass_analysis.c`) — pool, free + list, core operations, debug verifier. + +Pass-by-pass (Phase 2): one file per pass, edited in place. + +## Open questions + +These are deferred to implementation: + +- **`SEG_SHIFT` choice.** 7 (128 edges/segment ≈ 4 KB) is the starting + point. Measure peak edge count on `matrix`/`hash` and tune up to 8 if + segments are routinely full. Don't tune down — segment header overhead + outweighs waste at small sizes. +- **`OPK_INDIRECT.edge_base` / `edge_index` packing.** Two adjacent u32s + is the obvious layout. Alternative: pack as `u32 edges[2]` with kind + implicit by slot. Pick whichever reads cleanest during the pass_analysis + rewrite. No performance difference. +- **Per-edge `flag` reuse.** MIR uses it for renaming and for GVN/copy + worklists. cfree passes may want more than one bit — start with one, + add a second `u8 flag2` only when a pass concretely needs it. +- **Verifier cost.** `opt_ssa_edges_check` is debug-only. If it becomes + too slow even in debug, gate it behind a separate `CFREE_OPT_SSA_VERIFY` + define so day-to-day debug builds still complete in reasonable time.