commit 72f78c0141fca9e68a8efb575a098ba4db5b44ce
parent 662ea223050b63f620b330a9f129a1b352dfc0af
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sat, 23 May 2026 15:26:19 -0700
ssa2 plan
Diffstat:
| A | doc/SSA2.md | | | 932 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 932 insertions(+), 0 deletions(-)
diff --git a/doc/SSA2.md b/doc/SSA2.md
@@ -0,0 +1,932 @@
+# SSA2 — Incremental SSA Def-Use, MIR-shape
+
+This document is the design sketch for replacing cfree's wholesale rebuilt
+def-use representation (`opt_rebuild_def_use` in `src/opt/pass_analysis.c:327`)
+with MIR-style incremental SSA edges. The structural move lands first; every
+SSA-era downstream pass is disabled in-tree and then re-introduced one at a
+time, each matching or improving on the corresponding MIR pass.
+
+Background: see `doc/OPT.md` for the O2 pipeline shape, `doc/OPT_PERF.md` "O2
+Gap Analysis vs MIR" for the measured impact of wholesale rebuilds (38 call
+sites, 11 SSA-era DCE/copy invocations vs MIR's 2).
+
+## Status quo: what we are replacing
+
+cfree today carries a flat side table of uses, rebuilt from scratch by every
+pass that mutates instructions.
+
+`OptUse` (`src/opt/ir.h:374-385`) records one use:
+
+```c
+typedef struct OptUse {
+ Val val;
+ u32 block;
+ u32 inst;
+ InstId inst_id;
+ u32 next_for_val; /* singly-linked through opt_uses[] */
+ u32 operand_index;
+ u32 phi_pred_index;
+ Operand* operand; /* pointer into Inst.opnds; valid only between rebuilds */
+ u8 kind; /* OPT_USE_{OPERAND,INDIRECT_BASE,INDIRECT_INDEX,PHI_INPUT} */
+ u8 pad[3];
+} OptUse;
+```
+
+`Func` owns `opt_uses[]` plus the per-Val head array
+`opt_first_use_by_val[v]` (`ir.h:471-473`). The contract:
+
+- After `opt_rebuild_def_use(f)`, every use of every live Val is enumerable by
+ walking the `next_for_val` chain starting at `opt_first_use_by_val[v]`.
+- Defs are addressed separately through `Func.val_def_block[v]` and
+ `Func.val_def_inst[v]` (`ir.h:405-406`) — one def per Val (SSA single-def
+ invariant).
+- Any pass that adds, deletes, or rewrites an instruction must re-run
+ `opt_rebuild_def_use` before the next query, or the `operand` pointers in
+ `OptUse` go stale and the chains misrepresent the IR.
+
+That contract is what produces the 38 rebuild call sites and the 11 SSA-era
+DCE/copy interleaving documented in `doc/OPT_PERF.md`.
+
+## MIR's representation
+
+MIR encodes def-use as a per-operand linked-list system (mir-gen.c:2134-2164):
+
+```c
+typedef struct ssa_edge *ssa_edge_t;
+struct ssa_edge {
+ bb_insn_t use, def;
+ char flag; /* scratch bit used by renaming/worklist */
+ uint16_t def_op_num;
+ uint32_t use_op_num;
+ ssa_edge_t prev_use, next_use; /* doubly-linked list of uses of the same def */
+};
+```
+
+The clever part: every operand carries a single pointer `op->data`. Its
+meaning depends on whether the operand is a def or a use:
+
+- **Def operand** (`bb_insn->insn->ops[def_op_num].data`): head of the
+ doubly-linked list of `ssa_edge_t` records, one per use of this def.
+- **Use operand** (`bb_insn->insn->ops[use_op_num].data`): the single
+ `ssa_edge_t` representing this use, embedded in the def's list via
+ `prev_use`/`next_use`.
+
+So traversing all uses of a def is `for (se = def->ops[i].data; se; se = se->next_use)`. Removing one use is O(1) via the doubly-linked pointers
+(`mir-gen.c:2422-2434`). The shared `flag` byte is reused by renaming
+(`push_to_rename`/`pop_to_rename`, `mir-gen.c:2550-2616`) and other worklist
+algorithms — handy because it gives any pass a per-edge scratch bit without a
+separate map.
+
+The companion structures (also `mir-gen.c:2166-2186`):
+
+- `def_tab` — hash `(bb, reg) → bb_insn_t` used by demand-driven `get_def`
+ (mir-gen.c:2284-2304). Only live during SSA construction; cleared after.
+- `phis` / `deleted_phis` — running lists used by `minimize_ssa`
+ (mir-gen.c:2323-2372).
+- `ssa_edges_to_process` — VARR worklist used by renaming.
+
+The whole machinery is built once by `build_ssa` (mir-gen.c:2674-2709),
+maintained incrementally for the lifetime of SSA, and torn down by
+`undo_build_ssa` (mir-gen.c:2789-2820, walking ops and freeing edges).
+
+## Why this works for MIR and what changes for cfree
+
+MIR can stash a pointer directly on each operand because:
+
+1. `MIR_op_t` already has a `void* data` field for back-end use.
+2. `MIR_insn_t` is heap-allocated and reached via a doubly-linked list — instruction objects never move.
+3. `MIR_OP_VAR_MEM` packs base+index into one operand; in practice MIR
+ treats each operand position as a single use site (the back-end iterates
+ vars within an op separately).
+
+cfree differs on every count and needs a slightly different layout:
+
+1. `Operand` (`src/opt/ir.h`) has no scratch field. We have to add one or
+ keep edges in a side table.
+2. `Block.insts` is an `Inst*` array that can be reallocated on growth.
+ `Inst*` pointers are not stable — we must address instructions by
+ `(block_id, inst_index)`. `Inst.opnds` is arena-backed and stable for the
+ life of the instruction, so `Operand*` is stable as long as the instruction
+ exists, but it is not unique-id-able across arena resets.
+3. `OPK_INDIRECT` carries two register uses (`v.ind.base`, `v.ind.index`)
+ within one `Operand`. We need two edge slots per indirect operand.
+4. Many uses live in `extra.aux` (phi inputs in `IRPhiAux.pred_vals[]`, call
+ args in `IRCallAux`, asm operands in `IRAsmAux`, intrinsic args, return
+ value, scope cond) — not in `opnds[]`. Each of those needs an edge slot
+ too. cfree already enumerates them in `opt_collect_inst_uses`
+ (`pass_analysis.c:269-325`); the same enumeration is the basis for
+ maintenance.
+
+## Design constraints
+
+Speed is paramount, second only to correctness. The hot operations are walk
+uses of a Val (GVN, DSE, LICM, DCE), relink an operand (copy prop,
+substitute, addr_xform), remove a use (delete inst, rewrite operand), and
+redirect a def (GVN replacement, copy_cleanup). Each must be O(1) per
+operation aside from `redirect_def` which is O(uses_of_val).
+
+Three structural decisions that follow from this:
+
+1. **`u32` ids throughout — no pointers stored across operations.** Edge
+ ids are u32 indices into a pool. Storing `OptSsaEdge*` across an `add`
+ or `remove` requires pointer-stable backing storage; ids don't.
+2. **`SegVec` (`src/core/segvec.h`) for the edge pool.** Push never moves
+ existing elements. A free list through `next_use` reuses freed slots, so
+ the segment count tracks peak live edges, not lifetime allocations. MIR
+ allocates each `ssa_edge_t` separately with `gen_malloc`; we get the
+ same pointer stability with better cache density and no per-edge malloc
+ overhead.
+3. **Use-side edge id stored on the use site itself.** Operands gain a
+ `u32 edge` field. Aux structs gain a parallel `u32* edges` array sized
+ at aux creation. This makes "rewrite this operand's Val" a single edge
+ lookup, never a hash hit or a scan of the def's use list.
+
+We do **not** use MIR's `op->data` double-duty (head-at-def, edge-at-use).
+cfree already has `opt_first_use_by_val[v]` keyed directly by Val id, which
+is faster than walking from a def-side head — one array load vs an
+operand load plus an Inst lookup. Use sites store their edge id, defs
+store nothing in the operand.
+
+## Data structures
+
+All names use the `opt_ssa_` / `OptSsa` prefix to avoid colliding with the
+legacy `OptUse` types during the migration.
+
+### Edge record
+
+```c
+#define OPT_SSA_EDGE_NONE 0xffffffffu
+
+typedef enum OptSsaUseKind {
+ OPT_SSA_USE_OPERAND, /* Inst.opnds[op_idx] is an OPK_REG */
+ OPT_SSA_USE_INDIRECT_BASE, /* Inst.opnds[op_idx].v.ind.base */
+ OPT_SSA_USE_INDIRECT_INDEX, /* Inst.opnds[op_idx].v.ind.index */
+ OPT_SSA_USE_PHI_INPUT, /* IRPhiAux.pred_vals[sub_idx] */
+ OPT_SSA_USE_AUX_OPERAND, /* call/asm/intrinsic/ret/scope aux operand */
+} OptSsaUseKind;
+
+typedef struct OptSsaEdge {
+ /* Val whose def this edge refers to. Cached on the edge so the per-Val
+ * list head update on remove doesn't need to chase the use operand. */
+ Val val;
+
+ /* Use site. (use_block, use_inst) identify the consuming instruction;
+ * use_op_idx and sub_idx locate the specific use within it. */
+ u32 use_block;
+ u32 use_inst;
+ u16 use_op_idx;
+ u16 sub_idx; /* phi pred index or aux sub-index; 0 otherwise */
+
+ /* Def location. Single-def SSA invariant means there's exactly one
+ * producing inst per Val; def_op_idx is which output position of that
+ * inst produces val (0 for primary def; >0 for multi-result via
+ * Inst.defs[]). Cached to avoid scanning the def inst on each touch. */
+ u32 def_block;
+ u32 def_inst;
+ u8 def_op_idx;
+
+ u8 kind; /* OptSsaUseKind */
+ u8 flag; /* scratch bit; reused by renaming/GVN worklist */
+ u8 pad;
+
+ /* Doubly-linked use list per Val. Head at f->opt_first_use_by_val[val]. */
+ u32 prev_use; /* OPT_SSA_EDGE_NONE if first */
+ u32 next_use; /* OPT_SSA_EDGE_NONE if last; also free-list link */
+} OptSsaEdge;
+```
+
+The struct is 32 bytes on a 64-bit host. Two edges per cache line.
+
+### Edge pool
+
+```c
+SEGVEC_DEFINE(OptSsaEdgeVec, OptSsaEdge, 7); /* 128 edges per segment ≈ 4 KB */
+
+typedef struct OptSsaState {
+ OptSsaEdgeVec edges; /* pool; index = edge id */
+ u32 free_head; /* head of free list through next_use */
+ /* Per-Val use list heads. Indexed by Val id. */
+ u32* first_use_by_val;
+ u32 first_use_by_val_cap;
+} OptSsaState;
+```
+
+`OptSsaState` lives on `Func` (`Func.opt_ssa`) and is initialized at
+`opt_build_reg_ssa` / finalized at `opt_undo_ssa`. The SegVec uses the
+Compiler's `Heap`; segments are freed at `_fini`.
+
+Per-Val head array can be arena-grown (`Func.arena`): it holds only u32
+indices, never pointers, so reallocation on Val mint is safe.
+
+`first_use_by_val_cap` tracks current capacity; on Val mint, if `nvals >
+cap`, double and copy. The copy is cheap (u32s, count = active Vals).
+
+### Use-site edge storage
+
+The use site stores its edge id so that `relink` and `remove` are one
+lookup, never a list walk.
+
+**`Operand`** (in `src/opt/ir.h`) — add to the `OPK_REG` and `OPK_INDIRECT`
+variants:
+
+```c
+union {
+ struct { Reg r; u32 edge; } reg;
+ struct { Reg base, index; i32 ofs; u32 edge_base;
+ u32 edge_index; } ind;
+ /* other variants unchanged */
+} v;
+```
+
+`Operand` grows by 4 bytes for `reg` and 8 bytes for `ind`. Non-reg,
+non-indirect variants are untouched.
+
+**Aux structs** (in `src/opt/ir.h`) — each gains a parallel edge-id array,
+sized at aux construction:
+
+```c
+typedef struct IRPhiAux {
+ /* existing fields */
+ Val* pred_vals;
+ u32* pred_edges; /* edge id per pred; OPT_SSA_EDGE_NONE if none */
+ u32 npreds;
+ /* ... */
+} IRPhiAux;
+```
+
+Same shape for `IRCallAux`, `IRAsmAux`, `IRIntrinAux`, `IRRetAux`,
+`IRScopeAux`. Sized exactly to the use count enumerated by
+`opt_collect_inst_uses` today (`pass_analysis.c:269-325`), so no growth.
+
+Memory accounting: per use site we spend 4 bytes (edge id) at the use plus
+32 bytes (edge record) in the pool, vs today's 24 bytes per `OptUse` entry
+plus an entire `opt_uses[]` rebuild per pass. The win is in not paying the
+rebuild cost across ~38 sites per O2 pipeline.
+
+### Defs
+
+Existing `Func.val_def_block[v]` / `val_def_inst[v]` continue to identify
+the single def for each Val (`ir.h:405-406`). The edge's `def_block`,
+`def_inst`, `def_op_idx` are caches of the same information — written at
+edge creation, updated by `opt_ssa_redirect_def`. They exist so that walks
+of a Val's use list don't have to re-lookup the def through `val_def_*`
+just to know "what produced this".
+
+## Core operations
+
+The API splits into two layers:
+
+- **Low-level pool/list primitives** (this section). Hot path. Inline in
+ `src/opt/ssa_edges.h`. These touch the edge pool and the per-Val use
+ list only; they do not know about Operand or aux layouts. Suitable for
+ use inside tight walks where the caller already has the use-site handle.
+
+- **Use-site-aware public helpers** (next section). What passes call.
+ Out-of-line in `src/opt/ssa_edges.c`. These wrap the primitives and also
+ resolve and update the use-site slot (Operand field, aux edges array).
+
+Splitting this way lets the inner-most loops — walking a Val's use list
+and rewriting edges as you go — call only the primitives without paying
+for a use-site lookup that the caller already has implicitly.
+
+### Lookup and iteration
+
+```c
+/* O(1) — two dependent loads inside SegVec_at. */
+static inline OptSsaEdge* opt_ssa_edge(Func* f, u32 edge_id) {
+ return OptSsaEdgeVec_at(&f->opt_ssa.edges, edge_id);
+}
+
+/* Walk every use of val. Cache next_use before the body if the body may
+ * remove the current edge — see the "iterator invalidation" note below. */
+#define OPT_SSA_FOR_EACH_USE(F, V, E) \
+ for (u32 E = (F)->opt_ssa.first_use_by_val[V]; \
+ E != OPT_SSA_EDGE_NONE; \
+ E = opt_ssa_edge((F), E)->next_use)
+```
+
+### Pool allocate / free
+
+```c
+/* O(1). Pop from free list if non-empty, else SegVec_push. Returned slot
+ * is uninitialized — the caller writes every field. */
+static inline u32 opt_ssa_pool_alloc(Func* f) {
+ u32 id = f->opt_ssa.free_head;
+ if (id != OPT_SSA_EDGE_NONE) {
+ f->opt_ssa.free_head = opt_ssa_edge(f, id)->next_use;
+ return id;
+ }
+ (void)OptSsaEdgeVec_push(&f->opt_ssa.edges, &id);
+ return id;
+}
+
+/* O(1). Push onto free list via next_use. flag = 0xff marks the slot as
+ * free for the debug verifier. */
+static inline void opt_ssa_pool_free(Func* f, u32 id) {
+ OptSsaEdge* e = opt_ssa_edge(f, id);
+#ifndef NDEBUG
+ e->flag = 0xff;
+#endif
+ e->next_use = f->opt_ssa.free_head;
+ f->opt_ssa.free_head = id;
+}
+```
+
+### List link / unlink
+
+These are the inner loops of `add`/`remove`/`relink`. Inlined.
+
+```c
+/* O(1). Insert edge at the head of val's use list. Caller has already
+ * written e->val. */
+static inline void opt_ssa_list_push(Func* f, u32 edge_id) {
+ OptSsaEdge* e = opt_ssa_edge(f, edge_id);
+ u32 head = f->opt_ssa.first_use_by_val[e->val];
+ e->prev_use = OPT_SSA_EDGE_NONE;
+ e->next_use = head;
+ if (head != OPT_SSA_EDGE_NONE)
+ opt_ssa_edge(f, head)->prev_use = edge_id;
+ f->opt_ssa.first_use_by_val[e->val] = edge_id;
+}
+
+/* O(1). Unlink edge from val's use list. The edge fields are left as-is;
+ * the caller decides whether to free or relink. */
+static inline void opt_ssa_list_unlink(Func* f, u32 edge_id) {
+ OptSsaEdge* e = opt_ssa_edge(f, edge_id);
+ if (e->prev_use != OPT_SSA_EDGE_NONE)
+ opt_ssa_edge(f, e->prev_use)->next_use = e->next_use;
+ else
+ f->opt_ssa.first_use_by_val[e->val] = e->next_use;
+ if (e->next_use != OPT_SSA_EDGE_NONE)
+ opt_ssa_edge(f, e->next_use)->prev_use = e->prev_use;
+}
+```
+
+### Primitive add / remove / relink
+
+```c
+/* O(1). Allocate, initialize, and link an edge. Caller writes the new
+ * edge id into the use-site handle. */
+static inline u32 opt_ssa_edge_add_raw(
+ Func* f, Val val,
+ u32 def_block, u32 def_inst, u8 def_op_idx,
+ u32 use_block, u32 use_inst,
+ OptSsaUseKind kind, u16 use_op_idx, u16 sub_idx) {
+ u32 id = opt_ssa_pool_alloc(f);
+ OptSsaEdge* e = opt_ssa_edge(f, id);
+ e->val = val;
+ e->def_block = def_block;
+ e->def_inst = def_inst;
+ e->def_op_idx = def_op_idx;
+ e->use_block = use_block;
+ e->use_inst = use_inst;
+ e->use_op_idx = use_op_idx;
+ e->sub_idx = sub_idx;
+ e->kind = (u8)kind;
+ e->flag = 0;
+ opt_ssa_list_push(f, id);
+ return id;
+}
+
+/* O(1). Unlink and free. Caller clears the use-site handle. */
+static inline void opt_ssa_edge_remove_raw(Func* f, u32 edge_id) {
+ opt_ssa_list_unlink(f, edge_id);
+ opt_ssa_pool_free(f, edge_id);
+}
+
+/* O(1). Repoint an existing edge to a new Val. Caller is responsible for
+ * rewriting the use-site Reg/Val to match. The edge id and its position
+ * in the use site stay the same — only the val and the use-list it
+ * belongs to change. */
+static inline void opt_ssa_edge_relink_raw(Func* f, u32 edge_id, Val new_val,
+ u32 new_def_block,
+ u32 new_def_inst,
+ u8 new_def_op_idx) {
+ opt_ssa_list_unlink(f, edge_id);
+ OptSsaEdge* e = opt_ssa_edge(f, edge_id);
+ e->val = new_val;
+ e->def_block = new_def_block;
+ e->def_inst = new_def_inst;
+ e->def_op_idx = new_def_op_idx;
+ opt_ssa_list_push(f, edge_id);
+}
+```
+
+### Iterator invalidation
+
+Removing the current edge during a `OPT_SSA_FOR_EACH_USE` walk corrupts
+the iteration because the macro fetches `next_use` *after* the body. The
+safe pattern:
+
+```c
+u32 e = f->opt_ssa.first_use_by_val[v];
+while (e != OPT_SSA_EDGE_NONE) {
+ u32 next = opt_ssa_edge(f, e)->next_use;
+ if (should_remove_p(e)) opt_ssa_edge_remove_raw(f, e);
+ e = next;
+}
+```
+
+This is the same shape as MIR's removal loops (mir-gen.c:2810-2811). The
+macro is for read-only walks; mutation walks use the explicit form above.
+
+## Public helpers
+
+Out-of-line in `src/opt/ssa_edges.c`. These are what passes actually call;
+each resolves the use-site slot and then calls the primitive layer.
+
+### Use-site slot resolver
+
+A single switch from `(use_block, use_inst, kind, op_idx, sub_idx)` to a
+pointer to the u32 edge slot. Used by `_drop_inst`, `_add_inst`, the high-
+level relink helpers, and the verifier.
+
+```c
+static u32* opt_ssa_use_slot(Func* f, u32 use_block, u32 use_inst,
+ OptSsaUseKind kind, u16 op_idx, u16 sub_idx) {
+ Inst* in = &f->blocks[use_block].insts[use_inst];
+ switch (kind) {
+ case OPT_SSA_USE_OPERAND:
+ return &in->opnds[op_idx].v.reg.edge;
+ case OPT_SSA_USE_INDIRECT_BASE:
+ return &in->opnds[op_idx].v.ind.edge_base;
+ case OPT_SSA_USE_INDIRECT_INDEX:
+ return &in->opnds[op_idx].v.ind.edge_index;
+ case OPT_SSA_USE_PHI_INPUT:
+ return &((IRPhiAux*)in->extra.aux)->pred_edges[sub_idx];
+ case OPT_SSA_USE_AUX_OPERAND:
+ /* Dispatch on IROp; each aux struct carries its own *_edges[] sized
+ * to the use count enumerated by opt_collect_inst_uses. */
+ switch ((IROp)in->op) {
+ case IR_CALL: return &((IRCallAux*)in->extra.aux)->arg_edges[sub_idx];
+ case IR_ASM_BLOCK: return &((IRAsmAux*)in->extra.aux)->in_edges[sub_idx];
+ case IR_INTRINSIC: return &((IRIntrinAux*)in->extra.aux)->arg_edges[sub_idx];
+ case IR_RET: return &((IRRetAux*)in->extra.aux)->val_edges[sub_idx];
+ case IR_SCOPE_BEGIN: return &((IRScopeAux*)in->extra.aux)->cond_edge;
+ default: cfree_unreachable();
+ }
+ }
+ cfree_unreachable();
+}
+```
+
+The dispatch is one switch and one struct field load per resolution.
+Hot-loop callers (relink during copy_prop, GVN substitute) avoid this by
+holding the slot pointer or the Operand directly — see the relink-operand
+helpers below.
+
+### Inst-level add / drop
+
+```c
+/* Walk every use site on the instruction and add an edge. Called once
+ * per instruction from opt_ssa_edges_add_inst's outer loop, plus per
+ * new instruction emitted mid-pipeline. */
+void opt_ssa_edges_add_inst(Func* f, u32 block, u32 inst_idx);
+
+/* Walk every use site and remove its edge. Asserts (debug) that none of
+ * the instruction's defined Vals still has uses — caller has done its
+ * rewriting. */
+void opt_ssa_edges_drop_inst(Func* f, u32 block, u32 inst_idx);
+```
+
+The structure of both mirrors `opt_collect_inst_uses` (pass_analysis.c:
+269-325) exactly — same operand walk, same aux-struct dispatch. The
+difference is that today's path appends to `opt_uses[]` and writes
+`opt_first_use_by_val[v]` once at the end; the new path calls
+`opt_ssa_edge_add_raw` for each site and writes the returned edge id back
+through `opt_ssa_use_slot`.
+
+### Operand relink (the most common public helper)
+
+This is what copy_prop, GVN substitute, addr_xform, and ssa_combine call
+to rewrite an operand's Val:
+
+```c
+/* Rewrite an OPK_REG operand from old_val to new_val, maintaining edges.
+ * Reads the edge id from op->v.reg.edge; relinks it to new_val's list;
+ * writes op->v.reg.r = (Reg)new_val. */
+static inline void opt_ssa_relink_operand(Func* f, Operand* op, Val new_val) {
+ u32 edge_id = op->v.reg.edge;
+ opt_ssa_edge_relink_raw(f, edge_id, new_val,
+ f->val_def_block[new_val],
+ f->val_def_inst[new_val],
+ opt_ssa_val_def_op_idx(f, new_val));
+ op->v.reg.r = (Reg)new_val;
+}
+
+/* OPK_INDIRECT.base variant. Same shape against op->v.ind.edge_base. */
+static inline void opt_ssa_relink_indirect_base(Func* f, Operand* op, Val new_val);
+static inline void opt_ssa_relink_indirect_index(Func* f, Operand* op, Val new_val);
+
+/* Phi input: aux->pred_edges[p], aux->pred_vals[p] = new_val. */
+static inline void opt_ssa_relink_phi_input(Func* f, IRPhiAux* aux,
+ u32 pred_idx, Val new_val);
+```
+
+All five are inline because the body is short and they're called inside
+inner loops. The cost is one edge lookup, one list unlink + push, two
+field writes — no use-site resolver call because the caller already has
+the Operand/aux pointer.
+
+### Redirect def (whole-Val substitution)
+
+O(uses_of(from)). One pass over the list, rewriting each edge's
+`val`/`def_*` and the operand it refers to, then splicing the whole list
+onto `to`'s head. Single-pass equivalent of N individual relinks.
+
+```c
+void opt_ssa_redirect_def(Func* f, Val from, Val to) {
+ u32 head = f->opt_ssa.first_use_by_val[from];
+ if (head == OPT_SSA_EDGE_NONE) return;
+
+ u32 to_def_block = f->val_def_block[to];
+ u32 to_def_inst = f->val_def_inst[to];
+ u8 to_def_op_idx = opt_ssa_val_def_op_idx(f, to);
+ Reg to_reg = (Reg)to;
+
+ /* Walk from-val's list, rewriting val + operand at each edge. */
+ u32 tail = head;
+ for (u32 e = head; e != OPT_SSA_EDGE_NONE; ) {
+ OptSsaEdge* ed = opt_ssa_edge(f, e);
+ ed->val = to;
+ ed->def_block = to_def_block;
+ ed->def_inst = to_def_inst;
+ ed->def_op_idx = to_def_op_idx;
+ /* Rewrite the use-site operand to match. */
+ u32* slot = opt_ssa_use_slot(f, ed->use_block, ed->use_inst,
+ (OptSsaUseKind)ed->kind,
+ ed->use_op_idx, ed->sub_idx);
+ (void)slot; /* slot already holds e; what we update is the Reg/Val it points to */
+ opt_ssa_write_use_reg(f, ed, to_reg);
+ tail = e;
+ e = ed->next_use;
+ }
+
+ /* Splice the whole list onto to's head. */
+ u32 to_head = f->opt_ssa.first_use_by_val[to];
+ opt_ssa_edge(f, tail)->next_use = to_head;
+ if (to_head != OPT_SSA_EDGE_NONE)
+ opt_ssa_edge(f, to_head)->prev_use = tail;
+ f->opt_ssa.first_use_by_val[to] = head;
+ f->opt_ssa.first_use_by_val[from] = OPT_SSA_EDGE_NONE;
+}
+```
+
+`opt_ssa_write_use_reg` is a small helper that switches on the edge's
+`kind` and writes the appropriate Reg field of the use's Operand or aux
+entry. It's the inverse of `opt_ssa_use_slot` but for the Val/Reg value
+rather than the edge id.
+
+This mirrors MIR's `change_ssa_edge_list_def` (mir-gen.c:2445-2462) one
+to one, with cfree's per-kind dispatch for the operand-side write.
+
+### Inst-level delete
+
+```c
+void opt_ssa_delete_inst(Func* f, u32 block, u32 inst_idx) {
+ opt_ssa_edges_drop_inst(f, block, inst_idx);
+ /* existing inst-removal path: mark NOP / shift block.insts / etc. */
+ ir_block_remove_inst(f, block, inst_idx);
+}
+```
+
+Wraps the existing inst-removal path with the edge drop step. Passes
+that delete an inst always go through this rather than touching the
+block insts array directly.
+
+### State lifecycle
+
+```c
+void opt_ssa_state_init(Func* f); /* opt_build_reg_ssa */
+void opt_ssa_state_fini(Func* f); /* opt_undo_ssa */
+
+/* On Val mint: ensure first_use_by_val[] is large enough for the new
+ * highest Val id. Caller passes the new minimum capacity. */
+void opt_ssa_state_ensure_val_cap(Func* f, u32 nvals_needed);
+```
+
+`_init` zero-inits `OptSsaState`, sets `free_head = OPT_SSA_EDGE_NONE`,
+initializes the SegVec on the Compiler heap, allocates
+`first_use_by_val` with capacity `nvals` and fills with
+`OPT_SSA_EDGE_NONE`.
+
+`_fini` calls `OptSsaEdgeVec_fini` and clears the pointers. The arena-
+backed `first_use_by_val` is reclaimed by the function-end arena reset.
+
+### Debug verifier
+
+```c
+#ifndef NDEBUG
+/* Walk all blocks/insts; re-derive expected use set; compare against
+ * live edges. Called at OPT_VERIFY checkpoints in pass_o2.c. */
+void opt_ssa_edges_check(Func* f, const char* stage);
+#endif
+```
+
+For each instruction:
+
+1. For every use site, read the edge id from its slot. If non-NONE,
+ look up the edge and assert `e.val == operand_val`,
+ `e.use_block/use_inst/op_idx/sub_idx/kind` match, and the edge is on
+ `first_use_by_val[e.val]`'s doubly-linked list.
+2. For every Val the inst defines, walk `first_use_by_val[v]` and
+ assert each edge's `def_block`/`def_inst`/`def_op_idx` agrees with
+ the producing inst.
+3. Verify the free list — no live edge id appears on it, every free
+ edge has `flag == 0xff`.
+
+This is O(insts + uses) per call. Gated under `CFREE_OPT_SSA_VERIFY` so
+that day-to-day debug builds stay fast and only the full verify runs
+under explicit opt-in.
+
+## Data flow
+
+The lifecycle of an edge, from creation to teardown.
+
+**Creation (SSA construction).** `opt_build_reg_ssa` and `opt_build_ssa`
+walk the function once after IDF-based phi insertion. For each
+instruction, they call `opt_ssa_edges_add_inst`, which enumerates use
+sites the same way `opt_collect_inst_uses` does today. Each use site:
+
+1. Look up `val_def_block[v]`, `val_def_inst[v]` for the producing inst.
+2. Allocate edge (free list or pool push).
+3. Initialize all fields.
+4. Write `next_use = first_use_by_val[v]; prev_use = NONE`. If the
+ existing head exists, set its `prev_use = new_edge_id`.
+5. Set `first_use_by_val[v] = new_edge_id`.
+6. Write the edge id into the use-site handle (Operand or aux array).
+
+**Use walk (the most common operation).** Passes consult uses of a Val v.
+The walk reads `first_use_by_val[v]`, then chases `next_use`. Each step is
+one SegVec `_at` (two dependent loads). Within a walk, edges are stable —
+the caller can hold `OptSsaEdge*` for the duration of one iteration safely
+across edge removal at the current cursor (cache the `next_use` before
+removing).
+
+**Relink (operand rewrite).** A pass discovers that an operand currently
+referring to Val `old_v` should now refer to `new_v`. It reads the edge id
+from the use-site handle, calls `opt_ssa_edge_relink`. The edge's
+`prev_use`/`next_use` are unlinked from `old_v`'s list, the edge's `val`
+and the operand's Reg are updated, and the edge is linked at the head of
+`new_v`'s list. Six pointer rewrites total, no list walk.
+
+**Redirect def (whole-Val substitution).** A pass discovers Val `a` is
+equivalent to Val `b` (GVN substitute, copy_cleanup). `opt_ssa_redirect_def`
+walks `a`'s use list once. For each edge: rewrite its `val` and the
+operand it indexes. After the walk, splice the tail of `a`'s list onto
+`b`'s list head:
+
+```c
+u32 head_a = first_use_by_val[a];
+if (head_a == NONE) { return; }
+u32 tail = walk(head_a); /* find tail, rewriting val + operand along the way */
+opt_ssa_edge(tail)->next_use = first_use_by_val[b];
+if (first_use_by_val[b] != NONE)
+ opt_ssa_edge(first_use_by_val[b])->prev_use = tail;
+first_use_by_val[b] = head_a;
+first_use_by_val[a] = NONE;
+```
+
+O(uses_of(a)) with no per-element list-juggling. The producing inst of `a`
+is left in place (DCE will remove it once unused).
+
+**Inst deletion.** Three steps:
+
+1. `opt_ssa_edges_drop_inst(f, block, inst_idx)`. For each use site on the
+ inst, remove the edge (incoming). For each Val the inst defines, walk
+ its use list — if non-empty, the caller is wrong to be deleting this
+ inst; in debug builds, assert. The walk is fine because deletion implies
+ no remaining uses (the caller has already done its rewriting).
+2. Remove the inst from `Block.insts[]` (or mark NOP — existing
+ convention).
+3. The arena memory for `Inst.opnds` and aux structs is not freed
+ individually; it's reclaimed at function-end arena reset.
+
+**Inst creation mid-pipeline.** A pass emits a new inst (e.g., a copy
+inserted by copy_cleanup, a hoisted invariant by LICM):
+
+1. `ir_emit` allocates the inst with Operand fields initialized to
+ `OPT_SSA_EDGE_NONE`.
+2. Pass fills in operands and aux.
+3. Pass calls `opt_ssa_edges_add_inst` on the new inst.
+
+The order matters: edges are added *after* operands are populated, so the
+add path can read the Reg values to find their producing Val and edge to.
+
+**Teardown (opt_undo_ssa).** Walks all blocks/insts, calls
+`opt_ssa_edges_drop_inst` on each. After the walk, `OptSsaState` is reset:
+edges SegVec emptied (or kept for the next function), free_head reset,
+first_use_by_val cleared.
+
+## Lifetimes
+
+Everything is per-function. SSA edges are not retained across functions in
+the optimizer's `FuncSet`; each function is built, lowered, and emitted in
+isolation. The lifetimes:
+
+| Object | Born | Dies | Storage |
+| --- | --- | --- | --- |
+| `OptSsaEdge` (record) | `opt_ssa_edge_add` | `opt_ssa_edge_remove` | `OptSsaState.edges` (SegVec on Heap) |
+| Edge slot in SegVec | first segment push | `OptSsaState_fini` | Heap; freed at function teardown |
+| `OptSsaState.free_head` | `opt_build_reg_ssa` | `opt_undo_ssa` | `Func.opt_ssa` |
+| `first_use_by_val[]` | `opt_build_reg_ssa` / on Val mint grow | `opt_undo_ssa` | `Func.arena` |
+| `Operand.v.reg.edge` | `ir_emit` (init to NONE) → `opt_ssa_edge_add` | `opt_ssa_edge_remove` / inst deletion | inside `Inst.opnds` (arena) |
+| `IRPhiAux.pred_edges[]` | aux construction | function arena reset | `Func.arena` |
+| Other aux `*_edges[]` | aux construction | function arena reset | `Func.arena` |
+
+Three lifetime invariants for callers:
+
+1. **Edge ids are valid only between `opt_build_*_ssa` and `opt_undo_ssa`.**
+ At O1, no edges exist. Any code path that runs at both O1 and O2 must
+ gate edge maintenance on `Func.opt_reg_ssa`.
+2. **`OptSsaEdge*` is stable across `add`/`remove`/`relink`** thanks to
+ SegVec. It is NOT stable across `OptSsaState_fini`. Within a pass it's
+ safe to hold; across passes use the edge id.
+3. **An edge's use-site handle must always agree with the edge.** If a
+ pass rewrites an operand's Reg directly (not through `relink`), the
+ edge's `val` becomes stale and the use list is wrong. The debug
+ verifier (`opt_ssa_edges_check`) walks all insts, re-derives the
+ expected use set, and compares; this is the catch-all.
+
+## Operand mutation: the contract
+
+Every place that mutates an instruction operand or aux use moves to a
+helper:
+
+| Today | New |
+| --- | --- |
+| `op->v.reg = new_val;` | `opt_ssa_relink_operand(f, op, new_val);` |
+| `aux->pred_vals[p] = new_val;` | `opt_ssa_relink_phi_input(f, aux, p, new_val);` |
+| `block.insts[i].op = IR_NOP;` (delete) | `opt_ssa_delete_inst(f, b, i);` |
+| emit new inst, then read its operands | emit new inst, then `opt_ssa_edges_add_inst(f, b, i);` |
+| `opt_rebuild_def_use(f);` | deleted; the invariant is maintained |
+
+The migration tax is one-time: every pass that mutates touches these
+helpers exactly once per mutation kind, then is done.
+
+## Construction and teardown
+
+`opt_build_reg_ssa` / `opt_build_ssa` (currently in `src/opt/pass_ssa.c`) are
+rewritten in two phases:
+
+1. **Today's shape, edge-aware**: replace the post-construction
+ `opt_rebuild_def_use(f)` call (`pass_ssa.c:548`, `:807`) with a single
+ forward walk that calls `opt_ssa_edges_add_inst` once per instruction.
+ IDF-based phi insertion stays unchanged. Net structural move with no
+ algorithm change.
+
+2. **Later (post-roadmap re-enable)**: switch to MIR-shape demand-driven
+ construction with `get_def` and `minimize_ssa`
+ (mir-gen.c:2284-2372). Add live-in filtering to mem2reg
+ (`compute_phi_sites`, `pass_ssa.c:623-654`).
+
+Teardown is symmetric: `opt_undo_ssa` calls `opt_ssa_edges_drop_inst` for
+each instruction as it converts back to PReg form, returning all edges to
+the free list.
+
+## Invalidation and verification
+
+`Func.opt_valid_analyses` (`ir.h:475`) currently has an
+`OPT_ANALYSIS_DEF_USE` bit. Under the new representation, def-use is
+**always valid in SSA mode** — there is no rebuild. The bit becomes a
+verification flag rather than a state flag:
+
+- Set after `opt_build_*_ssa` (matches today).
+- Cleared by `opt_undo_ssa`.
+- Never cleared by individual passes. If a pass needs to mutate instructions,
+ it goes through `opt_ssa_edges_*` and the invariant survives.
+
+In debug builds, `opt_verify` (`pass_o2.c` checkpoints) gains an
+`opt_ssa_edges_check` step that re-derives uses from instructions and
+compares against the live edge lists. This is the safety net for the
+migration: any pass that mutates instructions without maintaining edges
+trips the verifier at the next checkpoint.
+
+## Migration plan
+
+The migration treats every SSA-era pass as opt-in. The structural move lands
+green with downstream passes disabled, then each pass is re-enabled in turn
+with its MIR-parity work.
+
+### Phase 0 — quarantine SSA-era passes
+
+In `src/opt/pass_o2.c:opt_cleanup`, gate every pass between
+`opt_build_reg_ssa` and `opt_undo_ssa` behind a per-pass `#ifdef
+CFREE_OPT_SSA_PASS_<NAME>`. By default all are undefined. The schedule then
+runs:
+
+```text
+opt_build_cfg, opt_jump_cleanup, opt_build_cfg
+opt_build_reg_ssa
+opt_build_ssa
+opt_undo_ssa
+opt_jump_opt
+```
+
+That is the smallest legal O2 schedule: build SSA, immediately undo, run
+the shared lowering pipeline. It is functionally equivalent to O1 plus the
+SSA round-trip — expensive but correct, and the bisection floor for the
+structural change.
+
+Tests that currently exercise SSA-era transformations
+(`128_o2_branch_join_addr_mem.toy` and friends) move to an "expected
+mediocre" tier in `test-opt` and `test-toy`: they must continue to compile
+and produce correct output, but quality is not asserted until the
+corresponding pass is re-enabled. This avoids losing coverage during the
+migration. `make bench-opt` at O2 will regress against today's numbers
+during the quarantine window — that is expected and recorded in
+`OPT_PERF.md` as a known trough.
+
+### Phase 1 — land the data structures
+
+1. Add `OptSsaEdge`, the pool fields on `Func`, and the `u32 edge` /
+ `u32 edge_index` slots on `Operand`.
+2. Add `u32* pred_edges` to `IRPhiAux`; `u32* operand_edges` to the call /
+ asm / intrinsic / ret / scope aux structs (sizes track the existing
+ use enumeration in `opt_collect_inst_uses`).
+3. Implement the core operations: `opt_ssa_edge_add`,
+ `opt_ssa_edge_remove`, `opt_ssa_edge_relink`, `opt_ssa_redirect_def`,
+ `opt_ssa_edges_add_inst`, `opt_ssa_edges_drop_inst`.
+4. Implement `opt_ssa_edges_check` (debug-only verifier).
+5. Wire `opt_build_*_ssa` to build edges in one forward pass; wire
+ `opt_undo_ssa` to drop them.
+6. Delete `opt_uses[]`, `OptUse`, `opt_rebuild_def_use`, and every call to
+ it. Each call site converts to a direct edge walk.
+
+Acceptance: O2 quarantine schedule (Phase 0) passes `make test-opt
+test-toy`, with the verifier enabled in debug builds.
+
+### Phase 2 — re-enable passes, one per landing
+
+Each phase is its own commit (or small series), shaped:
+
+1. Implement the MIR-equivalent algorithm using incremental edges from the
+ start — never reintroduce a wholesale rebuild.
+2. Define the corresponding `CFREE_OPT_SSA_PASS_<NAME>` and enable it in
+ the O2 schedule.
+3. Restore the pass's quality assertions in `test-opt` / `test-toy`.
+4. Refresh the corresponding row in `doc/OPT_PERF.md` and add an iteration
+ note documenting the MIR parity (or improvement) achieved.
+
+Suggested order, chosen so each pass exercises only edges already proven by
+its predecessors:
+
+1. **SSA DCE** (`opt_ssa_dce`). The simplest consumer: walks edges, deletes
+ instructions with no uses, calls `opt_ssa_edges_drop_inst`. Validates the
+ delete path.
+2. **Copy cleanup / copy prop** (`opt_copy_cleanup`, `opt_copy_prop`). Heavy
+ use of `opt_ssa_redirect_def`. Validates the redirect path. Folds into a
+ single pass (vs MIR's `copy_prop` — see `OPT_PERF.md` gap analysis).
+3. **Address transform** (`opt_addr_xform`). Per-use rewrite using
+ `opt_ssa_edge_relink`. Implements MIR's per-use folding (vs cfree's
+ all-or-nothing today).
+4. **Simplify** (`opt_simplify`). Pure local rewrites — minimal new edge
+ maintenance, mostly relink.
+5. **GVN** (`opt_gvn`). Worklist re-enqueue after branch fold (MIR parity
+ item from gap analysis). Use the per-edge `flag` for the worklist.
+6. **DSE** (`opt_dse`). Liveness computed once outside; pass walks
+ incrementally. Drops the inline fixpoint.
+7. **LICM** (`opt_licm`) — with pressure filter and loop-tree reuse from
+ the start. This is the priority runtime fix from `OPT_PERF.md`; it
+ waits until edges are stable because the filter needs use-set queries.
+8. **Pressure relief**, **conventional SSA**, **ssa_combine**, **undo SSA**
+ round out the schedule.
+
+The structural move itself is Phases 0–1. Phase 2 is the pass-by-pass
+re-introduction and overlaps the O2 runtime roadmap in `OPT_PERF.md`.
+
+## Files touched
+
+Structural change (Phase 0–1):
+
+- `src/opt/ir.h` — `OptSsaEdge`, pool fields on `Func`, `Operand.edge` and
+ `Operand.edge_index`, aux struct edge arrays.
+- `src/opt/pass_analysis.c` — replace `opt_rebuild_def_use` and `OptUse`
+ collection with edge-aware enumeration helpers.
+- `src/opt/pass_ssa.c` — `opt_build_reg_ssa`, `opt_build_ssa`, `opt_undo_ssa`
+ switch to edge maintenance.
+- `src/opt/pass_o2.c` — add `CFREE_OPT_SSA_PASS_*` gates around the SSA-era
+ schedule.
+- New `src/opt/ssa_edges.c` (or extend `pass_analysis.c`) — pool, free
+ list, core operations, debug verifier.
+
+Pass-by-pass (Phase 2): one file per pass, edited in place.
+
+## Open questions
+
+These are deferred to implementation:
+
+- **`SEG_SHIFT` choice.** 7 (128 edges/segment ≈ 4 KB) is the starting
+ point. Measure peak edge count on `matrix`/`hash` and tune up to 8 if
+ segments are routinely full. Don't tune down — segment header overhead
+ outweighs waste at small sizes.
+- **`OPK_INDIRECT.edge_base` / `edge_index` packing.** Two adjacent u32s
+ is the obvious layout. Alternative: pack as `u32 edges[2]` with kind
+ implicit by slot. Pick whichever reads cleanest during the pass_analysis
+ rewrite. No performance difference.
+- **Per-edge `flag` reuse.** MIR uses it for renaming and for GVN/copy
+ worklists. cfree passes may want more than one bit — start with one,
+ add a second `u8 flag2` only when a pass concretely needs it.
+- **Verifier cost.** `opt_ssa_edges_check` is debug-only. If it becomes
+ too slow even in debug, gate it behind a separate `CFREE_OPT_SSA_VERIFY`
+ define so day-to-day debug builds still complete in reasonable time.