commit 32f88989b99113efd573a10ab332c1e7732b54ff
parent bf1dc146e22105508838aeb887215cd591aeb6fb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sun, 10 May 2026 10:24:27 -0700
opt: Phase 3 — IR-native recording, SSA construction, dry-run
Refactor opt_cgtarget to record CGTarget calls directly into the SSA
IR (Func/Block/Inst) instead of into a shadow tape. Each method lands
as exactly one Inst; replay walks the Func and re-issues calls onto
the wrapped target. Levels 1 and 2 share the recording and replay
path; level 2 additionally runs build_cfg + build_ssa as a dry-run
(output discarded) to shake out IR-shape bugs ahead of Phase 4's
lowering pipeline.
- src/opt/ir.h, src/opt/ir.c: extend IR with CGTarget-shape ops
(IR_COPY, IR_ADDR_OF, IR_TLS_ADDR_OF, IR_CMP_BRANCH,
IR_SCOPE_BEGIN/ELSE/END, IR_BREAK_TO/CONTINUE_TO,
IR_BINOP/UNOP/CMP/CONVERT, IR_LOAD_IMM/CONST, etc.); switch
Inst.opnds from Val* to Operand* (collapsing Reg ↔ Val per
doc/OPT.md §5.1); add Func.emit_order so replay walks blocks in
the order CG visited them (cmp_branch fallthrough blocks are
created after a label_new'd block but must physically follow it).
- src/opt/opt.c: drop the tape; record straight into Func; replay
allocates target Regs lazily on first def via target->alloc_reg.
- src/opt/pass_cfg.c: rebuild Block.preds from succ[]/nsucc.
- src/opt/pass_ssa.c: Cooper-Harvey-Kennedy iterative dominators,
dominance frontiers, mem2reg phi insertion at iterated DF, dom-
tree rename DFS. Promotable slots identified via OPK_LOCAL on
IR_LOAD/STORE.
- test/cg/run.sh: default CFREE_OPT_LEVELS to "0 1 2".
- doc/OPT.md: revise the level dial — both levels build CFG + SSA
and (post-Phase 4) emit through SSA → machinize → regalloc →
emit; the level selects only the optimization schedule. Level 1
is the minimal/bisection-floor set; level 2 is the full pipeline
with IPO.
Corpus stays green at all three levels (1742 pass).
Diffstat:
| M | doc/OPT.md | | | 225 | +++++++++++++++++++++++++++++++++++++++++++++---------------------------------- |
| A | src/opt/ir.c | | | 176 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| M | src/opt/ir.h | | | 353 | ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------- |
| M | src/opt/opt.c | | | 2325 | ++++++++++++++++++++++++++++++------------------------------------------------- |
| A | src/opt/pass_cfg.c | | | 116 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| A | src/opt/pass_ssa.c | | | 427 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
| M | test/cg/run.sh | | | 60 | +++++++++++++++++++++++++++++++++++++----------------------- |
7 files changed, 1966 insertions(+), 1716 deletions(-)
diff --git a/doc/OPT.md b/doc/OPT.md
@@ -19,49 +19,63 @@ and builds out to a real intra+IPO pipeline.
## 1. What working opt must look like
-Two distinct replay engines, sharing one recording front-end.
-
-### 1.1 Level-1 (function-at-a-time, no SSA)
+A single shared engine. CG drives `opt_cgtarget`; every CGTarget call
+lands as exactly one `Inst` in a per-function flat-CFG IR (§5.1). On
+`func_end` (intra-procedural) and `finalize` (inter-procedural), the
+wrapper runs an optimization schedule, lowers through machinize →
+regalloc → emit, and drives the wrapped target CGTarget to produce
+machine code.
```
-parse → cg → opt_cgtarget {record into Func} → on func_end:
- optionally rewrite the recorded tape (peephole, constfold)
- replay each Inst back as the matching CGTarget call →
- wrapped target → MCEmitter → ObjBuilder
+parse → cg → opt_cgtarget {record into Func} →
+ on func_end: build_cfg → build_ssa → <intra schedule> → <lower>
+ on finalize: <inter schedule> → for each dirty Func: <lower>
+ → wrapped target CGTarget → MCEmitter → ObjBuilder
+
+ <lower> = make_conventional_ssa → ssa_combine → undo_ssa →
+ machinize → live_info → coalesce → regalloc → combine →
+ dce → opt_emit
```
-Replay is 1:1: each recorded `Inst` corresponds to one `target->method(...)`
-call. No phis, no register allocation, no IR-level CFG transforms that
-would break the correspondence. This preserves `doc/DESIGN.md` §8's
-"function-at-a-time" streaming guarantee at -O1.
+The level dial selects only the optimization schedule; everything
+else is shared.
-### 1.2 Level-2 (TU-buffered, SSA, IPO, full lowering)
+### 1.1 Level 1 — minimal
-```
-parse → cg → opt_cgtarget {record into Func} → on func_end:
- store Func; do nothing else.
- on cgtarget_finalize:
- for each Func: build_cfg, build_ssa (incl. mem2reg), gvn,
- copy_prop, dse, ssa_dce, licm, pressure_relief,
- make_conventional_ssa, ssa_combine, undo_ssa, jump_opt
- opt_inline + opt_cleanup (bounded iterations, -finline-iters=N)
- for each Func: machinize, live_info, coalesce, regalloc,
- combine, dce, opt_emit → wrapped target → MCEmitter
-```
+Intra: `build_cfg`, `build_ssa` (mem2reg), `jump_opt`. No GVN, no
+LICM, no DSE.
+Inter: none (no inlining; no cleanup iteration).
+
+Just enough to land code through the SSA pipeline at quality
+comparable to direct `CGTarget` lowering. The point of level 1 is
+isolation: when a bug reproduces here, it is in IR construction,
+SSA, or lowering — not in any of the value-changing passes. It is
+the working bisection floor for the rest of the pipeline.
-The level-2 path *cannot* preserve the 1:1 record→replay correspondence
-once SSA/regalloc has run — the lowering pipeline is the only path
-back to the wrapped CGTarget. This split is the single biggest
-load-bearing decision in the design.
+### 1.2 Level 2 — full
+
+Intra (per `doc/DESIGN.md` §9.2): `build_cfg`, `build_ssa` (mem2reg),
+`gvn` (incl. constprop, redundant-load elim), `copy_prop` (incl.
+redundant-extension elim), `dse`, `ssa_dce`, `jump_opt`,
+`build_loop_tree` + `licm`, `addr_xform`, `block_cloning`,
+`pressure_relief`.
+Inter: `opt_inline` + `opt_cleanup` (bounded by `-finline-iters=N`,
+default 1).
### 1.3 Equivalence we commit to
For every case in `test/cg/CORPUS.md` Groups A–Q, building with
-`opt_cgtarget_new(c, target, level)` for `level ∈ {1, 2}` must produce
-the same `test_main` exit code as building against the AArch64 target
-directly. DWARF (W path) equivalence is weaker — opt at level 2 may
-collapse line rows or move locations into loclists — but Group P at
-level 0/1 must be byte-identical.
+`opt_cgtarget_new(c, target, level)` for `level ∈ {1, 2}` must
+produce the same `test_main` exit code as building against the
+AArch64 target directly. Both levels emit through the SSA →
+machinize → regalloc → emit path, so neither is byte-equivalent to
+level 0 — exit-code equivalence is the contract.
+
+DWARF (W path) equivalence is weaker. Level 1 aims for row-by-row
+parity with level 0 in Group P; level 2 may collapse line rows or
+move locations into loclists when value-changing passes fire.
+Neither is a hard contract until the DWARF cross-level harness lands
+(§3 Phase 5+).
---
@@ -126,8 +140,10 @@ level 0/1 must be byte-identical.
Each phase ends at a green test surface. Phases 0–3 are reversible —
strip the wrapper back to identity replay and the system still works.
-Phase 4 is the wedge: once SSA/regalloc is in the pipeline, you need
-the full lowering path to get bytes back.
+Phase 4 is the wedge: once the SSA → lowering pipeline is the only
+path to bytes, both levels go through it. Levels 1 and 2 share the
+pipeline from Phase 4 onward and diverge only in which optimization
+passes the schedule includes (§1.1, §1.2).
### Phase 0 — wrapper skeleton + equivalence harness
@@ -152,10 +168,14 @@ Harness:
- Add `--opt-level N` flag to `test/cg/harness/cg_runner.c`. When
`N > 0`, wrap the constructed `CGTarget*` with
`opt_cgtarget_new(c, target, N)` before running the case.
-- `test/cg/run.sh`: each case in Groups A–Q runs at `--opt-level 0`
- and `--opt-level 1`; exit codes must match. Add `--opt-level 2`
- once Phase 4 lands. Group P (DWARF/W path) stays at level 0 only
- for now — opt-level equivalence on DWARF is a Phase 5+ concern.
+- `test/cg/run.sh`: each case in Groups A–Q runs at `--opt-level 0`,
+ `1`, and `2`; exit codes must match across all three. Through
+ Phase 3, levels 1 and 2 share the recorder's 1:1 replay path with
+ level 2 additionally exercising `build_cfg` + `build_ssa` as a
+ dry-run (output discarded). Phase 4 promotes both levels to the
+ shared SSA → lower path. Group P (DWARF/W path) stays at level 0
+ only for now — opt-level equivalence on DWARF is a Phase 5+
+ concern.
Tests: full Groups A–Q D/R/E/J pass at `--opt-level 0` and
`--opt-level 1`. Any divergence means the forwarding lost data — fix
@@ -222,30 +242,35 @@ disassembly diff against `--opt-level 0`.
### Phase 3 — SSA construction, dry-run
Goal: build SSA without consuming it. Catches IR-shape bugs before
-they matter.
+they matter — Phase 4's lowering pipeline depends on the SSA being
+correct, so we shake out construction bugs separately.
- `src/opt/pass_cfg.c`: `opt_build_cfg` — derives
`Block.preds`/`succ`/`nsucc` from terminators (`IR_BR`, `IR_CONDBR`,
- `IR_RET`, `IR_LONGJMP`, `IR_INTRINSIC{TRAP,UNREACHABLE}`).
- `IR_SETJMP` is a control barrier — splits its block but is not a
- terminator (control falls through).
+ `IR_CMP_BRANCH`, `IR_RET`, `IR_LONGJMP`, `IR_BREAK_TO`,
+ `IR_CONTINUE_TO`, `IR_INTRINSIC{TRAP,UNREACHABLE}`). `IR_SETJMP` is
+ a control barrier — splits its block but is not a terminator
+ (control falls through).
- `src/opt/pass_ssa.c`: `opt_build_ssa` — standard dominance-frontier
algorithm; promotes any `FrameSlot` whose `FSF_ADDR_TAKEN` bit was
never set (mem2reg folded in per `doc/DESIGN.md` §12). Inserts
`IR_PHI` instructions with `IRPhiAux` populated.
-- At level 2 these run on `func_end` but their output is *discarded*
- before replay. Goal is "no panics on the corpus", not "improved
+- These run at both levels on `func_end`, but their output is
+ *discarded* before replay until Phase 4 lands the lowering path.
+ Goal at this phase is "no panics on the corpus", not "improved
code".
-Tests: at level 2, recorder runs `build_cfg` + `build_ssa` then
-forces level-1 replay. Corpus stays green; any panic is a real bug
-(unhandled IROp, malformed CFG from goto/switch lowering, address-
-taken-detection miss).
+Tests: at levels 1 and 2, recorder runs `build_cfg` + `build_ssa`
+then falls back to the recorder's 1:1 replay. Corpus stays green;
+any panic is a real bug (unhandled IROp, malformed CFG from
+goto/switch lowering, address-taken-detection miss).
### Phase 4 — lowering pipeline (the wedge)
-Goal: real level-2 path: SSA → machinize → regalloc → emit. From
-this point, level-2 cannot fall back to level-1 replay.
+Goal: SSA → make_conventional_ssa → undo_ssa → machinize →
+live_info → coalesce → regalloc → combine → dce → opt_emit, replacing
+the recorder's 1:1 replay at *both* levels. From this point, neither
+level falls back to record→replay.
- `src/opt/pass_machinize.c`: `opt_machinize(Func*, Target)` — ABI
lowering (calls into `TargetABI` for argument/return part
@@ -259,53 +284,59 @@ this point, level-2 cannot fall back to level-1 replay.
target for clobbers and the param/sret physical mapping).
- `src/opt/pass_emit.c`: `opt_emit(Compiler*, Func*, CGTarget*)` —
walks the lowered IR and drives the wrapped target via the
- emit-side CGTarget surface. This is symmetric with `replay.c` from
- Phase 1: Inst → target call. Difference is that operands are now
- physical Operands (post-RA) and prolog/epilog/spill insertion has
- happened.
-- Wrapper's `finalize` at level 2 runs the per-Func intra pipeline,
- then for each Func runs the lowering pipeline, then calls
- `target->finalize(target)`.
-
-Initial pass set inside the intra pipeline is *empty* — `build_ssa`
-followed immediately by `undo_ssa`/`make_conventional_ssa`. The
-lowering pipeline does the real work. This isolates the lowering bugs
-from the optimization bugs.
-
-Tests: `--opt-level 2` corpus is green. Equivalence harness extended
-to cover level 2.
-
-Exit criterion: level-2 corpus green, no regressions in the level-1
-or level-0 suites.
-
-### Phase 5 — intra-procedural passes, one at a time
-
-Goal: enable real optimizations. Each pass is independently
-toggleable for bisection.
-
-Order (tracking `doc/DESIGN.md` §9.2):
-
-1. `opt_gvn` (with constprop, redundant-load elim folded in).
-2. `opt_copy_prop` (with redundant-extension elim).
-3. `opt_dse`.
-4. `opt_ssa_dce`.
-5. `opt_jump_opt`.
-6. `opt_build_loop_tree` + `opt_licm`.
-7. `opt_addr_xform`.
-8. `opt_block_cloning`.
-9. `opt_pressure_relief`.
-10. Lowering-time additions: `opt_coalesce`, `opt_combine`,
- `opt_dce` (post-RA), live-range splitting in `opt_regalloc`.
-
-Each pass lands behind a flag so the equivalence harness can run with
-just-this-pass to localize bugs. No UB-exploiting transformations
-(`doc/DESIGN.md` §9): no signed-overflow-is-unreachable, no
-shift-by-≥-width-is-unreachable, no division-by-zero-is-unreachable,
-no null-deref-is-unreachable.
-
-### Phase 6 — inter-procedural
-
-Goal: cross-function inlining + cleanup iteration.
+ emit-side CGTarget surface. Inst → target call, but operands are
+ now physical Operands (post-RA) and prolog/epilog/spill insertion
+ has happened.
+- Wrapper's `func_end` runs `build_cfg` + `build_ssa` +
+ `make_conventional_ssa` + `undo_ssa` + the lowering pipeline at
+ both levels. No optimization passes between SSA build and undo at
+ this phase — the lowering pipeline does all the work, so we
+ isolate lowering bugs from optimization-pass bugs. Level 1 and
+ level 2 are functionally identical at the end of Phase 4.
+
+Tests: `--opt-level 1` and `--opt-level 2` corpora green. The
+recorder's old 1:1 replay path is removed; SSA → lower is the only
+path to bytes.
+
+Exit criterion: levels 1 and 2 both green, no regressions in the
+level-0 suite.
+
+### Phase 5 — intra-procedural passes, level-gated
+
+Goal: populate the optimization schedule. All new passes land in the
+*level-2* schedule only; level 1's schedule stays at the Phase 4 set
+(`build_cfg`, `build_ssa`, `jump_opt` once it's wired in) so it
+remains the bisection floor for IR/SSA/lowering bugs.
+
+Order (tracking `doc/DESIGN.md` §9.2; each pass lands behind a flag
+so the equivalence harness can run with just-this-pass to localize
+bugs):
+
+1. `opt_jump_opt` — moves into the level-1 schedule too (cheap,
+ high-value, doesn't change values).
+2. `opt_gvn` (with constprop, redundant-load elim folded in) —
+ level 2 only.
+3. `opt_copy_prop` (with redundant-extension elim) — level 2 only.
+4. `opt_dse` — level 2 only.
+5. `opt_ssa_dce` — level 2 only.
+6. `opt_build_loop_tree` + `opt_licm` — level 2 only.
+7. `opt_addr_xform` — level 2 only.
+8. `opt_block_cloning` — level 2 only.
+9. `opt_pressure_relief` — level 2 only.
+10. Lowering-time additions (run at both levels): `opt_coalesce`,
+ `opt_combine`, `opt_dce` (post-RA), live-range splitting in
+ `opt_regalloc`.
+
+No UB-exploiting transformations (`doc/DESIGN.md` §9): no
+signed-overflow-is-unreachable, no shift-by-≥-width-is-unreachable,
+no division-by-zero-is-unreachable, no null-deref-is-unreachable.
+
+### Phase 6 — inter-procedural (level 2 only)
+
+Goal: cross-function inlining + cleanup iteration. Level 1 stays
+intra-procedural — the inliner is the largest source of correctness
+risk and the largest divergence from level-0 codegen, so it earns
+its own gating.
- `opt_inline`: bottom-up call-graph walk. SCCs (mutual recursion)
skipped. Heuristic: instruction count + call site count. Inlining
diff --git a/src/opt/ir.c b/src/opt/ir.c
@@ -0,0 +1,176 @@
+/* ir.c — Func/Block/Inst plumbing for the SSA IR (doc/OPT.md §1).
+ *
+ * Each CGTarget call recorded by opt_cgtarget produces exactly one Inst
+ * (or none, for pure bookkeeping calls like alloc_reg). Storage is per-
+ * Func arena, allocated against c->tu so the Func survives until
+ * cgtarget_finalize.
+ *
+ * Invariants:
+ * - VAL_NONE (= 0) is reserved; first allocated Val is 1.
+ * - val_def_block / val_def_inst / val_type / val_cls are parallel
+ * arrays indexed by Val.
+ * - Inst.opnds is Operand[] (not Val[]): Reg/Val are collapsed
+ * (doc/OPT.md §5.1) and OPK_REG operands' v.reg field IS the Val
+ * used at this site. Other OpKinds (IMM/LOCAL/GLOBAL/INDIRECT) are
+ * not Val uses for SSA dataflow.
+ */
+
+#include "opt/ir.h"
+
+#include <string.h>
+
+#include "core/arena.h"
+#include "core/core.h"
+
+/* ---- val table ---- */
+
+static void val_table_grow(Func* f, u32 needed) {
+ if (needed <= f->vals_cap) return;
+ u32 ncap = f->vals_cap ? f->vals_cap : 16u;
+ while (ncap < needed) ncap *= 2u;
+ u32* nb_blk = arena_zarray(f->arena, u32, ncap);
+ u32* nb_ins = arena_zarray(f->arena, u32, ncap);
+ const Type** nb_ty = arena_zarray(f->arena, const Type*, ncap);
+ u8* nb_cls = arena_zarray(f->arena, u8, ncap);
+ if (f->nvals) {
+ memcpy(nb_blk, f->val_def_block, sizeof(u32) * f->nvals);
+ memcpy(nb_ins, f->val_def_inst, sizeof(u32) * f->nvals);
+ memcpy(nb_ty, f->val_type, sizeof(const Type*) * f->nvals);
+ memcpy(nb_cls, f->val_cls, sizeof(u8) * f->nvals);
+ }
+ f->val_def_block = nb_blk;
+ f->val_def_inst = nb_ins;
+ f->val_type = nb_ty;
+ f->val_cls = nb_cls;
+ f->vals_cap = ncap;
+}
+
+Val ir_alloc_val(Func* f, const Type* t, u8 cls) {
+ Val v;
+ if (f->nvals == 0) {
+ val_table_grow(f, 16);
+ f->nvals = 1; /* reserve slot 0 for VAL_NONE */
+ }
+ if (f->nvals == f->vals_cap) val_table_grow(f, f->nvals + 1);
+ v = f->nvals++;
+ f->val_def_block[v] = 0;
+ f->val_def_inst[v] = 0;
+ f->val_type[v] = t;
+ f->val_cls[v] = cls;
+ return v;
+}
+
+/* ---- blocks ---- */
+
+u32 ir_block_new(Func* f) {
+ Block* b;
+ if (f->nblocks == f->blocks_cap) {
+ u32 ncap = f->blocks_cap ? f->blocks_cap * 2u : 8u;
+ Block* nb = arena_zarray(f->arena, Block, ncap);
+ if (f->blocks) memcpy(nb, f->blocks, sizeof(Block) * f->nblocks);
+ f->blocks = nb;
+ f->blocks_cap = ncap;
+ }
+ b = &f->blocks[f->nblocks];
+ memset(b, 0, sizeof *b);
+ b->id = f->nblocks;
+ return f->nblocks++;
+}
+
+/* ---- emit order ---- */
+
+void ir_note_emit(Func* f, u32 block) {
+ /* Linear scan: emit_order is small in practice (one entry per
+ * placed block, dozens at most for the corpus) and we only ever
+ * append, so a hash table would be overkill. */
+ for (u32 i = 0; i < f->emit_order_n; ++i)
+ if (f->emit_order[i] == block) return;
+ if (f->emit_order_n == f->emit_order_cap) {
+ u32 ncap = f->emit_order_cap ? f->emit_order_cap * 2u : 8u;
+ u32* nb = arena_array(f->arena, u32, ncap);
+ if (f->emit_order)
+ memcpy(nb, f->emit_order, sizeof(u32) * f->emit_order_n);
+ f->emit_order = nb;
+ f->emit_order_cap = ncap;
+ }
+ f->emit_order[f->emit_order_n++] = block;
+}
+
+/* ---- inst append ---- */
+
+Inst* ir_emit(Func* f, u32 block, IROp op) {
+ Block* b = &f->blocks[block];
+ Inst* in;
+ if (b->ninsts == b->cap) {
+ u32 ncap = b->cap ? b->cap * 2u : 8u;
+ Inst* nb = arena_zarray(f->arena, Inst, ncap);
+ if (b->insts) memcpy(nb, b->insts, sizeof(Inst) * b->ninsts);
+ b->insts = nb;
+ b->cap = ncap;
+ }
+ in = &b->insts[b->ninsts++];
+ memset(in, 0, sizeof *in);
+ in->op = (u16)op;
+ return in;
+}
+
+/* ---- frame slots / params ---- */
+
+FrameSlot ir_frame_slot_new(Func* f, const FrameSlotDesc* d) {
+ IRFrameSlot* s;
+ FrameSlot id;
+ if (f->nframe_slots == f->frame_slots_cap) {
+ u32 ncap = f->frame_slots_cap ? f->frame_slots_cap * 2u : 8u;
+ IRFrameSlot* nb = arena_zarray(f->arena, IRFrameSlot, ncap);
+ if (f->frame_slots)
+ memcpy(nb, f->frame_slots, sizeof(IRFrameSlot) * f->nframe_slots);
+ f->frame_slots = nb;
+ f->frame_slots_cap = ncap;
+ }
+ id = (FrameSlot)(f->nframe_slots + 1);
+ s = &f->frame_slots[f->nframe_slots++];
+ s->id = id;
+ s->type = d->type;
+ s->name = d->name;
+ s->loc = d->loc;
+ s->size = d->size;
+ s->align = d->align;
+ s->kind = d->kind;
+ s->flags = d->flags;
+ return id;
+}
+
+void ir_param_add(Func* f, const CGParamDesc* d) {
+ IRParam* p;
+ if (f->nparams == f->params_cap) {
+ u32 ncap = f->params_cap ? f->params_cap * 2u : 4u;
+ IRParam* nb = arena_zarray(f->arena, IRParam, ncap);
+ if (f->params) memcpy(nb, f->params, sizeof(IRParam) * f->nparams);
+ f->params = nb;
+ f->params_cap = ncap;
+ }
+ p = &f->params[f->nparams++];
+ p->index = d->index;
+ p->name = d->name;
+ p->type = d->type;
+ p->slot = d->slot;
+ p->abi = d->abi;
+ p->loc = d->loc;
+}
+
+/* ---- construction ---- */
+
+Func* ir_func_new(Compiler* c, const CGFuncDesc* desc) {
+ Func* f = arena_znew(c->tu, Func);
+ f->arena = c->tu;
+ f->desc = *desc;
+ f->name = desc->sym;
+ f->type = desc->fn_type;
+ /* Reserve slot 0 of the val table eagerly so the very first ir_alloc_val
+ * returns Val=1. */
+ val_table_grow(f, 16);
+ f->nvals = 1;
+ /* Caller is expected to ir_block_new(f) for entry, then assign
+ * f->entry. */
+ return f;
+}
diff --git a/src/opt/ir.h b/src/opt/ir.h
@@ -6,122 +6,104 @@
#include "core/core.h"
#include "type/type.h"
+/* SSA value id. Identical to the Reg space (doc/OPT.md §5.1): when an
+ * Operand has kind OPK_REG, v.reg names the Val it carries. VAL_NONE=0
+ * is reserved as a sentinel. */
typedef u32 Val;
#define VAL_NONE 0u
+/* IROps mirror the CGTarget surface 1:1 plus a handful of SSA-only ops
+ * (IR_PHI, IR_CONST_I, IR_CONST_BYTES). Each CGTarget method records as
+ * exactly one Inst, so level-1 replay is a flat walk that re-issues
+ * each Inst as one wrapped target call. doc/OPT.md §1.1. */
typedef enum IROp {
IR_NOP,
+ /* SSA-only constants (used by const-propagation; recording uses
+ * load_imm/load_const which become IR_LOAD_IMM/IR_LOAD_CONST). */
IR_CONST_I,
IR_CONST_BYTES,
- IR_PARAM,
- IR_ALLOCA,
- IR_LOAD,
- IR_STORE,
- IR_AGG_COPY,
- IR_AGG_SET,
- IR_BITFIELD_LOAD,
- IR_BITFIELD_STORE,
- IR_IADD,
- IR_ISUB,
- IR_IMUL,
- IR_SDIV,
- IR_UDIV,
- IR_SREM,
- IR_UREM,
- IR_FADD,
- IR_FSUB,
- IR_FMUL,
- IR_FDIV,
- IR_AND,
- IR_OR,
- IR_XOR,
- IR_SHL,
- IR_ASHR,
- IR_LSHR,
- IR_NEG,
- IR_BNOT,
- IR_CMP_EQ,
- IR_CMP_NE,
- IR_CMP_SLT,
- IR_CMP_SLE,
- IR_CMP_ULT,
- IR_CMP_ULE,
- IR_CMP_FLT,
- IR_CMP_FLE,
- IR_CMP_FEQ,
- IR_CMP_FNE,
- IR_SEXT,
- IR_ZEXT,
- IR_TRUNC,
- IR_BITCAST,
- IR_SITOFP,
- IR_UITOFP,
- IR_FPTOSI,
- IR_FPTOUI,
- IR_FPEXT,
- IR_FPTRUNC,
- IR_GEP,
+
+ /* Param/frame declarations recorded from CGTarget.param. The frame
+ * slot id table lives separately on Func; this op records the
+ * sequence point so replay can re-issue target->param in order. */
+ IR_PARAM_DECL,
+
+ /* Address-bearing data movement. */
+ IR_LOAD_IMM, /* opnds[0] dst REG; extra.imm = imm */
+ IR_LOAD_CONST, /* opnds[0] dst REG; extra.cbytes */
+ IR_COPY, /* opnds[0] dst REG, opnds[1] src REG */
+ IR_LOAD, /* opnds[0] dst REG, opnds[1] addr; extra.mem */
+ IR_STORE, /* opnds[0] addr, opnds[1] src REG|IMM; extra.mem */
+ IR_ADDR_OF, /* opnds[0] dst REG, opnds[1] lv (LOCAL|GLOBAL|INDIRECT) */
+ IR_TLS_ADDR_OF, /* opnds[0] dst REG; extra.aux = IRTlsAux */
+ IR_AGG_COPY, /* opnds[0] dst, opnds[1] src; extra.aux = IRAggAux */
+ IR_AGG_SET, /* opnds[0] dst, opnds[1] byte; extra.aux = IRAggAux */
+ IR_BITFIELD_LOAD, /* opnds[0] dst REG, opnds[1] record; extra.aux */
+ IR_BITFIELD_STORE, /* opnds[0] record, opnds[1] src; extra.aux */
+
+ /* Arithmetic / cmp / convert. opnds[0] dst REG, [1] a, optionally [2] b.
+ * extra.imm carries the BinOp/UnOp/CmpOp/ConvKind tag. */
+ IR_BINOP,
+ IR_UNOP,
+ IR_CMP,
+ IR_CONVERT,
+
+ /* Calls. extra.aux = IRCallAux (see below). defs = result Vals. */
IR_CALL,
+
+ /* Phis. extra.aux = IRPhiAux. */
IR_PHI,
- IR_BR,
- IR_CONDBR,
- IR_RET,
- IR_ATOMIC_LOAD,
- IR_ATOMIC_STORE,
- IR_ATOMIC_RMW, /* extra.imm encodes (AtomicOp << 8) | MemOrder */
- IR_ATOMIC_CAS, /* extra.imm encodes (success << 8) | failure */
- IR_FENCE, /* extra.imm = MemOrder */
- IR_VA_START,
- IR_VA_ARG,
- IR_VA_END,
- IR_VA_COPY,
- IR_SETJMP, /* returns-twice; opt treats as control barrier */
- IR_LONGJMP, /* terminator-like; control does not return */
- IR_ASM_BLOCK, /* opaque to most passes; preserves order, defines outs,
- clobbers */
- IR_INTRINSIC, /* extra.imm = IntrinKind; multi-result for *_OVERFLOW */
-} IROp;
-typedef struct IRCallAux {
- const Type* fn_type;
- const ABIFuncInfo* abi;
- ObjSymId direct_sym; /* OBJ_SYM_NONE for indirect */
- Val callee; /* VAL_NONE for direct_sym calls */
- u32 nargs;
- Val* args;
- u32 nresults;
- Val* results; /* ABI return parts and multi-result builtins */
- CGABIValue ret_abi;
-} IRCallAux;
+ /* Control flow / scopes. */
+ IR_BR, /* unconditional. block.succ[0] = target block id. */
+ IR_CONDBR, /* opnds[0] cond REG; succ[0] = true, succ[1] = false. */
+ IR_CMP_BRANCH, /* fused. opnds = [a, b]; extra.imm = CmpOp;
+ succ[0] = taken, succ[1] = fallthrough. */
+ IR_RET, /* extra.aux = IRRetAux* (NULL for void). */
+ IR_SCOPE_BEGIN, /* extra.aux = IRScopeAux. defs[0] = scope id Val. */
+ IR_SCOPE_ELSE, /* extra.imm = scope id (Val). */
+ IR_SCOPE_END, /* extra.imm = scope id (Val). */
+ IR_BREAK_TO, /* extra.imm = scope id (Val). */
+ IR_CONTINUE_TO, /* extra.imm = scope id (Val). */
-typedef struct IRFrameSlot {
- FrameSlot id;
- const Type* type;
- Sym name;
- SrcLoc loc;
- u32 size;
- u32 align;
- u8 kind; /* FrameSlotKind */
- u8 pad;
- u16 flags; /* FrameSlotFlag */
-} IRFrameSlot;
+ /* alloca / variadics. */
+ IR_ALLOCA, /* opnds = [dst REG, size]; extra.imm = align */
+ IR_VA_START, /* opnds = [ap] */
+ IR_VA_ARG, /* opnds = [dst REG, ap]; extra.aux = const Type* */
+ IR_VA_END, /* opnds = [ap] */
+ IR_VA_COPY, /* opnds = [dst, src] */
-typedef struct IRParam {
- u32 index;
- Sym name;
- const Type* type;
- FrameSlot slot;
- const ABIArgInfo* abi;
- SrcLoc loc;
-} IRParam;
+ /* setjmp/longjmp. */
+ IR_SETJMP, /* opnds = [dst REG, buf] */
+ IR_LONGJMP, /* opnds = [buf, val]; (terminator-like, no fallthrough) */
-typedef struct IRMemAux {
- MemAccess mem;
-} IRMemAux;
+ /* Atomics. */
+ IR_ATOMIC_LOAD, /* opnds = [dst, addr]; extra.aux = IRAtomicAux */
+ IR_ATOMIC_STORE, /* opnds = [addr, src]; extra.aux = IRAtomicAux */
+ IR_ATOMIC_RMW, /* opnds = [dst, addr, val]; extra.aux */
+ IR_ATOMIC_CAS, /* defs = [prior, ok]; extra.aux = IRCasAux */
+ IR_FENCE, /* extra.imm = MemOrder */
+
+ /* Inline asm. extra.aux = IRAsmAux. */
+ IR_ASM_BLOCK,
-typedef struct IRAggregateAux {
+ /* Compiler intrinsics (see arch.h IntrinKind). extra.aux = IRIntrinAux. */
+ IR_INTRINSIC,
+
+ /* set_loc is *not* an IR op — SrcLoc is sticky on Inst.loc, applied at
+ * recording time from the wrapper's pending_loc. */
+} IROp;
+
+/* ---- per-op aux structs ---- */
+
+typedef struct IRTlsAux {
+ ObjSymId sym;
+ i64 addend;
+} IRTlsAux;
+
+typedef struct IRAggAux {
AggregateAccess access;
-} IRAggregateAux;
+} IRAggAux;
typedef struct IRBitFieldAux {
BitFieldAccess access;
@@ -137,42 +119,119 @@ typedef struct IRPhiAux {
u32 npreds;
u32* pred_blocks;
Val* pred_vals;
+ u32 slot_id; /* 0 if not from mem2reg; else 1-based FrameSlot id */
} IRPhiAux;
-typedef struct IRAsmAux {
- const char* tmpl;
- const AsmConstraint* outs;
- const AsmConstraint* ins;
- const Sym* clobbers;
- u32 nout, nin, nclob;
-} IRAsmAux;
+/* IR_CALL aux. The CGTarget interface is rich enough that we keep the
+ * full descriptor for replay; SSA passes inspect args/results in their
+ * Val form via the CGABIValue.storage.v.reg fields where applicable. */
+typedef struct IRCallAux {
+ CGCallDesc desc;
+ /* Result Vals (one per ABI-decomposed return part). 0 for void. */
+ u32 nresults;
+ Val* results;
+ /* For SSA-aware passes: the call may have args that are REG operands
+ * (carrying Val uses). They live in desc.args[i].storage.v.reg or
+ * desc.args[i].parts[k].op.v.reg. */
+} IRCallAux;
+
+typedef struct IRRetAux {
+ u8 present; /* 0 → void return; ret(NULL) at replay */
+ CGABIValue val;
+} IRRetAux;
+
+typedef struct IRScopeAux {
+ CGScopeDesc desc;
+ u32 scope_id; /* 1-based; the CGScope handed back to the caller */
+ /* For SCOPE_IF: blocks for then-arm, else-arm, and join after end.
+ * For SCOPE_LOOP/SCOPE_BLOCK: the caller-supplied break/continue
+ * labels translated to block ids; if the desc passed LABEL_NONE we
+ * leave 0 (unused — break_to/continue_to is illegal in that case). */
+ u32 if_then_block;
+ u32 if_else_block;
+ u32 if_end_block;
+ u32 loop_break_block;
+ u32 loop_continue_block;
+ u8 if_has_else;
+} IRScopeAux;
+
+typedef struct IRAtomicAux {
+ MemAccess mem;
+ MemOrder mo;
+ u8 op; /* AtomicOp; valid for IR_ATOMIC_RMW */
+} IRAtomicAux;
typedef struct IRCasAux {
MemAccess mem;
MemOrder success;
MemOrder failure;
- Val prior;
- Val ok;
} IRCasAux;
+typedef struct IRAsmAux {
+ const char* tmpl;
+ AsmConstraint* outs;
+ AsmConstraint* ins;
+ Sym* clobbers;
+ Operand* out_ops; /* nout slots; the wrapped target may fill in REG location */
+ u32 nout, nin, nclob;
+} IRAsmAux;
+
+typedef struct IRIntrinAux {
+ IntrinKind kind;
+ Operand* dsts; /* ndst */
+ Operand* args; /* narg */
+ Val* result_vals; /* one per dst that's a REG, parallel to dsts */
+ u32 ndst, narg;
+} IRIntrinAux;
+
+typedef struct IRParamDeclAux {
+ CGParamDesc desc;
+} IRParamDeclAux;
+
+/* ---- frame slots / params ---- */
+
+typedef struct IRFrameSlot {
+ FrameSlot id;
+ const Type* type;
+ Sym name;
+ SrcLoc loc;
+ u32 size;
+ u32 align;
+ u8 kind; /* FrameSlotKind */
+ u8 pad;
+ u16 flags; /* FrameSlotFlag */
+} IRFrameSlot;
+
+typedef struct IRParam {
+ u32 index;
+ Sym name;
+ const Type* type;
+ FrameSlot slot;
+ const ABIArgInfo* abi;
+ SrcLoc loc;
+} IRParam;
+
+/* ---- Inst / Block / Func ---- */
+
typedef struct Inst {
u16 op;
- u16 flags; /* per-op flags (e.g. nsw/nuw, volatile) */
- SrcLoc loc; /* set from CGTarget.set_loc when this insn was recorded */
+ u16 flags;
+ SrcLoc loc; /* sticky from CGTarget.set_loc at recording */
const Type* type;
- Val def; /* this instruction's SSA value, or VAL_NONE */
- u32 ndefs; /* multi-result instructions use defs[0..ndefs) */
- Val* defs; /* arena-allocated; NULL when ndefs <= 1 */
+ Val def; /* primary SSA def, or VAL_NONE */
+ u32 ndefs; /* multi-result */
+ Val* defs;
+ /* Operands. We use Operand instead of Val so that the original
+ * CGTarget call shape (IMM / LOCAL / GLOBAL / INDIRECT in addition to
+ * REG) round-trips at replay. SSA passes treat OPK_REG operands as
+ * Val uses (via .v.reg). */
u32 nopnds;
- Val* opnds; /* arena-allocated */
+ Operand* opnds;
union {
i64 imm;
ConstBytes cbytes;
- struct {
- ObjSymId sym;
- } objsym;
MemAccess mem;
- void* aux; /* one of IR*Aux, arena-owned and typed by op */
+ void* aux;
} extra;
} Inst;
@@ -182,44 +241,66 @@ typedef struct Block {
u32 ninsts, cap;
u32* preds;
u32 npreds;
- u32 succ[2]; /* condbr: 2; br: 1; ret: 0 */
+ u32 succ[2];
u8 nsucc;
} Block;
typedef struct Func {
- /* IR storage. Lives until cgtarget_finalize so inter-procedural passes can
- * read every Func in the TU. Per-pass scratch goes in Arena scratch, not
- * here. */
Arena* arena;
- ObjSymId name;
+ CGFuncDesc desc; /* preserved for level-1 replay func_begin */
+ ObjSymId name; /* alias for desc.sym (kept for older callers) */
const Type* type;
Block* blocks;
u32 nblocks, blocks_cap;
- u32 entry; /* index of entry block */
+ u32 entry;
IRFrameSlot* frame_slots;
u32 nframe_slots, frame_slots_cap;
IRParam* params;
u32 nparams, params_cap;
- /* Value table: for each Val, where it's defined and its type. */
+ /* Value table. Index 0 is VAL_NONE; first allocated Val is 1. */
u32* val_def_block;
u32* val_def_inst;
const Type** val_type;
+ u8* val_cls; /* RegClass per Val, used by replay to reconstruct REG operands */
u32 nvals, vals_cap;
+
+ /* Scope id table. Indexed by scope_id (1-based). Values map to
+ * IRScopeAux entries (via the IR_SCOPE_BEGIN inst). Stored as a flat
+ * pointer table for O(1) lookup during scope_else/end/break/continue
+ * recording and replay. */
+ Inst** scope_aux_inst;
+ u32 nscopes, scopes_cap;
+
+ /* Emit order: the sequence in which blocks first became `cur` during
+ * recording. This is the natural order CG's emit cursor visited each
+ * block, so replay must follow it (block-creation order can differ —
+ * e.g. label_new(L) precedes cmp_branch but the cmp_branch's
+ * fallthrough block is what physically follows the cmp_branch in
+ * code). Blocks not present here are unreachable / unplaced; replay
+ * skips them. */
+ u32* emit_order;
+ u32 emit_order_n, emit_order_cap;
} Func;
-Func* ir_func_new(Arena*, ObjSymId, const Type* fn_type);
+/* ---- API ---- */
+
+Func* ir_func_new(Compiler*, const CGFuncDesc*);
+
u32 ir_block_new(Func*);
-Val ir_emit(Func*, u32 block, IROp, const Type* result, const Val* opnds,
- u32 n);
-void ir_emit_multi(Func*, u32 block, IROp, const Type** results, Val* defs,
- u32 ndefs, const Val* opnds, u32 nopnds);
-Val ir_emit_const_i(Func*, u32 block, const Type*, i64);
-Val ir_emit_const_bytes(Func*, u32 block, ConstBytes);
FrameSlot ir_frame_slot_new(Func*, const FrameSlotDesc*);
void ir_param_add(Func*, const CGParamDesc*);
-void ir_set_terminator(Func*, u32 block, IROp, u32 succ_a, u32 succ_b,
- Val cond);
+
+Val ir_alloc_val(Func*, const Type*, u8 cls);
+
+Inst* ir_emit(Func*, u32 block, IROp);
+
+/* Append `block` to f->emit_order if not already present. Called by
+ * the wrapper whenever cur transitions to a block. */
+void ir_note_emit(Func*, u32 block);
+/* Append Inst to block; caller fills op-specific fields. The Inst is
+ * arena-resident; its address is stable until the Block.insts array is
+ * reallocated by phi insertion (which fixes up references). */
#endif
diff --git a/src/opt/opt.c b/src/opt/opt.c
@@ -1,22 +1,13 @@
-/* opt — CGTarget wrapper that records each function as a tape of
- * CGTarget calls, then replays them onto the wrapped target on
- * func_end. See doc/OPT.md for the phased plan.
+/* opt.c — CGTarget wrapper that records each function as IR (doc/OPT.md
+ * §1). Each CGTarget call lands as exactly one Inst in the current
+ * Func's current block; the wrapper's vreg/vlabel/vslot/vscope ids
+ * coincide with their IR counterparts (Reg ↔ Val, label ↔ block id,
+ * vslot ↔ IR FrameSlot, vscope ↔ scope_aux index — doc/OPT.md §5.1).
*
- * Phase 1 (current): record every emit-side call into a per-function
- * tape; alloc_reg / frame_slot / label_new / scope_begin hand out
- * wrapper-local virtual ids. On func_end the tape is replayed
- * linearly: each entry produces exactly one wrapped target call,
- * with virtual ids translated to target-side ids on the fly. This
- * preserves doc/DESIGN.md §8's "function-at-a-time" streaming
- * guarantee at -O1.
- *
- * Phase 2 (current): a small, safe peephole pass runs over the tape
- * between recording and replay. See try_peephole_constfold.
- *
- * Phase 3+ (deferred): build CFG and SSA from the tape, run
- * intra-procedural passes, lower through machinize → regalloc →
- * emit. Until that lands, level 2 is functionally identical to
- * level 1 (per-function record + replay).
+ * Phase 3 (current): on func_end the wrapper optionally runs
+ * opt_build_cfg + opt_build_ssa (level 2; output discarded), then
+ * replays each Func into the wrapped target. Level 1 skips the SSA
+ * passes; in both cases replay is the level-1 1:1 walk.
*
* Methods the wrapper rejects under unbounded virtuals:
* - clobbers / spill_reg / reload_reg are CG -O0 register-pressure
@@ -31,346 +22,21 @@
#include "core/arena.h"
#include "core/core.h"
-
-/* ---- tape op tags ---- */
-
-typedef enum {
- TOP_FUNC_BEGIN,
- TOP_FUNC_END,
- TOP_ALLOC_REG,
- TOP_FRAME_SLOT,
- TOP_PARAM,
- TOP_LABEL_NEW,
- TOP_LABEL_PLACE,
- TOP_JUMP,
- TOP_CMP_BRANCH,
- TOP_SCOPE_BEGIN,
- TOP_SCOPE_ELSE,
- TOP_SCOPE_END,
- TOP_BREAK_TO,
- TOP_CONTINUE_TO,
- TOP_LOAD_IMM,
- TOP_LOAD_CONST,
- TOP_COPY,
- TOP_LOAD,
- TOP_STORE,
- TOP_ADDR_OF,
- TOP_TLS_ADDR_OF,
- TOP_COPY_BYTES,
- TOP_SET_BYTES,
- TOP_BITFIELD_LOAD,
- TOP_BITFIELD_STORE,
- TOP_BINOP,
- TOP_UNOP,
- TOP_CMP,
- TOP_CONVERT,
- TOP_CALL,
- TOP_RET,
- TOP_ALLOCA,
- TOP_VA_START,
- TOP_VA_ARG,
- TOP_VA_END,
- TOP_VA_COPY,
- TOP_SETJMP,
- TOP_LONGJMP,
- TOP_ATOMIC_LOAD,
- TOP_ATOMIC_STORE,
- TOP_ATOMIC_RMW,
- TOP_ATOMIC_CAS,
- TOP_FENCE,
- TOP_INTRINSIC,
- TOP_SET_LOC,
-} TapeOpKind;
-
-/* TapeEntry: one recorded CGTarget call. The tagged union is wide; we
- * pay arena bytes for clarity. */
-typedef struct TapeEntry {
- u8 op; /* TapeOpKind */
- u8 dead; /* set by peepholes; replay skips dead entries */
- u16 padding;
- SrcLoc loc;
- union {
- /* WOP_FUNC_BEGIN: deep-copied descriptor. The caller's CGFuncDesc
- * may be stack-allocated, so we copy by value into our arena.
- * params[] is also copied; field shapes inside (Type*, ABIArgInfo*,
- * incoming pointer) are TU-lifetime and shared. */
- struct {
- CGFuncDesc desc;
- CGParamDesc* params; /* arena copy of fd.params */
- } func_begin;
-
- /* WOP_ALLOC_REG: returns a vreg, indexed into reg_map at replay. */
- struct {
- RegClass cls;
- const Type* ty;
- Reg vreg;
- } alloc_reg;
-
- /* WOP_FRAME_SLOT */
- struct {
- FrameSlotDesc desc;
- FrameSlot vslot;
- } frame_slot;
-
- /* WOP_PARAM */
- struct {
- CGParamDesc desc;
- } param;
-
- /* WOP_LABEL_NEW */
- struct {
- Label vlabel;
- } label_new;
-
- /* WOP_LABEL_PLACE / WOP_JUMP */
- struct {
- Label vlabel;
- } label_op;
-
- /* WOP_CMP_BRANCH */
- struct {
- CmpOp op;
- Operand a, b;
- Label vlabel;
- } cmp_branch;
-
- /* WOP_SCOPE_BEGIN */
- struct {
- CGScopeDesc desc;
- CGScope vscope;
- } scope_begin;
-
- /* WOP_SCOPE_ELSE / WOP_SCOPE_END / WOP_BREAK_TO / WOP_CONTINUE_TO */
- struct {
- CGScope vscope;
- } scope_op;
-
- /* WOP_LOAD_IMM */
- struct {
- Operand dst;
- i64 imm;
- } load_imm;
-
- /* WOP_LOAD_CONST */
- struct {
- Operand dst;
- ConstBytes cb;
- } load_const;
-
- /* WOP_COPY / WOP_ADDR_OF / WOP_VA_COPY */
- struct {
- Operand dst;
- Operand src;
- } copy;
-
- /* WOP_LOAD */
- struct {
- Operand dst;
- Operand addr;
- MemAccess mem;
- } load;
-
- /* WOP_STORE */
- struct {
- Operand addr;
- Operand src;
- MemAccess mem;
- } store;
-
- /* WOP_TLS_ADDR_OF */
- struct {
- Operand dst;
- ObjSymId sym;
- i64 addend;
- } tls_addr_of;
-
- /* WOP_COPY_BYTES / WOP_SET_BYTES */
- struct {
- Operand a;
- Operand b;
- AggregateAccess agg;
- } agg;
-
- /* WOP_BITFIELD_LOAD */
- struct {
- Operand dst;
- Operand record;
- BitFieldAccess bf;
- } bitfield_load;
-
- /* WOP_BITFIELD_STORE */
- struct {
- Operand record;
- Operand src;
- BitFieldAccess bf;
- } bitfield_store;
-
- /* WOP_BINOP */
- struct {
- BinOp op;
- Operand dst, a, b;
- } binop;
-
- /* WOP_UNOP */
- struct {
- UnOp op;
- Operand dst, a;
- } unop;
-
- /* WOP_CMP */
- struct {
- CmpOp op;
- Operand dst, a, b;
- } cmp;
-
- /* WOP_CONVERT */
- struct {
- ConvKind kind;
- Operand dst, src;
- } convert;
-
- /* WOP_CALL: deep-copied descriptor and inner arrays. */
- struct {
- CGCallDesc desc;
- CGABIValue* args; /* len = desc.nargs */
- CGABIPart* ret_parts; /* len = desc.ret.nparts; NULL if 0 */
- CGABIPart** arg_parts; /* per-arg parts arrays; entry is NULL if 0 */
- } call;
-
- /* WOP_RET: present == 1 means there is a CGABIValue; otherwise a
- * void return. parts is deep-copied. */
- struct {
- u8 present;
- CGABIValue val;
- CGABIPart* parts; /* len = val.nparts */
- } ret;
-
- /* WOP_ALLOCA */
- struct {
- Operand dst;
- Operand size;
- u32 align;
- } alloca_;
-
- /* WOP_VA_START / WOP_VA_END */
- struct {
- Operand ap;
- } va_se;
-
- /* WOP_VA_ARG */
- struct {
- Operand dst;
- Operand ap;
- const Type* ty;
- } va_arg_;
-
- /* WOP_SETJMP */
- struct {
- Operand dst;
- Operand buf;
- } setjmp_;
-
- /* WOP_LONGJMP */
- struct {
- Operand buf;
- Operand val;
- } longjmp_;
-
- /* WOP_ATOMIC_LOAD */
- struct {
- Operand dst;
- Operand addr;
- MemAccess mem;
- MemOrder mo;
- } atomic_load;
-
- /* WOP_ATOMIC_STORE */
- struct {
- Operand addr;
- Operand src;
- MemAccess mem;
- MemOrder mo;
- } atomic_store;
-
- /* WOP_ATOMIC_RMW */
- struct {
- AtomicOp op;
- Operand dst;
- Operand addr;
- Operand val;
- MemAccess mem;
- MemOrder mo;
- } atomic_rmw;
-
- /* WOP_ATOMIC_CAS */
- struct {
- Operand prior;
- Operand ok;
- Operand addr;
- Operand expected;
- Operand desired;
- MemAccess mem;
- MemOrder success;
- MemOrder failure;
- } atomic_cas;
-
- /* WOP_FENCE */
- struct {
- MemOrder mo;
- } fence;
-
- /* WOP_INTRINSIC */
- struct {
- IntrinKind kind;
- Operand* dsts; /* deep-copied */
- u32 ndst;
- Operand* args; /* deep-copied */
- u32 narg;
- } intrinsic;
-
- /* WOP_SET_LOC */
- struct {
- SrcLoc loc;
- } set_loc;
- } u;
-} TapeEntry;
+#include "opt/ir.h"
/* ---- wrapper state ---- */
typedef struct OptImpl {
CGTarget base;
- CGTarget* target; /* wrapped */
+ CGTarget* target;
int level;
Compiler* c;
- /* Tape: per-function, reset on func_begin. Allocated from c->tu so
- * the buffer survives panic via compiler_defer cleanups. */
- TapeEntry* tape;
- u32 ntape, tape_cap;
-
- /* Wrapper-local virtual id counters. 1-based; 0 reserved as NONE.
- * Reset on each func_begin. */
- Reg next_vreg;
- FrameSlot next_vslot;
- Label next_vlabel;
- CGScope next_vscope;
-
- /* Replay-time translation tables. Index by virtual id; entry 0 is
- * the NONE sentinel and never referenced. Allocated lazily on first
- * replay so peak size matches the largest function. */
- Reg* reg_map;
- u32 reg_map_cap;
- FrameSlot* slot_map;
- u32 slot_map_cap;
- Label* label_map;
- u32 label_map_cap;
- CGScope* scope_map;
- u32 scope_map_cap;
-
- SrcLoc pending_loc; /* most recent set_loc; stamped onto each entry */
+ /* Current function being recorded. NULL between functions. */
+ Func* f;
+ u32 cur; /* current block id */
+ SrcLoc pending_loc; /* most recent set_loc; stamped on each Inst */
- /* If non-NULL, dump the tape to this writer on each func_end (before
- * replay). Used by cg-runner --dump-tape and ad-hoc debugging. */
Writer* dump_writer;
} OptImpl;
@@ -378,179 +44,94 @@ static OptImpl* impl_of(CGTarget* t) { return (OptImpl*)t; }
static _Noreturn void panic_unsupported(OptImpl* o, const char* what) {
SrcLoc loc = {0, 0, 0};
- compiler_panic(o->c, loc, "opt_cgtarget: %s called under unbounded virtuals",
- what);
+ compiler_panic(o->c, loc,
+ "opt_cgtarget: %s called under unbounded virtuals", what);
}
-/* ---- tape append ---- */
+/* ---- recording helpers ---- */
-static TapeEntry* tape_append(OptImpl* o, TapeOpKind op) {
- TapeEntry* e;
- if (o->ntape == o->tape_cap) {
- u32 ncap = o->tape_cap ? o->tape_cap * 2u : 64u;
- TapeEntry* nb = arena_array(o->c->tu, TapeEntry, ncap);
- if (o->tape) memcpy(nb, o->tape, sizeof(TapeEntry) * o->ntape);
- o->tape = nb;
- o->tape_cap = ncap;
- }
- e = &o->tape[o->ntape++];
- memset(e, 0, sizeof *e);
- e->op = (u8)op;
- e->loc = o->pending_loc;
- return e;
+static Inst* rec(OptImpl* o, IROp op) {
+ Inst* in = ir_emit(o->f, o->cur, op);
+ in->loc = o->pending_loc;
+ return in;
}
-/* ---- deep-copy helpers ---- */
+static void set_def(Func* f, Inst* in, u32 block, Val v, const Type* t) {
+ in->def = v;
+ in->type = t;
+ if (v != VAL_NONE && v < f->nvals) {
+ f->val_def_block[v] = block;
+ f->val_def_inst[v] = f->blocks[block].ninsts - 1u;
+ }
+}
-static CGParamDesc* copy_params(Compiler* c, const CGParamDesc* src, u32 n) {
- CGParamDesc* dst;
+static Operand* dup_opnds(Func* f, const Operand* src, u32 n) {
if (!n) return NULL;
- dst = arena_array(c->tu, CGParamDesc, n);
- memcpy(dst, src, sizeof(CGParamDesc) * n);
+ Operand* dst = arena_array(f->arena, Operand, n);
+ memcpy(dst, src, sizeof(Operand) * n);
return dst;
}
-static CGABIPart* copy_parts(Compiler* c, const CGABIPart* src, u32 n) {
- CGABIPart* dst;
- if (!n) return NULL;
- dst = arena_array(c->tu, CGABIPart, n);
- memcpy(dst, src, sizeof(CGABIPart) * n);
- return dst;
+static int cur_terminated(OptImpl* o) {
+ Block* b = &o->f->blocks[o->cur];
+ if (b->nsucc > 0) return 1;
+ if (b->ninsts == 0) return 0;
+ IROp last = (IROp)b->insts[b->ninsts - 1].op;
+ return last == IR_RET || last == IR_LONGJMP;
}
-static Operand* copy_operands(Compiler* c, const Operand* src, u32 n) {
- Operand* dst;
- if (!n) return NULL;
- dst = arena_array(c->tu, Operand, n);
- memcpy(dst, src, sizeof(Operand) * n);
- return dst;
+static void set_cur(OptImpl* o, u32 b) {
+ o->cur = b;
+ ir_note_emit(o->f, b);
}
-/* ---- map helpers (replay-time) ----
- * The maps are direct-indexed by the 1-based virtual id; entry 0 is
- * the NONE sentinel. */
-
-static void map_reg_grow(OptImpl* o, u32 needed) {
- u32 ncap;
- Reg* nb;
- if (needed <= o->reg_map_cap) return;
- ncap = o->reg_map_cap ? o->reg_map_cap : 16u;
- while (ncap < needed) ncap *= 2u;
- nb = arena_array(o->c->tu, Reg, ncap);
- if (o->reg_map) memcpy(nb, o->reg_map, sizeof(Reg) * o->reg_map_cap);
- /* New slots default to REG_NONE (0xffffffff). */
- for (u32 i = o->reg_map_cap; i < ncap; ++i) nb[i] = REG_NONE;
- o->reg_map = nb;
- o->reg_map_cap = ncap;
-}
-
-static void map_slot_grow(OptImpl* o, u32 needed) {
- u32 ncap;
- FrameSlot* nb;
- if (needed <= o->slot_map_cap) return;
- ncap = o->slot_map_cap ? o->slot_map_cap : 16u;
- while (ncap < needed) ncap *= 2u;
- nb = arena_array(o->c->tu, FrameSlot, ncap);
- if (o->slot_map) memcpy(nb, o->slot_map, sizeof(FrameSlot) * o->slot_map_cap);
- for (u32 i = o->slot_map_cap; i < ncap; ++i) nb[i] = FRAME_SLOT_NONE;
- o->slot_map = nb;
- o->slot_map_cap = ncap;
-}
-
-static void map_label_grow(OptImpl* o, u32 needed) {
- u32 ncap;
- Label* nb;
- if (needed <= o->label_map_cap) return;
- ncap = o->label_map_cap ? o->label_map_cap : 16u;
- while (ncap < needed) ncap *= 2u;
- nb = arena_array(o->c->tu, Label, ncap);
- if (o->label_map) memcpy(nb, o->label_map, sizeof(Label) * o->label_map_cap);
- for (u32 i = o->label_map_cap; i < ncap; ++i) nb[i] = LABEL_NONE;
- o->label_map = nb;
- o->label_map_cap = ncap;
-}
-
-static void map_scope_grow(OptImpl* o, u32 needed) {
- u32 ncap;
- CGScope* nb;
- if (needed <= o->scope_map_cap) return;
- ncap = o->scope_map_cap ? o->scope_map_cap : 8u;
- while (ncap < needed) ncap *= 2u;
- nb = arena_array(o->c->tu, CGScope, ncap);
- if (o->scope_map) memcpy(nb, o->scope_map, sizeof(CGScope) * o->scope_map_cap);
- for (u32 i = o->scope_map_cap; i < ncap; ++i) nb[i] = CG_SCOPE_NONE;
- o->scope_map = nb;
- o->scope_map_cap = ncap;
-}
-
-/* ---- recording: every emit-side method records a tape entry.
- *
- * Allocator methods (alloc_reg, frame_slot, label_new, scope_begin)
- * additionally hand back a wrapper-local virtual id; the underlying
- * target is not consulted until replay. */
+/* After emitting a terminator, allocate a fresh block for any
+ * subsequent (likely unreachable) recording. */
+static void after_terminator(OptImpl* o) {
+ set_cur(o, ir_block_new(o->f));
+}
+
+/* ---- function lifecycle ---- */
static void w_func_begin(CGTarget* t, const CGFuncDesc* fd) {
OptImpl* o = impl_of(t);
- TapeEntry* e;
-
- /* Reset per-function state. */
- o->tape = NULL;
- o->ntape = 0;
- o->tape_cap = 0;
- o->next_vreg = 1;
- o->next_vslot = 1;
- o->next_vlabel = 1;
- o->next_vscope = 1;
+ o->f = ir_func_new(o->c, fd);
+ u32 entry = ir_block_new(o->f);
+ o->f->entry = entry;
+ set_cur(o, entry);
o->pending_loc = (SrcLoc){0, 0, 0};
- /* Reset translation maps; capacities are kept for amortization. */
- for (u32 i = 0; i < o->reg_map_cap; ++i) o->reg_map[i] = REG_NONE;
- for (u32 i = 0; i < o->slot_map_cap; ++i) o->slot_map[i] = FRAME_SLOT_NONE;
- for (u32 i = 0; i < o->label_map_cap; ++i) o->label_map[i] = LABEL_NONE;
- for (u32 i = 0; i < o->scope_map_cap; ++i) o->scope_map[i] = CG_SCOPE_NONE;
-
- e = tape_append(o, TOP_FUNC_BEGIN);
- /* Shallow-copy the descriptor by value, then deep-copy the params
- * array — the harness mutates pds[i].slot AFTER func_begin returns,
- * so we can't rely on pointer-shallow-copy for that field. The slots
- * we record here are wrapper vslots (allocated by w_frame_slot in the
- * subsequent param-setup loop); replay translates them. */
- e->u.func_begin.desc = *fd;
- e->u.func_begin.params = copy_params(o->c, fd->params, fd->nparams);
- e->u.func_begin.desc.params = e->u.func_begin.params;
}
static void w_func_end(CGTarget* t);
+/* ---- registers and frame slots ---- */
+
static Reg w_alloc_reg(CGTarget* t, RegClass cls, const Type* ty) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_ALLOC_REG);
- Reg vreg = o->next_vreg++;
- e->u.alloc_reg.cls = cls;
- e->u.alloc_reg.ty = ty;
- e->u.alloc_reg.vreg = vreg;
- return vreg;
+ Val v = ir_alloc_val(o->f, ty, (u8)cls);
+ return (Reg)v;
}
static void w_free_reg(CGTarget* t, Reg r) {
- /* Hint; opt_cgtarget ignores. The wrapper's vregs are unbounded —
- * there is no pool to return to. */
(void)t;
(void)r;
}
static FrameSlot w_frame_slot(CGTarget* t, const FrameSlotDesc* d) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_FRAME_SLOT);
- FrameSlot vslot = o->next_vslot++;
- e->u.frame_slot.desc = *d;
- e->u.frame_slot.vslot = vslot;
- return vslot;
+ return ir_frame_slot_new(o->f, d);
}
static void w_param(CGTarget* t, const CGParamDesc* d) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_PARAM);
- e->u.param.desc = *d;
+ /* Deep-copy parts so caller-stack memory isn't relied on. */
+ CGParamDesc copy = *d;
+ if (d->nincoming) {
+ CGABIPart* parts = arena_array(o->f->arena, CGABIPart, d->nincoming);
+ memcpy(parts, d->incoming, sizeof(CGABIPart) * d->nincoming);
+ copy.incoming = parts;
+ }
+ ir_param_add(o->f, ©);
}
static const Reg* w_clobbers(CGTarget* t, RegClass cls, u32* nregs) {
@@ -571,317 +152,541 @@ static void w_reload_reg(CGTarget* t, Operand dst, FrameSlot s, MemAccess m) {
panic_unsupported(impl_of(t), "reload_reg");
}
+/* ---- labels and control flow ---- */
+
static Label w_label_new(CGTarget* t) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_LABEL_NEW);
- Label v = o->next_vlabel++;
- e->u.label_new.vlabel = v;
- return v;
+ return (Label)ir_block_new(o->f);
}
static void w_label_place(CGTarget* t, Label l) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_LABEL_PLACE);
- e->u.label_op.vlabel = l;
+ u32 target_blk = (u32)l;
+ if (target_blk >= o->f->nblocks) {
+ SrcLoc loc = {0, 0, 0};
+ compiler_panic(o->c, loc, "opt: label_place(%u) out of range",
+ (unsigned)l);
+ }
+ if (!cur_terminated(o)) {
+ Block* cb = &o->f->blocks[o->cur];
+ rec(o, IR_BR);
+ cb->succ[0] = target_blk;
+ cb->nsucc = 1;
+ }
+ set_cur(o, target_blk);
}
+
static void w_jump(CGTarget* t, Label l) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_JUMP);
- e->u.label_op.vlabel = l;
+ u32 target_blk = (u32)l;
+ Block* cb = &o->f->blocks[o->cur];
+ rec(o, IR_BR);
+ cb->succ[0] = target_blk;
+ cb->nsucc = 1;
+ after_terminator(o);
}
+
static void w_cmp_branch(CGTarget* t, CmpOp op, Operand a, Operand b, Label l) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_CMP_BRANCH);
- e->u.cmp_branch.op = op;
- e->u.cmp_branch.a = a;
- e->u.cmp_branch.b = b;
- e->u.cmp_branch.vlabel = l;
+ u32 taken = (u32)l;
+ Inst* in = rec(o, IR_CMP_BRANCH);
+ Operand ops[2] = {a, b};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ in->extra.imm = (i64)op;
+ Block* cb = &o->f->blocks[o->cur];
+ cb->succ[0] = taken;
+ u32 ft = ir_block_new(o->f);
+ cb->succ[1] = ft;
+ cb->nsucc = 2;
+ set_cur(o, ft);
+}
+
+/* ---- structured scopes ---- */
+
+static u32 scope_register(Func* f, Inst* in) {
+ if (f->nscopes == f->scopes_cap) {
+ u32 ncap = f->scopes_cap ? f->scopes_cap * 2u : 4u;
+ Inst** nb = arena_zarray(f->arena, Inst*, ncap);
+ if (f->scope_aux_inst)
+ memcpy(nb, f->scope_aux_inst, sizeof(Inst*) * f->nscopes);
+ f->scope_aux_inst = nb;
+ f->scopes_cap = ncap;
+ }
+ f->scope_aux_inst[f->nscopes++] = in;
+ return f->nscopes;
+}
+
+static IRScopeAux* scope_lookup(OptImpl* o, CGScope s) {
+ if (s == CG_SCOPE_NONE || s > o->f->nscopes) {
+ SrcLoc loc = {0, 0, 0};
+ compiler_panic(o->c, loc, "opt: bad scope id %u", (unsigned)s);
+ }
+ return (IRScopeAux*)o->f->scope_aux_inst[s - 1]->extra.aux;
}
static CGScope w_scope_begin(CGTarget* t, const CGScopeDesc* d) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_SCOPE_BEGIN);
- CGScope v = o->next_vscope++;
- e->u.scope_begin.desc = *d;
- e->u.scope_begin.vscope = v;
- return v;
+ Inst* in = rec(o, IR_SCOPE_BEGIN);
+ IRScopeAux* aux = arena_znew(o->f->arena, IRScopeAux);
+ aux->desc = *d;
+ in->extra.aux = aux;
+ u32 sid = scope_register(o->f, in);
+ aux->scope_id = sid;
+
+ if (d->kind == SCOPE_IF) {
+ aux->if_then_block = ir_block_new(o->f);
+ aux->if_else_block = ir_block_new(o->f);
+ aux->if_end_block = ir_block_new(o->f);
+ Block* cb = &o->f->blocks[o->cur];
+ cb->succ[0] = aux->if_then_block;
+ cb->succ[1] = aux->if_else_block;
+ cb->nsucc = 2;
+ set_cur(o, aux->if_then_block);
+ } else if (d->kind == SCOPE_LOOP || d->kind == SCOPE_BLOCK) {
+ aux->loop_break_block =
+ d->break_label != LABEL_NONE ? (u32)d->break_label : 0;
+ aux->loop_continue_block =
+ d->continue_label != LABEL_NONE ? (u32)d->continue_label : 0;
+ }
+ return (CGScope)sid;
}
+
static void w_scope_else(CGTarget* t, CGScope s) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_SCOPE_ELSE);
- e->u.scope_op.vscope = s;
+ IRScopeAux* aux = scope_lookup(o, s);
+ if (aux->desc.kind != SCOPE_IF) {
+ SrcLoc loc = {0, 0, 0};
+ compiler_panic(o->c, loc, "opt: scope_else on non-IF scope %u",
+ (unsigned)s);
+ }
+ Inst* in = rec(o, IR_SCOPE_ELSE);
+ in->extra.imm = (i64)s;
+ if (!cur_terminated(o)) {
+ Block* cb = &o->f->blocks[o->cur];
+ cb->succ[0] = aux->if_end_block;
+ cb->nsucc = 1;
+ }
+ aux->if_has_else = 1;
+ set_cur(o, aux->if_else_block);
}
+
static void w_scope_end(CGTarget* t, CGScope s) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_SCOPE_END);
- e->u.scope_op.vscope = s;
+ IRScopeAux* aux = scope_lookup(o, s);
+ Inst* in = rec(o, IR_SCOPE_END);
+ in->extra.imm = (i64)s;
+ if (aux->desc.kind == SCOPE_IF) {
+ if (!cur_terminated(o)) {
+ Block* cb = &o->f->blocks[o->cur];
+ cb->succ[0] = aux->if_end_block;
+ cb->nsucc = 1;
+ }
+ if (!aux->if_has_else) {
+ Block* eb = &o->f->blocks[aux->if_else_block];
+ if (eb->nsucc == 0) {
+ eb->succ[0] = aux->if_end_block;
+ eb->nsucc = 1;
+ }
+ /* Else block was never visited as cur, but it has code (the
+ * fall-through from scope_begin) — record it before end so emit
+ * order has it. */
+ ir_note_emit(o->f, aux->if_else_block);
+ }
+ set_cur(o, aux->if_end_block);
+ }
}
+
static void w_break_to(CGTarget* t, CGScope s) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_BREAK_TO);
- e->u.scope_op.vscope = s;
+ IRScopeAux* aux = scope_lookup(o, s);
+ Inst* in = rec(o, IR_BREAK_TO);
+ in->extra.imm = (i64)s;
+ Block* cb = &o->f->blocks[o->cur];
+ cb->succ[0] = aux->loop_break_block;
+ cb->nsucc = 1;
+ after_terminator(o);
}
+
static void w_continue_to(CGTarget* t, CGScope s) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_CONTINUE_TO);
- e->u.scope_op.vscope = s;
+ IRScopeAux* aux = scope_lookup(o, s);
+ Inst* in = rec(o, IR_CONTINUE_TO);
+ in->extra.imm = (i64)s;
+ Block* cb = &o->f->blocks[o->cur];
+ cb->succ[0] = aux->loop_continue_block;
+ cb->nsucc = 1;
+ after_terminator(o);
}
+/* ---- data movement ---- */
+
static void w_load_imm(CGTarget* t, Operand dst, i64 imm) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_LOAD_IMM);
- e->u.load_imm.dst = dst;
- e->u.load_imm.imm = imm;
+ Inst* in = rec(o, IR_LOAD_IMM);
+ Operand ops[1] = {dst};
+ in->opnds = dup_opnds(o->f, ops, 1);
+ in->nopnds = 1;
+ in->extra.imm = imm;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_load_const(CGTarget* t, Operand dst, ConstBytes cb) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_LOAD_CONST);
- e->u.load_const.dst = dst;
- e->u.load_const.cb = cb;
+ Inst* in = rec(o, IR_LOAD_CONST);
+ Operand ops[1] = {dst};
+ in->opnds = dup_opnds(o->f, ops, 1);
+ in->nopnds = 1;
+ in->extra.cbytes = cb;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_copy(CGTarget* t, Operand dst, Operand src) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_COPY);
- e->u.copy.dst = dst;
- e->u.copy.src = src;
+ Inst* in = rec(o, IR_COPY);
+ Operand ops[2] = {dst, src};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_load(CGTarget* t, Operand dst, Operand addr, MemAccess m) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_LOAD);
- e->u.load.dst = dst;
- e->u.load.addr = addr;
- e->u.load.mem = m;
+ Inst* in = rec(o, IR_LOAD);
+ Operand ops[2] = {dst, addr};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ in->extra.mem = m;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_store(CGTarget* t, Operand addr, Operand src, MemAccess m) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_STORE);
- e->u.store.addr = addr;
- e->u.store.src = src;
- e->u.store.mem = m;
+ Inst* in = rec(o, IR_STORE);
+ Operand ops[2] = {addr, src};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ in->extra.mem = m;
}
+
static void w_addr_of(CGTarget* t, Operand dst, Operand lv) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_ADDR_OF);
- e->u.copy.dst = dst;
- e->u.copy.src = lv;
+ Inst* in = rec(o, IR_ADDR_OF);
+ Operand ops[2] = {dst, lv};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_tls_addr_of(CGTarget* t, Operand dst, ObjSymId sym, i64 addend) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_TLS_ADDR_OF);
- e->u.tls_addr_of.dst = dst;
- e->u.tls_addr_of.sym = sym;
- e->u.tls_addr_of.addend = addend;
+ Inst* in = rec(o, IR_TLS_ADDR_OF);
+ Operand ops[1] = {dst};
+ in->opnds = dup_opnds(o->f, ops, 1);
+ in->nopnds = 1;
+ IRTlsAux* aux = arena_znew(o->f->arena, IRTlsAux);
+ aux->sym = sym;
+ aux->addend = addend;
+ in->extra.aux = aux;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_copy_bytes(CGTarget* t, Operand dst, Operand src,
AggregateAccess agg) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_COPY_BYTES);
- e->u.agg.a = dst;
- e->u.agg.b = src;
- e->u.agg.agg = agg;
+ Inst* in = rec(o, IR_AGG_COPY);
+ Operand ops[2] = {dst, src};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ IRAggAux* aux = arena_znew(o->f->arena, IRAggAux);
+ aux->access = agg;
+ in->extra.aux = aux;
}
+
static void w_set_bytes(CGTarget* t, Operand dst, Operand byte,
AggregateAccess agg) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_SET_BYTES);
- e->u.agg.a = dst;
- e->u.agg.b = byte;
- e->u.agg.agg = agg;
+ Inst* in = rec(o, IR_AGG_SET);
+ Operand ops[2] = {dst, byte};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ IRAggAux* aux = arena_znew(o->f->arena, IRAggAux);
+ aux->access = agg;
+ in->extra.aux = aux;
}
+
static void w_bitfield_load(CGTarget* t, Operand dst, Operand record,
BitFieldAccess bf) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_BITFIELD_LOAD);
- e->u.bitfield_load.dst = dst;
- e->u.bitfield_load.record = record;
- e->u.bitfield_load.bf = bf;
+ Inst* in = rec(o, IR_BITFIELD_LOAD);
+ Operand ops[2] = {dst, record};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ IRBitFieldAux* aux = arena_znew(o->f->arena, IRBitFieldAux);
+ aux->access = bf;
+ in->extra.aux = aux;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_bitfield_store(CGTarget* t, Operand record, Operand src,
BitFieldAccess bf) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_BITFIELD_STORE);
- e->u.bitfield_store.record = record;
- e->u.bitfield_store.src = src;
- e->u.bitfield_store.bf = bf;
+ Inst* in = rec(o, IR_BITFIELD_STORE);
+ Operand ops[2] = {record, src};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ IRBitFieldAux* aux = arena_znew(o->f->arena, IRBitFieldAux);
+ aux->access = bf;
+ in->extra.aux = aux;
}
+/* ---- arithmetic / cmp / convert ---- */
+
static void w_binop(CGTarget* t, BinOp op, Operand dst, Operand a, Operand b) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_BINOP);
- e->u.binop.op = op;
- e->u.binop.dst = dst;
- e->u.binop.a = a;
- e->u.binop.b = b;
+ Inst* in = rec(o, IR_BINOP);
+ Operand ops[3] = {dst, a, b};
+ in->opnds = dup_opnds(o->f, ops, 3);
+ in->nopnds = 3;
+ in->extra.imm = (i64)op;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_unop(CGTarget* t, UnOp op, Operand dst, Operand a) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_UNOP);
- e->u.unop.op = op;
- e->u.unop.dst = dst;
- e->u.unop.a = a;
+ Inst* in = rec(o, IR_UNOP);
+ Operand ops[2] = {dst, a};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ in->extra.imm = (i64)op;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_cmp(CGTarget* t, CmpOp op, Operand dst, Operand a, Operand b) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_CMP);
- e->u.cmp.op = op;
- e->u.cmp.dst = dst;
- e->u.cmp.a = a;
- e->u.cmp.b = b;
+ Inst* in = rec(o, IR_CMP);
+ Operand ops[3] = {dst, a, b};
+ in->opnds = dup_opnds(o->f, ops, 3);
+ in->nopnds = 3;
+ in->extra.imm = (i64)op;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_convert(CGTarget* t, ConvKind k, Operand dst, Operand src) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_CONVERT);
- e->u.convert.kind = k;
- e->u.convert.dst = dst;
- e->u.convert.src = src;
+ Inst* in = rec(o, IR_CONVERT);
+ Operand ops[2] = {dst, src};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ in->extra.imm = (i64)k;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
+}
+
+/* ---- calls / return ---- */
+
+static CGABIPart* dup_parts(Arena* a, const CGABIPart* src, u32 n) {
+ if (!n) return NULL;
+ CGABIPart* dst = arena_array(a, CGABIPart, n);
+ memcpy(dst, src, sizeof(CGABIPart) * n);
+ return dst;
}
static void w_call(CGTarget* t, const CGCallDesc* d) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_CALL);
- CGABIValue* args_copy = NULL;
- CGABIPart** arg_parts_copy = NULL;
- CGABIPart* ret_parts_copy = NULL;
- u32 i;
-
- /* Deep-copy the argv. Caller-owned d may be on the stack, and
- * args[i].parts may be too. */
+ Inst* in = rec(o, IR_CALL);
+ IRCallAux* aux = arena_znew(o->f->arena, IRCallAux);
+ aux->desc = *d;
if (d->nargs) {
- args_copy = arena_array(o->c->tu, CGABIValue, d->nargs);
- arg_parts_copy = arena_array(o->c->tu, CGABIPart*, d->nargs);
- for (i = 0; i < d->nargs; ++i) {
- args_copy[i] = d->args[i];
- arg_parts_copy[i] =
- copy_parts(o->c, d->args[i].parts, d->args[i].nparts);
- args_copy[i].parts = arg_parts_copy[i];
+ CGABIValue* args = arena_array(o->f->arena, CGABIValue, d->nargs);
+ for (u32 i = 0; i < d->nargs; ++i) {
+ args[i] = d->args[i];
+ args[i].parts =
+ dup_parts(o->f->arena, d->args[i].parts, d->args[i].nparts);
}
+ aux->desc.args = args;
}
- ret_parts_copy = copy_parts(o->c, d->ret.parts, d->ret.nparts);
-
- e->u.call.desc = *d;
- e->u.call.desc.args = args_copy;
- e->u.call.desc.ret.parts = ret_parts_copy;
- e->u.call.args = args_copy;
- e->u.call.arg_parts = arg_parts_copy;
- e->u.call.ret_parts = ret_parts_copy;
+ aux->desc.ret = d->ret;
+ aux->desc.ret.parts = dup_parts(o->f->arena, d->ret.parts, d->ret.nparts);
+ in->extra.aux = aux;
+ in->type = d->fn_type;
}
static void w_ret(CGTarget* t, const CGABIValue* v) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_RET);
- if (!v) {
- e->u.ret.present = 0;
- return;
+ Inst* in = rec(o, IR_RET);
+ IRRetAux* aux = arena_znew(o->f->arena, IRRetAux);
+ if (v) {
+ aux->present = 1;
+ aux->val = *v;
+ aux->val.parts = dup_parts(o->f->arena, v->parts, v->nparts);
}
- e->u.ret.present = 1;
- e->u.ret.val = *v;
- e->u.ret.parts = copy_parts(o->c, v->parts, v->nparts);
- e->u.ret.val.parts = e->u.ret.parts;
+ in->extra.aux = aux;
+ Block* cb = &o->f->blocks[o->cur];
+ cb->nsucc = 0;
+ after_terminator(o);
}
+/* ---- alloca / variadics / setjmp / atomics / fence / intrinsic ---- */
+
static void w_alloca_(CGTarget* t, Operand dst, Operand size, u32 align) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_ALLOCA);
- e->u.alloca_.dst = dst;
- e->u.alloca_.size = size;
- e->u.alloca_.align = align;
+ Inst* in = rec(o, IR_ALLOCA);
+ Operand ops[2] = {dst, size};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ in->extra.imm = (i64)align;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
static void w_va_start_(CGTarget* t, Operand ap) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_VA_START);
- e->u.va_se.ap = ap;
+ Inst* in = rec(o, IR_VA_START);
+ Operand ops[1] = {ap};
+ in->opnds = dup_opnds(o->f, ops, 1);
+ in->nopnds = 1;
}
+
static void w_va_arg_(CGTarget* t, Operand dst, Operand ap, const Type* ty) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_VA_ARG);
- e->u.va_arg_.dst = dst;
- e->u.va_arg_.ap = ap;
- e->u.va_arg_.ty = ty;
+ Inst* in = rec(o, IR_VA_ARG);
+ Operand ops[2] = {dst, ap};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ in->extra.aux = (void*)ty;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_va_end_(CGTarget* t, Operand ap) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_VA_END);
- e->u.va_se.ap = ap;
+ Inst* in = rec(o, IR_VA_END);
+ Operand ops[1] = {ap};
+ in->opnds = dup_opnds(o->f, ops, 1);
+ in->nopnds = 1;
}
+
static void w_va_copy_(CGTarget* t, Operand dst, Operand src) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_VA_COPY);
- e->u.copy.dst = dst;
- e->u.copy.src = src;
+ Inst* in = rec(o, IR_VA_COPY);
+ Operand ops[2] = {dst, src};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
}
static void w_setjmp_(CGTarget* t, Operand dst, Operand buf) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_SETJMP);
- e->u.setjmp_.dst = dst;
- e->u.setjmp_.buf = buf;
+ Inst* in = rec(o, IR_SETJMP);
+ Operand ops[2] = {dst, buf};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_longjmp_(CGTarget* t, Operand buf, Operand val) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_LONGJMP);
- e->u.longjmp_.buf = buf;
- e->u.longjmp_.val = val;
+ Inst* in = rec(o, IR_LONGJMP);
+ Operand ops[2] = {buf, val};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ Block* cb = &o->f->blocks[o->cur];
+ cb->nsucc = 0;
+ after_terminator(o);
}
static void w_atomic_load(CGTarget* t, Operand dst, Operand addr, MemAccess m,
MemOrder mo) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_ATOMIC_LOAD);
- e->u.atomic_load.dst = dst;
- e->u.atomic_load.addr = addr;
- e->u.atomic_load.mem = m;
- e->u.atomic_load.mo = mo;
+ Inst* in = rec(o, IR_ATOMIC_LOAD);
+ Operand ops[2] = {dst, addr};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ IRAtomicAux* aux = arena_znew(o->f->arena, IRAtomicAux);
+ aux->mem = m;
+ aux->mo = mo;
+ in->extra.aux = aux;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_atomic_store(CGTarget* t, Operand addr, Operand src, MemAccess m,
MemOrder mo) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_ATOMIC_STORE);
- e->u.atomic_store.addr = addr;
- e->u.atomic_store.src = src;
- e->u.atomic_store.mem = m;
- e->u.atomic_store.mo = mo;
+ Inst* in = rec(o, IR_ATOMIC_STORE);
+ Operand ops[2] = {addr, src};
+ in->opnds = dup_opnds(o->f, ops, 2);
+ in->nopnds = 2;
+ IRAtomicAux* aux = arena_znew(o->f->arena, IRAtomicAux);
+ aux->mem = m;
+ aux->mo = mo;
+ in->extra.aux = aux;
}
+
static void w_atomic_rmw(CGTarget* t, AtomicOp op, Operand dst, Operand addr,
Operand val, MemAccess m, MemOrder mo) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_ATOMIC_RMW);
- e->u.atomic_rmw.op = op;
- e->u.atomic_rmw.dst = dst;
- e->u.atomic_rmw.addr = addr;
- e->u.atomic_rmw.val = val;
- e->u.atomic_rmw.mem = m;
- e->u.atomic_rmw.mo = mo;
+ Inst* in = rec(o, IR_ATOMIC_RMW);
+ Operand ops[3] = {dst, addr, val};
+ in->opnds = dup_opnds(o->f, ops, 3);
+ in->nopnds = 3;
+ IRAtomicAux* aux = arena_znew(o->f->arena, IRAtomicAux);
+ aux->mem = m;
+ aux->mo = mo;
+ aux->op = (u8)op;
+ in->extra.aux = aux;
+ if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type);
}
+
static void w_atomic_cas(CGTarget* t, Operand prior, Operand ok, Operand addr,
Operand expected, Operand desired, MemAccess m,
MemOrder s, MemOrder f) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_ATOMIC_CAS);
- e->u.atomic_cas.prior = prior;
- e->u.atomic_cas.ok = ok;
- e->u.atomic_cas.addr = addr;
- e->u.atomic_cas.expected = expected;
- e->u.atomic_cas.desired = desired;
- e->u.atomic_cas.mem = m;
- e->u.atomic_cas.success = s;
- e->u.atomic_cas.failure = f;
+ Inst* in = rec(o, IR_ATOMIC_CAS);
+ Operand ops[5] = {prior, ok, addr, expected, desired};
+ in->opnds = dup_opnds(o->f, ops, 5);
+ in->nopnds = 5;
+ IRCasAux* aux = arena_znew(o->f->arena, IRCasAux);
+ aux->mem = m;
+ aux->success = s;
+ aux->failure = f;
+ in->extra.aux = aux;
+ if (prior.kind == OPK_REG)
+ set_def(o->f, in, o->cur, (Val)prior.v.reg, prior.type);
}
+
static void w_fence(CGTarget* t, MemOrder mo) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_FENCE);
- e->u.fence.mo = mo;
+ Inst* in = rec(o, IR_FENCE);
+ in->extra.imm = (i64)mo;
}
-static void w_intrinsic(CGTarget* t, IntrinKind k, Operand* dsts, u32 nd,
+static void w_intrinsic(CGTarget* t, IntrinKind kind, Operand* dsts, u32 nd,
const Operand* args, u32 na) {
OptImpl* o = impl_of(t);
- TapeEntry* e = tape_append(o, TOP_INTRINSIC);
- e->u.intrinsic.kind = k;
- e->u.intrinsic.ndst = nd;
- e->u.intrinsic.narg = na;
- e->u.intrinsic.dsts = copy_operands(o->c, dsts, nd);
- e->u.intrinsic.args = copy_operands(o->c, args, na);
+ Inst* in = rec(o, IR_INTRINSIC);
+ IRIntrinAux* aux = arena_znew(o->f->arena, IRIntrinAux);
+ aux->kind = kind;
+ aux->ndst = nd;
+ aux->narg = na;
+ aux->dsts = nd ? arena_array(o->f->arena, Operand, nd) : NULL;
+ aux->args = na ? arena_array(o->f->arena, Operand, na) : NULL;
+ if (nd) memcpy(aux->dsts, dsts, sizeof(Operand) * nd);
+ if (na) memcpy(aux->args, args, sizeof(Operand) * na);
+ in->extra.aux = aux;
+ if (nd == 1 && dsts[0].kind == OPK_REG) {
+ set_def(o->f, in, o->cur, (Val)dsts[0].v.reg, dsts[0].type);
+ } else if (nd > 1) {
+ in->ndefs = nd;
+ in->defs = arena_array(o->f->arena, Val, nd);
+ for (u32 i = 0; i < nd; ++i) {
+ in->defs[i] =
+ (dsts[i].kind == OPK_REG) ? (Val)dsts[i].v.reg : VAL_NONE;
+ if (in->defs[i] != VAL_NONE && in->defs[i] < o->f->nvals) {
+ o->f->val_def_block[in->defs[i]] = o->cur;
+ o->f->val_def_inst[in->defs[i]] =
+ o->f->blocks[o->cur].ninsts - 1u;
+ }
+ }
+ in->def = in->defs[0];
+ in->type = dsts[0].type;
+ }
}
static void w_asm_block(CGTarget* t, const char* tmpl,
@@ -897,877 +702,473 @@ static void w_asm_block(CGTarget* t, const char* tmpl,
(void)in_ops;
(void)clobbers;
(void)nclob;
- /* Group M (inline asm) is deferred in the corpus; the wrapper does
- * not yet support it. */
panic_unsupported(impl_of(t), "asm_block");
}
static void w_set_loc(CGTarget* t, SrcLoc loc) {
OptImpl* o = impl_of(t);
- TapeEntry* e;
o->pending_loc = loc;
- e = tape_append(o, TOP_SET_LOC);
- e->u.set_loc.loc = loc;
}
-/* ---- replay-time translation ---- */
+/* ============================================================
+ * Replay: walk the recorded Func and emit to the wrapped target.
+ * ============================================================ */
-static Reg xlat_reg(OptImpl* o, Reg vreg) {
- if (vreg == REG_NONE || vreg == 0u) return vreg;
- if (vreg >= o->reg_map_cap || o->reg_map[vreg] == REG_NONE) {
- SrcLoc loc = {0, 0, 0};
- compiler_panic(o->c, loc, "opt replay: unmapped vreg %u", (unsigned)vreg);
- }
- return o->reg_map[vreg];
-}
-
-static FrameSlot xlat_slot(OptImpl* o, FrameSlot vs) {
- if (vs == FRAME_SLOT_NONE) return FRAME_SLOT_NONE;
- if (vs >= o->slot_map_cap || o->slot_map[vs] == FRAME_SLOT_NONE) {
+typedef struct ReplayCtx {
+ OptImpl* o;
+ CGTarget* tgt;
+ Reg* val_to_reg;
+ FrameSlot* slot_map;
+ Label* label_map;
+ CGScope* scope_map;
+ u8* val_alloced;
+ u8* block_label_placed;
+} ReplayCtx;
+
+static Reg val_to_target_reg(ReplayCtx* r, Val v) {
+ Func* f = r->o->f;
+ if (v == VAL_NONE) return REG_NONE;
+ if (v >= f->nvals) {
SrcLoc loc = {0, 0, 0};
- compiler_panic(o->c, loc, "opt replay: unmapped vslot %u", (unsigned)vs);
+ compiler_panic(r->o->c, loc, "opt replay: Val %u out of range", v);
}
- return o->slot_map[vs];
-}
-
-static Label xlat_label(OptImpl* o, Label vl) {
- if (vl == LABEL_NONE) return LABEL_NONE;
- if (vl >= o->label_map_cap || o->label_map[vl] == LABEL_NONE) {
- SrcLoc loc = {0, 0, 0};
- compiler_panic(o->c, loc, "opt replay: unmapped vlabel %u", (unsigned)vl);
+ if (!r->val_alloced[v]) {
+ r->val_to_reg[v] =
+ r->tgt->alloc_reg(r->tgt, (RegClass)f->val_cls[v], f->val_type[v]);
+ r->val_alloced[v] = 1;
}
- return o->label_map[vl];
+ return r->val_to_reg[v];
}
-static CGScope xlat_scope(OptImpl* o, CGScope vs) {
- if (vs == CG_SCOPE_NONE) return CG_SCOPE_NONE;
- if (vs >= o->scope_map_cap || o->scope_map[vs] == CG_SCOPE_NONE) {
+static FrameSlot slot_to_target(ReplayCtx* r, FrameSlot vs) {
+ if (vs == FRAME_SLOT_NONE) return FRAME_SLOT_NONE;
+ if (vs >= r->o->f->nframe_slots + 1u) {
SrcLoc loc = {0, 0, 0};
- compiler_panic(o->c, loc, "opt replay: unmapped vscope %u", (unsigned)vs);
+ compiler_panic(r->o->c, loc, "opt replay: vslot %u out of range",
+ (unsigned)vs);
}
- return o->scope_map[vs];
+ return r->slot_map[vs];
}
-static Operand xlat_op(OptImpl* o, Operand op) {
+static Operand xlat_op(ReplayCtx* r, Operand op) {
switch ((OpKind)op.kind) {
case OPK_IMM:
case OPK_GLOBAL:
return op;
case OPK_REG:
- op.v.reg = xlat_reg(o, op.v.reg);
+ op.v.reg = val_to_target_reg(r, (Val)op.v.reg);
return op;
case OPK_LOCAL:
- op.v.frame_slot = xlat_slot(o, op.v.frame_slot);
+ op.v.frame_slot = slot_to_target(r, op.v.frame_slot);
return op;
case OPK_INDIRECT:
- op.v.ind.base = xlat_reg(o, op.v.ind.base);
+ op.v.ind.base = val_to_target_reg(r, (Val)op.v.ind.base);
return op;
}
- /* unreachable */
return op;
}
-static CGABIValue xlat_abivalue(OptImpl* o, const CGABIValue* in,
+static CGABIValue xlat_abivalue(ReplayCtx* r, const CGABIValue* in,
CGABIPart* parts_out) {
CGABIValue out = *in;
- out.storage = xlat_op(o, in->storage);
+ out.storage = xlat_op(r, in->storage);
if (in->nparts && parts_out) {
for (u32 i = 0; i < in->nparts; ++i) {
parts_out[i] = in->parts[i];
- parts_out[i].op = xlat_op(o, in->parts[i].op);
+ parts_out[i].op = xlat_op(r, in->parts[i].op);
}
out.parts = parts_out;
+ } else {
+ out.parts = NULL;
}
return out;
}
-/* ---- replay ---- */
-
-static void replay(OptImpl* o) {
- CGTarget* w = o->target;
-
- /* Pre-size the maps to the high-water mark for this function. */
- if (o->next_vreg > 1) map_reg_grow(o, o->next_vreg);
- if (o->next_vslot > 1) map_slot_grow(o, o->next_vslot);
- if (o->next_vlabel > 1) map_label_grow(o, o->next_vlabel);
- if (o->next_vscope > 1) map_scope_grow(o, o->next_vscope);
-
- for (u32 i = 0; i < o->ntape; ++i) {
- TapeEntry* e = &o->tape[i];
- if (e->dead) continue;
- switch ((TapeOpKind)e->op) {
- case TOP_FUNC_BEGIN: {
- /* Build a fresh CGFuncDesc with translated param slots. */
- CGFuncDesc fd = e->u.func_begin.desc;
- if (fd.nparams) {
- CGParamDesc* params = arena_array(o->c->tu, CGParamDesc, fd.nparams);
- for (u32 k = 0; k < fd.nparams; ++k) {
- params[k] = e->u.func_begin.params[k];
- params[k].slot = xlat_slot(o, e->u.func_begin.params[k].slot);
- }
- fd.params = params;
- }
- w->func_begin(w, &fd);
- break;
- }
- case TOP_FUNC_END:
- w->func_end(w);
- break;
- case TOP_ALLOC_REG: {
- Reg r =
- w->alloc_reg(w, e->u.alloc_reg.cls, e->u.alloc_reg.ty);
- Reg v = e->u.alloc_reg.vreg;
- if (v >= o->reg_map_cap) map_reg_grow(o, v + 1);
- o->reg_map[v] = r;
- break;
- }
- case TOP_FRAME_SLOT: {
- FrameSlot s = w->frame_slot(w, &e->u.frame_slot.desc);
- FrameSlot v = e->u.frame_slot.vslot;
- if (v >= o->slot_map_cap) map_slot_grow(o, v + 1);
- o->slot_map[v] = s;
- break;
- }
- case TOP_PARAM: {
- CGParamDesc d = e->u.param.desc;
- d.slot = xlat_slot(o, d.slot);
- w->param(w, &d);
- break;
- }
- case TOP_LABEL_NEW: {
- Label l = w->label_new(w);
- Label v = e->u.label_new.vlabel;
- if (v >= o->label_map_cap) map_label_grow(o, v + 1);
- o->label_map[v] = l;
- break;
- }
- case TOP_LABEL_PLACE:
- w->label_place(w, xlat_label(o, e->u.label_op.vlabel));
- break;
- case TOP_JUMP:
- w->jump(w, xlat_label(o, e->u.label_op.vlabel));
- break;
- case TOP_CMP_BRANCH:
- w->cmp_branch(w, e->u.cmp_branch.op, xlat_op(o, e->u.cmp_branch.a),
- xlat_op(o, e->u.cmp_branch.b),
- xlat_label(o, e->u.cmp_branch.vlabel));
- break;
- case TOP_SCOPE_BEGIN: {
- CGScopeDesc d = e->u.scope_begin.desc;
- d.cond = xlat_op(o, d.cond);
- d.break_label = xlat_label(o, d.break_label);
- d.continue_label = xlat_label(o, d.continue_label);
- CGScope s = w->scope_begin(w, &d);
- CGScope v = e->u.scope_begin.vscope;
- if (v >= o->scope_map_cap) map_scope_grow(o, v + 1);
- o->scope_map[v] = s;
- break;
- }
- case TOP_SCOPE_ELSE:
- w->scope_else(w, xlat_scope(o, e->u.scope_op.vscope));
- break;
- case TOP_SCOPE_END:
- w->scope_end(w, xlat_scope(o, e->u.scope_op.vscope));
- break;
- case TOP_BREAK_TO:
- w->break_to(w, xlat_scope(o, e->u.scope_op.vscope));
- break;
- case TOP_CONTINUE_TO:
- w->continue_to(w, xlat_scope(o, e->u.scope_op.vscope));
- break;
- case TOP_LOAD_IMM:
- w->load_imm(w, xlat_op(o, e->u.load_imm.dst), e->u.load_imm.imm);
- break;
- case TOP_LOAD_CONST:
- w->load_const(w, xlat_op(o, e->u.load_const.dst), e->u.load_const.cb);
- break;
- case TOP_COPY:
- w->copy(w, xlat_op(o, e->u.copy.dst), xlat_op(o, e->u.copy.src));
- break;
- case TOP_LOAD:
- w->load(w, xlat_op(o, e->u.load.dst), xlat_op(o, e->u.load.addr),
- e->u.load.mem);
- break;
- case TOP_STORE:
- w->store(w, xlat_op(o, e->u.store.addr), xlat_op(o, e->u.store.src),
- e->u.store.mem);
- break;
- case TOP_ADDR_OF:
- w->addr_of(w, xlat_op(o, e->u.copy.dst), xlat_op(o, e->u.copy.src));
- break;
- case TOP_TLS_ADDR_OF:
- w->tls_addr_of(w, xlat_op(o, e->u.tls_addr_of.dst),
- e->u.tls_addr_of.sym, e->u.tls_addr_of.addend);
- break;
- case TOP_COPY_BYTES:
- w->copy_bytes(w, xlat_op(o, e->u.agg.a), xlat_op(o, e->u.agg.b),
- e->u.agg.agg);
- break;
- case TOP_SET_BYTES:
- w->set_bytes(w, xlat_op(o, e->u.agg.a), xlat_op(o, e->u.agg.b),
- e->u.agg.agg);
- break;
- case TOP_BITFIELD_LOAD:
- w->bitfield_load(w, xlat_op(o, e->u.bitfield_load.dst),
- xlat_op(o, e->u.bitfield_load.record),
- e->u.bitfield_load.bf);
- break;
- case TOP_BITFIELD_STORE:
- w->bitfield_store(w, xlat_op(o, e->u.bitfield_store.record),
- xlat_op(o, e->u.bitfield_store.src),
- e->u.bitfield_store.bf);
- break;
- case TOP_BINOP:
- w->binop(w, e->u.binop.op, xlat_op(o, e->u.binop.dst),
- xlat_op(o, e->u.binop.a), xlat_op(o, e->u.binop.b));
- break;
- case TOP_UNOP:
- w->unop(w, e->u.unop.op, xlat_op(o, e->u.unop.dst),
- xlat_op(o, e->u.unop.a));
- break;
- case TOP_CMP:
- w->cmp(w, e->u.cmp.op, xlat_op(o, e->u.cmp.dst),
- xlat_op(o, e->u.cmp.a), xlat_op(o, e->u.cmp.b));
- break;
- case TOP_CONVERT:
- w->convert(w, e->u.convert.kind, xlat_op(o, e->u.convert.dst),
- xlat_op(o, e->u.convert.src));
- break;
- case TOP_CALL: {
- CGCallDesc cd = e->u.call.desc;
- cd.callee = xlat_op(o, cd.callee);
- CGABIValue* args = NULL;
- if (cd.nargs) {
- args = arena_array(o->c->tu, CGABIValue, cd.nargs);
- for (u32 k = 0; k < cd.nargs; ++k) {
- CGABIPart* parts =
- e->u.call.args[k].nparts
- ? arena_array(o->c->tu, CGABIPart,
- e->u.call.args[k].nparts)
- : NULL;
- args[k] = xlat_abivalue(o, &e->u.call.args[k], parts);
- }
- cd.args = args;
- } else {
- cd.args = NULL;
+static Label ensure_label(ReplayCtx* r, u32 b) {
+ if (b >= r->o->f->nblocks) return LABEL_NONE;
+ if (r->label_map[b] == LABEL_NONE) {
+ r->label_map[b] = r->tgt->label_new(r->tgt);
+ }
+ return r->label_map[b];
+}
+
+static void ensure_label_placed(ReplayCtx* r, u32 b) {
+ if (r->block_label_placed[b]) return;
+ r->block_label_placed[b] = 1;
+ if (b == r->o->f->entry) return;
+ Label l = ensure_label(r, b);
+ r->tgt->label_place(r->tgt, l);
+}
+
+static void replay_inst(ReplayCtx* r, u32 b, Inst* in) {
+ CGTarget* w = r->tgt;
+ w->set_loc(w, in->loc);
+
+ switch ((IROp)in->op) {
+ case IR_NOP:
+ case IR_CONST_I:
+ case IR_CONST_BYTES:
+ case IR_PARAM_DECL:
+ case IR_PHI:
+ case IR_CONDBR:
+ case IR_ASM_BLOCK:
+ break;
+ case IR_LOAD_IMM: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ w->load_imm(w, dst, in->extra.imm);
+ break;
+ }
+ case IR_LOAD_CONST: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ w->load_const(w, dst, in->extra.cbytes);
+ break;
+ }
+ case IR_COPY: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand src = xlat_op(r, in->opnds[1]);
+ w->copy(w, dst, src);
+ break;
+ }
+ case IR_LOAD: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand addr = xlat_op(r, in->opnds[1]);
+ w->load(w, dst, addr, in->extra.mem);
+ break;
+ }
+ case IR_STORE: {
+ Operand addr = xlat_op(r, in->opnds[0]);
+ Operand src = xlat_op(r, in->opnds[1]);
+ w->store(w, addr, src, in->extra.mem);
+ break;
+ }
+ case IR_ADDR_OF: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand lv = xlat_op(r, in->opnds[1]);
+ w->addr_of(w, dst, lv);
+ break;
+ }
+ case IR_TLS_ADDR_OF: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ IRTlsAux* aux = (IRTlsAux*)in->extra.aux;
+ w->tls_addr_of(w, dst, aux->sym, aux->addend);
+ break;
+ }
+ case IR_AGG_COPY: {
+ Operand a = xlat_op(r, in->opnds[0]);
+ Operand bo = xlat_op(r, in->opnds[1]);
+ IRAggAux* aux = (IRAggAux*)in->extra.aux;
+ w->copy_bytes(w, a, bo, aux->access);
+ break;
+ }
+ case IR_AGG_SET: {
+ Operand a = xlat_op(r, in->opnds[0]);
+ Operand bo = xlat_op(r, in->opnds[1]);
+ IRAggAux* aux = (IRAggAux*)in->extra.aux;
+ w->set_bytes(w, a, bo, aux->access);
+ break;
+ }
+ case IR_BITFIELD_LOAD: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand rec_ = xlat_op(r, in->opnds[1]);
+ IRBitFieldAux* aux = (IRBitFieldAux*)in->extra.aux;
+ w->bitfield_load(w, dst, rec_, aux->access);
+ break;
+ }
+ case IR_BITFIELD_STORE: {
+ Operand rec_ = xlat_op(r, in->opnds[0]);
+ Operand src = xlat_op(r, in->opnds[1]);
+ IRBitFieldAux* aux = (IRBitFieldAux*)in->extra.aux;
+ w->bitfield_store(w, rec_, src, aux->access);
+ break;
+ }
+ case IR_BINOP: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand a = xlat_op(r, in->opnds[1]);
+ Operand bo = xlat_op(r, in->opnds[2]);
+ w->binop(w, (BinOp)in->extra.imm, dst, a, bo);
+ break;
+ }
+ case IR_UNOP: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand a = xlat_op(r, in->opnds[1]);
+ w->unop(w, (UnOp)in->extra.imm, dst, a);
+ break;
+ }
+ case IR_CMP: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand a = xlat_op(r, in->opnds[1]);
+ Operand bo = xlat_op(r, in->opnds[2]);
+ w->cmp(w, (CmpOp)in->extra.imm, dst, a, bo);
+ break;
+ }
+ case IR_CONVERT: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand src = xlat_op(r, in->opnds[1]);
+ w->convert(w, (ConvKind)in->extra.imm, dst, src);
+ break;
+ }
+ case IR_CALL: {
+ IRCallAux* aux = (IRCallAux*)in->extra.aux;
+ CGCallDesc cd = aux->desc;
+ cd.callee = xlat_op(r, cd.callee);
+ CGABIValue* args = NULL;
+ if (cd.nargs) {
+ args = arena_array(r->o->f->arena, CGABIValue, cd.nargs);
+ for (u32 k = 0; k < cd.nargs; ++k) {
+ CGABIPart* parts =
+ aux->desc.args[k].nparts
+ ? arena_array(r->o->f->arena, CGABIPart,
+ aux->desc.args[k].nparts)
+ : NULL;
+ args[k] = xlat_abivalue(r, &aux->desc.args[k], parts);
}
- CGABIPart* ret_parts =
- cd.ret.nparts
- ? arena_array(o->c->tu, CGABIPart, cd.ret.nparts)
- : NULL;
- cd.ret = xlat_abivalue(o, &e->u.call.desc.ret, ret_parts);
- w->call(w, &cd);
- break;
+ cd.args = args;
+ } else {
+ cd.args = NULL;
}
- case TOP_RET: {
- if (!e->u.ret.present) {
- w->ret(w, NULL);
- break;
- }
+ CGABIPart* ret_parts =
+ cd.ret.nparts
+ ? arena_array(r->o->f->arena, CGABIPart, cd.ret.nparts)
+ : NULL;
+ cd.ret = xlat_abivalue(r, &aux->desc.ret, ret_parts);
+ w->call(w, &cd);
+ break;
+ }
+ case IR_BR: {
+ Block* bl = &r->o->f->blocks[b];
+ if (bl->nsucc < 1) break;
+ Label l = ensure_label(r, bl->succ[0]);
+ w->jump(w, l);
+ break;
+ }
+ case IR_CMP_BRANCH: {
+ Operand a = xlat_op(r, in->opnds[0]);
+ Operand bo = xlat_op(r, in->opnds[1]);
+ Block* bl = &r->o->f->blocks[b];
+ Label taken = ensure_label(r, bl->succ[0]);
+ w->cmp_branch(w, (CmpOp)in->extra.imm, a, bo, taken);
+ break;
+ }
+ case IR_RET: {
+ IRRetAux* aux = (IRRetAux*)in->extra.aux;
+ if (!aux || !aux->present) {
+ w->ret(w, NULL);
+ } else {
CGABIPart* parts =
- e->u.ret.val.nparts
- ? arena_array(o->c->tu, CGABIPart, e->u.ret.val.nparts)
+ aux->val.nparts
+ ? arena_array(r->o->f->arena, CGABIPart, aux->val.nparts)
: NULL;
- CGABIValue v = xlat_abivalue(o, &e->u.ret.val, parts);
+ CGABIValue v = xlat_abivalue(r, &aux->val, parts);
w->ret(w, &v);
- break;
}
- case TOP_ALLOCA:
- w->alloca_(w, xlat_op(o, e->u.alloca_.dst),
- xlat_op(o, e->u.alloca_.size), e->u.alloca_.align);
- break;
- case TOP_VA_START:
- w->va_start_(w, xlat_op(o, e->u.va_se.ap));
- break;
- case TOP_VA_ARG:
- w->va_arg_(w, xlat_op(o, e->u.va_arg_.dst),
- xlat_op(o, e->u.va_arg_.ap), e->u.va_arg_.ty);
- break;
- case TOP_VA_END:
- w->va_end_(w, xlat_op(o, e->u.va_se.ap));
- break;
- case TOP_VA_COPY:
- w->va_copy_(w, xlat_op(o, e->u.copy.dst), xlat_op(o, e->u.copy.src));
- break;
- case TOP_SETJMP:
- w->setjmp_(w, xlat_op(o, e->u.setjmp_.dst),
- xlat_op(o, e->u.setjmp_.buf));
- break;
- case TOP_LONGJMP:
- w->longjmp_(w, xlat_op(o, e->u.longjmp_.buf),
- xlat_op(o, e->u.longjmp_.val));
- break;
- case TOP_ATOMIC_LOAD:
- w->atomic_load(w, xlat_op(o, e->u.atomic_load.dst),
- xlat_op(o, e->u.atomic_load.addr),
- e->u.atomic_load.mem, e->u.atomic_load.mo);
- break;
- case TOP_ATOMIC_STORE:
- w->atomic_store(w, xlat_op(o, e->u.atomic_store.addr),
- xlat_op(o, e->u.atomic_store.src),
- e->u.atomic_store.mem, e->u.atomic_store.mo);
- break;
- case TOP_ATOMIC_RMW:
- w->atomic_rmw(w, e->u.atomic_rmw.op, xlat_op(o, e->u.atomic_rmw.dst),
- xlat_op(o, e->u.atomic_rmw.addr),
- xlat_op(o, e->u.atomic_rmw.val), e->u.atomic_rmw.mem,
- e->u.atomic_rmw.mo);
- break;
- case TOP_ATOMIC_CAS:
- w->atomic_cas(w, xlat_op(o, e->u.atomic_cas.prior),
- xlat_op(o, e->u.atomic_cas.ok),
- xlat_op(o, e->u.atomic_cas.addr),
- xlat_op(o, e->u.atomic_cas.expected),
- xlat_op(o, e->u.atomic_cas.desired),
- e->u.atomic_cas.mem, e->u.atomic_cas.success,
- e->u.atomic_cas.failure);
- break;
- case TOP_FENCE:
- w->fence(w, e->u.fence.mo);
- break;
- case TOP_INTRINSIC: {
- Operand* dsts = NULL;
- Operand* args = NULL;
- if (e->u.intrinsic.ndst) {
- dsts = arena_array(o->c->tu, Operand, e->u.intrinsic.ndst);
- for (u32 k = 0; k < e->u.intrinsic.ndst; ++k) {
- dsts[k] = xlat_op(o, e->u.intrinsic.dsts[k]);
- }
- }
- if (e->u.intrinsic.narg) {
- args = arena_array(o->c->tu, Operand, e->u.intrinsic.narg);
- for (u32 k = 0; k < e->u.intrinsic.narg; ++k) {
- args[k] = xlat_op(o, e->u.intrinsic.args[k]);
- }
- }
- w->intrinsic(w, e->u.intrinsic.kind, dsts, e->u.intrinsic.ndst, args,
- e->u.intrinsic.narg);
- break;
+ break;
+ }
+ case IR_SCOPE_BEGIN: {
+ IRScopeAux* aux = (IRScopeAux*)in->extra.aux;
+ CGScopeDesc d = aux->desc;
+ d.cond = xlat_op(r, d.cond);
+ if (aux->desc.kind == SCOPE_LOOP || aux->desc.kind == SCOPE_BLOCK) {
+ d.break_label = aux->loop_break_block
+ ? ensure_label(r, aux->loop_break_block)
+ : LABEL_NONE;
+ d.continue_label = aux->loop_continue_block
+ ? ensure_label(r, aux->loop_continue_block)
+ : LABEL_NONE;
}
- case TOP_SET_LOC:
- w->set_loc(w, e->u.set_loc.loc);
- break;
+ CGScope cs = w->scope_begin(w, &d);
+ r->scope_map[aux->scope_id] = cs;
+ break;
+ }
+ case IR_SCOPE_ELSE:
+ w->scope_else(w, r->scope_map[(u32)in->extra.imm]);
+ break;
+ case IR_SCOPE_END:
+ w->scope_end(w, r->scope_map[(u32)in->extra.imm]);
+ break;
+ case IR_BREAK_TO:
+ w->break_to(w, r->scope_map[(u32)in->extra.imm]);
+ break;
+ case IR_CONTINUE_TO:
+ w->continue_to(w, r->scope_map[(u32)in->extra.imm]);
+ break;
+ case IR_ALLOCA: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand size = xlat_op(r, in->opnds[1]);
+ w->alloca_(w, dst, size, (u32)in->extra.imm);
+ break;
+ }
+ case IR_VA_START: {
+ Operand ap = xlat_op(r, in->opnds[0]);
+ w->va_start_(w, ap);
+ break;
+ }
+ case IR_VA_ARG: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand ap = xlat_op(r, in->opnds[1]);
+ const Type* ty = (const Type*)in->extra.aux;
+ w->va_arg_(w, dst, ap, ty);
+ break;
+ }
+ case IR_VA_END: {
+ Operand ap = xlat_op(r, in->opnds[0]);
+ w->va_end_(w, ap);
+ break;
+ }
+ case IR_VA_COPY: {
+ Operand a = xlat_op(r, in->opnds[0]);
+ Operand src = xlat_op(r, in->opnds[1]);
+ w->va_copy_(w, a, src);
+ break;
+ }
+ case IR_SETJMP: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand buf = xlat_op(r, in->opnds[1]);
+ w->setjmp_(w, dst, buf);
+ break;
+ }
+ case IR_LONGJMP: {
+ Operand buf = xlat_op(r, in->opnds[0]);
+ Operand val = xlat_op(r, in->opnds[1]);
+ w->longjmp_(w, buf, val);
+ break;
+ }
+ case IR_ATOMIC_LOAD: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand addr = xlat_op(r, in->opnds[1]);
+ IRAtomicAux* aux = (IRAtomicAux*)in->extra.aux;
+ w->atomic_load(w, dst, addr, aux->mem, aux->mo);
+ break;
+ }
+ case IR_ATOMIC_STORE: {
+ Operand addr = xlat_op(r, in->opnds[0]);
+ Operand src = xlat_op(r, in->opnds[1]);
+ IRAtomicAux* aux = (IRAtomicAux*)in->extra.aux;
+ w->atomic_store(w, addr, src, aux->mem, aux->mo);
+ break;
+ }
+ case IR_ATOMIC_RMW: {
+ Operand dst = xlat_op(r, in->opnds[0]);
+ Operand addr = xlat_op(r, in->opnds[1]);
+ Operand val = xlat_op(r, in->opnds[2]);
+ IRAtomicAux* aux = (IRAtomicAux*)in->extra.aux;
+ w->atomic_rmw(w, (AtomicOp)aux->op, dst, addr, val, aux->mem, aux->mo);
+ break;
+ }
+ case IR_ATOMIC_CAS: {
+ Operand prior = xlat_op(r, in->opnds[0]);
+ Operand ok = xlat_op(r, in->opnds[1]);
+ Operand addr = xlat_op(r, in->opnds[2]);
+ Operand expected = xlat_op(r, in->opnds[3]);
+ Operand desired = xlat_op(r, in->opnds[4]);
+ IRCasAux* aux = (IRCasAux*)in->extra.aux;
+ w->atomic_cas(w, prior, ok, addr, expected, desired, aux->mem,
+ aux->success, aux->failure);
+ break;
+ }
+ case IR_FENCE:
+ w->fence(w, (MemOrder)in->extra.imm);
+ break;
+ case IR_INTRINSIC: {
+ IRIntrinAux* aux = (IRIntrinAux*)in->extra.aux;
+ Operand* dsts =
+ aux->ndst ? arena_array(r->o->f->arena, Operand, aux->ndst) : NULL;
+ Operand* args =
+ aux->narg ? arena_array(r->o->f->arena, Operand, aux->narg) : NULL;
+ for (u32 k = 0; k < aux->ndst; ++k) dsts[k] = xlat_op(r, aux->dsts[k]);
+ for (u32 k = 0; k < aux->narg; ++k) args[k] = xlat_op(r, aux->args[k]);
+ w->intrinsic(w, aux->kind, dsts, aux->ndst, args, aux->narg);
+ break;
}
}
}
-/* ---- printer ---- */
-
-static void wstr(Writer* w, const char* s) {
- size_t n = 0;
- while (s[n]) ++n;
- if (n) w->write(w, s, n);
-}
-
-/* Minimal i64 → decimal formatter. Writes into a 32-byte buffer (enough
- * for INT64_MIN). Returns nothing; the caller hands the buffer to wstr. */
-static void fmt_i64(i64 v, char* out) {
- char tmp[32];
- u32 n = 0;
- u64 u;
- int neg = 0;
- if (v < 0) {
- neg = 1;
- u = (u64)(-(v + 1)) + 1u; /* avoid UB for INT64_MIN */
- } else {
- u = (u64)v;
- }
- do {
- tmp[n++] = (char)('0' + (u % 10u));
- u /= 10u;
- } while (u);
- if (neg) tmp[n++] = '-';
- /* reverse */
- for (u32 i = 0; i < n; ++i) out[i] = tmp[n - 1 - i];
- out[n] = 0;
-}
-
-static void wint(Writer* w, i64 v) {
- char buf[32];
- fmt_i64(v, buf);
- wstr(w, buf);
-}
-
-static const char* binop_name(BinOp op) {
- switch (op) {
- case BO_IADD: return "iadd";
- case BO_ISUB: return "isub";
- case BO_IMUL: return "imul";
- case BO_SDIV: return "sdiv";
- case BO_UDIV: return "udiv";
- case BO_SREM: return "srem";
- case BO_UREM: return "urem";
- case BO_FADD: return "fadd";
- case BO_FSUB: return "fsub";
- case BO_FMUL: return "fmul";
- case BO_FDIV: return "fdiv";
- case BO_AND: return "and";
- case BO_OR: return "or";
- case BO_XOR: return "xor";
- case BO_SHL: return "shl";
- case BO_SHR_S: return "shr_s";
- case BO_SHR_U: return "shr_u";
+static void replay_block(ReplayCtx* r, u32 b) {
+ Func* f = r->o->f;
+ if (b >= f->nblocks) return;
+ ensure_label_placed(r, b);
+ Block* bl = &f->blocks[b];
+ for (u32 i = 0; i < bl->ninsts; ++i) {
+ replay_inst(r, b, &bl->insts[i]);
}
- return "?binop";
}
-static const char* unop_name(UnOp op) {
- switch (op) {
- case UO_NEG: return "neg";
- case UO_NOT: return "not";
- case UO_BNOT: return "bnot";
- }
- return "?unop";
-}
-
-static const char* cmp_name(CmpOp op) {
- switch (op) {
- case CMP_EQ: return "eq";
- case CMP_NE: return "ne";
- case CMP_LT_S: return "lt_s";
- case CMP_LE_S: return "le_s";
- case CMP_GT_S: return "gt_s";
- case CMP_GE_S: return "ge_s";
- case CMP_LT_U: return "lt_u";
- case CMP_LE_U: return "le_u";
- case CMP_GT_U: return "gt_u";
- case CMP_GE_U: return "ge_u";
- case CMP_LT_F: return "lt_f";
- case CMP_LE_F: return "le_f";
- case CMP_GT_F: return "gt_f";
- case CMP_GE_F: return "ge_f";
- }
- return "?cmp";
-}
+static void replay_func(OptImpl* o) {
+ Func* f = o->f;
+ CGTarget* w = o->target;
-static void print_operand(Writer* w, const Operand* op) {
- switch ((OpKind)op->kind) {
- case OPK_IMM:
- wstr(w, "imm:");
- wint(w, op->v.imm);
- return;
- case OPK_REG:
- wstr(w, "v");
- wint(w, (i64)op->v.reg);
- return;
- case OPK_LOCAL:
- wstr(w, "fs");
- wint(w, (i64)op->v.frame_slot);
- return;
- case OPK_GLOBAL:
- wstr(w, "sym");
- wint(w, (i64)op->v.global.sym);
- if (op->v.global.addend) {
- wstr(w, "+");
- wint(w, op->v.global.addend);
- }
- return;
- case OPK_INDIRECT:
- wstr(w, "[v");
- wint(w, (i64)op->v.ind.base);
- if (op->v.ind.ofs) {
- wstr(w, "+");
- wint(w, op->v.ind.ofs);
- }
- wstr(w, "]");
- return;
+ ReplayCtx r;
+ r.o = o;
+ r.tgt = w;
+ u32 nv = f->nvals ? f->nvals : 1u;
+ r.val_to_reg = arena_zarray(f->arena, Reg, nv);
+ for (u32 i = 0; i < nv; ++i) r.val_to_reg[i] = REG_NONE;
+ r.val_alloced = arena_zarray(f->arena, u8, nv);
+ r.slot_map = arena_zarray(f->arena, FrameSlot, f->nframe_slots + 1u);
+ for (u32 i = 0; i <= f->nframe_slots; ++i) r.slot_map[i] = FRAME_SLOT_NONE;
+ u32 nb = f->nblocks ? f->nblocks : 1u;
+ r.label_map = arena_zarray(f->arena, Label, nb);
+ for (u32 i = 0; i < f->nblocks; ++i) r.label_map[i] = LABEL_NONE;
+ r.scope_map = arena_zarray(f->arena, CGScope, f->nscopes + 1u);
+ for (u32 i = 0; i <= f->nscopes; ++i) r.scope_map[i] = CG_SCOPE_NONE;
+ r.block_label_placed = arena_zarray(f->arena, u8, nb);
+
+ /* func_begin with the recorded descriptor. The desc.params[].slot
+ * fields are wrapper IR slot ids; aarch64's func_begin doesn't
+ * dereference them so we don't translate. */
+ w->func_begin(w, &f->desc);
+
+ for (u32 i = 0; i < f->nframe_slots; ++i) {
+ IRFrameSlot* s = &f->frame_slots[i];
+ FrameSlotDesc d = {0};
+ d.type = s->type;
+ d.name = s->name;
+ d.loc = s->loc;
+ d.size = s->size;
+ d.align = s->align;
+ d.kind = s->kind;
+ d.flags = s->flags;
+ r.slot_map[s->id] = w->frame_slot(w, &d);
}
- wstr(w, "?op");
-}
-static void print_tape(OptImpl* o, Writer* w) {
- for (u32 i = 0; i < o->ntape; ++i) {
- TapeEntry* e = &o->tape[i];
- if (e->dead) {
- wstr(w, " ; dead\n");
- continue;
- }
- wstr(w, " ");
- switch ((TapeOpKind)e->op) {
- case TOP_FUNC_BEGIN:
- wstr(w, "func_begin sym=");
- wint(w, (i64)e->u.func_begin.desc.sym);
- wstr(w, " nparams=");
- wint(w, (i64)e->u.func_begin.desc.nparams);
- break;
- case TOP_FUNC_END:
- wstr(w, "func_end");
- break;
- case TOP_ALLOC_REG:
- wstr(w, "alloc_reg v");
- wint(w, (i64)e->u.alloc_reg.vreg);
- wstr(w, " cls=");
- wint(w, (i64)e->u.alloc_reg.cls);
- break;
- case TOP_FRAME_SLOT:
- wstr(w, "frame_slot fs");
- wint(w, (i64)e->u.frame_slot.vslot);
- wstr(w, " size=");
- wint(w, (i64)e->u.frame_slot.desc.size);
- wstr(w, " kind=");
- wint(w, (i64)e->u.frame_slot.desc.kind);
- break;
- case TOP_PARAM:
- wstr(w, "param idx=");
- wint(w, (i64)e->u.param.desc.index);
- wstr(w, " fs=");
- wint(w, (i64)e->u.param.desc.slot);
- break;
- case TOP_LABEL_NEW:
- wstr(w, "label_new L");
- wint(w, (i64)e->u.label_new.vlabel);
- break;
- case TOP_LABEL_PLACE:
- wstr(w, "label_place L");
- wint(w, (i64)e->u.label_op.vlabel);
- break;
- case TOP_JUMP:
- wstr(w, "jump L");
- wint(w, (i64)e->u.label_op.vlabel);
- break;
- case TOP_CMP_BRANCH:
- wstr(w, "cmp_branch ");
- wstr(w, cmp_name(e->u.cmp_branch.op));
- wstr(w, " ");
- print_operand(w, &e->u.cmp_branch.a);
- wstr(w, ", ");
- print_operand(w, &e->u.cmp_branch.b);
- wstr(w, " -> L");
- wint(w, (i64)e->u.cmp_branch.vlabel);
- break;
- case TOP_SCOPE_BEGIN:
- wstr(w, "scope_begin S");
- wint(w, (i64)e->u.scope_begin.vscope);
- wstr(w, " kind=");
- wint(w, (i64)e->u.scope_begin.desc.kind);
- break;
- case TOP_SCOPE_ELSE:
- wstr(w, "scope_else S");
- wint(w, (i64)e->u.scope_op.vscope);
- break;
- case TOP_SCOPE_END:
- wstr(w, "scope_end S");
- wint(w, (i64)e->u.scope_op.vscope);
- break;
- case TOP_BREAK_TO:
- wstr(w, "break_to S");
- wint(w, (i64)e->u.scope_op.vscope);
- break;
- case TOP_CONTINUE_TO:
- wstr(w, "continue_to S");
- wint(w, (i64)e->u.scope_op.vscope);
- break;
- case TOP_LOAD_IMM:
- wstr(w, "load_imm ");
- print_operand(w, &e->u.load_imm.dst);
- wstr(w, ", ");
- wint(w, e->u.load_imm.imm);
- break;
- case TOP_LOAD_CONST:
- wstr(w, "load_const ");
- print_operand(w, &e->u.load_const.dst);
- wstr(w, ", <bytes:");
- wint(w, (i64)e->u.load_const.cb.size);
- wstr(w, ">");
- break;
- case TOP_COPY:
- wstr(w, "copy ");
- print_operand(w, &e->u.copy.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.copy.src);
- break;
- case TOP_LOAD:
- wstr(w, "load ");
- print_operand(w, &e->u.load.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.load.addr);
- break;
- case TOP_STORE:
- wstr(w, "store ");
- print_operand(w, &e->u.store.addr);
- wstr(w, ", ");
- print_operand(w, &e->u.store.src);
- break;
- case TOP_ADDR_OF:
- wstr(w, "addr_of ");
- print_operand(w, &e->u.copy.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.copy.src);
- break;
- case TOP_TLS_ADDR_OF:
- wstr(w, "tls_addr_of ");
- print_operand(w, &e->u.tls_addr_of.dst);
- wstr(w, ", sym");
- wint(w, (i64)e->u.tls_addr_of.sym);
- break;
- case TOP_COPY_BYTES:
- wstr(w, "copy_bytes ");
- print_operand(w, &e->u.agg.a);
- wstr(w, ", ");
- print_operand(w, &e->u.agg.b);
- wstr(w, " size=");
- wint(w, (i64)e->u.agg.agg.size);
- break;
- case TOP_SET_BYTES:
- wstr(w, "set_bytes ");
- print_operand(w, &e->u.agg.a);
- wstr(w, ", ");
- print_operand(w, &e->u.agg.b);
- wstr(w, " size=");
- wint(w, (i64)e->u.agg.agg.size);
- break;
- case TOP_BITFIELD_LOAD:
- wstr(w, "bitfield_load ");
- print_operand(w, &e->u.bitfield_load.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.bitfield_load.record);
- break;
- case TOP_BITFIELD_STORE:
- wstr(w, "bitfield_store ");
- print_operand(w, &e->u.bitfield_store.record);
- wstr(w, ", ");
- print_operand(w, &e->u.bitfield_store.src);
- break;
- case TOP_BINOP:
- wstr(w, binop_name(e->u.binop.op));
- wstr(w, " ");
- print_operand(w, &e->u.binop.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.binop.a);
- wstr(w, ", ");
- print_operand(w, &e->u.binop.b);
- break;
- case TOP_UNOP:
- wstr(w, unop_name(e->u.unop.op));
- wstr(w, " ");
- print_operand(w, &e->u.unop.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.unop.a);
- break;
- case TOP_CMP:
- wstr(w, "cmp.");
- wstr(w, cmp_name(e->u.cmp.op));
- wstr(w, " ");
- print_operand(w, &e->u.cmp.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.cmp.a);
- wstr(w, ", ");
- print_operand(w, &e->u.cmp.b);
- break;
- case TOP_CONVERT:
- wstr(w, "convert ");
- print_operand(w, &e->u.convert.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.convert.src);
- wstr(w, " kind=");
- wint(w, (i64)e->u.convert.kind);
- break;
- case TOP_CALL:
- wstr(w, "call ");
- print_operand(w, &e->u.call.desc.callee);
- wstr(w, " nargs=");
- wint(w, (i64)e->u.call.desc.nargs);
- break;
- case TOP_RET:
- wstr(w, "ret");
- if (e->u.ret.present) {
- wstr(w, " ");
- print_operand(w, &e->u.ret.val.storage);
- }
- break;
- case TOP_ALLOCA:
- wstr(w, "alloca ");
- print_operand(w, &e->u.alloca_.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.alloca_.size);
- break;
- case TOP_VA_START:
- wstr(w, "va_start ");
- print_operand(w, &e->u.va_se.ap);
- break;
- case TOP_VA_ARG:
- wstr(w, "va_arg ");
- print_operand(w, &e->u.va_arg_.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.va_arg_.ap);
- break;
- case TOP_VA_END:
- wstr(w, "va_end ");
- print_operand(w, &e->u.va_se.ap);
- break;
- case TOP_VA_COPY:
- wstr(w, "va_copy ");
- print_operand(w, &e->u.copy.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.copy.src);
- break;
- case TOP_SETJMP:
- wstr(w, "setjmp ");
- print_operand(w, &e->u.setjmp_.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.setjmp_.buf);
- break;
- case TOP_LONGJMP:
- wstr(w, "longjmp ");
- print_operand(w, &e->u.longjmp_.buf);
- wstr(w, ", ");
- print_operand(w, &e->u.longjmp_.val);
- break;
- case TOP_ATOMIC_LOAD:
- wstr(w, "atomic_load ");
- print_operand(w, &e->u.atomic_load.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.atomic_load.addr);
- break;
- case TOP_ATOMIC_STORE:
- wstr(w, "atomic_store ");
- print_operand(w, &e->u.atomic_store.addr);
- wstr(w, ", ");
- print_operand(w, &e->u.atomic_store.src);
- break;
- case TOP_ATOMIC_RMW:
- wstr(w, "atomic_rmw op=");
- wint(w, (i64)e->u.atomic_rmw.op);
- wstr(w, " ");
- print_operand(w, &e->u.atomic_rmw.dst);
- wstr(w, ", ");
- print_operand(w, &e->u.atomic_rmw.addr);
- wstr(w, ", ");
- print_operand(w, &e->u.atomic_rmw.val);
- break;
- case TOP_ATOMIC_CAS:
- wstr(w, "atomic_cas prior=");
- print_operand(w, &e->u.atomic_cas.prior);
- wstr(w, " ok=");
- print_operand(w, &e->u.atomic_cas.ok);
- wstr(w, " addr=");
- print_operand(w, &e->u.atomic_cas.addr);
- break;
- case TOP_FENCE:
- wstr(w, "fence mo=");
- wint(w, (i64)e->u.fence.mo);
- break;
- case TOP_INTRINSIC:
- wstr(w, "intrinsic kind=");
- wint(w, (i64)e->u.intrinsic.kind);
- wstr(w, " ndst=");
- wint(w, (i64)e->u.intrinsic.ndst);
- wstr(w, " narg=");
- wint(w, (i64)e->u.intrinsic.narg);
- break;
- case TOP_SET_LOC:
- wstr(w, "set_loc ");
- wint(w, (i64)e->u.set_loc.loc.line);
- wstr(w, ":");
- wint(w, (i64)e->u.set_loc.loc.col);
- break;
- }
- wstr(w, "\n");
+ for (u32 i = 0; i < f->nparams; ++i) {
+ IRParam* p = &f->params[i];
+ CGParamDesc d = {0};
+ d.index = p->index;
+ d.name = p->name;
+ d.type = p->type;
+ d.slot = slot_to_target(&r, p->slot);
+ d.abi = p->abi;
+ d.loc = p->loc;
+ w->param(w, &d);
}
-}
-/* ---- Phase 2 peephole: integer constant folding ----
- *
- * Pattern: LOAD_IMM(V_a, k_a); LOAD_IMM(V_b, k_b); BINOP(op, V_d, V_a, V_b)
- * with op ∈ {IADD, ISUB, IMUL}.
- * After: the BINOP is rewritten to LOAD_IMM(V_d, k_a OP k_b).
- *
- * Both operands must be OPK_REG referencing wrapper vregs whose only
- * recorded definition was a LOAD_IMM. The intermediate LOAD_IMMs are
- * left in place — they may have other uses, and DCE is a Phase 3
- * concern.
- *
- * Folding is done in 64-bit signed arithmetic and truncated by the
- * target's load_imm based on the destination type. This matches C11
- * §6.5/3 ("two's-complement wraparound at the abstract machine level
- * for signed and unsigned integer types alike" per cfree's no-UB
- * stance — see doc/DESIGN.md §9). */
-
-typedef struct ImmInfo {
- i64 val;
- u8 known;
-} ImmInfo;
-
-static void peephole_constfold(OptImpl* o) {
- ImmInfo* imm;
- u32 cap;
-
- if (o->next_vreg <= 1) return;
- cap = o->next_vreg;
- imm = arena_zarray(o->c->tu, ImmInfo, cap);
-
- for (u32 i = 0; i < o->ntape; ++i) {
- TapeEntry* e = &o->tape[i];
- if (e->dead) continue;
- switch ((TapeOpKind)e->op) {
- case TOP_LOAD_IMM:
- if (e->u.load_imm.dst.kind == OPK_REG) {
- Reg r = e->u.load_imm.dst.v.reg;
- if (r < cap) {
- imm[r].val = e->u.load_imm.imm;
- imm[r].known = 1;
- }
- }
- break;
- case TOP_BINOP: {
- Operand a = e->u.binop.a;
- Operand b = e->u.binop.b;
- BinOp op = e->u.binop.op;
- if (a.kind != OPK_REG || b.kind != OPK_REG) break;
- if (a.v.reg >= cap || b.v.reg >= cap) break;
- if (!imm[a.v.reg].known || !imm[b.v.reg].known) break;
- if (op != BO_IADD && op != BO_ISUB && op != BO_IMUL) break;
-
- i64 av = imm[a.v.reg].val;
- i64 bv = imm[b.v.reg].val;
- u64 folded;
- /* Compute in u64 to make wraparound deterministic, then cast
- * back. cfree's no-UB stance forbids signed-overflow-is-UB
- * exploitation (doc/DESIGN.md §9), so this is the right shape. */
- switch (op) {
- case BO_IADD: folded = (u64)av + (u64)bv; break;
- case BO_ISUB: folded = (u64)av - (u64)bv; break;
- case BO_IMUL: folded = (u64)av * (u64)bv; break;
- default: continue;
- }
-
- Operand dst = e->u.binop.dst;
- memset(&e->u, 0, sizeof e->u);
- e->op = (u8)TOP_LOAD_IMM;
- e->u.load_imm.dst = dst;
- e->u.load_imm.imm = (i64)folded;
- if (dst.kind == OPK_REG && dst.v.reg < cap) {
- imm[dst.v.reg].val = (i64)folded;
- imm[dst.v.reg].known = 1;
- }
- break;
- }
- default:
- break;
- }
+ /* Body in emit order — the order CG's emit cursor visited each
+ * block. Block-creation order can differ when label_new precedes a
+ * cmp_branch whose fallthrough block must physically follow. */
+ for (u32 i = 0; i < f->emit_order_n; ++i) {
+ replay_block(&r, f->emit_order[i]);
}
+
+ w->func_end(w);
}
-/* ---- func_end: append TOP_FUNC_END, run peepholes, replay ---- */
+/* ---- func_end: optionally run dry-run passes; replay; reset ---- */
static void w_func_end(CGTarget* t) {
OptImpl* o = impl_of(t);
- tape_append(o, TOP_FUNC_END);
- peephole_constfold(o);
- if (o->dump_writer) print_tape(o, o->dump_writer);
- replay(o);
-}
+ if (!o->f) return;
-/* ---- public API: dump writer ---- */
+ if (o->level >= 2) {
+ opt_build_cfg(o->f);
+ opt_build_ssa(o->f);
+ }
-void opt_set_dump_writer(CGTarget* t, Writer* w) {
- /* Identify our own targets by the func_begin slot. Anything else is
- * a non-opt CGTarget and the call is a silent no-op. */
- if (!t || t->func_begin != w_func_begin) return;
- impl_of(t)->dump_writer = w;
+ replay_func(o);
+ o->f = NULL;
+ o->cur = 0;
}
-/* ---- end-of-TU and destruction ---- */
+/* ---- finalize / destroy ---- */
static void w_finalize(CGTarget* t) {
CGTarget* wr = impl_of(t)->target;
@@ -1779,12 +1180,16 @@ static void w_destroy(CGTarget* t) {
if (wr->destroy) wr->destroy(wr);
}
+/* ---- public dump-writer API ---- */
+
+void opt_set_dump_writer(CGTarget* t, Writer* w) {
+ if (!t || t->func_begin != w_func_begin) return;
+ impl_of(t)->dump_writer = w;
+}
+
/* ---- construction ---- */
CGTarget* opt_cgtarget_new(Compiler* c, CGTarget* target, int level) {
- OptImpl* o;
- CGTarget* t;
-
if (!target) {
SrcLoc loc = {0, 0, 0};
compiler_panic(c, loc, "opt_cgtarget_new: target is NULL");
@@ -1795,13 +1200,13 @@ CGTarget* opt_cgtarget_new(Compiler* c, CGTarget* target, int level) {
level);
}
- o = arena_new(c->tu, OptImpl);
+ OptImpl* o = arena_new(c->tu, OptImpl);
memset(o, 0, sizeof *o);
o->c = c;
o->target = target;
o->level = level;
- t = &o->base;
+ CGTarget* t = &o->base;
t->c = c;
t->obj = target->obj;
t->mc = target->mc;
diff --git a/src/opt/pass_cfg.c b/src/opt/pass_cfg.c
@@ -0,0 +1,116 @@
+/* pass_cfg.c — derive Block.preds and Block.succ/nsucc from each
+ * block's terminator. doc/OPT.md §3 Phase 3.
+ *
+ * Terminator inventory:
+ * IR_BR — 1 succ (succ[0])
+ * IR_CONDBR — 2 succs ([true, false])
+ * IR_CMP_BRANCH — 2 succs ([taken, fallthrough])
+ * IR_RET — 0 succs
+ * IR_LONGJMP — 0 succs
+ * IR_INTRINSIC TRAP/UNREACHABLE — 0 succs
+ * IR_BREAK_TO / IR_CONTINUE_TO — 0 succs (control transferred to
+ * the scope's break/continue label,
+ * which is a successor encoded on
+ * the IRScopeAux; pass populates
+ * succ from there)
+ *
+ * IR_SETJMP is a control barrier: the recorder splits its block but
+ * IR_SETJMP itself falls through. pass_cfg sees it as a normal inst.
+ *
+ * For scope ops the wrapper's recording assigns succ[] at emit time
+ * (since it owns the vlabel→block_id mapping). pass_cfg trusts that
+ * and only repopulates from the trailing terminator inst when one is
+ * present. */
+
+#include "opt/ir.h"
+#include "opt/opt.h"
+
+#include <string.h>
+
+#include "core/arena.h"
+#include "core/core.h"
+
+static int is_terminator(const Inst* in) {
+ switch ((IROp)in->op) {
+ case IR_BR:
+ case IR_CONDBR:
+ case IR_CMP_BRANCH:
+ case IR_RET:
+ case IR_LONGJMP:
+ case IR_BREAK_TO:
+ case IR_CONTINUE_TO:
+ return 1;
+ case IR_INTRINSIC:
+ return in->extra.imm == INTRIN_TRAP ||
+ in->extra.imm == INTRIN_UNREACHABLE;
+ default:
+ return 0;
+ }
+}
+
+void opt_build_cfg(Func* f) {
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ f->blocks[b].preds = NULL;
+ f->blocks[b].npreds = 0;
+ }
+
+ /* Trust the recorder's succ[] for terminators that don't have a fixed
+ * succ count from the inst alone (IR_BR, IR_CONDBR, IR_CMP_BRANCH,
+ * IR_BREAK_TO, IR_CONTINUE_TO). Only fix nsucc for ops where we can
+ * read it directly from the op tag. */
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ Block* bl = &f->blocks[b];
+ if (bl->ninsts == 0) {
+ bl->nsucc = 0;
+ continue;
+ }
+ const Inst* last = &bl->insts[bl->ninsts - 1];
+ if (!is_terminator(last)) {
+ bl->nsucc = 0;
+ continue;
+ }
+ switch ((IROp)last->op) {
+ case IR_RET:
+ case IR_LONGJMP:
+ bl->nsucc = 0;
+ break;
+ case IR_INTRINSIC:
+ bl->nsucc = 0;
+ break;
+ case IR_BR:
+ case IR_BREAK_TO:
+ case IR_CONTINUE_TO:
+ bl->nsucc = 1;
+ break;
+ case IR_CONDBR:
+ case IR_CMP_BRANCH:
+ bl->nsucc = 2;
+ break;
+ default:
+ break;
+ }
+ }
+
+ /* Count predecessors. */
+ u32* counts = arena_zarray(f->arena, u32, f->nblocks);
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ Block* bl = &f->blocks[b];
+ for (u32 s = 0; s < bl->nsucc; ++s) {
+ u32 t = bl->succ[s];
+ if (t < f->nblocks) counts[t]++;
+ }
+ }
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ if (counts[b]) {
+ f->blocks[b].preds = arena_array(f->arena, u32, counts[b]);
+ }
+ }
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ Block* bl = &f->blocks[b];
+ for (u32 s = 0; s < bl->nsucc; ++s) {
+ u32 t = bl->succ[s];
+ if (t >= f->nblocks) continue;
+ f->blocks[t].preds[f->blocks[t].npreds++] = b;
+ }
+ }
+}
diff --git a/src/opt/pass_ssa.c b/src/opt/pass_ssa.c
@@ -0,0 +1,427 @@
+/* pass_ssa.c — mem2reg + dominance-frontier SSA construction.
+ *
+ * Goal for Phase 3 (doc/OPT.md): build SSA without consuming it. The
+ * output is discarded before replay, so this pass's job is shape
+ * checking — no panics on the corpus.
+ *
+ * Algorithm (Cooper-Harvey-Kennedy iterative dominators + Cytron et al.
+ * dominance-frontier phi insertion):
+ *
+ * 1. Postorder + reverse-postorder traversal of the CFG.
+ * 2. Compute idom[] iteratively via the two-finger intersect.
+ * 3. Compute DF[] from idom[] in one pass.
+ * 4. For each promotable FrameSlot (no FSF_ADDR_TAKEN), compute the
+ * iterated dominance frontier of its defining blocks; insert
+ * IR_PHI at the start of each block in the IDF.
+ * 5. Rename: DFS the dominator tree, maintain a stack per slot; on
+ * store push the stored Val, on load record (load_def → top) into
+ * a rename map, on each successor fill in this block's slot in
+ * that successor's phis. After processing children, pop.
+ *
+ * The rename map is built but its uses are intentionally NOT walked
+ * across other instructions in the dry-run — that is the part that
+ * mutates the IR for downstream passes, and Phase 3 discards the IR.
+ * The phi-insertion + slot-stack walk is the part that exercises the
+ * IR shape, which is what the dry-run is checking. */
+
+#include "opt/ir.h"
+#include "opt/opt.h"
+
+#include <string.h>
+
+#include "core/arena.h"
+#include "core/core.h"
+
+#define BLK_NONE 0xffffffffu
+
+/* ---- postorder ---- */
+
+typedef struct PostorderCtx {
+ Func* f;
+ u32* po; /* po[i] = block id at postorder position i */
+ u32* po_idx; /* po_idx[block] = postorder position of block */
+ u8* visited;
+ u32 count;
+} PostorderCtx;
+
+static void postorder_dfs(PostorderCtx* ctx, u32 b) {
+ if (ctx->visited[b]) return;
+ ctx->visited[b] = 1;
+ Block* bl = &ctx->f->blocks[b];
+ for (u32 s = 0; s < bl->nsucc; ++s) {
+ u32 t = bl->succ[s];
+ if (t < ctx->f->nblocks) postorder_dfs(ctx, t);
+ }
+ ctx->po[ctx->count] = b;
+ ctx->po_idx[b] = ctx->count;
+ ctx->count++;
+}
+
+/* ---- dominators (Cooper-Harvey-Kennedy) ---- */
+
+static u32 dom_intersect(u32 b1, u32 b2, const u32* idom, const u32* po_idx) {
+ while (b1 != b2) {
+ while (po_idx[b1] < po_idx[b2]) b1 = idom[b1];
+ while (po_idx[b2] < po_idx[b1]) b2 = idom[b2];
+ }
+ return b1;
+}
+
+static u32* compute_idom(Func* f, const u32* po, const u32* po_idx, u32 ncount,
+ u32 entry) {
+ u32* idom = arena_array(f->arena, u32, f->nblocks);
+ for (u32 b = 0; b < f->nblocks; ++b) idom[b] = BLK_NONE;
+ idom[entry] = entry;
+
+ int changed = 1;
+ while (changed) {
+ changed = 0;
+ /* Reverse postorder, skip the entry block. */
+ for (i32 i = (i32)ncount - 1; i >= 0; --i) {
+ u32 b = po[i];
+ if (b == entry) continue;
+ Block* bl = &f->blocks[b];
+ u32 new_idom = BLK_NONE;
+ for (u32 p = 0; p < bl->npreds; ++p) {
+ u32 pp = bl->preds[p];
+ if (idom[pp] != BLK_NONE) {
+ new_idom = (new_idom == BLK_NONE)
+ ? pp
+ : dom_intersect(pp, new_idom, idom, po_idx);
+ }
+ }
+ if (new_idom != BLK_NONE && idom[b] != new_idom) {
+ idom[b] = new_idom;
+ changed = 1;
+ }
+ }
+ }
+ return idom;
+}
+
+/* ---- dominance frontier ---- */
+
+typedef struct DfSet {
+ u32* members;
+ u32 n, cap;
+} DfSet;
+
+static void df_add(Arena* a, DfSet* s, u32 b) {
+ for (u32 i = 0; i < s->n; ++i)
+ if (s->members[i] == b) return;
+ if (s->n == s->cap) {
+ u32 ncap = s->cap ? s->cap * 2u : 4u;
+ u32* nb = arena_array(a, u32, ncap);
+ if (s->members) memcpy(nb, s->members, sizeof(u32) * s->n);
+ s->members = nb;
+ s->cap = ncap;
+ }
+ s->members[s->n++] = b;
+}
+
+static DfSet* compute_df(Func* f, const u32* idom) {
+ DfSet* df = arena_zarray(f->arena, DfSet, f->nblocks);
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ Block* bl = &f->blocks[b];
+ if (bl->npreds < 2) continue;
+ if (idom[b] == BLK_NONE) continue;
+ for (u32 p = 0; p < bl->npreds; ++p) {
+ u32 runner = bl->preds[p];
+ while (runner != idom[b] && runner != BLK_NONE) {
+ df_add(f->arena, &df[runner], b);
+ runner = idom[runner];
+ }
+ }
+ }
+ return df;
+}
+
+/* ---- promotable slots ---- */
+
+/* Identify the FrameSlot a load/store address operand refers to.
+ * Returns 0 if the operand is not OPK_LOCAL (i.e., not a direct
+ * frame-slot reference) — those addresses route through computed Val
+ * pointers and cannot be promoted by a slot-keyed pass. */
+static u32 opnd_slot_id(const Operand* op) {
+ if (op->kind != OPK_LOCAL) return 0;
+ return (u32)op->v.frame_slot;
+}
+
+static int slot_promotable(const Func* f, u32 slot_id) {
+ if (slot_id == 0 || slot_id > f->nframe_slots) return 0;
+ const IRFrameSlot* s = &f->frame_slots[slot_id - 1];
+ if (s->flags & FSF_ADDR_TAKEN) return 0;
+ if (s->flags & FSF_VOLATILE) return 0;
+ /* Only locals are promotable for v1; params live in slots too but
+ * promoting them is a separate transform. */
+ if (s->kind != FS_LOCAL) return 0;
+ return 1;
+}
+
+/* ---- phi insertion ---- */
+
+/* Insert n_new phi instructions at the start of block b, each tagged
+ * with its slot via opnds[0] = synthetic "slot tag" Val (we reuse
+ * IRPhiAux to carry pred info). The stored slot id lives in
+ * extra.imm. */
+static void insert_phis(Func* f, u32 b, u32 n_new, const u32* phi_slots,
+ const u32* phi_blocks_for_slot) {
+ if (!n_new) return;
+ Block* bl = &f->blocks[b];
+ u32 old = bl->ninsts;
+ u32 nnew = old + n_new;
+ Inst* nb = arena_zarray(f->arena, Inst, nnew);
+ /* Phis go first. */
+ for (u32 i = 0; i < n_new; ++i) {
+ Inst* in = &nb[i];
+ u32 slot_id = phi_slots[i];
+ const IRFrameSlot* s = &f->frame_slots[slot_id - 1];
+ in->op = IR_PHI;
+ in->type = s->type;
+ in->extra.imm = (i64)slot_id;
+ /* Allocate IRPhiAux with one slot per pred, initialized to
+ * VAL_NONE. The rename pass fills pred_vals later. */
+ IRPhiAux* aux = arena_znew(f->arena, IRPhiAux);
+ aux->npreds = bl->npreds;
+ if (bl->npreds) {
+ aux->pred_blocks = arena_array(f->arena, u32, bl->npreds);
+ aux->pred_vals = arena_zarray(f->arena, Val, bl->npreds);
+ memcpy(aux->pred_blocks, bl->preds, sizeof(u32) * bl->npreds);
+ }
+ /* Reuse extra union: imm carries slot, but we also need aux. We
+ * stash the IRPhiAux in opnds: opnds[0] is a sentinel pointer cast
+ * — that breaks the Val* type. Instead, we use a parallel side
+ * table rooted on the inst. To keep this self-contained without
+ * altering the Inst layout, we point in->opnds at a Val array of
+ * length 0 and rely on the aux pointer through the extra union.
+ *
+ * Layout choice: extra.aux = aux (carry the IRPhiAux pointer);
+ * imm-as-slot lives in a side table. */
+ in->extra.aux = aux;
+ in->nopnds = 0;
+ in->opnds = NULL;
+ /* def Val: allocate a fresh value typed as the slot's type. */
+ /* We can't use val_alloc directly (it's static); set up the val
+ * table manually via an emit-equivalent. Simpler: piggy-back on
+ * ir_emit_const_i style by appending after current state. But we're
+ * mid-rebuild of the block. Defer: encode def slot via re-reading
+ * the val table. */
+ /* For Phase 3 dry-run, leave def = VAL_NONE on phis. The rename
+ * pass uses extra.aux->pred_vals to inspect phi shape; the
+ * dry-run discards before downstream passes need def. */
+ in->def = VAL_NONE;
+ }
+ /* Existing instructions shifted right. */
+ if (old) memcpy(nb + n_new, bl->insts, sizeof(Inst) * old);
+ bl->insts = nb;
+ bl->ninsts = nnew;
+ bl->cap = nnew;
+ /* val_def_inst for any val defined in this block has shifted by
+ * n_new. Walk the val table and update. */
+ for (u32 v = 1; v < f->nvals; ++v) {
+ if (f->val_def_block[v] == b) f->val_def_inst[v] += n_new;
+ }
+ (void)phi_blocks_for_slot;
+}
+
+/* ---- rename ---- */
+
+typedef struct SlotStack {
+ Val* stack;
+ u32 n, cap;
+} SlotStack;
+
+static void slot_push(Arena* a, SlotStack* s, Val v) {
+ if (s->n == s->cap) {
+ u32 ncap = s->cap ? s->cap * 2u : 4u;
+ Val* nb = arena_array(a, Val, ncap);
+ if (s->stack) memcpy(nb, s->stack, sizeof(Val) * s->n);
+ s->stack = nb;
+ s->cap = ncap;
+ }
+ s->stack[s->n++] = v;
+}
+
+static Val slot_top(const SlotStack* s) {
+ return s->n ? s->stack[s->n - 1] : VAL_NONE;
+}
+
+static void rename_dfs(Func* f, u32 b, const u32* idom, SlotStack* slots) {
+ Block* bl = &f->blocks[b];
+ /* Track per-slot push count so we can pop on exit. */
+ u32* pushed = arena_zarray(f->arena, u32, f->nframe_slots + 1);
+
+ /* 1. Process phis in this block: each phi has extra.imm = slot id.
+ * Push a synthetic phi value (VAL_NONE in dry-run; a "fresh Val"
+ * once we wire renaming). */
+ for (u32 i = 0; i < bl->ninsts; ++i) {
+ Inst* in = &bl->insts[i];
+ if (in->op != IR_PHI) break;
+ u32 slot_id = (u32)in->extra.imm;
+ if (slot_id == 0 || slot_id > f->nframe_slots) continue;
+ /* For dry-run, the phi's def is VAL_NONE; we still push so the
+ * stack has the right depth. Downstream passes (Phase 4+) will
+ * allocate a real Val here. */
+ slot_push(f->arena, &slots[slot_id], VAL_NONE);
+ pushed[slot_id]++;
+ }
+
+ /* 2. Process the rest of the block. */
+ for (u32 i = 0; i < bl->ninsts; ++i) {
+ Inst* in = &bl->insts[i];
+ if (in->op == IR_PHI) continue;
+ if (in->op == IR_STORE) {
+ /* IR_STORE opnds: [0] = addr, [1] = src. */
+ if (in->nopnds < 2) continue;
+ u32 sid = opnd_slot_id(&in->opnds[0]);
+ if (sid && slot_promotable(f, sid)) {
+ const Operand* src = &in->opnds[1];
+ Val v = (src->kind == OPK_REG) ? (Val)src->v.reg : VAL_NONE;
+ slot_push(f->arena, &slots[sid], v);
+ pushed[sid]++;
+ }
+ continue;
+ }
+ if (in->op == IR_LOAD) {
+ /* IR_LOAD opnds: [0] = dst REG, [1] = addr. */
+ if (in->nopnds < 2) continue;
+ u32 sid = opnd_slot_id(&in->opnds[1]);
+ if (sid && slot_promotable(f, sid)) {
+ /* Touching slot_top exercises the stack invariant — that's
+ * the shape check we want. The rewrite map proper would walk
+ * uses; we skip that in the dry-run (output discarded). */
+ (void)slot_top(&slots[sid]);
+ }
+ continue;
+ }
+ }
+
+ /* 3. Fill our slot in each successor's phis. */
+ for (u32 s = 0; s < bl->nsucc; ++s) {
+ u32 succ = bl->succ[s];
+ if (succ >= f->nblocks) continue;
+ Block* sb = &f->blocks[succ];
+ /* Find which pred index `b` is in succ. */
+ u32 pred_idx = 0;
+ int found = 0;
+ for (u32 p = 0; p < sb->npreds; ++p) {
+ if (sb->preds[p] == b) {
+ pred_idx = p;
+ found = 1;
+ break;
+ }
+ }
+ if (!found) continue;
+ for (u32 i = 0; i < sb->ninsts; ++i) {
+ Inst* in = &sb->insts[i];
+ if (in->op != IR_PHI) break;
+ u32 slot_id = (u32)in->extra.imm;
+ if (slot_id == 0 || slot_id > f->nframe_slots) continue;
+ IRPhiAux* aux = (IRPhiAux*)in->extra.aux;
+ if (!aux || pred_idx >= aux->npreds) continue;
+ aux->pred_vals[pred_idx] = slot_top(&slots[slot_id]);
+ }
+ }
+
+ /* 4. Recurse into immediate dom children. */
+ for (u32 c = 0; c < f->nblocks; ++c) {
+ if (c == b) continue;
+ if (idom[c] == b) rename_dfs(f, c, idom, slots);
+ }
+
+ /* 5. Pop. */
+ for (u32 sid = 0; sid <= f->nframe_slots; ++sid) {
+ while (pushed[sid]--) {
+ if (slots[sid].n) slots[sid].n--;
+ }
+ }
+}
+
+/* ---- main entry ---- */
+
+void opt_build_ssa(Func* f) {
+ if (f->nblocks == 0) return;
+
+ /* Postorder traversal from entry. */
+ PostorderCtx pctx;
+ pctx.f = f;
+ pctx.po = arena_array(f->arena, u32, f->nblocks);
+ pctx.po_idx = arena_array(f->arena, u32, f->nblocks);
+ pctx.visited = arena_zarray(f->arena, u8, f->nblocks);
+ pctx.count = 0;
+ for (u32 b = 0; b < f->nblocks; ++b) pctx.po_idx[b] = 0;
+ postorder_dfs(&pctx, f->entry);
+ /* Unreachable blocks: never visited. Skip them in the dom analysis;
+ * the dry-run shouldn't crash on them. */
+
+ u32* idom = compute_idom(f, pctx.po, pctx.po_idx, pctx.count, f->entry);
+ DfSet* df = compute_df(f, idom);
+
+ /* For each promotable slot, find defining blocks → iterated DF. */
+ if (f->nframe_slots == 0) return;
+ u8* needs_phi_storage =
+ arena_zarray(f->arena, u8, (f->nframe_slots + 1) * f->nblocks);
+#define NEEDS_PHI(slot, blk) \
+ needs_phi_storage[((slot) * f->nblocks) + (blk)]
+
+ /* Worklist algo per slot. */
+ u32* worklist = arena_array(f->arena, u32, f->nblocks);
+ u8* on_worklist = arena_zarray(f->arena, u8, f->nblocks);
+
+ for (u32 sid = 1; sid <= f->nframe_slots; ++sid) {
+ if (!slot_promotable(f, sid)) continue;
+ /* Reset per-slot worklist state. */
+ for (u32 i = 0; i < f->nblocks; ++i) on_worklist[i] = 0;
+ u32 wn = 0;
+ /* Seed with blocks containing a store to this slot. */
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ Block* bl = &f->blocks[b];
+ for (u32 i = 0; i < bl->ninsts; ++i) {
+ Inst* in = &bl->insts[i];
+ if (in->op == IR_STORE && in->nopnds >= 1 &&
+ opnd_slot_id(&in->opnds[0]) == sid) {
+ if (!on_worklist[b]) {
+ on_worklist[b] = 1;
+ worklist[wn++] = b;
+ }
+ break;
+ }
+ }
+ }
+ /* Iterated DF. */
+ while (wn) {
+ u32 x = worklist[--wn];
+ DfSet* d = &df[x];
+ for (u32 i = 0; i < d->n; ++i) {
+ u32 y = d->members[i];
+ if (NEEDS_PHI(sid, y)) continue;
+ NEEDS_PHI(sid, y) = 1;
+ if (!on_worklist[y]) {
+ on_worklist[y] = 1;
+ worklist[wn++] = y;
+ }
+ }
+ }
+ }
+
+ /* Insert phis per block. */
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ u32 nphi = 0;
+ for (u32 sid = 1; sid <= f->nframe_slots; ++sid) {
+ if (NEEDS_PHI(sid, b)) nphi++;
+ }
+ if (!nphi) continue;
+ u32* slots_arr = arena_array(f->arena, u32, nphi);
+ u32 k = 0;
+ for (u32 sid = 1; sid <= f->nframe_slots; ++sid) {
+ if (NEEDS_PHI(sid, b)) slots_arr[k++] = sid;
+ }
+ insert_phis(f, b, nphi, slots_arr, NULL);
+ }
+
+ /* Rename phase: DFS from entry over the dominator tree. */
+ SlotStack* slots = arena_zarray(f->arena, SlotStack, f->nframe_slots + 1);
+ rename_dfs(f, f->entry, idom, slots);
+
+#undef NEEDS_PHI
+}
diff --git a/test/cg/run.sh b/test/cg/run.sh
@@ -51,15 +51,18 @@ ALLOW_SKIP="${CFREE_TEST_ALLOW_SKIP:-0}"
# Filters (env vars or positional args; args win):
# $1 / CFREE_TEST_FILTER — substring match against case name
# $2 / CFREE_TEST_PATHS — subset of "DREJ" (default "DREJ")
-# CFREE_OPT_LEVELS — space-separated opt levels to exercise. Default "0 1"
-# so every case is built twice: directly against the
-# backend (level 0) and through the opt_cgtarget
-# wrapper (level 1). Path W (DWARF) only runs at
-# level 0 — opt-level DWARF equivalence is a later
-# phase concern.
+# CFREE_OPT_LEVELS — space-separated opt levels to exercise. Default
+# "0 1 2" so every case is built three ways:
+# directly against the backend (level 0), and
+# through the opt_cgtarget wrapper at levels 1 and
+# 2. Level 2 currently runs the Phase 3 dry-run
+# passes (build_cfg + build_ssa) and discards
+# before replay, so behavior must match level 1.
+# Path W (DWARF) only runs at level 0 — opt-level
+# DWARF equivalence is a later phase concern.
FILTER="${1:-${CFREE_TEST_FILTER:-}}"
PATHS="${2:-${CFREE_TEST_PATHS:-DREJW}}"
-OPT_LEVELS="${CFREE_OPT_LEVELS:-0 1}"
+OPT_LEVELS="${CFREE_OPT_LEVELS:-0 1 2}"
case "$PATHS" in *D*) RUN_D=1;; *) RUN_D=0;; esac
case "$PATHS" in *R*) RUN_R=1;; *) RUN_R=0;; esac
case "$PATHS" in *E*) RUN_E=1;; *) RUN_E=0;; esac
@@ -111,13 +114,13 @@ arch_raw="$(uname -m 2>/dev/null || true)"
READELF_BIN="$(command -v llvm-readelf 2>/dev/null || command -v readelf 2>/dev/null || true)"
-# Shared aarch64 exec helper — see test/lib/exec_aarch64.sh. Path E queues
-# each linked.exe and we drain the queue in a single batched podman run
-# after the case loop, amortizing the per-launch podman overhead across
-# all ~200 cg cases.
-EXEC_AARCH64_MOUNT_ROOT="$BUILD_DIR"
-# shellcheck source=../lib/exec_aarch64.sh
-source "$ROOT/test/lib/exec_aarch64.sh"
+# Shared per-arch exec helper — see test/lib/exec_target.sh. Path E
+# queues each linked.exe and we drain the queue in a single batched
+# podman run per arch after the case loop, amortizing the per-launch
+# podman overhead across all ~200 cg cases.
+EXEC_TARGET_MOUNT_ROOT="$BUILD_DIR"
+# shellcheck source=../lib/exec_target.sh
+source "$ROOT/test/lib/exec_target.sh"
# ---- build harness binaries ------------------------------------------------
@@ -263,6 +266,16 @@ for OPT_LEVEL in $OPT_LEVELS; do
# negative-return cases compare correctly.
expected_byte=$(( expected & 0xff ))
+ # Path E target arch. cg-runner --arches NAME prints the arches a
+ # case can run on (one per line). Today every case is aarch64-only;
+ # multi-arch cases will land alongside x64 codegen in MULTIARCH
+ # phase 3 and broaden this list.
+ case_arches="$("${CG_RUN[@]}" --arches "$name" 2>/dev/null)"
+ case_arches="${case_arches:-aarch64}"
+ # First arch is the canonical one path E targets. (Cases are still
+ # single-arch through phase 2; the loop is a placeholder seam.)
+ case_arch="$(printf '%s\n' "$case_arches" | head -n1)"
+
# ---- Path D: in-process JIT (only on aarch64) ------------------------
if [ $RUN_D -eq 1 ]; then
if [ $is_aarch64 -eq 1 ]; then
@@ -322,7 +335,7 @@ for OPT_LEVEL in $OPT_LEVELS; do
>"$work/exec_link.out" 2>"$work/exec_link.err"; then
dt=$(( $(now_ms) - t0 )); T_E=$(( T_E + dt ))
note_fail "$name/E${TAG} (link failed, ${dt}ms)"
- elif [ $have_runner -eq 1 ]; then
+ elif exec_target_supported "$case_arch"; then
link_dt=$(( $(now_ms) - t0 )); T_E=$(( T_E + link_dt ))
E_NAMES+=("$name")
E_WORK+=("$work")
@@ -330,10 +343,11 @@ for OPT_LEVEL in $OPT_LEVELS; do
E_EXPECTED+=("$expected_byte")
# Queue with a level-tagged key so cases at different
# opt levels don't collide in the batched runner.
- exec_aarch64_queue "L${OPT_LEVEL}_${name}" "$exe" \
- "$work/exec.out" "$work/exec.err" "$work/exec.rc"
+ exec_target_queue "$case_arch" "L${OPT_LEVEL}_${name}" \
+ "$exe" "$work/exec.out" "$work/exec.err" \
+ "$work/exec.rc"
else
- note_skip "$name/E${TAG}" "no qemu/podman"
+ note_skip "$name/E${TAG}" "no runner for $case_arch"
fi
else
note_skip "$name/E${TAG}" "no link-exe-runner, aarch64 clang, or start.o"
@@ -385,13 +399,13 @@ for OPT_LEVEL in $OPT_LEVELS; do
done
# ---- batched path-E flush + verification (per level) -------------------
- # Run every queued case in a single podman invocation, then iterate the
- # queue to read each exit code and emit PASS/FAIL.
- if [ "$(exec_aarch64_queue_size)" -gt 0 ]; then
+ # Run every queued case in a single podman invocation per arch, then
+ # iterate the queue to read each exit code and emit PASS/FAIL.
+ if [ "$(exec_target_queue_size)" -gt 0 ]; then
printf 'Running path E%s (%d cases batched)...\n' \
- "$TAG" "$(exec_aarch64_queue_size)"
+ "$TAG" "$(exec_target_queue_size)"
t0=$(now_ms)
- exec_aarch64_flush
+ exec_target_flush
DELTA=$(( $(now_ms) - t0 ))
T_E_BATCH=$(( ${T_E_BATCH:-0} + DELTA )); T_E=$(( T_E + DELTA ))