kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 32f88989b99113efd573a10ab332c1e7732b54ff
parent bf1dc146e22105508838aeb887215cd591aeb6fb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 10:24:27 -0700

opt: Phase 3 — IR-native recording, SSA construction, dry-run

Refactor opt_cgtarget to record CGTarget calls directly into the SSA
IR (Func/Block/Inst) instead of into a shadow tape. Each method lands
as exactly one Inst; replay walks the Func and re-issues calls onto
the wrapped target. Levels 1 and 2 share the recording and replay
path; level 2 additionally runs build_cfg + build_ssa as a dry-run
(output discarded) to shake out IR-shape bugs ahead of Phase 4's
lowering pipeline.

- src/opt/ir.h, src/opt/ir.c: extend IR with CGTarget-shape ops
  (IR_COPY, IR_ADDR_OF, IR_TLS_ADDR_OF, IR_CMP_BRANCH,
  IR_SCOPE_BEGIN/ELSE/END, IR_BREAK_TO/CONTINUE_TO,
  IR_BINOP/UNOP/CMP/CONVERT, IR_LOAD_IMM/CONST, etc.); switch
  Inst.opnds from Val* to Operand* (collapsing Reg ↔ Val per
  doc/OPT.md §5.1); add Func.emit_order so replay walks blocks in
  the order CG visited them (cmp_branch fallthrough blocks are
  created after a label_new'd block but must physically follow it).
- src/opt/opt.c: drop the tape; record straight into Func; replay
  allocates target Regs lazily on first def via target->alloc_reg.
- src/opt/pass_cfg.c: rebuild Block.preds from succ[]/nsucc.
- src/opt/pass_ssa.c: Cooper-Harvey-Kennedy iterative dominators,
  dominance frontiers, mem2reg phi insertion at iterated DF, dom-
  tree rename DFS. Promotable slots identified via OPK_LOCAL on
  IR_LOAD/STORE.
- test/cg/run.sh: default CFREE_OPT_LEVELS to "0 1 2".
- doc/OPT.md: revise the level dial — both levels build CFG + SSA
  and (post-Phase 4) emit through SSA → machinize → regalloc →
  emit; the level selects only the optimization schedule. Level 1
  is the minimal/bisection-floor set; level 2 is the full pipeline
  with IPO.

Corpus stays green at all three levels (1742 pass).

Diffstat:
Mdoc/OPT.md | 225+++++++++++++++++++++++++++++++++++++++++++++----------------------------------
Asrc/opt/ir.c | 176+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/opt/ir.h | 353++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------
Msrc/opt/opt.c | 2325++++++++++++++++++++++++++++++-------------------------------------------------
Asrc/opt/pass_cfg.c | 116+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/opt/pass_ssa.c | 427+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/cg/run.sh | 60+++++++++++++++++++++++++++++++++++++-----------------------
7 files changed, 1966 insertions(+), 1716 deletions(-)

diff --git a/doc/OPT.md b/doc/OPT.md @@ -19,49 +19,63 @@ and builds out to a real intra+IPO pipeline. ## 1. What working opt must look like -Two distinct replay engines, sharing one recording front-end. - -### 1.1 Level-1 (function-at-a-time, no SSA) +A single shared engine. CG drives `opt_cgtarget`; every CGTarget call +lands as exactly one `Inst` in a per-function flat-CFG IR (§5.1). On +`func_end` (intra-procedural) and `finalize` (inter-procedural), the +wrapper runs an optimization schedule, lowers through machinize → +regalloc → emit, and drives the wrapped target CGTarget to produce +machine code. ``` -parse → cg → opt_cgtarget {record into Func} → on func_end: - optionally rewrite the recorded tape (peephole, constfold) - replay each Inst back as the matching CGTarget call → - wrapped target → MCEmitter → ObjBuilder +parse → cg → opt_cgtarget {record into Func} → + on func_end: build_cfg → build_ssa → <intra schedule> → <lower> + on finalize: <inter schedule> → for each dirty Func: <lower> + → wrapped target CGTarget → MCEmitter → ObjBuilder + + <lower> = make_conventional_ssa → ssa_combine → undo_ssa → + machinize → live_info → coalesce → regalloc → combine → + dce → opt_emit ``` -Replay is 1:1: each recorded `Inst` corresponds to one `target->method(...)` -call. No phis, no register allocation, no IR-level CFG transforms that -would break the correspondence. This preserves `doc/DESIGN.md` §8's -"function-at-a-time" streaming guarantee at -O1. +The level dial selects only the optimization schedule; everything +else is shared. -### 1.2 Level-2 (TU-buffered, SSA, IPO, full lowering) +### 1.1 Level 1 — minimal -``` -parse → cg → opt_cgtarget {record into Func} → on func_end: - store Func; do nothing else. - on cgtarget_finalize: - for each Func: build_cfg, build_ssa (incl. mem2reg), gvn, - copy_prop, dse, ssa_dce, licm, pressure_relief, - make_conventional_ssa, ssa_combine, undo_ssa, jump_opt - opt_inline + opt_cleanup (bounded iterations, -finline-iters=N) - for each Func: machinize, live_info, coalesce, regalloc, - combine, dce, opt_emit → wrapped target → MCEmitter -``` +Intra: `build_cfg`, `build_ssa` (mem2reg), `jump_opt`. No GVN, no +LICM, no DSE. +Inter: none (no inlining; no cleanup iteration). + +Just enough to land code through the SSA pipeline at quality +comparable to direct `CGTarget` lowering. The point of level 1 is +isolation: when a bug reproduces here, it is in IR construction, +SSA, or lowering — not in any of the value-changing passes. It is +the working bisection floor for the rest of the pipeline. -The level-2 path *cannot* preserve the 1:1 record→replay correspondence -once SSA/regalloc has run — the lowering pipeline is the only path -back to the wrapped CGTarget. This split is the single biggest -load-bearing decision in the design. +### 1.2 Level 2 — full + +Intra (per `doc/DESIGN.md` §9.2): `build_cfg`, `build_ssa` (mem2reg), +`gvn` (incl. constprop, redundant-load elim), `copy_prop` (incl. +redundant-extension elim), `dse`, `ssa_dce`, `jump_opt`, +`build_loop_tree` + `licm`, `addr_xform`, `block_cloning`, +`pressure_relief`. +Inter: `opt_inline` + `opt_cleanup` (bounded by `-finline-iters=N`, +default 1). ### 1.3 Equivalence we commit to For every case in `test/cg/CORPUS.md` Groups A–Q, building with -`opt_cgtarget_new(c, target, level)` for `level ∈ {1, 2}` must produce -the same `test_main` exit code as building against the AArch64 target -directly. DWARF (W path) equivalence is weaker — opt at level 2 may -collapse line rows or move locations into loclists — but Group P at -level 0/1 must be byte-identical. +`opt_cgtarget_new(c, target, level)` for `level ∈ {1, 2}` must +produce the same `test_main` exit code as building against the +AArch64 target directly. Both levels emit through the SSA → +machinize → regalloc → emit path, so neither is byte-equivalent to +level 0 — exit-code equivalence is the contract. + +DWARF (W path) equivalence is weaker. Level 1 aims for row-by-row +parity with level 0 in Group P; level 2 may collapse line rows or +move locations into loclists when value-changing passes fire. +Neither is a hard contract until the DWARF cross-level harness lands +(§3 Phase 5+). --- @@ -126,8 +140,10 @@ level 0/1 must be byte-identical. Each phase ends at a green test surface. Phases 0–3 are reversible — strip the wrapper back to identity replay and the system still works. -Phase 4 is the wedge: once SSA/regalloc is in the pipeline, you need -the full lowering path to get bytes back. +Phase 4 is the wedge: once the SSA → lowering pipeline is the only +path to bytes, both levels go through it. Levels 1 and 2 share the +pipeline from Phase 4 onward and diverge only in which optimization +passes the schedule includes (§1.1, §1.2). ### Phase 0 — wrapper skeleton + equivalence harness @@ -152,10 +168,14 @@ Harness: - Add `--opt-level N` flag to `test/cg/harness/cg_runner.c`. When `N > 0`, wrap the constructed `CGTarget*` with `opt_cgtarget_new(c, target, N)` before running the case. -- `test/cg/run.sh`: each case in Groups A–Q runs at `--opt-level 0` - and `--opt-level 1`; exit codes must match. Add `--opt-level 2` - once Phase 4 lands. Group P (DWARF/W path) stays at level 0 only - for now — opt-level equivalence on DWARF is a Phase 5+ concern. +- `test/cg/run.sh`: each case in Groups A–Q runs at `--opt-level 0`, + `1`, and `2`; exit codes must match across all three. Through + Phase 3, levels 1 and 2 share the recorder's 1:1 replay path with + level 2 additionally exercising `build_cfg` + `build_ssa` as a + dry-run (output discarded). Phase 4 promotes both levels to the + shared SSA → lower path. Group P (DWARF/W path) stays at level 0 + only for now — opt-level equivalence on DWARF is a Phase 5+ + concern. Tests: full Groups A–Q D/R/E/J pass at `--opt-level 0` and `--opt-level 1`. Any divergence means the forwarding lost data — fix @@ -222,30 +242,35 @@ disassembly diff against `--opt-level 0`. ### Phase 3 — SSA construction, dry-run Goal: build SSA without consuming it. Catches IR-shape bugs before -they matter. +they matter — Phase 4's lowering pipeline depends on the SSA being +correct, so we shake out construction bugs separately. - `src/opt/pass_cfg.c`: `opt_build_cfg` — derives `Block.preds`/`succ`/`nsucc` from terminators (`IR_BR`, `IR_CONDBR`, - `IR_RET`, `IR_LONGJMP`, `IR_INTRINSIC{TRAP,UNREACHABLE}`). - `IR_SETJMP` is a control barrier — splits its block but is not a - terminator (control falls through). + `IR_CMP_BRANCH`, `IR_RET`, `IR_LONGJMP`, `IR_BREAK_TO`, + `IR_CONTINUE_TO`, `IR_INTRINSIC{TRAP,UNREACHABLE}`). `IR_SETJMP` is + a control barrier — splits its block but is not a terminator + (control falls through). - `src/opt/pass_ssa.c`: `opt_build_ssa` — standard dominance-frontier algorithm; promotes any `FrameSlot` whose `FSF_ADDR_TAKEN` bit was never set (mem2reg folded in per `doc/DESIGN.md` §12). Inserts `IR_PHI` instructions with `IRPhiAux` populated. -- At level 2 these run on `func_end` but their output is *discarded* - before replay. Goal is "no panics on the corpus", not "improved +- These run at both levels on `func_end`, but their output is + *discarded* before replay until Phase 4 lands the lowering path. + Goal at this phase is "no panics on the corpus", not "improved code". -Tests: at level 2, recorder runs `build_cfg` + `build_ssa` then -forces level-1 replay. Corpus stays green; any panic is a real bug -(unhandled IROp, malformed CFG from goto/switch lowering, address- -taken-detection miss). +Tests: at levels 1 and 2, recorder runs `build_cfg` + `build_ssa` +then falls back to the recorder's 1:1 replay. Corpus stays green; +any panic is a real bug (unhandled IROp, malformed CFG from +goto/switch lowering, address-taken-detection miss). ### Phase 4 — lowering pipeline (the wedge) -Goal: real level-2 path: SSA → machinize → regalloc → emit. From -this point, level-2 cannot fall back to level-1 replay. +Goal: SSA → make_conventional_ssa → undo_ssa → machinize → +live_info → coalesce → regalloc → combine → dce → opt_emit, replacing +the recorder's 1:1 replay at *both* levels. From this point, neither +level falls back to record→replay. - `src/opt/pass_machinize.c`: `opt_machinize(Func*, Target)` — ABI lowering (calls into `TargetABI` for argument/return part @@ -259,53 +284,59 @@ this point, level-2 cannot fall back to level-1 replay. target for clobbers and the param/sret physical mapping). - `src/opt/pass_emit.c`: `opt_emit(Compiler*, Func*, CGTarget*)` — walks the lowered IR and drives the wrapped target via the - emit-side CGTarget surface. This is symmetric with `replay.c` from - Phase 1: Inst → target call. Difference is that operands are now - physical Operands (post-RA) and prolog/epilog/spill insertion has - happened. -- Wrapper's `finalize` at level 2 runs the per-Func intra pipeline, - then for each Func runs the lowering pipeline, then calls - `target->finalize(target)`. - -Initial pass set inside the intra pipeline is *empty* — `build_ssa` -followed immediately by `undo_ssa`/`make_conventional_ssa`. The -lowering pipeline does the real work. This isolates the lowering bugs -from the optimization bugs. - -Tests: `--opt-level 2` corpus is green. Equivalence harness extended -to cover level 2. - -Exit criterion: level-2 corpus green, no regressions in the level-1 -or level-0 suites. - -### Phase 5 — intra-procedural passes, one at a time - -Goal: enable real optimizations. Each pass is independently -toggleable for bisection. - -Order (tracking `doc/DESIGN.md` §9.2): - -1. `opt_gvn` (with constprop, redundant-load elim folded in). -2. `opt_copy_prop` (with redundant-extension elim). -3. `opt_dse`. -4. `opt_ssa_dce`. -5. `opt_jump_opt`. -6. `opt_build_loop_tree` + `opt_licm`. -7. `opt_addr_xform`. -8. `opt_block_cloning`. -9. `opt_pressure_relief`. -10. Lowering-time additions: `opt_coalesce`, `opt_combine`, - `opt_dce` (post-RA), live-range splitting in `opt_regalloc`. - -Each pass lands behind a flag so the equivalence harness can run with -just-this-pass to localize bugs. No UB-exploiting transformations -(`doc/DESIGN.md` §9): no signed-overflow-is-unreachable, no -shift-by-≥-width-is-unreachable, no division-by-zero-is-unreachable, -no null-deref-is-unreachable. - -### Phase 6 — inter-procedural - -Goal: cross-function inlining + cleanup iteration. + emit-side CGTarget surface. Inst → target call, but operands are + now physical Operands (post-RA) and prolog/epilog/spill insertion + has happened. +- Wrapper's `func_end` runs `build_cfg` + `build_ssa` + + `make_conventional_ssa` + `undo_ssa` + the lowering pipeline at + both levels. No optimization passes between SSA build and undo at + this phase — the lowering pipeline does all the work, so we + isolate lowering bugs from optimization-pass bugs. Level 1 and + level 2 are functionally identical at the end of Phase 4. + +Tests: `--opt-level 1` and `--opt-level 2` corpora green. The +recorder's old 1:1 replay path is removed; SSA → lower is the only +path to bytes. + +Exit criterion: levels 1 and 2 both green, no regressions in the +level-0 suite. + +### Phase 5 — intra-procedural passes, level-gated + +Goal: populate the optimization schedule. All new passes land in the +*level-2* schedule only; level 1's schedule stays at the Phase 4 set +(`build_cfg`, `build_ssa`, `jump_opt` once it's wired in) so it +remains the bisection floor for IR/SSA/lowering bugs. + +Order (tracking `doc/DESIGN.md` §9.2; each pass lands behind a flag +so the equivalence harness can run with just-this-pass to localize +bugs): + +1. `opt_jump_opt` — moves into the level-1 schedule too (cheap, + high-value, doesn't change values). +2. `opt_gvn` (with constprop, redundant-load elim folded in) — + level 2 only. +3. `opt_copy_prop` (with redundant-extension elim) — level 2 only. +4. `opt_dse` — level 2 only. +5. `opt_ssa_dce` — level 2 only. +6. `opt_build_loop_tree` + `opt_licm` — level 2 only. +7. `opt_addr_xform` — level 2 only. +8. `opt_block_cloning` — level 2 only. +9. `opt_pressure_relief` — level 2 only. +10. Lowering-time additions (run at both levels): `opt_coalesce`, + `opt_combine`, `opt_dce` (post-RA), live-range splitting in + `opt_regalloc`. + +No UB-exploiting transformations (`doc/DESIGN.md` §9): no +signed-overflow-is-unreachable, no shift-by-≥-width-is-unreachable, +no division-by-zero-is-unreachable, no null-deref-is-unreachable. + +### Phase 6 — inter-procedural (level 2 only) + +Goal: cross-function inlining + cleanup iteration. Level 1 stays +intra-procedural — the inliner is the largest source of correctness +risk and the largest divergence from level-0 codegen, so it earns +its own gating. - `opt_inline`: bottom-up call-graph walk. SCCs (mutual recursion) skipped. Heuristic: instruction count + call site count. Inlining diff --git a/src/opt/ir.c b/src/opt/ir.c @@ -0,0 +1,176 @@ +/* ir.c — Func/Block/Inst plumbing for the SSA IR (doc/OPT.md §1). + * + * Each CGTarget call recorded by opt_cgtarget produces exactly one Inst + * (or none, for pure bookkeeping calls like alloc_reg). Storage is per- + * Func arena, allocated against c->tu so the Func survives until + * cgtarget_finalize. + * + * Invariants: + * - VAL_NONE (= 0) is reserved; first allocated Val is 1. + * - val_def_block / val_def_inst / val_type / val_cls are parallel + * arrays indexed by Val. + * - Inst.opnds is Operand[] (not Val[]): Reg/Val are collapsed + * (doc/OPT.md §5.1) and OPK_REG operands' v.reg field IS the Val + * used at this site. Other OpKinds (IMM/LOCAL/GLOBAL/INDIRECT) are + * not Val uses for SSA dataflow. + */ + +#include "opt/ir.h" + +#include <string.h> + +#include "core/arena.h" +#include "core/core.h" + +/* ---- val table ---- */ + +static void val_table_grow(Func* f, u32 needed) { + if (needed <= f->vals_cap) return; + u32 ncap = f->vals_cap ? f->vals_cap : 16u; + while (ncap < needed) ncap *= 2u; + u32* nb_blk = arena_zarray(f->arena, u32, ncap); + u32* nb_ins = arena_zarray(f->arena, u32, ncap); + const Type** nb_ty = arena_zarray(f->arena, const Type*, ncap); + u8* nb_cls = arena_zarray(f->arena, u8, ncap); + if (f->nvals) { + memcpy(nb_blk, f->val_def_block, sizeof(u32) * f->nvals); + memcpy(nb_ins, f->val_def_inst, sizeof(u32) * f->nvals); + memcpy(nb_ty, f->val_type, sizeof(const Type*) * f->nvals); + memcpy(nb_cls, f->val_cls, sizeof(u8) * f->nvals); + } + f->val_def_block = nb_blk; + f->val_def_inst = nb_ins; + f->val_type = nb_ty; + f->val_cls = nb_cls; + f->vals_cap = ncap; +} + +Val ir_alloc_val(Func* f, const Type* t, u8 cls) { + Val v; + if (f->nvals == 0) { + val_table_grow(f, 16); + f->nvals = 1; /* reserve slot 0 for VAL_NONE */ + } + if (f->nvals == f->vals_cap) val_table_grow(f, f->nvals + 1); + v = f->nvals++; + f->val_def_block[v] = 0; + f->val_def_inst[v] = 0; + f->val_type[v] = t; + f->val_cls[v] = cls; + return v; +} + +/* ---- blocks ---- */ + +u32 ir_block_new(Func* f) { + Block* b; + if (f->nblocks == f->blocks_cap) { + u32 ncap = f->blocks_cap ? f->blocks_cap * 2u : 8u; + Block* nb = arena_zarray(f->arena, Block, ncap); + if (f->blocks) memcpy(nb, f->blocks, sizeof(Block) * f->nblocks); + f->blocks = nb; + f->blocks_cap = ncap; + } + b = &f->blocks[f->nblocks]; + memset(b, 0, sizeof *b); + b->id = f->nblocks; + return f->nblocks++; +} + +/* ---- emit order ---- */ + +void ir_note_emit(Func* f, u32 block) { + /* Linear scan: emit_order is small in practice (one entry per + * placed block, dozens at most for the corpus) and we only ever + * append, so a hash table would be overkill. */ + for (u32 i = 0; i < f->emit_order_n; ++i) + if (f->emit_order[i] == block) return; + if (f->emit_order_n == f->emit_order_cap) { + u32 ncap = f->emit_order_cap ? f->emit_order_cap * 2u : 8u; + u32* nb = arena_array(f->arena, u32, ncap); + if (f->emit_order) + memcpy(nb, f->emit_order, sizeof(u32) * f->emit_order_n); + f->emit_order = nb; + f->emit_order_cap = ncap; + } + f->emit_order[f->emit_order_n++] = block; +} + +/* ---- inst append ---- */ + +Inst* ir_emit(Func* f, u32 block, IROp op) { + Block* b = &f->blocks[block]; + Inst* in; + if (b->ninsts == b->cap) { + u32 ncap = b->cap ? b->cap * 2u : 8u; + Inst* nb = arena_zarray(f->arena, Inst, ncap); + if (b->insts) memcpy(nb, b->insts, sizeof(Inst) * b->ninsts); + b->insts = nb; + b->cap = ncap; + } + in = &b->insts[b->ninsts++]; + memset(in, 0, sizeof *in); + in->op = (u16)op; + return in; +} + +/* ---- frame slots / params ---- */ + +FrameSlot ir_frame_slot_new(Func* f, const FrameSlotDesc* d) { + IRFrameSlot* s; + FrameSlot id; + if (f->nframe_slots == f->frame_slots_cap) { + u32 ncap = f->frame_slots_cap ? f->frame_slots_cap * 2u : 8u; + IRFrameSlot* nb = arena_zarray(f->arena, IRFrameSlot, ncap); + if (f->frame_slots) + memcpy(nb, f->frame_slots, sizeof(IRFrameSlot) * f->nframe_slots); + f->frame_slots = nb; + f->frame_slots_cap = ncap; + } + id = (FrameSlot)(f->nframe_slots + 1); + s = &f->frame_slots[f->nframe_slots++]; + s->id = id; + s->type = d->type; + s->name = d->name; + s->loc = d->loc; + s->size = d->size; + s->align = d->align; + s->kind = d->kind; + s->flags = d->flags; + return id; +} + +void ir_param_add(Func* f, const CGParamDesc* d) { + IRParam* p; + if (f->nparams == f->params_cap) { + u32 ncap = f->params_cap ? f->params_cap * 2u : 4u; + IRParam* nb = arena_zarray(f->arena, IRParam, ncap); + if (f->params) memcpy(nb, f->params, sizeof(IRParam) * f->nparams); + f->params = nb; + f->params_cap = ncap; + } + p = &f->params[f->nparams++]; + p->index = d->index; + p->name = d->name; + p->type = d->type; + p->slot = d->slot; + p->abi = d->abi; + p->loc = d->loc; +} + +/* ---- construction ---- */ + +Func* ir_func_new(Compiler* c, const CGFuncDesc* desc) { + Func* f = arena_znew(c->tu, Func); + f->arena = c->tu; + f->desc = *desc; + f->name = desc->sym; + f->type = desc->fn_type; + /* Reserve slot 0 of the val table eagerly so the very first ir_alloc_val + * returns Val=1. */ + val_table_grow(f, 16); + f->nvals = 1; + /* Caller is expected to ir_block_new(f) for entry, then assign + * f->entry. */ + return f; +} diff --git a/src/opt/ir.h b/src/opt/ir.h @@ -6,122 +6,104 @@ #include "core/core.h" #include "type/type.h" +/* SSA value id. Identical to the Reg space (doc/OPT.md §5.1): when an + * Operand has kind OPK_REG, v.reg names the Val it carries. VAL_NONE=0 + * is reserved as a sentinel. */ typedef u32 Val; #define VAL_NONE 0u +/* IROps mirror the CGTarget surface 1:1 plus a handful of SSA-only ops + * (IR_PHI, IR_CONST_I, IR_CONST_BYTES). Each CGTarget method records as + * exactly one Inst, so level-1 replay is a flat walk that re-issues + * each Inst as one wrapped target call. doc/OPT.md §1.1. */ typedef enum IROp { IR_NOP, + /* SSA-only constants (used by const-propagation; recording uses + * load_imm/load_const which become IR_LOAD_IMM/IR_LOAD_CONST). */ IR_CONST_I, IR_CONST_BYTES, - IR_PARAM, - IR_ALLOCA, - IR_LOAD, - IR_STORE, - IR_AGG_COPY, - IR_AGG_SET, - IR_BITFIELD_LOAD, - IR_BITFIELD_STORE, - IR_IADD, - IR_ISUB, - IR_IMUL, - IR_SDIV, - IR_UDIV, - IR_SREM, - IR_UREM, - IR_FADD, - IR_FSUB, - IR_FMUL, - IR_FDIV, - IR_AND, - IR_OR, - IR_XOR, - IR_SHL, - IR_ASHR, - IR_LSHR, - IR_NEG, - IR_BNOT, - IR_CMP_EQ, - IR_CMP_NE, - IR_CMP_SLT, - IR_CMP_SLE, - IR_CMP_ULT, - IR_CMP_ULE, - IR_CMP_FLT, - IR_CMP_FLE, - IR_CMP_FEQ, - IR_CMP_FNE, - IR_SEXT, - IR_ZEXT, - IR_TRUNC, - IR_BITCAST, - IR_SITOFP, - IR_UITOFP, - IR_FPTOSI, - IR_FPTOUI, - IR_FPEXT, - IR_FPTRUNC, - IR_GEP, + + /* Param/frame declarations recorded from CGTarget.param. The frame + * slot id table lives separately on Func; this op records the + * sequence point so replay can re-issue target->param in order. */ + IR_PARAM_DECL, + + /* Address-bearing data movement. */ + IR_LOAD_IMM, /* opnds[0] dst REG; extra.imm = imm */ + IR_LOAD_CONST, /* opnds[0] dst REG; extra.cbytes */ + IR_COPY, /* opnds[0] dst REG, opnds[1] src REG */ + IR_LOAD, /* opnds[0] dst REG, opnds[1] addr; extra.mem */ + IR_STORE, /* opnds[0] addr, opnds[1] src REG|IMM; extra.mem */ + IR_ADDR_OF, /* opnds[0] dst REG, opnds[1] lv (LOCAL|GLOBAL|INDIRECT) */ + IR_TLS_ADDR_OF, /* opnds[0] dst REG; extra.aux = IRTlsAux */ + IR_AGG_COPY, /* opnds[0] dst, opnds[1] src; extra.aux = IRAggAux */ + IR_AGG_SET, /* opnds[0] dst, opnds[1] byte; extra.aux = IRAggAux */ + IR_BITFIELD_LOAD, /* opnds[0] dst REG, opnds[1] record; extra.aux */ + IR_BITFIELD_STORE, /* opnds[0] record, opnds[1] src; extra.aux */ + + /* Arithmetic / cmp / convert. opnds[0] dst REG, [1] a, optionally [2] b. + * extra.imm carries the BinOp/UnOp/CmpOp/ConvKind tag. */ + IR_BINOP, + IR_UNOP, + IR_CMP, + IR_CONVERT, + + /* Calls. extra.aux = IRCallAux (see below). defs = result Vals. */ IR_CALL, + + /* Phis. extra.aux = IRPhiAux. */ IR_PHI, - IR_BR, - IR_CONDBR, - IR_RET, - IR_ATOMIC_LOAD, - IR_ATOMIC_STORE, - IR_ATOMIC_RMW, /* extra.imm encodes (AtomicOp << 8) | MemOrder */ - IR_ATOMIC_CAS, /* extra.imm encodes (success << 8) | failure */ - IR_FENCE, /* extra.imm = MemOrder */ - IR_VA_START, - IR_VA_ARG, - IR_VA_END, - IR_VA_COPY, - IR_SETJMP, /* returns-twice; opt treats as control barrier */ - IR_LONGJMP, /* terminator-like; control does not return */ - IR_ASM_BLOCK, /* opaque to most passes; preserves order, defines outs, - clobbers */ - IR_INTRINSIC, /* extra.imm = IntrinKind; multi-result for *_OVERFLOW */ -} IROp; -typedef struct IRCallAux { - const Type* fn_type; - const ABIFuncInfo* abi; - ObjSymId direct_sym; /* OBJ_SYM_NONE for indirect */ - Val callee; /* VAL_NONE for direct_sym calls */ - u32 nargs; - Val* args; - u32 nresults; - Val* results; /* ABI return parts and multi-result builtins */ - CGABIValue ret_abi; -} IRCallAux; + /* Control flow / scopes. */ + IR_BR, /* unconditional. block.succ[0] = target block id. */ + IR_CONDBR, /* opnds[0] cond REG; succ[0] = true, succ[1] = false. */ + IR_CMP_BRANCH, /* fused. opnds = [a, b]; extra.imm = CmpOp; + succ[0] = taken, succ[1] = fallthrough. */ + IR_RET, /* extra.aux = IRRetAux* (NULL for void). */ + IR_SCOPE_BEGIN, /* extra.aux = IRScopeAux. defs[0] = scope id Val. */ + IR_SCOPE_ELSE, /* extra.imm = scope id (Val). */ + IR_SCOPE_END, /* extra.imm = scope id (Val). */ + IR_BREAK_TO, /* extra.imm = scope id (Val). */ + IR_CONTINUE_TO, /* extra.imm = scope id (Val). */ -typedef struct IRFrameSlot { - FrameSlot id; - const Type* type; - Sym name; - SrcLoc loc; - u32 size; - u32 align; - u8 kind; /* FrameSlotKind */ - u8 pad; - u16 flags; /* FrameSlotFlag */ -} IRFrameSlot; + /* alloca / variadics. */ + IR_ALLOCA, /* opnds = [dst REG, size]; extra.imm = align */ + IR_VA_START, /* opnds = [ap] */ + IR_VA_ARG, /* opnds = [dst REG, ap]; extra.aux = const Type* */ + IR_VA_END, /* opnds = [ap] */ + IR_VA_COPY, /* opnds = [dst, src] */ -typedef struct IRParam { - u32 index; - Sym name; - const Type* type; - FrameSlot slot; - const ABIArgInfo* abi; - SrcLoc loc; -} IRParam; + /* setjmp/longjmp. */ + IR_SETJMP, /* opnds = [dst REG, buf] */ + IR_LONGJMP, /* opnds = [buf, val]; (terminator-like, no fallthrough) */ -typedef struct IRMemAux { - MemAccess mem; -} IRMemAux; + /* Atomics. */ + IR_ATOMIC_LOAD, /* opnds = [dst, addr]; extra.aux = IRAtomicAux */ + IR_ATOMIC_STORE, /* opnds = [addr, src]; extra.aux = IRAtomicAux */ + IR_ATOMIC_RMW, /* opnds = [dst, addr, val]; extra.aux */ + IR_ATOMIC_CAS, /* defs = [prior, ok]; extra.aux = IRCasAux */ + IR_FENCE, /* extra.imm = MemOrder */ + + /* Inline asm. extra.aux = IRAsmAux. */ + IR_ASM_BLOCK, -typedef struct IRAggregateAux { + /* Compiler intrinsics (see arch.h IntrinKind). extra.aux = IRIntrinAux. */ + IR_INTRINSIC, + + /* set_loc is *not* an IR op — SrcLoc is sticky on Inst.loc, applied at + * recording time from the wrapper's pending_loc. */ +} IROp; + +/* ---- per-op aux structs ---- */ + +typedef struct IRTlsAux { + ObjSymId sym; + i64 addend; +} IRTlsAux; + +typedef struct IRAggAux { AggregateAccess access; -} IRAggregateAux; +} IRAggAux; typedef struct IRBitFieldAux { BitFieldAccess access; @@ -137,42 +119,119 @@ typedef struct IRPhiAux { u32 npreds; u32* pred_blocks; Val* pred_vals; + u32 slot_id; /* 0 if not from mem2reg; else 1-based FrameSlot id */ } IRPhiAux; -typedef struct IRAsmAux { - const char* tmpl; - const AsmConstraint* outs; - const AsmConstraint* ins; - const Sym* clobbers; - u32 nout, nin, nclob; -} IRAsmAux; +/* IR_CALL aux. The CGTarget interface is rich enough that we keep the + * full descriptor for replay; SSA passes inspect args/results in their + * Val form via the CGABIValue.storage.v.reg fields where applicable. */ +typedef struct IRCallAux { + CGCallDesc desc; + /* Result Vals (one per ABI-decomposed return part). 0 for void. */ + u32 nresults; + Val* results; + /* For SSA-aware passes: the call may have args that are REG operands + * (carrying Val uses). They live in desc.args[i].storage.v.reg or + * desc.args[i].parts[k].op.v.reg. */ +} IRCallAux; + +typedef struct IRRetAux { + u8 present; /* 0 → void return; ret(NULL) at replay */ + CGABIValue val; +} IRRetAux; + +typedef struct IRScopeAux { + CGScopeDesc desc; + u32 scope_id; /* 1-based; the CGScope handed back to the caller */ + /* For SCOPE_IF: blocks for then-arm, else-arm, and join after end. + * For SCOPE_LOOP/SCOPE_BLOCK: the caller-supplied break/continue + * labels translated to block ids; if the desc passed LABEL_NONE we + * leave 0 (unused — break_to/continue_to is illegal in that case). */ + u32 if_then_block; + u32 if_else_block; + u32 if_end_block; + u32 loop_break_block; + u32 loop_continue_block; + u8 if_has_else; +} IRScopeAux; + +typedef struct IRAtomicAux { + MemAccess mem; + MemOrder mo; + u8 op; /* AtomicOp; valid for IR_ATOMIC_RMW */ +} IRAtomicAux; typedef struct IRCasAux { MemAccess mem; MemOrder success; MemOrder failure; - Val prior; - Val ok; } IRCasAux; +typedef struct IRAsmAux { + const char* tmpl; + AsmConstraint* outs; + AsmConstraint* ins; + Sym* clobbers; + Operand* out_ops; /* nout slots; the wrapped target may fill in REG location */ + u32 nout, nin, nclob; +} IRAsmAux; + +typedef struct IRIntrinAux { + IntrinKind kind; + Operand* dsts; /* ndst */ + Operand* args; /* narg */ + Val* result_vals; /* one per dst that's a REG, parallel to dsts */ + u32 ndst, narg; +} IRIntrinAux; + +typedef struct IRParamDeclAux { + CGParamDesc desc; +} IRParamDeclAux; + +/* ---- frame slots / params ---- */ + +typedef struct IRFrameSlot { + FrameSlot id; + const Type* type; + Sym name; + SrcLoc loc; + u32 size; + u32 align; + u8 kind; /* FrameSlotKind */ + u8 pad; + u16 flags; /* FrameSlotFlag */ +} IRFrameSlot; + +typedef struct IRParam { + u32 index; + Sym name; + const Type* type; + FrameSlot slot; + const ABIArgInfo* abi; + SrcLoc loc; +} IRParam; + +/* ---- Inst / Block / Func ---- */ + typedef struct Inst { u16 op; - u16 flags; /* per-op flags (e.g. nsw/nuw, volatile) */ - SrcLoc loc; /* set from CGTarget.set_loc when this insn was recorded */ + u16 flags; + SrcLoc loc; /* sticky from CGTarget.set_loc at recording */ const Type* type; - Val def; /* this instruction's SSA value, or VAL_NONE */ - u32 ndefs; /* multi-result instructions use defs[0..ndefs) */ - Val* defs; /* arena-allocated; NULL when ndefs <= 1 */ + Val def; /* primary SSA def, or VAL_NONE */ + u32 ndefs; /* multi-result */ + Val* defs; + /* Operands. We use Operand instead of Val so that the original + * CGTarget call shape (IMM / LOCAL / GLOBAL / INDIRECT in addition to + * REG) round-trips at replay. SSA passes treat OPK_REG operands as + * Val uses (via .v.reg). */ u32 nopnds; - Val* opnds; /* arena-allocated */ + Operand* opnds; union { i64 imm; ConstBytes cbytes; - struct { - ObjSymId sym; - } objsym; MemAccess mem; - void* aux; /* one of IR*Aux, arena-owned and typed by op */ + void* aux; } extra; } Inst; @@ -182,44 +241,66 @@ typedef struct Block { u32 ninsts, cap; u32* preds; u32 npreds; - u32 succ[2]; /* condbr: 2; br: 1; ret: 0 */ + u32 succ[2]; u8 nsucc; } Block; typedef struct Func { - /* IR storage. Lives until cgtarget_finalize so inter-procedural passes can - * read every Func in the TU. Per-pass scratch goes in Arena scratch, not - * here. */ Arena* arena; - ObjSymId name; + CGFuncDesc desc; /* preserved for level-1 replay func_begin */ + ObjSymId name; /* alias for desc.sym (kept for older callers) */ const Type* type; Block* blocks; u32 nblocks, blocks_cap; - u32 entry; /* index of entry block */ + u32 entry; IRFrameSlot* frame_slots; u32 nframe_slots, frame_slots_cap; IRParam* params; u32 nparams, params_cap; - /* Value table: for each Val, where it's defined and its type. */ + /* Value table. Index 0 is VAL_NONE; first allocated Val is 1. */ u32* val_def_block; u32* val_def_inst; const Type** val_type; + u8* val_cls; /* RegClass per Val, used by replay to reconstruct REG operands */ u32 nvals, vals_cap; + + /* Scope id table. Indexed by scope_id (1-based). Values map to + * IRScopeAux entries (via the IR_SCOPE_BEGIN inst). Stored as a flat + * pointer table for O(1) lookup during scope_else/end/break/continue + * recording and replay. */ + Inst** scope_aux_inst; + u32 nscopes, scopes_cap; + + /* Emit order: the sequence in which blocks first became `cur` during + * recording. This is the natural order CG's emit cursor visited each + * block, so replay must follow it (block-creation order can differ — + * e.g. label_new(L) precedes cmp_branch but the cmp_branch's + * fallthrough block is what physically follows the cmp_branch in + * code). Blocks not present here are unreachable / unplaced; replay + * skips them. */ + u32* emit_order; + u32 emit_order_n, emit_order_cap; } Func; -Func* ir_func_new(Arena*, ObjSymId, const Type* fn_type); +/* ---- API ---- */ + +Func* ir_func_new(Compiler*, const CGFuncDesc*); + u32 ir_block_new(Func*); -Val ir_emit(Func*, u32 block, IROp, const Type* result, const Val* opnds, - u32 n); -void ir_emit_multi(Func*, u32 block, IROp, const Type** results, Val* defs, - u32 ndefs, const Val* opnds, u32 nopnds); -Val ir_emit_const_i(Func*, u32 block, const Type*, i64); -Val ir_emit_const_bytes(Func*, u32 block, ConstBytes); FrameSlot ir_frame_slot_new(Func*, const FrameSlotDesc*); void ir_param_add(Func*, const CGParamDesc*); -void ir_set_terminator(Func*, u32 block, IROp, u32 succ_a, u32 succ_b, - Val cond); + +Val ir_alloc_val(Func*, const Type*, u8 cls); + +Inst* ir_emit(Func*, u32 block, IROp); + +/* Append `block` to f->emit_order if not already present. Called by + * the wrapper whenever cur transitions to a block. */ +void ir_note_emit(Func*, u32 block); +/* Append Inst to block; caller fills op-specific fields. The Inst is + * arena-resident; its address is stable until the Block.insts array is + * reallocated by phi insertion (which fixes up references). */ #endif diff --git a/src/opt/opt.c b/src/opt/opt.c @@ -1,22 +1,13 @@ -/* opt — CGTarget wrapper that records each function as a tape of - * CGTarget calls, then replays them onto the wrapped target on - * func_end. See doc/OPT.md for the phased plan. +/* opt.c — CGTarget wrapper that records each function as IR (doc/OPT.md + * §1). Each CGTarget call lands as exactly one Inst in the current + * Func's current block; the wrapper's vreg/vlabel/vslot/vscope ids + * coincide with their IR counterparts (Reg ↔ Val, label ↔ block id, + * vslot ↔ IR FrameSlot, vscope ↔ scope_aux index — doc/OPT.md §5.1). * - * Phase 1 (current): record every emit-side call into a per-function - * tape; alloc_reg / frame_slot / label_new / scope_begin hand out - * wrapper-local virtual ids. On func_end the tape is replayed - * linearly: each entry produces exactly one wrapped target call, - * with virtual ids translated to target-side ids on the fly. This - * preserves doc/DESIGN.md §8's "function-at-a-time" streaming - * guarantee at -O1. - * - * Phase 2 (current): a small, safe peephole pass runs over the tape - * between recording and replay. See try_peephole_constfold. - * - * Phase 3+ (deferred): build CFG and SSA from the tape, run - * intra-procedural passes, lower through machinize → regalloc → - * emit. Until that lands, level 2 is functionally identical to - * level 1 (per-function record + replay). + * Phase 3 (current): on func_end the wrapper optionally runs + * opt_build_cfg + opt_build_ssa (level 2; output discarded), then + * replays each Func into the wrapped target. Level 1 skips the SSA + * passes; in both cases replay is the level-1 1:1 walk. * * Methods the wrapper rejects under unbounded virtuals: * - clobbers / spill_reg / reload_reg are CG -O0 register-pressure @@ -31,346 +22,21 @@ #include "core/arena.h" #include "core/core.h" - -/* ---- tape op tags ---- */ - -typedef enum { - TOP_FUNC_BEGIN, - TOP_FUNC_END, - TOP_ALLOC_REG, - TOP_FRAME_SLOT, - TOP_PARAM, - TOP_LABEL_NEW, - TOP_LABEL_PLACE, - TOP_JUMP, - TOP_CMP_BRANCH, - TOP_SCOPE_BEGIN, - TOP_SCOPE_ELSE, - TOP_SCOPE_END, - TOP_BREAK_TO, - TOP_CONTINUE_TO, - TOP_LOAD_IMM, - TOP_LOAD_CONST, - TOP_COPY, - TOP_LOAD, - TOP_STORE, - TOP_ADDR_OF, - TOP_TLS_ADDR_OF, - TOP_COPY_BYTES, - TOP_SET_BYTES, - TOP_BITFIELD_LOAD, - TOP_BITFIELD_STORE, - TOP_BINOP, - TOP_UNOP, - TOP_CMP, - TOP_CONVERT, - TOP_CALL, - TOP_RET, - TOP_ALLOCA, - TOP_VA_START, - TOP_VA_ARG, - TOP_VA_END, - TOP_VA_COPY, - TOP_SETJMP, - TOP_LONGJMP, - TOP_ATOMIC_LOAD, - TOP_ATOMIC_STORE, - TOP_ATOMIC_RMW, - TOP_ATOMIC_CAS, - TOP_FENCE, - TOP_INTRINSIC, - TOP_SET_LOC, -} TapeOpKind; - -/* TapeEntry: one recorded CGTarget call. The tagged union is wide; we - * pay arena bytes for clarity. */ -typedef struct TapeEntry { - u8 op; /* TapeOpKind */ - u8 dead; /* set by peepholes; replay skips dead entries */ - u16 padding; - SrcLoc loc; - union { - /* WOP_FUNC_BEGIN: deep-copied descriptor. The caller's CGFuncDesc - * may be stack-allocated, so we copy by value into our arena. - * params[] is also copied; field shapes inside (Type*, ABIArgInfo*, - * incoming pointer) are TU-lifetime and shared. */ - struct { - CGFuncDesc desc; - CGParamDesc* params; /* arena copy of fd.params */ - } func_begin; - - /* WOP_ALLOC_REG: returns a vreg, indexed into reg_map at replay. */ - struct { - RegClass cls; - const Type* ty; - Reg vreg; - } alloc_reg; - - /* WOP_FRAME_SLOT */ - struct { - FrameSlotDesc desc; - FrameSlot vslot; - } frame_slot; - - /* WOP_PARAM */ - struct { - CGParamDesc desc; - } param; - - /* WOP_LABEL_NEW */ - struct { - Label vlabel; - } label_new; - - /* WOP_LABEL_PLACE / WOP_JUMP */ - struct { - Label vlabel; - } label_op; - - /* WOP_CMP_BRANCH */ - struct { - CmpOp op; - Operand a, b; - Label vlabel; - } cmp_branch; - - /* WOP_SCOPE_BEGIN */ - struct { - CGScopeDesc desc; - CGScope vscope; - } scope_begin; - - /* WOP_SCOPE_ELSE / WOP_SCOPE_END / WOP_BREAK_TO / WOP_CONTINUE_TO */ - struct { - CGScope vscope; - } scope_op; - - /* WOP_LOAD_IMM */ - struct { - Operand dst; - i64 imm; - } load_imm; - - /* WOP_LOAD_CONST */ - struct { - Operand dst; - ConstBytes cb; - } load_const; - - /* WOP_COPY / WOP_ADDR_OF / WOP_VA_COPY */ - struct { - Operand dst; - Operand src; - } copy; - - /* WOP_LOAD */ - struct { - Operand dst; - Operand addr; - MemAccess mem; - } load; - - /* WOP_STORE */ - struct { - Operand addr; - Operand src; - MemAccess mem; - } store; - - /* WOP_TLS_ADDR_OF */ - struct { - Operand dst; - ObjSymId sym; - i64 addend; - } tls_addr_of; - - /* WOP_COPY_BYTES / WOP_SET_BYTES */ - struct { - Operand a; - Operand b; - AggregateAccess agg; - } agg; - - /* WOP_BITFIELD_LOAD */ - struct { - Operand dst; - Operand record; - BitFieldAccess bf; - } bitfield_load; - - /* WOP_BITFIELD_STORE */ - struct { - Operand record; - Operand src; - BitFieldAccess bf; - } bitfield_store; - - /* WOP_BINOP */ - struct { - BinOp op; - Operand dst, a, b; - } binop; - - /* WOP_UNOP */ - struct { - UnOp op; - Operand dst, a; - } unop; - - /* WOP_CMP */ - struct { - CmpOp op; - Operand dst, a, b; - } cmp; - - /* WOP_CONVERT */ - struct { - ConvKind kind; - Operand dst, src; - } convert; - - /* WOP_CALL: deep-copied descriptor and inner arrays. */ - struct { - CGCallDesc desc; - CGABIValue* args; /* len = desc.nargs */ - CGABIPart* ret_parts; /* len = desc.ret.nparts; NULL if 0 */ - CGABIPart** arg_parts; /* per-arg parts arrays; entry is NULL if 0 */ - } call; - - /* WOP_RET: present == 1 means there is a CGABIValue; otherwise a - * void return. parts is deep-copied. */ - struct { - u8 present; - CGABIValue val; - CGABIPart* parts; /* len = val.nparts */ - } ret; - - /* WOP_ALLOCA */ - struct { - Operand dst; - Operand size; - u32 align; - } alloca_; - - /* WOP_VA_START / WOP_VA_END */ - struct { - Operand ap; - } va_se; - - /* WOP_VA_ARG */ - struct { - Operand dst; - Operand ap; - const Type* ty; - } va_arg_; - - /* WOP_SETJMP */ - struct { - Operand dst; - Operand buf; - } setjmp_; - - /* WOP_LONGJMP */ - struct { - Operand buf; - Operand val; - } longjmp_; - - /* WOP_ATOMIC_LOAD */ - struct { - Operand dst; - Operand addr; - MemAccess mem; - MemOrder mo; - } atomic_load; - - /* WOP_ATOMIC_STORE */ - struct { - Operand addr; - Operand src; - MemAccess mem; - MemOrder mo; - } atomic_store; - - /* WOP_ATOMIC_RMW */ - struct { - AtomicOp op; - Operand dst; - Operand addr; - Operand val; - MemAccess mem; - MemOrder mo; - } atomic_rmw; - - /* WOP_ATOMIC_CAS */ - struct { - Operand prior; - Operand ok; - Operand addr; - Operand expected; - Operand desired; - MemAccess mem; - MemOrder success; - MemOrder failure; - } atomic_cas; - - /* WOP_FENCE */ - struct { - MemOrder mo; - } fence; - - /* WOP_INTRINSIC */ - struct { - IntrinKind kind; - Operand* dsts; /* deep-copied */ - u32 ndst; - Operand* args; /* deep-copied */ - u32 narg; - } intrinsic; - - /* WOP_SET_LOC */ - struct { - SrcLoc loc; - } set_loc; - } u; -} TapeEntry; +#include "opt/ir.h" /* ---- wrapper state ---- */ typedef struct OptImpl { CGTarget base; - CGTarget* target; /* wrapped */ + CGTarget* target; int level; Compiler* c; - /* Tape: per-function, reset on func_begin. Allocated from c->tu so - * the buffer survives panic via compiler_defer cleanups. */ - TapeEntry* tape; - u32 ntape, tape_cap; - - /* Wrapper-local virtual id counters. 1-based; 0 reserved as NONE. - * Reset on each func_begin. */ - Reg next_vreg; - FrameSlot next_vslot; - Label next_vlabel; - CGScope next_vscope; - - /* Replay-time translation tables. Index by virtual id; entry 0 is - * the NONE sentinel and never referenced. Allocated lazily on first - * replay so peak size matches the largest function. */ - Reg* reg_map; - u32 reg_map_cap; - FrameSlot* slot_map; - u32 slot_map_cap; - Label* label_map; - u32 label_map_cap; - CGScope* scope_map; - u32 scope_map_cap; - - SrcLoc pending_loc; /* most recent set_loc; stamped onto each entry */ + /* Current function being recorded. NULL between functions. */ + Func* f; + u32 cur; /* current block id */ + SrcLoc pending_loc; /* most recent set_loc; stamped on each Inst */ - /* If non-NULL, dump the tape to this writer on each func_end (before - * replay). Used by cg-runner --dump-tape and ad-hoc debugging. */ Writer* dump_writer; } OptImpl; @@ -378,179 +44,94 @@ static OptImpl* impl_of(CGTarget* t) { return (OptImpl*)t; } static _Noreturn void panic_unsupported(OptImpl* o, const char* what) { SrcLoc loc = {0, 0, 0}; - compiler_panic(o->c, loc, "opt_cgtarget: %s called under unbounded virtuals", - what); + compiler_panic(o->c, loc, + "opt_cgtarget: %s called under unbounded virtuals", what); } -/* ---- tape append ---- */ +/* ---- recording helpers ---- */ -static TapeEntry* tape_append(OptImpl* o, TapeOpKind op) { - TapeEntry* e; - if (o->ntape == o->tape_cap) { - u32 ncap = o->tape_cap ? o->tape_cap * 2u : 64u; - TapeEntry* nb = arena_array(o->c->tu, TapeEntry, ncap); - if (o->tape) memcpy(nb, o->tape, sizeof(TapeEntry) * o->ntape); - o->tape = nb; - o->tape_cap = ncap; - } - e = &o->tape[o->ntape++]; - memset(e, 0, sizeof *e); - e->op = (u8)op; - e->loc = o->pending_loc; - return e; +static Inst* rec(OptImpl* o, IROp op) { + Inst* in = ir_emit(o->f, o->cur, op); + in->loc = o->pending_loc; + return in; } -/* ---- deep-copy helpers ---- */ +static void set_def(Func* f, Inst* in, u32 block, Val v, const Type* t) { + in->def = v; + in->type = t; + if (v != VAL_NONE && v < f->nvals) { + f->val_def_block[v] = block; + f->val_def_inst[v] = f->blocks[block].ninsts - 1u; + } +} -static CGParamDesc* copy_params(Compiler* c, const CGParamDesc* src, u32 n) { - CGParamDesc* dst; +static Operand* dup_opnds(Func* f, const Operand* src, u32 n) { if (!n) return NULL; - dst = arena_array(c->tu, CGParamDesc, n); - memcpy(dst, src, sizeof(CGParamDesc) * n); + Operand* dst = arena_array(f->arena, Operand, n); + memcpy(dst, src, sizeof(Operand) * n); return dst; } -static CGABIPart* copy_parts(Compiler* c, const CGABIPart* src, u32 n) { - CGABIPart* dst; - if (!n) return NULL; - dst = arena_array(c->tu, CGABIPart, n); - memcpy(dst, src, sizeof(CGABIPart) * n); - return dst; +static int cur_terminated(OptImpl* o) { + Block* b = &o->f->blocks[o->cur]; + if (b->nsucc > 0) return 1; + if (b->ninsts == 0) return 0; + IROp last = (IROp)b->insts[b->ninsts - 1].op; + return last == IR_RET || last == IR_LONGJMP; } -static Operand* copy_operands(Compiler* c, const Operand* src, u32 n) { - Operand* dst; - if (!n) return NULL; - dst = arena_array(c->tu, Operand, n); - memcpy(dst, src, sizeof(Operand) * n); - return dst; +static void set_cur(OptImpl* o, u32 b) { + o->cur = b; + ir_note_emit(o->f, b); } -/* ---- map helpers (replay-time) ---- - * The maps are direct-indexed by the 1-based virtual id; entry 0 is - * the NONE sentinel. */ - -static void map_reg_grow(OptImpl* o, u32 needed) { - u32 ncap; - Reg* nb; - if (needed <= o->reg_map_cap) return; - ncap = o->reg_map_cap ? o->reg_map_cap : 16u; - while (ncap < needed) ncap *= 2u; - nb = arena_array(o->c->tu, Reg, ncap); - if (o->reg_map) memcpy(nb, o->reg_map, sizeof(Reg) * o->reg_map_cap); - /* New slots default to REG_NONE (0xffffffff). */ - for (u32 i = o->reg_map_cap; i < ncap; ++i) nb[i] = REG_NONE; - o->reg_map = nb; - o->reg_map_cap = ncap; -} - -static void map_slot_grow(OptImpl* o, u32 needed) { - u32 ncap; - FrameSlot* nb; - if (needed <= o->slot_map_cap) return; - ncap = o->slot_map_cap ? o->slot_map_cap : 16u; - while (ncap < needed) ncap *= 2u; - nb = arena_array(o->c->tu, FrameSlot, ncap); - if (o->slot_map) memcpy(nb, o->slot_map, sizeof(FrameSlot) * o->slot_map_cap); - for (u32 i = o->slot_map_cap; i < ncap; ++i) nb[i] = FRAME_SLOT_NONE; - o->slot_map = nb; - o->slot_map_cap = ncap; -} - -static void map_label_grow(OptImpl* o, u32 needed) { - u32 ncap; - Label* nb; - if (needed <= o->label_map_cap) return; - ncap = o->label_map_cap ? o->label_map_cap : 16u; - while (ncap < needed) ncap *= 2u; - nb = arena_array(o->c->tu, Label, ncap); - if (o->label_map) memcpy(nb, o->label_map, sizeof(Label) * o->label_map_cap); - for (u32 i = o->label_map_cap; i < ncap; ++i) nb[i] = LABEL_NONE; - o->label_map = nb; - o->label_map_cap = ncap; -} - -static void map_scope_grow(OptImpl* o, u32 needed) { - u32 ncap; - CGScope* nb; - if (needed <= o->scope_map_cap) return; - ncap = o->scope_map_cap ? o->scope_map_cap : 8u; - while (ncap < needed) ncap *= 2u; - nb = arena_array(o->c->tu, CGScope, ncap); - if (o->scope_map) memcpy(nb, o->scope_map, sizeof(CGScope) * o->scope_map_cap); - for (u32 i = o->scope_map_cap; i < ncap; ++i) nb[i] = CG_SCOPE_NONE; - o->scope_map = nb; - o->scope_map_cap = ncap; -} - -/* ---- recording: every emit-side method records a tape entry. - * - * Allocator methods (alloc_reg, frame_slot, label_new, scope_begin) - * additionally hand back a wrapper-local virtual id; the underlying - * target is not consulted until replay. */ +/* After emitting a terminator, allocate a fresh block for any + * subsequent (likely unreachable) recording. */ +static void after_terminator(OptImpl* o) { + set_cur(o, ir_block_new(o->f)); +} + +/* ---- function lifecycle ---- */ static void w_func_begin(CGTarget* t, const CGFuncDesc* fd) { OptImpl* o = impl_of(t); - TapeEntry* e; - - /* Reset per-function state. */ - o->tape = NULL; - o->ntape = 0; - o->tape_cap = 0; - o->next_vreg = 1; - o->next_vslot = 1; - o->next_vlabel = 1; - o->next_vscope = 1; + o->f = ir_func_new(o->c, fd); + u32 entry = ir_block_new(o->f); + o->f->entry = entry; + set_cur(o, entry); o->pending_loc = (SrcLoc){0, 0, 0}; - /* Reset translation maps; capacities are kept for amortization. */ - for (u32 i = 0; i < o->reg_map_cap; ++i) o->reg_map[i] = REG_NONE; - for (u32 i = 0; i < o->slot_map_cap; ++i) o->slot_map[i] = FRAME_SLOT_NONE; - for (u32 i = 0; i < o->label_map_cap; ++i) o->label_map[i] = LABEL_NONE; - for (u32 i = 0; i < o->scope_map_cap; ++i) o->scope_map[i] = CG_SCOPE_NONE; - - e = tape_append(o, TOP_FUNC_BEGIN); - /* Shallow-copy the descriptor by value, then deep-copy the params - * array — the harness mutates pds[i].slot AFTER func_begin returns, - * so we can't rely on pointer-shallow-copy for that field. The slots - * we record here are wrapper vslots (allocated by w_frame_slot in the - * subsequent param-setup loop); replay translates them. */ - e->u.func_begin.desc = *fd; - e->u.func_begin.params = copy_params(o->c, fd->params, fd->nparams); - e->u.func_begin.desc.params = e->u.func_begin.params; } static void w_func_end(CGTarget* t); +/* ---- registers and frame slots ---- */ + static Reg w_alloc_reg(CGTarget* t, RegClass cls, const Type* ty) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_ALLOC_REG); - Reg vreg = o->next_vreg++; - e->u.alloc_reg.cls = cls; - e->u.alloc_reg.ty = ty; - e->u.alloc_reg.vreg = vreg; - return vreg; + Val v = ir_alloc_val(o->f, ty, (u8)cls); + return (Reg)v; } static void w_free_reg(CGTarget* t, Reg r) { - /* Hint; opt_cgtarget ignores. The wrapper's vregs are unbounded — - * there is no pool to return to. */ (void)t; (void)r; } static FrameSlot w_frame_slot(CGTarget* t, const FrameSlotDesc* d) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_FRAME_SLOT); - FrameSlot vslot = o->next_vslot++; - e->u.frame_slot.desc = *d; - e->u.frame_slot.vslot = vslot; - return vslot; + return ir_frame_slot_new(o->f, d); } static void w_param(CGTarget* t, const CGParamDesc* d) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_PARAM); - e->u.param.desc = *d; + /* Deep-copy parts so caller-stack memory isn't relied on. */ + CGParamDesc copy = *d; + if (d->nincoming) { + CGABIPart* parts = arena_array(o->f->arena, CGABIPart, d->nincoming); + memcpy(parts, d->incoming, sizeof(CGABIPart) * d->nincoming); + copy.incoming = parts; + } + ir_param_add(o->f, &copy); } static const Reg* w_clobbers(CGTarget* t, RegClass cls, u32* nregs) { @@ -571,317 +152,541 @@ static void w_reload_reg(CGTarget* t, Operand dst, FrameSlot s, MemAccess m) { panic_unsupported(impl_of(t), "reload_reg"); } +/* ---- labels and control flow ---- */ + static Label w_label_new(CGTarget* t) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_LABEL_NEW); - Label v = o->next_vlabel++; - e->u.label_new.vlabel = v; - return v; + return (Label)ir_block_new(o->f); } static void w_label_place(CGTarget* t, Label l) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_LABEL_PLACE); - e->u.label_op.vlabel = l; + u32 target_blk = (u32)l; + if (target_blk >= o->f->nblocks) { + SrcLoc loc = {0, 0, 0}; + compiler_panic(o->c, loc, "opt: label_place(%u) out of range", + (unsigned)l); + } + if (!cur_terminated(o)) { + Block* cb = &o->f->blocks[o->cur]; + rec(o, IR_BR); + cb->succ[0] = target_blk; + cb->nsucc = 1; + } + set_cur(o, target_blk); } + static void w_jump(CGTarget* t, Label l) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_JUMP); - e->u.label_op.vlabel = l; + u32 target_blk = (u32)l; + Block* cb = &o->f->blocks[o->cur]; + rec(o, IR_BR); + cb->succ[0] = target_blk; + cb->nsucc = 1; + after_terminator(o); } + static void w_cmp_branch(CGTarget* t, CmpOp op, Operand a, Operand b, Label l) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_CMP_BRANCH); - e->u.cmp_branch.op = op; - e->u.cmp_branch.a = a; - e->u.cmp_branch.b = b; - e->u.cmp_branch.vlabel = l; + u32 taken = (u32)l; + Inst* in = rec(o, IR_CMP_BRANCH); + Operand ops[2] = {a, b}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + in->extra.imm = (i64)op; + Block* cb = &o->f->blocks[o->cur]; + cb->succ[0] = taken; + u32 ft = ir_block_new(o->f); + cb->succ[1] = ft; + cb->nsucc = 2; + set_cur(o, ft); +} + +/* ---- structured scopes ---- */ + +static u32 scope_register(Func* f, Inst* in) { + if (f->nscopes == f->scopes_cap) { + u32 ncap = f->scopes_cap ? f->scopes_cap * 2u : 4u; + Inst** nb = arena_zarray(f->arena, Inst*, ncap); + if (f->scope_aux_inst) + memcpy(nb, f->scope_aux_inst, sizeof(Inst*) * f->nscopes); + f->scope_aux_inst = nb; + f->scopes_cap = ncap; + } + f->scope_aux_inst[f->nscopes++] = in; + return f->nscopes; +} + +static IRScopeAux* scope_lookup(OptImpl* o, CGScope s) { + if (s == CG_SCOPE_NONE || s > o->f->nscopes) { + SrcLoc loc = {0, 0, 0}; + compiler_panic(o->c, loc, "opt: bad scope id %u", (unsigned)s); + } + return (IRScopeAux*)o->f->scope_aux_inst[s - 1]->extra.aux; } static CGScope w_scope_begin(CGTarget* t, const CGScopeDesc* d) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_SCOPE_BEGIN); - CGScope v = o->next_vscope++; - e->u.scope_begin.desc = *d; - e->u.scope_begin.vscope = v; - return v; + Inst* in = rec(o, IR_SCOPE_BEGIN); + IRScopeAux* aux = arena_znew(o->f->arena, IRScopeAux); + aux->desc = *d; + in->extra.aux = aux; + u32 sid = scope_register(o->f, in); + aux->scope_id = sid; + + if (d->kind == SCOPE_IF) { + aux->if_then_block = ir_block_new(o->f); + aux->if_else_block = ir_block_new(o->f); + aux->if_end_block = ir_block_new(o->f); + Block* cb = &o->f->blocks[o->cur]; + cb->succ[0] = aux->if_then_block; + cb->succ[1] = aux->if_else_block; + cb->nsucc = 2; + set_cur(o, aux->if_then_block); + } else if (d->kind == SCOPE_LOOP || d->kind == SCOPE_BLOCK) { + aux->loop_break_block = + d->break_label != LABEL_NONE ? (u32)d->break_label : 0; + aux->loop_continue_block = + d->continue_label != LABEL_NONE ? (u32)d->continue_label : 0; + } + return (CGScope)sid; } + static void w_scope_else(CGTarget* t, CGScope s) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_SCOPE_ELSE); - e->u.scope_op.vscope = s; + IRScopeAux* aux = scope_lookup(o, s); + if (aux->desc.kind != SCOPE_IF) { + SrcLoc loc = {0, 0, 0}; + compiler_panic(o->c, loc, "opt: scope_else on non-IF scope %u", + (unsigned)s); + } + Inst* in = rec(o, IR_SCOPE_ELSE); + in->extra.imm = (i64)s; + if (!cur_terminated(o)) { + Block* cb = &o->f->blocks[o->cur]; + cb->succ[0] = aux->if_end_block; + cb->nsucc = 1; + } + aux->if_has_else = 1; + set_cur(o, aux->if_else_block); } + static void w_scope_end(CGTarget* t, CGScope s) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_SCOPE_END); - e->u.scope_op.vscope = s; + IRScopeAux* aux = scope_lookup(o, s); + Inst* in = rec(o, IR_SCOPE_END); + in->extra.imm = (i64)s; + if (aux->desc.kind == SCOPE_IF) { + if (!cur_terminated(o)) { + Block* cb = &o->f->blocks[o->cur]; + cb->succ[0] = aux->if_end_block; + cb->nsucc = 1; + } + if (!aux->if_has_else) { + Block* eb = &o->f->blocks[aux->if_else_block]; + if (eb->nsucc == 0) { + eb->succ[0] = aux->if_end_block; + eb->nsucc = 1; + } + /* Else block was never visited as cur, but it has code (the + * fall-through from scope_begin) — record it before end so emit + * order has it. */ + ir_note_emit(o->f, aux->if_else_block); + } + set_cur(o, aux->if_end_block); + } } + static void w_break_to(CGTarget* t, CGScope s) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_BREAK_TO); - e->u.scope_op.vscope = s; + IRScopeAux* aux = scope_lookup(o, s); + Inst* in = rec(o, IR_BREAK_TO); + in->extra.imm = (i64)s; + Block* cb = &o->f->blocks[o->cur]; + cb->succ[0] = aux->loop_break_block; + cb->nsucc = 1; + after_terminator(o); } + static void w_continue_to(CGTarget* t, CGScope s) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_CONTINUE_TO); - e->u.scope_op.vscope = s; + IRScopeAux* aux = scope_lookup(o, s); + Inst* in = rec(o, IR_CONTINUE_TO); + in->extra.imm = (i64)s; + Block* cb = &o->f->blocks[o->cur]; + cb->succ[0] = aux->loop_continue_block; + cb->nsucc = 1; + after_terminator(o); } +/* ---- data movement ---- */ + static void w_load_imm(CGTarget* t, Operand dst, i64 imm) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_LOAD_IMM); - e->u.load_imm.dst = dst; - e->u.load_imm.imm = imm; + Inst* in = rec(o, IR_LOAD_IMM); + Operand ops[1] = {dst}; + in->opnds = dup_opnds(o->f, ops, 1); + in->nopnds = 1; + in->extra.imm = imm; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_load_const(CGTarget* t, Operand dst, ConstBytes cb) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_LOAD_CONST); - e->u.load_const.dst = dst; - e->u.load_const.cb = cb; + Inst* in = rec(o, IR_LOAD_CONST); + Operand ops[1] = {dst}; + in->opnds = dup_opnds(o->f, ops, 1); + in->nopnds = 1; + in->extra.cbytes = cb; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_copy(CGTarget* t, Operand dst, Operand src) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_COPY); - e->u.copy.dst = dst; - e->u.copy.src = src; + Inst* in = rec(o, IR_COPY); + Operand ops[2] = {dst, src}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_load(CGTarget* t, Operand dst, Operand addr, MemAccess m) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_LOAD); - e->u.load.dst = dst; - e->u.load.addr = addr; - e->u.load.mem = m; + Inst* in = rec(o, IR_LOAD); + Operand ops[2] = {dst, addr}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + in->extra.mem = m; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_store(CGTarget* t, Operand addr, Operand src, MemAccess m) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_STORE); - e->u.store.addr = addr; - e->u.store.src = src; - e->u.store.mem = m; + Inst* in = rec(o, IR_STORE); + Operand ops[2] = {addr, src}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + in->extra.mem = m; } + static void w_addr_of(CGTarget* t, Operand dst, Operand lv) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_ADDR_OF); - e->u.copy.dst = dst; - e->u.copy.src = lv; + Inst* in = rec(o, IR_ADDR_OF); + Operand ops[2] = {dst, lv}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_tls_addr_of(CGTarget* t, Operand dst, ObjSymId sym, i64 addend) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_TLS_ADDR_OF); - e->u.tls_addr_of.dst = dst; - e->u.tls_addr_of.sym = sym; - e->u.tls_addr_of.addend = addend; + Inst* in = rec(o, IR_TLS_ADDR_OF); + Operand ops[1] = {dst}; + in->opnds = dup_opnds(o->f, ops, 1); + in->nopnds = 1; + IRTlsAux* aux = arena_znew(o->f->arena, IRTlsAux); + aux->sym = sym; + aux->addend = addend; + in->extra.aux = aux; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_copy_bytes(CGTarget* t, Operand dst, Operand src, AggregateAccess agg) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_COPY_BYTES); - e->u.agg.a = dst; - e->u.agg.b = src; - e->u.agg.agg = agg; + Inst* in = rec(o, IR_AGG_COPY); + Operand ops[2] = {dst, src}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + IRAggAux* aux = arena_znew(o->f->arena, IRAggAux); + aux->access = agg; + in->extra.aux = aux; } + static void w_set_bytes(CGTarget* t, Operand dst, Operand byte, AggregateAccess agg) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_SET_BYTES); - e->u.agg.a = dst; - e->u.agg.b = byte; - e->u.agg.agg = agg; + Inst* in = rec(o, IR_AGG_SET); + Operand ops[2] = {dst, byte}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + IRAggAux* aux = arena_znew(o->f->arena, IRAggAux); + aux->access = agg; + in->extra.aux = aux; } + static void w_bitfield_load(CGTarget* t, Operand dst, Operand record, BitFieldAccess bf) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_BITFIELD_LOAD); - e->u.bitfield_load.dst = dst; - e->u.bitfield_load.record = record; - e->u.bitfield_load.bf = bf; + Inst* in = rec(o, IR_BITFIELD_LOAD); + Operand ops[2] = {dst, record}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + IRBitFieldAux* aux = arena_znew(o->f->arena, IRBitFieldAux); + aux->access = bf; + in->extra.aux = aux; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_bitfield_store(CGTarget* t, Operand record, Operand src, BitFieldAccess bf) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_BITFIELD_STORE); - e->u.bitfield_store.record = record; - e->u.bitfield_store.src = src; - e->u.bitfield_store.bf = bf; + Inst* in = rec(o, IR_BITFIELD_STORE); + Operand ops[2] = {record, src}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + IRBitFieldAux* aux = arena_znew(o->f->arena, IRBitFieldAux); + aux->access = bf; + in->extra.aux = aux; } +/* ---- arithmetic / cmp / convert ---- */ + static void w_binop(CGTarget* t, BinOp op, Operand dst, Operand a, Operand b) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_BINOP); - e->u.binop.op = op; - e->u.binop.dst = dst; - e->u.binop.a = a; - e->u.binop.b = b; + Inst* in = rec(o, IR_BINOP); + Operand ops[3] = {dst, a, b}; + in->opnds = dup_opnds(o->f, ops, 3); + in->nopnds = 3; + in->extra.imm = (i64)op; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_unop(CGTarget* t, UnOp op, Operand dst, Operand a) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_UNOP); - e->u.unop.op = op; - e->u.unop.dst = dst; - e->u.unop.a = a; + Inst* in = rec(o, IR_UNOP); + Operand ops[2] = {dst, a}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + in->extra.imm = (i64)op; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_cmp(CGTarget* t, CmpOp op, Operand dst, Operand a, Operand b) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_CMP); - e->u.cmp.op = op; - e->u.cmp.dst = dst; - e->u.cmp.a = a; - e->u.cmp.b = b; + Inst* in = rec(o, IR_CMP); + Operand ops[3] = {dst, a, b}; + in->opnds = dup_opnds(o->f, ops, 3); + in->nopnds = 3; + in->extra.imm = (i64)op; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_convert(CGTarget* t, ConvKind k, Operand dst, Operand src) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_CONVERT); - e->u.convert.kind = k; - e->u.convert.dst = dst; - e->u.convert.src = src; + Inst* in = rec(o, IR_CONVERT); + Operand ops[2] = {dst, src}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + in->extra.imm = (i64)k; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); +} + +/* ---- calls / return ---- */ + +static CGABIPart* dup_parts(Arena* a, const CGABIPart* src, u32 n) { + if (!n) return NULL; + CGABIPart* dst = arena_array(a, CGABIPart, n); + memcpy(dst, src, sizeof(CGABIPart) * n); + return dst; } static void w_call(CGTarget* t, const CGCallDesc* d) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_CALL); - CGABIValue* args_copy = NULL; - CGABIPart** arg_parts_copy = NULL; - CGABIPart* ret_parts_copy = NULL; - u32 i; - - /* Deep-copy the argv. Caller-owned d may be on the stack, and - * args[i].parts may be too. */ + Inst* in = rec(o, IR_CALL); + IRCallAux* aux = arena_znew(o->f->arena, IRCallAux); + aux->desc = *d; if (d->nargs) { - args_copy = arena_array(o->c->tu, CGABIValue, d->nargs); - arg_parts_copy = arena_array(o->c->tu, CGABIPart*, d->nargs); - for (i = 0; i < d->nargs; ++i) { - args_copy[i] = d->args[i]; - arg_parts_copy[i] = - copy_parts(o->c, d->args[i].parts, d->args[i].nparts); - args_copy[i].parts = arg_parts_copy[i]; + CGABIValue* args = arena_array(o->f->arena, CGABIValue, d->nargs); + for (u32 i = 0; i < d->nargs; ++i) { + args[i] = d->args[i]; + args[i].parts = + dup_parts(o->f->arena, d->args[i].parts, d->args[i].nparts); } + aux->desc.args = args; } - ret_parts_copy = copy_parts(o->c, d->ret.parts, d->ret.nparts); - - e->u.call.desc = *d; - e->u.call.desc.args = args_copy; - e->u.call.desc.ret.parts = ret_parts_copy; - e->u.call.args = args_copy; - e->u.call.arg_parts = arg_parts_copy; - e->u.call.ret_parts = ret_parts_copy; + aux->desc.ret = d->ret; + aux->desc.ret.parts = dup_parts(o->f->arena, d->ret.parts, d->ret.nparts); + in->extra.aux = aux; + in->type = d->fn_type; } static void w_ret(CGTarget* t, const CGABIValue* v) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_RET); - if (!v) { - e->u.ret.present = 0; - return; + Inst* in = rec(o, IR_RET); + IRRetAux* aux = arena_znew(o->f->arena, IRRetAux); + if (v) { + aux->present = 1; + aux->val = *v; + aux->val.parts = dup_parts(o->f->arena, v->parts, v->nparts); } - e->u.ret.present = 1; - e->u.ret.val = *v; - e->u.ret.parts = copy_parts(o->c, v->parts, v->nparts); - e->u.ret.val.parts = e->u.ret.parts; + in->extra.aux = aux; + Block* cb = &o->f->blocks[o->cur]; + cb->nsucc = 0; + after_terminator(o); } +/* ---- alloca / variadics / setjmp / atomics / fence / intrinsic ---- */ + static void w_alloca_(CGTarget* t, Operand dst, Operand size, u32 align) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_ALLOCA); - e->u.alloca_.dst = dst; - e->u.alloca_.size = size; - e->u.alloca_.align = align; + Inst* in = rec(o, IR_ALLOCA); + Operand ops[2] = {dst, size}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + in->extra.imm = (i64)align; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } static void w_va_start_(CGTarget* t, Operand ap) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_VA_START); - e->u.va_se.ap = ap; + Inst* in = rec(o, IR_VA_START); + Operand ops[1] = {ap}; + in->opnds = dup_opnds(o->f, ops, 1); + in->nopnds = 1; } + static void w_va_arg_(CGTarget* t, Operand dst, Operand ap, const Type* ty) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_VA_ARG); - e->u.va_arg_.dst = dst; - e->u.va_arg_.ap = ap; - e->u.va_arg_.ty = ty; + Inst* in = rec(o, IR_VA_ARG); + Operand ops[2] = {dst, ap}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + in->extra.aux = (void*)ty; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_va_end_(CGTarget* t, Operand ap) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_VA_END); - e->u.va_se.ap = ap; + Inst* in = rec(o, IR_VA_END); + Operand ops[1] = {ap}; + in->opnds = dup_opnds(o->f, ops, 1); + in->nopnds = 1; } + static void w_va_copy_(CGTarget* t, Operand dst, Operand src) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_VA_COPY); - e->u.copy.dst = dst; - e->u.copy.src = src; + Inst* in = rec(o, IR_VA_COPY); + Operand ops[2] = {dst, src}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; } static void w_setjmp_(CGTarget* t, Operand dst, Operand buf) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_SETJMP); - e->u.setjmp_.dst = dst; - e->u.setjmp_.buf = buf; + Inst* in = rec(o, IR_SETJMP); + Operand ops[2] = {dst, buf}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_longjmp_(CGTarget* t, Operand buf, Operand val) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_LONGJMP); - e->u.longjmp_.buf = buf; - e->u.longjmp_.val = val; + Inst* in = rec(o, IR_LONGJMP); + Operand ops[2] = {buf, val}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + Block* cb = &o->f->blocks[o->cur]; + cb->nsucc = 0; + after_terminator(o); } static void w_atomic_load(CGTarget* t, Operand dst, Operand addr, MemAccess m, MemOrder mo) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_ATOMIC_LOAD); - e->u.atomic_load.dst = dst; - e->u.atomic_load.addr = addr; - e->u.atomic_load.mem = m; - e->u.atomic_load.mo = mo; + Inst* in = rec(o, IR_ATOMIC_LOAD); + Operand ops[2] = {dst, addr}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + IRAtomicAux* aux = arena_znew(o->f->arena, IRAtomicAux); + aux->mem = m; + aux->mo = mo; + in->extra.aux = aux; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_atomic_store(CGTarget* t, Operand addr, Operand src, MemAccess m, MemOrder mo) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_ATOMIC_STORE); - e->u.atomic_store.addr = addr; - e->u.atomic_store.src = src; - e->u.atomic_store.mem = m; - e->u.atomic_store.mo = mo; + Inst* in = rec(o, IR_ATOMIC_STORE); + Operand ops[2] = {addr, src}; + in->opnds = dup_opnds(o->f, ops, 2); + in->nopnds = 2; + IRAtomicAux* aux = arena_znew(o->f->arena, IRAtomicAux); + aux->mem = m; + aux->mo = mo; + in->extra.aux = aux; } + static void w_atomic_rmw(CGTarget* t, AtomicOp op, Operand dst, Operand addr, Operand val, MemAccess m, MemOrder mo) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_ATOMIC_RMW); - e->u.atomic_rmw.op = op; - e->u.atomic_rmw.dst = dst; - e->u.atomic_rmw.addr = addr; - e->u.atomic_rmw.val = val; - e->u.atomic_rmw.mem = m; - e->u.atomic_rmw.mo = mo; + Inst* in = rec(o, IR_ATOMIC_RMW); + Operand ops[3] = {dst, addr, val}; + in->opnds = dup_opnds(o->f, ops, 3); + in->nopnds = 3; + IRAtomicAux* aux = arena_znew(o->f->arena, IRAtomicAux); + aux->mem = m; + aux->mo = mo; + aux->op = (u8)op; + in->extra.aux = aux; + if (dst.kind == OPK_REG) set_def(o->f, in, o->cur, (Val)dst.v.reg, dst.type); } + static void w_atomic_cas(CGTarget* t, Operand prior, Operand ok, Operand addr, Operand expected, Operand desired, MemAccess m, MemOrder s, MemOrder f) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_ATOMIC_CAS); - e->u.atomic_cas.prior = prior; - e->u.atomic_cas.ok = ok; - e->u.atomic_cas.addr = addr; - e->u.atomic_cas.expected = expected; - e->u.atomic_cas.desired = desired; - e->u.atomic_cas.mem = m; - e->u.atomic_cas.success = s; - e->u.atomic_cas.failure = f; + Inst* in = rec(o, IR_ATOMIC_CAS); + Operand ops[5] = {prior, ok, addr, expected, desired}; + in->opnds = dup_opnds(o->f, ops, 5); + in->nopnds = 5; + IRCasAux* aux = arena_znew(o->f->arena, IRCasAux); + aux->mem = m; + aux->success = s; + aux->failure = f; + in->extra.aux = aux; + if (prior.kind == OPK_REG) + set_def(o->f, in, o->cur, (Val)prior.v.reg, prior.type); } + static void w_fence(CGTarget* t, MemOrder mo) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_FENCE); - e->u.fence.mo = mo; + Inst* in = rec(o, IR_FENCE); + in->extra.imm = (i64)mo; } -static void w_intrinsic(CGTarget* t, IntrinKind k, Operand* dsts, u32 nd, +static void w_intrinsic(CGTarget* t, IntrinKind kind, Operand* dsts, u32 nd, const Operand* args, u32 na) { OptImpl* o = impl_of(t); - TapeEntry* e = tape_append(o, TOP_INTRINSIC); - e->u.intrinsic.kind = k; - e->u.intrinsic.ndst = nd; - e->u.intrinsic.narg = na; - e->u.intrinsic.dsts = copy_operands(o->c, dsts, nd); - e->u.intrinsic.args = copy_operands(o->c, args, na); + Inst* in = rec(o, IR_INTRINSIC); + IRIntrinAux* aux = arena_znew(o->f->arena, IRIntrinAux); + aux->kind = kind; + aux->ndst = nd; + aux->narg = na; + aux->dsts = nd ? arena_array(o->f->arena, Operand, nd) : NULL; + aux->args = na ? arena_array(o->f->arena, Operand, na) : NULL; + if (nd) memcpy(aux->dsts, dsts, sizeof(Operand) * nd); + if (na) memcpy(aux->args, args, sizeof(Operand) * na); + in->extra.aux = aux; + if (nd == 1 && dsts[0].kind == OPK_REG) { + set_def(o->f, in, o->cur, (Val)dsts[0].v.reg, dsts[0].type); + } else if (nd > 1) { + in->ndefs = nd; + in->defs = arena_array(o->f->arena, Val, nd); + for (u32 i = 0; i < nd; ++i) { + in->defs[i] = + (dsts[i].kind == OPK_REG) ? (Val)dsts[i].v.reg : VAL_NONE; + if (in->defs[i] != VAL_NONE && in->defs[i] < o->f->nvals) { + o->f->val_def_block[in->defs[i]] = o->cur; + o->f->val_def_inst[in->defs[i]] = + o->f->blocks[o->cur].ninsts - 1u; + } + } + in->def = in->defs[0]; + in->type = dsts[0].type; + } } static void w_asm_block(CGTarget* t, const char* tmpl, @@ -897,877 +702,473 @@ static void w_asm_block(CGTarget* t, const char* tmpl, (void)in_ops; (void)clobbers; (void)nclob; - /* Group M (inline asm) is deferred in the corpus; the wrapper does - * not yet support it. */ panic_unsupported(impl_of(t), "asm_block"); } static void w_set_loc(CGTarget* t, SrcLoc loc) { OptImpl* o = impl_of(t); - TapeEntry* e; o->pending_loc = loc; - e = tape_append(o, TOP_SET_LOC); - e->u.set_loc.loc = loc; } -/* ---- replay-time translation ---- */ +/* ============================================================ + * Replay: walk the recorded Func and emit to the wrapped target. + * ============================================================ */ -static Reg xlat_reg(OptImpl* o, Reg vreg) { - if (vreg == REG_NONE || vreg == 0u) return vreg; - if (vreg >= o->reg_map_cap || o->reg_map[vreg] == REG_NONE) { - SrcLoc loc = {0, 0, 0}; - compiler_panic(o->c, loc, "opt replay: unmapped vreg %u", (unsigned)vreg); - } - return o->reg_map[vreg]; -} - -static FrameSlot xlat_slot(OptImpl* o, FrameSlot vs) { - if (vs == FRAME_SLOT_NONE) return FRAME_SLOT_NONE; - if (vs >= o->slot_map_cap || o->slot_map[vs] == FRAME_SLOT_NONE) { +typedef struct ReplayCtx { + OptImpl* o; + CGTarget* tgt; + Reg* val_to_reg; + FrameSlot* slot_map; + Label* label_map; + CGScope* scope_map; + u8* val_alloced; + u8* block_label_placed; +} ReplayCtx; + +static Reg val_to_target_reg(ReplayCtx* r, Val v) { + Func* f = r->o->f; + if (v == VAL_NONE) return REG_NONE; + if (v >= f->nvals) { SrcLoc loc = {0, 0, 0}; - compiler_panic(o->c, loc, "opt replay: unmapped vslot %u", (unsigned)vs); + compiler_panic(r->o->c, loc, "opt replay: Val %u out of range", v); } - return o->slot_map[vs]; -} - -static Label xlat_label(OptImpl* o, Label vl) { - if (vl == LABEL_NONE) return LABEL_NONE; - if (vl >= o->label_map_cap || o->label_map[vl] == LABEL_NONE) { - SrcLoc loc = {0, 0, 0}; - compiler_panic(o->c, loc, "opt replay: unmapped vlabel %u", (unsigned)vl); + if (!r->val_alloced[v]) { + r->val_to_reg[v] = + r->tgt->alloc_reg(r->tgt, (RegClass)f->val_cls[v], f->val_type[v]); + r->val_alloced[v] = 1; } - return o->label_map[vl]; + return r->val_to_reg[v]; } -static CGScope xlat_scope(OptImpl* o, CGScope vs) { - if (vs == CG_SCOPE_NONE) return CG_SCOPE_NONE; - if (vs >= o->scope_map_cap || o->scope_map[vs] == CG_SCOPE_NONE) { +static FrameSlot slot_to_target(ReplayCtx* r, FrameSlot vs) { + if (vs == FRAME_SLOT_NONE) return FRAME_SLOT_NONE; + if (vs >= r->o->f->nframe_slots + 1u) { SrcLoc loc = {0, 0, 0}; - compiler_panic(o->c, loc, "opt replay: unmapped vscope %u", (unsigned)vs); + compiler_panic(r->o->c, loc, "opt replay: vslot %u out of range", + (unsigned)vs); } - return o->scope_map[vs]; + return r->slot_map[vs]; } -static Operand xlat_op(OptImpl* o, Operand op) { +static Operand xlat_op(ReplayCtx* r, Operand op) { switch ((OpKind)op.kind) { case OPK_IMM: case OPK_GLOBAL: return op; case OPK_REG: - op.v.reg = xlat_reg(o, op.v.reg); + op.v.reg = val_to_target_reg(r, (Val)op.v.reg); return op; case OPK_LOCAL: - op.v.frame_slot = xlat_slot(o, op.v.frame_slot); + op.v.frame_slot = slot_to_target(r, op.v.frame_slot); return op; case OPK_INDIRECT: - op.v.ind.base = xlat_reg(o, op.v.ind.base); + op.v.ind.base = val_to_target_reg(r, (Val)op.v.ind.base); return op; } - /* unreachable */ return op; } -static CGABIValue xlat_abivalue(OptImpl* o, const CGABIValue* in, +static CGABIValue xlat_abivalue(ReplayCtx* r, const CGABIValue* in, CGABIPart* parts_out) { CGABIValue out = *in; - out.storage = xlat_op(o, in->storage); + out.storage = xlat_op(r, in->storage); if (in->nparts && parts_out) { for (u32 i = 0; i < in->nparts; ++i) { parts_out[i] = in->parts[i]; - parts_out[i].op = xlat_op(o, in->parts[i].op); + parts_out[i].op = xlat_op(r, in->parts[i].op); } out.parts = parts_out; + } else { + out.parts = NULL; } return out; } -/* ---- replay ---- */ - -static void replay(OptImpl* o) { - CGTarget* w = o->target; - - /* Pre-size the maps to the high-water mark for this function. */ - if (o->next_vreg > 1) map_reg_grow(o, o->next_vreg); - if (o->next_vslot > 1) map_slot_grow(o, o->next_vslot); - if (o->next_vlabel > 1) map_label_grow(o, o->next_vlabel); - if (o->next_vscope > 1) map_scope_grow(o, o->next_vscope); - - for (u32 i = 0; i < o->ntape; ++i) { - TapeEntry* e = &o->tape[i]; - if (e->dead) continue; - switch ((TapeOpKind)e->op) { - case TOP_FUNC_BEGIN: { - /* Build a fresh CGFuncDesc with translated param slots. */ - CGFuncDesc fd = e->u.func_begin.desc; - if (fd.nparams) { - CGParamDesc* params = arena_array(o->c->tu, CGParamDesc, fd.nparams); - for (u32 k = 0; k < fd.nparams; ++k) { - params[k] = e->u.func_begin.params[k]; - params[k].slot = xlat_slot(o, e->u.func_begin.params[k].slot); - } - fd.params = params; - } - w->func_begin(w, &fd); - break; - } - case TOP_FUNC_END: - w->func_end(w); - break; - case TOP_ALLOC_REG: { - Reg r = - w->alloc_reg(w, e->u.alloc_reg.cls, e->u.alloc_reg.ty); - Reg v = e->u.alloc_reg.vreg; - if (v >= o->reg_map_cap) map_reg_grow(o, v + 1); - o->reg_map[v] = r; - break; - } - case TOP_FRAME_SLOT: { - FrameSlot s = w->frame_slot(w, &e->u.frame_slot.desc); - FrameSlot v = e->u.frame_slot.vslot; - if (v >= o->slot_map_cap) map_slot_grow(o, v + 1); - o->slot_map[v] = s; - break; - } - case TOP_PARAM: { - CGParamDesc d = e->u.param.desc; - d.slot = xlat_slot(o, d.slot); - w->param(w, &d); - break; - } - case TOP_LABEL_NEW: { - Label l = w->label_new(w); - Label v = e->u.label_new.vlabel; - if (v >= o->label_map_cap) map_label_grow(o, v + 1); - o->label_map[v] = l; - break; - } - case TOP_LABEL_PLACE: - w->label_place(w, xlat_label(o, e->u.label_op.vlabel)); - break; - case TOP_JUMP: - w->jump(w, xlat_label(o, e->u.label_op.vlabel)); - break; - case TOP_CMP_BRANCH: - w->cmp_branch(w, e->u.cmp_branch.op, xlat_op(o, e->u.cmp_branch.a), - xlat_op(o, e->u.cmp_branch.b), - xlat_label(o, e->u.cmp_branch.vlabel)); - break; - case TOP_SCOPE_BEGIN: { - CGScopeDesc d = e->u.scope_begin.desc; - d.cond = xlat_op(o, d.cond); - d.break_label = xlat_label(o, d.break_label); - d.continue_label = xlat_label(o, d.continue_label); - CGScope s = w->scope_begin(w, &d); - CGScope v = e->u.scope_begin.vscope; - if (v >= o->scope_map_cap) map_scope_grow(o, v + 1); - o->scope_map[v] = s; - break; - } - case TOP_SCOPE_ELSE: - w->scope_else(w, xlat_scope(o, e->u.scope_op.vscope)); - break; - case TOP_SCOPE_END: - w->scope_end(w, xlat_scope(o, e->u.scope_op.vscope)); - break; - case TOP_BREAK_TO: - w->break_to(w, xlat_scope(o, e->u.scope_op.vscope)); - break; - case TOP_CONTINUE_TO: - w->continue_to(w, xlat_scope(o, e->u.scope_op.vscope)); - break; - case TOP_LOAD_IMM: - w->load_imm(w, xlat_op(o, e->u.load_imm.dst), e->u.load_imm.imm); - break; - case TOP_LOAD_CONST: - w->load_const(w, xlat_op(o, e->u.load_const.dst), e->u.load_const.cb); - break; - case TOP_COPY: - w->copy(w, xlat_op(o, e->u.copy.dst), xlat_op(o, e->u.copy.src)); - break; - case TOP_LOAD: - w->load(w, xlat_op(o, e->u.load.dst), xlat_op(o, e->u.load.addr), - e->u.load.mem); - break; - case TOP_STORE: - w->store(w, xlat_op(o, e->u.store.addr), xlat_op(o, e->u.store.src), - e->u.store.mem); - break; - case TOP_ADDR_OF: - w->addr_of(w, xlat_op(o, e->u.copy.dst), xlat_op(o, e->u.copy.src)); - break; - case TOP_TLS_ADDR_OF: - w->tls_addr_of(w, xlat_op(o, e->u.tls_addr_of.dst), - e->u.tls_addr_of.sym, e->u.tls_addr_of.addend); - break; - case TOP_COPY_BYTES: - w->copy_bytes(w, xlat_op(o, e->u.agg.a), xlat_op(o, e->u.agg.b), - e->u.agg.agg); - break; - case TOP_SET_BYTES: - w->set_bytes(w, xlat_op(o, e->u.agg.a), xlat_op(o, e->u.agg.b), - e->u.agg.agg); - break; - case TOP_BITFIELD_LOAD: - w->bitfield_load(w, xlat_op(o, e->u.bitfield_load.dst), - xlat_op(o, e->u.bitfield_load.record), - e->u.bitfield_load.bf); - break; - case TOP_BITFIELD_STORE: - w->bitfield_store(w, xlat_op(o, e->u.bitfield_store.record), - xlat_op(o, e->u.bitfield_store.src), - e->u.bitfield_store.bf); - break; - case TOP_BINOP: - w->binop(w, e->u.binop.op, xlat_op(o, e->u.binop.dst), - xlat_op(o, e->u.binop.a), xlat_op(o, e->u.binop.b)); - break; - case TOP_UNOP: - w->unop(w, e->u.unop.op, xlat_op(o, e->u.unop.dst), - xlat_op(o, e->u.unop.a)); - break; - case TOP_CMP: - w->cmp(w, e->u.cmp.op, xlat_op(o, e->u.cmp.dst), - xlat_op(o, e->u.cmp.a), xlat_op(o, e->u.cmp.b)); - break; - case TOP_CONVERT: - w->convert(w, e->u.convert.kind, xlat_op(o, e->u.convert.dst), - xlat_op(o, e->u.convert.src)); - break; - case TOP_CALL: { - CGCallDesc cd = e->u.call.desc; - cd.callee = xlat_op(o, cd.callee); - CGABIValue* args = NULL; - if (cd.nargs) { - args = arena_array(o->c->tu, CGABIValue, cd.nargs); - for (u32 k = 0; k < cd.nargs; ++k) { - CGABIPart* parts = - e->u.call.args[k].nparts - ? arena_array(o->c->tu, CGABIPart, - e->u.call.args[k].nparts) - : NULL; - args[k] = xlat_abivalue(o, &e->u.call.args[k], parts); - } - cd.args = args; - } else { - cd.args = NULL; +static Label ensure_label(ReplayCtx* r, u32 b) { + if (b >= r->o->f->nblocks) return LABEL_NONE; + if (r->label_map[b] == LABEL_NONE) { + r->label_map[b] = r->tgt->label_new(r->tgt); + } + return r->label_map[b]; +} + +static void ensure_label_placed(ReplayCtx* r, u32 b) { + if (r->block_label_placed[b]) return; + r->block_label_placed[b] = 1; + if (b == r->o->f->entry) return; + Label l = ensure_label(r, b); + r->tgt->label_place(r->tgt, l); +} + +static void replay_inst(ReplayCtx* r, u32 b, Inst* in) { + CGTarget* w = r->tgt; + w->set_loc(w, in->loc); + + switch ((IROp)in->op) { + case IR_NOP: + case IR_CONST_I: + case IR_CONST_BYTES: + case IR_PARAM_DECL: + case IR_PHI: + case IR_CONDBR: + case IR_ASM_BLOCK: + break; + case IR_LOAD_IMM: { + Operand dst = xlat_op(r, in->opnds[0]); + w->load_imm(w, dst, in->extra.imm); + break; + } + case IR_LOAD_CONST: { + Operand dst = xlat_op(r, in->opnds[0]); + w->load_const(w, dst, in->extra.cbytes); + break; + } + case IR_COPY: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand src = xlat_op(r, in->opnds[1]); + w->copy(w, dst, src); + break; + } + case IR_LOAD: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand addr = xlat_op(r, in->opnds[1]); + w->load(w, dst, addr, in->extra.mem); + break; + } + case IR_STORE: { + Operand addr = xlat_op(r, in->opnds[0]); + Operand src = xlat_op(r, in->opnds[1]); + w->store(w, addr, src, in->extra.mem); + break; + } + case IR_ADDR_OF: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand lv = xlat_op(r, in->opnds[1]); + w->addr_of(w, dst, lv); + break; + } + case IR_TLS_ADDR_OF: { + Operand dst = xlat_op(r, in->opnds[0]); + IRTlsAux* aux = (IRTlsAux*)in->extra.aux; + w->tls_addr_of(w, dst, aux->sym, aux->addend); + break; + } + case IR_AGG_COPY: { + Operand a = xlat_op(r, in->opnds[0]); + Operand bo = xlat_op(r, in->opnds[1]); + IRAggAux* aux = (IRAggAux*)in->extra.aux; + w->copy_bytes(w, a, bo, aux->access); + break; + } + case IR_AGG_SET: { + Operand a = xlat_op(r, in->opnds[0]); + Operand bo = xlat_op(r, in->opnds[1]); + IRAggAux* aux = (IRAggAux*)in->extra.aux; + w->set_bytes(w, a, bo, aux->access); + break; + } + case IR_BITFIELD_LOAD: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand rec_ = xlat_op(r, in->opnds[1]); + IRBitFieldAux* aux = (IRBitFieldAux*)in->extra.aux; + w->bitfield_load(w, dst, rec_, aux->access); + break; + } + case IR_BITFIELD_STORE: { + Operand rec_ = xlat_op(r, in->opnds[0]); + Operand src = xlat_op(r, in->opnds[1]); + IRBitFieldAux* aux = (IRBitFieldAux*)in->extra.aux; + w->bitfield_store(w, rec_, src, aux->access); + break; + } + case IR_BINOP: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand a = xlat_op(r, in->opnds[1]); + Operand bo = xlat_op(r, in->opnds[2]); + w->binop(w, (BinOp)in->extra.imm, dst, a, bo); + break; + } + case IR_UNOP: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand a = xlat_op(r, in->opnds[1]); + w->unop(w, (UnOp)in->extra.imm, dst, a); + break; + } + case IR_CMP: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand a = xlat_op(r, in->opnds[1]); + Operand bo = xlat_op(r, in->opnds[2]); + w->cmp(w, (CmpOp)in->extra.imm, dst, a, bo); + break; + } + case IR_CONVERT: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand src = xlat_op(r, in->opnds[1]); + w->convert(w, (ConvKind)in->extra.imm, dst, src); + break; + } + case IR_CALL: { + IRCallAux* aux = (IRCallAux*)in->extra.aux; + CGCallDesc cd = aux->desc; + cd.callee = xlat_op(r, cd.callee); + CGABIValue* args = NULL; + if (cd.nargs) { + args = arena_array(r->o->f->arena, CGABIValue, cd.nargs); + for (u32 k = 0; k < cd.nargs; ++k) { + CGABIPart* parts = + aux->desc.args[k].nparts + ? arena_array(r->o->f->arena, CGABIPart, + aux->desc.args[k].nparts) + : NULL; + args[k] = xlat_abivalue(r, &aux->desc.args[k], parts); } - CGABIPart* ret_parts = - cd.ret.nparts - ? arena_array(o->c->tu, CGABIPart, cd.ret.nparts) - : NULL; - cd.ret = xlat_abivalue(o, &e->u.call.desc.ret, ret_parts); - w->call(w, &cd); - break; + cd.args = args; + } else { + cd.args = NULL; } - case TOP_RET: { - if (!e->u.ret.present) { - w->ret(w, NULL); - break; - } + CGABIPart* ret_parts = + cd.ret.nparts + ? arena_array(r->o->f->arena, CGABIPart, cd.ret.nparts) + : NULL; + cd.ret = xlat_abivalue(r, &aux->desc.ret, ret_parts); + w->call(w, &cd); + break; + } + case IR_BR: { + Block* bl = &r->o->f->blocks[b]; + if (bl->nsucc < 1) break; + Label l = ensure_label(r, bl->succ[0]); + w->jump(w, l); + break; + } + case IR_CMP_BRANCH: { + Operand a = xlat_op(r, in->opnds[0]); + Operand bo = xlat_op(r, in->opnds[1]); + Block* bl = &r->o->f->blocks[b]; + Label taken = ensure_label(r, bl->succ[0]); + w->cmp_branch(w, (CmpOp)in->extra.imm, a, bo, taken); + break; + } + case IR_RET: { + IRRetAux* aux = (IRRetAux*)in->extra.aux; + if (!aux || !aux->present) { + w->ret(w, NULL); + } else { CGABIPart* parts = - e->u.ret.val.nparts - ? arena_array(o->c->tu, CGABIPart, e->u.ret.val.nparts) + aux->val.nparts + ? arena_array(r->o->f->arena, CGABIPart, aux->val.nparts) : NULL; - CGABIValue v = xlat_abivalue(o, &e->u.ret.val, parts); + CGABIValue v = xlat_abivalue(r, &aux->val, parts); w->ret(w, &v); - break; } - case TOP_ALLOCA: - w->alloca_(w, xlat_op(o, e->u.alloca_.dst), - xlat_op(o, e->u.alloca_.size), e->u.alloca_.align); - break; - case TOP_VA_START: - w->va_start_(w, xlat_op(o, e->u.va_se.ap)); - break; - case TOP_VA_ARG: - w->va_arg_(w, xlat_op(o, e->u.va_arg_.dst), - xlat_op(o, e->u.va_arg_.ap), e->u.va_arg_.ty); - break; - case TOP_VA_END: - w->va_end_(w, xlat_op(o, e->u.va_se.ap)); - break; - case TOP_VA_COPY: - w->va_copy_(w, xlat_op(o, e->u.copy.dst), xlat_op(o, e->u.copy.src)); - break; - case TOP_SETJMP: - w->setjmp_(w, xlat_op(o, e->u.setjmp_.dst), - xlat_op(o, e->u.setjmp_.buf)); - break; - case TOP_LONGJMP: - w->longjmp_(w, xlat_op(o, e->u.longjmp_.buf), - xlat_op(o, e->u.longjmp_.val)); - break; - case TOP_ATOMIC_LOAD: - w->atomic_load(w, xlat_op(o, e->u.atomic_load.dst), - xlat_op(o, e->u.atomic_load.addr), - e->u.atomic_load.mem, e->u.atomic_load.mo); - break; - case TOP_ATOMIC_STORE: - w->atomic_store(w, xlat_op(o, e->u.atomic_store.addr), - xlat_op(o, e->u.atomic_store.src), - e->u.atomic_store.mem, e->u.atomic_store.mo); - break; - case TOP_ATOMIC_RMW: - w->atomic_rmw(w, e->u.atomic_rmw.op, xlat_op(o, e->u.atomic_rmw.dst), - xlat_op(o, e->u.atomic_rmw.addr), - xlat_op(o, e->u.atomic_rmw.val), e->u.atomic_rmw.mem, - e->u.atomic_rmw.mo); - break; - case TOP_ATOMIC_CAS: - w->atomic_cas(w, xlat_op(o, e->u.atomic_cas.prior), - xlat_op(o, e->u.atomic_cas.ok), - xlat_op(o, e->u.atomic_cas.addr), - xlat_op(o, e->u.atomic_cas.expected), - xlat_op(o, e->u.atomic_cas.desired), - e->u.atomic_cas.mem, e->u.atomic_cas.success, - e->u.atomic_cas.failure); - break; - case TOP_FENCE: - w->fence(w, e->u.fence.mo); - break; - case TOP_INTRINSIC: { - Operand* dsts = NULL; - Operand* args = NULL; - if (e->u.intrinsic.ndst) { - dsts = arena_array(o->c->tu, Operand, e->u.intrinsic.ndst); - for (u32 k = 0; k < e->u.intrinsic.ndst; ++k) { - dsts[k] = xlat_op(o, e->u.intrinsic.dsts[k]); - } - } - if (e->u.intrinsic.narg) { - args = arena_array(o->c->tu, Operand, e->u.intrinsic.narg); - for (u32 k = 0; k < e->u.intrinsic.narg; ++k) { - args[k] = xlat_op(o, e->u.intrinsic.args[k]); - } - } - w->intrinsic(w, e->u.intrinsic.kind, dsts, e->u.intrinsic.ndst, args, - e->u.intrinsic.narg); - break; + break; + } + case IR_SCOPE_BEGIN: { + IRScopeAux* aux = (IRScopeAux*)in->extra.aux; + CGScopeDesc d = aux->desc; + d.cond = xlat_op(r, d.cond); + if (aux->desc.kind == SCOPE_LOOP || aux->desc.kind == SCOPE_BLOCK) { + d.break_label = aux->loop_break_block + ? ensure_label(r, aux->loop_break_block) + : LABEL_NONE; + d.continue_label = aux->loop_continue_block + ? ensure_label(r, aux->loop_continue_block) + : LABEL_NONE; } - case TOP_SET_LOC: - w->set_loc(w, e->u.set_loc.loc); - break; + CGScope cs = w->scope_begin(w, &d); + r->scope_map[aux->scope_id] = cs; + break; + } + case IR_SCOPE_ELSE: + w->scope_else(w, r->scope_map[(u32)in->extra.imm]); + break; + case IR_SCOPE_END: + w->scope_end(w, r->scope_map[(u32)in->extra.imm]); + break; + case IR_BREAK_TO: + w->break_to(w, r->scope_map[(u32)in->extra.imm]); + break; + case IR_CONTINUE_TO: + w->continue_to(w, r->scope_map[(u32)in->extra.imm]); + break; + case IR_ALLOCA: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand size = xlat_op(r, in->opnds[1]); + w->alloca_(w, dst, size, (u32)in->extra.imm); + break; + } + case IR_VA_START: { + Operand ap = xlat_op(r, in->opnds[0]); + w->va_start_(w, ap); + break; + } + case IR_VA_ARG: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand ap = xlat_op(r, in->opnds[1]); + const Type* ty = (const Type*)in->extra.aux; + w->va_arg_(w, dst, ap, ty); + break; + } + case IR_VA_END: { + Operand ap = xlat_op(r, in->opnds[0]); + w->va_end_(w, ap); + break; + } + case IR_VA_COPY: { + Operand a = xlat_op(r, in->opnds[0]); + Operand src = xlat_op(r, in->opnds[1]); + w->va_copy_(w, a, src); + break; + } + case IR_SETJMP: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand buf = xlat_op(r, in->opnds[1]); + w->setjmp_(w, dst, buf); + break; + } + case IR_LONGJMP: { + Operand buf = xlat_op(r, in->opnds[0]); + Operand val = xlat_op(r, in->opnds[1]); + w->longjmp_(w, buf, val); + break; + } + case IR_ATOMIC_LOAD: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand addr = xlat_op(r, in->opnds[1]); + IRAtomicAux* aux = (IRAtomicAux*)in->extra.aux; + w->atomic_load(w, dst, addr, aux->mem, aux->mo); + break; + } + case IR_ATOMIC_STORE: { + Operand addr = xlat_op(r, in->opnds[0]); + Operand src = xlat_op(r, in->opnds[1]); + IRAtomicAux* aux = (IRAtomicAux*)in->extra.aux; + w->atomic_store(w, addr, src, aux->mem, aux->mo); + break; + } + case IR_ATOMIC_RMW: { + Operand dst = xlat_op(r, in->opnds[0]); + Operand addr = xlat_op(r, in->opnds[1]); + Operand val = xlat_op(r, in->opnds[2]); + IRAtomicAux* aux = (IRAtomicAux*)in->extra.aux; + w->atomic_rmw(w, (AtomicOp)aux->op, dst, addr, val, aux->mem, aux->mo); + break; + } + case IR_ATOMIC_CAS: { + Operand prior = xlat_op(r, in->opnds[0]); + Operand ok = xlat_op(r, in->opnds[1]); + Operand addr = xlat_op(r, in->opnds[2]); + Operand expected = xlat_op(r, in->opnds[3]); + Operand desired = xlat_op(r, in->opnds[4]); + IRCasAux* aux = (IRCasAux*)in->extra.aux; + w->atomic_cas(w, prior, ok, addr, expected, desired, aux->mem, + aux->success, aux->failure); + break; + } + case IR_FENCE: + w->fence(w, (MemOrder)in->extra.imm); + break; + case IR_INTRINSIC: { + IRIntrinAux* aux = (IRIntrinAux*)in->extra.aux; + Operand* dsts = + aux->ndst ? arena_array(r->o->f->arena, Operand, aux->ndst) : NULL; + Operand* args = + aux->narg ? arena_array(r->o->f->arena, Operand, aux->narg) : NULL; + for (u32 k = 0; k < aux->ndst; ++k) dsts[k] = xlat_op(r, aux->dsts[k]); + for (u32 k = 0; k < aux->narg; ++k) args[k] = xlat_op(r, aux->args[k]); + w->intrinsic(w, aux->kind, dsts, aux->ndst, args, aux->narg); + break; } } } -/* ---- printer ---- */ - -static void wstr(Writer* w, const char* s) { - size_t n = 0; - while (s[n]) ++n; - if (n) w->write(w, s, n); -} - -/* Minimal i64 → decimal formatter. Writes into a 32-byte buffer (enough - * for INT64_MIN). Returns nothing; the caller hands the buffer to wstr. */ -static void fmt_i64(i64 v, char* out) { - char tmp[32]; - u32 n = 0; - u64 u; - int neg = 0; - if (v < 0) { - neg = 1; - u = (u64)(-(v + 1)) + 1u; /* avoid UB for INT64_MIN */ - } else { - u = (u64)v; - } - do { - tmp[n++] = (char)('0' + (u % 10u)); - u /= 10u; - } while (u); - if (neg) tmp[n++] = '-'; - /* reverse */ - for (u32 i = 0; i < n; ++i) out[i] = tmp[n - 1 - i]; - out[n] = 0; -} - -static void wint(Writer* w, i64 v) { - char buf[32]; - fmt_i64(v, buf); - wstr(w, buf); -} - -static const char* binop_name(BinOp op) { - switch (op) { - case BO_IADD: return "iadd"; - case BO_ISUB: return "isub"; - case BO_IMUL: return "imul"; - case BO_SDIV: return "sdiv"; - case BO_UDIV: return "udiv"; - case BO_SREM: return "srem"; - case BO_UREM: return "urem"; - case BO_FADD: return "fadd"; - case BO_FSUB: return "fsub"; - case BO_FMUL: return "fmul"; - case BO_FDIV: return "fdiv"; - case BO_AND: return "and"; - case BO_OR: return "or"; - case BO_XOR: return "xor"; - case BO_SHL: return "shl"; - case BO_SHR_S: return "shr_s"; - case BO_SHR_U: return "shr_u"; +static void replay_block(ReplayCtx* r, u32 b) { + Func* f = r->o->f; + if (b >= f->nblocks) return; + ensure_label_placed(r, b); + Block* bl = &f->blocks[b]; + for (u32 i = 0; i < bl->ninsts; ++i) { + replay_inst(r, b, &bl->insts[i]); } - return "?binop"; } -static const char* unop_name(UnOp op) { - switch (op) { - case UO_NEG: return "neg"; - case UO_NOT: return "not"; - case UO_BNOT: return "bnot"; - } - return "?unop"; -} - -static const char* cmp_name(CmpOp op) { - switch (op) { - case CMP_EQ: return "eq"; - case CMP_NE: return "ne"; - case CMP_LT_S: return "lt_s"; - case CMP_LE_S: return "le_s"; - case CMP_GT_S: return "gt_s"; - case CMP_GE_S: return "ge_s"; - case CMP_LT_U: return "lt_u"; - case CMP_LE_U: return "le_u"; - case CMP_GT_U: return "gt_u"; - case CMP_GE_U: return "ge_u"; - case CMP_LT_F: return "lt_f"; - case CMP_LE_F: return "le_f"; - case CMP_GT_F: return "gt_f"; - case CMP_GE_F: return "ge_f"; - } - return "?cmp"; -} +static void replay_func(OptImpl* o) { + Func* f = o->f; + CGTarget* w = o->target; -static void print_operand(Writer* w, const Operand* op) { - switch ((OpKind)op->kind) { - case OPK_IMM: - wstr(w, "imm:"); - wint(w, op->v.imm); - return; - case OPK_REG: - wstr(w, "v"); - wint(w, (i64)op->v.reg); - return; - case OPK_LOCAL: - wstr(w, "fs"); - wint(w, (i64)op->v.frame_slot); - return; - case OPK_GLOBAL: - wstr(w, "sym"); - wint(w, (i64)op->v.global.sym); - if (op->v.global.addend) { - wstr(w, "+"); - wint(w, op->v.global.addend); - } - return; - case OPK_INDIRECT: - wstr(w, "[v"); - wint(w, (i64)op->v.ind.base); - if (op->v.ind.ofs) { - wstr(w, "+"); - wint(w, op->v.ind.ofs); - } - wstr(w, "]"); - return; + ReplayCtx r; + r.o = o; + r.tgt = w; + u32 nv = f->nvals ? f->nvals : 1u; + r.val_to_reg = arena_zarray(f->arena, Reg, nv); + for (u32 i = 0; i < nv; ++i) r.val_to_reg[i] = REG_NONE; + r.val_alloced = arena_zarray(f->arena, u8, nv); + r.slot_map = arena_zarray(f->arena, FrameSlot, f->nframe_slots + 1u); + for (u32 i = 0; i <= f->nframe_slots; ++i) r.slot_map[i] = FRAME_SLOT_NONE; + u32 nb = f->nblocks ? f->nblocks : 1u; + r.label_map = arena_zarray(f->arena, Label, nb); + for (u32 i = 0; i < f->nblocks; ++i) r.label_map[i] = LABEL_NONE; + r.scope_map = arena_zarray(f->arena, CGScope, f->nscopes + 1u); + for (u32 i = 0; i <= f->nscopes; ++i) r.scope_map[i] = CG_SCOPE_NONE; + r.block_label_placed = arena_zarray(f->arena, u8, nb); + + /* func_begin with the recorded descriptor. The desc.params[].slot + * fields are wrapper IR slot ids; aarch64's func_begin doesn't + * dereference them so we don't translate. */ + w->func_begin(w, &f->desc); + + for (u32 i = 0; i < f->nframe_slots; ++i) { + IRFrameSlot* s = &f->frame_slots[i]; + FrameSlotDesc d = {0}; + d.type = s->type; + d.name = s->name; + d.loc = s->loc; + d.size = s->size; + d.align = s->align; + d.kind = s->kind; + d.flags = s->flags; + r.slot_map[s->id] = w->frame_slot(w, &d); } - wstr(w, "?op"); -} -static void print_tape(OptImpl* o, Writer* w) { - for (u32 i = 0; i < o->ntape; ++i) { - TapeEntry* e = &o->tape[i]; - if (e->dead) { - wstr(w, " ; dead\n"); - continue; - } - wstr(w, " "); - switch ((TapeOpKind)e->op) { - case TOP_FUNC_BEGIN: - wstr(w, "func_begin sym="); - wint(w, (i64)e->u.func_begin.desc.sym); - wstr(w, " nparams="); - wint(w, (i64)e->u.func_begin.desc.nparams); - break; - case TOP_FUNC_END: - wstr(w, "func_end"); - break; - case TOP_ALLOC_REG: - wstr(w, "alloc_reg v"); - wint(w, (i64)e->u.alloc_reg.vreg); - wstr(w, " cls="); - wint(w, (i64)e->u.alloc_reg.cls); - break; - case TOP_FRAME_SLOT: - wstr(w, "frame_slot fs"); - wint(w, (i64)e->u.frame_slot.vslot); - wstr(w, " size="); - wint(w, (i64)e->u.frame_slot.desc.size); - wstr(w, " kind="); - wint(w, (i64)e->u.frame_slot.desc.kind); - break; - case TOP_PARAM: - wstr(w, "param idx="); - wint(w, (i64)e->u.param.desc.index); - wstr(w, " fs="); - wint(w, (i64)e->u.param.desc.slot); - break; - case TOP_LABEL_NEW: - wstr(w, "label_new L"); - wint(w, (i64)e->u.label_new.vlabel); - break; - case TOP_LABEL_PLACE: - wstr(w, "label_place L"); - wint(w, (i64)e->u.label_op.vlabel); - break; - case TOP_JUMP: - wstr(w, "jump L"); - wint(w, (i64)e->u.label_op.vlabel); - break; - case TOP_CMP_BRANCH: - wstr(w, "cmp_branch "); - wstr(w, cmp_name(e->u.cmp_branch.op)); - wstr(w, " "); - print_operand(w, &e->u.cmp_branch.a); - wstr(w, ", "); - print_operand(w, &e->u.cmp_branch.b); - wstr(w, " -> L"); - wint(w, (i64)e->u.cmp_branch.vlabel); - break; - case TOP_SCOPE_BEGIN: - wstr(w, "scope_begin S"); - wint(w, (i64)e->u.scope_begin.vscope); - wstr(w, " kind="); - wint(w, (i64)e->u.scope_begin.desc.kind); - break; - case TOP_SCOPE_ELSE: - wstr(w, "scope_else S"); - wint(w, (i64)e->u.scope_op.vscope); - break; - case TOP_SCOPE_END: - wstr(w, "scope_end S"); - wint(w, (i64)e->u.scope_op.vscope); - break; - case TOP_BREAK_TO: - wstr(w, "break_to S"); - wint(w, (i64)e->u.scope_op.vscope); - break; - case TOP_CONTINUE_TO: - wstr(w, "continue_to S"); - wint(w, (i64)e->u.scope_op.vscope); - break; - case TOP_LOAD_IMM: - wstr(w, "load_imm "); - print_operand(w, &e->u.load_imm.dst); - wstr(w, ", "); - wint(w, e->u.load_imm.imm); - break; - case TOP_LOAD_CONST: - wstr(w, "load_const "); - print_operand(w, &e->u.load_const.dst); - wstr(w, ", <bytes:"); - wint(w, (i64)e->u.load_const.cb.size); - wstr(w, ">"); - break; - case TOP_COPY: - wstr(w, "copy "); - print_operand(w, &e->u.copy.dst); - wstr(w, ", "); - print_operand(w, &e->u.copy.src); - break; - case TOP_LOAD: - wstr(w, "load "); - print_operand(w, &e->u.load.dst); - wstr(w, ", "); - print_operand(w, &e->u.load.addr); - break; - case TOP_STORE: - wstr(w, "store "); - print_operand(w, &e->u.store.addr); - wstr(w, ", "); - print_operand(w, &e->u.store.src); - break; - case TOP_ADDR_OF: - wstr(w, "addr_of "); - print_operand(w, &e->u.copy.dst); - wstr(w, ", "); - print_operand(w, &e->u.copy.src); - break; - case TOP_TLS_ADDR_OF: - wstr(w, "tls_addr_of "); - print_operand(w, &e->u.tls_addr_of.dst); - wstr(w, ", sym"); - wint(w, (i64)e->u.tls_addr_of.sym); - break; - case TOP_COPY_BYTES: - wstr(w, "copy_bytes "); - print_operand(w, &e->u.agg.a); - wstr(w, ", "); - print_operand(w, &e->u.agg.b); - wstr(w, " size="); - wint(w, (i64)e->u.agg.agg.size); - break; - case TOP_SET_BYTES: - wstr(w, "set_bytes "); - print_operand(w, &e->u.agg.a); - wstr(w, ", "); - print_operand(w, &e->u.agg.b); - wstr(w, " size="); - wint(w, (i64)e->u.agg.agg.size); - break; - case TOP_BITFIELD_LOAD: - wstr(w, "bitfield_load "); - print_operand(w, &e->u.bitfield_load.dst); - wstr(w, ", "); - print_operand(w, &e->u.bitfield_load.record); - break; - case TOP_BITFIELD_STORE: - wstr(w, "bitfield_store "); - print_operand(w, &e->u.bitfield_store.record); - wstr(w, ", "); - print_operand(w, &e->u.bitfield_store.src); - break; - case TOP_BINOP: - wstr(w, binop_name(e->u.binop.op)); - wstr(w, " "); - print_operand(w, &e->u.binop.dst); - wstr(w, ", "); - print_operand(w, &e->u.binop.a); - wstr(w, ", "); - print_operand(w, &e->u.binop.b); - break; - case TOP_UNOP: - wstr(w, unop_name(e->u.unop.op)); - wstr(w, " "); - print_operand(w, &e->u.unop.dst); - wstr(w, ", "); - print_operand(w, &e->u.unop.a); - break; - case TOP_CMP: - wstr(w, "cmp."); - wstr(w, cmp_name(e->u.cmp.op)); - wstr(w, " "); - print_operand(w, &e->u.cmp.dst); - wstr(w, ", "); - print_operand(w, &e->u.cmp.a); - wstr(w, ", "); - print_operand(w, &e->u.cmp.b); - break; - case TOP_CONVERT: - wstr(w, "convert "); - print_operand(w, &e->u.convert.dst); - wstr(w, ", "); - print_operand(w, &e->u.convert.src); - wstr(w, " kind="); - wint(w, (i64)e->u.convert.kind); - break; - case TOP_CALL: - wstr(w, "call "); - print_operand(w, &e->u.call.desc.callee); - wstr(w, " nargs="); - wint(w, (i64)e->u.call.desc.nargs); - break; - case TOP_RET: - wstr(w, "ret"); - if (e->u.ret.present) { - wstr(w, " "); - print_operand(w, &e->u.ret.val.storage); - } - break; - case TOP_ALLOCA: - wstr(w, "alloca "); - print_operand(w, &e->u.alloca_.dst); - wstr(w, ", "); - print_operand(w, &e->u.alloca_.size); - break; - case TOP_VA_START: - wstr(w, "va_start "); - print_operand(w, &e->u.va_se.ap); - break; - case TOP_VA_ARG: - wstr(w, "va_arg "); - print_operand(w, &e->u.va_arg_.dst); - wstr(w, ", "); - print_operand(w, &e->u.va_arg_.ap); - break; - case TOP_VA_END: - wstr(w, "va_end "); - print_operand(w, &e->u.va_se.ap); - break; - case TOP_VA_COPY: - wstr(w, "va_copy "); - print_operand(w, &e->u.copy.dst); - wstr(w, ", "); - print_operand(w, &e->u.copy.src); - break; - case TOP_SETJMP: - wstr(w, "setjmp "); - print_operand(w, &e->u.setjmp_.dst); - wstr(w, ", "); - print_operand(w, &e->u.setjmp_.buf); - break; - case TOP_LONGJMP: - wstr(w, "longjmp "); - print_operand(w, &e->u.longjmp_.buf); - wstr(w, ", "); - print_operand(w, &e->u.longjmp_.val); - break; - case TOP_ATOMIC_LOAD: - wstr(w, "atomic_load "); - print_operand(w, &e->u.atomic_load.dst); - wstr(w, ", "); - print_operand(w, &e->u.atomic_load.addr); - break; - case TOP_ATOMIC_STORE: - wstr(w, "atomic_store "); - print_operand(w, &e->u.atomic_store.addr); - wstr(w, ", "); - print_operand(w, &e->u.atomic_store.src); - break; - case TOP_ATOMIC_RMW: - wstr(w, "atomic_rmw op="); - wint(w, (i64)e->u.atomic_rmw.op); - wstr(w, " "); - print_operand(w, &e->u.atomic_rmw.dst); - wstr(w, ", "); - print_operand(w, &e->u.atomic_rmw.addr); - wstr(w, ", "); - print_operand(w, &e->u.atomic_rmw.val); - break; - case TOP_ATOMIC_CAS: - wstr(w, "atomic_cas prior="); - print_operand(w, &e->u.atomic_cas.prior); - wstr(w, " ok="); - print_operand(w, &e->u.atomic_cas.ok); - wstr(w, " addr="); - print_operand(w, &e->u.atomic_cas.addr); - break; - case TOP_FENCE: - wstr(w, "fence mo="); - wint(w, (i64)e->u.fence.mo); - break; - case TOP_INTRINSIC: - wstr(w, "intrinsic kind="); - wint(w, (i64)e->u.intrinsic.kind); - wstr(w, " ndst="); - wint(w, (i64)e->u.intrinsic.ndst); - wstr(w, " narg="); - wint(w, (i64)e->u.intrinsic.narg); - break; - case TOP_SET_LOC: - wstr(w, "set_loc "); - wint(w, (i64)e->u.set_loc.loc.line); - wstr(w, ":"); - wint(w, (i64)e->u.set_loc.loc.col); - break; - } - wstr(w, "\n"); + for (u32 i = 0; i < f->nparams; ++i) { + IRParam* p = &f->params[i]; + CGParamDesc d = {0}; + d.index = p->index; + d.name = p->name; + d.type = p->type; + d.slot = slot_to_target(&r, p->slot); + d.abi = p->abi; + d.loc = p->loc; + w->param(w, &d); } -} -/* ---- Phase 2 peephole: integer constant folding ---- - * - * Pattern: LOAD_IMM(V_a, k_a); LOAD_IMM(V_b, k_b); BINOP(op, V_d, V_a, V_b) - * with op ∈ {IADD, ISUB, IMUL}. - * After: the BINOP is rewritten to LOAD_IMM(V_d, k_a OP k_b). - * - * Both operands must be OPK_REG referencing wrapper vregs whose only - * recorded definition was a LOAD_IMM. The intermediate LOAD_IMMs are - * left in place — they may have other uses, and DCE is a Phase 3 - * concern. - * - * Folding is done in 64-bit signed arithmetic and truncated by the - * target's load_imm based on the destination type. This matches C11 - * §6.5/3 ("two's-complement wraparound at the abstract machine level - * for signed and unsigned integer types alike" per cfree's no-UB - * stance — see doc/DESIGN.md §9). */ - -typedef struct ImmInfo { - i64 val; - u8 known; -} ImmInfo; - -static void peephole_constfold(OptImpl* o) { - ImmInfo* imm; - u32 cap; - - if (o->next_vreg <= 1) return; - cap = o->next_vreg; - imm = arena_zarray(o->c->tu, ImmInfo, cap); - - for (u32 i = 0; i < o->ntape; ++i) { - TapeEntry* e = &o->tape[i]; - if (e->dead) continue; - switch ((TapeOpKind)e->op) { - case TOP_LOAD_IMM: - if (e->u.load_imm.dst.kind == OPK_REG) { - Reg r = e->u.load_imm.dst.v.reg; - if (r < cap) { - imm[r].val = e->u.load_imm.imm; - imm[r].known = 1; - } - } - break; - case TOP_BINOP: { - Operand a = e->u.binop.a; - Operand b = e->u.binop.b; - BinOp op = e->u.binop.op; - if (a.kind != OPK_REG || b.kind != OPK_REG) break; - if (a.v.reg >= cap || b.v.reg >= cap) break; - if (!imm[a.v.reg].known || !imm[b.v.reg].known) break; - if (op != BO_IADD && op != BO_ISUB && op != BO_IMUL) break; - - i64 av = imm[a.v.reg].val; - i64 bv = imm[b.v.reg].val; - u64 folded; - /* Compute in u64 to make wraparound deterministic, then cast - * back. cfree's no-UB stance forbids signed-overflow-is-UB - * exploitation (doc/DESIGN.md §9), so this is the right shape. */ - switch (op) { - case BO_IADD: folded = (u64)av + (u64)bv; break; - case BO_ISUB: folded = (u64)av - (u64)bv; break; - case BO_IMUL: folded = (u64)av * (u64)bv; break; - default: continue; - } - - Operand dst = e->u.binop.dst; - memset(&e->u, 0, sizeof e->u); - e->op = (u8)TOP_LOAD_IMM; - e->u.load_imm.dst = dst; - e->u.load_imm.imm = (i64)folded; - if (dst.kind == OPK_REG && dst.v.reg < cap) { - imm[dst.v.reg].val = (i64)folded; - imm[dst.v.reg].known = 1; - } - break; - } - default: - break; - } + /* Body in emit order — the order CG's emit cursor visited each + * block. Block-creation order can differ when label_new precedes a + * cmp_branch whose fallthrough block must physically follow. */ + for (u32 i = 0; i < f->emit_order_n; ++i) { + replay_block(&r, f->emit_order[i]); } + + w->func_end(w); } -/* ---- func_end: append TOP_FUNC_END, run peepholes, replay ---- */ +/* ---- func_end: optionally run dry-run passes; replay; reset ---- */ static void w_func_end(CGTarget* t) { OptImpl* o = impl_of(t); - tape_append(o, TOP_FUNC_END); - peephole_constfold(o); - if (o->dump_writer) print_tape(o, o->dump_writer); - replay(o); -} + if (!o->f) return; -/* ---- public API: dump writer ---- */ + if (o->level >= 2) { + opt_build_cfg(o->f); + opt_build_ssa(o->f); + } -void opt_set_dump_writer(CGTarget* t, Writer* w) { - /* Identify our own targets by the func_begin slot. Anything else is - * a non-opt CGTarget and the call is a silent no-op. */ - if (!t || t->func_begin != w_func_begin) return; - impl_of(t)->dump_writer = w; + replay_func(o); + o->f = NULL; + o->cur = 0; } -/* ---- end-of-TU and destruction ---- */ +/* ---- finalize / destroy ---- */ static void w_finalize(CGTarget* t) { CGTarget* wr = impl_of(t)->target; @@ -1779,12 +1180,16 @@ static void w_destroy(CGTarget* t) { if (wr->destroy) wr->destroy(wr); } +/* ---- public dump-writer API ---- */ + +void opt_set_dump_writer(CGTarget* t, Writer* w) { + if (!t || t->func_begin != w_func_begin) return; + impl_of(t)->dump_writer = w; +} + /* ---- construction ---- */ CGTarget* opt_cgtarget_new(Compiler* c, CGTarget* target, int level) { - OptImpl* o; - CGTarget* t; - if (!target) { SrcLoc loc = {0, 0, 0}; compiler_panic(c, loc, "opt_cgtarget_new: target is NULL"); @@ -1795,13 +1200,13 @@ CGTarget* opt_cgtarget_new(Compiler* c, CGTarget* target, int level) { level); } - o = arena_new(c->tu, OptImpl); + OptImpl* o = arena_new(c->tu, OptImpl); memset(o, 0, sizeof *o); o->c = c; o->target = target; o->level = level; - t = &o->base; + CGTarget* t = &o->base; t->c = c; t->obj = target->obj; t->mc = target->mc; diff --git a/src/opt/pass_cfg.c b/src/opt/pass_cfg.c @@ -0,0 +1,116 @@ +/* pass_cfg.c — derive Block.preds and Block.succ/nsucc from each + * block's terminator. doc/OPT.md §3 Phase 3. + * + * Terminator inventory: + * IR_BR — 1 succ (succ[0]) + * IR_CONDBR — 2 succs ([true, false]) + * IR_CMP_BRANCH — 2 succs ([taken, fallthrough]) + * IR_RET — 0 succs + * IR_LONGJMP — 0 succs + * IR_INTRINSIC TRAP/UNREACHABLE — 0 succs + * IR_BREAK_TO / IR_CONTINUE_TO — 0 succs (control transferred to + * the scope's break/continue label, + * which is a successor encoded on + * the IRScopeAux; pass populates + * succ from there) + * + * IR_SETJMP is a control barrier: the recorder splits its block but + * IR_SETJMP itself falls through. pass_cfg sees it as a normal inst. + * + * For scope ops the wrapper's recording assigns succ[] at emit time + * (since it owns the vlabel→block_id mapping). pass_cfg trusts that + * and only repopulates from the trailing terminator inst when one is + * present. */ + +#include "opt/ir.h" +#include "opt/opt.h" + +#include <string.h> + +#include "core/arena.h" +#include "core/core.h" + +static int is_terminator(const Inst* in) { + switch ((IROp)in->op) { + case IR_BR: + case IR_CONDBR: + case IR_CMP_BRANCH: + case IR_RET: + case IR_LONGJMP: + case IR_BREAK_TO: + case IR_CONTINUE_TO: + return 1; + case IR_INTRINSIC: + return in->extra.imm == INTRIN_TRAP || + in->extra.imm == INTRIN_UNREACHABLE; + default: + return 0; + } +} + +void opt_build_cfg(Func* f) { + for (u32 b = 0; b < f->nblocks; ++b) { + f->blocks[b].preds = NULL; + f->blocks[b].npreds = 0; + } + + /* Trust the recorder's succ[] for terminators that don't have a fixed + * succ count from the inst alone (IR_BR, IR_CONDBR, IR_CMP_BRANCH, + * IR_BREAK_TO, IR_CONTINUE_TO). Only fix nsucc for ops where we can + * read it directly from the op tag. */ + for (u32 b = 0; b < f->nblocks; ++b) { + Block* bl = &f->blocks[b]; + if (bl->ninsts == 0) { + bl->nsucc = 0; + continue; + } + const Inst* last = &bl->insts[bl->ninsts - 1]; + if (!is_terminator(last)) { + bl->nsucc = 0; + continue; + } + switch ((IROp)last->op) { + case IR_RET: + case IR_LONGJMP: + bl->nsucc = 0; + break; + case IR_INTRINSIC: + bl->nsucc = 0; + break; + case IR_BR: + case IR_BREAK_TO: + case IR_CONTINUE_TO: + bl->nsucc = 1; + break; + case IR_CONDBR: + case IR_CMP_BRANCH: + bl->nsucc = 2; + break; + default: + break; + } + } + + /* Count predecessors. */ + u32* counts = arena_zarray(f->arena, u32, f->nblocks); + for (u32 b = 0; b < f->nblocks; ++b) { + Block* bl = &f->blocks[b]; + for (u32 s = 0; s < bl->nsucc; ++s) { + u32 t = bl->succ[s]; + if (t < f->nblocks) counts[t]++; + } + } + for (u32 b = 0; b < f->nblocks; ++b) { + if (counts[b]) { + f->blocks[b].preds = arena_array(f->arena, u32, counts[b]); + } + } + for (u32 b = 0; b < f->nblocks; ++b) { + Block* bl = &f->blocks[b]; + for (u32 s = 0; s < bl->nsucc; ++s) { + u32 t = bl->succ[s]; + if (t >= f->nblocks) continue; + f->blocks[t].preds[f->blocks[t].npreds++] = b; + } + } +} diff --git a/src/opt/pass_ssa.c b/src/opt/pass_ssa.c @@ -0,0 +1,427 @@ +/* pass_ssa.c — mem2reg + dominance-frontier SSA construction. + * + * Goal for Phase 3 (doc/OPT.md): build SSA without consuming it. The + * output is discarded before replay, so this pass's job is shape + * checking — no panics on the corpus. + * + * Algorithm (Cooper-Harvey-Kennedy iterative dominators + Cytron et al. + * dominance-frontier phi insertion): + * + * 1. Postorder + reverse-postorder traversal of the CFG. + * 2. Compute idom[] iteratively via the two-finger intersect. + * 3. Compute DF[] from idom[] in one pass. + * 4. For each promotable FrameSlot (no FSF_ADDR_TAKEN), compute the + * iterated dominance frontier of its defining blocks; insert + * IR_PHI at the start of each block in the IDF. + * 5. Rename: DFS the dominator tree, maintain a stack per slot; on + * store push the stored Val, on load record (load_def → top) into + * a rename map, on each successor fill in this block's slot in + * that successor's phis. After processing children, pop. + * + * The rename map is built but its uses are intentionally NOT walked + * across other instructions in the dry-run — that is the part that + * mutates the IR for downstream passes, and Phase 3 discards the IR. + * The phi-insertion + slot-stack walk is the part that exercises the + * IR shape, which is what the dry-run is checking. */ + +#include "opt/ir.h" +#include "opt/opt.h" + +#include <string.h> + +#include "core/arena.h" +#include "core/core.h" + +#define BLK_NONE 0xffffffffu + +/* ---- postorder ---- */ + +typedef struct PostorderCtx { + Func* f; + u32* po; /* po[i] = block id at postorder position i */ + u32* po_idx; /* po_idx[block] = postorder position of block */ + u8* visited; + u32 count; +} PostorderCtx; + +static void postorder_dfs(PostorderCtx* ctx, u32 b) { + if (ctx->visited[b]) return; + ctx->visited[b] = 1; + Block* bl = &ctx->f->blocks[b]; + for (u32 s = 0; s < bl->nsucc; ++s) { + u32 t = bl->succ[s]; + if (t < ctx->f->nblocks) postorder_dfs(ctx, t); + } + ctx->po[ctx->count] = b; + ctx->po_idx[b] = ctx->count; + ctx->count++; +} + +/* ---- dominators (Cooper-Harvey-Kennedy) ---- */ + +static u32 dom_intersect(u32 b1, u32 b2, const u32* idom, const u32* po_idx) { + while (b1 != b2) { + while (po_idx[b1] < po_idx[b2]) b1 = idom[b1]; + while (po_idx[b2] < po_idx[b1]) b2 = idom[b2]; + } + return b1; +} + +static u32* compute_idom(Func* f, const u32* po, const u32* po_idx, u32 ncount, + u32 entry) { + u32* idom = arena_array(f->arena, u32, f->nblocks); + for (u32 b = 0; b < f->nblocks; ++b) idom[b] = BLK_NONE; + idom[entry] = entry; + + int changed = 1; + while (changed) { + changed = 0; + /* Reverse postorder, skip the entry block. */ + for (i32 i = (i32)ncount - 1; i >= 0; --i) { + u32 b = po[i]; + if (b == entry) continue; + Block* bl = &f->blocks[b]; + u32 new_idom = BLK_NONE; + for (u32 p = 0; p < bl->npreds; ++p) { + u32 pp = bl->preds[p]; + if (idom[pp] != BLK_NONE) { + new_idom = (new_idom == BLK_NONE) + ? pp + : dom_intersect(pp, new_idom, idom, po_idx); + } + } + if (new_idom != BLK_NONE && idom[b] != new_idom) { + idom[b] = new_idom; + changed = 1; + } + } + } + return idom; +} + +/* ---- dominance frontier ---- */ + +typedef struct DfSet { + u32* members; + u32 n, cap; +} DfSet; + +static void df_add(Arena* a, DfSet* s, u32 b) { + for (u32 i = 0; i < s->n; ++i) + if (s->members[i] == b) return; + if (s->n == s->cap) { + u32 ncap = s->cap ? s->cap * 2u : 4u; + u32* nb = arena_array(a, u32, ncap); + if (s->members) memcpy(nb, s->members, sizeof(u32) * s->n); + s->members = nb; + s->cap = ncap; + } + s->members[s->n++] = b; +} + +static DfSet* compute_df(Func* f, const u32* idom) { + DfSet* df = arena_zarray(f->arena, DfSet, f->nblocks); + for (u32 b = 0; b < f->nblocks; ++b) { + Block* bl = &f->blocks[b]; + if (bl->npreds < 2) continue; + if (idom[b] == BLK_NONE) continue; + for (u32 p = 0; p < bl->npreds; ++p) { + u32 runner = bl->preds[p]; + while (runner != idom[b] && runner != BLK_NONE) { + df_add(f->arena, &df[runner], b); + runner = idom[runner]; + } + } + } + return df; +} + +/* ---- promotable slots ---- */ + +/* Identify the FrameSlot a load/store address operand refers to. + * Returns 0 if the operand is not OPK_LOCAL (i.e., not a direct + * frame-slot reference) — those addresses route through computed Val + * pointers and cannot be promoted by a slot-keyed pass. */ +static u32 opnd_slot_id(const Operand* op) { + if (op->kind != OPK_LOCAL) return 0; + return (u32)op->v.frame_slot; +} + +static int slot_promotable(const Func* f, u32 slot_id) { + if (slot_id == 0 || slot_id > f->nframe_slots) return 0; + const IRFrameSlot* s = &f->frame_slots[slot_id - 1]; + if (s->flags & FSF_ADDR_TAKEN) return 0; + if (s->flags & FSF_VOLATILE) return 0; + /* Only locals are promotable for v1; params live in slots too but + * promoting them is a separate transform. */ + if (s->kind != FS_LOCAL) return 0; + return 1; +} + +/* ---- phi insertion ---- */ + +/* Insert n_new phi instructions at the start of block b, each tagged + * with its slot via opnds[0] = synthetic "slot tag" Val (we reuse + * IRPhiAux to carry pred info). The stored slot id lives in + * extra.imm. */ +static void insert_phis(Func* f, u32 b, u32 n_new, const u32* phi_slots, + const u32* phi_blocks_for_slot) { + if (!n_new) return; + Block* bl = &f->blocks[b]; + u32 old = bl->ninsts; + u32 nnew = old + n_new; + Inst* nb = arena_zarray(f->arena, Inst, nnew); + /* Phis go first. */ + for (u32 i = 0; i < n_new; ++i) { + Inst* in = &nb[i]; + u32 slot_id = phi_slots[i]; + const IRFrameSlot* s = &f->frame_slots[slot_id - 1]; + in->op = IR_PHI; + in->type = s->type; + in->extra.imm = (i64)slot_id; + /* Allocate IRPhiAux with one slot per pred, initialized to + * VAL_NONE. The rename pass fills pred_vals later. */ + IRPhiAux* aux = arena_znew(f->arena, IRPhiAux); + aux->npreds = bl->npreds; + if (bl->npreds) { + aux->pred_blocks = arena_array(f->arena, u32, bl->npreds); + aux->pred_vals = arena_zarray(f->arena, Val, bl->npreds); + memcpy(aux->pred_blocks, bl->preds, sizeof(u32) * bl->npreds); + } + /* Reuse extra union: imm carries slot, but we also need aux. We + * stash the IRPhiAux in opnds: opnds[0] is a sentinel pointer cast + * — that breaks the Val* type. Instead, we use a parallel side + * table rooted on the inst. To keep this self-contained without + * altering the Inst layout, we point in->opnds at a Val array of + * length 0 and rely on the aux pointer through the extra union. + * + * Layout choice: extra.aux = aux (carry the IRPhiAux pointer); + * imm-as-slot lives in a side table. */ + in->extra.aux = aux; + in->nopnds = 0; + in->opnds = NULL; + /* def Val: allocate a fresh value typed as the slot's type. */ + /* We can't use val_alloc directly (it's static); set up the val + * table manually via an emit-equivalent. Simpler: piggy-back on + * ir_emit_const_i style by appending after current state. But we're + * mid-rebuild of the block. Defer: encode def slot via re-reading + * the val table. */ + /* For Phase 3 dry-run, leave def = VAL_NONE on phis. The rename + * pass uses extra.aux->pred_vals to inspect phi shape; the + * dry-run discards before downstream passes need def. */ + in->def = VAL_NONE; + } + /* Existing instructions shifted right. */ + if (old) memcpy(nb + n_new, bl->insts, sizeof(Inst) * old); + bl->insts = nb; + bl->ninsts = nnew; + bl->cap = nnew; + /* val_def_inst for any val defined in this block has shifted by + * n_new. Walk the val table and update. */ + for (u32 v = 1; v < f->nvals; ++v) { + if (f->val_def_block[v] == b) f->val_def_inst[v] += n_new; + } + (void)phi_blocks_for_slot; +} + +/* ---- rename ---- */ + +typedef struct SlotStack { + Val* stack; + u32 n, cap; +} SlotStack; + +static void slot_push(Arena* a, SlotStack* s, Val v) { + if (s->n == s->cap) { + u32 ncap = s->cap ? s->cap * 2u : 4u; + Val* nb = arena_array(a, Val, ncap); + if (s->stack) memcpy(nb, s->stack, sizeof(Val) * s->n); + s->stack = nb; + s->cap = ncap; + } + s->stack[s->n++] = v; +} + +static Val slot_top(const SlotStack* s) { + return s->n ? s->stack[s->n - 1] : VAL_NONE; +} + +static void rename_dfs(Func* f, u32 b, const u32* idom, SlotStack* slots) { + Block* bl = &f->blocks[b]; + /* Track per-slot push count so we can pop on exit. */ + u32* pushed = arena_zarray(f->arena, u32, f->nframe_slots + 1); + + /* 1. Process phis in this block: each phi has extra.imm = slot id. + * Push a synthetic phi value (VAL_NONE in dry-run; a "fresh Val" + * once we wire renaming). */ + for (u32 i = 0; i < bl->ninsts; ++i) { + Inst* in = &bl->insts[i]; + if (in->op != IR_PHI) break; + u32 slot_id = (u32)in->extra.imm; + if (slot_id == 0 || slot_id > f->nframe_slots) continue; + /* For dry-run, the phi's def is VAL_NONE; we still push so the + * stack has the right depth. Downstream passes (Phase 4+) will + * allocate a real Val here. */ + slot_push(f->arena, &slots[slot_id], VAL_NONE); + pushed[slot_id]++; + } + + /* 2. Process the rest of the block. */ + for (u32 i = 0; i < bl->ninsts; ++i) { + Inst* in = &bl->insts[i]; + if (in->op == IR_PHI) continue; + if (in->op == IR_STORE) { + /* IR_STORE opnds: [0] = addr, [1] = src. */ + if (in->nopnds < 2) continue; + u32 sid = opnd_slot_id(&in->opnds[0]); + if (sid && slot_promotable(f, sid)) { + const Operand* src = &in->opnds[1]; + Val v = (src->kind == OPK_REG) ? (Val)src->v.reg : VAL_NONE; + slot_push(f->arena, &slots[sid], v); + pushed[sid]++; + } + continue; + } + if (in->op == IR_LOAD) { + /* IR_LOAD opnds: [0] = dst REG, [1] = addr. */ + if (in->nopnds < 2) continue; + u32 sid = opnd_slot_id(&in->opnds[1]); + if (sid && slot_promotable(f, sid)) { + /* Touching slot_top exercises the stack invariant — that's + * the shape check we want. The rewrite map proper would walk + * uses; we skip that in the dry-run (output discarded). */ + (void)slot_top(&slots[sid]); + } + continue; + } + } + + /* 3. Fill our slot in each successor's phis. */ + for (u32 s = 0; s < bl->nsucc; ++s) { + u32 succ = bl->succ[s]; + if (succ >= f->nblocks) continue; + Block* sb = &f->blocks[succ]; + /* Find which pred index `b` is in succ. */ + u32 pred_idx = 0; + int found = 0; + for (u32 p = 0; p < sb->npreds; ++p) { + if (sb->preds[p] == b) { + pred_idx = p; + found = 1; + break; + } + } + if (!found) continue; + for (u32 i = 0; i < sb->ninsts; ++i) { + Inst* in = &sb->insts[i]; + if (in->op != IR_PHI) break; + u32 slot_id = (u32)in->extra.imm; + if (slot_id == 0 || slot_id > f->nframe_slots) continue; + IRPhiAux* aux = (IRPhiAux*)in->extra.aux; + if (!aux || pred_idx >= aux->npreds) continue; + aux->pred_vals[pred_idx] = slot_top(&slots[slot_id]); + } + } + + /* 4. Recurse into immediate dom children. */ + for (u32 c = 0; c < f->nblocks; ++c) { + if (c == b) continue; + if (idom[c] == b) rename_dfs(f, c, idom, slots); + } + + /* 5. Pop. */ + for (u32 sid = 0; sid <= f->nframe_slots; ++sid) { + while (pushed[sid]--) { + if (slots[sid].n) slots[sid].n--; + } + } +} + +/* ---- main entry ---- */ + +void opt_build_ssa(Func* f) { + if (f->nblocks == 0) return; + + /* Postorder traversal from entry. */ + PostorderCtx pctx; + pctx.f = f; + pctx.po = arena_array(f->arena, u32, f->nblocks); + pctx.po_idx = arena_array(f->arena, u32, f->nblocks); + pctx.visited = arena_zarray(f->arena, u8, f->nblocks); + pctx.count = 0; + for (u32 b = 0; b < f->nblocks; ++b) pctx.po_idx[b] = 0; + postorder_dfs(&pctx, f->entry); + /* Unreachable blocks: never visited. Skip them in the dom analysis; + * the dry-run shouldn't crash on them. */ + + u32* idom = compute_idom(f, pctx.po, pctx.po_idx, pctx.count, f->entry); + DfSet* df = compute_df(f, idom); + + /* For each promotable slot, find defining blocks → iterated DF. */ + if (f->nframe_slots == 0) return; + u8* needs_phi_storage = + arena_zarray(f->arena, u8, (f->nframe_slots + 1) * f->nblocks); +#define NEEDS_PHI(slot, blk) \ + needs_phi_storage[((slot) * f->nblocks) + (blk)] + + /* Worklist algo per slot. */ + u32* worklist = arena_array(f->arena, u32, f->nblocks); + u8* on_worklist = arena_zarray(f->arena, u8, f->nblocks); + + for (u32 sid = 1; sid <= f->nframe_slots; ++sid) { + if (!slot_promotable(f, sid)) continue; + /* Reset per-slot worklist state. */ + for (u32 i = 0; i < f->nblocks; ++i) on_worklist[i] = 0; + u32 wn = 0; + /* Seed with blocks containing a store to this slot. */ + for (u32 b = 0; b < f->nblocks; ++b) { + Block* bl = &f->blocks[b]; + for (u32 i = 0; i < bl->ninsts; ++i) { + Inst* in = &bl->insts[i]; + if (in->op == IR_STORE && in->nopnds >= 1 && + opnd_slot_id(&in->opnds[0]) == sid) { + if (!on_worklist[b]) { + on_worklist[b] = 1; + worklist[wn++] = b; + } + break; + } + } + } + /* Iterated DF. */ + while (wn) { + u32 x = worklist[--wn]; + DfSet* d = &df[x]; + for (u32 i = 0; i < d->n; ++i) { + u32 y = d->members[i]; + if (NEEDS_PHI(sid, y)) continue; + NEEDS_PHI(sid, y) = 1; + if (!on_worklist[y]) { + on_worklist[y] = 1; + worklist[wn++] = y; + } + } + } + } + + /* Insert phis per block. */ + for (u32 b = 0; b < f->nblocks; ++b) { + u32 nphi = 0; + for (u32 sid = 1; sid <= f->nframe_slots; ++sid) { + if (NEEDS_PHI(sid, b)) nphi++; + } + if (!nphi) continue; + u32* slots_arr = arena_array(f->arena, u32, nphi); + u32 k = 0; + for (u32 sid = 1; sid <= f->nframe_slots; ++sid) { + if (NEEDS_PHI(sid, b)) slots_arr[k++] = sid; + } + insert_phis(f, b, nphi, slots_arr, NULL); + } + + /* Rename phase: DFS from entry over the dominator tree. */ + SlotStack* slots = arena_zarray(f->arena, SlotStack, f->nframe_slots + 1); + rename_dfs(f, f->entry, idom, slots); + +#undef NEEDS_PHI +} diff --git a/test/cg/run.sh b/test/cg/run.sh @@ -51,15 +51,18 @@ ALLOW_SKIP="${CFREE_TEST_ALLOW_SKIP:-0}" # Filters (env vars or positional args; args win): # $1 / CFREE_TEST_FILTER — substring match against case name # $2 / CFREE_TEST_PATHS — subset of "DREJ" (default "DREJ") -# CFREE_OPT_LEVELS — space-separated opt levels to exercise. Default "0 1" -# so every case is built twice: directly against the -# backend (level 0) and through the opt_cgtarget -# wrapper (level 1). Path W (DWARF) only runs at -# level 0 — opt-level DWARF equivalence is a later -# phase concern. +# CFREE_OPT_LEVELS — space-separated opt levels to exercise. Default +# "0 1 2" so every case is built three ways: +# directly against the backend (level 0), and +# through the opt_cgtarget wrapper at levels 1 and +# 2. Level 2 currently runs the Phase 3 dry-run +# passes (build_cfg + build_ssa) and discards +# before replay, so behavior must match level 1. +# Path W (DWARF) only runs at level 0 — opt-level +# DWARF equivalence is a later phase concern. FILTER="${1:-${CFREE_TEST_FILTER:-}}" PATHS="${2:-${CFREE_TEST_PATHS:-DREJW}}" -OPT_LEVELS="${CFREE_OPT_LEVELS:-0 1}" +OPT_LEVELS="${CFREE_OPT_LEVELS:-0 1 2}" case "$PATHS" in *D*) RUN_D=1;; *) RUN_D=0;; esac case "$PATHS" in *R*) RUN_R=1;; *) RUN_R=0;; esac case "$PATHS" in *E*) RUN_E=1;; *) RUN_E=0;; esac @@ -111,13 +114,13 @@ arch_raw="$(uname -m 2>/dev/null || true)" READELF_BIN="$(command -v llvm-readelf 2>/dev/null || command -v readelf 2>/dev/null || true)" -# Shared aarch64 exec helper — see test/lib/exec_aarch64.sh. Path E queues -# each linked.exe and we drain the queue in a single batched podman run -# after the case loop, amortizing the per-launch podman overhead across -# all ~200 cg cases. -EXEC_AARCH64_MOUNT_ROOT="$BUILD_DIR" -# shellcheck source=../lib/exec_aarch64.sh -source "$ROOT/test/lib/exec_aarch64.sh" +# Shared per-arch exec helper — see test/lib/exec_target.sh. Path E +# queues each linked.exe and we drain the queue in a single batched +# podman run per arch after the case loop, amortizing the per-launch +# podman overhead across all ~200 cg cases. +EXEC_TARGET_MOUNT_ROOT="$BUILD_DIR" +# shellcheck source=../lib/exec_target.sh +source "$ROOT/test/lib/exec_target.sh" # ---- build harness binaries ------------------------------------------------ @@ -263,6 +266,16 @@ for OPT_LEVEL in $OPT_LEVELS; do # negative-return cases compare correctly. expected_byte=$(( expected & 0xff )) + # Path E target arch. cg-runner --arches NAME prints the arches a + # case can run on (one per line). Today every case is aarch64-only; + # multi-arch cases will land alongside x64 codegen in MULTIARCH + # phase 3 and broaden this list. + case_arches="$("${CG_RUN[@]}" --arches "$name" 2>/dev/null)" + case_arches="${case_arches:-aarch64}" + # First arch is the canonical one path E targets. (Cases are still + # single-arch through phase 2; the loop is a placeholder seam.) + case_arch="$(printf '%s\n' "$case_arches" | head -n1)" + # ---- Path D: in-process JIT (only on aarch64) ------------------------ if [ $RUN_D -eq 1 ]; then if [ $is_aarch64 -eq 1 ]; then @@ -322,7 +335,7 @@ for OPT_LEVEL in $OPT_LEVELS; do >"$work/exec_link.out" 2>"$work/exec_link.err"; then dt=$(( $(now_ms) - t0 )); T_E=$(( T_E + dt )) note_fail "$name/E${TAG} (link failed, ${dt}ms)" - elif [ $have_runner -eq 1 ]; then + elif exec_target_supported "$case_arch"; then link_dt=$(( $(now_ms) - t0 )); T_E=$(( T_E + link_dt )) E_NAMES+=("$name") E_WORK+=("$work") @@ -330,10 +343,11 @@ for OPT_LEVEL in $OPT_LEVELS; do E_EXPECTED+=("$expected_byte") # Queue with a level-tagged key so cases at different # opt levels don't collide in the batched runner. - exec_aarch64_queue "L${OPT_LEVEL}_${name}" "$exe" \ - "$work/exec.out" "$work/exec.err" "$work/exec.rc" + exec_target_queue "$case_arch" "L${OPT_LEVEL}_${name}" \ + "$exe" "$work/exec.out" "$work/exec.err" \ + "$work/exec.rc" else - note_skip "$name/E${TAG}" "no qemu/podman" + note_skip "$name/E${TAG}" "no runner for $case_arch" fi else note_skip "$name/E${TAG}" "no link-exe-runner, aarch64 clang, or start.o" @@ -385,13 +399,13 @@ for OPT_LEVEL in $OPT_LEVELS; do done # ---- batched path-E flush + verification (per level) ------------------- - # Run every queued case in a single podman invocation, then iterate the - # queue to read each exit code and emit PASS/FAIL. - if [ "$(exec_aarch64_queue_size)" -gt 0 ]; then + # Run every queued case in a single podman invocation per arch, then + # iterate the queue to read each exit code and emit PASS/FAIL. + if [ "$(exec_target_queue_size)" -gt 0 ]; then printf 'Running path E%s (%d cases batched)...\n' \ - "$TAG" "$(exec_aarch64_queue_size)" + "$TAG" "$(exec_target_queue_size)" t0=$(now_ms) - exec_aarch64_flush + exec_target_flush DELTA=$(( $(now_ms) - t0 )) T_E_BATCH=$(( ${T_E_BATCH:-0} + DELTA )); T_E=$(( T_E + DELTA ))