Optimizer (OPT)

This document is the design reference for kit's optimizer: the module that sits between the recording code-generation API and the per-architecture native backends. The optimizer owns a private, mutable IR; it lowers each recorded function into that IR, runs analyses and transforms over it, performs register allocation, builds a physical post-allocation IR (MIR), and finally replays the MIR into a NativeTarget backend. It is also the source of the function shape the bytecode interpreter consumes. The focus here is layering, ownership, representation invariants, and the reasoning behind the boundaries — not API signatures, which live in the headers. Cross-references: DESIGN.md, CODEGEN.md, IR.md, ARCH.md, and INTERPRETER.md.

1. Where the optimizer sits

kit's codegen has two surfaces. The semantic surface is the recording code-generation API (cg/ir.h, the CgTarget interface): frontends call it to describe a function. The physical surface is the per-architecture NativeTarget (arch/native_target.h): it encodes machine code. The optimizer is the bridge.

frontend  --CgTarget calls-->  CgIrRecorder  --records-->  CgIrFunc tape
                                                                 |
                                                opt_func_from_cg_ir (cg_ir_lower.c)
                                                                 v
                                            optimizer IR: Func / Block / Inst / Val / PReg
                                                                 |
                                                  passes (analysis + transform)
                                                                 |
                                                regalloc -> MFunc (physical MIR)
                                                                 v
                                       opt_emit_native --NativeTarget calls--> machine code

At -O0 the optimizer is not installed at all: the driver wires the frontend's CgTarget straight to the backend's NativeDirectTarget, which emits in a single pass with a small register cache (see CODEGEN.md). At opt_level >= 1, opt_cgtarget_new (src/opt/opt.c) installs a CgIrRecorder (cg/ir_recorder.c) as the sink. The recorder captures each function as a CgIrFunc tape, and on completion fires the optimizer's per-function callback.

OptImpl (in src/opt/opt.c) is the wrapper state: the wrapped real target, the resolved NativeTarget*, an optional dump writer, and a per-translation- unit registry of recorded CgIrFuncs (with a parallel lazily-lowered-Func cache) used for streaming tiny-callee inline lookup.

When each function is processed: streaming vs. finalization

Two scheduling regimes exist, chosen by target architecture:

Per-function streaming (x64, rv64). As the recorder completes each function it fires the optimizer's per-function callback, which lowers and fully processes that one function immediately. Functions flow through the pipeline in recording order, one at a time, before the module is finalized.
Finalization-time, reachability-driven (ARM_64). The per-function callback registers the recorded CgIrFunc but does no lowering. All processing is deferred to module finalization, where a reachability sweep over the call/data-reloc graph computes the set of functions actually referenced from a root, prunes the rest, and only then lowers and processes the survivors.

Both regimes converge on the same lowering path and backend tail; they differ only in when a function is lowered and whether dead local functions are dropped before lowering or left for the linker (Section 3.1).

The recording/optimizing boundary

The split between recording (cg/ir) and optimizing (opt/ir) is deliberate and is the central design decision of this module:

The recorded CgIrFunc is a faithful, immutable transcript of the frontend's semantic intent. It speaks in CGLocal/Label/CGCallDesc terms and knows nothing about CFGs, dominators, or physical registers. Frontends and ABI lowering own that layer; the optimizer never mutates it. Keeping it immutable is what makes streaming tiny-inline re-lowering cheap and repeatable, and what lets the same recorded tape feed both the native pipeline and the interpreter.
opt_func_from_cg_ir (src/opt/cg_ir_lower.c) translates one CgIrFunc into the optimizer's own mutable Func — a real CFG of Blocks, each holding a linear list of Insts, plus frame slots, a pseudo-register table, a value table, and the params/locals tables. From here on the optimizer works only on Func; the recorded tape is a read-only source.

Lowering also performs the first storage-classification decision. In lower_locals, each semantic CGLocal becomes either register storage (CG_LOCAL_STORAGE_REG, a fresh PReg, operands of kind OPK_REG) or frame storage (CG_LOCAL_STORAGE_FRAME, a FrameSlot, operands of kind OPK_LOCAL). A local is forced to a frame home when it is address-taken / memory-required (local_needs_home), an aggregate, or larger than a machine word. Everything else starts in a pseudo-register. Address-taken locals begin in frame storage; later HIR address-folding and promotion passes recover register storage for those whose address does not actually escape (Section 4). va_list operands are lowered as opaque pointer values, never address-taken, so that all va-layout knowledge stays behind the NativeTarget va hooks.

2. The optimizer IR and its operand model

The optimizer IR lives in src/opt/ir.h / src/opt/ir.c. Its shape:

Func owns one function: its CFG (blocks, entry, emit_order), frame slots, params, locals, the pseudo-register table, the SSA value table, scope bookkeeping, allocation results, and per-pass scratch.
Block is a basic block: a growable Inst[], explicit preds/succ edges, and a pre-allocated MCLabel for blocks born from cg_label_new.
Inst is one recorded operation. The IROp enum mirrors the CgTarget surface essentially 1:1 (each recorded CgTarget call becomes exactly one Inst), plus a few SSA-only ops (IR_PHI, IR_CONST_I, IR_CONST_BYTES). Rich operations (calls, returns, switches, inline asm, atomics, aggregate memory ops, intrinsics, scopes, phis) carry a structured aux record so the full semantic descriptor round-trips to emission.

Virtual vs physical operands; the mode-on-`Func` invariant

Operand (kind OPK_REG/OPK_IMM/OPK_LOCAL/OPK_GLOBAL/OPK_INDIRECT) is intentionally not a bare value id. Register operands change meaning across the pipeline, but the field never changes — the mode is a flag on Func, never encoded in the numeric id:

During lowering and the whole O1 path, OPK_REG carries a PReg: a mutable pseudo-register id, the persistent storage location of a value.
After opt_build_reg_ssa (O2 only), OPK_REG carries a Val: an SSA single-definition value id. Func.opt_reg_ssa records which namespace is live; shared helpers (opt_reg_count, opt_reg_type, opt_reg_cls in opt_internal.h) consult it rather than guessing from context.
Physical registers never appear in OPK_REG HIR operands. Allocation results go to a separate location table, and physical operands appear only in the MIR (Section 6).

IR_PARAM_DECL is a def-only marker carrying no operands — the param's storage lives in the IRParam table, not in a synthetic self-operand. These invariants (virtual-only HIR operands, single-namespace-at-a-time, def-only param decls) are what the debug verifier (opt_verify, Section 7) checks at phase boundaries, so that a stale physical operand or a wrong-namespace use fails at the nearest checkpoint rather than in the backend encoder.

FrameSlot is the frame-storage currency: locals forced to memory, spill slots, ABI parameter slots, alloca regions, and outgoing-argument areas.

Token aliasing: optimizer-local names onto NativeTarget types

src/opt/ir.h deliberately reuses the physical backend's data types as the optimizer's own, via a layer of preprocessor #defines. After including arch/native_target.h it remaps a set of optimizer-local tokens onto the Native* types:

FrameSlot → NativeFrameSlot, FrameSlotKind/FS_* → the NativeFrameSlot* enum, RegClass/RC_* → NativeAllocClass, CGPhysRegInfo → NativePhysRegInfo, the known-frame descriptor, and the CG_REG_* register role flags.
It also re-#defines the now-removed semantic CG spellings — Operand, CGCallDesc, CGFuncDesc, CGParamDesc, CGScopeDesc, CGLocalStorage, FrameSlotDesc — onto the optimizer's own Opt* structs, so optimizer code can keep using the short historical names.

The reason is that the optimizer's frame-slot, register-class, and physical- register vocabulary is the backend's; sharing the structs avoids a translation layer at the emit boundary, where NativeFrame* is exactly what opt_emit_native hands the NativeTarget. The cost is a namespace hazard: a .c file that needs the real semantic cg/ir.h Operand/CG*Desc types (for example because it reads the recorded tape, or it talks to the live NativeTarget in Native* terms) must first #undef the aliased tokens. cg_ir_lower.c, opt.c, and pass_native_emit.c each do exactly this at the top of the file — they straddle the boundary and must escape the optimizer-local remapping to name the other side's types. Files that live entirely inside the optimizer IR (the analysis and transform passes) keep the aliases and never #undef.

3. One lowering path, three consumers

There is a single lowering path through the optimizer IR. The opt level and the consumer choose how far down it the function travels.

                       opt_func_from_cg_ir
                              |
            +-----------------+------------------+
            |                 |                  |
       O1 native          O2 mid-end        interpreter tap
   opt_run_o1_native    opt_cleanup +      opt_run_o1_interp
                        shared lowering    (stops before machinize)
            |                 |                  |
       machinize          SSA build,             |
       regalloc           value/mem passes,      |
       MIR + emit         conventional SSA,      |
                          undo-SSA, then         |
                          shared lowering        |
            v                 v                  v
       NativeTarget       NativeTarget      interp bytecode

O1 native (`opt_run_o1_native`)

This is the live optimized path for compiled output. opt_run_o1_native (src/opt/opt.c) is the per-function driver; how a function reaches it depends on the scheduling regime of Section 1. On x64/rv64 the per-function callback lowers the recorded function and calls opt_run_o1_native directly as each function is recorded. On ARM_64 the callback only registers the function; lowering and the call to opt_run_o1_native happen at finalization, once the reachability sweep has selected the function. Either way the function travels the same pipeline, entirely in the PReg namespace (opt_reg_ssa == 0) — no SSA construction, no value numbering. In source order:

build_cfg -> jump_cleanup(CFG) -> build_cfg -> simplify_local
try_tiny_inline   (+ cfg/jump_cleanup/cfg if anything inlined)
verify "lowering-cfg"
machinize_native        ABI/call/ret/param constraints + machine clobbers
verify "lowering-machinize"
addr_xform_pregs        fold ADDR_OF(local) into OPK_LOCAL loads/stores
promote_scalar_locals   non-escaped scalar frame slot -> PReg
addr_of_global_cse      hoist duplicate ADDR_OF(global) to entry
build_loop_tree
lower_loop_imm_operands / hoist_loop_consts   loop-invariant imm materialization
live_blocks             per-block PReg liveness (backward dataflow)
dead_def_elim_with_live pre-RA dead-definition elimination
regalloc_locations      PReg -> hard reg / spill slot (no live-range splitting)
verify "post-regalloc"
lower_to_mir            build physical MFunc; insert spill/reload
mir_verify "lower-mir"
mir_combine             post-RA peephole / addressing-mode synthesis
mir_dce                 post-RA dead-code elimination
mir_jump_cleanup(CFG) -> mir_build_cfg -> mir_jump_cleanup(LAYOUT)
emit_native             replay MIR into the NativeTarget

Once a function enters this pipeline it runs every stage — there is no per-op bypass within the pipeline itself. Varargs, inline asm, aggregates/sret/byval are all handled here. Most stages are bracketed by an opt_verify / opt_mir_verify checkpoint with a stage tag, and KIT_DUMP=<tag> dumps the IR at the matching stage (entry before any pass, pre-emit just before emit).

The reachability decision lives outside this pipeline and is per-architecture (Section 1). At module finalization (opt_on_finalize) file-scope asm blocks captured during recording are replayed on every target. On ARM_64, finalization additionally runs the reachability sweep that selects which functions are lowered at all, so dead local functions/data are never lowered or emitted; the survivors then each run the full pipeline above. On x64/rv64 every recorded function was already lowered and emitted during streaming, so dead-static elimination is left to the linker rather than performed here.

O2 mid-end (`opt_cleanup` + shared lowering)

The O2 mid-end is the SSA-based optimization schedule defined in opt_cleanup (src/opt/pass_o2.c). It is the intended mid-end architecture and is fully implemented, but it is not on the shipped code path: opt_cgtarget_new normalizes every requested opt_level to 1 (the line o->level = 1 in src/opt/opt.c), so no compilation ever selects O2 and every opt_level >= 1 request runs the O1 native path.

The rationale for this normalization is isolation. Keeping the O2 schedule defined and its passes maintained means the SSA representation and its incremental def-use can stabilize against targeted optimizer tests independently, without an SSA-construction or value-numbering bug affecting shipped output. The schedule is documented here because it is the designed mid-end shape that the O1 path is a deliberately reduced subset of; the section describes the intended architecture, not a live code path. The schedule is:

build_cfg / jump_cleanup(CFG) / build_cfg     canonicalize control flow
build_reg_ssa                                 PReg -> Val (register SSA)
block_cloning                                 bounded clone of small blocks
build_ssa                                     mem2reg: promote frame locals, insert phis
ssa_dce / copy_cleanup
addr_xform                                    fold address pseudos into mem operands
simplify                                      SSA-aware identity/algebraic cleanup
gvn                                           value numbering, constprop, branch fold,
                                              redundant-load reuse
copy_prop                                     copy + redundant-extension elimination
dse                                           dead store elimination
build_loop_tree / licm                        hoist loop invariants
pressure_relief                               sink same-block computes
make_conventional_ssa                         phis -> edge copies (IRF_NO_COALESCE)
ssa_combine
undo_ssa / copy_cleanup                        Val -> PReg, allocation-ready
jump_opt

By design an O2 function then re-enters the same backend tail as O1 (machinize through emit), with the allocator's live-range splitting and move-related coalescing enabled — the variants that the O1 path leaves off. The SSA value/memory passes (opt_gvn, opt_dse, opt_licm, opt_pressure_relief, opt_ssa_combine) live in src/opt/pass_o2.c; SSA construction and phi destruction in src/opt/pass_ssa.c.

Interpreter tap (`opt_run_o1_interp`)

The interpreter consumes the optimizer IR directly rather than machine code. The tap runs the maximal target-independent subset of the O1 pipeline and stops before machinization: build CFG, jump cleanup, simplify_local, the PReg-level address folds and scalar-local promotion, addr_of_global_cse, loop tree, and liveness-driven dead-def elimination. It deliberately stops before opt_machinize_native, register allocation, MIR lowering, and native emit. The result is a Func still in the PReg namespace (opt_reg_ssa == 0, no IR_PHI phis) that src/interp/lower.c lowers into threaded bytecode. The tap runs the folds even though in the native pipeline they sit after machinize, because they depend only on the PReg/frame-slot view, not on physical-register pools — so they are safe and they shrink the interpreter's work. See INTERPRETER.md.

4. Pass catalog by role

The passes are grouped here by responsibility. Each is one Func-in-place transform or analysis; the file paths orient the reader.

SSA mid-end (O2)

Register SSA + mem2reg (src/opt/pass_ssa.c): opt_build_reg_ssa renames multiply-assigned PRegs into SSA Vals; opt_build_ssa promotes eligible frame-backed locals/params to SSA via dominance-frontier phi insertion and rewrites their loads/stores to values. opt_make_conventional_ssa lowers phis to edge copies (marked IRF_NO_COALESCE, because coalescing a phi edge copy can collapse a loop-carried value with its successor and miscompile the loop) and opt_undo_ssa returns to the PReg namespace for allocation.
GVN + DSE orchestration (src/opt/pass_o2.c): opt_gvn does scalar value numbering, constant propagation, branch folding, and memory-aware redundant load / store-to-load reuse gated by alias-root and version rules; opt_dse removes stores proven dead or overwritten while preserving observable memory (volatile, atomic, calls that may clobber, escapes). opt_licm and opt_pressure_relief round out the loop/pressure work, also here.
Peephole combine + addressing-mode synthesis (src/opt/pass_combine.c): opt_combine is a per-block forward-pass-with-fixpoint that propagates copies, folds address-producing computations into a load/store's OPK_INDIRECT base/index/scale/offset where the backend accepts the shape, sinks defs toward their sole use, and folds extension chains. It is used in two roles: directly in the O2 SSA combine (opt_ssa_combine wraps it) and as the post-RA MIR combine (Section 6). When run over physical MIR it gates each rewrite on a live-range safety check (Section 5).
Simplify (src/opt/pass_simplify.c): opt_simplify_local is the no-SSA-required local algebraic/addressing canonicalizer used on every path; opt_simplify is the SSA-aware identity/constant cleanup used in O2.
DCE (src/opt/pass_dce.c): opt_ssa_dce removes unused SSA defs; opt_mir_dce removes post-RA dead physical defs; both preserve side effects, including the subtle case of a value-producing op whose destination is an OPK_LOCAL (a write to an escaped frame-homed local is a memory side effect even when the op is otherwise pure).
Copy cleanup / copy prop (src/opt/pass_copy.c): redundant-copy removal and copy propagation, including redundant extension/convert-chain elimination.
Inlining (src/opt/pass_inline.c): opt_try_tiny_inline is the streaming O1 entry. On the pre-machinize PReg form it resolves each direct IR_CALL to a recorded callee via a lookup callback (OptImpl owns the registry and the lazily re-lowered callee cache), gates on a tiny straightline-cost cap and a whitelist that excludes calls/control-rich constructs, refuses self/recursive callees, and splices the cloned body in. The whole-program inliner machinery (inline_call_site and its gates) also lives here.
Address folding (src/opt/pass_addr_fold.c): the always-on O1 HIR folds — opt_addr_xform_pregs (fold ADDR_OF(local) into direct OPK_LOCAL load/store operands and clear FSF_ADDR_TAKEN when all such defs retire), opt_promote_scalar_locals (promote a non-escaped scalar frame slot to a PReg, turning its stores/loads into copies), opt_addr_of_global_cse (hoist one ADDR_OF(global) compute to the entry block and reuse it), and the loop-invariant constant materialization (opt_hoist_loop_consts / opt_lower_loop_imm_operands). opt_addr_xform is the SSA-namespace counterpart used in O2.

Shared analyses

CFG (src/opt/pass_cfg.c): opt_build_cfg derives preds/succ from each block's terminator (branches, conditional/fused branches, returns, switches, indirect branches, scope break/continue edges) and validates reciprocity; opt_mir_build_cfg recomputes them over the physical MIR.
Order + dominators + verify (src/opt/pass_analysis.c): postorder / reverse-postorder, reachability, immediate dominators, dominator children, dominance frontiers (OptAnalysis), the coarse analysis-validity bits (OPT_ANALYSIS_CFG/DEF_USE/DOM/LOOP), and the debug verifier opt_verify.
Liveness (src/opt/pass_live.c): opt_live_blocks solves per-block PReg liveness by backward dataflow into elastic 64-bit-word bitsets (OptBitset, grown on demand, trailing-zero-trimmed); opt_live_ranges_build produces the compressed point-indexed live ranges and per-PReg frequency/spill-cost metrics the allocator consumes.
Hard-register liveness (src/opt/pass_hard_live.c): physical-register live-in/out over the post-RA MIR, plus the per-call clobber mask (opt_call_clobber_mask_for). This is what makes post-RA combine/DCE safe: a value in a callee-saved register survives a call, while caller-saved registers are killed by it.
Loop detection (src/opt/pass_loop.c): opt_build_loop_tree computes loop nesting depth from dominators; depth feeds the allocator's spill-cost weighting and LICM.

Backend tail

Type-size lowering (src/opt/pass_lower.c): the type/size machinery and the allocator that the PReg form needs before MIR (also hosts the allocation and constraint application described below).
Machinize (src/opt/pass_machinize.c): opt_machinize_native is ABI lowering against the NativeTarget. It annotates calls/returns/params with calling-convention constraints (argument/result registers, the call clobber and return masks, callee-save markers), collects the target's register classes (allocable set, reserved scratch set, caller/callee-saved masks) and checks allocable and scratch sets do not overlap, resolves inline-asm named-register constraint strings into masks, and records per-instruction fixed-register clobbers (Section 5).
MIR view (src/opt/pass_mir.c): the post-allocation physical IR. Rather than duplicate the CFG passes, pass_mir.c builds a transient Func view whose block arrays point at Func.mir, runs the shared opt_combine, opt_dce, opt_build_cfg, and opt_jump_cleanup over that view, and commits it back. The opt_mir_* wrappers are thin shims over this view; the shared passes are written once and reused for both HIR and MIR.
Coalescing / allocation (src/opt/pass_coalesce.c, src/opt/pass_lower.c): opt_regalloc_locations is a point-bitmap linear-scan allocator producing the canonical Func.preg_locs location table (hard reg or spill slot per PReg) without mutating HIR operands. The non-splitting form is the O1 path; when live-range splitting is enabled (the O2 quality path) it invokes move-related coalescing (opt_coalesce_ranges), which builds a bounded conflict matrix and merges only same-class, same-type values with compatible constraints and no range conflict — never an IRF_NO_COALESCE copy.
Jump / layout cleanup (src/opt/pass_jump.c): opt_jump_cleanup in CFG mode drops unreachable blocks and collapses unconditional-jump chains; in LAYOUT mode it reorders blocks for fallthrough, rotates simple single-latch loops, and inverts mis-aligned conditional branches so the per-iteration back-jump disappears.
Native emit (src/opt/pass_native_emit.c): opt_emit_native replays the physical MIR into a NativeTarget, using NativeLoc (register / frame / imm / address) as the operand currency. It reserves exactly the callee-saved registers the allocator used, pre-maps frame slots, drives the backend's minimal-prologue hook when available, routes scalar call results straight to their destination, uses a hardware zero register for stored zeros where the backend advertises one, and legalizes addresses the backend rejects into a reserved scratch register. See ARCH.md for the backend contract.

5. Machine register-constraint model

Some target instructions pin operands or results to specific physical registers and clobber others as a side effect of their encoding — hardware constraints, not allocator choices (x86-64 idiv/div pinning the dividend to rax and clobbering rdx, variable shifts requiring the count in cl, one-operand mul, cmpxchg, and the va_arg offset scratch). aarch64 and riscv64 have no such instructions — their div/shift/mul are ordinary three-operand forms — so on those targets the constraint hooks are inert.

The optimizer models all fixed-register requirements through two allocator primitives, and the allocator (pass_lower.c) speaks only in physical register numbers here:

Tied hard register (OptPRegInfo.tied_hard_reg): pin a value to a specific physical register. Set for inline-asm operands with a {reg} constraint and for fixed-input/fixed-output machine operands.
Forbidden / clobbered hard registers (OptPRegInfo.forbidden_hard_regs / clobbered_hard_regs): for each register an instruction clobbers, every value live across the instruction (live-after, not a use or def of it) is forbidden from that register. The clobbered subset is recorded separately so the soft return-register placement hint cannot later clear a forbid that came from a real hardware clobber.

Three sources feed these, all unified at allocation time:

Calls — the call plan's clobber_mask (caller-saved by default, or the call-specific mask) drives the live-across-forbid loop; argument/result registers come from the plan.
Inline asm — pass_machinize.c resolves named-register constraint strings and clobber lists into masks and fixed-register indices on the IRAsmAux; pass_lower.c's apply_asm_register_constraints ties the fixed operands and runs the live-across-forbid loop.
Generic machine instructions — a binop or convert has no aux to hang constraints on, so machinization queries the target's machine-clobber hook per instruction and stores the result in a per-function side table keyed by InstId (Func.inst_clobbers, built in machinize_inst_clobbers). At allocation, apply_machine_reg_clobbers looks up the instruction's clobber mask and applies the same live-across-forbid loop. A NULL hook (aa64/rv64) means no entries and zero behavior change.

This is the single place where target ISA register rules enter the allocator, and all three sources reuse one mechanism — tie + forbid — rather than patching assignments after the fact. A value that merely dies at the instruction needs no constraint (the backend stages it into/out of the fixed register itself); only values that survive past the instruction are forbidden.

6. Allocation, MIR, and the physical boundary

Allocation does not rewrite HIR. opt_regalloc_locations consumes block liveness and the compressed live ranges and writes one canonical location per PReg into Func.preg_locs (OptLoc: hard register or spill slot). HIR operands stay virtual after allocation — the verifier checks this.

opt_lower_to_mir then builds the physical IR Func.mir (an MFunc): each virtual OPK_REG is translated through its OptLoc into a physical register or a frame access, spilled values get a reload before each use and a store after each def, and call plans are lowered into physical argument/return moves. From this point the IR is physical and non-SSA (registers may be multiply defined). All PReg-to-physical knowledge lives in this one step; after it, the HIR is untouched and the MIR is fully physical. The downstream MIR passes (combine, DCE, jump/layout cleanup) run over the MIR view and rely on physical-register liveness for their safety checks, then opt_emit_native replays the MIR.

The reason allocation results are a separate table rather than rewritten operands is the same mode-clarity principle from Section 2: a post-allocation pass can never accidentally treat a physical register as a PReg, replay can never see a stale virtual operand, and the MIR verifier can assert "no PRegs or Vals here" at a single boundary.

7. Verification and observability

The optimizer is checkpoint-verified in debug builds. opt_verify(Func*, stage) checks CFG reciprocity, reachable-block shape, emit-order validity, instruction ids, operand namespaces (no physical registers in HIR; correct PReg-vs-Val namespace for the current mode), phi consistency, and def-use freshness; opt_mir_verify checks the physical boundary (no virtual operands, valid frame slots, fully physical call plans). Each pass tags its checkpoint with the name of the transformation just completed, so a failure localizes to the nearest boundary. Func.opt_valid_analyses tracks coarse invalidation; passes that mutate control flow, operands, or instructions rebuild or invalidate the relevant analysis.

Observability hooks: KIT_DUMP=<tag> dumps the optimizer IR at a named stage, KIT_DUMPCG=1 dumps the recorded semantic tape before lowering, KIT_DUMP_INTERP dumps the interpreter-tap Func, and the optimizer emits scoped timing/count metrics (visible through kit run --time) for the frontend, each pass scope, allocation, and emit.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

kit