kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 5e4d8e66ad3facbda764b97e7df3cd9cef32ede6
parent 89b2feab45e6fd2e256d7f6ea889d6383777149e
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 15 May 2026 13:04:53 -0700

opt: keep scalar locals in virtual registers

Diffstat:
Mdoc/OPT1.md | 104+++++++++++++++++++++++++++++++++++++++++++++++++++----------------------------
Msrc/api/cg.c | 185++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
Msrc/arch/arch.h | 32++++++++++++++++++++++++++++++++
Msrc/arch/regalloc.c | 5++++-
Msrc/opt/ir.c | 17+++++++++++++++++
Msrc/opt/ir.h | 12++++++++++++
Msrc/opt/opt.c | 252+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/api/cg_type_test.c | 133+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/opt/opt_test.c | 180+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/opt/phase0_guardrails.sh | 29+++++++++++++++++++++++++++++
Atest/parse/cases/opt_01_scalar_local.c | 5+++++
Atest/parse/cases/opt_01_scalar_local.expected | 1+
Atest/parse/cases/opt_02_late_addrof_join.c | 10++++++++++
Atest/parse/cases/opt_02_late_addrof_join.expected | 1+
Mtest/parse/run.sh | 53+++++++++++++++++++++++++++++++++++++++--------------
15 files changed, 941 insertions(+), 78 deletions(-)

diff --git a/doc/OPT1.md b/doc/OPT1.md @@ -181,52 +181,70 @@ make test-smoke-x64 ## Current Disassembly Observations -A small AArch64 Linux probe with arithmetic, a conditional, a loop, and a call -shows the current shape: - -- O1 is smaller than O0 and removes many post-RA physical moves. -- The old prologue NOP sleds are gone on O1 for AArch64, x64, and RV64; the - known-frame entry path emits exact prologues for those targets. -- Loads/stores for simple C locals and parameters are still unchanged between - O0 and O1 in the probe. -- Boolean branch lowering still materializes compare results: - -```asm -cmp w19, #0x64 -cset w20, gt -cmp w20, #0x0 -b.eq ... -``` - -The desired local shape is a direct conditional branch: +A May 2026 probe compiled existing parse corpus cases at `-O0` and `-O1`, +then disassembled with `llvm-objdump -dr`. The cases covered straight-line +scalar arithmetic, a direct call, a while loop, and a backward-goto loop: + +| case | AArch64 O0 -> O1 | x64 O0 -> O1 | RV64 O0 -> O1 | +| ---- | ---------------- | ------------ | ------------- | +| `6_5_17_compound_assign` | 40 -> 18 insns, 160 -> 72 bytes | 93 -> 22 insns, 158 -> 96 bytes | 53 -> 28 insns, 212 -> 112 bytes | +| `6_5_24_func_call` | 70 -> 26 insns, 276 -> 100 bytes | 194 -> 34 insns, 281 -> 131 bytes | 97 -> 47 insns, 384 -> 184 bytes | +| `6_8_02_while_sum` | 52 -> 26 insns, 208 -> 104 bytes | 100 -> 29 insns, 210 -> 127 bytes | 67 -> 38 insns, 268 -> 152 bytes | +| `6_8_11_goto_backward` | 46 -> 20 insns, 184 -> 80 bytes | 93 -> 24 insns, 186 -> 106 bytes | 61 -> 32 insns, 244 -> 128 bytes | + +Observed progress: + +- O1 is materially smaller across all three probed targets. +- The old prologue NOP sleds are gone on O1 for AArch64, x64, and RV64. +- Simple loop locals are now kept in registers in the AArch64, x64, and RV64 + loop probes instead of being reloaded from stack slots on each iteration. +- Branch-consuming compares now lower to direct compare branches in the probed + AArch64 cases. For example, the while-loop condition is: ```asm -cmp w19, #0x64 -b.le ... +cmp w19, #0xa +b.ge ... ``` -The same pattern appears in loop conditions. + The larger short-circuit control-flow probe likewise shows `cmp` plus direct + conditional branches and no `cset` bridge for branch consumers. + +Remaining O1 shape issues visible in the same dumps: + +- Cheap branch layout cleanup is still missing. O1 frequently emits an + unconditional branch to the immediately following block, including before + loop headers and before shared return epilogues. +- O1 still saves/restores more callee-saved registers than the body appears to + use in small functions. For example, the AArch64 while-loop probe saves + `x22` even though the body uses `x19-x21`; x64 and RV64 show similar + over-preservation. +- Post-RA copy cleanup still leaves avoidable moves such as `add tmp, ...` + followed by `mov live, tmp`. More two-address or destination-selection + folding would help on all targets. +- Parameter and entry-slot promotion is incomplete. The trivial identity + function at O1 still stores `w0` to a frame slot and reloads it before + returning on AArch64. +- Very local constant simplification is still absent in O1. The + `int x = 40; x += 2; return x;` probe is much smaller than O0, but still + emits immediate materialization plus an add instead of returning `42`. ## Todo: Code Quality MIR's O1 path suggests these high-value local cleanups that still fit cfree's fast tier: -1. Fuse compare-result conditionals into direct compare branches. - MIR's C frontend emits direct compare branches for conditional compares, and - its generic compare-value-plus-branch fusion lives in the heavier SSA - combine path. For cfree O1, recover or preserve `IR_CMP_BRANCH` instead of - lowering through `IR_CMP` plus `IR_CONDBR` when the compare result has a - single branch use. This should use target `cmp_branch` support and should - remove the `cmp; cset; cmp #0; b.cond` pattern on AArch64. +1. Keep compare-branch fusion covered by tests. + The current probes show direct `cmp` plus branch shapes for branch-consuming + compares on AArch64. Add focused regression coverage so the old + `cmp; cset; cmp #0; b.cond` bridge does not return. -2. Promote simple scalar locals before backend allocation. +2. Promote remaining scalar entry slots before backend allocation. MIR's C frontend represents normal scalar block locals as MIR registers and leaves stack slots for aggregates, forced-stack cases, and address-taken - values. O1 still stores parameters and scalar locals to frame slots and - reloads them in straightforward code. A conservative mem2reg-lite pass - should promote locals whose address does not escape, starting with - integer/pointer scalars in single-entry structured control flow. + values. O1 now keeps simple loop locals in registers in the probe, but still + stores and reloads some parameter/entry slots. A conservative mem2reg-lite + pass should promote remaining integer/pointer scalars whose address does not + escape, starting with parameters and single-entry structured control flow. 3. Clean up local branch layout artifacts. MIR's full jump optimizer is O2-only, but its cheap pieces are appropriate @@ -234,10 +252,22 @@ fast tier: branch-to-branch targets, and invert a branch when it removes an unconditional jump. Avoid full CFG layout work. -- Continue tightening post-rewrite DCE. - Model hard-register call arguments and call clobbers precisely enough to - delete dead caller-saved defs before calls without removing required ABI - traffic. +4. Avoid unnecessary callee-save traffic. + Reserve and preserve only hard registers that survive final post-rewrite + cleanup, and consider caller-saved registers for values that are not live + across calls. This would make small leaf functions much closer to expected + O1 output without requiring global optimization. + +5. Continue tightening post-rewrite DCE and copy cleanup. + Model hard-register call arguments and call clobbers precisely enough to + delete dead caller-saved defs before calls without removing required ABI + traffic. Also fold single-use arithmetic temporaries into their destination + when target constraints allow it. + +6. Add tiny local constant simplification where it is cheap. + O1 should not grow full SSA value optimization, but folding immediate-only + straight-line arithmetic before allocation would remove obvious code in + small functions without pulling O2 machinery into the fast tier. - Keep `opt_combine` legality target-aware. Existing one-use copy/immediate/convert folds should stay conservative. New diff --git a/src/api/cg.c b/src/api/cg.c @@ -990,6 +990,7 @@ typedef enum SResidency { RES_INHERENT, RES_REG, RES_SPILLED, + RES_FIXED_REG, } SResidency; typedef enum ApiSValueKind { @@ -1007,8 +1008,11 @@ typedef struct ApiSValue { u8 res; u8 pinned; u8 lvalue; - u8 pad[1]; + u8 cmp_a_owned; + u8 cmp_b_owned; + u8 pad[3]; FrameSlot spill_slot; + CfreeCgLocal source_local; } ApiSValue; #define API_CG_STACK_INITIAL 16u @@ -1036,7 +1040,8 @@ typedef struct ApiSourceLocal { CfreeSym name; CfreeCgLocalAttrs attrs; SrcLoc loc; - FrameSlot slot; + CGLocalDesc desc; + CGLocalStorage storage; u32 param_index; u8 kind; u8 pad[3]; @@ -1256,6 +1261,7 @@ static ApiSValue api_make_sv(Operand op, CfreeCgTypeId ty) { sv.type = ty; sv.res = api_residency_for(&op); sv.spill_slot = FRAME_SLOT_NONE; + sv.source_local = CFREE_CG_LOCAL_NONE; return sv; } @@ -1266,7 +1272,8 @@ static ApiSValue api_make_lv(Operand op, CfreeCgTypeId ty) { } static ApiSValue api_make_cmp(CmpOp op, Operand a, Operand b, - CfreeCgTypeId result_ty) { + CfreeCgTypeId result_ty, int a_owned, + int b_owned) { ApiSValue sv; memset(&sv, 0, sizeof sv); sv.kind = SV_CMP; @@ -1274,8 +1281,11 @@ static ApiSValue api_make_cmp(CmpOp op, Operand a, Operand b, sv.cmp_op = op; sv.cmp_a = a; sv.cmp_b = b; + sv.cmp_a_owned = a_owned ? 1u : 0u; + sv.cmp_b_owned = b_owned ? 1u : 0u; sv.res = RES_INHERENT; sv.spill_slot = FRAME_SLOT_NONE; + sv.source_local = CFREE_CG_LOCAL_NONE; return sv; } @@ -1298,7 +1308,10 @@ static int api_sv_op_is_reg_or_imm(const ApiSValue *sv) { } static int api_is_lvalue_sv(const ApiSValue *sv) { - return sv->lvalue && api_operand_can_address(&sv->op); + return sv->lvalue && + (api_operand_can_address(&sv->op) || + (sv->source_local != CFREE_CG_LOCAL_NONE && + sv->op.kind == OPK_REG)); } static void api_stack_grow(CfreeCg *g, u32 want) { @@ -1554,29 +1567,41 @@ static void api_release_operand_reg(CfreeCg *g, Operand op) { api_free_reg(g, op.v.reg, op.cls); } +static int api_sv_owns_operand_reg(const ApiSValue *sv, const Operand *op) { + return sv->res == RES_REG && op->kind == OPK_REG && sv->op.kind == OPK_REG && + sv->op.v.reg == op->v.reg && sv->op.cls == op->cls; +} + static void api_release_cmp(CfreeCg *g, ApiSValue *sv) { - api_release_operand_reg(g, sv->cmp_a); - if (sv->cmp_b.kind != OPK_REG || sv->cmp_a.kind != OPK_REG || - sv->cmp_b.v.reg != sv->cmp_a.v.reg || sv->cmp_b.cls != sv->cmp_a.cls) { + if (sv->cmp_a_owned) + api_release_operand_reg(g, sv->cmp_a); + if (sv->cmp_b_owned && + (sv->cmp_b.kind != OPK_REG || sv->cmp_a.kind != OPK_REG || + sv->cmp_b.v.reg != sv->cmp_a.v.reg || sv->cmp_b.cls != sv->cmp_a.cls || + !sv->cmp_a_owned)) { api_release_operand_reg(g, sv->cmp_b); } memset(&sv->cmp_a, 0, sizeof sv->cmp_a); memset(&sv->cmp_b, 0, sizeof sv->cmp_b); + sv->cmp_a_owned = 0; + sv->cmp_b_owned = 0; sv->kind = SV_OPERAND; } static void api_materialize_cmp_to(CfreeCg *g, ApiSValue *sv, Operand dst) { g->target->cmp(g->target, sv->cmp_op, dst, sv->cmp_a, sv->cmp_b); - if (sv->cmp_a.kind == OPK_REG && + if (sv->cmp_a_owned && sv->cmp_a.kind == OPK_REG && (sv->cmp_a.v.reg != dst.v.reg || sv->cmp_a.cls != dst.cls)) { api_release_operand_reg(g, sv->cmp_a); } - if (sv->cmp_b.kind == OPK_REG && + if (sv->cmp_b_owned && sv->cmp_b.kind == OPK_REG && (sv->cmp_b.v.reg != dst.v.reg || sv->cmp_b.cls != dst.cls)) { api_release_operand_reg(g, sv->cmp_b); } memset(&sv->cmp_a, 0, sizeof sv->cmp_a); memset(&sv->cmp_b, 0, sizeof sv->cmp_b); + sv->cmp_a_owned = 0; + sv->cmp_b_owned = 0; sv->kind = SV_OPERAND; sv->op = dst; sv->type = dst.type; @@ -2302,6 +2327,14 @@ static int api_source_flags_addr_taken(u32 flags) { return (flags & CFREE_CG_LOCAL_ADDRESS_TAKEN) != 0; } +static int api_local_requires_memory(CfreeCg *g, CfreeCgTypeId ty, + CfreeCgLocalAttrs attrs) { + if (api_source_flags_addr_taken(attrs.flags)) + return 1; + return !(cg_type_is_int(g->c, ty) || cg_type_is_float(g->c, ty) || + cg_type_is_ptr(g->c, ty)); +} + static CfreeCgLocal api_local_handle(u32 index) { u32 raw = index + 1u; if (!raw) @@ -2343,11 +2376,29 @@ static ApiSourceLocal *api_local_from_handle(CfreeCg *g, CfreeCgLocal local) { return &g->locals[index]; } +static CGLocalStorage api_frame_local_storage(CfreeCg *g, + const CGLocalDesc *d) { + FrameSlotDesc fsd; + CGLocalStorage st; + memset(&fsd, 0, sizeof fsd); + fsd.type = d->type; + fsd.name = d->name; + fsd.loc = d->loc; + fsd.size = d->size; + fsd.align = d->align; + fsd.kind = FS_LOCAL; + if (d->flags & CG_LOCAL_ADDR_TAKEN) + fsd.flags |= FSF_ADDR_TAKEN; + st.kind = CG_LOCAL_STORAGE_FRAME; + st.v.frame_slot = g->target->frame_slot(g->target, &fsd); + return st; +} + CfreeCgLocal cfree_cg_local(CfreeCg *g, CfreeCgTypeId type, CfreeCgLocalAttrs attrs) { CfreeCgTypeId ty; - FrameSlotDesc fsd; - FrameSlot slot; + CGLocalDesc desc; + CGLocalStorage storage; ApiSourceLocal *rec; CfreeCgLocal handle; if (!g) @@ -2358,22 +2409,31 @@ CfreeCgLocal cfree_cg_local(CfreeCg *g, CfreeCgTypeId type, handle = api_local_handle(g->nlocals); if (handle == CFREE_CG_LOCAL_NONE || !api_grow_locals(g, g->nlocals + 1u)) return CFREE_CG_LOCAL_NONE; - memset(&fsd, 0, sizeof fsd); - fsd.type = ty; - fsd.name = (Sym)attrs.name; - fsd.loc = g->cur_loc; - fsd.size = abi_cg_sizeof(g->c->abi, type); - fsd.align = attrs.align ? attrs.align : abi_cg_alignof(g->c->abi, type); - fsd.kind = FS_LOCAL; + memset(&desc, 0, sizeof desc); + desc.type = ty; + desc.name = (Sym)attrs.name; + desc.loc = g->cur_loc; + desc.size = abi_cg_sizeof(g->c->abi, type); + desc.align = attrs.align ? attrs.align : abi_cg_alignof(g->c->abi, type); if (api_source_flags_addr_taken(attrs.flags)) - fsd.flags |= FSF_ADDR_TAKEN; - slot = g->target->frame_slot(g->target, &fsd); + desc.flags |= CG_LOCAL_ADDR_TAKEN; + if (api_local_requires_memory(g, ty, attrs)) + desc.flags |= CG_LOCAL_MEMORY_REQUIRED; + if (g->target->local) + storage = g->target->local(g->target, &desc); + else + storage = api_frame_local_storage(g, &desc); + if (storage.kind == CG_LOCAL_STORAGE_REG) { + cg_simple_regalloc_reserve(&g->regalloc, (RegClass)api_type_class(ty), + storage.v.reg); + } rec = &g->locals[g->nlocals++]; rec->type = ty; rec->name = attrs.name; rec->attrs = attrs; rec->loc = g->cur_loc; - rec->slot = slot; + rec->desc = desc; + rec->storage = storage; rec->param_index = 0; rec->kind = API_SOURCE_LOCAL_AUTO; return handle; @@ -2425,7 +2485,17 @@ CfreeCgLocal cfree_cg_param(CfreeCg *g, uint32_t index, CfreeCgTypeId type, rec->name = attrs.name; rec->attrs = attrs; rec->loc = g->cur_loc; - rec->slot = slot; + memset(&rec->desc, 0, sizeof rec->desc); + rec->desc.type = ty; + rec->desc.name = (Sym)attrs.name; + rec->desc.loc = g->cur_loc; + rec->desc.size = fsd.size; + rec->desc.align = fsd.align; + rec->desc.flags = api_source_flags_addr_taken(attrs.flags) + ? CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED + : CG_LOCAL_MEMORY_REQUIRED; + rec->storage.kind = CG_LOCAL_STORAGE_FRAME; + rec->storage.v.frame_slot = slot; rec->param_index = index; rec->kind = API_SOURCE_LOCAL_PARAM; return handle; @@ -2530,6 +2600,27 @@ static void api_push_frame_lvalue(CfreeCg *g, FrameSlot slot, api_push(g, api_make_lv(api_op_local(slot, type), type)); } +static void api_push_source_frame_lvalue(CfreeCg *g, CfreeCgLocal local, + FrameSlot slot, CfreeCgTypeId type) { + ApiSValue sv; + if (!g) + return; + sv = api_make_lv(api_op_local(slot, type), type); + sv.source_local = local; + api_push(g, sv); +} + +static void api_push_source_reg_lvalue(CfreeCg *g, CfreeCgLocal local, Reg reg, + CfreeCgTypeId type) { + ApiSValue sv; + if (!g) + return; + sv = api_make_lv(api_op_reg(reg, type), type); + sv.res = RES_FIXED_REG; + sv.source_local = local; + api_push(g, sv); +} + void cfree_cg_push_local(CfreeCg *g, CfreeCgLocal local) { ApiSourceLocal *rec; if (!g) @@ -2537,7 +2628,15 @@ void cfree_cg_push_local(CfreeCg *g, CfreeCgLocal local) { rec = api_local_from_handle(g, local); if (!rec) return; - api_push_frame_lvalue(g, rec->slot, rec->type); + if (rec->kind == API_SOURCE_LOCAL_AUTO && + rec->storage.kind == CG_LOCAL_STORAGE_REG) { + api_push_source_reg_lvalue(g, local, rec->storage.v.reg, rec->type); + } else if (rec->kind == API_SOURCE_LOCAL_AUTO) { + api_push_source_frame_lvalue(g, local, rec->storage.v.frame_slot, + rec->type); + } else { + api_push_frame_lvalue(g, rec->storage.v.frame_slot, rec->type); + } } void cfree_cg_push_local_addr(CfreeCg *g, CfreeCgLocal local) { @@ -2650,6 +2749,16 @@ void cfree_cg_load(CfreeCg *g, CfreeCgMemAccess access) { ty = resolve_type(g->c, access.type); if (!ty) ty = api_sv_type(&v); + if (v.source_local != CFREE_CG_LOCAL_NONE && v.op.kind == OPK_REG) { + dst = v.op; + dst.type = ty; + v.op = dst; + v.type = ty; + v.lvalue = 0; + v.res = RES_FIXED_REG; + api_push(g, v); + return; + } dst = api_force_reg(g, &v, ty); dst.type = ty; api_push(g, api_make_sv(dst, ty)); @@ -2680,6 +2789,7 @@ void cfree_cg_addr(CfreeCg *g) { CfreeCgTypeId pty; Reg r; Operand dst; + ApiSourceLocal *rec; if (!g) return; T = g->target; @@ -2692,7 +2802,13 @@ void cfree_cg_addr(CfreeCg *g) { pty = cg_type_ptr_to(g->c, api_sv_type(&v)); r = api_alloc_reg_or_spill(g, RC_INT, pty); dst = api_op_reg(r, pty); - T->addr_of(T, dst, v.op); + rec = v.source_local != CFREE_CG_LOCAL_NONE + ? api_local_from_handle(g, v.source_local) + : NULL; + if (rec && rec->kind == API_SOURCE_LOCAL_AUTO && T->local_addr) + T->local_addr(T, dst, &rec->desc, rec->storage); + else + T->addr_of(T, dst, v.op); api_release(g, &v); api_push(g, api_make_sv(dst, pty)); } @@ -2722,7 +2838,22 @@ void cfree_cg_store(CfreeCg *g, CfreeCgMemAccess access) { } else { src = api_force_reg(g, &rv, api_sv_type(&rv)); } - T->store(T, lv.op, src, api_mem_from_access(g, &lv.op, access)); + if (lv.source_local != CFREE_CG_LOCAL_NONE && lv.op.kind == OPK_REG) { + Operand dst = lv.op; + dst.type = ty; + if (src.kind == OPK_IMM) { + T->load_imm(T, dst, src.v.imm); + } else if (src.kind == OPK_REG) { + if (src.v.reg != dst.v.reg || src.cls != dst.cls) + T->copy(T, dst, src); + } else { + src = api_force_reg(g, &rv, ty); + if (src.v.reg != dst.v.reg || src.cls != dst.cls) + T->copy(T, dst, src); + } + } else { + T->store(T, lv.op, src, api_mem_from_access(g, &lv.op, access)); + } api_release(g, &lv); api_release(g, &rv); } @@ -2857,7 +2988,9 @@ static void api_cg_cmp(CfreeCg *g, CmpOp cop) { ra = api_force_reg_unless_imm(g, &a, opty); rb = api_force_reg_unless_imm(g, &b, opty); if (api_type_class(opty) != RC_FP) { - api_push(g, api_make_cmp(cop, ra, rb, i32)); + api_push(g, api_make_cmp(cop, ra, rb, i32, + api_sv_owns_operand_reg(&a, &ra), + api_sv_owns_operand_reg(&b, &rb))); return; } rr = api_alloc_reg_or_spill(g, RC_INT, i32); diff --git a/src/arch/arch.h b/src/arch/arch.h @@ -187,6 +187,35 @@ typedef struct FrameSlotDesc { u16 flags; /* FrameSlotFlag */ } FrameSlotDesc; +typedef enum CGLocalFlag { + CG_LOCAL_NONE = 0, + CG_LOCAL_ADDR_TAKEN = 1u << 0, + CG_LOCAL_MEMORY_REQUIRED = 1u << 1, +} CGLocalFlag; + +typedef struct CGLocalDesc { + CfreeCgTypeId type; + Sym name; + SrcLoc loc; + u32 size; + u32 align; + u32 flags; /* CGLocalFlag */ +} CGLocalDesc; + +typedef enum CGLocalStorageKind { + CG_LOCAL_STORAGE_FRAME, + CG_LOCAL_STORAGE_REG, +} CGLocalStorageKind; + +typedef struct CGLocalStorage { + u8 kind; /* CGLocalStorageKind */ + u8 pad[3]; + union { + FrameSlot frame_slot; + Reg reg; + } v; +} CGLocalStorage; + typedef enum MemFlag { MF_NONE = 0, MF_VOLATILE = 1u << 0, @@ -506,6 +535,9 @@ struct CGTarget { * regs to the target. Plain machine targets consume hard regs; opt_cgtarget * sets virtual_regs and records virtual Reg ids as SSA values. */ FrameSlot (*frame_slot)(CGTarget*, const FrameSlotDesc*); + CGLocalStorage (*local)(CGTarget*, const CGLocalDesc*); + void (*local_addr)(CGTarget*, Operand dst, const CGLocalDesc*, + CGLocalStorage); void (*param)(CGTarget*, const CGParamDesc*); void (*spill_reg)(CGTarget*, Operand src_reg, FrameSlot, MemAccess); void (*reload_reg)(CGTarget*, Operand dst_reg, FrameSlot, MemAccess); diff --git a/src/arch/regalloc.c b/src/arch/regalloc.c @@ -111,7 +111,10 @@ int cg_simple_regalloc_free(CGSimpleRegAlloc* a, RegClass cls, Reg r) { void cg_simple_regalloc_reserve(CGSimpleRegAlloc* a, RegClass cls, Reg r) { if ((u32)cls >= 3u) return; - if (a->virtual_regs) return; + if (a->virtual_regs) { + if (r != (Reg)REG_NONE && r >= a->next_virtual) a->next_virtual = r + 1u; + return; + } cg_simple_regpool_reserve(&a->pools[cls], r); } diff --git a/src/opt/ir.c b/src/opt/ir.c @@ -177,6 +177,23 @@ void ir_param_add(Func* f, const CGParamDesc* d) { p->loc = d->loc; } +u32 ir_local_add(Func* f, const CGLocalDesc* d, CGLocalStorage storage) { + IRLocal* l; + if (f->nlocals == f->locals_cap) { + u32 ncap = f->locals_cap ? f->locals_cap * 2u : 8u; + IRLocal* nb = arena_zarray(f->arena, IRLocal, ncap); + if (f->locals) memcpy(nb, f->locals, sizeof(IRLocal) * f->nlocals); + f->locals = nb; + f->locals_cap = ncap; + } + l = &f->locals[f->nlocals]; + l->id = f->nlocals + 1u; + l->desc = *d; + l->storage = storage; + ++f->nlocals; + return l->id; +} + /* ---- construction ---- */ Func* ir_func_new(Compiler* c, const CGFuncDesc* desc) { diff --git a/src/opt/ir.h b/src/opt/ir.h @@ -217,6 +217,15 @@ typedef struct IRParam { SrcLoc loc; } IRParam; +typedef struct IRLocal { + u32 id; + CGLocalDesc desc; + CGLocalStorage storage; + FrameSlot home_slot; + u8 address_taken; + u8 pad[3]; +} IRLocal; + /* ---- Inst / Block / Func ---- */ typedef struct Inst { @@ -293,6 +302,8 @@ typedef struct Func { u32 nframe_slots, frame_slots_cap; IRParam* params; u32 nparams, params_cap; + IRLocal* locals; + u32 nlocals, locals_cap; /* Value table. Index 0 is VAL_NONE; first allocated Val is 1. */ u32* val_def_block; @@ -352,6 +363,7 @@ Func* ir_func_new(Compiler*, const CGFuncDesc*); u32 ir_block_new(Func*); FrameSlot ir_frame_slot_new(Func*, const FrameSlotDesc*); void ir_param_add(Func*, const CGParamDesc*); +u32 ir_local_add(Func*, const CGLocalDesc*, CGLocalStorage); Val ir_alloc_val(Func*, CfreeCgTypeId, u8 cls); void ir_ensure_val(Func*, Val, CfreeCgTypeId, u8 cls); diff --git a/src/opt/opt.c b/src/opt/opt.c @@ -127,6 +127,7 @@ static void w_func_begin(CGTarget* t, const CGFuncDesc* fd) { } static void w_func_end(CGTarget* t); +static void w_addr_of(CGTarget* t, Operand dst, Operand lv); /* ---- registers and frame slots ---- */ @@ -135,6 +136,254 @@ static FrameSlot w_frame_slot(CGTarget* t, const FrameSlotDesc* d) { return ir_frame_slot_new(o->f, d); } +static FrameSlot opt_local_frame_slot(Func* f, const CGLocalDesc* d, + int force_addr_taken) { + FrameSlotDesc fsd; + memset(&fsd, 0, sizeof fsd); + fsd.type = d->type; + fsd.name = d->name; + fsd.loc = d->loc; + fsd.size = d->size; + fsd.align = d->align; + fsd.kind = FS_LOCAL; + if (force_addr_taken || (d->flags & CG_LOCAL_ADDR_TAKEN)) + fsd.flags |= FSF_ADDR_TAKEN; + return ir_frame_slot_new(f, &fsd); +} + +static u8 opt_local_reg_class_for(Compiler* c, CfreeCgTypeId ty) { + CfreeCgTypeKind kind = cfree_cg_type_kind((CfreeCompiler*)c, ty); + return kind == CFREE_CG_TYPE_FLOAT ? RC_FP : RC_INT; +} + +static u8 opt_local_reg_class(OptImpl* o, CfreeCgTypeId ty) { + return opt_local_reg_class_for(o->c, ty); +} + +static CGLocalStorage w_local(CGTarget* t, const CGLocalDesc* d) { + OptImpl* o = impl_of(t); + CGLocalStorage st; + memset(&st, 0, sizeof st); + if ((d->flags & (CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED)) == 0) { + Val v = ir_alloc_val(o->f, d->type, opt_local_reg_class(o, d->type)); + st.kind = CG_LOCAL_STORAGE_REG; + st.v.reg = (Reg)v; + } else { + st.kind = CG_LOCAL_STORAGE_FRAME; + st.v.frame_slot = opt_local_frame_slot(o->f, d, 0); + } + ir_local_add(o->f, d, st); + return st; +} + +static IRLocal* opt_find_local_by_reg(Func* f, Reg reg) { + for (u32 i = 0; i < f->nlocals; ++i) { + IRLocal* l = &f->locals[i]; + if (l->storage.kind == CG_LOCAL_STORAGE_REG && l->storage.v.reg == reg) + return l; + } + return NULL; +} + +static void w_local_addr(CGTarget* t, Operand dst, const CGLocalDesc* d, + CGLocalStorage st) { + OptImpl* o = impl_of(t); + IRLocal* local = NULL; + FrameSlot frame_slot = FRAME_SLOT_NONE; + const CGLocalDesc* desc = d; + if (st.kind == CG_LOCAL_STORAGE_REG) { + local = opt_find_local_by_reg(o->f, st.v.reg); + if (!local) { + compiler_panic(o->c, d ? d->loc : o->pending_loc, + "opt_cgtarget: unknown register-backed local address"); + } + if (local->home_slot == FRAME_SLOT_NONE) + local->home_slot = opt_local_frame_slot(o->f, &local->desc, 1); + local->address_taken = 1; + local->desc.flags |= CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED; + frame_slot = local->home_slot; + desc = &local->desc; + } else { + frame_slot = st.v.frame_slot; + } + Operand lv; + memset(&lv, 0, sizeof lv); + lv.kind = OPK_LOCAL; + lv.cls = RC_INT; + lv.type = desc ? desc->type : dst.type; + lv.v.frame_slot = frame_slot; + w_addr_of(t, dst, lv); +} + +static Operand opt_local_addr_operand(IRLocal* l) { + Operand o; + memset(&o, 0, sizeof o); + o.kind = OPK_LOCAL; + o.cls = RC_INT; + o.type = l->desc.type; + o.v.frame_slot = l->home_slot; + return o; +} + +static MemAccess opt_local_mem(IRLocal* l) { + MemAccess m; + memset(&m, 0, sizeof m); + m.type = l->desc.type; + m.size = l->desc.size; + m.align = l->desc.align; + m.alias.kind = ALIAS_LOCAL; + m.alias.v.local_id = (i32)l->home_slot; + return m; +} + +static int inst_defines_val(const Inst* in, Val v) { + if (!in || v == VAL_NONE) return 0; + if (in->def == v) return 1; + for (u32 i = 0; i < in->ndefs; ++i) + if (in->defs[i] == v) return 1; + return 0; +} + +static int op_uses_reg(const Operand* op, Reg reg) { + if (!op) return 0; + if (op->kind == OPK_REG && op->v.reg == reg) return 1; + if (op->kind == OPK_INDIRECT && op->v.ind.base == reg) return 1; + return 0; +} + +static int abivalue_uses_reg(const CGABIValue* v, Reg reg) { + if (!v) return 0; + if (op_uses_reg(&v->storage, reg)) return 1; + for (u32 i = 0; i < v->nparts; ++i) + if (op_uses_reg(&v->parts[i].op, reg)) return 1; + return 0; +} + +static int inst_uses_local_reg(const Inst* in, Reg reg) { + if (!in) return 0; + for (u32 i = 0; i < in->nopnds; ++i) { + int is_def = i == 0 && in->opnds[i].kind == OPK_REG && + inst_defines_val(in, (Val)in->opnds[i].v.reg); + if (!is_def && op_uses_reg(&in->opnds[i], reg)) return 1; + } + switch ((IROp)in->op) { + case IR_CALL: { + IRCallAux* aux = (IRCallAux*)in->extra.aux; + if (!aux) return 0; + if (op_uses_reg(&aux->desc.callee, reg)) return 1; + for (u32 i = 0; i < aux->desc.nargs; ++i) + if (abivalue_uses_reg(&aux->desc.args[i], reg)) return 1; + return 0; + } + case IR_RET: { + IRRetAux* aux = (IRRetAux*)in->extra.aux; + return aux && aux->present && abivalue_uses_reg(&aux->val, reg); + } + case IR_SCOPE_BEGIN: { + IRScopeAux* aux = (IRScopeAux*)in->extra.aux; + return aux && op_uses_reg(&aux->desc.cond, reg); + } + case IR_ASM_BLOCK: { + IRAsmAux* aux = (IRAsmAux*)in->extra.aux; + if (!aux) return 0; + for (u32 i = 0; i < aux->nin; ++i) + if (op_uses_reg(&aux->in_ops[i], reg)) return 1; + return 0; + } + case IR_INTRINSIC: { + IRIntrinAux* aux = (IRIntrinAux*)in->extra.aux; + if (!aux) return 0; + for (u32 i = 0; i < aux->narg; ++i) + if (op_uses_reg(&aux->args[i], reg)) return 1; + return 0; + } + default: + return 0; + } +} + +static void opt_make_local_load(Func* f, Inst* out, IRLocal* l, SrcLoc loc) { + memset(out, 0, sizeof *out); + out->op = IR_LOAD; + out->loc = loc; + out->type = l->desc.type; + out->def = (Val)l->storage.v.reg; + out->opnds = arena_array(f->arena, Operand, 2); + out->opnds[0].kind = OPK_REG; + out->opnds[0].cls = opt_local_reg_class_for(f->c, l->desc.type); + out->opnds[0].type = l->desc.type; + out->opnds[0].v.reg = l->storage.v.reg; + out->opnds[1] = opt_local_addr_operand(l); + out->nopnds = 2; + out->extra.mem = opt_local_mem(l); +} + +static void opt_make_local_store(Func* f, Inst* out, IRLocal* l, SrcLoc loc) { + memset(out, 0, sizeof *out); + out->op = IR_STORE; + out->loc = loc; + out->opnds = arena_array(f->arena, Operand, 2); + out->opnds[0] = opt_local_addr_operand(l); + out->opnds[1].kind = OPK_REG; + out->opnds[1].cls = opt_local_reg_class_for(f->c, l->desc.type); + out->opnds[1].type = l->desc.type; + out->opnds[1].v.reg = l->storage.v.reg; + out->nopnds = 2; + out->extra.mem = opt_local_mem(l); +} + +static IRLocal* opt_addr_taken_reg_local_defined_by(Func* f, const Inst* in) { + if (!in) return NULL; + for (u32 i = 0; i < f->nlocals; ++i) { + IRLocal* l = &f->locals[i]; + if (!l->address_taken || l->home_slot == FRAME_SLOT_NONE) continue; + if (l->storage.kind == CG_LOCAL_STORAGE_REG && + inst_defines_val(in, (Val)l->storage.v.reg)) + return l; + } + return NULL; +} + +static void opt_frame_home_addr_taken_locals(Func* f) { + int any = 0; + for (u32 i = 0; i < f->nlocals; ++i) { + IRLocal* l = &f->locals[i]; + if (l->address_taken && l->storage.kind == CG_LOCAL_STORAGE_REG && + l->home_slot != FRAME_SLOT_NONE) { + any = 1; + break; + } + } + if (!any) return; + + for (u32 b = 0; b < f->nblocks; ++b) { + Block* bl = &f->blocks[b]; + if (!bl->ninsts) continue; + u32 out_cap = bl->ninsts * (f->nlocals + 2u); + Inst* out = arena_zarray(f->arena, Inst, out_cap ? out_cap : 1u); + u32 nout = 0; + for (u32 i = 0; i < bl->ninsts; ++i) { + Inst in = bl->insts[i]; + for (u32 j = 0; j < f->nlocals; ++j) { + IRLocal* used = &f->locals[j]; + if (!used->address_taken || used->home_slot == FRAME_SLOT_NONE) + continue; + if (used->storage.kind != CG_LOCAL_STORAGE_REG) + continue; + if (inst_uses_local_reg(&in, used->storage.v.reg)) + opt_make_local_load(f, &out[nout++], used, in.loc); + } + out[nout++] = in; + IRLocal* defined = opt_addr_taken_reg_local_defined_by(f, &in); + if (defined) + opt_make_local_store(f, &out[nout++], defined, in.loc); + } + bl->insts = out; + bl->ninsts = nout; + bl->cap = bl->ninsts; + } +} + static void w_param(CGTarget* t, const CGParamDesc* d) { OptImpl* o = impl_of(t); /* Deep-copy parts so caller-stack memory isn't relied on. */ @@ -1466,6 +1715,7 @@ static u64 func_spill_store_count(Func* f) { static void w_func_end(CGTarget* t) { OptImpl* o = impl_of(t); if (!o->f) return; + opt_frame_home_addr_taken_locals(o->f); if (o->level == 1) { metrics_scope_begin(o->c, "opt.o1.total"); @@ -1603,6 +1853,8 @@ CGTarget* opt_cgtarget_new(Compiler* c, CGTarget* target, int level) { t->func_end = w_func_end; t->frame_slot = w_frame_slot; + t->local = w_local; + t->local_addr = w_local_addr; t->param = w_param; t->spill_reg = w_spill_reg; t->reload_reg = w_reload_reg; diff --git a/test/api/cg_type_test.c b/test/api/cg_type_test.c @@ -130,6 +130,135 @@ static void exercise_cg_handles(CfreeCompiler* c, CfreeCgTypeId i32_ty, obj_free((ObjBuilder*)ob); } +static void exercise_cg_scalar_local(CfreeCompiler* c, CfreeCgTypeId i32_ty, + int opt_level) { + char name_buf[40]; + CfreeCompileOptions opts; + CfreeObjBuilder* ob; + CfreeCg* cg; + CfreeCgFuncSig sig; + CfreeCgDecl decl; + CfreeCgSym sym; + CfreeCgLocalAttrs attrs; + CfreeCgLocal local; + CfreeCgMemAccess mem; + + memset(&opts, 0, sizeof opts); + opts.opt_level = opt_level; + ob = (CfreeObjBuilder*)obj_new((Compiler*)c); + EXPECT(ob != NULL, "obj builder allocation failed"); + if (!ob) return; + cg = cfree_cg_new(c, ob, &opts); + EXPECT(cg != NULL, "cg allocation failed"); + if (!cg) { + obj_free((ObjBuilder*)ob); + return; + } + + memset(&sig, 0, sizeof sig); + sig.ret = i32_ty; + sig.call_conv = CFREE_CG_CC_TARGET_C; + + snprintf(name_buf, sizeof name_buf, "cg_scalar_local_o%d", opt_level); + memset(&decl, 0, sizeof decl); + decl.kind = CFREE_CG_DECL_FUNC; + decl.linkage_name = cfree_sym_intern(c, name_buf); + decl.display_name = decl.linkage_name; + decl.type = cfree_cg_type_func(c, sig); + decl.sym.bind = CFREE_SB_GLOBAL; + decl.sym.visibility = CFREE_CG_VIS_DEFAULT; + sym = cfree_cg_decl(cg, decl); + EXPECT(sym != CFREE_CG_SYM_NONE, "scalar local decl failed"); + + cfree_cg_func_begin(cg, sym); + memset(&attrs, 0, sizeof attrs); + attrs.name = cfree_sym_intern(c, "x"); + local = cfree_cg_local(cg, i32_ty, attrs); + EXPECT(local != CFREE_CG_LOCAL_NONE, "scalar local handle is none"); + + memset(&mem, 0, sizeof mem); + mem.type = i32_ty; + mem.align = cfree_cg_type_align(c, i32_ty); + + cfree_cg_push_local(cg, local); + cfree_cg_push_int(cg, 40, i32_ty); + cfree_cg_store(cg, mem); + cfree_cg_push_local(cg, local); + cfree_cg_load(cg, mem); + cfree_cg_push_int(cg, 2, i32_ty); + cfree_cg_int_binop(cg, CFREE_CG_INT_ADD, 0); + cfree_cg_ret(cg); + cfree_cg_func_end(cg); + + cfree_cg_free(cg); + obj_free((ObjBuilder*)ob); +} + +static void exercise_cg_late_local_addr(CfreeCompiler* c, CfreeCgTypeId i32_ty, + int opt_level) { + char name_buf[40]; + CfreeCompileOptions opts; + CfreeObjBuilder* ob; + CfreeCg* cg; + CfreeCgFuncSig sig; + CfreeCgDecl decl; + CfreeCgSym sym; + CfreeCgLocalAttrs attrs; + CfreeCgLocal local; + CfreeCgMemAccess mem; + + memset(&opts, 0, sizeof opts); + opts.opt_level = opt_level; + ob = (CfreeObjBuilder*)obj_new((Compiler*)c); + EXPECT(ob != NULL, "obj builder allocation failed"); + if (!ob) return; + cg = cfree_cg_new(c, ob, &opts); + EXPECT(cg != NULL, "cg allocation failed"); + if (!cg) { + obj_free((ObjBuilder*)ob); + return; + } + + memset(&sig, 0, sizeof sig); + sig.ret = i32_ty; + sig.call_conv = CFREE_CG_CC_TARGET_C; + + snprintf(name_buf, sizeof name_buf, "cg_late_local_addr_o%d", opt_level); + memset(&decl, 0, sizeof decl); + decl.kind = CFREE_CG_DECL_FUNC; + decl.linkage_name = cfree_sym_intern(c, name_buf); + decl.display_name = decl.linkage_name; + decl.type = cfree_cg_type_func(c, sig); + decl.sym.bind = CFREE_SB_GLOBAL; + decl.sym.visibility = CFREE_CG_VIS_DEFAULT; + sym = cfree_cg_decl(cg, decl); + EXPECT(sym != CFREE_CG_SYM_NONE, "late local addr decl failed"); + + cfree_cg_func_begin(cg, sym); + memset(&attrs, 0, sizeof attrs); + attrs.name = cfree_sym_intern(c, "x"); + local = cfree_cg_local(cg, i32_ty, attrs); + EXPECT(local != CFREE_CG_LOCAL_NONE, "late addr local handle is none"); + + memset(&mem, 0, sizeof mem); + mem.type = i32_ty; + mem.align = cfree_cg_type_align(c, i32_ty); + + cfree_cg_push_local(cg, local); + cfree_cg_push_int(cg, 41, i32_ty); + cfree_cg_store(cg, mem); + cfree_cg_push_local_addr(cg, local); + cfree_cg_indirect(cg); + cfree_cg_load(cg, mem); + cfree_cg_push_int(cg, 1, i32_ty); + cfree_cg_int_binop(cg, CFREE_CG_INT_ADD, 0); + cfree_cg_ret(cg); + cfree_cg_func_end(cg); + + cfree_cg_free(cg); + obj_free((ObjBuilder*)ob); +} + int main(void) { CfreeTarget target; CfreeEnv env; @@ -284,6 +413,10 @@ int main(void) { exercise_cg_handles(c, i32_ty, 0); exercise_cg_handles(c, i32_ty, 1); + exercise_cg_scalar_local(c, i32_ty, 0); + exercise_cg_scalar_local(c, i32_ty, 1); + exercise_cg_late_local_addr(c, i32_ty, 0); + exercise_cg_late_local_addr(c, i32_ty, 1); cfree_compiler_free(c); return g_fail ? 1 : 0; diff --git a/test/opt/opt_test.c b/test/opt/opt_test.c @@ -132,6 +132,17 @@ static Operand op_local_(FrameSlot fs, CfreeCgTypeId ty) { return o; } +static Operand op_indirect_(Reg base, CfreeCgTypeId ty) { + Operand o; + memset(&o, 0, sizeof o); + o.kind = OPK_INDIRECT; + o.cls = RC_INT; + o.type = ty; + o.v.ind.base = base; + o.v.ind.ofs = 0; + return o; +} + static MemAccess mem_local_(FrameSlot fs, CfreeCgTypeId ty, u32 size, u16 flags) { MemAccess m; @@ -145,6 +156,16 @@ static MemAccess mem_local_(FrameSlot fs, CfreeCgTypeId ty, u32 size, return m; } +static MemAccess mem_unknown_(CfreeCgTypeId ty, u32 size) { + MemAccess m; + memset(&m, 0, sizeof m); + m.type = ty; + m.size = size; + m.align = size >= 8 ? 8 : size; + m.alias.kind = ALIAS_UNKNOWN; + return m; +} + static Func* new_func(TestCtx* tc) { CGFuncDesc fd; CfreeCgFuncSig sig; @@ -375,6 +396,10 @@ typedef struct MockCGTarget { int load_const_calls; u8 last_const_bytes[16]; u32 last_const_size; + int copy_calls; + int load_calls; + int store_calls; + int addr_of_calls; int cmp_branch_calls; } MockCGTarget; @@ -458,6 +483,36 @@ static void mock_load_const(CGTarget* t, Operand dst, ConstBytes cb) { memcpy(m->last_const_bytes, cb.bytes, cb.size); } +static void mock_copy(CGTarget* t, Operand dst, Operand src) { + MockCGTarget* m = (MockCGTarget*)t; + (void)dst; + (void)src; + ++m->copy_calls; +} + +static void mock_load(CGTarget* t, Operand dst, Operand addr, MemAccess macc) { + MockCGTarget* m = (MockCGTarget*)t; + (void)dst; + (void)addr; + (void)macc; + ++m->load_calls; +} + +static void mock_store(CGTarget* t, Operand addr, Operand src, MemAccess macc) { + MockCGTarget* m = (MockCGTarget*)t; + (void)addr; + (void)src; + (void)macc; + ++m->store_calls; +} + +static void mock_addr_of(CGTarget* t, Operand dst, Operand lv) { + MockCGTarget* m = (MockCGTarget*)t; + (void)dst; + (void)lv; + ++m->addr_of_calls; +} + static void mock_ret(CGTarget* t, const CGABIValue* v) { (void)t; (void)v; @@ -511,6 +566,10 @@ static void mock_init(MockCGTarget* m, Compiler* c) { m->base.cmp_branch = mock_cmp_branch; m->base.load_imm = mock_load_imm; m->base.load_const = mock_load_const; + m->base.copy = mock_copy; + m->base.load = mock_load; + m->base.store = mock_store; + m->base.addr_of = mock_addr_of; m->base.ret = mock_ret; m->base.set_loc = mock_set_loc; m->base.get_allocable_regs = mock_get_allocable_regs; @@ -2132,6 +2191,124 @@ static void opt_cmp_branch_keeps_fallthrough_after_block_growth(void) { tc_fini(&tc); } +static void begin_mock_opt_func(TestCtx* tc, CGTarget* opt, + CfreeCgTypeId ret_ty) { + CGFuncDesc fd; + CfreeCgFuncSig sig; + memset(&fd, 0, sizeof fd); + memset(&sig, 0, sizeof sig); + sig.ret = ret_ty; + sig.call_conv = CFREE_CG_CC_TARGET_C; + fd.fn_type = cfree_cg_type_func(tc->c, sig); + opt->func_begin(opt, &fd); +} + +static CGLocalDesc local_desc_(CfreeCgTypeId ty, u32 size, u32 align, + u32 flags) { + CGLocalDesc d; + memset(&d, 0, sizeof d); + d.type = ty; + d.size = size; + d.align = align; + d.flags = flags; + return d; +} + +static void opt_local_hook_chooses_register_for_scalar(void) { + TestCtx tc; + tc_init(&tc); + MockCGTarget mock; + mock_init(&mock, tc.c); + + CGTarget* opt = opt_cgtarget_new(tc.c, &mock.base, 1); + begin_mock_opt_func(&tc, opt, tc.i32); + + CGLocalDesc d = local_desc_(tc.i32, 4, 4, 0); + CGLocalStorage st = opt->local(opt, &d); + EXPECT(st.kind == CG_LOCAL_STORAGE_REG, + "non-address-taken scalar local should be register-backed"); + EXPECT(st.v.reg != (Reg)REG_NONE, "register-backed local needs a vreg"); + + opt->destroy(opt); + tc_fini(&tc); +} + +static void opt_local_addr_taken_uses_frame_and_replays_addr_of(void) { + TestCtx tc; + tc_init(&tc); + MockCGTarget mock; + mock_init(&mock, tc.c); + static const Reg pool[] = {19}; + static const Reg scratch[] = {9, 10}; + mock_set_pool(&mock, RC_INT, pool, 1, scratch, 2, 0x4007FFFFu); + + CGTarget* opt = opt_cgtarget_new(tc.c, &mock.base, 1); + CfreeCgTypeId ptr_ty = cfree_cg_type_ptr(tc.c, tc.i32, 0); + begin_mock_opt_func(&tc, opt, ptr_ty); + + CGLocalDesc d = + local_desc_(tc.i32, 4, 4, CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED); + CGLocalStorage st = opt->local(opt, &d); + EXPECT(st.kind == CG_LOCAL_STORAGE_FRAME, + "address-taken local should be frame-backed"); + + Operand addr = op_reg_(1, ptr_ty); + opt->local_addr(opt, addr, &d, st); + CGABIValue retv = {0}; + retv.type = ptr_ty; + retv.storage = addr; + opt->ret(opt, &retv); + opt->func_end(opt); + + EXPECT(mock.addr_of_calls == 1, + "frame-backed local address should replay as addr_of"); + + opt->destroy(opt); + tc_fini(&tc); +} + +static void opt_register_local_addr_frame_homes(void) { + TestCtx tc; + tc_init(&tc); + MockCGTarget mock; + mock_init(&mock, tc.c); + static const Reg pool[] = {19, 20}; + static const Reg scratch[] = {9, 10}; + mock_set_pool(&mock, RC_INT, pool, 2, scratch, 2, 0x4007FFFFu); + + CGTarget* opt = opt_cgtarget_new(tc.c, &mock.base, 1); + CfreeCgTypeId ptr_ty = cfree_cg_type_ptr(tc.c, tc.i32, 0); + begin_mock_opt_func(&tc, opt, ptr_ty); + + CGLocalDesc desc = local_desc_(tc.i32, 4, 4, 0); + CGLocalStorage storage = opt->local(opt, &desc); + EXPECT(storage.kind == CG_LOCAL_STORAGE_REG, + "frame-home test needs register-backed local"); + + Operand local = op_reg_(storage.v.reg, tc.i32); + opt->load_imm(opt, local, 42); + Operand addr = op_reg_(2, ptr_ty); + opt->local_addr(opt, addr, &desc, storage); + opt->store(opt, op_indirect_(addr.v.reg, tc.i32), local, + mem_unknown_(tc.i32, 4)); + + CGABIValue retv = {0}; + retv.type = ptr_ty; + retv.storage = addr; + opt->ret(opt, &retv); + opt->func_end(opt); + + EXPECT(mock.addr_of_calls == 1, + "register-backed local addrof should replay as addr_of"); + EXPECT(mock.store_calls >= 1, + "register-backed local addrof should frame-home prior stores"); + EXPECT(mock.load_calls >= 1, + "register-backed local use after addrof should reload from home"); + + opt->destroy(opt); + tc_fini(&tc); +} + static void simple_regalloc_reports_exact_used_regs(void) { CGSimpleRegAlloc a; static const Reg regs[] = {3, 7, 11}; @@ -2192,6 +2369,9 @@ int main(void) { opt_emit_no_virtual_alloc(); opt_records_const_bytes_by_value(); opt_cmp_branch_keeps_fallthrough_after_block_growth(); + opt_local_hook_chooses_register_for_scalar(); + opt_local_addr_taken_uses_frame_and_replays_addr_of(); + opt_register_local_addr_frame_homes(); simple_regalloc_reports_exact_used_regs(); if (g_fails) { fprintf(stderr, "opt tests: %d failed (%d checks)\n", g_fails, g_checks); diff --git a/test/opt/phase0_guardrails.sh b/test/opt/phase0_guardrails.sh @@ -34,6 +34,31 @@ int main() { SRC } +write_scalar_local() { + cat >"$TMP/scalar_local.c" <<'SRC' +int main() { + int x = 40; + x = x + 2; + return x == 42 ? 0 : 1; +} +SRC +} + +write_late_addrof_join() { + cat >"$TMP/late_addrof_join.c" <<'SRC' +int main() { + int x = 5; + if (x == 5) + x = 40; + else + x = 1; + int *p = &x; + *p = *p + 2; + return x == 42 ? 0 : 1; +} +SRC +} + write_spills() { { printf 'int main() {\n' @@ -130,12 +155,16 @@ check_metrics() { write_branch_liveness write_call_clobber +write_scalar_local +write_late_addrof_join write_spills write_many_small_functions write_large_straight_line run_case branch_liveness "$TMP/branch_liveness.c" run_case call_clobber "$TMP/call_clobber.c" +run_case scalar_local "$TMP/scalar_local.c" +run_case late_addrof_join "$TMP/late_addrof_join.c" run_case spills "$TMP/spills.c" run_case many_small_functions "$TMP/many_small_functions.c" run_case large_straight_line "$TMP/large_straight_line.c" diff --git a/test/parse/cases/opt_01_scalar_local.c b/test/parse/cases/opt_01_scalar_local.c @@ -0,0 +1,5 @@ +int test_main(void) { + int x = 40; + x = x + 2; + return x; +} diff --git a/test/parse/cases/opt_01_scalar_local.expected b/test/parse/cases/opt_01_scalar_local.expected @@ -0,0 +1 @@ +42 diff --git a/test/parse/cases/opt_02_late_addrof_join.c b/test/parse/cases/opt_02_late_addrof_join.c @@ -0,0 +1,10 @@ +int test_main(void) { + int x = 5; + if (x == 5) + x = 40; + else + x = 1; + int *p = &x; + *p = *p + 2; + return x; +} diff --git a/test/parse/cases/opt_02_late_addrof_join.expected b/test/parse/cases/opt_02_late_addrof_join.expected @@ -0,0 +1 @@ +42 diff --git a/test/parse/run.sh b/test/parse/run.sh @@ -27,6 +27,11 @@ # paths subset of "DREJ" (default "DREJ") # Equivalent env vars: CFREE_TEST_FILTER, CFREE_TEST_PATHS. # +# Optimization levels: +# CFREE_OPT_LEVELS="0 1" whitespace-separated levels to test. +# CFREE_OPT_LEVEL=1 compatibility shorthand for one level. +# Default is "0 1". +# # Parallelism: # default run in parallel with a capped CPU-count default. # CFREE_TEST_JOBS=N run up to N cases concurrently. @@ -68,6 +73,19 @@ ALLOW_SKIP="${CFREE_TEST_ALLOW_SKIP:-0}" FILTER="${1:-${CFREE_TEST_FILTER:-}}" PATHS="${2:-${CFREE_TEST_PATHS:-DREJ}}" +if [ -n "${CFREE_OPT_LEVELS:-}" ]; then + OPT_LEVELS="$CFREE_OPT_LEVELS" +elif [ -n "${CFREE_OPT_LEVEL:-}" ]; then + OPT_LEVELS="$CFREE_OPT_LEVEL" +else + OPT_LEVELS="0 1" +fi +for opt in $OPT_LEVELS; do + case "$opt" in + 0|1|2) ;; + *) printf 'parse: invalid opt level %s in CFREE_OPT_LEVELS\n' "$opt" >&2; exit 2 ;; + esac +done case "$PATHS" in *D*) RUN_D=1;; *) RUN_D=0;; esac case "$PATHS" in *R*) RUN_R=1;; *) RUN_R=0;; esac case "$PATHS" in *E*) RUN_E=1;; *) RUN_E=0;; esac @@ -300,7 +318,7 @@ if [ $have_clang_cross -eq 1 ]; then fi fi -printf 'Running cases (%s jobs)...\n' "$TEST_JOBS" +printf 'Running cases (%s jobs, opt levels: %s)...\n' "$TEST_JOBS" "$OPT_LEVELS" # ---- per-case loop --------------------------------------------------------- @@ -320,30 +338,35 @@ FILTERED_CASES=() for src in "${CASES[@]}"; do name="$(basename "$src" .c)" [ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && continue - FILTERED_CASES+=("$src") + for opt in $OPT_LEVELS; do + FILTERED_CASES+=("$opt:$src") + done done run_parse_case() { - local _idx="$1" src="$2" event="$3" - local name work reason expected expected_byte obj t0 dt d_rc r_ok r_msg rt + local _idx="$1" item="$2" event="$3" + local opt src base_name name work reason expected expected_byte obj t0 dt d_rc r_ok r_msg rt local exe link_dt j_rc : "$_idx" - name="$(basename "$src" .c)" - work="$BUILD_DIR/parse/$name" + opt="${item%%:*}" + src="${item#*:}" + base_name="$(basename "$src" .c)" + name="$base_name/O$opt" + work="$BUILD_DIR/parse/$base_name.O$opt" mkdir -p "$work" # Skip sidecar - if [ -e "$TEST_DIR/cases/$name.skip" ]; then - reason=$(head -n1 "$TEST_DIR/cases/$name.skip") + if [ -e "$TEST_DIR/cases/$base_name.skip" ]; then + reason=$(head -n1 "$TEST_DIR/cases/$base_name.skip") emit_event "$event" SKIP "$name" "$reason" return 0 fi # Expected exit code (default 0) expected=0 - if [ -e "$TEST_DIR/cases/$name.expected" ]; then - expected=$(head -n1 "$TEST_DIR/cases/$name.expected") + if [ -e "$TEST_DIR/cases/$base_name.expected" ]; then + expected=$(head -n1 "$TEST_DIR/cases/$base_name.expected") fi expected_byte=$(( expected & 0xff )) @@ -351,7 +374,8 @@ run_parse_case() { if [ $RUN_D -eq 1 ]; then if [ $is_native_target -eq 1 ]; then t0=$(now_ms) - "$PARSE_RUNNER" --jit "$src" >"$work/d.out" 2>"$work/d.err" + CFREE_OPT_LEVEL="$opt" "$PARSE_RUNNER" --jit "$src" \ + >"$work/d.out" 2>"$work/d.err" d_rc=$? dt=$(( $(now_ms) - t0 )) emit_event "$event" TIME D "$dt" @@ -366,9 +390,10 @@ run_parse_case() { fi # ---- emit (needed by R/E/J) ------------------------------------------ - obj="$work/$name.o" + obj="$work/$base_name.o" if [ $RUN_R -eq 1 ] || [ $RUN_E -eq 1 ] || [ $RUN_J -eq 1 ]; then - if ! "$PARSE_RUNNER" --emit "$src" "$obj" 2>"$work/emit.err"; then + if ! CFREE_OPT_LEVEL="$opt" "$PARSE_RUNNER" --emit "$src" "$obj" \ + 2>"$work/emit.err"; then emit_event "$event" FAIL "$name/emit (parse-runner --emit failed; see $work/emit.err)" return 0 fi @@ -378,7 +403,7 @@ run_parse_case() { if [ $RUN_R -eq 1 ]; then if [ $have_roundtrip -eq 1 ] && [ $have_readelf -eq 1 ] && [ $have_python3 -eq 1 ]; then t0=$(now_ms) - rt="$work/$name.rt.o" + rt="$work/$base_name.rt.o" r_ok=1; r_msg="" if ! "$ROUNDTRIP_BIN" "$obj" "$rt" 2>"$work/rt.err"; then r_ok=0; r_msg=" (roundtrip failed)"