kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit e3cc40502d041ed5d803683ac0afc4edf7b641ef
parent c1a6bf61d4e24de878d42dd717f69cc7589de591
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri, 15 May 2026 14:38:51 -0700

Complete O1 constfold cleanup

Diffstat:
Mdoc/CONSTFOLD.md | 54++++++++++++++++++++++++++++++------------------------
Mdoc/OPT1.md | 169+++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------
Mdoc/PERF.md | 37+++++++++++++++++++++++++++++++++++++
Msrc/api/cg.c | 384++++++++++++++++++++++++++++++++++++++++++++-----------------------------------
Mtest/api/cg_type_test.c | 85+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 485 insertions(+), 244 deletions(-)

diff --git a/doc/CONSTFOLD.md b/doc/CONSTFOLD.md @@ -25,12 +25,18 @@ Implemented: - delayed `SV_ARITH` for unary and binary arithmetic; - expression-local arithmetic chain folding; - straight-line local constant shadowing with conservative boundary - invalidation. + invalidation; +- disassembly and metrics refresh for the phase 5 probes; +- `ApiSValue` delayed compare/arithmetic payloads are now unioned; +- local-shadow invalidation is routed through named memory, control, and + address-taking boundary helpers; +- delayed arithmetic has a direct register-pressure materialization regression + test. Remaining: -- disassembly and metrics updates after the remaining phases land; -- cleanup/refactor work listed in "Future Refactors". +- consider a shadow generation counter if repeated `clear_all` scans show up in + future O1 metrics. ## Current Shape @@ -416,8 +422,7 @@ Tests cover: - delayed arithmetic consumed by another binop with immediate folding; - delayed unary arithmetic consumed by another unop with chain folding; - delayed arithmetic forced by store path; -- register pressure path is still best covered indirectly by `test-opt`; add a - direct CG API pressure case if a future regression appears. +- register-pressure materialization when no ordinary spill victim exists. Run: @@ -433,6 +438,9 @@ Done. Added local shadow state to `ApiSourceLocal` and invalidation helpers: ```c static void api_local_const_clear(ApiSourceLocal*); static void api_local_const_clear_all(CfreeCg*); +static void api_local_const_memory_boundary(CfreeCg*); +static void api_local_const_control_boundary(CfreeCg*); +static void api_local_const_address_taken(CfreeCg*, CfreeCgLocal); static int api_local_const_can_track(CfreeCg*, const ApiSourceLocal*, CfreeCgMemAccess); static void api_local_const_store(CfreeCg*, CfreeCgLocal, CfreeCgTypeId, i64); @@ -485,19 +493,18 @@ and an RV64 targeted case if available locally. ### Future Refactors -These are not required for the phase 3/4 patch, but should be considered before -growing the vstack simplifier further: - -- Move `SV_CMP` and `SV_ARITH` payload fields into a small union once the shape - stabilizes. The current flat struct keeps the first implementation simple but - increases every vstack entry. -- Centralize local-shadow invalidation behind named boundary helpers such as - `api_memory_boundary`, `api_control_boundary`, and - `api_local_address_taken`. The first implementation clears at call sites so - correctness is visible, but the repeated calls are easy to miss when adding - new CG API operations. -- Add an explicit CG API register-pressure test that forces delayed arithmetic - materialization when no ordinary spill victim exists. +Completed cleanup from the original phase 3/4 implementation: + +- `SV_CMP` and `SV_ARITH` payload fields now live in an `ApiSValue` union. +- Local-shadow invalidation now goes through named boundary helpers: + `api_local_const_memory_boundary`, `api_local_const_control_boundary`, and + `api_local_const_address_taken`. +- `test/api/cg_type_test.c` includes an explicit delayed-arithmetic + register-pressure case that forces materialization when no ordinary spill + victim exists. + +Still worth considering before growing the vstack simplifier further: + - Consider a shadow generation counter if whole-function local counts grow enough for repeated `clear_all` scans to show up in O1 metrics. - Consider one small builder helper in `test/api/cg_type_test.c` for creating @@ -506,26 +513,25 @@ growing the vstack simplifier further: ### Phase 5: Disassembly And Metrics -Re-run the probe cases from `doc/OPT1.md`: +Done. Re-ran the probe cases from `doc/OPT1.md`: - `6_5_17_compound_assign`; - a direct literal return case; - a compare/branch literal case; - one local-address-taken case. -For each of AArch64, x64, and RV64, compare: +For each of AArch64, x64, and RV64, recorded: - `.text` size; - instruction count; - `mov`/`load_imm` count; - arithmetic instruction count; -- O1 wall time; -- live-range time; -- regalloc time; - spills/reloads; - rewrite inserted instruction count. -Update `doc/PERF.md` only after the implementation lands and numbers are real. +Host JIT `run --time` samples recorded O1 wall time, live-range time, +regalloc time, spills/reloads, and rewrite inserted instruction count. The +current numbers are in `doc/PERF.md` under "Vstack Constfold Probe". ## Expected Results diff --git a/doc/OPT1.md b/doc/OPT1.md @@ -179,66 +179,126 @@ make test-smoke-x64 - rewrite reloads/stores/inserted instructions/live traffic; - link/JIT and compile frontend scopes. -## Current Disassembly Observations +## Current State -A May 2026 probe compiled existing parse corpus cases at `-O0` and `-O1`, -then disassembled with `llvm-objdump -dr`. The cases covered straight-line -scalar arithmetic, a direct call, a while loop, and a backward-goto loop: +Fresh May 2026 scaling probes used the same direct-call and function-table +families tracked in `doc/PERF.md`, with three samples per point and p50 times +below. Both normal-path ladders are now essentially linear through 1024 +generated functions. The old dense conflict matrix is absent from the normal +path (`opt.conflict_bytes=0`), and the live-range and allocator buckets double +with the input. -| case | AArch64 O0 -> O1 | x64 O0 -> O1 | RV64 O0 -> O1 | -| ---- | ---------------- | ------------ | ------------- | -| `6_5_17_compound_assign` | 40 -> 18 insns, 160 -> 72 bytes | 93 -> 22 insns, 158 -> 96 bytes | 53 -> 28 insns, 212 -> 112 bytes | -| `6_5_24_func_call` | 70 -> 26 insns, 276 -> 100 bytes | 194 -> 34 insns, 281 -> 131 bytes | 97 -> 47 insns, 384 -> 184 bytes | -| `6_8_02_while_sum` | 52 -> 26 insns, 208 -> 104 bytes | 100 -> 29 insns, 210 -> 127 bytes | 67 -> 38 insns, 268 -> 152 bytes | -| `6_8_11_goto_backward` | 46 -> 20 insns, 184 -> 80 bytes | 93 -> 24 insns, 186 -> 106 bytes | 61 -> 32 insns, 244 -> 128 bytes | +```text +straight direct-call ladder +N run.total opt.o1.total live_reg regalloc +64 18.155 14.826 3.899 8.056 +128 35.244 29.371 7.727 15.970 +256 70.254 58.873 15.765 32.168 +512 144.874 120.613 30.894 64.842 +1024 283.198 233.653 61.392 126.779 + +function-table ladder +N run.total opt.o1.total live_reg regalloc +64 17.536 14.571 3.822 7.915 +128 34.270 28.855 7.594 15.685 +256 68.305 57.831 15.183 31.413 +512 135.433 114.115 30.025 62.018 +1024 277.842 230.563 60.167 124.895 +``` + +A nonconstant wide-local spill probe also scales linearly in spill/reload and +rewrite counts, though regalloc bends slightly above 2x under heavy spill +pressure: + +```text +N opt.o1 live_reg regalloc spills reloads inserted +128 1.541 0.398 0.918 120 120 240 +256 3.230 0.778 2.048 248 248 496 +512 6.991 1.474 4.924 504 504 1008 +``` + +The same spill family exposed a correctness risk in the JIT/run path: some +large spill-heavy generated functions returned correctly while nearby sizes +segfaulted during execution (`N=256` and `N=512` in the focused probe). Treat +that as a codegen/runtime correctness bug before using the spill ladder as a +pure performance benchmark. + +Current code-shape probes compiled `identity_param`, `scalar_add`, +`while_sum`, `simple_branch`, `direct_call`, `const_local`, and +`local_addr_taken` across x64, AArch64, and RV64 with `-O1`, then disassembled +with `llvm-objdump -dr`. Observed progress: -- O1 is materially smaller across all three probed targets. +- O1 is materially smaller than O0 across all three supported native backends + in the older corpus probes. - The old prologue NOP sleds are gone on O1 for AArch64, x64, and RV64. -- Simple loop locals are now kept in registers in the AArch64, x64, and RV64 - loop probes instead of being reloaded from stack slots on each iteration. -- Branch-consuming compares now lower to direct compare branches in the probed - AArch64 cases. For example, the while-loop condition is: +- Simple loop locals are kept in registers in the loop probes instead of being + reloaded from stack slots on each iteration. +- Branch-consuming compares lower to direct compare branches. For example, + AArch64 while-loop conditions use `cmp w19, #0xa` plus `b.ge ...` rather + than a `cmp; cset; cmp #0; b.cond` bridge. +- Very local constant simplification handles the direct O1 probes covered by + `doc/CONSTFOLD.md`: immediate arithmetic, compare literals, delayed + arithmetic chains, and straight-line scalar-local shadows. The + `int x = 40; x += 2; return x;` probe now reduces to immediate return + materialization on AArch64: ```asm -cmp w19, #0xa -b.ge ... +mov w0, #0x2a +b ... ``` - The larger short-circuit control-flow probe likewise shows `cmp` plus direct - conditional branches and no `cset` bridge for branch consumers. +Remaining O1 shape issues visible in the current dumps: -Remaining O1 shape issues visible in the same dumps: +- Cheap branch layout cleanup is still missing. Every quality probe had at + least one unconditional branch to the immediately following block, including + trivial returns and shared return epilogues. +- Parameter and entry-slot promotion is incomplete. The trivial AArch64 + identity function still stores the incoming argument to a frame slot, reloads + it into `w19`, then copies it back to `w0`: + +```asm +stur w0, [x29, #-0x4] +ldur w19, [x29, #-0x4] +mov w0, w19 +``` -- Cheap branch layout cleanup is still missing. O1 frequently emits an - unconditional branch to the immediately following block, including before - loop headers and before shared return epilogues. - O1 still saves/restores more callee-saved registers than the body appears to - use in small functions. For example, the AArch64 while-loop probe saves - `x22` even though the body uses `x19-x21`; x64 and RV64 show similar - over-preservation. -- Post-RA copy cleanup still leaves avoidable moves such as `add tmp, ...` - followed by `mov live, tmp`. More two-address or destination-selection - folding would help on all targets. -- Parameter and entry-slot promotion is incomplete. The trivial identity - function at O1 still stores `w0` to a frame slot and reloads it before - returning on AArch64. -- Very local constant simplification is still absent in O1. The - `int x = 40; x += 2; return x;` probe is much smaller than O0, but still - emits immediate materialization plus an add instead of returning `42`. - -## Todo: Code Quality + need in small functions. The AArch64 while-loop probe saves `x19-x22`, and + the x64 direct-call probe saves `rbx/r12/r13/r14` in tiny functions. +- Post-RA copy cleanup still leaves avoidable moves such as: + +```asm +add w21, w20, w19 +mov w20, w21 +add w21, w19, #1 +mov w19, w21 +``` + +- Direct-call tiny functions are still heavy at O1. The x64 `callee(x) + 2` + probe emitted 167 bytes and 47 instructions across two small functions, + mostly frame setup, callee-save traffic, copies, and branch-to-epilogue + artifacts. O1 does not need general inlining, but call-side frame/save + discipline remains expensive. + +## Todo MIR's O1 path suggests these high-value local cleanups that still fit cfree's fast tier: -1. Keep compare-branch fusion covered by tests. - The current probes show direct `cmp` plus branch shapes for branch-consuming - compares on AArch64. Add focused regression coverage so the old - `cmp; cset; cmp #0; b.cond` bridge does not return. +1. Reduce and fix the spill-heavy JIT/runtime crash. + The nonconstant wide-local spill probe returned correctly for many sizes but + segfaulted at nearby large sizes. Isolate this to a small parse or CG API + testcase before doing more spill-pressure perf work. -2. Promote remaining scalar entry slots before backend allocation. +2. Clean up local branch layout artifacts. + MIR's full jump optimizer is O2-only, but its cheap pieces are appropriate + for O1: delete branches to immediate fallthrough blocks, forward + branch-to-branch targets, and invert a branch when it removes an + unconditional jump. Avoid full CFG layout work. + +3. Promote remaining scalar entry slots before backend allocation. MIR's C frontend represents normal scalar block locals as MIR registers and leaves stack slots for aggregates, forced-stack cases, and address-taken values. O1 now keeps simple loop locals in registers in the probe, but still @@ -246,12 +306,6 @@ fast tier: pass should promote remaining integer/pointer scalars whose address does not escape, starting with parameters and single-entry structured control flow. -3. Clean up local branch layout artifacts. - MIR's full jump optimizer is O2-only, but its cheap pieces are appropriate - for O1: delete branches to immediate fallthrough blocks, forward - branch-to-branch targets, and invert a branch when it removes an - unconditional jump. Avoid full CFG layout work. - 4. Avoid unnecessary callee-save traffic. Reserve and preserve only hard registers that survive final post-rewrite cleanup, and consider caller-saved registers for values that are not live @@ -264,10 +318,21 @@ fast tier: traffic. Also fold single-use arithmetic temporaries into their destination when target constraints allow it. -6. Add tiny local constant simplification where it is cheap. - O1 should not grow full SSA value optimization, but folding immediate-only - straight-line arithmetic before allocation would remove obvious code in - small functions without pulling O2 machinery into the fast tier. +6. Keep compare-branch fusion covered by tests. + The current probes show direct `cmp` plus branch shapes for branch-consuming + compares on AArch64. Add focused regression coverage so the old + `cmp; cset; cmp #0; b.cond` bridge does not return. + +7. Keep tiny local constant simplification bounded. + The vstack constfold path now removes obvious immediate and straight-line + scalar-local code before allocation. Keep it basic-block-local; broader + propagation belongs in the O2 SSA/value optimizer. + +8. Watch spill-pressure regalloc slope. + Normal-path scaling is linear, but heavy spill pressure still bends slightly + in regalloc time. After the correctness crash is fixed, rerun the + nonconstant wide-local ladder and decide whether interval probing or stack + slot assignment needs another narrow cleanup. - Keep `opt_combine` legality target-aware. Existing one-use copy/immediate/convert folds should stay conservative. New diff --git a/doc/PERF.md b/doc/PERF.md @@ -803,6 +803,43 @@ O1 allocator cleanup item identified by the MIR-shape report; further allocator work should now be justified by generated-code quality, O2 coalescing, or live range splitting rather than by O1 compile-time scaling. +### Vstack Constfold Probe + +After `doc/CONSTFOLD.md` phase 5, the focused O1 probes were re-run with +`build/cfree cc -O1 -target ... -c` and disassembled with `llvm-objdump -dr`. +Instruction counts below include function prologue/epilogue traffic; the useful +shape is that the direct literal, compound-assign, and literal compare/branch +probes all reduce to immediate return materialization, while the address-taken +probe still performs the load/add/store/reload sequence. + +```text +probe arch text insn mov arithmetic +compound_assign x64 45 11 6 1 +compound_assign aa64 32 8 2 2 +compound_assign rv64 72 18 1 3 +literal_return x64 45 11 6 1 +literal_return aa64 32 8 2 2 +literal_return rv64 72 18 1 3 +compare_branch_literal x64 45 11 6 1 +compare_branch_literal aa64 32 8 2 2 +compare_branch_literal rv64 72 18 1 3 +local_addr_taken x64 107 25 18 1 +local_addr_taken aa64 88 22 3 5 +local_addr_taken rv64 128 32 1 4 +``` + +Host JIT timings were measured with seven `build/cfree run --time -O1 -e +test_main` samples per probe; the table uses p50 milliseconds for time buckets +and stable first-sample counters for rewrite activity. + +```text +probe opt.o1 live_ranges regalloc spills reloads inserted +compound_assign 0.285 0.071 0.153 0 0 0 +literal_return 0.289 0.072 0.155 0 0 0 +compare_branch_literal 0.351 0.088 0.188 0 0 0 +local_addr_taken 0.351 0.088 0.187 0 0 0 +``` + ## Performance Priorities 1. Keep O1 on interval occupancy. diff --git a/src/api/cg.c b/src/api/cg.c @@ -1004,25 +1004,38 @@ typedef enum ApiDelayedArithKind { API_DELAYED_BINOP, } ApiDelayedArithKind; +typedef struct ApiDelayedCmp { + Operand a; + Operand b; + CmpOp op; + u8 a_owned; + u8 b_owned; + u8 pad[2]; +} ApiDelayedCmp; + +typedef struct ApiDelayedArith { + Operand a; + Operand b; + BinOp bin_op; + UnOp un_op; + u8 kind; + u8 a_owned; + u8 b_owned; + u8 pad; +} ApiDelayedArith; + typedef struct ApiSValue { Operand op; - Operand cmp_a; - Operand cmp_b; - Operand arith_a; - Operand arith_b; + union { + ApiDelayedCmp cmp; + ApiDelayedArith arith; + } delayed; CfreeCgTypeId type; - CmpOp cmp_op; - BinOp arith_bin_op; - UnOp arith_un_op; u8 kind; - u8 arith_kind; u8 res; u8 pinned; u8 lvalue; - u8 cmp_a_owned; - u8 cmp_b_owned; - u8 arith_a_owned; - u8 arith_b_owned; + u8 pad; FrameSlot spill_slot; CfreeCgLocal source_local; } ApiSValue; @@ -1292,11 +1305,11 @@ static ApiSValue api_make_cmp(CmpOp op, Operand a, Operand b, memset(&sv, 0, sizeof sv); sv.kind = SV_CMP; sv.type = result_ty; - sv.cmp_op = op; - sv.cmp_a = a; - sv.cmp_b = b; - sv.cmp_a_owned = a_owned ? 1u : 0u; - sv.cmp_b_owned = b_owned ? 1u : 0u; + sv.delayed.cmp.op = op; + sv.delayed.cmp.a = a; + sv.delayed.cmp.b = b; + sv.delayed.cmp.a_owned = a_owned ? 1u : 0u; + sv.delayed.cmp.b_owned = b_owned ? 1u : 0u; sv.res = RES_INHERENT; sv.spill_slot = FRAME_SLOT_NONE; sv.source_local = CFREE_CG_LOCAL_NONE; @@ -1308,11 +1321,11 @@ static ApiSValue api_make_arith_unop(UnOp op, Operand a, CfreeCgTypeId ty, ApiSValue sv; memset(&sv, 0, sizeof sv); sv.kind = SV_ARITH; - sv.arith_kind = API_DELAYED_UNOP; + sv.delayed.arith.kind = API_DELAYED_UNOP; sv.type = ty; - sv.arith_un_op = op; - sv.arith_a = a; - sv.arith_a_owned = a_owned ? 1u : 0u; + sv.delayed.arith.un_op = op; + sv.delayed.arith.a = a; + sv.delayed.arith.a_owned = a_owned ? 1u : 0u; sv.res = RES_INHERENT; sv.spill_slot = FRAME_SLOT_NONE; sv.source_local = CFREE_CG_LOCAL_NONE; @@ -1325,13 +1338,13 @@ static ApiSValue api_make_arith_binop(BinOp op, Operand a, Operand b, ApiSValue sv; memset(&sv, 0, sizeof sv); sv.kind = SV_ARITH; - sv.arith_kind = API_DELAYED_BINOP; + sv.delayed.arith.kind = API_DELAYED_BINOP; sv.type = ty; - sv.arith_bin_op = op; - sv.arith_a = a; - sv.arith_b = b; - sv.arith_a_owned = a_owned ? 1u : 0u; - sv.arith_b_owned = b_owned ? 1u : 0u; + sv.delayed.arith.bin_op = op; + sv.delayed.arith.a = a; + sv.delayed.arith.b = b; + sv.delayed.arith.a_owned = a_owned ? 1u : 0u; + sv.delayed.arith.b_owned = b_owned ? 1u : 0u; sv.res = RES_INHERENT; sv.spill_slot = FRAME_SLOT_NONE; sv.source_local = CFREE_CG_LOCAL_NONE; @@ -1632,51 +1645,57 @@ static int api_sv_owns_operand_reg(const ApiSValue *sv, const Operand *op) { } static void api_release_cmp(CfreeCg *g, ApiSValue *sv) { - if (sv->cmp_a_owned) - api_release_operand_reg(g, sv->cmp_a); - if (sv->cmp_b_owned && - (sv->cmp_b.kind != OPK_REG || sv->cmp_a.kind != OPK_REG || - sv->cmp_b.v.reg != sv->cmp_a.v.reg || sv->cmp_b.cls != sv->cmp_a.cls || - !sv->cmp_a_owned)) { - api_release_operand_reg(g, sv->cmp_b); - } - memset(&sv->cmp_a, 0, sizeof sv->cmp_a); - memset(&sv->cmp_b, 0, sizeof sv->cmp_b); - sv->cmp_a_owned = 0; - sv->cmp_b_owned = 0; + if (sv->delayed.cmp.a_owned) + api_release_operand_reg(g, sv->delayed.cmp.a); + if (sv->delayed.cmp.b_owned && + (sv->delayed.cmp.b.kind != OPK_REG || sv->delayed.cmp.a.kind != OPK_REG || + sv->delayed.cmp.b.v.reg != sv->delayed.cmp.a.v.reg || + sv->delayed.cmp.b.cls != sv->delayed.cmp.a.cls || + !sv->delayed.cmp.a_owned)) { + api_release_operand_reg(g, sv->delayed.cmp.b); + } + memset(&sv->delayed.cmp.a, 0, sizeof sv->delayed.cmp.a); + memset(&sv->delayed.cmp.b, 0, sizeof sv->delayed.cmp.b); + sv->delayed.cmp.a_owned = 0; + sv->delayed.cmp.b_owned = 0; sv->kind = SV_OPERAND; } static void api_release_arith(CfreeCg *g, ApiSValue *sv) { - if (sv->arith_a_owned) - api_release_operand_reg(g, sv->arith_a); - if (sv->arith_b_owned && - (sv->arith_b.kind != OPK_REG || sv->arith_a.kind != OPK_REG || - sv->arith_b.v.reg != sv->arith_a.v.reg || - sv->arith_b.cls != sv->arith_a.cls || !sv->arith_a_owned)) { - api_release_operand_reg(g, sv->arith_b); - } - memset(&sv->arith_a, 0, sizeof sv->arith_a); - memset(&sv->arith_b, 0, sizeof sv->arith_b); - sv->arith_a_owned = 0; - sv->arith_b_owned = 0; + if (sv->delayed.arith.a_owned) + api_release_operand_reg(g, sv->delayed.arith.a); + if (sv->delayed.arith.b_owned && + (sv->delayed.arith.b.kind != OPK_REG || + sv->delayed.arith.a.kind != OPK_REG || + sv->delayed.arith.b.v.reg != sv->delayed.arith.a.v.reg || + sv->delayed.arith.b.cls != sv->delayed.arith.a.cls || + !sv->delayed.arith.a_owned)) { + api_release_operand_reg(g, sv->delayed.arith.b); + } + memset(&sv->delayed.arith.a, 0, sizeof sv->delayed.arith.a); + memset(&sv->delayed.arith.b, 0, sizeof sv->delayed.arith.b); + sv->delayed.arith.a_owned = 0; + sv->delayed.arith.b_owned = 0; sv->kind = SV_OPERAND; } static void api_materialize_cmp_to(CfreeCg *g, ApiSValue *sv, Operand dst) { - g->target->cmp(g->target, sv->cmp_op, dst, sv->cmp_a, sv->cmp_b); - if (sv->cmp_a_owned && sv->cmp_a.kind == OPK_REG && - (sv->cmp_a.v.reg != dst.v.reg || sv->cmp_a.cls != dst.cls)) { - api_release_operand_reg(g, sv->cmp_a); - } - if (sv->cmp_b_owned && sv->cmp_b.kind == OPK_REG && - (sv->cmp_b.v.reg != dst.v.reg || sv->cmp_b.cls != dst.cls)) { - api_release_operand_reg(g, sv->cmp_b); - } - memset(&sv->cmp_a, 0, sizeof sv->cmp_a); - memset(&sv->cmp_b, 0, sizeof sv->cmp_b); - sv->cmp_a_owned = 0; - sv->cmp_b_owned = 0; + g->target->cmp(g->target, sv->delayed.cmp.op, dst, sv->delayed.cmp.a, + sv->delayed.cmp.b); + if (sv->delayed.cmp.a_owned && sv->delayed.cmp.a.kind == OPK_REG && + (sv->delayed.cmp.a.v.reg != dst.v.reg || + sv->delayed.cmp.a.cls != dst.cls)) { + api_release_operand_reg(g, sv->delayed.cmp.a); + } + if (sv->delayed.cmp.b_owned && sv->delayed.cmp.b.kind == OPK_REG && + (sv->delayed.cmp.b.v.reg != dst.v.reg || + sv->delayed.cmp.b.cls != dst.cls)) { + api_release_operand_reg(g, sv->delayed.cmp.b); + } + memset(&sv->delayed.cmp.a, 0, sizeof sv->delayed.cmp.a); + memset(&sv->delayed.cmp.b, 0, sizeof sv->delayed.cmp.b); + sv->delayed.cmp.a_owned = 0; + sv->delayed.cmp.b_owned = 0; sv->kind = SV_OPERAND; sv->op = dst; sv->type = dst.type; @@ -1685,24 +1704,28 @@ static void api_materialize_cmp_to(CfreeCg *g, ApiSValue *sv, Operand dst) { } static void api_materialize_arith_to(CfreeCg *g, ApiSValue *sv, Operand dst) { - if (sv->arith_kind == API_DELAYED_UNOP) { - g->target->unop(g->target, sv->arith_un_op, dst, sv->arith_a); + if (sv->delayed.arith.kind == API_DELAYED_UNOP) { + g->target->unop(g->target, sv->delayed.arith.un_op, dst, + sv->delayed.arith.a); } else { - g->target->binop(g->target, sv->arith_bin_op, dst, sv->arith_a, - sv->arith_b); - } - if (sv->arith_a_owned && sv->arith_a.kind == OPK_REG && - (sv->arith_a.v.reg != dst.v.reg || sv->arith_a.cls != dst.cls)) { - api_release_operand_reg(g, sv->arith_a); - } - if (sv->arith_b_owned && sv->arith_b.kind == OPK_REG && - (sv->arith_b.v.reg != dst.v.reg || sv->arith_b.cls != dst.cls)) { - api_release_operand_reg(g, sv->arith_b); - } - memset(&sv->arith_a, 0, sizeof sv->arith_a); - memset(&sv->arith_b, 0, sizeof sv->arith_b); - sv->arith_a_owned = 0; - sv->arith_b_owned = 0; + g->target->binop(g->target, sv->delayed.arith.bin_op, dst, + sv->delayed.arith.a, + sv->delayed.arith.b); + } + if (sv->delayed.arith.a_owned && sv->delayed.arith.a.kind == OPK_REG && + (sv->delayed.arith.a.v.reg != dst.v.reg || + sv->delayed.arith.a.cls != dst.cls)) { + api_release_operand_reg(g, sv->delayed.arith.a); + } + if (sv->delayed.arith.b_owned && sv->delayed.arith.b.kind == OPK_REG && + (sv->delayed.arith.b.v.reg != dst.v.reg || + sv->delayed.arith.b.cls != dst.cls)) { + api_release_operand_reg(g, sv->delayed.arith.b); + } + memset(&sv->delayed.arith.a, 0, sizeof sv->delayed.arith.a); + memset(&sv->delayed.arith.b, 0, sizeof sv->delayed.arith.b); + sv->delayed.arith.a_owned = 0; + sv->delayed.arith.b_owned = 0; sv->kind = SV_OPERAND; sv->op = dst; sv->type = dst.type; @@ -1711,9 +1734,9 @@ static void api_materialize_arith_to(CfreeCg *g, ApiSValue *sv, Operand dst) { } static int api_arith_rhs_reusable(const ApiSValue *sv) { - if (sv->arith_kind == API_DELAYED_UNOP) + if (sv->delayed.arith.kind == API_DELAYED_UNOP) return 0; - switch (sv->arith_bin_op) { + switch (sv->delayed.arith.bin_op) { case BO_IADD: case BO_IMUL: case BO_AND: @@ -1733,12 +1756,12 @@ static int api_materialize_cmp_victim(CfreeCg *g, u8 cls) { Operand dst; if (sv->kind != SV_CMP || sv->pinned) continue; - if (sv->cmp_a_owned && sv->cmp_a.kind == OPK_REG && - sv->cmp_a.cls == RC_INT) { - dst = api_op_reg(sv->cmp_a.v.reg, api_sv_type(sv)); - } else if (sv->cmp_b_owned && sv->cmp_b.kind == OPK_REG && - sv->cmp_b.cls == RC_INT) { - dst = api_op_reg(sv->cmp_b.v.reg, api_sv_type(sv)); + if (sv->delayed.cmp.a_owned && sv->delayed.cmp.a.kind == OPK_REG && + sv->delayed.cmp.a.cls == RC_INT) { + dst = api_op_reg(sv->delayed.cmp.a.v.reg, api_sv_type(sv)); + } else if (sv->delayed.cmp.b_owned && sv->delayed.cmp.b.kind == OPK_REG && + sv->delayed.cmp.b.cls == RC_INT) { + dst = api_op_reg(sv->delayed.cmp.b.v.reg, api_sv_type(sv)); } else { continue; } @@ -1756,12 +1779,13 @@ static int api_materialize_arith_victim(CfreeCg *g, u8 cls) { Operand dst; if (sv->kind != SV_ARITH || sv->pinned) continue; - if (sv->arith_a_owned && sv->arith_a.kind == OPK_REG && - sv->arith_a.cls == RC_INT) { - dst = api_op_reg(sv->arith_a.v.reg, api_sv_type(sv)); - } else if (api_arith_rhs_reusable(sv) && sv->arith_b_owned && - sv->arith_b.kind == OPK_REG && sv->arith_b.cls == RC_INT) { - dst = api_op_reg(sv->arith_b.v.reg, api_sv_type(sv)); + if (sv->delayed.arith.a_owned && sv->delayed.arith.a.kind == OPK_REG && + sv->delayed.arith.a.cls == RC_INT) { + dst = api_op_reg(sv->delayed.arith.a.v.reg, api_sv_type(sv)); + } else if (api_arith_rhs_reusable(sv) && sv->delayed.arith.b_owned && + sv->delayed.arith.b.kind == OPK_REG && + sv->delayed.arith.b.cls == RC_INT) { + dst = api_op_reg(sv->delayed.arith.b.v.reg, api_sv_type(sv)); } else { continue; } @@ -1820,12 +1844,12 @@ static void api_ensure_reg(CfreeCg *g, ApiSValue *sv) { if (sv->kind == SV_CMP) { CfreeCgTypeId ty = api_sv_type(sv); Operand dst; - if (sv->cmp_a_owned && sv->cmp_a.kind == OPK_REG && - sv->cmp_a.cls == RC_INT) { - dst = api_op_reg(sv->cmp_a.v.reg, ty); - } else if (sv->cmp_b_owned && sv->cmp_b.kind == OPK_REG && - sv->cmp_b.cls == RC_INT) { - dst = api_op_reg(sv->cmp_b.v.reg, ty); + if (sv->delayed.cmp.a_owned && sv->delayed.cmp.a.kind == OPK_REG && + sv->delayed.cmp.a.cls == RC_INT) { + dst = api_op_reg(sv->delayed.cmp.a.v.reg, ty); + } else if (sv->delayed.cmp.b_owned && sv->delayed.cmp.b.kind == OPK_REG && + sv->delayed.cmp.b.cls == RC_INT) { + dst = api_op_reg(sv->delayed.cmp.b.v.reg, ty); } else { Reg r = api_alloc_reg_or_spill(g, RC_INT, @@ -1838,12 +1862,13 @@ static void api_ensure_reg(CfreeCg *g, ApiSValue *sv) { if (sv->kind == SV_ARITH) { CfreeCgTypeId ty = api_sv_type(sv); Operand dst; - if (sv->arith_a_owned && sv->arith_a.kind == OPK_REG && - sv->arith_a.cls == RC_INT) { - dst = api_op_reg(sv->arith_a.v.reg, ty); - } else if (api_arith_rhs_reusable(sv) && sv->arith_b_owned && - sv->arith_b.kind == OPK_REG && sv->arith_b.cls == RC_INT) { - dst = api_op_reg(sv->arith_b.v.reg, ty); + if (sv->delayed.arith.a_owned && sv->delayed.arith.a.kind == OPK_REG && + sv->delayed.arith.a.cls == RC_INT) { + dst = api_op_reg(sv->delayed.arith.a.v.reg, ty); + } else if (api_arith_rhs_reusable(sv) && sv->delayed.arith.b_owned && + sv->delayed.arith.b.kind == OPK_REG && + sv->delayed.arith.b.cls == RC_INT) { + dst = api_op_reg(sv->delayed.arith.b.v.reg, ty); } else { Reg r = api_alloc_reg_or_spill(g, RC_INT, @@ -2341,6 +2366,19 @@ static void api_local_const_clear_all(CfreeCg *g) { api_local_const_clear(&g->locals[i]); } +static void api_local_const_memory_boundary(CfreeCg *g) { + api_local_const_clear_all(g); +} + +static void api_local_const_control_boundary(CfreeCg *g) { + api_local_const_clear_all(g); +} + +static void api_local_const_address_taken(CfreeCg *g, CfreeCgLocal local) { + api_local_const_clear_all(g); + api_local_const_clear(api_local_from_handle(g, local)); +} + static int api_local_const_can_track(CfreeCg *g, const ApiSourceLocal *rec, CfreeCgMemAccess access) { u32 width; @@ -2458,21 +2496,24 @@ static int api_try_fold_arith_chain(CfreeCg *g, BinOp op, CfreeCgTypeId ty, ApiSValue *out) { i64 folded; BinOp result_op; - if (a->kind != SV_ARITH || a->arith_kind != API_DELAYED_BINOP || - a->arith_a.kind != OPK_REG || a->arith_b.kind != OPK_IMM || + if (a->kind != SV_ARITH || a->delayed.arith.kind != API_DELAYED_BINOP || + a->delayed.arith.a.kind != OPK_REG || + a->delayed.arith.b.kind != OPK_IMM || b->kind != SV_OPERAND || b->op.kind != OPK_IMM) { return 0; } - result_op = a->arith_bin_op; - switch (a->arith_bin_op) { + result_op = a->delayed.arith.bin_op; + switch (a->delayed.arith.bin_op) { case BO_IADD: if (op == BO_IADD) { - if (!api_try_fold_int_binop(g, BO_IADD, ty, a->arith_b.v.imm, b->op.v.imm, + if (!api_try_fold_int_binop(g, BO_IADD, ty, + a->delayed.arith.b.v.imm, b->op.v.imm, &folded)) return 0; result_op = BO_IADD; } else if (op == BO_ISUB) { - if (!api_try_fold_int_binop(g, BO_ISUB, ty, a->arith_b.v.imm, b->op.v.imm, + if (!api_try_fold_int_binop(g, BO_ISUB, ty, + a->delayed.arith.b.v.imm, b->op.v.imm, &folded)) return 0; result_op = BO_IADD; @@ -2482,12 +2523,14 @@ static int api_try_fold_arith_chain(CfreeCg *g, BinOp op, CfreeCgTypeId ty, break; case BO_ISUB: if (op == BO_IADD) { - if (!api_try_fold_int_binop(g, BO_ISUB, ty, b->op.v.imm, a->arith_b.v.imm, + if (!api_try_fold_int_binop(g, BO_ISUB, ty, b->op.v.imm, + a->delayed.arith.b.v.imm, &folded)) return 0; result_op = BO_IADD; } else if (op == BO_ISUB) { - if (!api_try_fold_int_binop(g, BO_IADD, ty, a->arith_b.v.imm, b->op.v.imm, + if (!api_try_fold_int_binop(g, BO_IADD, ty, + a->delayed.arith.b.v.imm, b->op.v.imm, &folded)) return 0; result_op = BO_ISUB; @@ -2496,19 +2539,22 @@ static int api_try_fold_arith_chain(CfreeCg *g, BinOp op, CfreeCgTypeId ty, } break; case BO_XOR: - if (op != BO_XOR || !api_try_fold_int_binop(g, BO_XOR, ty, a->arith_b.v.imm, + if (op != BO_XOR || !api_try_fold_int_binop(g, BO_XOR, ty, + a->delayed.arith.b.v.imm, b->op.v.imm, &folded)) return 0; result_op = BO_XOR; break; case BO_AND: - if (op != BO_AND || !api_try_fold_int_binop(g, BO_AND, ty, a->arith_b.v.imm, + if (op != BO_AND || !api_try_fold_int_binop(g, BO_AND, ty, + a->delayed.arith.b.v.imm, b->op.v.imm, &folded)) return 0; result_op = BO_AND; break; case BO_OR: - if (op != BO_OR || !api_try_fold_int_binop(g, BO_OR, ty, a->arith_b.v.imm, + if (op != BO_OR || !api_try_fold_int_binop(g, BO_OR, ty, + a->delayed.arith.b.v.imm, b->op.v.imm, &folded)) return 0; result_op = BO_OR; @@ -2517,31 +2563,34 @@ static int api_try_fold_arith_chain(CfreeCg *g, BinOp op, CfreeCgTypeId ty, return 0; } if (api_op_is_int_identity(g, result_op, ty, folded)) { - *out = api_make_sv_with_reg_ownership(a->arith_a, ty, a->arith_a_owned); - a->arith_a_owned = 0; - memset(&a->arith_a, 0, sizeof a->arith_a); + *out = api_make_sv_with_reg_ownership(a->delayed.arith.a, ty, + a->delayed.arith.a_owned); + a->delayed.arith.a_owned = 0; + memset(&a->delayed.arith.a, 0, sizeof a->delayed.arith.a); return 1; } - a->arith_bin_op = result_op; - a->arith_b.v.imm = folded; + a->delayed.arith.bin_op = result_op; + a->delayed.arith.b.v.imm = folded; *out = *a; - a->arith_a_owned = 0; - a->arith_b_owned = 0; - memset(&a->arith_a, 0, sizeof a->arith_a); - memset(&a->arith_b, 0, sizeof a->arith_b); + a->delayed.arith.a_owned = 0; + a->delayed.arith.b_owned = 0; + memset(&a->delayed.arith.a, 0, sizeof a->delayed.arith.a); + memset(&a->delayed.arith.b, 0, sizeof a->delayed.arith.b); return 1; } static int api_try_fold_unary_chain(ApiSValue *a, UnOp op, CfreeCgTypeId ty, ApiSValue *out) { if (op != UO_BNOT || a->kind != SV_ARITH || - a->arith_kind != API_DELAYED_UNOP || a->arith_un_op != UO_BNOT || - a->arith_a.kind != OPK_REG) { + a->delayed.arith.kind != API_DELAYED_UNOP || + a->delayed.arith.un_op != UO_BNOT || + a->delayed.arith.a.kind != OPK_REG) { return 0; } - *out = api_make_sv_with_reg_ownership(a->arith_a, ty, a->arith_a_owned); - a->arith_a_owned = 0; - memset(&a->arith_a, 0, sizeof a->arith_a); + *out = api_make_sv_with_reg_ownership(a->delayed.arith.a, ty, + a->delayed.arith.a_owned); + a->delayed.arith.a_owned = 0; + memset(&a->delayed.arith.a, 0, sizeof a->delayed.arith.a); return 1; } @@ -3333,7 +3382,7 @@ void cfree_cg_load(CfreeCg *g, CfreeCgMemAccess access) { if (!g) return; if (access.flags & CFREE_CG_MEM_VOLATILE) - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); v = api_pop(g); if (!api_is_lvalue_sv(&v)) { api_push(g, v); @@ -3392,11 +3441,9 @@ void cfree_cg_addr(CfreeCg *g) { ApiSourceLocal *rec; if (!g) return; - api_local_const_clear_all(g); T = g->target; v = api_pop(g); - if (v.source_local != CFREE_CG_LOCAL_NONE) - api_local_const_clear(api_local_from_handle(g, v.source_local)); + api_local_const_address_taken(g, v.source_local); api_ensure_reg(g, &v); if (!api_is_lvalue_sv(&v)) { compiler_panic(g->c, g->cur_loc, "CfreeCg: addr operand is not an lvalue"); @@ -3424,7 +3471,7 @@ void cfree_cg_store(CfreeCg *g, CfreeCgMemAccess access) { if (!g) return; if (access.flags & CFREE_CG_MEM_VOLATILE) - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); T = g->target; rv = api_pop(g); lv = api_pop(g); @@ -3451,7 +3498,7 @@ void cfree_cg_store(CfreeCg *g, CfreeCgMemAccess access) { } } else if (lv.op.kind == OPK_INDIRECT || lv.op.kind == OPK_GLOBAL || (access.flags & CFREE_CG_MEM_VOLATILE)) { - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); } if (lv.source_local != CFREE_CG_LOCAL_NONE && lv.op.kind == OPK_REG) { Operand dst = lv.op; @@ -4020,7 +4067,7 @@ void cfree_cg_atomic_load(CfreeCg *g, CfreeCgMemAccess access, Reg rr; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); ptr = api_pop(g); pty = api_sv_type(&ptr); val_ty = resolve_type(g->c, access.type); @@ -4042,7 +4089,7 @@ void cfree_cg_atomic_store(CfreeCg *g, CfreeCgMemAccess access, Operand addr, src; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); val = api_pop(g); ptr = api_pop(g); pty = api_sv_type(&ptr); @@ -4067,7 +4114,7 @@ void cfree_cg_atomic_rmw(CfreeCg *g, CfreeCgMemAccess access, Reg rr; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); val = api_pop(g); ptr = api_pop(g); pty = api_sv_type(&ptr); @@ -4095,7 +4142,7 @@ void cfree_cg_atomic_cmpxchg(CfreeCg *g, CfreeCgMemAccess access, Reg pr, kr; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); (void)weak; desired = api_pop(g); expected = api_pop(g); @@ -4129,7 +4176,7 @@ void cfree_cg_atomic_cmpxchg(CfreeCg *g, CfreeCgMemAccess access, void cfree_cg_atomic_fence(CfreeCg *g, CfreeCgMemOrder order) { if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); g->target->fence(g->target, api_map_mem_order(order)); } @@ -4214,7 +4261,7 @@ void cfree_cg_inline_asm(CfreeCg *g, CfreeCgInlineAsm asm_block) { (void)asm_block.clobber_abi_sets; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); T = g->target; h = g->c->env->heap; fallback_ty = builtin_id(CFREE_CG_BUILTIN_I64); @@ -4483,14 +4530,14 @@ CfreeCgLabel cfree_cg_label_new(CfreeCg *g) { void cfree_cg_label_place(CfreeCg *g, CfreeCgLabel label) { if (!g) return; - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->label_place(g->target, (Label)label); } void cfree_cg_jump(CfreeCg *g, CfreeCgLabel label) { if (!g) return; - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->jump(g->target, (Label)label); } @@ -4500,7 +4547,7 @@ static void api_branch_if(CfreeCg *g, ApiSValue *v, int branch_when_true, CfreeCgTypeId ty; if (!g) return; - api_local_const_clear_all(g); + api_local_const_control_boundary(g); T = g->target; ty = v->type ? v->type : builtin_id(CFREE_CG_BUILTIN_I32); if (v->op.kind == OPK_IMM && v->kind == SV_OPERAND) { @@ -4510,8 +4557,9 @@ static void api_branch_if(CfreeCg *g, ApiSValue *v, int branch_when_true, return; } if (v->kind == SV_CMP) { - CmpOp op = branch_when_true ? v->cmp_op : api_invert_cmp(v->cmp_op); - T->cmp_branch(T, op, v->cmp_a, v->cmp_b, label); + CmpOp op = + branch_when_true ? v->delayed.cmp.op : api_invert_cmp(v->delayed.cmp.op); + T->cmp_branch(T, op, v->delayed.cmp.a, v->delayed.cmp.b, label); api_release(g, v); return; } @@ -4547,7 +4595,7 @@ void cfree_cg_switch(CfreeCg *g, CfreeCgSwitch sw) { return; if (g->sp == 0) return; - api_local_const_clear_all(g); + api_local_const_control_boundary(g); selector = api_pop(g); ty = resolve_type(g->c, sw.selector_type); if (!ty) @@ -4581,7 +4629,7 @@ void cfree_cg_computed_goto(CfreeCg *g, const CfreeCgLabel *valid_targets, (void)ntargets; if (!g) return; - api_local_const_clear_all(g); + api_local_const_control_boundary(g); target = api_pop(g); api_release(g, &target); compiler_panic(g->c, g->cur_loc, @@ -4591,7 +4639,7 @@ void cfree_cg_computed_goto(CfreeCg *g, const CfreeCgLabel *valid_targets, void cfree_cg_unreachable(CfreeCg *g) { if (!g) return; - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->intrinsic(g->target, INTRIN_UNREACHABLE, NULL, 0, NULL, 0); } @@ -4676,7 +4724,7 @@ CfreeCgScope cfree_cg_scope_begin(CfreeCg *g, CfreeCgTypeId result_type) { return 0; break_lbl = g->target->label_new(g->target); cont_lbl = g->target->label_new(g->target); - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->label_place(g->target, cont_lbl); if (g->nscopes >= API_CG_MAX_SCOPES) { @@ -4723,7 +4771,7 @@ void cfree_cg_scope_end(CfreeCg *g, CfreeCgScope scope) { ApiSValue result = api_pop(g); api_scope_store_result(g, s, &result); } - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->label_place(g->target, s->break_lbl); g->target->scope_end(g->target, s->target_scope); api_scope_push_result(g, s); @@ -4739,7 +4787,7 @@ void cfree_cg_break(CfreeCg *g, CfreeCgScope scope) { ApiSValue result = api_pop(g); api_scope_store_result(g, s, &result); } - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->jump(g->target, s->break_lbl); } @@ -4758,7 +4806,7 @@ void cfree_cg_break_true(CfreeCg *g, CfreeCgScope scope) { if (cond.kind == SV_OPERAND && cond.op.kind == OPK_IMM) { if (cond.op.v.imm != 0) { api_scope_store_result(g, s, &result); - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->jump(g->target, s->break_lbl); } else { api_release(g, &result); @@ -4768,9 +4816,9 @@ void cfree_cg_break_true(CfreeCg *g, CfreeCgScope scope) { Label skip = g->target->label_new(g->target); api_branch_if(g, &cond, 0, skip); api_scope_store_result(g, s, &result); - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->jump(g->target, s->break_lbl); - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->label_place(g->target, skip); } } else { @@ -4793,7 +4841,7 @@ void cfree_cg_break_false(CfreeCg *g, CfreeCgScope scope) { if (cond.kind == SV_OPERAND && cond.op.kind == OPK_IMM) { if (cond.op.v.imm == 0) { api_scope_store_result(g, s, &result); - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->jump(g->target, s->break_lbl); } else { api_release(g, &result); @@ -4803,9 +4851,9 @@ void cfree_cg_break_false(CfreeCg *g, CfreeCgScope scope) { Label skip = g->target->label_new(g->target); api_branch_if(g, &cond, 1, skip); api_scope_store_result(g, s, &result); - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->jump(g->target, s->break_lbl); - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->label_place(g->target, skip); } } else { @@ -4817,7 +4865,7 @@ void cfree_cg_continue(CfreeCg *g, CfreeCgScope scope) { ApiCgScope *s = api_scope_from_handle(g, scope, 0, "CfreeCg: continue"); if (!s) return; - api_local_const_clear_all(g); + api_local_const_control_boundary(g); g->target->jump(g->target, s->continue_lbl); } @@ -4949,7 +4997,7 @@ void cfree_cg_memcpy(CfreeCg *g, uint64_t size, CfreeCgMemAccess dst_access, Operand dst_op, src_op; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); (void)src_access; if (size > UINT32_MAX) { compiler_panic(g->c, g->cur_loc, "CfreeCg: memcpy size exceeds CGTarget"); @@ -4974,7 +5022,7 @@ void cfree_cg_memmove(CfreeCg *g, uint64_t size, CfreeCgMemAccess dst_access, Operand args[3]; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); (void)dst_access; (void)src_access; if (size > INT64_MAX) { @@ -4999,7 +5047,7 @@ void cfree_cg_memset(CfreeCg *g, uint8_t val, uint64_t size, Operand dst_op, byte_val; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); if (size > UINT32_MAX) { compiler_panic(g->c, g->cur_loc, "CfreeCg: memset size exceeds CGTarget"); return; @@ -5168,7 +5216,7 @@ void cfree_cg_call(CfreeCg *g, uint32_t nargs, CfreeCgTypeId fn_type, int tail; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); tail = attrs.tail == CFREE_CG_TAIL_ALLOWED || attrs.tail == CFREE_CG_TAIL_MUST; T = g->target; @@ -5289,7 +5337,7 @@ static void api_cg_tail_call(CfreeCg *g, uint32_t nargs, ApiSValue callee; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); T = g->target; fty = resolve_type(g->c, fn_type); if (!fty) @@ -5348,7 +5396,7 @@ static void api_call_symbol_common(CfreeCg *g, CfreeCgSym sym, uint32_t nargs, Operand callee_op; if (!g) return; - api_local_const_clear_all(g); + api_local_const_memory_boundary(g); int tail = attrs.tail == CFREE_CG_TAIL_ALLOWED || attrs.tail == CFREE_CG_TAIL_MUST; T = g->target; diff --git a/test/api/cg_type_test.c b/test/api/cg_type_test.c @@ -692,6 +692,87 @@ static uint32_t cg_emit_delayed_store(CfreeCompiler* c, CfreeCgTypeId i32_ty, return size; } +static uint32_t cg_emit_delayed_pressure(CfreeCompiler* c, + CfreeCgTypeId i32_ty, + const char* name) { + enum { NPARAMS = 13 }; + CfreeCompileOptions opts; + CfreeObjBuilder* ob; + CfreeCg* cg; + CfreeCgFuncParam param_desc[NPARAMS]; + CfreeCgFuncSig sig; + CfreeCgDecl decl; + CfreeCgSym sym; + CfreeCgLocalAttrs attrs; + CfreeCgLocal params[NPARAMS]; + CfreeCgMemAccess mem; + uint32_t size; + + memset(&opts, 0, sizeof opts); + opts.opt_level = 1; + ob = (CfreeObjBuilder*)obj_new((Compiler*)c); + EXPECT(ob != NULL, "delayed pressure obj builder allocation failed"); + if (!ob) return 0; + cg = cfree_cg_new(c, ob, &opts); + EXPECT(cg != NULL, "delayed pressure cg allocation failed"); + if (!cg) { + obj_free((ObjBuilder*)ob); + return 0; + } + + memset(param_desc, 0, sizeof param_desc); + for (uint32_t i = 0; i < NPARAMS; ++i) + param_desc[i].type = i32_ty; + memset(&sig, 0, sizeof sig); + sig.ret = i32_ty; + sig.params = param_desc; + sig.nparams = NPARAMS; + sig.call_conv = CFREE_CG_CC_TARGET_C; + + memset(&decl, 0, sizeof decl); + decl.kind = CFREE_CG_DECL_FUNC; + decl.linkage_name = cfree_sym_intern(c, name); + decl.display_name = decl.linkage_name; + decl.type = cfree_cg_type_func(c, sig); + decl.sym.bind = CFREE_SB_GLOBAL; + decl.sym.visibility = CFREE_CG_VIS_DEFAULT; + sym = cfree_cg_decl(cg, decl); + EXPECT(sym != CFREE_CG_SYM_NONE, "delayed pressure decl failed"); + + cfree_cg_func_begin(cg, sym); + memset(&attrs, 0, sizeof attrs); + memset(&mem, 0, sizeof mem); + mem.type = i32_ty; + mem.align = cfree_cg_type_align(c, i32_ty); + for (uint32_t i = 0; i < NPARAMS; ++i) { + char pname[8]; + snprintf(pname, sizeof pname, "p%u", (unsigned)i); + attrs.name = cfree_sym_intern(c, pname); + params[i] = cfree_cg_param(cg, i, i32_ty, attrs); + EXPECT(params[i] != CFREE_CG_LOCAL_NONE, "delayed pressure param failed"); + } + + for (uint32_t i = 0; i + 1 < NPARAMS; ++i) { + cfree_cg_push_local(cg, params[i]); + cfree_cg_load(cg, mem); + cfree_cg_push_int(cg, 1, i32_ty); + cfree_cg_int_binop(cg, CFREE_CG_INT_ADD, 0); + } + cfree_cg_push_local(cg, params[NPARAMS - 1]); + cfree_cg_load(cg, mem); + cfree_cg_drop(cg); + for (uint32_t i = 0; i + 1 < NPARAMS; ++i) + cfree_cg_drop(cg); + cfree_cg_push_int(cg, 0, i32_ty); + cfree_cg_ret(cg); + cfree_cg_func_end(cg); + + cfree_cg_free(cg); + size = text_size((ObjBuilder*)ob); + obj_free((ObjBuilder*)ob); + return size; +} + typedef enum CgShadowBoundary { CG_SHADOW_LABEL, CG_SHADOW_BRANCH, @@ -883,6 +964,8 @@ static void exercise_cg_constfold_phases(CfreeCompiler* c, cg_emit_delayed_cmp(c, i32_ty, "cg_delayed_cmp_o1"); uint32_t delayed_store_size = cg_emit_delayed_store(c, i32_ty, "cg_delayed_store_o1"); + uint32_t pressure_size = + cg_emit_delayed_pressure(c, i32_ty, "cg_delayed_pressure_o1"); uint32_t label_size = cg_emit_local_shadow_boundary( c, i32_ty, "cg_shadow_label_o1", CG_SHADOW_LABEL); uint32_t branch_size = cg_emit_local_shadow_boundary( @@ -913,6 +996,8 @@ static void exercise_cg_constfold_phases(CfreeCompiler* c, EXPECT(delayed_store_size <= 64, "delayed arithmetic forced by store should stay compact, text size=%u", delayed_store_size); + EXPECT(pressure_size > 0, + "delayed arithmetic pressure materialization should emit code"); EXPECT(label_size > local_size, "label should clear local shadow, label=%u folded=%u", label_size, local_size);