commit 5e4d8e66ad3facbda764b97e7df3cd9cef32ede6
parent 89b2feab45e6fd2e256d7f6ea889d6383777149e
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Fri, 15 May 2026 13:04:53 -0700
opt: keep scalar locals in virtual registers
Diffstat:
15 files changed, 941 insertions(+), 78 deletions(-)
diff --git a/doc/OPT1.md b/doc/OPT1.md
@@ -181,52 +181,70 @@ make test-smoke-x64
## Current Disassembly Observations
-A small AArch64 Linux probe with arithmetic, a conditional, a loop, and a call
-shows the current shape:
-
-- O1 is smaller than O0 and removes many post-RA physical moves.
-- The old prologue NOP sleds are gone on O1 for AArch64, x64, and RV64; the
- known-frame entry path emits exact prologues for those targets.
-- Loads/stores for simple C locals and parameters are still unchanged between
- O0 and O1 in the probe.
-- Boolean branch lowering still materializes compare results:
-
-```asm
-cmp w19, #0x64
-cset w20, gt
-cmp w20, #0x0
-b.eq ...
-```
-
-The desired local shape is a direct conditional branch:
+A May 2026 probe compiled existing parse corpus cases at `-O0` and `-O1`,
+then disassembled with `llvm-objdump -dr`. The cases covered straight-line
+scalar arithmetic, a direct call, a while loop, and a backward-goto loop:
+
+| case | AArch64 O0 -> O1 | x64 O0 -> O1 | RV64 O0 -> O1 |
+| ---- | ---------------- | ------------ | ------------- |
+| `6_5_17_compound_assign` | 40 -> 18 insns, 160 -> 72 bytes | 93 -> 22 insns, 158 -> 96 bytes | 53 -> 28 insns, 212 -> 112 bytes |
+| `6_5_24_func_call` | 70 -> 26 insns, 276 -> 100 bytes | 194 -> 34 insns, 281 -> 131 bytes | 97 -> 47 insns, 384 -> 184 bytes |
+| `6_8_02_while_sum` | 52 -> 26 insns, 208 -> 104 bytes | 100 -> 29 insns, 210 -> 127 bytes | 67 -> 38 insns, 268 -> 152 bytes |
+| `6_8_11_goto_backward` | 46 -> 20 insns, 184 -> 80 bytes | 93 -> 24 insns, 186 -> 106 bytes | 61 -> 32 insns, 244 -> 128 bytes |
+
+Observed progress:
+
+- O1 is materially smaller across all three probed targets.
+- The old prologue NOP sleds are gone on O1 for AArch64, x64, and RV64.
+- Simple loop locals are now kept in registers in the AArch64, x64, and RV64
+ loop probes instead of being reloaded from stack slots on each iteration.
+- Branch-consuming compares now lower to direct compare branches in the probed
+ AArch64 cases. For example, the while-loop condition is:
```asm
-cmp w19, #0x64
-b.le ...
+cmp w19, #0xa
+b.ge ...
```
-The same pattern appears in loop conditions.
+ The larger short-circuit control-flow probe likewise shows `cmp` plus direct
+ conditional branches and no `cset` bridge for branch consumers.
+
+Remaining O1 shape issues visible in the same dumps:
+
+- Cheap branch layout cleanup is still missing. O1 frequently emits an
+ unconditional branch to the immediately following block, including before
+ loop headers and before shared return epilogues.
+- O1 still saves/restores more callee-saved registers than the body appears to
+ use in small functions. For example, the AArch64 while-loop probe saves
+ `x22` even though the body uses `x19-x21`; x64 and RV64 show similar
+ over-preservation.
+- Post-RA copy cleanup still leaves avoidable moves such as `add tmp, ...`
+ followed by `mov live, tmp`. More two-address or destination-selection
+ folding would help on all targets.
+- Parameter and entry-slot promotion is incomplete. The trivial identity
+ function at O1 still stores `w0` to a frame slot and reloads it before
+ returning on AArch64.
+- Very local constant simplification is still absent in O1. The
+ `int x = 40; x += 2; return x;` probe is much smaller than O0, but still
+ emits immediate materialization plus an add instead of returning `42`.
## Todo: Code Quality
MIR's O1 path suggests these high-value local cleanups that still fit cfree's
fast tier:
-1. Fuse compare-result conditionals into direct compare branches.
- MIR's C frontend emits direct compare branches for conditional compares, and
- its generic compare-value-plus-branch fusion lives in the heavier SSA
- combine path. For cfree O1, recover or preserve `IR_CMP_BRANCH` instead of
- lowering through `IR_CMP` plus `IR_CONDBR` when the compare result has a
- single branch use. This should use target `cmp_branch` support and should
- remove the `cmp; cset; cmp #0; b.cond` pattern on AArch64.
+1. Keep compare-branch fusion covered by tests.
+ The current probes show direct `cmp` plus branch shapes for branch-consuming
+ compares on AArch64. Add focused regression coverage so the old
+ `cmp; cset; cmp #0; b.cond` bridge does not return.
-2. Promote simple scalar locals before backend allocation.
+2. Promote remaining scalar entry slots before backend allocation.
MIR's C frontend represents normal scalar block locals as MIR registers and
leaves stack slots for aggregates, forced-stack cases, and address-taken
- values. O1 still stores parameters and scalar locals to frame slots and
- reloads them in straightforward code. A conservative mem2reg-lite pass
- should promote locals whose address does not escape, starting with
- integer/pointer scalars in single-entry structured control flow.
+ values. O1 now keeps simple loop locals in registers in the probe, but still
+ stores and reloads some parameter/entry slots. A conservative mem2reg-lite
+ pass should promote remaining integer/pointer scalars whose address does not
+ escape, starting with parameters and single-entry structured control flow.
3. Clean up local branch layout artifacts.
MIR's full jump optimizer is O2-only, but its cheap pieces are appropriate
@@ -234,10 +252,22 @@ fast tier:
branch-to-branch targets, and invert a branch when it removes an
unconditional jump. Avoid full CFG layout work.
-- Continue tightening post-rewrite DCE.
- Model hard-register call arguments and call clobbers precisely enough to
- delete dead caller-saved defs before calls without removing required ABI
- traffic.
+4. Avoid unnecessary callee-save traffic.
+ Reserve and preserve only hard registers that survive final post-rewrite
+ cleanup, and consider caller-saved registers for values that are not live
+ across calls. This would make small leaf functions much closer to expected
+ O1 output without requiring global optimization.
+
+5. Continue tightening post-rewrite DCE and copy cleanup.
+ Model hard-register call arguments and call clobbers precisely enough to
+ delete dead caller-saved defs before calls without removing required ABI
+ traffic. Also fold single-use arithmetic temporaries into their destination
+ when target constraints allow it.
+
+6. Add tiny local constant simplification where it is cheap.
+ O1 should not grow full SSA value optimization, but folding immediate-only
+ straight-line arithmetic before allocation would remove obvious code in
+ small functions without pulling O2 machinery into the fast tier.
- Keep `opt_combine` legality target-aware.
Existing one-use copy/immediate/convert folds should stay conservative. New
diff --git a/src/api/cg.c b/src/api/cg.c
@@ -990,6 +990,7 @@ typedef enum SResidency {
RES_INHERENT,
RES_REG,
RES_SPILLED,
+ RES_FIXED_REG,
} SResidency;
typedef enum ApiSValueKind {
@@ -1007,8 +1008,11 @@ typedef struct ApiSValue {
u8 res;
u8 pinned;
u8 lvalue;
- u8 pad[1];
+ u8 cmp_a_owned;
+ u8 cmp_b_owned;
+ u8 pad[3];
FrameSlot spill_slot;
+ CfreeCgLocal source_local;
} ApiSValue;
#define API_CG_STACK_INITIAL 16u
@@ -1036,7 +1040,8 @@ typedef struct ApiSourceLocal {
CfreeSym name;
CfreeCgLocalAttrs attrs;
SrcLoc loc;
- FrameSlot slot;
+ CGLocalDesc desc;
+ CGLocalStorage storage;
u32 param_index;
u8 kind;
u8 pad[3];
@@ -1256,6 +1261,7 @@ static ApiSValue api_make_sv(Operand op, CfreeCgTypeId ty) {
sv.type = ty;
sv.res = api_residency_for(&op);
sv.spill_slot = FRAME_SLOT_NONE;
+ sv.source_local = CFREE_CG_LOCAL_NONE;
return sv;
}
@@ -1266,7 +1272,8 @@ static ApiSValue api_make_lv(Operand op, CfreeCgTypeId ty) {
}
static ApiSValue api_make_cmp(CmpOp op, Operand a, Operand b,
- CfreeCgTypeId result_ty) {
+ CfreeCgTypeId result_ty, int a_owned,
+ int b_owned) {
ApiSValue sv;
memset(&sv, 0, sizeof sv);
sv.kind = SV_CMP;
@@ -1274,8 +1281,11 @@ static ApiSValue api_make_cmp(CmpOp op, Operand a, Operand b,
sv.cmp_op = op;
sv.cmp_a = a;
sv.cmp_b = b;
+ sv.cmp_a_owned = a_owned ? 1u : 0u;
+ sv.cmp_b_owned = b_owned ? 1u : 0u;
sv.res = RES_INHERENT;
sv.spill_slot = FRAME_SLOT_NONE;
+ sv.source_local = CFREE_CG_LOCAL_NONE;
return sv;
}
@@ -1298,7 +1308,10 @@ static int api_sv_op_is_reg_or_imm(const ApiSValue *sv) {
}
static int api_is_lvalue_sv(const ApiSValue *sv) {
- return sv->lvalue && api_operand_can_address(&sv->op);
+ return sv->lvalue &&
+ (api_operand_can_address(&sv->op) ||
+ (sv->source_local != CFREE_CG_LOCAL_NONE &&
+ sv->op.kind == OPK_REG));
}
static void api_stack_grow(CfreeCg *g, u32 want) {
@@ -1554,29 +1567,41 @@ static void api_release_operand_reg(CfreeCg *g, Operand op) {
api_free_reg(g, op.v.reg, op.cls);
}
+static int api_sv_owns_operand_reg(const ApiSValue *sv, const Operand *op) {
+ return sv->res == RES_REG && op->kind == OPK_REG && sv->op.kind == OPK_REG &&
+ sv->op.v.reg == op->v.reg && sv->op.cls == op->cls;
+}
+
static void api_release_cmp(CfreeCg *g, ApiSValue *sv) {
- api_release_operand_reg(g, sv->cmp_a);
- if (sv->cmp_b.kind != OPK_REG || sv->cmp_a.kind != OPK_REG ||
- sv->cmp_b.v.reg != sv->cmp_a.v.reg || sv->cmp_b.cls != sv->cmp_a.cls) {
+ if (sv->cmp_a_owned)
+ api_release_operand_reg(g, sv->cmp_a);
+ if (sv->cmp_b_owned &&
+ (sv->cmp_b.kind != OPK_REG || sv->cmp_a.kind != OPK_REG ||
+ sv->cmp_b.v.reg != sv->cmp_a.v.reg || sv->cmp_b.cls != sv->cmp_a.cls ||
+ !sv->cmp_a_owned)) {
api_release_operand_reg(g, sv->cmp_b);
}
memset(&sv->cmp_a, 0, sizeof sv->cmp_a);
memset(&sv->cmp_b, 0, sizeof sv->cmp_b);
+ sv->cmp_a_owned = 0;
+ sv->cmp_b_owned = 0;
sv->kind = SV_OPERAND;
}
static void api_materialize_cmp_to(CfreeCg *g, ApiSValue *sv, Operand dst) {
g->target->cmp(g->target, sv->cmp_op, dst, sv->cmp_a, sv->cmp_b);
- if (sv->cmp_a.kind == OPK_REG &&
+ if (sv->cmp_a_owned && sv->cmp_a.kind == OPK_REG &&
(sv->cmp_a.v.reg != dst.v.reg || sv->cmp_a.cls != dst.cls)) {
api_release_operand_reg(g, sv->cmp_a);
}
- if (sv->cmp_b.kind == OPK_REG &&
+ if (sv->cmp_b_owned && sv->cmp_b.kind == OPK_REG &&
(sv->cmp_b.v.reg != dst.v.reg || sv->cmp_b.cls != dst.cls)) {
api_release_operand_reg(g, sv->cmp_b);
}
memset(&sv->cmp_a, 0, sizeof sv->cmp_a);
memset(&sv->cmp_b, 0, sizeof sv->cmp_b);
+ sv->cmp_a_owned = 0;
+ sv->cmp_b_owned = 0;
sv->kind = SV_OPERAND;
sv->op = dst;
sv->type = dst.type;
@@ -2302,6 +2327,14 @@ static int api_source_flags_addr_taken(u32 flags) {
return (flags & CFREE_CG_LOCAL_ADDRESS_TAKEN) != 0;
}
+static int api_local_requires_memory(CfreeCg *g, CfreeCgTypeId ty,
+ CfreeCgLocalAttrs attrs) {
+ if (api_source_flags_addr_taken(attrs.flags))
+ return 1;
+ return !(cg_type_is_int(g->c, ty) || cg_type_is_float(g->c, ty) ||
+ cg_type_is_ptr(g->c, ty));
+}
+
static CfreeCgLocal api_local_handle(u32 index) {
u32 raw = index + 1u;
if (!raw)
@@ -2343,11 +2376,29 @@ static ApiSourceLocal *api_local_from_handle(CfreeCg *g, CfreeCgLocal local) {
return &g->locals[index];
}
+static CGLocalStorage api_frame_local_storage(CfreeCg *g,
+ const CGLocalDesc *d) {
+ FrameSlotDesc fsd;
+ CGLocalStorage st;
+ memset(&fsd, 0, sizeof fsd);
+ fsd.type = d->type;
+ fsd.name = d->name;
+ fsd.loc = d->loc;
+ fsd.size = d->size;
+ fsd.align = d->align;
+ fsd.kind = FS_LOCAL;
+ if (d->flags & CG_LOCAL_ADDR_TAKEN)
+ fsd.flags |= FSF_ADDR_TAKEN;
+ st.kind = CG_LOCAL_STORAGE_FRAME;
+ st.v.frame_slot = g->target->frame_slot(g->target, &fsd);
+ return st;
+}
+
CfreeCgLocal cfree_cg_local(CfreeCg *g, CfreeCgTypeId type,
CfreeCgLocalAttrs attrs) {
CfreeCgTypeId ty;
- FrameSlotDesc fsd;
- FrameSlot slot;
+ CGLocalDesc desc;
+ CGLocalStorage storage;
ApiSourceLocal *rec;
CfreeCgLocal handle;
if (!g)
@@ -2358,22 +2409,31 @@ CfreeCgLocal cfree_cg_local(CfreeCg *g, CfreeCgTypeId type,
handle = api_local_handle(g->nlocals);
if (handle == CFREE_CG_LOCAL_NONE || !api_grow_locals(g, g->nlocals + 1u))
return CFREE_CG_LOCAL_NONE;
- memset(&fsd, 0, sizeof fsd);
- fsd.type = ty;
- fsd.name = (Sym)attrs.name;
- fsd.loc = g->cur_loc;
- fsd.size = abi_cg_sizeof(g->c->abi, type);
- fsd.align = attrs.align ? attrs.align : abi_cg_alignof(g->c->abi, type);
- fsd.kind = FS_LOCAL;
+ memset(&desc, 0, sizeof desc);
+ desc.type = ty;
+ desc.name = (Sym)attrs.name;
+ desc.loc = g->cur_loc;
+ desc.size = abi_cg_sizeof(g->c->abi, type);
+ desc.align = attrs.align ? attrs.align : abi_cg_alignof(g->c->abi, type);
if (api_source_flags_addr_taken(attrs.flags))
- fsd.flags |= FSF_ADDR_TAKEN;
- slot = g->target->frame_slot(g->target, &fsd);
+ desc.flags |= CG_LOCAL_ADDR_TAKEN;
+ if (api_local_requires_memory(g, ty, attrs))
+ desc.flags |= CG_LOCAL_MEMORY_REQUIRED;
+ if (g->target->local)
+ storage = g->target->local(g->target, &desc);
+ else
+ storage = api_frame_local_storage(g, &desc);
+ if (storage.kind == CG_LOCAL_STORAGE_REG) {
+ cg_simple_regalloc_reserve(&g->regalloc, (RegClass)api_type_class(ty),
+ storage.v.reg);
+ }
rec = &g->locals[g->nlocals++];
rec->type = ty;
rec->name = attrs.name;
rec->attrs = attrs;
rec->loc = g->cur_loc;
- rec->slot = slot;
+ rec->desc = desc;
+ rec->storage = storage;
rec->param_index = 0;
rec->kind = API_SOURCE_LOCAL_AUTO;
return handle;
@@ -2425,7 +2485,17 @@ CfreeCgLocal cfree_cg_param(CfreeCg *g, uint32_t index, CfreeCgTypeId type,
rec->name = attrs.name;
rec->attrs = attrs;
rec->loc = g->cur_loc;
- rec->slot = slot;
+ memset(&rec->desc, 0, sizeof rec->desc);
+ rec->desc.type = ty;
+ rec->desc.name = (Sym)attrs.name;
+ rec->desc.loc = g->cur_loc;
+ rec->desc.size = fsd.size;
+ rec->desc.align = fsd.align;
+ rec->desc.flags = api_source_flags_addr_taken(attrs.flags)
+ ? CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED
+ : CG_LOCAL_MEMORY_REQUIRED;
+ rec->storage.kind = CG_LOCAL_STORAGE_FRAME;
+ rec->storage.v.frame_slot = slot;
rec->param_index = index;
rec->kind = API_SOURCE_LOCAL_PARAM;
return handle;
@@ -2530,6 +2600,27 @@ static void api_push_frame_lvalue(CfreeCg *g, FrameSlot slot,
api_push(g, api_make_lv(api_op_local(slot, type), type));
}
+static void api_push_source_frame_lvalue(CfreeCg *g, CfreeCgLocal local,
+ FrameSlot slot, CfreeCgTypeId type) {
+ ApiSValue sv;
+ if (!g)
+ return;
+ sv = api_make_lv(api_op_local(slot, type), type);
+ sv.source_local = local;
+ api_push(g, sv);
+}
+
+static void api_push_source_reg_lvalue(CfreeCg *g, CfreeCgLocal local, Reg reg,
+ CfreeCgTypeId type) {
+ ApiSValue sv;
+ if (!g)
+ return;
+ sv = api_make_lv(api_op_reg(reg, type), type);
+ sv.res = RES_FIXED_REG;
+ sv.source_local = local;
+ api_push(g, sv);
+}
+
void cfree_cg_push_local(CfreeCg *g, CfreeCgLocal local) {
ApiSourceLocal *rec;
if (!g)
@@ -2537,7 +2628,15 @@ void cfree_cg_push_local(CfreeCg *g, CfreeCgLocal local) {
rec = api_local_from_handle(g, local);
if (!rec)
return;
- api_push_frame_lvalue(g, rec->slot, rec->type);
+ if (rec->kind == API_SOURCE_LOCAL_AUTO &&
+ rec->storage.kind == CG_LOCAL_STORAGE_REG) {
+ api_push_source_reg_lvalue(g, local, rec->storage.v.reg, rec->type);
+ } else if (rec->kind == API_SOURCE_LOCAL_AUTO) {
+ api_push_source_frame_lvalue(g, local, rec->storage.v.frame_slot,
+ rec->type);
+ } else {
+ api_push_frame_lvalue(g, rec->storage.v.frame_slot, rec->type);
+ }
}
void cfree_cg_push_local_addr(CfreeCg *g, CfreeCgLocal local) {
@@ -2650,6 +2749,16 @@ void cfree_cg_load(CfreeCg *g, CfreeCgMemAccess access) {
ty = resolve_type(g->c, access.type);
if (!ty)
ty = api_sv_type(&v);
+ if (v.source_local != CFREE_CG_LOCAL_NONE && v.op.kind == OPK_REG) {
+ dst = v.op;
+ dst.type = ty;
+ v.op = dst;
+ v.type = ty;
+ v.lvalue = 0;
+ v.res = RES_FIXED_REG;
+ api_push(g, v);
+ return;
+ }
dst = api_force_reg(g, &v, ty);
dst.type = ty;
api_push(g, api_make_sv(dst, ty));
@@ -2680,6 +2789,7 @@ void cfree_cg_addr(CfreeCg *g) {
CfreeCgTypeId pty;
Reg r;
Operand dst;
+ ApiSourceLocal *rec;
if (!g)
return;
T = g->target;
@@ -2692,7 +2802,13 @@ void cfree_cg_addr(CfreeCg *g) {
pty = cg_type_ptr_to(g->c, api_sv_type(&v));
r = api_alloc_reg_or_spill(g, RC_INT, pty);
dst = api_op_reg(r, pty);
- T->addr_of(T, dst, v.op);
+ rec = v.source_local != CFREE_CG_LOCAL_NONE
+ ? api_local_from_handle(g, v.source_local)
+ : NULL;
+ if (rec && rec->kind == API_SOURCE_LOCAL_AUTO && T->local_addr)
+ T->local_addr(T, dst, &rec->desc, rec->storage);
+ else
+ T->addr_of(T, dst, v.op);
api_release(g, &v);
api_push(g, api_make_sv(dst, pty));
}
@@ -2722,7 +2838,22 @@ void cfree_cg_store(CfreeCg *g, CfreeCgMemAccess access) {
} else {
src = api_force_reg(g, &rv, api_sv_type(&rv));
}
- T->store(T, lv.op, src, api_mem_from_access(g, &lv.op, access));
+ if (lv.source_local != CFREE_CG_LOCAL_NONE && lv.op.kind == OPK_REG) {
+ Operand dst = lv.op;
+ dst.type = ty;
+ if (src.kind == OPK_IMM) {
+ T->load_imm(T, dst, src.v.imm);
+ } else if (src.kind == OPK_REG) {
+ if (src.v.reg != dst.v.reg || src.cls != dst.cls)
+ T->copy(T, dst, src);
+ } else {
+ src = api_force_reg(g, &rv, ty);
+ if (src.v.reg != dst.v.reg || src.cls != dst.cls)
+ T->copy(T, dst, src);
+ }
+ } else {
+ T->store(T, lv.op, src, api_mem_from_access(g, &lv.op, access));
+ }
api_release(g, &lv);
api_release(g, &rv);
}
@@ -2857,7 +2988,9 @@ static void api_cg_cmp(CfreeCg *g, CmpOp cop) {
ra = api_force_reg_unless_imm(g, &a, opty);
rb = api_force_reg_unless_imm(g, &b, opty);
if (api_type_class(opty) != RC_FP) {
- api_push(g, api_make_cmp(cop, ra, rb, i32));
+ api_push(g, api_make_cmp(cop, ra, rb, i32,
+ api_sv_owns_operand_reg(&a, &ra),
+ api_sv_owns_operand_reg(&b, &rb)));
return;
}
rr = api_alloc_reg_or_spill(g, RC_INT, i32);
diff --git a/src/arch/arch.h b/src/arch/arch.h
@@ -187,6 +187,35 @@ typedef struct FrameSlotDesc {
u16 flags; /* FrameSlotFlag */
} FrameSlotDesc;
+typedef enum CGLocalFlag {
+ CG_LOCAL_NONE = 0,
+ CG_LOCAL_ADDR_TAKEN = 1u << 0,
+ CG_LOCAL_MEMORY_REQUIRED = 1u << 1,
+} CGLocalFlag;
+
+typedef struct CGLocalDesc {
+ CfreeCgTypeId type;
+ Sym name;
+ SrcLoc loc;
+ u32 size;
+ u32 align;
+ u32 flags; /* CGLocalFlag */
+} CGLocalDesc;
+
+typedef enum CGLocalStorageKind {
+ CG_LOCAL_STORAGE_FRAME,
+ CG_LOCAL_STORAGE_REG,
+} CGLocalStorageKind;
+
+typedef struct CGLocalStorage {
+ u8 kind; /* CGLocalStorageKind */
+ u8 pad[3];
+ union {
+ FrameSlot frame_slot;
+ Reg reg;
+ } v;
+} CGLocalStorage;
+
typedef enum MemFlag {
MF_NONE = 0,
MF_VOLATILE = 1u << 0,
@@ -506,6 +535,9 @@ struct CGTarget {
* regs to the target. Plain machine targets consume hard regs; opt_cgtarget
* sets virtual_regs and records virtual Reg ids as SSA values. */
FrameSlot (*frame_slot)(CGTarget*, const FrameSlotDesc*);
+ CGLocalStorage (*local)(CGTarget*, const CGLocalDesc*);
+ void (*local_addr)(CGTarget*, Operand dst, const CGLocalDesc*,
+ CGLocalStorage);
void (*param)(CGTarget*, const CGParamDesc*);
void (*spill_reg)(CGTarget*, Operand src_reg, FrameSlot, MemAccess);
void (*reload_reg)(CGTarget*, Operand dst_reg, FrameSlot, MemAccess);
diff --git a/src/arch/regalloc.c b/src/arch/regalloc.c
@@ -111,7 +111,10 @@ int cg_simple_regalloc_free(CGSimpleRegAlloc* a, RegClass cls, Reg r) {
void cg_simple_regalloc_reserve(CGSimpleRegAlloc* a, RegClass cls, Reg r) {
if ((u32)cls >= 3u) return;
- if (a->virtual_regs) return;
+ if (a->virtual_regs) {
+ if (r != (Reg)REG_NONE && r >= a->next_virtual) a->next_virtual = r + 1u;
+ return;
+ }
cg_simple_regpool_reserve(&a->pools[cls], r);
}
diff --git a/src/opt/ir.c b/src/opt/ir.c
@@ -177,6 +177,23 @@ void ir_param_add(Func* f, const CGParamDesc* d) {
p->loc = d->loc;
}
+u32 ir_local_add(Func* f, const CGLocalDesc* d, CGLocalStorage storage) {
+ IRLocal* l;
+ if (f->nlocals == f->locals_cap) {
+ u32 ncap = f->locals_cap ? f->locals_cap * 2u : 8u;
+ IRLocal* nb = arena_zarray(f->arena, IRLocal, ncap);
+ if (f->locals) memcpy(nb, f->locals, sizeof(IRLocal) * f->nlocals);
+ f->locals = nb;
+ f->locals_cap = ncap;
+ }
+ l = &f->locals[f->nlocals];
+ l->id = f->nlocals + 1u;
+ l->desc = *d;
+ l->storage = storage;
+ ++f->nlocals;
+ return l->id;
+}
+
/* ---- construction ---- */
Func* ir_func_new(Compiler* c, const CGFuncDesc* desc) {
diff --git a/src/opt/ir.h b/src/opt/ir.h
@@ -217,6 +217,15 @@ typedef struct IRParam {
SrcLoc loc;
} IRParam;
+typedef struct IRLocal {
+ u32 id;
+ CGLocalDesc desc;
+ CGLocalStorage storage;
+ FrameSlot home_slot;
+ u8 address_taken;
+ u8 pad[3];
+} IRLocal;
+
/* ---- Inst / Block / Func ---- */
typedef struct Inst {
@@ -293,6 +302,8 @@ typedef struct Func {
u32 nframe_slots, frame_slots_cap;
IRParam* params;
u32 nparams, params_cap;
+ IRLocal* locals;
+ u32 nlocals, locals_cap;
/* Value table. Index 0 is VAL_NONE; first allocated Val is 1. */
u32* val_def_block;
@@ -352,6 +363,7 @@ Func* ir_func_new(Compiler*, const CGFuncDesc*);
u32 ir_block_new(Func*);
FrameSlot ir_frame_slot_new(Func*, const FrameSlotDesc*);
void ir_param_add(Func*, const CGParamDesc*);
+u32 ir_local_add(Func*, const CGLocalDesc*, CGLocalStorage);
Val ir_alloc_val(Func*, CfreeCgTypeId, u8 cls);
void ir_ensure_val(Func*, Val, CfreeCgTypeId, u8 cls);
diff --git a/src/opt/opt.c b/src/opt/opt.c
@@ -127,6 +127,7 @@ static void w_func_begin(CGTarget* t, const CGFuncDesc* fd) {
}
static void w_func_end(CGTarget* t);
+static void w_addr_of(CGTarget* t, Operand dst, Operand lv);
/* ---- registers and frame slots ---- */
@@ -135,6 +136,254 @@ static FrameSlot w_frame_slot(CGTarget* t, const FrameSlotDesc* d) {
return ir_frame_slot_new(o->f, d);
}
+static FrameSlot opt_local_frame_slot(Func* f, const CGLocalDesc* d,
+ int force_addr_taken) {
+ FrameSlotDesc fsd;
+ memset(&fsd, 0, sizeof fsd);
+ fsd.type = d->type;
+ fsd.name = d->name;
+ fsd.loc = d->loc;
+ fsd.size = d->size;
+ fsd.align = d->align;
+ fsd.kind = FS_LOCAL;
+ if (force_addr_taken || (d->flags & CG_LOCAL_ADDR_TAKEN))
+ fsd.flags |= FSF_ADDR_TAKEN;
+ return ir_frame_slot_new(f, &fsd);
+}
+
+static u8 opt_local_reg_class_for(Compiler* c, CfreeCgTypeId ty) {
+ CfreeCgTypeKind kind = cfree_cg_type_kind((CfreeCompiler*)c, ty);
+ return kind == CFREE_CG_TYPE_FLOAT ? RC_FP : RC_INT;
+}
+
+static u8 opt_local_reg_class(OptImpl* o, CfreeCgTypeId ty) {
+ return opt_local_reg_class_for(o->c, ty);
+}
+
+static CGLocalStorage w_local(CGTarget* t, const CGLocalDesc* d) {
+ OptImpl* o = impl_of(t);
+ CGLocalStorage st;
+ memset(&st, 0, sizeof st);
+ if ((d->flags & (CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED)) == 0) {
+ Val v = ir_alloc_val(o->f, d->type, opt_local_reg_class(o, d->type));
+ st.kind = CG_LOCAL_STORAGE_REG;
+ st.v.reg = (Reg)v;
+ } else {
+ st.kind = CG_LOCAL_STORAGE_FRAME;
+ st.v.frame_slot = opt_local_frame_slot(o->f, d, 0);
+ }
+ ir_local_add(o->f, d, st);
+ return st;
+}
+
+static IRLocal* opt_find_local_by_reg(Func* f, Reg reg) {
+ for (u32 i = 0; i < f->nlocals; ++i) {
+ IRLocal* l = &f->locals[i];
+ if (l->storage.kind == CG_LOCAL_STORAGE_REG && l->storage.v.reg == reg)
+ return l;
+ }
+ return NULL;
+}
+
+static void w_local_addr(CGTarget* t, Operand dst, const CGLocalDesc* d,
+ CGLocalStorage st) {
+ OptImpl* o = impl_of(t);
+ IRLocal* local = NULL;
+ FrameSlot frame_slot = FRAME_SLOT_NONE;
+ const CGLocalDesc* desc = d;
+ if (st.kind == CG_LOCAL_STORAGE_REG) {
+ local = opt_find_local_by_reg(o->f, st.v.reg);
+ if (!local) {
+ compiler_panic(o->c, d ? d->loc : o->pending_loc,
+ "opt_cgtarget: unknown register-backed local address");
+ }
+ if (local->home_slot == FRAME_SLOT_NONE)
+ local->home_slot = opt_local_frame_slot(o->f, &local->desc, 1);
+ local->address_taken = 1;
+ local->desc.flags |= CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED;
+ frame_slot = local->home_slot;
+ desc = &local->desc;
+ } else {
+ frame_slot = st.v.frame_slot;
+ }
+ Operand lv;
+ memset(&lv, 0, sizeof lv);
+ lv.kind = OPK_LOCAL;
+ lv.cls = RC_INT;
+ lv.type = desc ? desc->type : dst.type;
+ lv.v.frame_slot = frame_slot;
+ w_addr_of(t, dst, lv);
+}
+
+static Operand opt_local_addr_operand(IRLocal* l) {
+ Operand o;
+ memset(&o, 0, sizeof o);
+ o.kind = OPK_LOCAL;
+ o.cls = RC_INT;
+ o.type = l->desc.type;
+ o.v.frame_slot = l->home_slot;
+ return o;
+}
+
+static MemAccess opt_local_mem(IRLocal* l) {
+ MemAccess m;
+ memset(&m, 0, sizeof m);
+ m.type = l->desc.type;
+ m.size = l->desc.size;
+ m.align = l->desc.align;
+ m.alias.kind = ALIAS_LOCAL;
+ m.alias.v.local_id = (i32)l->home_slot;
+ return m;
+}
+
+static int inst_defines_val(const Inst* in, Val v) {
+ if (!in || v == VAL_NONE) return 0;
+ if (in->def == v) return 1;
+ for (u32 i = 0; i < in->ndefs; ++i)
+ if (in->defs[i] == v) return 1;
+ return 0;
+}
+
+static int op_uses_reg(const Operand* op, Reg reg) {
+ if (!op) return 0;
+ if (op->kind == OPK_REG && op->v.reg == reg) return 1;
+ if (op->kind == OPK_INDIRECT && op->v.ind.base == reg) return 1;
+ return 0;
+}
+
+static int abivalue_uses_reg(const CGABIValue* v, Reg reg) {
+ if (!v) return 0;
+ if (op_uses_reg(&v->storage, reg)) return 1;
+ for (u32 i = 0; i < v->nparts; ++i)
+ if (op_uses_reg(&v->parts[i].op, reg)) return 1;
+ return 0;
+}
+
+static int inst_uses_local_reg(const Inst* in, Reg reg) {
+ if (!in) return 0;
+ for (u32 i = 0; i < in->nopnds; ++i) {
+ int is_def = i == 0 && in->opnds[i].kind == OPK_REG &&
+ inst_defines_val(in, (Val)in->opnds[i].v.reg);
+ if (!is_def && op_uses_reg(&in->opnds[i], reg)) return 1;
+ }
+ switch ((IROp)in->op) {
+ case IR_CALL: {
+ IRCallAux* aux = (IRCallAux*)in->extra.aux;
+ if (!aux) return 0;
+ if (op_uses_reg(&aux->desc.callee, reg)) return 1;
+ for (u32 i = 0; i < aux->desc.nargs; ++i)
+ if (abivalue_uses_reg(&aux->desc.args[i], reg)) return 1;
+ return 0;
+ }
+ case IR_RET: {
+ IRRetAux* aux = (IRRetAux*)in->extra.aux;
+ return aux && aux->present && abivalue_uses_reg(&aux->val, reg);
+ }
+ case IR_SCOPE_BEGIN: {
+ IRScopeAux* aux = (IRScopeAux*)in->extra.aux;
+ return aux && op_uses_reg(&aux->desc.cond, reg);
+ }
+ case IR_ASM_BLOCK: {
+ IRAsmAux* aux = (IRAsmAux*)in->extra.aux;
+ if (!aux) return 0;
+ for (u32 i = 0; i < aux->nin; ++i)
+ if (op_uses_reg(&aux->in_ops[i], reg)) return 1;
+ return 0;
+ }
+ case IR_INTRINSIC: {
+ IRIntrinAux* aux = (IRIntrinAux*)in->extra.aux;
+ if (!aux) return 0;
+ for (u32 i = 0; i < aux->narg; ++i)
+ if (op_uses_reg(&aux->args[i], reg)) return 1;
+ return 0;
+ }
+ default:
+ return 0;
+ }
+}
+
+static void opt_make_local_load(Func* f, Inst* out, IRLocal* l, SrcLoc loc) {
+ memset(out, 0, sizeof *out);
+ out->op = IR_LOAD;
+ out->loc = loc;
+ out->type = l->desc.type;
+ out->def = (Val)l->storage.v.reg;
+ out->opnds = arena_array(f->arena, Operand, 2);
+ out->opnds[0].kind = OPK_REG;
+ out->opnds[0].cls = opt_local_reg_class_for(f->c, l->desc.type);
+ out->opnds[0].type = l->desc.type;
+ out->opnds[0].v.reg = l->storage.v.reg;
+ out->opnds[1] = opt_local_addr_operand(l);
+ out->nopnds = 2;
+ out->extra.mem = opt_local_mem(l);
+}
+
+static void opt_make_local_store(Func* f, Inst* out, IRLocal* l, SrcLoc loc) {
+ memset(out, 0, sizeof *out);
+ out->op = IR_STORE;
+ out->loc = loc;
+ out->opnds = arena_array(f->arena, Operand, 2);
+ out->opnds[0] = opt_local_addr_operand(l);
+ out->opnds[1].kind = OPK_REG;
+ out->opnds[1].cls = opt_local_reg_class_for(f->c, l->desc.type);
+ out->opnds[1].type = l->desc.type;
+ out->opnds[1].v.reg = l->storage.v.reg;
+ out->nopnds = 2;
+ out->extra.mem = opt_local_mem(l);
+}
+
+static IRLocal* opt_addr_taken_reg_local_defined_by(Func* f, const Inst* in) {
+ if (!in) return NULL;
+ for (u32 i = 0; i < f->nlocals; ++i) {
+ IRLocal* l = &f->locals[i];
+ if (!l->address_taken || l->home_slot == FRAME_SLOT_NONE) continue;
+ if (l->storage.kind == CG_LOCAL_STORAGE_REG &&
+ inst_defines_val(in, (Val)l->storage.v.reg))
+ return l;
+ }
+ return NULL;
+}
+
+static void opt_frame_home_addr_taken_locals(Func* f) {
+ int any = 0;
+ for (u32 i = 0; i < f->nlocals; ++i) {
+ IRLocal* l = &f->locals[i];
+ if (l->address_taken && l->storage.kind == CG_LOCAL_STORAGE_REG &&
+ l->home_slot != FRAME_SLOT_NONE) {
+ any = 1;
+ break;
+ }
+ }
+ if (!any) return;
+
+ for (u32 b = 0; b < f->nblocks; ++b) {
+ Block* bl = &f->blocks[b];
+ if (!bl->ninsts) continue;
+ u32 out_cap = bl->ninsts * (f->nlocals + 2u);
+ Inst* out = arena_zarray(f->arena, Inst, out_cap ? out_cap : 1u);
+ u32 nout = 0;
+ for (u32 i = 0; i < bl->ninsts; ++i) {
+ Inst in = bl->insts[i];
+ for (u32 j = 0; j < f->nlocals; ++j) {
+ IRLocal* used = &f->locals[j];
+ if (!used->address_taken || used->home_slot == FRAME_SLOT_NONE)
+ continue;
+ if (used->storage.kind != CG_LOCAL_STORAGE_REG)
+ continue;
+ if (inst_uses_local_reg(&in, used->storage.v.reg))
+ opt_make_local_load(f, &out[nout++], used, in.loc);
+ }
+ out[nout++] = in;
+ IRLocal* defined = opt_addr_taken_reg_local_defined_by(f, &in);
+ if (defined)
+ opt_make_local_store(f, &out[nout++], defined, in.loc);
+ }
+ bl->insts = out;
+ bl->ninsts = nout;
+ bl->cap = bl->ninsts;
+ }
+}
+
static void w_param(CGTarget* t, const CGParamDesc* d) {
OptImpl* o = impl_of(t);
/* Deep-copy parts so caller-stack memory isn't relied on. */
@@ -1466,6 +1715,7 @@ static u64 func_spill_store_count(Func* f) {
static void w_func_end(CGTarget* t) {
OptImpl* o = impl_of(t);
if (!o->f) return;
+ opt_frame_home_addr_taken_locals(o->f);
if (o->level == 1) {
metrics_scope_begin(o->c, "opt.o1.total");
@@ -1603,6 +1853,8 @@ CGTarget* opt_cgtarget_new(Compiler* c, CGTarget* target, int level) {
t->func_end = w_func_end;
t->frame_slot = w_frame_slot;
+ t->local = w_local;
+ t->local_addr = w_local_addr;
t->param = w_param;
t->spill_reg = w_spill_reg;
t->reload_reg = w_reload_reg;
diff --git a/test/api/cg_type_test.c b/test/api/cg_type_test.c
@@ -130,6 +130,135 @@ static void exercise_cg_handles(CfreeCompiler* c, CfreeCgTypeId i32_ty,
obj_free((ObjBuilder*)ob);
}
+static void exercise_cg_scalar_local(CfreeCompiler* c, CfreeCgTypeId i32_ty,
+ int opt_level) {
+ char name_buf[40];
+ CfreeCompileOptions opts;
+ CfreeObjBuilder* ob;
+ CfreeCg* cg;
+ CfreeCgFuncSig sig;
+ CfreeCgDecl decl;
+ CfreeCgSym sym;
+ CfreeCgLocalAttrs attrs;
+ CfreeCgLocal local;
+ CfreeCgMemAccess mem;
+
+ memset(&opts, 0, sizeof opts);
+ opts.opt_level = opt_level;
+ ob = (CfreeObjBuilder*)obj_new((Compiler*)c);
+ EXPECT(ob != NULL, "obj builder allocation failed");
+ if (!ob) return;
+ cg = cfree_cg_new(c, ob, &opts);
+ EXPECT(cg != NULL, "cg allocation failed");
+ if (!cg) {
+ obj_free((ObjBuilder*)ob);
+ return;
+ }
+
+ memset(&sig, 0, sizeof sig);
+ sig.ret = i32_ty;
+ sig.call_conv = CFREE_CG_CC_TARGET_C;
+
+ snprintf(name_buf, sizeof name_buf, "cg_scalar_local_o%d", opt_level);
+ memset(&decl, 0, sizeof decl);
+ decl.kind = CFREE_CG_DECL_FUNC;
+ decl.linkage_name = cfree_sym_intern(c, name_buf);
+ decl.display_name = decl.linkage_name;
+ decl.type = cfree_cg_type_func(c, sig);
+ decl.sym.bind = CFREE_SB_GLOBAL;
+ decl.sym.visibility = CFREE_CG_VIS_DEFAULT;
+ sym = cfree_cg_decl(cg, decl);
+ EXPECT(sym != CFREE_CG_SYM_NONE, "scalar local decl failed");
+
+ cfree_cg_func_begin(cg, sym);
+ memset(&attrs, 0, sizeof attrs);
+ attrs.name = cfree_sym_intern(c, "x");
+ local = cfree_cg_local(cg, i32_ty, attrs);
+ EXPECT(local != CFREE_CG_LOCAL_NONE, "scalar local handle is none");
+
+ memset(&mem, 0, sizeof mem);
+ mem.type = i32_ty;
+ mem.align = cfree_cg_type_align(c, i32_ty);
+
+ cfree_cg_push_local(cg, local);
+ cfree_cg_push_int(cg, 40, i32_ty);
+ cfree_cg_store(cg, mem);
+ cfree_cg_push_local(cg, local);
+ cfree_cg_load(cg, mem);
+ cfree_cg_push_int(cg, 2, i32_ty);
+ cfree_cg_int_binop(cg, CFREE_CG_INT_ADD, 0);
+ cfree_cg_ret(cg);
+ cfree_cg_func_end(cg);
+
+ cfree_cg_free(cg);
+ obj_free((ObjBuilder*)ob);
+}
+
+static void exercise_cg_late_local_addr(CfreeCompiler* c, CfreeCgTypeId i32_ty,
+ int opt_level) {
+ char name_buf[40];
+ CfreeCompileOptions opts;
+ CfreeObjBuilder* ob;
+ CfreeCg* cg;
+ CfreeCgFuncSig sig;
+ CfreeCgDecl decl;
+ CfreeCgSym sym;
+ CfreeCgLocalAttrs attrs;
+ CfreeCgLocal local;
+ CfreeCgMemAccess mem;
+
+ memset(&opts, 0, sizeof opts);
+ opts.opt_level = opt_level;
+ ob = (CfreeObjBuilder*)obj_new((Compiler*)c);
+ EXPECT(ob != NULL, "obj builder allocation failed");
+ if (!ob) return;
+ cg = cfree_cg_new(c, ob, &opts);
+ EXPECT(cg != NULL, "cg allocation failed");
+ if (!cg) {
+ obj_free((ObjBuilder*)ob);
+ return;
+ }
+
+ memset(&sig, 0, sizeof sig);
+ sig.ret = i32_ty;
+ sig.call_conv = CFREE_CG_CC_TARGET_C;
+
+ snprintf(name_buf, sizeof name_buf, "cg_late_local_addr_o%d", opt_level);
+ memset(&decl, 0, sizeof decl);
+ decl.kind = CFREE_CG_DECL_FUNC;
+ decl.linkage_name = cfree_sym_intern(c, name_buf);
+ decl.display_name = decl.linkage_name;
+ decl.type = cfree_cg_type_func(c, sig);
+ decl.sym.bind = CFREE_SB_GLOBAL;
+ decl.sym.visibility = CFREE_CG_VIS_DEFAULT;
+ sym = cfree_cg_decl(cg, decl);
+ EXPECT(sym != CFREE_CG_SYM_NONE, "late local addr decl failed");
+
+ cfree_cg_func_begin(cg, sym);
+ memset(&attrs, 0, sizeof attrs);
+ attrs.name = cfree_sym_intern(c, "x");
+ local = cfree_cg_local(cg, i32_ty, attrs);
+ EXPECT(local != CFREE_CG_LOCAL_NONE, "late addr local handle is none");
+
+ memset(&mem, 0, sizeof mem);
+ mem.type = i32_ty;
+ mem.align = cfree_cg_type_align(c, i32_ty);
+
+ cfree_cg_push_local(cg, local);
+ cfree_cg_push_int(cg, 41, i32_ty);
+ cfree_cg_store(cg, mem);
+ cfree_cg_push_local_addr(cg, local);
+ cfree_cg_indirect(cg);
+ cfree_cg_load(cg, mem);
+ cfree_cg_push_int(cg, 1, i32_ty);
+ cfree_cg_int_binop(cg, CFREE_CG_INT_ADD, 0);
+ cfree_cg_ret(cg);
+ cfree_cg_func_end(cg);
+
+ cfree_cg_free(cg);
+ obj_free((ObjBuilder*)ob);
+}
+
int main(void) {
CfreeTarget target;
CfreeEnv env;
@@ -284,6 +413,10 @@ int main(void) {
exercise_cg_handles(c, i32_ty, 0);
exercise_cg_handles(c, i32_ty, 1);
+ exercise_cg_scalar_local(c, i32_ty, 0);
+ exercise_cg_scalar_local(c, i32_ty, 1);
+ exercise_cg_late_local_addr(c, i32_ty, 0);
+ exercise_cg_late_local_addr(c, i32_ty, 1);
cfree_compiler_free(c);
return g_fail ? 1 : 0;
diff --git a/test/opt/opt_test.c b/test/opt/opt_test.c
@@ -132,6 +132,17 @@ static Operand op_local_(FrameSlot fs, CfreeCgTypeId ty) {
return o;
}
+static Operand op_indirect_(Reg base, CfreeCgTypeId ty) {
+ Operand o;
+ memset(&o, 0, sizeof o);
+ o.kind = OPK_INDIRECT;
+ o.cls = RC_INT;
+ o.type = ty;
+ o.v.ind.base = base;
+ o.v.ind.ofs = 0;
+ return o;
+}
+
static MemAccess mem_local_(FrameSlot fs, CfreeCgTypeId ty, u32 size,
u16 flags) {
MemAccess m;
@@ -145,6 +156,16 @@ static MemAccess mem_local_(FrameSlot fs, CfreeCgTypeId ty, u32 size,
return m;
}
+static MemAccess mem_unknown_(CfreeCgTypeId ty, u32 size) {
+ MemAccess m;
+ memset(&m, 0, sizeof m);
+ m.type = ty;
+ m.size = size;
+ m.align = size >= 8 ? 8 : size;
+ m.alias.kind = ALIAS_UNKNOWN;
+ return m;
+}
+
static Func* new_func(TestCtx* tc) {
CGFuncDesc fd;
CfreeCgFuncSig sig;
@@ -375,6 +396,10 @@ typedef struct MockCGTarget {
int load_const_calls;
u8 last_const_bytes[16];
u32 last_const_size;
+ int copy_calls;
+ int load_calls;
+ int store_calls;
+ int addr_of_calls;
int cmp_branch_calls;
} MockCGTarget;
@@ -458,6 +483,36 @@ static void mock_load_const(CGTarget* t, Operand dst, ConstBytes cb) {
memcpy(m->last_const_bytes, cb.bytes, cb.size);
}
+static void mock_copy(CGTarget* t, Operand dst, Operand src) {
+ MockCGTarget* m = (MockCGTarget*)t;
+ (void)dst;
+ (void)src;
+ ++m->copy_calls;
+}
+
+static void mock_load(CGTarget* t, Operand dst, Operand addr, MemAccess macc) {
+ MockCGTarget* m = (MockCGTarget*)t;
+ (void)dst;
+ (void)addr;
+ (void)macc;
+ ++m->load_calls;
+}
+
+static void mock_store(CGTarget* t, Operand addr, Operand src, MemAccess macc) {
+ MockCGTarget* m = (MockCGTarget*)t;
+ (void)addr;
+ (void)src;
+ (void)macc;
+ ++m->store_calls;
+}
+
+static void mock_addr_of(CGTarget* t, Operand dst, Operand lv) {
+ MockCGTarget* m = (MockCGTarget*)t;
+ (void)dst;
+ (void)lv;
+ ++m->addr_of_calls;
+}
+
static void mock_ret(CGTarget* t, const CGABIValue* v) {
(void)t;
(void)v;
@@ -511,6 +566,10 @@ static void mock_init(MockCGTarget* m, Compiler* c) {
m->base.cmp_branch = mock_cmp_branch;
m->base.load_imm = mock_load_imm;
m->base.load_const = mock_load_const;
+ m->base.copy = mock_copy;
+ m->base.load = mock_load;
+ m->base.store = mock_store;
+ m->base.addr_of = mock_addr_of;
m->base.ret = mock_ret;
m->base.set_loc = mock_set_loc;
m->base.get_allocable_regs = mock_get_allocable_regs;
@@ -2132,6 +2191,124 @@ static void opt_cmp_branch_keeps_fallthrough_after_block_growth(void) {
tc_fini(&tc);
}
+static void begin_mock_opt_func(TestCtx* tc, CGTarget* opt,
+ CfreeCgTypeId ret_ty) {
+ CGFuncDesc fd;
+ CfreeCgFuncSig sig;
+ memset(&fd, 0, sizeof fd);
+ memset(&sig, 0, sizeof sig);
+ sig.ret = ret_ty;
+ sig.call_conv = CFREE_CG_CC_TARGET_C;
+ fd.fn_type = cfree_cg_type_func(tc->c, sig);
+ opt->func_begin(opt, &fd);
+}
+
+static CGLocalDesc local_desc_(CfreeCgTypeId ty, u32 size, u32 align,
+ u32 flags) {
+ CGLocalDesc d;
+ memset(&d, 0, sizeof d);
+ d.type = ty;
+ d.size = size;
+ d.align = align;
+ d.flags = flags;
+ return d;
+}
+
+static void opt_local_hook_chooses_register_for_scalar(void) {
+ TestCtx tc;
+ tc_init(&tc);
+ MockCGTarget mock;
+ mock_init(&mock, tc.c);
+
+ CGTarget* opt = opt_cgtarget_new(tc.c, &mock.base, 1);
+ begin_mock_opt_func(&tc, opt, tc.i32);
+
+ CGLocalDesc d = local_desc_(tc.i32, 4, 4, 0);
+ CGLocalStorage st = opt->local(opt, &d);
+ EXPECT(st.kind == CG_LOCAL_STORAGE_REG,
+ "non-address-taken scalar local should be register-backed");
+ EXPECT(st.v.reg != (Reg)REG_NONE, "register-backed local needs a vreg");
+
+ opt->destroy(opt);
+ tc_fini(&tc);
+}
+
+static void opt_local_addr_taken_uses_frame_and_replays_addr_of(void) {
+ TestCtx tc;
+ tc_init(&tc);
+ MockCGTarget mock;
+ mock_init(&mock, tc.c);
+ static const Reg pool[] = {19};
+ static const Reg scratch[] = {9, 10};
+ mock_set_pool(&mock, RC_INT, pool, 1, scratch, 2, 0x4007FFFFu);
+
+ CGTarget* opt = opt_cgtarget_new(tc.c, &mock.base, 1);
+ CfreeCgTypeId ptr_ty = cfree_cg_type_ptr(tc.c, tc.i32, 0);
+ begin_mock_opt_func(&tc, opt, ptr_ty);
+
+ CGLocalDesc d =
+ local_desc_(tc.i32, 4, 4, CG_LOCAL_ADDR_TAKEN | CG_LOCAL_MEMORY_REQUIRED);
+ CGLocalStorage st = opt->local(opt, &d);
+ EXPECT(st.kind == CG_LOCAL_STORAGE_FRAME,
+ "address-taken local should be frame-backed");
+
+ Operand addr = op_reg_(1, ptr_ty);
+ opt->local_addr(opt, addr, &d, st);
+ CGABIValue retv = {0};
+ retv.type = ptr_ty;
+ retv.storage = addr;
+ opt->ret(opt, &retv);
+ opt->func_end(opt);
+
+ EXPECT(mock.addr_of_calls == 1,
+ "frame-backed local address should replay as addr_of");
+
+ opt->destroy(opt);
+ tc_fini(&tc);
+}
+
+static void opt_register_local_addr_frame_homes(void) {
+ TestCtx tc;
+ tc_init(&tc);
+ MockCGTarget mock;
+ mock_init(&mock, tc.c);
+ static const Reg pool[] = {19, 20};
+ static const Reg scratch[] = {9, 10};
+ mock_set_pool(&mock, RC_INT, pool, 2, scratch, 2, 0x4007FFFFu);
+
+ CGTarget* opt = opt_cgtarget_new(tc.c, &mock.base, 1);
+ CfreeCgTypeId ptr_ty = cfree_cg_type_ptr(tc.c, tc.i32, 0);
+ begin_mock_opt_func(&tc, opt, ptr_ty);
+
+ CGLocalDesc desc = local_desc_(tc.i32, 4, 4, 0);
+ CGLocalStorage storage = opt->local(opt, &desc);
+ EXPECT(storage.kind == CG_LOCAL_STORAGE_REG,
+ "frame-home test needs register-backed local");
+
+ Operand local = op_reg_(storage.v.reg, tc.i32);
+ opt->load_imm(opt, local, 42);
+ Operand addr = op_reg_(2, ptr_ty);
+ opt->local_addr(opt, addr, &desc, storage);
+ opt->store(opt, op_indirect_(addr.v.reg, tc.i32), local,
+ mem_unknown_(tc.i32, 4));
+
+ CGABIValue retv = {0};
+ retv.type = ptr_ty;
+ retv.storage = addr;
+ opt->ret(opt, &retv);
+ opt->func_end(opt);
+
+ EXPECT(mock.addr_of_calls == 1,
+ "register-backed local addrof should replay as addr_of");
+ EXPECT(mock.store_calls >= 1,
+ "register-backed local addrof should frame-home prior stores");
+ EXPECT(mock.load_calls >= 1,
+ "register-backed local use after addrof should reload from home");
+
+ opt->destroy(opt);
+ tc_fini(&tc);
+}
+
static void simple_regalloc_reports_exact_used_regs(void) {
CGSimpleRegAlloc a;
static const Reg regs[] = {3, 7, 11};
@@ -2192,6 +2369,9 @@ int main(void) {
opt_emit_no_virtual_alloc();
opt_records_const_bytes_by_value();
opt_cmp_branch_keeps_fallthrough_after_block_growth();
+ opt_local_hook_chooses_register_for_scalar();
+ opt_local_addr_taken_uses_frame_and_replays_addr_of();
+ opt_register_local_addr_frame_homes();
simple_regalloc_reports_exact_used_regs();
if (g_fails) {
fprintf(stderr, "opt tests: %d failed (%d checks)\n", g_fails, g_checks);
diff --git a/test/opt/phase0_guardrails.sh b/test/opt/phase0_guardrails.sh
@@ -34,6 +34,31 @@ int main() {
SRC
}
+write_scalar_local() {
+ cat >"$TMP/scalar_local.c" <<'SRC'
+int main() {
+ int x = 40;
+ x = x + 2;
+ return x == 42 ? 0 : 1;
+}
+SRC
+}
+
+write_late_addrof_join() {
+ cat >"$TMP/late_addrof_join.c" <<'SRC'
+int main() {
+ int x = 5;
+ if (x == 5)
+ x = 40;
+ else
+ x = 1;
+ int *p = &x;
+ *p = *p + 2;
+ return x == 42 ? 0 : 1;
+}
+SRC
+}
+
write_spills() {
{
printf 'int main() {\n'
@@ -130,12 +155,16 @@ check_metrics() {
write_branch_liveness
write_call_clobber
+write_scalar_local
+write_late_addrof_join
write_spills
write_many_small_functions
write_large_straight_line
run_case branch_liveness "$TMP/branch_liveness.c"
run_case call_clobber "$TMP/call_clobber.c"
+run_case scalar_local "$TMP/scalar_local.c"
+run_case late_addrof_join "$TMP/late_addrof_join.c"
run_case spills "$TMP/spills.c"
run_case many_small_functions "$TMP/many_small_functions.c"
run_case large_straight_line "$TMP/large_straight_line.c"
diff --git a/test/parse/cases/opt_01_scalar_local.c b/test/parse/cases/opt_01_scalar_local.c
@@ -0,0 +1,5 @@
+int test_main(void) {
+ int x = 40;
+ x = x + 2;
+ return x;
+}
diff --git a/test/parse/cases/opt_01_scalar_local.expected b/test/parse/cases/opt_01_scalar_local.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/cases/opt_02_late_addrof_join.c b/test/parse/cases/opt_02_late_addrof_join.c
@@ -0,0 +1,10 @@
+int test_main(void) {
+ int x = 5;
+ if (x == 5)
+ x = 40;
+ else
+ x = 1;
+ int *p = &x;
+ *p = *p + 2;
+ return x;
+}
diff --git a/test/parse/cases/opt_02_late_addrof_join.expected b/test/parse/cases/opt_02_late_addrof_join.expected
@@ -0,0 +1 @@
+42
diff --git a/test/parse/run.sh b/test/parse/run.sh
@@ -27,6 +27,11 @@
# paths subset of "DREJ" (default "DREJ")
# Equivalent env vars: CFREE_TEST_FILTER, CFREE_TEST_PATHS.
#
+# Optimization levels:
+# CFREE_OPT_LEVELS="0 1" whitespace-separated levels to test.
+# CFREE_OPT_LEVEL=1 compatibility shorthand for one level.
+# Default is "0 1".
+#
# Parallelism:
# default run in parallel with a capped CPU-count default.
# CFREE_TEST_JOBS=N run up to N cases concurrently.
@@ -68,6 +73,19 @@ ALLOW_SKIP="${CFREE_TEST_ALLOW_SKIP:-0}"
FILTER="${1:-${CFREE_TEST_FILTER:-}}"
PATHS="${2:-${CFREE_TEST_PATHS:-DREJ}}"
+if [ -n "${CFREE_OPT_LEVELS:-}" ]; then
+ OPT_LEVELS="$CFREE_OPT_LEVELS"
+elif [ -n "${CFREE_OPT_LEVEL:-}" ]; then
+ OPT_LEVELS="$CFREE_OPT_LEVEL"
+else
+ OPT_LEVELS="0 1"
+fi
+for opt in $OPT_LEVELS; do
+ case "$opt" in
+ 0|1|2) ;;
+ *) printf 'parse: invalid opt level %s in CFREE_OPT_LEVELS\n' "$opt" >&2; exit 2 ;;
+ esac
+done
case "$PATHS" in *D*) RUN_D=1;; *) RUN_D=0;; esac
case "$PATHS" in *R*) RUN_R=1;; *) RUN_R=0;; esac
case "$PATHS" in *E*) RUN_E=1;; *) RUN_E=0;; esac
@@ -300,7 +318,7 @@ if [ $have_clang_cross -eq 1 ]; then
fi
fi
-printf 'Running cases (%s jobs)...\n' "$TEST_JOBS"
+printf 'Running cases (%s jobs, opt levels: %s)...\n' "$TEST_JOBS" "$OPT_LEVELS"
# ---- per-case loop ---------------------------------------------------------
@@ -320,30 +338,35 @@ FILTERED_CASES=()
for src in "${CASES[@]}"; do
name="$(basename "$src" .c)"
[ -n "$FILTER" ] && [[ "$name" != *"$FILTER"* ]] && continue
- FILTERED_CASES+=("$src")
+ for opt in $OPT_LEVELS; do
+ FILTERED_CASES+=("$opt:$src")
+ done
done
run_parse_case() {
- local _idx="$1" src="$2" event="$3"
- local name work reason expected expected_byte obj t0 dt d_rc r_ok r_msg rt
+ local _idx="$1" item="$2" event="$3"
+ local opt src base_name name work reason expected expected_byte obj t0 dt d_rc r_ok r_msg rt
local exe link_dt j_rc
: "$_idx"
- name="$(basename "$src" .c)"
- work="$BUILD_DIR/parse/$name"
+ opt="${item%%:*}"
+ src="${item#*:}"
+ base_name="$(basename "$src" .c)"
+ name="$base_name/O$opt"
+ work="$BUILD_DIR/parse/$base_name.O$opt"
mkdir -p "$work"
# Skip sidecar
- if [ -e "$TEST_DIR/cases/$name.skip" ]; then
- reason=$(head -n1 "$TEST_DIR/cases/$name.skip")
+ if [ -e "$TEST_DIR/cases/$base_name.skip" ]; then
+ reason=$(head -n1 "$TEST_DIR/cases/$base_name.skip")
emit_event "$event" SKIP "$name" "$reason"
return 0
fi
# Expected exit code (default 0)
expected=0
- if [ -e "$TEST_DIR/cases/$name.expected" ]; then
- expected=$(head -n1 "$TEST_DIR/cases/$name.expected")
+ if [ -e "$TEST_DIR/cases/$base_name.expected" ]; then
+ expected=$(head -n1 "$TEST_DIR/cases/$base_name.expected")
fi
expected_byte=$(( expected & 0xff ))
@@ -351,7 +374,8 @@ run_parse_case() {
if [ $RUN_D -eq 1 ]; then
if [ $is_native_target -eq 1 ]; then
t0=$(now_ms)
- "$PARSE_RUNNER" --jit "$src" >"$work/d.out" 2>"$work/d.err"
+ CFREE_OPT_LEVEL="$opt" "$PARSE_RUNNER" --jit "$src" \
+ >"$work/d.out" 2>"$work/d.err"
d_rc=$?
dt=$(( $(now_ms) - t0 ))
emit_event "$event" TIME D "$dt"
@@ -366,9 +390,10 @@ run_parse_case() {
fi
# ---- emit (needed by R/E/J) ------------------------------------------
- obj="$work/$name.o"
+ obj="$work/$base_name.o"
if [ $RUN_R -eq 1 ] || [ $RUN_E -eq 1 ] || [ $RUN_J -eq 1 ]; then
- if ! "$PARSE_RUNNER" --emit "$src" "$obj" 2>"$work/emit.err"; then
+ if ! CFREE_OPT_LEVEL="$opt" "$PARSE_RUNNER" --emit "$src" "$obj" \
+ 2>"$work/emit.err"; then
emit_event "$event" FAIL "$name/emit (parse-runner --emit failed; see $work/emit.err)"
return 0
fi
@@ -378,7 +403,7 @@ run_parse_case() {
if [ $RUN_R -eq 1 ]; then
if [ $have_roundtrip -eq 1 ] && [ $have_readelf -eq 1 ] && [ $have_python3 -eq 1 ]; then
t0=$(now_ms)
- rt="$work/$name.rt.o"
+ rt="$work/$base_name.rt.o"
r_ok=1; r_msg=""
if ! "$ROUNDTRIP_BIN" "$obj" "$rt" 2>"$work/rt.err"; then
r_ok=0; r_msg=" (roundtrip failed)"