kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 07949547f417b0fc4fdeb839a3e163f28473fae9
parent 3d661011371f95c34738c1b22bda05e0309a96f5
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 31 May 2026 02:00:19 -0700

cg: reference code locations via per-block local symbols, not fixed offsets

Switch jump-table entries and `&&label` computed-goto address-takes were emitted
as layout-dependent fixed offsets: the jump table as `R_ABS64` against the
enclosing function symbol + a baked byte offset, and the address-take as an
in-place-patched displacement with NO object relocation at all (x86-64
`leaq disp(%rip)`, rv64 `auipc/addi`, aarch64 a 16-byte INTRA-label sequence).
These only stay correct if the assembler preserves cfree's exact instruction
layout — so aarch64 survived by being fixed-width, rv64 needed a `.option norvc`
band-aid, and x86-64 needed a disassembler special case (`pcrel_code_target`)
gated to x64 (`arch_needs_symbolic_code_refs`). That gate was a code smell: the
same fragility was latent in every backend.

Fix it once, uniformly: the MCEmitter lazily mints a per-MCLabel SB_LOCAL symbol
(`mc_label_symbol`, forward-ref safe — created undefined, defined at
label_place), and both constructs relocate against it:
  - jump table: `R_ABS64` vs `.Lcfblk.*` (addend 0)
  - `&&label`:  x86-64 `leaq sym(%rip)` R_PC32; aarch64 `adrp/add`
    ADR_PREL_PG_HI21 + ADD_ABS_LO12_NC; riscv64 `auipc/addi`
    PCREL_HI20 + PCREL_LO12_I (the same forms used to address a global).
The existing arch_reloc_operand symbolizer renders all of them, so cc -S needs
no per-arch code; the object is genuinely relocatable; and cc -c / cc -S emit
the same relocations so the round-trip byte+reloc lanes stay faithful.

Deletes the band-aids: the x64 pcrel_code_target hook + the
arch_needs_symbolic_code_refs gate, the rv64 file_prologue (.option norvc), the
aa64 `adr` is_local_branch case, and the asm_emit.c jump-table/leaq special
cases (collect_code_anchors extras, code_target_label, SecReloc target fields).
Net -182 lines. (The now-unused R_*_INTRA_* fixup kinds are left inert.)

Verified green, no regressions: test-hostas-cross 936/936 both lanes on all 3
arches; hostas-toy 312/0; test-asm-roundtrip 572/0 (L1 byte+reloc compare
confirms cc-c==cc-S); roundtrip-toy 624/0 (JIT + native ld); toy 1338/0; asm
27/0; symmetry clean; diff-llvm agrees; link 122/0; elf 40/0; debug OK;
smoke-x64 2/0; smoke-rv64 2/0.

Diffstat:
Mdoc/ASM_ROUNDTRIP_TESTING.md | 33++++++++++++++++-----------------
Msrc/api/asm_emit.c | 168++++++++++++-------------------------------------------------------------------
Msrc/arch/aa64/asm.c | 1-
Msrc/arch/aa64/native.c | 21++++++++++++++++-----
Msrc/arch/arch.h | 39---------------------------------------
Msrc/arch/mc.c | 89+++++++++++++++++++++++++++++++++++++++++++++++++++++++------------------------
Msrc/arch/mc.h | 8++++++++
Msrc/arch/registry.c | 17-----------------
Msrc/arch/rv64/asm.c | 16----------------
Msrc/arch/rv64/native.c | 21++++++++++++++++++---
Msrc/arch/x64/asm.c | 66------------------------------------------------------------------
Msrc/arch/x64/native.c | 10+++++++++-
Mtest/asm/hostas_cross.sh | 24++++++++++--------------
13 files changed, 163 insertions(+), 350 deletions(-)

diff --git a/doc/ASM_ROUNDTRIP_TESTING.md b/doc/ASM_ROUNDTRIP_TESTING.md @@ -250,23 +250,22 @@ downgrades to SKIP instead of hanging). All three ELF targets are in the gating default and pass **both** lanes — **936/936** = 312 cases × {O0,O1} × 3 arches, cfree-as **and** clang-as, judged purely by execution (matching exit code): -- **aarch64-linux**: green end-to-end — podman runs arm64 natively in its VM, so - it's fast. Fixed-width encodings preserve cfree's instruction layout, so code - references need no special spelling. -- **x86_64-linux**: cc -S references code locations **symbolically** so clang's - encoding choices (movabs vs mov-imm32, `jmp` rel32 vs rel8) can't shift a fixed - byte offset onto the wrong instruction: a `&&label` address-take is - `leaq Lcf_*(%rip)` (un-relocated PC-relative computes are detected via - `ArchAsmOps.pcrel_code_target`, the target gets a synthesized label), and - switch jump-table entries are `.quad Lcf_*` rather than `.quad fn+off` - (absolute data pointers into an executable section are re-pointed at a - synthesized code label by `collect_code_anchors` + `code_target_label`). -- **riscv64-linux**: cc -S emits `.option norvc`/`.option norelax` - (`ArchAsmOps.file_prologue`) to pin cfree's fixed layout against clang's - C-extension compression — cfree computes some `&&label`/jump-table targets as - fixed offsets, which compression would otherwise shift — plus the - `%pcrel_hi`/`%pcrel_lo` AUIPC-anchor pairing and `R_RISCV_CALL` AUIPC+JALR - fusion to `call`/`tail`. +Code locations that an encoding-divergent assembler must be able to recompute — +switch jump-table entries and `&&label` address-takes — are referenced through a +**per-basic-block local symbol** the MCEmitter mints (`mc_label_symbol`, +`src/arch/mc.c`), uniformly on all three arches. The jump table emits +`.quad .Lcfblk.*` (`R_ABS64` against the block symbol) and the address-take emits +the arch's standard PC-relative relocation against the same symbol: x86-64 +`leaq .Lcfblk.*(%rip)` (`R_PC32`), aarch64 `adrp`/`add` +(`ADR_PREL_PG_HI21`+`ADD_ABS_LO12_NC`), riscv64 `auipc`/`addi` +(`%pcrel_hi`/`%pcrel_lo`). The existing reloc-operand symbolizer renders all of +them, so the references are genuinely relocatable everywhere and clang's encoding +choices (movabs vs mov-imm32, `jmp` rel32 vs rel8, RVC compression) can't shift a +baked offset onto the wrong instruction. `cc -c` and `cc -S` emit the same +relocations, so the round-trip byte/reloc lanes stay faithful too. + +(aarch64-linux runs arm64 natively in the podman VM, so it's the fastest lane; +x86_64/riscv64 run under qemu-user in their containers.) Both lanes are judged by **execution**, never by bytes: cfree and clang emit different (execution-equivalent) code, so a byte/text match would be meaningless. diff --git a/src/api/asm_emit.c b/src/api/asm_emit.c @@ -529,8 +529,6 @@ typedef struct { u16 kind; Sym sym; i64 addend; - ObjSecId target_sec; /* section the reloc's symbol is defined in (or NONE) */ - u64 target_val; /* the symbol's value (offset within target_sec) */ } SecReloc; static int cmp_secreloc(const void* va, const void* vb) { @@ -566,8 +564,6 @@ static SecReloc* collect_relocs(Compiler* c, ObjBuilder* ob, ObjSecId sec_id, arr[n].kind = r->kind; arr[n].sym = s ? s->name : (Sym)0; arr[n].addend = r->addend; - arr[n].target_sec = s ? s->section_id : OBJ_SEC_NONE; - arr[n].target_val = s ? s->value : 0; ++n; } if (n > 1) qsort(arr, n, sizeof(SecReloc), cmp_secreloc); @@ -929,24 +925,17 @@ static void anchor_add(Compiler* c, u32** arr, u32* n, u32* cap, u32 off) { (*arr)[(*n)++] = off; } -/* Pre-scan: offsets in section `sec_id` that need a synthesized Lcf_ label so a - * layout-dependent reference resolves symbolically through any assembler: - * 1. targets of un-relocated intra-section local branches (b/jmp/jcc); - * 2. targets of un-relocated PC-relative code-address-takes (x86-64 `leaq - * disp(%rip)` for `&&label`); - * 3. offsets targeted by an absolute data-pointer relocation living in a - * NON-executable section (switch jump-table `.quad fn+off` entries). - * (2)/(3) are exactly the references that break when the assembler picks - * different instruction lengths than cfree did, so they are collected only for - * arches that need symbolic code refs (x86-64); fixed-width (aarch64) or - * layout-pinned (RISC-V .option norvc) arches keep the compact offset forms. */ -static u32* collect_code_anchors(Compiler* c, ObjBuilder* ob, ObjSecId sec_id, - ArchDisasm* dasm, const SecReloc* relocs, - u32 nrelocs, const u8* data, u32 total, - u32* n_out) { +/* Pre-scan: collect in-section branch targets of un-relocated local branches, + * so cc -S synthesizes a label there and the branch re-assembles. Code-location + * references that must survive a re-encoding assembler (switch jump-table + * entries, `&&label` address-takes) are NOT handled here — codegen emits them as + * relocations against per-block local symbols (mc_label_symbol), so the normal + * reloc-operand path symbolizes them and the target label is a real symbol. */ +static u32* collect_branch_targets(Compiler* c, ArchDisasm* dasm, + const SecReloc* relocs, u32 nrelocs, + const u8* data, u32 total, u32* n_out) { u32* arr = NULL; u32 n = 0, cap = 0, off = 0; - int want_sym = arch_needs_symbolic_code_refs(c); *n_out = 0; while (off < total) { @@ -957,44 +946,14 @@ static u32* collect_code_anchors(Compiler* c, ObjBuilder* ob, ObjSecId sec_id, off += 1; continue; } - if (!reloc_in_range(relocs, nrelocs, off, nb)) { - if (arch_is_local_branch(c, insn.mnemonic) && - parse_hex_tail(insn.operands, &tgt) && tgt < total) { - anchor_add(c, &arr, &n, &cap, (u32)tgt); - } else if (want_sym) { - i64 disp; - if (arch_pcrel_code_target(c, insn.mnemonic, insn.operands, &disp)) { - i64 t = (i64)off + (i64)nb + disp; - if (t >= 0 && (u64)t < total) - anchor_add(c, &arr, &n, &cap, (u32)t); - } - } + if (!reloc_in_range(relocs, nrelocs, off, nb) && + arch_is_local_branch(c, insn.mnemonic) && + parse_hex_tail(insn.operands, &tgt) && tgt < total) { + anchor_add(c, &arr, &n, &cap, (u32)tgt); } off += nb; } - if (want_sym) { - u32 nr = obj_reloc_total(ob), i; - for (i = 0; i < nr; ++i) { - const Reloc* r = obj_reloc_at(ob, i); - const Section* host; - const ObjSym* s; - const char* dir; - u32 width; - int pcrel; - i64 t; - if (!r || r->removed) continue; - host = obj_section_get(ob, r->section_id); - if (!host || (host->flags & SF_EXEC)) continue; /* code reloc: skip */ - if (!data_reloc_directive(r->kind, &dir, &width, &pcrel) || pcrel) - continue; /* only absolute data pointers (jump-table entries) */ - s = obj_symbol_get(ob, r->sym); - if (!s || s->section_id != sec_id) continue; - t = (i64)s->value + r->addend; - if (t >= 0 && (u64)t < total) anchor_add(c, &arr, &n, &cap, (u32)t); - } - } - if (n > 1) qsort(arr, n, sizeof(u32), cmp_u32); *n_out = n; return arr; @@ -1044,65 +1003,20 @@ static CfreeStatus emit_operands(Writer* w, const EmitCtx* x, return w_symbolized(w, insn->operands.s, insn->operands.len, name, ARCH_RELOC_SURG_TAIL); } - } else { - /* Un-relocated PC-relative code-address-take (x86-64 `leaq disp(%rip)` for - * `&&label`): rewrite the fixed displacement to the synthesized target - * label so an encoding-divergent assembler recomputes it. */ - i64 disp; - if (arch_pcrel_code_target(x->c, insn->mnemonic, insn->operands, &disp)) { - i64 t = (i64)off + (i64)insn->nbytes + disp; - if (t >= 0 && is_btarget(x, (u32)t)) { - char name[256]; - build_label_name(name, sizeof name, x, (u32)t); - return w_symbolized(w, insn->operands.s, insn->operands.len, name, - ARCH_RELOC_SURG_RIP); - } - } } return cfree_writer_write(w, insn->operands.s, insn->operands.len); } -/* Symbolic name for a code location (target_sec:target_off) referenced from a - * data directive: an assemblable label defined exactly there if one exists, - * else the synthesized `Lcf_<sec>_<off>` that collect_code_anchors guarantees - * is emitted in the target section. Mirrors the synth-vs-real choice the label - * emitter makes (symbol_at / build_label_name), so both ends agree. */ -static u32 code_target_label(char* buf, u32 cap, Compiler* c, ObjBuilder* ob, - ObjSecId target_sec, u32 target_off) { - ObjSymIter* it = obj_symiter_new(ob); - if (it) { - ObjSymEntry e; - while (obj_symiter_next(it, &e)) { - const ObjSym* s = e.sym; - Slice nm; - if (!s || s->removed || !s->name) continue; - if (s->section_id != target_sec || (u32)s->value != target_off) continue; - if (s->kind == SK_SECTION || s->kind == SK_FILE) continue; - nm = pool_slice(c->global, s->name); - if (slice_eq_cstr(nm, ".LpcrelHi")) continue; - if (sym_is_assemblable(nm)) { - u32 p = 0, j; - for (j = 0; j < nm.len && p + 1 < cap; ++j) buf[p++] = nm.s[j]; - buf[p] = '\0'; - obj_symiter_free(it); - return p; - } - } - obj_symiter_free(it); - } - return fmt_synth_label(buf, cap, (u32)target_sec, target_off); -} - /* Emit a data range, rendering any covered relocation as a symbolic integer * directive (`.quad sym+addend`) so cc -S | as reproduces the data relocation - * table — switch jump tables (R_ABS64 against the function) and any other - * relocated rodata/data. A reloc kind with no integer-directive form, or a - * target the assembler can't spell, falls back to raw `.byte`; the dropped - * reloc then surfaces in the round-trip's reloc comparison. `relocs` is the - * section's relocation list, sorted by offset. */ -static CfreeStatus emit_data_range(Writer* w, Compiler* c, ObjBuilder* ob, - const u8* data, u32 start, u32 end, - const SecReloc* relocs, u32 nrelocs) { + * table — switch jump tables (`.quad .Lcfblk.*` against per-block local + * symbols) and any other relocated rodata/data. A reloc kind with no + * integer-directive form, or a target the assembler can't spell, falls back to + * raw `.byte`; the dropped reloc then surfaces in the round-trip's reloc + * comparison. `relocs` is the section's relocation list, sorted by offset. */ +static CfreeStatus emit_data_range(Writer* w, Compiler* c, const u8* data, + u32 start, u32 end, const SecReloc* relocs, + u32 nrelocs) { u32 off = start; while (off < end) { const SecReloc* r = NULL; @@ -1128,33 +1042,6 @@ static CfreeStatus emit_data_range(Writer* w, Compiler* c, ObjBuilder* ob, * re-derives R_PC{32,64} instead of an absolute reloc. */ ArchRelocOperand bare = {ARCH_RELOC_SURG_NONE, "", "", 0, 0, 0}; if (data_reloc_directive(r->kind, &dir, &width, &pcrel) && - off + width <= end) { - const Section* tsec = (r->target_sec != OBJ_SEC_NONE) - ? obj_section_get(ob, r->target_sec) - : NULL; - /* An absolute pointer into executable code (switch jump-table entry): - * spell it as a label that moves with the code rather than `fn+off`. - * After an encoding-divergent assembler re-lays-out the function, a - * fixed offset would point into the wrong instruction; a label is - * recomputed to the correct address. Only for arches that need it. */ - if (!pcrel && tsec && (tsec->flags & SF_EXEC) && - arch_needs_symbolic_code_refs(c)) { - char label[256]; - u64 toff = r->target_val + (u64)r->addend; - CfreeStatus st; - code_target_label(label, sizeof label, c, ob, r->target_sec, - (u32)toff); - st = w_str(w, dir); - if (st != CFREE_OK) return st; - st = w_str(w, label); - if (st != CFREE_OK) return st; - st = w_newline(w); - if (st != CFREE_OK) return st; - off += width; - continue; - } - } - if (data_reloc_directive(r->kind, &dir, &width, &pcrel) && off + width <= end && build_symref(symref, sizeof symref, c, &bare, r->sym, r->addend) >= 0) { @@ -1316,13 +1203,6 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, sx.c = c; nsec = obj_section_count(ob); - /* Arch-specific leading directives (e.g. RISC-V `.option norvc` to pin - * cfree's fixed instruction layout against a compressing assembler). */ - { - const char* prologue = arch_asm_file_prologue(c); - if (prologue) w_str(w, prologue); - } - for (i = 1; i < nsec; ++i) { const Section* sec = obj_section_get(ob, (ObjSecId)i); SymLabel* labels; @@ -1368,8 +1248,8 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, buf_flatten(&sec->bytes, heap_data); flat_data = heap_data; if (dasm) - btargets = collect_code_anchors(c, ob, (ObjSecId)i, dasm, relocs, - nrelocs, flat_data, total, &nbt); + btargets = collect_branch_targets(c, dasm, relocs, nrelocs, flat_data, + total, &nbt); } } else if (total > 0 && sec->kind != SEC_BSS) { Heap* heap = c->ctx->heap; @@ -1422,7 +1302,7 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder, } else if ((sec->flags & SF_EXEC) && dasm && flat_data) { emit_disasm_range(w, &ctx, dasm, flat_data, off, next); } else if (flat_data) { - emit_data_range(w, c, ob, flat_data, off, next, relocs, nrelocs); + emit_data_range(w, c, flat_data, off, next, relocs, nrelocs); } off = next; } diff --git a/src/arch/aa64/asm.c b/src/arch/aa64/asm.c @@ -492,7 +492,6 @@ static int aa64_is_local_branch(CfreeSlice m) { if (m.len == 4 && memcmp(m.s, "cbnz", 4) == 0) return 1; if (m.len == 3 && memcmp(m.s, "tbz", 3) == 0) return 1; if (m.len == 4 && memcmp(m.s, "tbnz", 4) == 0) return 1; - if (m.len == 3 && memcmp(m.s, "adr", 3) == 0) return 1; return 0; } diff --git a/src/arch/aa64/native.c b/src/arch/aa64/native.c @@ -1639,11 +1639,22 @@ static void aa_indirect_branch(NativeTarget* t, NativeLoc addr, } static void aa_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel target) { - aa_emit32(t->mc, aa64_adr(loc_reg(dst), 0, 0)); - aa_emit32(t->mc, aa64_b(3)); - aa_emit32(t->mc, 0); - aa_emit32(t->mc, 0); - t->mc->emit_label_ref(t->mc, target, R_AARCH64_INTRA_LABEL_ADDR, 16, 0); + /* `&&label` address-take: adrp/add with the ADR_PREL_PG_HI21 + ADD_ABS_LO12_NC + * relocation pair against the label's per-block local symbol — the same form + * used to address a global — so the reference is genuinely relocatable + * (reaches ±4 GiB) and any assembler resolves it from the symbol. Replaces the + * old 16-byte INTRA-label sequence with a baked offset. */ + MCEmitter* mc = t->mc; + u32 rd = loc_reg(dst); + ObjSymId sym = mc_label_symbol(mc, target); + u32 pos = mc->pos(mc); + aa_emit32(mc, aa64_adrp(rd, 0, 0)); + mc->emit_reloc_at(mc, mc->section_id, pos, R_AARCH64_ADR_PREL_PG_HI21, sym, 0, + 0, 0); + pos = mc->pos(mc); + aa_emit32(mc, aa64_add_imm(1, rd, rd, 0, 0)); + mc->emit_reloc_at(mc, mc->section_id, pos, R_AARCH64_ADD_ABS_LO12_NC, sym, 0, + 0, 0); } static void aa_move(NativeTarget* t, NativeLoc dst, NativeLoc src) { diff --git a/src/arch/arch.h b/src/arch/arch.h @@ -244,28 +244,6 @@ typedef struct ArchAsmOps { * pair fusion for the arch. */ int (*reloc_call_pair)(u16 reloc_kind, CfreeSlice pair_mnemonic, CfreeSlice pair_ops, const char** mnemonic_out); - /* Arch-specific leading directives emitted at the very top of a cc -S file, - * before any section, returned as a NUL-terminated string the printer writes - * verbatim (NULL = none). RISC-V returns "\t.option norvc\n.option norelax\n": - * cfree's codegen computes some PC-relative label / jump-table targets as - * fixed byte offsets that assume its own uncompressed, un-relaxed instruction - * stream, so a third-party assembler (clang) must be told not to compress or - * relax, or those offsets shift and the targets break. aarch64/x86-64 have - * fixed-width encodings and no such layout dependence -> NULL. */ - const char* (*file_prologue)(void); - /* 1 if (mnemonic, operands) is an un-relocated PC-relative reference to a - * code address computed as a fixed displacement — x86-64 `leaq disp(%rip), - * reg` emitted for a `&&label` address-take. Sets *disp_out to the signed - * byte displacement from the END of the instruction to the target. The - * symbolizer then synthesizes a label at (insn_end + disp) and rewrites the - * displacement to that label so a re-encoding assembler recomputes it. - * Providing this hook ALSO opts the arch into symbolic switch jump-table - * entries (.quad fn+off -> .quad <label>): both are needed precisely when the - * arch's assembler may pick different instruction lengths than cfree did - * (x86-64 movabs/mov-imm32, jmp rel32/rel8). Fixed-width arches (aarch64) and - * arches that pin layout another way (RISC-V .option norvc) leave it NULL. */ - int (*pcrel_code_target)(CfreeSlice mnemonic, CfreeSlice operands, - i64* disp_out); } ArchAsmOps; typedef struct ArchImpl { @@ -334,23 +312,6 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind, CfreeSlice pair_mnemonic, CfreeSlice pair_ops, const char** mnemonic_out); -/* Leading directive string for the top of a cc -S file for the compiler's - * target arch (e.g. RISC-V `.option norvc`), or NULL when the arch needs none. - * Thin dispatch over ArchAsmOps.file_prologue. */ -const char* arch_asm_file_prologue(const Compiler* c); - -/* 1 if `insn` is an un-relocated PC-relative code-address-take for the target - * arch, with *disp_out set to the signed displacement from the instruction end - * to the target. Thin dispatch over ArchAsmOps.pcrel_code_target. */ -int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic, - CfreeSlice operands, i64* disp_out); - -/* 1 if the target arch needs code locations referenced symbolically (by label) - * rather than as fixed byte offsets in cc -S — true exactly for arches that - * provide pcrel_code_target (x86-64). Drives both `&&label` address-take and - * switch jump-table symbolization. */ -int arch_needs_symbolic_code_refs(const Compiler* c); - ArchDisasm* arch_disasm_new(Compiler*); u32 arch_disasm_decode(ArchDisasm*, const u8* bytes, size_t len, u64 vaddr, CfreeInsn* out); diff --git a/src/arch/mc.c b/src/arch/mc.c @@ -32,6 +32,7 @@ #include "core/buf.h" #include "core/heap.h" #include "core/pool.h" +#include "core/strbuf.h" #include "debug/dwarf_defs.h" #include "obj/obj.h" @@ -68,6 +69,12 @@ typedef struct MCLabelInfo { u32 offset; MCFixup* pending; MCDataLabelRef* pending_data; + /* Lazily-minted SB_LOCAL symbol for this label, for code-location + * references that must survive a re-encoding assembler: switch jump-table + * entries (.quad <sym>) and `&&label` address-takes (a PC-relative reloc + * against <sym>). OBJ_SYM_NONE until first requested via mc_label_symbol; + * defined at the label's offset in m_label_place (forward-ref safe). */ + ObjSymId block_sym; } MCLabelInfo; /* ---- CFI buffering (.eh_frame producer) ---- @@ -140,39 +147,31 @@ static void labels_grow(MCImpl* mc, u32 want) { mc->cap = ncap; } -static void emit_label_data_reloc_now(MCImpl* mc, const MCDataLabelRef* r, - u32 label_offset) { - i64 addend; +static void emit_label_data_reloc_now(MCImpl* mc, MCLabel label, + const MCDataLabelRef* r) { + /* Reference the label's per-block local symbol (its value IS the label's + * offset) rather than the enclosing function symbol + a baked byte offset. + * That makes the entry genuinely relocatable: a third-party assembler that + * re-encodes the function to different instruction lengths still resolves it + * to the right address (a fixed fn+offset would point into the wrong + * instruction). */ + ObjSymId sym = mc_label_symbol(&mc->base, label); + i64 addend = r->extra_addend; u8 bytes[8]; u32 i; - int big_endian; - if (mc->base.cur_func_sym == OBJ_SYM_NONE) { - compiler_panic(mc->base.c, mc->base.loc, - "MCEmitter: label-data reloc resolved outside a function"); - } - addend = (i64)label_offset - (i64)mc->base.cur_func_start + r->extra_addend; - /* Patch the inline addend into the data bytes. Object formats that - * carry the addend in the relocation record (ELF RELA) read both - * inline and r->addend; static link adds them. Mach-O R_ABS64 - * (ARM64_RELOC_UNSIGNED) only reads the inline addend. Write the - * computed addend inline and pass 0 in the reloc so both formats - * resolve to the same runtime address. */ - big_endian = mc->base.c->target.big_endian; + int big_endian = mc->base.c->target.big_endian; + /* Patch the inline addend (Mach-O ARM64_RELOC_UNSIGNED reads only the inline + * value) and also pass it in the reloc record (ELF RELA / the JIT linker's + * link_reloc_apply, where the inline gets overwritten by S + A). Both paths + * converge on sym + addend at runtime. */ memset(bytes, 0, sizeof bytes); for (i = 0; i < r->width && i < sizeof bytes; ++i) { u32 shift = big_endian ? (r->width - 1u - i) * 8u : i * 8u; bytes[i] = (u8)((u64)addend >> shift); } obj_patch(mc->base.obj, r->data_sec, r->data_offset, bytes, r->width); - /* Pass the addend in BOTH the inline data bytes AND the reloc record: - * - Mach-O ARM64_RELOC_UNSIGNED uses only the inline value (the .o - * emitter drops the record's addend for UNSIGNED). - * - ELF RELA and the JIT linker's link_reloc_apply use the record - * addend (the inline gets overwritten by S + A). - * Both paths converge on sym + addend at runtime. */ - mc->base.emit_reloc_at(&mc->base, r->data_sec, r->data_offset, r->kind, - mc->base.cur_func_sym, addend, - /*explicit_addend=*/1, /*pair=*/0); + mc->base.emit_reloc_at(&mc->base, r->data_sec, r->data_offset, r->kind, sym, + addend, /*explicit_addend=*/1, /*pair=*/0); } static void apply_fixup(MCImpl* mc, const MCFixup* fx, u32 target_offset) { @@ -199,6 +198,36 @@ static void apply_fixup(MCImpl* mc, const MCFixup* fx, u32 target_offset) { } } +/* Lazily mint (and return) a per-label SB_LOCAL symbol defined at the label's + * placement, for code-location references an encoding-divergent assembler must + * be able to recompute: switch jump-table entries and `&&label` address-takes + * relocate against it instead of baking a fixed offset. Created undefined if the + * label is not yet placed (a forward reference) and defined in m_label_place; + * defined immediately otherwise. The name is per-object-unique (MCLabel ids are + * monotonic within a TU). */ +ObjSymId mc_label_symbol(MCEmitter* m, MCLabel id) { + MCImpl* mc = impl_of(m); + MCLabelInfo* li; + char buf[40]; + StrBuf sb; + Sym name; + if (id == MC_LABEL_NONE || id >= mc->nlabels) { + compiler_panic(m->c, m->loc, "MCEmitter: bad label %u for symbol", + (unsigned)id); + } + li = &mc->labels[id]; + if (li->block_sym != OBJ_SYM_NONE) return li->block_sym; + strbuf_init(&sb, buf, sizeof buf); + strbuf_put_slice(&sb, SLICE_LIT(".Lcfblk.")); + strbuf_put_u64(&sb, (u64)id); + name = pool_intern_slice(m->c->global, strbuf_slice(&sb)); + li->block_sym = + obj_symbol(m->obj, name, SB_LOCAL, SK_NOTYPE, + li->placed ? li->sec_id : OBJ_SEC_NONE, + li->placed ? (u64)li->offset : 0u, 0); + return li->block_sym; +} + /* ---- vtable methods ---- */ static void m_set_section(MCEmitter* m, u32 section_id) { @@ -221,6 +250,7 @@ static MCLabel m_label_new(MCEmitter* m) { li->offset = 0; li->pending = NULL; li->pending_data = NULL; + li->block_sym = OBJ_SYM_NONE; return (MCLabel)id; } @@ -237,6 +267,11 @@ static void m_label_place(MCEmitter* m, MCLabel id) { li->placed = 1; li->sec_id = m->section_id; li->offset = obj_pos(m->obj, m->section_id); + /* Define the lazily-minted block symbol (if any) now that the offset is + * known — resolves the forward-reference case for jump-table / &&label + * relocations emitted before the label was placed. */ + if (li->block_sym != OBJ_SYM_NONE) + obj_symbol_define(m->obj, li->block_sym, li->sec_id, (u64)li->offset, 0); /* Apply pending intra-section fixups. */ for (MCFixup* fx = li->pending; fx; fx = fx->next) { apply_fixup(mc, fx, li->offset); @@ -247,7 +282,7 @@ static void m_label_place(MCEmitter* m, MCLabel id) { * body is currently being emitted; the label is always placed inside * its owning function's emit, so the active function context matches. */ for (MCDataLabelRef* r = li->pending_data; r; r = r->next) { - emit_label_data_reloc_now(mc, r, li->offset); + emit_label_data_reloc_now(mc, id, r); } li->pending_data = NULL; } @@ -326,7 +361,7 @@ static void m_emit_label_data_reloc(MCEmitter* m, u32 data_sec, u32 data_offset, tmp.width = width; tmp.extra_addend = extra_addend; tmp.next = NULL; - emit_label_data_reloc_now(mc, &tmp, li->offset); + emit_label_data_reloc_now(mc, id, &tmp); return; } { diff --git a/src/arch/mc.h b/src/arch/mc.h @@ -127,6 +127,14 @@ struct MCEmitter { MCEmitter* mc_new(Compiler*, ObjBuilder*); void mc_free(MCEmitter*); +/* Lazily mint (and return) a per-label SB_LOCAL symbol defined at `label`'s + * placement. Backends use this to reference a code location relocatably — + * `&&label` address-takes emit a PC-relative reloc against it instead of baking + * a fixed displacement, so a re-encoding assembler (clang) recomputes the right + * address. Forward-ref safe: if `label` is not yet placed the symbol is created + * undefined and defined in label_place. */ +ObjSymId mc_label_symbol(MCEmitter*, MCLabel label); + /* Per-function context helpers. Backends call mc_begin_function from * their CgTarget func_begin (after computing the post-alignment function * start) and mc_end_function from func_end. The pair sets / clears diff --git a/src/arch/registry.c b/src/arch/registry.c @@ -110,23 +110,6 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind, mnemonic_out); } -const char* arch_asm_file_prologue(const Compiler* c) { - const ArchImpl* a = arch_for_compiler(c); - if (!a || !a->asm_ops || !a->asm_ops->file_prologue) return NULL; - return a->asm_ops->file_prologue(); -} - -int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic, - CfreeSlice operands, i64* disp_out) { - const ArchImpl* a = arch_for_compiler(c); - if (!a || !a->asm_ops || !a->asm_ops->pcrel_code_target) return 0; - return a->asm_ops->pcrel_code_target(mnemonic, operands, disp_out); -} - -int arch_needs_symbolic_code_refs(const Compiler* c) { - const ArchImpl* a = arch_for_compiler(c); - return a && a->asm_ops && a->asm_ops->pcrel_code_target != NULL; -} const CGBackend* cg_backend_for_session(const Compiler* c, const CfreeCodeOptions* opts) { diff --git a/src/arch/rv64/asm.c b/src/arch/rv64/asm.c @@ -1110,26 +1110,10 @@ static int rv64_reloc_call_pair(u16 kind, CfreeSlice pair_mnemonic, return 0; } -/* RISC-V cc -S file prologue. cfree computes a few PC-relative targets as - * fixed byte offsets baked into the instruction stream rather than as symbolic - * relocations: a `&&label` address-of (auipc+addi with a hardcoded immediate, - * no reloc) and switch jump-table entries (`.quad fn+offset`). Both assume - * cfree's own 4-byte-per-instruction, un-relaxed layout. A standards-conformant - * assembler such as clang defaults to the C extension and would compress - * instructions (e.g. `mv`->`c.mv`), shifting every later offset and sending - * those targets to the wrong place. `.option norvc`/`.option norelax` pin the - * layout so cfree's offsets stay valid through any assembler — cfree's own - * codegen never emits compressed/relaxed forms, so this only constrains a - * third party to match what cfree already does. */ -static const char* rv64_file_prologue(void) { - return "\t.option norvc\n\t.option norelax\n"; -} - const ArchAsmOps rv64_asm_ops = { .reloc_operand = rv64_reloc_operand, .is_local_branch = rv64_is_local_branch, .reloc_call_pair = rv64_reloc_call_pair, - .file_prologue = rv64_file_prologue, }; ArchAsm* rv64_arch_asm_new(Compiler* c) { diff --git a/src/arch/rv64/native.c b/src/arch/rv64/native.c @@ -1177,10 +1177,25 @@ static void rv_indirect_branch(NativeTarget* t, NativeLoc addr, } static void rv_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel l) { + /* `&&label` address-take: auipc/addi with a %pcrel_hi/%pcrel_lo relocation + * pair against the label's per-block local symbol — the same form + * rv_emit_global_addr uses for a global — so a compressing/re-encoding + * assembler recomputes the displacement (a baked offset would break under + * the C extension). */ + MCEmitter* mc = t->mc; u32 rd = loc_reg(dst); - rv64_emit32(t->mc, rv_auipc(rd, 0)); - rv64_emit32(t->mc, rv_addi(rd, rd, 0)); - t->mc->emit_label_ref(t->mc, l, R_RV_INTRA_AUIPC_ADDI, 8, 0); + u32 sec = mc->section_id; + ObjSymId sym = mc_label_symbol(mc, l); + u32 ap = mc->pos(mc); + rv64_emit32(mc, rv_auipc(rd, 0)); + mc->emit_reloc_at(mc, sec, ap, R_RV_PCREL_HI20, sym, 0, 0, 0); + { + Sym an = pool_intern_slice(t->c->global, SLICE_LIT(".LpcrelHi")); + ObjSymId anchor = obj_symbol(t->obj, an, SB_LOCAL, SK_OBJ, sec, (u64)ap, 0); + u32 lp = mc->pos(mc); + rv64_emit32(mc, rv_addi(rd, rd, 0)); + mc->emit_reloc_at(mc, sec, lp, R_RV_PCREL_LO12_I, anchor, 0, 0, 0); + } } /* ============================ frame / lifecycle ============================ */ diff --git a/src/arch/x64/asm.c b/src/arch/x64/asm.c @@ -1637,75 +1637,9 @@ static int x64_is_local_branch(CfreeSlice m) { return 0; } -/* Parse a leading signed integer (decimal or 0x-hex) from [s, s+len). Returns - * chars consumed and sets *out, or 0 if no integer starts here. */ -static u32 x64_parse_leading_int(const char* s, u32 len, i64* out) { - u32 i = 0, start; - int neg = 0; - i64 v = 0; - if (i < len && (s[i] == '+' || s[i] == '-')) { - neg = (s[i] == '-'); - ++i; - } - if (i + 1 < len && s[i] == '0' && (s[i + 1] == 'x' || s[i + 1] == 'X')) { - i += 2; - start = i; - for (; i < len; ++i) { - char c = s[i]; - if (c >= '0' && c <= '9') - v = v * 16 + (c - '0'); - else if (c >= 'a' && c <= 'f') - v = v * 16 + (c - 'a' + 10); - else if (c >= 'A' && c <= 'F') - v = v * 16 + (c - 'A' + 10); - else - break; - } - } else { - start = i; - for (; i < len; ++i) { - char c = s[i]; - if (c >= '0' && c <= '9') - v = v * 10 + (c - '0'); - else - break; - } - } - if (i == start) return 0; - *out = neg ? -v : v; - return i; -} - -/* x86-64 `&&label` address-take: an un-relocated `leaq <disp>(%rip), %reg`. The - * disassembler renders the resolved target as a fixed displacement from the - * next instruction (the %rip base); report it so the symbolizer can swap in a - * label that an encoding-divergent assembler will recompute correctly. */ -static int x64_pcrel_code_target(CfreeSlice mnemonic, CfreeSlice operands, - i64* disp_out) { - const char* o = operands.s; - u32 ol = operands.len, i, n; - i64 disp = 0; - int has_rip = 0; - if (!(mnemonic.len == 4 && memcmp(mnemonic.s, "leaq", 4) == 0) && - !(mnemonic.len == 3 && memcmp(mnemonic.s, "lea", 3) == 0)) - return 0; - for (i = 0; i + 6 <= ol; ++i) - if (memcmp(o + i, "(%rip)", 6) == 0) { - has_rip = 1; - break; - } - if (!has_rip) return 0; - n = x64_parse_leading_int(o, ol, &disp); - /* The displacement must sit immediately before `(%rip)`. */ - if (n == 0 || !(n + 6 <= ol && memcmp(o + n, "(%rip)", 6) == 0)) return 0; - *disp_out = disp; - return 1; -} - const ArchAsmOps x64_asm_ops = { .reloc_operand = x64_reloc_operand, .is_local_branch = x64_is_local_branch, - .pcrel_code_target = x64_pcrel_code_target, }; ArchAsm* x64_arch_asm_new(Compiler* c) { return &x64_asm_open(c)->base; } diff --git a/src/arch/x64/native.c b/src/arch/x64/native.c @@ -1347,8 +1347,15 @@ static void x64_indirect_branch(NativeTarget* t, NativeLoc addr, } static void x64_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel l) { + /* `&&label` address-take: `leaq sym(%rip), rd` with an R_PC32 relocation + * against the label's per-block local symbol — same form as a global + * address-take, so a re-encoding assembler recomputes the displacement. + * (A baked disp32 with no reloc would break once clang re-lays-out the + * function.) */ MCEmitter* mc = t->mc; u32 rd = loc_reg(dst); + ObjSymId sym = mc_label_symbol(mc, l); + u32 disp_pos; emit_rex(mc, 1, rd, 0, 0); { u8 op = X64_OPC_LEA; @@ -1358,8 +1365,9 @@ static void x64_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel l) { u8 mr = modrm(0u, rd & 7u, 5u); /* [rip + disp32] */ mc->emit_bytes(mc, &mr, 1); } + disp_pos = mc->pos(mc); emit_u32le(mc, 0); - mc->emit_label_ref(mc, l, R_PC32, 4, -4); + mc->emit_reloc_at(mc, mc->section_id, disp_pos, R_PC32, sym, -4, 1, 0); } /* ============================ frame / lifecycle ============================ */ diff --git a/test/asm/hostas_cross.sh b/test/asm/hostas_cross.sh @@ -27,20 +27,16 @@ # clang cross-compiler for it, (2) a runner (podman/qemu) per exec_target, (3) a # working `cfree cc -S | cfree as` round-trip for that arch, and (4) a bounded # exec smoke that returns the oracle. So the harness runs green on whatever the -# host supports and self-extends as gaps close. All three ELF targets now pass -# BOTH lanes end-to-end (936/936 = 312 cases x {O0,O1} x 3 arches, ENFORCE_CLANG): -# - aarch64-linux: podman runs arm64 natively in its VM; fixed-width encodings -# keep cfree's layout, so code references need no special form. -# - x86_64-linux: cc -S references code locations symbolically — `&&label` -# address-takes (`leaq Lcf_*(%rip)`) and switch jump-table -# entries (`.quad Lcf_*`) — so clang's encoding choices -# (movabs vs mov-imm32, jmp rel32 vs rel8) can't shift a fixed -# offset onto the wrong instruction. (ArchAsmOps.pcrel_code_target -# + collect_code_anchors; see src/api/asm_emit.c.) -# - riscv64-linux: cc -S emits `.option norvc`/`.option norelax` to pin cfree's -# fixed instruction layout against clang's C-extension -# compression, plus the %pcrel_hi/%pcrel_lo + AUIPC/JALR call -# symbolizer. +# host supports and self-extends as gaps close. All three ELF targets pass BOTH +# lanes end-to-end (936/936 = 312 cases x {O0,O1} x 3 arches, ENFORCE_CLANG). +# Code locations that an encoding-divergent assembler must recompute — switch +# jump-table entries and `&&label` address-takes — are referenced via per-block +# local symbols emitted by codegen (mc_label_symbol): the jump table is +# `.quad .Lcfblk.*` (R_ABS64) and the address-take a standard PC-relative reloc +# against the same symbol (x86-64 leaq/R_PC32, aarch64 adrp+add, riscv64 +# auipc+addi/%pcrel). So the references are genuinely relocatable on every arch +# and clang's encoding choices (movabs vs mov-imm32, jmp rel32 vs rel8, RVC +# compression) can't shift a baked offset onto the wrong instruction. # Execution under qemu-user (x86_64/riscv64 in their podman containers) is the # sole judge — cfree and clang emit different code, so a byte/text match would be # meaningless. The batched runner caps each case (EXEC_CASE_TIMEOUT) so one