commit 07949547f417b0fc4fdeb839a3e163f28473fae9
parent 3d661011371f95c34738c1b22bda05e0309a96f5
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sun, 31 May 2026 02:00:19 -0700
cg: reference code locations via per-block local symbols, not fixed offsets
Switch jump-table entries and `&&label` computed-goto address-takes were emitted
as layout-dependent fixed offsets: the jump table as `R_ABS64` against the
enclosing function symbol + a baked byte offset, and the address-take as an
in-place-patched displacement with NO object relocation at all (x86-64
`leaq disp(%rip)`, rv64 `auipc/addi`, aarch64 a 16-byte INTRA-label sequence).
These only stay correct if the assembler preserves cfree's exact instruction
layout — so aarch64 survived by being fixed-width, rv64 needed a `.option norvc`
band-aid, and x86-64 needed a disassembler special case (`pcrel_code_target`)
gated to x64 (`arch_needs_symbolic_code_refs`). That gate was a code smell: the
same fragility was latent in every backend.
Fix it once, uniformly: the MCEmitter lazily mints a per-MCLabel SB_LOCAL symbol
(`mc_label_symbol`, forward-ref safe — created undefined, defined at
label_place), and both constructs relocate against it:
- jump table: `R_ABS64` vs `.Lcfblk.*` (addend 0)
- `&&label`: x86-64 `leaq sym(%rip)` R_PC32; aarch64 `adrp/add`
ADR_PREL_PG_HI21 + ADD_ABS_LO12_NC; riscv64 `auipc/addi`
PCREL_HI20 + PCREL_LO12_I (the same forms used to address a global).
The existing arch_reloc_operand symbolizer renders all of them, so cc -S needs
no per-arch code; the object is genuinely relocatable; and cc -c / cc -S emit
the same relocations so the round-trip byte+reloc lanes stay faithful.
Deletes the band-aids: the x64 pcrel_code_target hook + the
arch_needs_symbolic_code_refs gate, the rv64 file_prologue (.option norvc), the
aa64 `adr` is_local_branch case, and the asm_emit.c jump-table/leaq special
cases (collect_code_anchors extras, code_target_label, SecReloc target fields).
Net -182 lines. (The now-unused R_*_INTRA_* fixup kinds are left inert.)
Verified green, no regressions: test-hostas-cross 936/936 both lanes on all 3
arches; hostas-toy 312/0; test-asm-roundtrip 572/0 (L1 byte+reloc compare
confirms cc-c==cc-S); roundtrip-toy 624/0 (JIT + native ld); toy 1338/0; asm
27/0; symmetry clean; diff-llvm agrees; link 122/0; elf 40/0; debug OK;
smoke-x64 2/0; smoke-rv64 2/0.
Diffstat:
13 files changed, 163 insertions(+), 350 deletions(-)
diff --git a/doc/ASM_ROUNDTRIP_TESTING.md b/doc/ASM_ROUNDTRIP_TESTING.md
@@ -250,23 +250,22 @@ downgrades to SKIP instead of hanging). All three ELF targets are in the gating
default and pass **both** lanes — **936/936** = 312 cases × {O0,O1} × 3 arches,
cfree-as **and** clang-as, judged purely by execution (matching exit code):
-- **aarch64-linux**: green end-to-end — podman runs arm64 natively in its VM, so
- it's fast. Fixed-width encodings preserve cfree's instruction layout, so code
- references need no special spelling.
-- **x86_64-linux**: cc -S references code locations **symbolically** so clang's
- encoding choices (movabs vs mov-imm32, `jmp` rel32 vs rel8) can't shift a fixed
- byte offset onto the wrong instruction: a `&&label` address-take is
- `leaq Lcf_*(%rip)` (un-relocated PC-relative computes are detected via
- `ArchAsmOps.pcrel_code_target`, the target gets a synthesized label), and
- switch jump-table entries are `.quad Lcf_*` rather than `.quad fn+off`
- (absolute data pointers into an executable section are re-pointed at a
- synthesized code label by `collect_code_anchors` + `code_target_label`).
-- **riscv64-linux**: cc -S emits `.option norvc`/`.option norelax`
- (`ArchAsmOps.file_prologue`) to pin cfree's fixed layout against clang's
- C-extension compression — cfree computes some `&&label`/jump-table targets as
- fixed offsets, which compression would otherwise shift — plus the
- `%pcrel_hi`/`%pcrel_lo` AUIPC-anchor pairing and `R_RISCV_CALL` AUIPC+JALR
- fusion to `call`/`tail`.
+Code locations that an encoding-divergent assembler must be able to recompute —
+switch jump-table entries and `&&label` address-takes — are referenced through a
+**per-basic-block local symbol** the MCEmitter mints (`mc_label_symbol`,
+`src/arch/mc.c`), uniformly on all three arches. The jump table emits
+`.quad .Lcfblk.*` (`R_ABS64` against the block symbol) and the address-take emits
+the arch's standard PC-relative relocation against the same symbol: x86-64
+`leaq .Lcfblk.*(%rip)` (`R_PC32`), aarch64 `adrp`/`add`
+(`ADR_PREL_PG_HI21`+`ADD_ABS_LO12_NC`), riscv64 `auipc`/`addi`
+(`%pcrel_hi`/`%pcrel_lo`). The existing reloc-operand symbolizer renders all of
+them, so the references are genuinely relocatable everywhere and clang's encoding
+choices (movabs vs mov-imm32, `jmp` rel32 vs rel8, RVC compression) can't shift a
+baked offset onto the wrong instruction. `cc -c` and `cc -S` emit the same
+relocations, so the round-trip byte/reloc lanes stay faithful too.
+
+(aarch64-linux runs arm64 natively in the podman VM, so it's the fastest lane;
+x86_64/riscv64 run under qemu-user in their containers.)
Both lanes are judged by **execution**, never by bytes: cfree and clang emit
different (execution-equivalent) code, so a byte/text match would be meaningless.
diff --git a/src/api/asm_emit.c b/src/api/asm_emit.c
@@ -529,8 +529,6 @@ typedef struct {
u16 kind;
Sym sym;
i64 addend;
- ObjSecId target_sec; /* section the reloc's symbol is defined in (or NONE) */
- u64 target_val; /* the symbol's value (offset within target_sec) */
} SecReloc;
static int cmp_secreloc(const void* va, const void* vb) {
@@ -566,8 +564,6 @@ static SecReloc* collect_relocs(Compiler* c, ObjBuilder* ob, ObjSecId sec_id,
arr[n].kind = r->kind;
arr[n].sym = s ? s->name : (Sym)0;
arr[n].addend = r->addend;
- arr[n].target_sec = s ? s->section_id : OBJ_SEC_NONE;
- arr[n].target_val = s ? s->value : 0;
++n;
}
if (n > 1) qsort(arr, n, sizeof(SecReloc), cmp_secreloc);
@@ -929,24 +925,17 @@ static void anchor_add(Compiler* c, u32** arr, u32* n, u32* cap, u32 off) {
(*arr)[(*n)++] = off;
}
-/* Pre-scan: offsets in section `sec_id` that need a synthesized Lcf_ label so a
- * layout-dependent reference resolves symbolically through any assembler:
- * 1. targets of un-relocated intra-section local branches (b/jmp/jcc);
- * 2. targets of un-relocated PC-relative code-address-takes (x86-64 `leaq
- * disp(%rip)` for `&&label`);
- * 3. offsets targeted by an absolute data-pointer relocation living in a
- * NON-executable section (switch jump-table `.quad fn+off` entries).
- * (2)/(3) are exactly the references that break when the assembler picks
- * different instruction lengths than cfree did, so they are collected only for
- * arches that need symbolic code refs (x86-64); fixed-width (aarch64) or
- * layout-pinned (RISC-V .option norvc) arches keep the compact offset forms. */
-static u32* collect_code_anchors(Compiler* c, ObjBuilder* ob, ObjSecId sec_id,
- ArchDisasm* dasm, const SecReloc* relocs,
- u32 nrelocs, const u8* data, u32 total,
- u32* n_out) {
+/* Pre-scan: collect in-section branch targets of un-relocated local branches,
+ * so cc -S synthesizes a label there and the branch re-assembles. Code-location
+ * references that must survive a re-encoding assembler (switch jump-table
+ * entries, `&&label` address-takes) are NOT handled here — codegen emits them as
+ * relocations against per-block local symbols (mc_label_symbol), so the normal
+ * reloc-operand path symbolizes them and the target label is a real symbol. */
+static u32* collect_branch_targets(Compiler* c, ArchDisasm* dasm,
+ const SecReloc* relocs, u32 nrelocs,
+ const u8* data, u32 total, u32* n_out) {
u32* arr = NULL;
u32 n = 0, cap = 0, off = 0;
- int want_sym = arch_needs_symbolic_code_refs(c);
*n_out = 0;
while (off < total) {
@@ -957,44 +946,14 @@ static u32* collect_code_anchors(Compiler* c, ObjBuilder* ob, ObjSecId sec_id,
off += 1;
continue;
}
- if (!reloc_in_range(relocs, nrelocs, off, nb)) {
- if (arch_is_local_branch(c, insn.mnemonic) &&
- parse_hex_tail(insn.operands, &tgt) && tgt < total) {
- anchor_add(c, &arr, &n, &cap, (u32)tgt);
- } else if (want_sym) {
- i64 disp;
- if (arch_pcrel_code_target(c, insn.mnemonic, insn.operands, &disp)) {
- i64 t = (i64)off + (i64)nb + disp;
- if (t >= 0 && (u64)t < total)
- anchor_add(c, &arr, &n, &cap, (u32)t);
- }
- }
+ if (!reloc_in_range(relocs, nrelocs, off, nb) &&
+ arch_is_local_branch(c, insn.mnemonic) &&
+ parse_hex_tail(insn.operands, &tgt) && tgt < total) {
+ anchor_add(c, &arr, &n, &cap, (u32)tgt);
}
off += nb;
}
- if (want_sym) {
- u32 nr = obj_reloc_total(ob), i;
- for (i = 0; i < nr; ++i) {
- const Reloc* r = obj_reloc_at(ob, i);
- const Section* host;
- const ObjSym* s;
- const char* dir;
- u32 width;
- int pcrel;
- i64 t;
- if (!r || r->removed) continue;
- host = obj_section_get(ob, r->section_id);
- if (!host || (host->flags & SF_EXEC)) continue; /* code reloc: skip */
- if (!data_reloc_directive(r->kind, &dir, &width, &pcrel) || pcrel)
- continue; /* only absolute data pointers (jump-table entries) */
- s = obj_symbol_get(ob, r->sym);
- if (!s || s->section_id != sec_id) continue;
- t = (i64)s->value + r->addend;
- if (t >= 0 && (u64)t < total) anchor_add(c, &arr, &n, &cap, (u32)t);
- }
- }
-
if (n > 1) qsort(arr, n, sizeof(u32), cmp_u32);
*n_out = n;
return arr;
@@ -1044,65 +1003,20 @@ static CfreeStatus emit_operands(Writer* w, const EmitCtx* x,
return w_symbolized(w, insn->operands.s, insn->operands.len, name,
ARCH_RELOC_SURG_TAIL);
}
- } else {
- /* Un-relocated PC-relative code-address-take (x86-64 `leaq disp(%rip)` for
- * `&&label`): rewrite the fixed displacement to the synthesized target
- * label so an encoding-divergent assembler recomputes it. */
- i64 disp;
- if (arch_pcrel_code_target(x->c, insn->mnemonic, insn->operands, &disp)) {
- i64 t = (i64)off + (i64)insn->nbytes + disp;
- if (t >= 0 && is_btarget(x, (u32)t)) {
- char name[256];
- build_label_name(name, sizeof name, x, (u32)t);
- return w_symbolized(w, insn->operands.s, insn->operands.len, name,
- ARCH_RELOC_SURG_RIP);
- }
- }
}
return cfree_writer_write(w, insn->operands.s, insn->operands.len);
}
-/* Symbolic name for a code location (target_sec:target_off) referenced from a
- * data directive: an assemblable label defined exactly there if one exists,
- * else the synthesized `Lcf_<sec>_<off>` that collect_code_anchors guarantees
- * is emitted in the target section. Mirrors the synth-vs-real choice the label
- * emitter makes (symbol_at / build_label_name), so both ends agree. */
-static u32 code_target_label(char* buf, u32 cap, Compiler* c, ObjBuilder* ob,
- ObjSecId target_sec, u32 target_off) {
- ObjSymIter* it = obj_symiter_new(ob);
- if (it) {
- ObjSymEntry e;
- while (obj_symiter_next(it, &e)) {
- const ObjSym* s = e.sym;
- Slice nm;
- if (!s || s->removed || !s->name) continue;
- if (s->section_id != target_sec || (u32)s->value != target_off) continue;
- if (s->kind == SK_SECTION || s->kind == SK_FILE) continue;
- nm = pool_slice(c->global, s->name);
- if (slice_eq_cstr(nm, ".LpcrelHi")) continue;
- if (sym_is_assemblable(nm)) {
- u32 p = 0, j;
- for (j = 0; j < nm.len && p + 1 < cap; ++j) buf[p++] = nm.s[j];
- buf[p] = '\0';
- obj_symiter_free(it);
- return p;
- }
- }
- obj_symiter_free(it);
- }
- return fmt_synth_label(buf, cap, (u32)target_sec, target_off);
-}
-
/* Emit a data range, rendering any covered relocation as a symbolic integer
* directive (`.quad sym+addend`) so cc -S | as reproduces the data relocation
- * table — switch jump tables (R_ABS64 against the function) and any other
- * relocated rodata/data. A reloc kind with no integer-directive form, or a
- * target the assembler can't spell, falls back to raw `.byte`; the dropped
- * reloc then surfaces in the round-trip's reloc comparison. `relocs` is the
- * section's relocation list, sorted by offset. */
-static CfreeStatus emit_data_range(Writer* w, Compiler* c, ObjBuilder* ob,
- const u8* data, u32 start, u32 end,
- const SecReloc* relocs, u32 nrelocs) {
+ * table — switch jump tables (`.quad .Lcfblk.*` against per-block local
+ * symbols) and any other relocated rodata/data. A reloc kind with no
+ * integer-directive form, or a target the assembler can't spell, falls back to
+ * raw `.byte`; the dropped reloc then surfaces in the round-trip's reloc
+ * comparison. `relocs` is the section's relocation list, sorted by offset. */
+static CfreeStatus emit_data_range(Writer* w, Compiler* c, const u8* data,
+ u32 start, u32 end, const SecReloc* relocs,
+ u32 nrelocs) {
u32 off = start;
while (off < end) {
const SecReloc* r = NULL;
@@ -1128,33 +1042,6 @@ static CfreeStatus emit_data_range(Writer* w, Compiler* c, ObjBuilder* ob,
* re-derives R_PC{32,64} instead of an absolute reloc. */
ArchRelocOperand bare = {ARCH_RELOC_SURG_NONE, "", "", 0, 0, 0};
if (data_reloc_directive(r->kind, &dir, &width, &pcrel) &&
- off + width <= end) {
- const Section* tsec = (r->target_sec != OBJ_SEC_NONE)
- ? obj_section_get(ob, r->target_sec)
- : NULL;
- /* An absolute pointer into executable code (switch jump-table entry):
- * spell it as a label that moves with the code rather than `fn+off`.
- * After an encoding-divergent assembler re-lays-out the function, a
- * fixed offset would point into the wrong instruction; a label is
- * recomputed to the correct address. Only for arches that need it. */
- if (!pcrel && tsec && (tsec->flags & SF_EXEC) &&
- arch_needs_symbolic_code_refs(c)) {
- char label[256];
- u64 toff = r->target_val + (u64)r->addend;
- CfreeStatus st;
- code_target_label(label, sizeof label, c, ob, r->target_sec,
- (u32)toff);
- st = w_str(w, dir);
- if (st != CFREE_OK) return st;
- st = w_str(w, label);
- if (st != CFREE_OK) return st;
- st = w_newline(w);
- if (st != CFREE_OK) return st;
- off += width;
- continue;
- }
- }
- if (data_reloc_directive(r->kind, &dir, &width, &pcrel) &&
off + width <= end &&
build_symref(symref, sizeof symref, c, &bare, r->sym, r->addend) >=
0) {
@@ -1316,13 +1203,6 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder,
sx.c = c;
nsec = obj_section_count(ob);
- /* Arch-specific leading directives (e.g. RISC-V `.option norvc` to pin
- * cfree's fixed instruction layout against a compressing assembler). */
- {
- const char* prologue = arch_asm_file_prologue(c);
- if (prologue) w_str(w, prologue);
- }
-
for (i = 1; i < nsec; ++i) {
const Section* sec = obj_section_get(ob, (ObjSecId)i);
SymLabel* labels;
@@ -1368,8 +1248,8 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder,
buf_flatten(&sec->bytes, heap_data);
flat_data = heap_data;
if (dasm)
- btargets = collect_code_anchors(c, ob, (ObjSecId)i, dasm, relocs,
- nrelocs, flat_data, total, &nbt);
+ btargets = collect_branch_targets(c, dasm, relocs, nrelocs, flat_data,
+ total, &nbt);
}
} else if (total > 0 && sec->kind != SEC_BSS) {
Heap* heap = c->ctx->heap;
@@ -1422,7 +1302,7 @@ CfreeStatus cfree_obj_builder_emit_asm(CfreeObjBuilder* builder,
} else if ((sec->flags & SF_EXEC) && dasm && flat_data) {
emit_disasm_range(w, &ctx, dasm, flat_data, off, next);
} else if (flat_data) {
- emit_data_range(w, c, ob, flat_data, off, next, relocs, nrelocs);
+ emit_data_range(w, c, flat_data, off, next, relocs, nrelocs);
}
off = next;
}
diff --git a/src/arch/aa64/asm.c b/src/arch/aa64/asm.c
@@ -492,7 +492,6 @@ static int aa64_is_local_branch(CfreeSlice m) {
if (m.len == 4 && memcmp(m.s, "cbnz", 4) == 0) return 1;
if (m.len == 3 && memcmp(m.s, "tbz", 3) == 0) return 1;
if (m.len == 4 && memcmp(m.s, "tbnz", 4) == 0) return 1;
- if (m.len == 3 && memcmp(m.s, "adr", 3) == 0) return 1;
return 0;
}
diff --git a/src/arch/aa64/native.c b/src/arch/aa64/native.c
@@ -1639,11 +1639,22 @@ static void aa_indirect_branch(NativeTarget* t, NativeLoc addr,
}
static void aa_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel target) {
- aa_emit32(t->mc, aa64_adr(loc_reg(dst), 0, 0));
- aa_emit32(t->mc, aa64_b(3));
- aa_emit32(t->mc, 0);
- aa_emit32(t->mc, 0);
- t->mc->emit_label_ref(t->mc, target, R_AARCH64_INTRA_LABEL_ADDR, 16, 0);
+ /* `&&label` address-take: adrp/add with the ADR_PREL_PG_HI21 + ADD_ABS_LO12_NC
+ * relocation pair against the label's per-block local symbol — the same form
+ * used to address a global — so the reference is genuinely relocatable
+ * (reaches ±4 GiB) and any assembler resolves it from the symbol. Replaces the
+ * old 16-byte INTRA-label sequence with a baked offset. */
+ MCEmitter* mc = t->mc;
+ u32 rd = loc_reg(dst);
+ ObjSymId sym = mc_label_symbol(mc, target);
+ u32 pos = mc->pos(mc);
+ aa_emit32(mc, aa64_adrp(rd, 0, 0));
+ mc->emit_reloc_at(mc, mc->section_id, pos, R_AARCH64_ADR_PREL_PG_HI21, sym, 0,
+ 0, 0);
+ pos = mc->pos(mc);
+ aa_emit32(mc, aa64_add_imm(1, rd, rd, 0, 0));
+ mc->emit_reloc_at(mc, mc->section_id, pos, R_AARCH64_ADD_ABS_LO12_NC, sym, 0,
+ 0, 0);
}
static void aa_move(NativeTarget* t, NativeLoc dst, NativeLoc src) {
diff --git a/src/arch/arch.h b/src/arch/arch.h
@@ -244,28 +244,6 @@ typedef struct ArchAsmOps {
* pair fusion for the arch. */
int (*reloc_call_pair)(u16 reloc_kind, CfreeSlice pair_mnemonic,
CfreeSlice pair_ops, const char** mnemonic_out);
- /* Arch-specific leading directives emitted at the very top of a cc -S file,
- * before any section, returned as a NUL-terminated string the printer writes
- * verbatim (NULL = none). RISC-V returns "\t.option norvc\n.option norelax\n":
- * cfree's codegen computes some PC-relative label / jump-table targets as
- * fixed byte offsets that assume its own uncompressed, un-relaxed instruction
- * stream, so a third-party assembler (clang) must be told not to compress or
- * relax, or those offsets shift and the targets break. aarch64/x86-64 have
- * fixed-width encodings and no such layout dependence -> NULL. */
- const char* (*file_prologue)(void);
- /* 1 if (mnemonic, operands) is an un-relocated PC-relative reference to a
- * code address computed as a fixed displacement — x86-64 `leaq disp(%rip),
- * reg` emitted for a `&&label` address-take. Sets *disp_out to the signed
- * byte displacement from the END of the instruction to the target. The
- * symbolizer then synthesizes a label at (insn_end + disp) and rewrites the
- * displacement to that label so a re-encoding assembler recomputes it.
- * Providing this hook ALSO opts the arch into symbolic switch jump-table
- * entries (.quad fn+off -> .quad <label>): both are needed precisely when the
- * arch's assembler may pick different instruction lengths than cfree did
- * (x86-64 movabs/mov-imm32, jmp rel32/rel8). Fixed-width arches (aarch64) and
- * arches that pin layout another way (RISC-V .option norvc) leave it NULL. */
- int (*pcrel_code_target)(CfreeSlice mnemonic, CfreeSlice operands,
- i64* disp_out);
} ArchAsmOps;
typedef struct ArchImpl {
@@ -334,23 +312,6 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind,
CfreeSlice pair_mnemonic, CfreeSlice pair_ops,
const char** mnemonic_out);
-/* Leading directive string for the top of a cc -S file for the compiler's
- * target arch (e.g. RISC-V `.option norvc`), or NULL when the arch needs none.
- * Thin dispatch over ArchAsmOps.file_prologue. */
-const char* arch_asm_file_prologue(const Compiler* c);
-
-/* 1 if `insn` is an un-relocated PC-relative code-address-take for the target
- * arch, with *disp_out set to the signed displacement from the instruction end
- * to the target. Thin dispatch over ArchAsmOps.pcrel_code_target. */
-int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic,
- CfreeSlice operands, i64* disp_out);
-
-/* 1 if the target arch needs code locations referenced symbolically (by label)
- * rather than as fixed byte offsets in cc -S — true exactly for arches that
- * provide pcrel_code_target (x86-64). Drives both `&&label` address-take and
- * switch jump-table symbolization. */
-int arch_needs_symbolic_code_refs(const Compiler* c);
-
ArchDisasm* arch_disasm_new(Compiler*);
u32 arch_disasm_decode(ArchDisasm*, const u8* bytes, size_t len, u64 vaddr,
CfreeInsn* out);
diff --git a/src/arch/mc.c b/src/arch/mc.c
@@ -32,6 +32,7 @@
#include "core/buf.h"
#include "core/heap.h"
#include "core/pool.h"
+#include "core/strbuf.h"
#include "debug/dwarf_defs.h"
#include "obj/obj.h"
@@ -68,6 +69,12 @@ typedef struct MCLabelInfo {
u32 offset;
MCFixup* pending;
MCDataLabelRef* pending_data;
+ /* Lazily-minted SB_LOCAL symbol for this label, for code-location
+ * references that must survive a re-encoding assembler: switch jump-table
+ * entries (.quad <sym>) and `&&label` address-takes (a PC-relative reloc
+ * against <sym>). OBJ_SYM_NONE until first requested via mc_label_symbol;
+ * defined at the label's offset in m_label_place (forward-ref safe). */
+ ObjSymId block_sym;
} MCLabelInfo;
/* ---- CFI buffering (.eh_frame producer) ----
@@ -140,39 +147,31 @@ static void labels_grow(MCImpl* mc, u32 want) {
mc->cap = ncap;
}
-static void emit_label_data_reloc_now(MCImpl* mc, const MCDataLabelRef* r,
- u32 label_offset) {
- i64 addend;
+static void emit_label_data_reloc_now(MCImpl* mc, MCLabel label,
+ const MCDataLabelRef* r) {
+ /* Reference the label's per-block local symbol (its value IS the label's
+ * offset) rather than the enclosing function symbol + a baked byte offset.
+ * That makes the entry genuinely relocatable: a third-party assembler that
+ * re-encodes the function to different instruction lengths still resolves it
+ * to the right address (a fixed fn+offset would point into the wrong
+ * instruction). */
+ ObjSymId sym = mc_label_symbol(&mc->base, label);
+ i64 addend = r->extra_addend;
u8 bytes[8];
u32 i;
- int big_endian;
- if (mc->base.cur_func_sym == OBJ_SYM_NONE) {
- compiler_panic(mc->base.c, mc->base.loc,
- "MCEmitter: label-data reloc resolved outside a function");
- }
- addend = (i64)label_offset - (i64)mc->base.cur_func_start + r->extra_addend;
- /* Patch the inline addend into the data bytes. Object formats that
- * carry the addend in the relocation record (ELF RELA) read both
- * inline and r->addend; static link adds them. Mach-O R_ABS64
- * (ARM64_RELOC_UNSIGNED) only reads the inline addend. Write the
- * computed addend inline and pass 0 in the reloc so both formats
- * resolve to the same runtime address. */
- big_endian = mc->base.c->target.big_endian;
+ int big_endian = mc->base.c->target.big_endian;
+ /* Patch the inline addend (Mach-O ARM64_RELOC_UNSIGNED reads only the inline
+ * value) and also pass it in the reloc record (ELF RELA / the JIT linker's
+ * link_reloc_apply, where the inline gets overwritten by S + A). Both paths
+ * converge on sym + addend at runtime. */
memset(bytes, 0, sizeof bytes);
for (i = 0; i < r->width && i < sizeof bytes; ++i) {
u32 shift = big_endian ? (r->width - 1u - i) * 8u : i * 8u;
bytes[i] = (u8)((u64)addend >> shift);
}
obj_patch(mc->base.obj, r->data_sec, r->data_offset, bytes, r->width);
- /* Pass the addend in BOTH the inline data bytes AND the reloc record:
- * - Mach-O ARM64_RELOC_UNSIGNED uses only the inline value (the .o
- * emitter drops the record's addend for UNSIGNED).
- * - ELF RELA and the JIT linker's link_reloc_apply use the record
- * addend (the inline gets overwritten by S + A).
- * Both paths converge on sym + addend at runtime. */
- mc->base.emit_reloc_at(&mc->base, r->data_sec, r->data_offset, r->kind,
- mc->base.cur_func_sym, addend,
- /*explicit_addend=*/1, /*pair=*/0);
+ mc->base.emit_reloc_at(&mc->base, r->data_sec, r->data_offset, r->kind, sym,
+ addend, /*explicit_addend=*/1, /*pair=*/0);
}
static void apply_fixup(MCImpl* mc, const MCFixup* fx, u32 target_offset) {
@@ -199,6 +198,36 @@ static void apply_fixup(MCImpl* mc, const MCFixup* fx, u32 target_offset) {
}
}
+/* Lazily mint (and return) a per-label SB_LOCAL symbol defined at the label's
+ * placement, for code-location references an encoding-divergent assembler must
+ * be able to recompute: switch jump-table entries and `&&label` address-takes
+ * relocate against it instead of baking a fixed offset. Created undefined if the
+ * label is not yet placed (a forward reference) and defined in m_label_place;
+ * defined immediately otherwise. The name is per-object-unique (MCLabel ids are
+ * monotonic within a TU). */
+ObjSymId mc_label_symbol(MCEmitter* m, MCLabel id) {
+ MCImpl* mc = impl_of(m);
+ MCLabelInfo* li;
+ char buf[40];
+ StrBuf sb;
+ Sym name;
+ if (id == MC_LABEL_NONE || id >= mc->nlabels) {
+ compiler_panic(m->c, m->loc, "MCEmitter: bad label %u for symbol",
+ (unsigned)id);
+ }
+ li = &mc->labels[id];
+ if (li->block_sym != OBJ_SYM_NONE) return li->block_sym;
+ strbuf_init(&sb, buf, sizeof buf);
+ strbuf_put_slice(&sb, SLICE_LIT(".Lcfblk."));
+ strbuf_put_u64(&sb, (u64)id);
+ name = pool_intern_slice(m->c->global, strbuf_slice(&sb));
+ li->block_sym =
+ obj_symbol(m->obj, name, SB_LOCAL, SK_NOTYPE,
+ li->placed ? li->sec_id : OBJ_SEC_NONE,
+ li->placed ? (u64)li->offset : 0u, 0);
+ return li->block_sym;
+}
+
/* ---- vtable methods ---- */
static void m_set_section(MCEmitter* m, u32 section_id) {
@@ -221,6 +250,7 @@ static MCLabel m_label_new(MCEmitter* m) {
li->offset = 0;
li->pending = NULL;
li->pending_data = NULL;
+ li->block_sym = OBJ_SYM_NONE;
return (MCLabel)id;
}
@@ -237,6 +267,11 @@ static void m_label_place(MCEmitter* m, MCLabel id) {
li->placed = 1;
li->sec_id = m->section_id;
li->offset = obj_pos(m->obj, m->section_id);
+ /* Define the lazily-minted block symbol (if any) now that the offset is
+ * known — resolves the forward-reference case for jump-table / &&label
+ * relocations emitted before the label was placed. */
+ if (li->block_sym != OBJ_SYM_NONE)
+ obj_symbol_define(m->obj, li->block_sym, li->sec_id, (u64)li->offset, 0);
/* Apply pending intra-section fixups. */
for (MCFixup* fx = li->pending; fx; fx = fx->next) {
apply_fixup(mc, fx, li->offset);
@@ -247,7 +282,7 @@ static void m_label_place(MCEmitter* m, MCLabel id) {
* body is currently being emitted; the label is always placed inside
* its owning function's emit, so the active function context matches. */
for (MCDataLabelRef* r = li->pending_data; r; r = r->next) {
- emit_label_data_reloc_now(mc, r, li->offset);
+ emit_label_data_reloc_now(mc, id, r);
}
li->pending_data = NULL;
}
@@ -326,7 +361,7 @@ static void m_emit_label_data_reloc(MCEmitter* m, u32 data_sec, u32 data_offset,
tmp.width = width;
tmp.extra_addend = extra_addend;
tmp.next = NULL;
- emit_label_data_reloc_now(mc, &tmp, li->offset);
+ emit_label_data_reloc_now(mc, id, &tmp);
return;
}
{
diff --git a/src/arch/mc.h b/src/arch/mc.h
@@ -127,6 +127,14 @@ struct MCEmitter {
MCEmitter* mc_new(Compiler*, ObjBuilder*);
void mc_free(MCEmitter*);
+/* Lazily mint (and return) a per-label SB_LOCAL symbol defined at `label`'s
+ * placement. Backends use this to reference a code location relocatably —
+ * `&&label` address-takes emit a PC-relative reloc against it instead of baking
+ * a fixed displacement, so a re-encoding assembler (clang) recomputes the right
+ * address. Forward-ref safe: if `label` is not yet placed the symbol is created
+ * undefined and defined in label_place. */
+ObjSymId mc_label_symbol(MCEmitter*, MCLabel label);
+
/* Per-function context helpers. Backends call mc_begin_function from
* their CgTarget func_begin (after computing the post-alignment function
* start) and mc_end_function from func_end. The pair sets / clears
diff --git a/src/arch/registry.c b/src/arch/registry.c
@@ -110,23 +110,6 @@ int arch_reloc_call_pair(const Compiler* c, u16 reloc_kind,
mnemonic_out);
}
-const char* arch_asm_file_prologue(const Compiler* c) {
- const ArchImpl* a = arch_for_compiler(c);
- if (!a || !a->asm_ops || !a->asm_ops->file_prologue) return NULL;
- return a->asm_ops->file_prologue();
-}
-
-int arch_pcrel_code_target(const Compiler* c, CfreeSlice mnemonic,
- CfreeSlice operands, i64* disp_out) {
- const ArchImpl* a = arch_for_compiler(c);
- if (!a || !a->asm_ops || !a->asm_ops->pcrel_code_target) return 0;
- return a->asm_ops->pcrel_code_target(mnemonic, operands, disp_out);
-}
-
-int arch_needs_symbolic_code_refs(const Compiler* c) {
- const ArchImpl* a = arch_for_compiler(c);
- return a && a->asm_ops && a->asm_ops->pcrel_code_target != NULL;
-}
const CGBackend* cg_backend_for_session(const Compiler* c,
const CfreeCodeOptions* opts) {
diff --git a/src/arch/rv64/asm.c b/src/arch/rv64/asm.c
@@ -1110,26 +1110,10 @@ static int rv64_reloc_call_pair(u16 kind, CfreeSlice pair_mnemonic,
return 0;
}
-/* RISC-V cc -S file prologue. cfree computes a few PC-relative targets as
- * fixed byte offsets baked into the instruction stream rather than as symbolic
- * relocations: a `&&label` address-of (auipc+addi with a hardcoded immediate,
- * no reloc) and switch jump-table entries (`.quad fn+offset`). Both assume
- * cfree's own 4-byte-per-instruction, un-relaxed layout. A standards-conformant
- * assembler such as clang defaults to the C extension and would compress
- * instructions (e.g. `mv`->`c.mv`), shifting every later offset and sending
- * those targets to the wrong place. `.option norvc`/`.option norelax` pin the
- * layout so cfree's offsets stay valid through any assembler — cfree's own
- * codegen never emits compressed/relaxed forms, so this only constrains a
- * third party to match what cfree already does. */
-static const char* rv64_file_prologue(void) {
- return "\t.option norvc\n\t.option norelax\n";
-}
-
const ArchAsmOps rv64_asm_ops = {
.reloc_operand = rv64_reloc_operand,
.is_local_branch = rv64_is_local_branch,
.reloc_call_pair = rv64_reloc_call_pair,
- .file_prologue = rv64_file_prologue,
};
ArchAsm* rv64_arch_asm_new(Compiler* c) {
diff --git a/src/arch/rv64/native.c b/src/arch/rv64/native.c
@@ -1177,10 +1177,25 @@ static void rv_indirect_branch(NativeTarget* t, NativeLoc addr,
}
static void rv_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel l) {
+ /* `&&label` address-take: auipc/addi with a %pcrel_hi/%pcrel_lo relocation
+ * pair against the label's per-block local symbol — the same form
+ * rv_emit_global_addr uses for a global — so a compressing/re-encoding
+ * assembler recomputes the displacement (a baked offset would break under
+ * the C extension). */
+ MCEmitter* mc = t->mc;
u32 rd = loc_reg(dst);
- rv64_emit32(t->mc, rv_auipc(rd, 0));
- rv64_emit32(t->mc, rv_addi(rd, rd, 0));
- t->mc->emit_label_ref(t->mc, l, R_RV_INTRA_AUIPC_ADDI, 8, 0);
+ u32 sec = mc->section_id;
+ ObjSymId sym = mc_label_symbol(mc, l);
+ u32 ap = mc->pos(mc);
+ rv64_emit32(mc, rv_auipc(rd, 0));
+ mc->emit_reloc_at(mc, sec, ap, R_RV_PCREL_HI20, sym, 0, 0, 0);
+ {
+ Sym an = pool_intern_slice(t->c->global, SLICE_LIT(".LpcrelHi"));
+ ObjSymId anchor = obj_symbol(t->obj, an, SB_LOCAL, SK_OBJ, sec, (u64)ap, 0);
+ u32 lp = mc->pos(mc);
+ rv64_emit32(mc, rv_addi(rd, rd, 0));
+ mc->emit_reloc_at(mc, sec, lp, R_RV_PCREL_LO12_I, anchor, 0, 0, 0);
+ }
}
/* ============================ frame / lifecycle ============================ */
diff --git a/src/arch/x64/asm.c b/src/arch/x64/asm.c
@@ -1637,75 +1637,9 @@ static int x64_is_local_branch(CfreeSlice m) {
return 0;
}
-/* Parse a leading signed integer (decimal or 0x-hex) from [s, s+len). Returns
- * chars consumed and sets *out, or 0 if no integer starts here. */
-static u32 x64_parse_leading_int(const char* s, u32 len, i64* out) {
- u32 i = 0, start;
- int neg = 0;
- i64 v = 0;
- if (i < len && (s[i] == '+' || s[i] == '-')) {
- neg = (s[i] == '-');
- ++i;
- }
- if (i + 1 < len && s[i] == '0' && (s[i + 1] == 'x' || s[i + 1] == 'X')) {
- i += 2;
- start = i;
- for (; i < len; ++i) {
- char c = s[i];
- if (c >= '0' && c <= '9')
- v = v * 16 + (c - '0');
- else if (c >= 'a' && c <= 'f')
- v = v * 16 + (c - 'a' + 10);
- else if (c >= 'A' && c <= 'F')
- v = v * 16 + (c - 'A' + 10);
- else
- break;
- }
- } else {
- start = i;
- for (; i < len; ++i) {
- char c = s[i];
- if (c >= '0' && c <= '9')
- v = v * 10 + (c - '0');
- else
- break;
- }
- }
- if (i == start) return 0;
- *out = neg ? -v : v;
- return i;
-}
-
-/* x86-64 `&&label` address-take: an un-relocated `leaq <disp>(%rip), %reg`. The
- * disassembler renders the resolved target as a fixed displacement from the
- * next instruction (the %rip base); report it so the symbolizer can swap in a
- * label that an encoding-divergent assembler will recompute correctly. */
-static int x64_pcrel_code_target(CfreeSlice mnemonic, CfreeSlice operands,
- i64* disp_out) {
- const char* o = operands.s;
- u32 ol = operands.len, i, n;
- i64 disp = 0;
- int has_rip = 0;
- if (!(mnemonic.len == 4 && memcmp(mnemonic.s, "leaq", 4) == 0) &&
- !(mnemonic.len == 3 && memcmp(mnemonic.s, "lea", 3) == 0))
- return 0;
- for (i = 0; i + 6 <= ol; ++i)
- if (memcmp(o + i, "(%rip)", 6) == 0) {
- has_rip = 1;
- break;
- }
- if (!has_rip) return 0;
- n = x64_parse_leading_int(o, ol, &disp);
- /* The displacement must sit immediately before `(%rip)`. */
- if (n == 0 || !(n + 6 <= ol && memcmp(o + n, "(%rip)", 6) == 0)) return 0;
- *disp_out = disp;
- return 1;
-}
-
const ArchAsmOps x64_asm_ops = {
.reloc_operand = x64_reloc_operand,
.is_local_branch = x64_is_local_branch,
- .pcrel_code_target = x64_pcrel_code_target,
};
ArchAsm* x64_arch_asm_new(Compiler* c) { return &x64_asm_open(c)->base; }
diff --git a/src/arch/x64/native.c b/src/arch/x64/native.c
@@ -1347,8 +1347,15 @@ static void x64_indirect_branch(NativeTarget* t, NativeLoc addr,
}
static void x64_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel l) {
+ /* `&&label` address-take: `leaq sym(%rip), rd` with an R_PC32 relocation
+ * against the label's per-block local symbol — same form as a global
+ * address-take, so a re-encoding assembler recomputes the displacement.
+ * (A baked disp32 with no reloc would break once clang re-lays-out the
+ * function.) */
MCEmitter* mc = t->mc;
u32 rd = loc_reg(dst);
+ ObjSymId sym = mc_label_symbol(mc, l);
+ u32 disp_pos;
emit_rex(mc, 1, rd, 0, 0);
{
u8 op = X64_OPC_LEA;
@@ -1358,8 +1365,9 @@ static void x64_load_label_addr(NativeTarget* t, NativeLoc dst, MCLabel l) {
u8 mr = modrm(0u, rd & 7u, 5u); /* [rip + disp32] */
mc->emit_bytes(mc, &mr, 1);
}
+ disp_pos = mc->pos(mc);
emit_u32le(mc, 0);
- mc->emit_label_ref(mc, l, R_PC32, 4, -4);
+ mc->emit_reloc_at(mc, mc->section_id, disp_pos, R_PC32, sym, -4, 1, 0);
}
/* ============================ frame / lifecycle ============================ */
diff --git a/test/asm/hostas_cross.sh b/test/asm/hostas_cross.sh
@@ -27,20 +27,16 @@
# clang cross-compiler for it, (2) a runner (podman/qemu) per exec_target, (3) a
# working `cfree cc -S | cfree as` round-trip for that arch, and (4) a bounded
# exec smoke that returns the oracle. So the harness runs green on whatever the
-# host supports and self-extends as gaps close. All three ELF targets now pass
-# BOTH lanes end-to-end (936/936 = 312 cases x {O0,O1} x 3 arches, ENFORCE_CLANG):
-# - aarch64-linux: podman runs arm64 natively in its VM; fixed-width encodings
-# keep cfree's layout, so code references need no special form.
-# - x86_64-linux: cc -S references code locations symbolically — `&&label`
-# address-takes (`leaq Lcf_*(%rip)`) and switch jump-table
-# entries (`.quad Lcf_*`) — so clang's encoding choices
-# (movabs vs mov-imm32, jmp rel32 vs rel8) can't shift a fixed
-# offset onto the wrong instruction. (ArchAsmOps.pcrel_code_target
-# + collect_code_anchors; see src/api/asm_emit.c.)
-# - riscv64-linux: cc -S emits `.option norvc`/`.option norelax` to pin cfree's
-# fixed instruction layout against clang's C-extension
-# compression, plus the %pcrel_hi/%pcrel_lo + AUIPC/JALR call
-# symbolizer.
+# host supports and self-extends as gaps close. All three ELF targets pass BOTH
+# lanes end-to-end (936/936 = 312 cases x {O0,O1} x 3 arches, ENFORCE_CLANG).
+# Code locations that an encoding-divergent assembler must recompute — switch
+# jump-table entries and `&&label` address-takes — are referenced via per-block
+# local symbols emitted by codegen (mc_label_symbol): the jump table is
+# `.quad .Lcfblk.*` (R_ABS64) and the address-take a standard PC-relative reloc
+# against the same symbol (x86-64 leaq/R_PC32, aarch64 adrp+add, riscv64
+# auipc+addi/%pcrel). So the references are genuinely relocatable on every arch
+# and clang's encoding choices (movabs vs mov-imm32, jmp rel32 vs rel8, RVC
+# compression) can't shift a baked offset onto the wrong instruction.
# Execution under qemu-user (x86_64/riscv64 in their podman containers) is the
# sole judge — cfree and clang emit different code, so a byte/text match would be
# meaningless. The batched runner caps each case (EXEC_CASE_TIMEOUT) so one