commit 9e9e4d219953c1a7c1abdf8e319802a67047eed0
parent e384489fd26ddedf0845e0ceb688cfa9b1f7e26f
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 11 May 2026 11:46:26 -0700
jit: close out reloc-apply gaps from doc/JIT.md
Four changes that together take Mach-O Path J from 92/100 to
100/100 and unblock `cfree run` calling host libc:
1. layout_got now runs on the Mach-O JIT path (gated on
!emit_static_exe), giving cross-TU data loads, weak-undef, and
common-coalesce the same slot-routed shape ELF has had. The
Mach-O exe path keeps using link_macho.c::collect_imports.
2. cfree_jit_from_image reserves the full image span as a single
contiguous mapping and subdivides per-segment. Inter-segment
displacements stay within ±4 GiB / ±128 MiB regardless of where
the OS placed the reservation, so ADRP/CALL26 range checks no
longer depend on mmap luck.
3. New layout pass layout_jit_call_stubs synthesizes 12-byte
ADRP+LDR+BR PLT-style stubs (plus an 8-byte GOT-like slot) for
every resolver-supplied / weak-undef SK_ABS target hit by
CALL26/JUMP26 on AArch64. Slots are filled by R_ABS64 against
a synthetic resolver-pointer LinkSymbol that preserves the
original (host) vaddr. cfree run can now call libc directly.
4. The IFUNC trio (32_ifunc, 33_ifunc_in_init, 34_ifunc_addr_taken)
gets j_targets files restricting Path J to ELF tuples — IFUNC
is ELF-only at the format level, matching the pre-existing
e_targets exclusion on 33_ifunc_in_init.
Diffstat:
7 files changed, 479 insertions(+), 129 deletions(-)
diff --git a/doc/JIT.md b/doc/JIT.md
@@ -43,32 +43,42 @@ in-process apply path. The Mach-O J-path issues are listed in
`doc/MACHO.md` §3; the corresponding ELF JIT path is green on the same
inputs.
-- [ ] **Cross-TU data via ADRP/ADD/LDR.** (`doc/MACHO.md` §3.1.)
- `ARM64_RELOC_GOT_LOAD_PAGE21` / `PAGEOFF12` patches the *value*
- instead of the *address* on the JIT path. Internal-GOT slots are
- seeded by dyld chained-fixup REBASEs in the exe path; the JIT has
- no dyld and must seed them in-process. Cases: `11_data_cross_tu`,
- `14_weak_present`, `17_common_coalesce`, `34_ifunc_addr_taken`.
-- [ ] **Weak-undef out of ±4 GiB.** (`doc/MACHO.md` §3.2.) JIT maps
- code far from the synthetic weak-undef sentinel; ELF JIT colocates
- `.got` with `.text` and avoids this. Fix: colocate the sentinel,
- or rewrite ADRP into an absolute MOV/LDR when out of range. Case:
- `16_weak_undef`.
-- [ ] **IFUNC under Mach-O JIT.** (`doc/MACHO.md` §3.3.) Mach-O has no
- `__mod_init_func` equivalent for iplt synthesis. Either exclude
- from `j_targets` or emulate the ELF iplt scheme inside the JIT
- mapping. Cases: `32_ifunc`, `33_ifunc_in_init`.
-- [ ] **Extern resolver / far-call.** (`doc/MACHO.md` §3.4.) Resolver
- returns a host pointer (e.g. libc); reloc-apply tries to encode a
- PC-relative ADRP/ADD or CALL26. Today `cfree run` cannot call any
- libc function from JIT'd code — fails with
- `link: CALL26 out of range (need ±128MiB)`. Options:
- - Route resolver-supplied symbols through an internal-GOT slot
- inside the JIT mapping (matches the exe shape), or
- - Emit a per-import trampoline inside JIT memory (PLT-style:
- `ADRP+LDR+BR Xn`) and redirect CALL26/JUMP26 at it.
- Case: `28_extern_resolver`. Workaround in the meantime: use
- inline `asm volatile(... svc ...)` for syscalls from JIT'd code.
+- [x] **Cross-TU data via ADRP/ADD/LDR.** (`doc/MACHO.md` §3.1.)
+ Resolved by running `layout_got` on the Mach-O JIT path
+ (`src/link/link_layout.c`, gated on `!l->emit_static_exe`).
+ The ELF-shaped synthesis materializes one `.got` slot per
+ GOT-referenced symbol with a per-slot `R_ABS64` reloc, and
+ rewrites `R_AARCH64_ADR_GOT_PAGE` / `LD64_GOT_LO12_NC` to
+ target the slot. The exe path keeps using
+ `link_macho.c::collect_imports`.
+- [x] **Weak-undef / proximity.** (`doc/MACHO.md` §3.2.) Resolved
+ by reserving the JIT image as a single contiguous mapping
+ (`src/link/link_jit.c::cfree_jit_from_image`): one
+ `mem->reserve` call covers the full image span and segments
+ are subdivisions of it, so inter-segment displacements stay
+ within ±4 GiB (ADRP) and ±128 MiB (CALL26) regardless of
+ where the OS placed the mapping. Weak-undef now naturally
+ routes through a GOT slot whose `R_ABS64` writes 0.
+- [x] **IFUNC under Mach-O JIT.** (`doc/MACHO.md` §3.3.) Excluded via
+ `j_targets` on `32_ifunc`, `33_ifunc_in_init`, `34_ifunc_addr_taken`
+ — Mach-O has no `__mod_init_func` analogue for iplt synthesis and
+ IFUNC is an ELF/glibc extension. Revisit only if a Mach-O-shaped
+ iplt scheme inside the JIT mapping becomes a requirement.
+- [x] **Extern resolver / far-call.** (`doc/MACHO.md` §3.4.)
+ Resolved by a new layout pass `layout_jit_call_stubs`
+ (`src/link/link_layout.c`) that, for the AArch64 JIT path,
+ synthesizes a 12-byte PLT-style stub
+ (`ADRP x16, slot ; LDR x16,[x16] ; BR x16`) and an 8-byte
+ slot per resolver-supplied / weak-undef `SK_ABS` target hit
+ by `CALL26`/`JUMP26`. The slot is filled by a per-slot
+ `R_ABS64` against a synthetic resolver-pointer LinkSymbol
+ preserving the original (host) vaddr, and
+ `emit_reloc_records` redirects the CALL26/JUMP26 to the
+ stub. Stubs live in their own RX subsegment of the
+ contiguous JIT reservation so the call-site branch
+ displacement stays in range. `cfree run` can now call
+ libc directly (verified end-to-end with `write` and
+ `printf`).
## Inspector / debugger surface
diff --git a/doc/MACHO.md b/doc/MACHO.md
@@ -10,9 +10,10 @@ after the §1 and §2 fixes below landed. `33_ifunc_in_init/E` remains
`e_targets`-restricted to ELF tuples (§3) since IFUNC has no Mach-O
analogue.
-Path J on `aa64-macho` has a separate set of 8 pre-existing
-failures (11/14/16/17/28/32/33/34); see §3 for the per-case
-breakdown.
+Path J on `aa64-macho` is now also 100/100 (88 J + 3 IFUNC cases
+that are `j_targets`-excluded on Mach-O alongside their pre-existing
+`e_targets` exclusion). See §3 for the per-case breakdown of how
+each former failure was resolved.
ELF (`make test-elf`, `make test-link`) is unaffected — every change
described here is either Mach-O-only or guarded on `target.obj ==
@@ -108,66 +109,46 @@ pre-evaluated) actually runs.
---
-## 3. Path J on `aa64-macho` — TODO
-
-`make test-link CFREE_TEST_OBJ=macho` Path J currently fails on 8
-cases (all SIGSEGV / SIGBUS at runtime — the link succeeds, the
-JIT-mapped code faults). Path E covers the same surface and is
-green, so the divergence is in the JIT-only code paths
-(`link_jit.c`, in-process mmap / reloc-apply) rather than the
-shared resolver / layout passes. Each group below is reachable
-via `CFREE_TEST_OBJ=macho build/test/jit-runner <objs>`.
-
-- **§3.1 Cross-TU data via ADRP/ADD/LDR — value vs. address mix-up.**
- Cases: `11_data_cross_tu/J`, `14_weak_present/J`,
- `17_common_coalesce/J`, `34_ifunc_addr_taken/J`. Witness
- (`11_data_cross_tu`): `test_main` is JIT-mapped, and the load of
- `g_val` faults at address `0xdeadbeefcafebabe` — the literal value
- of `g_val`. Reloc-apply is patching the ADRP/ADD pair with the
- *value* of the cross-TU symbol instead of its *address*. Likely
- in the Mach-O JIT path's `ARM64_RELOC_GOT_LOAD_PAGE21` /
- `PAGEOFF12` apply, or in how internal-GOT slots are seeded for the
- JIT (the exe path seeds them via chained-fixup REBASEs at dyld
- load time — JIT has no dyld and must seed in-process). Start by
- comparing the apply-time S/A/P inputs against the exe path for
- `_g_val` and following where the cross-TU symbol's vaddr comes
- from.
-
-- **§3.2 Weak-undef out of ±4 GiB range.** Case: `16_weak_undef/J`.
- jit-runner errors with `link: ADR_PREL_PG_HI21 out of range (need
- ±4GiB)`. The JIT maps code at a host VA that's more than 4 GiB
- away from the slot the weak-undef ADRP targets (a NULL sentinel,
- or a synthetic landing in some other segment). ELF JIT side-steps
- this by colocating .got with .text; the Mach-O JIT needs the same
- guarantee — either place the weak-undef sentinel inside the same
- 4 GiB window as the patched code, or rewrite the ADRP into an
- absolute MOV/LDR sequence when out-of-range.
-
-- **§3.3 IFUNC under Mach-O JIT.** Cases: `32_ifunc/J`,
- `33_ifunc_in_init/J` (also `34_ifunc_addr_taken/J`, which overlaps
- §3.1). IFUNC is ELF-only at the format level, so Mach-O has no
- __mod_init_func equivalent for the iplt synthesis. Path E green
- here is coincidental — `e_targets`-excluded on `aa64-macho` (§4).
- Decide whether `j_targets` should likewise exclude these or
- whether the JIT path should emulate the ELF iplt scheme inside
- the JIT mapping (call resolver in-process and patch igot.plt
- slots, mirroring `cfree_link_jit`'s existing IFUNC handling for
- ELF inputs).
-
-- **§3.4 Extern resolver mismatch.** Case: `28_extern_resolver/J`.
- SEGVs after link — the resolver returned a host pointer for
- `external_value`, and JIT reloc-apply tried to encode it as a
- PC-relative ADRP/ADD pair. Same underlying issue as §3.2 (host
- pointer far from the JIT mapping). Either route resolver-supplied
- symbols through an internal-GOT slot inside the JIT mapping
- (already the exe shape) or extend the JIT reloc-apply to handle
- >±4 GiB targets via an indirect load.
-
-These are all reachable with the doc's `make test-link
-CFREE_TEST_OBJ=macho` invocation; the test reporter currently prints
-`Segmentation fault: 11` lines from the harness wrapper, with no
-J-specific markers. Cleaning up the J path is the natural next
-slice for finishing aa64-macho.
+## 3. Path J on `aa64-macho` — RESOLVED
+
+`make test-link CFREE_TEST_OBJ=macho` Path J is now 100/100 (88 J
+cases + the §3.3 IFUNC trio excluded via `j_targets`). Each
+sub-issue is summarized below; see `doc/JIT.md` §"Reloc-apply gaps"
+for the canonical implementation pointers.
+
+- **§3.1 Cross-TU data via ADRP/ADD/LDR — RESOLVED.** Cases:
+ `11_data_cross_tu/J`, `14_weak_present/J`, `17_common_coalesce/J`,
+ `34_ifunc_addr_taken/J` (now Mach-O-skipped via §3.3). Fixed by
+ enabling the ELF-shaped `layout_got` synthesis on the Mach-O JIT
+ path (`src/link/link_layout.c`, gated on `!l->emit_static_exe`).
+ The exe path keeps its `link_macho.c::collect_imports` scheme.
+
+- **§3.2 Weak-undef proximity — RESOLVED.** Case: `16_weak_undef/J`.
+ Fixed by allocating the JIT image as a single contiguous mapping
+ (`src/link/link_jit.c::cfree_jit_from_image`): one `mem->reserve`
+ for the full image span, segments are subdivisions, inter-segment
+ displacements are always within ±4 GiB. Weak-undef now flows
+ through a GOT slot whose `R_ABS64` writes 0.
+
+- **§3.3 IFUNC under Mach-O JIT — RESOLVED via exclusion.**
+ Cases: `32_ifunc/J`, `33_ifunc_in_init/J`, `34_ifunc_addr_taken/J`.
+ IFUNC is ELF-only at the format level, so Mach-O has no
+ __mod_init_func equivalent for the iplt synthesis. Excluded via
+ `j_targets` on all three cases (ELF tuples only), matching the
+ pre-existing `e_targets` shape on `33_ifunc_in_init`. Revisit
+ only if a Mach-O-shaped iplt scheme inside the JIT mapping
+ becomes a requirement.
+
+- **§3.4 Extern resolver — RESOLVED.** Case: `28_extern_resolver/J`.
+ Fixed by a new layout pass `layout_jit_call_stubs`
+ (`src/link/link_layout.c`) that synthesizes a 12-byte
+ `ADRP+LDR+BR` stub per resolver-supplied / weak-undef SK_ABS
+ target hit by CALL26/JUMP26. The stub lives in its own RX
+ subsegment of the contiguous JIT mapping; its slot is filled by
+ an `R_ABS64` against a synthetic resolver-pointer LinkSymbol
+ carrying the original (host) vaddr. `emit_reloc_records` redirects
+ CALL26/JUMP26 to the stub. End-to-end: `cfree run` can now call
+ libc directly (verified with `write` and `printf`).
---
diff --git a/src/link/link_jit.c b/src/link/link_jit.c
@@ -53,7 +53,16 @@ static u64 jit_page_size(Compiler* c) {
struct CfreeJit {
Compiler* c;
LinkImage* image;
- CfreeExecMemRegion* segs; /* one per image->nsegments */
+ /* Single contiguous reservation covering every segment. All segments
+ * are sub-ranges of this region — runtime/write aliases are derived
+ * by offsetting against (image-vaddr - image_base). Keeping them
+ * inside one mapping guarantees inter-segment displacements stay in
+ * range for ADRP (±4 GiB) and CALL26 (±128 MiB), which would
+ * otherwise depend on the OS placing independent mmap'd segments
+ * close together. */
+ CfreeExecMemRegion master;
+ u64 image_base; /* page-aligned image vaddr that maps to master.* */
+ CfreeExecMemRegion* segs; /* one per image->nsegments; views into master */
u32 nsegs;
/* DWARF view, lazily constructed on first cfree_jit_view call. Built
* over a private Compiler so its string pools and the new ObjBuilder
@@ -135,7 +144,12 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) {
const CfreeExecMem* mem;
CfreeJit* jit;
CfreeExecMemRegion* segs;
+ CfreeExecMemRegion master;
u64 page;
+ u64 image_base = (u64)-1;
+ u64 image_end = 0;
+ u64 master_size;
+ int needs_exec = 0;
u32 i;
if (!img) return NULL;
@@ -148,28 +162,56 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) {
compiler_panic(c, no_loc(), "cfree_jit_from_image: image has no segments");
}
+ /* Compute the span all segments must fit inside. Layout guarantees
+ * each segment's vaddr is page-aligned (layout_segments / layout_got
+ * align via ALIGN_UP at the page size), so the offset within the
+ * master mapping is (vaddr - image_base). */
+ for (i = 0; i < img->nsegments; ++i) {
+ const LinkSegment* seg = &img->segments[i];
+ u64 hi = ALIGN_UP(seg->vaddr + seg->mem_size, page);
+ if (seg->vaddr < image_base) image_base = seg->vaddr;
+ if (hi > image_end) image_end = hi;
+ if (seg->flags & SF_EXEC) needs_exec = 1;
+ }
+ if (image_base & (page - 1u))
+ compiler_panic(c, no_loc(),
+ "cfree_jit_from_image: segment vaddr not page-aligned");
+ master_size = image_end - image_base;
+
+ /* One reservation for the whole image. Requesting EXEC if any segment
+ * is exec triggers the dual-mapping path on Apple silicon; non-exec
+ * regions just leave the alternate alias unused. Per-segment final
+ * perms are applied via mem->protect on sub-ranges below. */
+ {
+ int master_prot = CFREE_PROT_READ | CFREE_PROT_WRITE;
+ if (needs_exec) master_prot |= CFREE_PROT_EXEC;
+ if (mem->reserve(mem->user, (size_t)master_size, master_prot, &master) != 0) {
+ compiler_panic(c, no_loc(),
+ "cfree_jit_from_image: execmem.reserve failed");
+ }
+ }
+
segs = (CfreeExecMemRegion*)heap->alloc(heap, sizeof(*segs) * img->nsegments,
_Alignof(CfreeExecMemRegion));
if (!segs) {
+ mem->release(mem->user, &master);
compiler_panic(c, no_loc(), "cfree_jit_from_image: oom on segment table");
}
memset(segs, 0, sizeof(*segs) * img->nsegments);
- /* Reserve each segment with its FINAL perms. For EXEC segments the
- * host returns a dual mapping (write alias / runtime alias); for
- * data/rodata the two aliases coincide. */
+ /* Subdivide the master mapping. segs[i].token stays NULL — the
+ * master reservation owns the underlying mapping and is released in
+ * cfree_jit_free. */
for (i = 0; i < img->nsegments; ++i) {
const LinkSegment* seg = &img->segments[i];
+ u64 off = seg->vaddr - image_base;
size_t mlen = (size_t)ALIGN_UP(seg->mem_size, page);
- if (mem->reserve(mem->user, mlen, perms_for(seg->flags), &segs[i]) != 0) {
- u32 j;
- for (j = 0; j < i; ++j) mem->release(mem->user, &segs[j]);
- heap->free(heap, segs, sizeof(*segs) * img->nsegments);
- compiler_panic(c, no_loc(),
- "cfree_jit_from_image: execmem.reserve failed");
- }
+ segs[i].write = (u8*)master.write + off;
+ segs[i].runtime = (u8*)master.runtime + off;
+ segs[i].size = mlen;
+ segs[i].token = NULL;
}
- /* Reservations are zeroed; BSS is naturally zero. */
+ /* Master reservation is zeroed; BSS is naturally zero. */
/* Copy each segment's file bytes to its write alias. */
for (i = 0; i < img->nsegments; ++i) {
@@ -230,13 +272,13 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) {
* write alias is unaffected (still RW for any segment we'd want to
* write to from JITed code; for EXEC segments the write alias is
* orphaned after this point — JITed code is not expected to write
- * to its own code). */
+ * to its own code). Each segs[i] is a sub-range of master; protect
+ * accepts arbitrary [addr,size) inside the reservation. */
for (i = 0; i < img->nsegments; ++i) {
const LinkSegment* seg = &img->segments[i];
if (mem->protect(mem->user, segs[i].runtime, segs[i].size,
perms_for(seg->flags)) != 0) {
- u32 j;
- for (j = 0; j < img->nsegments; ++j) mem->release(mem->user, &segs[j]);
+ mem->release(mem->user, &master);
heap->free(heap, segs, sizeof(*segs) * img->nsegments);
compiler_panic(c, no_loc(),
"cfree_jit_from_image: execmem.protect failed");
@@ -276,12 +318,14 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) {
jit = (CfreeJit*)heap->alloc(heap, sizeof(*jit), _Alignof(CfreeJit));
if (!jit) {
- for (i = 0; i < img->nsegments; ++i) mem->release(mem->user, &segs[i]);
+ mem->release(mem->user, &master);
heap->free(heap, segs, sizeof(*segs) * img->nsegments);
compiler_panic(c, no_loc(), "cfree_jit_from_image: oom on jit handle");
}
jit->c = c;
jit->image = img;
+ jit->master = master;
+ jit->image_base = image_base;
jit->segs = segs;
jit->nsegs = img->nsegments;
jit->view = NULL;
@@ -313,7 +357,6 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) {
void cfree_jit_free(CfreeJit* jit) {
Heap* heap;
const CfreeExecMem* mem;
- u32 i;
if (!jit) return;
heap = (Heap*)jit->c->env->heap;
mem = jit->c->env->execmem;
@@ -324,10 +367,9 @@ void cfree_jit_free(CfreeJit* jit) {
cfree_obj_close(jit->view);
jit->view = NULL;
}
- if (jit->segs && mem && mem->release) {
- for (i = 0; i < jit->nsegs; ++i) {
- if (jit->segs[i].size) mem->release(mem->user, &jit->segs[i]);
- }
+ /* segs[] are views into master — release master only. */
+ if (mem && mem->release && jit->master.size) {
+ mem->release(mem->user, &jit->master);
}
if (jit->segs) {
heap->free(heap, jit->segs, sizeof(*jit->segs) * jit->nsegs);
diff --git a/src/link/link_layout.c b/src/link/link_layout.c
@@ -1918,8 +1918,282 @@ static int reloc_uses_got(u16 kind) {
}
}
+/* Forward decls — defined alongside layout_iplt below. */
+static u32 layout_iplt_alloc_segments(LinkImage* img, u32 nseg);
+static u32 layout_iplt_alloc_sections(LinkImage* img, u32 nsec);
+
+/* ---- pass: JIT call stubs ----
+ *
+ * For the JIT path on AArch64, route every CALL26/JUMP26 against a
+ * resolver-supplied or weak-undef symbol (SK_ABS) through a 12-byte
+ * stub colocated with .text inside the JIT mapping. The stub is
+ * ADRP x16, slot ; LDR x16,[x16,#:lo12:slot] ; BR x16
+ * and the slot is an 8-byte GOT entry filled by a per-slot R_ABS64
+ * reloc against a synthetic resolver-pointer LinkSymbol (whose vaddr
+ * is the original SK_ABS target's vaddr — a host pointer for
+ * resolver-supplied symbols, 0 for weak-undef).
+ *
+ * Rationale: without this routing, CALL26 to a resolver-supplied host
+ * function (e.g. libc `printf` from `cfree run`) trips link_reloc's
+ * ±128 MiB range check, since the JIT mapping is arbitrarily far from
+ * the host VA the resolver returned.
+ *
+ * The stub_map output is a sparse array indexed by LinkSymId
+ * (size = LinkSyms_count(&img->syms)+1 at pass entry; the new stub /
+ * slot / resolver_rec LinkSymbols are never themselves looked up
+ * through this map). emit_reloc_records consults it to redirect
+ * CALL26/JUMP26 targets.
+ *
+ * Runs after resolve_undefs (SK_ABS is set) and before
+ * emit_reloc_records (so the redirect takes effect). Only runs on
+ * AArch64 JIT (`!emit_static_exe`); the exe path covers the same
+ * shape via PLT (ELF) / stubs (Mach-O).
+ *
+ * Address-taking via GOT_LOAD still resolves to the original
+ * resolver-supplied vaddr (the GOT slot's R_ABS64 against the
+ * non-redirected symbol). Address-taking via direct PCREL would land
+ * on the stub instead, but clang does not emit non-GOT-routed
+ * pointer-to-extern on AArch64. */
+static void layout_jit_call_stubs(Linker* l, LinkImage* img, u32 map_size,
+ LinkSymId** stub_map_out) {
+ Heap* h = img->heap;
+ const LinkArchDesc* arch;
+ LinkSymId* stub_map;
+ LinkSymId* targets = NULL;
+ u32 ntarget = 0, tcap = 0;
+ u32 ii, k, i;
+ u64 page;
+ u64 base_vaddr = 0;
+ u64 stubs_vaddr, slots_vaddr;
+ u64 stubs_size, slots_size;
+ u32 stubs_seg_idx, slots_seg_idx;
+ u32 seg_base, sec_base;
+ LinkSegment* stubs_seg;
+ LinkSegment* slots_seg;
+ LinkSection* stubs_sec;
+ LinkSection* slots_sec;
+ u8* stubs_bytes;
+
+ *stub_map_out = NULL;
+ if (l->emit_static_exe) return;
+ if (l->c->target.arch != CFREE_ARCH_ARM_64) return;
+
+ arch = link_arch_desc_for(l->c);
+ if (!arch) return;
+
+ stub_map = (LinkSymId*)h->alloc(h, sizeof(*stub_map) * map_size,
+ _Alignof(LinkSymId));
+ if (!stub_map) compiler_panic(img->c, no_loc(), "link: oom on stub map");
+ memset(stub_map, 0, sizeof(*stub_map) * map_size);
+
+ /* Pass A: collect unique SK_ABS targets of CALL26/JUMP26. */
+ for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) {
+ ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj;
+ InputMap* m = &img->input_maps[ii];
+ u32 total = obj_reloc_total(ob);
+ if (!total) continue;
+ for (k = 0; k < total; ++k) {
+ const Reloc* r = obj_reloc_at(ob, k);
+ const Section* s = obj_section_get(ob, r->section_id);
+ LinkSymId target;
+ const LinkSymbol* tgt;
+ if (!s || !section_kept(s)) continue;
+ if (m->section[r->section_id] == LINK_SEC_NONE) continue;
+ if (r->kind != R_AARCH64_CALL26 && r->kind != R_AARCH64_JUMP26) continue;
+ if (r->sym == OBJ_SYM_NONE || r->sym >= m->nsym) continue;
+ target = m->sym[r->sym];
+ if (target == LINK_SYM_NONE) continue;
+ tgt = LinkSyms_at(&img->syms, target - 1);
+ if (!tgt || tgt->kind != SK_ABS) continue;
+ if (stub_map[target] != LINK_SYM_NONE) continue;
+ if (VEC_GROW(h, targets, tcap, ntarget + 1u))
+ compiler_panic(img->c, no_loc(), "link: oom on stub target list");
+ targets[ntarget] = target;
+ /* Sentinel marker; replaced with the stub's LinkSymId in pass C. */
+ stub_map[target] = (LinkSymId)(ntarget + 1u);
+ ntarget++;
+ }
+ }
+
+ if (ntarget == 0) {
+ if (targets) h->free(h, targets, sizeof(*targets) * tcap);
+ h->free(h, stub_map, sizeof(*stub_map) * map_size);
+ return;
+ }
+ /* Reset sentinels — pass C writes real stub LinkSymIds. */
+ for (i = 0; i < ntarget; ++i) stub_map[targets[i]] = LINK_SYM_NONE;
+
+ /* Pass B: allocate RX stubs segment + RW slots segment. Both land
+ * page-aligned after the current image tail; layout_iplt may run
+ * before us (IFUNC), and layout_got after — none of those passes
+ * shift segments allocated here. */
+ page = layout_page_size(l);
+ for (i = 0; i < img->nsegments; ++i) {
+ u64 end = img->segments[i].vaddr + img->segments[i].mem_size;
+ if (end > base_vaddr) base_vaddr = end;
+ }
+ base_vaddr = ALIGN_UP(base_vaddr, (u64)page);
+ stubs_vaddr = base_vaddr;
+ stubs_size = (u64)ntarget * (u64)arch->iplt_stub_size;
+ slots_vaddr = ALIGN_UP(stubs_vaddr + stubs_size, (u64)page);
+ slots_size = (u64)ntarget * 8u;
+
+ seg_base = layout_iplt_alloc_segments(img, 2u);
+ stubs_seg_idx = seg_base + 0u;
+ slots_seg_idx = seg_base + 1u;
+
+ stubs_seg = &img->segments[stubs_seg_idx];
+ memset(stubs_seg, 0, sizeof(*stubs_seg));
+ stubs_seg->id = (LinkSegmentId)(stubs_seg_idx + 1u);
+ stubs_seg->flags = SF_ALLOC | SF_EXEC;
+ stubs_seg->file_offset = stubs_vaddr;
+ stubs_seg->vaddr = stubs_vaddr;
+ stubs_seg->file_size = stubs_size;
+ stubs_seg->mem_size = stubs_size;
+ stubs_seg->align = (u32)page;
+ stubs_seg->nsections = 1;
+ img->segment_bytes[stubs_seg_idx] = (u8*)h->alloc(h, (size_t)stubs_size, 16);
+ img->segment_bytes_cap[stubs_seg_idx] = (size_t)stubs_size;
+ if (!img->segment_bytes[stubs_seg_idx])
+ compiler_panic(img->c, no_loc(), "link: oom on jit stubs bytes");
+ memset(img->segment_bytes[stubs_seg_idx], 0, (size_t)stubs_size);
+
+ slots_seg = &img->segments[slots_seg_idx];
+ memset(slots_seg, 0, sizeof(*slots_seg));
+ slots_seg->id = (LinkSegmentId)(slots_seg_idx + 1u);
+ slots_seg->flags = SF_ALLOC | SF_WRITE;
+ slots_seg->file_offset = slots_vaddr;
+ slots_seg->vaddr = slots_vaddr;
+ slots_seg->file_size = slots_size;
+ slots_seg->mem_size = slots_size;
+ slots_seg->align = (u32)page;
+ slots_seg->nsections = 1;
+ img->segment_bytes[slots_seg_idx] = (u8*)h->alloc(h, (size_t)slots_size, 16);
+ img->segment_bytes_cap[slots_seg_idx] = (size_t)slots_size;
+ if (!img->segment_bytes[slots_seg_idx])
+ compiler_panic(img->c, no_loc(), "link: oom on jit stub slots bytes");
+ memset(img->segment_bytes[slots_seg_idx], 0, (size_t)slots_size);
+ img->nsegments += 2u;
+
+ sec_base = layout_iplt_alloc_sections(img, 2u);
+ stubs_sec = &img->sections[sec_base + 0u];
+ memset(stubs_sec, 0, sizeof(*stubs_sec));
+ stubs_sec->id = (LinkSectionId)(sec_base + 0u + 1u);
+ stubs_sec->input_id = LINK_INPUT_NONE;
+ stubs_sec->obj_section_id = OBJ_SEC_NONE;
+ stubs_sec->segment_id = stubs_seg->id;
+ stubs_sec->input_offset = 0;
+ stubs_sec->file_offset = stubs_vaddr;
+ stubs_sec->vaddr = stubs_vaddr;
+ stubs_sec->size = stubs_size;
+ stubs_sec->flags = SF_ALLOC | SF_EXEC;
+ stubs_sec->align = 4;
+ stubs_sec->name = pool_intern_cstr(l->c->global, ".cfree_jit_call_stubs");
+ stubs_sec->sem = SSEM_PROGBITS;
+
+ slots_sec = &img->sections[sec_base + 1u];
+ memset(slots_sec, 0, sizeof(*slots_sec));
+ slots_sec->id = (LinkSectionId)(sec_base + 1u + 1u);
+ slots_sec->input_id = LINK_INPUT_NONE;
+ slots_sec->obj_section_id = OBJ_SEC_NONE;
+ slots_sec->segment_id = slots_seg->id;
+ slots_sec->input_offset = 0;
+ slots_sec->file_offset = slots_vaddr;
+ slots_sec->vaddr = slots_vaddr;
+ slots_sec->size = slots_size;
+ slots_sec->flags = SF_ALLOC | SF_WRITE;
+ slots_sec->align = 8;
+ slots_sec->name = pool_intern_cstr(l->c->global, ".cfree_jit_call_slots");
+ slots_sec->sem = SSEM_PROGBITS;
+ img->nsections += 2u;
+
+ /* Pass C: per target, emit stub bytes, synthesize slot + resolver
+ * LinkSymbols, and queue the 3 relocs that wire them together. */
+ stubs_bytes = img->segment_bytes[stubs_seg_idx];
+ for (i = 0; i < ntarget; ++i) {
+ LinkSymId orig = targets[i];
+ LinkSymbol* orig_sym = LinkSyms_at(&img->syms, orig - 1);
+ u64 stub_vaddr = stubs_vaddr + (u64)i * (u64)arch->iplt_stub_size;
+ u64 slot_vaddr = slots_vaddr + (u64)i * 8u;
+ LinkSymbol slot_rec, resolver_rec, stub_rec;
+ LinkSymId slot_id, resolver_id, stub_id;
+ LinkArchIPltReloc stub_relocs[2];
+ u32 nstub_relocs;
+ LinkRelocApply rrec;
+ u8* stub_dst = stubs_bytes + (size_t)i * (size_t)arch->iplt_stub_size;
+ u32 ri;
+
+ nstub_relocs =
+ arch->emit_iplt_stub(stub_dst, stub_vaddr, slot_vaddr, stub_relocs);
+
+ memset(&slot_rec, 0, sizeof(slot_rec));
+ slot_rec.kind = SK_OBJ;
+ slot_rec.bind = SB_LOCAL;
+ slot_rec.defined = 1;
+ slot_rec.section_id = slots_sec->id;
+ slot_rec.vaddr = slot_vaddr;
+ slot_rec.size = 8;
+ slot_id = append_symbol(img, &slot_rec);
+
+ /* Preserve the original SK_ABS vaddr (host pointer / NULL) for the
+ * slot's R_ABS64. Redirecting the original LinkSymbol would
+ * change semantics for non-call references (e.g. data loads). */
+ memset(&resolver_rec, 0, sizeof(resolver_rec));
+ resolver_rec.kind = SK_ABS;
+ resolver_rec.bind = SB_LOCAL;
+ resolver_rec.defined = 1;
+ resolver_rec.vaddr = orig_sym->vaddr;
+ resolver_id = append_symbol(img, &resolver_rec);
+
+ memset(&stub_rec, 0, sizeof(stub_rec));
+ stub_rec.kind = SK_FUNC;
+ stub_rec.bind = SB_LOCAL;
+ stub_rec.defined = 1;
+ stub_rec.section_id = stubs_sec->id;
+ stub_rec.vaddr = stub_vaddr;
+ stub_rec.size = arch->iplt_stub_size;
+ stub_id = append_symbol(img, &stub_rec);
+ stub_map[orig] = stub_id;
+
+ /* Stub→slot relocs (ADR_PREL_PG_HI21 + LDST64_ABS_LO12_NC). */
+ for (ri = 0; ri < nstub_relocs; ++ri) {
+ memset(&rrec, 0, sizeof(rrec));
+ rrec.input_id = LINK_INPUT_NONE;
+ rrec.section_id = OBJ_SEC_NONE;
+ rrec.link_section_id = stubs_sec->id;
+ rrec.offset = (u32)(i * arch->iplt_stub_size) +
+ stub_relocs[ri].offset_in_stub;
+ rrec.width = stub_relocs[ri].width;
+ rrec.write_vaddr = stub_vaddr + stub_relocs[ri].offset_in_stub;
+ rrec.write_file_offset = rrec.write_vaddr;
+ rrec.kind = stub_relocs[ri].kind;
+ rrec.target = slot_id;
+ rrec.addend = 0;
+ *append_reloc_slot(img) = rrec;
+ }
+
+ /* Slot R_ABS64 against resolver_rec (preserves original vaddr). */
+ memset(&rrec, 0, sizeof(rrec));
+ rrec.input_id = LINK_INPUT_NONE;
+ rrec.section_id = OBJ_SEC_NONE;
+ rrec.link_section_id = slots_sec->id;
+ rrec.offset = (u32)(i * 8u);
+ rrec.width = 8;
+ rrec.write_vaddr = slot_vaddr;
+ rrec.write_file_offset = slot_vaddr;
+ rrec.kind = R_ABS64;
+ rrec.target = resolver_id;
+ rrec.addend = 0;
+ *append_reloc_slot(img) = rrec;
+ }
+
+ if (targets) h->free(h, targets, sizeof(*targets) * tcap);
+ *stub_map_out = stub_map;
+}
+
static void emit_reloc_records(Linker* l, LinkImage* img,
- const LinkSymId* got_map) {
+ const LinkSymId* got_map,
+ const LinkSymId* stub_map) {
u32 ii;
for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) {
ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj;
@@ -1958,6 +2232,19 @@ static void emit_reloc_records(Linker* l, LinkImage* img,
compiler_panic(l->c, no_loc(), "link: GOT slot missing for symbol");
target = slot;
}
+ /* JIT path: CALL26/JUMP26 against a resolver-supplied (or
+ * weak-undef) SK_ABS target is routed through a per-target stub
+ * synthesized by layout_jit_call_stubs. The stub is colocated
+ * with .text inside the JIT mapping so the branch displacement
+ * fits ±128 MiB even when the real target is a host pointer
+ * arbitrarily far away. stub_map is sparse — only entries for
+ * targets a CALL26/JUMP26 was actually emitted against are
+ * populated. */
+ if (stub_map && (r->kind == R_AARCH64_CALL26 ||
+ r->kind == R_AARCH64_JUMP26)) {
+ LinkSymId stub = stub_map[target];
+ if (stub != LINK_SYM_NONE) target = stub;
+ }
ls = &img->sections[m->section[r->section_id] - 1];
memset(&rec, 0, sizeof(rec));
rec.input_id = LinkInputs_at(&l->inputs, ii)->id;
@@ -1993,7 +2280,8 @@ static void emit_reloc_records(Linker* l, LinkImage* img,
* (LinkSyms_count(&img->syms)+1) indexed by LinkSymId, holding the slot's
* synthetic LinkSymId (or LINK_SYM_NONE for symbols that don't need a slot).
* Caller frees. */
-static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out) {
+static void layout_got(Linker* l, LinkImage* img, u32 map_size,
+ LinkSymId** got_map_out) {
Heap* h = img->heap;
LinkSymId* got_map;
LinkSymId* slot_targets = NULL;
@@ -2010,15 +2298,16 @@ static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out) {
*got_map_out = NULL;
+ /* map_size is the caller's pre-pass symbol count (+ 1 for the 1-based
+ * LinkSymId space). Synthetic syms appended below are never indexed
+ * through got_map, so the map is correctly sized despite further
+ * growth of img->syms. */
+ got_map = (LinkSymId*)h->alloc(h, sizeof(*got_map) * map_size,
+ _Alignof(LinkSymId));
+ if (!got_map) compiler_panic(img->c, no_loc(), "link: oom on got map");
+ memset(got_map, 0, sizeof(*got_map) * map_size);
+
/* Pass A: scan input relocs for GOT-using kinds. */
- {
- u32 nsyms_now = LinkSyms_count(&img->syms); /* freeze before we append */
- got_map = (LinkSymId*)h->alloc(h, sizeof(*got_map) * (nsyms_now + 1u),
- _Alignof(LinkSymId));
- if (!got_map) compiler_panic(img->c, no_loc(), "link: oom on got map");
- memset(got_map, 0, sizeof(*got_map) * (nsyms_now + 1u));
- (void)nsyms_now;
- }
for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) {
ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj;
@@ -2048,7 +2337,7 @@ static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out) {
if (nslot == 0) {
if (slot_targets)
h->free(h, slot_targets, sizeof(*slot_targets) * slot_cap);
- h->free(h, got_map, sizeof(*got_map) * (LinkSyms_count(&img->syms) + 1u));
+ h->free(h, got_map, sizeof(*got_map) * map_size);
return;
}
@@ -2949,16 +3238,35 @@ LinkImage* link_resolve(Linker* l) {
if (img->niplt) emit_array_boundaries(l, img);
{
LinkSymId* got_map = NULL;
- u32 got_map_size = LinkSyms_count(&img->syms) + 1u;
+ LinkSymId* stub_map = NULL;
+ /* Both maps are sparse arrays indexed by orig LinkSymId, sized
+ * to the symbol count BEFORE either pass appends synthetic
+ * entries (stub/slot/resolver_rec from layout_jit_call_stubs;
+ * GOT-slot syms from layout_got). Snapshot here so the free
+ * size matches the allocation. */
+ u32 map_size = LinkSyms_count(&img->syms) + 1u;
+ /* JIT-only: synthesize per-target stubs for CALL26/JUMP26
+ * against resolver-supplied or weak-undef SK_ABS targets so the
+ * branch displacement stays within ±128 MiB of .text regardless
+ * of where the resolver-returned host pointer lives. Runs
+ * before layout_got (the stub's slot reloc is non-GOT) and
+ * before emit_reloc_records (which consults stub_map). */
+ layout_jit_call_stubs(l, img, map_size, &stub_map);
/* layout_got synthesizes ELF-shaped .got slots and rewrites
* GOT-using reloc targets to point at them. Mach-O has its own
- * __DATA_CONST,__got mechanism wired up in link_macho.c, so
- * skip the ELF synthesis there — GOT relocs keep their original
- * user-named target, which link_macho's collect_imports pass
- * matches against imports + internal-GOT entries. */
- if (l->c->target.obj != CFREE_OBJ_MACHO) layout_got(l, img, &got_map);
- emit_reloc_records(l, img, got_map);
- if (got_map) h->free(h, got_map, sizeof(*got_map) * got_map_size);
+ * __DATA_CONST,__got mechanism wired up in link_macho.c for the
+ * exe path (driven by collect_imports), so skip the ELF synthesis
+ * there. The JIT path has no equivalent — link_jit.c does not
+ * run collect_imports — so fall through to layout_got on Mach-O
+ * when emit_static_exe is off (cfree_link_jit). Without this,
+ * cross-TU GOT_LOAD_PAGE21 / LD64_GOT_LO12_NC relocs would patch
+ * with S = symbol value instead of S = slot address (see
+ * doc/MACHO.md §3.1). */
+ if (l->c->target.obj != CFREE_OBJ_MACHO || !l->emit_static_exe)
+ layout_got(l, img, map_size, &got_map);
+ emit_reloc_records(l, img, got_map, stub_map);
+ if (got_map) h->free(h, got_map, sizeof(*got_map) * map_size);
+ if (stub_map) h->free(h, stub_map, sizeof(*stub_map) * map_size);
}
/* Phase 4 dynamic-link tables. Runs after every other layout
* pass: it depends on import resolution (resolve_undefs), every
diff --git a/test/link/cases/32_ifunc/j_targets b/test/link/cases/32_ifunc/j_targets
@@ -0,0 +1,3 @@
+aa64-elf
+rv64-elf
+x64-elf
diff --git a/test/link/cases/33_ifunc_in_init/j_targets b/test/link/cases/33_ifunc_in_init/j_targets
@@ -0,0 +1,3 @@
+aa64-elf
+rv64-elf
+x64-elf
diff --git a/test/link/cases/34_ifunc_addr_taken/j_targets b/test/link/cases/34_ifunc_addr_taken/j_targets
@@ -0,0 +1,3 @@
+aa64-elf
+rv64-elf
+x64-elf