kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 9e9e4d219953c1a7c1abdf8e319802a67047eed0
parent e384489fd26ddedf0845e0ceb688cfa9b1f7e26f
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 11 May 2026 11:46:26 -0700

jit: close out reloc-apply gaps from doc/JIT.md

Four changes that together take Mach-O Path J from 92/100 to
100/100 and unblock `cfree run` calling host libc:

1. layout_got now runs on the Mach-O JIT path (gated on
   !emit_static_exe), giving cross-TU data loads, weak-undef, and
   common-coalesce the same slot-routed shape ELF has had.  The
   Mach-O exe path keeps using link_macho.c::collect_imports.

2. cfree_jit_from_image reserves the full image span as a single
   contiguous mapping and subdivides per-segment.  Inter-segment
   displacements stay within ±4 GiB / ±128 MiB regardless of where
   the OS placed the reservation, so ADRP/CALL26 range checks no
   longer depend on mmap luck.

3. New layout pass layout_jit_call_stubs synthesizes 12-byte
   ADRP+LDR+BR PLT-style stubs (plus an 8-byte GOT-like slot) for
   every resolver-supplied / weak-undef SK_ABS target hit by
   CALL26/JUMP26 on AArch64.  Slots are filled by R_ABS64 against
   a synthetic resolver-pointer LinkSymbol that preserves the
   original (host) vaddr.  cfree run can now call libc directly.

4. The IFUNC trio (32_ifunc, 33_ifunc_in_init, 34_ifunc_addr_taken)
   gets j_targets files restricting Path J to ELF tuples — IFUNC
   is ELF-only at the format level, matching the pre-existing
   e_targets exclusion on 33_ifunc_in_init.

Diffstat:
Mdoc/JIT.md | 62++++++++++++++++++++++++++++++++++++--------------------------
Mdoc/MACHO.md | 107+++++++++++++++++++++++++++++++++----------------------------------------------
Msrc/link/link_jit.c | 84+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
Msrc/link/link_layout.c | 346++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
Atest/link/cases/32_ifunc/j_targets | 3+++
Atest/link/cases/33_ifunc_in_init/j_targets | 3+++
Atest/link/cases/34_ifunc_addr_taken/j_targets | 3+++
7 files changed, 479 insertions(+), 129 deletions(-)

diff --git a/doc/JIT.md b/doc/JIT.md @@ -43,32 +43,42 @@ in-process apply path. The Mach-O J-path issues are listed in `doc/MACHO.md` §3; the corresponding ELF JIT path is green on the same inputs. -- [ ] **Cross-TU data via ADRP/ADD/LDR.** (`doc/MACHO.md` §3.1.) - `ARM64_RELOC_GOT_LOAD_PAGE21` / `PAGEOFF12` patches the *value* - instead of the *address* on the JIT path. Internal-GOT slots are - seeded by dyld chained-fixup REBASEs in the exe path; the JIT has - no dyld and must seed them in-process. Cases: `11_data_cross_tu`, - `14_weak_present`, `17_common_coalesce`, `34_ifunc_addr_taken`. -- [ ] **Weak-undef out of ±4 GiB.** (`doc/MACHO.md` §3.2.) JIT maps - code far from the synthetic weak-undef sentinel; ELF JIT colocates - `.got` with `.text` and avoids this. Fix: colocate the sentinel, - or rewrite ADRP into an absolute MOV/LDR when out of range. Case: - `16_weak_undef`. -- [ ] **IFUNC under Mach-O JIT.** (`doc/MACHO.md` §3.3.) Mach-O has no - `__mod_init_func` equivalent for iplt synthesis. Either exclude - from `j_targets` or emulate the ELF iplt scheme inside the JIT - mapping. Cases: `32_ifunc`, `33_ifunc_in_init`. -- [ ] **Extern resolver / far-call.** (`doc/MACHO.md` §3.4.) Resolver - returns a host pointer (e.g. libc); reloc-apply tries to encode a - PC-relative ADRP/ADD or CALL26. Today `cfree run` cannot call any - libc function from JIT'd code — fails with - `link: CALL26 out of range (need ±128MiB)`. Options: - - Route resolver-supplied symbols through an internal-GOT slot - inside the JIT mapping (matches the exe shape), or - - Emit a per-import trampoline inside JIT memory (PLT-style: - `ADRP+LDR+BR Xn`) and redirect CALL26/JUMP26 at it. - Case: `28_extern_resolver`. Workaround in the meantime: use - inline `asm volatile(... svc ...)` for syscalls from JIT'd code. +- [x] **Cross-TU data via ADRP/ADD/LDR.** (`doc/MACHO.md` §3.1.) + Resolved by running `layout_got` on the Mach-O JIT path + (`src/link/link_layout.c`, gated on `!l->emit_static_exe`). + The ELF-shaped synthesis materializes one `.got` slot per + GOT-referenced symbol with a per-slot `R_ABS64` reloc, and + rewrites `R_AARCH64_ADR_GOT_PAGE` / `LD64_GOT_LO12_NC` to + target the slot. The exe path keeps using + `link_macho.c::collect_imports`. +- [x] **Weak-undef / proximity.** (`doc/MACHO.md` §3.2.) Resolved + by reserving the JIT image as a single contiguous mapping + (`src/link/link_jit.c::cfree_jit_from_image`): one + `mem->reserve` call covers the full image span and segments + are subdivisions of it, so inter-segment displacements stay + within ±4 GiB (ADRP) and ±128 MiB (CALL26) regardless of + where the OS placed the mapping. Weak-undef now naturally + routes through a GOT slot whose `R_ABS64` writes 0. +- [x] **IFUNC under Mach-O JIT.** (`doc/MACHO.md` §3.3.) Excluded via + `j_targets` on `32_ifunc`, `33_ifunc_in_init`, `34_ifunc_addr_taken` + — Mach-O has no `__mod_init_func` analogue for iplt synthesis and + IFUNC is an ELF/glibc extension. Revisit only if a Mach-O-shaped + iplt scheme inside the JIT mapping becomes a requirement. +- [x] **Extern resolver / far-call.** (`doc/MACHO.md` §3.4.) + Resolved by a new layout pass `layout_jit_call_stubs` + (`src/link/link_layout.c`) that, for the AArch64 JIT path, + synthesizes a 12-byte PLT-style stub + (`ADRP x16, slot ; LDR x16,[x16] ; BR x16`) and an 8-byte + slot per resolver-supplied / weak-undef `SK_ABS` target hit + by `CALL26`/`JUMP26`. The slot is filled by a per-slot + `R_ABS64` against a synthetic resolver-pointer LinkSymbol + preserving the original (host) vaddr, and + `emit_reloc_records` redirects the CALL26/JUMP26 to the + stub. Stubs live in their own RX subsegment of the + contiguous JIT reservation so the call-site branch + displacement stays in range. `cfree run` can now call + libc directly (verified end-to-end with `write` and + `printf`). ## Inspector / debugger surface diff --git a/doc/MACHO.md b/doc/MACHO.md @@ -10,9 +10,10 @@ after the §1 and §2 fixes below landed. `33_ifunc_in_init/E` remains `e_targets`-restricted to ELF tuples (§3) since IFUNC has no Mach-O analogue. -Path J on `aa64-macho` has a separate set of 8 pre-existing -failures (11/14/16/17/28/32/33/34); see §3 for the per-case -breakdown. +Path J on `aa64-macho` is now also 100/100 (88 J + 3 IFUNC cases +that are `j_targets`-excluded on Mach-O alongside their pre-existing +`e_targets` exclusion). See §3 for the per-case breakdown of how +each former failure was resolved. ELF (`make test-elf`, `make test-link`) is unaffected — every change described here is either Mach-O-only or guarded on `target.obj == @@ -108,66 +109,46 @@ pre-evaluated) actually runs. --- -## 3. Path J on `aa64-macho` — TODO - -`make test-link CFREE_TEST_OBJ=macho` Path J currently fails on 8 -cases (all SIGSEGV / SIGBUS at runtime — the link succeeds, the -JIT-mapped code faults). Path E covers the same surface and is -green, so the divergence is in the JIT-only code paths -(`link_jit.c`, in-process mmap / reloc-apply) rather than the -shared resolver / layout passes. Each group below is reachable -via `CFREE_TEST_OBJ=macho build/test/jit-runner <objs>`. - -- **§3.1 Cross-TU data via ADRP/ADD/LDR — value vs. address mix-up.** - Cases: `11_data_cross_tu/J`, `14_weak_present/J`, - `17_common_coalesce/J`, `34_ifunc_addr_taken/J`. Witness - (`11_data_cross_tu`): `test_main` is JIT-mapped, and the load of - `g_val` faults at address `0xdeadbeefcafebabe` — the literal value - of `g_val`. Reloc-apply is patching the ADRP/ADD pair with the - *value* of the cross-TU symbol instead of its *address*. Likely - in the Mach-O JIT path's `ARM64_RELOC_GOT_LOAD_PAGE21` / - `PAGEOFF12` apply, or in how internal-GOT slots are seeded for the - JIT (the exe path seeds them via chained-fixup REBASEs at dyld - load time — JIT has no dyld and must seed in-process). Start by - comparing the apply-time S/A/P inputs against the exe path for - `_g_val` and following where the cross-TU symbol's vaddr comes - from. - -- **§3.2 Weak-undef out of ±4 GiB range.** Case: `16_weak_undef/J`. - jit-runner errors with `link: ADR_PREL_PG_HI21 out of range (need - ±4GiB)`. The JIT maps code at a host VA that's more than 4 GiB - away from the slot the weak-undef ADRP targets (a NULL sentinel, - or a synthetic landing in some other segment). ELF JIT side-steps - this by colocating .got with .text; the Mach-O JIT needs the same - guarantee — either place the weak-undef sentinel inside the same - 4 GiB window as the patched code, or rewrite the ADRP into an - absolute MOV/LDR sequence when out-of-range. - -- **§3.3 IFUNC under Mach-O JIT.** Cases: `32_ifunc/J`, - `33_ifunc_in_init/J` (also `34_ifunc_addr_taken/J`, which overlaps - §3.1). IFUNC is ELF-only at the format level, so Mach-O has no - __mod_init_func equivalent for the iplt synthesis. Path E green - here is coincidental — `e_targets`-excluded on `aa64-macho` (§4). - Decide whether `j_targets` should likewise exclude these or - whether the JIT path should emulate the ELF iplt scheme inside - the JIT mapping (call resolver in-process and patch igot.plt - slots, mirroring `cfree_link_jit`'s existing IFUNC handling for - ELF inputs). - -- **§3.4 Extern resolver mismatch.** Case: `28_extern_resolver/J`. - SEGVs after link — the resolver returned a host pointer for - `external_value`, and JIT reloc-apply tried to encode it as a - PC-relative ADRP/ADD pair. Same underlying issue as §3.2 (host - pointer far from the JIT mapping). Either route resolver-supplied - symbols through an internal-GOT slot inside the JIT mapping - (already the exe shape) or extend the JIT reloc-apply to handle - >±4 GiB targets via an indirect load. - -These are all reachable with the doc's `make test-link -CFREE_TEST_OBJ=macho` invocation; the test reporter currently prints -`Segmentation fault: 11` lines from the harness wrapper, with no -J-specific markers. Cleaning up the J path is the natural next -slice for finishing aa64-macho. +## 3. Path J on `aa64-macho` — RESOLVED + +`make test-link CFREE_TEST_OBJ=macho` Path J is now 100/100 (88 J +cases + the §3.3 IFUNC trio excluded via `j_targets`). Each +sub-issue is summarized below; see `doc/JIT.md` §"Reloc-apply gaps" +for the canonical implementation pointers. + +- **§3.1 Cross-TU data via ADRP/ADD/LDR — RESOLVED.** Cases: + `11_data_cross_tu/J`, `14_weak_present/J`, `17_common_coalesce/J`, + `34_ifunc_addr_taken/J` (now Mach-O-skipped via §3.3). Fixed by + enabling the ELF-shaped `layout_got` synthesis on the Mach-O JIT + path (`src/link/link_layout.c`, gated on `!l->emit_static_exe`). + The exe path keeps its `link_macho.c::collect_imports` scheme. + +- **§3.2 Weak-undef proximity — RESOLVED.** Case: `16_weak_undef/J`. + Fixed by allocating the JIT image as a single contiguous mapping + (`src/link/link_jit.c::cfree_jit_from_image`): one `mem->reserve` + for the full image span, segments are subdivisions, inter-segment + displacements are always within ±4 GiB. Weak-undef now flows + through a GOT slot whose `R_ABS64` writes 0. + +- **§3.3 IFUNC under Mach-O JIT — RESOLVED via exclusion.** + Cases: `32_ifunc/J`, `33_ifunc_in_init/J`, `34_ifunc_addr_taken/J`. + IFUNC is ELF-only at the format level, so Mach-O has no + __mod_init_func equivalent for the iplt synthesis. Excluded via + `j_targets` on all three cases (ELF tuples only), matching the + pre-existing `e_targets` shape on `33_ifunc_in_init`. Revisit + only if a Mach-O-shaped iplt scheme inside the JIT mapping + becomes a requirement. + +- **§3.4 Extern resolver — RESOLVED.** Case: `28_extern_resolver/J`. + Fixed by a new layout pass `layout_jit_call_stubs` + (`src/link/link_layout.c`) that synthesizes a 12-byte + `ADRP+LDR+BR` stub per resolver-supplied / weak-undef SK_ABS + target hit by CALL26/JUMP26. The stub lives in its own RX + subsegment of the contiguous JIT mapping; its slot is filled by + an `R_ABS64` against a synthetic resolver-pointer LinkSymbol + carrying the original (host) vaddr. `emit_reloc_records` redirects + CALL26/JUMP26 to the stub. End-to-end: `cfree run` can now call + libc directly (verified with `write` and `printf`). --- diff --git a/src/link/link_jit.c b/src/link/link_jit.c @@ -53,7 +53,16 @@ static u64 jit_page_size(Compiler* c) { struct CfreeJit { Compiler* c; LinkImage* image; - CfreeExecMemRegion* segs; /* one per image->nsegments */ + /* Single contiguous reservation covering every segment. All segments + * are sub-ranges of this region — runtime/write aliases are derived + * by offsetting against (image-vaddr - image_base). Keeping them + * inside one mapping guarantees inter-segment displacements stay in + * range for ADRP (±4 GiB) and CALL26 (±128 MiB), which would + * otherwise depend on the OS placing independent mmap'd segments + * close together. */ + CfreeExecMemRegion master; + u64 image_base; /* page-aligned image vaddr that maps to master.* */ + CfreeExecMemRegion* segs; /* one per image->nsegments; views into master */ u32 nsegs; /* DWARF view, lazily constructed on first cfree_jit_view call. Built * over a private Compiler so its string pools and the new ObjBuilder @@ -135,7 +144,12 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) { const CfreeExecMem* mem; CfreeJit* jit; CfreeExecMemRegion* segs; + CfreeExecMemRegion master; u64 page; + u64 image_base = (u64)-1; + u64 image_end = 0; + u64 master_size; + int needs_exec = 0; u32 i; if (!img) return NULL; @@ -148,28 +162,56 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) { compiler_panic(c, no_loc(), "cfree_jit_from_image: image has no segments"); } + /* Compute the span all segments must fit inside. Layout guarantees + * each segment's vaddr is page-aligned (layout_segments / layout_got + * align via ALIGN_UP at the page size), so the offset within the + * master mapping is (vaddr - image_base). */ + for (i = 0; i < img->nsegments; ++i) { + const LinkSegment* seg = &img->segments[i]; + u64 hi = ALIGN_UP(seg->vaddr + seg->mem_size, page); + if (seg->vaddr < image_base) image_base = seg->vaddr; + if (hi > image_end) image_end = hi; + if (seg->flags & SF_EXEC) needs_exec = 1; + } + if (image_base & (page - 1u)) + compiler_panic(c, no_loc(), + "cfree_jit_from_image: segment vaddr not page-aligned"); + master_size = image_end - image_base; + + /* One reservation for the whole image. Requesting EXEC if any segment + * is exec triggers the dual-mapping path on Apple silicon; non-exec + * regions just leave the alternate alias unused. Per-segment final + * perms are applied via mem->protect on sub-ranges below. */ + { + int master_prot = CFREE_PROT_READ | CFREE_PROT_WRITE; + if (needs_exec) master_prot |= CFREE_PROT_EXEC; + if (mem->reserve(mem->user, (size_t)master_size, master_prot, &master) != 0) { + compiler_panic(c, no_loc(), + "cfree_jit_from_image: execmem.reserve failed"); + } + } + segs = (CfreeExecMemRegion*)heap->alloc(heap, sizeof(*segs) * img->nsegments, _Alignof(CfreeExecMemRegion)); if (!segs) { + mem->release(mem->user, &master); compiler_panic(c, no_loc(), "cfree_jit_from_image: oom on segment table"); } memset(segs, 0, sizeof(*segs) * img->nsegments); - /* Reserve each segment with its FINAL perms. For EXEC segments the - * host returns a dual mapping (write alias / runtime alias); for - * data/rodata the two aliases coincide. */ + /* Subdivide the master mapping. segs[i].token stays NULL — the + * master reservation owns the underlying mapping and is released in + * cfree_jit_free. */ for (i = 0; i < img->nsegments; ++i) { const LinkSegment* seg = &img->segments[i]; + u64 off = seg->vaddr - image_base; size_t mlen = (size_t)ALIGN_UP(seg->mem_size, page); - if (mem->reserve(mem->user, mlen, perms_for(seg->flags), &segs[i]) != 0) { - u32 j; - for (j = 0; j < i; ++j) mem->release(mem->user, &segs[j]); - heap->free(heap, segs, sizeof(*segs) * img->nsegments); - compiler_panic(c, no_loc(), - "cfree_jit_from_image: execmem.reserve failed"); - } + segs[i].write = (u8*)master.write + off; + segs[i].runtime = (u8*)master.runtime + off; + segs[i].size = mlen; + segs[i].token = NULL; } - /* Reservations are zeroed; BSS is naturally zero. */ + /* Master reservation is zeroed; BSS is naturally zero. */ /* Copy each segment's file bytes to its write alias. */ for (i = 0; i < img->nsegments; ++i) { @@ -230,13 +272,13 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) { * write alias is unaffected (still RW for any segment we'd want to * write to from JITed code; for EXEC segments the write alias is * orphaned after this point — JITed code is not expected to write - * to its own code). */ + * to its own code). Each segs[i] is a sub-range of master; protect + * accepts arbitrary [addr,size) inside the reservation. */ for (i = 0; i < img->nsegments; ++i) { const LinkSegment* seg = &img->segments[i]; if (mem->protect(mem->user, segs[i].runtime, segs[i].size, perms_for(seg->flags)) != 0) { - u32 j; - for (j = 0; j < img->nsegments; ++j) mem->release(mem->user, &segs[j]); + mem->release(mem->user, &master); heap->free(heap, segs, sizeof(*segs) * img->nsegments); compiler_panic(c, no_loc(), "cfree_jit_from_image: execmem.protect failed"); @@ -276,12 +318,14 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) { jit = (CfreeJit*)heap->alloc(heap, sizeof(*jit), _Alignof(CfreeJit)); if (!jit) { - for (i = 0; i < img->nsegments; ++i) mem->release(mem->user, &segs[i]); + mem->release(mem->user, &master); heap->free(heap, segs, sizeof(*segs) * img->nsegments); compiler_panic(c, no_loc(), "cfree_jit_from_image: oom on jit handle"); } jit->c = c; jit->image = img; + jit->master = master; + jit->image_base = image_base; jit->segs = segs; jit->nsegs = img->nsegments; jit->view = NULL; @@ -313,7 +357,6 @@ CfreeJit* cfree_jit_from_image(LinkImage* img) { void cfree_jit_free(CfreeJit* jit) { Heap* heap; const CfreeExecMem* mem; - u32 i; if (!jit) return; heap = (Heap*)jit->c->env->heap; mem = jit->c->env->execmem; @@ -324,10 +367,9 @@ void cfree_jit_free(CfreeJit* jit) { cfree_obj_close(jit->view); jit->view = NULL; } - if (jit->segs && mem && mem->release) { - for (i = 0; i < jit->nsegs; ++i) { - if (jit->segs[i].size) mem->release(mem->user, &jit->segs[i]); - } + /* segs[] are views into master — release master only. */ + if (mem && mem->release && jit->master.size) { + mem->release(mem->user, &jit->master); } if (jit->segs) { heap->free(heap, jit->segs, sizeof(*jit->segs) * jit->nsegs); diff --git a/src/link/link_layout.c b/src/link/link_layout.c @@ -1918,8 +1918,282 @@ static int reloc_uses_got(u16 kind) { } } +/* Forward decls — defined alongside layout_iplt below. */ +static u32 layout_iplt_alloc_segments(LinkImage* img, u32 nseg); +static u32 layout_iplt_alloc_sections(LinkImage* img, u32 nsec); + +/* ---- pass: JIT call stubs ---- + * + * For the JIT path on AArch64, route every CALL26/JUMP26 against a + * resolver-supplied or weak-undef symbol (SK_ABS) through a 12-byte + * stub colocated with .text inside the JIT mapping. The stub is + * ADRP x16, slot ; LDR x16,[x16,#:lo12:slot] ; BR x16 + * and the slot is an 8-byte GOT entry filled by a per-slot R_ABS64 + * reloc against a synthetic resolver-pointer LinkSymbol (whose vaddr + * is the original SK_ABS target's vaddr — a host pointer for + * resolver-supplied symbols, 0 for weak-undef). + * + * Rationale: without this routing, CALL26 to a resolver-supplied host + * function (e.g. libc `printf` from `cfree run`) trips link_reloc's + * ±128 MiB range check, since the JIT mapping is arbitrarily far from + * the host VA the resolver returned. + * + * The stub_map output is a sparse array indexed by LinkSymId + * (size = LinkSyms_count(&img->syms)+1 at pass entry; the new stub / + * slot / resolver_rec LinkSymbols are never themselves looked up + * through this map). emit_reloc_records consults it to redirect + * CALL26/JUMP26 targets. + * + * Runs after resolve_undefs (SK_ABS is set) and before + * emit_reloc_records (so the redirect takes effect). Only runs on + * AArch64 JIT (`!emit_static_exe`); the exe path covers the same + * shape via PLT (ELF) / stubs (Mach-O). + * + * Address-taking via GOT_LOAD still resolves to the original + * resolver-supplied vaddr (the GOT slot's R_ABS64 against the + * non-redirected symbol). Address-taking via direct PCREL would land + * on the stub instead, but clang does not emit non-GOT-routed + * pointer-to-extern on AArch64. */ +static void layout_jit_call_stubs(Linker* l, LinkImage* img, u32 map_size, + LinkSymId** stub_map_out) { + Heap* h = img->heap; + const LinkArchDesc* arch; + LinkSymId* stub_map; + LinkSymId* targets = NULL; + u32 ntarget = 0, tcap = 0; + u32 ii, k, i; + u64 page; + u64 base_vaddr = 0; + u64 stubs_vaddr, slots_vaddr; + u64 stubs_size, slots_size; + u32 stubs_seg_idx, slots_seg_idx; + u32 seg_base, sec_base; + LinkSegment* stubs_seg; + LinkSegment* slots_seg; + LinkSection* stubs_sec; + LinkSection* slots_sec; + u8* stubs_bytes; + + *stub_map_out = NULL; + if (l->emit_static_exe) return; + if (l->c->target.arch != CFREE_ARCH_ARM_64) return; + + arch = link_arch_desc_for(l->c); + if (!arch) return; + + stub_map = (LinkSymId*)h->alloc(h, sizeof(*stub_map) * map_size, + _Alignof(LinkSymId)); + if (!stub_map) compiler_panic(img->c, no_loc(), "link: oom on stub map"); + memset(stub_map, 0, sizeof(*stub_map) * map_size); + + /* Pass A: collect unique SK_ABS targets of CALL26/JUMP26. */ + for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) { + ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj; + InputMap* m = &img->input_maps[ii]; + u32 total = obj_reloc_total(ob); + if (!total) continue; + for (k = 0; k < total; ++k) { + const Reloc* r = obj_reloc_at(ob, k); + const Section* s = obj_section_get(ob, r->section_id); + LinkSymId target; + const LinkSymbol* tgt; + if (!s || !section_kept(s)) continue; + if (m->section[r->section_id] == LINK_SEC_NONE) continue; + if (r->kind != R_AARCH64_CALL26 && r->kind != R_AARCH64_JUMP26) continue; + if (r->sym == OBJ_SYM_NONE || r->sym >= m->nsym) continue; + target = m->sym[r->sym]; + if (target == LINK_SYM_NONE) continue; + tgt = LinkSyms_at(&img->syms, target - 1); + if (!tgt || tgt->kind != SK_ABS) continue; + if (stub_map[target] != LINK_SYM_NONE) continue; + if (VEC_GROW(h, targets, tcap, ntarget + 1u)) + compiler_panic(img->c, no_loc(), "link: oom on stub target list"); + targets[ntarget] = target; + /* Sentinel marker; replaced with the stub's LinkSymId in pass C. */ + stub_map[target] = (LinkSymId)(ntarget + 1u); + ntarget++; + } + } + + if (ntarget == 0) { + if (targets) h->free(h, targets, sizeof(*targets) * tcap); + h->free(h, stub_map, sizeof(*stub_map) * map_size); + return; + } + /* Reset sentinels — pass C writes real stub LinkSymIds. */ + for (i = 0; i < ntarget; ++i) stub_map[targets[i]] = LINK_SYM_NONE; + + /* Pass B: allocate RX stubs segment + RW slots segment. Both land + * page-aligned after the current image tail; layout_iplt may run + * before us (IFUNC), and layout_got after — none of those passes + * shift segments allocated here. */ + page = layout_page_size(l); + for (i = 0; i < img->nsegments; ++i) { + u64 end = img->segments[i].vaddr + img->segments[i].mem_size; + if (end > base_vaddr) base_vaddr = end; + } + base_vaddr = ALIGN_UP(base_vaddr, (u64)page); + stubs_vaddr = base_vaddr; + stubs_size = (u64)ntarget * (u64)arch->iplt_stub_size; + slots_vaddr = ALIGN_UP(stubs_vaddr + stubs_size, (u64)page); + slots_size = (u64)ntarget * 8u; + + seg_base = layout_iplt_alloc_segments(img, 2u); + stubs_seg_idx = seg_base + 0u; + slots_seg_idx = seg_base + 1u; + + stubs_seg = &img->segments[stubs_seg_idx]; + memset(stubs_seg, 0, sizeof(*stubs_seg)); + stubs_seg->id = (LinkSegmentId)(stubs_seg_idx + 1u); + stubs_seg->flags = SF_ALLOC | SF_EXEC; + stubs_seg->file_offset = stubs_vaddr; + stubs_seg->vaddr = stubs_vaddr; + stubs_seg->file_size = stubs_size; + stubs_seg->mem_size = stubs_size; + stubs_seg->align = (u32)page; + stubs_seg->nsections = 1; + img->segment_bytes[stubs_seg_idx] = (u8*)h->alloc(h, (size_t)stubs_size, 16); + img->segment_bytes_cap[stubs_seg_idx] = (size_t)stubs_size; + if (!img->segment_bytes[stubs_seg_idx]) + compiler_panic(img->c, no_loc(), "link: oom on jit stubs bytes"); + memset(img->segment_bytes[stubs_seg_idx], 0, (size_t)stubs_size); + + slots_seg = &img->segments[slots_seg_idx]; + memset(slots_seg, 0, sizeof(*slots_seg)); + slots_seg->id = (LinkSegmentId)(slots_seg_idx + 1u); + slots_seg->flags = SF_ALLOC | SF_WRITE; + slots_seg->file_offset = slots_vaddr; + slots_seg->vaddr = slots_vaddr; + slots_seg->file_size = slots_size; + slots_seg->mem_size = slots_size; + slots_seg->align = (u32)page; + slots_seg->nsections = 1; + img->segment_bytes[slots_seg_idx] = (u8*)h->alloc(h, (size_t)slots_size, 16); + img->segment_bytes_cap[slots_seg_idx] = (size_t)slots_size; + if (!img->segment_bytes[slots_seg_idx]) + compiler_panic(img->c, no_loc(), "link: oom on jit stub slots bytes"); + memset(img->segment_bytes[slots_seg_idx], 0, (size_t)slots_size); + img->nsegments += 2u; + + sec_base = layout_iplt_alloc_sections(img, 2u); + stubs_sec = &img->sections[sec_base + 0u]; + memset(stubs_sec, 0, sizeof(*stubs_sec)); + stubs_sec->id = (LinkSectionId)(sec_base + 0u + 1u); + stubs_sec->input_id = LINK_INPUT_NONE; + stubs_sec->obj_section_id = OBJ_SEC_NONE; + stubs_sec->segment_id = stubs_seg->id; + stubs_sec->input_offset = 0; + stubs_sec->file_offset = stubs_vaddr; + stubs_sec->vaddr = stubs_vaddr; + stubs_sec->size = stubs_size; + stubs_sec->flags = SF_ALLOC | SF_EXEC; + stubs_sec->align = 4; + stubs_sec->name = pool_intern_cstr(l->c->global, ".cfree_jit_call_stubs"); + stubs_sec->sem = SSEM_PROGBITS; + + slots_sec = &img->sections[sec_base + 1u]; + memset(slots_sec, 0, sizeof(*slots_sec)); + slots_sec->id = (LinkSectionId)(sec_base + 1u + 1u); + slots_sec->input_id = LINK_INPUT_NONE; + slots_sec->obj_section_id = OBJ_SEC_NONE; + slots_sec->segment_id = slots_seg->id; + slots_sec->input_offset = 0; + slots_sec->file_offset = slots_vaddr; + slots_sec->vaddr = slots_vaddr; + slots_sec->size = slots_size; + slots_sec->flags = SF_ALLOC | SF_WRITE; + slots_sec->align = 8; + slots_sec->name = pool_intern_cstr(l->c->global, ".cfree_jit_call_slots"); + slots_sec->sem = SSEM_PROGBITS; + img->nsections += 2u; + + /* Pass C: per target, emit stub bytes, synthesize slot + resolver + * LinkSymbols, and queue the 3 relocs that wire them together. */ + stubs_bytes = img->segment_bytes[stubs_seg_idx]; + for (i = 0; i < ntarget; ++i) { + LinkSymId orig = targets[i]; + LinkSymbol* orig_sym = LinkSyms_at(&img->syms, orig - 1); + u64 stub_vaddr = stubs_vaddr + (u64)i * (u64)arch->iplt_stub_size; + u64 slot_vaddr = slots_vaddr + (u64)i * 8u; + LinkSymbol slot_rec, resolver_rec, stub_rec; + LinkSymId slot_id, resolver_id, stub_id; + LinkArchIPltReloc stub_relocs[2]; + u32 nstub_relocs; + LinkRelocApply rrec; + u8* stub_dst = stubs_bytes + (size_t)i * (size_t)arch->iplt_stub_size; + u32 ri; + + nstub_relocs = + arch->emit_iplt_stub(stub_dst, stub_vaddr, slot_vaddr, stub_relocs); + + memset(&slot_rec, 0, sizeof(slot_rec)); + slot_rec.kind = SK_OBJ; + slot_rec.bind = SB_LOCAL; + slot_rec.defined = 1; + slot_rec.section_id = slots_sec->id; + slot_rec.vaddr = slot_vaddr; + slot_rec.size = 8; + slot_id = append_symbol(img, &slot_rec); + + /* Preserve the original SK_ABS vaddr (host pointer / NULL) for the + * slot's R_ABS64. Redirecting the original LinkSymbol would + * change semantics for non-call references (e.g. data loads). */ + memset(&resolver_rec, 0, sizeof(resolver_rec)); + resolver_rec.kind = SK_ABS; + resolver_rec.bind = SB_LOCAL; + resolver_rec.defined = 1; + resolver_rec.vaddr = orig_sym->vaddr; + resolver_id = append_symbol(img, &resolver_rec); + + memset(&stub_rec, 0, sizeof(stub_rec)); + stub_rec.kind = SK_FUNC; + stub_rec.bind = SB_LOCAL; + stub_rec.defined = 1; + stub_rec.section_id = stubs_sec->id; + stub_rec.vaddr = stub_vaddr; + stub_rec.size = arch->iplt_stub_size; + stub_id = append_symbol(img, &stub_rec); + stub_map[orig] = stub_id; + + /* Stub→slot relocs (ADR_PREL_PG_HI21 + LDST64_ABS_LO12_NC). */ + for (ri = 0; ri < nstub_relocs; ++ri) { + memset(&rrec, 0, sizeof(rrec)); + rrec.input_id = LINK_INPUT_NONE; + rrec.section_id = OBJ_SEC_NONE; + rrec.link_section_id = stubs_sec->id; + rrec.offset = (u32)(i * arch->iplt_stub_size) + + stub_relocs[ri].offset_in_stub; + rrec.width = stub_relocs[ri].width; + rrec.write_vaddr = stub_vaddr + stub_relocs[ri].offset_in_stub; + rrec.write_file_offset = rrec.write_vaddr; + rrec.kind = stub_relocs[ri].kind; + rrec.target = slot_id; + rrec.addend = 0; + *append_reloc_slot(img) = rrec; + } + + /* Slot R_ABS64 against resolver_rec (preserves original vaddr). */ + memset(&rrec, 0, sizeof(rrec)); + rrec.input_id = LINK_INPUT_NONE; + rrec.section_id = OBJ_SEC_NONE; + rrec.link_section_id = slots_sec->id; + rrec.offset = (u32)(i * 8u); + rrec.width = 8; + rrec.write_vaddr = slot_vaddr; + rrec.write_file_offset = slot_vaddr; + rrec.kind = R_ABS64; + rrec.target = resolver_id; + rrec.addend = 0; + *append_reloc_slot(img) = rrec; + } + + if (targets) h->free(h, targets, sizeof(*targets) * tcap); + *stub_map_out = stub_map; +} + static void emit_reloc_records(Linker* l, LinkImage* img, - const LinkSymId* got_map) { + const LinkSymId* got_map, + const LinkSymId* stub_map) { u32 ii; for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) { ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj; @@ -1958,6 +2232,19 @@ static void emit_reloc_records(Linker* l, LinkImage* img, compiler_panic(l->c, no_loc(), "link: GOT slot missing for symbol"); target = slot; } + /* JIT path: CALL26/JUMP26 against a resolver-supplied (or + * weak-undef) SK_ABS target is routed through a per-target stub + * synthesized by layout_jit_call_stubs. The stub is colocated + * with .text inside the JIT mapping so the branch displacement + * fits ±128 MiB even when the real target is a host pointer + * arbitrarily far away. stub_map is sparse — only entries for + * targets a CALL26/JUMP26 was actually emitted against are + * populated. */ + if (stub_map && (r->kind == R_AARCH64_CALL26 || + r->kind == R_AARCH64_JUMP26)) { + LinkSymId stub = stub_map[target]; + if (stub != LINK_SYM_NONE) target = stub; + } ls = &img->sections[m->section[r->section_id] - 1]; memset(&rec, 0, sizeof(rec)); rec.input_id = LinkInputs_at(&l->inputs, ii)->id; @@ -1993,7 +2280,8 @@ static void emit_reloc_records(Linker* l, LinkImage* img, * (LinkSyms_count(&img->syms)+1) indexed by LinkSymId, holding the slot's * synthetic LinkSymId (or LINK_SYM_NONE for symbols that don't need a slot). * Caller frees. */ -static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out) { +static void layout_got(Linker* l, LinkImage* img, u32 map_size, + LinkSymId** got_map_out) { Heap* h = img->heap; LinkSymId* got_map; LinkSymId* slot_targets = NULL; @@ -2010,15 +2298,16 @@ static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out) { *got_map_out = NULL; + /* map_size is the caller's pre-pass symbol count (+ 1 for the 1-based + * LinkSymId space). Synthetic syms appended below are never indexed + * through got_map, so the map is correctly sized despite further + * growth of img->syms. */ + got_map = (LinkSymId*)h->alloc(h, sizeof(*got_map) * map_size, + _Alignof(LinkSymId)); + if (!got_map) compiler_panic(img->c, no_loc(), "link: oom on got map"); + memset(got_map, 0, sizeof(*got_map) * map_size); + /* Pass A: scan input relocs for GOT-using kinds. */ - { - u32 nsyms_now = LinkSyms_count(&img->syms); /* freeze before we append */ - got_map = (LinkSymId*)h->alloc(h, sizeof(*got_map) * (nsyms_now + 1u), - _Alignof(LinkSymId)); - if (!got_map) compiler_panic(img->c, no_loc(), "link: oom on got map"); - memset(got_map, 0, sizeof(*got_map) * (nsyms_now + 1u)); - (void)nsyms_now; - } for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) { ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj; @@ -2048,7 +2337,7 @@ static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out) { if (nslot == 0) { if (slot_targets) h->free(h, slot_targets, sizeof(*slot_targets) * slot_cap); - h->free(h, got_map, sizeof(*got_map) * (LinkSyms_count(&img->syms) + 1u)); + h->free(h, got_map, sizeof(*got_map) * map_size); return; } @@ -2949,16 +3238,35 @@ LinkImage* link_resolve(Linker* l) { if (img->niplt) emit_array_boundaries(l, img); { LinkSymId* got_map = NULL; - u32 got_map_size = LinkSyms_count(&img->syms) + 1u; + LinkSymId* stub_map = NULL; + /* Both maps are sparse arrays indexed by orig LinkSymId, sized + * to the symbol count BEFORE either pass appends synthetic + * entries (stub/slot/resolver_rec from layout_jit_call_stubs; + * GOT-slot syms from layout_got). Snapshot here so the free + * size matches the allocation. */ + u32 map_size = LinkSyms_count(&img->syms) + 1u; + /* JIT-only: synthesize per-target stubs for CALL26/JUMP26 + * against resolver-supplied or weak-undef SK_ABS targets so the + * branch displacement stays within ±128 MiB of .text regardless + * of where the resolver-returned host pointer lives. Runs + * before layout_got (the stub's slot reloc is non-GOT) and + * before emit_reloc_records (which consults stub_map). */ + layout_jit_call_stubs(l, img, map_size, &stub_map); /* layout_got synthesizes ELF-shaped .got slots and rewrites * GOT-using reloc targets to point at them. Mach-O has its own - * __DATA_CONST,__got mechanism wired up in link_macho.c, so - * skip the ELF synthesis there — GOT relocs keep their original - * user-named target, which link_macho's collect_imports pass - * matches against imports + internal-GOT entries. */ - if (l->c->target.obj != CFREE_OBJ_MACHO) layout_got(l, img, &got_map); - emit_reloc_records(l, img, got_map); - if (got_map) h->free(h, got_map, sizeof(*got_map) * got_map_size); + * __DATA_CONST,__got mechanism wired up in link_macho.c for the + * exe path (driven by collect_imports), so skip the ELF synthesis + * there. The JIT path has no equivalent — link_jit.c does not + * run collect_imports — so fall through to layout_got on Mach-O + * when emit_static_exe is off (cfree_link_jit). Without this, + * cross-TU GOT_LOAD_PAGE21 / LD64_GOT_LO12_NC relocs would patch + * with S = symbol value instead of S = slot address (see + * doc/MACHO.md §3.1). */ + if (l->c->target.obj != CFREE_OBJ_MACHO || !l->emit_static_exe) + layout_got(l, img, map_size, &got_map); + emit_reloc_records(l, img, got_map, stub_map); + if (got_map) h->free(h, got_map, sizeof(*got_map) * map_size); + if (stub_map) h->free(h, stub_map, sizeof(*stub_map) * map_size); } /* Phase 4 dynamic-link tables. Runs after every other layout * pass: it depends on import resolution (resolve_undefs), every diff --git a/test/link/cases/32_ifunc/j_targets b/test/link/cases/32_ifunc/j_targets @@ -0,0 +1,3 @@ +aa64-elf +rv64-elf +x64-elf diff --git a/test/link/cases/33_ifunc_in_init/j_targets b/test/link/cases/33_ifunc_in_init/j_targets @@ -0,0 +1,3 @@ +aa64-elf +rv64-elf +x64-elf diff --git a/test/link/cases/34_ifunc_addr_taken/j_targets b/test/link/cases/34_ifunc_addr_taken/j_targets @@ -0,0 +1,3 @@ +aa64-elf +rv64-elf +x64-elf