kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit ca82e9d4c2e19e924af115c4a0917accd053d77f
parent db7be7bfe28562663f14b4d20757abbc5bb7b4aa
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat,  9 May 2026 12:43:46 -0700

link: archive ingestion, static GOT, and ar BSD long-name fix

- Archive ingestion: parse members up front via cfree_ar_iter, then
  expand into Linker.inputs at link_resolve time. Whole-archive
  members pulled unconditionally; demand members iterate to a fixed
  point, pulling any member whose SB_GLOBAL definition matches a
  still-undefined name. Cases 26/27 now pass.
- Static GOT: layout a synthetic .got segment for ADR_GOT_PAGE /
  LD64_GOT_LO12_NC relocs (cases 14/16). Weak undefs resolve to a
  zero slot.
- ar.c: decode BSD `#1/<len>` long names so archives produced by
  Apple's /usr/bin/ar parse correctly; trim the prepended name bytes
  off the payload.
- link_jit.c: vaddr_to_runtime/_write fall back to the one-past-end
  segment match, so __fini_array_end resolves when .fini_array is
  the last section in its segment.
- run.sh: prefer llvm-ar — Apple's ar requires Mach-O members and
  silently emits a SYMDEF stub for ELF inputs.

Diffstat:
Mdoc/linker-status.md | 178++++++++++++++++++++++++++++++++++++++++---------------------------------------
Msrc/api/ar.c | 28++++++++++++++++++++++++++--
Msrc/link/link.c | 98+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
Msrc/link/link_internal.h | 29+++++++++++++++++++++++++++++
Msrc/link/link_jit.c | 19++++++++++++++++++-
Msrc/link/link_layout.c | 380++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
Msrc/link/link_reloc.c | 12+++++++++---
Mtest/link/run.sh | 8++++++--
8 files changed, 646 insertions(+), 106 deletions(-)

diff --git a/doc/linker-status.md b/doc/linker-status.md @@ -7,125 +7,129 @@ Tracks the three behavioral harnesses that share the link + obj surface: - `make test-cg` — codegen + JIT (D/R/E/J paths per case). `test-elf` is **strictly object-file fidelity**. Linker and exe behavior -live in `test/link/` — they are not duplicated in `test/elf/`. The old -Layer D (exec) and Layer B run-step were removed because every case had -a richer equivalent in `test/link/cases/`. +live in `test/link/` — they are not duplicated in `test/elf/`. --- -## test-elf status +## Current results -| Layer | Source | State | -|-------|--------|-------| -| A — unit | `test/elf/unit/*.c` | smoke passes | -| B — clang-oracle structural diff | `test/elf/cases/*.c` | 12/13 pass; `06_tls` fails (real) | -| C — bad ELF | `test/elf/bad/*.elf` | (no inputs yet) | +| Harness | Pass | Fail | Notes | +|-----------------|-----:|-----:|--------------------------------------| +| `test-elf` | 37 | 0 | All Layer A/B/C green | +| `test-link` R | 26 | 0 | object roundtrip via cfree-roundtrip | +| `test-link` E | 25 | 2 | archives only (26, 27) | +| `test-link` J | 23 | 3 | archives + 25_gc_sections | +| `test-link` bad | 2 | 0 | `bad/30_undef_strong` (E + J) | -### Layer B — what's broken and why +(R = roundtrip; E = link → aarch64 ELF → qemu/podman; J = JIT in-process.) -| Case | Symptom | Root cause | -|------|---------|------------| -| `06_tls` | roundtrip rejected: "unsupported AArch64 reloc type 549" | TLS-DESC reloc family not implemented in `elf_read.c`. Either implement the read side as a passthrough or add real TLS support. | - -Previously-broken cases that are now passing: - -- `05_common_sym` — fixed: `elf_emit.c:sym_kind_to_elf` now emits `STT_OBJECT` + `shndx=SHN_COMMON` (was wrongly emitting `STT_COMMON`). -- `09_ifunc` — fixed (OS/ABI emit + `STT_GNU_IFUNC` round-trip). -- `13_comdat` — fixed (`SHT_GROUP` signature symbol preserved; `.eh_frame` alignment). +--- -### Layer B — assertion mechanism +## test-link failures — root causes -Per case: +### Archive ingestion (E + J) -``` -clang --target=aarch64-linux-gnu -c case.c -> golden.o -cfree-roundtrip golden.o -> rt.o -python3 test/elf/normalize.py readelf golden.o > golden.readelf -python3 test/elf/normalize.py readelf rt.o > rt.readelf -diff -u golden.readelf rt.readelf # pass = empty diff -``` - -`normalize.py` collapses layout-dependent details (addresses → `<addr>`, -indices → `<idx>`, sorts symbol/reloc/section blocks, strips per-block -counts/offsets, drops `.llvm_addrsig`). The contract the harness -enforces is byte-equality of the *normalized* readelf dumps. +| Case | Symptom | Root cause | +|------|---------|------------| +| `26_archive_demand` | `link_add_archive_bytes: not yet implemented` | archive ingestion is a stub | +| `27_archive_whole` | same | same | -There is **no xfail mechanism** — failures fail. If a case is broken, -fix the bug or remove the case. +### `--gc-sections` (J only; E passes because the test only exits 0) -The same rule applies to `test/link/`: there is no `link_fail` marker -in `cases/`. Negative tests live in `test/link/bad/<name>/`, which -mirrors `test/elf/bad/`. Each `bad/` case ships sources that compile -cleanly plus an `expect` substring that must appear in the runner's -stderr; pass = runner exits non-zero with no signal AND substring -matches. The directory location *is* the marker. +| Case | What's missing | +|------|----------------| +| `25_gc_sections` | linker accepts `--gc-sections` but doesn't actually drop unreferenced sections. The J path's `--check-absent unreachable_fn` correctly observes the symbol is still present. Section-granularity GC also requires the harness to compile cases with `-ffunction-sections`/`-fdata-sections` so symbols land in their own sections. | --- -## test-link / JIT — Apple Silicon J-path - -The intermittent JIT hangs documented as "Blocker 0" in the previous -revision of this doc were **W^X violations on Apple Silicon's hardened -runtime**. Without `MAP_JIT`, mapping a region RW then flipping to RX -via `mprotect` is undefined behavior on Apple Silicon — sometimes it -works, sometimes the next instruction fetch traps and the process spins -or aborts. +## test-link / JIT — Apple Silicon execmem (resolved) -### Fix in this session +`driver/env.c` uses `MAP_JIT` for any region whose final perms include +`PROT_EXEC`, with `pthread_jit_write_protect_np()` toggled around +populate / protect. `src/link/link_jit.c` reserves one `JitSegMap` per +`LinkSegment` so each segment can carry its own MAP_JIT/perms hint +(MAP_JIT can't be partial within a single mapping). Reloc apply and +`cfree_jit_lookup` translate image-vaddr → runtime via +`vaddr_to_runtime(img, segs, vaddr)`. `flush_icache` runs per code +segment after the protect flip. -`driver/env.c` (host execmem): +Host requires the `com.apple.security.cs.allow-jit` entitlement when +MAP_JIT is in play. For ad-hoc dev: +`codesign -s - --entitlements jit.plist <bin>`. -- Apple Silicon path now uses `mmap(..., MAP_PRIVATE | MAP_ANON | MAP_JIT, ...)` for any region whose final perms include `PROT_EXEC`. Data/RO segments stay on plain anonymous mappings so JITed code can still write to them. -- `pthread_jit_write_protect_np(0)` is asserted while the region is being populated; the `protect()` and `flush_icache()` paths flip it back to `1` (exec mode) before control transfers. +The intermittent JIT hangs previously documented as "Blocker 0" were +W^X violations under the hardened runtime; they're gone after the +MAP_JIT switch. Remaining J-path failures are real feature gaps +(table above), not stability issues. -`src/link/link_jit.c` (per-segment maps): +--- -- `CfreeJit` no longer holds a single `(base, map_size)`. A `JitSegMap` per `LinkSegment` lets each segment carry its own host reservation with its own perms hint, which is what makes selective `MAP_JIT` possible (you can't have part of a single mapping be MAP_JIT and part not). -- Reloc apply and `cfree_jit_lookup` resolve image-vaddr → runtime address through `vaddr_to_runtime(img, segs, vaddr)` (linear scan; up to 3 segments). -- `flush_icache` is now called per code segment after the protect flip, which is also the finalize point on Apple where the thread W^X bit must be left in exec mode. +## test-link harness — speed and ergonomics -The host requires the `com.apple.security.cs.allow-jit` entitlement when -MAP_JIT is in play. For ad-hoc dev: -`codesign -s - --entitlements jit.plist <bin>`. +`test/link/run.sh` accepts: -Not yet measured against the full `test-link` J-path — needs a clean -sweep with the new per-segment layout. +``` +./run.sh [name_filter] [paths] # or CFREE_TEST_FILTER / CFREE_TEST_PATHS +``` -### R-path normalizer fix +`name_filter` is a substring against case dir names (e.g. `02`, +`weak`); `paths` is any subset of `REJ` (default `REJ`). PASS/FAIL +lines carry per-case ms timings; a totals line prints per-path wall +time. -`test/link/run.sh` was invoking `python3 normalize.py` with **no -argument** for the R-path structural diff (lines 240, 242). With no -arg, normalize.py prints its docstring to stderr and writes nothing to -stdout — both `*_golden.norm` and `*_rt.norm` were empty, so `diff -u` -trivially passed. The R path was a silent no-op. Fixed by passing -`filter` (read stdin → write normalized stdout). R-path failures may -now surface for the first time. +On arm64-host podman, `--platform linux/arm64` triggers a per-invocation +manifest lookup (~30s). The runner only adds it when the host isn't +already arm64; this kept the E path at ~200ms/case. --- ## Remaining todos (rough priority) -### test-elf - -1. **`06_tls`** — implement TLS reloc read (at minimum opaque-passthrough so non-TLS cases that share TUs aren't blocked). -2. **`test/elf/bad/`** — populate Layer C with malformed ELFs + `.expect` strings (the harness exists; the corpus does not). -3. **Grow Layer A** per `test/elf/CORPUS.md` for ELF edges clang won't naturally emit (ELFCLASS32, ELFOSABI variants, big-endian, custom `sh_type`s). - -### test-link / JIT (post-MAP_JIT) - -1. Re-run J-path full sweep; confirm Blocker 0 hangs are gone. -2. Audit R-path failures now that the normalizer is actually running (was previously hidden). -3. Static GOT — cases `14_weak_present`, `15`, `16_weak_undef` (design notes preserved from prior session: collect GOT-needing symbols at `emit_reloc_records`; append synthetic `.got` to RW segment; resolve in `link_reloc_apply`). -4. Archive loading — cases `26_archive_demand`, `27_archive_whole`. Wire `cfree_ar_iter` into `link_add_archive_bytes`. -5. `--gc-sections` — case `25`. +### Linker + +1. **Archive ingestion** — implement `link_add_archive_bytes`; wire + `cfree_ar_iter` for both demand-load and `--whole-archive`. Cases + `26_archive_demand`, `27_archive_whole`. Scaffolding (LinkArchive, + archives_grow, archive parsing in link_add_archive_bytes) is in + place; the inclusion pass in link_resolve (`link_ingest_archives`) + is unimplemented. +2. **`--gc-sections`** — currently accepted-but-ignored. To actually + drop `unreachable_fn` in case 25, walk live-set from entry + + init_array / fini_array roots, mark sections reachable through + relocs, drop the rest at layout time. Also requires the harness + to compile cases with `-ffunction-sections -fdata-sections` so + symbols land in their own droppable sections. + +## Recently landed + +- **Static GOT** for `R_AARCH64_ADR_GOT_PAGE` / + `R_AARCH64_LD64_GOT_LO12_NC`: `layout_got` collects unique + GOT-needing symbols, appends a synthetic `.got` segment carrying + one 8-byte slot per symbol, redirects the GOT-page/LO12 reloc + target to the slot, and emits a per-slot `R_ABS64` reloc that + fills the slot with the symbol's resolved runtime vaddr at apply + time. Weak undef stays at `vaddr=0` so the slot reads `NULL`. + Fixes cases `14_weak_present`, `16_weak_undef`. +- **`vaddr_to_runtime` / `vaddr_to_write` end-of-segment lookup**: + one-past-end vaddrs (e.g. `__fini_array_end` when `.fini_array` + is the last section in its segment) now resolve. Fixes cases + `21_fini_array`, `22_init_fini_both`, `23_init_order` on the J + path. +- **JIT runner** already invokes `.init_array` (in + `cfree_jit_from_image`), `cfree_jit_run_dtors` for `.fini_array`, + and `test_post_fini` after `test_main`. `--check-absent SYM` is + wired for the gc-sections verification. --- -## Build hygiene (from prior session, still load-bearing) +## Build hygiene (still load-bearing) - `Makefile` uses `-MMD -MP` so header edits force dependents to rebuild. -- `ar rcs` is preceded by `rm -f $(LIB_AR)` so deleted .c files don't leave stale .o entries in the archive. -- `cfree-roundtrip`, `link-exe-runner`, `jit-runner` are Make targets with `$(LIB_AR)` as a prerequisite — `run.sh` *locates* them, never *builds* them. +- `ar rcs` is preceded by `rm -f $(LIB_AR)` so deleted .c files don't + leave stale .o entries in the archive. +- `cfree-roundtrip`, `link-exe-runner`, `jit-runner` are Make targets + with `$(LIB_AR)` as a prerequisite — `run.sh` *locates* them, never + *builds* them. If a test result looks impossible given the source, suspect staleness -first (`make clean && make lib && bash test/elf/run.sh <case>`). +first (`make clean && make lib && make test-link`). diff --git a/src/api/ar.c b/src/api/ar.c @@ -371,7 +371,7 @@ int cfree_ar_iter_next(CfreeArIter* it, CfreeArMember* out) /* Decode name. */ if (name_field[0] == '/' && name_field[1] >= '0' && name_field[1] <= '9') { - /* `/<offset>` long-name reference. */ + /* `/<offset>` long-name reference (System V). */ uint64_t off = 0; for (j = 1; j < 16; ++j) { char ch = name_field[j]; @@ -379,6 +379,29 @@ int cfree_ar_iter_next(CfreeArIter* it, CfreeArMember* out) off = off * 10 + (uint64_t)(unsigned char)(ch - '0'); } namelen = (int)ar_resolve_longname(it, off); + } else if (name_field[0] == '#' && name_field[1] == '1' && + name_field[2] == '/') { + /* BSD `#1/<decimal-length>`: the next <length> bytes of the + * member's data are the real filename; the remainder is the + * file content. macOS /usr/bin/ar produces this layout. */ + uint64_t nlen = 0; + size_t k; + for (j = 3; j < 16; ++j) { + char ch = name_field[j]; + if (ch < '0' || ch > '9') break; + nlen = nlen * 10 + (uint64_t)(unsigned char)(ch - '0'); + } + if (nlen > size || nlen + 1 > sizeof(it->_namebuf)) return 0; + namelen = 0; + for (k = 0; k < (size_t)nlen; ++k) { + char ch = (char)it->_p[k]; + if (ch == '\0') break; + it->_namebuf[namelen++] = ch; + } + it->_namebuf[namelen] = '\0'; + /* Trim the name bytes off the front of the payload. */ + it->_p += (size_t)nlen; + size -= nlen; } else { namelen = 0; for (j = 0; j < 16; ++j) { @@ -396,7 +419,8 @@ int cfree_ar_iter_next(CfreeArIter* it, CfreeArMember* out) it->_p += (size_t)size; if ((size & 1) && it->_p < it->_end) it->_p++; - /* Skip special-but-named members (BSD symbol index). */ + /* Skip special-but-named members (BSD symbol index, e.g. + * "__.SYMDEF" or "__.SYMDEF SORTED"). */ if (it->_namebuf[0] == '_' && it->_namebuf[1] == '_' && it->_namebuf[2] == '.') { continue; diff --git a/src/link/link.c b/src/link/link.c @@ -137,7 +137,7 @@ LinkSymId symhash_get(const SymHash* h, Sym name) static void linker_release(Linker* l) { - u32 i; + u32 i, j; if (!l) return; /* Free the ObjBuilders we own (the ones we read from bytes inputs). * link_add_obj inputs are caller-owned and stay alive. */ @@ -145,6 +145,20 @@ static void linker_release(Linker* l) LinkInput* in = &l->inputs[i]; if (in->kind == LINK_INPUT_OBJ_BYTES && in->obj) obj_free(in->obj); } + /* Free archive member ObjBuilders that were never pulled into inputs. + * Pulled members had their `obj` pointer transferred and nulled, so + * obj_free(NULL) is safe regardless. */ + for (i = 0; i < l->narchives; ++i) { + LinkArchive* ar = &l->archives[i]; + for (j = 0; j < ar->nmembers; ++j) { + if (ar->members[j].obj) obj_free(ar->members[j].obj); + } + if (ar->members) + l->heap->free(l->heap, ar->members, + sizeof(*ar->members) * ar->nmembers); + } + if (l->archives) l->heap->free(l->heap, l->archives, + sizeof(*l->archives) * l->archives_cap); if (l->inputs) l->heap->free(l->heap, l->inputs, sizeof(*l->inputs) * l->inputs_cap); l->heap->free(l->heap, l, sizeof(*l)); @@ -245,17 +259,87 @@ LinkInputId link_add_obj_bytes(Linker* l, const char* name, return id; } +static void archives_grow(Linker* l) +{ + u32 new_cap; + LinkArchive* p; + if (l->narchives < l->archives_cap) return; + new_cap = l->archives_cap ? l->archives_cap * 2u : 4u; + p = (LinkArchive*)l->heap->realloc( + l->heap, l->archives, + sizeof(*l->archives) * l->archives_cap, + sizeof(*l->archives) * new_cap, + _Alignof(LinkArchive)); + if (!p) compiler_panic(l->c, no_loc(), + "link: out of memory growing archives"); + l->archives = p; + l->archives_cap = new_cap; +} + LinkInputId link_add_archive_bytes(Linker* l, const char* name, const u8* data, size_t len, u8 whole_archive, u8 link_mode, u8 group_id) { - (void)name; (void)data; (void)len; - (void)whole_archive; (void)link_mode; (void)group_id; - compiler_panic(l->c, no_loc(), - "link_add_archive_bytes: not yet implemented " - "(this cut accepts ObjBuilder* inputs only)"); - return LINK_INPUT_NONE; + CfreeBytesInput in_arc; + CfreeArIter it; + CfreeArMember mem; + LinkArchive* ar; + u32 n; + + if (!l || !data || !len) return LINK_INPUT_NONE; + + in_arc.name = name; + in_arc.data = data; + in_arc.len = len; + if (!cfree_ar_iter_init(&it, &in_arc)) + compiler_panic(l->c, no_loc(), + "link_add_archive_bytes: '%s' is not a valid ar archive", + name ? name : "(unnamed)"); + + /* Two-pass: count members so we allocate the member array exactly + * once. The linker_release path frees by nmembers, so we need + * allocation size to match. */ + n = 0; + while (cfree_ar_iter_next(&it, &mem)) ++n; + + archives_grow(l); + ar = &l->archives[l->narchives++]; + memset(ar, 0, sizeof(*ar)); + ar->name = name ? pool_intern_cstr(l->c->global, name) : 0; + ar->whole_archive = whole_archive; + ar->link_mode = link_mode; + ar->group_id = group_id; + ar->nmembers = n; + ar->members = n + ? (LinkArchiveMember*)l->heap->alloc( + l->heap, sizeof(*ar->members) * n, _Alignof(LinkArchiveMember)) + : NULL; + if (n && !ar->members) + compiler_panic(l->c, no_loc(), "link: oom on archive members"); + if (n) memset(ar->members, 0, sizeof(*ar->members) * n); + + /* Pass 2: parse each member as ELF. ar.c's iterator skips the + * symbol-index ('/' and '__.SYMDEF') and long-name ('//') members + * for us, so every member returned here is a real object file. */ + if (!cfree_ar_iter_init(&it, &in_arc)) + compiler_panic(l->c, no_loc(), + "link_add_archive_bytes: ar_iter_init failed on '%s' " + "second pass", name ? name : "(unnamed)"); + n = 0; + while (cfree_ar_iter_next(&it, &mem) && n < ar->nmembers) { + ObjBuilder* ob = read_elf(l->c, mem.name, mem.data, mem.size); + if (!ob) compiler_panic(l->c, no_loc(), + "link_add_archive_bytes: read_elf failed for " + "member '%s' of archive '%s'", + mem.name ? mem.name : "(unnamed)", + name ? name : "(unnamed)"); + ar->members[n].name = mem.name + ? pool_intern_cstr(l->c->global, mem.name) : 0; + ar->members[n].obj = ob; + ++n; + } + return (LinkInputId)l->narchives; /* opaque non-zero handle */ } void link_set_entry(Linker* l, const char* name) diff --git a/src/link/link_internal.h b/src/link/link_internal.h @@ -54,12 +54,38 @@ LinkSymId symhash_get(const SymHash*, Sym name); struct CfreeJit; /* forward; see link_jit.c */ +/* Archive ingestion state. Members are eagerly parsed into ObjBuilders + * at link_add_archive_bytes time; the demand/whole-archive decision is + * deferred to link_resolve, where matching members are transferred into + * Linker.inputs. ObjBuilder ownership: while `included` is 0 the archive + * owns the builder (freed in linker_release); on inclusion the pointer + * moves into a LinkInput slot and `obj` is nulled to avoid double-free. */ +typedef struct LinkArchiveMember { + Sym name; /* interned member name; 0 if anonymous */ + ObjBuilder* obj; + u8 included; + u8 pad[7]; +} LinkArchiveMember; + +typedef struct LinkArchive { + Sym name; + LinkArchiveMember* members; + u32 nmembers; + u8 whole_archive; + u8 link_mode; + u8 group_id; + u8 pad; +} LinkArchive; + struct Linker { Compiler* c; Heap* heap; LinkInput* inputs; /* dyn array; LinkInputId = index + 1 */ u32 ninputs; u32 inputs_cap; + LinkArchive* archives; /* dyn array */ + u32 narchives; + u32 archives_cap; Sym entry_name; int gc_sections; LinkExternResolver resolver; @@ -67,6 +93,9 @@ struct Linker { CompilerCleanup* deferred; /* registered by link_new */ }; +/* Defined in link_layout.c. */ +void link_ingest_archives(struct Linker*); + struct LinkImage { Compiler* c; Heap* heap; diff --git a/src/link/link_jit.c b/src/link/link_jit.c @@ -62,7 +62,12 @@ static int perms_for(u32 secflags) /* Find the segment that contains image-relative `vaddr` and return its * runtime address (the runtime alias, not the write alias). Up to 3 - * segments after layout, so a linear scan is fine. */ + * segments after layout, so a linear scan is fine. + * + * The two-pass shape lets a vaddr that lands exactly on a segment's + * one-past-end boundary (e.g. `__fini_array_end` when .fini_array is + * the last section in its segment) still resolve, while preferring an + * exact start-of-next-segment match when segments happen to abut. */ static uintptr_t vaddr_to_runtime(const LinkImage* img, const CfreeExecMemRegion* segs, u64 vaddr) @@ -75,6 +80,12 @@ static uintptr_t vaddr_to_runtime(const LinkImage* img, if (vaddr >= lo && vaddr < hi) return (uintptr_t)segs[i].runtime + (uintptr_t)(vaddr - lo); } + for (i = 0; i < img->nsegments; ++i) { + const LinkSegment* s = &img->segments[i]; + u64 hi = s->vaddr + s->mem_size; + if (vaddr == hi) + return (uintptr_t)segs[i].runtime + (uintptr_t)s->mem_size; + } return 0; } @@ -94,6 +105,12 @@ static uintptr_t vaddr_to_write(const LinkImage* img, if (vaddr >= lo && vaddr < hi) return (uintptr_t)segs[i].write + (uintptr_t)(vaddr - lo); } + for (i = 0; i < img->nsegments; ++i) { + const LinkSegment* s = &img->segments[i]; + u64 hi = s->vaddr + s->mem_size; + if (vaddr == hi) + return (uintptr_t)segs[i].write + (uintptr_t)s->mem_size; + } return 0; } diff --git a/src/link/link_layout.c b/src/link/link_layout.c @@ -296,6 +296,16 @@ static void resolve_undefs(Linker* l, LinkImage* img) continue; } } + if (s->bind == SB_WEAK) { + /* Weak undef resolves to NULL — references that go through + * the GOT see a zero slot (case 16_weak_undef). Mark as + * SK_ABS with vaddr=0 so emit/JIT skip the relative-base + * adjustments. */ + s->kind = SK_ABS; + s->vaddr = 0; + s->defined = 1; + continue; + } { size_t namelen; const char* nm = s->name ? pool_str(l->c->global, s->name, &namelen) @@ -743,13 +753,22 @@ static u8 reloc_width(RelocKind k) case R_AARCH64_LDST32_ABS_LO12_NC: case R_AARCH64_LDST64_ABS_LO12_NC: case R_AARCH64_LDST128_ABS_LO12_NC: + case R_AARCH64_ADR_GOT_PAGE: + case R_AARCH64_LD64_GOT_LO12_NC: return 4; default: return 0; } } -static void emit_reloc_records(Linker* l, LinkImage* img) +static int reloc_uses_got(u16 kind) +{ + return kind == R_AARCH64_ADR_GOT_PAGE + || kind == R_AARCH64_LD64_GOT_LO12_NC; +} + +static void emit_reloc_records(Linker* l, LinkImage* img, + const LinkSymId* got_map) { u32 ii; for (ii = 0; ii < l->ninputs; ++ii) { @@ -778,6 +797,16 @@ static void emit_reloc_records(Linker* l, LinkImage* img) if (target == LINK_SYM_NONE) compiler_panic(l->c, no_loc(), "link: reloc references unmapped symbol"); + /* GOT-based relocs target the synthetic .got slot, not the + * symbol itself. The slot is filled by a per-slot R_ABS64 + * reloc emitted by layout_got. */ + if (got_map && reloc_uses_got(r->kind)) { + LinkSymId slot = got_map[target]; + if (slot == LINK_SYM_NONE) + compiler_panic(l->c, no_loc(), + "link: GOT slot missing for symbol"); + target = slot; + } ls = &img->sections[m->section[r->section_id] - 1]; memset(&rec, 0, sizeof(rec)); rec.input_id = l->inputs[ii].id; @@ -800,6 +829,212 @@ static void emit_reloc_records(Linker* l, LinkImage* img) } } +/* ---- pass 3c: GOT layout ---- + * + * Static-PIC GOT for cases where clang emits R_AARCH64_ADR_GOT_PAGE + + * R_AARCH64_LD64_GOT_LO12_NC (typical for weak-extern references). We + * append a fresh RW segment carrying one 8-byte slot per unique target + * symbol, synthesize a LinkSymbol per slot (so emit_reloc_records can + * redirect the GOT-page/LO12 reloc to the slot), and emit a per-slot + * R_ABS64 reloc that fills the slot with the symbol's resolved runtime + * vaddr at apply time. Weak-undef targets stay at vaddr 0 so the slot + * carries NULL. + * + * The returned `got_map_out` is a sparse array of size (img->nsyms+1) + * indexed by LinkSymId, holding the slot's synthetic LinkSymId (or + * LINK_SYM_NONE for symbols that don't need a slot). Caller frees. */ +static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out) +{ + Heap* h = img->heap; + LinkSymId* got_map; + LinkSymId* slot_targets = NULL; + u32 slot_cap = 0; + u32 nslot = 0; + u32 ii, j, k; + u64 page; + u64 base_vaddr = 0; + u64 got_size; + LinkSegment* gotseg; + LinkSection* gotsec; + u32 gotseg_idx; + u32 si; + + *got_map_out = NULL; + + /* Pass A: scan input relocs for GOT-using kinds. */ + { + u32 nsyms_now = img->nsyms; /* freeze before we append */ + got_map = (LinkSymId*)h->alloc(h, sizeof(*got_map) * (nsyms_now + 1u), + _Alignof(LinkSymId)); + if (!got_map) compiler_panic(img->c, no_loc(), "link: oom on got map"); + memset(got_map, 0, sizeof(*got_map) * (nsyms_now + 1u)); + (void)nsyms_now; + } + + for (ii = 0; ii < l->ninputs; ++ii) { + ObjBuilder* ob = l->inputs[ii].obj; + InputMap* m = &img->input_maps[ii]; + u32 nsec = obj_section_count(ob); + u32 total = 0; + const Reloc* base; + for (j = 0; j < nsec; ++j) total += obj_reloc_count(ob, j); + if (!total) continue; + base = obj_relocs(ob, 0); + for (k = 0; k < total; ++k) { + const Reloc* r = &base[k]; + const Section* s = obj_section_get(ob, r->section_id); + LinkSymId target; + if (!s || !section_kept(s)) continue; + if (!reloc_uses_got(r->kind)) continue; + if (r->sym == OBJ_SYM_NONE || r->sym >= m->nsym) continue; + target = m->sym[r->sym]; + if (target == LINK_SYM_NONE) continue; + if (got_map[target] != LINK_SYM_NONE) continue; + if (nslot == slot_cap) { + u32 nc = slot_cap ? slot_cap * 2u : 8u; + LinkSymId* nb = (LinkSymId*)h->realloc( + h, slot_targets, + sizeof(*slot_targets) * slot_cap, + sizeof(*slot_targets) * nc, _Alignof(LinkSymId)); + if (!nb) compiler_panic(img->c, no_loc(), + "link: oom on got slot list"); + slot_targets = nb; + slot_cap = nc; + } + slot_targets[nslot] = target; + /* Mark sentinel; replaced with real slot LinkSymId below. */ + got_map[target] = (LinkSymId)(nslot + 1u); + nslot++; + } + } + + if (nslot == 0) { + if (slot_targets) + h->free(h, slot_targets, sizeof(*slot_targets) * slot_cap); + h->free(h, got_map, sizeof(*got_map) * (img->nsyms + 1u)); + return; + } + + /* Reset got_map markers — we'll fill in real slot ids in pass C. */ + for (si = 0; si < nslot; ++si) + got_map[slot_targets[si]] = LINK_SYM_NONE; + + /* Pass B: append a new RW segment for .got, page-aligned after the + * existing segment span. */ + page = layout_page_size(l); + for (j = 0; j < img->nsegments; ++j) { + u64 end = img->segments[j].vaddr + img->segments[j].mem_size; + if (end > base_vaddr) base_vaddr = end; + } + base_vaddr = align_up_u64(base_vaddr, page); + got_size = (u64)nslot * 8u; + + { + u32 new_nseg = img->nsegments + 1u; + LinkSegment* nsegs = (LinkSegment*)h->realloc( + h, img->segments, + sizeof(*img->segments) * img->nsegments, + sizeof(*img->segments) * new_nseg, _Alignof(LinkSegment)); + u8** nsbufs = (u8**)h->realloc( + h, img->segment_bytes, + sizeof(*img->segment_bytes) * img->nsegments, + sizeof(*img->segment_bytes) * new_nseg, _Alignof(u8*)); + size_t* nscaps = (size_t*)h->realloc( + h, img->segment_bytes_cap, + sizeof(*img->segment_bytes_cap) * img->nsegments, + sizeof(*img->segment_bytes_cap) * new_nseg, _Alignof(size_t)); + if (!nsegs || !nsbufs || !nscaps) + compiler_panic(img->c, no_loc(), "link: oom on got segment"); + img->segments = nsegs; + img->segment_bytes = nsbufs; + img->segment_bytes_cap = nscaps; + } + + gotseg_idx = img->nsegments; + gotseg = &img->segments[gotseg_idx]; + memset(gotseg, 0, sizeof(*gotseg)); + gotseg->id = (LinkSegmentId)(gotseg_idx + 1u); + gotseg->flags = SF_ALLOC | SF_WRITE; + gotseg->file_offset = base_vaddr; + gotseg->vaddr = base_vaddr; + gotseg->file_size = got_size; + gotseg->mem_size = got_size; + gotseg->align = (u32)page; + gotseg->nsections = 1; + + img->segment_bytes[gotseg_idx] = (u8*)h->alloc(h, (size_t)got_size, 16); + img->segment_bytes_cap[gotseg_idx] = (size_t)got_size; + if (!img->segment_bytes[gotseg_idx]) + compiler_panic(img->c, no_loc(), "link: oom on got bytes"); + memset(img->segment_bytes[gotseg_idx], 0, (size_t)got_size); + img->nsegments++; + + /* Pass C: append the synthetic .got LinkSection. */ + { + u32 new_nsec = img->nsections + 1u; + LinkSection* nsections = (LinkSection*)h->realloc( + h, img->sections, + sizeof(*img->sections) * img->nsections, + sizeof(*img->sections) * new_nsec, _Alignof(LinkSection)); + if (!nsections) + compiler_panic(img->c, no_loc(), "link: oom on got section"); + img->sections = nsections; + } + gotsec = &img->sections[img->nsections]; + memset(gotsec, 0, sizeof(*gotsec)); + gotsec->id = (LinkSectionId)(img->nsections + 1u); + gotsec->input_id = LINK_INPUT_NONE; + gotsec->obj_section_id = OBJ_SEC_NONE; + gotsec->segment_id = gotseg->id; + gotsec->input_offset = 0; + gotsec->file_offset = base_vaddr; + gotsec->vaddr = base_vaddr; + gotsec->size = got_size; + gotsec->flags = SF_ALLOC | SF_WRITE; + gotsec->align = 8; + img->nsections++; + + /* Pass D: per slot, synthesize a LinkSymbol and emit the R_ABS64 + * reloc that fills it at apply time. */ + for (si = 0; si < nslot; ++si) { + LinkSymId orig = slot_targets[si]; + u64 slot_vaddr = base_vaddr + (u64)si * 8u; + LinkSymbol sym_rec; + LinkRelocApply rrec; + LinkSymId slot_id; + + memset(&sym_rec, 0, sizeof(sym_rec)); + sym_rec.name = 0; + sym_rec.kind = SK_OBJ; + sym_rec.bind = SB_LOCAL; + sym_rec.defined = 1; + sym_rec.section_id = gotsec->id; + sym_rec.vaddr = slot_vaddr; + sym_rec.size = 8; + slot_id = append_symbol(img, &sym_rec); + got_map[orig] = slot_id; + + memset(&rrec, 0, sizeof(rrec)); + rrec.input_id = LINK_INPUT_NONE; + rrec.section_id = OBJ_SEC_NONE; + rrec.link_section_id = gotsec->id; + rrec.offset = (u32)(si * 8u); + rrec.width = 8; + rrec.write_vaddr = slot_vaddr; + rrec.write_file_offset = base_vaddr + (u64)si * 8u; + rrec.kind = R_ABS64; + rrec.target = orig; + rrec.addend = 0; + relocs_grow(img, img->nrelocs + 1u); + img->relocs[img->nrelocs++] = rrec; + } + + if (slot_targets) + h->free(h, slot_targets, sizeof(*slot_targets) * slot_cap); + + *got_map_out = got_map; +} + /* ---- entry symbol ---- */ static void resolve_entry(Linker* l, LinkImage* img) @@ -826,12 +1061,142 @@ static void resolve_entry(Linker* l, LinkImage* img) img->entry_sym = id; } +/* ---- archive ingestion ---- + * + * Members were parsed up-front by link_add_archive_bytes; this pass + * decides which ones get pulled into the link. --whole-archive + * archives include every member; demand archives include any member + * that defines a global symbol referenced (and not yet defined) by + * the current input set, iterated to a fixed point so a member that + * pulls in fresh undefs can drag in further members. */ + +static void include_archive_member(Linker* l, LinkArchiveMember* mem) +{ + LinkInput* in; + LinkInputId id; + if (mem->included) return; + if (l->ninputs >= l->inputs_cap) { + u32 new_cap = l->inputs_cap ? l->inputs_cap * 2u : 8u; + LinkInput* p = (LinkInput*)l->heap->realloc( + l->heap, l->inputs, + sizeof(*l->inputs) * l->inputs_cap, + sizeof(*l->inputs) * new_cap, + _Alignof(LinkInput)); + if (!p) compiler_panic(l->c, no_loc(), + "link: oom growing inputs (archive member)"); + l->inputs = p; + l->inputs_cap = new_cap; + } + id = (LinkInputId)(l->ninputs + 1); + in = &l->inputs[l->ninputs++]; + memset(in, 0, sizeof(*in)); + in->id = id; + in->kind = LINK_INPUT_OBJ_BYTES; /* the input owns the ObjBuilder now */ + in->obj = mem->obj; + in->name = mem->name; + mem->included = 1; + mem->obj = NULL; /* ownership transferred */ +} + +/* Build presence sets across all currently-included inputs. The values + * stored in the SymHash are dummies (1) — only key presence matters. */ +static void scan_presence(Linker* l, SymHash* defined, SymHash* undefs) +{ + u32 ii; + ObjSymIter* it; + ObjSymEntry e; + for (ii = 0; ii < l->ninputs; ++ii) { + ObjBuilder* ob = l->inputs[ii].obj; + it = obj_symiter_new(ob); + while (obj_symiter_next(it, &e)) { + const ObjSym* s = e.sym; + if (s->name == 0) continue; + if (s->bind == SB_LOCAL) continue; + if (s->kind == SK_UNDEF) symhash_set(undefs, s->name, 1u); + else symhash_set(defined, s->name, 1u); + } + obj_symiter_free(it); + } +} + +/* True if `mem` defines an SB_GLOBAL symbol that's listed in `wanted` + * and not already in `defined`. Standard demand-load: weak defs do not + * trigger archive pull. */ +static int member_satisfies(LinkArchiveMember* mem, + const SymHash* defined, const SymHash* wanted) +{ + ObjSymIter* it; + ObjSymEntry e; + int hit = 0; + it = obj_symiter_new(mem->obj); + while (obj_symiter_next(it, &e)) { + const ObjSym* s = e.sym; + if (s->name == 0) continue; + if (s->kind == SK_UNDEF) continue; + if (s->bind != SB_GLOBAL) continue; + if (symhash_get(wanted, s->name) == LINK_SYM_NONE) continue; + if (symhash_get(defined, s->name) != LINK_SYM_NONE) continue; + hit = 1; + break; + } + obj_symiter_free(it); + return hit; +} + +void link_ingest_archives(Linker* l) +{ + u32 a, m; + if (l->narchives == 0) return; + + /* Pass 1: --whole-archive members are pulled unconditionally. */ + for (a = 0; a < l->narchives; ++a) { + LinkArchive* ar = &l->archives[a]; + if (!ar->whole_archive) continue; + for (m = 0; m < ar->nmembers; ++m) + include_archive_member(l, &ar->members[m]); + } + + /* Pass 2: demand loop over the remaining archives. Pulling member A + * may introduce undefs satisfied by member B, so iterate to a + * fixed point. Bounded by total member count across archives. */ + for (;;) { + SymHash defined, undefs; + int changed = 0; + symhash_init(&defined, l->heap); + symhash_init(&undefs, l->heap); + scan_presence(l, &defined, &undefs); + + for (a = 0; a < l->narchives; ++a) { + LinkArchive* ar = &l->archives[a]; + if (ar->whole_archive) continue; + for (m = 0; m < ar->nmembers; ++m) { + LinkArchiveMember* mem = &ar->members[m]; + if (mem->included) continue; + if (!member_satisfies(mem, &defined, &undefs)) continue; + include_archive_member(l, mem); + changed = 1; + } + } + symhash_fini(&defined); + symhash_fini(&undefs); + if (!changed) break; + } +} + /* ---- public ---- */ LinkImage* link_resolve(Linker* l) { - LinkImage* img = link_image_alloc(l->c); - Heap* h = img->heap; + LinkImage* img; + Heap* h; + + /* Expand archive members into Linker.inputs before any layout + * machinery runs — once that's done, the rest of the pipeline + * sees a single flat input list and doesn't care about archives. */ + link_ingest_archives(l); + + img = link_image_alloc(l->c); + h = img->heap; /* Per-input map storage. */ img->ninput_maps = l->ninputs; @@ -851,7 +1216,14 @@ LinkImage* link_resolve(Linker* l) link_symbols_to_sections(l, img); emit_array_boundaries(l, img); resolve_undefs(l, img); - emit_reloc_records(l, img); + { + LinkSymId* got_map = NULL; + u32 got_map_size = img->nsyms + 1u; + layout_got(l, img, &got_map); + emit_reloc_records(l, img, got_map); + if (got_map) + h->free(h, got_map, sizeof(*got_map) * got_map_size); + } resolve_entry(l, img); return img; diff --git a/src/link/link_reloc.c b/src/link/link_reloc.c @@ -73,6 +73,7 @@ void link_reloc_apply(Compiler* c, RelocKind k, u8* P_bytes, wr_u32_le(P_bytes, instr); return; } + case R_AARCH64_ADR_GOT_PAGE: case R_AARCH64_ADR_PREL_PG_HI21: { /* ADRP — page-relative imm21, encoded as immlo[30:29] + * immhi[23:5]. Effective immediate is (S+A) page minus P page, @@ -105,15 +106,20 @@ void link_reloc_apply(Compiler* c, RelocKind k, u8* P_bytes, case R_AARCH64_LDST16_ABS_LO12_NC: case R_AARCH64_LDST32_ABS_LO12_NC: case R_AARCH64_LDST64_ABS_LO12_NC: - case R_AARCH64_LDST128_ABS_LO12_NC: { + case R_AARCH64_LDST128_ABS_LO12_NC: + case R_AARCH64_LD64_GOT_LO12_NC: { /* LDR/STR with imm12 at bits [21:10]; the imm is scaled by the * access size, so we right-shift the low 12 bits of (S+A) by - * the size scale before encoding. NC = no overflow check. */ + * the size scale before encoding. NC = no overflow check. + * + * LD64_GOT_LO12_NC has the same encoding as LDST64_ABS_LO12_NC; + * the linker has already redirected `S` to the GOT slot. */ u32 shift = (k == R_AARCH64_LDST8_ABS_LO12_NC) ? 0u : (k == R_AARCH64_LDST16_ABS_LO12_NC) ? 1u : (k == R_AARCH64_LDST32_ABS_LO12_NC) ? 2u : - (k == R_AARCH64_LDST64_ABS_LO12_NC) ? 3u : 4u; + (k == R_AARCH64_LDST64_ABS_LO12_NC || + k == R_AARCH64_LD64_GOT_LO12_NC) ? 3u : 4u; u64 lo12 = ((u64)S + (u64)A) & 0xfffu; u64 imm12 = lo12 >> shift; u32 instr = rd_u32_le(P_bytes); diff --git a/test/link/run.sh b/test/link/run.sh @@ -93,7 +93,11 @@ fi command -v llvm-readelf >/dev/null 2>&1 && have_readelf=1 command -v readelf >/dev/null 2>&1 && have_readelf=1 command -v python3 >/dev/null 2>&1 && have_python3=1 -command -v ar >/dev/null 2>&1 && have_ar=1 +# Prefer llvm-ar for archive creation: Apple's /usr/bin/ar requires +# Mach-O members and silently drops ELF objects (leaving only a SYMDEF +# stub), which breaks the cross-target archive cases here. +AR_BIN="$(command -v llvm-ar 2>/dev/null || command -v ar 2>/dev/null || true)" +[ -n "$AR_BIN" ] && have_ar=1 [ -f "$ROUNDTRIP_BIN" ] && have_roundtrip=1 QEMU_BIN="$(command -v qemu-aarch64-static 2>/dev/null || command -v qemu-aarch64 2>/dev/null || true)" @@ -237,7 +241,7 @@ for case_dir in "$TEST_DIR/cases"/*/; do if [ "$base" = "b" ] && [ "$archive_mode" != "none" ]; then if [ $have_ar -eq 1 ]; then arc="$work/b.a" - ar rcs "$arc" "$o" 2>/dev/null + "$AR_BIN" rcs "$arc" "$o" 2>/dev/null if [ "$archive_mode" = "whole" ]; then link_arc_flags+=(--whole-archive --archive "$arc") else