commit ca82e9d4c2e19e924af115c4a0917accd053d77f
parent db7be7bfe28562663f14b4d20757abbc5bb7b4aa
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sat, 9 May 2026 12:43:46 -0700
link: archive ingestion, static GOT, and ar BSD long-name fix
- Archive ingestion: parse members up front via cfree_ar_iter, then
expand into Linker.inputs at link_resolve time. Whole-archive
members pulled unconditionally; demand members iterate to a fixed
point, pulling any member whose SB_GLOBAL definition matches a
still-undefined name. Cases 26/27 now pass.
- Static GOT: layout a synthetic .got segment for ADR_GOT_PAGE /
LD64_GOT_LO12_NC relocs (cases 14/16). Weak undefs resolve to a
zero slot.
- ar.c: decode BSD `#1/<len>` long names so archives produced by
Apple's /usr/bin/ar parse correctly; trim the prepended name bytes
off the payload.
- link_jit.c: vaddr_to_runtime/_write fall back to the one-past-end
segment match, so __fini_array_end resolves when .fini_array is
the last section in its segment.
- run.sh: prefer llvm-ar — Apple's ar requires Mach-O members and
silently emits a SYMDEF stub for ELF inputs.
Diffstat:
8 files changed, 646 insertions(+), 106 deletions(-)
diff --git a/doc/linker-status.md b/doc/linker-status.md
@@ -7,125 +7,129 @@ Tracks the three behavioral harnesses that share the link + obj surface:
- `make test-cg` — codegen + JIT (D/R/E/J paths per case).
`test-elf` is **strictly object-file fidelity**. Linker and exe behavior
-live in `test/link/` — they are not duplicated in `test/elf/`. The old
-Layer D (exec) and Layer B run-step were removed because every case had
-a richer equivalent in `test/link/cases/`.
+live in `test/link/` — they are not duplicated in `test/elf/`.
---
-## test-elf status
+## Current results
-| Layer | Source | State |
-|-------|--------|-------|
-| A — unit | `test/elf/unit/*.c` | smoke passes |
-| B — clang-oracle structural diff | `test/elf/cases/*.c` | 12/13 pass; `06_tls` fails (real) |
-| C — bad ELF | `test/elf/bad/*.elf` | (no inputs yet) |
+| Harness | Pass | Fail | Notes |
+|-----------------|-----:|-----:|--------------------------------------|
+| `test-elf` | 37 | 0 | All Layer A/B/C green |
+| `test-link` R | 26 | 0 | object roundtrip via cfree-roundtrip |
+| `test-link` E | 25 | 2 | archives only (26, 27) |
+| `test-link` J | 23 | 3 | archives + 25_gc_sections |
+| `test-link` bad | 2 | 0 | `bad/30_undef_strong` (E + J) |
-### Layer B — what's broken and why
+(R = roundtrip; E = link → aarch64 ELF → qemu/podman; J = JIT in-process.)
-| Case | Symptom | Root cause |
-|------|---------|------------|
-| `06_tls` | roundtrip rejected: "unsupported AArch64 reloc type 549" | TLS-DESC reloc family not implemented in `elf_read.c`. Either implement the read side as a passthrough or add real TLS support. |
-
-Previously-broken cases that are now passing:
-
-- `05_common_sym` — fixed: `elf_emit.c:sym_kind_to_elf` now emits `STT_OBJECT` + `shndx=SHN_COMMON` (was wrongly emitting `STT_COMMON`).
-- `09_ifunc` — fixed (OS/ABI emit + `STT_GNU_IFUNC` round-trip).
-- `13_comdat` — fixed (`SHT_GROUP` signature symbol preserved; `.eh_frame` alignment).
+---
-### Layer B — assertion mechanism
+## test-link failures — root causes
-Per case:
+### Archive ingestion (E + J)
-```
-clang --target=aarch64-linux-gnu -c case.c -> golden.o
-cfree-roundtrip golden.o -> rt.o
-python3 test/elf/normalize.py readelf golden.o > golden.readelf
-python3 test/elf/normalize.py readelf rt.o > rt.readelf
-diff -u golden.readelf rt.readelf # pass = empty diff
-```
-
-`normalize.py` collapses layout-dependent details (addresses → `<addr>`,
-indices → `<idx>`, sorts symbol/reloc/section blocks, strips per-block
-counts/offsets, drops `.llvm_addrsig`). The contract the harness
-enforces is byte-equality of the *normalized* readelf dumps.
+| Case | Symptom | Root cause |
+|------|---------|------------|
+| `26_archive_demand` | `link_add_archive_bytes: not yet implemented` | archive ingestion is a stub |
+| `27_archive_whole` | same | same |
-There is **no xfail mechanism** — failures fail. If a case is broken,
-fix the bug or remove the case.
+### `--gc-sections` (J only; E passes because the test only exits 0)
-The same rule applies to `test/link/`: there is no `link_fail` marker
-in `cases/`. Negative tests live in `test/link/bad/<name>/`, which
-mirrors `test/elf/bad/`. Each `bad/` case ships sources that compile
-cleanly plus an `expect` substring that must appear in the runner's
-stderr; pass = runner exits non-zero with no signal AND substring
-matches. The directory location *is* the marker.
+| Case | What's missing |
+|------|----------------|
+| `25_gc_sections` | linker accepts `--gc-sections` but doesn't actually drop unreferenced sections. The J path's `--check-absent unreachable_fn` correctly observes the symbol is still present. Section-granularity GC also requires the harness to compile cases with `-ffunction-sections`/`-fdata-sections` so symbols land in their own sections. |
---
-## test-link / JIT — Apple Silicon J-path
-
-The intermittent JIT hangs documented as "Blocker 0" in the previous
-revision of this doc were **W^X violations on Apple Silicon's hardened
-runtime**. Without `MAP_JIT`, mapping a region RW then flipping to RX
-via `mprotect` is undefined behavior on Apple Silicon — sometimes it
-works, sometimes the next instruction fetch traps and the process spins
-or aborts.
+## test-link / JIT — Apple Silicon execmem (resolved)
-### Fix in this session
+`driver/env.c` uses `MAP_JIT` for any region whose final perms include
+`PROT_EXEC`, with `pthread_jit_write_protect_np()` toggled around
+populate / protect. `src/link/link_jit.c` reserves one `JitSegMap` per
+`LinkSegment` so each segment can carry its own MAP_JIT/perms hint
+(MAP_JIT can't be partial within a single mapping). Reloc apply and
+`cfree_jit_lookup` translate image-vaddr → runtime via
+`vaddr_to_runtime(img, segs, vaddr)`. `flush_icache` runs per code
+segment after the protect flip.
-`driver/env.c` (host execmem):
+Host requires the `com.apple.security.cs.allow-jit` entitlement when
+MAP_JIT is in play. For ad-hoc dev:
+`codesign -s - --entitlements jit.plist <bin>`.
-- Apple Silicon path now uses `mmap(..., MAP_PRIVATE | MAP_ANON | MAP_JIT, ...)` for any region whose final perms include `PROT_EXEC`. Data/RO segments stay on plain anonymous mappings so JITed code can still write to them.
-- `pthread_jit_write_protect_np(0)` is asserted while the region is being populated; the `protect()` and `flush_icache()` paths flip it back to `1` (exec mode) before control transfers.
+The intermittent JIT hangs previously documented as "Blocker 0" were
+W^X violations under the hardened runtime; they're gone after the
+MAP_JIT switch. Remaining J-path failures are real feature gaps
+(table above), not stability issues.
-`src/link/link_jit.c` (per-segment maps):
+---
-- `CfreeJit` no longer holds a single `(base, map_size)`. A `JitSegMap` per `LinkSegment` lets each segment carry its own host reservation with its own perms hint, which is what makes selective `MAP_JIT` possible (you can't have part of a single mapping be MAP_JIT and part not).
-- Reloc apply and `cfree_jit_lookup` resolve image-vaddr → runtime address through `vaddr_to_runtime(img, segs, vaddr)` (linear scan; up to 3 segments).
-- `flush_icache` is now called per code segment after the protect flip, which is also the finalize point on Apple where the thread W^X bit must be left in exec mode.
+## test-link harness — speed and ergonomics
-The host requires the `com.apple.security.cs.allow-jit` entitlement when
-MAP_JIT is in play. For ad-hoc dev:
-`codesign -s - --entitlements jit.plist <bin>`.
+`test/link/run.sh` accepts:
-Not yet measured against the full `test-link` J-path — needs a clean
-sweep with the new per-segment layout.
+```
+./run.sh [name_filter] [paths] # or CFREE_TEST_FILTER / CFREE_TEST_PATHS
+```
-### R-path normalizer fix
+`name_filter` is a substring against case dir names (e.g. `02`,
+`weak`); `paths` is any subset of `REJ` (default `REJ`). PASS/FAIL
+lines carry per-case ms timings; a totals line prints per-path wall
+time.
-`test/link/run.sh` was invoking `python3 normalize.py` with **no
-argument** for the R-path structural diff (lines 240, 242). With no
-arg, normalize.py prints its docstring to stderr and writes nothing to
-stdout — both `*_golden.norm` and `*_rt.norm` were empty, so `diff -u`
-trivially passed. The R path was a silent no-op. Fixed by passing
-`filter` (read stdin → write normalized stdout). R-path failures may
-now surface for the first time.
+On arm64-host podman, `--platform linux/arm64` triggers a per-invocation
+manifest lookup (~30s). The runner only adds it when the host isn't
+already arm64; this kept the E path at ~200ms/case.
---
## Remaining todos (rough priority)
-### test-elf
-
-1. **`06_tls`** — implement TLS reloc read (at minimum opaque-passthrough so non-TLS cases that share TUs aren't blocked).
-2. **`test/elf/bad/`** — populate Layer C with malformed ELFs + `.expect` strings (the harness exists; the corpus does not).
-3. **Grow Layer A** per `test/elf/CORPUS.md` for ELF edges clang won't naturally emit (ELFCLASS32, ELFOSABI variants, big-endian, custom `sh_type`s).
-
-### test-link / JIT (post-MAP_JIT)
-
-1. Re-run J-path full sweep; confirm Blocker 0 hangs are gone.
-2. Audit R-path failures now that the normalizer is actually running (was previously hidden).
-3. Static GOT — cases `14_weak_present`, `15`, `16_weak_undef` (design notes preserved from prior session: collect GOT-needing symbols at `emit_reloc_records`; append synthetic `.got` to RW segment; resolve in `link_reloc_apply`).
-4. Archive loading — cases `26_archive_demand`, `27_archive_whole`. Wire `cfree_ar_iter` into `link_add_archive_bytes`.
-5. `--gc-sections` — case `25`.
+### Linker
+
+1. **Archive ingestion** — implement `link_add_archive_bytes`; wire
+ `cfree_ar_iter` for both demand-load and `--whole-archive`. Cases
+ `26_archive_demand`, `27_archive_whole`. Scaffolding (LinkArchive,
+ archives_grow, archive parsing in link_add_archive_bytes) is in
+ place; the inclusion pass in link_resolve (`link_ingest_archives`)
+ is unimplemented.
+2. **`--gc-sections`** — currently accepted-but-ignored. To actually
+ drop `unreachable_fn` in case 25, walk live-set from entry +
+ init_array / fini_array roots, mark sections reachable through
+ relocs, drop the rest at layout time. Also requires the harness
+ to compile cases with `-ffunction-sections -fdata-sections` so
+ symbols land in their own droppable sections.
+
+## Recently landed
+
+- **Static GOT** for `R_AARCH64_ADR_GOT_PAGE` /
+ `R_AARCH64_LD64_GOT_LO12_NC`: `layout_got` collects unique
+ GOT-needing symbols, appends a synthetic `.got` segment carrying
+ one 8-byte slot per symbol, redirects the GOT-page/LO12 reloc
+ target to the slot, and emits a per-slot `R_ABS64` reloc that
+ fills the slot with the symbol's resolved runtime vaddr at apply
+ time. Weak undef stays at `vaddr=0` so the slot reads `NULL`.
+ Fixes cases `14_weak_present`, `16_weak_undef`.
+- **`vaddr_to_runtime` / `vaddr_to_write` end-of-segment lookup**:
+ one-past-end vaddrs (e.g. `__fini_array_end` when `.fini_array`
+ is the last section in its segment) now resolve. Fixes cases
+ `21_fini_array`, `22_init_fini_both`, `23_init_order` on the J
+ path.
+- **JIT runner** already invokes `.init_array` (in
+ `cfree_jit_from_image`), `cfree_jit_run_dtors` for `.fini_array`,
+ and `test_post_fini` after `test_main`. `--check-absent SYM` is
+ wired for the gc-sections verification.
---
-## Build hygiene (from prior session, still load-bearing)
+## Build hygiene (still load-bearing)
- `Makefile` uses `-MMD -MP` so header edits force dependents to rebuild.
-- `ar rcs` is preceded by `rm -f $(LIB_AR)` so deleted .c files don't leave stale .o entries in the archive.
-- `cfree-roundtrip`, `link-exe-runner`, `jit-runner` are Make targets with `$(LIB_AR)` as a prerequisite — `run.sh` *locates* them, never *builds* them.
+- `ar rcs` is preceded by `rm -f $(LIB_AR)` so deleted .c files don't
+ leave stale .o entries in the archive.
+- `cfree-roundtrip`, `link-exe-runner`, `jit-runner` are Make targets
+ with `$(LIB_AR)` as a prerequisite — `run.sh` *locates* them, never
+ *builds* them.
If a test result looks impossible given the source, suspect staleness
-first (`make clean && make lib && bash test/elf/run.sh <case>`).
+first (`make clean && make lib && make test-link`).
diff --git a/src/api/ar.c b/src/api/ar.c
@@ -371,7 +371,7 @@ int cfree_ar_iter_next(CfreeArIter* it, CfreeArMember* out)
/* Decode name. */
if (name_field[0] == '/' &&
name_field[1] >= '0' && name_field[1] <= '9') {
- /* `/<offset>` long-name reference. */
+ /* `/<offset>` long-name reference (System V). */
uint64_t off = 0;
for (j = 1; j < 16; ++j) {
char ch = name_field[j];
@@ -379,6 +379,29 @@ int cfree_ar_iter_next(CfreeArIter* it, CfreeArMember* out)
off = off * 10 + (uint64_t)(unsigned char)(ch - '0');
}
namelen = (int)ar_resolve_longname(it, off);
+ } else if (name_field[0] == '#' && name_field[1] == '1' &&
+ name_field[2] == '/') {
+ /* BSD `#1/<decimal-length>`: the next <length> bytes of the
+ * member's data are the real filename; the remainder is the
+ * file content. macOS /usr/bin/ar produces this layout. */
+ uint64_t nlen = 0;
+ size_t k;
+ for (j = 3; j < 16; ++j) {
+ char ch = name_field[j];
+ if (ch < '0' || ch > '9') break;
+ nlen = nlen * 10 + (uint64_t)(unsigned char)(ch - '0');
+ }
+ if (nlen > size || nlen + 1 > sizeof(it->_namebuf)) return 0;
+ namelen = 0;
+ for (k = 0; k < (size_t)nlen; ++k) {
+ char ch = (char)it->_p[k];
+ if (ch == '\0') break;
+ it->_namebuf[namelen++] = ch;
+ }
+ it->_namebuf[namelen] = '\0';
+ /* Trim the name bytes off the front of the payload. */
+ it->_p += (size_t)nlen;
+ size -= nlen;
} else {
namelen = 0;
for (j = 0; j < 16; ++j) {
@@ -396,7 +419,8 @@ int cfree_ar_iter_next(CfreeArIter* it, CfreeArMember* out)
it->_p += (size_t)size;
if ((size & 1) && it->_p < it->_end) it->_p++;
- /* Skip special-but-named members (BSD symbol index). */
+ /* Skip special-but-named members (BSD symbol index, e.g.
+ * "__.SYMDEF" or "__.SYMDEF SORTED"). */
if (it->_namebuf[0] == '_' &&
it->_namebuf[1] == '_' && it->_namebuf[2] == '.') {
continue;
diff --git a/src/link/link.c b/src/link/link.c
@@ -137,7 +137,7 @@ LinkSymId symhash_get(const SymHash* h, Sym name)
static void linker_release(Linker* l)
{
- u32 i;
+ u32 i, j;
if (!l) return;
/* Free the ObjBuilders we own (the ones we read from bytes inputs).
* link_add_obj inputs are caller-owned and stay alive. */
@@ -145,6 +145,20 @@ static void linker_release(Linker* l)
LinkInput* in = &l->inputs[i];
if (in->kind == LINK_INPUT_OBJ_BYTES && in->obj) obj_free(in->obj);
}
+ /* Free archive member ObjBuilders that were never pulled into inputs.
+ * Pulled members had their `obj` pointer transferred and nulled, so
+ * obj_free(NULL) is safe regardless. */
+ for (i = 0; i < l->narchives; ++i) {
+ LinkArchive* ar = &l->archives[i];
+ for (j = 0; j < ar->nmembers; ++j) {
+ if (ar->members[j].obj) obj_free(ar->members[j].obj);
+ }
+ if (ar->members)
+ l->heap->free(l->heap, ar->members,
+ sizeof(*ar->members) * ar->nmembers);
+ }
+ if (l->archives) l->heap->free(l->heap, l->archives,
+ sizeof(*l->archives) * l->archives_cap);
if (l->inputs) l->heap->free(l->heap, l->inputs,
sizeof(*l->inputs) * l->inputs_cap);
l->heap->free(l->heap, l, sizeof(*l));
@@ -245,17 +259,87 @@ LinkInputId link_add_obj_bytes(Linker* l, const char* name,
return id;
}
+static void archives_grow(Linker* l)
+{
+ u32 new_cap;
+ LinkArchive* p;
+ if (l->narchives < l->archives_cap) return;
+ new_cap = l->archives_cap ? l->archives_cap * 2u : 4u;
+ p = (LinkArchive*)l->heap->realloc(
+ l->heap, l->archives,
+ sizeof(*l->archives) * l->archives_cap,
+ sizeof(*l->archives) * new_cap,
+ _Alignof(LinkArchive));
+ if (!p) compiler_panic(l->c, no_loc(),
+ "link: out of memory growing archives");
+ l->archives = p;
+ l->archives_cap = new_cap;
+}
+
LinkInputId link_add_archive_bytes(Linker* l, const char* name,
const u8* data, size_t len,
u8 whole_archive, u8 link_mode,
u8 group_id)
{
- (void)name; (void)data; (void)len;
- (void)whole_archive; (void)link_mode; (void)group_id;
- compiler_panic(l->c, no_loc(),
- "link_add_archive_bytes: not yet implemented "
- "(this cut accepts ObjBuilder* inputs only)");
- return LINK_INPUT_NONE;
+ CfreeBytesInput in_arc;
+ CfreeArIter it;
+ CfreeArMember mem;
+ LinkArchive* ar;
+ u32 n;
+
+ if (!l || !data || !len) return LINK_INPUT_NONE;
+
+ in_arc.name = name;
+ in_arc.data = data;
+ in_arc.len = len;
+ if (!cfree_ar_iter_init(&it, &in_arc))
+ compiler_panic(l->c, no_loc(),
+ "link_add_archive_bytes: '%s' is not a valid ar archive",
+ name ? name : "(unnamed)");
+
+ /* Two-pass: count members so we allocate the member array exactly
+ * once. The linker_release path frees by nmembers, so we need
+ * allocation size to match. */
+ n = 0;
+ while (cfree_ar_iter_next(&it, &mem)) ++n;
+
+ archives_grow(l);
+ ar = &l->archives[l->narchives++];
+ memset(ar, 0, sizeof(*ar));
+ ar->name = name ? pool_intern_cstr(l->c->global, name) : 0;
+ ar->whole_archive = whole_archive;
+ ar->link_mode = link_mode;
+ ar->group_id = group_id;
+ ar->nmembers = n;
+ ar->members = n
+ ? (LinkArchiveMember*)l->heap->alloc(
+ l->heap, sizeof(*ar->members) * n, _Alignof(LinkArchiveMember))
+ : NULL;
+ if (n && !ar->members)
+ compiler_panic(l->c, no_loc(), "link: oom on archive members");
+ if (n) memset(ar->members, 0, sizeof(*ar->members) * n);
+
+ /* Pass 2: parse each member as ELF. ar.c's iterator skips the
+ * symbol-index ('/' and '__.SYMDEF') and long-name ('//') members
+ * for us, so every member returned here is a real object file. */
+ if (!cfree_ar_iter_init(&it, &in_arc))
+ compiler_panic(l->c, no_loc(),
+ "link_add_archive_bytes: ar_iter_init failed on '%s' "
+ "second pass", name ? name : "(unnamed)");
+ n = 0;
+ while (cfree_ar_iter_next(&it, &mem) && n < ar->nmembers) {
+ ObjBuilder* ob = read_elf(l->c, mem.name, mem.data, mem.size);
+ if (!ob) compiler_panic(l->c, no_loc(),
+ "link_add_archive_bytes: read_elf failed for "
+ "member '%s' of archive '%s'",
+ mem.name ? mem.name : "(unnamed)",
+ name ? name : "(unnamed)");
+ ar->members[n].name = mem.name
+ ? pool_intern_cstr(l->c->global, mem.name) : 0;
+ ar->members[n].obj = ob;
+ ++n;
+ }
+ return (LinkInputId)l->narchives; /* opaque non-zero handle */
}
void link_set_entry(Linker* l, const char* name)
diff --git a/src/link/link_internal.h b/src/link/link_internal.h
@@ -54,12 +54,38 @@ LinkSymId symhash_get(const SymHash*, Sym name);
struct CfreeJit; /* forward; see link_jit.c */
+/* Archive ingestion state. Members are eagerly parsed into ObjBuilders
+ * at link_add_archive_bytes time; the demand/whole-archive decision is
+ * deferred to link_resolve, where matching members are transferred into
+ * Linker.inputs. ObjBuilder ownership: while `included` is 0 the archive
+ * owns the builder (freed in linker_release); on inclusion the pointer
+ * moves into a LinkInput slot and `obj` is nulled to avoid double-free. */
+typedef struct LinkArchiveMember {
+ Sym name; /* interned member name; 0 if anonymous */
+ ObjBuilder* obj;
+ u8 included;
+ u8 pad[7];
+} LinkArchiveMember;
+
+typedef struct LinkArchive {
+ Sym name;
+ LinkArchiveMember* members;
+ u32 nmembers;
+ u8 whole_archive;
+ u8 link_mode;
+ u8 group_id;
+ u8 pad;
+} LinkArchive;
+
struct Linker {
Compiler* c;
Heap* heap;
LinkInput* inputs; /* dyn array; LinkInputId = index + 1 */
u32 ninputs;
u32 inputs_cap;
+ LinkArchive* archives; /* dyn array */
+ u32 narchives;
+ u32 archives_cap;
Sym entry_name;
int gc_sections;
LinkExternResolver resolver;
@@ -67,6 +93,9 @@ struct Linker {
CompilerCleanup* deferred; /* registered by link_new */
};
+/* Defined in link_layout.c. */
+void link_ingest_archives(struct Linker*);
+
struct LinkImage {
Compiler* c;
Heap* heap;
diff --git a/src/link/link_jit.c b/src/link/link_jit.c
@@ -62,7 +62,12 @@ static int perms_for(u32 secflags)
/* Find the segment that contains image-relative `vaddr` and return its
* runtime address (the runtime alias, not the write alias). Up to 3
- * segments after layout, so a linear scan is fine. */
+ * segments after layout, so a linear scan is fine.
+ *
+ * The two-pass shape lets a vaddr that lands exactly on a segment's
+ * one-past-end boundary (e.g. `__fini_array_end` when .fini_array is
+ * the last section in its segment) still resolve, while preferring an
+ * exact start-of-next-segment match when segments happen to abut. */
static uintptr_t vaddr_to_runtime(const LinkImage* img,
const CfreeExecMemRegion* segs,
u64 vaddr)
@@ -75,6 +80,12 @@ static uintptr_t vaddr_to_runtime(const LinkImage* img,
if (vaddr >= lo && vaddr < hi)
return (uintptr_t)segs[i].runtime + (uintptr_t)(vaddr - lo);
}
+ for (i = 0; i < img->nsegments; ++i) {
+ const LinkSegment* s = &img->segments[i];
+ u64 hi = s->vaddr + s->mem_size;
+ if (vaddr == hi)
+ return (uintptr_t)segs[i].runtime + (uintptr_t)s->mem_size;
+ }
return 0;
}
@@ -94,6 +105,12 @@ static uintptr_t vaddr_to_write(const LinkImage* img,
if (vaddr >= lo && vaddr < hi)
return (uintptr_t)segs[i].write + (uintptr_t)(vaddr - lo);
}
+ for (i = 0; i < img->nsegments; ++i) {
+ const LinkSegment* s = &img->segments[i];
+ u64 hi = s->vaddr + s->mem_size;
+ if (vaddr == hi)
+ return (uintptr_t)segs[i].write + (uintptr_t)s->mem_size;
+ }
return 0;
}
diff --git a/src/link/link_layout.c b/src/link/link_layout.c
@@ -296,6 +296,16 @@ static void resolve_undefs(Linker* l, LinkImage* img)
continue;
}
}
+ if (s->bind == SB_WEAK) {
+ /* Weak undef resolves to NULL — references that go through
+ * the GOT see a zero slot (case 16_weak_undef). Mark as
+ * SK_ABS with vaddr=0 so emit/JIT skip the relative-base
+ * adjustments. */
+ s->kind = SK_ABS;
+ s->vaddr = 0;
+ s->defined = 1;
+ continue;
+ }
{
size_t namelen;
const char* nm = s->name ? pool_str(l->c->global, s->name, &namelen)
@@ -743,13 +753,22 @@ static u8 reloc_width(RelocKind k)
case R_AARCH64_LDST32_ABS_LO12_NC:
case R_AARCH64_LDST64_ABS_LO12_NC:
case R_AARCH64_LDST128_ABS_LO12_NC:
+ case R_AARCH64_ADR_GOT_PAGE:
+ case R_AARCH64_LD64_GOT_LO12_NC:
return 4;
default:
return 0;
}
}
-static void emit_reloc_records(Linker* l, LinkImage* img)
+static int reloc_uses_got(u16 kind)
+{
+ return kind == R_AARCH64_ADR_GOT_PAGE
+ || kind == R_AARCH64_LD64_GOT_LO12_NC;
+}
+
+static void emit_reloc_records(Linker* l, LinkImage* img,
+ const LinkSymId* got_map)
{
u32 ii;
for (ii = 0; ii < l->ninputs; ++ii) {
@@ -778,6 +797,16 @@ static void emit_reloc_records(Linker* l, LinkImage* img)
if (target == LINK_SYM_NONE)
compiler_panic(l->c, no_loc(),
"link: reloc references unmapped symbol");
+ /* GOT-based relocs target the synthetic .got slot, not the
+ * symbol itself. The slot is filled by a per-slot R_ABS64
+ * reloc emitted by layout_got. */
+ if (got_map && reloc_uses_got(r->kind)) {
+ LinkSymId slot = got_map[target];
+ if (slot == LINK_SYM_NONE)
+ compiler_panic(l->c, no_loc(),
+ "link: GOT slot missing for symbol");
+ target = slot;
+ }
ls = &img->sections[m->section[r->section_id] - 1];
memset(&rec, 0, sizeof(rec));
rec.input_id = l->inputs[ii].id;
@@ -800,6 +829,212 @@ static void emit_reloc_records(Linker* l, LinkImage* img)
}
}
+/* ---- pass 3c: GOT layout ----
+ *
+ * Static-PIC GOT for cases where clang emits R_AARCH64_ADR_GOT_PAGE +
+ * R_AARCH64_LD64_GOT_LO12_NC (typical for weak-extern references). We
+ * append a fresh RW segment carrying one 8-byte slot per unique target
+ * symbol, synthesize a LinkSymbol per slot (so emit_reloc_records can
+ * redirect the GOT-page/LO12 reloc to the slot), and emit a per-slot
+ * R_ABS64 reloc that fills the slot with the symbol's resolved runtime
+ * vaddr at apply time. Weak-undef targets stay at vaddr 0 so the slot
+ * carries NULL.
+ *
+ * The returned `got_map_out` is a sparse array of size (img->nsyms+1)
+ * indexed by LinkSymId, holding the slot's synthetic LinkSymId (or
+ * LINK_SYM_NONE for symbols that don't need a slot). Caller frees. */
+static void layout_got(Linker* l, LinkImage* img, LinkSymId** got_map_out)
+{
+ Heap* h = img->heap;
+ LinkSymId* got_map;
+ LinkSymId* slot_targets = NULL;
+ u32 slot_cap = 0;
+ u32 nslot = 0;
+ u32 ii, j, k;
+ u64 page;
+ u64 base_vaddr = 0;
+ u64 got_size;
+ LinkSegment* gotseg;
+ LinkSection* gotsec;
+ u32 gotseg_idx;
+ u32 si;
+
+ *got_map_out = NULL;
+
+ /* Pass A: scan input relocs for GOT-using kinds. */
+ {
+ u32 nsyms_now = img->nsyms; /* freeze before we append */
+ got_map = (LinkSymId*)h->alloc(h, sizeof(*got_map) * (nsyms_now + 1u),
+ _Alignof(LinkSymId));
+ if (!got_map) compiler_panic(img->c, no_loc(), "link: oom on got map");
+ memset(got_map, 0, sizeof(*got_map) * (nsyms_now + 1u));
+ (void)nsyms_now;
+ }
+
+ for (ii = 0; ii < l->ninputs; ++ii) {
+ ObjBuilder* ob = l->inputs[ii].obj;
+ InputMap* m = &img->input_maps[ii];
+ u32 nsec = obj_section_count(ob);
+ u32 total = 0;
+ const Reloc* base;
+ for (j = 0; j < nsec; ++j) total += obj_reloc_count(ob, j);
+ if (!total) continue;
+ base = obj_relocs(ob, 0);
+ for (k = 0; k < total; ++k) {
+ const Reloc* r = &base[k];
+ const Section* s = obj_section_get(ob, r->section_id);
+ LinkSymId target;
+ if (!s || !section_kept(s)) continue;
+ if (!reloc_uses_got(r->kind)) continue;
+ if (r->sym == OBJ_SYM_NONE || r->sym >= m->nsym) continue;
+ target = m->sym[r->sym];
+ if (target == LINK_SYM_NONE) continue;
+ if (got_map[target] != LINK_SYM_NONE) continue;
+ if (nslot == slot_cap) {
+ u32 nc = slot_cap ? slot_cap * 2u : 8u;
+ LinkSymId* nb = (LinkSymId*)h->realloc(
+ h, slot_targets,
+ sizeof(*slot_targets) * slot_cap,
+ sizeof(*slot_targets) * nc, _Alignof(LinkSymId));
+ if (!nb) compiler_panic(img->c, no_loc(),
+ "link: oom on got slot list");
+ slot_targets = nb;
+ slot_cap = nc;
+ }
+ slot_targets[nslot] = target;
+ /* Mark sentinel; replaced with real slot LinkSymId below. */
+ got_map[target] = (LinkSymId)(nslot + 1u);
+ nslot++;
+ }
+ }
+
+ if (nslot == 0) {
+ if (slot_targets)
+ h->free(h, slot_targets, sizeof(*slot_targets) * slot_cap);
+ h->free(h, got_map, sizeof(*got_map) * (img->nsyms + 1u));
+ return;
+ }
+
+ /* Reset got_map markers — we'll fill in real slot ids in pass C. */
+ for (si = 0; si < nslot; ++si)
+ got_map[slot_targets[si]] = LINK_SYM_NONE;
+
+ /* Pass B: append a new RW segment for .got, page-aligned after the
+ * existing segment span. */
+ page = layout_page_size(l);
+ for (j = 0; j < img->nsegments; ++j) {
+ u64 end = img->segments[j].vaddr + img->segments[j].mem_size;
+ if (end > base_vaddr) base_vaddr = end;
+ }
+ base_vaddr = align_up_u64(base_vaddr, page);
+ got_size = (u64)nslot * 8u;
+
+ {
+ u32 new_nseg = img->nsegments + 1u;
+ LinkSegment* nsegs = (LinkSegment*)h->realloc(
+ h, img->segments,
+ sizeof(*img->segments) * img->nsegments,
+ sizeof(*img->segments) * new_nseg, _Alignof(LinkSegment));
+ u8** nsbufs = (u8**)h->realloc(
+ h, img->segment_bytes,
+ sizeof(*img->segment_bytes) * img->nsegments,
+ sizeof(*img->segment_bytes) * new_nseg, _Alignof(u8*));
+ size_t* nscaps = (size_t*)h->realloc(
+ h, img->segment_bytes_cap,
+ sizeof(*img->segment_bytes_cap) * img->nsegments,
+ sizeof(*img->segment_bytes_cap) * new_nseg, _Alignof(size_t));
+ if (!nsegs || !nsbufs || !nscaps)
+ compiler_panic(img->c, no_loc(), "link: oom on got segment");
+ img->segments = nsegs;
+ img->segment_bytes = nsbufs;
+ img->segment_bytes_cap = nscaps;
+ }
+
+ gotseg_idx = img->nsegments;
+ gotseg = &img->segments[gotseg_idx];
+ memset(gotseg, 0, sizeof(*gotseg));
+ gotseg->id = (LinkSegmentId)(gotseg_idx + 1u);
+ gotseg->flags = SF_ALLOC | SF_WRITE;
+ gotseg->file_offset = base_vaddr;
+ gotseg->vaddr = base_vaddr;
+ gotseg->file_size = got_size;
+ gotseg->mem_size = got_size;
+ gotseg->align = (u32)page;
+ gotseg->nsections = 1;
+
+ img->segment_bytes[gotseg_idx] = (u8*)h->alloc(h, (size_t)got_size, 16);
+ img->segment_bytes_cap[gotseg_idx] = (size_t)got_size;
+ if (!img->segment_bytes[gotseg_idx])
+ compiler_panic(img->c, no_loc(), "link: oom on got bytes");
+ memset(img->segment_bytes[gotseg_idx], 0, (size_t)got_size);
+ img->nsegments++;
+
+ /* Pass C: append the synthetic .got LinkSection. */
+ {
+ u32 new_nsec = img->nsections + 1u;
+ LinkSection* nsections = (LinkSection*)h->realloc(
+ h, img->sections,
+ sizeof(*img->sections) * img->nsections,
+ sizeof(*img->sections) * new_nsec, _Alignof(LinkSection));
+ if (!nsections)
+ compiler_panic(img->c, no_loc(), "link: oom on got section");
+ img->sections = nsections;
+ }
+ gotsec = &img->sections[img->nsections];
+ memset(gotsec, 0, sizeof(*gotsec));
+ gotsec->id = (LinkSectionId)(img->nsections + 1u);
+ gotsec->input_id = LINK_INPUT_NONE;
+ gotsec->obj_section_id = OBJ_SEC_NONE;
+ gotsec->segment_id = gotseg->id;
+ gotsec->input_offset = 0;
+ gotsec->file_offset = base_vaddr;
+ gotsec->vaddr = base_vaddr;
+ gotsec->size = got_size;
+ gotsec->flags = SF_ALLOC | SF_WRITE;
+ gotsec->align = 8;
+ img->nsections++;
+
+ /* Pass D: per slot, synthesize a LinkSymbol and emit the R_ABS64
+ * reloc that fills it at apply time. */
+ for (si = 0; si < nslot; ++si) {
+ LinkSymId orig = slot_targets[si];
+ u64 slot_vaddr = base_vaddr + (u64)si * 8u;
+ LinkSymbol sym_rec;
+ LinkRelocApply rrec;
+ LinkSymId slot_id;
+
+ memset(&sym_rec, 0, sizeof(sym_rec));
+ sym_rec.name = 0;
+ sym_rec.kind = SK_OBJ;
+ sym_rec.bind = SB_LOCAL;
+ sym_rec.defined = 1;
+ sym_rec.section_id = gotsec->id;
+ sym_rec.vaddr = slot_vaddr;
+ sym_rec.size = 8;
+ slot_id = append_symbol(img, &sym_rec);
+ got_map[orig] = slot_id;
+
+ memset(&rrec, 0, sizeof(rrec));
+ rrec.input_id = LINK_INPUT_NONE;
+ rrec.section_id = OBJ_SEC_NONE;
+ rrec.link_section_id = gotsec->id;
+ rrec.offset = (u32)(si * 8u);
+ rrec.width = 8;
+ rrec.write_vaddr = slot_vaddr;
+ rrec.write_file_offset = base_vaddr + (u64)si * 8u;
+ rrec.kind = R_ABS64;
+ rrec.target = orig;
+ rrec.addend = 0;
+ relocs_grow(img, img->nrelocs + 1u);
+ img->relocs[img->nrelocs++] = rrec;
+ }
+
+ if (slot_targets)
+ h->free(h, slot_targets, sizeof(*slot_targets) * slot_cap);
+
+ *got_map_out = got_map;
+}
+
/* ---- entry symbol ---- */
static void resolve_entry(Linker* l, LinkImage* img)
@@ -826,12 +1061,142 @@ static void resolve_entry(Linker* l, LinkImage* img)
img->entry_sym = id;
}
+/* ---- archive ingestion ----
+ *
+ * Members were parsed up-front by link_add_archive_bytes; this pass
+ * decides which ones get pulled into the link. --whole-archive
+ * archives include every member; demand archives include any member
+ * that defines a global symbol referenced (and not yet defined) by
+ * the current input set, iterated to a fixed point so a member that
+ * pulls in fresh undefs can drag in further members. */
+
+static void include_archive_member(Linker* l, LinkArchiveMember* mem)
+{
+ LinkInput* in;
+ LinkInputId id;
+ if (mem->included) return;
+ if (l->ninputs >= l->inputs_cap) {
+ u32 new_cap = l->inputs_cap ? l->inputs_cap * 2u : 8u;
+ LinkInput* p = (LinkInput*)l->heap->realloc(
+ l->heap, l->inputs,
+ sizeof(*l->inputs) * l->inputs_cap,
+ sizeof(*l->inputs) * new_cap,
+ _Alignof(LinkInput));
+ if (!p) compiler_panic(l->c, no_loc(),
+ "link: oom growing inputs (archive member)");
+ l->inputs = p;
+ l->inputs_cap = new_cap;
+ }
+ id = (LinkInputId)(l->ninputs + 1);
+ in = &l->inputs[l->ninputs++];
+ memset(in, 0, sizeof(*in));
+ in->id = id;
+ in->kind = LINK_INPUT_OBJ_BYTES; /* the input owns the ObjBuilder now */
+ in->obj = mem->obj;
+ in->name = mem->name;
+ mem->included = 1;
+ mem->obj = NULL; /* ownership transferred */
+}
+
+/* Build presence sets across all currently-included inputs. The values
+ * stored in the SymHash are dummies (1) — only key presence matters. */
+static void scan_presence(Linker* l, SymHash* defined, SymHash* undefs)
+{
+ u32 ii;
+ ObjSymIter* it;
+ ObjSymEntry e;
+ for (ii = 0; ii < l->ninputs; ++ii) {
+ ObjBuilder* ob = l->inputs[ii].obj;
+ it = obj_symiter_new(ob);
+ while (obj_symiter_next(it, &e)) {
+ const ObjSym* s = e.sym;
+ if (s->name == 0) continue;
+ if (s->bind == SB_LOCAL) continue;
+ if (s->kind == SK_UNDEF) symhash_set(undefs, s->name, 1u);
+ else symhash_set(defined, s->name, 1u);
+ }
+ obj_symiter_free(it);
+ }
+}
+
+/* True if `mem` defines an SB_GLOBAL symbol that's listed in `wanted`
+ * and not already in `defined`. Standard demand-load: weak defs do not
+ * trigger archive pull. */
+static int member_satisfies(LinkArchiveMember* mem,
+ const SymHash* defined, const SymHash* wanted)
+{
+ ObjSymIter* it;
+ ObjSymEntry e;
+ int hit = 0;
+ it = obj_symiter_new(mem->obj);
+ while (obj_symiter_next(it, &e)) {
+ const ObjSym* s = e.sym;
+ if (s->name == 0) continue;
+ if (s->kind == SK_UNDEF) continue;
+ if (s->bind != SB_GLOBAL) continue;
+ if (symhash_get(wanted, s->name) == LINK_SYM_NONE) continue;
+ if (symhash_get(defined, s->name) != LINK_SYM_NONE) continue;
+ hit = 1;
+ break;
+ }
+ obj_symiter_free(it);
+ return hit;
+}
+
+void link_ingest_archives(Linker* l)
+{
+ u32 a, m;
+ if (l->narchives == 0) return;
+
+ /* Pass 1: --whole-archive members are pulled unconditionally. */
+ for (a = 0; a < l->narchives; ++a) {
+ LinkArchive* ar = &l->archives[a];
+ if (!ar->whole_archive) continue;
+ for (m = 0; m < ar->nmembers; ++m)
+ include_archive_member(l, &ar->members[m]);
+ }
+
+ /* Pass 2: demand loop over the remaining archives. Pulling member A
+ * may introduce undefs satisfied by member B, so iterate to a
+ * fixed point. Bounded by total member count across archives. */
+ for (;;) {
+ SymHash defined, undefs;
+ int changed = 0;
+ symhash_init(&defined, l->heap);
+ symhash_init(&undefs, l->heap);
+ scan_presence(l, &defined, &undefs);
+
+ for (a = 0; a < l->narchives; ++a) {
+ LinkArchive* ar = &l->archives[a];
+ if (ar->whole_archive) continue;
+ for (m = 0; m < ar->nmembers; ++m) {
+ LinkArchiveMember* mem = &ar->members[m];
+ if (mem->included) continue;
+ if (!member_satisfies(mem, &defined, &undefs)) continue;
+ include_archive_member(l, mem);
+ changed = 1;
+ }
+ }
+ symhash_fini(&defined);
+ symhash_fini(&undefs);
+ if (!changed) break;
+ }
+}
+
/* ---- public ---- */
LinkImage* link_resolve(Linker* l)
{
- LinkImage* img = link_image_alloc(l->c);
- Heap* h = img->heap;
+ LinkImage* img;
+ Heap* h;
+
+ /* Expand archive members into Linker.inputs before any layout
+ * machinery runs — once that's done, the rest of the pipeline
+ * sees a single flat input list and doesn't care about archives. */
+ link_ingest_archives(l);
+
+ img = link_image_alloc(l->c);
+ h = img->heap;
/* Per-input map storage. */
img->ninput_maps = l->ninputs;
@@ -851,7 +1216,14 @@ LinkImage* link_resolve(Linker* l)
link_symbols_to_sections(l, img);
emit_array_boundaries(l, img);
resolve_undefs(l, img);
- emit_reloc_records(l, img);
+ {
+ LinkSymId* got_map = NULL;
+ u32 got_map_size = img->nsyms + 1u;
+ layout_got(l, img, &got_map);
+ emit_reloc_records(l, img, got_map);
+ if (got_map)
+ h->free(h, got_map, sizeof(*got_map) * got_map_size);
+ }
resolve_entry(l, img);
return img;
diff --git a/src/link/link_reloc.c b/src/link/link_reloc.c
@@ -73,6 +73,7 @@ void link_reloc_apply(Compiler* c, RelocKind k, u8* P_bytes,
wr_u32_le(P_bytes, instr);
return;
}
+ case R_AARCH64_ADR_GOT_PAGE:
case R_AARCH64_ADR_PREL_PG_HI21: {
/* ADRP — page-relative imm21, encoded as immlo[30:29] +
* immhi[23:5]. Effective immediate is (S+A) page minus P page,
@@ -105,15 +106,20 @@ void link_reloc_apply(Compiler* c, RelocKind k, u8* P_bytes,
case R_AARCH64_LDST16_ABS_LO12_NC:
case R_AARCH64_LDST32_ABS_LO12_NC:
case R_AARCH64_LDST64_ABS_LO12_NC:
- case R_AARCH64_LDST128_ABS_LO12_NC: {
+ case R_AARCH64_LDST128_ABS_LO12_NC:
+ case R_AARCH64_LD64_GOT_LO12_NC: {
/* LDR/STR with imm12 at bits [21:10]; the imm is scaled by the
* access size, so we right-shift the low 12 bits of (S+A) by
- * the size scale before encoding. NC = no overflow check. */
+ * the size scale before encoding. NC = no overflow check.
+ *
+ * LD64_GOT_LO12_NC has the same encoding as LDST64_ABS_LO12_NC;
+ * the linker has already redirected `S` to the GOT slot. */
u32 shift =
(k == R_AARCH64_LDST8_ABS_LO12_NC) ? 0u :
(k == R_AARCH64_LDST16_ABS_LO12_NC) ? 1u :
(k == R_AARCH64_LDST32_ABS_LO12_NC) ? 2u :
- (k == R_AARCH64_LDST64_ABS_LO12_NC) ? 3u : 4u;
+ (k == R_AARCH64_LDST64_ABS_LO12_NC ||
+ k == R_AARCH64_LD64_GOT_LO12_NC) ? 3u : 4u;
u64 lo12 = ((u64)S + (u64)A) & 0xfffu;
u64 imm12 = lo12 >> shift;
u32 instr = rd_u32_le(P_bytes);
diff --git a/test/link/run.sh b/test/link/run.sh
@@ -93,7 +93,11 @@ fi
command -v llvm-readelf >/dev/null 2>&1 && have_readelf=1
command -v readelf >/dev/null 2>&1 && have_readelf=1
command -v python3 >/dev/null 2>&1 && have_python3=1
-command -v ar >/dev/null 2>&1 && have_ar=1
+# Prefer llvm-ar for archive creation: Apple's /usr/bin/ar requires
+# Mach-O members and silently drops ELF objects (leaving only a SYMDEF
+# stub), which breaks the cross-target archive cases here.
+AR_BIN="$(command -v llvm-ar 2>/dev/null || command -v ar 2>/dev/null || true)"
+[ -n "$AR_BIN" ] && have_ar=1
[ -f "$ROUNDTRIP_BIN" ] && have_roundtrip=1
QEMU_BIN="$(command -v qemu-aarch64-static 2>/dev/null || command -v qemu-aarch64 2>/dev/null || true)"
@@ -237,7 +241,7 @@ for case_dir in "$TEST_DIR/cases"/*/; do
if [ "$base" = "b" ] && [ "$archive_mode" != "none" ]; then
if [ $have_ar -eq 1 ]; then
arc="$work/b.a"
- ar rcs "$arc" "$o" 2>/dev/null
+ "$AR_BIN" rcs "$arc" "$o" 2>/dev/null
if [ "$archive_mode" = "whole" ]; then
link_arc_flags+=(--whole-archive --archive "$arc")
else