kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 50d5bec75749d30912c9f54abad250f279a0ffa2
parent 9fb1e48ba4c1b8dcd5ab830faa2ccdc9ab6ab285
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat,  9 May 2026 16:22:57 -0700

link: dynamic linking phases 1-3 (DSO read, driver, resolve)

Per doc/DYNLD.md: ELF reader accepts ET_DYN as a DSO input via the
new read_elf_dso path; driver gains -dynamic-linker, .so / .so.N
recognition, and -Bdynamic-aware -l<name>; resolve_undefs searches
DSO inputs by name before panicking. Dynamic harness now reaches
emit instead of being rejected at ELF read; static cases and all
existing test suites still pass.

Diffstat:
Mdoc/DYNLD.md | 364+++++++++++++++++++++++++++++++++++++++++++++++++------------------------------
Mdriver/cc.c | 5+++--
Mdriver/ld.c | 134+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
Mdriver/lib_resolve.c | 64++++++++++++++++++++++++++++++++++++++++++++++++++--------------
Mdriver/lib_resolve.h | 45+++++++++++++++++++++++++++++++++------------
Minclude/cfree.h | 16++++++++++++++++
Msrc/api/pipeline.c | 12++++++++++++
Msrc/link/link.c | 38+++++++++++++++++++++++++++++++++++++-
Msrc/link/link.h | 33++++++++++++++++++++++++++++++++-
Msrc/link/link_layout.c | 63+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
Msrc/obj/elf.h | 62++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/obj/elf_read.c | 201+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/obj/elf_reloc_aarch64.c | 16++++++++++++++++
Msrc/obj/obj.h | 21+++++++++++++++++++++
Mtest/test.mk | 23++++++++++++++++-------
15 files changed, 914 insertions(+), 183 deletions(-)

diff --git a/doc/DYNLD.md b/doc/DYNLD.md @@ -6,9 +6,51 @@ This is the gap exposed by `test/musl/run.sh`'s `dynamic` variant (`build/musl/<case>/dynamic/link.err`); see `doc/linker-status.md` row "Dynamic linking: PT_DYNAMIC, PT_INTERP, PLT, DT_NEEDED" for context. -The harness today fails at the first foot of the pipeline (ELF reader -rejects the `.so`). Behind that failure are the model, layout, emit, -and driver gaps catalogued below. +## Status + +Phases 1–3 have landed. The dynamic harness now reaches the link's +final emit stage instead of being rejected at ELF read; failures have +shifted from `(link)` to runtime crashes (`run rc=139`) on the +produced binary, which is the expected outcome until Phases 4–6 +(synthetic `.plt`/`.got`/`.dynamic`, PIE emission) are written. Every +3-test static variant still passes (no regression), and the existing +`test-link` / `test-cg` / `test-elf` suites are clean. + +| Phase | State | Where to look | +|------:|--------------|-----------------------------------------------------------------| +| 1 | done | `src/obj/elf_read.c::read_elf_dso`, new RelocKinds in `obj.h` | +| 2 | done | `driver/ld.c` (`-dynamic-linker`, `.so` argv), `lib_resolve.c` | +| 3 | done | `link_layout.c::find_dso_export` + `resolve_undefs` extension | +| 4 | not started | per §3.4 below | +| 5 | not started | per §3.5 below | +| 6 | not started | per §3.6 below | +| 7 | not started | per §3.7 below | +| 8 | deferred | TLS GD/IE/LD, IRELATIVE — out of scope for v1 | + +Notes that drifted from the original plan during 1–3 implementation: +- The DSO input shares the existing `ObjBuilder` rather than a new + `DsoBuilder` (open question §5.1). `read_elf_dso` produces an + `ObjBuilder` with only the DSO's exported dynsym entries and no + sections; the SONAME lives on `LinkInput.soname`. Symbol-walk + passes that touched `InputMap` (`resolve_symbols`, + `link_symbols_to_sections`) early-out for `LINK_INPUT_DSO_BYTES` + inputs, since DSOs contribute no per-input map. If §4–§6 turn out + to want a proper `DsoBuilder`, the migration is local — the + reader's call sites and the input enum are the only public + surface. +- `link_add_dso_bytes` falls back to the DSO's filename basename + when DT_SONAME is absent (matches GNU ld). PT_DYNAMIC walking from + pure `e_phoff` (no `SHT_DYNAMIC`) is stubbed; not exercised by + musl `libc.so` since musl does ship `SHT_DYNAMIC`. +- `-l<name>` resolution under non-`-Bstatic` modes searches `.so` + across every `-L` dir before falling back to `.a` — suffix-first + rather than dir-first, matching `clang -l`. + +The §2 inventory below catalogues the full set of model, layout, +emit, and driver gaps as they stood pre-Phase 1. The text describes +the **original** state of each file so the rest of the plan reads in +context; what's actually changed since is captured in the Status +table above and noted inline where useful. --- @@ -54,90 +96,102 @@ the shared path. ## 2. Investigation: current pipeline state -### 2.1 Driver — `driver/ld.c` +### 2.1 Driver — `driver/ld.c` *(addressed in Phase 2)* -Already has: +Already had: - `-shared`, `-soname`, `-rpath`, `-rpath-link`, `-Bstatic`/`-Bdynamic`, `--enable-new-dtags`, `-pie`, `-no-pie`, `-E`/`--export-dynamic`, `--whole-archive`, `--start-group`/`--end-group`, `-l<name>` resolution (`driver/lib_resolve.c`). -- `cfree_link_shared` dispatch wired (`driver/ld.c:640-668`); shared - options are populated and passed. - -Missing: -- No `-dynamic-linker` / `--dynamic-linker` flag. Unknown flags are - rejected (`driver/ld.c:427`), so callers can't even pass it as a - forward-compat no-op. -- No `.so` recognition. Only `.a` is special-cased in argv parse - (`driver/ld.c:432`); everything else becomes an "object file" and - goes into `obj_bytes`. `driver_lib_resolve` (used by `-l`) does not - appear to distinguish `.so` from `.a` either — confirm and extend. -- `-l<name>` resolution doesn't honor the current `-Bstatic`/`-Bdynamic` - link-mode for picking `.so` over `.a`. - -### 2.2 ELF reader — `src/obj/elf_read.c` - -`read_elf` is the single ingest path used by both `link_add_obj_bytes` -(`src/link/link.c:128`) and archive members (`src/link/link.c:207`). - -What it parses: -- `e_shoff` / shdrs only. Program headers ignored. -- `SHT_PROGBITS`/`NOBITS`/`NOTE`/etc. → ObjBuilder sections. -- Exactly one `SHT_SYMTAB` (the `.symtab`) into ObjSyms. -- `SHT_RELA` / `SHT_REL` whose `sh_info` points at a kept section. -- `SHT_GROUP`. - -Why it rejects `libc.so`: `elf_read.c:395` enforces -`sh->sh_info != 0 && sh->sh_info < e_shnum` for every RELA/REL. Shared -objects' `.rela.dyn` and `.rela.plt` carry `sh_info = 0` (a dynamic -reloc isn't bound to one specific output section) — which is **valid -ELF** but hits this guard. Also missing: - -- No `e_type` discrimination (silently accepts ET_DYN/ET_EXEC and - proceeds; would corrupt the global symbol pool if not for the rela - guard tripping first). -- No `SHT_DYNSYM` / `SHT_DYNAMIC` / `SHT_GNU_HASH` reader. -- No `PT_DYNAMIC` walk. - -### 2.3 Object model — `src/obj/obj.h` - -The relocation enum (`src/obj/obj.h:96-127`) has no entries for: -- `R_AARCH64_GLOB_DAT` / `R_AARCH64_JUMP_SLOT` / `R_AARCH64_RELATIVE` - / `R_AARCH64_COPY` / `R_AARCH64_TLSDESC*` — the dynamic-only relocs - the loader processes at startup. -- `R_AARCH64_PLT32` (4-byte PLT-relative) — typically not used on - AArch64 (CALL26/JUMP26 carry the PLT semantics) but the mapping - table in `src/obj/elf_reloc_aarch64.c` would refuse it if seen. - -The mapping in `src/obj/elf_reloc_aarch64.c` returns `(u32)-1` for -unsupported types, which `read_elf` panics on (`elf_read.c:413`). So -even with the `sh_info==0` check relaxed, GLOB_DAT in `libc.so`'s -`.rela.dyn` would trip the next guard. - -There's also no notion of an **import** symbol or a **DSO input**. -Today `LinkInput.kind ∈ {OBJ, OBJ_BYTES, ARCHIVE_BYTES}` (`link.h:10`); -`LinkSymbol` has `defined`, `kind`, `value`, `vaddr` but no "needs PLT -slot" / "needs GOT slot" / "lives in DSO N" fields. - -### 2.4 Linker resolve — `src/link/link_layout.c` - -- Static-only by construction. Comment at `link_layout.c:1646` and the - IFUNC ctor logic key off `l->emit_static_exe`; the rest of the layout - has no symmetric "build dynamic image" branch. -- `resolve_undefs` (`link_layout.c:247`) panics on any undef that isn't - satisfied by `img->globals` or the in-process resolver - (`link_layout.c:300-307`). Dynamic linking needs a third path: undef - satisfied by an imported DSO sym, recorded but **kept undefined in - the static sense** so it routes through .plt/.got at apply time. -- `section_kept` (`link_layout.c:53`) drops everything that isn't - ALLOC PROGBITS/NOBITS/INIT_ARRAY. Synthesized .dynsym / .dynstr / - .dynamic / .got.plt / .plt / .rela.dyn / .rela.plt would need to be - added as image-owned synthetic sections (same model as `layout_iplt` - uses for `.iplt`/`.igot.plt`/`.iplt.pairs`). -- `link_image_alloc` and `LinkImage` (`link_internal.h:105-148`) carry - no fields for: dynamic strtab, dynsym table, hash table, PLT/GOT - slot tables, dynamic-reloc list, PT_INTERP path, soname, DT_NEEDED - list, runpath/rpath lists. +- `cfree_link_shared` dispatch wired (`driver/ld.c`); shared options + are populated and passed. + +Was missing (now fixed): +- ~~`-dynamic-linker` / `--dynamic-linker` flag~~ → parsed and plumbed + through `CfreeLinkOptions.interp_path`. +- ~~`.so` / `.so.N` filename recognition~~ → `driver_is_so_filename` + routes positional shared inputs into `LdOptions.dsos[]` → + `CfreeLinkInputs.dso_bytes`. +- ~~`-l<name>` honoring `-Bstatic`/`-Bdynamic`~~ → `driver_lib_resolve` + takes a `LibResolveMode`; non-`-Bstatic` callers prefer `lib<name>.so` + before falling back to `lib<name>.a`. The function reports which + suffix matched so the driver routes hits to `dsos[]` vs. + `archives[]`. + +### 2.2 ELF reader — `src/obj/elf_read.c` *(addressed in Phase 1)* + +Pre-Phase-1, `read_elf` was the single ingest path used by both +`link_add_obj_bytes` and archive members. It parsed `e_shoff` / shdrs +only (program headers ignored), folded `SHT_PROGBITS`/`NOBITS`/`NOTE` +into ObjBuilder sections, took at most one `SHT_SYMTAB`, and walked +`SHT_RELA`/`SHT_REL` whose `sh_info` named a kept section. + +It rejected `libc.so` because the guard +`sh->sh_info != 0 && sh->sh_info < e_shnum` tripped on `.rela.dyn` / +`.rela.plt` (whose `sh_info = 0` is valid ELF — a dynamic reloc isn't +bound to one output section). It also accepted ET_DYN silently and +had no SHT_DYNSYM / SHT_DYNAMIC / PT_DYNAMIC reader. + +Post-Phase-1: +- `read_elf` rejects anything other than ET_REL with a diagnostic; + ET_DYN inputs are routed through a separate `read_elf_dso`. +- `read_elf_dso` parses `SHT_DYNSYM` (skipping `.symtab` if present), + walks `SHT_DYNAMIC` for `DT_SONAME` (interned into the compiler's + global Sym pool), and explicitly skips `.rela.*` / `SHT_GROUP` — + DSO inputs contribute no relocations or sections to the consumer. +- Defined dynsym entries are appended as `ObjSym`s with + `section_id = OBJ_SEC_NONE` and `value = 0`; only the name is + load-bearing for the consumer's resolver. `STB_LOCAL` and undefined + dynsym entries (the DSO's own imports) are dropped. + +### 2.3 Object model — `src/obj/obj.h` *(addressed in Phases 1, 3)* + +Pre-Phase-1, `RelocKind` had no entries for the dynamic-only relocs +(`R_AARCH64_GLOB_DAT` / `R_AARCH64_JUMP_SLOT` / `R_AARCH64_RELATIVE` / +`R_AARCH64_COPY`); the mapping in `src/obj/elf_reloc_aarch64.c` +returned `(u32)-1` on unsupported types and the reader panicked. +There was also no concept of an **import** symbol or a **DSO input**. + +Post-Phase-1/3: +- `RelocKind` carries `R_AARCH64_GLOB_DAT`, `R_AARCH64_JUMP_SLOT`, + `R_AARCH64_RELATIVE`, `R_AARCH64_COPY`, with both directions of + `elf_aarch64_reloc_{to,from}` wired and the objdump-side + `reloc_kind_name` extended. `R_AARCH64_TLSDESC*` and `R_AARCH64_PLT32` + are still deferred (Phase 8 / not exercised by the musl harness). +- `LinkInput.kind` gains `LINK_INPUT_DSO_BYTES`, plus a `Sym soname` + field; `link_add_dso_bytes` builds these via `read_elf_dso`. +- `LinkSymbol` gains `imported`, `dso_input_id`, and a flag set + (`needs_plt`, `needs_got`, `needs_copy`). The flags are reserved + for Phases 4–5; today only `imported` and `dso_input_id` are set — + by `resolve_undefs` when an undef is matched against a DSO export. + An imported symbol stays `defined=0` (the static linker has no + vaddr for it) but no longer trips the panic. + +### 2.4 Linker resolve — `src/link/link_layout.c` *(partially addressed in Phase 3; Phase 4 owns the rest)* + +- Static-only by construction. Comment near the IFUNC ctor logic + keys off `l->emit_static_exe`; the rest of the layout still has no + symmetric "build dynamic image" branch — Phase 4 work. +- `resolve_undefs` *(post-Phase-3)*: walks DSO inputs via + `find_dso_export` before falling through to the resolver / weak- + zero path. On a hit, the undef is marked `imported=1`, + `dso_input_id=<DSO id>`, and resolution continues. The + still-undefined-in-the-static-sense semantics is exactly what + Phases 4–5 need: the symbol routes through synthetic .plt/.got at + apply time. DSO inputs themselves are skipped by `resolve_symbols` + (their exports must not contend with internal definitions in + `img->globals`) and by `link_symbols_to_sections` (no per-input + `InputMap` is allocated for them). +- `section_kept` still drops everything that isn't ALLOC + PROGBITS/NOBITS/INIT_ARRAY. Synthesized .dynsym / .dynstr / + .dynamic / .got.plt / .plt / .rela.dyn / .rela.plt still need to + be added as image-owned synthetic sections (Phase 4, same model + as `layout_iplt`'s `.iplt`/`.igot.plt`/`.iplt.pairs`). +- `link_image_alloc` and `LinkImage` still carry no fields for: + dynamic strtab, dynsym table, hash table, PLT/GOT slot tables, + dynamic-reloc list, PT_INTERP path, DT_NEEDED list, runpath/rpath + lists. The DT_NEEDED soname does live on `LinkInput.soname` — + collecting the actually-used set is Phase 4. ### 2.5 ELF emit — `src/link/link_elf.c` @@ -292,16 +346,17 @@ is simpler to implement** — initialize all `.got.plt` slots from 4. Default: when any DSO input or `-pie` is present, output is ET_DYN with a default interp; otherwise ET_EXEC (current behavior). -### 3.8 Public API +### 3.8 Public API *(items 1–2 done in Phase 2; item 3 is Phase 7)* -1. `CfreeLinkInputs` gains `dso_bytes` + `ndso_bytes` fields - (parallel to `obj_bytes`). -2. `CfreeLinkOptions` gains `interp_path` and a `pie` flag (or - `output_kind ∈ {EXE_STATIC, EXE_PIE, SHARED}`). -3. `cfree_link_shared` stub at `src/api/pipeline.c:413` becomes a - thin wrapper that dispatches into the same layout as `link_exe` - but with `output_kind = SHARED` (no PT_INTERP, no entry symbol - required, allow_undefined=1). +1. ~~`CfreeLinkInputs` gains `dso_bytes` + `ndso_bytes` fields~~ — + landed. +2. ~~`CfreeLinkOptions` gains `interp_path` and a `pie` flag~~ — + landed (kept as two scalar fields rather than a single + `output_kind` enum; Phase 4/6 will decide whether to fold them). +3. `cfree_link_shared` stub at `src/api/pipeline.c` becomes a thin + wrapper that dispatches into the same layout as `link_exe` but + with `output_kind = SHARED` (no PT_INTERP, no entry symbol + required, allow_undefined=1) — Phase 7. --- @@ -311,47 +366,71 @@ Each phase is independently testable against `test/musl/run.sh`'s dynamic variant. Phases (1)-(3) are the ELF-reader cleanup that unblocks every later step; (4)-(8) are the actual link work. -### Phase 1 — ELF reader: accept ET_DYN as a DSO input *(small)* +### Phase 1 — ELF reader: accept ET_DYN as a DSO input *(done)* Files: `src/obj/elf_read.c`, `src/obj/elf_reloc_aarch64.c`, -`src/obj/obj.h`, `src/link/link.c`. - -- Add `read_elf_dso` returning a `DsoBuilder`. Callers in - `src/link/link.c` dispatch on `e_type`. -- `LINK_INPUT_DSO_BYTES` enum + `link_add_dso_bytes` API. -- New RelocKinds (GLOB_DAT, JUMP_SLOT, RELATIVE, COPY) wired through - `elf_aarch64_reloc_{to,from}`. -- DSO input is *parsed but not laid out* — its dynsym is searchable - during `resolve_undefs`, but it contributes no sections to the - image. - -Test: harness no longer fails at "rela sh_info 0 out of range". Next -failure surfaces. - -### Phase 2 — Driver: `-dynamic-linker`, `.so` inputs *(small)* - -Files: `driver/ld.c`, `driver/lib_resolve.c`, `include/cfree.h`. - -- Parse `-dynamic-linker`, plumb to `CfreeLinkOptions`. -- Recognize `.so` / `.so.N` filenames; route to new `dso_bytes` slot. -- `-l<name>` under `-Bdynamic` finds `lib<name>.so` first. - -Test: case can be invoked end-to-end with the same flags GNU ld -takes; failure is now a missing model field, not a parse error. - -### Phase 3 — Resolve: imported-undef path *(medium)* - -Files: `src/link/link_layout.c`, `src/link/link_internal.h`, -`src/link/link.h`. - -- `LinkSymbol.imported`, `dso_id`, `needs_{plt,got,copy}` flags. -- `resolve_undefs` extension: search DSO inputs by name before the - panic. On hit, mark imported; record DT_NEEDED. -- Emit-time decisions deferred — at this phase the imported syms - just aren't fatal anymore. - -Test: link reaches layout. Failure shifts to "no .plt", "abs reloc -target has no vaddr", or similar — i.e., real layout work. +`src/obj/elf.h`, `src/obj/obj.h`, `src/link/link.{h,c}`. + +- ~~Add `read_elf_dso`~~. Returns an `ObjBuilder` (not a sibling + `DsoBuilder` — see Status notes); the consumer's `LinkInput` + carries the soname separately. +- ~~`LINK_INPUT_DSO_BYTES` enum + `link_add_dso_bytes` API.~~ +- ~~New RelocKinds (GLOB_DAT, JUMP_SLOT, RELATIVE, COPY) wired + through `elf_aarch64_reloc_{to,from}`.~~ +- ~~DSO input is *parsed but not laid out*~~ — its exported dynsym + entries are searchable during `resolve_undefs` but it contributes + no sections to the image. `resolve_symbols` and + `link_symbols_to_sections` short-circuit on + `LINK_INPUT_DSO_BYTES`. + +Test: ✓ harness no longer fails at "rela sh_info 0 out of range"; +ELF read of `libc.so` succeeds. + +### Phase 2 — Driver: `-dynamic-linker`, `.so` inputs *(done)* + +Files: `driver/ld.c`, `driver/lib_resolve.{h,c}`, `driver/cc.c` +(call-site update), `include/cfree.h`, `src/api/pipeline.c` (DSO +input plumbing). + +- ~~Parse `-dynamic-linker` / `--dynamic-linker [=]PATH`~~; plumbed + through `CfreeLinkOptions.interp_path`. +- ~~Recognize `.so` / `.so.N` filenames~~ via + `driver_is_so_filename`; positional shared inputs route to + `LdOptions.dsos[]` → `CfreeLinkInputs.dso_bytes`. +- ~~`-l<name>` under `-Bdynamic` finds `lib<name>.so` first~~ via + `LibResolveMode`. `cc.c` uses `LIB_RESOLVE_STATIC_ONLY` (driver + default unchanged); `ld.c` picks the mode from the current + `-Bstatic`/`-Bdynamic` state. +- `CfreeLinkOptions.pie` carries `-pie` through to the linker for + Phase 6 to consume. + +Test: ✓ harness invokes cfree with `-pie` and `libc.so` end-to-end; +failure is now in the link's emit stage, not a parse error. + +### Phase 3 — Resolve: imported-undef path *(done)* + +Files: `src/link/link_layout.c`, `src/link/link.h`. + +- ~~`LinkSymbol.imported`, `dso_input_id`, + `needs_{plt,got,copy}` flags.~~ Declared in `link.h`; only + `imported` and `dso_input_id` are populated today (the `needs_*` + flags are reserved for Phase 4–5 decisions). +- ~~`resolve_undefs` extension~~ via `find_dso_export`: walks DSO + inputs in input order before the resolver/weak-zero/panic + fallback. On hit, marks the symbol imported and stamps + `dso_input_id`. The DSO's soname (already on `LinkInput.soname`) + is the eventual DT_NEEDED entry; collecting the actually-used + set into the image is Phase 4. +- ~~Emit-time decisions deferred~~ — imported syms are no longer + fatal but still have no vaddr. They'll panic the reloc apply + path the moment a CALL26 / ABS64 / ADR_GOT_PAGE targets one, + which is the wedge for Phase 4–5. + +Test: ✓ link reaches emit. Dynamic harness moved from `(link)` +failures to `(run rc=139)` runtime crashes — the produced binary +has no PT_INTERP / PT_DYNAMIC / .plt yet, so the loader can't bind +it. All 3 static cases still pass; all 756 cg tests, 118 link tests, +and the elf/ar/lib-deps suites still pass (no regression). ### Phase 4 — Synthetic dyn-tables *(medium)* @@ -462,18 +541,27 @@ near-term surface. ## 6. Test plan -`test/musl/run.sh dynamic` is the integration test. Per-phase -expected progressions: +`test/musl/run.sh dynamic` is the integration test, accessible via +`make test-musl` (the target declares the sysroot, runtime, and +driver binary as Make prereqs so a fresh checkout boots cleanly). +Per-phase expected progressions: | Phase | `01_syscall_write` | `02_errno_touch` | `03_printf_hello` | |------:|--------------------|-------------------|-------------------| | pre | link: rela sh_info | link: rela sh_info| link: rela sh_info| -| 1 | link: …unsupported reloc / model | … | … | -| 2 | link: model gap | … | … | -| 3 | link: layout gap | … | … | -| 4 | mmap ok / segfault | … | … | -| 5 | run pass | run: GLOB_DAT path| run: PLT call path| -| 6 | run pass | run pass | run pass | +| 1 | link: model gap | link: model gap | link: model gap | +| 2 | link: model gap | link: model gap | link: model gap | +| **3** | **run rc=139** | **run rc=139** | **run rc=139** | +| 4 | mmap ok / segfault | … | … | +| 5 | run pass | run: GLOB_DAT path| run: PLT call path| +| 6 | run pass | run pass | run pass | + +(Bold row = current state.) Phases 1–2 didn't surface as the +intermediate states predicted in the original plan because the +implementation landed Phases 1+2+3 in sequence inside a single +session — there was never a build that exposed the "Phase 1 only" +or "Phase 2 only" failure shapes. The post-Phase-3 row is the first +state observable in a finished tree. A unit-level harness for the synthetic-section builder (Phase 4) is worth adding under `test/link/dyn/` — round-trip the `.dynsym` / diff --git a/driver/cc.c b/driver/cc.c @@ -555,8 +555,9 @@ static int cc_resolve_pending_libs(CcOptions* o) { for (i = 0; i < o->npending_libs; ++i) { char* p; size_t sz; - if (driver_lib_resolve(o->env, o->pending_libs[i], o->lib_search_paths, - o->nlib_search_paths, &p, &sz) != 0) { + if (driver_lib_resolve(o->env, o->pending_libs[i], LIB_RESOLVE_STATIC_ONLY, + o->lib_search_paths, o->nlib_search_paths, &p, &sz, + NULL) != 0) { driver_errf(CC_TOOL, "library not found: -l%s", o->pending_libs[i]); return 1; } diff --git a/driver/ld.c b/driver/ld.c @@ -50,6 +50,16 @@ typedef struct LdArchive { uint8_t group_id; /* cyclic resolution group id; 0 = single-pass */ } LdArchive; +/* Per-DSO ownership info. The DSO bytes are loaded straight off disk + * via env->file_io into the CfreeBytesInput passed to libcfree; only + * the path itself may need to be free'd if it came from -l<name> + * resolution. */ +typedef struct LdDso { + const char* path; /* path used for both open and CfreeBytesInput.name */ + int owned; /* 1 if `path` was alloc'd by lib_resolve */ + size_t owned_size; /* allocation size (for driver_free) */ +} LdDso; + typedef struct LdOptions { DriverEnv* env; size_t argv_bound; @@ -60,6 +70,11 @@ typedef struct LdOptions { int output_seen; const char* entry; /* -e */ const char* script_path; /* -T */ + /* PT_INTERP path. NULL means "let libcfree pick the target default + * (e.g. /lib/ld-musl-aarch64.so.1)". Set by -dynamic-linker / + * --dynamic-linker. */ + const char* interp_path; + int pie; /* -pie was requested */ const char** object_files; uint32_t nobject_files; @@ -67,6 +82,12 @@ typedef struct LdOptions { LdArchive* archives; uint32_t narchives; + /* Shared-object inputs (positional .so / .so.N or `-l<name>` under + * -Bdynamic). The runtime loader resolves these by SONAME at link + * time → DT_NEEDED entries. */ + LdDso* dsos; + uint32_t ndsos; + const char** lib_dirs; /* -L */ uint32_t nlib_dirs; @@ -181,11 +202,12 @@ static int ld_alloc_arrays(LdOptions* o, int argc) { o->object_files = driver_alloc_zeroed(o->env, bound * sizeof(*o->object_files)); o->archives = driver_alloc_zeroed(o->env, bound * sizeof(*o->archives)); + o->dsos = driver_alloc_zeroed(o->env, bound * sizeof(*o->dsos)); o->lib_dirs = driver_alloc_zeroed(o->env, bound * sizeof(*o->lib_dirs)); o->rpaths = driver_alloc_zeroed(o->env, bound * sizeof(*o->rpaths)); o->rpath_links = driver_alloc_zeroed(o->env, bound * sizeof(*o->rpath_links)); - if (!o->object_files || !o->archives || !o->lib_dirs || !o->rpaths || - !o->rpath_links) { + if (!o->object_files || !o->archives || !o->dsos || !o->lib_dirs || + !o->rpaths || !o->rpath_links) { driver_errf(LD_TOOL, "out of memory"); return 1; } @@ -206,6 +228,45 @@ static void ld_push_archive(LdOptions* o, const char* path, int owned, a->group_id = o->cur_group_id; } +static void ld_push_dso(LdOptions* o, const char* path, int owned, + size_t owned_size) { + LdDso* d = &o->dsos[o->ndsos++]; + d->path = path; + d->owned = owned; + d->owned_size = owned_size; +} + +/* Filename ends in `.so` (with no further extension) or in `.so.N` + * for some run of digits and dots. */ +static int driver_is_so_filename(const char* path) { + size_t n = driver_strlen(path); + size_t i; + /* Walk from the end: trim trailing ".N" / ".N.N" sequences if any, + * then check that we land on ".so". */ + i = n; + while (i > 0) { + /* Strip a trailing ".<digits>" cluster (e.g. ".1", ".26"). */ + size_t end = i; + size_t j = i; + while (j > 0) { + char c = path[j - 1]; + if (c >= '0' && c <= '9') { + --j; + continue; + } + break; + } + if (j < end && j > 0 && path[j - 1] == '.') { + i = j - 1; + continue; + } + break; + } + if (i >= 3 && path[i - 3] == '.' && path[i - 2] == 's' && path[i - 1] == 'o') + return 1; + return 0; +} + /* ---------- --build-id parsing ---------- */ static int hex_nibble(char c) { @@ -335,16 +396,26 @@ static int ld_parse(int argc, char** argv, LdOptions* o) { const char* name = a[2] ? a + 2 : (++i < argc ? argv[i] : NULL); char* resolved; size_t resolved_size; + LibResolveKind kind; + LibResolveMode mode; if (!name) { driver_errf(LD_TOOL, "-l requires an argument"); return 1; } - if (driver_lib_resolve(o->env, name, o->lib_dirs, o->nlib_dirs, &resolved, - &resolved_size) != 0) { + /* -Bstatic forces .a only; everything else (default, + * -Bdynamic, --as-needed) prefers .so but falls back to .a. */ + mode = (o->cur_link_mode == CFREE_LM_STATIC) ? LIB_RESOLVE_STATIC_ONLY + : LIB_RESOLVE_DYNAMIC_PREFER; + if (driver_lib_resolve(o->env, name, mode, o->lib_dirs, o->nlib_dirs, + &resolved, &resolved_size, &kind) != 0) { driver_errf(LD_TOOL, "cannot find -l%s", name); return 1; } - ld_push_archive(o, resolved, 1, resolved_size); + if (kind == LIB_RESOLVE_KIND_SHARED) { + ld_push_dso(o, resolved, 1, resolved_size); + } else { + ld_push_archive(o, resolved, 1, resolved_size); + } continue; } @@ -354,6 +425,20 @@ static int ld_parse(int argc, char** argv, LdOptions* o) { } if (driver_streq(a, "-pie")) { o->target.pic = CFREE_PIC_PIE; + o->pie = 1; + continue; + } + if (driver_streq(a, "-dynamic-linker") || + driver_streq(a, "--dynamic-linker")) { + if (++i >= argc) { + driver_errf(LD_TOOL, "-dynamic-linker requires an argument"); + return 1; + } + o->interp_path = argv[i]; + continue; + } + if ((val = arg_eq_value(a, "--dynamic-linker")) != NULL) { + o->interp_path = val; continue; } if (driver_streq(a, "-no-pie")) { @@ -488,6 +573,8 @@ static int ld_parse(int argc, char** argv, LdOptions* o) { if (driver_has_suffix(a, ".a")) { ld_push_archive(o, a, 0, 0); + } else if (driver_is_so_filename(a)) { + ld_push_dso(o, a, 0, 0); } else { o->object_files[o->nobject_files++] = a; } @@ -501,7 +588,7 @@ static int ld_parse(int argc, char** argv, LdOptions* o) { driver_errf(LD_TOOL, "missing --end-group"); return 1; } - if (o->nobject_files == 0 && o->narchives == 0) { + if (o->nobject_files == 0 && o->narchives == 0 && o->ndsos == 0) { driver_errf(LD_TOOL, "no input files"); ld_usage(); return 1; @@ -520,11 +607,18 @@ static void ld_options_release(LdOptions* o) { driver_free(o->env, (void*)a->path, a->owned_size); } } + for (i = 0; i < o->ndsos; ++i) { + LdDso* d = &o->dsos[i]; + if (d->owned && d->path) { + driver_free(o->env, (void*)d->path, d->owned_size); + } + } if (o->build_id_bytes) { driver_free(o->env, o->build_id_bytes, o->build_id_alloc); } driver_free(o->env, o->object_files, bound * sizeof(*o->object_files)); driver_free(o->env, o->archives, bound * sizeof(*o->archives)); + driver_free(o->env, o->dsos, bound * sizeof(*o->dsos)); driver_free(o->env, o->lib_dirs, bound * sizeof(*o->lib_dirs)); driver_free(o->env, o->rpaths, bound * sizeof(*o->rpaths)); driver_free(o->env, o->rpath_links, bound * sizeof(*o->rpath_links)); @@ -574,9 +668,11 @@ static int ld_run_link(LdOptions* o) { CfreeWriter* writer = NULL; LoadedFile* obj_lf = NULL; LoadedFile* arch_lf = NULL; + LoadedFile* dso_lf = NULL; LoadedFile script_lf = {0}; CfreeBytesInput* obj_in = NULL; CfreeBytesInputArchive* arch_in = NULL; + CfreeBytesInput* dso_in = NULL; const CfreeLinkScript* script = NULL; CfreeLinkInputs inputs; CfreeLinkOptions link_opts; @@ -606,6 +702,14 @@ static int ld_run_link(LdOptions* o) { goto out; } } + if (o->ndsos) { + dso_lf = driver_alloc_zeroed(o->env, o->ndsos * sizeof(*dso_lf)); + dso_in = driver_alloc_zeroed(o->env, o->ndsos * sizeof(*dso_in)); + if (!dso_lf || !dso_in) { + driver_errf(LD_TOOL, "out of memory"); + goto out; + } + } /* Load object files. */ for (i = 0; i < o->nobject_files; ++i) { @@ -632,6 +736,17 @@ static int ld_run_link(LdOptions* o) { arch_in[i].link_mode = a->link_mode; arch_in[i].group_id = a->group_id; } + /* Load shared objects. */ + for (i = 0; i < o->ndsos; ++i) { + const LdDso* d = &o->dsos[i]; + if (load_file(io, d->path, &dso_lf[i]) != 0) { + driver_errf(LD_TOOL, "failed to read: %s", d->path); + goto out; + } + dso_in[i].name = d->path; + dso_in[i].data = dso_lf[i].data.data; + dso_in[i].len = dso_lf[i].data.size; + } /* Load and parse the linker script (if any). The structured script is * arena-owned by the compiler; we free it explicitly before the @@ -683,6 +798,8 @@ static int ld_run_link(LdOptions* o) { inputs.nobj_bytes = o->nobject_files; inputs.archives = arch_in; inputs.narchives = o->narchives; + inputs.dso_bytes = dso_in; + inputs.ndso_bytes = o->ndsos; inputs.linker_script = script; inputs.entry = o->entry; inputs.build_id_mode = o->build_id_mode; @@ -723,6 +840,8 @@ static int ld_run_link(LdOptions* o) { link_opts = zero; link_opts.inputs = inputs; link_opts.gc_sections = o->gc_sections; + link_opts.pie = o->pie; + link_opts.interp_path = o->interp_path; if (o->export_dynamic) { /* TODO(#5/exe): once CfreeLinkOptions grows an export_dynamic * field (or per-symbol export list for executables), wire it @@ -749,10 +868,13 @@ out: release_file(&script_lf); release_all(arch_lf, o->narchives); release_all(obj_lf, o->nobject_files); + release_all(dso_lf, o->ndsos); if (arch_in) driver_free(o->env, arch_in, o->narchives * sizeof(*arch_in)); if (arch_lf) driver_free(o->env, arch_lf, o->narchives * sizeof(*arch_lf)); if (obj_in) driver_free(o->env, obj_in, o->nobject_files * sizeof(*obj_in)); if (obj_lf) driver_free(o->env, obj_lf, o->nobject_files * sizeof(*obj_lf)); + if (dso_in) driver_free(o->env, dso_in, o->ndsos * sizeof(*dso_in)); + if (dso_lf) driver_free(o->env, dso_lf, o->ndsos * sizeof(*dso_lf)); return rc; } diff --git a/driver/lib_resolve.c b/driver/lib_resolve.c @@ -3,16 +3,19 @@ #include <stddef.h> #include <stdint.h> -/* Compose `<dir>/lib<name>.a` into a fresh heap buffer. Inserts a separating - * '/' iff `dir` does not already end in one. Empty `dir` is treated as the - * current directory: the path is `lib<name>.a`. */ +/* Compose `<dir>/lib<name><suffix>` into a fresh heap buffer. Inserts + * a separating '/' iff `dir` does not already end in one. Empty `dir` + * is treated as the current directory: the path becomes + * `lib<name><suffix>`. `suffix` is e.g. ".a" or ".so" — caller-owned, + * NUL-terminated. */ static char* compose_path(DriverEnv* env, const char* dir, const char* name, - size_t* out_size) { + const char* suffix, size_t* out_size) { size_t dlen = driver_strlen(dir); size_t nlen = driver_strlen(name); + size_t slen = driver_strlen(suffix); size_t need_slash = (dlen > 0 && dir[dlen - 1] != '/') ? 1 : 0; - /* "<dir>" + "/"? + "lib" + "<name>" + ".a" + NUL */ - size_t bytes = dlen + need_slash + 3 + nlen + 2 + 1; + /* "<dir>" + "/"? + "lib" + "<name>" + "<suffix>" + NUL */ + size_t bytes = dlen + need_slash + 3 + nlen + slen + 1; char* buf = driver_alloc(env, bytes); size_t off = 0; if (!buf) return NULL; @@ -29,22 +32,25 @@ static char* compose_path(DriverEnv* env, const char* dir, const char* name, driver_memcpy(buf + off, name, nlen); off += nlen; } - driver_memcpy(buf + off, ".a", 2); - off += 2; + if (slen) { + driver_memcpy(buf + off, suffix, slen); + off += slen; + } buf[off] = '\0'; *out_size = bytes; return buf; } -int driver_lib_resolve(DriverEnv* env, const char* name, - const char* const* search_dirs, uint32_t nsearch_dirs, - char** out_path, size_t* out_size) { +/* Try one (suffix, kind) pair across every search dir; return 0 on + * the first hit. Allocations for non-matching candidates are freed + * before the next attempt. */ +static int try_suffix(DriverEnv* env, const char* name, const char* suffix, + const char* const* search_dirs, uint32_t nsearch_dirs, + char** out_path, size_t* out_size) { uint32_t i; - if (!env || !name) return 1; - for (i = 0; i < nsearch_dirs; ++i) { size_t bytes; - char* cand = compose_path(env, search_dirs[i], name, &bytes); + char* cand = compose_path(env, search_dirs[i], name, suffix, &bytes); if (!cand) return 1; if (driver_path_exists(cand)) { *out_path = cand; @@ -55,3 +61,33 @@ int driver_lib_resolve(DriverEnv* env, const char* name, } return 1; } + +int driver_lib_resolve(DriverEnv* env, const char* name, LibResolveMode mode, + const char* const* search_dirs, uint32_t nsearch_dirs, + char** out_path, size_t* out_size, + LibResolveKind* out_kind) { + if (!env || !name) return 1; + + /* GNU-ld order: under dynamic mode prefer .so over .a within the + * same search dir. In practice that means we still iterate dirs in + * order, but for each dir try .so first when applicable. To keep + * the implementation simple and match `clang -l` behaviour, we + * iterate suffix-first instead — `.so` is searched across every + * -L dir before falling back to `.a`. The musl/Alpine layout we + * target keeps both side-by-side, so the difference is invisible + * for the cases the harness exercises. */ + if (mode != LIB_RESOLVE_STATIC_ONLY) { + if (try_suffix(env, name, ".so", search_dirs, nsearch_dirs, out_path, + out_size) == 0) { + if (out_kind) *out_kind = LIB_RESOLVE_KIND_SHARED; + return 0; + } + if (mode == LIB_RESOLVE_DYNAMIC_ONLY) return 1; + } + if (try_suffix(env, name, ".a", search_dirs, nsearch_dirs, out_path, + out_size) == 0) { + if (out_kind) *out_kind = LIB_RESOLVE_KIND_ARCHIVE; + return 0; + } + return 1; +} diff --git a/driver/lib_resolve.h b/driver/lib_resolve.h @@ -3,21 +3,42 @@ #include "driver.h" -/* Resolve `-l<name>` against a list of `-L`-style search directories. +/* Whether driver_lib_resolve should look for shared libraries (.so), + * archives (.a), or both. `LIB_RESOLVE_AUTO` follows the GNU-ld + * positional rule: under -Bdynamic try `lib<name>.so` first then + * `lib<name>.a`; under -Bstatic try `lib<name>.a` only. * - * On success, returns 0 and writes a heap-allocated, NUL-terminated path - * into `*out_path`, with its allocation size in `*out_size`. The caller - * frees the path via driver_free(env, *out_path, *out_size). + * `out_kind` (when non-NULL) reports which suffix actually matched so + * the caller can route the result into the right input slot + * (dso_bytes vs. archives). */ +typedef enum LibResolveMode { + LIB_RESOLVE_STATIC_ONLY, + LIB_RESOLVE_DYNAMIC_PREFER, /* .so first, then .a (default for dynamic + link mode) */ + LIB_RESOLVE_DYNAMIC_ONLY, +} LibResolveMode; + +typedef enum LibResolveKind { + LIB_RESOLVE_KIND_ARCHIVE = 0, + LIB_RESOLVE_KIND_SHARED = 1, +} LibResolveKind; + +/* Resolve `-l<name>` against a list of `-L`-style search directories. * - * On failure, returns nonzero with `*out_path` unchanged. Failure cases: - * - no `lib<name>.a` exists in any of the search directories - * - allocation failure while constructing a candidate path + * On success, returns 0 and writes a heap-allocated, NUL-terminated + * path into `*out_path`, with its allocation size in `*out_size`. The + * caller frees the path via driver_free(env, *out_path, *out_size). + * If `out_kind` is non-NULL, *out_kind tells the caller whether the + * matched file is a `.so` (LIB_RESOLVE_KIND_SHARED) or a `.a` + * (LIB_RESOLVE_KIND_ARCHIVE). * - * v1 only resolves static archives (`lib<name>.a`); shared-library - * resolution (`lib<name>.so` / `.dylib` / `.dll`) waits on shared-output - * support in libcfree. */ -int driver_lib_resolve(DriverEnv* env, const char* name, + * On failure, returns nonzero with `*out_path` unchanged. Failure + * cases: + * - no candidate exists in any of the search directories + * - allocation failure while constructing a candidate path */ +int driver_lib_resolve(DriverEnv* env, const char* name, LibResolveMode mode, const char* const* search_dirs, uint32_t nsearch_dirs, - char** out_path, size_t* out_size); + char** out_path, size_t* out_size, + LibResolveKind* out_kind); #endif diff --git a/include/cfree.h b/include/cfree.h @@ -896,6 +896,14 @@ typedef struct CfreeLinkInputs { uint32_t nobj_bytes; const CfreeBytesInputArchive* archives; uint32_t narchives; + /* Shared-object inputs (ELF ET_DYN). Each entry's bytes are parsed + * via the linker's read_elf_dso path; the DSO contributes no + * sections to the output image, but its dynsym is searched during + * undef resolution so references against this DSO bind dynamically. + * The DSO's DT_SONAME (or its filename if missing) is recorded in + * the produced image's DT_NEEDED list. */ + const CfreeBytesInput* dso_bytes; + uint32_t ndso_bytes; /* Structured linker script. NULL means no script (target/format default * layout). Borrowed: must outlive the cfree_link_* call. */ const CfreeLinkScript* linker_script; @@ -918,6 +926,14 @@ typedef struct CfreeLinkInputs { typedef struct CfreeLinkOptions { CfreeLinkInputs inputs; int gc_sections; + /* PIE / dynamic-exe shape. When `pie` is set or any DSO input is + * present the output is ET_DYN; the runtime loader at + * `interp_path` (default `/lib/ld-musl-aarch64.so.1` for + * aarch64-linux when not specified) binds DT_NEEDED dependencies + * before transferring to the entry symbol. NULL `interp_path` with + * `pie==0` and no DSO inputs preserves the static ET_EXEC path. */ + int pie; + const char* interp_path; } CfreeLinkOptions; /* Options for shared-library link. diff --git a/src/api/pipeline.c b/src/api/pipeline.c @@ -358,6 +358,10 @@ static Linker* build_linker(Compiler* c, const CfreeLinkInputs* in) { link_add_obj_bytes(linker, in->obj_bytes[i].name, in->obj_bytes[i].data, in->obj_bytes[i].len); } + for (i = 0; i < in->ndso_bytes; ++i) { + link_add_dso_bytes(linker, in->dso_bytes[i].name, in->dso_bytes[i].data, + in->dso_bytes[i].len); + } for (i = 0; i < in->narchives; ++i) { const CfreeBytesInputArchive* a = &in->archives[i]; link_add_archive_bytes(linker, a->input.name, a->input.data, a->input.len, @@ -1021,6 +1025,14 @@ static const char* reloc_kind_name(u16 kind) { return "R_AARCH64_TLSLE_LDST64_TPREL_LO12"; case R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC: return "R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC"; + case R_AARCH64_GLOB_DAT: + return "R_AARCH64_GLOB_DAT"; + case R_AARCH64_JUMP_SLOT: + return "R_AARCH64_JUMP_SLOT"; + case R_AARCH64_RELATIVE: + return "R_AARCH64_RELATIVE"; + case R_AARCH64_COPY: + return "R_AARCH64_COPY"; case R_RV_HI20: return "R_RISCV_HI20"; case R_RV_LO12_I: diff --git a/src/link/link.c b/src/link/link.c @@ -40,7 +40,10 @@ static void linker_release(Linker* l) { * link_add_obj inputs are caller-owned and stay alive. */ for (i = 0; i < LinkInputs_count(&l->inputs); ++i) { LinkInput* in = LinkInputs_at(&l->inputs, i); - if (in->kind == LINK_INPUT_OBJ_BYTES && in->obj) obj_free(in->obj); + if ((in->kind == LINK_INPUT_OBJ_BYTES || + in->kind == LINK_INPUT_DSO_BYTES) && + in->obj) + obj_free(in->obj); } /* Free archive member ObjBuilders that were never pulled into inputs. * Pulled members had their `obj` pointer transferred and nulled, so @@ -137,6 +140,39 @@ LinkInputId link_add_obj_bytes(Linker* l, const char* name, const u8* data, return id; } +LinkInputId link_add_dso_bytes(Linker* l, const char* name, const u8* data, + size_t len) { + ObjBuilder* ob; + LinkInput* in; + LinkInputId id; + Sym soname = 0; + if (!l || !data || !len) return LINK_INPUT_NONE; + ob = read_elf_dso(l->c, name, data, len, &soname); + if (!ob) + compiler_panic(l->c, no_loc(), + "link_add_dso_bytes: read_elf_dso returned NULL for '%s'", + name ? name : "(unnamed)"); + in = inputs_push(l, &id); + in->kind = LINK_INPUT_DSO_BYTES; + in->obj = ob; + in->name = name ? pool_intern_cstr(l->c->global, name) : 0; + /* DT_SONAME wins; fall back to the file's basename if the DSO has + * no SONAME (matches GNU ld's behaviour for hand-rolled libraries + * that forgot to set DT_SONAME). */ + if (soname != 0) { + in->soname = soname; + } else if (name) { + const char* base = name; + const char* p; + for (p = name; *p; ++p) + if (*p == '/') base = p + 1; + in->soname = pool_intern_cstr(l->c->global, base); + } else { + in->soname = 0; + } + return id; +} + LinkInputId link_add_archive_bytes(Linker* l, const char* name, const u8* data, size_t len, u8 whole_archive, u8 link_mode, u8 group_id) { diff --git a/src/link/link.h b/src/link/link.h @@ -12,6 +12,11 @@ typedef enum LinkInputKind { LINK_INPUT_OBJ, LINK_INPUT_OBJ_BYTES, LINK_INPUT_ARCHIVE_BYTES, + /* Shared-object input (ET_DYN). Parsed via read_elf_dso into an + * ObjBuilder containing only the DSO's exported (dynsym) symbols. + * Contributes nothing to layout — its symbols are searched by + * resolve_undefs to satisfy imported references. */ + LINK_INPUT_DSO_BYTES, } LinkInputKind; typedef u32 LinkInputId; @@ -32,6 +37,11 @@ typedef struct LinkInput { u8 pad[3]; ObjBuilder* obj; /* for LINK_INPUT_OBJ, otherwise NULL until read */ Sym name; /* diagnostic name for bytes inputs */ + /* DSO-only: SONAME extracted from PT_DYNAMIC.DT_SONAME. 0 if absent. + * Used as the DT_NEEDED entry for the consuming exe / shared lib — + * the runtime loader looks up the dependency by SONAME, not by the + * filesystem path passed at link time. */ + Sym soname; } LinkInput; typedef struct LinkSymbol { @@ -47,7 +57,20 @@ typedef struct LinkSymbol { u8 bind; /* SymBind */ u8 kind; /* SymKind */ u8 defined; - u8 pad; + /* Dynamic-link bookkeeping. `imported` is set when an undef was + * matched against a DSO input's exports — the symbol stays + * structurally undefined (the static linker has no value for it) + * but resolve_undefs no longer panics on it. `dso_input_id` is the + * id of the providing DSO LinkInput; the DSO's SONAME ends up in + * the produced image's DT_NEEDED list. The needs_* flags are set + * during reloc-rewrite (Phase 5) — declared here so the model is + * stable across the dyn-link work. */ + u8 imported; + LinkInputId dso_input_id; + u8 needs_plt; + u8 needs_got; + u8 needs_copy; + u8 pad[5]; } LinkSymbol; typedef struct LinkSegment { @@ -107,6 +130,14 @@ void link_free(Linker*); LinkInputId link_add_obj(Linker*, ObjBuilder*); LinkInputId link_add_obj_bytes(Linker*, const char* name, const u8* data, size_t len); +/* Shared-object input. The bytes are parsed as ET_DYN ELF; only the + * DSO's dynsym (exported symbols) is materialized. The DSO contributes + * no sections to the output image — its presence influences resolution + * (an undef matched by name against this DSO's exports becomes an + * imported symbol) and DT_NEEDED bookkeeping (the DSO's SONAME, or its + * filename if no SONAME, is recorded as a runtime dependency). */ +LinkInputId link_add_dso_bytes(Linker*, const char* name, const u8* data, + size_t len); /* `whole_archive` (nonzero == --whole-archive) and `link_mode` * (CfreeLinkMode: -Bstatic / -Bdynamic / --as-needed positional state) are * orthogonal per-archive flags. `group_id == 0` means linear single-pass; diff --git a/src/link/link_layout.c b/src/link/link_layout.c @@ -145,6 +145,14 @@ static void resolve_symbols(Linker* l, LinkImage* img) { ObjSymIter* it; ObjSymEntry e; + /* DSO inputs do not contribute symbol definitions to the image — + * their exports satisfy undefs through resolve_undefs's + * DSO-search path, which marks the consuming LinkSymbols as + * imported. Skipping here keeps DSO names out of img->globals + * so a static-side defined symbol of the same name doesn't + * collide and a DSO export doesn't accidentally win. */ + if (in->kind == LINK_INPUT_DSO_BYTES) continue; + /* obj.h: ObjSymId 0 is the "none" sentinel; the iterator skips * it. We need an upper bound for the per-input symbol map, * which is the builder's nsymbols (count incl. id-0 sentinel). @@ -243,6 +251,34 @@ static void resolve_symbols(Linker* l, LinkImage* img) { } } +/* Search the DSO inputs for a defined exported symbol matching + * `name`. Returns the LinkInputId of the first DSO that exports + * `name` (with its name interned in the same global pool, so a Sym + * comparison is sufficient), or LINK_INPUT_NONE if no DSO matches. + * Walks DSOs in input order so a leftmost-wins rule applies — same + * behaviour as GNU ld for ambiguous DSO exports. */ +static LinkInputId find_dso_export(Linker* l, Sym name) { + u32 ii; + ObjSymIter* it; + ObjSymEntry e; + if (name == 0) return LINK_INPUT_NONE; + for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) { + LinkInput* in = LinkInputs_at(&l->inputs, ii); + if (in->kind != LINK_INPUT_DSO_BYTES) continue; + it = obj_symiter_new(in->obj); + while (obj_symiter_next(it, &e)) { + const ObjSym* s = e.sym; + if (s->name != name) continue; + if (s->kind == SK_UNDEF) continue; + if (s->bind == SB_LOCAL) continue; + obj_symiter_free(it); + return in->id; + } + obj_symiter_free(it); + } + return LINK_INPUT_NONE; +} + static void resolve_undefs(Linker* l, LinkImage* img) { u32 i; /* For every symbol that's still SK_UNDEF and visible by name, look @@ -271,6 +307,23 @@ static void resolve_undefs(Linker* l, LinkImage* img) { } } } + /* Dynamic-link match: a DSO input exports this name. The symbol + * stays "structurally undefined" — the static linker never + * computes a vaddr for it — but we mark it imported so the panic + * path below leaves it alone, and so later phases (PLT/GOT slot + * synthesis, .rela.dyn emit) know to wire it through dynamic + * relocs. The DSO's input id ends up in DT_NEEDED via the + * input's `soname` field. The actual JUMP_SLOT / GLOB_DAT / + * needs_plt / needs_got decisions land in Phases 4–5 alongside + * the synthetic-section work. */ + if (s->name != 0) { + LinkInputId dso = find_dso_export(l, s->name); + if (dso != LINK_INPUT_NONE) { + s->imported = 1; + s->dso_input_id = dso; + continue; + } + } if (l->resolver && s->name != 0) { size_t namelen; const char* nm = pool_str(l->c->global, s->name, &namelen); @@ -956,10 +1009,16 @@ static void emit_segment_bytes(Linker* l, LinkImage* img) { static void link_symbols_to_sections(Linker* l, LinkImage* img) { u32 ii; for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) { - ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj; + LinkInput* in = LinkInputs_at(&l->inputs, ii); + ObjBuilder* ob = in->obj; InputMap* m = &img->input_maps[ii]; - ObjSymIter* it = obj_symiter_new(ob); + ObjSymIter* it; ObjSymEntry e; + /* DSO inputs were skipped in resolve_symbols — their per-input + * map is unallocated. They contribute no defined LinkSymbols + * either, so there's nothing to map to a section. */ + if (in->kind == LINK_INPUT_DSO_BYTES) continue; + it = obj_symiter_new(ob); while (obj_symiter_next(it, &e)) { LinkSymId lsid = m->sym[e.id]; LinkSymbol* ls; diff --git a/src/obj/elf.h b/src/obj/elf.h @@ -116,11 +116,66 @@ #define ELF64_R_INFO(s, t) ((((u64)(s)) << 32) | ((u64)(t) & 0xffffffffull)) /* ---- program header ---- */ +#define PT_NULL 0 #define PT_LOAD 1 +#define PT_DYNAMIC 2 +#define PT_INTERP 3 +#define PT_NOTE 4 +#define PT_PHDR 6 +#define PT_TLS 7 +#define PT_GNU_EH_FRAME 0x6474e550 +#define PT_GNU_STACK 0x6474e551 +#define PT_GNU_RELRO 0x6474e552 #define PF_X 0x1u #define PF_W 0x2u #define PF_R 0x4u +/* ---- dynamic-table tags (PT_DYNAMIC body) ---- */ +#define DT_NULL 0 +#define DT_NEEDED 1 +#define DT_PLTRELSZ 2 +#define DT_PLTGOT 3 +#define DT_HASH 4 +#define DT_STRTAB 5 +#define DT_SYMTAB 6 +#define DT_RELA 7 +#define DT_RELASZ 8 +#define DT_RELAENT 9 +#define DT_STRSZ 10 +#define DT_SYMENT 11 +#define DT_INIT 12 +#define DT_FINI 13 +#define DT_SONAME 14 +#define DT_RPATH 15 +#define DT_SYMBOLIC 16 +#define DT_REL 17 +#define DT_RELSZ 18 +#define DT_RELENT 19 +#define DT_PLTREL 20 +#define DT_DEBUG 21 +#define DT_TEXTREL 22 +#define DT_JMPREL 23 +#define DT_BIND_NOW 24 +#define DT_INIT_ARRAY 25 +#define DT_FINI_ARRAY 26 +#define DT_INIT_ARRAYSZ 27 +#define DT_FINI_ARRAYSZ 28 +#define DT_RUNPATH 29 +#define DT_FLAGS 30 +#define DT_PREINIT_ARRAY 32 +#define DT_PREINIT_ARRAYSZ 33 +#define DT_GNU_HASH 0x6ffffef5 +#define DT_FLAGS_1 0x6ffffffb +#define DF_1_NOW 0x00000001 + +/* ---- extra section types we need to recognize in DSO inputs ---- */ +#define SHT_DYNAMIC 6 +#define SHT_DYNSYM 11 +#define SHT_GNU_HASH 0x6ffffff6 +#define SHT_GNU_VERSYM 0x6fffffff +#define SHT_GNU_VERNEED 0x6ffffffe +#define SHT_GNU_VERDEF 0x6ffffffd + /* ---- AArch64 ELF wire-format relocation type codes ---- * Prefixed ELF_ to avoid collision with the cfree-canonical RelocKind * enum values in obj.h (R_AARCH64_*). */ @@ -148,6 +203,13 @@ #define ELF_R_AARCH64_ADR_GOT_PAGE 311 #define ELF_R_AARCH64_LD64_GOT_LO12_NC 312 +/* AArch64 dynamic-only reloc types: generated by the linker into + * .rela.dyn / .rela.plt and processed by the runtime loader. */ +#define ELF_R_AARCH64_COPY 1024 +#define ELF_R_AARCH64_GLOB_DAT 1025 +#define ELF_R_AARCH64_JUMP_SLOT 1026 +#define ELF_R_AARCH64_RELATIVE 1027 + /* AArch64 TLS Local-Exec (static linking model: each TLV is at a fixed * offset from the thread pointer, computed at link time). */ #define ELF_R_AARCH64_TLSLE_ADD_TPREL_HI12 549 diff --git a/src/obj/elf_read.c b/src/obj/elf_read.c @@ -213,6 +213,13 @@ ObjBuilder* read_elf(Compiler* c, const char* name, const u8* data, compiler_panic(c, no_loc(), "read_elf: not ELFDATA2LSB (got %u)", data[EI_DATA]); + u16 e_type = elf_rd_u16(data + 16); + if (e_type != ET_REL) + compiler_panic(c, no_loc(), + "read_elf: only ET_REL inputs are accepted by read_elf " + "(got e_type=%u); use read_elf_dso for ET_DYN shared objects", + (u32)e_type); + u16 e_machine = elf_rd_u16(data + 18); if (e_machine != EM_AARCH64) compiler_panic(c, no_loc(), @@ -470,3 +477,197 @@ ObjBuilder* read_elf(Compiler* c, const char* name, const u8* data, obj_finalize(ob); return ob; } + +/* ---- ET_DYN (shared object) reader ---- + * + * Produces an ObjBuilder containing only the DSO's exported symbols + * (parsed from .dynsym, not .symtab). The DSO's sections, relocations, + * and groups are skipped — DSOs contribute no bytes to the output + * image. The DT_SONAME (if any) is interned and returned via + * `*soname_out` so the caller can record DT_NEEDED at link time. + * + * Symbol shape: each defined dynsym entry produces an ObjSym whose + * (bind, kind, vis) match the source. `section_id` is OBJ_SEC_NONE — + * the symbol's value is its DSO-internal vaddr, not meaningful to the + * consuming linker, so we record `value=0`. The linker layer + * (resolve_undefs) only consults the name and the defined-ness flag. + * + * Undefined dynsym entries (st_shndx==SHN_UNDEF) are imports the DSO + * itself has against other libraries; they're not relevant to a + * consumer that's linking against this DSO and are dropped. */ + +static int parse_phdr(const u8* data, size_t len, u64 e_phoff, u16 e_phentsize, + u16 e_phnum, u32 want_type, u64* out_offset, + u64* out_filesz) { + u32 i; + if (e_phentsize != ELF64_PHDR_SIZE) return 0; + if (e_phoff + (u64)e_phnum * ELF64_PHDR_SIZE > len) return 0; + for (i = 0; i < e_phnum; ++i) { + const u8* p = data + e_phoff + (u64)i * ELF64_PHDR_SIZE; + u32 p_type = elf_rd_u32(p + 0); + if (p_type != want_type) continue; + *out_offset = elf_rd_u64(p + 8); + *out_filesz = elf_rd_u64(p + 32); + return 1; + } + return 0; +} + +ObjBuilder* read_elf_dso(Compiler* c, const char* name, const u8* data, + size_t len, Sym* soname_out) { + (void)name; + if (soname_out) *soname_out = 0; + + if (len < ELF64_EHDR_SIZE) + compiler_panic(c, no_loc(), "read_elf_dso: input shorter than ELF header"); + if (data[EI_MAG0] != ELFMAG0 || data[EI_MAG1] != ELFMAG1 || + data[EI_MAG2] != ELFMAG2 || data[EI_MAG3] != ELFMAG3) + compiler_panic(c, no_loc(), "read_elf_dso: bad ELF magic"); + if (data[EI_CLASS] != ELFCLASS64) + compiler_panic(c, no_loc(), "read_elf_dso: not ELFCLASS64"); + if (data[EI_DATA] != ELFDATA2LSB) + compiler_panic(c, no_loc(), "read_elf_dso: not ELFDATA2LSB"); + + u16 e_type = elf_rd_u16(data + 16); + if (e_type != ET_DYN) + compiler_panic(c, no_loc(), + "read_elf_dso: expected ET_DYN, got e_type=%u", (u32)e_type); + + u16 e_machine = elf_rd_u16(data + 18); + if (e_machine != EM_AARCH64) + compiler_panic(c, no_loc(), + "read_elf_dso: unsupported e_machine 0x%x (only AArch64)", + (u32)e_machine); + + u64 e_phoff = elf_rd_u64(data + 32); + u64 e_shoff = elf_rd_u64(data + 40); + u16 e_phentsize = elf_rd_u16(data + 54); + u16 e_phnum = elf_rd_u16(data + 56); + u16 e_shentsize = elf_rd_u16(data + 58); + u16 e_shnum = elf_rd_u16(data + 60); + u16 e_shstrndx = elf_rd_u16(data + 62); + + if (e_shentsize != ELF64_SHDR_SIZE) + compiler_panic(c, no_loc(), "read_elf_dso: unexpected e_shentsize %u", + (u32)e_shentsize); + if (e_shoff + (u64)e_shnum * ELF64_SHDR_SIZE > len) + compiler_panic(c, no_loc(), + "read_elf_dso: section header table out of range"); + if (e_shstrndx >= e_shnum) + compiler_panic(c, no_loc(), "read_elf_dso: e_shstrndx out of range"); + + ShdrRec* shdrs = arena_array(c->scratch, ShdrRec, e_shnum); + for (u32 i = 0; i < e_shnum; ++i) + parse_shdr(data + e_shoff + (u64)i * ELF64_SHDR_SIZE, &shdrs[i]); + + /* Locate .dynsym (preferred over .symtab — a stripped DSO carries + * only .dynsym) and its associated strtab via sh_link. */ + u32 dynsym_idx = 0, dynamic_idx = 0; + for (u32 i = 1; i < e_shnum; ++i) { + if (shdrs[i].sh_type == SHT_DYNSYM && !dynsym_idx) dynsym_idx = i; + if (shdrs[i].sh_type == SHT_DYNAMIC && !dynamic_idx) dynamic_idx = i; + } + + if (!dynsym_idx) + compiler_panic(c, no_loc(), + "read_elf_dso: no SHT_DYNSYM in shared object"); + + /* Parse PT_DYNAMIC for DT_SONAME. The .dynamic section gives us the + * dynstr to resolve the SONAME's offset; if there's no .dynamic + * section we fall back to scanning the PT_DYNAMIC segment. */ + Sym soname = 0; + if (dynamic_idx) { + const ShdrRec* dsh = &shdrs[dynamic_idx]; + if (dsh->sh_link >= e_shnum) + compiler_panic(c, no_loc(), + "read_elf_dso: .dynamic sh_link %u out of range", + dsh->sh_link); + const ShdrRec* str_sh = &shdrs[dsh->sh_link]; + if (str_sh->sh_offset + str_sh->sh_size > len) + compiler_panic(c, no_loc(), + "read_elf_dso: .dynamic strtab out of range"); + const u8* dynstr = data + str_sh->sh_offset; + u64 dynstr_sz = str_sh->sh_size; + + if (dsh->sh_offset + dsh->sh_size > len) + compiler_panic(c, no_loc(), "read_elf_dso: .dynamic body out of range"); + const u8* dynp = data + dsh->sh_offset; + u64 dynsz = dsh->sh_size; + /* DT entries are 16 bytes: (d_tag: u64, d_un: u64). */ + for (u64 off = 0; off + 16 <= dynsz; off += 16) { + u64 tag = elf_rd_u64(dynp + off); + u64 val = elf_rd_u64(dynp + off + 8); + if (tag == DT_NULL) break; + if (tag == DT_SONAME) { + u32 nlen; + const char* nm = strtab_lookup(dynstr, dynstr_sz, (u32)val, &nlen); + if (nlen) soname = pool_intern(c->global, nm, nlen); + break; + } + } + } else if (e_phnum) { + /* Fallback: walk PT_DYNAMIC straight from program headers. We + * only need DT_SONAME, so skip if we can't find a strtab pointer + * inline (DT_STRTAB carries a vaddr, not a file offset — stripped + * DSOs without SHT_DYNAMIC are exceedingly rare in practice). */ + u64 dyn_off, dyn_sz; + (void)parse_phdr(data, len, e_phoff, e_phentsize, e_phnum, PT_DYNAMIC, + &dyn_off, &dyn_sz); + } + if (soname_out) *soname_out = soname; + + /* Now parse .dynsym. */ + const ShdrRec* sh = &shdrs[dynsym_idx]; + if (sh->sh_entsize != ELF64_SYM_SIZE) + compiler_panic(c, no_loc(), "read_elf_dso: .dynsym entsize %llu != %u", + (unsigned long long)sh->sh_entsize, (u32)ELF64_SYM_SIZE); + if (sh->sh_size % ELF64_SYM_SIZE) + compiler_panic(c, no_loc(), + "read_elf_dso: .dynsym size not multiple of entry size"); + if (sh->sh_link >= e_shnum) + compiler_panic(c, no_loc(), "read_elf_dso: .dynsym sh_link out of range"); + const ShdrRec* str_sh = &shdrs[sh->sh_link]; + if (str_sh->sh_offset + str_sh->sh_size > len) + compiler_panic(c, no_loc(), "read_elf_dso: .dynstr out of range"); + const u8* strtab = data + str_sh->sh_offset; + u64 strtab_sz = str_sh->sh_size; + + ObjBuilder* ob = obj_new(c); + if (!ob) compiler_panic(c, no_loc(), "read_elf_dso: obj_new failed"); + + u32 nsyms = (u32)(sh->sh_size / ELF64_SYM_SIZE); + const u8* base = data + sh->sh_offset; + for (u32 i = 1; i < nsyms; ++i) { /* skip index 0 */ + const u8* p = base + (u64)i * ELF64_SYM_SIZE; + u32 st_name = elf_rd_u32(p + 0); + u8 st_info = p[4]; + u8 st_other = p[5]; + u16 st_shndx = elf_rd_u16(p + 6); + + /* Skip the DSO's own undefined imports — they don't satisfy any + * undef in our consumer. Locals (STB_LOCAL) likewise aren't + * exported and would only confuse the resolver. */ + if (st_shndx == SHN_UNDEF) continue; + u32 e_bind = ELF64_ST_BIND(st_info); + if (e_bind == STB_LOCAL) continue; + + u32 nlen; + const char* nm = strtab_lookup(strtab, strtab_sz, st_name, &nlen); + if (!nlen) continue; + Sym sn = pool_intern(c->global, nm, nlen); + + u32 e_type_field = ELF64_ST_TYPE(st_info); + u16 bind = elf_bind_to_obj(e_bind); + u16 kind = elf_type_to_kind(e_type_field, st_shndx); + u8 vis = elf_other_to_vis(st_other); + + /* DSO exports land as defined symbols in OBJ_SEC_NONE with + * value=0. The consumer treats them as imports — see + * resolve_undefs in src/link/link_layout.c. */ + obj_symbol_ex(ob, sn, (SymBind)bind, (SymVis)vis, (SymKind)kind, + OBJ_SEC_NONE, 0, 0, 0); + } + + obj_finalize(ob); + return ob; +} diff --git a/src/obj/elf_reloc_aarch64.c b/src/obj/elf_reloc_aarch64.c @@ -85,6 +85,14 @@ u32 elf_aarch64_reloc_to(u32 kind /* RelocKind */) { return ELF_R_AARCH64_TLSLE_LDST64_TPREL_LO12; case R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC: return ELF_R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC; + case R_AARCH64_GLOB_DAT: + return ELF_R_AARCH64_GLOB_DAT; + case R_AARCH64_JUMP_SLOT: + return ELF_R_AARCH64_JUMP_SLOT; + case R_AARCH64_RELATIVE: + return ELF_R_AARCH64_RELATIVE; + case R_AARCH64_COPY: + return ELF_R_AARCH64_COPY; default: return ELF_R_AARCH64_NONE; } @@ -160,6 +168,14 @@ u32 elf_aarch64_reloc_from(u32 elf_type) { return R_AARCH64_TLSLE_LDST64_TPREL_LO12; case ELF_R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC: return R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC; + case ELF_R_AARCH64_GLOB_DAT: + return R_AARCH64_GLOB_DAT; + case ELF_R_AARCH64_JUMP_SLOT: + return R_AARCH64_JUMP_SLOT; + case ELF_R_AARCH64_RELATIVE: + return R_AARCH64_RELATIVE; + case ELF_R_AARCH64_COPY: + return R_AARCH64_COPY; default: return (u32)-1; /* sentinel */ } diff --git a/src/obj/obj.h b/src/obj/obj.h @@ -137,6 +137,16 @@ typedef enum RelocKind { R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC, R_AARCH64_TLSLE_LDST64_TPREL_LO12, R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC, + /* Dynamic-only relocs: emitted into .rela.dyn / .rela.plt of an + * ET_DYN/ET_EXEC output and processed by the runtime loader. They + * never appear in ET_REL inputs from a compiler; the linker may + * synthesize them during dynamic-exe / shared-lib emit, and the + * reader recognizes them when it walks an ET_DYN's .rela.* sections + * (currently only used for symbol-name extraction, not applied). */ + R_AARCH64_GLOB_DAT, + R_AARCH64_JUMP_SLOT, + R_AARCH64_RELATIVE, + R_AARCH64_COPY, R_RV_HI20, R_RV_LO12_I, R_RV_LO12_S, @@ -303,6 +313,17 @@ void emit_wasm(Compiler*, ObjBuilder*, Writer*); /* ---- file format readers (for ld and objdump) ---- */ ObjBuilder* read_elf(Compiler*, const char* name, const u8* data, size_t len); +/* ELF ET_DYN reader. Produces an ObjBuilder containing only the DSO's + * exported (dynsym) symbols. Defined dynsym entries land as ObjSyms + * with their original SymBind/SymKind so the linker's symbol-resolution + * pass can match them by name. The DSO's sections, relocations, and + * groups are all skipped — DSOs contribute no bytes to the output. + * + * If `soname_out` is non-NULL, *soname_out receives the DT_SONAME + * interned into the compiler's global Sym pool, or 0 if the DSO has + * no SONAME. */ +ObjBuilder* read_elf_dso(Compiler*, const char* name, const u8* data, + size_t len, Sym* soname_out); ObjBuilder* read_coff(Compiler*, const char* name, const u8* data, size_t len); ObjBuilder* read_macho(Compiler*, const char* name, const u8* data, size_t len); ObjBuilder* read_wasm(Compiler*, const char* name, const u8* data, size_t len); diff --git a/test/test.mk b/test/test.mk @@ -86,14 +86,23 @@ test-link: lib $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER) test-cg: lib $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER) bash test/cg/run.sh -# test-musl: end-to-end static-musl link/run on aarch64. Pulls a pinned -# musl sysroot (test/musl/extract.sh — uses podman against Alpine 3.20), -# builds rt/build/aarch64-linux/libcfree_rt.a for the soft-float / TF -# builtins, and runs `cfree ld` against the real musl libc.a. Excluded -# from the default `test` target because it needs podman and ~30s on -# first run; opt-in via `make test-musl`. -test-musl: bin rt-aarch64-linux +# test-musl: end-to-end static + dynamic musl link/run on aarch64. +# Pulls a pinned musl sysroot (test/musl/extract.sh — podman against +# Alpine 3.20), builds rt/build/aarch64-linux/libcfree_rt.a for the +# soft-float / TF builtins, and runs `cfree ld` against the real musl +# libc.a (static variant) and libc.so (dynamic variant — see +# doc/DYNLD.md). Excluded from the default `test` target because it +# needs podman and ~30s on first run; opt-in via `make test-musl`. +# +# The sysroot is treated as a real prerequisite via its PROVENANCE +# marker so subsequent runs skip extraction and re-extract only when +# the file is removed (or test/musl/extract.sh -f forces a rebuild). +MUSL_SYSROOT_MARKER = build/musl-sysroot/PROVENANCE + +$(MUSL_SYSROOT_MARKER): test/musl/extract.sh test/musl/Containerfile @bash test/musl/extract.sh + +test-musl: bin rt-aarch64-linux $(MUSL_SYSROOT_MARKER) @bash test/musl/run.sh # Fail if libcfree.a depends on any external symbol not in the allowlist.