commit 50d5bec75749d30912c9f54abad250f279a0ffa2
parent 9fb1e48ba4c1b8dcd5ab830faa2ccdc9ab6ab285
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sat, 9 May 2026 16:22:57 -0700
link: dynamic linking phases 1-3 (DSO read, driver, resolve)
Per doc/DYNLD.md: ELF reader accepts ET_DYN as a DSO input via the
new read_elf_dso path; driver gains -dynamic-linker, .so / .so.N
recognition, and -Bdynamic-aware -l<name>; resolve_undefs searches
DSO inputs by name before panicking. Dynamic harness now reaches
emit instead of being rejected at ELF read; static cases and all
existing test suites still pass.
Diffstat:
15 files changed, 914 insertions(+), 183 deletions(-)
diff --git a/doc/DYNLD.md b/doc/DYNLD.md
@@ -6,9 +6,51 @@ This is the gap exposed by `test/musl/run.sh`'s `dynamic` variant
(`build/musl/<case>/dynamic/link.err`); see `doc/linker-status.md` row
"Dynamic linking: PT_DYNAMIC, PT_INTERP, PLT, DT_NEEDED" for context.
-The harness today fails at the first foot of the pipeline (ELF reader
-rejects the `.so`). Behind that failure are the model, layout, emit,
-and driver gaps catalogued below.
+## Status
+
+Phases 1–3 have landed. The dynamic harness now reaches the link's
+final emit stage instead of being rejected at ELF read; failures have
+shifted from `(link)` to runtime crashes (`run rc=139`) on the
+produced binary, which is the expected outcome until Phases 4–6
+(synthetic `.plt`/`.got`/`.dynamic`, PIE emission) are written. Every
+3-test static variant still passes (no regression), and the existing
+`test-link` / `test-cg` / `test-elf` suites are clean.
+
+| Phase | State | Where to look |
+|------:|--------------|-----------------------------------------------------------------|
+| 1 | done | `src/obj/elf_read.c::read_elf_dso`, new RelocKinds in `obj.h` |
+| 2 | done | `driver/ld.c` (`-dynamic-linker`, `.so` argv), `lib_resolve.c` |
+| 3 | done | `link_layout.c::find_dso_export` + `resolve_undefs` extension |
+| 4 | not started | per §3.4 below |
+| 5 | not started | per §3.5 below |
+| 6 | not started | per §3.6 below |
+| 7 | not started | per §3.7 below |
+| 8 | deferred | TLS GD/IE/LD, IRELATIVE — out of scope for v1 |
+
+Notes that drifted from the original plan during 1–3 implementation:
+- The DSO input shares the existing `ObjBuilder` rather than a new
+ `DsoBuilder` (open question §5.1). `read_elf_dso` produces an
+ `ObjBuilder` with only the DSO's exported dynsym entries and no
+ sections; the SONAME lives on `LinkInput.soname`. Symbol-walk
+ passes that touched `InputMap` (`resolve_symbols`,
+ `link_symbols_to_sections`) early-out for `LINK_INPUT_DSO_BYTES`
+ inputs, since DSOs contribute no per-input map. If §4–§6 turn out
+ to want a proper `DsoBuilder`, the migration is local — the
+ reader's call sites and the input enum are the only public
+ surface.
+- `link_add_dso_bytes` falls back to the DSO's filename basename
+ when DT_SONAME is absent (matches GNU ld). PT_DYNAMIC walking from
+ pure `e_phoff` (no `SHT_DYNAMIC`) is stubbed; not exercised by
+ musl `libc.so` since musl does ship `SHT_DYNAMIC`.
+- `-l<name>` resolution under non-`-Bstatic` modes searches `.so`
+ across every `-L` dir before falling back to `.a` — suffix-first
+ rather than dir-first, matching `clang -l`.
+
+The §2 inventory below catalogues the full set of model, layout,
+emit, and driver gaps as they stood pre-Phase 1. The text describes
+the **original** state of each file so the rest of the plan reads in
+context; what's actually changed since is captured in the Status
+table above and noted inline where useful.
---
@@ -54,90 +96,102 @@ the shared path.
## 2. Investigation: current pipeline state
-### 2.1 Driver — `driver/ld.c`
+### 2.1 Driver — `driver/ld.c` *(addressed in Phase 2)*
-Already has:
+Already had:
- `-shared`, `-soname`, `-rpath`, `-rpath-link`, `-Bstatic`/`-Bdynamic`,
`--enable-new-dtags`, `-pie`, `-no-pie`, `-E`/`--export-dynamic`,
`--whole-archive`, `--start-group`/`--end-group`, `-l<name>` resolution
(`driver/lib_resolve.c`).
-- `cfree_link_shared` dispatch wired (`driver/ld.c:640-668`); shared
- options are populated and passed.
-
-Missing:
-- No `-dynamic-linker` / `--dynamic-linker` flag. Unknown flags are
- rejected (`driver/ld.c:427`), so callers can't even pass it as a
- forward-compat no-op.
-- No `.so` recognition. Only `.a` is special-cased in argv parse
- (`driver/ld.c:432`); everything else becomes an "object file" and
- goes into `obj_bytes`. `driver_lib_resolve` (used by `-l`) does not
- appear to distinguish `.so` from `.a` either — confirm and extend.
-- `-l<name>` resolution doesn't honor the current `-Bstatic`/`-Bdynamic`
- link-mode for picking `.so` over `.a`.
-
-### 2.2 ELF reader — `src/obj/elf_read.c`
-
-`read_elf` is the single ingest path used by both `link_add_obj_bytes`
-(`src/link/link.c:128`) and archive members (`src/link/link.c:207`).
-
-What it parses:
-- `e_shoff` / shdrs only. Program headers ignored.
-- `SHT_PROGBITS`/`NOBITS`/`NOTE`/etc. → ObjBuilder sections.
-- Exactly one `SHT_SYMTAB` (the `.symtab`) into ObjSyms.
-- `SHT_RELA` / `SHT_REL` whose `sh_info` points at a kept section.
-- `SHT_GROUP`.
-
-Why it rejects `libc.so`: `elf_read.c:395` enforces
-`sh->sh_info != 0 && sh->sh_info < e_shnum` for every RELA/REL. Shared
-objects' `.rela.dyn` and `.rela.plt` carry `sh_info = 0` (a dynamic
-reloc isn't bound to one specific output section) — which is **valid
-ELF** but hits this guard. Also missing:
-
-- No `e_type` discrimination (silently accepts ET_DYN/ET_EXEC and
- proceeds; would corrupt the global symbol pool if not for the rela
- guard tripping first).
-- No `SHT_DYNSYM` / `SHT_DYNAMIC` / `SHT_GNU_HASH` reader.
-- No `PT_DYNAMIC` walk.
-
-### 2.3 Object model — `src/obj/obj.h`
-
-The relocation enum (`src/obj/obj.h:96-127`) has no entries for:
-- `R_AARCH64_GLOB_DAT` / `R_AARCH64_JUMP_SLOT` / `R_AARCH64_RELATIVE`
- / `R_AARCH64_COPY` / `R_AARCH64_TLSDESC*` — the dynamic-only relocs
- the loader processes at startup.
-- `R_AARCH64_PLT32` (4-byte PLT-relative) — typically not used on
- AArch64 (CALL26/JUMP26 carry the PLT semantics) but the mapping
- table in `src/obj/elf_reloc_aarch64.c` would refuse it if seen.
-
-The mapping in `src/obj/elf_reloc_aarch64.c` returns `(u32)-1` for
-unsupported types, which `read_elf` panics on (`elf_read.c:413`). So
-even with the `sh_info==0` check relaxed, GLOB_DAT in `libc.so`'s
-`.rela.dyn` would trip the next guard.
-
-There's also no notion of an **import** symbol or a **DSO input**.
-Today `LinkInput.kind ∈ {OBJ, OBJ_BYTES, ARCHIVE_BYTES}` (`link.h:10`);
-`LinkSymbol` has `defined`, `kind`, `value`, `vaddr` but no "needs PLT
-slot" / "needs GOT slot" / "lives in DSO N" fields.
-
-### 2.4 Linker resolve — `src/link/link_layout.c`
-
-- Static-only by construction. Comment at `link_layout.c:1646` and the
- IFUNC ctor logic key off `l->emit_static_exe`; the rest of the layout
- has no symmetric "build dynamic image" branch.
-- `resolve_undefs` (`link_layout.c:247`) panics on any undef that isn't
- satisfied by `img->globals` or the in-process resolver
- (`link_layout.c:300-307`). Dynamic linking needs a third path: undef
- satisfied by an imported DSO sym, recorded but **kept undefined in
- the static sense** so it routes through .plt/.got at apply time.
-- `section_kept` (`link_layout.c:53`) drops everything that isn't
- ALLOC PROGBITS/NOBITS/INIT_ARRAY. Synthesized .dynsym / .dynstr /
- .dynamic / .got.plt / .plt / .rela.dyn / .rela.plt would need to be
- added as image-owned synthetic sections (same model as `layout_iplt`
- uses for `.iplt`/`.igot.plt`/`.iplt.pairs`).
-- `link_image_alloc` and `LinkImage` (`link_internal.h:105-148`) carry
- no fields for: dynamic strtab, dynsym table, hash table, PLT/GOT
- slot tables, dynamic-reloc list, PT_INTERP path, soname, DT_NEEDED
- list, runpath/rpath lists.
+- `cfree_link_shared` dispatch wired (`driver/ld.c`); shared options
+ are populated and passed.
+
+Was missing (now fixed):
+- ~~`-dynamic-linker` / `--dynamic-linker` flag~~ → parsed and plumbed
+ through `CfreeLinkOptions.interp_path`.
+- ~~`.so` / `.so.N` filename recognition~~ → `driver_is_so_filename`
+ routes positional shared inputs into `LdOptions.dsos[]` →
+ `CfreeLinkInputs.dso_bytes`.
+- ~~`-l<name>` honoring `-Bstatic`/`-Bdynamic`~~ → `driver_lib_resolve`
+ takes a `LibResolveMode`; non-`-Bstatic` callers prefer `lib<name>.so`
+ before falling back to `lib<name>.a`. The function reports which
+ suffix matched so the driver routes hits to `dsos[]` vs.
+ `archives[]`.
+
+### 2.2 ELF reader — `src/obj/elf_read.c` *(addressed in Phase 1)*
+
+Pre-Phase-1, `read_elf` was the single ingest path used by both
+`link_add_obj_bytes` and archive members. It parsed `e_shoff` / shdrs
+only (program headers ignored), folded `SHT_PROGBITS`/`NOBITS`/`NOTE`
+into ObjBuilder sections, took at most one `SHT_SYMTAB`, and walked
+`SHT_RELA`/`SHT_REL` whose `sh_info` named a kept section.
+
+It rejected `libc.so` because the guard
+`sh->sh_info != 0 && sh->sh_info < e_shnum` tripped on `.rela.dyn` /
+`.rela.plt` (whose `sh_info = 0` is valid ELF — a dynamic reloc isn't
+bound to one output section). It also accepted ET_DYN silently and
+had no SHT_DYNSYM / SHT_DYNAMIC / PT_DYNAMIC reader.
+
+Post-Phase-1:
+- `read_elf` rejects anything other than ET_REL with a diagnostic;
+ ET_DYN inputs are routed through a separate `read_elf_dso`.
+- `read_elf_dso` parses `SHT_DYNSYM` (skipping `.symtab` if present),
+ walks `SHT_DYNAMIC` for `DT_SONAME` (interned into the compiler's
+ global Sym pool), and explicitly skips `.rela.*` / `SHT_GROUP` —
+ DSO inputs contribute no relocations or sections to the consumer.
+- Defined dynsym entries are appended as `ObjSym`s with
+ `section_id = OBJ_SEC_NONE` and `value = 0`; only the name is
+ load-bearing for the consumer's resolver. `STB_LOCAL` and undefined
+ dynsym entries (the DSO's own imports) are dropped.
+
+### 2.3 Object model — `src/obj/obj.h` *(addressed in Phases 1, 3)*
+
+Pre-Phase-1, `RelocKind` had no entries for the dynamic-only relocs
+(`R_AARCH64_GLOB_DAT` / `R_AARCH64_JUMP_SLOT` / `R_AARCH64_RELATIVE` /
+`R_AARCH64_COPY`); the mapping in `src/obj/elf_reloc_aarch64.c`
+returned `(u32)-1` on unsupported types and the reader panicked.
+There was also no concept of an **import** symbol or a **DSO input**.
+
+Post-Phase-1/3:
+- `RelocKind` carries `R_AARCH64_GLOB_DAT`, `R_AARCH64_JUMP_SLOT`,
+ `R_AARCH64_RELATIVE`, `R_AARCH64_COPY`, with both directions of
+ `elf_aarch64_reloc_{to,from}` wired and the objdump-side
+ `reloc_kind_name` extended. `R_AARCH64_TLSDESC*` and `R_AARCH64_PLT32`
+ are still deferred (Phase 8 / not exercised by the musl harness).
+- `LinkInput.kind` gains `LINK_INPUT_DSO_BYTES`, plus a `Sym soname`
+ field; `link_add_dso_bytes` builds these via `read_elf_dso`.
+- `LinkSymbol` gains `imported`, `dso_input_id`, and a flag set
+ (`needs_plt`, `needs_got`, `needs_copy`). The flags are reserved
+ for Phases 4–5; today only `imported` and `dso_input_id` are set —
+ by `resolve_undefs` when an undef is matched against a DSO export.
+ An imported symbol stays `defined=0` (the static linker has no
+ vaddr for it) but no longer trips the panic.
+
+### 2.4 Linker resolve — `src/link/link_layout.c` *(partially addressed in Phase 3; Phase 4 owns the rest)*
+
+- Static-only by construction. Comment near the IFUNC ctor logic
+ keys off `l->emit_static_exe`; the rest of the layout still has no
+ symmetric "build dynamic image" branch — Phase 4 work.
+- `resolve_undefs` *(post-Phase-3)*: walks DSO inputs via
+ `find_dso_export` before falling through to the resolver / weak-
+ zero path. On a hit, the undef is marked `imported=1`,
+ `dso_input_id=<DSO id>`, and resolution continues. The
+ still-undefined-in-the-static-sense semantics is exactly what
+ Phases 4–5 need: the symbol routes through synthetic .plt/.got at
+ apply time. DSO inputs themselves are skipped by `resolve_symbols`
+ (their exports must not contend with internal definitions in
+ `img->globals`) and by `link_symbols_to_sections` (no per-input
+ `InputMap` is allocated for them).
+- `section_kept` still drops everything that isn't ALLOC
+ PROGBITS/NOBITS/INIT_ARRAY. Synthesized .dynsym / .dynstr /
+ .dynamic / .got.plt / .plt / .rela.dyn / .rela.plt still need to
+ be added as image-owned synthetic sections (Phase 4, same model
+ as `layout_iplt`'s `.iplt`/`.igot.plt`/`.iplt.pairs`).
+- `link_image_alloc` and `LinkImage` still carry no fields for:
+ dynamic strtab, dynsym table, hash table, PLT/GOT slot tables,
+ dynamic-reloc list, PT_INTERP path, DT_NEEDED list, runpath/rpath
+ lists. The DT_NEEDED soname does live on `LinkInput.soname` —
+ collecting the actually-used set is Phase 4.
### 2.5 ELF emit — `src/link/link_elf.c`
@@ -292,16 +346,17 @@ is simpler to implement** — initialize all `.got.plt` slots from
4. Default: when any DSO input or `-pie` is present, output is ET_DYN
with a default interp; otherwise ET_EXEC (current behavior).
-### 3.8 Public API
+### 3.8 Public API *(items 1–2 done in Phase 2; item 3 is Phase 7)*
-1. `CfreeLinkInputs` gains `dso_bytes` + `ndso_bytes` fields
- (parallel to `obj_bytes`).
-2. `CfreeLinkOptions` gains `interp_path` and a `pie` flag (or
- `output_kind ∈ {EXE_STATIC, EXE_PIE, SHARED}`).
-3. `cfree_link_shared` stub at `src/api/pipeline.c:413` becomes a
- thin wrapper that dispatches into the same layout as `link_exe`
- but with `output_kind = SHARED` (no PT_INTERP, no entry symbol
- required, allow_undefined=1).
+1. ~~`CfreeLinkInputs` gains `dso_bytes` + `ndso_bytes` fields~~ —
+ landed.
+2. ~~`CfreeLinkOptions` gains `interp_path` and a `pie` flag~~ —
+ landed (kept as two scalar fields rather than a single
+ `output_kind` enum; Phase 4/6 will decide whether to fold them).
+3. `cfree_link_shared` stub at `src/api/pipeline.c` becomes a thin
+ wrapper that dispatches into the same layout as `link_exe` but
+ with `output_kind = SHARED` (no PT_INTERP, no entry symbol
+ required, allow_undefined=1) — Phase 7.
---
@@ -311,47 +366,71 @@ Each phase is independently testable against `test/musl/run.sh`'s
dynamic variant. Phases (1)-(3) are the ELF-reader cleanup that
unblocks every later step; (4)-(8) are the actual link work.
-### Phase 1 — ELF reader: accept ET_DYN as a DSO input *(small)*
+### Phase 1 — ELF reader: accept ET_DYN as a DSO input *(done)*
Files: `src/obj/elf_read.c`, `src/obj/elf_reloc_aarch64.c`,
-`src/obj/obj.h`, `src/link/link.c`.
-
-- Add `read_elf_dso` returning a `DsoBuilder`. Callers in
- `src/link/link.c` dispatch on `e_type`.
-- `LINK_INPUT_DSO_BYTES` enum + `link_add_dso_bytes` API.
-- New RelocKinds (GLOB_DAT, JUMP_SLOT, RELATIVE, COPY) wired through
- `elf_aarch64_reloc_{to,from}`.
-- DSO input is *parsed but not laid out* — its dynsym is searchable
- during `resolve_undefs`, but it contributes no sections to the
- image.
-
-Test: harness no longer fails at "rela sh_info 0 out of range". Next
-failure surfaces.
-
-### Phase 2 — Driver: `-dynamic-linker`, `.so` inputs *(small)*
-
-Files: `driver/ld.c`, `driver/lib_resolve.c`, `include/cfree.h`.
-
-- Parse `-dynamic-linker`, plumb to `CfreeLinkOptions`.
-- Recognize `.so` / `.so.N` filenames; route to new `dso_bytes` slot.
-- `-l<name>` under `-Bdynamic` finds `lib<name>.so` first.
-
-Test: case can be invoked end-to-end with the same flags GNU ld
-takes; failure is now a missing model field, not a parse error.
-
-### Phase 3 — Resolve: imported-undef path *(medium)*
-
-Files: `src/link/link_layout.c`, `src/link/link_internal.h`,
-`src/link/link.h`.
-
-- `LinkSymbol.imported`, `dso_id`, `needs_{plt,got,copy}` flags.
-- `resolve_undefs` extension: search DSO inputs by name before the
- panic. On hit, mark imported; record DT_NEEDED.
-- Emit-time decisions deferred — at this phase the imported syms
- just aren't fatal anymore.
-
-Test: link reaches layout. Failure shifts to "no .plt", "abs reloc
-target has no vaddr", or similar — i.e., real layout work.
+`src/obj/elf.h`, `src/obj/obj.h`, `src/link/link.{h,c}`.
+
+- ~~Add `read_elf_dso`~~. Returns an `ObjBuilder` (not a sibling
+ `DsoBuilder` — see Status notes); the consumer's `LinkInput`
+ carries the soname separately.
+- ~~`LINK_INPUT_DSO_BYTES` enum + `link_add_dso_bytes` API.~~
+- ~~New RelocKinds (GLOB_DAT, JUMP_SLOT, RELATIVE, COPY) wired
+ through `elf_aarch64_reloc_{to,from}`.~~
+- ~~DSO input is *parsed but not laid out*~~ — its exported dynsym
+ entries are searchable during `resolve_undefs` but it contributes
+ no sections to the image. `resolve_symbols` and
+ `link_symbols_to_sections` short-circuit on
+ `LINK_INPUT_DSO_BYTES`.
+
+Test: ✓ harness no longer fails at "rela sh_info 0 out of range";
+ELF read of `libc.so` succeeds.
+
+### Phase 2 — Driver: `-dynamic-linker`, `.so` inputs *(done)*
+
+Files: `driver/ld.c`, `driver/lib_resolve.{h,c}`, `driver/cc.c`
+(call-site update), `include/cfree.h`, `src/api/pipeline.c` (DSO
+input plumbing).
+
+- ~~Parse `-dynamic-linker` / `--dynamic-linker [=]PATH`~~; plumbed
+ through `CfreeLinkOptions.interp_path`.
+- ~~Recognize `.so` / `.so.N` filenames~~ via
+ `driver_is_so_filename`; positional shared inputs route to
+ `LdOptions.dsos[]` → `CfreeLinkInputs.dso_bytes`.
+- ~~`-l<name>` under `-Bdynamic` finds `lib<name>.so` first~~ via
+ `LibResolveMode`. `cc.c` uses `LIB_RESOLVE_STATIC_ONLY` (driver
+ default unchanged); `ld.c` picks the mode from the current
+ `-Bstatic`/`-Bdynamic` state.
+- `CfreeLinkOptions.pie` carries `-pie` through to the linker for
+ Phase 6 to consume.
+
+Test: ✓ harness invokes cfree with `-pie` and `libc.so` end-to-end;
+failure is now in the link's emit stage, not a parse error.
+
+### Phase 3 — Resolve: imported-undef path *(done)*
+
+Files: `src/link/link_layout.c`, `src/link/link.h`.
+
+- ~~`LinkSymbol.imported`, `dso_input_id`,
+ `needs_{plt,got,copy}` flags.~~ Declared in `link.h`; only
+ `imported` and `dso_input_id` are populated today (the `needs_*`
+ flags are reserved for Phase 4–5 decisions).
+- ~~`resolve_undefs` extension~~ via `find_dso_export`: walks DSO
+ inputs in input order before the resolver/weak-zero/panic
+ fallback. On hit, marks the symbol imported and stamps
+ `dso_input_id`. The DSO's soname (already on `LinkInput.soname`)
+ is the eventual DT_NEEDED entry; collecting the actually-used
+ set into the image is Phase 4.
+- ~~Emit-time decisions deferred~~ — imported syms are no longer
+ fatal but still have no vaddr. They'll panic the reloc apply
+ path the moment a CALL26 / ABS64 / ADR_GOT_PAGE targets one,
+ which is the wedge for Phase 4–5.
+
+Test: ✓ link reaches emit. Dynamic harness moved from `(link)`
+failures to `(run rc=139)` runtime crashes — the produced binary
+has no PT_INTERP / PT_DYNAMIC / .plt yet, so the loader can't bind
+it. All 3 static cases still pass; all 756 cg tests, 118 link tests,
+and the elf/ar/lib-deps suites still pass (no regression).
### Phase 4 — Synthetic dyn-tables *(medium)*
@@ -462,18 +541,27 @@ near-term surface.
## 6. Test plan
-`test/musl/run.sh dynamic` is the integration test. Per-phase
-expected progressions:
+`test/musl/run.sh dynamic` is the integration test, accessible via
+`make test-musl` (the target declares the sysroot, runtime, and
+driver binary as Make prereqs so a fresh checkout boots cleanly).
+Per-phase expected progressions:
| Phase | `01_syscall_write` | `02_errno_touch` | `03_printf_hello` |
|------:|--------------------|-------------------|-------------------|
| pre | link: rela sh_info | link: rela sh_info| link: rela sh_info|
-| 1 | link: …unsupported reloc / model | … | … |
-| 2 | link: model gap | … | … |
-| 3 | link: layout gap | … | … |
-| 4 | mmap ok / segfault | … | … |
-| 5 | run pass | run: GLOB_DAT path| run: PLT call path|
-| 6 | run pass | run pass | run pass |
+| 1 | link: model gap | link: model gap | link: model gap |
+| 2 | link: model gap | link: model gap | link: model gap |
+| **3** | **run rc=139** | **run rc=139** | **run rc=139** |
+| 4 | mmap ok / segfault | … | … |
+| 5 | run pass | run: GLOB_DAT path| run: PLT call path|
+| 6 | run pass | run pass | run pass |
+
+(Bold row = current state.) Phases 1–2 didn't surface as the
+intermediate states predicted in the original plan because the
+implementation landed Phases 1+2+3 in sequence inside a single
+session — there was never a build that exposed the "Phase 1 only"
+or "Phase 2 only" failure shapes. The post-Phase-3 row is the first
+state observable in a finished tree.
A unit-level harness for the synthetic-section builder (Phase 4) is
worth adding under `test/link/dyn/` — round-trip the `.dynsym` /
diff --git a/driver/cc.c b/driver/cc.c
@@ -555,8 +555,9 @@ static int cc_resolve_pending_libs(CcOptions* o) {
for (i = 0; i < o->npending_libs; ++i) {
char* p;
size_t sz;
- if (driver_lib_resolve(o->env, o->pending_libs[i], o->lib_search_paths,
- o->nlib_search_paths, &p, &sz) != 0) {
+ if (driver_lib_resolve(o->env, o->pending_libs[i], LIB_RESOLVE_STATIC_ONLY,
+ o->lib_search_paths, o->nlib_search_paths, &p, &sz,
+ NULL) != 0) {
driver_errf(CC_TOOL, "library not found: -l%s", o->pending_libs[i]);
return 1;
}
diff --git a/driver/ld.c b/driver/ld.c
@@ -50,6 +50,16 @@ typedef struct LdArchive {
uint8_t group_id; /* cyclic resolution group id; 0 = single-pass */
} LdArchive;
+/* Per-DSO ownership info. The DSO bytes are loaded straight off disk
+ * via env->file_io into the CfreeBytesInput passed to libcfree; only
+ * the path itself may need to be free'd if it came from -l<name>
+ * resolution. */
+typedef struct LdDso {
+ const char* path; /* path used for both open and CfreeBytesInput.name */
+ int owned; /* 1 if `path` was alloc'd by lib_resolve */
+ size_t owned_size; /* allocation size (for driver_free) */
+} LdDso;
+
typedef struct LdOptions {
DriverEnv* env;
size_t argv_bound;
@@ -60,6 +70,11 @@ typedef struct LdOptions {
int output_seen;
const char* entry; /* -e */
const char* script_path; /* -T */
+ /* PT_INTERP path. NULL means "let libcfree pick the target default
+ * (e.g. /lib/ld-musl-aarch64.so.1)". Set by -dynamic-linker /
+ * --dynamic-linker. */
+ const char* interp_path;
+ int pie; /* -pie was requested */
const char** object_files;
uint32_t nobject_files;
@@ -67,6 +82,12 @@ typedef struct LdOptions {
LdArchive* archives;
uint32_t narchives;
+ /* Shared-object inputs (positional .so / .so.N or `-l<name>` under
+ * -Bdynamic). The runtime loader resolves these by SONAME at link
+ * time → DT_NEEDED entries. */
+ LdDso* dsos;
+ uint32_t ndsos;
+
const char** lib_dirs; /* -L */
uint32_t nlib_dirs;
@@ -181,11 +202,12 @@ static int ld_alloc_arrays(LdOptions* o, int argc) {
o->object_files =
driver_alloc_zeroed(o->env, bound * sizeof(*o->object_files));
o->archives = driver_alloc_zeroed(o->env, bound * sizeof(*o->archives));
+ o->dsos = driver_alloc_zeroed(o->env, bound * sizeof(*o->dsos));
o->lib_dirs = driver_alloc_zeroed(o->env, bound * sizeof(*o->lib_dirs));
o->rpaths = driver_alloc_zeroed(o->env, bound * sizeof(*o->rpaths));
o->rpath_links = driver_alloc_zeroed(o->env, bound * sizeof(*o->rpath_links));
- if (!o->object_files || !o->archives || !o->lib_dirs || !o->rpaths ||
- !o->rpath_links) {
+ if (!o->object_files || !o->archives || !o->dsos || !o->lib_dirs ||
+ !o->rpaths || !o->rpath_links) {
driver_errf(LD_TOOL, "out of memory");
return 1;
}
@@ -206,6 +228,45 @@ static void ld_push_archive(LdOptions* o, const char* path, int owned,
a->group_id = o->cur_group_id;
}
+static void ld_push_dso(LdOptions* o, const char* path, int owned,
+ size_t owned_size) {
+ LdDso* d = &o->dsos[o->ndsos++];
+ d->path = path;
+ d->owned = owned;
+ d->owned_size = owned_size;
+}
+
+/* Filename ends in `.so` (with no further extension) or in `.so.N`
+ * for some run of digits and dots. */
+static int driver_is_so_filename(const char* path) {
+ size_t n = driver_strlen(path);
+ size_t i;
+ /* Walk from the end: trim trailing ".N" / ".N.N" sequences if any,
+ * then check that we land on ".so". */
+ i = n;
+ while (i > 0) {
+ /* Strip a trailing ".<digits>" cluster (e.g. ".1", ".26"). */
+ size_t end = i;
+ size_t j = i;
+ while (j > 0) {
+ char c = path[j - 1];
+ if (c >= '0' && c <= '9') {
+ --j;
+ continue;
+ }
+ break;
+ }
+ if (j < end && j > 0 && path[j - 1] == '.') {
+ i = j - 1;
+ continue;
+ }
+ break;
+ }
+ if (i >= 3 && path[i - 3] == '.' && path[i - 2] == 's' && path[i - 1] == 'o')
+ return 1;
+ return 0;
+}
+
/* ---------- --build-id parsing ---------- */
static int hex_nibble(char c) {
@@ -335,16 +396,26 @@ static int ld_parse(int argc, char** argv, LdOptions* o) {
const char* name = a[2] ? a + 2 : (++i < argc ? argv[i] : NULL);
char* resolved;
size_t resolved_size;
+ LibResolveKind kind;
+ LibResolveMode mode;
if (!name) {
driver_errf(LD_TOOL, "-l requires an argument");
return 1;
}
- if (driver_lib_resolve(o->env, name, o->lib_dirs, o->nlib_dirs, &resolved,
- &resolved_size) != 0) {
+ /* -Bstatic forces .a only; everything else (default,
+ * -Bdynamic, --as-needed) prefers .so but falls back to .a. */
+ mode = (o->cur_link_mode == CFREE_LM_STATIC) ? LIB_RESOLVE_STATIC_ONLY
+ : LIB_RESOLVE_DYNAMIC_PREFER;
+ if (driver_lib_resolve(o->env, name, mode, o->lib_dirs, o->nlib_dirs,
+ &resolved, &resolved_size, &kind) != 0) {
driver_errf(LD_TOOL, "cannot find -l%s", name);
return 1;
}
- ld_push_archive(o, resolved, 1, resolved_size);
+ if (kind == LIB_RESOLVE_KIND_SHARED) {
+ ld_push_dso(o, resolved, 1, resolved_size);
+ } else {
+ ld_push_archive(o, resolved, 1, resolved_size);
+ }
continue;
}
@@ -354,6 +425,20 @@ static int ld_parse(int argc, char** argv, LdOptions* o) {
}
if (driver_streq(a, "-pie")) {
o->target.pic = CFREE_PIC_PIE;
+ o->pie = 1;
+ continue;
+ }
+ if (driver_streq(a, "-dynamic-linker") ||
+ driver_streq(a, "--dynamic-linker")) {
+ if (++i >= argc) {
+ driver_errf(LD_TOOL, "-dynamic-linker requires an argument");
+ return 1;
+ }
+ o->interp_path = argv[i];
+ continue;
+ }
+ if ((val = arg_eq_value(a, "--dynamic-linker")) != NULL) {
+ o->interp_path = val;
continue;
}
if (driver_streq(a, "-no-pie")) {
@@ -488,6 +573,8 @@ static int ld_parse(int argc, char** argv, LdOptions* o) {
if (driver_has_suffix(a, ".a")) {
ld_push_archive(o, a, 0, 0);
+ } else if (driver_is_so_filename(a)) {
+ ld_push_dso(o, a, 0, 0);
} else {
o->object_files[o->nobject_files++] = a;
}
@@ -501,7 +588,7 @@ static int ld_parse(int argc, char** argv, LdOptions* o) {
driver_errf(LD_TOOL, "missing --end-group");
return 1;
}
- if (o->nobject_files == 0 && o->narchives == 0) {
+ if (o->nobject_files == 0 && o->narchives == 0 && o->ndsos == 0) {
driver_errf(LD_TOOL, "no input files");
ld_usage();
return 1;
@@ -520,11 +607,18 @@ static void ld_options_release(LdOptions* o) {
driver_free(o->env, (void*)a->path, a->owned_size);
}
}
+ for (i = 0; i < o->ndsos; ++i) {
+ LdDso* d = &o->dsos[i];
+ if (d->owned && d->path) {
+ driver_free(o->env, (void*)d->path, d->owned_size);
+ }
+ }
if (o->build_id_bytes) {
driver_free(o->env, o->build_id_bytes, o->build_id_alloc);
}
driver_free(o->env, o->object_files, bound * sizeof(*o->object_files));
driver_free(o->env, o->archives, bound * sizeof(*o->archives));
+ driver_free(o->env, o->dsos, bound * sizeof(*o->dsos));
driver_free(o->env, o->lib_dirs, bound * sizeof(*o->lib_dirs));
driver_free(o->env, o->rpaths, bound * sizeof(*o->rpaths));
driver_free(o->env, o->rpath_links, bound * sizeof(*o->rpath_links));
@@ -574,9 +668,11 @@ static int ld_run_link(LdOptions* o) {
CfreeWriter* writer = NULL;
LoadedFile* obj_lf = NULL;
LoadedFile* arch_lf = NULL;
+ LoadedFile* dso_lf = NULL;
LoadedFile script_lf = {0};
CfreeBytesInput* obj_in = NULL;
CfreeBytesInputArchive* arch_in = NULL;
+ CfreeBytesInput* dso_in = NULL;
const CfreeLinkScript* script = NULL;
CfreeLinkInputs inputs;
CfreeLinkOptions link_opts;
@@ -606,6 +702,14 @@ static int ld_run_link(LdOptions* o) {
goto out;
}
}
+ if (o->ndsos) {
+ dso_lf = driver_alloc_zeroed(o->env, o->ndsos * sizeof(*dso_lf));
+ dso_in = driver_alloc_zeroed(o->env, o->ndsos * sizeof(*dso_in));
+ if (!dso_lf || !dso_in) {
+ driver_errf(LD_TOOL, "out of memory");
+ goto out;
+ }
+ }
/* Load object files. */
for (i = 0; i < o->nobject_files; ++i) {
@@ -632,6 +736,17 @@ static int ld_run_link(LdOptions* o) {
arch_in[i].link_mode = a->link_mode;
arch_in[i].group_id = a->group_id;
}
+ /* Load shared objects. */
+ for (i = 0; i < o->ndsos; ++i) {
+ const LdDso* d = &o->dsos[i];
+ if (load_file(io, d->path, &dso_lf[i]) != 0) {
+ driver_errf(LD_TOOL, "failed to read: %s", d->path);
+ goto out;
+ }
+ dso_in[i].name = d->path;
+ dso_in[i].data = dso_lf[i].data.data;
+ dso_in[i].len = dso_lf[i].data.size;
+ }
/* Load and parse the linker script (if any). The structured script is
* arena-owned by the compiler; we free it explicitly before the
@@ -683,6 +798,8 @@ static int ld_run_link(LdOptions* o) {
inputs.nobj_bytes = o->nobject_files;
inputs.archives = arch_in;
inputs.narchives = o->narchives;
+ inputs.dso_bytes = dso_in;
+ inputs.ndso_bytes = o->ndsos;
inputs.linker_script = script;
inputs.entry = o->entry;
inputs.build_id_mode = o->build_id_mode;
@@ -723,6 +840,8 @@ static int ld_run_link(LdOptions* o) {
link_opts = zero;
link_opts.inputs = inputs;
link_opts.gc_sections = o->gc_sections;
+ link_opts.pie = o->pie;
+ link_opts.interp_path = o->interp_path;
if (o->export_dynamic) {
/* TODO(#5/exe): once CfreeLinkOptions grows an export_dynamic
* field (or per-symbol export list for executables), wire it
@@ -749,10 +868,13 @@ out:
release_file(&script_lf);
release_all(arch_lf, o->narchives);
release_all(obj_lf, o->nobject_files);
+ release_all(dso_lf, o->ndsos);
if (arch_in) driver_free(o->env, arch_in, o->narchives * sizeof(*arch_in));
if (arch_lf) driver_free(o->env, arch_lf, o->narchives * sizeof(*arch_lf));
if (obj_in) driver_free(o->env, obj_in, o->nobject_files * sizeof(*obj_in));
if (obj_lf) driver_free(o->env, obj_lf, o->nobject_files * sizeof(*obj_lf));
+ if (dso_in) driver_free(o->env, dso_in, o->ndsos * sizeof(*dso_in));
+ if (dso_lf) driver_free(o->env, dso_lf, o->ndsos * sizeof(*dso_lf));
return rc;
}
diff --git a/driver/lib_resolve.c b/driver/lib_resolve.c
@@ -3,16 +3,19 @@
#include <stddef.h>
#include <stdint.h>
-/* Compose `<dir>/lib<name>.a` into a fresh heap buffer. Inserts a separating
- * '/' iff `dir` does not already end in one. Empty `dir` is treated as the
- * current directory: the path is `lib<name>.a`. */
+/* Compose `<dir>/lib<name><suffix>` into a fresh heap buffer. Inserts
+ * a separating '/' iff `dir` does not already end in one. Empty `dir`
+ * is treated as the current directory: the path becomes
+ * `lib<name><suffix>`. `suffix` is e.g. ".a" or ".so" — caller-owned,
+ * NUL-terminated. */
static char* compose_path(DriverEnv* env, const char* dir, const char* name,
- size_t* out_size) {
+ const char* suffix, size_t* out_size) {
size_t dlen = driver_strlen(dir);
size_t nlen = driver_strlen(name);
+ size_t slen = driver_strlen(suffix);
size_t need_slash = (dlen > 0 && dir[dlen - 1] != '/') ? 1 : 0;
- /* "<dir>" + "/"? + "lib" + "<name>" + ".a" + NUL */
- size_t bytes = dlen + need_slash + 3 + nlen + 2 + 1;
+ /* "<dir>" + "/"? + "lib" + "<name>" + "<suffix>" + NUL */
+ size_t bytes = dlen + need_slash + 3 + nlen + slen + 1;
char* buf = driver_alloc(env, bytes);
size_t off = 0;
if (!buf) return NULL;
@@ -29,22 +32,25 @@ static char* compose_path(DriverEnv* env, const char* dir, const char* name,
driver_memcpy(buf + off, name, nlen);
off += nlen;
}
- driver_memcpy(buf + off, ".a", 2);
- off += 2;
+ if (slen) {
+ driver_memcpy(buf + off, suffix, slen);
+ off += slen;
+ }
buf[off] = '\0';
*out_size = bytes;
return buf;
}
-int driver_lib_resolve(DriverEnv* env, const char* name,
- const char* const* search_dirs, uint32_t nsearch_dirs,
- char** out_path, size_t* out_size) {
+/* Try one (suffix, kind) pair across every search dir; return 0 on
+ * the first hit. Allocations for non-matching candidates are freed
+ * before the next attempt. */
+static int try_suffix(DriverEnv* env, const char* name, const char* suffix,
+ const char* const* search_dirs, uint32_t nsearch_dirs,
+ char** out_path, size_t* out_size) {
uint32_t i;
- if (!env || !name) return 1;
-
for (i = 0; i < nsearch_dirs; ++i) {
size_t bytes;
- char* cand = compose_path(env, search_dirs[i], name, &bytes);
+ char* cand = compose_path(env, search_dirs[i], name, suffix, &bytes);
if (!cand) return 1;
if (driver_path_exists(cand)) {
*out_path = cand;
@@ -55,3 +61,33 @@ int driver_lib_resolve(DriverEnv* env, const char* name,
}
return 1;
}
+
+int driver_lib_resolve(DriverEnv* env, const char* name, LibResolveMode mode,
+ const char* const* search_dirs, uint32_t nsearch_dirs,
+ char** out_path, size_t* out_size,
+ LibResolveKind* out_kind) {
+ if (!env || !name) return 1;
+
+ /* GNU-ld order: under dynamic mode prefer .so over .a within the
+ * same search dir. In practice that means we still iterate dirs in
+ * order, but for each dir try .so first when applicable. To keep
+ * the implementation simple and match `clang -l` behaviour, we
+ * iterate suffix-first instead — `.so` is searched across every
+ * -L dir before falling back to `.a`. The musl/Alpine layout we
+ * target keeps both side-by-side, so the difference is invisible
+ * for the cases the harness exercises. */
+ if (mode != LIB_RESOLVE_STATIC_ONLY) {
+ if (try_suffix(env, name, ".so", search_dirs, nsearch_dirs, out_path,
+ out_size) == 0) {
+ if (out_kind) *out_kind = LIB_RESOLVE_KIND_SHARED;
+ return 0;
+ }
+ if (mode == LIB_RESOLVE_DYNAMIC_ONLY) return 1;
+ }
+ if (try_suffix(env, name, ".a", search_dirs, nsearch_dirs, out_path,
+ out_size) == 0) {
+ if (out_kind) *out_kind = LIB_RESOLVE_KIND_ARCHIVE;
+ return 0;
+ }
+ return 1;
+}
diff --git a/driver/lib_resolve.h b/driver/lib_resolve.h
@@ -3,21 +3,42 @@
#include "driver.h"
-/* Resolve `-l<name>` against a list of `-L`-style search directories.
+/* Whether driver_lib_resolve should look for shared libraries (.so),
+ * archives (.a), or both. `LIB_RESOLVE_AUTO` follows the GNU-ld
+ * positional rule: under -Bdynamic try `lib<name>.so` first then
+ * `lib<name>.a`; under -Bstatic try `lib<name>.a` only.
*
- * On success, returns 0 and writes a heap-allocated, NUL-terminated path
- * into `*out_path`, with its allocation size in `*out_size`. The caller
- * frees the path via driver_free(env, *out_path, *out_size).
+ * `out_kind` (when non-NULL) reports which suffix actually matched so
+ * the caller can route the result into the right input slot
+ * (dso_bytes vs. archives). */
+typedef enum LibResolveMode {
+ LIB_RESOLVE_STATIC_ONLY,
+ LIB_RESOLVE_DYNAMIC_PREFER, /* .so first, then .a (default for dynamic
+ link mode) */
+ LIB_RESOLVE_DYNAMIC_ONLY,
+} LibResolveMode;
+
+typedef enum LibResolveKind {
+ LIB_RESOLVE_KIND_ARCHIVE = 0,
+ LIB_RESOLVE_KIND_SHARED = 1,
+} LibResolveKind;
+
+/* Resolve `-l<name>` against a list of `-L`-style search directories.
*
- * On failure, returns nonzero with `*out_path` unchanged. Failure cases:
- * - no `lib<name>.a` exists in any of the search directories
- * - allocation failure while constructing a candidate path
+ * On success, returns 0 and writes a heap-allocated, NUL-terminated
+ * path into `*out_path`, with its allocation size in `*out_size`. The
+ * caller frees the path via driver_free(env, *out_path, *out_size).
+ * If `out_kind` is non-NULL, *out_kind tells the caller whether the
+ * matched file is a `.so` (LIB_RESOLVE_KIND_SHARED) or a `.a`
+ * (LIB_RESOLVE_KIND_ARCHIVE).
*
- * v1 only resolves static archives (`lib<name>.a`); shared-library
- * resolution (`lib<name>.so` / `.dylib` / `.dll`) waits on shared-output
- * support in libcfree. */
-int driver_lib_resolve(DriverEnv* env, const char* name,
+ * On failure, returns nonzero with `*out_path` unchanged. Failure
+ * cases:
+ * - no candidate exists in any of the search directories
+ * - allocation failure while constructing a candidate path */
+int driver_lib_resolve(DriverEnv* env, const char* name, LibResolveMode mode,
const char* const* search_dirs, uint32_t nsearch_dirs,
- char** out_path, size_t* out_size);
+ char** out_path, size_t* out_size,
+ LibResolveKind* out_kind);
#endif
diff --git a/include/cfree.h b/include/cfree.h
@@ -896,6 +896,14 @@ typedef struct CfreeLinkInputs {
uint32_t nobj_bytes;
const CfreeBytesInputArchive* archives;
uint32_t narchives;
+ /* Shared-object inputs (ELF ET_DYN). Each entry's bytes are parsed
+ * via the linker's read_elf_dso path; the DSO contributes no
+ * sections to the output image, but its dynsym is searched during
+ * undef resolution so references against this DSO bind dynamically.
+ * The DSO's DT_SONAME (or its filename if missing) is recorded in
+ * the produced image's DT_NEEDED list. */
+ const CfreeBytesInput* dso_bytes;
+ uint32_t ndso_bytes;
/* Structured linker script. NULL means no script (target/format default
* layout). Borrowed: must outlive the cfree_link_* call. */
const CfreeLinkScript* linker_script;
@@ -918,6 +926,14 @@ typedef struct CfreeLinkInputs {
typedef struct CfreeLinkOptions {
CfreeLinkInputs inputs;
int gc_sections;
+ /* PIE / dynamic-exe shape. When `pie` is set or any DSO input is
+ * present the output is ET_DYN; the runtime loader at
+ * `interp_path` (default `/lib/ld-musl-aarch64.so.1` for
+ * aarch64-linux when not specified) binds DT_NEEDED dependencies
+ * before transferring to the entry symbol. NULL `interp_path` with
+ * `pie==0` and no DSO inputs preserves the static ET_EXEC path. */
+ int pie;
+ const char* interp_path;
} CfreeLinkOptions;
/* Options for shared-library link.
diff --git a/src/api/pipeline.c b/src/api/pipeline.c
@@ -358,6 +358,10 @@ static Linker* build_linker(Compiler* c, const CfreeLinkInputs* in) {
link_add_obj_bytes(linker, in->obj_bytes[i].name, in->obj_bytes[i].data,
in->obj_bytes[i].len);
}
+ for (i = 0; i < in->ndso_bytes; ++i) {
+ link_add_dso_bytes(linker, in->dso_bytes[i].name, in->dso_bytes[i].data,
+ in->dso_bytes[i].len);
+ }
for (i = 0; i < in->narchives; ++i) {
const CfreeBytesInputArchive* a = &in->archives[i];
link_add_archive_bytes(linker, a->input.name, a->input.data, a->input.len,
@@ -1021,6 +1025,14 @@ static const char* reloc_kind_name(u16 kind) {
return "R_AARCH64_TLSLE_LDST64_TPREL_LO12";
case R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC:
return "R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC";
+ case R_AARCH64_GLOB_DAT:
+ return "R_AARCH64_GLOB_DAT";
+ case R_AARCH64_JUMP_SLOT:
+ return "R_AARCH64_JUMP_SLOT";
+ case R_AARCH64_RELATIVE:
+ return "R_AARCH64_RELATIVE";
+ case R_AARCH64_COPY:
+ return "R_AARCH64_COPY";
case R_RV_HI20:
return "R_RISCV_HI20";
case R_RV_LO12_I:
diff --git a/src/link/link.c b/src/link/link.c
@@ -40,7 +40,10 @@ static void linker_release(Linker* l) {
* link_add_obj inputs are caller-owned and stay alive. */
for (i = 0; i < LinkInputs_count(&l->inputs); ++i) {
LinkInput* in = LinkInputs_at(&l->inputs, i);
- if (in->kind == LINK_INPUT_OBJ_BYTES && in->obj) obj_free(in->obj);
+ if ((in->kind == LINK_INPUT_OBJ_BYTES ||
+ in->kind == LINK_INPUT_DSO_BYTES) &&
+ in->obj)
+ obj_free(in->obj);
}
/* Free archive member ObjBuilders that were never pulled into inputs.
* Pulled members had their `obj` pointer transferred and nulled, so
@@ -137,6 +140,39 @@ LinkInputId link_add_obj_bytes(Linker* l, const char* name, const u8* data,
return id;
}
+LinkInputId link_add_dso_bytes(Linker* l, const char* name, const u8* data,
+ size_t len) {
+ ObjBuilder* ob;
+ LinkInput* in;
+ LinkInputId id;
+ Sym soname = 0;
+ if (!l || !data || !len) return LINK_INPUT_NONE;
+ ob = read_elf_dso(l->c, name, data, len, &soname);
+ if (!ob)
+ compiler_panic(l->c, no_loc(),
+ "link_add_dso_bytes: read_elf_dso returned NULL for '%s'",
+ name ? name : "(unnamed)");
+ in = inputs_push(l, &id);
+ in->kind = LINK_INPUT_DSO_BYTES;
+ in->obj = ob;
+ in->name = name ? pool_intern_cstr(l->c->global, name) : 0;
+ /* DT_SONAME wins; fall back to the file's basename if the DSO has
+ * no SONAME (matches GNU ld's behaviour for hand-rolled libraries
+ * that forgot to set DT_SONAME). */
+ if (soname != 0) {
+ in->soname = soname;
+ } else if (name) {
+ const char* base = name;
+ const char* p;
+ for (p = name; *p; ++p)
+ if (*p == '/') base = p + 1;
+ in->soname = pool_intern_cstr(l->c->global, base);
+ } else {
+ in->soname = 0;
+ }
+ return id;
+}
+
LinkInputId link_add_archive_bytes(Linker* l, const char* name, const u8* data,
size_t len, u8 whole_archive, u8 link_mode,
u8 group_id) {
diff --git a/src/link/link.h b/src/link/link.h
@@ -12,6 +12,11 @@ typedef enum LinkInputKind {
LINK_INPUT_OBJ,
LINK_INPUT_OBJ_BYTES,
LINK_INPUT_ARCHIVE_BYTES,
+ /* Shared-object input (ET_DYN). Parsed via read_elf_dso into an
+ * ObjBuilder containing only the DSO's exported (dynsym) symbols.
+ * Contributes nothing to layout — its symbols are searched by
+ * resolve_undefs to satisfy imported references. */
+ LINK_INPUT_DSO_BYTES,
} LinkInputKind;
typedef u32 LinkInputId;
@@ -32,6 +37,11 @@ typedef struct LinkInput {
u8 pad[3];
ObjBuilder* obj; /* for LINK_INPUT_OBJ, otherwise NULL until read */
Sym name; /* diagnostic name for bytes inputs */
+ /* DSO-only: SONAME extracted from PT_DYNAMIC.DT_SONAME. 0 if absent.
+ * Used as the DT_NEEDED entry for the consuming exe / shared lib —
+ * the runtime loader looks up the dependency by SONAME, not by the
+ * filesystem path passed at link time. */
+ Sym soname;
} LinkInput;
typedef struct LinkSymbol {
@@ -47,7 +57,20 @@ typedef struct LinkSymbol {
u8 bind; /* SymBind */
u8 kind; /* SymKind */
u8 defined;
- u8 pad;
+ /* Dynamic-link bookkeeping. `imported` is set when an undef was
+ * matched against a DSO input's exports — the symbol stays
+ * structurally undefined (the static linker has no value for it)
+ * but resolve_undefs no longer panics on it. `dso_input_id` is the
+ * id of the providing DSO LinkInput; the DSO's SONAME ends up in
+ * the produced image's DT_NEEDED list. The needs_* flags are set
+ * during reloc-rewrite (Phase 5) — declared here so the model is
+ * stable across the dyn-link work. */
+ u8 imported;
+ LinkInputId dso_input_id;
+ u8 needs_plt;
+ u8 needs_got;
+ u8 needs_copy;
+ u8 pad[5];
} LinkSymbol;
typedef struct LinkSegment {
@@ -107,6 +130,14 @@ void link_free(Linker*);
LinkInputId link_add_obj(Linker*, ObjBuilder*);
LinkInputId link_add_obj_bytes(Linker*, const char* name, const u8* data,
size_t len);
+/* Shared-object input. The bytes are parsed as ET_DYN ELF; only the
+ * DSO's dynsym (exported symbols) is materialized. The DSO contributes
+ * no sections to the output image — its presence influences resolution
+ * (an undef matched by name against this DSO's exports becomes an
+ * imported symbol) and DT_NEEDED bookkeeping (the DSO's SONAME, or its
+ * filename if no SONAME, is recorded as a runtime dependency). */
+LinkInputId link_add_dso_bytes(Linker*, const char* name, const u8* data,
+ size_t len);
/* `whole_archive` (nonzero == --whole-archive) and `link_mode`
* (CfreeLinkMode: -Bstatic / -Bdynamic / --as-needed positional state) are
* orthogonal per-archive flags. `group_id == 0` means linear single-pass;
diff --git a/src/link/link_layout.c b/src/link/link_layout.c
@@ -145,6 +145,14 @@ static void resolve_symbols(Linker* l, LinkImage* img) {
ObjSymIter* it;
ObjSymEntry e;
+ /* DSO inputs do not contribute symbol definitions to the image —
+ * their exports satisfy undefs through resolve_undefs's
+ * DSO-search path, which marks the consuming LinkSymbols as
+ * imported. Skipping here keeps DSO names out of img->globals
+ * so a static-side defined symbol of the same name doesn't
+ * collide and a DSO export doesn't accidentally win. */
+ if (in->kind == LINK_INPUT_DSO_BYTES) continue;
+
/* obj.h: ObjSymId 0 is the "none" sentinel; the iterator skips
* it. We need an upper bound for the per-input symbol map,
* which is the builder's nsymbols (count incl. id-0 sentinel).
@@ -243,6 +251,34 @@ static void resolve_symbols(Linker* l, LinkImage* img) {
}
}
+/* Search the DSO inputs for a defined exported symbol matching
+ * `name`. Returns the LinkInputId of the first DSO that exports
+ * `name` (with its name interned in the same global pool, so a Sym
+ * comparison is sufficient), or LINK_INPUT_NONE if no DSO matches.
+ * Walks DSOs in input order so a leftmost-wins rule applies — same
+ * behaviour as GNU ld for ambiguous DSO exports. */
+static LinkInputId find_dso_export(Linker* l, Sym name) {
+ u32 ii;
+ ObjSymIter* it;
+ ObjSymEntry e;
+ if (name == 0) return LINK_INPUT_NONE;
+ for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) {
+ LinkInput* in = LinkInputs_at(&l->inputs, ii);
+ if (in->kind != LINK_INPUT_DSO_BYTES) continue;
+ it = obj_symiter_new(in->obj);
+ while (obj_symiter_next(it, &e)) {
+ const ObjSym* s = e.sym;
+ if (s->name != name) continue;
+ if (s->kind == SK_UNDEF) continue;
+ if (s->bind == SB_LOCAL) continue;
+ obj_symiter_free(it);
+ return in->id;
+ }
+ obj_symiter_free(it);
+ }
+ return LINK_INPUT_NONE;
+}
+
static void resolve_undefs(Linker* l, LinkImage* img) {
u32 i;
/* For every symbol that's still SK_UNDEF and visible by name, look
@@ -271,6 +307,23 @@ static void resolve_undefs(Linker* l, LinkImage* img) {
}
}
}
+ /* Dynamic-link match: a DSO input exports this name. The symbol
+ * stays "structurally undefined" — the static linker never
+ * computes a vaddr for it — but we mark it imported so the panic
+ * path below leaves it alone, and so later phases (PLT/GOT slot
+ * synthesis, .rela.dyn emit) know to wire it through dynamic
+ * relocs. The DSO's input id ends up in DT_NEEDED via the
+ * input's `soname` field. The actual JUMP_SLOT / GLOB_DAT /
+ * needs_plt / needs_got decisions land in Phases 4–5 alongside
+ * the synthetic-section work. */
+ if (s->name != 0) {
+ LinkInputId dso = find_dso_export(l, s->name);
+ if (dso != LINK_INPUT_NONE) {
+ s->imported = 1;
+ s->dso_input_id = dso;
+ continue;
+ }
+ }
if (l->resolver && s->name != 0) {
size_t namelen;
const char* nm = pool_str(l->c->global, s->name, &namelen);
@@ -956,10 +1009,16 @@ static void emit_segment_bytes(Linker* l, LinkImage* img) {
static void link_symbols_to_sections(Linker* l, LinkImage* img) {
u32 ii;
for (ii = 0; ii < LinkInputs_count(&l->inputs); ++ii) {
- ObjBuilder* ob = LinkInputs_at(&l->inputs, ii)->obj;
+ LinkInput* in = LinkInputs_at(&l->inputs, ii);
+ ObjBuilder* ob = in->obj;
InputMap* m = &img->input_maps[ii];
- ObjSymIter* it = obj_symiter_new(ob);
+ ObjSymIter* it;
ObjSymEntry e;
+ /* DSO inputs were skipped in resolve_symbols — their per-input
+ * map is unallocated. They contribute no defined LinkSymbols
+ * either, so there's nothing to map to a section. */
+ if (in->kind == LINK_INPUT_DSO_BYTES) continue;
+ it = obj_symiter_new(ob);
while (obj_symiter_next(it, &e)) {
LinkSymId lsid = m->sym[e.id];
LinkSymbol* ls;
diff --git a/src/obj/elf.h b/src/obj/elf.h
@@ -116,11 +116,66 @@
#define ELF64_R_INFO(s, t) ((((u64)(s)) << 32) | ((u64)(t) & 0xffffffffull))
/* ---- program header ---- */
+#define PT_NULL 0
#define PT_LOAD 1
+#define PT_DYNAMIC 2
+#define PT_INTERP 3
+#define PT_NOTE 4
+#define PT_PHDR 6
+#define PT_TLS 7
+#define PT_GNU_EH_FRAME 0x6474e550
+#define PT_GNU_STACK 0x6474e551
+#define PT_GNU_RELRO 0x6474e552
#define PF_X 0x1u
#define PF_W 0x2u
#define PF_R 0x4u
+/* ---- dynamic-table tags (PT_DYNAMIC body) ---- */
+#define DT_NULL 0
+#define DT_NEEDED 1
+#define DT_PLTRELSZ 2
+#define DT_PLTGOT 3
+#define DT_HASH 4
+#define DT_STRTAB 5
+#define DT_SYMTAB 6
+#define DT_RELA 7
+#define DT_RELASZ 8
+#define DT_RELAENT 9
+#define DT_STRSZ 10
+#define DT_SYMENT 11
+#define DT_INIT 12
+#define DT_FINI 13
+#define DT_SONAME 14
+#define DT_RPATH 15
+#define DT_SYMBOLIC 16
+#define DT_REL 17
+#define DT_RELSZ 18
+#define DT_RELENT 19
+#define DT_PLTREL 20
+#define DT_DEBUG 21
+#define DT_TEXTREL 22
+#define DT_JMPREL 23
+#define DT_BIND_NOW 24
+#define DT_INIT_ARRAY 25
+#define DT_FINI_ARRAY 26
+#define DT_INIT_ARRAYSZ 27
+#define DT_FINI_ARRAYSZ 28
+#define DT_RUNPATH 29
+#define DT_FLAGS 30
+#define DT_PREINIT_ARRAY 32
+#define DT_PREINIT_ARRAYSZ 33
+#define DT_GNU_HASH 0x6ffffef5
+#define DT_FLAGS_1 0x6ffffffb
+#define DF_1_NOW 0x00000001
+
+/* ---- extra section types we need to recognize in DSO inputs ---- */
+#define SHT_DYNAMIC 6
+#define SHT_DYNSYM 11
+#define SHT_GNU_HASH 0x6ffffff6
+#define SHT_GNU_VERSYM 0x6fffffff
+#define SHT_GNU_VERNEED 0x6ffffffe
+#define SHT_GNU_VERDEF 0x6ffffffd
+
/* ---- AArch64 ELF wire-format relocation type codes ----
* Prefixed ELF_ to avoid collision with the cfree-canonical RelocKind
* enum values in obj.h (R_AARCH64_*). */
@@ -148,6 +203,13 @@
#define ELF_R_AARCH64_ADR_GOT_PAGE 311
#define ELF_R_AARCH64_LD64_GOT_LO12_NC 312
+/* AArch64 dynamic-only reloc types: generated by the linker into
+ * .rela.dyn / .rela.plt and processed by the runtime loader. */
+#define ELF_R_AARCH64_COPY 1024
+#define ELF_R_AARCH64_GLOB_DAT 1025
+#define ELF_R_AARCH64_JUMP_SLOT 1026
+#define ELF_R_AARCH64_RELATIVE 1027
+
/* AArch64 TLS Local-Exec (static linking model: each TLV is at a fixed
* offset from the thread pointer, computed at link time). */
#define ELF_R_AARCH64_TLSLE_ADD_TPREL_HI12 549
diff --git a/src/obj/elf_read.c b/src/obj/elf_read.c
@@ -213,6 +213,13 @@ ObjBuilder* read_elf(Compiler* c, const char* name, const u8* data,
compiler_panic(c, no_loc(), "read_elf: not ELFDATA2LSB (got %u)",
data[EI_DATA]);
+ u16 e_type = elf_rd_u16(data + 16);
+ if (e_type != ET_REL)
+ compiler_panic(c, no_loc(),
+ "read_elf: only ET_REL inputs are accepted by read_elf "
+ "(got e_type=%u); use read_elf_dso for ET_DYN shared objects",
+ (u32)e_type);
+
u16 e_machine = elf_rd_u16(data + 18);
if (e_machine != EM_AARCH64)
compiler_panic(c, no_loc(),
@@ -470,3 +477,197 @@ ObjBuilder* read_elf(Compiler* c, const char* name, const u8* data,
obj_finalize(ob);
return ob;
}
+
+/* ---- ET_DYN (shared object) reader ----
+ *
+ * Produces an ObjBuilder containing only the DSO's exported symbols
+ * (parsed from .dynsym, not .symtab). The DSO's sections, relocations,
+ * and groups are skipped — DSOs contribute no bytes to the output
+ * image. The DT_SONAME (if any) is interned and returned via
+ * `*soname_out` so the caller can record DT_NEEDED at link time.
+ *
+ * Symbol shape: each defined dynsym entry produces an ObjSym whose
+ * (bind, kind, vis) match the source. `section_id` is OBJ_SEC_NONE —
+ * the symbol's value is its DSO-internal vaddr, not meaningful to the
+ * consuming linker, so we record `value=0`. The linker layer
+ * (resolve_undefs) only consults the name and the defined-ness flag.
+ *
+ * Undefined dynsym entries (st_shndx==SHN_UNDEF) are imports the DSO
+ * itself has against other libraries; they're not relevant to a
+ * consumer that's linking against this DSO and are dropped. */
+
+static int parse_phdr(const u8* data, size_t len, u64 e_phoff, u16 e_phentsize,
+ u16 e_phnum, u32 want_type, u64* out_offset,
+ u64* out_filesz) {
+ u32 i;
+ if (e_phentsize != ELF64_PHDR_SIZE) return 0;
+ if (e_phoff + (u64)e_phnum * ELF64_PHDR_SIZE > len) return 0;
+ for (i = 0; i < e_phnum; ++i) {
+ const u8* p = data + e_phoff + (u64)i * ELF64_PHDR_SIZE;
+ u32 p_type = elf_rd_u32(p + 0);
+ if (p_type != want_type) continue;
+ *out_offset = elf_rd_u64(p + 8);
+ *out_filesz = elf_rd_u64(p + 32);
+ return 1;
+ }
+ return 0;
+}
+
+ObjBuilder* read_elf_dso(Compiler* c, const char* name, const u8* data,
+ size_t len, Sym* soname_out) {
+ (void)name;
+ if (soname_out) *soname_out = 0;
+
+ if (len < ELF64_EHDR_SIZE)
+ compiler_panic(c, no_loc(), "read_elf_dso: input shorter than ELF header");
+ if (data[EI_MAG0] != ELFMAG0 || data[EI_MAG1] != ELFMAG1 ||
+ data[EI_MAG2] != ELFMAG2 || data[EI_MAG3] != ELFMAG3)
+ compiler_panic(c, no_loc(), "read_elf_dso: bad ELF magic");
+ if (data[EI_CLASS] != ELFCLASS64)
+ compiler_panic(c, no_loc(), "read_elf_dso: not ELFCLASS64");
+ if (data[EI_DATA] != ELFDATA2LSB)
+ compiler_panic(c, no_loc(), "read_elf_dso: not ELFDATA2LSB");
+
+ u16 e_type = elf_rd_u16(data + 16);
+ if (e_type != ET_DYN)
+ compiler_panic(c, no_loc(),
+ "read_elf_dso: expected ET_DYN, got e_type=%u", (u32)e_type);
+
+ u16 e_machine = elf_rd_u16(data + 18);
+ if (e_machine != EM_AARCH64)
+ compiler_panic(c, no_loc(),
+ "read_elf_dso: unsupported e_machine 0x%x (only AArch64)",
+ (u32)e_machine);
+
+ u64 e_phoff = elf_rd_u64(data + 32);
+ u64 e_shoff = elf_rd_u64(data + 40);
+ u16 e_phentsize = elf_rd_u16(data + 54);
+ u16 e_phnum = elf_rd_u16(data + 56);
+ u16 e_shentsize = elf_rd_u16(data + 58);
+ u16 e_shnum = elf_rd_u16(data + 60);
+ u16 e_shstrndx = elf_rd_u16(data + 62);
+
+ if (e_shentsize != ELF64_SHDR_SIZE)
+ compiler_panic(c, no_loc(), "read_elf_dso: unexpected e_shentsize %u",
+ (u32)e_shentsize);
+ if (e_shoff + (u64)e_shnum * ELF64_SHDR_SIZE > len)
+ compiler_panic(c, no_loc(),
+ "read_elf_dso: section header table out of range");
+ if (e_shstrndx >= e_shnum)
+ compiler_panic(c, no_loc(), "read_elf_dso: e_shstrndx out of range");
+
+ ShdrRec* shdrs = arena_array(c->scratch, ShdrRec, e_shnum);
+ for (u32 i = 0; i < e_shnum; ++i)
+ parse_shdr(data + e_shoff + (u64)i * ELF64_SHDR_SIZE, &shdrs[i]);
+
+ /* Locate .dynsym (preferred over .symtab — a stripped DSO carries
+ * only .dynsym) and its associated strtab via sh_link. */
+ u32 dynsym_idx = 0, dynamic_idx = 0;
+ for (u32 i = 1; i < e_shnum; ++i) {
+ if (shdrs[i].sh_type == SHT_DYNSYM && !dynsym_idx) dynsym_idx = i;
+ if (shdrs[i].sh_type == SHT_DYNAMIC && !dynamic_idx) dynamic_idx = i;
+ }
+
+ if (!dynsym_idx)
+ compiler_panic(c, no_loc(),
+ "read_elf_dso: no SHT_DYNSYM in shared object");
+
+ /* Parse PT_DYNAMIC for DT_SONAME. The .dynamic section gives us the
+ * dynstr to resolve the SONAME's offset; if there's no .dynamic
+ * section we fall back to scanning the PT_DYNAMIC segment. */
+ Sym soname = 0;
+ if (dynamic_idx) {
+ const ShdrRec* dsh = &shdrs[dynamic_idx];
+ if (dsh->sh_link >= e_shnum)
+ compiler_panic(c, no_loc(),
+ "read_elf_dso: .dynamic sh_link %u out of range",
+ dsh->sh_link);
+ const ShdrRec* str_sh = &shdrs[dsh->sh_link];
+ if (str_sh->sh_offset + str_sh->sh_size > len)
+ compiler_panic(c, no_loc(),
+ "read_elf_dso: .dynamic strtab out of range");
+ const u8* dynstr = data + str_sh->sh_offset;
+ u64 dynstr_sz = str_sh->sh_size;
+
+ if (dsh->sh_offset + dsh->sh_size > len)
+ compiler_panic(c, no_loc(), "read_elf_dso: .dynamic body out of range");
+ const u8* dynp = data + dsh->sh_offset;
+ u64 dynsz = dsh->sh_size;
+ /* DT entries are 16 bytes: (d_tag: u64, d_un: u64). */
+ for (u64 off = 0; off + 16 <= dynsz; off += 16) {
+ u64 tag = elf_rd_u64(dynp + off);
+ u64 val = elf_rd_u64(dynp + off + 8);
+ if (tag == DT_NULL) break;
+ if (tag == DT_SONAME) {
+ u32 nlen;
+ const char* nm = strtab_lookup(dynstr, dynstr_sz, (u32)val, &nlen);
+ if (nlen) soname = pool_intern(c->global, nm, nlen);
+ break;
+ }
+ }
+ } else if (e_phnum) {
+ /* Fallback: walk PT_DYNAMIC straight from program headers. We
+ * only need DT_SONAME, so skip if we can't find a strtab pointer
+ * inline (DT_STRTAB carries a vaddr, not a file offset — stripped
+ * DSOs without SHT_DYNAMIC are exceedingly rare in practice). */
+ u64 dyn_off, dyn_sz;
+ (void)parse_phdr(data, len, e_phoff, e_phentsize, e_phnum, PT_DYNAMIC,
+ &dyn_off, &dyn_sz);
+ }
+ if (soname_out) *soname_out = soname;
+
+ /* Now parse .dynsym. */
+ const ShdrRec* sh = &shdrs[dynsym_idx];
+ if (sh->sh_entsize != ELF64_SYM_SIZE)
+ compiler_panic(c, no_loc(), "read_elf_dso: .dynsym entsize %llu != %u",
+ (unsigned long long)sh->sh_entsize, (u32)ELF64_SYM_SIZE);
+ if (sh->sh_size % ELF64_SYM_SIZE)
+ compiler_panic(c, no_loc(),
+ "read_elf_dso: .dynsym size not multiple of entry size");
+ if (sh->sh_link >= e_shnum)
+ compiler_panic(c, no_loc(), "read_elf_dso: .dynsym sh_link out of range");
+ const ShdrRec* str_sh = &shdrs[sh->sh_link];
+ if (str_sh->sh_offset + str_sh->sh_size > len)
+ compiler_panic(c, no_loc(), "read_elf_dso: .dynstr out of range");
+ const u8* strtab = data + str_sh->sh_offset;
+ u64 strtab_sz = str_sh->sh_size;
+
+ ObjBuilder* ob = obj_new(c);
+ if (!ob) compiler_panic(c, no_loc(), "read_elf_dso: obj_new failed");
+
+ u32 nsyms = (u32)(sh->sh_size / ELF64_SYM_SIZE);
+ const u8* base = data + sh->sh_offset;
+ for (u32 i = 1; i < nsyms; ++i) { /* skip index 0 */
+ const u8* p = base + (u64)i * ELF64_SYM_SIZE;
+ u32 st_name = elf_rd_u32(p + 0);
+ u8 st_info = p[4];
+ u8 st_other = p[5];
+ u16 st_shndx = elf_rd_u16(p + 6);
+
+ /* Skip the DSO's own undefined imports — they don't satisfy any
+ * undef in our consumer. Locals (STB_LOCAL) likewise aren't
+ * exported and would only confuse the resolver. */
+ if (st_shndx == SHN_UNDEF) continue;
+ u32 e_bind = ELF64_ST_BIND(st_info);
+ if (e_bind == STB_LOCAL) continue;
+
+ u32 nlen;
+ const char* nm = strtab_lookup(strtab, strtab_sz, st_name, &nlen);
+ if (!nlen) continue;
+ Sym sn = pool_intern(c->global, nm, nlen);
+
+ u32 e_type_field = ELF64_ST_TYPE(st_info);
+ u16 bind = elf_bind_to_obj(e_bind);
+ u16 kind = elf_type_to_kind(e_type_field, st_shndx);
+ u8 vis = elf_other_to_vis(st_other);
+
+ /* DSO exports land as defined symbols in OBJ_SEC_NONE with
+ * value=0. The consumer treats them as imports — see
+ * resolve_undefs in src/link/link_layout.c. */
+ obj_symbol_ex(ob, sn, (SymBind)bind, (SymVis)vis, (SymKind)kind,
+ OBJ_SEC_NONE, 0, 0, 0);
+ }
+
+ obj_finalize(ob);
+ return ob;
+}
diff --git a/src/obj/elf_reloc_aarch64.c b/src/obj/elf_reloc_aarch64.c
@@ -85,6 +85,14 @@ u32 elf_aarch64_reloc_to(u32 kind /* RelocKind */) {
return ELF_R_AARCH64_TLSLE_LDST64_TPREL_LO12;
case R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC:
return ELF_R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC;
+ case R_AARCH64_GLOB_DAT:
+ return ELF_R_AARCH64_GLOB_DAT;
+ case R_AARCH64_JUMP_SLOT:
+ return ELF_R_AARCH64_JUMP_SLOT;
+ case R_AARCH64_RELATIVE:
+ return ELF_R_AARCH64_RELATIVE;
+ case R_AARCH64_COPY:
+ return ELF_R_AARCH64_COPY;
default:
return ELF_R_AARCH64_NONE;
}
@@ -160,6 +168,14 @@ u32 elf_aarch64_reloc_from(u32 elf_type) {
return R_AARCH64_TLSLE_LDST64_TPREL_LO12;
case ELF_R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC:
return R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC;
+ case ELF_R_AARCH64_GLOB_DAT:
+ return R_AARCH64_GLOB_DAT;
+ case ELF_R_AARCH64_JUMP_SLOT:
+ return R_AARCH64_JUMP_SLOT;
+ case ELF_R_AARCH64_RELATIVE:
+ return R_AARCH64_RELATIVE;
+ case ELF_R_AARCH64_COPY:
+ return R_AARCH64_COPY;
default:
return (u32)-1; /* sentinel */
}
diff --git a/src/obj/obj.h b/src/obj/obj.h
@@ -137,6 +137,16 @@ typedef enum RelocKind {
R_AARCH64_TLSLE_LDST32_TPREL_LO12_NC,
R_AARCH64_TLSLE_LDST64_TPREL_LO12,
R_AARCH64_TLSLE_LDST64_TPREL_LO12_NC,
+ /* Dynamic-only relocs: emitted into .rela.dyn / .rela.plt of an
+ * ET_DYN/ET_EXEC output and processed by the runtime loader. They
+ * never appear in ET_REL inputs from a compiler; the linker may
+ * synthesize them during dynamic-exe / shared-lib emit, and the
+ * reader recognizes them when it walks an ET_DYN's .rela.* sections
+ * (currently only used for symbol-name extraction, not applied). */
+ R_AARCH64_GLOB_DAT,
+ R_AARCH64_JUMP_SLOT,
+ R_AARCH64_RELATIVE,
+ R_AARCH64_COPY,
R_RV_HI20,
R_RV_LO12_I,
R_RV_LO12_S,
@@ -303,6 +313,17 @@ void emit_wasm(Compiler*, ObjBuilder*, Writer*);
/* ---- file format readers (for ld and objdump) ---- */
ObjBuilder* read_elf(Compiler*, const char* name, const u8* data, size_t len);
+/* ELF ET_DYN reader. Produces an ObjBuilder containing only the DSO's
+ * exported (dynsym) symbols. Defined dynsym entries land as ObjSyms
+ * with their original SymBind/SymKind so the linker's symbol-resolution
+ * pass can match them by name. The DSO's sections, relocations, and
+ * groups are all skipped — DSOs contribute no bytes to the output.
+ *
+ * If `soname_out` is non-NULL, *soname_out receives the DT_SONAME
+ * interned into the compiler's global Sym pool, or 0 if the DSO has
+ * no SONAME. */
+ObjBuilder* read_elf_dso(Compiler*, const char* name, const u8* data,
+ size_t len, Sym* soname_out);
ObjBuilder* read_coff(Compiler*, const char* name, const u8* data, size_t len);
ObjBuilder* read_macho(Compiler*, const char* name, const u8* data, size_t len);
ObjBuilder* read_wasm(Compiler*, const char* name, const u8* data, size_t len);
diff --git a/test/test.mk b/test/test.mk
@@ -86,14 +86,23 @@ test-link: lib $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER)
test-cg: lib $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER)
bash test/cg/run.sh
-# test-musl: end-to-end static-musl link/run on aarch64. Pulls a pinned
-# musl sysroot (test/musl/extract.sh — uses podman against Alpine 3.20),
-# builds rt/build/aarch64-linux/libcfree_rt.a for the soft-float / TF
-# builtins, and runs `cfree ld` against the real musl libc.a. Excluded
-# from the default `test` target because it needs podman and ~30s on
-# first run; opt-in via `make test-musl`.
-test-musl: bin rt-aarch64-linux
+# test-musl: end-to-end static + dynamic musl link/run on aarch64.
+# Pulls a pinned musl sysroot (test/musl/extract.sh — podman against
+# Alpine 3.20), builds rt/build/aarch64-linux/libcfree_rt.a for the
+# soft-float / TF builtins, and runs `cfree ld` against the real musl
+# libc.a (static variant) and libc.so (dynamic variant — see
+# doc/DYNLD.md). Excluded from the default `test` target because it
+# needs podman and ~30s on first run; opt-in via `make test-musl`.
+#
+# The sysroot is treated as a real prerequisite via its PROVENANCE
+# marker so subsequent runs skip extraction and re-extract only when
+# the file is removed (or test/musl/extract.sh -f forces a rebuild).
+MUSL_SYSROOT_MARKER = build/musl-sysroot/PROVENANCE
+
+$(MUSL_SYSROOT_MARKER): test/musl/extract.sh test/musl/Containerfile
@bash test/musl/extract.sh
+
+test-musl: bin rt-aarch64-linux $(MUSL_SYSROOT_MARKER)
@bash test/musl/run.sh
# Fail if libcfree.a depends on any external symbol not in the allowlist.