kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 0f61a9b9327b44012f4c9ef6059ea07381da2870
parent 1f72786aa4fc1d69eb8c0742efcd5c831b1ce3d9
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 11:44:20 -0700

obj/abi/link: MULTIOBJ Phase 1 — Mach-O seams + plan

Establishes the seams that let a second object format land additively
(see doc/MULTIOBJ.md).  No behavior change for ELF; aarch64-linux test
suites green (test-elf 37/37, test-link 119/119, test-cg 1549/1549).

- link_emit_image_writer dispatches by target.obj; Mach-O / COFF / Wasm
  panic with format-specific "unimplemented" diagnostics.
- ABI vtable selection keys on (arch, os); apple_arm64_vtable aliases
  AAPCS64's compute_func_info as a Phase 1 shim, with va_list overridden
  to char* (Darwin's actual shape — variadics are stack-passed).
- Build-id synthesis lifted out of link_elf.c into link_image_id.c so
  Mach-O LC_UUID will hash the same bytes.
- Format-aware obj_secname_* helpers (init_array / fini_array /
  preinit_array / tdata / tbss); ELF returns the historical strings,
  Mach-O panics until the writer lands.
- src/obj/macho.h with MH_*, LC_*, nlist_64, ARM64_RELOC_*; reloc
  translator stubs in macho_reloc_aarch64.c (not reachable yet).

Diffstat:
Adoc/MULTIOBJ.md | 519+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/abi/abi.c | 13++++++++++++-
Msrc/abi/abi_aapcs64.c | 4+++-
Asrc/abi/abi_apple_arm64.c | 55+++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/abi/abi_internal.h | 3+++
Msrc/link/link.c | 23++++++++++++++++++++---
Msrc/link/link_elf.c | 59++++++++++++++++-------------------------------------------
Asrc/link/link_image_id.c | 52++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/link/link_internal.h | 12+++++++++++-
Msrc/link/link_layout.c | 121+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------
Asrc/obj/macho.h | 259+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/obj/macho_reloc_aarch64.c | 37+++++++++++++++++++++++++++++++++++++
Msrc/obj/obj.h | 16++++++++++++++++
Asrc/obj/obj_secnames.c | 97+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
14 files changed, 1190 insertions(+), 80 deletions(-)

diff --git a/doc/MULTIOBJ.md b/doc/MULTIOBJ.md @@ -0,0 +1,519 @@ +# MULTIOBJ — plan for adding a second object format (Mach-O) + +Scope: turn cfree from an ELF-only compiler/linker into one that +supports multiple `(arch, os, objfmt)` triples on the **objfmt** axis. +The first new objfmt is Mach-O, targeting macOS on aarch64 and (once +x64 codegen lands) x86_64. PE/COFF for Windows is the next peer; the +seams introduced here are designed so PE/COFF is purely additive on +top — new files, no edits to format-aware ones except the dispatch +tables. + +Companion to `MULTIARCH.md`. That doc was the **arch** axis; this is +the **objfmt** axis. The two are intentionally separate critical +paths: arch and objfmt seams land independently, validated against +each other only at the `(arch, os, objfmt)` cross-product on the test +matrix. + +--- + +## Status + +- [x] **Phase 1** — seams + format-aware bookkeeping + - [x] Linker emit dispatch — per-format panics (§3.3) + - [x] Build-id moved to format-agnostic `link_image_id_compute` (§3.5) + - [x] ABI vtable selection keys on `(arch, os)`; `apple_arm64_vtable` + aliases AAPCS64 (§3.4) + - [x] `obj_secname_*` helpers for init/fini/preinit/tdata/tbss (§3.5) + - [x] `src/obj/macho.h` + `macho_reloc_aarch64.c` stubs (§3.1, §3.2) + - [x] ELF suite green (test-elf 37/37, test-link 119/119, test-cg 1549/1549) +- [ ] **Phase 2** — Mach-O object writer + reader (MH_OBJECT) +- [ ] **Phase 3** — Mach-O linker (`link_emit_macho`) + +--- + +## 1. Today + +What exists for Mach-O already: + +- `ObjExtKind` carries `OBJ_EXT_MACHO` and `OBJ_EXT_COFF` + (`src/obj/obj.h:75-81`). +- `emit_macho` / `read_macho` (and `emit_coff` / `read_coff`) are + declared in `obj.h:344-364` and stubbed in `src/api/stubs.c:73-94` + — they panic with `unimplemented`. +- `cfree_compile_obj_emit` (`src/api/pipeline.c:306-321`) already + dispatches into them by `c->target.obj`. +- `cfree_detect_target` recognizes Mach-O magic, reads `cputype`, + and populates `(arch, os=MACOS, obj=MACHO)` + (`src/api/detect.c:180-214`). +- The driver's target parser accepts `darwin`/`macos` + (`driver/target.c:82`); `driver/env.c:754` defaults the host OS + on a Darwin build. +- `RelocKind` is already per-arch in shape (e.g. `R_AARCH64_*`, + `R_X64_*`); the per-arch ELF reloc translator is split into + `obj/elf_reloc_<arch>.c`. Mach-O reloc translators slot in as + peers. + +What is ELF-only and panics on Mach-O today: + +- `emit_macho` / `read_macho` themselves + (`src/api/stubs.c:73,90`). +- `link_emit_image_writer` (`src/link/link.c:422-432`) routes only + `CFREE_OBJ_ELF` to `link_emit_elf`; every other case panics with + "only ELF is implemented". +- `link_emit_elf` writes an ET_EXEC/ET_DYN ELF (`src/link/link_elf.c`, + `src/link/link_dyn.c`). Nothing exists for the Mach-O peer. +- Dynamic-link plumbing (`src/link/link_dyn.c::layout_dyn`) emits + `.interp` / `.dynsym` / `.dynstr` / `.gnu.hash` / `.rela.*` / + `.plt` / `.got.plt` / `.dynamic` — all ELF-shaped. Mach-O has its + own dyld machinery (LC_DYLD_INFO_ONLY / DYLD_CHAINED_FIXUPS / + LC_SYMTAB / LC_DYSYMTAB / `__la_symbol_ptr` / `__stubs`). +- ABI vtable selection (`src/abi/abi.c:286-300`) keys on + `target.arch` only; macOS-arm64 has a distinct ABI from AAPCS64 + (variadic-args-on-stack, `char`/`short` promoted on stack + arguments). +- The test harness (`test/lib/exec_target.sh`) batches via + `podman run` against `linux/<arch>` images — a Linux-only + execution path. Native Mach-O execution on the host is the new + path. + +--- + +## 2. Target slice for first Mach-O milestone + +| axis | value | +|-----------|--------------------------------------| +| arch | `CFREE_ARCH_ARM_64` | +| os | `CFREE_OS_MACOS` | +| objfmt | `CFREE_OBJ_MACHO` | +| ABI | Apple ARM64 (Darwin variant) | +| codemodel | `CFREE_CM_SMALL` (default) | + +Why this slice first, not x86_64-darwin or arm64-windows: + +- arm64 codegen is the validated one; x64 is still on the + `MULTIARCH.md` Phase 3 critical path. +- Apple no longer ships x86_64 Macs; x86_64-darwin is interesting + only as a cross target. +- Windows is on the roadmap but is a separate format **and** a + separate ABI; bundling it into the first Mach-O milestone would + blur which seam each defect lives behind. + +The pivot once arm64-darwin lands: + +- **x86_64-darwin** — additive; needs `macho_reloc_x86_64.c`, + Apple-x64 ABI vtable (close to SysV-x64 with quirks). +- **arm64-windows / x86_64-windows** — peer of this plan with + `coff_emit.c` / `coff_read.c` / `coff_reloc_<arch>.c` and the + Microsoft ABI vtable. The seams below are shaped so this is + purely additive. + +Out of scope for v1 of Mach-O: + +- Universal (fat) binaries — not a hard requirement; skip until + someone needs them. The shape is well-defined: a fat header + prepended to per-arch slices. +- Bitcode embedding, codesigning, entitlements, `__LINKEDIT` + beyond what dyld needs, debug `__DWARF` segment (covered later + by the dwarf side, not the object side). +- ObjC metadata sections (`__objc_*`). Not relevant for a C + compiler. + +--- + +## 3. The seams + +### 3.1 Object writer / reader — peers of `elf_emit.c` / `elf_read.c` + +**Decision:** introduce `src/obj/macho_emit.c` and +`src/obj/macho_read.c` as peers of the ELF pair, sitting behind the +existing `emit_macho` / `read_macho` declarations in `obj.h`. No +additional dispatch is needed at this layer — `pipeline.c:306-321` +already routes by `target.obj`. + +Round-trip invariant (matching `DESIGN.md §5.5`): `read_macho` of a +`macho_emit` output must produce an `ObjBuilder` shape-equivalent +to the input, modulo (a) Mach-O's mandatory `(segname, sectname)` +pairing for sections and (b) any synthesized `N_SECT` / `N_OSO` +symbols. + +The neutral `ObjBuilder` model accommodates Mach-O without a +schema break: + +- `Section.name` is already a single `Sym`. Mach-O writers split it + by convention: when a section's name string starts with + `__TEXT,__text` (or any other comma-separated form), the writer + takes the prefix as `segname` and the suffix as `sectname`. When + the name lacks a comma (the common case for ELF-shaped input), + the writer derives `segname` from `SecKind` (`SEC_TEXT` → + `__TEXT`, `SEC_RODATA` → `__TEXT,__const`, `SEC_DATA` → + `__DATA`, `SEC_BSS` → `__DATA,__bss`). +- `SecKind` / `SecFlag` map cleanly onto Mach-O `S_*` section + types and `S_ATTR_*` attributes. The reverse mapping (read side) + uses the existing `Section.ext_type` / `Section.ext_flags` + escape hatch (already present, see `obj.h:209-217`) for any + Mach-O-only types we don't want to lose on round-trip. +- `SymBind` / `SymKind` / `SymVis` cover Mach-O's `N_EXT` / + `N_PEXT` / `N_TYPE` adequately; `Section.ext_kind` is set to + `OBJ_EXT_MACHO` when reading so the writer knows to preserve + format-specific fields. (The same escape hatch will be used by + COFF.) +- Symbol bookkeeping: Mach-O requires symbols to be partitioned + into local / external-defined / external-undefined for + `LC_DYSYMTAB`. The partitioning is computed at write time from + the `ObjSym.bind` / `ObjSym.section_id` fields — no schema + change. + +Header includes: a new `src/obj/macho.h` peer of `obj/elf.h` for +the on-disk structures (`mach_header_64`, `segment_command_64`, +`section_64`, `symtab_command`, `nlist_64`, `relocation_info`). + +### 3.2 Per-arch Mach-O reloc translator + +`obj/elf_reloc_<arch>.c` is the model. Add: + +- `src/obj/macho_reloc_aarch64.c` — `RelocKind` ↔ + `ARM64_RELOC_*` (UNSIGNED, BRANCH26, PAGE21, PAGEOFF12, + GOT_LOAD_PAGE21, GOT_LOAD_PAGEOFF12, POINTER_TO_GOT, + TLVP_LOAD_PAGE21, TLVP_LOAD_PAGEOFF12, ADDEND, SUBTRACTOR). +- `src/obj/macho_reloc_x86_64.c` — `RelocKind` ↔ `X86_64_RELOC_*` + (UNSIGNED, SIGNED, BRANCH, GOT, GOT_LOAD, SUBTRACTOR, TLV, + SIGNED_1/2/4). Lands when x64 codegen does (post + `MULTIARCH.md` Phase 3). + +Two Mach-O-specific complications the translator absorbs: + +- **`ARM64_RELOC_ADDEND` pairs.** Mach-O encodes addends out-of- + band by emitting a leading `ARM64_RELOC_ADDEND` reloc carrying + the addend, immediately followed by the real reloc. The existing + `Reloc.pair` byte (`obj.h:226`) already exists for this kind of + paired-reloc shape (it was added with the same semantics in + mind). The translator emits the pair on write and collapses the + pair on read. +- **`ARM64_RELOC_SUBTRACTOR`** is two relocs — a SUBTRACTOR + followed by an UNSIGNED — modeling `B - A` as the resolved + value. Cfree's IR doesn't currently emit DWARF-style + difference relocs (the only consumer is `eh_frame` / + `compact_unwind` — not on the v1 critical path); leave them + out of `cgtarget` and panic in the writer if seen. Reader + recognizes them so a clang-built object round-trips through + `objdump`. + +**Decision:** no new `RelocKind` enum entries are needed for +Mach-O. The kinds are already arch-suffixed and the translator +pattern keeps the format-specific encoding local. We do not split +`R_AARCH64_PAGE21` from a hypothetical `R_MACHO_AARCH64_PAGE21` — +the underlying semantic (the page-relative ADRP fixup) is the +same; the translator picks the right ELF or Mach-O code. + +### 3.3 Linker emit dispatch + +`link_emit_image_writer` (`src/link/link.c:422-432`) is the seam. +Replace the single ELF case + panic with a switch: + +``` +case CFREE_OBJ_ELF: link_emit_elf(img, w); return; +case CFREE_OBJ_MACHO: link_emit_macho(img, w); return; +case CFREE_OBJ_COFF: link_emit_coff(img, w); return; /* later */ +case CFREE_OBJ_WASM: /* later */ +``` + +`link_emit_macho` lives in a new `src/link/link_macho.c`. It is +**not** a thin reskin of `link_emit_elf`: the LinkImage model +(segments / sections / symbols / reloc-applies) is largely shared, +but Mach-O's load-command shape is wholly different from ELF +program headers, and dyld bookkeeping is incompatible enough that +trying to share `link_dyn.c::layout_dyn` would be more work than a +peer. + +Concrete shape of `link_macho.c`: + +- Plan load commands: `LC_SEGMENT_64` × N (segments map to + Mach-O segments, one per `LinkSegment`), `LC_SYMTAB`, + `LC_DYSYMTAB`, `LC_BUILD_VERSION`, `LC_DYLD_INFO_ONLY` (or + `LC_DYLD_CHAINED_FIXUPS` for modern dyld; pick one — see + decision below), `LC_LOAD_DYLINKER`, `LC_LOAD_DYLIB` × N for + imported DSOs (peer of ELF DT_NEEDED), `LC_MAIN` (or + `LC_UNIXTHREAD` for static), `LC_FUNCTION_STARTS`, + `LC_DATA_IN_CODE`, `LC_UUID`, `LC_SOURCE_VERSION`. +- Synthesize `__LINKEDIT` segment containing symtab/strtab, + dyld export trie, indirect-symbol table, function-starts + table, code-signature placeholder. +- Synthesize `__TEXT,__stubs` (arm64: 12-byte stubs) and + `__DATA,__la_symbol_ptr` (lazy pointers) for imported + function calls; arm64 BL → stub. Or, on modern macOS, + go straight to chained fixups + non-lazy binding (no + `__stubs`). +- Apply relocations against the chosen image base + (`MH_PIE`-only for v1). + +**Decision: `LC_DYLD_CHAINED_FIXUPS` for v1, not +`LC_DYLD_INFO_ONLY`.** Chained fixups are the modern macOS path +(11+); they are smaller, simpler to emit (no opcode encoder for +the bind-info stream), and Apple has been deprecating the legacy +path. A consequence: cfree-emitted Mach-O exes do not run on +macOS 10.15 or older. That is acceptable for v1 — bring it up +only if a user complains. The legacy-bind-info encoder lands later +as an additional path, gated behind a target-min-version check. + +`link_dyn.c` stays ELF-only and is **not** generalized. The +overlap with `link_macho.c` is "we both need a list of imported +DSOs and exported symbols" — which the LinkImage already carries. +Generalizing would force a lowest-common-denominator layer that +serves neither side well. + +### 3.4 ABI vtable — widen selection to `(arch, os)` + +`abi.c::select_vtable` (lines 286-300) keys on `target.arch` alone. +For aarch64 this is wrong on macOS: + +- Apple ARM64 passes variadic arguments on the stack only (not in + x0-x7), and promotes `char`/`short` to `int` for stack args + even when the type is otherwise passed register-only. AAPCS64 + passes variadic args in registers like fixed args. +- `_Bool` is 8 bits on Darwin (matching most other platforms; + AAPCS64 also says 8 bits, so no divergence here — but the + rules diverge enough that the vtable must be distinct). + +**Decision:** widen the switch in `select_vtable` to key on the +`(target.arch, target.os)` pair. Add `apple_arm64_vtable` in a new +`src/abi/abi_apple_arm64.c`. When `(arch, os)` is `(ARM_64, MACOS)`, +install it; otherwise keep AAPCS64. + +Mechanically: + +- Initial implementation can be a thin shim: copy `aapcs64_vtable` + and override only `compute_func_info`'s variadic handling and + the by-value-on-stack promotion rule. Avoid the temptation to + factor a "shared aarch64 base" — two implementations with a + shared static helper for the register classification is enough + abstraction. +- `va_list` on Apple ARM64 is a single `char*` (much simpler than + AAPCS64's struct with five fields). The vtable's `va_list_type` + hook returns the right type per ABI — already structured for + this. +- Apple x86_64 uses SysV-x64 with minor differences (red zone is + the same; varargs use a different `__va_list_tag` layout? — no, + same). When x64 lands, `apple_x64_vtable` may be a literal + re-export of `sysv_x64_vtable`; revisit then. +- Microsoft x64 ABI (Windows) is meaningfully different (4 + register args, shadow space, varargs in registers); it gets its + own `ms_x64_vtable` peer when COFF lands. + +### 3.5 Format-aware bookkeeping in the linker layout + +`link_layout.c` emits a few ELF-shaped artifacts that need to be +either generalized or dispatched: + +- **Build-id note** (`.note.gnu.build-id`) is ELF-specific. Mach-O + uses `LC_UUID`. Move the build-id synthesis out of `layout` into + a per-format hook called by the format-specific emitter. + Decision: layout produces a 16-byte image identity (a hash of + the post-shift section bytes), and the format emitter packages + it as a build-id note (ELF) or `LC_UUID` (Mach-O) or a debug + directory (COFF/PE). One source of truth for the bytes. +- **TLS layout.** ELF uses `PT_TLS` + per-arch tpoff relocs; + Mach-O uses `__thread_vars` / `__thread_data` / `__thread_bss` + sections and `tlv_descriptor` records (a function pointer + key + + offset). The TLS lowering in cgtarget is already arch-aware; + the **section name choice** in cg/abi must become format-aware. + Add a `target.obj`-keyed dispatch where TLS section names are + picked. +- **Init/fini.** ELF uses `.init_array` / `.fini_array`; Mach-O + uses `__DATA,__mod_init_func` / `__DATA,__mod_term_func`. Same + format-aware-section-name dispatch. +- **Common symbols.** ELF emits `SK_COMMON` as `SHN_COMMON`; + Mach-O lays them out into `__DATA,__common` at link time. The + read/write paths absorb this — no `ObjBuilder` change. + +### 3.6 Driver: native execution path on Darwin hosts + +`driver/cc.c` and friends already understand `darwin`/`macos` at +the parse layer. The execution path for tests is the only new +thing: an arm64-darwin Mach-O executable runs natively on the +Darwin/arm64 host, no podman, no qemu. Detection rule for +`test/lib/exec_target.sh`: + +- A new `exec_target_darwin_native` predicate that returns 0 if + the host is `darwin/<matching-arch>` and the case's target os + is `MACOS`. Bypass the podman / qemu branches entirely; just + `chmod +x` and exec. +- For Linux hosts, executing macOS binaries is not supported. Mark + cases as XFAIL on non-Darwin hosts. +- For Darwin hosts, executing Linux binaries continues to flow + through podman as today — Apple's Virtualization.framework via + Podman Desktop or `podman machine` is already the working path + for the existing aarch64-linux suite when developing on a Mac. + +--- + +## 4. Phasing + +Three phases. Each is independently mergeable; phase 1 lands with +no behavior change, phase 2 lands a working `cfree -c` Mach-O +output, phase 3 lands `cfree` link-to-exe. + +### Phase 1 — seams + format-aware bookkeeping + +Pure refactors. No Mach-O bytes emitted yet; ELF output unchanged +byte-for-byte. + +1. **Linker emit dispatch** (§3.3). `link_emit_image_writer` (the + one site at `link/link.c:422`) gains a switch over `target.obj`; + `CFREE_OBJ_MACHO` and `CFREE_OBJ_COFF` panic with + `unimplemented` (replacing the catch-all "only ELF" panic with + per-format ones). Move the build-id synthesis out of layout + into a format-agnostic image-identity hook (§3.5). +2. **ABI vtable widening** (§3.4). `abi.c::select_vtable` keys on + `(arch, os)`. Add `src/abi/abi_apple_arm64.c` with + `apple_arm64_vtable` initially aliasing AAPCS64. The + variadic-on-stack and small-int-promotion overrides land in + phase 2 alongside the macho writer (so a build-it-and-see-what- + breaks loop on a real macOS toolchain catches divergences). + Same shape for `apple_x64_vtable` (deferred to whenever x64 + lands on Darwin). +3. **Format-aware section-name dispatch** (§3.5). TLS, + init/fini, and common-symbol section naming become a + target-keyed function rather than the hard-coded ELF strings + currently in `cg` / `abi` / `link_layout`. ELF behavior is + unchanged; the dispatch is a fall-through to the current + strings until a Mach-O case is added. +4. **Mach-O headers and reloc translator stubs** (§3.1, §3.2). + New `src/obj/macho.h` with the on-disk structures. New + `src/obj/macho_reloc_aarch64.c` with translator stubs (panic + on call). No behavior change — neither is reachable yet. + +Exit criterion: ELF test suite green and byte-for-byte identical +output objects on a representative `test/cg/` set. + +### Phase 2 — Mach-O object writer + reader (MH_OBJECT) + +Now the writer produces real Mach-O bytes. No linker work yet — +we validate by running a clang-built binary that links a +cfree-emitted `.o`. + +1. **`obj/macho_emit.c`** — writes `MH_OBJECT` from a finalized + `ObjBuilder`. Layout: header → load commands + (`LC_SEGMENT_64`-with-everything, `LC_SYMTAB`, + `LC_DYSYMTAB`, `LC_BUILD_VERSION`) → section bytes → reloc + tables → symtab/strtab in `__LINKEDIT`. (`MH_OBJECT` keeps + everything in one `LC_SEGMENT_64`.) +2. **`obj/macho_reloc_aarch64.c`** — fill in the translators + stubbed in phase 1. The `ADDEND`-pair handling on write; the + `SUBTRACTOR`-pair handling on read. +3. **`obj/macho_read.c`** — parses `MH_OBJECT` (and `MH_DYLIB` + for the linker's DSO-input path) into an `ObjBuilder`. +4. **Apple ARM64 ABI deltas.** Variadic-on-stack and small-int + promotion in `abi_apple_arm64.c` (the Phase 1 alias is no + longer correct once anything calls a varargs function). +5. **Native exec helper** (§3.6). `test/lib/exec_target.sh` + gains the Darwin-native branch. +6. **Smoke test.** A `test/cg/` case compiles to `.o` via cfree + targeting `arm64-apple-macos`, links via host `clang`, and + runs natively. Greens the standard pass/fail line. +7. **`objdump` round-trip.** A Mach-O `.o` produced by clang + round-trips through `read_macho` → `emit_macho` and + re-`read_macho` produces an equivalent `ObjBuilder`. This is + the standard cfree round-trip discipline. + +Exit criterion: `cfree -c` for `arm64-apple-macos` produces an +object that links via the host `ld` / `clang` into a runnable +executable, and clang-produced Mach-O round-trips through +cfree's reader/writer. + +### Phase 3 — Mach-O linker (`link_emit_macho`) + +Now cfree links its own Mach-O executable end-to-end, no clang. + +1. **`link/link_macho.c`** — `link_emit_macho(img, w)` peer of + `link_emit_elf`. `MH_EXECUTE` + `MH_PIE`. Modern dyld path: + `LC_DYLD_CHAINED_FIXUPS` (§3.3 decision). `__TEXT,__stubs` + for imported function calls; `__DATA_CONST,__got` for + imported data. +2. **Imported-DSO load commands.** `LC_LOAD_DYLIB` per imported + `.dylib` input (peer of ELF's `DT_NEEDED`). The Phase 1 + linker model for DSO inputs (`LINK_INPUT_DSO_BYTES`) already + carries the soname-equivalent (Mach-O's `install_name`); on + the read side, `read_macho` extracts it from `LC_ID_DYLIB`. +3. **`LC_MAIN` entry.** The entry symbol resolution + (`img->entry_sym`) already happens generically; the format + emitter just packages it as `LC_MAIN`'s `entryoff`. +4. **First end-to-end exe.** A `test/cg/` hello-world targeting + `arm64-apple-macos` compiles + links via cfree and runs on + the host. Exit code threads through the standard pass/fail + line. +5. **libSystem coverage.** `printf` / `errno` / `malloc` — + linking against `libSystem.B.dylib` (the umbrella that + re-exports libc, libm, libdyld, libpthread). Sysroot extraction + for the Darwin SDK lives in a new `test/sdk/macos/` peer of + `test/libc/{musl,glibc}/` — `xcrun --show-sdk-path` is the + source on Darwin hosts; cross-from-Linux is out of scope. +6. **Universal binaries (deferred)** — fat-header wrapper around + per-arch slices. Lands when a user wants it, not earlier. + +Exit criterion: each Mach-O milestone owns a `test/cg/` case +running natively on a Darwin/arm64 host; pass/fail line green. +ELF suite still green. + +--- + +## 5. PE/COFF as a peer (forward look) + +The seams in §3 are sized for COFF too: + +- `obj/coff_emit.c` / `obj/coff_read.c` peer `obj/macho_*`. +- `obj/coff_reloc_<arch>.c` peer `obj/macho_reloc_*`. +- `link/link_coff.c::link_emit_coff` peer `link_macho.c`. PE + uses optional headers + data directories instead of Mach-O's + load commands; the LinkImage model is still adequate. +- `abi/abi_ms_x64.c` (Win64 ABI) and a hypothetical + `abi_ms_arm64.c` (Windows on ARM ABI) as ABI vtable peers. +- Windows-on-Linux execution is wine-shaped; on a Windows host + it is native. The exec helper grows a Windows native branch; + on Linux/Darwin hosts, COFF cases default to XFAIL until a wine + branch is added. + +The ABI vtable's `(arch, os)` keying naturally captures Microsoft +ABI vs SysV vs Apple — Windows-arm64 picks `ms_arm64`, not +`apple_arm64`, even though both are arm64. + +PE/COFF gets its own `MULTIOBJ_PE.md` design pass when its +critical path opens; this doc reserves the seams. + +--- + +## 6. Naming conventions + +For the new files and exposed symbols: + +- Mach-O code lives under `src/obj/macho_*` and + `src/link/link_macho.c`. Identifiers use the `macho_` prefix + (`macho_emit`, `macho_reloc_aarch64_to`, + `link_emit_macho`). +- Apple ABIs use the `apple_` prefix (`apple_arm64_vtable`, + `apple_x64_vtable`). The host OS is the discriminator; any + future Apple-only-on-arch prefix (e.g. for an iOS-specific + variant) would extend this — but iOS / tvOS / watchOS share + the same ABIs as macOS for the relevant arches, so no second + prefix is needed. +- Mach-O reloc translators stay arch-suffixed: + `macho_reloc_aarch64.c`, `macho_reloc_x86_64.c` (matches the + ELF translator naming). +- Win64 / Windows-ARM64 (deferred) use the `ms_` prefix + (`ms_x64_vtable`, `ms_arm64_vtable`). COFF code uses the + `coff_` prefix. + +--- + +## 7. Validation gates + +A change in this plan is "done" when: + +- **Phase 1**: ELF test suite still green, byte-for-byte identical + output objects on a representative set of `test/cg/` cases. +- **Phase 2**: Mach-O `.o` produced by cfree links via host + `clang` into a runnable arm64-darwin executable; clang-built + `.o` round-trips through cfree's reader/writer; ELF suite still + green. +- **Phase 3**: `cfree -c` + `cfree` linker produces an + arm64-darwin Mach-O exe that runs natively on the Darwin host; + per-milestone `test/cg/` cases green; ELF suite still green. diff --git a/src/abi/abi.c b/src/abi/abi.c @@ -283,10 +283,21 @@ const Type* abi_va_list_type(TargetABI* a, Pool* p) { /* ---- lifecycle ---- */ +/* Vtable selection keys on (arch, os): the Apple variants of the + * ARM64 / x86_64 ABIs differ from AAPCS64 / SysV-x64 in calling + * convention details (variadic-on-stack, va_list shape, stack-arg + * promotion). Microsoft x64 / Windows-on-ARM64 will land here as + * additional (arch, OS_WINDOWS) cases when COFF support arrives. + * See doc/MULTIOBJ.md §3.4. */ static const ABIVtable* select_vtable(Compiler* c) { switch (c->target.arch) { case CFREE_ARCH_ARM_64: - return &aapcs64_vtable; + switch (c->target.os) { + case CFREE_OS_MACOS: + return &apple_arm64_vtable; + default: + return &aapcs64_vtable; + } case CFREE_ARCH_X86_64: return &sysv_x64_vtable; case CFREE_ARCH_RV64: diff --git a/src/abi/abi_aapcs64.c b/src/abi/abi_aapcs64.c @@ -96,7 +96,9 @@ static void classify_one(TargetABI* a, const Type* t, ABIArgInfo* out, } } -static ABIFuncInfo* aapcs64_compute_func_info(TargetABI* a, const Type* fn) { +/* Non-static so apple_arm64_compute_func_info can delegate to it during + * the Phase 1 alias period — see abi_apple_arm64.c. */ +ABIFuncInfo* aapcs64_compute_func_info(TargetABI* a, const Type* fn) { ABIFuncInfo* info = arena_new(a->c->tu, ABIFuncInfo); memset(info, 0, sizeof *info); diff --git a/src/abi/abi_apple_arm64.c b/src/abi/abi_apple_arm64.c @@ -0,0 +1,55 @@ +/* Apple ARM64 (Darwin) ABI dispatch. + * + * Phase 1 of doc/MULTIOBJ.md: vtable selection now keys on + * (target.arch, target.os), and (ARM_64, MACOS) lands here instead of + * AAPCS64. The two ABIs diverge in: + * + * 1. Variadics — Apple ARM64 passes ALL variadic arguments on the + * stack (no v0-v7 / x0-x7 routing for the `...` portion of the + * arglist). AAPCS64 passes them in registers like fixed args. + * Consequence: `va_list` is just `char*`, not the AAPCS64 + * five-field struct. + * + * 2. Stack-arg promotion — small integer arguments passed on the + * stack are promoted to 4 bytes minimum (so `char` and `short` + * occupy 4 stack bytes each, not 1 / 2). + * + * Phase 1 ships a thin shim that aliases AAPCS64's compute_func_info + * — variadics and the stack-arg promotion override land in Phase 2 + * alongside the macho writer, so the build-it-and-see-what-breaks + * loop on a real macOS toolchain catches divergences from the spec. + * The va_list type is overridden now because it is cheap and the + * divergence is unambiguous. + * + * Until macho_emit lands, this vtable is reachable only on the + * (ARM_64, MACOS) target slice — itself unreachable end-to-end — + * so the alias is safe. See abi_aapcs64.c for the AAPCS64 source. */ + +#include "abi/abi_internal.h" +#include "core/core.h" +#include "core/pool.h" +#include "type/type.h" + +extern ABIFuncInfo* aapcs64_compute_func_info(TargetABI*, const Type*); + +static ABIFuncInfo* apple_arm64_compute_func_info(TargetABI* a, + const Type* fn) { + /* Phase 2: spell out the Darwin variadic / stack-arg-promotion + * deltas. For now the AAPCS64 classifier produces ABI-correct + * output for the fixed-args-only programs in the v1 cg suite. */ + return aapcs64_compute_func_info(a, fn); +} + +static const Type* apple_arm64_va_list_type(TargetABI* a, Pool* p) { + /* Apple ARM64: __builtin_va_list is `char*`, full stop. No struct, + * no five fields — variadics are stack-passed so the implementation + * just walks a byte cursor. (AAPCS64 needs the five-field struct + * because variadics may originate in v0-v7 / x0-x7.) */ + (void)a; + return type_ptr(p, type_prim(p, TY_CHAR)); +} + +const ABIVtable apple_arm64_vtable = { + .compute_func_info = apple_arm64_compute_func_info, + .va_list_type = apple_arm64_va_list_type, +}; diff --git a/src/abi/abi_internal.h b/src/abi/abi_internal.h @@ -21,6 +21,9 @@ typedef struct ABIVtable { extern const ABIVtable aapcs64_vtable; extern const ABIVtable sysv_x64_vtable; extern const ABIVtable rv64_vtable; +/* Apple Darwin variants — selected when (arch, os) matches. See + * abi.c::select_vtable and doc/MULTIOBJ.md §3.4. */ +extern const ABIVtable apple_arm64_vtable; /* Shared TargetABI internals. The struct definition is here so each ABI * TU can reach into the per-TU caches via TargetABI*. abi.c owns the diff --git a/src/link/link.c b/src/link/link.c @@ -417,7 +417,12 @@ void link_resolve_extend(Linker* l, LinkImage* img) { "yet implemented"); } -/* ---- public emit dispatcher ---- */ +/* ---- public emit dispatcher ---- + * + * Per-format peers of link_emit_elf: link_emit_macho (Phase 3 of + * doc/MULTIOBJ.md) and link_emit_coff (deferred) slot in here. Until + * those land, the unimplemented cases panic with a format-specific + * diagnostic rather than the catch-all. */ void link_emit_image_writer(LinkImage* img, Writer* w) { if (!img || !w) return; @@ -425,8 +430,20 @@ void link_emit_image_writer(LinkImage* img, Writer* w) { case CFREE_OBJ_ELF: link_emit_elf(img, w); return; - default: + case CFREE_OBJ_MACHO: + compiler_panic(img->c, no_loc(), + "link_emit_image_writer: Mach-O linker emit not yet " + "implemented (see doc/MULTIOBJ.md Phase 3)"); + case CFREE_OBJ_COFF: + compiler_panic(img->c, no_loc(), + "link_emit_image_writer: COFF/PE linker emit not yet " + "implemented"); + case CFREE_OBJ_WASM: compiler_panic(img->c, no_loc(), - "link_emit_image_writer: only ELF is implemented"); + "link_emit_image_writer: Wasm linker emit not yet " + "implemented"); } + compiler_panic(img->c, no_loc(), + "link_emit_image_writer: unknown obj format %u", + (u32)img->c->target.obj); } diff --git a/src/link/link_elf.c b/src/link/link_elf.c @@ -203,12 +203,17 @@ static void shift_image_addresses(LinkImage* img, u64 delta) { } /* AArch64 ELF ABI: the per-thread TLS block starts at TP + 16 bytes - * (the TCB sits ahead of the TLS image). */ -#define AARCH64_TCB_SIZE 16ull + * (the TCB sits ahead of the TLS image). RISC-V psABI normally points + * tp at the start of the TLS image; the cfree harness's start.c + * places a 16-byte TCB ahead of .tdata and biases tp accordingly, so + * the TPREL offset for both arches is (target - tls_vaddr) + 16. */ +#define TLS_TCB_SIZE 16ull static int reloc_is_tlsle(RelocKind k) { return k == R_AARCH64_TLSLE_ADD_TPREL_HI12 || - k == R_AARCH64_TLSLE_ADD_TPREL_LO12_NC; + k == R_AARCH64_TLSLE_ADD_TPREL_LO12_NC || + k == R_RV_TPREL_HI20 || k == R_RV_TPREL_LO12_I || + k == R_RV_TPREL_LO12_S; } static int reloc_is_abs(RelocKind k) { return k == R_ABS32 || k == R_ABS64; } @@ -257,7 +262,7 @@ static void apply_all_relocs(LinkImage* img, u64 img_base) { * TLS image start plus the 16-byte TCB. Both vaddrs are * in the same (post-shift, image-relative) coordinate * system, so img_base cancels out. */ - S = (tgt->vaddr - img->tls_vaddr) + AARCH64_TCB_SIZE; + S = (tgt->vaddr - img->tls_vaddr) + TLS_TCB_SIZE; } else { S = tgt->vaddr + img_base; if (tgt->kind == SK_ABS) S = tgt->vaddr; @@ -343,43 +348,9 @@ static void apply_all_relocs(LinkImage* img, u64 img_base) { } } -/* ---- build-id: FNV-1a 64 over segment bytes, mixed to 128 bits ---- */ - -static u64 fnv1a64(const u8* data, size_t n, u64 seed) { - const u64 PRIME = 0x100000001b3ull; - u64 h = seed; - size_t i; - for (i = 0; i < n; ++i) { - h ^= (u64)data[i]; - h *= PRIME; - } - return h; -} - -static void compute_build_id(LinkImage* img, u8 out[16]) { - /* Two FNV-1a streams with different seeds → 128 bits. Mix segment - * bytes (post-reloc) plus segment vaddrs so the digest changes if - * either content or layout shifts. */ - const u64 SEED_LO = 0xcbf29ce484222325ull; - const u64 SEED_HI = 0x14650fb0739d0383ull; - u64 lo = SEED_LO, hi = SEED_HI; - u32 i; - for (i = 0; i < img->nsegments; ++i) { - const LinkSegment* seg = &img->segments[i]; - u64 vaddr = seg->vaddr; - u64 fsz = seg->file_size; - lo = fnv1a64((const u8*)&vaddr, sizeof vaddr, lo); - lo = fnv1a64((const u8*)&fsz, sizeof fsz, lo); - hi = fnv1a64((const u8*)&vaddr, sizeof vaddr, hi); - hi = fnv1a64((const u8*)&fsz, sizeof fsz, hi); - if (img->segment_bytes[i] && fsz) { - lo = fnv1a64(img->segment_bytes[i], (size_t)fsz, lo); - hi = fnv1a64(img->segment_bytes[i], (size_t)fsz, hi); - } - } - for (i = 0; i < 8; ++i) out[i] = (u8)(lo >> (i * 8)); - for (i = 0; i < 8; ++i) out[8 + i] = (u8)(hi >> (i * 8)); -} +/* The build-id payload is a format-agnostic image identity hash — + * see link_image_id_compute in link_image_id.c. Mach-O wraps the + * same bytes in LC_UUID; ELF wraps them in a .note.gnu.build-id. */ /* ---- string-table builder ---- */ @@ -795,9 +766,11 @@ void link_emit_elf(LinkImage* img, Writer* w) { } } - /* ---- compute build-id (post-reloc, deterministic) ---- */ + /* ---- compute build-id (post-reloc, deterministic) ---- + * + * Format-agnostic — Mach-O LC_UUID will hash the same bytes. */ u8 build_id[BUILD_ID_DESC_LEN]; - compute_build_id(img, build_id); + link_image_id_compute(img, build_id); /* ---- plan section headers covering loaded segments ---- * diff --git a/src/link/link_image_id.c b/src/link/link_image_id.c @@ -0,0 +1,52 @@ +/* link_image_id_compute: format-agnostic 16-byte identity hash for a + * resolved LinkImage. + * + * Two FNV-1a 64-bit streams with different seeds produce 128 bits. The + * mix covers each segment's vaddr, file_size, and post-shift bytes, so + * the digest changes if either content or layout shifts. Determinism + * (no time / random component) is intentional — reproducible builds. + * + * Wrapped per format: + * - ELF .note.gnu.build-id (link_emit_elf) + * - Mach-O LC_UUID payload (link_emit_macho, Phase 3) + * - COFF/PE debug directory (deferred) + * + * Lived in link_elf.c through Phase 0; lifted out so the Mach-O writer + * sees the same bytes (doc/MULTIOBJ.md §3.5). */ + +#include "core/core.h" +#include "link/link_internal.h" + +static u64 fnv1a64(const u8* data, size_t n, u64 seed) { + const u64 PRIME = 0x100000001b3ull; + u64 h = seed; + size_t i; + for (i = 0; i < n; ++i) { + h ^= (u64)data[i]; + h *= PRIME; + } + return h; +} + +void link_image_id_compute(const LinkImage* img, + u8 out[LINK_IMAGE_ID_BYTES]) { + const u64 SEED_LO = 0xcbf29ce484222325ull; + const u64 SEED_HI = 0x14650fb0739d0383ull; + u64 lo = SEED_LO, hi = SEED_HI; + u32 i; + for (i = 0; i < img->nsegments; ++i) { + const LinkSegment* seg = &img->segments[i]; + u64 vaddr = seg->vaddr; + u64 fsz = seg->file_size; + lo = fnv1a64((const u8*)&vaddr, sizeof vaddr, lo); + lo = fnv1a64((const u8*)&fsz, sizeof fsz, lo); + hi = fnv1a64((const u8*)&vaddr, sizeof vaddr, hi); + hi = fnv1a64((const u8*)&fsz, sizeof fsz, hi); + if (img->segment_bytes[i] && fsz) { + lo = fnv1a64(img->segment_bytes[i], (size_t)fsz, lo); + hi = fnv1a64(img->segment_bytes[i], (size_t)fsz, hi); + } + } + for (i = 0; i < 8; ++i) out[i] = (u8)(lo >> (i * 8)); + for (i = 0; i < 8; ++i) out[8 + i] = (u8)(hi >> (i * 8)); +} diff --git a/src/link/link_internal.h b/src/link/link_internal.h @@ -307,7 +307,17 @@ void link_reloc_apply(Compiler*, RelocKind, u8* P_bytes, u64 S, i64 A, u64 P); /* Public link_emit_image_writer dispatches by Compiler.target.obj. The * ELF implementation lives in link_elf.c and dispatches internally on - * Compiler.target.arch for e_machine and reloc translation. */ + * Compiler.target.arch for e_machine and reloc translation. The Mach-O + * peer (link_macho.c) and COFF peer arrive in later phases of + * doc/MULTIOBJ.md. */ void link_emit_elf(LinkImage*, Writer*); +/* Format-agnostic 16-byte image identity, derived from per-segment + * post-shift bytes + vaddrs/sizes. ELF wraps it in a + * .note.gnu.build-id; Mach-O will wrap it in LC_UUID; COFF/PE in a + * debug directory entry. One source of truth so the bytes match + * across formats. */ +#define LINK_IMAGE_ID_BYTES 16u +void link_image_id_compute(const LinkImage*, u8 out[LINK_IMAGE_ID_BYTES]); + #endif diff --git a/src/link/link_layout.c b/src/link/link_layout.c @@ -1746,6 +1746,29 @@ static u8 reloc_width(RelocKind k) { case R_AARCH64_TLSLE_ADD_TPREL_HI12: case R_AARCH64_TLSLE_ADD_TPREL_LO12_NC: return 4; + case R_RV_HI20: + case R_RV_LO12_I: + case R_RV_LO12_S: + case R_RV_BRANCH: + case R_RV_JAL: + case R_RV_PCREL_HI20: + case R_RV_PCREL_LO12_I: + case R_RV_PCREL_LO12_S: + case R_RV_GOT_HI20: + case R_RV_TPREL_HI20: + case R_RV_TPREL_LO12_I: + case R_RV_TPREL_LO12_S: + return 4; + case R_RV_CALL: + return 8; + case R_RV_RVC_BRANCH: + case R_RV_RVC_JUMP: + return 2; + /* Marker relocs that don't alter site bytes; width nonzero so the + * apply path treats them as recognized. */ + case R_RV_RELAX: + case R_RV_TPREL_ADD: + return 4; default: return 0; } @@ -1773,6 +1796,10 @@ static void emit_reloc_records(Linker* l, LinkImage* img, if (!s || !section_kept(s)) continue; /* Skip relocs whose containing section was GC'd. */ if (m->section[r->section_id] == LINK_SEC_NONE) continue; + /* RISC-V marker relocs (RELAX, TPREL_ADD) reference no symbol — + * they annotate the prior reloc for relaxation or TLS-add folding. + * We don't relax, so drop them entirely. */ + if (r->kind == R_RV_RELAX || r->kind == R_RV_TPREL_ADD) continue; if (r->sym == OBJ_SYM_NONE || r->sym >= m->nsym) compiler_panic(l->c, no_loc(), "link: reloc references unknown symbol"); target = m->sym[r->sym]; @@ -2231,7 +2258,7 @@ static void layout_iplt(Linker* l, LinkImage* img) { } pairs_section_name = pool_intern_cstr(l->c->global, ".iplt.pairs"); - init_section_name = pool_intern_cstr(l->c->global, ".preinit_array"); + init_section_name = obj_secname_preinit_array(l->c); iplt_sec = &img->sections[sec_base + 0u]; memset(iplt_sec, 0, sizeof(*iplt_sec)); @@ -2359,9 +2386,38 @@ static void layout_iplt(Linker* l, LinkImage* img) { img->iplt_pairs[2u * slot_idx + 1] = slot_vaddr; stub_dst = iplt_bytes + (size_t)(slot_idx * 12u); - wr_u32_le(stub_dst + 0, 0x90000010u); /* ADRP x16, #0 */ - wr_u32_le(stub_dst + 4, 0xf9400210u); /* LDR x16, [x16] */ - wr_u32_le(stub_dst + 8, 0xd61f0200u); /* BR x16 */ + switch (img->c->target.arch) { + case CFREE_ARCH_ARM_64: { + /* AArch64: ADRP x16, %page(slot) ; LDR x16, [x16, :lo12:slot] + * ; BR x16. The two immediates are zeroed and patched at apply + * time by the ADR_PREL_PG_HI21 / LDST64_ABS_LO12_NC relocs + * emitted below. */ + wr_u32_le(stub_dst + 0, 0x90000010u); /* ADRP x16, #0 */ + wr_u32_le(stub_dst + 4, 0xf9400210u); /* LDR x16, [x16] */ + wr_u32_le(stub_dst + 8, 0xd61f0200u); /* BR x16 */ + break; + } + case CFREE_ARCH_RV64: { + /* RV64: AUIPC t1, %hi ; LD t1, %lo(t1) ; JR t1. Both + * displacements are PC-relative differences between the + * stub and its slot — invariant under the segment-base + * shift — so we encode them directly without relocs. */ + i64 disp = (i64)slot_vaddr - (i64)stub_vaddr; + u32 hi20 = (u32)(((u64)(disp + 0x800)) >> 12) & 0xfffffu; + u32 lo12 = (u32)((u64)disp & 0xfffu); + u32 auipc = 0x00000317u | (hi20 << 12); /* auipc t1, hi */ + u32 ld_t1 = 0x00033303u | (lo12 << 20); /* ld t1, lo(t1) */ + u32 jr_t1 = 0x00030067u; /* jalr x0,t1,0 */ + wr_u32_le(stub_dst + 0, auipc); + wr_u32_le(stub_dst + 4, ld_t1); + wr_u32_le(stub_dst + 8, jr_t1); + break; + } + default: + compiler_panic(img->c, no_loc(), + "link: ifunc iplt stub not implemented for arch %u", + (unsigned)img->c->target.arch); + } /* Synthetic local symbol for the .igot.plt slot. */ memset(&slot_rec, 0, sizeof(slot_rec)); @@ -2389,33 +2445,36 @@ static void layout_iplt(Linker* l, LinkImage* img) { resolver_rec.size = 0; resolver_id = append_symbol(img, &resolver_rec); - /* Reloc on the ADRP at stub+0. */ - memset(&rrec, 0, sizeof(rrec)); - rrec.input_id = LINK_INPUT_NONE; - rrec.section_id = OBJ_SEC_NONE; - rrec.link_section_id = iplt_sec->id; - rrec.offset = (u32)(slot_idx * 12u); - rrec.width = 4; - rrec.write_vaddr = stub_vaddr; - rrec.write_file_offset = stub_vaddr; - rrec.kind = R_AARCH64_ADR_PREL_PG_HI21; - rrec.target = slot_id; - rrec.addend = 0; - *append_reloc_slot(img) = rrec; - - /* Reloc on the LDR at stub+4. */ - memset(&rrec, 0, sizeof(rrec)); - rrec.input_id = LINK_INPUT_NONE; - rrec.section_id = OBJ_SEC_NONE; - rrec.link_section_id = iplt_sec->id; - rrec.offset = (u32)(slot_idx * 12u + 4u); - rrec.width = 4; - rrec.write_vaddr = stub_vaddr + 4u; - rrec.write_file_offset = stub_vaddr + 4u; - rrec.kind = R_AARCH64_LDST64_ABS_LO12_NC; - rrec.target = slot_id; - rrec.addend = 0; - *append_reloc_slot(img) = rrec; + if (img->c->target.arch == CFREE_ARCH_ARM_64) { + /* Reloc on the ADRP at stub+0. RV64's stub is fully encoded + * inline above and needs no apply-time fixups. */ + memset(&rrec, 0, sizeof(rrec)); + rrec.input_id = LINK_INPUT_NONE; + rrec.section_id = OBJ_SEC_NONE; + rrec.link_section_id = iplt_sec->id; + rrec.offset = (u32)(slot_idx * 12u); + rrec.width = 4; + rrec.write_vaddr = stub_vaddr; + rrec.write_file_offset = stub_vaddr; + rrec.kind = R_AARCH64_ADR_PREL_PG_HI21; + rrec.target = slot_id; + rrec.addend = 0; + *append_reloc_slot(img) = rrec; + + /* Reloc on the LDR at stub+4. */ + memset(&rrec, 0, sizeof(rrec)); + rrec.input_id = LINK_INPUT_NONE; + rrec.section_id = OBJ_SEC_NONE; + rrec.link_section_id = iplt_sec->id; + rrec.offset = (u32)(slot_idx * 12u + 4u); + rrec.width = 4; + rrec.write_vaddr = stub_vaddr + 4u; + rrec.write_file_offset = stub_vaddr + 4u; + rrec.kind = R_AARCH64_LDST64_ABS_LO12_NC; + rrec.target = slot_id; + rrec.addend = 0; + *append_reloc_slot(img) = rrec; + } /* .iplt.pairs[i].resolver = &resolver (R_ABS64) */ memset(&rrec, 0, sizeof(rrec)); diff --git a/src/obj/macho.h b/src/obj/macho.h @@ -0,0 +1,259 @@ +/* Mach-O wire-format constants, structs, and per-arch reloc translators + * shared between obj/macho_emit.c, obj/macho_read.c, and link/link_macho.c + * (none of which exist yet — see doc/MULTIOBJ.md). + * + * Private to src/. The public ObjBuilder/Linker surface is format-neutral + * (obj/obj.h, link/link.h); the Mach-O spelling of those abstractions only + * exists inside libcfree. + * + * Scope: 64-bit little-endian only (MH_MAGIC_64). The per-arch reloc + * mapping is split across macho_reloc_<arch>.c, mirroring the ELF + * arrangement; emit_macho and the linker dispatch to the right + * translator by Compiler.target.arch. */ + +#ifndef CFREE_OBJ_MACHO_H +#define CFREE_OBJ_MACHO_H + +#include <cfree.h> + +#include "core/core.h" +#include "obj/obj.h" + +/* ---- magic ---- */ +#define MH_MAGIC_64 0xfeedfacfu +#define MH_CIGAM_64 0xcffaedfeu /* byte-swapped (big-endian host reading LE) */ + +/* ---- cputype / cpusubtype (subset cfree cares about) ---- */ +#define CPU_TYPE_X86 0x00000007 +#define CPU_TYPE_X86_64 0x01000007 +#define CPU_TYPE_ARM 0x0000000C +#define CPU_TYPE_ARM64 0x0100000C + +#define CPU_SUBTYPE_X86_64_ALL 3 +#define CPU_SUBTYPE_ARM64_ALL 0 + +/* ---- filetype ---- */ +#define MH_OBJECT 0x1 /* relocatable .o (no segments split) */ +#define MH_EXECUTE 0x2 /* main executable */ +#define MH_DYLIB 0x6 /* dynamically bound shared library */ +#define MH_DYLINKER 0x7 +#define MH_BUNDLE 0x8 + +/* ---- mach_header flags (subset) ---- */ +#define MH_NOUNDEFS 0x00000001u +#define MH_DYLDLINK 0x00000004u +#define MH_TWOLEVEL 0x00000080u +#define MH_PIE 0x00200000u + +/* ---- load command IDs (subset cfree will emit / consume) ---- */ +#define LC_REQ_DYLD 0x80000000u +#define LC_SEGMENT_64 0x19u +#define LC_SYMTAB 0x02u +#define LC_DYSYMTAB 0x0bu +#define LC_LOAD_DYLIB 0x0cu +#define LC_ID_DYLIB 0x0du +#define LC_LOAD_DYLINKER 0x0eu +#define LC_UUID 0x1bu +#define LC_FUNCTION_STARTS 0x26u +#define LC_DATA_IN_CODE 0x29u +#define LC_SOURCE_VERSION 0x2au +#define LC_BUILD_VERSION 0x32u +#define LC_DYLD_EXPORTS_TRIE (0x33u | LC_REQ_DYLD) +#define LC_DYLD_CHAINED_FIXUPS (0x34u | LC_REQ_DYLD) +#define LC_MAIN (0x28u | LC_REQ_DYLD) + +/* ---- header sizes ---- */ +#define MACHO_HDR64_SIZE 32u +#define MACHO_SEGCMD64_SIZE 72u +#define MACHO_SECT64_SIZE 80u +#define MACHO_SYMTAB_CMD_SIZE 24u +#define MACHO_DYSYMTAB_CMD_SIZE 80u +#define MACHO_NLIST64_SIZE 16u +#define MACHO_RELOC_SIZE 8u + +/* ---- on-disk structures (LE) ---- */ + +typedef struct MachHeader64 { + u32 magic; /* MH_MAGIC_64 */ + u32 cputype; /* CPU_TYPE_* */ + u32 cpusubtype; /* CPU_SUBTYPE_* (low 24 bits) | feature flags */ + u32 filetype; /* MH_OBJECT / MH_EXECUTE / ... */ + u32 ncmds; /* number of load commands */ + u32 sizeofcmds; /* total bytes of load commands */ + u32 flags; /* MH_* */ + u32 reserved; +} MachHeader64; + +typedef struct MachLoadCmd { + u32 cmd; /* LC_* */ + u32 cmdsize; /* size of this command including header */ +} MachLoadCmd; + +/* LC_SEGMENT_64: one per Mach-O segment. Followed by `nsects` + * MachSection64 records inline. */ +typedef struct MachSegmentCmd64 { + u32 cmd; /* LC_SEGMENT_64 */ + u32 cmdsize; /* sizeof(this) + nsects * sizeof(MachSection64) */ + char segname[16]; + u64 vmaddr; + u64 vmsize; + u64 fileoff; + u64 filesize; + u32 maxprot; + u32 initprot; + u32 nsects; + u32 flags; +} MachSegmentCmd64; + +/* Mach-O section descriptor, embedded inside an LC_SEGMENT_64. */ +typedef struct MachSection64 { + char sectname[16]; + char segname[16]; + u64 addr; + u64 size; + u32 offset; + u32 align; /* power of 2 (so 3 means 8-byte align) */ + u32 reloff; + u32 nreloc; + u32 flags; + u32 reserved1; + u32 reserved2; + u32 reserved3; +} MachSection64; + +typedef struct MachSymtabCmd { + u32 cmd; /* LC_SYMTAB */ + u32 cmdsize; + u32 symoff; + u32 nsyms; + u32 stroff; + u32 strsize; +} MachSymtabCmd; + +typedef struct MachDysymtabCmd { + u32 cmd; /* LC_DYSYMTAB */ + u32 cmdsize; + u32 ilocalsym; + u32 nlocalsym; + u32 iextdefsym; + u32 nextdefsym; + u32 iundefsym; + u32 nundefsym; + u32 tocoff; + u32 ntoc; + u32 modtaboff; + u32 nmodtab; + u32 extrefsymoff; + u32 nextrefsyms; + u32 indirectsymoff; + u32 nindirectsyms; + u32 extreloff; + u32 nextrel; + u32 locreloff; + u32 nlocrel; +} MachDysymtabCmd; + +/* nlist_64 entry. n_type packs N_STAB | N_PEXT | N_TYPE | N_EXT. */ +typedef struct MachNlist64 { + u32 n_strx; + u8 n_type; + u8 n_sect; /* 1-based section index, 0 = NO_SECT */ + u16 n_desc; + u64 n_value; +} MachNlist64; + +/* ---- nlist n_type bits ---- */ +#define N_STAB 0xe0u +#define N_PEXT 0x10u +#define N_TYPE 0x0eu +#define N_EXT 0x01u + +/* N_TYPE values */ +#define N_UNDF 0x0u +#define N_ABS 0x2u +#define N_SECT 0xeu +#define N_PBUD 0xcu +#define N_INDR 0xau + +#define NO_SECT 0u + +/* n_desc bits (subset) */ +#define N_NO_DEAD_STRIP 0x0020u +#define N_WEAK_REF 0x0040u +#define N_WEAK_DEF 0x0080u +#define REFERENCE_FLAG_UNDEFINED_NON_LAZY 0x0u +#define REFERENCE_FLAG_UNDEFINED_LAZY 0x1u + +/* ---- section type / attributes (subset of section.flags) ---- */ +#define SECTION_TYPE 0x000000ffu +#define SECTION_ATTRIBUTES 0xffffff00u + +#define S_REGULAR 0x0u +#define S_ZEROFILL 0x1u +#define S_CSTRING_LITERALS 0x2u +#define S_NON_LAZY_SYMBOL_POINTERS 0x6u +#define S_LAZY_SYMBOL_POINTERS 0x7u +#define S_SYMBOL_STUBS 0x8u +#define S_MOD_INIT_FUNC_POINTERS 0x9u +#define S_MOD_TERM_FUNC_POINTERS 0xau +#define S_COALESCED 0xbu +#define S_INTERPOSING 0xdu +#define S_THREAD_LOCAL_REGULAR 0x11u +#define S_THREAD_LOCAL_ZEROFILL 0x12u +#define S_THREAD_LOCAL_VARIABLES 0x13u +#define S_THREAD_LOCAL_VARIABLE_POINTERS 0x14u +#define S_THREAD_LOCAL_INIT_FUNCTION_POINTERS 0x15u + +#define S_ATTR_PURE_INSTRUCTIONS 0x80000000u +#define S_ATTR_SOME_INSTRUCTIONS 0x00000400u +#define S_ATTR_DEBUG 0x02000000u +#define S_ATTR_NO_DEAD_STRIP 0x10000000u + +/* ---- relocation_info (external/scattered union; cfree emits only the + * external form for arm64 / x86_64) ---- + * + * Wire layout (little-endian): + * u32 r_address; offset within the section the reloc patches + * u32 packed; bitfield: r_symbolnum:24, r_pcrel:1, r_length:2, + * r_extern:1, r_type:4 + * + * length encoding: 0=byte, 1=word, 2=long, 3=quad. */ +typedef struct MachRelocInfo { + u32 r_address; + u32 r_packed; +} MachRelocInfo; + +/* ---- arm64 reloc types (r_type field) ---- */ +#define ARM64_RELOC_UNSIGNED 0u +#define ARM64_RELOC_SUBTRACTOR 1u +#define ARM64_RELOC_BRANCH26 2u +#define ARM64_RELOC_PAGE21 3u +#define ARM64_RELOC_PAGEOFF12 4u +#define ARM64_RELOC_GOT_LOAD_PAGE21 5u +#define ARM64_RELOC_GOT_LOAD_PAGEOFF12 6u +#define ARM64_RELOC_POINTER_TO_GOT 7u +#define ARM64_RELOC_TLVP_LOAD_PAGE21 8u +#define ARM64_RELOC_TLVP_LOAD_PAGEOFF12 9u +#define ARM64_RELOC_ADDEND 10u + +/* ---- x86_64 reloc types (for the translator that lands when x64 + * codegen does) ---- */ +#define X86_64_RELOC_UNSIGNED 0u +#define X86_64_RELOC_SIGNED 1u +#define X86_64_RELOC_BRANCH 2u +#define X86_64_RELOC_GOT_LOAD 3u +#define X86_64_RELOC_GOT 4u +#define X86_64_RELOC_SUBTRACTOR 5u +#define X86_64_RELOC_SIGNED_1 6u +#define X86_64_RELOC_SIGNED_2 7u +#define X86_64_RELOC_SIGNED_4 8u +#define X86_64_RELOC_TLV 9u + +/* Map cfree-canonical RelocKind <-> arm64 Mach-O reloc type. Returns + * (u32)-1 on unsupported kinds; the caller (emit_macho / read_macho) + * panics with a diagnostic. Stubs in macho_reloc_aarch64.c until the + * Phase 2 writer lands (see doc/MULTIOBJ.md). */ +u32 macho_aarch64_reloc_to(u32 kind /* RelocKind */); +u32 macho_aarch64_reloc_from(u32 macho_type); + +#endif diff --git a/src/obj/macho_reloc_aarch64.c b/src/obj/macho_reloc_aarch64.c @@ -0,0 +1,37 @@ +/* RelocKind <-> arm64 Mach-O reloc-type mapping. Mirror of + * elf_reloc_aarch64.c for Mach-O. Stubbed in Phase 1 of the + * MULTIOBJ plan (doc/MULTIOBJ.md): the translator declarations + * exist so macho.h compiles, but neither path is reachable yet — + * the writer / reader / linker peers (macho_emit.c, macho_read.c, + * link_macho.c) are Phase 2/3 work. + * + * Filling these in is part of Phase 2. Until then, callers panic + * via the (u32)-1 sentinel (mirrors the elf_aarch64_reloc_from + * convention) — but no caller exists. The compile-time check that + * obj/macho.h's declarations match a real definition is the value + * this TU provides today. */ + +#include "core/util.h" +#include "obj/macho.h" + +u32 macho_aarch64_reloc_to(u32 kind /* RelocKind */) { + (void)kind; + /* Phase 2: full RelocKind <-> ARM64_RELOC_* table, with + * R_AARCH64_CALL26 / R_AARCH64_JUMP26 -> ARM64_RELOC_BRANCH26, + * R_AARCH64_ADR_PREL_PG_HI21 -> ARM64_RELOC_PAGE21, + * R_AARCH64_ADD_ABS_LO12_NC / R_AARCH64_LDST*_ABS_LO12_NC -> + * ARM64_RELOC_PAGEOFF12, + * R_AARCH64_ADR_GOT_PAGE -> ARM64_RELOC_GOT_LOAD_PAGE21, + * R_AARCH64_LD64_GOT_LO12_NC -> ARM64_RELOC_GOT_LOAD_PAGEOFF12, + * R_ABS64 -> ARM64_RELOC_UNSIGNED, etc. Non-zero addends emit a + * leading ARM64_RELOC_ADDEND pair (see doc/MULTIOBJ.md §3.2). */ + return (u32)-1; +} + +u32 macho_aarch64_reloc_from(u32 macho_type) { + (void)macho_type; + /* Phase 2: inverse of macho_aarch64_reloc_to, plus + * ARM64_RELOC_SUBTRACTOR pair recognition (read-only; + * cgtarget does not emit difference relocs in v1). */ + return (u32)-1; +} diff --git a/src/obj/obj.h b/src/obj/obj.h @@ -360,6 +360,22 @@ void obj_symiter_free(ObjSymIter*); * (see src/core/core.h). The streaming API lives in <cfree.h> as * cfree_writer_*. */ +/* ---- format-aware canonical section names ---- + * + * For sections the linker synthesizes (init/fini arrays, TLS template + * sections), the spelling diverges across object formats: ELF uses + * `.init_array` / `.tdata` / etc., Mach-O uses + * `__DATA,__mod_init_func` / `__DATA,__thread_data` / etc. These + * helpers pick the right name for the active target.obj so the linker + * doesn't carry per-format switches at every synthesis site. ELF + * returns the historical names; Mach-O / COFF panic until those + * writers land (see doc/MULTIOBJ.md §3.5). */ +Sym obj_secname_init_array(Compiler*); +Sym obj_secname_fini_array(Compiler*); +Sym obj_secname_preinit_array(Compiler*); +Sym obj_secname_tdata(Compiler*); +Sym obj_secname_tbss(Compiler*); + /* ---- file format emitters ---- */ void emit_elf(Compiler*, ObjBuilder*, Writer*); void emit_coff(Compiler*, ObjBuilder*, Writer*); diff --git a/src/obj/obj_secnames.c b/src/obj/obj_secnames.c @@ -0,0 +1,97 @@ +/* Format-aware canonical section names. + * + * The cfree-internal section model (obj/obj.h) is format-neutral: every + * Section carries a single Sym name plus a SecKind tag. Most sections + * keep ELF-style dot-prefixed names ("`.text`", "`.data`", …) end-to-end + * because the per-format writer translates them as it emits headers. + * + * A handful of *synthetic* sections — built by the linker rather than + * the front end — diverge in name across formats. Their names need to + * be picked at synthesis time, before any writer sees them, because the + * linker uses the name to drive layout, symbol-boundary emission, and + * the writer's output-section bucketing. This TU centralizes that + * choice so callers don't sprinkle target-format switches through + * link_layout.c / link_dyn.c. + * + * Phase 1 of doc/MULTIOBJ.md: ELF returns the historical name; Mach-O + * panics with a "TODO" until the macho writer lands in Phase 2/3. COFF + * panics in the same way and is filled in later. */ + +#include "obj/obj.h" +#include "core/core.h" +#include "core/pool.h" + +static Sym secname_panic_unimpl(Compiler* c, const char* which) { + SrcLoc l = {0, 0, 0}; + compiler_panic(c, l, + "obj section name '%s' for target obj=%u not yet " + "implemented (see doc/MULTIOBJ.md §3.5)", + which, (unsigned)c->target.obj); + return 0; +} + +Sym obj_secname_init_array(Compiler* c) { + switch (c->target.obj) { + case CFREE_OBJ_ELF: + return pool_intern_cstr(c->global, ".init_array"); + case CFREE_OBJ_MACHO: + /* TODO Phase 2: "__DATA,__mod_init_func" with + * S_MOD_INIT_FUNC_POINTERS section type. */ + return secname_panic_unimpl(c, ".init_array"); + default: + return secname_panic_unimpl(c, ".init_array"); + } +} + +Sym obj_secname_fini_array(Compiler* c) { + switch (c->target.obj) { + case CFREE_OBJ_ELF: + return pool_intern_cstr(c->global, ".fini_array"); + case CFREE_OBJ_MACHO: + /* TODO Phase 2: "__DATA,__mod_term_func" with + * S_MOD_TERM_FUNC_POINTERS section type. */ + return secname_panic_unimpl(c, ".fini_array"); + default: + return secname_panic_unimpl(c, ".fini_array"); + } +} + +Sym obj_secname_preinit_array(Compiler* c) { + switch (c->target.obj) { + case CFREE_OBJ_ELF: + return pool_intern_cstr(c->global, ".preinit_array"); + case CFREE_OBJ_MACHO: + /* Mach-O has no direct `.preinit_array` analogue — dyld runs + * S_MOD_INIT_FUNC_POINTERS only. Phase 3 of the linker will + * route the IFUNC ctor through __mod_init_func instead and + * this entry point will become a target.obj-specific shim. */ + return secname_panic_unimpl(c, ".preinit_array"); + default: + return secname_panic_unimpl(c, ".preinit_array"); + } +} + +Sym obj_secname_tdata(Compiler* c) { + switch (c->target.obj) { + case CFREE_OBJ_ELF: + return pool_intern_cstr(c->global, ".tdata"); + case CFREE_OBJ_MACHO: + /* TODO Phase 2: Mach-O TLS uses __DATA,__thread_data with + * S_THREAD_LOCAL_REGULAR plus a tlv_descriptor record. */ + return secname_panic_unimpl(c, ".tdata"); + default: + return secname_panic_unimpl(c, ".tdata"); + } +} + +Sym obj_secname_tbss(Compiler* c) { + switch (c->target.obj) { + case CFREE_OBJ_ELF: + return pool_intern_cstr(c->global, ".tbss"); + case CFREE_OBJ_MACHO: + /* TODO Phase 2: __DATA,__thread_bss with S_THREAD_LOCAL_ZEROFILL. */ + return secname_panic_unimpl(c, ".tbss"); + default: + return secname_panic_unimpl(c, ".tbss"); + } +}