commit 049d0f0ae42e920aa7f5a997dbc23fbe53fef7c0
parent 5d0d1ca2b5c27ba40ed16d414737d18431be255c
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sun, 10 May 2026 14:58:13 -0700
doc: rm 7 stale design docs + strip stale references
Removes doc/{cg_testing,DYNLD,linker-status,MULTIARCH,MULTIOBJ,REGALLOC,rv64-status}.md
and prunes the now-dangling "see doc/X.md" pointers scattered through
src/, test/, and doc/ASM.md.
Diffstat:
26 files changed, 28 insertions(+), 2298 deletions(-)
diff --git a/doc/ASM.md b/doc/ASM.md
@@ -2,7 +2,7 @@
Scope: bring up the asm frontend (standalone `.s` and inline `asm("...")`)
and the matching disassembler, starting with aarch64. Companion to
-`DESIGN.md §10` and `MULTIARCH.md`.
+`DESIGN.md §10`.
The asm and disasm sides are designed together so that one description of
each instruction serves both: same field layout, same operand syntax, same
@@ -115,7 +115,7 @@ entries and operand-print/parse drift.
## 4. Module layout
-Reuse `aa64` prefix (`MULTIARCH.md §5`).
+Reuse `aa64` prefix.
```
src/parse/parse_asm.c shared driver: scan tokens, dispatch directives,
@@ -292,8 +292,8 @@ and print).
## 5. Phasing
Each phase ends mergeable. Phase 1 stands up the test harness so every
-later phase gates on real runs from its first commit (mirrors
-`MULTIARCH.md §4` Phase 1). Phase 2 lands the encode/decode pairing as
+later phase gates on real runs from its first commit. Phase 2 lands the
+encode/decode pairing as
a mechanical refactor; phase 3 is the standalone assembler; phase 4 is
inline asm + disasm overlay; phase 5 is the seam-rev for x64/rv64.
diff --git a/doc/DYNLD.md b/doc/DYNLD.md
@@ -1,99 +0,0 @@
-# Dynamic linking — status & remaining work
-
-Scope: producing dynamic-linked aarch64-linux ELF executables (and,
-eventually, shared libs) that run against a real musl or glibc
-libc.so.
-
-## Status
-
-`make test-musl` passes 6/6 (3 static + 3 dynamic) and
-`make test-glibc` passes 3/3 (dynamic-only). Each produces an
-ET_DYN PIE that runs end-to-end against the runtime loader
-(`/lib/ld-musl-aarch64.so.1` or `/lib/ld-linux-aarch64.so.1`) for
-`01_syscall_write`, `02_errno_touch`, `03_printf_hello`.
-
-`.dynamic` lives in a PF_R+W segment (alongside `.got.plt`) because
-glibc's loader patches `DT_*` `d_un.d_ptr` fields in-place at startup
-(`elf_get_dynamic_info` adjusts STRTAB/SYMTAB/etc. by `l_addr`); a
-PF_R-only segment causes `SEGV_ACCERR`. musl's loader doesn't do
-this rewrite, but RW placement is conventional and works for both.
-
-What's wired (don't re-derive — read the code if you need detail):
-
-- DSO ingest (`read_elf_dso`, `LINK_INPUT_DSO_BYTES`, soname tracking).
-- Driver: `-dynamic-linker`, `.so` / `.so.N` positional inputs,
- `-l<name>` honoring `-Bdynamic`/`-Bstatic`.
-- Synthetic dyn tables: `.interp` / `.dynsym` / `.dynstr` / `.gnu.hash`
- / `.rela.dyn` / `.rela.plt` / `.plt` / `.got.plt` / `.dynamic`
- (`src/link/link_dyn.c::layout_dyn`).
-- PIE / ET_DYN emit: `e_type`, `img_base`, PT_PHDR / PT_INTERP /
- PT_DYNAMIC / PT_GNU_STACK, `R_AARCH64_RELATIVE` for internal abs
- fixups (`src/link/link_elf.c`).
-- PLT body emit (PLT0 + per-import 16-byte stubs) and import-reloc
- routing: CALL26/JUMP26 → PLT entry (`sym_plt_vaddr`),
- abs-against-import → GLOB_DAT, GOT-redirected slot fills →
- GLOB_DAT via the existing `layout_got` path. BIND_NOW only
- (`DF_1_NOW`); PLT0 is canonical but unused.
-
-## Remaining work
-
-### Phase 7 — `cfree_link_shared` (small)
-
-Files: `src/api/pipeline.c`, `src/link/`.
-
-Replace the panic at `pipeline.c:413` with a dispatch into the same
-machinery as `link_exe`, with:
-- `output_kind = SHARED` (no PT_INTERP, no entry-symbol requirement,
- `allow_undefined = 1`).
-- DT_SONAME from `opts->soname`.
-- DT_R(UN)PATH from `opts->r(un)paths`.
-- Exports promoted into `.dynsym` from `opts->exports`.
-
-Add a harness case under `test/libc/cases/` (shared by the
-`test/libc/musl/` and `test/libc/glibc/` runners) or a new
-`test/link/dyn/`:
-build `libfoo.so` from a single `.c`, link an exe against it, run.
-
-### Phase 8 — TLS GD/IE/LD, IRELATIVE (deferred)
-
-Required for shared-lib TLS and IFUNCs in dynamic outputs. Out of
-scope for the v1 dynamic exe; the musl harness doesn't exercise them.
-
-### Polish / follow-ups (none blocking)
-
-- **`--export-dynamic`**: promote internal globals into `.dynsym`.
- Mechanical; not exercised by the musl harness.
-- **`.gnu.hash` sort-by-bucket**: current code assumes hashed
- symbols land contiguously in `.dynsym`. Fine for small import
- sets; needs a sort pass before scaling.
-- **`--as-needed`**: today every DSO with a soname gets a DT_NEEDED.
- Plumb the flag through to filter on actual import use.
-- **Linker-script DSO inputs**: Debian ships
- `/usr/lib/aarch64-linux-gnu/libc.so` as a GNU-ld script
- (`GROUP ( libc.so.6 libc_nonshared.a ld-linux-aarch64.so.1 )`).
- `cfree ld` doesn't recognize a script in DSO position, so the
- glibc harness hands `libc.so.6` + `libc_nonshared.a` directly.
- `link_script.c` already parses the kernel.lds subset; extend it
- to handle bare GROUP/INPUT scripts and wire `-l<name>` /
- positional `.so` resolution to fan out the listed inputs.
-- **Versioned symbols** (`.gnu.version` / `.gnu.version_r`): musl
- doesn't use them; glibc does.
-- **Lazy binding**: would need a real `_dl_runtime_resolve` PLT0
- reference. Skip until perf demands it.
-- **Unit-level dyn-table harness** under `test/link/dyn/`: round-trip
- `.dynsym` / `.gnu.hash` / `.rela.{dyn,plt}` / `.plt` body against
- `readelf -d -r --dyn-syms` and `objdump -d --section=.plt`. Faster
- than waiting on a full musl run to catch a malformed `.dynamic`
- or mis-encoded PLT stub.
-
-## Open questions
-
-1. **Versioned symbols.** v1 ignores versions on read (matches GNU
- ld's behavior with unversioned objects against versioned libs —
- the unversioned default version is taken). Write-side versioning
- is a follow-up that's invisible to the musl harness.
-
-2. **`.eh_frame_hdr` interaction.** A near-term gap in
- `linker-status.md` independent of dynamic linking, but it touches
- the same phdr synthesis code. If it sequences in the same window,
- land it alongside Phase 7 — phdr count growth is shared.
diff --git a/doc/MULTIARCH.md b/doc/MULTIARCH.md
@@ -1,286 +0,0 @@
-# MULTIARCH — plan for adding a second architecture
-
-Scope: turn cfree from an aarch64-only compiler into one that supports
-multiple `(arch, os, objfmt)` triples. The first new arch is x86_64;
-the first new platform/objfmt and the asm frontend land later, on the
-seams this work establishes.
-
-Today the codebase has one of each: aarch64 codegen (`src/arch/aarch64.c`),
-AAPCS64 ABI (`src/abi/abi.c`), ELF emission with aarch64 relocs
-(`src/obj/elf_emit.c` + `src/obj/elf_reloc_aarch64.c`), and an aarch64
-emulator-driven test path (`src/emu/`). `cgtarget_new`, `emit_elf`,
-`link_elf`, and the disassembler all panic on non-AArch64 targets.
-
-The goal of the first phase is to introduce the seams that a second
-arch forces — without yet writing x64 codegen. After the seams land,
-x64 bring-up is purely additive: new files, no edits to the
-arch-aware ones except the dispatch tables.
-
----
-
-## 1. Target slice for first x64 milestone
-
-| axis | value |
-|-----------|--------------------------------|
-| arch | `CFREE_ARCH_X86_64` |
-| os | `CFREE_OS_LINUX` |
-| objfmt | `CFREE_OBJ_ELF` |
-| ABI | SysV AMD64 |
-| codemodel | `CFREE_CM_SMALL` (default) |
-
-Mach-O, PE/COFF, Win64 ABI, and macOS-arm64 all wait until the seams
-are validated by a working x86_64-linux-gnu path. This keeps the
-arch-seam work decoupled from the objfmt-seam work — only one is on
-the critical path at a time.
-
----
-
-## 2. The seams
-
-### 2.1 CGTarget construction — dispatch by arch
-
-`src/arch/aarch64.c:2998` is currently the public `cgtarget_new`. Split:
-
-- Rename the AArch64 constructor to `aa64_cgtarget_new`, declared in a
- new `src/arch/aa64.h`.
-- Add `src/arch/x64.h` and `src/arch/x64.c` with `x64_cgtarget_new` and
- the equivalent `XImpl` skeleton (vtable wired up to method stubs).
-- New `src/arch/cgtarget.c` owns the public `cgtarget_new` and switches
- on `c->target.arch`.
-
-Same dispatch shape for `arch_disasm_new` (already factored via the
-`ArchDisasm` hook — just needs a switch).
-
-`mc_new` does not change yet: a single `MCEmitter` impl serves all
-arches.
-
-### 2.2 MCEmitter fixup encodings — extend the switch
-
-`src/arch/mc.c:84-134` has aarch64 BL/B.cond/CONDBR19 bit layouts
-hardcoded in `apply_fixup`. **Decision:** extend the same switch with
-x64 cases (`R_PC32` already works for jumps; add `R_PC8` for short
-jumps). No per-arch fixup vtable — the encoding is one-line
-little-endian patching, and the abstraction would be premature.
-
-Action item: update the file's "target-agnostic" header comment to
-reflect that `mc.c` is the union of all known fixup encodings, not a
-generic library.
-
-### 2.3 ABI classification — TargetABI vtable
-
-**Decision:** promote `TargetABI` to carry function pointers for the
-parts that vary by `(arch, os)`. `abi_init` selects the right impl set
-based on `c->target`. Rationale: even at two ABIs, the SysV AMD64
-classifier is large enough (eight-byte classification, INTEGER/SSE
-classes, x87 corners) that an in-line switch in `abi_func_info` would
-be ugly; and Win64 + macOS-arm64 are visible on the roadmap, so the
-indirection pays off quickly.
-
-Concrete changes:
-
-- Add a vtable to `TargetABI` (function pointers for `func_info`,
- `record_layout`, `va_list_type`, scalar profiles where they vary).
-- Move the AAPCS64 classifier out of `abi.c` into
- `src/abi/abi_aapcs64.c`, exposing an `aapcs64_vtable` symbol.
-- Add `src/abi/abi_sysv_x64.c` exposing `sysv_x64_vtable`. Initial
- classifier returns `ABI_ARG_INDIRECT` for everything (correct, slow,
- unblocks bring-up); fill in the eight-byte rules incrementally.
-- `abi_init` switches on `(target.arch, target.os)` and copies the
- right vtable in.
-
-The public `abi_func_info` / `abi_record_layout` / `abi_*_type` API in
-`src/abi/abi.h` does not change — only the internals dispatch through
-the vtable.
-
-### 2.4 Object format — ELF reloc translator dispatch
-
-`src/obj/elf_reloc_aarch64.c` already exists as a per-arch reloc
-translator; the seam is half there. Finish it:
-
-- Add `src/obj/elf_reloc_x86_64.c` mirroring it for `R_X86_64_*` codes.
-- Extend `RelocKind` in `src/obj/obj.h` with `R_X64_*` entries
- (`R_X64_PC32`, `R_X64_PLT32`, `R_X64_GOTPCREL`, ...).
-- `src/obj/elf_emit.c:246-249` panics on non-aarch64 and hardcodes
- `e_machine`. Replace with a switch that picks `EM_AARCH64` /
- `EM_X86_64` and the right reloc-translator function pointer.
-- `src/link/link_elf.c:575` — same treatment.
-
-Mach-O and PE/COFF emitters slot in as peers of `elf_emit.c` later,
-each with its own per-arch reloc translator file. The reloc-translator
-pattern established here is what makes that cheap.
-
-### 2.5 Header types per (arch, abi)
-
-`abi_size_type`, `abi_ptrdiff_type`, `abi_intptr_type`,
-`abi_uintptr_type`, `abi_va_list_type` are already abstracted. The
-SysV-x64 vtable (§2.3) supplies the right `__va_list_tag` struct
-(`gp_offset`, `fp_offset`, `overflow_arg_area`, `reg_save_area`).
-
-`rt/include/` headers that are arch-conditioned will gate on
-`__x86_64__` / `__aarch64__` predefines in the preprocessor (already
-the convention used elsewhere in `rt/`). No new mechanism needed.
-
----
-
-## 3. Test/run path — execute from day 1
-
-The harness already uses podman+qemu for aarch64 — see
-`test/lib/exec_aarch64.sh`, `test/cg/run.sh:104-118`, and the libc
-sysroot extractors in `test/libc/{musl,glibc}/`. The pattern: detect
-qemu-user or podman, batch all queued exes through one `podman run`
-to amortize launch overhead, fall back to a host serial loop. Test
-selection is per-arch with per-target XFAIL.
-
-x64 inherits this machinery wholesale. There is no new "runner" to
-design — only an x64-shaped peer of `exec_aarch64.sh` and a per-arch
-dispatch where `cg/run.sh` currently sources the aarch64 helper
-unconditionally. The default container image (`alpine:latest`)
-already runs linux/amd64 binaries; the `--platform linux/amd64` flag
-selection mirrors the existing `is_aarch64` gate in
-`exec_aarch64.sh:48`.
-
-Execution tests are gating from the first hello-world. We do not
-build out an x64 lifter inside `src/emu/` — that path is aarch64-only
-and stays so.
-
----
-
-## 4. Phasing
-
-Three phases. Each is independently mergeable; phase 1 and phase 2
-land with no aarch64 behavior change.
-
-### Phase 1 — test harness updates
-
-Stand up the x64 execution path before any compiler code moves, so
-phase 3 milestones can be gated on real runs from their first commit.
-
-1. Generalize `test/lib/exec_aarch64.sh` into a per-arch helper:
- either rename to `exec_target.sh` with arch-keyed function names
- (`exec_target_run aarch64 ...`), or add a peer
- `test/lib/exec_x64.sh` with the same surface. The batched
- `podman run` body is identical — only `--platform`,
- `RUN_*_IMAGE`, and the `is_<arch>` gate vary.
-2. `test/cg/run.sh` (and the other harnesses that source the helper)
- dispatch the executor by the case's target arch. The existing
- single-helper sourcing (`run.sh:118`) becomes a per-target choice.
-3. Sysroot extractors: add `test/libc/{musl,glibc}/` x86_64 variants
- if the libc tests claim x64 coverage. For the cg suite (static,
- no libc), no sysroot is needed — `alpine:latest` is enough.
-4. Smoke test: a hand-rolled or clang-built x86_64 ELF is queued and
- flushed through the new executor and exits cleanly. This proves
- the harness end-to-end before any cfree-emitted x64 bytes exist.
-5. Per-test arch declaration: `test/cg/` cases gain a way to say
- which arches they run on (default: all supported). Per-target
- XFAIL is keyed off the same.
-
-Exit criterion: aarch64 suite green; the smoke test runs an external
-x86_64 binary through the harness and reports pass on the standard
-pass/fail line.
-
-### Phase 2 — code refactors (multi-arch seams + x64 stubs)
-
-Pure refactors. No aarch64 output changes; x64 reachable but every
-codegen call panics with "x64: not implemented".
-
-1. **CGTarget dispatch** (§2.1). Rename `aarch64.c::cgtarget_new` to
- `aa64_cgtarget_new`, declared in `arch/aa64.h`. New
- `arch/cgtarget.c` owns the public `cgtarget_new` and switches on
- `c->target.arch`. Same split for `arch_disasm_new`.
-2. **x64 skeleton.** Add `arch/x64.{h,c}` with the full vtable wired
- up to stub methods that panic on call. `cgtarget_new` dispatches
- to it for `CFREE_ARCH_X86_64`.
-3. **ABI vtable** (§2.3). Promote `TargetABI` to carry function
- pointers for `func_info`, `record_layout`, `va_list_type`, and
- the scalar profiles that vary. `abi_init` switches on
- `(target.arch, target.os)` and installs the right vtable.
-4. **AAPCS64 split.** Move the AAPCS64 classifier from `abi.c` into
- `abi/abi_aapcs64.c`, exposing `aapcs64_vtable`. `abi.c` keeps the
- generic dispatch and the C-standard-driven scalar/record bits.
-5. **SysV-x64 stub.** Add `abi/abi_sysv_x64.c` exposing
- `sysv_x64_vtable` with `ABI_ARG_INDIRECT` for everything. Wired
- up by `abi_init` for `(X86_64, LINUX)` but unreachable until
- phase 3 fills `arch/x64.c`.
-6. **ELF reloc dispatch** (§2.4). Add `R_X64_*` to `RelocKind`. New
- `obj/elf_reloc_x86_64.c` mirrors `elf_reloc_aarch64.c`.
- `obj/elf_emit.c:246-249` and `link/link_elf.c:575` lose their
- AArch64-only panics in favor of a switch on `c->target.arch` that
- picks `e_machine` (`EM_AARCH64` / `EM_X86_64`) and the reloc
- translator.
-7. **Allowlist consolidation.** Other `arch != ARM_64` panics
- (anywhere they remain after the splits above) move into one
- target-validation function, easy to extend.
-
-Exit criterion: aarch64 suite green and byte-for-byte identical
-output objects on a representative `test/cg/` set; an x86_64-linux
-target reaches `arch/x64.c`'s stubs (the panic is the proof the
-dispatch is wired).
-
-### Phase 3 — implementations
-
-Now purely additive. Fill in the stubs from phase 2; each milestone
-gates on the podman/qemu execution path from phase 1.
-
-1. **mc.c x64 fixups.** Add x64 reloc-kind cases to
- `arch/mc.c::apply_fixup` (`R_PC32` already works for jumps;
- `R_PC8` for short jumps; whatever else x64 codegen actually
- emits).
-2. **SysV-x64 ABI classifier.** Replace the `ABI_ARG_INDIRECT` stub
- in `abi/abi_sysv_x64.c` with the eight-byte INTEGER/SSE
- classification. `va_list` returns the SysV `__va_list_tag`
- struct.
-3. **x64 codegen** in `arch/x64.c`, in the rough order the parser
- phases established for aarch64:
- 1. Hello-world (`int main(void) { return 0; }`) — exit via
- syscall through inline asm, or a libc-less return path.
- 2. Integer arithmetic + locals — frame slots, spill/reload on the
- x64 register pool. The CG-driven spill/reload from commit
- `9724439` is reusable; only the physical-register pool and
- load/store encodings change.
- 3. Calls — SysV register passing, basic eight-byte classification
- paired with the ABI work above.
- 4. Loads/stores at every width — `mov` size variants, sign/zero
- extension corners.
- 5. Compare-and-branch, structured control flow.
- 6. Aggregates, bitfields, varargs, atomics, intrinsics.
-4. **x64 disassembler** in `arch/` — peer of `aa64_isa.{c,h}`.
- Required for the textual disasm path used by some `test/cg/`
- gates.
-5. **Asm frontend.** Once x64 codegen lands, the asm frontend
- (`parse_asm`) is the cheapest way to author instruction-sequence
- tests without driving through the C frontend. Lands as a peer of
- `src/parse/` consuming `MCEmitter` directly.
-
-Exit criterion: each x64 milestone owns a `test/cg/` case running
-under both the aarch64 emulator and the podman/qemu x64 path; both
-report green on the standard pass/fail line.
-
----
-
-## 5. Naming conventions
-
-For the new files and exposed symbols:
-
-- aarch64 → `aa64` prefix in code (`aa64_cgtarget_new`, `aa64_vtable`,
- files under `arch/aa64*` and `abi/abi_aapcs64.c`). The existing
- `aarch64.c` and `aa64_isa.{c,h}` already mix the two; keep `aa64` as
- the going-forward convention.
-- x86_64 → `x64` prefix (`x64_cgtarget_new`, files under `arch/x64*`
- and `abi/abi_sysv_x64.c`). Avoid `x86_64` in identifiers — too long.
-- ELF reloc translators stay arch-suffixed:
- `elf_reloc_aarch64.c`, `elf_reloc_x86_64.c` (matches ELF spec
- naming).
-
----
-
-## 6. Validation gates
-
-A change in this plan is "done" when:
-
-- Phase A/B/C: aarch64 test suite still green, byte-for-byte identical
- output objects on a representative set of `test/cg/` cases.
-- Phase D: a hand-rolled x86_64 ELF round-trips through the podman
- runner with the right exit code.
-- Phase E: each milestone owns a `test/cg/` case running under both
- the aarch64 emulator and the podman runner; pass/fail line green
- on both.
diff --git a/doc/MULTIOBJ.md b/doc/MULTIOBJ.md
@@ -1,847 +0,0 @@
-# MULTIOBJ — plan for adding a second object format (Mach-O)
-
-Scope: turn cfree from an ELF-only compiler/linker into one that
-supports multiple `(arch, os, objfmt)` triples on the **objfmt** axis.
-The first new objfmt is Mach-O, targeting macOS on aarch64 and (once
-x64 codegen lands) x86_64. PE/COFF for Windows is the next peer; the
-seams introduced here are designed so PE/COFF is purely additive on
-top — new files, no edits to format-aware ones except the dispatch
-tables.
-
-Companion to `MULTIARCH.md`. That doc was the **arch** axis; this is
-the **objfmt** axis. The two are intentionally separate critical
-paths: arch and objfmt seams land independently, validated against
-each other only at the `(arch, os, objfmt)` cross-product on the test
-matrix.
-
----
-
-## Status
-
-- [x] **Phase 1** — seams + format-aware bookkeeping
- - [x] Linker emit dispatch — per-format panics (§3.3)
- - [x] Build-id moved to format-agnostic `link_image_id_compute` (§3.5)
- - [x] ABI vtable selection keys on `(arch, os)`; `apple_arm64_vtable`
- aliases AAPCS64 (§3.4)
- - [x] `obj_secname_*` helpers for init/fini/preinit/tdata/tbss (§3.5)
- - [x] `src/obj/macho.h` + `macho_reloc_aarch64.c` stubs (§3.1, §3.2)
- - [x] ELF suite green (test-elf 37/37, test-link 119/119, test-cg 1549/1549)
-- [x] **Phase 2** — Mach-O object writer + reader (MH_OBJECT, arm64)
- - [x] `obj/macho_emit.c` — MH_OBJECT writer, leading-`_` C-symbol
- prefix, ARM64_RELOC_ADDEND pair on non-zero addends (§3.1)
- - [x] `obj/macho_reloc_aarch64.c` — full RelocKind ↔ ARM64_RELOC_*
- table with pcrel/length companions (§3.2)
- - [x] `obj/macho_read.c` — MH_OBJECT reader with strip-leading-`_`
- inverse and ADDEND-pair collapsing (§3.1)
- - [x] `obj/obj_secnames.c` — Mach-O section names for init/fini/tls
- (§3.5)
- - [x] `abi/abi_apple_arm64.c` — `va_list` is `char*`; variadic-arg
- on-stack routing wired in `arch/aarch64.c::emit_arg_value`
- when `target.os == MACOS` (§3.4)
- - [x] `test/lib/exec_target.sh` — `<arch>-<os>` tag form, Darwin
- native branch for `aarch64-macos` (§3.6)
- - [x] Smoke test: `cfree cc -target arm64-apple-macos -c …` produces
- Mach-O `.o` that links via host clang and runs natively (exit 42)
- - [x] Self-roundtrip oracle: `test/macho/cfree-roundtrip-macho.c` —
- cfree-emitted `.o` → `read_macho` → `emit_macho` is
- byte-identical
- - [x] ELF suite still green (test-elf 37/37)
- - [x] Testing harness extensions (§7): `CFREE_TEST_OBJ` env in
- `cfree_test_target.h` and `test/{cg,link,elf}/run.sh`,
- per-case `*.targets` applicability (`test/elf/cases/18_bti_note.targets`)
- - [ ] Clang-emitted Mach-O round-trip — needs section-relative reloc
- and `__compact_unwind` handling (deferred)
-- [x] **Phase 2.5** — Linker-side Mach-O read + JIT
- - [x] `link_add_obj_bytes` / `link_add_archive_bytes` dispatch on
- `cfree_detect_fmt`, so a Mach-O `.o` (or `.a` member) parses
- through `read_macho` exactly like an ELF input does (§3.3)
- - [x] C-symbol mangling lives at the linker API boundary
- (`link_intern_c_name`, `cfree_jit_lookup`, the undef diag) —
- Mach-O on-disk names stay byte-for-byte verbatim (round-trip
- intact), and callers see the source-level form across both
- formats: `link_set_entry("test_main")` and
- `cfree_jit_lookup(jit, "test_main")` work uniformly.
- - [x] test-link paths R + J both green on `aa64-macho`
- (35 R / 31 J passing, including bad/30_undef_strong;
- path E is the remaining gap)
- - Four cases ship a `j_targets` file restricting their J path to
- ELF tuples (R + E still run on every tuple). Path J's pass/fail
- criterion in these cases depends on an ELF-specific ABI feature
- with no Mach-O analogue:
- * `21_fini_array` / `22_init_fini_both` — Mach-O destructors flow
- through `__StaticInit` + `__cxa_atexit` registration, not the
- ELF `.fini_array` shape the test and
- `test/link/harness/start.c` walk. Without a `__cxa_atexit`
- runtime in the JIT, the destructor is never invoked.
- * `25a_gc_basic` / `25d_gc_chain` — `clang -ffunction-sections`
- on Mach-O still emits a single `__TEXT,__text` per `.o`
- (`.subsections_via_symbols` is the per-symbol dead-strip
- granularity, not `-ffunction-sections`). `--gc-sections` can
- drop whole sections but not individual functions, so the
- `gc_absent unreachable_fn` check fails.
-- [~] **Phase 3** — Mach-O linker (`link_emit_macho`) — driver path
- working; test-link/E coverage pending (§9 Phase 3.1).
- - [x] `src/link/link_macho.c` — MH_EXECUTE + MH_PIE writer with
- `__PAGEZERO` / `__TEXT` / `__DATA_CONST` / `__DATA` /
- `__LINKEDIT` segments, `__TEXT,__stubs` (12-byte arm64 stubs
- through `__DATA_CONST,__got` slots), `LC_DYLD_CHAINED_FIXUPS`
- for both bind (imports) and rebase (internal abs64) fixups,
- `LC_DYLD_EXPORTS_TRIE` (single-entry minimal trie),
- `LC_SYMTAB` + `LC_DYSYMTAB` + indirect-symbol table,
- `LC_LOAD_DYLINKER` (`/usr/lib/dyld`), `LC_BUILD_VERSION`,
- `LC_UUID`, `LC_MAIN`, empty `LC_FUNCTION_STARTS` /
- `LC_DATA_IN_CODE`.
- - [x] Ad-hoc `LC_CODE_SIGNATURE` (SuperBlob + CodeDirectory v=0x20400
- with sha256 4 KiB-page hashes + execSeg fields) so the kernel
- execs the binary on macOS 11+.
- - [x] `read_macho_dso` for MH_DYLIB inputs and `read_tbd` for
- Apple's text-based-stub `.tbd` files (sniffed via leading
- `---`). `link_add_dso_bytes` dispatches on format. TBD parser
- is a token scanner — emits every `_id` token as an exported
- ObjSym; conservative but correct for the static-link decision
- (the install-name on `LC_LOAD_DYLIB` is the umbrella, and dyld
- walks re-exports at runtime).
- - [x] `driver/lib_resolve.c` — `-lname` resolves `.tbd` first, then
- `.dylib`, then `.so`, then `.a` under
- `LIB_RESOLVE_DYNAMIC_PREFER`. `driver/cc.c` routes the result
- to `dso_bytes[]` or `archives[]` by suffix and plumbs
- `ndso_bytes` through `CfreeLinkInputs`.
- - [x] `LinkImage.linker` back-pointer set by `link_resolve` so
- format-specific emit can walk `LinkInputs` (e.g. resolving an
- imported sym's `dso_input_id` to a DSO `install_name`).
- - [x] `link_set_entry` defaults to `_main` on Mach-O (vs `_start`
- for ELF), matching the LC_MAIN convention where dyld owns C
- startup.
- - [x] `link_layout.c::resolve_symbols` `is_def` widened to require
- backing storage (section / abs / common) — `extern int f();`
- from `decl.c` lands as `kind=SK_FUNC, section_id=0` and would
- otherwise be misclassified as a definition, masking the import
- and breaking CALL26 to libSystem on the in-memory pipeline.
- ELF `.o` reads were unaffected because `read_elf` already
- normalizes undefs to `SK_UNDEF`.
- - [x] `link_layout.c::boundary_name` — Mach-O target prefixes every
- linker-synthesized boundary symbol (`__init_array_start`, etc)
- with `_` so the on-disk name matches what consumer code
- compiles to under the leading-`_` mangling rule.
- - [x] `decl.c` — Mach-O target prepends `_` to every C identifier
- with linkage at obj_symbol creation time (unconditional, even
- when source name already starts with `_`, matching Apple `cc`).
- - [x] `pipeline.c::cfree_link_exe` — skip ELF `layout_dyn` for
- Mach-O targets (their LC_LOAD_DYLIB / chained-fixups machinery
- is synthesized in `link_emit_macho` instead, and ELF-shaped
- `.plt` / `.got.plt` synthetic sections would only confuse the
- Mach-O writer).
- - [x] Smoke test — `cfree cc -target arm64-apple-macos hello.c -o
- hello -lSystem` produces a runnable arm64-darwin Mach-O exe
- that calls `printf` from libSystem and exits with the right
- code (verified manually with `hello.c`, `multi.c` covering
- multiple imports + globals + bss).
- - [x] ELF test suite still green (`test-elf` 37/37).
-
----
-
-## 1. Today
-
-What exists for Mach-O already:
-
-- `ObjExtKind` carries `OBJ_EXT_MACHO` and `OBJ_EXT_COFF`
- (`src/obj/obj.h:75-81`).
-- `emit_macho` / `read_macho` (and `emit_coff` / `read_coff`) are
- declared in `obj.h:344-364` and stubbed in `src/api/stubs.c:73-94`
- — they panic with `unimplemented`.
-- `cfree_compile_obj_emit` (`src/api/pipeline.c:306-321`) already
- dispatches into them by `c->target.obj`.
-- `cfree_detect_target` recognizes Mach-O magic, reads `cputype`,
- and populates `(arch, os=MACOS, obj=MACHO)`
- (`src/api/detect.c:180-214`).
-- The driver's target parser accepts `darwin`/`macos`
- (`driver/target.c:82`); `driver/env.c:754` defaults the host OS
- on a Darwin build.
-- `RelocKind` is already per-arch in shape (e.g. `R_AARCH64_*`,
- `R_X64_*`); the per-arch ELF reloc translator is split into
- `obj/elf_reloc_<arch>.c`. Mach-O reloc translators slot in as
- peers.
-
-What is ELF-only and panics on Mach-O today:
-
-- `emit_macho` / `read_macho` themselves
- (`src/api/stubs.c:73,90`).
-- `link_emit_image_writer` (`src/link/link.c:422-432`) routes only
- `CFREE_OBJ_ELF` to `link_emit_elf`; every other case panics with
- "only ELF is implemented".
-- `link_emit_elf` writes an ET_EXEC/ET_DYN ELF (`src/link/link_elf.c`,
- `src/link/link_dyn.c`). Nothing exists for the Mach-O peer.
-- Dynamic-link plumbing (`src/link/link_dyn.c::layout_dyn`) emits
- `.interp` / `.dynsym` / `.dynstr` / `.gnu.hash` / `.rela.*` /
- `.plt` / `.got.plt` / `.dynamic` — all ELF-shaped. Mach-O has its
- own dyld machinery (LC_DYLD_INFO_ONLY / DYLD_CHAINED_FIXUPS /
- LC_SYMTAB / LC_DYSYMTAB / `__la_symbol_ptr` / `__stubs`).
-- ABI vtable selection (`src/abi/abi.c:286-300`) keys on
- `target.arch` only; macOS-arm64 has a distinct ABI from AAPCS64
- (variadic-args-on-stack, `char`/`short` promoted on stack
- arguments).
-- The test harness (`test/lib/exec_target.sh`) batches via
- `podman run` against `linux/<arch>` images — a Linux-only
- execution path. Native Mach-O execution on the host is the new
- path.
-
----
-
-## 2. Target slice for first Mach-O milestone
-
-| axis | value |
-|-----------|--------------------------------------|
-| arch | `CFREE_ARCH_ARM_64` |
-| os | `CFREE_OS_MACOS` |
-| objfmt | `CFREE_OBJ_MACHO` |
-| ABI | Apple ARM64 (Darwin variant) |
-| codemodel | `CFREE_CM_SMALL` (default) |
-
-Why this slice first, not x86_64-darwin or arm64-windows:
-
-- arm64 codegen is the validated one; x64 is still on the
- `MULTIARCH.md` Phase 3 critical path.
-- Apple no longer ships x86_64 Macs; x86_64-darwin is interesting
- only as a cross target.
-- Windows is on the roadmap but is a separate format **and** a
- separate ABI; bundling it into the first Mach-O milestone would
- blur which seam each defect lives behind.
-
-The pivot once arm64-darwin lands:
-
-- **x86_64-darwin** — additive; needs `macho_reloc_x86_64.c`,
- Apple-x64 ABI vtable (close to SysV-x64 with quirks).
-- **arm64-windows / x86_64-windows** — peer of this plan with
- `coff_emit.c` / `coff_read.c` / `coff_reloc_<arch>.c` and the
- Microsoft ABI vtable. The seams below are shaped so this is
- purely additive.
-
-Out of scope for v1 of Mach-O:
-
-- Universal (fat) binaries — not a hard requirement; skip until
- someone needs them. The shape is well-defined: a fat header
- prepended to per-arch slices.
-- Bitcode embedding, codesigning, entitlements, `__LINKEDIT`
- beyond what dyld needs, debug `__DWARF` segment (covered later
- by the dwarf side, not the object side).
-- ObjC metadata sections (`__objc_*`). Not relevant for a C
- compiler.
-
----
-
-## 3. The seams
-
-### 3.1 Object writer / reader — peers of `elf_emit.c` / `elf_read.c`
-
-**Decision:** introduce `src/obj/macho_emit.c` and
-`src/obj/macho_read.c` as peers of the ELF pair, sitting behind the
-existing `emit_macho` / `read_macho` declarations in `obj.h`. No
-additional dispatch is needed at this layer — `pipeline.c:306-321`
-already routes by `target.obj`.
-
-Round-trip invariant (matching `DESIGN.md §5.5`): `read_macho` of a
-`macho_emit` output must produce an `ObjBuilder` shape-equivalent
-to the input, modulo (a) Mach-O's mandatory `(segname, sectname)`
-pairing for sections and (b) any synthesized `N_SECT` / `N_OSO`
-symbols.
-
-The neutral `ObjBuilder` model accommodates Mach-O without a
-schema break:
-
-- `Section.name` is already a single `Sym`. Mach-O writers split it
- by convention: when a section's name string starts with
- `__TEXT,__text` (or any other comma-separated form), the writer
- takes the prefix as `segname` and the suffix as `sectname`. When
- the name lacks a comma (the common case for ELF-shaped input),
- the writer derives `segname` from `SecKind` (`SEC_TEXT` →
- `__TEXT`, `SEC_RODATA` → `__TEXT,__const`, `SEC_DATA` →
- `__DATA`, `SEC_BSS` → `__DATA,__bss`).
-- `SecKind` / `SecFlag` map cleanly onto Mach-O `S_*` section
- types and `S_ATTR_*` attributes. The reverse mapping (read side)
- uses the existing `Section.ext_type` / `Section.ext_flags`
- escape hatch (already present, see `obj.h:209-217`) for any
- Mach-O-only types we don't want to lose on round-trip.
-- `SymBind` / `SymKind` / `SymVis` cover Mach-O's `N_EXT` /
- `N_PEXT` / `N_TYPE` adequately; `Section.ext_kind` is set to
- `OBJ_EXT_MACHO` when reading so the writer knows to preserve
- format-specific fields. (The same escape hatch will be used by
- COFF.)
-- Symbol bookkeeping: Mach-O requires symbols to be partitioned
- into local / external-defined / external-undefined for
- `LC_DYSYMTAB`. The partitioning is computed at write time from
- the `ObjSym.bind` / `ObjSym.section_id` fields — no schema
- change.
-
-Header includes: a new `src/obj/macho.h` peer of `obj/elf.h` for
-the on-disk structures (`mach_header_64`, `segment_command_64`,
-`section_64`, `symtab_command`, `nlist_64`, `relocation_info`).
-
-### 3.2 Per-arch Mach-O reloc translator
-
-`obj/elf_reloc_<arch>.c` is the model. Add:
-
-- `src/obj/macho_reloc_aarch64.c` — `RelocKind` ↔
- `ARM64_RELOC_*` (UNSIGNED, BRANCH26, PAGE21, PAGEOFF12,
- GOT_LOAD_PAGE21, GOT_LOAD_PAGEOFF12, POINTER_TO_GOT,
- TLVP_LOAD_PAGE21, TLVP_LOAD_PAGEOFF12, ADDEND, SUBTRACTOR).
-- `src/obj/macho_reloc_x86_64.c` — `RelocKind` ↔ `X86_64_RELOC_*`
- (UNSIGNED, SIGNED, BRANCH, GOT, GOT_LOAD, SUBTRACTOR, TLV,
- SIGNED_1/2/4). Lands when x64 codegen does (post
- `MULTIARCH.md` Phase 3).
-
-Two Mach-O-specific complications the translator absorbs:
-
-- **`ARM64_RELOC_ADDEND` pairs.** Mach-O encodes addends out-of-
- band by emitting a leading `ARM64_RELOC_ADDEND` reloc carrying
- the addend, immediately followed by the real reloc. The existing
- `Reloc.pair` byte (`obj.h:226`) already exists for this kind of
- paired-reloc shape (it was added with the same semantics in
- mind). The translator emits the pair on write and collapses the
- pair on read.
-- **`ARM64_RELOC_SUBTRACTOR`** is two relocs — a SUBTRACTOR
- followed by an UNSIGNED — modeling `B - A` as the resolved
- value. Cfree's IR doesn't currently emit DWARF-style
- difference relocs (the only consumer is `eh_frame` /
- `compact_unwind` — not on the v1 critical path); leave them
- out of `cgtarget` and panic in the writer if seen. Reader
- recognizes them so a clang-built object round-trips through
- `objdump`.
-
-**Decision:** no new `RelocKind` enum entries are needed for
-Mach-O. The kinds are already arch-suffixed and the translator
-pattern keeps the format-specific encoding local. We do not split
-`R_AARCH64_PAGE21` from a hypothetical `R_MACHO_AARCH64_PAGE21` —
-the underlying semantic (the page-relative ADRP fixup) is the
-same; the translator picks the right ELF or Mach-O code.
-
-### 3.3 Linker emit dispatch
-
-`link_emit_image_writer` (`src/link/link.c:422-432`) is the seam.
-Replace the single ELF case + panic with a switch:
-
-```
-case CFREE_OBJ_ELF: link_emit_elf(img, w); return;
-case CFREE_OBJ_MACHO: link_emit_macho(img, w); return;
-case CFREE_OBJ_COFF: link_emit_coff(img, w); return; /* later */
-case CFREE_OBJ_WASM: /* later */
-```
-
-`link_emit_macho` lives in a new `src/link/link_macho.c`. It is
-**not** a thin reskin of `link_emit_elf`: the LinkImage model
-(segments / sections / symbols / reloc-applies) is largely shared,
-but Mach-O's load-command shape is wholly different from ELF
-program headers, and dyld bookkeeping is incompatible enough that
-trying to share `link_dyn.c::layout_dyn` would be more work than a
-peer.
-
-Concrete shape of `link_macho.c`:
-
-- Plan load commands: `LC_SEGMENT_64` × N (segments map to
- Mach-O segments, one per `LinkSegment`), `LC_SYMTAB`,
- `LC_DYSYMTAB`, `LC_BUILD_VERSION`, `LC_DYLD_INFO_ONLY` (or
- `LC_DYLD_CHAINED_FIXUPS` for modern dyld; pick one — see
- decision below), `LC_LOAD_DYLINKER`, `LC_LOAD_DYLIB` × N for
- imported DSOs (peer of ELF DT_NEEDED), `LC_MAIN` (or
- `LC_UNIXTHREAD` for static), `LC_FUNCTION_STARTS`,
- `LC_DATA_IN_CODE`, `LC_UUID`, `LC_SOURCE_VERSION`.
-- Synthesize `__LINKEDIT` segment containing symtab/strtab,
- dyld export trie, indirect-symbol table, function-starts
- table, code-signature placeholder.
-- Synthesize `__TEXT,__stubs` (arm64: 12-byte stubs) and
- `__DATA,__la_symbol_ptr` (lazy pointers) for imported
- function calls; arm64 BL → stub. Or, on modern macOS,
- go straight to chained fixups + non-lazy binding (no
- `__stubs`).
-- Apply relocations against the chosen image base
- (`MH_PIE`-only for v1).
-
-**Decision: `LC_DYLD_CHAINED_FIXUPS` for v1, not
-`LC_DYLD_INFO_ONLY`.** Chained fixups are the modern macOS path
-(11+); they are smaller, simpler to emit (no opcode encoder for
-the bind-info stream), and Apple has been deprecating the legacy
-path. A consequence: cfree-emitted Mach-O exes do not run on
-macOS 10.15 or older. That is acceptable for v1 — bring it up
-only if a user complains. The legacy-bind-info encoder lands later
-as an additional path, gated behind a target-min-version check.
-
-`link_dyn.c` stays ELF-only and is **not** generalized. The
-overlap with `link_macho.c` is "we both need a list of imported
-DSOs and exported symbols" — which the LinkImage already carries.
-Generalizing would force a lowest-common-denominator layer that
-serves neither side well.
-
-### 3.4 ABI vtable — widen selection to `(arch, os)`
-
-`abi.c::select_vtable` (lines 286-300) keys on `target.arch` alone.
-For aarch64 this is wrong on macOS:
-
-- Apple ARM64 passes variadic arguments on the stack only (not in
- x0-x7), and promotes `char`/`short` to `int` for stack args
- even when the type is otherwise passed register-only. AAPCS64
- passes variadic args in registers like fixed args.
-- `_Bool` is 8 bits on Darwin (matching most other platforms;
- AAPCS64 also says 8 bits, so no divergence here — but the
- rules diverge enough that the vtable must be distinct).
-
-**Decision:** widen the switch in `select_vtable` to key on the
-`(target.arch, target.os)` pair. Add `apple_arm64_vtable` in a new
-`src/abi/abi_apple_arm64.c`. When `(arch, os)` is `(ARM_64, MACOS)`,
-install it; otherwise keep AAPCS64.
-
-Mechanically:
-
-- Initial implementation can be a thin shim: copy `aapcs64_vtable`
- and override only `compute_func_info`'s variadic handling and
- the by-value-on-stack promotion rule. Avoid the temptation to
- factor a "shared aarch64 base" — two implementations with a
- shared static helper for the register classification is enough
- abstraction.
-- `va_list` on Apple ARM64 is a single `char*` (much simpler than
- AAPCS64's struct with five fields). The vtable's `va_list_type`
- hook returns the right type per ABI — already structured for
- this.
-- Apple x86_64 uses SysV-x64 with minor differences (red zone is
- the same; varargs use a different `__va_list_tag` layout? — no,
- same). When x64 lands, `apple_x64_vtable` may be a literal
- re-export of `sysv_x64_vtable`; revisit then.
-- Microsoft x64 ABI (Windows) is meaningfully different (4
- register args, shadow space, varargs in registers); it gets its
- own `ms_x64_vtable` peer when COFF lands.
-
-### 3.5 Format-aware bookkeeping in the linker layout
-
-`link_layout.c` emits a few ELF-shaped artifacts that need to be
-either generalized or dispatched:
-
-- **Build-id note** (`.note.gnu.build-id`) is ELF-specific. Mach-O
- uses `LC_UUID`. Move the build-id synthesis out of `layout` into
- a per-format hook called by the format-specific emitter.
- Decision: layout produces a 16-byte image identity (a hash of
- the post-shift section bytes), and the format emitter packages
- it as a build-id note (ELF) or `LC_UUID` (Mach-O) or a debug
- directory (COFF/PE). One source of truth for the bytes.
-- **TLS layout.** ELF uses `PT_TLS` + per-arch tpoff relocs;
- Mach-O uses `__thread_vars` / `__thread_data` / `__thread_bss`
- sections and `tlv_descriptor` records (a function pointer + key
- + offset). The TLS lowering in cgtarget is already arch-aware;
- the **section name choice** in cg/abi must become format-aware.
- Add a `target.obj`-keyed dispatch where TLS section names are
- picked.
-- **Init/fini.** ELF uses `.init_array` / `.fini_array`; Mach-O
- uses `__DATA,__mod_init_func` / `__DATA,__mod_term_func`. Same
- format-aware-section-name dispatch.
-- **Common symbols.** ELF emits `SK_COMMON` as `SHN_COMMON`;
- Mach-O lays them out into `__DATA,__common` at link time. The
- read/write paths absorb this — no `ObjBuilder` change.
-
-### 3.6 Driver: native execution path on Darwin hosts
-
-`driver/cc.c` and friends already understand `darwin`/`macos` at
-the parse layer. The execution path for tests is the only new
-thing: an arm64-darwin Mach-O executable runs natively on the
-Darwin/arm64 host, no podman, no qemu. Detection rule for
-`test/lib/exec_target.sh`:
-
-- A new `exec_target_darwin_native` predicate that returns 0 if
- the host is `darwin/<matching-arch>` and the case's target os
- is `MACOS`. Bypass the podman / qemu branches entirely; just
- `chmod +x` and exec.
-- For Linux hosts, executing macOS binaries is not supported. Mark
- cases as XFAIL on non-Darwin hosts.
-- For Darwin hosts, executing Linux binaries continues to flow
- through podman as today — Apple's Virtualization.framework via
- Podman Desktop or `podman machine` is already the working path
- for the existing aarch64-linux suite when developing on a Mac.
-
----
-
-## 4. Phasing
-
-Three phases. Each is independently mergeable; phase 1 lands with
-no behavior change, phase 2 lands a working `cfree -c` Mach-O
-output, phase 3 lands `cfree` link-to-exe.
-
-### Phase 1 — seams + format-aware bookkeeping
-
-Pure refactors. No Mach-O bytes emitted yet; ELF output unchanged
-byte-for-byte.
-
-1. **Linker emit dispatch** (§3.3). `link_emit_image_writer` (the
- one site at `link/link.c:422`) gains a switch over `target.obj`;
- `CFREE_OBJ_MACHO` and `CFREE_OBJ_COFF` panic with
- `unimplemented` (replacing the catch-all "only ELF" panic with
- per-format ones). Move the build-id synthesis out of layout
- into a format-agnostic image-identity hook (§3.5).
-2. **ABI vtable widening** (§3.4). `abi.c::select_vtable` keys on
- `(arch, os)`. Add `src/abi/abi_apple_arm64.c` with
- `apple_arm64_vtable` initially aliasing AAPCS64. The
- variadic-on-stack and small-int-promotion overrides land in
- phase 2 alongside the macho writer (so a build-it-and-see-what-
- breaks loop on a real macOS toolchain catches divergences).
- Same shape for `apple_x64_vtable` (deferred to whenever x64
- lands on Darwin).
-3. **Format-aware section-name dispatch** (§3.5). TLS,
- init/fini, and common-symbol section naming become a
- target-keyed function rather than the hard-coded ELF strings
- currently in `cg` / `abi` / `link_layout`. ELF behavior is
- unchanged; the dispatch is a fall-through to the current
- strings until a Mach-O case is added.
-4. **Mach-O headers and reloc translator stubs** (§3.1, §3.2).
- New `src/obj/macho.h` with the on-disk structures. New
- `src/obj/macho_reloc_aarch64.c` with translator stubs (panic
- on call). No behavior change — neither is reachable yet.
-
-Exit criterion: ELF test suite green and byte-for-byte identical
-output objects on a representative `test/cg/` set.
-
-### Phase 2 — Mach-O object writer + reader (MH_OBJECT)
-
-Now the writer produces real Mach-O bytes. No linker work yet —
-we validate by running a clang-built binary that links a
-cfree-emitted `.o`.
-
-1. **`obj/macho_emit.c`** — writes `MH_OBJECT` from a finalized
- `ObjBuilder`. Layout: header → load commands
- (`LC_SEGMENT_64`-with-everything, `LC_SYMTAB`,
- `LC_DYSYMTAB`, `LC_BUILD_VERSION`) → section bytes → reloc
- tables → symtab/strtab in `__LINKEDIT`. (`MH_OBJECT` keeps
- everything in one `LC_SEGMENT_64`.)
-2. **`obj/macho_reloc_aarch64.c`** — fill in the translators
- stubbed in phase 1. The `ADDEND`-pair handling on write; the
- `SUBTRACTOR`-pair handling on read.
-3. **`obj/macho_read.c`** — parses `MH_OBJECT` (and `MH_DYLIB`
- for the linker's DSO-input path) into an `ObjBuilder`.
-4. **Apple ARM64 ABI deltas.** Variadic-on-stack and small-int
- promotion in `abi_apple_arm64.c` (the Phase 1 alias is no
- longer correct once anything calls a varargs function).
-5. **Native exec helper** (§3.6). `test/lib/exec_target.sh`
- gains the Darwin-native branch.
-6. **Smoke test.** A `test/cg/` case compiles to `.o` via cfree
- targeting `arm64-apple-macos`, links via host `clang`, and
- runs natively. Greens the standard pass/fail line.
-7. **`objdump` round-trip.** A Mach-O `.o` produced by clang
- round-trips through `read_macho` → `emit_macho` and
- re-`read_macho` produces an equivalent `ObjBuilder`. This is
- the standard cfree round-trip discipline.
-
-Exit criterion: `cfree -c` for `arm64-apple-macos` produces an
-object that links via the host `ld` / `clang` into a runnable
-executable, and clang-produced Mach-O round-trips through
-cfree's reader/writer.
-
-### Phase 3 — Mach-O linker (`link_emit_macho`)
-
-Now cfree links its own Mach-O executable end-to-end, no clang.
-
-1. **`link/link_macho.c`** — `link_emit_macho(img, w)` peer of
- `link_emit_elf`. `MH_EXECUTE` + `MH_PIE`. Modern dyld path:
- `LC_DYLD_CHAINED_FIXUPS` (§3.3 decision). `__TEXT,__stubs`
- for imported function calls; `__DATA_CONST,__got` for
- imported data.
-2. **Imported-DSO load commands.** `LC_LOAD_DYLIB` per imported
- `.dylib` input (peer of ELF's `DT_NEEDED`). The Phase 1
- linker model for DSO inputs (`LINK_INPUT_DSO_BYTES`) already
- carries the soname-equivalent (Mach-O's `install_name`); on
- the read side, `read_macho` extracts it from `LC_ID_DYLIB`.
-3. **`LC_MAIN` entry.** The entry symbol resolution
- (`img->entry_sym`) already happens generically; the format
- emitter just packages it as `LC_MAIN`'s `entryoff`.
-4. **First end-to-end exe.** A `test/cg/` hello-world targeting
- `arm64-apple-macos` compiles + links via cfree and runs on
- the host. Exit code threads through the standard pass/fail
- line.
-5. **libSystem coverage.** `printf` / `errno` / `malloc` —
- linking against `libSystem.B.dylib` (the umbrella that
- re-exports libc, libm, libdyld, libpthread). Sysroot extraction
- for the Darwin SDK lives in a new `test/sdk/macos/` peer of
- `test/libc/{musl,glibc}/` — `xcrun --show-sdk-path` is the
- source on Darwin hosts; cross-from-Linux is out of scope.
-6. **Universal binaries (deferred)** — fat-header wrapper around
- per-arch slices. Lands when a user wants it, not earlier.
-
-Exit criterion: each Mach-O milestone owns a `test/cg/` case
-running natively on a Darwin/arm64 host; pass/fail line green.
-ELF suite still green.
-
----
-
-## 5. PE/COFF as a peer (forward look)
-
-The seams in §3 are sized for COFF too:
-
-- `obj/coff_emit.c` / `obj/coff_read.c` peer `obj/macho_*`.
-- `obj/coff_reloc_<arch>.c` peer `obj/macho_reloc_*`.
-- `link/link_coff.c::link_emit_coff` peer `link_macho.c`. PE
- uses optional headers + data directories instead of Mach-O's
- load commands; the LinkImage model is still adequate.
-- `abi/abi_ms_x64.c` (Win64 ABI) and a hypothetical
- `abi_ms_arm64.c` (Windows on ARM ABI) as ABI vtable peers.
-- Windows-on-Linux execution is wine-shaped; on a Windows host
- it is native. The exec helper grows a Windows native branch;
- on Linux/Darwin hosts, COFF cases default to XFAIL until a wine
- branch is added.
-
-The ABI vtable's `(arch, os)` keying naturally captures Microsoft
-ABI vs SysV vs Apple — Windows-arm64 picks `ms_arm64`, not
-`apple_arm64`, even though both are arm64.
-
-PE/COFF gets its own `MULTIOBJ_PE.md` design pass when its
-critical path opens; this doc reserves the seams.
-
----
-
-## 6. Naming conventions
-
-For the new files and exposed symbols:
-
-- Mach-O code lives under `src/obj/macho_*` and
- `src/link/link_macho.c`. Identifiers use the `macho_` prefix
- (`macho_emit`, `macho_reloc_aarch64_to`,
- `link_emit_macho`).
-- Apple ABIs use the `apple_` prefix (`apple_arm64_vtable`,
- `apple_x64_vtable`). The host OS is the discriminator; any
- future Apple-only-on-arch prefix (e.g. for an iOS-specific
- variant) would extend this — but iOS / tvOS / watchOS share
- the same ABIs as macOS for the relevant arches, so no second
- prefix is needed.
-- Mach-O reloc translators stay arch-suffixed:
- `macho_reloc_aarch64.c`, `macho_reloc_x86_64.c` (matches the
- ELF translator naming).
-- Win64 / Windows-ARM64 (deferred) use the `ms_` prefix
- (`ms_x64_vtable`, `ms_arm64_vtable`). COFF code uses the
- `coff_` prefix.
-
----
-
-## 7. Testing harness
-
-The existing `test/cg/` and `test/link/` matrices already do everything
-the Mach-O work needs to validate against — round-trip (Path R), exec
-(Path E), JIT (Path J), DWARF check (Path W). We extend that
-infrastructure rather than standing up a parallel `test/macho/` peer
-of `test/elf/`: `test-link`'s Path R covers `clang -c` → cfree-roundtrip
-→ structural diff, and Path E covers run-the-binary. The same machinery
-serves Mach-O once the harness can pick a Mach-O target and exec the
-result.
-
-### 7.1 Target selection
-
-A new `CFREE_TEST_OBJ` env var sits parallel to `CFREE_TEST_ARCH`,
-values `elf` (default) | `macho` (later `coff`). `cfree_test_target.h`
-reads it and sets `t->obj` and `t->os` together (`macho` ⇒ MACOS) so
-both the C runners and the shell drivers stay in lockstep.
-
-`test/cg/run.sh`'s clang-cross detection grows a Mach-O branch:
-
-- `elf` ⇒ `--target=<arch>-linux-gnu` as today.
-- `macho` ⇒ `--target=arm64-apple-macos` (or `x86_64-apple-macos`).
-
-### 7.2 Per-case applicability
-
-`test/cg/cases/*.arches` becomes `*.targets`, listing one
-`<arch>-<obj>` tuple per line (`aarch64-elf`, `arm64-macos`,
-`x86_64-elf`, …). Cases with no file default to "all supported
-tuples"; cases that exercise format-specific features (GNU IFUNC,
-SHT_GNU_RETAIN, ELF linker scripts) name only the tuples they
-support. The Phase-2 Mach-O allowlist starts with a small set —
-hello-world, integer arithmetic, locals, calls, varargs — and grows
-as ABI deltas and reloc-translator coverage land.
-
-### 7.3 Exec dispatch (`test/lib/exec_target.sh`)
-
-The queue tag widens from `<arch>` to `<arch>-<os>`:
-
-- `aarch64-linux` → existing podman/qemu path on a Linux container.
-- `arm64-macos` (new) → on a Darwin/arm64 host, `chmod +x && ./exe`
- natively (no podman, no qemu). On non-Darwin hosts, SKIP cleanly
- with "macOS exec requires Darwin host". Mach-O cannot be loaded
- by the Linux kernel.
-- macOS-on-Linux is unsupported and stays SKIP. Linux-on-macOS
- continues to flow through the podman path (already works on
- Darwin/arm64 via `podman machine`).
-
-### 7.4 Phase-2 Path E (linker delegation)
-
-`link_emit_macho` doesn't exist until Phase 3, so Phase-2 Path E
-delegates to host `clang`: `cfree -c case.c -o case.o` then
-`clang -o case case.o`. A new `test/lib/link_macho_via_clang.sh`
-peer of `link_exe_runner` packages this so `test/cg/run.sh` and
-`test/link/run.sh` route Mach-O cases through it. Phase 3 swaps the
-helper for cfree's own linker; cases don't change.
-
-Clang's invocation of `ld` automatically inserts an ad-hoc code
-signature, so Phase-2 binaries exec on macOS 11+ without extra steps.
-Phase 3 inherits that responsibility — see the codesigning task
-below.
-
-### 7.5 Round-trip diff
-
-Path R already runs `cfree-roundtrip` (read → write) and structural-
-diffs the input vs. the rewritten output. For ELF the diff is
-`readelf -aW | normalize.py`. For Mach-O the equivalent is
-`llvm-objdump --macho --syms --reloc --section-headers | normalize_macho.py`,
-a small new normalizer alongside `test/elf/normalize.py`. This is
-the only new Mach-O-specific test artifact Phase 2 ships — and
-because the diff is structural, it doesn't have to be byte-perfect
-against clang's output (just round-trip-stable through cfree's
-reader/writer).
-
-### 7.6 Sysroot
-
-Layer-B / Path R round-trip needs no sysroot — clang produces the
-`.o` without linking. Path E (Phase 2 via clang) needs the host SDK
-on Darwin: `xcrun --show-sdk-path` is the only sanctioned source.
-Cross-from-Linux is out of scope (Apple SDK isn't redistributable).
-A new `test/sdk/macos/` peer of `test/libc/{musl,glibc}/` handles the
-extraction, only invoked when libc-dependent cases are added (mostly
-a Phase-3 concern; Phase-2 smoke can stay freestanding).
-
-### 7.7 `make` targets
-
-No new top-level harness target. Existing `make test-link` /
-`make test-cg` honor `CFREE_TEST_OBJ` (and `CFREE_TEST_ARCH`); CI
-runs them once per supported `(arch, obj)` tuple. The default
-invocation stays `aarch64-elf` so `make test` behavior is unchanged.
-
----
-
-## 8. Validation gates
-
-A change in this plan is "done" when:
-
-- **Phase 1**: ELF test suite still green, byte-for-byte identical
- output objects on a representative set of `test/cg/` cases.
-- **Phase 2**: Mach-O `.o` produced by cfree links via host
- `clang` into a runnable arm64-darwin executable; clang-built
- `.o` round-trips through cfree's reader/writer (`test-link`
- Path R with `CFREE_TEST_OBJ=macho`); ELF suite still green.
-- **Phase 3**: `cfree -c` + `cfree` linker produces an
- arm64-darwin Mach-O exe that runs natively on the Darwin host
- (ad-hoc codesigned by `link_macho.c` so the kernel will exec
- it); per-milestone `test/cg/` cases green; ELF suite still
- green.
-
----
-
-## 9. Remaining work
-
-### 9.1 Phase 3.1 — `test-link` Path E on `aa64-macho`
-
-Phase 3 lit up the cc-driver path: `cfree cc -target arm64-apple-macos
-src.c -o exe -lSystem` produces a runnable binary. But
-`make test-link CFREE_TEST_OBJ=macho` reports **72 pass / 36 fail / 0
-skip** — the R (round-trip) and J (JIT) lanes are green across all 36
-test cases, while every E (exec) lane fails at link time with
-
- fatal: link: undefined reference to '___fini_array_end'
-
-(`__fini_array_end` shown after the Mach-O leading-`_` strip in the
-diagnostic).
-
-The harness's `start.o` is built by host clang from
-`test/link/harness/start.c`, which references the array-boundary
-symbols (`__init_array_start/end`, `__fini_array_start/end`,
-`__preinit_array_start/end`, `__cfree_ifunc_init`,
-`__start_iplt_pairs`, `__stop_iplt_pairs`). On Mach-O, clang mangles
-those with a leading `_` so the .o carries `___fini_array_end` (3
-underscores).
-
-`link_layout.c::boundary_name` already prefixes every
-linker-synthesized boundary symbol on Mach-O. But when the runner is
-exercised in isolation, those boundary symbols **never get
-synthesized** — `emit_array_boundaries` evidently runs but the
-resulting `LinkSymbol` doesn't satisfy the per-input shadow's
-`defined=0` check. Two suspects, in order:
-
-1. The fan-out in `emit_boundary_sym` matches by `Sym` equality. If
- the start.o's per-input shadow interns
- `___fini_array_end` to a different `Sym` than `boundary_name`
- produces (e.g. one path goes through `pool_intern` and another
- through `pool_intern_cstr` with a length mismatch), they wouldn't
- match. Both call sites use the same global `Pool`, so this should
- be a no-op — but worth confirming with a single byte-level
- comparison instead of the Sym-equality short-circuit.
-2. `link-exe-runner` may panic in `resolve_symbols` (before
- `emit_array_boundaries`) because the start.o's `__cfree_ifunc_init`
- undef hits a code path that doesn't tolerate it. The earlier
- widening of `is_def` (require backing storage) is the same change
- that fixed the in-memory CALL26 case; it might be triggering a
- different early panic now.
-
-Recommended next step: instrument `cfree_link_exe` to print every
-`compiler_panic` site and the LinkSymbol state at the moment of the
-first failure, then walk back from there. Stderr fprintfs from
-`link_layout.c` were observed not to reach the runner's captured
-stderr in one local repro — verify whether the runner's
-`cfree_writer` redirection is intercepting them, or use a
-`compiler_panic`-shaped marker that the runner does propagate.
-
-Other items the E lane will surface once it gets past the start.o
-link:
-
-- `21_fini_array` / `22_init_fini_both` — Mach-O destructors flow
- through `__cxa_atexit`, not the `.fini_array` shape `start.c` walks.
- Same `j_targets`-style restriction the J lane already uses; extend
- to E.
-- `25a_gc_basic` / `25d_gc_chain` — `--gc-sections` granularity is
- per-section, but Apple's clang emits a single `__TEXT,__text` per
- `.o` (subsections-via-symbols is per-symbol). Same restriction.
-- `kernel_image` cases — freestanding ELF kernels with their own
- linker scripts; not portable to Mach-O at all. `targets`
- applicability marker should drop them on `aa64-macho`.
-- `bad/` cases that probe ELF-specific malformations (`shoff_oob`,
- `wrong_class`) need either Mach-O analogues or `targets` exclusion.
-
-### 9.2 Phase 4 — `test-cg` Path E on `aa64-macho`
-
-`make test-cg CFREE_TEST_OBJ=macho` exercises every cg case end-to-end
-via Path E (compile + link + run). Phase 4 prerequisites:
-
-- `test/cg/run.sh` already routes Path E for Mach-O through
- `link_macho_via_clang.sh` (per Phase 2 §7.4); switch to cfree's own
- linker once Phase 3.1 is green.
-- `test/sdk/macos/` shim materializes `xcrun --show-sdk-path` for
- `-isysroot` and `-lSystem` resolution. No-op on a Linux host —
- cases requiring libc stay SKIP there.
-- `*.targets` audit: every cg case that's currently `aarch64-elf`-only
- should either grow `arm64-macos` or document why it's restricted
- (linker scripts, IFUNC, ELF-specific intrinsics).
-
-### 9.3 Phase 5 — x86_64-darwin
-
-Additive on top of Phases 3–4, gated on `MULTIARCH.md` Phase 3 (x64
-codegen) landing. Concrete scope:
-
-- `obj/macho_reloc_x86_64.c` — `RelocKind` ↔ `X86_64_RELOC_*`
- (UNSIGNED, SIGNED, BRANCH, GOT, GOT_LOAD, SUBTRACTOR, TLV,
- SIGNED_1/2/4). Mirror of `macho_reloc_aarch64.c`.
-- `link_emit_macho` arch dispatch — currently arm64-only at the
- cputype/stub-encoding level. Add an x86_64 branch: 5-byte
- `jmpq *got(%rip)` stubs (vs arm64's 12-byte adrp+ldr+br).
-- `apple_x64_vtable` — likely a literal re-export of `sysv_x64_vtable`
- per §3.4 design; revisit if testing reveals a quirk.
-- `CFREE_TEST_ARCH=x64 CFREE_TEST_OBJ=macho` lane in CI on a
- Darwin/x86_64 host (or skipped cleanly on Apple Silicon, since
- Rosetta-emulation of cfree-emitted binaries isn't a goal).
-
-### 9.4 Phase 6 — universal (fat) binaries
-
-Optional. Fat header wrapping per-arch `MH_EXECUTE` slices. Defer
-until a user wants `lipo`-style multi-arch output. Implementation is
-shallow — a fat header prepended to the existing slice writer, plus
-matching multi-arch reader.
-
-### 9.5 Cleanup deferred from Phase 3
-
-- The Phase-2 deferred item (clang-emitted Mach-O round-trip via
- `read_macho` → `emit_macho`) needs section-relative reloc and
- `__compact_unwind` handling. Independent of linker work; lift out
- into its own task.
-- `read_tbd` is a permissive token scanner (every `_id` becomes an
- exported sym). Tighten to filter Obj-C metadata (`_OBJC_CLASS_$_*`)
- and `R<rev>$_*` reverse-export markers if Apple ever adds a symbol
- whose textual form would clash with a real C identifier.
-- `link_macho.c` carries a few oversize cleanup `free()` calls that
- pass `0` for the byte size (the buffers came from `VEC_GROW` which
- doesn't track capacity post-hand-off). Audit — leak-equivalent on
- the panic path, harmless on success.
diff --git a/doc/REGALLOC.md b/doc/REGALLOC.md
@@ -1,346 +0,0 @@
-# REGALLOC — CG-driven spill/reload on a finite physical-register pool
-
-The single-pass codegen produces target machine code by streaming
-`CGTarget` calls. Backends expose a finite scratch-register pool
-(aarch64 today: 10 INT, 16 FP). CG must drive that pool correctly under
-arbitrary register pressure: when the pool is empty and another reg is
-needed, spill the deepest live value on the CG value stack to a frame
-slot, free its register, and proceed. When a spilled value is consumed,
-reload it first.
-
-This document defines the contract between CG and the backend and the
-residency state machine on the value stack. Opt is out of scope —
-`opt_cgtarget` panics on `spill_reg`/`reload_reg` today and that
-remains the case until the deferred Phase 3 RA pass lands.
-
----
-
-## 1. What's broken today
-
-`aa_alloc_reg` panics with "out of INT scratch (no spill yet)" once 10
-integer scratch regs are live. To work around this, CG calls
-`cg_reset_scratch` at every statement boundary
-(`parse/parse.c:1618`), which calls `aa_reset_scratch(g->target)`
-directly (`cg/cg.c:412`). That direct call is a layering violation:
-
-- It hardcodes the aarch64 backend by name.
-- When opt is in the chain, `g->target` is an `OptImpl*`, not an
- `AAImpl*`. `aa_reset_scratch` casts it as the latter and writes
- `used_int=0; used_fp=0;` at AAImpl offsets inside opt's bytes —
- silent memory corruption on every statement boundary.
-- The reset hides the fact that CG never calls `free_reg` on consumed
- SValues. The pool is recycled by fiat, not by tracking ownership.
-
-Both `cg_reset_scratch` and `aa_reset_scratch` are removed by this
-design. `free_reg` becomes load-bearing.
-
----
-
-## 2. CGTarget contract changes
-
-### 2.1 `alloc_reg` returns `REG_NONE` on exhaustion
-
-```c
-Reg (*alloc_reg)(CGTarget*, RegClass, const Type*);
-```
-
-Returns a fresh physical reg, or `REG_NONE` if the class's pool is
-empty. Backends never panic on exhaustion — that decision belongs to
-CG. Other failure modes (unknown `RegClass`, internal invariant
-violation) still panic.
-
-### 2.2 `free_reg` is real
-
-```c
-void (*free_reg)(CGTarget*, Reg);
-```
-
-Returns `r` to the backend's free-list. Idempotent calls and
-double-frees are bugs and may panic. CG owes exactly one `free_reg`
-per `alloc_reg` over the lifetime of every value.
-
-### 2.3 `spill_reg` implies `free_reg`
-
-```c
-void (*spill_reg)(CGTarget*, Operand src_reg, FrameSlot, MemAccess);
-```
-
-`src_reg` must be `OPK_REG`. The call:
-
-1. Stores the register's contents to the frame slot per `MemAccess`.
-2. Returns the register to the backend's free-list.
-
-After `spill_reg`, the caller must not use `src_reg` and must not call
-`free_reg` on it. Coupling these two operations matches every CG
-caller's intent and removes a forget-to-free foot-gun.
-
-### 2.4 `reload_reg` is independent
-
-```c
-void (*reload_reg)(CGTarget*, Operand dst_reg, FrameSlot, MemAccess);
-```
-
-`dst_reg` must already have been allocated by the caller via
-`alloc_reg`. Loads the slot's bytes into the register. The slot is not
-released by this call — CG returns it to its own slot free-list.
-Multiple reloads from the same slot are well-defined; CG does not rely
-on this but the contract permits it.
-
-### 2.5 Backend pool implementation
-
-Each `RegClass` holds a `u32` free-mask: bit `i` set means the i-th
-register in that class's contiguous range is free. The aarch64 backend
-keeps two such masks alongside the per-class base register:
-
-```c
-typedef struct RegPool {
- u32 free; /* bit i set ⇔ regs[base + i] is free */
- u8 base; /* first physical reg in the class */
- u8 nregs; /* count; bits [nregs..32) are always 0 */
-} RegPool;
-```
-
-For aarch64:
-
-- INT pool: `base = 19`, `nregs = 10`, initial `free = 0x000003FFu`
- (x19..x28).
-- FP pool: `base = 8`, `nregs = 16`, initial `free = 0x0000FFFFu`
- (v8..v23, callee-saves first then caller-saved scratch).
-
-The three pool ops are pure bit operations:
-
-```c
-static Reg pool_alloc(RegPool* p) {
- if (p->free == 0) return REG_NONE;
- u32 idx = (u32)__builtin_ctz(p->free);
- p->free &= ~(1u << idx);
- return (Reg)(p->base + idx);
-}
-
-static void pool_free(RegPool* p, Reg r) {
- u32 idx = (u32)r - p->base;
- /* Double-free is a CG bug — bit must currently be 0. */
- if (p->free & (1u << idx))
- compiler_panic(..., "free_reg: %u already free", (unsigned)r);
- p->free |= (1u << idx);
-}
-```
-
-`spill_reg` emits the store via the backend's existing store path,
-then calls `pool_free` to release the bit.
-
-`__builtin_ctz` picks the lowest free bit, so allocation order is
-deterministic and the same physical regs stay hot — matching today's
-sequential allocation order for diff stability across the test corpus.
-The 32-bit mask is sufficient: every architecture cfree targets has
-fewer than 32 scratch regs per class.
-
----
-
-## 3. CG residency model
-
-### 3.1 SValue extension
-
-```c
-typedef enum SResidency {
- RES_INHERENT, /* IMM / LOCAL / GLOBAL — no reg owed */
- RES_REG, /* op holds a Reg this SValue owns */
- RES_SPILLED, /* register contents stored to spill_slot;
- op.kind reflects the original value form
- (REG or INDIRECT); op.v.reg is REG_NONE */
-} SResidency;
-
-typedef struct SValue {
- Operand op;
- const Type* type;
- SResidency res;
- FrameSlot spill_slot; /* valid iff res == RES_SPILLED */
- u8 pinned; /* 1 = ineligible spill victim */
-} SValue;
-```
-
-`OPK_INDIRECT` lvalues hold a base register and are tracked with
-`RES_REG`. On spill the base reg is saved; on reload the SValue is
-restored to `OPK_INDIRECT` with the freshly-reloaded base. The
-deferred-load identity of the lvalue is preserved across spill/reload
-— CG does not eagerly materialize an INDIRECT to a plain rvalue just
-because it became a spill victim.
-
-`pinned` is set for the duration of one CG operation. Pinning prevents
-a freshly-reloaded operand from being chosen as the spill victim while
-CG is still arranging the other operand of a binop or call.
-
-### 3.2 Spill-slot pool
-
-CG maintains two free-lists of `FrameSlot`s, one per `RegClass`:
-
-- `RC_INT` slots are 8 bytes, 8-byte aligned.
-- `RC_FP` slots are 16 bytes, 16-byte aligned (covers `double` and
- the spilled portion of `long double`).
-
-A spill takes a slot from the free-list (allocating a fresh
-`FrameSlot` from the backend if the list is empty). A reload returns
-the slot to the free-list. Frame footprint is bounded by peak
-concurrent spills per class, which on a stack machine of typical depth
-is small.
-
-Slots are per-function. `cg_func_end` discards both free-lists.
-
-### 3.3 Invariants
-
-After every public `cg_*` call returns:
-
-1. Every `RES_REG` SValue on the stack owns its register; the sum of
- `RES_REG` regs equals the backend's allocated-reg count.
-2. No SValue is `pinned`.
-3. Every `RES_SPILLED` SValue holds a slot that is *not* on the
- slot free-list.
-
-`cg_func_end` asserts the stack is empty and both free-lists are well-
-formed (every entry distinct, no aliasing with a live `RES_SPILLED`
-slot).
-
----
-
-## 4. Spill / reload algorithm
-
-### 4.1 Allocation with fallback
-
-```
-alloc_reg_or_spill(g, cls, ty) -> Reg:
- r = target->alloc_reg(cls, ty)
- if r != REG_NONE:
- return r
- victim = pick_victim(g, cls)
- if victim == NULL:
- panic("regalloc: no spillable victim") // pinned-only is a CG bug
- slot = take_spill_slot(g, cls)
- target->spill_reg(victim.op, slot, mem_for_spill(victim, cls))
- victim.spill_slot = slot
- victim.res = RES_SPILLED
- victim.op.v.reg = REG_NONE
- r = target->alloc_reg(cls, ty)
- assert(r != REG_NONE)
- return r
-```
-
-`pick_victim` walks the value stack from index 0 upward and returns
-the first SValue with `res == RES_REG`, `pinned == 0`, and matching
-`RegClass`. This is FIFO from the bottom — equivalent to "deepest
-first" — and matches the intuition that the top of the stack is about
-to be consumed.
-
-`mem_for_spill` constructs a `MemAccess` from the victim's type with
-`alias.kind = ALIAS_LOCAL` rooted at the spill slot.
-
-### 4.2 Reload before consumption
-
-```
-ensure_reg(g, sv):
- if sv.res != RES_SPILLED:
- return
- r = alloc_reg_or_spill(g, class_of(sv.type), sv.type)
- target->reload_reg(op_reg(r, sv.type), sv.spill_slot,
- mem_for_spill(sv, class_of(sv.type)))
- return_spill_slot(g, sv.spill_slot, class_of(sv.type))
- sv.spill_slot = FRAME_SLOT_NONE
- if sv.op.kind == OPK_INDIRECT:
- sv.op.v.ind.base = r
- else:
- sv.op = op_reg(r, sv.type)
- sv.res = RES_REG
-```
-
-`ensure_reg` is called at the start of every operation that consumes
-a register-resident operand. The ensure-then-pin pattern is:
-
-```
-binop(g, op):
- b = pop(g); ensure_reg(g, &b); b.pinned = 1
- a = pop(g); ensure_reg(g, &a); a.pinned = 1
- rd = alloc_reg_or_spill(g, class_of(result), result_ty)
- target->binop(op, op_reg(rd, result_ty), a.op, b.op)
- a.pinned = b.pinned = 0
- target->free_reg(reg_of(a)); target->free_reg(reg_of(b))
- push(g, svalue_reg(rd, result_ty))
-```
-
-Pinning is symmetric within one CG call. Nothing leaks across calls.
-
-### 4.3 Termination
-
-Each `alloc_reg_or_spill` call either succeeds on first try or evicts
-exactly one unpinned `RES_REG` SValue. The pinned set is bounded
-(within one CG op, at most a handful: two source operands plus a
-destination-in-progress). With at least `pinned + 1` registers in the
-class, the second `alloc_reg` call in `alloc_reg_or_spill` is
-guaranteed to succeed. Every backend's pool comfortably exceeds this
-bound (aarch64: 10 INT, 16 FP; minimum needed: ~3).
-
-`cg_call` is the one exception to "victims live on the value stack."
-While the pop loop materializes args, popped-but-not-yet-emitted regs
-accumulate in the local `CGABIValue avs[]` array — off the value
-stack and so invisible to `pick_victim`. For a call with more reg-
-class args than the pool size can hold, `pick_victim` will eventually
-return NULL while genuine victims remain in `avs[]`.
-
-To handle this, `cg_call` publishes its in-flight `avs` array via
-`CG.avs_in_flight` for the duration of the pop+materialize loop.
-`alloc_reg_or_spill` falls back to `spill_avs_victim` when the stack
-victim list is empty: it picks an `OPK_REG` arg storage entry,
-evicts it through `spill_reg`, and rewrites `avs[i].storage` to
-`OPK_LOCAL` so the backend's call lowering loads from the slot. After
-`T->call`, `cg_call` walks `avs` and returns each `OPK_LOCAL` slot to
-the spill-slot free-list. Without this fallback, a 12-arg INT call on
-aarch64 (10 INT pool) would be unsatisfiable.
-
-If a backend is added with a class smaller than the pinned bound for
-some operation outside `cg_call`, that's a wiring bug; CG asserts
-after the recursive `alloc_reg` call.
-
-### 4.4 Free-on-consume
-
-Every site that pops an SValue and is done with it must release any
-register it owned:
-
-```
-release(g, sv):
- if sv.res == RES_REG:
- if sv.op.kind == OPK_REG:
- target->free_reg(sv.op.v.reg)
- else if sv.op.kind == OPK_INDIRECT:
- target->free_reg(sv.op.v.ind.base)
- else if sv.res == RES_SPILLED:
- return_spill_slot(g, sv.spill_slot, class_of(sv.type))
- /* RES_INHERENT: nothing owed */
-```
-
-`cg_drop`, the result-discard path of `cg_store`, the operand pops in
-every `cg_binop`/`cg_unop`/`cg_cmp`/`cg_call`/`cg_load`/`cg_addr`/etc.
-all flow through `release`. This is the audit that bulks up the diff —
-CG has many `pop` sites and they currently leak.
-
----
-
-## 5. Removed mechanisms
-
-The following are deleted as part of this change:
-
-- `cg_reset_scratch` (`cg/cg.c`) — no replacement; `release` in §4.4
- makes it unnecessary.
-- The `extern void cg_reset_scratch(CG*)` declaration and the five call
- sites in `parse/parse.c`.
-- `aa_reset_scratch` (`arch/aarch64.c`) and its forward declaration in
- `cg/cg.c`.
-- The "(no spill yet)" panics in `aa_alloc_reg` — replaced by
- `return REG_NONE`.
-- The `aa_panic("spill_reg")`/`aa_panic("reload_reg")` stubs —
- replaced by real STR/LDR implementations that route through the
- backend's existing store/load paths.
-
-The `used_int` / `used_fp` fields on `AAImpl` are repurposed: instead
-of monotonically-incrementing allocation cursors (where free was a
-no-op), they now track the highest-index-+1 ever allocated per class —
-the high-water mark needed by the prologue/epilogue to size the
-callee-save area. The actual free-list is the `int_free` / `fp_free`
-bitmasks added alongside.
diff --git a/doc/cg_testing.md b/doc/cg_testing.md
@@ -1,308 +0,0 @@
-# cg / CGTarget / MCEmitter test strategy
-
-How we test the codegen stack — `cg`, `CGTarget`, and `MCEmitter` — *before*
-the C parser is ready, and how the suite extends naturally once the parser
-arrives. Companion to `DESIGN.md`. Scope: harness shape, layering rationale,
-test paths, fixture API, and a coverage corpus.
-
-## 1. Goal
-
-Test the meat of the single-pass compiler — recursive-descent parser ↔ `cg`
-↔ `CGTarget` ↔ `MCEmitter` ↔ `ObjBuilder` — with the parser stubbed out.
-The fixtures play the parser's role: each one drives `cg` (and a small
-number drive `CGTarget` or `MCEmitter` directly) to produce a function
-named `test_main` that returns an `int`. The harness then runs that
-function and checks the return value.
-
-This decouples codegen development from parser development: we can build
-out the AArch64 backend, opt's recording wrapper, MCEmitter encoding, and
-CG's value-stack/spill/fusion logic against behavioral oracles, without
-needing a working C front-end.
-
-## 2. Why three layers (and how to test each)
-
-```
-parser → cg → CGTarget → MCEmitter → ObjBuilder
- | | | |
- | | | └─ already covered by test/elf, test/link
- | | └─ encoding, fixups, relocs, alignment, CFI
- | └─ typed lowering vtable: takes resolved Operands
- └─ TCC-style value stack: spills, fusion, conversions, frame-residency
-```
-
-- **`MCEmitter`** is the lowest layer. Encoding-table bugs and reloc/fixup
- bugs surface here. Best tested with hand-written byte sequences and a
- one-instruction `test_main` (analogue of `test/elf/unit/smoke.c`).
-- **`CGTarget`** is typed lowering. It receives `Operand`s that are
- already resolved (REG / IMM / LOCAL / GLOBAL / INDIRECT). No value
- stack, no implicit conversions. Best tested with focused unit anchors
- that exhaustively exercise operand-kind combinations and op enums
- (every `BinOp`, every `ConvKind`, every `MemOrder`).
-- **`cg`** owns C-shaped behavior: value stack, spill/reload across
- pressure, `cmp` → `cmp_branch` fusion, frame-residency-by-default for
- locals, implicit MemAccess derivation, address-taken tracking. This is
- what the parser will drive, so the **primary** suite drives `cg`.
-
-The three layers map to three case categories:
-
-| Category | Drives | Where | Volume |
-|---|---|---|---|
-| Primary | `cg.h` | `test/cg/cases/` | grows with C language coverage |
-| Unit | `CGTarget` | `test/cg/unit/cgt_*` | one per method group, write-once |
-| Unit | `MCEmitter` | `test/cg/unit/mc_*` | one per encoding family, write-once |
-
-When a cg-driven case fails mysteriously, the matching unit anchor tells
-you whether the bug is in CGTarget/MCEmitter or in `cg` itself.
-
-## 3. Test paths per fixture
-
-Each fixture is run through several paths. This mirrors the R/E/J path
-matrix in `test/link/run.sh` and reuses its harness binaries.
-
-| Path | Pipeline | Validates | Available on |
-|---|---|---|---|
-| **D** direct-JIT | fixture → `ObjBuilder*` → `link_add_obj` → `cfree_link_jit` → call `test_main` | live ObjBuilder → JIT path; fastest; no file I/O | aarch64 host |
-| **R** roundtrip | fixture → `emit_elf` → bytes → `cfree-roundtrip` → `read_elf` → normalized diff | ELF writer/reader fidelity on synthetic input | always (host-arch agnostic) |
-| **E** exec | fixture → `emit_elf` → bytes → `link-exe-runner` → exe → qemu/podman → exit code | file linker + reloc application + ELF emission | when qemu or podman available |
-| **J** jit-via-file | fixture → `emit_elf` → bytes → `jit-runner` (reads .o) → call `test_main` | full file → JIT pipeline | aarch64 host |
-| **O** opt-wrapped | as D and J, but with `opt_cgtarget` between `cg` and target | IPO + lowering preserve behavior | once opt lands |
-
-Path D is intentionally distinct from J: it catches bugs that the ELF
-emitter writes into a `.o` but the reader silently corrects (or
-vice-versa). The two paths together force the post-finalize ObjBuilder
-shape and the read-back ObjBuilder shape to be behaviorally equivalent.
-
-The R, E, and J paths reuse the existing harness binaries unchanged
-(`cfree-roundtrip`, `link-exe-runner`, `jit-runner`). Only `cg-runner` is
-new.
-
-## 4. Layout
-
-```
-test/cg/
- CORPUS.md # coverage matrix (mirrors test/elf/CORPUS.md)
- run.sh # per-fixture: D, R, E, J (and O once opt lands)
- harness/
- cg_runner.c # multi-mode binary
- cg_test.h / cg_test.c # fixture API used by every case
- cases.c # registry: { name, build_fn, expected, flags }
- start.c # symlink/copy of test/link/harness/start.c
- cases/
- a01_return_const.c # cg-driven cases (the primary suite)
- a02_return_void.c
- c01_add_const.c
- c02_arith_mix.c
- ...
- unit/
- mc_smoke.c # MCEmitter-direct
- cgt_load_imm.c # CGTarget-direct anchors
- cgt_binop_int.c
- ...
-```
-
-`cg-runner` modes:
-
-```
-cg-runner --list # print every registered case name
-cg-runner --emit NAME OUT.o # build the case's ObjBuilder, emit_elf to OUT.o
-cg-runner --jit NAME # build, link_add_obj, cfree_link_jit, call test_main
-cg-runner --dump NAME # build, emit_elf, run ArchDisasm over .text — no oracle
-```
-
-The `--dump` mode is for debugging only; it has no expected output and is
-not run by default. Snapshot/golden disassembly tests are deferred.
-
-## 5. Fixture API
-
-The fixture API hides the boilerplate that every case shares: Compiler
-init, ObjBuilder allocation, `cgtarget_new`, `mc_new`, the `CGFuncDesc`
-+ `ABIFuncInfo` dance for `test_main`, and the value-stack vs CGABIValue
-plumbing for `ret`.
-
-```c
-/* test/cg/harness/cg_test.h */
-
-typedef struct CgTestCtx CgTestCtx;
-typedef void (*CgCaseFn)(CgTestCtx*);
-
-typedef struct CgCase {
- const char* name; /* "a01_return_const" */
- CgCaseFn build;
- int expected; /* test_main's return value; 0 if absent */
- unsigned flags; /* CG_CASE_* */
-} CgCase;
-
-extern const CgCase cg_cases[]; /* registry, NUL-terminated */
-extern unsigned cg_cases_count;
-
-/* Common interned types; populated by cg_test_init. */
-extern const Type *T_VOID, *T_I8, *T_I16, *T_I32, *T_I64, *T_U32, *T_U64,
- *T_F32, *T_F64, *T_PTR_VOID;
-
-/* Helpers cases use. */
-CG* cgtest_cg(CgTestCtx*);
-CGTarget* cgtest_target(CgTestCtx*);
-Compiler* cgtest_compiler(CgTestCtx*);
-
-/* Begin a function `<ret_ty> test_main(void)`. Allocates the ObjSymId,
- * builds CGFuncDesc, queries TargetABI, calls cgtarget->func_begin. */
-typedef struct CgTestFn CgTestFn;
-CgTestFn* cgtest_begin_main(CgTestCtx*, const Type* ret_ty);
-
-/* Begin a named function with parameters; for cases that need helpers. */
-CgTestFn* cgtest_begin_fn(CgTestCtx*, const char* name,
- const Type* ret_ty,
- const Type* const* param_tys, unsigned nparams);
-
-/* Return a value sitting on top of cg's value stack; ret_ty must match. */
-void cgtest_ret_value(CgTestFn*);
-void cgtest_ret_void (CgTestFn*);
-
-/* End the function (cgtarget->func_end). */
-void cgtest_end(CgTestFn*);
-
-/* Operand sugar (used by CGTarget-direct unit cases). */
-Operand IMM (i64 v, const Type*);
-Operand REG_OF(Reg r, const Type*);
-Operand LOCAL_(FrameSlot, const Type*);
-Operand GLOBAL_(ObjSymId, i64 addend, const Type*);
-Operand IND (Reg base, i32 ofs, const Type*);
-```
-
-A primary cg-driven case is then ~10 lines:
-
-```c
-/* test/cg/cases/c01_add_const.c — int test_main(void) { return 1 + 2; } */
-#include "cg_test.h"
-
-static void build(CgTestCtx* ctx) {
- CgTestFn* tf = cgtest_begin_main(ctx, T_I32);
- CG* g = cgtest_cg(ctx);
- cg_push_int(g, 1, T_I32);
- cg_push_int(g, 2, T_I32);
- cg_binop(g, BO_IADD);
- cgtest_ret_value(tf);
- cgtest_end(tf);
-}
-
-CG_CASE_REGISTER(c01_add_const, build, /*expected=*/3);
-```
-
-A CGTarget-direct unit anchor:
-
-```c
-/* test/cg/unit/cgt_load_imm.c */
-#include "cg_test.h"
-
-static void build(CgTestCtx* ctx) {
- CgTestFn* tf = cgtest_begin_main(ctx, T_I32);
- CGTarget* a = cgtest_target(ctx);
- Reg r = a->alloc_reg(a, RC_INT, T_I32);
- a->load_imm(a, REG_OF(r, T_I32), 0xdeadbeefLL);
- /* hand-build CGABIValue for ret */
- cgtest_ret_reg(tf, r, T_I32);
- cgtest_end(tf);
-}
-
-CG_CASE_REGISTER(cgt_load_imm, build, /*expected=*/(int)0xdeadbeef);
-```
-
-`CG_CASE_REGISTER` is a macro that appends to the registry via a constructor
-section or a manual table in `cases.c` (depending on freestanding constraints).
-Since cfree itself is freestanding-only but the test runner is hosted, we can
-use `__attribute__((constructor))` here; if that's undesirable we maintain
-`cases.c` by hand as a list.
-
-## 6. CORPUS coverage groups
-
-Each group has a one-line description here; CORPUS.md will expand into
-explicit cases. **Initial landing covers groups A and C only** — that's
-the spine that proves the harness works end-to-end.
-
-| Group | Surface | Examples |
-|---|---|---|
-| **A** Lifecycle / return | `func_begin/end`, `ret`, return widths, sret | const return; void return; multiple returns; i8/i16/i32/i64/struct return |
-| **B** Frame slots / params | `frame_slot`, `param`, address-taken, byval, sret-param | sum two int params; spill 9 params; address-taken local; struct byval |
-| **C** Arithmetic | `binop` (all), `unop` | IADD/ISUB/IMUL; SDIV/UDIV/SREM/UREM; AND/OR/XOR; SHL/SHR_S/SHR_U; FADD..FDIV |
-| **D** Compare / branch | `cmp`, `cmp_branch`, `scope_*`, fusion | cmp materialize 0/1; cmp_branch eq/ne/lt; structured if; structured if-else |
-| **E** Convert | `convert` (all `ConvKind`) | sext/zext/trunc; itof/ftoi; fext/ftrunc; bitcast |
-| **F** Memory | `load`/`store`/`addr_of`/`copy_bytes`/`set_bytes`/bitfield | load/store i8..i64; global rw; indirect rw; struct copy; memset zero; bitfield_load/store; volatile |
-| **G** Calls | `call`, `ret`, ABI | direct + indirect; sret; byval arg; mixed gp+fp; >reg-count args; varargs call |
-| **H** Control flow | `label`/`jump`/`scope_*`/`break_to`/`continue_to` | flat if; while; for-with-continue lands on inc; do/while; nested loops; goto |
-| **I** alloca | `alloca_` | __builtin_alloca round-trip |
-| **J** Variadics | `va_start_/va_arg_/va_end_/va_copy_` | walk va_list with int args; va_copy |
-| **K** Atomics | `atomic_load/store/rmw/cas/fence`, `MemOrder` | per op × per order matrix (sampled) |
-| **L** Intrinsics | `intrinsic` × `IntrinKind` | popcount/ctz/clz; bswap; expect (no-op); add_overflow multi-result |
-| **M** Inline asm | `asm_block` | "r"/"=r" round-trip; "memory" clobber forces flush |
-| **N** TLS | `tls_addr_of` | thread_local read/write |
-| **O** Sections / globals | `frame_slot` + `DeclTable` | .data init; .bss tentative; .rodata constant pool; -ffunction-sections |
-| **P** set_loc / debug | `set_loc`, MCEmitter line program | line numbers reach .debug_line (oracle TBD) |
-| **Q** Multi-function (1 TU) | multiple func_begin/end | fn calls fn; mutual recursion; forward-declared callee |
-| **R** opt-wrapped equivalence | `opt_cgtarget` | every other group's exit codes preserved through opt |
-
-Group order roughly tracks priority. A is the prerequisite for every
-other group; B + C + D are needed for almost any non-trivial case;
-F + G + H bring us to "real C". K, M, N, P, R can land later without
-blocking the rest.
-
-## 7. Initial landing — Spine (A + C)
-
-Concrete first cases. Each is `int test_main(void) { return ...; }`.
-
-| Case | Body | expected |
-|---|---|---|
-| `a01_return_const` | `return 42;` | 42 |
-| `a02_return_zero` | `return 0;` | 0 |
-| `a03_return_neg` | `return -7;` | -7 (i.e. 249 mod 256 from runner) |
-| `a04_return_i8` | `int test_main(void){return (i8)200;}` exercising widening | as expected |
-| `c01_add` | `return 1 + 2;` | 3 |
-| `c02_sub_mul` | `return 7 * 3 - 4;` | 17 |
-| `c03_div_mod` | `return 23 / 4 + 23 % 4;` | 8 |
-| `c04_bitwise` | `return (~3) & 0xff;` | 252 |
-| `c05_shift` | `return (1 << 5) | (16 >> 1);` | 40 |
-
-Plus unit anchors:
-
-| Case | Layer | Notes |
-|---|---|---|
-| `mc_smoke` | MCEmitter | one-insn `test_main`; analogue of `test/elf/unit/smoke.c` |
-| `cgt_load_imm_ret` | CGTarget | `alloc_reg` + `load_imm` + `ret` only, no `cg` |
-| `cgt_binop_iadd` | CGTarget | `binop(BO_IADD, REG, IMM)` only, no `cg` |
-
-These exit-code oracles are enough to drive AArch64 backend bring-up
-through return + integer arith. The R/E/J/D paths from §3 give us
-ELF-roundtrip, link-exe + qemu, JIT, and direct-JIT coverage for free.
-
-## 8. Build integration
-
-Add a `test-cg` target in `test/test.mk`:
-
-```make
-test-cg: lib bin-soft
- bash test/cg/run.sh
-```
-
-Wire into the `test` aggregate. `cg-runner` builds against `libcfree.a`
-the same way `link-exe-runner` and `jit-runner` do
-(`$(CC) $(DRIVER_CFLAGS) ... $(LIB_AR)`).
-
-`run.sh` walks `cg-runner --list`, then per case runs paths D, R, E, J
-(and later O), reporting one PASS/FAIL/SKIP line per (case, path) pair —
-identical to `test/link/run.sh`. Skip-vs-fail is governed by
-`CFREE_TEST_ALLOW_SKIP`, matching the existing convention.
-
-## 9. Non-goals (this strategy)
-
-- **Encoding/disassembly snapshots.** Deferred. Behavioral exit codes
- only; we'll add encoding lock-ins for specific instruction-selection
- guarantees later, surgically.
-- **Negative tests.** Not in this harness. CG/CGTarget contract
- violations are covered by assertions inside the implementations and by
- parser-level diagnostics tests once the parser exists.
-- **C-source-level tests.** Those arrive once the parser is real; they
- share the R/E/J paths but bypass the cg-runner fixture API. The
- fixtures here continue to exist as bisection anchors.
-- **Cross-arch.** AArch64 only for now. The harness is per-arch (cases
- may be reused if encodings differ but ABI does not); a future x86_64
- pass duplicates the runner with target = x86_64 and the same case set
- modulo arch-specific overrides.
diff --git a/doc/linker-status.md b/doc/linker-status.md
@@ -1,238 +0,0 @@
-# Linker / JIT / ELF status
-
-Tracks the three behavioral harnesses that share the link + obj surface:
-
-- `make test-elf` — ELF object-file fidelity (read / write / roundtrip).
-- `make test-link` — link + JIT (R/E/J paths per case).
-- `make test-cg` — codegen + JIT (D/R/E/J paths per case).
-
-`test-elf` is **strictly object-file fidelity**. Linker and exe behavior
-live in `test/link/` — they are not duplicated in `test/elf/`.
-
----
-
-## Current results
-
-| Harness | Pass | Fail | Notes |
-|--------------------------|-----:|-----:|--------------------------------------|
-| `test-elf` | 37 | 0 | All Layer A/B/C green |
-| `test-link` R (aa64) | 38 | 0 | object roundtrip via cfree-roundtrip |
-| `test-link` E (aa64) | 37 | 0 | qemu/podman aarch64 exec, incl. IFUNC |
-| `test-link` J (aa64) | 38 | 0 | JIT in-process incl. GC subgroup, IFUNC, TLS |
-| `test-link` R (rv64) | 38 | 0 | object roundtrip via cfree-roundtrip |
-| `test-link` R (aa64-macho) | 36 | 0 | Mach-O object roundtrip via cfree-roundtrip-macho (3 cases SKIP-NA: ELF-only) |
-| `test-link` E (rv64) | 38 | 0 | qemu/podman riscv64 exec, incl. IFUNC + TLS |
-| `test-link` bad | 2 | 0 | `bad/30_undef_strong` (E + J) |
-| `test-musl` | 6 | 0 | musl 1.2.5 static + dynamic: syscall, errno, printf |
-| `test-glibc` | 3 | 0 | glibc 2.36 dynamic: syscall, errno, printf |
-
-(R = roundtrip; E = link → aarch64 ELF → qemu/podman; J = JIT in-process.)
-
-`test-musl` links real C against pinned musl 1.2.5 in two variants:
-**static** (libc.a + cfree's own `rt/build/aarch64-linux/libcfree_rt.a`,
-classic ET_EXEC) and **dynamic** (libc.so + Scrt1.o, ET_DYN PIE with
-PT_INTERP / PT_DYNAMIC / PLT / .got.plt and BIND_NOW resolution
-against the runtime loader). Both variants run the result under
-qemu/podman. Sysroots are produced by `test/libc/musl/Containerfile`
-(Alpine 3.20 + musl 1.2.5-r3) and `test/libc/glibc/Containerfile`
-(Debian bookworm + glibc 2.36); cases are shared under
-`test/libc/cases/`. Excluded from the default `make test` because
-they need podman.
-
----
-
-## What works today
-
-`cfree ld` links real aarch64-linux executables in both **static**
-ET_EXEC and **dynamic** ET_DYN PIE shapes, including against real
-musl libc.a / libc.so + cfree's own `libcfree_rt.a`. `printf("hello,
-musl")` works end-to-end against the runtime loader
-(`/lib/ld-musl-aarch64.so.1`). Beyond that:
-
-- **Reloc kinds applied (AArch64):** ABS{16,32,64}, PREL{16}, REL32,
- PC32, CONDBR19, TSTBR14, LD_PREL_LO19, ADR_PREL_LO21, JUMP26 /
- CALL26, ADR_PREL_PG_HI21{,_NC}, ADD_ABS_LO12_NC,
- LDST{8,16,32,64,128}_ABS_LO12_NC,
- ADR_GOT_PAGE / LD64_GOT_LO12_NC,
- TLSLE_ADD_TPREL_{HI12,LO12_NC}. Plus a synthetic R_ABS64 emitter
- for GOT slot fill. **Reads every reloc kind in musl 1.2.5 aarch64
- libc.a.** Dynamic emit pass also produces R_AARCH64_RELATIVE,
- R_AARCH64_GLOB_DAT, and R_AARCH64_JUMP_SLOT records (.rela.dyn /
- .rela.plt) for the runtime loader.
-- **Reloc kinds applied (RISC-V LP64):** ABS{32,64}, PC32, HI20,
- LO12_{I,S}, BRANCH, JAL, CALL / CALL_PLT (auipc+jalr pair),
- RVC_BRANCH, RVC_JUMP, TPREL_HI20, TPREL_LO12_{I,S}. Marker relocs
- (RELAX, TPREL_ADD) are accepted as no-ops; cfree does not relax.
- PCREL_HI20 / PCREL_LO12_{I,S} and GOT_HI20 are recognized in widths
- but not yet exercised by the test corpus — slot-PC pairing is
- follow-up work.
-- **Symbol resolution:** STB_GLOBAL/WEAK/LOCAL replacement strength;
- STV_HIDDEN; SHN_COMMON coalesce-to-largest; STT_FILE / STT_SECTION
- pass-through. Weak archive defs satisfy unresolved refs (matches
- GNU ld / lld; required for musl's weak `__init_tls`).
-- **Linker-synthesized symbols:** `__init_array_start/end`,
- `__fini_array_start/end`, `__tdata_start/end` (vaddrs of the .tdata
- template), `__tbss_size` (SK_ABS holding the .tbss byte count), and
- general `__start_<X>`/`__stop_<X>` for any encoding section.
-- **Section / segment layout:** four-bucket RX / R / RW / TLS
- partition, BSS, init/fini/preinit_array, synthetic `.got`.
- **Same-named input sections merge by first-occurrence** — required
- for `_init`/`_fini` to be contiguous when `.init` / `.fini` come
- from crti.o + crtn.o. `-ffunction-sections` / `-fdata-sections`
- flow through naturally.
-- **TLS local-exec (AArch64 + RV64):**
- `R_AARCH64_TLSLE_ADD_TPREL_{HI12, LO12_NC}` and
- `R_RISCV_TPREL_{HI20,LO12_I,LO12_S}` apply against the per-image
- TLS span; .tdata/.tbss
- sections (SHF_TLS) layout into a dedicated SEG_TLS segment with
- natural alignment preserved on PT_TLS (separate from the
- containing PT_LOAD's page align). The exe writer emits both the
- PT_LOAD (so the kernel maps the .tdata template) and a PT_TLS
- pointing at it; the AArch64 ABI's 16-byte TCB offset is folded
- into S at apply time. The freestanding `_start` (and `jit-runner`)
- build the per-thread block — TCB(16) | .tdata copy | .tbss zero —
- using the synthesized boundary symbols and `msr TPIDR_EL0`. On
- Darwin libc routinely clobbers TPIDR_EL0, so the harness keeps
- msr → blr back-to-back with no libc calls between.
-- **Inputs:** loose `.o`, `.a` (demand + `--whole-archive`),
- `--start-group` / `--end-group` cyclic resolution.
-- **GC:** `--gc-sections` at section granularity. Roots: entry sym,
- init/fini/preinit_array, `SF_RETAIN` (`SHF_GNU_RETAIN`),
- `__start_/__stop_` referents. Edges follow per-section relocs to
- fixed point.
-- **IFUNC trampoline (JIT and ELF):** every defined `STT_GNU_IFUNC`
- symbol gets a 12-byte stub in a synthetic `.iplt` (RX) section
- and an 8-byte slot in `.igot.plt` (RW). AArch64 stub is
- `adrp x16, slot ; ldr x16,[x16,:lo12:slot] ; br x16`; RV64 stub
- is `auipc t1, hi ; ld t1, lo(t1) ; jalr x0, t1`. The RV64 stub's
- PC-rel displacement to its slot is invariant under the segment
- shift, so the bytes are pre-encoded at layout time without
- apply-time relocs. The IFUNC's vaddr is redirected
- to the stub, and cross-TU undef refs to the same name are
- re-pointed at the stub via a propagation pass at the tail of
- `layout_iplt`. JIT load calls each resolver in-process after
- applying relocs and writes the chosen implementation pointer
- into the slot. ELF emit also materializes a parallel
- `.iplt.pairs` data section (alternating `(resolver_ptr, slot_ptr)`
- u64s, filled via `R_ABS64`) plus boundary symbols
- `__start_iplt_pairs` / `__stop_iplt_pairs`, and synthesizes a
- one-entry `.preinit_array` referencing
- `__cfree_ifunc_init` (provided by `libcfree_rt.a`). Preinit
- runs strictly before any `.init_array` ctor, so user ctors
- that call IFUNCs see filled slots. The rt member is pulled
- via demand-load: `link_ingest_archives` seeds
- `__cfree_ifunc_init` into the archive wanted set whenever an
- input defines an IFUNC and `link_set_emit_static_exe` was set
- (which `cfree_link_exe` does and `cfree_link_jit` does not).
-- **Format fidelity:** ELF read+write byte-stable for the supported
- subset; `EI_OSABI=GNU` flips automatically when GNU extensions are
- present.
-- **Exe section + symbol tables:** the static ET_EXEC writer emits
- `.symtab` / `.strtab` / `.shstrtab` and a section header table.
- Defined symbols carry final absolute addresses (IMAGE_BASE + image
- vaddr); SK_FILE / SK_ABS / SK_COMMON map to SHN_ABS / SHN_COMMON;
- per-input undef-vs-canonical-def shadow records are deduped via
- `img->globals`. Per-name input sections survive into the output as
- one `(segment, name)` shdr — `.text`, `.rodata`, `.data`, `.bss`,
- `.init_array`, `.fini_array`, `.eh_frame`, `.got`, etc., named
- per their input. `nm`, `objdump -t`, `readelf -s` all work.
-- **Build-id:** an allocatable `.note.gnu.build-id` with a 16-byte
- digest goes into the headers PT_LOAD; a PT_NOTE phdr makes it
- discoverable via `dl_iterate_phdr`. The digest is FNV-1a 64 over
- each segment with two seeds, mixed into 128 bits — deterministic
- given the post-relocation segment bytes.
-- **`.eh_frame` flow-through:** input `.eh_frame` survives into the
- output with a properly named PROGBITS+ALLOC shdr at its final
- vaddr. Sufficient for `backtrace()` past the innermost frame on
- toolchains that scan `.eh_frame` linearly; fast lookup via
- `.eh_frame_hdr` + PT_GNU_EH_FRAME is still TODO (binary search
- index over FDEs).
-- **Dynamic linking against `.so` deps:** `cfree ld -pie -o out
- Scrt1.o crti.o user.o libc.so libcfree_rt.a crtn.o` produces an
- ET_DYN PIE that runs against the musl runtime loader. The driver
- parses `-dynamic-linker`, recognizes `.so` / `.so.N` positional
- inputs, and routes `-l<name>` under `-Bdynamic` to `lib<name>.so`
- before `lib<name>.a`. The link image carries a synthetic
- `.interp` / `.dynsym` / `.dynstr` / `.gnu.hash` /
- `.rela.dyn` / `.rela.plt` / `.plt` / `.got.plt` / `.dynamic`
- layout, with PT_PHDR / PT_INTERP / PT_DYNAMIC / PT_GNU_STACK
- phdrs, DT_NEEDED per consumed DSO soname, and `DF_1_NOW`
- (BIND_NOW eager binding). PLT0 + per-import 16-byte stubs are
- emitted; CALL26 / JUMP26 against an imported symbol is rewritten
- to its PLT entry, and abs / GOT-slot references against imports
- emit `R_AARCH64_GLOB_DAT` so the loader patches the resolved
- runtime address before user code runs. PIE internal abs64
- fixups emit `R_AARCH64_RELATIVE`.
-- **Driver:** `cfree ld -static -o out crt1.o crti.o user.o libc.a
- libcfree_rt.a crtn.o` works. Output is chmod 0755 on success.
-- **JIT path** runs the same resolved image in-process; MAP_JIT on
- Apple Silicon.
-
----
-
-## Gaps before this can replace GNU ld / lld
-
-Each row below would break a typical real-world Linux invocation. Roughly
-ordered by how often the gap actually bites.
-
-| Gap | What breaks | Effort |
-|-----|-------------|--------|
-| **`.eh_frame_hdr` + PT_GNU_EH_FRAME** | `.eh_frame` already flows through with a proper shdr; without `.eh_frame_hdr` libgcc/libunwind fall back to linear FDE scan, and `dl_iterate_phdr` consumers (most modern unwinders) skip the section entirely. Needs FDE parsing + sorted binary-search table emission. | medium |
-| **`.debug_*` in the exe** | No DWARF → `gdb` blind on source lines. cfree's debug pipeline ends at the obj boundary; the linker drops non-`SF_ALLOC` sections. | medium |
-| **TLSGD / TLSIE / TLSLD relocs** | Read but not applied. Needed for `-fpic` TLS or shared-lib TLS. Local-exec works; the dyn-link cut leaves GD/IE/LD as Phase 8. | medium |
-| **`cfree_link_shared` (`-shared` ET_DYN libs)** | Driver and inputs are wired (DSO read, dyn tables) but `cfree_link_shared` still panics with "not yet implemented". The parallel path through `link_exe` would only need `output_kind = SHARED`, `allow_undefined = 1`, no entry sym, DT_SONAME, exports promoted into dynsym. | small (after Phase 5) |
-| **`--export-dynamic` exports in dynsym** | Imports are in `.dynsym`; internal exports the consumer wants visible to dependents (e.g. dlopen plugins, callbacks the loader resolves) are not yet promoted. Not exercised by the musl harness. | small |
-| **Linker scripts** | `link_set_script` panics with "not yet implemented". Parser exists in `cfree_link_script_parse` but isn't wired into `link_resolve`. | medium |
-| **COMDAT-group atomicity in `--gc-sections`** | C++ inline / weak-template instantiations under `SHF_GROUP` could lose group members. C-only inputs don't exercise it. | small |
-| **`crt1.o`/`crti.o`/`crtn.o` auto-link** | Driver doesn't auto-include a C runtime; the user passes `crt1.o crti.o ... crtn.o` explicitly. Cosmetic. | small (driver-only) |
-
-**Bottom line:** for aarch64-linux executables — both static ET_EXEC
-and dynamic ET_DYN PIE against real musl — `cfree ld` is a working
-linker. STT_GNU_IFUNC in ELF output (rt-driven preinit) and BIND_NOW
-dynamic linking against `.so` deps both pass end-to-end. The next
-priorities, roughly in order:
-
-1. **`.eh_frame_hdr` + PT_GNU_EH_FRAME** — `.eh_frame` already flows
- through; building the binary-search index over FDEs unblocks fast
- unwind and `dl_iterate_phdr`-driven consumers (modern libunwind,
- libgcc's `_Unwind_Find_FDE`).
-2. **`.debug_*` in the exe** — DWARF flow-through; the linker
- currently drops non-`SF_ALLOC` sections at `section_kept`.
-3. **`cfree_link_shared`** — the dyn-table machinery is reusable
- from the exe path; producing `libfoo.so` is mostly a dispatch
- wrapper plus exports-into-dynsym.
-
-TLS GD / IE / LD modes remain Phase 8 work; lazy-binding (no
-`DF_1_NOW`) is a follow-up that needs a real `_dl_runtime_resolve`
-PLT0 — eager binding is fine for v1.
-
-The IFUNC iplt stub bytes (`0x90000010 / 0xf9400210 / 0xd61f0200`)
-are still hand-encoded inline in `layout_iplt`; moving them behind
-a per-arch `LinkArch.emit_iplt_stub` hook in `src/arch/<arch>.c`
-is bounded follow-up work — useful when a second arch lands but
-not load-bearing today.
-
----
-
-## test-link harness — speed and ergonomics
-
-`test/link/run.sh` accepts:
-
-```
-./run.sh [name_filter] [paths] # or CFREE_TEST_FILTER / CFREE_TEST_PATHS
-```
-
-`name_filter` is a substring against case dir names (e.g. `02`,
-`weak`); `paths` is any subset of `REJ` (default `REJ`). PASS/FAIL
-lines carry per-case ms timings; a totals line prints per-path wall
-time.
-
----
-
-## Build hygiene (still load-bearing)
-
-- `Makefile` uses `-MMD -MP` so header edits force dependents to rebuild.
-
-If a test result looks impossible given the source, suspect staleness
-first (`make clean && make lib && make test-link`). If that then works,
-investigate the source of staleness and fix the Makefile.
diff --git a/doc/rv64-status.md b/doc/rv64-status.md
@@ -1,140 +0,0 @@
-# RV64 codegen status
-
-Living checklist for the RISC-V (RV64IMFD, LP64D) backend (`src/arch/rv64.c`)
-and ABI (`src/abi/abi_rv64.c`). Behavioral oracles are `test/cg/` and
-`test/parse/`. Phase status:
-
-- ✅ landed
-- 🚧 in progress
-- ⬜ planned
-
----
-
-## Current test-cg results
-
-Run from an aarch64 host; D / J are skipped because they need a native
-rv64 host (same pattern as x64 on aa64). R verifies emit→reader fidelity,
-E links and runs under qemu-riscv64 via podman.
-
-| Path | Pass | Fail | Skip |
-|----------------------------|-----:|-----:|-----:|
-| R (roundtrip) | 386 | 0 | 0 |
-| E (qemu exec) | 387 | 0 | ~ |
-| D / J (native JIT) | 0 | 0 | 772 |
-
-Skips are valid: D and J require host == rv64. With
-`CFREE_TEST_ALLOW_SKIP=1`, the suite reports **773 pass, 0 fail, 768 skip**.
-
----
-
-## Phase 1 — Backend foundation ✅
-
-- ✅ `rv64_isa.h` — R/I/S/B/U/J encoders, FP (F+D), atomic (A), FCVT, FMV
-- ✅ Register pools (s2..s11 int, fs2..fs11 fp); scratches t0..t3, ft0
-- ✅ Frame layout: locals at s0 - off, callee saves below locals,
- saved-s0/ra at the top, outgoing args at sp+0
-- ✅ Prologue placeholder + func_end patch (mirrors aa64's pattern)
-- ✅ Epilogue restores sp from s0 when alloca was used
-- ✅ ABI: replaced indirect-everything stub with real LP64D scalar +
- small-aggregate classification (≤16B → up to 2 INT parts)
-- ✅ `mc.c` apply_fixup handles R_RV_BRANCH and R_RV_JAL
-
-## Phase 2 — Core ops ✅
-
-- ✅ load_imm: ADDI / LUI+ADDIW / multi-step 64-bit
-- ✅ copy, load, store, addr_of (LOCAL / INDIRECT / GLOBAL via AUIPC+PCREL)
-- ✅ binop (all int + FP add/sub/mul/div), unop, cmp, cmp_branch
-- ✅ convert: SEXT/ZEXT/TRUNC/ITOF/FTOI/FEXT/FTRUNC/BITCAST
-- ✅ Structured scopes: IF / LOOP / BLOCK
-- ✅ Calls (direct AUIPC+JALR with R_RV_CALL; indirect JALR)
-- ✅ Sret returns (caller passes dst pointer in a0; callee spills to slot)
-- ✅ alloca (const + runtime size, max_outgoing patch site)
-- ✅ Aggregate copy_bytes / set_bytes / bitfield_load / bitfield_store
-- ✅ Atomics: load/store (with LR/fence sequences), AMO via LR/SC, CAS
-- ✅ Intrinsics: memcpy/memmove/memset, popcount, ctz, clz, bswap16/32/64,
- add/sub/mul_overflow, expect, assume_aligned, prefetch, trap
-
-## Phase 3 — Variadic LP64D ✅
-
-Variadic-args calling convention with **save area contiguous with caller's
-stack args** so a single `void*` walk works for any number of args.
-
-- ✅ va_list = `void*`; va_start / va_arg / va_end / va_copy
-- ✅ va_arg handles RC_INT and RC_FP (bitcast via FMV.X.{W,D})
-- ✅ Variadic FP **args being passed** are bitcast into integer regs
- (RC_FP storage → FMV.X.{W,D} → a-reg)
-- ✅ Save area sits at the very top of the frame, above the saved-s0/ra
- pair, so [save_area, save_area+64, caller's stack] is one contiguous
- byte stream — `save_area[8]` coincides with the caller's first stack arg
-- ✅ Prologue spills only a_{next_param_int}..a7 (named int params already
- landed in their own slots; sret consumes a0 when present)
-- ✅ Stack-arg reads in `rv_param` use `caller_stack_base = 16 + 64` for
- variadic functions to skip past the save area
-
-## Phase 4 — TLS LE ✅
-
-Local-Exec model. The LUI+ADD+ADDI / TPREL_HI20+LO12 sequence and
-`__tbss_size` / TLS-image layout were already correct; the remaining
-failures (`n02_tls_store_le`, `n07_tls_bss_zero_init`) were rejected by
-qemu-riscv64-static before reaching code with the diagnostic
-"PT_LOAD with non-writable bss". Cases with no `.tdata` (only `.tbss`)
-produced an SF_TLS PT_LOAD whose `p_memsz` extended past `p_filesz=0`
-into the `.tbss` span without PF_W.
-
-- ✅ ELF PT_LOAD over an SF_TLS segment now keeps `p_memsz == p_filesz`;
- the `.tbss` extension is described exclusively by PT_TLS, which the
- loader uses to size each thread's TLS block (see `src/link/link_elf.c`
- segment phdr loop). Matches what GNU ld emits — `.tbss` consumes no
- PT_LOAD memory.
-
-## Phase 5 — test-parse on rv64 ⬜
-
-`test-parse` is the file-driven C-parser harness (`test/parse/`); it
-reuses the cg roundtrip/exec runners and exercises the parser through
-the same R/E/J/W paths. Today it runs aa64-only.
-
-- ⬜ Verify the parse runner picks up CFREE_TEST_ARCH=rv64 (likely needs
- no changes — it already uses cfree_test_target_init)
-- ⬜ Run `CFREE_TEST_ARCH=rv64 make test-parse` and triage failures
-- ⬜ Decide on opt-out filtering for arch-specific parse cases (asm
- templates, target-specific builtins). Pattern follows the per-case
- `arches` mask added to test/cg
-- ⬜ Land a phased rv64 entry in `test/parse/CORPUS.md` mirroring this doc
-
-## Phase 6 — Beyond v1 ⬜
-
-- ⬜ `mc_smoke` rv64 sibling (hand-crafted bytes that return 42)
-- ⬜ Compressed (RVC) emission when output is denser
-- ⬜ M-extension overflow detection for `mul_overflow` i64 (today panics)
-- ⬜ Zbb fast paths for popcount/ctz/clz/bswap when ABI permits
-- ⬜ Inline asm (`rv_asm_block` panics today)
-- ⬜ Real address-out-of-imm12 expansion in addr_base (panics on giant
- frames; the test corpus stays well within the imm12 window)
-
----
-
-## Known mismatches with aa64 conventions
-
-- Reg pools are 10 wide (s2..s11) vs. aa64's 10 (x19..x28). s1 (and
- fs0/fs1) are reserved/unused.
-- AUIPC+PCREL_HI20/LO12 anchor symbols (`.LpcrelHi<N>`) are emitted as
- SB_LOCAL into the current section; cfree-ld looks them up by AUIPC
- vaddr (see `src/link/link_elf.c:rv_pcrel_lo12_disp`).
-- The aarch64 backend pairs saved fp+lr via STP; rv64 has no pair-store
- so prologue is two SDs instead.
-
----
-
-## Reproduce
-
-```sh
-# rv64 cg (R-only avoids needing podman):
-make lib
-CFREE_TEST_ARCH=rv64 bash test/cg/run.sh '' R
-
-# rv64 cg (all paths, qemu-riscv64 + podman required):
-CFREE_TEST_ARCH=rv64 CFREE_TEST_ALLOW_SKIP=1 bash test/cg/run.sh '' DREJW
-
-# rv64 parse (planned):
-CFREE_TEST_ARCH=rv64 make test-parse
-```
diff --git a/src/abi/abi.c b/src/abi/abi.c
@@ -287,8 +287,7 @@ const Type* abi_va_list_type(TargetABI* a, Pool* p) {
* ARM64 / x86_64 ABIs differ from AAPCS64 / SysV-x64 in calling
* convention details (variadic-on-stack, va_list shape, stack-arg
* promotion). Microsoft x64 / Windows-on-ARM64 will land here as
- * additional (arch, OS_WINDOWS) cases when COFF support arrives.
- * See doc/MULTIOBJ.md §3.4. */
+ * additional (arch, OS_WINDOWS) cases when COFF support arrives. */
static const ABIVtable* select_vtable(Compiler* c) {
switch (c->target.arch) {
case CFREE_ARCH_ARM_64:
diff --git a/src/abi/abi_apple_arm64.c b/src/abi/abi_apple_arm64.c
@@ -1,7 +1,7 @@
/* Apple ARM64 (Darwin) ABI dispatch.
*
- * Phase 2 of doc/MULTIOBJ.md. Vtable selection keys on
- * (target.arch, target.os); (ARM_64, MACOS) lands here instead of
+ * Vtable selection keys on (target.arch, target.os);
+ * (ARM_64, MACOS) lands here instead of
* AAPCS64. The two ABIs diverge in:
*
* 1. va_list shape — Apple ARM64 `__builtin_va_list` is plain
diff --git a/src/abi/abi_internal.h b/src/abi/abi_internal.h
@@ -22,7 +22,7 @@ extern const ABIVtable aapcs64_vtable;
extern const ABIVtable sysv_x64_vtable;
extern const ABIVtable rv64_vtable;
/* Apple Darwin variants — selected when (arch, os) matches. See
- * abi.c::select_vtable and doc/MULTIOBJ.md §3.4. */
+ * abi.c::select_vtable. */
extern const ABIVtable apple_arm64_vtable;
/* Shared TargetABI internals. The struct definition is here so each ABI
diff --git a/src/arch/aarch64.c b/src/arch/aarch64.c
@@ -1939,8 +1939,7 @@ static void emit_arg_value(CGTarget* t, const CGABIValue* av, u32* next_int,
*
* Apple ARM64 (Darwin) diverges: variadic args go on the stack only.
* Detect the synthesized-vararg case and bump the next-int / next-fp
- * cursors past the register pool so the part below routes to stack.
- * See doc/MULTIOBJ.md §3.4. */
+ * cursors past the register pool so the part below routes to stack. */
ABIArgInfo va_ai;
ABIArgPart va_pt;
const ABIArgInfo* ai = av->abi;
diff --git a/src/arch/x64.c b/src/arch/x64.c
@@ -2,7 +2,7 @@
*
* Phase-2 placeholder: the vtable is wired up but every method panics.
* This proves the cgtarget_new dispatch reaches an x64-shaped target.
- * Phase 3 fills in real codegen — see doc/MULTIARCH.md §4. */
+ * Phase 3 fills in real codegen. */
#include <string.h>
diff --git a/src/cg/cg.c b/src/cg/cg.c
@@ -11,7 +11,7 @@
* - OPK_LOCAL / OPK_GLOBAL / OPK_INDIRECT are lvalues. cg_load promotes
* them to OPK_REG via target->load + a fresh scratch register.
*
- * Register pressure & spill (see doc/REGALLOC.md):
+ * Register pressure & spill:
* - Each SValue carries an SResidency tag (INHERENT / REG / SPILLED).
* REG-residing SValues own a physical scratch register that must be
* released back to the pool when the value is consumed; SPILLED
@@ -57,7 +57,7 @@
* (IMM, LOCAL, GLOBAL) carry no register obligation. REG values own a
* physical scratch register. SPILLED values had their register evicted
* to a frame slot under register pressure and must be reloaded before
- * consumption. See doc/REGALLOC.md §3. */
+ * consumption. */
typedef enum SResidency {
RES_INHERENT,
RES_REG,
@@ -102,8 +102,7 @@ struct CG {
/* Per-function spill-slot free-lists, one per RegClass. A spill takes a
* slot from the free-list (allocating fresh from the backend if empty);
* a reload returns the slot. Frame footprint is bounded by the peak
- * concurrent spills per class. Reset at func_end. See doc/REGALLOC.md
- * §3.2. */
+ * concurrent spills per class. Reset at func_end. */
struct {
FrameSlot* free;
u32 n;
@@ -287,7 +286,7 @@ static Operand op_indirect(Reg base, i32 ofs, const Type* ty) {
}
/* ============================================================
- * Register-pool & spill driver — see doc/REGALLOC.md
+ * Register-pool & spill driver
* ============================================================ */
/* Class an SValue's register lives in: RC_FP for float types, RC_INT for
diff --git a/src/link/link_image_id.c b/src/link/link_image_id.c
@@ -12,7 +12,7 @@
* - COFF/PE debug directory (deferred)
*
* Lived in link_elf.c through Phase 0; lifted out so the Mach-O writer
- * sees the same bytes (doc/MULTIOBJ.md §3.5). */
+ * sees the same bytes. */
#include "core/core.h"
#include "link/link_internal.h"
diff --git a/src/link/link_internal.h b/src/link/link_internal.h
@@ -313,8 +313,7 @@ void link_reloc_apply(Compiler*, RelocKind, u8* P_bytes, u64 S, i64 A, u64 P);
/* Public link_emit_image_writer dispatches by Compiler.target.obj. The
* ELF implementation lives in link_elf.c and dispatches internally on
* Compiler.target.arch for e_machine and reloc translation. The Mach-O
- * peer (link_macho.c) and COFF peer arrive in later phases of
- * doc/MULTIOBJ.md. */
+ * peer (link_macho.c) and COFF peer arrive in later phases. */
void link_emit_elf(LinkImage*, Writer*);
void link_emit_macho(LinkImage*, Writer*);
diff --git a/src/obj/macho.h b/src/obj/macho.h
@@ -1,6 +1,6 @@
/* Mach-O wire-format constants, structs, and per-arch reloc translators
* shared between obj/macho_emit.c, obj/macho_read.c, and link/link_macho.c
- * (none of which exist yet — see doc/MULTIOBJ.md).
+ * (none of which exist yet).
*
* Private to src/. The public ObjBuilder/Linker surface is format-neutral
* (obj/obj.h, link/link.h); the Mach-O spelling of those abstractions only
@@ -252,7 +252,7 @@ typedef struct MachRelocInfo {
/* Map cfree-canonical RelocKind <-> arm64 Mach-O reloc type. Returns
* (u32)-1 on unsupported kinds; the caller (emit_macho / read_macho)
* panics with a diagnostic. Stubs in macho_reloc_aarch64.c until the
- * Phase 2 writer lands (see doc/MULTIOBJ.md). */
+ * Phase 2 writer lands. */
u32 macho_aarch64_reloc_to(u32 kind /* RelocKind */);
u32 macho_aarch64_reloc_pcrel(u32 kind /* RelocKind */);
u32 macho_aarch64_reloc_length(u32 kind /* RelocKind */);
diff --git a/src/obj/macho_emit.c b/src/obj/macho_emit.c
@@ -15,7 +15,7 @@
*
* 64-bit little-endian only. Big-endian / 32-bit panics at entry.
*
- * See doc/MULTIOBJ.md §3.1 for the round-trip invariant: read_macho of
+ * Round-trip invariant: read_macho of
* this output must produce an ObjBuilder shape-equivalent to the input,
* modulo (a) Mach-O's mandatory (segname, sectname) pairing and (b)
* any synthesized N_SECT symbols. The (segname,sectname) form chosen
diff --git a/src/obj/obj.h b/src/obj/obj.h
@@ -379,7 +379,7 @@ void obj_symiter_free(ObjSymIter*);
* helpers pick the right name for the active target.obj so the linker
* doesn't carry per-format switches at every synthesis site. ELF
* returns the historical names; Mach-O / COFF panic until those
- * writers land (see doc/MULTIOBJ.md §3.5). */
+ * writers land. */
Sym obj_secname_init_array(Compiler*);
Sym obj_secname_fini_array(Compiler*);
Sym obj_secname_preinit_array(Compiler*);
diff --git a/src/obj/obj_secnames.c b/src/obj/obj_secnames.c
@@ -13,7 +13,7 @@
* choice so callers don't sprinkle target-format switches through
* link_layout.c / link_dyn.c.
*
- * Phase 1 of doc/MULTIOBJ.md: ELF returns the historical name; Mach-O
+ * Phase 1: ELF returns the historical name; Mach-O
* panics with a "TODO" until the macho writer lands in Phase 2/3. COFF
* panics in the same way and is filled in later. */
@@ -25,7 +25,7 @@ static Sym secname_panic_unimpl(Compiler* c, const char* which) {
SrcLoc l = {0, 0, 0};
compiler_panic(c, l,
"obj section name '%s' for target obj=%u not yet "
- "implemented (see doc/MULTIOBJ.md §3.5)",
+ "implemented",
which, (unsigned)c->target.obj);
return 0;
}
diff --git a/test/cg/CORPUS.md b/test/cg/CORPUS.md
@@ -434,5 +434,3 @@ Phase status:
|---|---|
| M | inline asm |
| R | opt-wrapped equivalence |
-
-See `doc/cg_testing.md` for the strategy and group definitions.
diff --git a/test/macho/cfree-roundtrip-macho.c b/test/macho/cfree-roundtrip-macho.c
@@ -1,5 +1,5 @@
/* cfree-roundtrip-macho: read a Mach-O object via libcfree's read_macho,
- * then re-emit via emit_macho. Phase 2 oracle for doc/MULTIOBJ.md §3.1.
+ * then re-emit via emit_macho. Round-trip oracle for the Mach-O writer.
*
* Usage: cfree-roundtrip-macho <in.o> <out.o>
*
diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md
@@ -158,7 +158,7 @@ here for completeness once they're real cases.
| `6_5_30_generic_selection`| ★ | `int x=42; return _Generic((x), int: x, default: 0);` | 42 |
| `6_5_31_subscript_commute`| ★ | `int a[5]={0,0,42,0,0}; return 2[a];` | 42 |
| `6_5_32_string_subscript` | ★ | `return "*"[0];` | 42 |
-| `6_5_33_regalloc_spill` | ★ | 12-arg `sum12(x1+0, ..., x12+0)` — exceeds the 10-INT scratch pool, exercises `spill_reg`/`reload_reg` and the cg_call avs-in-flight fallback (see doc/REGALLOC.md) | 78 |
+| `6_5_33_regalloc_spill` | ★ | 12-arg `sum12(x1+0, ..., x12+0)` — exceeds the 10-INT scratch pool, exercises `spill_reg`/`reload_reg` and the cg_call avs-in-flight fallback | 78 |
| `6_5_36_fp_arith` | ★ | `(a + b) / b * c - 36.0` over `double` — pins parser dispatch to `BO_FADD`/`FSUB`/`FMUL`/`FDIV` | 42 |
| `6_5_37_fp_int_promote` | ★ | `int + double` — usual arithmetic conversion promotes the int side to `double` before BO_FADD | 42 |
| `6_5_38_fp_float_widen` | ★ | `float + double` — float widens to double before BO_FADD | 42 |
diff --git a/test/smoke/rv64.sh b/test/smoke/rv64.sh
@@ -1,7 +1,7 @@
#!/usr/bin/env bash
# test/smoke/rv64.sh — end-to-end smoke test for the rv64 podman/qemu path.
#
-# Phase-2 of doc/MULTIARCH.md: prove the test/lib/exec_target.sh helper
+# Phase-2 of the multi-arch bring-up: prove the test/lib/exec_target.sh helper
# can build, queue, and run a riscv64-linux ELF before any cfree-emitted
# rv64 bytes exist. Builds a tiny freestanding static executable with
# clang --target=riscv64-linux-gnu and pushes it through
diff --git a/test/smoke/x64.sh b/test/smoke/x64.sh
@@ -1,7 +1,7 @@
#!/usr/bin/env bash
# test/smoke/x64.sh — end-to-end smoke test for the x64 podman/qemu path.
#
-# Phase-1 of doc/MULTIARCH.md: prove the test/lib/exec_target.sh helper
+# Phase-1 of the multi-arch bring-up: prove the test/lib/exec_target.sh helper
# can build, queue, and run an x86_64-linux ELF before any cfree-emitted
# x64 bytes exist. Builds a tiny freestanding static executable with
# clang --target=x86_64-linux-gnu and pushes it through
diff --git a/test/test.mk b/test/test.mk
@@ -137,7 +137,7 @@ test-parse: lib $(PARSE_RUNNER) $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER
test-parse-err: lib $(PARSE_RUNNER)
sh test/parse/run_errors.sh
-# test-smoke-x64: phase-1 sanity check for doc/MULTIARCH.md. Builds a
+# test-smoke-x64: phase-1 sanity check for the multi-arch bring-up. Builds a
# tiny freestanding x86_64 ELF with clang --target=x86_64-linux-gnu and
# runs it through test/lib/exec_target.sh's podman/qemu pipeline,
# proving the harness end-to-end before any cfree-emitted x64 bytes
@@ -165,7 +165,7 @@ test-smoke-rv64:
#
# Both build rt/build/aarch64-linux/libcfree_rt.a for soft-float / TF
# builtins, and run `cfree ld` against the real libc.a (static) and
-# libc.so / libc.so.6 (dynamic — see doc/DYNLD.md). Excluded from the
+# libc.so / libc.so.6 (dynamic). Excluded from the
# default `test` target because they need podman; opt-in via
# `make test-musl` / `make test-glibc`.
#