kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 049d0f0ae42e920aa7f5a997dbc23fbe53fef7c0
parent 5d0d1ca2b5c27ba40ed16d414737d18431be255c
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sun, 10 May 2026 14:58:13 -0700

doc: rm 7 stale design docs + strip stale references

Removes doc/{cg_testing,DYNLD,linker-status,MULTIARCH,MULTIOBJ,REGALLOC,rv64-status}.md
and prunes the now-dangling "see doc/X.md" pointers scattered through
src/, test/, and doc/ASM.md.

Diffstat:
Mdoc/ASM.md | 8++++----
Ddoc/DYNLD.md | 99-------------------------------------------------------------------------------
Ddoc/MULTIARCH.md | 286-------------------------------------------------------------------------------
Ddoc/MULTIOBJ.md | 847-------------------------------------------------------------------------------
Ddoc/REGALLOC.md | 346-------------------------------------------------------------------------------
Ddoc/cg_testing.md | 308-------------------------------------------------------------------------------
Ddoc/linker-status.md | 238-------------------------------------------------------------------------------
Ddoc/rv64-status.md | 140-------------------------------------------------------------------------------
Msrc/abi/abi.c | 3+--
Msrc/abi/abi_apple_arm64.c | 4++--
Msrc/abi/abi_internal.h | 2+-
Msrc/arch/aarch64.c | 3+--
Msrc/arch/x64.c | 2+-
Msrc/cg/cg.c | 9++++-----
Msrc/link/link_image_id.c | 2+-
Msrc/link/link_internal.h | 3+--
Msrc/obj/macho.h | 4++--
Msrc/obj/macho_emit.c | 2+-
Msrc/obj/obj.h | 2+-
Msrc/obj/obj_secnames.c | 4++--
Mtest/cg/CORPUS.md | 2--
Mtest/macho/cfree-roundtrip-macho.c | 2+-
Mtest/parse/CORPUS.md | 2+-
Mtest/smoke/rv64.sh | 2+-
Mtest/smoke/x64.sh | 2+-
Mtest/test.mk | 4++--
26 files changed, 28 insertions(+), 2298 deletions(-)

diff --git a/doc/ASM.md b/doc/ASM.md @@ -2,7 +2,7 @@ Scope: bring up the asm frontend (standalone `.s` and inline `asm("...")`) and the matching disassembler, starting with aarch64. Companion to -`DESIGN.md §10` and `MULTIARCH.md`. +`DESIGN.md §10`. The asm and disasm sides are designed together so that one description of each instruction serves both: same field layout, same operand syntax, same @@ -115,7 +115,7 @@ entries and operand-print/parse drift. ## 4. Module layout -Reuse `aa64` prefix (`MULTIARCH.md §5`). +Reuse `aa64` prefix. ``` src/parse/parse_asm.c shared driver: scan tokens, dispatch directives, @@ -292,8 +292,8 @@ and print). ## 5. Phasing Each phase ends mergeable. Phase 1 stands up the test harness so every -later phase gates on real runs from its first commit (mirrors -`MULTIARCH.md §4` Phase 1). Phase 2 lands the encode/decode pairing as +later phase gates on real runs from its first commit. Phase 2 lands the +encode/decode pairing as a mechanical refactor; phase 3 is the standalone assembler; phase 4 is inline asm + disasm overlay; phase 5 is the seam-rev for x64/rv64. diff --git a/doc/DYNLD.md b/doc/DYNLD.md @@ -1,99 +0,0 @@ -# Dynamic linking — status & remaining work - -Scope: producing dynamic-linked aarch64-linux ELF executables (and, -eventually, shared libs) that run against a real musl or glibc -libc.so. - -## Status - -`make test-musl` passes 6/6 (3 static + 3 dynamic) and -`make test-glibc` passes 3/3 (dynamic-only). Each produces an -ET_DYN PIE that runs end-to-end against the runtime loader -(`/lib/ld-musl-aarch64.so.1` or `/lib/ld-linux-aarch64.so.1`) for -`01_syscall_write`, `02_errno_touch`, `03_printf_hello`. - -`.dynamic` lives in a PF_R+W segment (alongside `.got.plt`) because -glibc's loader patches `DT_*` `d_un.d_ptr` fields in-place at startup -(`elf_get_dynamic_info` adjusts STRTAB/SYMTAB/etc. by `l_addr`); a -PF_R-only segment causes `SEGV_ACCERR`. musl's loader doesn't do -this rewrite, but RW placement is conventional and works for both. - -What's wired (don't re-derive — read the code if you need detail): - -- DSO ingest (`read_elf_dso`, `LINK_INPUT_DSO_BYTES`, soname tracking). -- Driver: `-dynamic-linker`, `.so` / `.so.N` positional inputs, - `-l<name>` honoring `-Bdynamic`/`-Bstatic`. -- Synthetic dyn tables: `.interp` / `.dynsym` / `.dynstr` / `.gnu.hash` - / `.rela.dyn` / `.rela.plt` / `.plt` / `.got.plt` / `.dynamic` - (`src/link/link_dyn.c::layout_dyn`). -- PIE / ET_DYN emit: `e_type`, `img_base`, PT_PHDR / PT_INTERP / - PT_DYNAMIC / PT_GNU_STACK, `R_AARCH64_RELATIVE` for internal abs - fixups (`src/link/link_elf.c`). -- PLT body emit (PLT0 + per-import 16-byte stubs) and import-reloc - routing: CALL26/JUMP26 → PLT entry (`sym_plt_vaddr`), - abs-against-import → GLOB_DAT, GOT-redirected slot fills → - GLOB_DAT via the existing `layout_got` path. BIND_NOW only - (`DF_1_NOW`); PLT0 is canonical but unused. - -## Remaining work - -### Phase 7 — `cfree_link_shared` (small) - -Files: `src/api/pipeline.c`, `src/link/`. - -Replace the panic at `pipeline.c:413` with a dispatch into the same -machinery as `link_exe`, with: -- `output_kind = SHARED` (no PT_INTERP, no entry-symbol requirement, - `allow_undefined = 1`). -- DT_SONAME from `opts->soname`. -- DT_R(UN)PATH from `opts->r(un)paths`. -- Exports promoted into `.dynsym` from `opts->exports`. - -Add a harness case under `test/libc/cases/` (shared by the -`test/libc/musl/` and `test/libc/glibc/` runners) or a new -`test/link/dyn/`: -build `libfoo.so` from a single `.c`, link an exe against it, run. - -### Phase 8 — TLS GD/IE/LD, IRELATIVE (deferred) - -Required for shared-lib TLS and IFUNCs in dynamic outputs. Out of -scope for the v1 dynamic exe; the musl harness doesn't exercise them. - -### Polish / follow-ups (none blocking) - -- **`--export-dynamic`**: promote internal globals into `.dynsym`. - Mechanical; not exercised by the musl harness. -- **`.gnu.hash` sort-by-bucket**: current code assumes hashed - symbols land contiguously in `.dynsym`. Fine for small import - sets; needs a sort pass before scaling. -- **`--as-needed`**: today every DSO with a soname gets a DT_NEEDED. - Plumb the flag through to filter on actual import use. -- **Linker-script DSO inputs**: Debian ships - `/usr/lib/aarch64-linux-gnu/libc.so` as a GNU-ld script - (`GROUP ( libc.so.6 libc_nonshared.a ld-linux-aarch64.so.1 )`). - `cfree ld` doesn't recognize a script in DSO position, so the - glibc harness hands `libc.so.6` + `libc_nonshared.a` directly. - `link_script.c` already parses the kernel.lds subset; extend it - to handle bare GROUP/INPUT scripts and wire `-l<name>` / - positional `.so` resolution to fan out the listed inputs. -- **Versioned symbols** (`.gnu.version` / `.gnu.version_r`): musl - doesn't use them; glibc does. -- **Lazy binding**: would need a real `_dl_runtime_resolve` PLT0 - reference. Skip until perf demands it. -- **Unit-level dyn-table harness** under `test/link/dyn/`: round-trip - `.dynsym` / `.gnu.hash` / `.rela.{dyn,plt}` / `.plt` body against - `readelf -d -r --dyn-syms` and `objdump -d --section=.plt`. Faster - than waiting on a full musl run to catch a malformed `.dynamic` - or mis-encoded PLT stub. - -## Open questions - -1. **Versioned symbols.** v1 ignores versions on read (matches GNU - ld's behavior with unversioned objects against versioned libs — - the unversioned default version is taken). Write-side versioning - is a follow-up that's invisible to the musl harness. - -2. **`.eh_frame_hdr` interaction.** A near-term gap in - `linker-status.md` independent of dynamic linking, but it touches - the same phdr synthesis code. If it sequences in the same window, - land it alongside Phase 7 — phdr count growth is shared. diff --git a/doc/MULTIARCH.md b/doc/MULTIARCH.md @@ -1,286 +0,0 @@ -# MULTIARCH — plan for adding a second architecture - -Scope: turn cfree from an aarch64-only compiler into one that supports -multiple `(arch, os, objfmt)` triples. The first new arch is x86_64; -the first new platform/objfmt and the asm frontend land later, on the -seams this work establishes. - -Today the codebase has one of each: aarch64 codegen (`src/arch/aarch64.c`), -AAPCS64 ABI (`src/abi/abi.c`), ELF emission with aarch64 relocs -(`src/obj/elf_emit.c` + `src/obj/elf_reloc_aarch64.c`), and an aarch64 -emulator-driven test path (`src/emu/`). `cgtarget_new`, `emit_elf`, -`link_elf`, and the disassembler all panic on non-AArch64 targets. - -The goal of the first phase is to introduce the seams that a second -arch forces — without yet writing x64 codegen. After the seams land, -x64 bring-up is purely additive: new files, no edits to the -arch-aware ones except the dispatch tables. - ---- - -## 1. Target slice for first x64 milestone - -| axis | value | -|-----------|--------------------------------| -| arch | `CFREE_ARCH_X86_64` | -| os | `CFREE_OS_LINUX` | -| objfmt | `CFREE_OBJ_ELF` | -| ABI | SysV AMD64 | -| codemodel | `CFREE_CM_SMALL` (default) | - -Mach-O, PE/COFF, Win64 ABI, and macOS-arm64 all wait until the seams -are validated by a working x86_64-linux-gnu path. This keeps the -arch-seam work decoupled from the objfmt-seam work — only one is on -the critical path at a time. - ---- - -## 2. The seams - -### 2.1 CGTarget construction — dispatch by arch - -`src/arch/aarch64.c:2998` is currently the public `cgtarget_new`. Split: - -- Rename the AArch64 constructor to `aa64_cgtarget_new`, declared in a - new `src/arch/aa64.h`. -- Add `src/arch/x64.h` and `src/arch/x64.c` with `x64_cgtarget_new` and - the equivalent `XImpl` skeleton (vtable wired up to method stubs). -- New `src/arch/cgtarget.c` owns the public `cgtarget_new` and switches - on `c->target.arch`. - -Same dispatch shape for `arch_disasm_new` (already factored via the -`ArchDisasm` hook — just needs a switch). - -`mc_new` does not change yet: a single `MCEmitter` impl serves all -arches. - -### 2.2 MCEmitter fixup encodings — extend the switch - -`src/arch/mc.c:84-134` has aarch64 BL/B.cond/CONDBR19 bit layouts -hardcoded in `apply_fixup`. **Decision:** extend the same switch with -x64 cases (`R_PC32` already works for jumps; add `R_PC8` for short -jumps). No per-arch fixup vtable — the encoding is one-line -little-endian patching, and the abstraction would be premature. - -Action item: update the file's "target-agnostic" header comment to -reflect that `mc.c` is the union of all known fixup encodings, not a -generic library. - -### 2.3 ABI classification — TargetABI vtable - -**Decision:** promote `TargetABI` to carry function pointers for the -parts that vary by `(arch, os)`. `abi_init` selects the right impl set -based on `c->target`. Rationale: even at two ABIs, the SysV AMD64 -classifier is large enough (eight-byte classification, INTEGER/SSE -classes, x87 corners) that an in-line switch in `abi_func_info` would -be ugly; and Win64 + macOS-arm64 are visible on the roadmap, so the -indirection pays off quickly. - -Concrete changes: - -- Add a vtable to `TargetABI` (function pointers for `func_info`, - `record_layout`, `va_list_type`, scalar profiles where they vary). -- Move the AAPCS64 classifier out of `abi.c` into - `src/abi/abi_aapcs64.c`, exposing an `aapcs64_vtable` symbol. -- Add `src/abi/abi_sysv_x64.c` exposing `sysv_x64_vtable`. Initial - classifier returns `ABI_ARG_INDIRECT` for everything (correct, slow, - unblocks bring-up); fill in the eight-byte rules incrementally. -- `abi_init` switches on `(target.arch, target.os)` and copies the - right vtable in. - -The public `abi_func_info` / `abi_record_layout` / `abi_*_type` API in -`src/abi/abi.h` does not change — only the internals dispatch through -the vtable. - -### 2.4 Object format — ELF reloc translator dispatch - -`src/obj/elf_reloc_aarch64.c` already exists as a per-arch reloc -translator; the seam is half there. Finish it: - -- Add `src/obj/elf_reloc_x86_64.c` mirroring it for `R_X86_64_*` codes. -- Extend `RelocKind` in `src/obj/obj.h` with `R_X64_*` entries - (`R_X64_PC32`, `R_X64_PLT32`, `R_X64_GOTPCREL`, ...). -- `src/obj/elf_emit.c:246-249` panics on non-aarch64 and hardcodes - `e_machine`. Replace with a switch that picks `EM_AARCH64` / - `EM_X86_64` and the right reloc-translator function pointer. -- `src/link/link_elf.c:575` — same treatment. - -Mach-O and PE/COFF emitters slot in as peers of `elf_emit.c` later, -each with its own per-arch reloc translator file. The reloc-translator -pattern established here is what makes that cheap. - -### 2.5 Header types per (arch, abi) - -`abi_size_type`, `abi_ptrdiff_type`, `abi_intptr_type`, -`abi_uintptr_type`, `abi_va_list_type` are already abstracted. The -SysV-x64 vtable (§2.3) supplies the right `__va_list_tag` struct -(`gp_offset`, `fp_offset`, `overflow_arg_area`, `reg_save_area`). - -`rt/include/` headers that are arch-conditioned will gate on -`__x86_64__` / `__aarch64__` predefines in the preprocessor (already -the convention used elsewhere in `rt/`). No new mechanism needed. - ---- - -## 3. Test/run path — execute from day 1 - -The harness already uses podman+qemu for aarch64 — see -`test/lib/exec_aarch64.sh`, `test/cg/run.sh:104-118`, and the libc -sysroot extractors in `test/libc/{musl,glibc}/`. The pattern: detect -qemu-user or podman, batch all queued exes through one `podman run` -to amortize launch overhead, fall back to a host serial loop. Test -selection is per-arch with per-target XFAIL. - -x64 inherits this machinery wholesale. There is no new "runner" to -design — only an x64-shaped peer of `exec_aarch64.sh` and a per-arch -dispatch where `cg/run.sh` currently sources the aarch64 helper -unconditionally. The default container image (`alpine:latest`) -already runs linux/amd64 binaries; the `--platform linux/amd64` flag -selection mirrors the existing `is_aarch64` gate in -`exec_aarch64.sh:48`. - -Execution tests are gating from the first hello-world. We do not -build out an x64 lifter inside `src/emu/` — that path is aarch64-only -and stays so. - ---- - -## 4. Phasing - -Three phases. Each is independently mergeable; phase 1 and phase 2 -land with no aarch64 behavior change. - -### Phase 1 — test harness updates - -Stand up the x64 execution path before any compiler code moves, so -phase 3 milestones can be gated on real runs from their first commit. - -1. Generalize `test/lib/exec_aarch64.sh` into a per-arch helper: - either rename to `exec_target.sh` with arch-keyed function names - (`exec_target_run aarch64 ...`), or add a peer - `test/lib/exec_x64.sh` with the same surface. The batched - `podman run` body is identical — only `--platform`, - `RUN_*_IMAGE`, and the `is_<arch>` gate vary. -2. `test/cg/run.sh` (and the other harnesses that source the helper) - dispatch the executor by the case's target arch. The existing - single-helper sourcing (`run.sh:118`) becomes a per-target choice. -3. Sysroot extractors: add `test/libc/{musl,glibc}/` x86_64 variants - if the libc tests claim x64 coverage. For the cg suite (static, - no libc), no sysroot is needed — `alpine:latest` is enough. -4. Smoke test: a hand-rolled or clang-built x86_64 ELF is queued and - flushed through the new executor and exits cleanly. This proves - the harness end-to-end before any cfree-emitted x64 bytes exist. -5. Per-test arch declaration: `test/cg/` cases gain a way to say - which arches they run on (default: all supported). Per-target - XFAIL is keyed off the same. - -Exit criterion: aarch64 suite green; the smoke test runs an external -x86_64 binary through the harness and reports pass on the standard -pass/fail line. - -### Phase 2 — code refactors (multi-arch seams + x64 stubs) - -Pure refactors. No aarch64 output changes; x64 reachable but every -codegen call panics with "x64: not implemented". - -1. **CGTarget dispatch** (§2.1). Rename `aarch64.c::cgtarget_new` to - `aa64_cgtarget_new`, declared in `arch/aa64.h`. New - `arch/cgtarget.c` owns the public `cgtarget_new` and switches on - `c->target.arch`. Same split for `arch_disasm_new`. -2. **x64 skeleton.** Add `arch/x64.{h,c}` with the full vtable wired - up to stub methods that panic on call. `cgtarget_new` dispatches - to it for `CFREE_ARCH_X86_64`. -3. **ABI vtable** (§2.3). Promote `TargetABI` to carry function - pointers for `func_info`, `record_layout`, `va_list_type`, and - the scalar profiles that vary. `abi_init` switches on - `(target.arch, target.os)` and installs the right vtable. -4. **AAPCS64 split.** Move the AAPCS64 classifier from `abi.c` into - `abi/abi_aapcs64.c`, exposing `aapcs64_vtable`. `abi.c` keeps the - generic dispatch and the C-standard-driven scalar/record bits. -5. **SysV-x64 stub.** Add `abi/abi_sysv_x64.c` exposing - `sysv_x64_vtable` with `ABI_ARG_INDIRECT` for everything. Wired - up by `abi_init` for `(X86_64, LINUX)` but unreachable until - phase 3 fills `arch/x64.c`. -6. **ELF reloc dispatch** (§2.4). Add `R_X64_*` to `RelocKind`. New - `obj/elf_reloc_x86_64.c` mirrors `elf_reloc_aarch64.c`. - `obj/elf_emit.c:246-249` and `link/link_elf.c:575` lose their - AArch64-only panics in favor of a switch on `c->target.arch` that - picks `e_machine` (`EM_AARCH64` / `EM_X86_64`) and the reloc - translator. -7. **Allowlist consolidation.** Other `arch != ARM_64` panics - (anywhere they remain after the splits above) move into one - target-validation function, easy to extend. - -Exit criterion: aarch64 suite green and byte-for-byte identical -output objects on a representative `test/cg/` set; an x86_64-linux -target reaches `arch/x64.c`'s stubs (the panic is the proof the -dispatch is wired). - -### Phase 3 — implementations - -Now purely additive. Fill in the stubs from phase 2; each milestone -gates on the podman/qemu execution path from phase 1. - -1. **mc.c x64 fixups.** Add x64 reloc-kind cases to - `arch/mc.c::apply_fixup` (`R_PC32` already works for jumps; - `R_PC8` for short jumps; whatever else x64 codegen actually - emits). -2. **SysV-x64 ABI classifier.** Replace the `ABI_ARG_INDIRECT` stub - in `abi/abi_sysv_x64.c` with the eight-byte INTEGER/SSE - classification. `va_list` returns the SysV `__va_list_tag` - struct. -3. **x64 codegen** in `arch/x64.c`, in the rough order the parser - phases established for aarch64: - 1. Hello-world (`int main(void) { return 0; }`) — exit via - syscall through inline asm, or a libc-less return path. - 2. Integer arithmetic + locals — frame slots, spill/reload on the - x64 register pool. The CG-driven spill/reload from commit - `9724439` is reusable; only the physical-register pool and - load/store encodings change. - 3. Calls — SysV register passing, basic eight-byte classification - paired with the ABI work above. - 4. Loads/stores at every width — `mov` size variants, sign/zero - extension corners. - 5. Compare-and-branch, structured control flow. - 6. Aggregates, bitfields, varargs, atomics, intrinsics. -4. **x64 disassembler** in `arch/` — peer of `aa64_isa.{c,h}`. - Required for the textual disasm path used by some `test/cg/` - gates. -5. **Asm frontend.** Once x64 codegen lands, the asm frontend - (`parse_asm`) is the cheapest way to author instruction-sequence - tests without driving through the C frontend. Lands as a peer of - `src/parse/` consuming `MCEmitter` directly. - -Exit criterion: each x64 milestone owns a `test/cg/` case running -under both the aarch64 emulator and the podman/qemu x64 path; both -report green on the standard pass/fail line. - ---- - -## 5. Naming conventions - -For the new files and exposed symbols: - -- aarch64 → `aa64` prefix in code (`aa64_cgtarget_new`, `aa64_vtable`, - files under `arch/aa64*` and `abi/abi_aapcs64.c`). The existing - `aarch64.c` and `aa64_isa.{c,h}` already mix the two; keep `aa64` as - the going-forward convention. -- x86_64 → `x64` prefix (`x64_cgtarget_new`, files under `arch/x64*` - and `abi/abi_sysv_x64.c`). Avoid `x86_64` in identifiers — too long. -- ELF reloc translators stay arch-suffixed: - `elf_reloc_aarch64.c`, `elf_reloc_x86_64.c` (matches ELF spec - naming). - ---- - -## 6. Validation gates - -A change in this plan is "done" when: - -- Phase A/B/C: aarch64 test suite still green, byte-for-byte identical - output objects on a representative set of `test/cg/` cases. -- Phase D: a hand-rolled x86_64 ELF round-trips through the podman - runner with the right exit code. -- Phase E: each milestone owns a `test/cg/` case running under both - the aarch64 emulator and the podman runner; pass/fail line green - on both. diff --git a/doc/MULTIOBJ.md b/doc/MULTIOBJ.md @@ -1,847 +0,0 @@ -# MULTIOBJ — plan for adding a second object format (Mach-O) - -Scope: turn cfree from an ELF-only compiler/linker into one that -supports multiple `(arch, os, objfmt)` triples on the **objfmt** axis. -The first new objfmt is Mach-O, targeting macOS on aarch64 and (once -x64 codegen lands) x86_64. PE/COFF for Windows is the next peer; the -seams introduced here are designed so PE/COFF is purely additive on -top — new files, no edits to format-aware ones except the dispatch -tables. - -Companion to `MULTIARCH.md`. That doc was the **arch** axis; this is -the **objfmt** axis. The two are intentionally separate critical -paths: arch and objfmt seams land independently, validated against -each other only at the `(arch, os, objfmt)` cross-product on the test -matrix. - ---- - -## Status - -- [x] **Phase 1** — seams + format-aware bookkeeping - - [x] Linker emit dispatch — per-format panics (§3.3) - - [x] Build-id moved to format-agnostic `link_image_id_compute` (§3.5) - - [x] ABI vtable selection keys on `(arch, os)`; `apple_arm64_vtable` - aliases AAPCS64 (§3.4) - - [x] `obj_secname_*` helpers for init/fini/preinit/tdata/tbss (§3.5) - - [x] `src/obj/macho.h` + `macho_reloc_aarch64.c` stubs (§3.1, §3.2) - - [x] ELF suite green (test-elf 37/37, test-link 119/119, test-cg 1549/1549) -- [x] **Phase 2** — Mach-O object writer + reader (MH_OBJECT, arm64) - - [x] `obj/macho_emit.c` — MH_OBJECT writer, leading-`_` C-symbol - prefix, ARM64_RELOC_ADDEND pair on non-zero addends (§3.1) - - [x] `obj/macho_reloc_aarch64.c` — full RelocKind ↔ ARM64_RELOC_* - table with pcrel/length companions (§3.2) - - [x] `obj/macho_read.c` — MH_OBJECT reader with strip-leading-`_` - inverse and ADDEND-pair collapsing (§3.1) - - [x] `obj/obj_secnames.c` — Mach-O section names for init/fini/tls - (§3.5) - - [x] `abi/abi_apple_arm64.c` — `va_list` is `char*`; variadic-arg - on-stack routing wired in `arch/aarch64.c::emit_arg_value` - when `target.os == MACOS` (§3.4) - - [x] `test/lib/exec_target.sh` — `<arch>-<os>` tag form, Darwin - native branch for `aarch64-macos` (§3.6) - - [x] Smoke test: `cfree cc -target arm64-apple-macos -c …` produces - Mach-O `.o` that links via host clang and runs natively (exit 42) - - [x] Self-roundtrip oracle: `test/macho/cfree-roundtrip-macho.c` — - cfree-emitted `.o` → `read_macho` → `emit_macho` is - byte-identical - - [x] ELF suite still green (test-elf 37/37) - - [x] Testing harness extensions (§7): `CFREE_TEST_OBJ` env in - `cfree_test_target.h` and `test/{cg,link,elf}/run.sh`, - per-case `*.targets` applicability (`test/elf/cases/18_bti_note.targets`) - - [ ] Clang-emitted Mach-O round-trip — needs section-relative reloc - and `__compact_unwind` handling (deferred) -- [x] **Phase 2.5** — Linker-side Mach-O read + JIT - - [x] `link_add_obj_bytes` / `link_add_archive_bytes` dispatch on - `cfree_detect_fmt`, so a Mach-O `.o` (or `.a` member) parses - through `read_macho` exactly like an ELF input does (§3.3) - - [x] C-symbol mangling lives at the linker API boundary - (`link_intern_c_name`, `cfree_jit_lookup`, the undef diag) — - Mach-O on-disk names stay byte-for-byte verbatim (round-trip - intact), and callers see the source-level form across both - formats: `link_set_entry("test_main")` and - `cfree_jit_lookup(jit, "test_main")` work uniformly. - - [x] test-link paths R + J both green on `aa64-macho` - (35 R / 31 J passing, including bad/30_undef_strong; - path E is the remaining gap) - - Four cases ship a `j_targets` file restricting their J path to - ELF tuples (R + E still run on every tuple). Path J's pass/fail - criterion in these cases depends on an ELF-specific ABI feature - with no Mach-O analogue: - * `21_fini_array` / `22_init_fini_both` — Mach-O destructors flow - through `__StaticInit` + `__cxa_atexit` registration, not the - ELF `.fini_array` shape the test and - `test/link/harness/start.c` walk. Without a `__cxa_atexit` - runtime in the JIT, the destructor is never invoked. - * `25a_gc_basic` / `25d_gc_chain` — `clang -ffunction-sections` - on Mach-O still emits a single `__TEXT,__text` per `.o` - (`.subsections_via_symbols` is the per-symbol dead-strip - granularity, not `-ffunction-sections`). `--gc-sections` can - drop whole sections but not individual functions, so the - `gc_absent unreachable_fn` check fails. -- [~] **Phase 3** — Mach-O linker (`link_emit_macho`) — driver path - working; test-link/E coverage pending (§9 Phase 3.1). - - [x] `src/link/link_macho.c` — MH_EXECUTE + MH_PIE writer with - `__PAGEZERO` / `__TEXT` / `__DATA_CONST` / `__DATA` / - `__LINKEDIT` segments, `__TEXT,__stubs` (12-byte arm64 stubs - through `__DATA_CONST,__got` slots), `LC_DYLD_CHAINED_FIXUPS` - for both bind (imports) and rebase (internal abs64) fixups, - `LC_DYLD_EXPORTS_TRIE` (single-entry minimal trie), - `LC_SYMTAB` + `LC_DYSYMTAB` + indirect-symbol table, - `LC_LOAD_DYLINKER` (`/usr/lib/dyld`), `LC_BUILD_VERSION`, - `LC_UUID`, `LC_MAIN`, empty `LC_FUNCTION_STARTS` / - `LC_DATA_IN_CODE`. - - [x] Ad-hoc `LC_CODE_SIGNATURE` (SuperBlob + CodeDirectory v=0x20400 - with sha256 4 KiB-page hashes + execSeg fields) so the kernel - execs the binary on macOS 11+. - - [x] `read_macho_dso` for MH_DYLIB inputs and `read_tbd` for - Apple's text-based-stub `.tbd` files (sniffed via leading - `---`). `link_add_dso_bytes` dispatches on format. TBD parser - is a token scanner — emits every `_id` token as an exported - ObjSym; conservative but correct for the static-link decision - (the install-name on `LC_LOAD_DYLIB` is the umbrella, and dyld - walks re-exports at runtime). - - [x] `driver/lib_resolve.c` — `-lname` resolves `.tbd` first, then - `.dylib`, then `.so`, then `.a` under - `LIB_RESOLVE_DYNAMIC_PREFER`. `driver/cc.c` routes the result - to `dso_bytes[]` or `archives[]` by suffix and plumbs - `ndso_bytes` through `CfreeLinkInputs`. - - [x] `LinkImage.linker` back-pointer set by `link_resolve` so - format-specific emit can walk `LinkInputs` (e.g. resolving an - imported sym's `dso_input_id` to a DSO `install_name`). - - [x] `link_set_entry` defaults to `_main` on Mach-O (vs `_start` - for ELF), matching the LC_MAIN convention where dyld owns C - startup. - - [x] `link_layout.c::resolve_symbols` `is_def` widened to require - backing storage (section / abs / common) — `extern int f();` - from `decl.c` lands as `kind=SK_FUNC, section_id=0` and would - otherwise be misclassified as a definition, masking the import - and breaking CALL26 to libSystem on the in-memory pipeline. - ELF `.o` reads were unaffected because `read_elf` already - normalizes undefs to `SK_UNDEF`. - - [x] `link_layout.c::boundary_name` — Mach-O target prefixes every - linker-synthesized boundary symbol (`__init_array_start`, etc) - with `_` so the on-disk name matches what consumer code - compiles to under the leading-`_` mangling rule. - - [x] `decl.c` — Mach-O target prepends `_` to every C identifier - with linkage at obj_symbol creation time (unconditional, even - when source name already starts with `_`, matching Apple `cc`). - - [x] `pipeline.c::cfree_link_exe` — skip ELF `layout_dyn` for - Mach-O targets (their LC_LOAD_DYLIB / chained-fixups machinery - is synthesized in `link_emit_macho` instead, and ELF-shaped - `.plt` / `.got.plt` synthetic sections would only confuse the - Mach-O writer). - - [x] Smoke test — `cfree cc -target arm64-apple-macos hello.c -o - hello -lSystem` produces a runnable arm64-darwin Mach-O exe - that calls `printf` from libSystem and exits with the right - code (verified manually with `hello.c`, `multi.c` covering - multiple imports + globals + bss). - - [x] ELF test suite still green (`test-elf` 37/37). - ---- - -## 1. Today - -What exists for Mach-O already: - -- `ObjExtKind` carries `OBJ_EXT_MACHO` and `OBJ_EXT_COFF` - (`src/obj/obj.h:75-81`). -- `emit_macho` / `read_macho` (and `emit_coff` / `read_coff`) are - declared in `obj.h:344-364` and stubbed in `src/api/stubs.c:73-94` - — they panic with `unimplemented`. -- `cfree_compile_obj_emit` (`src/api/pipeline.c:306-321`) already - dispatches into them by `c->target.obj`. -- `cfree_detect_target` recognizes Mach-O magic, reads `cputype`, - and populates `(arch, os=MACOS, obj=MACHO)` - (`src/api/detect.c:180-214`). -- The driver's target parser accepts `darwin`/`macos` - (`driver/target.c:82`); `driver/env.c:754` defaults the host OS - on a Darwin build. -- `RelocKind` is already per-arch in shape (e.g. `R_AARCH64_*`, - `R_X64_*`); the per-arch ELF reloc translator is split into - `obj/elf_reloc_<arch>.c`. Mach-O reloc translators slot in as - peers. - -What is ELF-only and panics on Mach-O today: - -- `emit_macho` / `read_macho` themselves - (`src/api/stubs.c:73,90`). -- `link_emit_image_writer` (`src/link/link.c:422-432`) routes only - `CFREE_OBJ_ELF` to `link_emit_elf`; every other case panics with - "only ELF is implemented". -- `link_emit_elf` writes an ET_EXEC/ET_DYN ELF (`src/link/link_elf.c`, - `src/link/link_dyn.c`). Nothing exists for the Mach-O peer. -- Dynamic-link plumbing (`src/link/link_dyn.c::layout_dyn`) emits - `.interp` / `.dynsym` / `.dynstr` / `.gnu.hash` / `.rela.*` / - `.plt` / `.got.plt` / `.dynamic` — all ELF-shaped. Mach-O has its - own dyld machinery (LC_DYLD_INFO_ONLY / DYLD_CHAINED_FIXUPS / - LC_SYMTAB / LC_DYSYMTAB / `__la_symbol_ptr` / `__stubs`). -- ABI vtable selection (`src/abi/abi.c:286-300`) keys on - `target.arch` only; macOS-arm64 has a distinct ABI from AAPCS64 - (variadic-args-on-stack, `char`/`short` promoted on stack - arguments). -- The test harness (`test/lib/exec_target.sh`) batches via - `podman run` against `linux/<arch>` images — a Linux-only - execution path. Native Mach-O execution on the host is the new - path. - ---- - -## 2. Target slice for first Mach-O milestone - -| axis | value | -|-----------|--------------------------------------| -| arch | `CFREE_ARCH_ARM_64` | -| os | `CFREE_OS_MACOS` | -| objfmt | `CFREE_OBJ_MACHO` | -| ABI | Apple ARM64 (Darwin variant) | -| codemodel | `CFREE_CM_SMALL` (default) | - -Why this slice first, not x86_64-darwin or arm64-windows: - -- arm64 codegen is the validated one; x64 is still on the - `MULTIARCH.md` Phase 3 critical path. -- Apple no longer ships x86_64 Macs; x86_64-darwin is interesting - only as a cross target. -- Windows is on the roadmap but is a separate format **and** a - separate ABI; bundling it into the first Mach-O milestone would - blur which seam each defect lives behind. - -The pivot once arm64-darwin lands: - -- **x86_64-darwin** — additive; needs `macho_reloc_x86_64.c`, - Apple-x64 ABI vtable (close to SysV-x64 with quirks). -- **arm64-windows / x86_64-windows** — peer of this plan with - `coff_emit.c` / `coff_read.c` / `coff_reloc_<arch>.c` and the - Microsoft ABI vtable. The seams below are shaped so this is - purely additive. - -Out of scope for v1 of Mach-O: - -- Universal (fat) binaries — not a hard requirement; skip until - someone needs them. The shape is well-defined: a fat header - prepended to per-arch slices. -- Bitcode embedding, codesigning, entitlements, `__LINKEDIT` - beyond what dyld needs, debug `__DWARF` segment (covered later - by the dwarf side, not the object side). -- ObjC metadata sections (`__objc_*`). Not relevant for a C - compiler. - ---- - -## 3. The seams - -### 3.1 Object writer / reader — peers of `elf_emit.c` / `elf_read.c` - -**Decision:** introduce `src/obj/macho_emit.c` and -`src/obj/macho_read.c` as peers of the ELF pair, sitting behind the -existing `emit_macho` / `read_macho` declarations in `obj.h`. No -additional dispatch is needed at this layer — `pipeline.c:306-321` -already routes by `target.obj`. - -Round-trip invariant (matching `DESIGN.md §5.5`): `read_macho` of a -`macho_emit` output must produce an `ObjBuilder` shape-equivalent -to the input, modulo (a) Mach-O's mandatory `(segname, sectname)` -pairing for sections and (b) any synthesized `N_SECT` / `N_OSO` -symbols. - -The neutral `ObjBuilder` model accommodates Mach-O without a -schema break: - -- `Section.name` is already a single `Sym`. Mach-O writers split it - by convention: when a section's name string starts with - `__TEXT,__text` (or any other comma-separated form), the writer - takes the prefix as `segname` and the suffix as `sectname`. When - the name lacks a comma (the common case for ELF-shaped input), - the writer derives `segname` from `SecKind` (`SEC_TEXT` → - `__TEXT`, `SEC_RODATA` → `__TEXT,__const`, `SEC_DATA` → - `__DATA`, `SEC_BSS` → `__DATA,__bss`). -- `SecKind` / `SecFlag` map cleanly onto Mach-O `S_*` section - types and `S_ATTR_*` attributes. The reverse mapping (read side) - uses the existing `Section.ext_type` / `Section.ext_flags` - escape hatch (already present, see `obj.h:209-217`) for any - Mach-O-only types we don't want to lose on round-trip. -- `SymBind` / `SymKind` / `SymVis` cover Mach-O's `N_EXT` / - `N_PEXT` / `N_TYPE` adequately; `Section.ext_kind` is set to - `OBJ_EXT_MACHO` when reading so the writer knows to preserve - format-specific fields. (The same escape hatch will be used by - COFF.) -- Symbol bookkeeping: Mach-O requires symbols to be partitioned - into local / external-defined / external-undefined for - `LC_DYSYMTAB`. The partitioning is computed at write time from - the `ObjSym.bind` / `ObjSym.section_id` fields — no schema - change. - -Header includes: a new `src/obj/macho.h` peer of `obj/elf.h` for -the on-disk structures (`mach_header_64`, `segment_command_64`, -`section_64`, `symtab_command`, `nlist_64`, `relocation_info`). - -### 3.2 Per-arch Mach-O reloc translator - -`obj/elf_reloc_<arch>.c` is the model. Add: - -- `src/obj/macho_reloc_aarch64.c` — `RelocKind` ↔ - `ARM64_RELOC_*` (UNSIGNED, BRANCH26, PAGE21, PAGEOFF12, - GOT_LOAD_PAGE21, GOT_LOAD_PAGEOFF12, POINTER_TO_GOT, - TLVP_LOAD_PAGE21, TLVP_LOAD_PAGEOFF12, ADDEND, SUBTRACTOR). -- `src/obj/macho_reloc_x86_64.c` — `RelocKind` ↔ `X86_64_RELOC_*` - (UNSIGNED, SIGNED, BRANCH, GOT, GOT_LOAD, SUBTRACTOR, TLV, - SIGNED_1/2/4). Lands when x64 codegen does (post - `MULTIARCH.md` Phase 3). - -Two Mach-O-specific complications the translator absorbs: - -- **`ARM64_RELOC_ADDEND` pairs.** Mach-O encodes addends out-of- - band by emitting a leading `ARM64_RELOC_ADDEND` reloc carrying - the addend, immediately followed by the real reloc. The existing - `Reloc.pair` byte (`obj.h:226`) already exists for this kind of - paired-reloc shape (it was added with the same semantics in - mind). The translator emits the pair on write and collapses the - pair on read. -- **`ARM64_RELOC_SUBTRACTOR`** is two relocs — a SUBTRACTOR - followed by an UNSIGNED — modeling `B - A` as the resolved - value. Cfree's IR doesn't currently emit DWARF-style - difference relocs (the only consumer is `eh_frame` / - `compact_unwind` — not on the v1 critical path); leave them - out of `cgtarget` and panic in the writer if seen. Reader - recognizes them so a clang-built object round-trips through - `objdump`. - -**Decision:** no new `RelocKind` enum entries are needed for -Mach-O. The kinds are already arch-suffixed and the translator -pattern keeps the format-specific encoding local. We do not split -`R_AARCH64_PAGE21` from a hypothetical `R_MACHO_AARCH64_PAGE21` — -the underlying semantic (the page-relative ADRP fixup) is the -same; the translator picks the right ELF or Mach-O code. - -### 3.3 Linker emit dispatch - -`link_emit_image_writer` (`src/link/link.c:422-432`) is the seam. -Replace the single ELF case + panic with a switch: - -``` -case CFREE_OBJ_ELF: link_emit_elf(img, w); return; -case CFREE_OBJ_MACHO: link_emit_macho(img, w); return; -case CFREE_OBJ_COFF: link_emit_coff(img, w); return; /* later */ -case CFREE_OBJ_WASM: /* later */ -``` - -`link_emit_macho` lives in a new `src/link/link_macho.c`. It is -**not** a thin reskin of `link_emit_elf`: the LinkImage model -(segments / sections / symbols / reloc-applies) is largely shared, -but Mach-O's load-command shape is wholly different from ELF -program headers, and dyld bookkeeping is incompatible enough that -trying to share `link_dyn.c::layout_dyn` would be more work than a -peer. - -Concrete shape of `link_macho.c`: - -- Plan load commands: `LC_SEGMENT_64` × N (segments map to - Mach-O segments, one per `LinkSegment`), `LC_SYMTAB`, - `LC_DYSYMTAB`, `LC_BUILD_VERSION`, `LC_DYLD_INFO_ONLY` (or - `LC_DYLD_CHAINED_FIXUPS` for modern dyld; pick one — see - decision below), `LC_LOAD_DYLINKER`, `LC_LOAD_DYLIB` × N for - imported DSOs (peer of ELF DT_NEEDED), `LC_MAIN` (or - `LC_UNIXTHREAD` for static), `LC_FUNCTION_STARTS`, - `LC_DATA_IN_CODE`, `LC_UUID`, `LC_SOURCE_VERSION`. -- Synthesize `__LINKEDIT` segment containing symtab/strtab, - dyld export trie, indirect-symbol table, function-starts - table, code-signature placeholder. -- Synthesize `__TEXT,__stubs` (arm64: 12-byte stubs) and - `__DATA,__la_symbol_ptr` (lazy pointers) for imported - function calls; arm64 BL → stub. Or, on modern macOS, - go straight to chained fixups + non-lazy binding (no - `__stubs`). -- Apply relocations against the chosen image base - (`MH_PIE`-only for v1). - -**Decision: `LC_DYLD_CHAINED_FIXUPS` for v1, not -`LC_DYLD_INFO_ONLY`.** Chained fixups are the modern macOS path -(11+); they are smaller, simpler to emit (no opcode encoder for -the bind-info stream), and Apple has been deprecating the legacy -path. A consequence: cfree-emitted Mach-O exes do not run on -macOS 10.15 or older. That is acceptable for v1 — bring it up -only if a user complains. The legacy-bind-info encoder lands later -as an additional path, gated behind a target-min-version check. - -`link_dyn.c` stays ELF-only and is **not** generalized. The -overlap with `link_macho.c` is "we both need a list of imported -DSOs and exported symbols" — which the LinkImage already carries. -Generalizing would force a lowest-common-denominator layer that -serves neither side well. - -### 3.4 ABI vtable — widen selection to `(arch, os)` - -`abi.c::select_vtable` (lines 286-300) keys on `target.arch` alone. -For aarch64 this is wrong on macOS: - -- Apple ARM64 passes variadic arguments on the stack only (not in - x0-x7), and promotes `char`/`short` to `int` for stack args - even when the type is otherwise passed register-only. AAPCS64 - passes variadic args in registers like fixed args. -- `_Bool` is 8 bits on Darwin (matching most other platforms; - AAPCS64 also says 8 bits, so no divergence here — but the - rules diverge enough that the vtable must be distinct). - -**Decision:** widen the switch in `select_vtable` to key on the -`(target.arch, target.os)` pair. Add `apple_arm64_vtable` in a new -`src/abi/abi_apple_arm64.c`. When `(arch, os)` is `(ARM_64, MACOS)`, -install it; otherwise keep AAPCS64. - -Mechanically: - -- Initial implementation can be a thin shim: copy `aapcs64_vtable` - and override only `compute_func_info`'s variadic handling and - the by-value-on-stack promotion rule. Avoid the temptation to - factor a "shared aarch64 base" — two implementations with a - shared static helper for the register classification is enough - abstraction. -- `va_list` on Apple ARM64 is a single `char*` (much simpler than - AAPCS64's struct with five fields). The vtable's `va_list_type` - hook returns the right type per ABI — already structured for - this. -- Apple x86_64 uses SysV-x64 with minor differences (red zone is - the same; varargs use a different `__va_list_tag` layout? — no, - same). When x64 lands, `apple_x64_vtable` may be a literal - re-export of `sysv_x64_vtable`; revisit then. -- Microsoft x64 ABI (Windows) is meaningfully different (4 - register args, shadow space, varargs in registers); it gets its - own `ms_x64_vtable` peer when COFF lands. - -### 3.5 Format-aware bookkeeping in the linker layout - -`link_layout.c` emits a few ELF-shaped artifacts that need to be -either generalized or dispatched: - -- **Build-id note** (`.note.gnu.build-id`) is ELF-specific. Mach-O - uses `LC_UUID`. Move the build-id synthesis out of `layout` into - a per-format hook called by the format-specific emitter. - Decision: layout produces a 16-byte image identity (a hash of - the post-shift section bytes), and the format emitter packages - it as a build-id note (ELF) or `LC_UUID` (Mach-O) or a debug - directory (COFF/PE). One source of truth for the bytes. -- **TLS layout.** ELF uses `PT_TLS` + per-arch tpoff relocs; - Mach-O uses `__thread_vars` / `__thread_data` / `__thread_bss` - sections and `tlv_descriptor` records (a function pointer + key - + offset). The TLS lowering in cgtarget is already arch-aware; - the **section name choice** in cg/abi must become format-aware. - Add a `target.obj`-keyed dispatch where TLS section names are - picked. -- **Init/fini.** ELF uses `.init_array` / `.fini_array`; Mach-O - uses `__DATA,__mod_init_func` / `__DATA,__mod_term_func`. Same - format-aware-section-name dispatch. -- **Common symbols.** ELF emits `SK_COMMON` as `SHN_COMMON`; - Mach-O lays them out into `__DATA,__common` at link time. The - read/write paths absorb this — no `ObjBuilder` change. - -### 3.6 Driver: native execution path on Darwin hosts - -`driver/cc.c` and friends already understand `darwin`/`macos` at -the parse layer. The execution path for tests is the only new -thing: an arm64-darwin Mach-O executable runs natively on the -Darwin/arm64 host, no podman, no qemu. Detection rule for -`test/lib/exec_target.sh`: - -- A new `exec_target_darwin_native` predicate that returns 0 if - the host is `darwin/<matching-arch>` and the case's target os - is `MACOS`. Bypass the podman / qemu branches entirely; just - `chmod +x` and exec. -- For Linux hosts, executing macOS binaries is not supported. Mark - cases as XFAIL on non-Darwin hosts. -- For Darwin hosts, executing Linux binaries continues to flow - through podman as today — Apple's Virtualization.framework via - Podman Desktop or `podman machine` is already the working path - for the existing aarch64-linux suite when developing on a Mac. - ---- - -## 4. Phasing - -Three phases. Each is independently mergeable; phase 1 lands with -no behavior change, phase 2 lands a working `cfree -c` Mach-O -output, phase 3 lands `cfree` link-to-exe. - -### Phase 1 — seams + format-aware bookkeeping - -Pure refactors. No Mach-O bytes emitted yet; ELF output unchanged -byte-for-byte. - -1. **Linker emit dispatch** (§3.3). `link_emit_image_writer` (the - one site at `link/link.c:422`) gains a switch over `target.obj`; - `CFREE_OBJ_MACHO` and `CFREE_OBJ_COFF` panic with - `unimplemented` (replacing the catch-all "only ELF" panic with - per-format ones). Move the build-id synthesis out of layout - into a format-agnostic image-identity hook (§3.5). -2. **ABI vtable widening** (§3.4). `abi.c::select_vtable` keys on - `(arch, os)`. Add `src/abi/abi_apple_arm64.c` with - `apple_arm64_vtable` initially aliasing AAPCS64. The - variadic-on-stack and small-int-promotion overrides land in - phase 2 alongside the macho writer (so a build-it-and-see-what- - breaks loop on a real macOS toolchain catches divergences). - Same shape for `apple_x64_vtable` (deferred to whenever x64 - lands on Darwin). -3. **Format-aware section-name dispatch** (§3.5). TLS, - init/fini, and common-symbol section naming become a - target-keyed function rather than the hard-coded ELF strings - currently in `cg` / `abi` / `link_layout`. ELF behavior is - unchanged; the dispatch is a fall-through to the current - strings until a Mach-O case is added. -4. **Mach-O headers and reloc translator stubs** (§3.1, §3.2). - New `src/obj/macho.h` with the on-disk structures. New - `src/obj/macho_reloc_aarch64.c` with translator stubs (panic - on call). No behavior change — neither is reachable yet. - -Exit criterion: ELF test suite green and byte-for-byte identical -output objects on a representative `test/cg/` set. - -### Phase 2 — Mach-O object writer + reader (MH_OBJECT) - -Now the writer produces real Mach-O bytes. No linker work yet — -we validate by running a clang-built binary that links a -cfree-emitted `.o`. - -1. **`obj/macho_emit.c`** — writes `MH_OBJECT` from a finalized - `ObjBuilder`. Layout: header → load commands - (`LC_SEGMENT_64`-with-everything, `LC_SYMTAB`, - `LC_DYSYMTAB`, `LC_BUILD_VERSION`) → section bytes → reloc - tables → symtab/strtab in `__LINKEDIT`. (`MH_OBJECT` keeps - everything in one `LC_SEGMENT_64`.) -2. **`obj/macho_reloc_aarch64.c`** — fill in the translators - stubbed in phase 1. The `ADDEND`-pair handling on write; the - `SUBTRACTOR`-pair handling on read. -3. **`obj/macho_read.c`** — parses `MH_OBJECT` (and `MH_DYLIB` - for the linker's DSO-input path) into an `ObjBuilder`. -4. **Apple ARM64 ABI deltas.** Variadic-on-stack and small-int - promotion in `abi_apple_arm64.c` (the Phase 1 alias is no - longer correct once anything calls a varargs function). -5. **Native exec helper** (§3.6). `test/lib/exec_target.sh` - gains the Darwin-native branch. -6. **Smoke test.** A `test/cg/` case compiles to `.o` via cfree - targeting `arm64-apple-macos`, links via host `clang`, and - runs natively. Greens the standard pass/fail line. -7. **`objdump` round-trip.** A Mach-O `.o` produced by clang - round-trips through `read_macho` → `emit_macho` and - re-`read_macho` produces an equivalent `ObjBuilder`. This is - the standard cfree round-trip discipline. - -Exit criterion: `cfree -c` for `arm64-apple-macos` produces an -object that links via the host `ld` / `clang` into a runnable -executable, and clang-produced Mach-O round-trips through -cfree's reader/writer. - -### Phase 3 — Mach-O linker (`link_emit_macho`) - -Now cfree links its own Mach-O executable end-to-end, no clang. - -1. **`link/link_macho.c`** — `link_emit_macho(img, w)` peer of - `link_emit_elf`. `MH_EXECUTE` + `MH_PIE`. Modern dyld path: - `LC_DYLD_CHAINED_FIXUPS` (§3.3 decision). `__TEXT,__stubs` - for imported function calls; `__DATA_CONST,__got` for - imported data. -2. **Imported-DSO load commands.** `LC_LOAD_DYLIB` per imported - `.dylib` input (peer of ELF's `DT_NEEDED`). The Phase 1 - linker model for DSO inputs (`LINK_INPUT_DSO_BYTES`) already - carries the soname-equivalent (Mach-O's `install_name`); on - the read side, `read_macho` extracts it from `LC_ID_DYLIB`. -3. **`LC_MAIN` entry.** The entry symbol resolution - (`img->entry_sym`) already happens generically; the format - emitter just packages it as `LC_MAIN`'s `entryoff`. -4. **First end-to-end exe.** A `test/cg/` hello-world targeting - `arm64-apple-macos` compiles + links via cfree and runs on - the host. Exit code threads through the standard pass/fail - line. -5. **libSystem coverage.** `printf` / `errno` / `malloc` — - linking against `libSystem.B.dylib` (the umbrella that - re-exports libc, libm, libdyld, libpthread). Sysroot extraction - for the Darwin SDK lives in a new `test/sdk/macos/` peer of - `test/libc/{musl,glibc}/` — `xcrun --show-sdk-path` is the - source on Darwin hosts; cross-from-Linux is out of scope. -6. **Universal binaries (deferred)** — fat-header wrapper around - per-arch slices. Lands when a user wants it, not earlier. - -Exit criterion: each Mach-O milestone owns a `test/cg/` case -running natively on a Darwin/arm64 host; pass/fail line green. -ELF suite still green. - ---- - -## 5. PE/COFF as a peer (forward look) - -The seams in §3 are sized for COFF too: - -- `obj/coff_emit.c` / `obj/coff_read.c` peer `obj/macho_*`. -- `obj/coff_reloc_<arch>.c` peer `obj/macho_reloc_*`. -- `link/link_coff.c::link_emit_coff` peer `link_macho.c`. PE - uses optional headers + data directories instead of Mach-O's - load commands; the LinkImage model is still adequate. -- `abi/abi_ms_x64.c` (Win64 ABI) and a hypothetical - `abi_ms_arm64.c` (Windows on ARM ABI) as ABI vtable peers. -- Windows-on-Linux execution is wine-shaped; on a Windows host - it is native. The exec helper grows a Windows native branch; - on Linux/Darwin hosts, COFF cases default to XFAIL until a wine - branch is added. - -The ABI vtable's `(arch, os)` keying naturally captures Microsoft -ABI vs SysV vs Apple — Windows-arm64 picks `ms_arm64`, not -`apple_arm64`, even though both are arm64. - -PE/COFF gets its own `MULTIOBJ_PE.md` design pass when its -critical path opens; this doc reserves the seams. - ---- - -## 6. Naming conventions - -For the new files and exposed symbols: - -- Mach-O code lives under `src/obj/macho_*` and - `src/link/link_macho.c`. Identifiers use the `macho_` prefix - (`macho_emit`, `macho_reloc_aarch64_to`, - `link_emit_macho`). -- Apple ABIs use the `apple_` prefix (`apple_arm64_vtable`, - `apple_x64_vtable`). The host OS is the discriminator; any - future Apple-only-on-arch prefix (e.g. for an iOS-specific - variant) would extend this — but iOS / tvOS / watchOS share - the same ABIs as macOS for the relevant arches, so no second - prefix is needed. -- Mach-O reloc translators stay arch-suffixed: - `macho_reloc_aarch64.c`, `macho_reloc_x86_64.c` (matches the - ELF translator naming). -- Win64 / Windows-ARM64 (deferred) use the `ms_` prefix - (`ms_x64_vtable`, `ms_arm64_vtable`). COFF code uses the - `coff_` prefix. - ---- - -## 7. Testing harness - -The existing `test/cg/` and `test/link/` matrices already do everything -the Mach-O work needs to validate against — round-trip (Path R), exec -(Path E), JIT (Path J), DWARF check (Path W). We extend that -infrastructure rather than standing up a parallel `test/macho/` peer -of `test/elf/`: `test-link`'s Path R covers `clang -c` → cfree-roundtrip -→ structural diff, and Path E covers run-the-binary. The same machinery -serves Mach-O once the harness can pick a Mach-O target and exec the -result. - -### 7.1 Target selection - -A new `CFREE_TEST_OBJ` env var sits parallel to `CFREE_TEST_ARCH`, -values `elf` (default) | `macho` (later `coff`). `cfree_test_target.h` -reads it and sets `t->obj` and `t->os` together (`macho` ⇒ MACOS) so -both the C runners and the shell drivers stay in lockstep. - -`test/cg/run.sh`'s clang-cross detection grows a Mach-O branch: - -- `elf` ⇒ `--target=<arch>-linux-gnu` as today. -- `macho` ⇒ `--target=arm64-apple-macos` (or `x86_64-apple-macos`). - -### 7.2 Per-case applicability - -`test/cg/cases/*.arches` becomes `*.targets`, listing one -`<arch>-<obj>` tuple per line (`aarch64-elf`, `arm64-macos`, -`x86_64-elf`, …). Cases with no file default to "all supported -tuples"; cases that exercise format-specific features (GNU IFUNC, -SHT_GNU_RETAIN, ELF linker scripts) name only the tuples they -support. The Phase-2 Mach-O allowlist starts with a small set — -hello-world, integer arithmetic, locals, calls, varargs — and grows -as ABI deltas and reloc-translator coverage land. - -### 7.3 Exec dispatch (`test/lib/exec_target.sh`) - -The queue tag widens from `<arch>` to `<arch>-<os>`: - -- `aarch64-linux` → existing podman/qemu path on a Linux container. -- `arm64-macos` (new) → on a Darwin/arm64 host, `chmod +x && ./exe` - natively (no podman, no qemu). On non-Darwin hosts, SKIP cleanly - with "macOS exec requires Darwin host". Mach-O cannot be loaded - by the Linux kernel. -- macOS-on-Linux is unsupported and stays SKIP. Linux-on-macOS - continues to flow through the podman path (already works on - Darwin/arm64 via `podman machine`). - -### 7.4 Phase-2 Path E (linker delegation) - -`link_emit_macho` doesn't exist until Phase 3, so Phase-2 Path E -delegates to host `clang`: `cfree -c case.c -o case.o` then -`clang -o case case.o`. A new `test/lib/link_macho_via_clang.sh` -peer of `link_exe_runner` packages this so `test/cg/run.sh` and -`test/link/run.sh` route Mach-O cases through it. Phase 3 swaps the -helper for cfree's own linker; cases don't change. - -Clang's invocation of `ld` automatically inserts an ad-hoc code -signature, so Phase-2 binaries exec on macOS 11+ without extra steps. -Phase 3 inherits that responsibility — see the codesigning task -below. - -### 7.5 Round-trip diff - -Path R already runs `cfree-roundtrip` (read → write) and structural- -diffs the input vs. the rewritten output. For ELF the diff is -`readelf -aW | normalize.py`. For Mach-O the equivalent is -`llvm-objdump --macho --syms --reloc --section-headers | normalize_macho.py`, -a small new normalizer alongside `test/elf/normalize.py`. This is -the only new Mach-O-specific test artifact Phase 2 ships — and -because the diff is structural, it doesn't have to be byte-perfect -against clang's output (just round-trip-stable through cfree's -reader/writer). - -### 7.6 Sysroot - -Layer-B / Path R round-trip needs no sysroot — clang produces the -`.o` without linking. Path E (Phase 2 via clang) needs the host SDK -on Darwin: `xcrun --show-sdk-path` is the only sanctioned source. -Cross-from-Linux is out of scope (Apple SDK isn't redistributable). -A new `test/sdk/macos/` peer of `test/libc/{musl,glibc}/` handles the -extraction, only invoked when libc-dependent cases are added (mostly -a Phase-3 concern; Phase-2 smoke can stay freestanding). - -### 7.7 `make` targets - -No new top-level harness target. Existing `make test-link` / -`make test-cg` honor `CFREE_TEST_OBJ` (and `CFREE_TEST_ARCH`); CI -runs them once per supported `(arch, obj)` tuple. The default -invocation stays `aarch64-elf` so `make test` behavior is unchanged. - ---- - -## 8. Validation gates - -A change in this plan is "done" when: - -- **Phase 1**: ELF test suite still green, byte-for-byte identical - output objects on a representative set of `test/cg/` cases. -- **Phase 2**: Mach-O `.o` produced by cfree links via host - `clang` into a runnable arm64-darwin executable; clang-built - `.o` round-trips through cfree's reader/writer (`test-link` - Path R with `CFREE_TEST_OBJ=macho`); ELF suite still green. -- **Phase 3**: `cfree -c` + `cfree` linker produces an - arm64-darwin Mach-O exe that runs natively on the Darwin host - (ad-hoc codesigned by `link_macho.c` so the kernel will exec - it); per-milestone `test/cg/` cases green; ELF suite still - green. - ---- - -## 9. Remaining work - -### 9.1 Phase 3.1 — `test-link` Path E on `aa64-macho` - -Phase 3 lit up the cc-driver path: `cfree cc -target arm64-apple-macos -src.c -o exe -lSystem` produces a runnable binary. But -`make test-link CFREE_TEST_OBJ=macho` reports **72 pass / 36 fail / 0 -skip** — the R (round-trip) and J (JIT) lanes are green across all 36 -test cases, while every E (exec) lane fails at link time with - - fatal: link: undefined reference to '___fini_array_end' - -(`__fini_array_end` shown after the Mach-O leading-`_` strip in the -diagnostic). - -The harness's `start.o` is built by host clang from -`test/link/harness/start.c`, which references the array-boundary -symbols (`__init_array_start/end`, `__fini_array_start/end`, -`__preinit_array_start/end`, `__cfree_ifunc_init`, -`__start_iplt_pairs`, `__stop_iplt_pairs`). On Mach-O, clang mangles -those with a leading `_` so the .o carries `___fini_array_end` (3 -underscores). - -`link_layout.c::boundary_name` already prefixes every -linker-synthesized boundary symbol on Mach-O. But when the runner is -exercised in isolation, those boundary symbols **never get -synthesized** — `emit_array_boundaries` evidently runs but the -resulting `LinkSymbol` doesn't satisfy the per-input shadow's -`defined=0` check. Two suspects, in order: - -1. The fan-out in `emit_boundary_sym` matches by `Sym` equality. If - the start.o's per-input shadow interns - `___fini_array_end` to a different `Sym` than `boundary_name` - produces (e.g. one path goes through `pool_intern` and another - through `pool_intern_cstr` with a length mismatch), they wouldn't - match. Both call sites use the same global `Pool`, so this should - be a no-op — but worth confirming with a single byte-level - comparison instead of the Sym-equality short-circuit. -2. `link-exe-runner` may panic in `resolve_symbols` (before - `emit_array_boundaries`) because the start.o's `__cfree_ifunc_init` - undef hits a code path that doesn't tolerate it. The earlier - widening of `is_def` (require backing storage) is the same change - that fixed the in-memory CALL26 case; it might be triggering a - different early panic now. - -Recommended next step: instrument `cfree_link_exe` to print every -`compiler_panic` site and the LinkSymbol state at the moment of the -first failure, then walk back from there. Stderr fprintfs from -`link_layout.c` were observed not to reach the runner's captured -stderr in one local repro — verify whether the runner's -`cfree_writer` redirection is intercepting them, or use a -`compiler_panic`-shaped marker that the runner does propagate. - -Other items the E lane will surface once it gets past the start.o -link: - -- `21_fini_array` / `22_init_fini_both` — Mach-O destructors flow - through `__cxa_atexit`, not the `.fini_array` shape `start.c` walks. - Same `j_targets`-style restriction the J lane already uses; extend - to E. -- `25a_gc_basic` / `25d_gc_chain` — `--gc-sections` granularity is - per-section, but Apple's clang emits a single `__TEXT,__text` per - `.o` (subsections-via-symbols is per-symbol). Same restriction. -- `kernel_image` cases — freestanding ELF kernels with their own - linker scripts; not portable to Mach-O at all. `targets` - applicability marker should drop them on `aa64-macho`. -- `bad/` cases that probe ELF-specific malformations (`shoff_oob`, - `wrong_class`) need either Mach-O analogues or `targets` exclusion. - -### 9.2 Phase 4 — `test-cg` Path E on `aa64-macho` - -`make test-cg CFREE_TEST_OBJ=macho` exercises every cg case end-to-end -via Path E (compile + link + run). Phase 4 prerequisites: - -- `test/cg/run.sh` already routes Path E for Mach-O through - `link_macho_via_clang.sh` (per Phase 2 §7.4); switch to cfree's own - linker once Phase 3.1 is green. -- `test/sdk/macos/` shim materializes `xcrun --show-sdk-path` for - `-isysroot` and `-lSystem` resolution. No-op on a Linux host — - cases requiring libc stay SKIP there. -- `*.targets` audit: every cg case that's currently `aarch64-elf`-only - should either grow `arm64-macos` or document why it's restricted - (linker scripts, IFUNC, ELF-specific intrinsics). - -### 9.3 Phase 5 — x86_64-darwin - -Additive on top of Phases 3–4, gated on `MULTIARCH.md` Phase 3 (x64 -codegen) landing. Concrete scope: - -- `obj/macho_reloc_x86_64.c` — `RelocKind` ↔ `X86_64_RELOC_*` - (UNSIGNED, SIGNED, BRANCH, GOT, GOT_LOAD, SUBTRACTOR, TLV, - SIGNED_1/2/4). Mirror of `macho_reloc_aarch64.c`. -- `link_emit_macho` arch dispatch — currently arm64-only at the - cputype/stub-encoding level. Add an x86_64 branch: 5-byte - `jmpq *got(%rip)` stubs (vs arm64's 12-byte adrp+ldr+br). -- `apple_x64_vtable` — likely a literal re-export of `sysv_x64_vtable` - per §3.4 design; revisit if testing reveals a quirk. -- `CFREE_TEST_ARCH=x64 CFREE_TEST_OBJ=macho` lane in CI on a - Darwin/x86_64 host (or skipped cleanly on Apple Silicon, since - Rosetta-emulation of cfree-emitted binaries isn't a goal). - -### 9.4 Phase 6 — universal (fat) binaries - -Optional. Fat header wrapping per-arch `MH_EXECUTE` slices. Defer -until a user wants `lipo`-style multi-arch output. Implementation is -shallow — a fat header prepended to the existing slice writer, plus -matching multi-arch reader. - -### 9.5 Cleanup deferred from Phase 3 - -- The Phase-2 deferred item (clang-emitted Mach-O round-trip via - `read_macho` → `emit_macho`) needs section-relative reloc and - `__compact_unwind` handling. Independent of linker work; lift out - into its own task. -- `read_tbd` is a permissive token scanner (every `_id` becomes an - exported sym). Tighten to filter Obj-C metadata (`_OBJC_CLASS_$_*`) - and `R<rev>$_*` reverse-export markers if Apple ever adds a symbol - whose textual form would clash with a real C identifier. -- `link_macho.c` carries a few oversize cleanup `free()` calls that - pass `0` for the byte size (the buffers came from `VEC_GROW` which - doesn't track capacity post-hand-off). Audit — leak-equivalent on - the panic path, harmless on success. diff --git a/doc/REGALLOC.md b/doc/REGALLOC.md @@ -1,346 +0,0 @@ -# REGALLOC — CG-driven spill/reload on a finite physical-register pool - -The single-pass codegen produces target machine code by streaming -`CGTarget` calls. Backends expose a finite scratch-register pool -(aarch64 today: 10 INT, 16 FP). CG must drive that pool correctly under -arbitrary register pressure: when the pool is empty and another reg is -needed, spill the deepest live value on the CG value stack to a frame -slot, free its register, and proceed. When a spilled value is consumed, -reload it first. - -This document defines the contract between CG and the backend and the -residency state machine on the value stack. Opt is out of scope — -`opt_cgtarget` panics on `spill_reg`/`reload_reg` today and that -remains the case until the deferred Phase 3 RA pass lands. - ---- - -## 1. What's broken today - -`aa_alloc_reg` panics with "out of INT scratch (no spill yet)" once 10 -integer scratch regs are live. To work around this, CG calls -`cg_reset_scratch` at every statement boundary -(`parse/parse.c:1618`), which calls `aa_reset_scratch(g->target)` -directly (`cg/cg.c:412`). That direct call is a layering violation: - -- It hardcodes the aarch64 backend by name. -- When opt is in the chain, `g->target` is an `OptImpl*`, not an - `AAImpl*`. `aa_reset_scratch` casts it as the latter and writes - `used_int=0; used_fp=0;` at AAImpl offsets inside opt's bytes — - silent memory corruption on every statement boundary. -- The reset hides the fact that CG never calls `free_reg` on consumed - SValues. The pool is recycled by fiat, not by tracking ownership. - -Both `cg_reset_scratch` and `aa_reset_scratch` are removed by this -design. `free_reg` becomes load-bearing. - ---- - -## 2. CGTarget contract changes - -### 2.1 `alloc_reg` returns `REG_NONE` on exhaustion - -```c -Reg (*alloc_reg)(CGTarget*, RegClass, const Type*); -``` - -Returns a fresh physical reg, or `REG_NONE` if the class's pool is -empty. Backends never panic on exhaustion — that decision belongs to -CG. Other failure modes (unknown `RegClass`, internal invariant -violation) still panic. - -### 2.2 `free_reg` is real - -```c -void (*free_reg)(CGTarget*, Reg); -``` - -Returns `r` to the backend's free-list. Idempotent calls and -double-frees are bugs and may panic. CG owes exactly one `free_reg` -per `alloc_reg` over the lifetime of every value. - -### 2.3 `spill_reg` implies `free_reg` - -```c -void (*spill_reg)(CGTarget*, Operand src_reg, FrameSlot, MemAccess); -``` - -`src_reg` must be `OPK_REG`. The call: - -1. Stores the register's contents to the frame slot per `MemAccess`. -2. Returns the register to the backend's free-list. - -After `spill_reg`, the caller must not use `src_reg` and must not call -`free_reg` on it. Coupling these two operations matches every CG -caller's intent and removes a forget-to-free foot-gun. - -### 2.4 `reload_reg` is independent - -```c -void (*reload_reg)(CGTarget*, Operand dst_reg, FrameSlot, MemAccess); -``` - -`dst_reg` must already have been allocated by the caller via -`alloc_reg`. Loads the slot's bytes into the register. The slot is not -released by this call — CG returns it to its own slot free-list. -Multiple reloads from the same slot are well-defined; CG does not rely -on this but the contract permits it. - -### 2.5 Backend pool implementation - -Each `RegClass` holds a `u32` free-mask: bit `i` set means the i-th -register in that class's contiguous range is free. The aarch64 backend -keeps two such masks alongside the per-class base register: - -```c -typedef struct RegPool { - u32 free; /* bit i set ⇔ regs[base + i] is free */ - u8 base; /* first physical reg in the class */ - u8 nregs; /* count; bits [nregs..32) are always 0 */ -} RegPool; -``` - -For aarch64: - -- INT pool: `base = 19`, `nregs = 10`, initial `free = 0x000003FFu` - (x19..x28). -- FP pool: `base = 8`, `nregs = 16`, initial `free = 0x0000FFFFu` - (v8..v23, callee-saves first then caller-saved scratch). - -The three pool ops are pure bit operations: - -```c -static Reg pool_alloc(RegPool* p) { - if (p->free == 0) return REG_NONE; - u32 idx = (u32)__builtin_ctz(p->free); - p->free &= ~(1u << idx); - return (Reg)(p->base + idx); -} - -static void pool_free(RegPool* p, Reg r) { - u32 idx = (u32)r - p->base; - /* Double-free is a CG bug — bit must currently be 0. */ - if (p->free & (1u << idx)) - compiler_panic(..., "free_reg: %u already free", (unsigned)r); - p->free |= (1u << idx); -} -``` - -`spill_reg` emits the store via the backend's existing store path, -then calls `pool_free` to release the bit. - -`__builtin_ctz` picks the lowest free bit, so allocation order is -deterministic and the same physical regs stay hot — matching today's -sequential allocation order for diff stability across the test corpus. -The 32-bit mask is sufficient: every architecture cfree targets has -fewer than 32 scratch regs per class. - ---- - -## 3. CG residency model - -### 3.1 SValue extension - -```c -typedef enum SResidency { - RES_INHERENT, /* IMM / LOCAL / GLOBAL — no reg owed */ - RES_REG, /* op holds a Reg this SValue owns */ - RES_SPILLED, /* register contents stored to spill_slot; - op.kind reflects the original value form - (REG or INDIRECT); op.v.reg is REG_NONE */ -} SResidency; - -typedef struct SValue { - Operand op; - const Type* type; - SResidency res; - FrameSlot spill_slot; /* valid iff res == RES_SPILLED */ - u8 pinned; /* 1 = ineligible spill victim */ -} SValue; -``` - -`OPK_INDIRECT` lvalues hold a base register and are tracked with -`RES_REG`. On spill the base reg is saved; on reload the SValue is -restored to `OPK_INDIRECT` with the freshly-reloaded base. The -deferred-load identity of the lvalue is preserved across spill/reload -— CG does not eagerly materialize an INDIRECT to a plain rvalue just -because it became a spill victim. - -`pinned` is set for the duration of one CG operation. Pinning prevents -a freshly-reloaded operand from being chosen as the spill victim while -CG is still arranging the other operand of a binop or call. - -### 3.2 Spill-slot pool - -CG maintains two free-lists of `FrameSlot`s, one per `RegClass`: - -- `RC_INT` slots are 8 bytes, 8-byte aligned. -- `RC_FP` slots are 16 bytes, 16-byte aligned (covers `double` and - the spilled portion of `long double`). - -A spill takes a slot from the free-list (allocating a fresh -`FrameSlot` from the backend if the list is empty). A reload returns -the slot to the free-list. Frame footprint is bounded by peak -concurrent spills per class, which on a stack machine of typical depth -is small. - -Slots are per-function. `cg_func_end` discards both free-lists. - -### 3.3 Invariants - -After every public `cg_*` call returns: - -1. Every `RES_REG` SValue on the stack owns its register; the sum of - `RES_REG` regs equals the backend's allocated-reg count. -2. No SValue is `pinned`. -3. Every `RES_SPILLED` SValue holds a slot that is *not* on the - slot free-list. - -`cg_func_end` asserts the stack is empty and both free-lists are well- -formed (every entry distinct, no aliasing with a live `RES_SPILLED` -slot). - ---- - -## 4. Spill / reload algorithm - -### 4.1 Allocation with fallback - -``` -alloc_reg_or_spill(g, cls, ty) -> Reg: - r = target->alloc_reg(cls, ty) - if r != REG_NONE: - return r - victim = pick_victim(g, cls) - if victim == NULL: - panic("regalloc: no spillable victim") // pinned-only is a CG bug - slot = take_spill_slot(g, cls) - target->spill_reg(victim.op, slot, mem_for_spill(victim, cls)) - victim.spill_slot = slot - victim.res = RES_SPILLED - victim.op.v.reg = REG_NONE - r = target->alloc_reg(cls, ty) - assert(r != REG_NONE) - return r -``` - -`pick_victim` walks the value stack from index 0 upward and returns -the first SValue with `res == RES_REG`, `pinned == 0`, and matching -`RegClass`. This is FIFO from the bottom — equivalent to "deepest -first" — and matches the intuition that the top of the stack is about -to be consumed. - -`mem_for_spill` constructs a `MemAccess` from the victim's type with -`alias.kind = ALIAS_LOCAL` rooted at the spill slot. - -### 4.2 Reload before consumption - -``` -ensure_reg(g, sv): - if sv.res != RES_SPILLED: - return - r = alloc_reg_or_spill(g, class_of(sv.type), sv.type) - target->reload_reg(op_reg(r, sv.type), sv.spill_slot, - mem_for_spill(sv, class_of(sv.type))) - return_spill_slot(g, sv.spill_slot, class_of(sv.type)) - sv.spill_slot = FRAME_SLOT_NONE - if sv.op.kind == OPK_INDIRECT: - sv.op.v.ind.base = r - else: - sv.op = op_reg(r, sv.type) - sv.res = RES_REG -``` - -`ensure_reg` is called at the start of every operation that consumes -a register-resident operand. The ensure-then-pin pattern is: - -``` -binop(g, op): - b = pop(g); ensure_reg(g, &b); b.pinned = 1 - a = pop(g); ensure_reg(g, &a); a.pinned = 1 - rd = alloc_reg_or_spill(g, class_of(result), result_ty) - target->binop(op, op_reg(rd, result_ty), a.op, b.op) - a.pinned = b.pinned = 0 - target->free_reg(reg_of(a)); target->free_reg(reg_of(b)) - push(g, svalue_reg(rd, result_ty)) -``` - -Pinning is symmetric within one CG call. Nothing leaks across calls. - -### 4.3 Termination - -Each `alloc_reg_or_spill` call either succeeds on first try or evicts -exactly one unpinned `RES_REG` SValue. The pinned set is bounded -(within one CG op, at most a handful: two source operands plus a -destination-in-progress). With at least `pinned + 1` registers in the -class, the second `alloc_reg` call in `alloc_reg_or_spill` is -guaranteed to succeed. Every backend's pool comfortably exceeds this -bound (aarch64: 10 INT, 16 FP; minimum needed: ~3). - -`cg_call` is the one exception to "victims live on the value stack." -While the pop loop materializes args, popped-but-not-yet-emitted regs -accumulate in the local `CGABIValue avs[]` array — off the value -stack and so invisible to `pick_victim`. For a call with more reg- -class args than the pool size can hold, `pick_victim` will eventually -return NULL while genuine victims remain in `avs[]`. - -To handle this, `cg_call` publishes its in-flight `avs` array via -`CG.avs_in_flight` for the duration of the pop+materialize loop. -`alloc_reg_or_spill` falls back to `spill_avs_victim` when the stack -victim list is empty: it picks an `OPK_REG` arg storage entry, -evicts it through `spill_reg`, and rewrites `avs[i].storage` to -`OPK_LOCAL` so the backend's call lowering loads from the slot. After -`T->call`, `cg_call` walks `avs` and returns each `OPK_LOCAL` slot to -the spill-slot free-list. Without this fallback, a 12-arg INT call on -aarch64 (10 INT pool) would be unsatisfiable. - -If a backend is added with a class smaller than the pinned bound for -some operation outside `cg_call`, that's a wiring bug; CG asserts -after the recursive `alloc_reg` call. - -### 4.4 Free-on-consume - -Every site that pops an SValue and is done with it must release any -register it owned: - -``` -release(g, sv): - if sv.res == RES_REG: - if sv.op.kind == OPK_REG: - target->free_reg(sv.op.v.reg) - else if sv.op.kind == OPK_INDIRECT: - target->free_reg(sv.op.v.ind.base) - else if sv.res == RES_SPILLED: - return_spill_slot(g, sv.spill_slot, class_of(sv.type)) - /* RES_INHERENT: nothing owed */ -``` - -`cg_drop`, the result-discard path of `cg_store`, the operand pops in -every `cg_binop`/`cg_unop`/`cg_cmp`/`cg_call`/`cg_load`/`cg_addr`/etc. -all flow through `release`. This is the audit that bulks up the diff — -CG has many `pop` sites and they currently leak. - ---- - -## 5. Removed mechanisms - -The following are deleted as part of this change: - -- `cg_reset_scratch` (`cg/cg.c`) — no replacement; `release` in §4.4 - makes it unnecessary. -- The `extern void cg_reset_scratch(CG*)` declaration and the five call - sites in `parse/parse.c`. -- `aa_reset_scratch` (`arch/aarch64.c`) and its forward declaration in - `cg/cg.c`. -- The "(no spill yet)" panics in `aa_alloc_reg` — replaced by - `return REG_NONE`. -- The `aa_panic("spill_reg")`/`aa_panic("reload_reg")` stubs — - replaced by real STR/LDR implementations that route through the - backend's existing store/load paths. - -The `used_int` / `used_fp` fields on `AAImpl` are repurposed: instead -of monotonically-incrementing allocation cursors (where free was a -no-op), they now track the highest-index-+1 ever allocated per class — -the high-water mark needed by the prologue/epilogue to size the -callee-save area. The actual free-list is the `int_free` / `fp_free` -bitmasks added alongside. diff --git a/doc/cg_testing.md b/doc/cg_testing.md @@ -1,308 +0,0 @@ -# cg / CGTarget / MCEmitter test strategy - -How we test the codegen stack — `cg`, `CGTarget`, and `MCEmitter` — *before* -the C parser is ready, and how the suite extends naturally once the parser -arrives. Companion to `DESIGN.md`. Scope: harness shape, layering rationale, -test paths, fixture API, and a coverage corpus. - -## 1. Goal - -Test the meat of the single-pass compiler — recursive-descent parser ↔ `cg` -↔ `CGTarget` ↔ `MCEmitter` ↔ `ObjBuilder` — with the parser stubbed out. -The fixtures play the parser's role: each one drives `cg` (and a small -number drive `CGTarget` or `MCEmitter` directly) to produce a function -named `test_main` that returns an `int`. The harness then runs that -function and checks the return value. - -This decouples codegen development from parser development: we can build -out the AArch64 backend, opt's recording wrapper, MCEmitter encoding, and -CG's value-stack/spill/fusion logic against behavioral oracles, without -needing a working C front-end. - -## 2. Why three layers (and how to test each) - -``` -parser → cg → CGTarget → MCEmitter → ObjBuilder - | | | | - | | | └─ already covered by test/elf, test/link - | | └─ encoding, fixups, relocs, alignment, CFI - | └─ typed lowering vtable: takes resolved Operands - └─ TCC-style value stack: spills, fusion, conversions, frame-residency -``` - -- **`MCEmitter`** is the lowest layer. Encoding-table bugs and reloc/fixup - bugs surface here. Best tested with hand-written byte sequences and a - one-instruction `test_main` (analogue of `test/elf/unit/smoke.c`). -- **`CGTarget`** is typed lowering. It receives `Operand`s that are - already resolved (REG / IMM / LOCAL / GLOBAL / INDIRECT). No value - stack, no implicit conversions. Best tested with focused unit anchors - that exhaustively exercise operand-kind combinations and op enums - (every `BinOp`, every `ConvKind`, every `MemOrder`). -- **`cg`** owns C-shaped behavior: value stack, spill/reload across - pressure, `cmp` → `cmp_branch` fusion, frame-residency-by-default for - locals, implicit MemAccess derivation, address-taken tracking. This is - what the parser will drive, so the **primary** suite drives `cg`. - -The three layers map to three case categories: - -| Category | Drives | Where | Volume | -|---|---|---|---| -| Primary | `cg.h` | `test/cg/cases/` | grows with C language coverage | -| Unit | `CGTarget` | `test/cg/unit/cgt_*` | one per method group, write-once | -| Unit | `MCEmitter` | `test/cg/unit/mc_*` | one per encoding family, write-once | - -When a cg-driven case fails mysteriously, the matching unit anchor tells -you whether the bug is in CGTarget/MCEmitter or in `cg` itself. - -## 3. Test paths per fixture - -Each fixture is run through several paths. This mirrors the R/E/J path -matrix in `test/link/run.sh` and reuses its harness binaries. - -| Path | Pipeline | Validates | Available on | -|---|---|---|---| -| **D** direct-JIT | fixture → `ObjBuilder*` → `link_add_obj` → `cfree_link_jit` → call `test_main` | live ObjBuilder → JIT path; fastest; no file I/O | aarch64 host | -| **R** roundtrip | fixture → `emit_elf` → bytes → `cfree-roundtrip` → `read_elf` → normalized diff | ELF writer/reader fidelity on synthetic input | always (host-arch agnostic) | -| **E** exec | fixture → `emit_elf` → bytes → `link-exe-runner` → exe → qemu/podman → exit code | file linker + reloc application + ELF emission | when qemu or podman available | -| **J** jit-via-file | fixture → `emit_elf` → bytes → `jit-runner` (reads .o) → call `test_main` | full file → JIT pipeline | aarch64 host | -| **O** opt-wrapped | as D and J, but with `opt_cgtarget` between `cg` and target | IPO + lowering preserve behavior | once opt lands | - -Path D is intentionally distinct from J: it catches bugs that the ELF -emitter writes into a `.o` but the reader silently corrects (or -vice-versa). The two paths together force the post-finalize ObjBuilder -shape and the read-back ObjBuilder shape to be behaviorally equivalent. - -The R, E, and J paths reuse the existing harness binaries unchanged -(`cfree-roundtrip`, `link-exe-runner`, `jit-runner`). Only `cg-runner` is -new. - -## 4. Layout - -``` -test/cg/ - CORPUS.md # coverage matrix (mirrors test/elf/CORPUS.md) - run.sh # per-fixture: D, R, E, J (and O once opt lands) - harness/ - cg_runner.c # multi-mode binary - cg_test.h / cg_test.c # fixture API used by every case - cases.c # registry: { name, build_fn, expected, flags } - start.c # symlink/copy of test/link/harness/start.c - cases/ - a01_return_const.c # cg-driven cases (the primary suite) - a02_return_void.c - c01_add_const.c - c02_arith_mix.c - ... - unit/ - mc_smoke.c # MCEmitter-direct - cgt_load_imm.c # CGTarget-direct anchors - cgt_binop_int.c - ... -``` - -`cg-runner` modes: - -``` -cg-runner --list # print every registered case name -cg-runner --emit NAME OUT.o # build the case's ObjBuilder, emit_elf to OUT.o -cg-runner --jit NAME # build, link_add_obj, cfree_link_jit, call test_main -cg-runner --dump NAME # build, emit_elf, run ArchDisasm over .text — no oracle -``` - -The `--dump` mode is for debugging only; it has no expected output and is -not run by default. Snapshot/golden disassembly tests are deferred. - -## 5. Fixture API - -The fixture API hides the boilerplate that every case shares: Compiler -init, ObjBuilder allocation, `cgtarget_new`, `mc_new`, the `CGFuncDesc` -+ `ABIFuncInfo` dance for `test_main`, and the value-stack vs CGABIValue -plumbing for `ret`. - -```c -/* test/cg/harness/cg_test.h */ - -typedef struct CgTestCtx CgTestCtx; -typedef void (*CgCaseFn)(CgTestCtx*); - -typedef struct CgCase { - const char* name; /* "a01_return_const" */ - CgCaseFn build; - int expected; /* test_main's return value; 0 if absent */ - unsigned flags; /* CG_CASE_* */ -} CgCase; - -extern const CgCase cg_cases[]; /* registry, NUL-terminated */ -extern unsigned cg_cases_count; - -/* Common interned types; populated by cg_test_init. */ -extern const Type *T_VOID, *T_I8, *T_I16, *T_I32, *T_I64, *T_U32, *T_U64, - *T_F32, *T_F64, *T_PTR_VOID; - -/* Helpers cases use. */ -CG* cgtest_cg(CgTestCtx*); -CGTarget* cgtest_target(CgTestCtx*); -Compiler* cgtest_compiler(CgTestCtx*); - -/* Begin a function `<ret_ty> test_main(void)`. Allocates the ObjSymId, - * builds CGFuncDesc, queries TargetABI, calls cgtarget->func_begin. */ -typedef struct CgTestFn CgTestFn; -CgTestFn* cgtest_begin_main(CgTestCtx*, const Type* ret_ty); - -/* Begin a named function with parameters; for cases that need helpers. */ -CgTestFn* cgtest_begin_fn(CgTestCtx*, const char* name, - const Type* ret_ty, - const Type* const* param_tys, unsigned nparams); - -/* Return a value sitting on top of cg's value stack; ret_ty must match. */ -void cgtest_ret_value(CgTestFn*); -void cgtest_ret_void (CgTestFn*); - -/* End the function (cgtarget->func_end). */ -void cgtest_end(CgTestFn*); - -/* Operand sugar (used by CGTarget-direct unit cases). */ -Operand IMM (i64 v, const Type*); -Operand REG_OF(Reg r, const Type*); -Operand LOCAL_(FrameSlot, const Type*); -Operand GLOBAL_(ObjSymId, i64 addend, const Type*); -Operand IND (Reg base, i32 ofs, const Type*); -``` - -A primary cg-driven case is then ~10 lines: - -```c -/* test/cg/cases/c01_add_const.c — int test_main(void) { return 1 + 2; } */ -#include "cg_test.h" - -static void build(CgTestCtx* ctx) { - CgTestFn* tf = cgtest_begin_main(ctx, T_I32); - CG* g = cgtest_cg(ctx); - cg_push_int(g, 1, T_I32); - cg_push_int(g, 2, T_I32); - cg_binop(g, BO_IADD); - cgtest_ret_value(tf); - cgtest_end(tf); -} - -CG_CASE_REGISTER(c01_add_const, build, /*expected=*/3); -``` - -A CGTarget-direct unit anchor: - -```c -/* test/cg/unit/cgt_load_imm.c */ -#include "cg_test.h" - -static void build(CgTestCtx* ctx) { - CgTestFn* tf = cgtest_begin_main(ctx, T_I32); - CGTarget* a = cgtest_target(ctx); - Reg r = a->alloc_reg(a, RC_INT, T_I32); - a->load_imm(a, REG_OF(r, T_I32), 0xdeadbeefLL); - /* hand-build CGABIValue for ret */ - cgtest_ret_reg(tf, r, T_I32); - cgtest_end(tf); -} - -CG_CASE_REGISTER(cgt_load_imm, build, /*expected=*/(int)0xdeadbeef); -``` - -`CG_CASE_REGISTER` is a macro that appends to the registry via a constructor -section or a manual table in `cases.c` (depending on freestanding constraints). -Since cfree itself is freestanding-only but the test runner is hosted, we can -use `__attribute__((constructor))` here; if that's undesirable we maintain -`cases.c` by hand as a list. - -## 6. CORPUS coverage groups - -Each group has a one-line description here; CORPUS.md will expand into -explicit cases. **Initial landing covers groups A and C only** — that's -the spine that proves the harness works end-to-end. - -| Group | Surface | Examples | -|---|---|---| -| **A** Lifecycle / return | `func_begin/end`, `ret`, return widths, sret | const return; void return; multiple returns; i8/i16/i32/i64/struct return | -| **B** Frame slots / params | `frame_slot`, `param`, address-taken, byval, sret-param | sum two int params; spill 9 params; address-taken local; struct byval | -| **C** Arithmetic | `binop` (all), `unop` | IADD/ISUB/IMUL; SDIV/UDIV/SREM/UREM; AND/OR/XOR; SHL/SHR_S/SHR_U; FADD..FDIV | -| **D** Compare / branch | `cmp`, `cmp_branch`, `scope_*`, fusion | cmp materialize 0/1; cmp_branch eq/ne/lt; structured if; structured if-else | -| **E** Convert | `convert` (all `ConvKind`) | sext/zext/trunc; itof/ftoi; fext/ftrunc; bitcast | -| **F** Memory | `load`/`store`/`addr_of`/`copy_bytes`/`set_bytes`/bitfield | load/store i8..i64; global rw; indirect rw; struct copy; memset zero; bitfield_load/store; volatile | -| **G** Calls | `call`, `ret`, ABI | direct + indirect; sret; byval arg; mixed gp+fp; >reg-count args; varargs call | -| **H** Control flow | `label`/`jump`/`scope_*`/`break_to`/`continue_to` | flat if; while; for-with-continue lands on inc; do/while; nested loops; goto | -| **I** alloca | `alloca_` | __builtin_alloca round-trip | -| **J** Variadics | `va_start_/va_arg_/va_end_/va_copy_` | walk va_list with int args; va_copy | -| **K** Atomics | `atomic_load/store/rmw/cas/fence`, `MemOrder` | per op × per order matrix (sampled) | -| **L** Intrinsics | `intrinsic` × `IntrinKind` | popcount/ctz/clz; bswap; expect (no-op); add_overflow multi-result | -| **M** Inline asm | `asm_block` | "r"/"=r" round-trip; "memory" clobber forces flush | -| **N** TLS | `tls_addr_of` | thread_local read/write | -| **O** Sections / globals | `frame_slot` + `DeclTable` | .data init; .bss tentative; .rodata constant pool; -ffunction-sections | -| **P** set_loc / debug | `set_loc`, MCEmitter line program | line numbers reach .debug_line (oracle TBD) | -| **Q** Multi-function (1 TU) | multiple func_begin/end | fn calls fn; mutual recursion; forward-declared callee | -| **R** opt-wrapped equivalence | `opt_cgtarget` | every other group's exit codes preserved through opt | - -Group order roughly tracks priority. A is the prerequisite for every -other group; B + C + D are needed for almost any non-trivial case; -F + G + H bring us to "real C". K, M, N, P, R can land later without -blocking the rest. - -## 7. Initial landing — Spine (A + C) - -Concrete first cases. Each is `int test_main(void) { return ...; }`. - -| Case | Body | expected | -|---|---|---| -| `a01_return_const` | `return 42;` | 42 | -| `a02_return_zero` | `return 0;` | 0 | -| `a03_return_neg` | `return -7;` | -7 (i.e. 249 mod 256 from runner) | -| `a04_return_i8` | `int test_main(void){return (i8)200;}` exercising widening | as expected | -| `c01_add` | `return 1 + 2;` | 3 | -| `c02_sub_mul` | `return 7 * 3 - 4;` | 17 | -| `c03_div_mod` | `return 23 / 4 + 23 % 4;` | 8 | -| `c04_bitwise` | `return (~3) & 0xff;` | 252 | -| `c05_shift` | `return (1 << 5) | (16 >> 1);` | 40 | - -Plus unit anchors: - -| Case | Layer | Notes | -|---|---|---| -| `mc_smoke` | MCEmitter | one-insn `test_main`; analogue of `test/elf/unit/smoke.c` | -| `cgt_load_imm_ret` | CGTarget | `alloc_reg` + `load_imm` + `ret` only, no `cg` | -| `cgt_binop_iadd` | CGTarget | `binop(BO_IADD, REG, IMM)` only, no `cg` | - -These exit-code oracles are enough to drive AArch64 backend bring-up -through return + integer arith. The R/E/J/D paths from §3 give us -ELF-roundtrip, link-exe + qemu, JIT, and direct-JIT coverage for free. - -## 8. Build integration - -Add a `test-cg` target in `test/test.mk`: - -```make -test-cg: lib bin-soft - bash test/cg/run.sh -``` - -Wire into the `test` aggregate. `cg-runner` builds against `libcfree.a` -the same way `link-exe-runner` and `jit-runner` do -(`$(CC) $(DRIVER_CFLAGS) ... $(LIB_AR)`). - -`run.sh` walks `cg-runner --list`, then per case runs paths D, R, E, J -(and later O), reporting one PASS/FAIL/SKIP line per (case, path) pair — -identical to `test/link/run.sh`. Skip-vs-fail is governed by -`CFREE_TEST_ALLOW_SKIP`, matching the existing convention. - -## 9. Non-goals (this strategy) - -- **Encoding/disassembly snapshots.** Deferred. Behavioral exit codes - only; we'll add encoding lock-ins for specific instruction-selection - guarantees later, surgically. -- **Negative tests.** Not in this harness. CG/CGTarget contract - violations are covered by assertions inside the implementations and by - parser-level diagnostics tests once the parser exists. -- **C-source-level tests.** Those arrive once the parser is real; they - share the R/E/J paths but bypass the cg-runner fixture API. The - fixtures here continue to exist as bisection anchors. -- **Cross-arch.** AArch64 only for now. The harness is per-arch (cases - may be reused if encodings differ but ABI does not); a future x86_64 - pass duplicates the runner with target = x86_64 and the same case set - modulo arch-specific overrides. diff --git a/doc/linker-status.md b/doc/linker-status.md @@ -1,238 +0,0 @@ -# Linker / JIT / ELF status - -Tracks the three behavioral harnesses that share the link + obj surface: - -- `make test-elf` — ELF object-file fidelity (read / write / roundtrip). -- `make test-link` — link + JIT (R/E/J paths per case). -- `make test-cg` — codegen + JIT (D/R/E/J paths per case). - -`test-elf` is **strictly object-file fidelity**. Linker and exe behavior -live in `test/link/` — they are not duplicated in `test/elf/`. - ---- - -## Current results - -| Harness | Pass | Fail | Notes | -|--------------------------|-----:|-----:|--------------------------------------| -| `test-elf` | 37 | 0 | All Layer A/B/C green | -| `test-link` R (aa64) | 38 | 0 | object roundtrip via cfree-roundtrip | -| `test-link` E (aa64) | 37 | 0 | qemu/podman aarch64 exec, incl. IFUNC | -| `test-link` J (aa64) | 38 | 0 | JIT in-process incl. GC subgroup, IFUNC, TLS | -| `test-link` R (rv64) | 38 | 0 | object roundtrip via cfree-roundtrip | -| `test-link` R (aa64-macho) | 36 | 0 | Mach-O object roundtrip via cfree-roundtrip-macho (3 cases SKIP-NA: ELF-only) | -| `test-link` E (rv64) | 38 | 0 | qemu/podman riscv64 exec, incl. IFUNC + TLS | -| `test-link` bad | 2 | 0 | `bad/30_undef_strong` (E + J) | -| `test-musl` | 6 | 0 | musl 1.2.5 static + dynamic: syscall, errno, printf | -| `test-glibc` | 3 | 0 | glibc 2.36 dynamic: syscall, errno, printf | - -(R = roundtrip; E = link → aarch64 ELF → qemu/podman; J = JIT in-process.) - -`test-musl` links real C against pinned musl 1.2.5 in two variants: -**static** (libc.a + cfree's own `rt/build/aarch64-linux/libcfree_rt.a`, -classic ET_EXEC) and **dynamic** (libc.so + Scrt1.o, ET_DYN PIE with -PT_INTERP / PT_DYNAMIC / PLT / .got.plt and BIND_NOW resolution -against the runtime loader). Both variants run the result under -qemu/podman. Sysroots are produced by `test/libc/musl/Containerfile` -(Alpine 3.20 + musl 1.2.5-r3) and `test/libc/glibc/Containerfile` -(Debian bookworm + glibc 2.36); cases are shared under -`test/libc/cases/`. Excluded from the default `make test` because -they need podman. - ---- - -## What works today - -`cfree ld` links real aarch64-linux executables in both **static** -ET_EXEC and **dynamic** ET_DYN PIE shapes, including against real -musl libc.a / libc.so + cfree's own `libcfree_rt.a`. `printf("hello, -musl")` works end-to-end against the runtime loader -(`/lib/ld-musl-aarch64.so.1`). Beyond that: - -- **Reloc kinds applied (AArch64):** ABS{16,32,64}, PREL{16}, REL32, - PC32, CONDBR19, TSTBR14, LD_PREL_LO19, ADR_PREL_LO21, JUMP26 / - CALL26, ADR_PREL_PG_HI21{,_NC}, ADD_ABS_LO12_NC, - LDST{8,16,32,64,128}_ABS_LO12_NC, - ADR_GOT_PAGE / LD64_GOT_LO12_NC, - TLSLE_ADD_TPREL_{HI12,LO12_NC}. Plus a synthetic R_ABS64 emitter - for GOT slot fill. **Reads every reloc kind in musl 1.2.5 aarch64 - libc.a.** Dynamic emit pass also produces R_AARCH64_RELATIVE, - R_AARCH64_GLOB_DAT, and R_AARCH64_JUMP_SLOT records (.rela.dyn / - .rela.plt) for the runtime loader. -- **Reloc kinds applied (RISC-V LP64):** ABS{32,64}, PC32, HI20, - LO12_{I,S}, BRANCH, JAL, CALL / CALL_PLT (auipc+jalr pair), - RVC_BRANCH, RVC_JUMP, TPREL_HI20, TPREL_LO12_{I,S}. Marker relocs - (RELAX, TPREL_ADD) are accepted as no-ops; cfree does not relax. - PCREL_HI20 / PCREL_LO12_{I,S} and GOT_HI20 are recognized in widths - but not yet exercised by the test corpus — slot-PC pairing is - follow-up work. -- **Symbol resolution:** STB_GLOBAL/WEAK/LOCAL replacement strength; - STV_HIDDEN; SHN_COMMON coalesce-to-largest; STT_FILE / STT_SECTION - pass-through. Weak archive defs satisfy unresolved refs (matches - GNU ld / lld; required for musl's weak `__init_tls`). -- **Linker-synthesized symbols:** `__init_array_start/end`, - `__fini_array_start/end`, `__tdata_start/end` (vaddrs of the .tdata - template), `__tbss_size` (SK_ABS holding the .tbss byte count), and - general `__start_<X>`/`__stop_<X>` for any encoding section. -- **Section / segment layout:** four-bucket RX / R / RW / TLS - partition, BSS, init/fini/preinit_array, synthetic `.got`. - **Same-named input sections merge by first-occurrence** — required - for `_init`/`_fini` to be contiguous when `.init` / `.fini` come - from crti.o + crtn.o. `-ffunction-sections` / `-fdata-sections` - flow through naturally. -- **TLS local-exec (AArch64 + RV64):** - `R_AARCH64_TLSLE_ADD_TPREL_{HI12, LO12_NC}` and - `R_RISCV_TPREL_{HI20,LO12_I,LO12_S}` apply against the per-image - TLS span; .tdata/.tbss - sections (SHF_TLS) layout into a dedicated SEG_TLS segment with - natural alignment preserved on PT_TLS (separate from the - containing PT_LOAD's page align). The exe writer emits both the - PT_LOAD (so the kernel maps the .tdata template) and a PT_TLS - pointing at it; the AArch64 ABI's 16-byte TCB offset is folded - into S at apply time. The freestanding `_start` (and `jit-runner`) - build the per-thread block — TCB(16) | .tdata copy | .tbss zero — - using the synthesized boundary symbols and `msr TPIDR_EL0`. On - Darwin libc routinely clobbers TPIDR_EL0, so the harness keeps - msr → blr back-to-back with no libc calls between. -- **Inputs:** loose `.o`, `.a` (demand + `--whole-archive`), - `--start-group` / `--end-group` cyclic resolution. -- **GC:** `--gc-sections` at section granularity. Roots: entry sym, - init/fini/preinit_array, `SF_RETAIN` (`SHF_GNU_RETAIN`), - `__start_/__stop_` referents. Edges follow per-section relocs to - fixed point. -- **IFUNC trampoline (JIT and ELF):** every defined `STT_GNU_IFUNC` - symbol gets a 12-byte stub in a synthetic `.iplt` (RX) section - and an 8-byte slot in `.igot.plt` (RW). AArch64 stub is - `adrp x16, slot ; ldr x16,[x16,:lo12:slot] ; br x16`; RV64 stub - is `auipc t1, hi ; ld t1, lo(t1) ; jalr x0, t1`. The RV64 stub's - PC-rel displacement to its slot is invariant under the segment - shift, so the bytes are pre-encoded at layout time without - apply-time relocs. The IFUNC's vaddr is redirected - to the stub, and cross-TU undef refs to the same name are - re-pointed at the stub via a propagation pass at the tail of - `layout_iplt`. JIT load calls each resolver in-process after - applying relocs and writes the chosen implementation pointer - into the slot. ELF emit also materializes a parallel - `.iplt.pairs` data section (alternating `(resolver_ptr, slot_ptr)` - u64s, filled via `R_ABS64`) plus boundary symbols - `__start_iplt_pairs` / `__stop_iplt_pairs`, and synthesizes a - one-entry `.preinit_array` referencing - `__cfree_ifunc_init` (provided by `libcfree_rt.a`). Preinit - runs strictly before any `.init_array` ctor, so user ctors - that call IFUNCs see filled slots. The rt member is pulled - via demand-load: `link_ingest_archives` seeds - `__cfree_ifunc_init` into the archive wanted set whenever an - input defines an IFUNC and `link_set_emit_static_exe` was set - (which `cfree_link_exe` does and `cfree_link_jit` does not). -- **Format fidelity:** ELF read+write byte-stable for the supported - subset; `EI_OSABI=GNU` flips automatically when GNU extensions are - present. -- **Exe section + symbol tables:** the static ET_EXEC writer emits - `.symtab` / `.strtab` / `.shstrtab` and a section header table. - Defined symbols carry final absolute addresses (IMAGE_BASE + image - vaddr); SK_FILE / SK_ABS / SK_COMMON map to SHN_ABS / SHN_COMMON; - per-input undef-vs-canonical-def shadow records are deduped via - `img->globals`. Per-name input sections survive into the output as - one `(segment, name)` shdr — `.text`, `.rodata`, `.data`, `.bss`, - `.init_array`, `.fini_array`, `.eh_frame`, `.got`, etc., named - per their input. `nm`, `objdump -t`, `readelf -s` all work. -- **Build-id:** an allocatable `.note.gnu.build-id` with a 16-byte - digest goes into the headers PT_LOAD; a PT_NOTE phdr makes it - discoverable via `dl_iterate_phdr`. The digest is FNV-1a 64 over - each segment with two seeds, mixed into 128 bits — deterministic - given the post-relocation segment bytes. -- **`.eh_frame` flow-through:** input `.eh_frame` survives into the - output with a properly named PROGBITS+ALLOC shdr at its final - vaddr. Sufficient for `backtrace()` past the innermost frame on - toolchains that scan `.eh_frame` linearly; fast lookup via - `.eh_frame_hdr` + PT_GNU_EH_FRAME is still TODO (binary search - index over FDEs). -- **Dynamic linking against `.so` deps:** `cfree ld -pie -o out - Scrt1.o crti.o user.o libc.so libcfree_rt.a crtn.o` produces an - ET_DYN PIE that runs against the musl runtime loader. The driver - parses `-dynamic-linker`, recognizes `.so` / `.so.N` positional - inputs, and routes `-l<name>` under `-Bdynamic` to `lib<name>.so` - before `lib<name>.a`. The link image carries a synthetic - `.interp` / `.dynsym` / `.dynstr` / `.gnu.hash` / - `.rela.dyn` / `.rela.plt` / `.plt` / `.got.plt` / `.dynamic` - layout, with PT_PHDR / PT_INTERP / PT_DYNAMIC / PT_GNU_STACK - phdrs, DT_NEEDED per consumed DSO soname, and `DF_1_NOW` - (BIND_NOW eager binding). PLT0 + per-import 16-byte stubs are - emitted; CALL26 / JUMP26 against an imported symbol is rewritten - to its PLT entry, and abs / GOT-slot references against imports - emit `R_AARCH64_GLOB_DAT` so the loader patches the resolved - runtime address before user code runs. PIE internal abs64 - fixups emit `R_AARCH64_RELATIVE`. -- **Driver:** `cfree ld -static -o out crt1.o crti.o user.o libc.a - libcfree_rt.a crtn.o` works. Output is chmod 0755 on success. -- **JIT path** runs the same resolved image in-process; MAP_JIT on - Apple Silicon. - ---- - -## Gaps before this can replace GNU ld / lld - -Each row below would break a typical real-world Linux invocation. Roughly -ordered by how often the gap actually bites. - -| Gap | What breaks | Effort | -|-----|-------------|--------| -| **`.eh_frame_hdr` + PT_GNU_EH_FRAME** | `.eh_frame` already flows through with a proper shdr; without `.eh_frame_hdr` libgcc/libunwind fall back to linear FDE scan, and `dl_iterate_phdr` consumers (most modern unwinders) skip the section entirely. Needs FDE parsing + sorted binary-search table emission. | medium | -| **`.debug_*` in the exe** | No DWARF → `gdb` blind on source lines. cfree's debug pipeline ends at the obj boundary; the linker drops non-`SF_ALLOC` sections. | medium | -| **TLSGD / TLSIE / TLSLD relocs** | Read but not applied. Needed for `-fpic` TLS or shared-lib TLS. Local-exec works; the dyn-link cut leaves GD/IE/LD as Phase 8. | medium | -| **`cfree_link_shared` (`-shared` ET_DYN libs)** | Driver and inputs are wired (DSO read, dyn tables) but `cfree_link_shared` still panics with "not yet implemented". The parallel path through `link_exe` would only need `output_kind = SHARED`, `allow_undefined = 1`, no entry sym, DT_SONAME, exports promoted into dynsym. | small (after Phase 5) | -| **`--export-dynamic` exports in dynsym** | Imports are in `.dynsym`; internal exports the consumer wants visible to dependents (e.g. dlopen plugins, callbacks the loader resolves) are not yet promoted. Not exercised by the musl harness. | small | -| **Linker scripts** | `link_set_script` panics with "not yet implemented". Parser exists in `cfree_link_script_parse` but isn't wired into `link_resolve`. | medium | -| **COMDAT-group atomicity in `--gc-sections`** | C++ inline / weak-template instantiations under `SHF_GROUP` could lose group members. C-only inputs don't exercise it. | small | -| **`crt1.o`/`crti.o`/`crtn.o` auto-link** | Driver doesn't auto-include a C runtime; the user passes `crt1.o crti.o ... crtn.o` explicitly. Cosmetic. | small (driver-only) | - -**Bottom line:** for aarch64-linux executables — both static ET_EXEC -and dynamic ET_DYN PIE against real musl — `cfree ld` is a working -linker. STT_GNU_IFUNC in ELF output (rt-driven preinit) and BIND_NOW -dynamic linking against `.so` deps both pass end-to-end. The next -priorities, roughly in order: - -1. **`.eh_frame_hdr` + PT_GNU_EH_FRAME** — `.eh_frame` already flows - through; building the binary-search index over FDEs unblocks fast - unwind and `dl_iterate_phdr`-driven consumers (modern libunwind, - libgcc's `_Unwind_Find_FDE`). -2. **`.debug_*` in the exe** — DWARF flow-through; the linker - currently drops non-`SF_ALLOC` sections at `section_kept`. -3. **`cfree_link_shared`** — the dyn-table machinery is reusable - from the exe path; producing `libfoo.so` is mostly a dispatch - wrapper plus exports-into-dynsym. - -TLS GD / IE / LD modes remain Phase 8 work; lazy-binding (no -`DF_1_NOW`) is a follow-up that needs a real `_dl_runtime_resolve` -PLT0 — eager binding is fine for v1. - -The IFUNC iplt stub bytes (`0x90000010 / 0xf9400210 / 0xd61f0200`) -are still hand-encoded inline in `layout_iplt`; moving them behind -a per-arch `LinkArch.emit_iplt_stub` hook in `src/arch/<arch>.c` -is bounded follow-up work — useful when a second arch lands but -not load-bearing today. - ---- - -## test-link harness — speed and ergonomics - -`test/link/run.sh` accepts: - -``` -./run.sh [name_filter] [paths] # or CFREE_TEST_FILTER / CFREE_TEST_PATHS -``` - -`name_filter` is a substring against case dir names (e.g. `02`, -`weak`); `paths` is any subset of `REJ` (default `REJ`). PASS/FAIL -lines carry per-case ms timings; a totals line prints per-path wall -time. - ---- - -## Build hygiene (still load-bearing) - -- `Makefile` uses `-MMD -MP` so header edits force dependents to rebuild. - -If a test result looks impossible given the source, suspect staleness -first (`make clean && make lib && make test-link`). If that then works, -investigate the source of staleness and fix the Makefile. diff --git a/doc/rv64-status.md b/doc/rv64-status.md @@ -1,140 +0,0 @@ -# RV64 codegen status - -Living checklist for the RISC-V (RV64IMFD, LP64D) backend (`src/arch/rv64.c`) -and ABI (`src/abi/abi_rv64.c`). Behavioral oracles are `test/cg/` and -`test/parse/`. Phase status: - -- ✅ landed -- 🚧 in progress -- ⬜ planned - ---- - -## Current test-cg results - -Run from an aarch64 host; D / J are skipped because they need a native -rv64 host (same pattern as x64 on aa64). R verifies emit→reader fidelity, -E links and runs under qemu-riscv64 via podman. - -| Path | Pass | Fail | Skip | -|----------------------------|-----:|-----:|-----:| -| R (roundtrip) | 386 | 0 | 0 | -| E (qemu exec) | 387 | 0 | ~ | -| D / J (native JIT) | 0 | 0 | 772 | - -Skips are valid: D and J require host == rv64. With -`CFREE_TEST_ALLOW_SKIP=1`, the suite reports **773 pass, 0 fail, 768 skip**. - ---- - -## Phase 1 — Backend foundation ✅ - -- ✅ `rv64_isa.h` — R/I/S/B/U/J encoders, FP (F+D), atomic (A), FCVT, FMV -- ✅ Register pools (s2..s11 int, fs2..fs11 fp); scratches t0..t3, ft0 -- ✅ Frame layout: locals at s0 - off, callee saves below locals, - saved-s0/ra at the top, outgoing args at sp+0 -- ✅ Prologue placeholder + func_end patch (mirrors aa64's pattern) -- ✅ Epilogue restores sp from s0 when alloca was used -- ✅ ABI: replaced indirect-everything stub with real LP64D scalar + - small-aggregate classification (≤16B → up to 2 INT parts) -- ✅ `mc.c` apply_fixup handles R_RV_BRANCH and R_RV_JAL - -## Phase 2 — Core ops ✅ - -- ✅ load_imm: ADDI / LUI+ADDIW / multi-step 64-bit -- ✅ copy, load, store, addr_of (LOCAL / INDIRECT / GLOBAL via AUIPC+PCREL) -- ✅ binop (all int + FP add/sub/mul/div), unop, cmp, cmp_branch -- ✅ convert: SEXT/ZEXT/TRUNC/ITOF/FTOI/FEXT/FTRUNC/BITCAST -- ✅ Structured scopes: IF / LOOP / BLOCK -- ✅ Calls (direct AUIPC+JALR with R_RV_CALL; indirect JALR) -- ✅ Sret returns (caller passes dst pointer in a0; callee spills to slot) -- ✅ alloca (const + runtime size, max_outgoing patch site) -- ✅ Aggregate copy_bytes / set_bytes / bitfield_load / bitfield_store -- ✅ Atomics: load/store (with LR/fence sequences), AMO via LR/SC, CAS -- ✅ Intrinsics: memcpy/memmove/memset, popcount, ctz, clz, bswap16/32/64, - add/sub/mul_overflow, expect, assume_aligned, prefetch, trap - -## Phase 3 — Variadic LP64D ✅ - -Variadic-args calling convention with **save area contiguous with caller's -stack args** so a single `void*` walk works for any number of args. - -- ✅ va_list = `void*`; va_start / va_arg / va_end / va_copy -- ✅ va_arg handles RC_INT and RC_FP (bitcast via FMV.X.{W,D}) -- ✅ Variadic FP **args being passed** are bitcast into integer regs - (RC_FP storage → FMV.X.{W,D} → a-reg) -- ✅ Save area sits at the very top of the frame, above the saved-s0/ra - pair, so [save_area, save_area+64, caller's stack] is one contiguous - byte stream — `save_area[8]` coincides with the caller's first stack arg -- ✅ Prologue spills only a_{next_param_int}..a7 (named int params already - landed in their own slots; sret consumes a0 when present) -- ✅ Stack-arg reads in `rv_param` use `caller_stack_base = 16 + 64` for - variadic functions to skip past the save area - -## Phase 4 — TLS LE ✅ - -Local-Exec model. The LUI+ADD+ADDI / TPREL_HI20+LO12 sequence and -`__tbss_size` / TLS-image layout were already correct; the remaining -failures (`n02_tls_store_le`, `n07_tls_bss_zero_init`) were rejected by -qemu-riscv64-static before reaching code with the diagnostic -"PT_LOAD with non-writable bss". Cases with no `.tdata` (only `.tbss`) -produced an SF_TLS PT_LOAD whose `p_memsz` extended past `p_filesz=0` -into the `.tbss` span without PF_W. - -- ✅ ELF PT_LOAD over an SF_TLS segment now keeps `p_memsz == p_filesz`; - the `.tbss` extension is described exclusively by PT_TLS, which the - loader uses to size each thread's TLS block (see `src/link/link_elf.c` - segment phdr loop). Matches what GNU ld emits — `.tbss` consumes no - PT_LOAD memory. - -## Phase 5 — test-parse on rv64 ⬜ - -`test-parse` is the file-driven C-parser harness (`test/parse/`); it -reuses the cg roundtrip/exec runners and exercises the parser through -the same R/E/J/W paths. Today it runs aa64-only. - -- ⬜ Verify the parse runner picks up CFREE_TEST_ARCH=rv64 (likely needs - no changes — it already uses cfree_test_target_init) -- ⬜ Run `CFREE_TEST_ARCH=rv64 make test-parse` and triage failures -- ⬜ Decide on opt-out filtering for arch-specific parse cases (asm - templates, target-specific builtins). Pattern follows the per-case - `arches` mask added to test/cg -- ⬜ Land a phased rv64 entry in `test/parse/CORPUS.md` mirroring this doc - -## Phase 6 — Beyond v1 ⬜ - -- ⬜ `mc_smoke` rv64 sibling (hand-crafted bytes that return 42) -- ⬜ Compressed (RVC) emission when output is denser -- ⬜ M-extension overflow detection for `mul_overflow` i64 (today panics) -- ⬜ Zbb fast paths for popcount/ctz/clz/bswap when ABI permits -- ⬜ Inline asm (`rv_asm_block` panics today) -- ⬜ Real address-out-of-imm12 expansion in addr_base (panics on giant - frames; the test corpus stays well within the imm12 window) - ---- - -## Known mismatches with aa64 conventions - -- Reg pools are 10 wide (s2..s11) vs. aa64's 10 (x19..x28). s1 (and - fs0/fs1) are reserved/unused. -- AUIPC+PCREL_HI20/LO12 anchor symbols (`.LpcrelHi<N>`) are emitted as - SB_LOCAL into the current section; cfree-ld looks them up by AUIPC - vaddr (see `src/link/link_elf.c:rv_pcrel_lo12_disp`). -- The aarch64 backend pairs saved fp+lr via STP; rv64 has no pair-store - so prologue is two SDs instead. - ---- - -## Reproduce - -```sh -# rv64 cg (R-only avoids needing podman): -make lib -CFREE_TEST_ARCH=rv64 bash test/cg/run.sh '' R - -# rv64 cg (all paths, qemu-riscv64 + podman required): -CFREE_TEST_ARCH=rv64 CFREE_TEST_ALLOW_SKIP=1 bash test/cg/run.sh '' DREJW - -# rv64 parse (planned): -CFREE_TEST_ARCH=rv64 make test-parse -``` diff --git a/src/abi/abi.c b/src/abi/abi.c @@ -287,8 +287,7 @@ const Type* abi_va_list_type(TargetABI* a, Pool* p) { * ARM64 / x86_64 ABIs differ from AAPCS64 / SysV-x64 in calling * convention details (variadic-on-stack, va_list shape, stack-arg * promotion). Microsoft x64 / Windows-on-ARM64 will land here as - * additional (arch, OS_WINDOWS) cases when COFF support arrives. - * See doc/MULTIOBJ.md §3.4. */ + * additional (arch, OS_WINDOWS) cases when COFF support arrives. */ static const ABIVtable* select_vtable(Compiler* c) { switch (c->target.arch) { case CFREE_ARCH_ARM_64: diff --git a/src/abi/abi_apple_arm64.c b/src/abi/abi_apple_arm64.c @@ -1,7 +1,7 @@ /* Apple ARM64 (Darwin) ABI dispatch. * - * Phase 2 of doc/MULTIOBJ.md. Vtable selection keys on - * (target.arch, target.os); (ARM_64, MACOS) lands here instead of + * Vtable selection keys on (target.arch, target.os); + * (ARM_64, MACOS) lands here instead of * AAPCS64. The two ABIs diverge in: * * 1. va_list shape — Apple ARM64 `__builtin_va_list` is plain diff --git a/src/abi/abi_internal.h b/src/abi/abi_internal.h @@ -22,7 +22,7 @@ extern const ABIVtable aapcs64_vtable; extern const ABIVtable sysv_x64_vtable; extern const ABIVtable rv64_vtable; /* Apple Darwin variants — selected when (arch, os) matches. See - * abi.c::select_vtable and doc/MULTIOBJ.md §3.4. */ + * abi.c::select_vtable. */ extern const ABIVtable apple_arm64_vtable; /* Shared TargetABI internals. The struct definition is here so each ABI diff --git a/src/arch/aarch64.c b/src/arch/aarch64.c @@ -1939,8 +1939,7 @@ static void emit_arg_value(CGTarget* t, const CGABIValue* av, u32* next_int, * * Apple ARM64 (Darwin) diverges: variadic args go on the stack only. * Detect the synthesized-vararg case and bump the next-int / next-fp - * cursors past the register pool so the part below routes to stack. - * See doc/MULTIOBJ.md §3.4. */ + * cursors past the register pool so the part below routes to stack. */ ABIArgInfo va_ai; ABIArgPart va_pt; const ABIArgInfo* ai = av->abi; diff --git a/src/arch/x64.c b/src/arch/x64.c @@ -2,7 +2,7 @@ * * Phase-2 placeholder: the vtable is wired up but every method panics. * This proves the cgtarget_new dispatch reaches an x64-shaped target. - * Phase 3 fills in real codegen — see doc/MULTIARCH.md §4. */ + * Phase 3 fills in real codegen. */ #include <string.h> diff --git a/src/cg/cg.c b/src/cg/cg.c @@ -11,7 +11,7 @@ * - OPK_LOCAL / OPK_GLOBAL / OPK_INDIRECT are lvalues. cg_load promotes * them to OPK_REG via target->load + a fresh scratch register. * - * Register pressure & spill (see doc/REGALLOC.md): + * Register pressure & spill: * - Each SValue carries an SResidency tag (INHERENT / REG / SPILLED). * REG-residing SValues own a physical scratch register that must be * released back to the pool when the value is consumed; SPILLED @@ -57,7 +57,7 @@ * (IMM, LOCAL, GLOBAL) carry no register obligation. REG values own a * physical scratch register. SPILLED values had their register evicted * to a frame slot under register pressure and must be reloaded before - * consumption. See doc/REGALLOC.md §3. */ + * consumption. */ typedef enum SResidency { RES_INHERENT, RES_REG, @@ -102,8 +102,7 @@ struct CG { /* Per-function spill-slot free-lists, one per RegClass. A spill takes a * slot from the free-list (allocating fresh from the backend if empty); * a reload returns the slot. Frame footprint is bounded by the peak - * concurrent spills per class. Reset at func_end. See doc/REGALLOC.md - * §3.2. */ + * concurrent spills per class. Reset at func_end. */ struct { FrameSlot* free; u32 n; @@ -287,7 +286,7 @@ static Operand op_indirect(Reg base, i32 ofs, const Type* ty) { } /* ============================================================ - * Register-pool & spill driver — see doc/REGALLOC.md + * Register-pool & spill driver * ============================================================ */ /* Class an SValue's register lives in: RC_FP for float types, RC_INT for diff --git a/src/link/link_image_id.c b/src/link/link_image_id.c @@ -12,7 +12,7 @@ * - COFF/PE debug directory (deferred) * * Lived in link_elf.c through Phase 0; lifted out so the Mach-O writer - * sees the same bytes (doc/MULTIOBJ.md §3.5). */ + * sees the same bytes. */ #include "core/core.h" #include "link/link_internal.h" diff --git a/src/link/link_internal.h b/src/link/link_internal.h @@ -313,8 +313,7 @@ void link_reloc_apply(Compiler*, RelocKind, u8* P_bytes, u64 S, i64 A, u64 P); /* Public link_emit_image_writer dispatches by Compiler.target.obj. The * ELF implementation lives in link_elf.c and dispatches internally on * Compiler.target.arch for e_machine and reloc translation. The Mach-O - * peer (link_macho.c) and COFF peer arrive in later phases of - * doc/MULTIOBJ.md. */ + * peer (link_macho.c) and COFF peer arrive in later phases. */ void link_emit_elf(LinkImage*, Writer*); void link_emit_macho(LinkImage*, Writer*); diff --git a/src/obj/macho.h b/src/obj/macho.h @@ -1,6 +1,6 @@ /* Mach-O wire-format constants, structs, and per-arch reloc translators * shared between obj/macho_emit.c, obj/macho_read.c, and link/link_macho.c - * (none of which exist yet — see doc/MULTIOBJ.md). + * (none of which exist yet). * * Private to src/. The public ObjBuilder/Linker surface is format-neutral * (obj/obj.h, link/link.h); the Mach-O spelling of those abstractions only @@ -252,7 +252,7 @@ typedef struct MachRelocInfo { /* Map cfree-canonical RelocKind <-> arm64 Mach-O reloc type. Returns * (u32)-1 on unsupported kinds; the caller (emit_macho / read_macho) * panics with a diagnostic. Stubs in macho_reloc_aarch64.c until the - * Phase 2 writer lands (see doc/MULTIOBJ.md). */ + * Phase 2 writer lands. */ u32 macho_aarch64_reloc_to(u32 kind /* RelocKind */); u32 macho_aarch64_reloc_pcrel(u32 kind /* RelocKind */); u32 macho_aarch64_reloc_length(u32 kind /* RelocKind */); diff --git a/src/obj/macho_emit.c b/src/obj/macho_emit.c @@ -15,7 +15,7 @@ * * 64-bit little-endian only. Big-endian / 32-bit panics at entry. * - * See doc/MULTIOBJ.md §3.1 for the round-trip invariant: read_macho of + * Round-trip invariant: read_macho of * this output must produce an ObjBuilder shape-equivalent to the input, * modulo (a) Mach-O's mandatory (segname, sectname) pairing and (b) * any synthesized N_SECT symbols. The (segname,sectname) form chosen diff --git a/src/obj/obj.h b/src/obj/obj.h @@ -379,7 +379,7 @@ void obj_symiter_free(ObjSymIter*); * helpers pick the right name for the active target.obj so the linker * doesn't carry per-format switches at every synthesis site. ELF * returns the historical names; Mach-O / COFF panic until those - * writers land (see doc/MULTIOBJ.md §3.5). */ + * writers land. */ Sym obj_secname_init_array(Compiler*); Sym obj_secname_fini_array(Compiler*); Sym obj_secname_preinit_array(Compiler*); diff --git a/src/obj/obj_secnames.c b/src/obj/obj_secnames.c @@ -13,7 +13,7 @@ * choice so callers don't sprinkle target-format switches through * link_layout.c / link_dyn.c. * - * Phase 1 of doc/MULTIOBJ.md: ELF returns the historical name; Mach-O + * Phase 1: ELF returns the historical name; Mach-O * panics with a "TODO" until the macho writer lands in Phase 2/3. COFF * panics in the same way and is filled in later. */ @@ -25,7 +25,7 @@ static Sym secname_panic_unimpl(Compiler* c, const char* which) { SrcLoc l = {0, 0, 0}; compiler_panic(c, l, "obj section name '%s' for target obj=%u not yet " - "implemented (see doc/MULTIOBJ.md §3.5)", + "implemented", which, (unsigned)c->target.obj); return 0; } diff --git a/test/cg/CORPUS.md b/test/cg/CORPUS.md @@ -434,5 +434,3 @@ Phase status: |---|---| | M | inline asm | | R | opt-wrapped equivalence | - -See `doc/cg_testing.md` for the strategy and group definitions. diff --git a/test/macho/cfree-roundtrip-macho.c b/test/macho/cfree-roundtrip-macho.c @@ -1,5 +1,5 @@ /* cfree-roundtrip-macho: read a Mach-O object via libcfree's read_macho, - * then re-emit via emit_macho. Phase 2 oracle for doc/MULTIOBJ.md §3.1. + * then re-emit via emit_macho. Round-trip oracle for the Mach-O writer. * * Usage: cfree-roundtrip-macho <in.o> <out.o> * diff --git a/test/parse/CORPUS.md b/test/parse/CORPUS.md @@ -158,7 +158,7 @@ here for completeness once they're real cases. | `6_5_30_generic_selection`| ★ | `int x=42; return _Generic((x), int: x, default: 0);` | 42 | | `6_5_31_subscript_commute`| ★ | `int a[5]={0,0,42,0,0}; return 2[a];` | 42 | | `6_5_32_string_subscript` | ★ | `return "*"[0];` | 42 | -| `6_5_33_regalloc_spill` | ★ | 12-arg `sum12(x1+0, ..., x12+0)` — exceeds the 10-INT scratch pool, exercises `spill_reg`/`reload_reg` and the cg_call avs-in-flight fallback (see doc/REGALLOC.md) | 78 | +| `6_5_33_regalloc_spill` | ★ | 12-arg `sum12(x1+0, ..., x12+0)` — exceeds the 10-INT scratch pool, exercises `spill_reg`/`reload_reg` and the cg_call avs-in-flight fallback | 78 | | `6_5_36_fp_arith` | ★ | `(a + b) / b * c - 36.0` over `double` — pins parser dispatch to `BO_FADD`/`FSUB`/`FMUL`/`FDIV` | 42 | | `6_5_37_fp_int_promote` | ★ | `int + double` — usual arithmetic conversion promotes the int side to `double` before BO_FADD | 42 | | `6_5_38_fp_float_widen` | ★ | `float + double` — float widens to double before BO_FADD | 42 | diff --git a/test/smoke/rv64.sh b/test/smoke/rv64.sh @@ -1,7 +1,7 @@ #!/usr/bin/env bash # test/smoke/rv64.sh — end-to-end smoke test for the rv64 podman/qemu path. # -# Phase-2 of doc/MULTIARCH.md: prove the test/lib/exec_target.sh helper +# Phase-2 of the multi-arch bring-up: prove the test/lib/exec_target.sh helper # can build, queue, and run a riscv64-linux ELF before any cfree-emitted # rv64 bytes exist. Builds a tiny freestanding static executable with # clang --target=riscv64-linux-gnu and pushes it through diff --git a/test/smoke/x64.sh b/test/smoke/x64.sh @@ -1,7 +1,7 @@ #!/usr/bin/env bash # test/smoke/x64.sh — end-to-end smoke test for the x64 podman/qemu path. # -# Phase-1 of doc/MULTIARCH.md: prove the test/lib/exec_target.sh helper +# Phase-1 of the multi-arch bring-up: prove the test/lib/exec_target.sh helper # can build, queue, and run an x86_64-linux ELF before any cfree-emitted # x64 bytes exist. Builds a tiny freestanding static executable with # clang --target=x86_64-linux-gnu and pushes it through diff --git a/test/test.mk b/test/test.mk @@ -137,7 +137,7 @@ test-parse: lib $(PARSE_RUNNER) $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER test-parse-err: lib $(PARSE_RUNNER) sh test/parse/run_errors.sh -# test-smoke-x64: phase-1 sanity check for doc/MULTIARCH.md. Builds a +# test-smoke-x64: phase-1 sanity check for the multi-arch bring-up. Builds a # tiny freestanding x86_64 ELF with clang --target=x86_64-linux-gnu and # runs it through test/lib/exec_target.sh's podman/qemu pipeline, # proving the harness end-to-end before any cfree-emitted x64 bytes @@ -165,7 +165,7 @@ test-smoke-rv64: # # Both build rt/build/aarch64-linux/libcfree_rt.a for soft-float / TF # builtins, and run `cfree ld` against the real libc.a (static) and -# libc.so / libc.so.6 (dynamic — see doc/DYNLD.md). Excluded from the +# libc.so / libc.so.6 (dynamic). Excluded from the # default `test` target because they need podman; opt-in via # `make test-musl` / `make test-glibc`. #