kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 4e590650228592d27104243d8512af3ed0cb0c18
parent 246df83a668460fbe7e2b598a23614ebe68c9c11
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon,  8 Jun 2026 18:22:39 -0700

asm: don't globalize deferred-symbol tombstones in promote_undef_externs

promote_undef_externs (the `as` pass that turns referenced-but-undefined LOCAL
symbols into undefined GLOBALs, matching GNU as) iterated every symbol slot —
including `removed=1` tombstones — and so flipped the binding of deferred
anonymous const-data / jump-table symbols (`.Lkit_ro.N` / `.Lkit_jt.N`) that
were still tombstoned awaiting materialization at opt_whole_module_finalize.
It was the only obj_symiter consumer not honoring the `removed` contract
documented in obj.h.

This bit the aarch64-freebsd `-O1` self-build: FreeBSD's <stdlib.h> injects a
file-scope `__asm__(".symver ...")` (via __sym_compat) whose replay runs this
pass *before* the deferred data is materialized, so the four hosted
driver/env/*.o each surfaced a defined global `.Lkit_ro.0` and the stage2 link
aborted with `duplicate definition of global symbol '.Lkit_ro.0'`. Not
FreeBSD-codegen-specific — any TU with a file-scope asm plus a deferred
const-data symbol at -O1 reproduces it on any target (e.g. aarch64-linux-musl).

Fix: skip tombstones in the loop. The four env objects now show zero global
.Lkit symbols and the stage2 link succeeds. test-asm/test-link/test-elf green;
-O0 is provably unaffected (deferral, hence tombstones, only exist at -O1).

The freebsd -O1 chain now gets past the stage2 link but hits a separate,
pre-existing blocker — `-Wl,--gc-sections` drops crt `__progname`/`environ`
that libc.so.7 needs — documented in doc/plan/BOOTSTRAP.md as the next fix.

Diffstat:
Mdoc/plan/BOOTSTRAP.md | 42++++++++++++++++++++++++++++--------------
Msrc/asm/asm.c | 9+++++++++
2 files changed, 37 insertions(+), 14 deletions(-)

diff --git a/doc/plan/BOOTSTRAP.md b/doc/plan/BOOTSTRAP.md @@ -84,24 +84,38 @@ Linux target from a non-Linux host" — the FreeBSD VM path is the same shape): `.gnu.version_d`/`.gnu.version` and emits a matching `.gnu.version_r` + `.gnu.version` (gated on the DSO carrying versions, so musl/static links are unchanged; glibc links now also carry correct `GLIBC_*` requirements). -- **`-O1` (release) chain does NOT yet reach the fixed point.** It is blocked - *before* the fixed-point check by a pre-existing, FreeBSD-target-specific - codegen bug, unrelated to symbol versioning: the `-O1` *deferred* anonymous - const-data path (`api_const_data_can_defer` → - `local_static_data_*`) emits the `.Lkit_ro.N` (and sibling `.Lkit_jt.N`) - symbols with **GLOBAL** instead of LOCAL binding for the FreeBSD target. The - four hosted `driver/env/*.o` adapters then each define a global `.Lkit_ro.0`, - and the stage2 `kit` link aborts with `duplicate definition of global symbol - '.Lkit_ro.0'`. The identical source at `-O1` emits these symbols LOCAL for - aarch64-linux and aarch64-macos, so the bug is in the FreeBSD-target deferred - const-data emission, not the link/versioning work. This is the `.Lkit_jt.0` - release-bootstrap break tracked elsewhere; gate on `bootstrap-debug` until it - is fixed. +- **`-O1` (release) chain does not yet reach the fixed point**, but the + original blocker is fixed and a second, distinct one is now isolated. + - **Fixed — deferred-symbol globalization (assembler).** The `-O1` *deferred* + anonymous const-data / jump-table symbols (`.Lkit_ro.N` / `.Lkit_jt.N`) are + created as LOCAL tombstones (`obj_symbol_defer`, `removed=1`) and only + materialized at `opt_whole_module_finalize`. The assembler's + `promote_undef_externs` (`src/asm/asm.c`) — which globalizes undefined LOCAL + externs — walked every symbol slot *including tombstones*, the only + `obj_symiter` consumer that did not honor the `removed` contract (`obj.h`), + so it flipped those tombstones to defined GLOBALs. It bit FreeBSD because + its `<stdlib.h>` injects a file-scope `__asm__(".symver …")` (via + `__sym_compat`) whose replay runs that pass *before* the deferred data is + materialized; the four hosted `driver/env/*.o` then each defined a global + `.Lkit_ro.0` and the stage2 link aborted with `duplicate definition of + global symbol '.Lkit_ro.0'`. Not FreeBSD-codegen-specific — any TU with a + file-scope `asm` plus a deferred const-data symbol at `-O1` reproduces it on + any target. Fix: skip `removed` tombstones in `promote_undef_externs`. + - **Open — `--gc-sections` drops crt symbols a DSO needs.** With that fixed, + the stage2 link now succeeds, but the resulting `-O1` `kit` fails to *load*: + `ld-elf.so.1: /lib/libc.so.7: Undefined symbol "__progname"`. The release + chain links with `-Wl,--gc-sections`, and kit's section-GC liveness pass + (`src/link/link_resolve.c`) does not root executable-defined symbols that a + linked DSO references — so `__progname` and `environ` (defined in the + FreeBSD crt, referenced by `libc.so.7`) get garbage-collected out of the + dynsym. The `-O0` chain has no `--gc-sections` and is unaffected; reproduces + by cross-linking any hosted FreeBSD exe with `-rdynamic -Wl,--gc-sections`. + This is the next thing to fix for the `-O1` fixed point. This gives three fully self-hosting configurations (aarch64-macos, plus aarch64-linux under musl and glibc) and a fourth at `-O0` (aarch64-freebsd). The remaining work is breadth: the other native targets, the aarch64-freebsd -`-O1` const-data binding fix, and guarding the property over time. +`-O1` `--gc-sections`/DSO-root fix, and guarding the property over time. ## Open problems and next steps diff --git a/src/asm/asm.c b/src/asm/asm.c @@ -203,6 +203,15 @@ static void promote_undef_externs(AsmDriver* d) { ObjSymIter* it = obj_symiter_new(d->ob); ObjSymEntry e; while (obj_symiter_next(it, &e)) { + /* The iterator visits tombstoned slots too (see obj.h). Deferred + * anonymous const-data / jump-table symbols (obj_symbol_defer) sit as + * LOCAL/SK_OBJ/no-section tombstones until opt_whole_module_finalize + * materializes them — which, when a file-scope `asm` block (e.g. a + * FreeBSD header's `.symver`) replays before that finalize step, is + * *after* this pass runs. Promoting them here would resurface them as + * defined GLOBALs and collide at link (`duplicate definition of global + * symbol '.Lkit_ro.0'`), so skip tombstones like every other consumer. */ + if (e.sym->removed) continue; if (e.sym->section_id != OBJ_SEC_NONE) continue; /* defined here */ if (e.sym->bind != SB_LOCAL) continue; if (e.sym->kind == SK_ABS || e.sym->kind == SK_COMMON) continue;