commit 4e590650228592d27104243d8512af3ed0cb0c18
parent 246df83a668460fbe7e2b598a23614ebe68c9c11
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 8 Jun 2026 18:22:39 -0700
asm: don't globalize deferred-symbol tombstones in promote_undef_externs
promote_undef_externs (the `as` pass that turns referenced-but-undefined LOCAL
symbols into undefined GLOBALs, matching GNU as) iterated every symbol slot —
including `removed=1` tombstones — and so flipped the binding of deferred
anonymous const-data / jump-table symbols (`.Lkit_ro.N` / `.Lkit_jt.N`) that
were still tombstoned awaiting materialization at opt_whole_module_finalize.
It was the only obj_symiter consumer not honoring the `removed` contract
documented in obj.h.
This bit the aarch64-freebsd `-O1` self-build: FreeBSD's <stdlib.h> injects a
file-scope `__asm__(".symver ...")` (via __sym_compat) whose replay runs this
pass *before* the deferred data is materialized, so the four hosted
driver/env/*.o each surfaced a defined global `.Lkit_ro.0` and the stage2 link
aborted with `duplicate definition of global symbol '.Lkit_ro.0'`. Not
FreeBSD-codegen-specific — any TU with a file-scope asm plus a deferred
const-data symbol at -O1 reproduces it on any target (e.g. aarch64-linux-musl).
Fix: skip tombstones in the loop. The four env objects now show zero global
.Lkit symbols and the stage2 link succeeds. test-asm/test-link/test-elf green;
-O0 is provably unaffected (deferral, hence tombstones, only exist at -O1).
The freebsd -O1 chain now gets past the stage2 link but hits a separate,
pre-existing blocker — `-Wl,--gc-sections` drops crt `__progname`/`environ`
that libc.so.7 needs — documented in doc/plan/BOOTSTRAP.md as the next fix.
Diffstat:
2 files changed, 37 insertions(+), 14 deletions(-)
diff --git a/doc/plan/BOOTSTRAP.md b/doc/plan/BOOTSTRAP.md
@@ -84,24 +84,38 @@ Linux target from a non-Linux host" — the FreeBSD VM path is the same shape):
`.gnu.version_d`/`.gnu.version` and emits a matching `.gnu.version_r` +
`.gnu.version` (gated on the DSO carrying versions, so musl/static links are
unchanged; glibc links now also carry correct `GLIBC_*` requirements).
-- **`-O1` (release) chain does NOT yet reach the fixed point.** It is blocked
- *before* the fixed-point check by a pre-existing, FreeBSD-target-specific
- codegen bug, unrelated to symbol versioning: the `-O1` *deferred* anonymous
- const-data path (`api_const_data_can_defer` →
- `local_static_data_*`) emits the `.Lkit_ro.N` (and sibling `.Lkit_jt.N`)
- symbols with **GLOBAL** instead of LOCAL binding for the FreeBSD target. The
- four hosted `driver/env/*.o` adapters then each define a global `.Lkit_ro.0`,
- and the stage2 `kit` link aborts with `duplicate definition of global symbol
- '.Lkit_ro.0'`. The identical source at `-O1` emits these symbols LOCAL for
- aarch64-linux and aarch64-macos, so the bug is in the FreeBSD-target deferred
- const-data emission, not the link/versioning work. This is the `.Lkit_jt.0`
- release-bootstrap break tracked elsewhere; gate on `bootstrap-debug` until it
- is fixed.
+- **`-O1` (release) chain does not yet reach the fixed point**, but the
+ original blocker is fixed and a second, distinct one is now isolated.
+ - **Fixed — deferred-symbol globalization (assembler).** The `-O1` *deferred*
+ anonymous const-data / jump-table symbols (`.Lkit_ro.N` / `.Lkit_jt.N`) are
+ created as LOCAL tombstones (`obj_symbol_defer`, `removed=1`) and only
+ materialized at `opt_whole_module_finalize`. The assembler's
+ `promote_undef_externs` (`src/asm/asm.c`) — which globalizes undefined LOCAL
+ externs — walked every symbol slot *including tombstones*, the only
+ `obj_symiter` consumer that did not honor the `removed` contract (`obj.h`),
+ so it flipped those tombstones to defined GLOBALs. It bit FreeBSD because
+ its `<stdlib.h>` injects a file-scope `__asm__(".symver …")` (via
+ `__sym_compat`) whose replay runs that pass *before* the deferred data is
+ materialized; the four hosted `driver/env/*.o` then each defined a global
+ `.Lkit_ro.0` and the stage2 link aborted with `duplicate definition of
+ global symbol '.Lkit_ro.0'`. Not FreeBSD-codegen-specific — any TU with a
+ file-scope `asm` plus a deferred const-data symbol at `-O1` reproduces it on
+ any target. Fix: skip `removed` tombstones in `promote_undef_externs`.
+ - **Open — `--gc-sections` drops crt symbols a DSO needs.** With that fixed,
+ the stage2 link now succeeds, but the resulting `-O1` `kit` fails to *load*:
+ `ld-elf.so.1: /lib/libc.so.7: Undefined symbol "__progname"`. The release
+ chain links with `-Wl,--gc-sections`, and kit's section-GC liveness pass
+ (`src/link/link_resolve.c`) does not root executable-defined symbols that a
+ linked DSO references — so `__progname` and `environ` (defined in the
+ FreeBSD crt, referenced by `libc.so.7`) get garbage-collected out of the
+ dynsym. The `-O0` chain has no `--gc-sections` and is unaffected; reproduces
+ by cross-linking any hosted FreeBSD exe with `-rdynamic -Wl,--gc-sections`.
+ This is the next thing to fix for the `-O1` fixed point.
This gives three fully self-hosting configurations (aarch64-macos, plus
aarch64-linux under musl and glibc) and a fourth at `-O0` (aarch64-freebsd).
The remaining work is breadth: the other native targets, the aarch64-freebsd
-`-O1` const-data binding fix, and guarding the property over time.
+`-O1` `--gc-sections`/DSO-root fix, and guarding the property over time.
## Open problems and next steps
diff --git a/src/asm/asm.c b/src/asm/asm.c
@@ -203,6 +203,15 @@ static void promote_undef_externs(AsmDriver* d) {
ObjSymIter* it = obj_symiter_new(d->ob);
ObjSymEntry e;
while (obj_symiter_next(it, &e)) {
+ /* The iterator visits tombstoned slots too (see obj.h). Deferred
+ * anonymous const-data / jump-table symbols (obj_symbol_defer) sit as
+ * LOCAL/SK_OBJ/no-section tombstones until opt_whole_module_finalize
+ * materializes them — which, when a file-scope `asm` block (e.g. a
+ * FreeBSD header's `.symver`) replays before that finalize step, is
+ * *after* this pass runs. Promoting them here would resurface them as
+ * defined GLOBALs and collide at link (`duplicate definition of global
+ * symbol '.Lkit_ro.0'`), so skip tombstones like every other consumer. */
+ if (e.sym->removed) continue;
if (e.sym->section_id != OBJ_SEC_NONE) continue; /* defined here */
if (e.sym->bind != SB_LOCAL) continue;
if (e.sym->kind == SK_ABS || e.sym->kind == SK_COMMON) continue;