kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 0a2faa92c8f3b6a302e7f621f78a92ba4b4180c7
parent 61a4a4c6f662dced2a9394cb09ee41c1bca529c2
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu,  4 Jun 2026 14:01:26 -0700

Implement whole-program optimization (LTO Phase 0 + Phase 1)

Make a library or executable look like a single translation unit to the
optimizer so inlining, dead-code elimination, and internalization cross TU
boundaries, for invocations that provide all sources up front
(kit cc *.c -flto -o prog and the build-exe/lib/obj verbs). Whole-program
optimization runs whenever the optimizer runs (-O1+). See doc/plan/LTO.md.

Phase 0 — whole-translation-unit optimization:
- Generalize the ARM64-only finalize sweep to opt_whole_module_finalize for
  every arch; x86-64/riscv64 defer per-function emit to finalize under the
  whole-program path. -O0 and the JIT/interp/run paths stay on eager emit.
- Wire opt_inline over the reachable FuncSet (the previously unreached
  whole-program inliner); weak/interposable callees are kept out-of-line.
- Decide cg/link capabilities via the arch vtable rather than arch identity,
  so src/opt carries no arch == KIT_ARCH_* checks.

Phase 1 — shared-context, all-sources-up-front LTO:
- Stage N semantic frontends (C, Toy, Wasm) into one borrowed KitCg over a
  caller-owned ObjBuilder via an explicit begin/begin_unit/end_unit/finish/
  detach lifecycle; asm participates as an opaque object. Drivers
  (build_compile_all, cc_run_link_exe) collect inputs and do not own
  definition selection or finalization.
- Extract symresolve (src/obj/symresolve.{h,c}); refactor
  link_resolve_symbols onto it and reuse it for the recording-time merge so
  cross-TU ODR/weak/common/COMDAT resolution matches the linker exactly.
- Give ObjBuilder an O(1) name->id index; skip-intern LOCAL symbols so
  per-TU statics stay distinct in the shared builder.
- Compute the preserved/export set from the assembled link (entry, opaque
  undef refs, intrinsic/IFUNC roots, retain/init-array/address-significant)
  and feed it to kit_cg_finish; executable links internalize the rest before
  the reachability sweep. Relocatable/archive/shared stay conservative and
  cc -shared -flto is rejected until shared output is exercised.

Fixes found completing the work:
- opt_set_finish_policy was called unconditionally from kit_cg_finish and
  casts the recorder's user to OptImpl*, but the C-source backend's recorder
  user is a CTarget; the stray write corrupted it and crashed every --emit=c
  run. Guard it behind opt_level > 0, like opt_set_dump_writer.
- The recording-time symresolve_merge fired on same-TU re-emission, so legal
  C tentative definitions (int g; int g;) were rejected as duplicates in
  every path, not just LTO. Track the defining source unit per ObjSymId and
  merge only across units; same-TU re-emission keeps last-writer-wins,
  matching the non-LTO linker. kit stays -fno-common.

Tests: test/opt/whole_program_inline.sh and test/opt/lto_phase1.sh cover
cross-TU fusion, internalization, weak/strong, ODR, tentative resolution
fidelity (-flto == linker), multi-frontend staging, and opaque asm.
asm_04_register_callee_saved is skipped on Mach-O hosts (verbatim file-scope
asm defines bare names the underscored C refs cannot reach, like asm_02).

Diffstat:
Mdoc/CODEGEN.md | 4++--
Mdoc/DWARF.md | 2+-
Adoc/plan/CG_OBJ_LIFECYCLE.md | 184+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdoc/plan/LTO.md | 223++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------
Mdoc/plan/RELEASE.md | 8++++----
Mdriver/cmd/build.c | 351++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-----------------
Mdriver/cmd/cc.c | 111++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
Mdriver/lib/compile_engine.c | 182+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdriver/lib/compile_engine.h | 57+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdriver/lib/link_engine.c | 84++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
Mdriver/lib/link_engine.h | 7+++++++
Minclude/kit/cg.h | 38+++++++++++++++++++++++++++++++++++---
Minclude/kit/compile.h | 33++++++++++++++++++++++++++++-----
Minclude/kit/core.h | 5+++++
Minclude/kit/link.h | 5+++++
Mlang/c/c.c | 28++++++++++------------------
Mlang/toy/compile.c | 30+++++++++++-------------------
Mlang/wasm/cg.c | 92++++++++++++++++++++++++++++++++++++++++---------------------------------------
Mlang/wasm/wasm.c | 33+++++++++++++++++----------------
Mmk/test.mk | 12+++++++++++-
Msrc/abi/abi.h | 6++++++
Msrc/abi/abi_aapcs64.c | 4++++
Msrc/abi/abi_rv64.c | 3+++
Msrc/abi/abi_sysv_x64.c | 3+++
Msrc/abi/abi_win64_x64.c | 3+++
Msrc/api/compile.c | 164++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---------------
Msrc/api/link.c | 238+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/arch/aa64/arch.c | 67+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/arch/aa64/link.c | 18++++++++++++++++++
Msrc/arch/arch.h | 28++++++++++++++++++++++++++++
Msrc/arch/cgtarget.c | 8+++++++-
Msrc/arch/riscv/arch.c | 80+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
Msrc/arch/wasm/arch.c | 70++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/arch/x64/arch.c | 69+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/cg/atomic.c | 52+++++++++++++++++++++++++++++++---------------------
Msrc/cg/cgtarget.h | 16+++++++++++++++-
Msrc/cg/data.c | 203+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
Msrc/cg/internal.h | 27+++++++++++++++++++++++----
Msrc/cg/ir.h | 3++-
Msrc/cg/session.c | 111+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
Msrc/cg/type.c | 109+++++++++++++------------------------------------------------------------------
Msrc/emu/emu.c | 14++++++++++----
Msrc/link/link_arch.h | 8++++++++
Msrc/link/link_internal.h | 32+++++++-------------------------
Msrc/link/link_resolve.c | 142+++++++++++++++++++++++++++++++++++++++++--------------------------------------
Msrc/obj/obj.c | 73+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
Msrc/obj/obj.h | 1+
Asrc/obj/symresolve.c | 37+++++++++++++++++++++++++++++++++++++
Asrc/obj/symresolve.h | 78++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/opt/opt.c | 331+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
Msrc/opt/opt.h | 1+
Msrc/opt/pass_lower.c | 11+++++------
Msrc/wasm/wasm.h | 1+
Mtest/api/cg_fp_cmp_test.c | 11+++++++----
Mtest/api/cg_switch_test.c | 2+-
Mtest/api/cg_type_test.c | 115++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------------
Mtest/arch/inline_public_test.h | 5+++--
Mtest/cg/strength_reduce_test.c | 5+++--
Atest/opt/lto_phase1.sh | 461+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atest/opt/whole_program_inline.sh | 138+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/parse/run.sh | 9++++++++-
61 files changed, 3663 insertions(+), 583 deletions(-)

diff --git a/doc/CODEGEN.md b/doc/CODEGEN.md @@ -113,7 +113,7 @@ realization. ## CgTarget realizations -`session.c`'s `kit_cg_begin_obj` picks the realization. It asks the arch +`session.c`'s `kit_cg_begin` picks the realization. It asks the arch registry (`cg_backend_for_session`, `src/arch/registry.c`) for a `CGBackend` whose `make` builds the base `CgTarget` for this target arch and output kind, then conditionally wraps it: @@ -122,7 +122,7 @@ then conditionally wraps it: (see below). No IR is recorded; semantic ops emit machine code immediately. - **`-O1`/`-O2` or interpreter:** `session.c` wraps the base target with `opt_cgtarget_new` (`src/opt/opt.c`), which returns a `CgIrRecorder` - (`src/cg/ir_recorder.c`). Recording does not emit; at `finalize` the optimizer + (`src/cg/ir_recorder.c`). Recording does not emit; at `kit_cg_finish` the optimizer replays optimized IR. The recorder still holds the unwrapped native target so the optimizer can drive `NativeTarget` directly after lowering. - **C-source / wasm:** the registry returns a source-like `CgTarget` that diff --git a/doc/DWARF.md b/doc/DWARF.md @@ -69,7 +69,7 @@ is fine. The flow: the CG session before any optimizer wrapper. - The CG session (`src/cg/session.c`) drives function/scope/variable lifecycle from inside the public CG API entry points, and calls `debug_emit` then - `debug_free` at `kit_cg_end_obj`. + `debug_free` at `kit_cg_finish`. - The C-type → DWARF-type adapter lives at `src/cg/debug.c` (`api_debug_type`), not in the language frontend: it lowers a CG type id (`KitCgTypeId`) into a chain of `debug_type_*` calls. Debug itself is language-neutral — it knows diff --git a/doc/plan/CG_OBJ_LIFECYCLE.md b/doc/plan/CG_OBJ_LIFECYCLE.md @@ -0,0 +1,184 @@ +# CG / ObjBuilder Lifecycle + +This is the target lifecycle for semantic code generation and object building. +It is motivated by LTO, but it should be true for ordinary one-TU compilation +as well: `ObjBuilder` owns object lifetime, while `KitCg` borrows an object and +finishes codegen into it. + +Status (2026-06-04): the borrowed CG/object lifecycle is implemented as the only +public CG session interface. `kit_cg_free` aborts and detaches without flushing, +lowering, debug-emitting, or finalizing the borrowed object. Shared-library LTO +remains disabled until that output path is exercised. + +## Problem + +Historically `KitCg` had an object-shaped lifecycle: + +```c +cg_begin_object(cg, ob, code_opts); +frontend_compile_cg(..., cg); +cg_end_object(cg); +kit_obj_builder_finalize(ob); +``` + +That was the wrong ownership boundary. `KitCg` does not create, emit, link, or +free the object; the caller does. In the borrowed lifecycle, `kit_cg_finish` +finalizes the CG target and emits debug, while `kit_cg_detach` drops the +borrowed object/target links. `kit_cg_free` follows the abort path and never +finishes a partial object as a side effect of cleanup. + +It also makes LTO harder to finish cleanly. LTO needs to collect multiple source +units into one object, then finish semantic codegen only after the driver/linker +has enough information to provide preserved/export policy. That handoff should +be a `KitCg` finish option, not a driver-owned pseudo-unit abstraction. + +## Ownership Model + +`ObjBuilder` owns object state: + +- symbol identity and the name-to-id index; +- sections, atoms, relocations, data bodies, common symbols, and object metadata; +- object-level finalization and emission; +- object lifetime and cleanup. + +`KitCg` owns a semantic codegen session attached to an object: + +- the current target/recorder/backend; +- codegen options and whole-module optimization state; +- source-unit boundaries and provenance; +- debug/codegen state that is produced by semantic lowering; +- the final codegen flush into the borrowed object. + +The driver or API caller owns orchestration: + +- creating/freeing `ObjBuilder`; +- deciding source order and which inputs are semantic vs opaque; +- passing link-picture policy to codegen finish; +- calling `kit_obj_builder_finalize` and then emitting/linking the object. + +## Target API Shape + +The exact names can change, but the shape should be explicit: + +```c +KitObjBuilder* ob = NULL; +KitCg* cg = NULL; + +kit_obj_builder_new(compiler, &ob); +kit_cg_new(compiler, &cg); + +kit_cg_begin(cg, ob, &code_opts); /* borrow ob, attach backend */ +kit_cg_begin_unit(cg, &unit_opts); /* source contribution */ +frontend_compile_cg(..., cg); +kit_cg_end_unit(cg); +kit_cg_finish(cg, &finish_opts); /* flush/lower/debug into ob */ +kit_cg_detach(cg); /* drop borrowed links */ + +kit_obj_builder_finalize(ob); +``` + +For multi-source LTO, only the unit loop grows: + +```c +kit_obj_builder_new(compiler, &ob); +kit_cg_new(compiler, &cg); +kit_cg_begin(cg, ob, &code_opts); + +for each semantic source: + kit_cg_begin_unit(cg, &unit_opts); + frontend_compile_cg(..., cg); + kit_cg_end_unit(cg); + +kit_cg_finish(cg, &finish_opts); +kit_cg_detach(cg); +kit_obj_builder_finalize(ob); +``` + +Opaque frontends do not attach to `KitCg`; they compile directly into their own +`ObjBuilder` and enter link/archive/relocatable order as ordinary objects. + +## Object vs Unit + +An object is the emitted product. It may contain one source unit or many. + +A unit is one semantic source contribution inside the object. Unit boundaries +are not object boundaries. They exist so codegen can track: + +- source name and source identity; +- ODR/duplicate-definition provenance; +- debug compilation-unit identity; +- file-scope asm and file-scope language state boundaries; +- future per-source codegen options or path-map state; +- contribution tables for "symbol X was defined by unit N". + +## Finish Options + +`kit_cg_finish` is where link-picture-dependent policy enters semantic +optimization. For LTO, finish options should eventually carry: + +- preserved symbols: entry, dynamic exports, opaque undefined references, + `used`, init/fini, asm-named/address-significant symbols, IFUNC, etc.; +- output policy: executable, shared library, relocatable, archive member; +- interposition policy: default-visibility shared-library symbols are + interposable unless hidden/version-script/`-Bsymbolic` policy says otherwise; +- debug policy for cross-unit inlining. + +The finish operation may use internal `ObjSymId` sets when the linker/driver has +already resolved names into the shared `ObjBuilder`. A public API can offer a +name-based adapter if needed, but the core should prefer symbol ids once an +object exists. + +`kit_cg_finish` must not call `kit_obj_builder_finalize`. The caller finalizes +the object after CG has finished writing semantic output into it. + +## Failure Model + +Cleanup must not finalize by accident. + +- `kit_cg_finish` is the only operation that flushes/lower/debug-emits CG state. +- `kit_cg_abort` drops current CG-side state and detaches from the borrowed + object without finalizing anything. +- `kit_cg_free` never calls finish implicitly. +- The caller decides whether to finalize or free the `ObjBuilder`. + +This fixes the old wart where freeing an open `KitCg` could finalize a partial +object. + +## Boundary Rules + +Frontends should only see the `KitCg` semantic API or the object-only API they +explicitly implement. A semantic frontend should not own `ObjBuilder` +finalization, and an opaque frontend should not need a fake `KitCg`. + +`ObjBuilder` should remain the single source of truth for object symbol identity +and storage. CG may ask it to declare/define/merge contributions, but CG should +not own object lifetime. + +The driver should not implement symbol merge, semantic finalization, or +internalization policy. It should gather sources, gather opaque inputs, compute +or request preserved/export policy, and pass that policy to `kit_cg_finish`. + +## Migration Plan + +1. Introduce borrowed-lifecycle names as the public API: + `kit_cg_begin`, `kit_cg_finish`, `kit_cg_detach`, and `kit_cg_abort`. +2. Make one-TU semantic compilation use the same borrowed lifecycle that LTO + uses: caller creates `ObjBuilder`, CG borrows it, CG finishes, caller + finalizes the object. +3. Add `begin_unit` / `end_unit` bookkeeping and use it in ordinary one-TU and + multi-source LTO paths. +4. Move output-kind and preserved/export input into `kit_cg_finish` options. + The driver now passes output-kind/interposition policy for supported outputs; + preserved-symbol computation, internalization, and shared-library LTO remain + follow-up work, so global roots stay conservative. +5. Move duplicate function/data contribution bookkeeping toward the + `ObjBuilder`/CG contribution boundary so `src/opt` and `src/cg/data.c` do not + each own fragments of LTO symbol-resolution policy. + +## Non-Goals + +- This does not introduce a separate public `LtoUnit` abstraction. +- This does not require serialized IR objects. +- This does not make frontends own object finalization. +- This does not make opaque inputs semantic; asm and prebuilt objects remain + ordinary object participants. diff --git a/doc/plan/LTO.md b/doc/plan/LTO.md @@ -11,8 +11,10 @@ compiled IR objects are a later phase that reuses the same core. The optimizer baseline this builds on — the recording IR, the recording/optimizing boundary, the finalize path, and the pass catalog — is in [../OPT.md](../OPT.md) and [OPTIMIZER.md](OPTIMIZER.md). The link-time symbol -model is in [LINKER.md](LINKER.md). This document treats those as given and -describes only the LTO-specific additions. +model is in [LINKER.md](LINKER.md). The CG/object lifetime boundary used by the +remaining Phase 1 staging work is in +[CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md). This document treats those as given +and describes only the LTO-specific additions. The headline finding from investigating the tree: **most of the machinery for whole-program optimization already exists; it is just per-TU, single-arch, and @@ -20,6 +22,126 @@ partly unreached.** LTO here is three concrete refactors plus wiring, not a new subsystem. The largest of the three is factoring the linker's symbol-resolution policy out so it can run at merge time as well as at link time. +## Status (2026-06-04) + +**Phase 0 is complete and shipping; Phase 1's all-sources-up-front LTO path is +implemented in this branch.** The end state is not a C-only shortcut: +every source-building verb routes through one staging engine, and every +in-tree frontend declares either semantic CG staging or opaque-object +participation. The link-picture-driven preserved/export prepass now feeds +`kit_cg_finish`, and executable LTO internalizes non-preserved globals before +the whole-module reachability walk. Where reality diverged from the original +wording below: + +- **The gate is `-O1`, not `-O2`.** Whole-program optimization (deferred emit + + module sweep + inliner) runs whenever the optimizer runs: + `o->whole_program = (level >= 1)` in `opt_cgtarget_new`. `-O2` is treated as + `-O1` for now. References to `-O2`/`-fwhole-program` gating below are superseded. +- **One arch path, no identity checks.** The ARM64-only sweep is now + `opt_whole_module_finalize` for every arch; `src/opt` has zero + `arch == KIT_ARCH_*` checks. The sret arg-slot rule moved off arch identity to + `ABIFuncInfo.sret_consumes_int_arg` (set per ABI impl). Remaining generic-layer + arch identity (`src/cg/type.c`, `src/cg/atomic.c`, `src/link/link_resolve.c`) is + tracked as separate cleanup, not part of LTO. +- **Cross-TU LTO will be opt-in behind `-flto`** (revisit making it the `-O1` + default once proven) — resolves the flag-surface open question. +- **Frontend participation is explicit.** C, Toy, and Wasm lower into a + caller-owned open `KitCg`; asm is an opaque LTO participant and continues to + compile as an ordinary object. +- **The lifecycle target is borrowed `KitCg` + caller-owned `ObjBuilder`, not a + separate LTO unit abstraction.** `ObjBuilder` owns object lifetime; `KitCg` + records source units into a borrowed object and finishes semantic codegen with + link-picture policy. See [CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md). +- **`symresolve_merge` signature** as built is `(SymAttrs existing, SymAttrs + incoming)` with `in_comdat` carried inside `SymAttrs`; no separate `coff_target` + parameter (the COMDAT flags carry everything the decision needs). +- **Preserved/export internalization is part of Phase 1.** The LTO CG finish + path receives linker-computed preserved symbols for executable links, and + `cc -shared -flto` remains disabled until shared-library output is exercised. + +### Done + +- [x] **§6.1 Generalize the finalize sweep to all arches** — `opt_whole_module_finalize` + (`src/opt/opt.c`); x64/rv64 defer-to-finalize; `-O0` and the JIT/interp/run paths + unchanged; `opt_maybe_capture_interp` still invoked per reachable func. +- [x] **§6.4 Wire `opt_inline`** over the reachable `FuncSet` — `opt_run_o1_native` + split into `opt_o1_native_prepare` / `opt_o1_native_finish`; the sweep lowers the + live set into one FuncSet, runs the inliner, then finishes each func. +- [x] **Interposition soundness fix** (strengthens §9): weak/interposable callees are + never inlined — `opt_cg_func_interposable` marks them `KIT_CG_INLINE_NEVER`, honored by + both the streaming tiny-inliner and the whole-program inliner. Caught by a + strong-over-weak override case the prior (tiny-inliner) behavior miscompiled. +- [x] **§3 `symresolve` extraction** — `src/obj/symresolve.{h,c}`; + `link_resolve_symbols` refactored onto `symresolve_merge`; `link_bind_strength` / + `link_sym_is_def` / `link_sym_is_spurious_undef` are now wrappers. Behavior-preserving + (test-link 122/0, test-macho 80/0, ODR/weak/common/COMDAT all covered). +- [x] **§3 `ObjBuilder` name→id index** — `SymNameIndex` in `src/obj/obj.c`; + `obj_symbol_find` is an authoritative O(1) hash lookup with no linear scan, kept + exact through `obj_symbol_ex` and `obj_symbol_rename`. +- [x] **Tests** — `test/opt/whole_program_inline.sh` (wired `test-opt-whole-program-inline`): + static callee fuses on aa64/x64/rv64, weak callee kept out-of-line (interposition + guard), `opt.inline.inlined` fires at `-O1`, and the kit-native build verbs + (`build-obj`/`build-exe`) fuse too. +- [x] **Build verbs participate.** `build-exe`/`build-lib`/`build-obj` (which replaced + `compile` on `main`) compile each source to an in-memory builder under one + `KitCompiler` via `build_compile_all` (`driver/cmd/build.c`) and route through the + shared `kit_cg` path, so per-TU whole-program optimization applies at `-O1` with no + verb-specific wiring. `build_compile_all` is also the single seam the Phase 1 + cross-TU staging loop will hook (all three verbs at once); `cc` keeps its own + `cc_run_link_exe` → `link_engine` path. + +### Phase 1 source-staging checklist + +- [x] **Architecture lock-in.** Phase 1 is implemented as a frontend staging + and CG/ObjBuilder lifecycle refactor, not a C-driver shortcut. All + source-building verbs (`cc`, `build-exe`, `build-lib`, `build-obj`) route + through the same staging engine. Frontends explicitly declare how they + participate: semantic `kit_cg` staging for frontends that lower through CG, or + opaque-object participation for inputs that cannot expose semantic IR + (notably asm). The change is not complete until every in-tree frontend is + opted into one of those modes. +- [x] **§2 Skip-intern locals.** In `kit_cg_decl` (`src/cg/session.c:198`), for + `SB_LOCAL` bindings skip `obj_symbol_find` and always mint a fresh id. Confirm the + per-`Decl` id cache keeps intra-TU static reuse pointing at the cached id, and that + single-TU behavior is unchanged (locals are already unique per name within a TU). +- [x] **§4 Recording-arena lifetime — settle first.** Choose dedicated LTO arena vs + `c->global` for the recorder/`CgIrModule` so accumulated IR outlives each per-TU + frontend run. This is the one structural hazard (§9). +- [x] **§4 Source staging under the current CG API.** Add a deferred-finalize + mode to `kit_cg`: record N TUs into one shared session / `ObjBuilder` / + `CgIrModule` without per-TU finalization, then finish CG and finalize the + object once. Keep per-TU frontend state (Pool/DeclTable/type interning) + independent. +- [x] **§4 CG/ObjBuilder borrowed lifecycle.** Replace the former + object-shaped CG bracket with the lifecycle in + [CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md): caller-owned `ObjBuilder`, + borrowed `KitCg`, explicit unit boundaries, `kit_cg_finish` for semantic + codegen policy, and caller-owned object finalization. One-TU and multi-TU + builds now use the same + ownership model. +- [x] **§3/§4 Recording-time merge.** At the per-TU staging boundary, when a TU + contributes a body for a symbol already defined, call `symresolve_merge` to pick the + winner; drop the loser's `CgIrFunc`/data and keep its decl as a reference; report ODR + at the second definition's `SrcLoc`. +- [x] **§4 Driver loop + `-flto` flag.** Parse `-flto` in `cc` and the build verbs, + thread an LTO flag through `KitCodeOptions`/the driver, and add the staging path: + one shared session, frontend per source, one CG finish/object finalize, single + builder to the link session. Hook it at `build_compile_all` + (`driver/cmd/build.c`) so build-exe/lib/obj get it together, plus + `cc_run_link_exe`. (The build verbs already share one `KitCompiler`, so the + seam is in place.) +- [x] **§5 Preserved/export set.** Compute from the assembled link (entry symbol, + dynamic exports, undefs referenced by opaque inputs, `used`/init-fini/asm-named/IFUNC/ + address-significant) and hand it to `kit_cg_finish`. Current Phase 1 behavior + is conservative for relocatable/archive outputs, while executable outputs + internalize non-preserved LTO definitions. Shared-library LTO remains disabled + until shared output is exercised. +- [x] **§6.2 Internalize** non-preserved globals using the preserved set (unlocks + cross-TU DCE and unconstrained inlining), then re-run GC. +- [x] **Tests.** A two-TU `test/smoke` (or `test/link`) case where a cross-TU callee + inlines under `-flto`; a guard that a weak/exported cross-TU symbol is *not* + inlined/internalized; cross-TU ODR reported at the right `SrcLoc`. + ## Baseline (what already exists) A handful of facts about the current code path frame everything below. @@ -202,28 +324,57 @@ extracting it rather than duplicating it. ## 4. The staging lifecycle -`kit_cg_end_obj` finalizes (lowers + emits everything), nulls `g->obj`/ -`g->target`, and resets per-object state including `rodata_counter` -(`src/cg/session.c`). That bracket assumes exactly one TU. LTO needs a staging -mode: - -- **Record each TU into the live session without finalizing**; run a single - `finalize` after the last TU. A new session entry point (a "stage" variant of - `begin_obj`/`end_obj`, or a recorder flag that defers `finalize_recorded`) lets - the recorder accumulate all TUs into the one `CgIrModule`. +The lifecycle target for Phase 1 is documented in +[CG_OBJ_LIFECYCLE.md](CG_OBJ_LIFECYCLE.md). The short version: `ObjBuilder` +owns object lifetime, while `KitCg` borrows an object, records one or more +semantic units, and finishes codegen into that object. `kit_cg_finish` is a CG +flush/lowering/debug operation; it is not object finalization. + +The old object-shaped bracket used to finalize (lowers + emits everything), +null `g->obj`/`g->target`, and reset per-object state including +`rodata_counter` (`src/cg/session.c`). The structural state is now a borrowed +lifecycle: + +- **Record each TU as a unit in one live CG session without object + finalization.** Run a single `kit_cg_finish` after the last semantic source, + then let the caller finalize the `ObjBuilder`. The shared path records N + semantic frontends into one shared `KitCg` / `ObjBuilder` and finalizes once + through the explicit lifecycle: `kit_cg_begin`, `kit_cg_begin_unit`, + `kit_cg_end_unit`, `kit_cg_finish`, and `kit_cg_detach`/`kit_cg_abort`. + Drivers collect sources and opaque inputs; they do not implement definition + selection, IR lifetime, semantic finalization, or object finalization policy. +- **Frontend participation is explicit.** `KitFrontendVTable` has a split + contract: semantic frontends implement `compile_cg`, while opaque frontends + implement `compile_obj`. C, Toy, and Wasm participate by emitting into a + caller-owned open `KitCg` session; one-TU object builds are wrapped at the + compile-session layer by creating an `ObjBuilder`, attaching `KitCg` for one + unit, finishing CG, and then finalizing the object. + Asm has no semantic CG representation, so its LTO participation mode is opaque: + it compiles to an ordinary object and contributes references/definitions to the + link picture but not to the merged optimization module. This keeps all verbs + and all frontends on one declared path while allowing semantic frontend opt-in + one at a time. - **The recording arena must outlive any single TU.** The recorder and module are arena-allocated from `c->tu` today (`opt_cgtarget_new`, `cg_ir_recorder_new`). - For LTO they must come from a cross-TU arena (a dedicated LTO arena, or - `c->global`) so the accumulated IR survives across the per-TU frontend runs. - This is the one sharp edge of the lifecycle change and should be settled first. + In the current implementation `c->tu` is already compiler-session lifetime + (not reset between source inputs), so Phase 1 uses it as the cross-TU recorder + arena and documents that lifetime. If `c->tu` later becomes per-source again, + the shared CG path must switch to an explicit cross-source arena; the frontend + staging API must not depend on that allocator choice. - **Each TU keeps its own frontend state.** The per-TU `Pool`, `DeclTable`, and type interning stay independent; only the CG session and `ObjBuilder` are shared. The shared `KitCompiler` already spans sources today, so `c->global` name interning is already consistent across TUs. -The driver/`compile_engine` change is a loop: open one staging session, run the C -frontend once per source against it, finalize once, hand the single resulting -builder to the link session in place of the per-source builders. +The driver change is a shared staging engine: group every LTO-capable source +input in command-line order, stage semantic frontends into the borrowed CG +session and shared object, compile opaque frontends/objects as ordinary inputs, +then finish CG once and substitute the resulting builder at the right place in +the link order. The hook is `build_compile_all` in `driver/cmd/build.c` (shared +by build-exe/build-lib/build-obj) and `cc_run_link_exe` — both already compile +every source under one `KitCompiler`, which is the seam this loop replaces. +(`compile`/`compile_engine` from the original plan were retired in favor of the +build verbs on `main`.) ## 5. The export / preserved set @@ -248,6 +399,13 @@ pull (`scan_presence_before` / `member_satisfies`, `src/link/link_resolve.c:859` opaque inputs and the output-kind/visibility policy. Conservative default: internalize only for executable outputs or provably non-exported symbols. +Phase 1 implements this for all-sources-up-front executable LTO: the driver +stages semantic sources, assembles the ordered link session, asks the linker for +preserved LTO symbols, then passes those IDs to `kit_cg_finish` before object +finalization. Relocatable and archive-member outputs remain conservative because +later links may still reference globals by name. Shared-library LTO continues +to reject until shared output policy is exercised. + ## 6. The whole-program optimization core With a merged module and a preserved set, the core is `opt_emit_reachable_aarch64` @@ -282,12 +440,13 @@ exercised on real code. Lowest risk — purely inside the optimizer — and it validates the deferred-emit path that Phase 1's staging lifecycle also relies on. **Phase 1 — Shared-context, all-sources-up-front LTO.** The target case, -`kit cc *.c -O2 -flto -o prog` and `kit compile`. Build on Phase 0 by adding: +`kit cc *.c -flto -o prog` and `kit build-exe -flto` (and `build-lib`/`build-obj`; +`build-obj` replaced the retired `compile`). Build on Phase 0 by adding: (a) the `symresolve` extraction (§3), (b) the `ObjBuilder` name index (§3), -(c) skip-intern for locals (§2), (d) the `kit_cg` staging lifecycle and the -driver loop that records N frontends into one session and finalizes once (§4), -(e) the preserved set fed from the assembled link (§5). No cloner, no -serialization, no archive support yet. +(c) skip-intern for locals (§2), (d) the `KitCg`/`ObjBuilder` borrowed staging +lifecycle and the driver loop that records N frontends into one session and +finishes CG once (§4), (e) the preserved set fed from the assembled link into +`kit_cg_finish` (§5). No cloner, no serialization, no archive support yet. **Phase 2 — Serialized IR objects (`.kit.ir`).** Optional follow-on for separate compilation, archives, and build caches. `kit cc -c -flto a.c` emits a normal @@ -360,10 +519,16 @@ two-TU `test/smoke` case where a cross-TU callee inlines. - **Define-timing for resolution** (§3): confirm the staging-boundary merge is the right hook versus an `obj_symbol_define`-time check, given symbols are only obj-defined at emit. -- **Recording arena** (§4): dedicated LTO arena vs `c->global`, and how the - per-TU frontend arenas interact with a cross-TU module. -- **`-flto` flag surface**: accept the GCC/Clang spelling for `cc`; decide the - kit-native spelling for `compile` and whether `-fwhole-program` is a distinct, - more aggressive internalization mode. -- **CG API exposure**: whether staging is internal to the driver/`compile_engine` - or a public `kit_cg`/`kit_compile` surface for embedders driving multi-TU LTO. +- **Recording arena follow-through** (§4): Phase 1 relies on `c->tu` having + compiler-session lifetime for the cross-TU recorder/module. If frontend reset + semantics later make `c->tu` per-source again, move the recorder/module to an + explicit cross-source arena without changing the frontend staging API. +- **`-flto` flag surface** (largely resolved — see Status): `-flto` opt-in on `cc` + and the build verbs, decided per the Status section. Still open: whether + `-fwhole-program` is a distinct, more aggressive internalization mode, and whether + to make cross-TU LTO the `-O1` default later. +- **CG API exposure**: how much of the borrowed lifecycle + (`kit_cg_begin`/`kit_cg_begin_unit`/`kit_cg_finish`/`kit_cg_detach`) remains + internal to the driver (`build.c`'s `build_compile_all`, `cc_run_link_exe`) + versus becoming a public `kit_cg`/`kit_compile` surface for embedders driving + multi-TU LTO. diff --git a/doc/plan/RELEASE.md b/doc/plan/RELEASE.md @@ -145,15 +145,15 @@ Verified native/VM run signoff: at `-O1`. - [ ] Decide the release spelling (`-flto`, plus any rejected aliases) and make diagnostics precise. -- [ ] Finish link-picture preserved/export set computation for LTO: +- [x] Finish link-picture preserved/export set computation for LTO: entry symbol, dynamic imports used by executable links, opaque object/asm references, `used`, init/fini, IFUNC, address-significant symbols, and visibility. -- [ ] Internalize non-preserved globals and re-run whole-module reachability. +- [x] Internalize non-preserved globals and re-run whole-module reachability. - [ ] Keep shared-library creation out of scope; make `-shared -flto` reject cleanly as part of the general dynamic-library creation policy. -- [ ] Validate cross-TU inlining and interposition safety on arm64/x64/rv64. -- [ ] Add LTO tests for `cc`, `build-exe`, `build-lib`, and `build-obj`. +- [x] Validate cross-TU inlining and interposition safety on arm64/x64/rv64. +- [x] Add LTO tests for `cc`, `build-exe`, `build-lib`, and `build-obj`. - [ ] Refresh O0/O1 benchmark baselines and record LTO impact separately. ## Build coordinator diff --git a/driver/cmd/build.c b/driver/cmd/build.c @@ -37,7 +37,8 @@ * Per-language frontend flags route through `-X<lang> FLAG` (e.g. * `-Xwasm -mfeature=simd128`). */ -/* Stand-in for "no -x; resolve language from the path suffix at compile time." */ +/* Stand-in for "no -x; resolve language from the path suffix at compile time." + */ #define BUILD_LANG_AUTO ((KitLanguage)KIT_LANG_COUNT) typedef enum BuildOutputKind { @@ -123,7 +124,8 @@ typedef struct BuildGroup { uint32_t m_nsys; KitDefine* m_def; uint32_t m_ndef; - uint32_t m_def_cap; /* allocation size; m_ndef <= cap once globals are shadowed */ + uint32_t + m_def_cap; /* allocation size; m_ndef <= cap once globals are shadowed */ KitSlice* m_und; uint32_t m_nund; } BuildGroup; @@ -136,17 +138,18 @@ typedef struct BuildOptions { size_t argv_bound; /* Output / per-output state. */ - int emit; /* BuildEmit (build-obj) */ + int emit; /* BuildEmit (build-obj) */ int syntax_only; int opt_level; int debug_info; - int dynamic; /* -dynamic / -shared */ - int shared_requested; /* -shared spelling, for build-exe diagnostics */ - int shared; /* computed: kind==lib && dynamic */ - int static_link; /* -static */ - int pie; /* -pie */ - int function_sections; /* -ffunction-sections */ - int data_sections; /* -fdata-sections */ + int dynamic; /* -dynamic / -shared */ + int shared_requested; /* -shared spelling, for build-exe diagnostics */ + int shared; /* computed: kind==lib && dynamic */ + int static_link; /* -static */ + int pie; /* -pie */ + int function_sections; /* -ffunction-sections */ + int data_sections; /* -fdata-sections */ + int lto; /* -flto/-fno-lto */ uint8_t default_visibility; /* KitSymVis */ int warnings_are_errors; uint32_t max_errors; @@ -304,7 +307,8 @@ static void build_release(BuildOptions* o) { size_t bound = o->argv_bound; for (i = 0; i < o->narchives; ++i) if (o->archives[i].owned) - driver_free(o->env, (void*)o->archives[i].path, o->archives[i].owned_size); + driver_free(o->env, (void*)o->archives[i].path, + o->archives[i].owned_size); for (i = 0; i < o->ndsos; ++i) if (o->dsos[i].owned) driver_free(o->env, (void*)o->dsos[i].path, o->dsos[i].owned_size); @@ -314,11 +318,13 @@ static void build_release(BuildOptions* o) { if (g->fe) driver_free(o->env, g->fe, bound * sizeof(*g->fe)); if (g->m_inc) driver_free(o->env, g->m_inc, g->m_ninc * sizeof(*g->m_inc)); if (g->m_sys) driver_free(o->env, g->m_sys, g->m_nsys * sizeof(*g->m_sys)); - if (g->m_def) driver_free(o->env, g->m_def, g->m_def_cap * sizeof(*g->m_def)); + if (g->m_def) + driver_free(o->env, g->m_def, g->m_def_cap * sizeof(*g->m_def)); if (g->m_und) driver_free(o->env, g->m_und, g->m_nund * sizeof(*g->m_und)); } if (o->owned_sysroot_lib_dir) - driver_free(o->env, o->owned_sysroot_lib_dir, o->owned_sysroot_lib_dir_size); + driver_free(o->env, o->owned_sysroot_lib_dir, + o->owned_sysroot_lib_dir_size); driver_hosted_plan_fini(o->env, &o->hosted); driver_link_flags_fini(&o->link); driver_target_features_fini(&o->target_features, o->env); @@ -326,7 +332,8 @@ static void build_release(BuildOptions* o) { if (o->groups) driver_free(o->env, o->groups, bound * sizeof(*o->groups)); if (o->object_files) driver_free(o->env, o->object_files, bound * sizeof(*o->object_files)); - if (o->archives) driver_free(o->env, o->archives, bound * sizeof(*o->archives)); + if (o->archives) + driver_free(o->env, o->archives, bound * sizeof(*o->archives)); if (o->dsos) driver_free(o->env, o->dsos, bound * sizeof(*o->dsos)); if (o->lib_search_paths) driver_free(o->env, o->lib_search_paths, @@ -341,7 +348,8 @@ static void build_release(BuildOptions* o) { /* link-item bookkeeping (build-exe) */ /* ===================================================================== */ -static void build_push_link_item(BuildOptions* o, uint8_t kind, uint32_t index) { +static void build_push_link_item(BuildOptions* o, uint8_t kind, + uint32_t index) { BuildLinkItem* it = &o->link_items[o->nlink_items++]; it->kind = kind; it->index = index; @@ -351,7 +359,8 @@ static void build_insert_link_item(BuildOptions* o, uint32_t pos, uint8_t kind, uint32_t index) { uint32_t i; if (pos > o->nlink_items) pos = o->nlink_items; - for (i = o->nlink_items; i > pos; --i) o->link_items[i] = o->link_items[i - 1u]; + for (i = o->nlink_items; i > pos; --i) + o->link_items[i] = o->link_items[i - 1u]; o->link_items[pos].kind = kind; o->link_items[pos].index = index; o->nlink_items++; @@ -542,9 +551,11 @@ static int build_is_global_flag(const char* a) { driver_streq(a, "-S") || driver_strneq(a, "--emit=", 7) || driver_streq(a, "-fsyntax-only") || driver_strneq(a, "-fPIC", 5) || driver_strneq(a, "-fpic", 5) || driver_strneq(a, "-fPIE", 5) || - driver_strneq(a, "-fpie", 5) || driver_strneq(a, "-fvisibility=", 13) || + driver_strneq(a, "-fpie", 5) || + driver_strneq(a, "-fvisibility=", 13) || driver_streq(a, "-ffunction-sections") || - driver_streq(a, "-fdata-sections") || driver_streq(a, "-static") || + driver_streq(a, "-fdata-sections") || driver_streq(a, "-flto") || + driver_streq(a, "-fno-lto") || driver_streq(a, "-static") || driver_streq(a, "-dynamic") || driver_streq(a, "-shared") || driver_streq(a, "-pie") || driver_streq(a, "-no-pie") || driver_streq(a, "-target") || driver_strneq(a, "--target", 8) || @@ -587,7 +598,7 @@ static int build_parse_group(BuildOptions* o, int argc, char** argv, int* i) { driver_errf(o->tool, "--group requires `--` before its sources"); return 1; } - ++(*i); /* past `--` */ + ++(*i); /* past `--` */ o->cur_group = gid; /* subsequent sources belong to this group */ return 0; } @@ -736,6 +747,14 @@ static int build_parse(int argc, char** argv, BuildOptions* o) { o->data_sections = 0; continue; } + if (driver_streq(a, "-flto")) { + o->lto = 1; + continue; + } + if (driver_streq(a, "-fno-lto")) { + o->lto = 0; + continue; + } if (driver_streq(a, "-ffreestanding")) { o->freestanding = 1; continue; @@ -1023,7 +1042,8 @@ static int build_apply_hosted_profile(BuildOptions* o) { req.link_inputs = 1; if (driver_hosted_resolve(&req, &o->hosted) != 0) return 1; for (i = 0; i < o->hosted.nsystem_includes; ++i) - o->groups[0].cf.system_include_dirs[o->groups[0].cf.nsystem_include_dirs++] = + o->groups[0] + .cf.system_include_dirs[o->groups[0].cf.nsystem_include_dirs++] = o->hosted.system_includes[i]; for (i = 0; i < o->hosted.ndefines; ++i) o->groups[0].cf.defines[o->groups[0].cf.ndefines++] = o->hosted.defines[i]; @@ -1044,7 +1064,8 @@ static int build_apply_hosted_profile(BuildOptions* o) { return 0; } -/* Append `<sysroot>/lib` to the search path for Windows targets (mirrors cc). */ +/* Append `<sysroot>/lib` to the search path for Windows targets (mirrors cc). + */ static int build_append_windows_lib_dirs(BuildOptions* o) { const char* sysroot = o->sysroot; char* joined; @@ -1107,8 +1128,10 @@ static int build_group_build_pp(BuildOptions* o, uint32_t gi) { } { uint32_t p = 0, j; - for (k = 0; k < g->cf.ninclude_dirs; ++k) g->m_inc[p++] = g->cf.include_dirs[k]; - for (k = 0; k < gl->cf.ninclude_dirs; ++k) g->m_inc[p++] = gl->cf.include_dirs[k]; + for (k = 0; k < g->cf.ninclude_dirs; ++k) + g->m_inc[p++] = g->cf.include_dirs[k]; + for (k = 0; k < gl->cf.ninclude_dirs; ++k) + g->m_inc[p++] = gl->cf.include_dirs[k]; p = 0; for (k = 0; k < g->cf.nsystem_include_dirs; ++k) g->m_sys[p++] = g->cf.system_include_dirs[k]; @@ -1228,13 +1251,13 @@ static int build_compile_source(BuildOptions* o, KitCompiler* compiler, if (fe_n) { if (kit_frontend_parse_options(compiler, lang, (int)fe_n, fe_argv, &lang_extra) != KIT_OK) { - driver_errf(o->tool, "unsupported -X%.*s frontend flag: %.*s", - KIT_SLICE_ARG(kit_slice_cstr( - lang == KIT_LANG_C ? "c" - : lang == KIT_LANG_ASM ? "asm" - : lang == KIT_LANG_TOY ? "toy" - : "wasm")), - KIT_SLICE_ARG(kit_slice_cstr(fe_argv[0]))); + driver_errf( + o->tool, "unsupported -X%.*s frontend flag: %.*s", + KIT_SLICE_ARG(kit_slice_cstr(lang == KIT_LANG_C ? "c" + : lang == KIT_LANG_ASM ? "asm" + : lang == KIT_LANG_TOY ? "toy" + : "wasm")), + KIT_SLICE_ARG(kit_slice_cstr(fe_argv[0]))); goto out; } } @@ -1260,6 +1283,7 @@ static void build_fill_code(const BuildOptions* o, KitCodeOptions* code) { code->default_visibility = o->default_visibility; code->function_sections = o->function_sections ? true : false; code->data_sections = o->data_sections ? true : false; + code->lto = o->lto ? true : false; code->epoch = o->epoch; } @@ -1341,20 +1365,103 @@ static int build_open_output(const KitContext* ctx, DriverEnv* env, static int build_compile_all(BuildOptions* o, KitCompiler* compiler, const KitContext* ctx, const KitCodeOptions* code, const KitDiagnosticOptions* diag, - KitObjBuilder** objs) { + KitObjBuilder** objs, uint32_t* source_obj_index, + uint8_t* source_order_keep, + const DriverCompileBatchOptions* batch, + DriverCompilePendingLto* pending_lto, + uint32_t* nobjs_out) { + DriverLoad* loads = NULL; + DriverCompileSource* sources = NULL; + void** lang_extras = NULL; + DriverCompileObjects out; uint32_t i; + KitStatus st; + int rc = 1; + + if (nobjs_out) *nobjs_out = 0; + if (o->nsources == 0) return 0; + + loads = driver_alloc_zeroed(o->env, o->nsources * sizeof(*loads)); + sources = driver_alloc_zeroed(o->env, o->nsources * sizeof(*sources)); + lang_extras = driver_alloc_zeroed(o->env, o->nsources * sizeof(*lang_extras)); + if (!loads || !sources || !lang_extras) { + driver_errf(o->tool, "out of memory"); + goto out; + } + for (i = 0; i < o->nsources; ++i) { - if (build_compile_source(o, compiler, ctx, i, code, diag, NULL, - &objs[i]) != 0) - return 1; + const char* path = o->sources[i].path; + uint32_t gi = o->sources[i].group; + KitLanguage lang; + char** fe_argv = NULL; + uint32_t fe_n = 0; + + if (driver_load_bytes(ctx->file_io, o->tool, path, &loads[i], + &sources[i].bytes) != 0) + goto out; + lang = build_resolve_lang(o, compiler, i); + if (lang == KIT_LANG_UNKNOWN) { + driver_errf(o->tool, "cannot determine language for %.*s (use -x LANG)", + KIT_SLICE_ARG(kit_slice_cstr(path))); + goto out; + } + if (build_collect_fe_argv(o, i, lang, &fe_argv, &fe_n) != 0) { + if (fe_argv) driver_free(o->env, fe_argv, fe_n * sizeof(*fe_argv)); + goto out; + } + if (fe_n) { + if (kit_frontend_parse_options(compiler, lang, (int)fe_n, fe_argv, + &lang_extras[i]) != KIT_OK) { + driver_errf( + o->tool, "unsupported -X%.*s frontend flag: %.*s", + KIT_SLICE_ARG(kit_slice_cstr(lang == KIT_LANG_C ? "c" + : lang == KIT_LANG_ASM ? "asm" + : lang == KIT_LANG_TOY ? "toy" + : "wasm")), + KIT_SLICE_ARG(kit_slice_cstr(fe_argv[0]))); + if (fe_argv) driver_free(o->env, fe_argv, fe_n * sizeof(*fe_argv)); + goto out; + } + } + if (fe_argv) driver_free(o->env, fe_argv, fe_n * sizeof(*fe_argv)); + sources[i].lang = lang; + sources[i].name = kit_slice_cstr(path); + sources[i].pp = &o->groups[gi].pp; + sources[i].lang_extra = lang_extras[i]; } - return 0; + + memset(&out, 0, sizeof out); + out.objs = objs; + out.source_obj_index = source_obj_index; + out.source_order_keep = source_order_keep; + out.pending_lto = pending_lto; + st = driver_compile_sources_run(compiler, code, diag, sources, o->nsources, + batch, &out); + if (nobjs_out) *nobjs_out = out.nobjs; + if (st != KIT_OK) goto out; + rc = 0; + +out: + if (lang_extras && sources) { + for (i = 0; i < o->nsources; ++i) + if (lang_extras[i]) + kit_frontend_free_options(compiler, sources[i].lang, lang_extras[i]); + } + if (loads) + for (i = 0; i < o->nsources; ++i) + driver_release_bytes(ctx->file_io, &loads[i]); + if (lang_extras) + driver_free(o->env, lang_extras, o->nsources * sizeof(*lang_extras)); + if (sources) driver_free(o->env, sources, o->nsources * sizeof(*sources)); + if (loads) driver_free(o->env, loads, o->nsources * sizeof(*loads)); + return rc; } /* build-exe / shared build-lib: compile sources, load link inputs, link. */ static int build_run_link(BuildOptions* o, KitCompiler* compiler, const KitContext* ctx, const KitCodeOptions* code, - const KitDiagnosticOptions* diag, uint8_t output_kind) { + const KitDiagnosticOptions* diag, + uint8_t output_kind) { DriverEnv* env = o->env; const KitFileIO* io = ctx->file_io; KitWriter* out_w = NULL; @@ -1369,14 +1476,24 @@ static int build_run_link(BuildOptions* o, KitCompiler* compiler, KitSlice* dso_names = NULL; KitLinkInputOrder* order = NULL; KitObjBuilder** objs = NULL; + DriverCompilePendingLto pending_lto = {0}; + uint32_t* source_obj_index = NULL; + uint8_t* source_order_keep = NULL; KitLinkScript* script = NULL; KitSlice* rpath_slices = NULL; + DriverCompileBatchOptions lto_batch; + uint32_t nobjs = 0; uint32_t i; + uint32_t norder = 0; int rc = 1; if (o->nsources) { objs = driver_alloc_zeroed(env, o->nsources * sizeof(*objs)); - if (!objs) goto oom; + source_obj_index = + driver_alloc_zeroed(env, o->nsources * sizeof(*source_obj_index)); + source_order_keep = + driver_alloc_zeroed(env, o->nsources * sizeof(*source_order_keep)); + if (!objs || !source_obj_index || !source_order_keep) goto oom; } if (o->nobject_files) { obj_lf = driver_alloc_zeroed(env, o->nobject_files * sizeof(*obj_lf)); @@ -1434,7 +1551,21 @@ static int build_run_link(BuildOptions* o, KitCompiler* compiler, if (kit_link_script_parse(ctx, text, &script) != KIT_OK) goto out; } - if (build_compile_all(o, compiler, ctx, code, diag, objs) != 0) goto out; + { + memset(&lto_batch, 0, sizeof lto_batch); + lto_batch.output_kind = output_kind == KIT_LINK_OUTPUT_SHARED + ? KIT_CG_OUTPUT_SHARED + : KIT_CG_OUTPUT_EXECUTABLE; + lto_batch.interposition_policy = + output_kind == KIT_LINK_OUTPUT_SHARED + ? KIT_CG_INTERPOSITION_DEFAULT_VISIBILITY + : KIT_CG_INTERPOSITION_DEFAULT; + lto_batch.defer_lto_finish = 1; + if (build_compile_all(o, compiler, ctx, code, diag, objs, source_obj_index, + source_order_keep, &lto_batch, &pending_lto, + &nobjs) != 0) + goto out; + } if (build_open_output(ctx, env, o->tool, o->output_path, &out_w) != 0) goto out; @@ -1442,26 +1573,32 @@ static int build_run_link(BuildOptions* o, KitCompiler* compiler, /* Translate the recorded link order into KitLinkInputOrder. */ for (i = 0; i < o->nlink_items; ++i) { const BuildLinkItem* item = &o->link_items[i]; - KitLinkInputOrder* ord = &order[i]; + KitLinkInputOrder* ord; switch ((BuildLinkKind)item->kind) { case BUILD_LINK_SOURCE: + if (!source_order_keep[item->index]) continue; + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_OBJ; - ord->index = item->index; + ord->index = source_obj_index[item->index]; break; case BUILD_LINK_OBJECT: + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_OBJ_BYTES; ord->index = item->index; break; case BUILD_LINK_ARCHIVE: + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_ARCHIVE; ord->index = item->index; break; case BUILD_LINK_DSO: + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_DSO; ord->index = item->index; break; case BUILD_LINK_LIB: { const BuildPendingLib* pl = &o->pending_libs[item->index]; + ord = &order[norder++]; if (pl->resolved_kind == BUILD_LINK_DSO) { ord->kind = KIT_LINK_INPUT_DSO; ord->index = pl->resolved_index; @@ -1485,7 +1622,7 @@ static int build_run_link(BuildOptions* o, KitCompiler* compiler, goto out; memset(&li, 0, sizeof(li)); li.objs = objs; - li.nobjs = o->nsources; + li.nobjs = nobjs; li.obj_names = obj_names; li.obj_bytes = obj_in; li.nobj_bytes = o->nobject_files; @@ -1495,8 +1632,9 @@ static int build_run_link(BuildOptions* o, KitCompiler* compiler, li.dso_bytes = dso_in; li.ndsos = o->ndsos; li.order = order; - li.norder = o->nlink_items; - st = driver_link_engine_emit(compiler, &lopts, &li, out_w); + li.norder = norder; + st = driver_link_engine_emit_with_lto(compiler, &lopts, &li, &pending_lto, + &lto_batch, out_w); rc = (st == KIT_OK) ? 0 : 1; } @@ -1511,6 +1649,7 @@ out: } } if (script) kit_link_script_free(ctx, script); + driver_compile_pending_lto_abort(&pending_lto); driver_link_flags_free_rpath_slices(&o->link, rpath_slices); driver_release_bytes(io, &script_lf); if (arch_lf) @@ -1532,9 +1671,14 @@ out: /* The link session borrows the per-source builders (it frees only its own * pointer array), so the caller still owns and must release them. */ if (objs) { - for (i = 0; i < o->nsources; ++i) kit_obj_builder_free(objs[i]); + for (i = 0; i < nobjs; ++i) kit_obj_builder_free(objs[i]); driver_free(env, objs, o->nsources * sizeof(*objs)); } + if (source_order_keep) + driver_free(env, source_order_keep, + o->nsources * sizeof(*source_order_keep)); + if (source_obj_index) + driver_free(env, source_obj_index, o->nsources * sizeof(*source_obj_index)); return rc; oom: @@ -1549,21 +1693,42 @@ static int build_run_relocatable(BuildOptions* o, KitCompiler* compiler, const KitDiagnosticOptions* diag) { DriverEnv* env = o->env; KitObjBuilder** objs = NULL; + uint32_t* source_obj_index = NULL; + uint8_t* source_order_keep = NULL; KitLinkInputOrder* order = NULL; + DriverCompilePendingLto pending_lto = {0}; KitWriter* out_w = NULL; + DriverCompileBatchOptions lto_batch; + uint32_t nobjs = 0; + uint32_t norder = 0; uint32_t i; int rc = 1; objs = driver_alloc_zeroed(env, o->nsources * sizeof(*objs)); + source_obj_index = + driver_alloc_zeroed(env, o->nsources * sizeof(*source_obj_index)); + source_order_keep = + driver_alloc_zeroed(env, o->nsources * sizeof(*source_order_keep)); order = driver_alloc_zeroed(env, o->nsources * sizeof(*order)); - if (!objs || !order) { + if (!objs || !source_obj_index || !source_order_keep || !order) { driver_errf(o->tool, "out of memory"); goto out; } - if (build_compile_all(o, compiler, ctx, code, diag, objs) != 0) goto out; + { + memset(&lto_batch, 0, sizeof lto_batch); + lto_batch.output_kind = KIT_CG_OUTPUT_RELOCATABLE; + lto_batch.interposition_policy = KIT_CG_INTERPOSITION_DEFAULT; + lto_batch.defer_lto_finish = 1; + if (build_compile_all(o, compiler, ctx, code, diag, objs, source_obj_index, + source_order_keep, &lto_batch, &pending_lto, + &nobjs) != 0) + goto out; + } for (i = 0; i < o->nsources; ++i) { - order[i].kind = KIT_LINK_INPUT_OBJ; - order[i].index = i; + if (!source_order_keep[i]) continue; + order[norder].kind = KIT_LINK_INPUT_OBJ; + order[norder].index = source_obj_index[i]; + ++norder; } if (build_open_output(ctx, env, o->tool, o->output_path, &out_w) != 0) goto out; @@ -1577,22 +1742,29 @@ static int build_run_relocatable(BuildOptions* o, KitCompiler* compiler, lopts.strip_debug = o->link.strip_debug ? true : false; memset(&li, 0, sizeof(li)); li.objs = objs; - li.nobjs = o->nsources; + li.nobjs = nobjs; li.order = order; - li.norder = o->nsources; - st = driver_link_engine_emit(compiler, &lopts, &li, out_w); + li.norder = norder; + st = driver_link_engine_emit_with_lto(compiler, &lopts, &li, &pending_lto, + &lto_batch, out_w); rc = (st == KIT_OK) ? 0 : 1; } out: if (out_w) kit_writer_close(out_w); + driver_compile_pending_lto_abort(&pending_lto); if (order) driver_free(env, order, o->nsources * sizeof(*order)); /* The link session borrows the builders; release them here (see * build_run_link). */ if (objs) { - for (i = 0; i < o->nsources; ++i) kit_obj_builder_free(objs[i]); + for (i = 0; i < nobjs; ++i) kit_obj_builder_free(objs[i]); driver_free(env, objs, o->nsources * sizeof(*objs)); } + if (source_order_keep) + driver_free(env, source_order_keep, + o->nsources * sizeof(*source_order_keep)); + if (source_obj_index) + driver_free(env, source_obj_index, o->nsources * sizeof(*source_obj_index)); return rc; } @@ -1602,49 +1774,72 @@ static int build_run_archive(BuildOptions* o, KitCompiler* compiler, const KitDiagnosticOptions* diag) { DriverEnv* env = o->env; KitObjBuilder** objs = NULL; + uint32_t* source_obj_index = NULL; + uint8_t* source_order_keep = NULL; KitSlice* names = NULL; char** owned_names = NULL; size_t* owned_name_sizes = NULL; KitWriter* out_w = NULL; + uint32_t nobjs = 0; uint32_t i; int rc = 1; objs = driver_alloc_zeroed(env, o->nsources * sizeof(*objs)); + source_obj_index = + driver_alloc_zeroed(env, o->nsources * sizeof(*source_obj_index)); + source_order_keep = + driver_alloc_zeroed(env, o->nsources * sizeof(*source_order_keep)); names = driver_alloc_zeroed(env, o->nsources * sizeof(*names)); owned_names = driver_alloc_zeroed(env, o->nsources * sizeof(*owned_names)); owned_name_sizes = driver_alloc_zeroed(env, o->nsources * sizeof(*owned_name_sizes)); - if (!objs || !names || !owned_names || !owned_name_sizes) { + if (!objs || !source_obj_index || !source_order_keep || !names || + !owned_names || !owned_name_sizes) { driver_errf(o->tool, "out of memory"); goto out; } - if (build_compile_all(o, compiler, ctx, code, diag, objs) != 0) goto out; + { + DriverCompileBatchOptions batch; + memset(&batch, 0, sizeof batch); + batch.output_kind = KIT_CG_OUTPUT_ARCHIVE_MEMBER; + batch.interposition_policy = KIT_CG_INTERPOSITION_DEFAULT; + if (build_compile_all(o, compiler, ctx, code, diag, objs, source_obj_index, + source_order_keep, &batch, NULL, &nobjs) != 0) + goto out; + } /* build-lib always emits objects (validated), so build_default_obj_name * yields the right `.o`/`.obj` member name. */ for (i = 0; i < o->nsources; ++i) { - owned_names[i] = - build_default_obj_name(env, o, o->sources[i].path, &owned_name_sizes[i]); - if (!owned_names[i]) { + uint32_t oi; + if (!source_order_keep[i]) continue; + oi = source_obj_index[i]; + owned_names[oi] = build_default_obj_name(env, o, o->sources[i].path, + &owned_name_sizes[oi]); + if (!owned_names[oi]) { driver_errf(o->tool, "out of memory"); goto out; } - names[i] = kit_slice_cstr(owned_names[i]); + names[oi] = kit_slice_cstr(owned_names[oi]); } if (build_open_output(ctx, env, o->tool, o->output_path, &out_w) != 0) goto out; - rc = driver_archive_emit(env, ctx, o->tool, objs, names, o->nsources, - o->epoch, out_w); + rc = driver_archive_emit(env, ctx, o->tool, objs, names, nobjs, o->epoch, + out_w); out: if (out_w) kit_writer_close(out_w); if (objs) - for (i = 0; i < o->nsources; ++i) kit_obj_builder_free(objs[i]); + for (i = 0; i < nobjs; ++i) kit_obj_builder_free(objs[i]); if (owned_names) { for (i = 0; i < o->nsources; ++i) - if (owned_names[i]) - driver_free(env, owned_names[i], owned_name_sizes[i]); + if (owned_names[i]) driver_free(env, owned_names[i], owned_name_sizes[i]); } if (objs) driver_free(env, objs, o->nsources * sizeof(*objs)); + if (source_order_keep) + driver_free(env, source_order_keep, + o->nsources * sizeof(*source_order_keep)); + if (source_obj_index) + driver_free(env, source_obj_index, o->nsources * sizeof(*source_obj_index)); if (names) driver_free(env, names, o->nsources * sizeof(*names)); if (owned_names) driver_free(env, owned_names, o->nsources * sizeof(*owned_names)); @@ -1669,7 +1864,8 @@ static int build_run_per_source(BuildOptions* o, KitCompiler* compiler, for (i = 0; i < o->nsources; ++i) { if (o->syntax_only) { KitObjBuilder* ob = NULL; - int rc = build_compile_source(o, compiler, ctx, i, &code, diag, NULL, &ob); + int rc = + build_compile_source(o, compiler, ctx, i, &code, diag, NULL, &ob); kit_obj_builder_free(ob); if (rc != 0) return rc; continue; @@ -1706,8 +1902,8 @@ static int build_run_per_source(BuildOptions* o, KitCompiler* compiler, /* ===================================================================== */ static int build_validate(BuildOptions* o) { - uint32_t total_link = o->nobject_files + o->narchives + o->ndsos + - o->npending_libs; + uint32_t total_link = + o->nobject_files + o->narchives + o->ndsos + o->npending_libs; if (o->nsources == 0 && total_link == 0) { driver_errf(o->tool, "no input files"); @@ -1887,8 +2083,8 @@ static int build_main(int argc, char** argv, int kind, const char* tool) { if (!o.no_stdlib && !o.no_defaultlibs) { DriverRuntimeArchive rt = {0}; uint32_t insert_pos; - if (driver_runtime_prepare_archive(&env, tool, &runtime, o.target, o.epoch, - &rt) != 0) { + if (driver_runtime_prepare_archive(&env, tool, &runtime, o.target, + o.epoch, &rt) != 0) { driver_runtime_archive_fini(&env, &rt); rc = 1; goto done; @@ -1971,12 +2167,14 @@ void driver_help_build_exe(void) { " -o PATH Output (default a.out / a.exe)\n" " -O0 -O1 -O2 -g Optimization / debug info\n" " -target TRIPLE Cross-compile target\n" + " -flto Link-time optimization for source inputs\n" " -static Fully static executable\n" " -l NAME -L DIR Link a library / add a search dir\n" " -e SYM -T script.ld Entry symbol / linker script\n" " -Wl,... Linker pass-through\n" " --group [flags] -- sources... Scope compile flags to sources\n" - " -X<lang> FLAG Per-language frontend flag (c|asm|toy|wasm)\n" + " -X<lang> FLAG Per-language frontend flag " + "(c|asm|toy|wasm)\n" " -h, --help Show this help\n"))); } @@ -1990,7 +2188,8 @@ void driver_help_build_lib(void) { " kit build-lib -o LIB.a [options] sources...\n" "\n" "DESCRIPTION\n" - " Compiles a polyglot source set in memory and archives the objects\n" + " Compiles a polyglot source set in memory and archives the " + "objects\n" " into a static library (.a) with a symbol index. Dynamic/shared\n" " libraries are not yet supported.\n" "\n" @@ -1998,9 +2197,11 @@ void driver_help_build_lib(void) { " -o PATH Output archive (required)\n" " -fPIC Position-independent code\n" " -O0 -O1 -O2 -g Optimization / debug info\n" + " -flto Link-time optimization for source inputs\n" " -target TRIPLE Cross-compile target\n" " --group [flags] -- sources... Scope compile flags to sources\n" - " -X<lang> FLAG Per-language frontend flag (c|asm|toy|wasm)\n" + " -X<lang> FLAG Per-language frontend flag " + "(c|asm|toy|wasm)\n" " -h, --help Show this help\n"))); } @@ -2014,7 +2215,8 @@ void driver_help_build_obj(void) { " kit build-obj [options] sources...\n" "\n" "DESCRIPTION\n" - " Compiles each source (C / asm / toy / wasm by suffix or -x) to an\n" + " Compiles each source (C / asm / toy / wasm by suffix or -x) to " + "an\n" " object. Multiple sources with --emit=obj combine into one\n" " relocatable object (ld -r). The kit-native replacement for the\n" " retired `compile` tool.\n" @@ -2026,11 +2228,14 @@ void driver_help_build_obj(void) { " -S Alias for --emit=asm\n" " -fsyntax-only Check only; write no output\n" " -O0 -O1 -O2 -g Optimization / debug info\n" + " -flto Link-time optimization for multi-source " + "obj\n" " -target TRIPLE Cross-compile target\n" " -I/-isystem/-D/-U Preprocessor flags (C/asm frontends)\n" " -x LANG Force language: c | asm | toy | wasm\n" " --group [flags] -- sources... Scope compile flags to sources\n" - " -X<lang> FLAG Per-language frontend flag (c|asm|toy|wasm)\n" + " -X<lang> FLAG Per-language frontend flag " + "(c|asm|toy|wasm)\n" " -o - Write the emit to stdout\n" " -h, --help Show this help\n"))); } diff --git a/driver/cmd/cc.c b/driver/cmd/cc.c @@ -126,6 +126,7 @@ typedef struct CcOptions { int debug_info; /* -g */ int function_sections; /* -ffunction-sections */ int data_sections; /* -fdata-sections */ + int lto; /* -flto/-fno-lto */ int warnings_are_errors; /* -Werror */ uint32_t max_errors; /* -fmax-errors=N */ KitTargetSpec target; /* -target / host */ @@ -249,6 +250,8 @@ void driver_help_cc(void) { "source\n" " --emit=ir -O1 [options] input.c emit semantic IR " "dump\n" + " -flto link-time " + "optimization for all source inputs\n" "\n" "(see source for the full GCC-subset flag reference)\n"))); } @@ -1052,6 +1055,14 @@ static int cc_parse(int argc, char** argv, CcOptions* o) { o->data_sections = 0; continue; } + if (driver_streq(a, "-flto")) { + o->lto = 1; + continue; + } + if (driver_streq(a, "-fno-lto")) { + o->lto = 0; + continue; + } if (driver_streq(a, "-nostdinc")) { o->nostdinc = 1; continue; @@ -1487,6 +1498,12 @@ static int cc_parse(int argc, char** argv, CcOptions* o) { "-shared is incompatible with -c/-S/-E/-fsyntax-only"); return 1; } + if (o->shared && o->lto) { + driver_errf(CC_TOOL, + "-shared -flto is not supported yet " + "(shared-library LTO output is not exercised)"); + return 1; + } if (o->emit_ir && o->opt_level < 1) { driver_errf(CC_TOOL, "--emit=ir requires -O1 or higher " @@ -2028,6 +2045,7 @@ static void cc_fill_c_opts(const CcOptions* o, KitCCompileOptions* copts) { copts->code.emit_asm_source = o->emit_asm_source ? true : false; copts->code.function_sections = o->function_sections ? true : false; copts->code.data_sections = o->data_sections ? true : false; + copts->code.lto = o->lto ? true : false; copts->code.epoch = o->epoch; copts->code.path_map = o->npath_map ? o->path_map : NULL; copts->code.npath_map = o->npath_map; @@ -2298,11 +2316,18 @@ static int cc_run_link_exe(DriverEnv* env, const CcOptions* o, KitSlice* dso_names = NULL; KitLinkInputOrder* order = NULL; KitObjBuilder** objs = NULL; + DriverCompileSource* sources = NULL; + DriverCompilePendingLto pending_lto = {0}; + uint32_t* source_obj_index = NULL; + uint8_t* source_order_keep = NULL; KitLinkScript* script = NULL; KitSlice* rpath_slices = NULL; KitCCompileOptions copts; + DriverCompileBatchOptions lto_batch; uint32_t nsrc = o->nsource_files + o->nsource_memory; uint32_t i; + uint32_t nobjs = 0; + uint32_t norder = 0; int rc = 1; if (!io || !io->read_all || !io->open_writer) { @@ -2320,7 +2345,12 @@ static int cc_run_link_exe(DriverEnv* env, const CcOptions* o, } if (nsrc) { objs = driver_alloc_zeroed(env, nsrc * sizeof(*objs)); - if (!objs) { + sources = driver_alloc_zeroed(env, nsrc * sizeof(*sources)); + source_obj_index = + driver_alloc_zeroed(env, nsrc * sizeof(*source_obj_index)); + source_order_keep = + driver_alloc_zeroed(env, nsrc * sizeof(*source_order_keep)); + if (!objs || !sources || !source_obj_index || !source_order_keep) { driver_errf(CC_TOOL, "out of memory"); goto out; } @@ -2405,21 +2435,49 @@ static int cc_run_link_exe(DriverEnv* env, const CcOptions* o, } cc_fill_c_opts(o, &copts); + memset(&lto_batch, 0, sizeof lto_batch); + lto_batch.output_kind = + o->shared ? KIT_CG_OUTPUT_SHARED : KIT_CG_OUTPUT_EXECUTABLE; + lto_batch.interposition_policy = o->shared + ? KIT_CG_INTERPOSITION_DEFAULT_VISIBILITY + : KIT_CG_INTERPOSITION_DEFAULT; + lto_batch.defer_lto_finish = 1; for (i = 0; i < o->nsource_files; ++i) { KitLanguage lang = cc_resolve_lang(compiler, o->source_files[i], o->source_langs[i]); - KitStatus st; - st = cc_compile_source_obj(compiler, lang, &copts, pp, - kit_slice_cstr(o->source_files[i]), - &src_bytes[i], &objs[i]); - if (st != KIT_OK) goto out; + if (lang == KIT_LANG_UNKNOWN) { + driver_errf(CC_TOOL, "cannot determine language for %.*s (use -x LANG)", + KIT_SLICE_ARG(kit_slice_cstr(o->source_files[i]))); + goto out; + } + sources[i].lang = lang; + sources[i].name = kit_slice_cstr(o->source_files[i]); + sources[i].bytes = src_bytes[i]; + sources[i].pp = pp; } for (i = 0; i < o->nsource_memory; ++i) { + uint32_t si = o->nsource_files + i; + if (o->source_memory[i].lang == KIT_LANG_UNKNOWN) { + driver_errf(CC_TOOL, "cannot determine language for %.*s (use -x LANG)", + KIT_SLICE_ARG(o->source_memory[i].name)); + goto out; + } + sources[si].lang = o->source_memory[i].lang; + sources[si].name = o->source_memory[i].name; + sources[si].bytes = o->source_memory[i].bytes; + sources[si].pp = pp; + } + if (nsrc) { + DriverCompileObjects cout; KitStatus st; - st = cc_compile_source_obj(compiler, o->source_memory[i].lang, &copts, pp, - o->source_memory[i].name, - &o->source_memory[i].bytes, - &objs[o->nsource_files + i]); + memset(&cout, 0, sizeof cout); + cout.objs = objs; + cout.source_obj_index = source_obj_index; + cout.source_order_keep = source_order_keep; + cout.pending_lto = &pending_lto; + st = driver_compile_sources_run(compiler, &copts.code, &copts.diagnostics, + sources, nsrc, &lto_batch, &cout); + nobjs = cout.nobjs; if (st != KIT_OK) goto out; } @@ -2447,30 +2505,40 @@ static int cc_run_link_exe(DriverEnv* env, const CcOptions* o, DriverLinkInputs li; for (oi = 0; oi < o->nlink_items; ++oi) { const CcLinkItem* item = &o->link_items[oi]; - KitLinkInputOrder* ord = &order[oi]; + KitLinkInputOrder* ord; switch ((CcLinkItemKind)item->kind) { case CC_LINK_SOURCE_FILE: + if (!source_order_keep[item->index]) continue; + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_OBJ; - ord->index = item->index; + ord->index = source_obj_index[item->index]; break; - case CC_LINK_SOURCE_MEMORY: + case CC_LINK_SOURCE_MEMORY: { + uint32_t si = o->nsource_files + item->index; + if (!source_order_keep[si]) continue; + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_OBJ; - ord->index = o->nsource_files + item->index; + ord->index = source_obj_index[si]; break; + } case CC_LINK_OBJECT: + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_OBJ_BYTES; ord->index = item->index; break; case CC_LINK_ARCHIVE: + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_ARCHIVE; ord->index = item->index; break; case CC_LINK_DSO: + ord = &order[norder++]; ord->kind = KIT_LINK_INPUT_DSO; ord->index = item->index; break; case CC_LINK_LIB: { const CcPendingLib* pl = &o->pending_libs[item->index]; + ord = &order[norder++]; if (pl->resolved_kind == CC_LINK_DSO) { ord->kind = KIT_LINK_INPUT_DSO; ord->index = pl->resolved_index; @@ -2484,7 +2552,7 @@ static int cc_run_link_exe(DriverEnv* env, const CcOptions* o, } memset(&li, 0, sizeof(li)); li.objs = objs; - li.nobjs = nsrc; + li.nobjs = nobjs; li.obj_names = obj_names; li.obj_bytes = obj_in; li.nobj_bytes = o->nobject_files; @@ -2494,8 +2562,9 @@ static int cc_run_link_exe(DriverEnv* env, const CcOptions* o, li.dso_bytes = dso_in; li.ndsos = o->ndsos; li.order = order; - li.norder = o->nlink_items; - st = driver_link_engine_emit(compiler, &lopts, &li, out_w); + li.norder = norder; + st = driver_link_engine_emit_with_lto(compiler, &lopts, &li, &pending_lto, + &lto_batch, out_w); } rc = st == KIT_OK ? 0 : 1; } @@ -2510,6 +2579,7 @@ out: } } if (script) kit_link_script_free(&ctx, script); + driver_compile_pending_lto_abort(&pending_lto); driver_link_flags_free_rpath_slices(&o->link, rpath_slices); if (compiler) driver_compiler_free(compiler); kit_target_free(target); @@ -2540,9 +2610,14 @@ out: if (src_bytes) driver_free(env, src_bytes, o->nsource_files * sizeof(*src_bytes)); if (objs) { - for (i = 0; i < nsrc; ++i) kit_obj_builder_free(objs[i]); + for (i = 0; i < nobjs; ++i) kit_obj_builder_free(objs[i]); driver_free(env, objs, nsrc * sizeof(*objs)); } + if (source_order_keep) + driver_free(env, source_order_keep, nsrc * sizeof(*source_order_keep)); + if (source_obj_index) + driver_free(env, source_obj_index, nsrc * sizeof(*source_obj_index)); + if (sources) driver_free(env, sources, nsrc * sizeof(*sources)); return rc; } diff --git a/driver/lib/compile_engine.c b/driver/lib/compile_engine.c @@ -1,8 +1,38 @@ #include "compile_engine.h" #include <kit/asm_emit.h> +#include <kit/cg.h> #include <string.h> +static KitStatus driver_compile_cg_run(KitCompiler* compiler, + const KitCodeOptions* code, + const KitDiagnosticOptions* diagnostics, + const DriverCompileSource* src, + KitCg* cg) { + KitCompileSessionOptions sopts; + KitCompileSession* session = NULL; + KitSourceInput sin; + KitStatus st; + + if (!compiler || !code || !diagnostics || !src || !cg) return KIT_INVALID; + memset(&sopts, 0, sizeof(sopts)); + sopts.lang = src->lang; + sopts.compile.code = *code; + sopts.compile.diagnostics = *diagnostics; + if (src->pp) sopts.compile.preprocess = *src->pp; + sopts.compile.language_options = src->lang_extra; + + memset(&sin, 0, sizeof(sin)); + sin.name = src->name; + sin.bytes = src->bytes; + sin.lang = src->lang; + + st = kit_compile_session_new(compiler, &sopts, &session); + if (st == KIT_OK) st = kit_compile_session_compile_cg(session, &sin, cg); + kit_compile_session_free(session); + return st; +} + KitStatus driver_compile_run(KitCompiler* compiler, KitLanguage lang, const KitCodeOptions* code, const KitDiagnosticOptions* diagnostics, @@ -58,3 +88,155 @@ KitStatus driver_compile_run(KitCompiler* compiler, KitLanguage lang, kit_obj_builder_free(ob); return st; } + +static int driver_compile_lto_enabled(const KitCodeOptions* code) { + return code && code->lto && !code->check_only && !code->emit_c_source && + !code->emit_ir && !code->emit_asm_source; +} + +static KitStatus driver_compile_start_lto(KitCompiler* compiler, + const KitCodeOptions* code, + KitObjBuilder** ob_out, + KitCg** cg_out) { + KitObjBuilder* ob = NULL; + KitCg* cg = NULL; + KitStatus st; + + if (ob_out) *ob_out = NULL; + if (cg_out) *cg_out = NULL; + if (!compiler || !code || !ob_out || !cg_out) return KIT_INVALID; + st = kit_obj_builder_new(compiler, &ob); + if (st == KIT_OK) st = kit_cg_new(compiler, &cg); + if (st == KIT_OK) st = kit_cg_begin(cg, ob, code); + if (st != KIT_OK) { + kit_cg_free(cg); + kit_obj_builder_free(ob); + return st; + } + *ob_out = ob; + *cg_out = cg; + return KIT_OK; +} + +KitStatus driver_compile_pending_lto_finish( + DriverCompilePendingLto* pending, const DriverCompileBatchOptions* batch, + const KitCgSym* preserved_symbols, uint32_t npreserved_symbols) { + KitCgFinishOptions finish; + KitStatus st; + + if (!pending || !pending->active) return KIT_OK; + if (!pending->obj || !pending->cg) { + driver_compile_pending_lto_abort(pending); + return KIT_INVALID; + } + + memset(&finish, 0, sizeof finish); + finish.output_kind = batch ? batch->output_kind : KIT_CG_OUTPUT_RELOCATABLE; + finish.interposition_policy = + batch ? batch->interposition_policy : KIT_CG_INTERPOSITION_DEFAULT; + finish.preserved_symbols = preserved_symbols; + finish.npreserved_symbols = npreserved_symbols; + + st = kit_cg_finish(pending->cg, &finish); + if (st == KIT_OK) st = kit_cg_detach(pending->cg); + if (st == KIT_OK) st = kit_obj_builder_finalize(pending->obj); + + kit_cg_free(pending->cg); + pending->cg = NULL; + pending->active = 0; + return st; +} + +void driver_compile_pending_lto_abort(DriverCompilePendingLto* pending) { + if (!pending || !pending->active) return; + kit_cg_free(pending->cg); + pending->cg = NULL; + pending->active = 0; +} + +KitStatus driver_compile_sources_run(KitCompiler* compiler, + const KitCodeOptions* code, + const KitDiagnosticOptions* diagnostics, + const DriverCompileSource* sources, + uint32_t nsources, + const DriverCompileBatchOptions* batch, + DriverCompileObjects* out) { + DriverCompilePendingLto pending_lto; + int lto_order_emitted = 0; + int lto_enabled = driver_compile_lto_enabled(code); + KitStatus st = KIT_OK; + + if (!compiler || !code || !diagnostics || (!sources && nsources) || !out || + !out->objs || !out->source_obj_index || !out->source_order_keep) { + return KIT_INVALID; + } + memset(&pending_lto, 0, sizeof pending_lto); + out->nobjs = 0; + if (out->pending_lto) memset(out->pending_lto, 0, sizeof(*out->pending_lto)); + for (uint32_t i = 0; i < nsources; ++i) { + out->source_obj_index[i] = (uint32_t)-1; + out->source_order_keep[i] = 0; + } + + for (uint32_t i = 0; i < nsources; ++i) { + const DriverCompileSource* src = &sources[i]; + KitFrontendCaps caps; + int stage_cg = 0; + + memset(&caps, 0, sizeof caps); + if (lto_enabled) { + st = kit_frontend_caps(compiler, src->lang, &caps); + if (st != KIT_OK) goto out; + stage_cg = caps.lto_mode == KIT_FRONTEND_LTO_CG; + } + + if (stage_cg) { + if (!pending_lto.active) { + st = driver_compile_start_lto(compiler, code, &pending_lto.obj, + &pending_lto.cg); + if (st != KIT_OK) goto out; + pending_lto.obj_index = out->nobjs; + pending_lto.active = 1; + out->objs[out->nobjs++] = pending_lto.obj; + } + out->source_obj_index[i] = pending_lto.obj_index; + if (!lto_order_emitted) { + out->source_order_keep[i] = 1; + lto_order_emitted = 1; + } + st = driver_compile_cg_run(compiler, code, diagnostics, src, + pending_lto.cg); + if (st != KIT_OK) goto out; + continue; + } + + { + KitObjBuilder* ob = NULL; + st = driver_compile_run(compiler, src->lang, code, diagnostics, src->pp, + src->lang_extra, src->name, &src->bytes, NULL, + &ob); + if (st != KIT_OK) goto out; + out->source_obj_index[i] = out->nobjs; + out->source_order_keep[i] = 1; + out->objs[out->nobjs++] = ob; + } + } + + if (pending_lto.active) { + if (batch && batch->defer_lto_finish) { + if (!out->pending_lto) { + st = KIT_INVALID; + goto out; + } + *out->pending_lto = pending_lto; + memset(&pending_lto, 0, sizeof pending_lto); + } else { + st = driver_compile_pending_lto_finish(&pending_lto, batch, NULL, 0); + if (st != KIT_OK) goto out; + } + } + +out: + if (pending_lto.active) driver_compile_pending_lto_abort(&pending_lto); + return st; +} diff --git a/driver/lib/compile_engine.h b/driver/lib/compile_engine.h @@ -1,9 +1,11 @@ #ifndef KIT_DRIVER_COMPILE_ENGINE_H #define KIT_DRIVER_COMPILE_ENGINE_H +#include <kit/cg.h> #include <kit/compile.h> #include <kit/object.h> #include <kit/preprocess.h> +#include <stdint.h> /* Language-neutral "compile one source" step shared by `cc` and `compile`. * @@ -32,4 +34,59 @@ KitStatus driver_compile_run(KitCompiler* compiler, KitLanguage lang, const KitSlice* bytes, KitWriter* emit_out, KitObjBuilder** obj_out); +typedef struct DriverCompileSource { + KitLanguage lang; + KitSlice name; + KitSlice bytes; + const KitPreprocessOptions* pp; + const void* lang_extra; +} DriverCompileSource; + +typedef struct DriverCompileObjects { + /* Caller-allocated capacity for at least nsources objects. Filled compactly. + */ + KitObjBuilder** objs; + uint32_t nobjs; + /* Caller-allocated nsources-entry maps. source_obj_index[i] is the compact + * object index for source i. source_order_keep[i] is true only for the source + * position that should contribute an order/archive member; later semantic LTO + * sources map to the same object and have keep=false. */ + uint32_t* source_obj_index; + uint8_t* source_order_keep; + struct DriverCompilePendingLto* pending_lto; +} DriverCompileObjects; + +typedef struct DriverCompileBatchOptions { + uint8_t output_kind; /* KitCgOutputKind */ + uint8_t interposition_policy; /* KitCgInterpositionPolicy */ + uint8_t defer_lto_finish; + uint8_t pad[1]; +} DriverCompileBatchOptions; + +typedef struct DriverCompilePendingLto { + KitObjBuilder* obj; + KitCg* cg; + uint32_t obj_index; + uint8_t active; + uint8_t pad[3]; +} DriverCompilePendingLto; + +/* Compile a batch of sources for a link/archive/relocatable output. When + * code->lto is set, KIT_FRONTEND_LTO_CG frontends emit into one shared KitCg + * unit and KIT_FRONTEND_LTO_OPAQUE/NONE frontends still compile as ordinary + * per-source objects. */ +KitStatus driver_compile_sources_run(KitCompiler* compiler, + const KitCodeOptions* code, + const KitDiagnosticOptions* diagnostics, + const DriverCompileSource* sources, + uint32_t nsources, + const DriverCompileBatchOptions* batch, + DriverCompileObjects* out); + +KitStatus driver_compile_pending_lto_finish( + DriverCompilePendingLto* pending, const DriverCompileBatchOptions* batch, + const KitCgSym* preserved_symbols, uint32_t npreserved_symbols); + +void driver_compile_pending_lto_abort(DriverCompilePendingLto* pending); + #endif diff --git a/driver/lib/link_engine.c b/driver/lib/link_engine.c @@ -1,13 +1,41 @@ #include "link_engine.h" -KitStatus driver_link_engine_emit(KitCompiler* compiler, - const KitLinkSessionOptions* lopts, - const DriverLinkInputs* in, KitWriter* out) { - KitLinkSession* link = NULL; - KitStatus st; +#include <string.h> + +typedef struct DriverPreservedVec { + KitHeap* heap; + KitCgSym* syms; + uint32_t nsyms; + uint32_t cap; + int oom; +} DriverPreservedVec; + +static void driver_preserved_vec_add(void* user, KitCgSym sym) { + DriverPreservedVec* v = (DriverPreservedVec*)user; + KitCgSym* ns; + uint32_t ncap; + if (!v || v->oom) return; + if (v->nsyms == v->cap) { + ncap = v->cap ? v->cap * 2u : 32u; + ns = (KitCgSym*)v->heap->realloc( + v->heap, v->syms, sizeof(*v->syms) * v->cap, sizeof(*v->syms) * ncap, + _Alignof(KitCgSym)); + if (!ns) { + v->oom = 1; + return; + } + v->syms = ns; + v->cap = ncap; + } + v->syms[v->nsyms++] = sym; +} + +static KitStatus driver_link_engine_add_inputs(KitLinkSession* link, + const DriverLinkInputs* in) { + KitStatus st = KIT_OK; uint32_t i; + if (!link || !in) return KIT_INVALID; - st = kit_link_session_new(compiler, lopts, &link); for (i = 0; i < in->norder && st == KIT_OK; ++i) { const KitLinkInputOrder* ord = &in->order[i]; switch ((KitLinkInputOrderKind)ord->kind) { @@ -19,7 +47,8 @@ KitStatus driver_link_engine_emit(KitCompiler* compiler, &in->obj_bytes[ord->index]); break; case KIT_LINK_INPUT_ARCHIVE: - st = kit_link_session_add_archive_bytes(link, &in->archives[ord->index]); + st = + kit_link_session_add_archive_bytes(link, &in->archives[ord->index]); break; case KIT_LINK_INPUT_DSO: st = kit_link_session_add_dso_bytes(link, in->dso_names[ord->index], @@ -27,7 +56,48 @@ KitStatus driver_link_engine_emit(KitCompiler* compiler, break; } } + return st; +} + +KitStatus driver_link_engine_emit_with_lto( + KitCompiler* compiler, const KitLinkSessionOptions* lopts, + const DriverLinkInputs* in, DriverCompilePendingLto* pending_lto, + const DriverCompileBatchOptions* batch, KitWriter* out) { + KitLinkSession* link = NULL; + DriverPreservedVec preserved; + KitStatus st; + + if (!compiler || !lopts || !in || !out) { + if (pending_lto && pending_lto->active) + driver_compile_pending_lto_abort(pending_lto); + return KIT_INVALID; + } + memset(&preserved, 0, sizeof preserved); + preserved.heap = kit_compiler_context(compiler)->heap; + st = kit_link_session_new(compiler, lopts, &link); + if (st == KIT_OK) st = driver_link_engine_add_inputs(link, in); + if (st == KIT_OK && pending_lto && pending_lto->active) { + st = kit_link_session_visit_lto_preserved( + link, pending_lto->obj, pending_lto->cg, driver_preserved_vec_add, + &preserved); + if (st == KIT_OK && preserved.oom) st = KIT_NOMEM; + if (st == KIT_OK) { + st = driver_compile_pending_lto_finish(pending_lto, batch, preserved.syms, + preserved.nsyms); + } + } if (st == KIT_OK) st = kit_link_session_emit(link, out); kit_link_session_free(link); + if (preserved.syms) + preserved.heap->free(preserved.heap, preserved.syms, + sizeof(*preserved.syms) * preserved.cap); + if (st != KIT_OK && pending_lto && pending_lto->active) + driver_compile_pending_lto_abort(pending_lto); return st; } + +KitStatus driver_link_engine_emit(KitCompiler* compiler, + const KitLinkSessionOptions* lopts, + const DriverLinkInputs* in, KitWriter* out) { + return driver_link_engine_emit_with_lto(compiler, lopts, in, NULL, NULL, out); +} diff --git a/driver/lib/link_engine.h b/driver/lib/link_engine.h @@ -5,6 +5,8 @@ #include <kit/link.h> #include <kit/object.h> +#include "compile_engine.h" + /* Reusable "build a link session, add inputs in command-line order, and emit" * step shared by `cc` and the `build-*` commands. Every input is already * loaded/compiled by the caller; path lookup, option parsing, hosted/runtime @@ -43,4 +45,9 @@ KitStatus driver_link_engine_emit(KitCompiler* compiler, const KitLinkSessionOptions* lopts, const DriverLinkInputs* in, KitWriter* out); +KitStatus driver_link_engine_emit_with_lto( + KitCompiler* compiler, const KitLinkSessionOptions* lopts, + const DriverLinkInputs* in, DriverCompilePendingLto* pending_lto, + const DriverCompileBatchOptions* batch, KitWriter* out); + #endif diff --git a/include/kit/cg.h b/include/kit/cg.h @@ -387,9 +387,41 @@ KIT_API KitSym kit_cg_c_linkage_name(KitCompiler*, KitSym source_name); * ============================================================ */ KIT_API KitStatus kit_cg_new(KitCompiler*, KitCg** cg_out); -KIT_API KitStatus kit_cg_begin_obj(KitCg*, KitObjBuilder* out, - const KitCodeOptions*); -KIT_API KitStatus kit_cg_end_obj(KitCg*); + +typedef struct KitCgUnitOptions { + KitSlice source_name; /* diagnostic/provenance label; may be empty */ + uint32_t source_id; /* 0 means "unspecified" */ + uint32_t flags; /* reserved; must be 0 */ +} KitCgUnitOptions; + +typedef enum KitCgOutputKind { + KIT_CG_OUTPUT_RELOCATABLE = 0, + KIT_CG_OUTPUT_EXECUTABLE = 1, + KIT_CG_OUTPUT_SHARED = 2, + KIT_CG_OUTPUT_ARCHIVE_MEMBER = 3, +} KitCgOutputKind; + +typedef enum KitCgInterpositionPolicy { + KIT_CG_INTERPOSITION_DEFAULT = 0, + KIT_CG_INTERPOSITION_NONE = 1, + KIT_CG_INTERPOSITION_DEFAULT_VISIBILITY = 2, +} KitCgInterpositionPolicy; + +typedef struct KitCgFinishOptions { + uint8_t output_kind; /* KitCgOutputKind */ + uint8_t interposition_policy; /* KitCgInterpositionPolicy */ + uint8_t pad[2]; + const KitCgSym* preserved_symbols; + uint32_t npreserved_symbols; +} KitCgFinishOptions; + +KIT_API KitStatus kit_cg_begin(KitCg*, KitObjBuilder* out, + const KitCodeOptions*); +KIT_API KitStatus kit_cg_begin_unit(KitCg*, const KitCgUnitOptions*); +KIT_API KitStatus kit_cg_end_unit(KitCg*); +KIT_API KitStatus kit_cg_finish(KitCg*, const KitCgFinishOptions*); +KIT_API KitStatus kit_cg_detach(KitCg*); +KIT_API KitStatus kit_cg_abort(KitCg*); KIT_API void kit_cg_free(KitCg*); /* Sticky source location. Function, scope, local, param, instruction, and diff --git a/include/kit/compile.h b/include/kit/compile.h @@ -96,12 +96,22 @@ typedef struct KitFrontendCompileOptions { typedef struct KitFrontend KitFrontend; typedef struct KitFrontendState KitFrontendState; +typedef struct KitCg KitCg; + +typedef enum KitFrontendLtoMode { + KIT_FRONTEND_LTO_NONE = 0, + KIT_FRONTEND_LTO_CG = 1, + KIT_FRONTEND_LTO_OPAQUE = 2, +} KitFrontendLtoMode; typedef KitFrontendState* (*KitFrontendNewFn)(KitCompiler*); -typedef KitStatus (*KitFrontendCompileFn)(KitFrontendState*, - const KitFrontendCompileOptions*, - const KitSourceInput*, - KitObjBuilder* out); +typedef KitStatus (*KitFrontendCompileObjFn)(KitFrontendState*, + const KitFrontendCompileOptions*, + const KitSourceInput*, + KitObjBuilder* out); +typedef KitStatus (*KitFrontendCompileCgFn)(KitFrontendState*, + const KitFrontendCompileOptions*, + const KitSourceInput*, KitCg* cg); typedef void (*KitFrontendFreeFn)(KitFrontendState*); /* Transaction hooks for frontends with durable cross-compile state (REPL * declarations). See the `commit`/`abort` fields below. */ @@ -114,6 +124,7 @@ typedef void (*KitFrontendAbortFn)(KitFrontendState*); * (-I/-isystem/-D/-U) is accepted. */ typedef struct KitFrontendCaps { bool preprocessor; /* honors KitFrontendCompileOptions.preprocess */ + uint8_t lto_mode; /* KitFrontendLtoMode */ } KitFrontendCaps; /* Parse the frontend-specific command-line flags a generic driver did not @@ -128,7 +139,12 @@ typedef void (*KitFrontendFreeOptionsFn)(KitCompiler*, void* opts); typedef struct KitFrontendVTable { KitFrontendNewFn new_frontend; - KitFrontendCompileFn compile; + /* Semantic frontends emit into a caller-owned open KitCg. Object-only + * frontends, such as asm, emit directly into a KitObjBuilder. Exactly one is + * required by caps.lto_mode: KIT_FRONTEND_LTO_CG requires compile_cg; + * KIT_FRONTEND_LTO_OPAQUE/NONE require compile_obj. */ + KitFrontendCompileCgFn compile_cg; + KitFrontendCompileObjFn compile_obj; KitFrontendFreeFn free_frontend; /* Counted list of lowercase file extensions (no leading dot) that this @@ -201,6 +217,13 @@ KIT_API KitStatus kit_compile_session_new(KitCompiler*, KIT_API KitStatus kit_compile_session_compile(KitCompileSession*, const KitSourceInput*, KitObjBuilder** out); +/* Compile into an already-open semantic codegen session and auto-commit on + * success. Valid only for frontends whose caps.lto_mode is + * KIT_FRONTEND_LTO_CG. The caller owns kit_cg_finish / kit_cg_detach after all + * staged inputs have been compiled, then owns finalizing the ObjBuilder. */ +KIT_API KitStatus kit_compile_session_compile_cg(KitCompileSession*, + const KitSourceInput*, + KitCg* cg); /* Compile but leave the frontend transaction OPEN on success, so the caller * can link and publish the resulting object before deciding the outcome. * Follow a successful stage with exactly one of kit_compile_session_commit diff --git a/include/kit/core.h b/include/kit/core.h @@ -250,6 +250,11 @@ typedef struct KitCodeOptions { * per-symbol section when no explicit frontend section was requested. */ bool function_sections; bool data_sections; + /* Cross-translation-unit LTO. Drivers that have all sources up front use + * this to stage semantic frontends into one KitCg session and finalize once. + * Separate compilation still emits an ordinary object until serialized IR + * objects exist. */ + bool lto; uint64_t epoch; /* reproducible timestamp seed; 0 means no timestamp */ const KitPathPrefixMap* path_map; uint32_t npath_map; diff --git a/include/kit/link.h b/include/kit/link.h @@ -1,6 +1,7 @@ #ifndef KIT_LINK_H #define KIT_LINK_H +#include <kit/cg.h> #include <kit/core.h> #include <kit/object.h> @@ -200,6 +201,10 @@ KIT_API KitStatus kit_link_session_add_archive_bytes(KitLinkSession*, const KitLinkArchiveInput*); KIT_API KitStatus kit_link_session_add_dso_bytes(KitLinkSession*, KitSlice name, const KitSlice*); +typedef void (*KitLinkLtoPreservedCallback)(void* user, KitCgSym sym); +KIT_API KitStatus kit_link_session_visit_lto_preserved( + KitLinkSession*, KitObjBuilder* lto_obj, KitCg* lto_cg, + KitLinkLtoPreservedCallback cb, void* user); KIT_API KitStatus kit_link_session_resolve(KitLinkSession*); KIT_API KitStatus kit_link_session_emit(KitLinkSession*, KitWriter* out); KIT_API KitStatus kit_link_session_jit(KitLinkSession*, KitJit** out_jit); diff --git a/lang/c/c.c b/lang/c/c.c @@ -52,10 +52,9 @@ static KitFrontendState* c_frontend_new(KitCompiler* c) { return (KitFrontendState*)fe; } -static KitStatus c_frontend_compile(KitFrontendState* frontend, - const KitFrontendCompileOptions* fe_opts, - const KitSourceInput* input, - KitObjBuilder* out) { +static KitStatus c_frontend_compile_cg(KitFrontendState* frontend, + const KitFrontendCompileOptions* fe_opts, + const KitSourceInput* input, KitCg* cg) { CFrontend* fe = (CFrontend*)frontend; KitCompiler* c; /* Code, diagnostics, and preprocessor settings all arrive on the common @@ -65,12 +64,10 @@ static KitStatus c_frontend_compile(KitFrontendState* frontend, Lexer* lex; Pp* pp; DeclTable* decls; - KitCg* cg; - KitStatus cg_st; if (!fe || !fe->c) return KIT_INVALID; c = fe->c; - if (!fe_opts || !input) c_bad_options(c, "compile args missing"); + if (!fe_opts || !input || !cg) c_bad_options(c, "compile args missing"); bytes = &input->bytes; kit_frontend_metrics_scope_begin(c, "compile.c.setup"); @@ -85,12 +82,7 @@ static KitStatus c_frontend_compile(KitFrontendState* frontend, kit_frontend_metrics_scope_begin(c, "compile.c.pp_new"); pp = pp_new(c); kit_frontend_metrics_scope_end(c, "compile.c.pp_new"); - kit_frontend_metrics_scope_begin(c, "compile.c.cg_new"); - cg = NULL; - cg_st = kit_cg_new(c, &cg); - if (cg_st == KIT_OK) cg_st = kit_cg_begin_obj(cg, out, &fe_opts->code); - kit_frontend_metrics_scope_end(c, "compile.c.cg_new"); - if (!lex || !pp || cg_st != KIT_OK || !cg) + if (!lex || !pp || !cg) compiler_panic(c, c_no_loc(), "C compiler out of memory"); kit_frontend_metrics_scope_begin(c, "compile.c.decl_new"); decls = decl_new(c, pool, cg); @@ -117,7 +109,6 @@ static KitStatus c_frontend_compile(KitFrontendState* frontend, kit_frontend_metrics_scope_end(c, "compile.c.parse_codegen"); kit_frontend_metrics_scope_begin(c, "compile.c.cleanup"); - kit_cg_free(cg); decl_free(decls); pp_free(pp); c_pool_free(pool); @@ -140,14 +131,15 @@ static const KitSlice c_extensions[] = {KIT_SLICE_LIT("c"), KIT_SLICE_LIT("h")}; const KitFrontendVTable kit_c_frontend_vtable = { c_frontend_new, - c_frontend_compile, + c_frontend_compile_cg, + NULL, /* compile_obj: semantic frontends are wrapped by compile session */ c_frontend_free, c_extensions, (uint32_t)(sizeof c_extensions / sizeof c_extensions[0]), /* commit/abort: C has no durable cross-compile state yet */ NULL, NULL, - {true}, /* caps: C honors the common preprocess options (-I/-D/-U/...) */ - NULL, /* parse_options: C has no frontend-specific flags */ - NULL, /* free_options */ + {true, KIT_FRONTEND_LTO_CG}, + NULL, /* parse_options: C has no frontend-specific flags */ + NULL, /* free_options */ }; diff --git a/lang/toy/compile.c b/lang/toy/compile.c @@ -134,21 +134,20 @@ static KitFrontendState* toy_frontend_new(KitCompiler* c) { return (KitFrontendState*)fe; } -static KitStatus toy_frontend_compile(KitFrontendState* frontend, - const KitFrontendCompileOptions* opts, - const KitSourceInput* input, - KitObjBuilder* out) { +static KitStatus toy_frontend_compile_cg(KitFrontendState* frontend, + const KitFrontendCompileOptions* opts, + const KitSourceInput* input, + KitCg* cg) { ToyFrontend* fe = (ToyFrontend*)frontend; KitCompiler* c; ToyParser* p; - KitCg* cg; const uint8_t* source; size_t source_len; - KitStatus st; + KitStatus st = KIT_OK; char* owned_source = NULL; size_t owned_source_cap = 0; - if (!fe || !fe->c || !opts || !input || !out) return KIT_INVALID; + if (!fe || !fe->c || !opts || !input || !cg) return KIT_INVALID; c = fe->c; (void)opts->language_options; /* toy frontend has no per-language options */ @@ -157,10 +156,6 @@ static KitStatus toy_frontend_compile(KitFrontendState* frontend, return KIT_ERR; } - st = kit_cg_new(c, &cg); - if (st == KIT_OK) st = kit_cg_begin_obj(cg, out, &opts->code); - if (st != KIT_OK) goto done_status; - if (!fe->parser_live) { toy_parser_init(&fe->parser, c, cg, &fe->module, source, source_len, input->name.s); @@ -184,24 +179,20 @@ static KitStatus toy_frontend_compile(KitFrontendState* frontend, toy_txn_begin(p); if (opts->input_kind != KIT_FRONTEND_INPUT_TRANSLATION_UNIT && !toy_seed_repl_symbols(p)) { - kit_cg_free(cg); st = KIT_ERR; goto done_status; } if (!toy_parse_program(p) || p->has_error) { - kit_cg_free(cg); st = KIT_ERR; goto done_status; } if (p->cur.kind != TOK_EOF) { toy_error(p, p->cur.loc, "unexpected token after program end"); - kit_cg_free(cg); st = KIT_ERR; goto done_status; } - kit_cg_free(cg); st = KIT_OK; done_status: @@ -238,13 +229,14 @@ static const KitSlice toy_extensions[] = {KIT_SLICE_LIT("toy")}; const KitFrontendVTable kit_toy_frontend_vtable = { toy_frontend_new, - toy_frontend_compile, + toy_frontend_compile_cg, + NULL, /* compile_obj: semantic frontends are wrapped by compile session */ toy_frontend_free, toy_extensions, (uint32_t)(sizeof toy_extensions / sizeof toy_extensions[0]), toy_frontend_commit, toy_frontend_abort, - {false}, /* caps: toy has no preprocessor */ - NULL, /* parse_options: no toy-specific flags yet */ - NULL, /* free_options */ + {false, KIT_FRONTEND_LTO_CG}, + NULL, /* parse_options: no toy-specific flags yet */ + NULL, /* free_options */ }; diff --git a/lang/wasm/cg.c b/lang/wasm/cg.c @@ -248,8 +248,8 @@ static uint64_t wasm_cg_field_offset(KitCompiler* c, KitCgTypeId ty, return off; } -static uint32_t wasm_cg_checked_add_u32(KitCompiler* c, uint32_t a, - uint32_t b, KitSrcLoc loc) { +static uint32_t wasm_cg_checked_add_u32(KitCompiler* c, uint32_t a, uint32_t b, + KitSrcLoc loc) { if (UINT32_MAX - a < b) wasm_error(c, loc, "wasm: module layout is too large"); return a + b; @@ -291,14 +291,14 @@ static void wasm_cg_build_runtime(KitCompiler* c, KitCgBuiltinTypes b, wasm_cg_checked_add_u32(c, instance_cap, m->nglobals, wasm_loc(0, 0)); if (m->ntables > UINT32_MAX / 2u) wasm_error(c, wasm_loc(0, 0), "wasm: module layout is too large"); - instance_cap = wasm_cg_checked_add_u32(c, instance_cap, 2u * m->ntables, - wasm_loc(0, 0)); + instance_cap = + wasm_cg_checked_add_u32(c, instance_cap, 2u * m->ntables, wasm_loc(0, 0)); instance_cap = wasm_cg_checked_add_u32(c, instance_cap, m->ndata, wasm_loc(0, 0)); if (m->nelems > UINT32_MAX / 2u) wasm_error(c, wasm_loc(0, 0), "wasm: module layout is too large"); - instance_cap = wasm_cg_checked_add_u32(c, instance_cap, 2u * m->nelems, - wasm_loc(0, 0)); + instance_cap = + wasm_cg_checked_add_u32(c, instance_cap, 2u * m->nelems, wasm_loc(0, 0)); instance_fields = instance_cap ? kit_arena_zarray(arena, KitCgField, instance_cap) : NULL; @@ -1109,8 +1109,7 @@ static void wasm_cg_emit_host_import_metadata(KitCompiler* c, KitCg* cg, uint32_t nglobal_imports = 0; uint32_t desc_size = 2u * ptr_size + 16u; uint32_t type_desc_size = - 2u * ptr_size + 8u + - (ptr_align > 4u ? 2u * (ptr_align - 4u) : 0u); + 2u * ptr_size + 8u + (ptr_align > 4u ? 2u * (ptr_align - 4u) : 0u); KitCgSym nimports_sym, imports_sym; KitCgDecl decl; KitCgDataDefAttrs data_attrs; @@ -1175,21 +1174,21 @@ static void wasm_cg_emit_host_import_metadata(KitCompiler* c, KitCg* cg, } WasmImportEmit; WasmImportEmit* descs = kit_arena_zarray(arena, WasmImportEmit, nimports); KitWasmMemoryImportDesc* memory_descs = - nmemory_imports ? kit_arena_zarray(arena, KitWasmMemoryImportDesc, - nmemory_imports) - : NULL; + nmemory_imports + ? kit_arena_zarray(arena, KitWasmMemoryImportDesc, nmemory_imports) + : NULL; KitWasmTableImportDesc* table_descs = ntable_imports ? kit_arena_zarray(arena, KitWasmTableImportDesc, ntable_imports) : NULL; KitWasmGlobalImportDesc* global_descs = - nglobal_imports ? kit_arena_zarray(arena, KitWasmGlobalImportDesc, - nglobal_imports) - : NULL; - uint32_t* type_remap = nfunc_imports - ? kit_arena_zarray(arena, uint32_t, - m->ntypes ? m->ntypes : 1u) - : NULL; + nglobal_imports + ? kit_arena_zarray(arena, KitWasmGlobalImportDesc, nglobal_imports) + : NULL; + uint32_t* type_remap = + nfunc_imports + ? kit_arena_zarray(arena, uint32_t, m->ntypes ? m->ntypes : 1u) + : NULL; uint32_t* local_to_module = nfunc_imports ? kit_arena_zarray(arena, uint32_t, nfunc_imports) : NULL; uint32_t ntypes = 0; @@ -1235,9 +1234,8 @@ static void wasm_cg_emit_host_import_metadata(KitCompiler* c, KitCg* cg, max_pages = mem->has_max ? mem->max_pages : mem->min_pages; flags = (mem->shared ? KIT_WASM_MEMORY_SHARED : 0u) | (mem->is64 ? KIT_WASM_MEMORY_64 : 0u); - descs[d].module_sym = - wasm_cg_intern_cstr(cg, b, - mem->import_module ? mem->import_module : ""); + descs[d].module_sym = wasm_cg_intern_cstr( + cg, b, mem->import_module ? mem->import_module : ""); descs[d].field_sym = wasm_cg_intern_cstr(cg, b, mem->import_name ? mem->import_name : ""); descs[d].kind = KIT_WASM_IMPORT_MEMORY; @@ -1307,8 +1305,8 @@ static void wasm_cg_emit_host_import_metadata(KitCompiler* c, KitCg* cg, KitCgSym types_sym; KitCgSym* param_syms = kit_arena_zarray(arena, KitCgSym, ntypes); KitCgSym* result_syms = kit_arena_zarray(arena, KitCgSym, ntypes); - KitCgTypeId types_array_ty = kit_cg_type_array( - c, u8_ty, (uint64_t)type_desc_size * ntypes); + KitCgTypeId types_array_ty = + kit_cg_type_array(c, u8_ty, (uint64_t)type_desc_size * ntypes); for (uint32_t k = 0; k < ntypes; ++k) { const WasmFuncType* t = &m->types[local_to_module[k]]; uint8_t* pbuf = @@ -1386,8 +1384,7 @@ static void wasm_cg_emit_host_import_metadata(KitCompiler* c, KitCg* cg, memset(&decl, 0, sizeof decl); decl.kind = KIT_CG_DECL_OBJECT; decl.linkage_name = kit_cg_c_linkage_name( - c, kit_sym_intern(c, - KIT_SLICE_LIT("__kit_wasm_memory_import_types"))); + c, kit_sym_intern(c, KIT_SLICE_LIT("__kit_wasm_memory_import_types"))); decl.display_name = decl.linkage_name; decl.type = memory_array_ty; decl.sym.bind = KIT_SB_GLOBAL; @@ -1583,9 +1580,8 @@ static void wasm_cg_emit_runtime_layout_metadata(KitCompiler* c, KitCg* cg, for (uint32_t i = 0; i < m->nmemories; ++i) { const WasmMemory* mem = &m->memories[i]; uint64_t max_pages = mem->has_max ? mem->max_pages : mem->min_pages; - uint32_t flags = - (mem->shared ? KIT_WASM_MEMORY_SHARED : 0u) | - (mem->is64 ? KIT_WASM_MEMORY_64 : 0u); + uint32_t flags = (mem->shared ? KIT_WASM_MEMORY_SHARED : 0u) | + (mem->is64 ? KIT_WASM_MEMORY_64 : 0u); kit_cg_data_align(cg, 8); kit_cg_data_int(cg, rt->memory_offset[i], u64_ty); kit_cg_data_int(cg, mem->min_pages, u64_ty); @@ -1977,14 +1973,10 @@ static void wasm_cg_emit_table_copy_loop( kit_cg_label_place(cg, done); } -static void wasm_cg_cache_funcref_entry(KitCompiler* c, KitCg* cg, - KitCgBuiltinTypes b, - const WasmCgRuntime* rt, - KitCgLocal ref_local, - KitCgLocal fn_local, - KitCgLocal typeidx_local, - KitCgMemAccess ref_mem, - KitCgMemAccess i32_mem) { +static void wasm_cg_cache_funcref_entry( + KitCompiler* c, KitCg* cg, KitCgBuiltinTypes b, const WasmCgRuntime* rt, + KitCgLocal ref_local, KitCgLocal fn_local, KitCgLocal typeidx_local, + KitCgMemAccess ref_mem, KitCgMemAccess i32_mem) { KitCgLabel is_null = kit_cg_label_new(cg); KitCgLabel done = kit_cg_label_new(cg); kit_cg_push_local(cg, ref_local); @@ -2022,18 +2014,12 @@ static void wasm_cg_cache_funcref_entry(KitCompiler* c, KitCg* cg, kit_cg_label_place(cg, done); } -void wasm_emit_cg(KitCompiler* c, const KitCodeOptions* code_opts, - KitObjBuilder* out, const WasmModule* m) { - KitCg* cg = NULL; - KitStatus cg_st = kit_cg_new(c, &cg); - if (cg_st == KIT_OK) cg_st = kit_cg_begin_obj(cg, out, code_opts); - if (cg_st != KIT_OK) - wasm_error(c, wasm_loc(0, 0), "wasm: failed to initialize codegen"); +void wasm_emit_cg_into(KitCompiler* c, KitCg* cg, const WasmModule* m) { KitCgBuiltinTypes b = kit_cg_builtin_types(c); WasmCgRuntime rt; /* A KitArena owns transient frontend-side codegen state — sym tables, func * types, per-function local arrays, instance-record field tables, call - * argument arrays. Lives for the duration of wasm_emit_cg; no fixed cap + * argument arrays. Lives for the duration of wasm_emit_cg_into; no fixed cap * on functions, params, locals, or instance fields. */ KitArena* arena = NULL; KitCgSym init_sym = KIT_CG_SYM_NONE; @@ -4003,6 +3989,22 @@ void wasm_emit_cg(KitCompiler* c, const KitCodeOptions* code_opts, kit_cg_func_end(cg); heap->free(heap, control, sizeof(WasmCgControl) * control_cap); } - kit_cg_free(cg); kit_arena_free(arena); } + +void wasm_emit_cg(KitCompiler* c, const KitCodeOptions* code_opts, + KitObjBuilder* out, const WasmModule* m) { + KitCg* cg = NULL; + KitCgUnitOptions unit_opts; + KitStatus cg_st = kit_cg_new(c, &cg); + if (cg_st == KIT_OK) cg_st = kit_cg_begin(cg, out, code_opts); + memset(&unit_opts, 0, sizeof unit_opts); + if (cg_st == KIT_OK) cg_st = kit_cg_begin_unit(cg, &unit_opts); + if (cg_st != KIT_OK || !cg) + wasm_error(c, wasm_loc(0, 0), "wasm: failed to initialize codegen"); + wasm_emit_cg_into(c, cg, m); + if (kit_cg_end_unit(cg) != KIT_OK || kit_cg_finish(cg, NULL) != KIT_OK || + kit_cg_detach(cg) != KIT_OK) + wasm_error(c, wasm_loc(0, 0), "wasm: failed to finalize codegen"); + kit_cg_free(cg); +} diff --git a/lang/wasm/wasm.c b/lang/wasm/wasm.c @@ -1,14 +1,14 @@ +#include "wasm/wasm.h" + #include <stdarg.h> #include <string.h> -#include "wasm/wasm.h" - /* Every KitWasmFeature bit — the frontend's default when no -mfeature flags * narrow it (and what wasm_module_init seeds, kept in sync). */ -#define WASM_FEATURES_ALL \ - (KIT_WASM_FEATURE_THREADS | KIT_WASM_FEATURE_TYPED_FUNC_REFS | \ - KIT_WASM_FEATURE_TAIL_CALLS | KIT_WASM_FEATURE_MULTI_MEMORY | \ - KIT_WASM_FEATURE_MEMORY64 | KIT_WASM_FEATURE_BULK_MEMORY | \ +#define WASM_FEATURES_ALL \ + (KIT_WASM_FEATURE_THREADS | KIT_WASM_FEATURE_TYPED_FUNC_REFS | \ + KIT_WASM_FEATURE_TAIL_CALLS | KIT_WASM_FEATURE_MULTI_MEMORY | \ + KIT_WASM_FEATURE_MEMORY64 | KIT_WASM_FEATURE_BULK_MEMORY | \ KIT_WASM_FEATURE_NONTRAPPING_FTOI) static void wasm_parse_any(KitCompiler* c, KitSlice name, const KitSlice* input, @@ -120,15 +120,15 @@ static void wasm_free_options(KitCompiler* c, void* opts) { if (opts) h->free(h, opts, sizeof(KitWasmCompileOptions)); } -static KitStatus wasm_frontend_compile(KitFrontendState* frontend, - const KitFrontendCompileOptions* opts, - const KitSourceInput* input, - KitObjBuilder* out) { +static KitStatus wasm_frontend_compile_cg(KitFrontendState* frontend, + const KitFrontendCompileOptions* opts, + const KitSourceInput* input, + KitCg* cg) { WasmFrontend* fe = (WasmFrontend*)frontend; KitCompiler* c; WasmModule m; const KitWasmCompileOptions* wopts; - if (!fe || !fe->c || !opts || !input || !out) return KIT_INVALID; + if (!fe || !fe->c || !opts || !input || !cg) return KIT_INVALID; c = fe->c; wopts = (const KitWasmCompileOptions*)opts->language_options; wasm_module_init(&m, kit_compiler_context(c)->heap); @@ -136,7 +136,7 @@ static KitStatus wasm_frontend_compile(KitFrontendState* frontend, * supplied parsed options. NULL keeps the default (run/dbg/cc paths). */ if (wopts) m.features = wopts->features; wasm_parse_any(c, input->name, &input->bytes, &m); - wasm_emit_cg(c, &opts->code, out, &m); + wasm_emit_cg_into(c, cg, &m); wasm_module_free(&m); return KIT_OK; } @@ -154,13 +154,14 @@ static const KitSlice wasm_extensions[] = {KIT_SLICE_LIT("wat"), const KitFrontendVTable kit_wasm_frontend_vtable = { wasm_frontend_new, - wasm_frontend_compile, + wasm_frontend_compile_cg, + NULL, /* compile_obj: semantic frontends are wrapped by compile session */ wasm_frontend_free, wasm_extensions, (uint32_t)(sizeof wasm_extensions / sizeof wasm_extensions[0]), - NULL, /* commit: wasm has no durable cross-compile state */ - NULL, /* abort */ - {false}, /* caps: wasm has no preprocessor */ + NULL, /* commit: wasm has no durable cross-compile state */ + NULL, /* abort */ + {false, KIT_FRONTEND_LTO_CG}, wasm_parse_options, wasm_free_options, }; diff --git a/mk/test.mk b/mk/test.mk @@ -738,7 +738,7 @@ test-macho: lib $(TEST_RT_DEP) $(ROUNDTRIP_BIN_MACHO) $(LINK_EXE_RUNNER) $(JIT_R OPT_TEST_BIN = build/test/cg_ir_lower_test TINY_INLINE_TEST_BIN = build/test/tiny_inline_test -test-opt: bin $(OPT_TEST_BIN) test-opt-tiny-inline test-opt-inline test-opt-zero-arg test-opt-static-prune-aa64 test-opt-aa64-tail test-opt-prologue-tier +test-opt: bin $(OPT_TEST_BIN) test-opt-tiny-inline test-opt-inline test-opt-zero-arg test-opt-static-prune-aa64 test-opt-aa64-tail test-opt-prologue-tier test-opt-whole-program-inline test-opt-lto-phase1 $(OPT_TEST_BIN) @@ -769,6 +769,16 @@ test-opt-aa64-tail: bin test-opt-prologue-tier: bin @KIT=$(abspath $(BIN)) bash test/opt/prologue_tier.sh +# Whole-program (LTO Phase 0) cross-function inlining: a small static callee +# fuses into its caller at -O1 on every arch, and opt_inline actually fires. +.PHONY: test-opt-whole-program-inline +test-opt-whole-program-inline: bin + @KIT=$(abspath $(BIN)) bash test/opt/whole_program_inline.sh + +.PHONY: test-opt-lto-phase1 +test-opt-lto-phase1: bin + @KIT=$(abspath $(BIN)) bash test/opt/lto_phase1.sh + test-parse: test-parse-ok test-parse-err test-parse-ok: lib $(TEST_RT_DEP) $(PARSE_RUNNER) $(ROUNDTRIP_BIN) $(LINK_EXE_RUNNER) $(JIT_RUNNER) diff --git a/src/abi/abi.h b/src/abi/abi.h @@ -119,6 +119,12 @@ typedef struct ABIFuncInfo { u16 nparams; u8 variadic; u8 has_sret; + /* True when the sret (indirect-result) pointer is passed in the first + * integer argument register and therefore consumes that arg slot — SysV-x64 + * (rdi), Win64 (rcx), RISC-V (a0). ABIs that return it in a dedicated + * register (AArch64 x8) leave this 0. Lets generic code reason about arg-slot + * consumption from the ABI descriptor instead of by arch identity. */ + u8 sret_consumes_int_arg; /* True when the trailing `...` portion of a variadic call must be * routed to the stack exclusively, bypassing the GPR/FPR arg pools. * Apple ARM64 sets this; AAPCS64 / SysV-x64 leave it 0 (variadics diff --git a/src/abi/abi_aapcs64.c b/src/abi/abi_aapcs64.c @@ -125,6 +125,10 @@ ABIFuncInfo* aapcs64_compute_func_info(TargetABI* a, KitCgTypeId fn) { classify_one(a, cg_func_ret_type(fnty), &info->ret, /*is_return=*/1); info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0; + /* AArch64 returns the sret pointer in the dedicated x8 register, so it never + * consumes an x0..x7 argument slot. (memset above already cleared the field; + * set explicitly for documentation.) */ + info->sret_consumes_int_arg = 0; info->variadic = fnty->func.abi_variadic; info->nparams = (u16)fnty->func.nparams; diff --git a/src/abi/abi_rv64.c b/src/abi/abi_rv64.c @@ -305,6 +305,9 @@ static ABIFuncInfo* riscv_compute_func_info(TargetABI* a, KitCgTypeId fn) { classify_one(a, cg_func_ret_type(fnty), &info->ret, /*is_return=*/1); info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0; + /* RISC-V passes the sret pointer in a0 (the first integer arg register), + * consuming that slot. */ + info->sret_consumes_int_arg = info->has_sret; info->variadic = fnty->func.abi_variadic; info->nparams = (u16)fnty->func.nparams; diff --git a/src/abi/abi_sysv_x64.c b/src/abi/abi_sysv_x64.c @@ -240,6 +240,9 @@ static ABIFuncInfo* sysv_x64_compute_func_info(TargetABI* a, KitCgTypeId fn) { classify_one(a, cg_func_ret_type(fnty), &info->ret, /*is_return=*/1); info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0; + /* SysV-x64 passes the sret pointer in rdi (the first integer arg register), + * consuming that slot. */ + info->sret_consumes_int_arg = info->has_sret; info->variadic = fnty->func.abi_variadic; info->nparams = (u16)fnty->func.nparams; diff --git a/src/abi/abi_win64_x64.c b/src/abi/abi_win64_x64.c @@ -155,6 +155,9 @@ static ABIFuncInfo* win64_x64_compute_func_info(TargetABI* a, KitCgTypeId fn) { classify_one(a, cg_func_ret_type(fnty), &info->ret, /*is_return=*/1); info->has_sret = (info->ret.kind == ABI_ARG_INDIRECT) ? 1 : 0; + /* Win64 passes the sret pointer in rcx (the first integer arg register), + * consuming that slot. */ + info->sret_consumes_int_arg = info->has_sret; info->variadic = fnty->func.abi_variadic; info->nparams = (u16)fnty->func.nparams; diff --git a/src/api/compile.c b/src/api/compile.c @@ -2,6 +2,7 @@ * that drive the C, asm, and registered-frontend paths. */ #include <kit/compile.h> +#include <kit/cg.h> #include <kit/core.h> #include <string.h> @@ -43,15 +44,16 @@ static const KitSlice asm_extensions[] = {KIT_SLICE_LIT("s")}; const KitFrontendVTable kit_asm_frontend_vtable = { asm_frontend_new, + NULL, /* compile_cg: asm participates in LTO as an opaque object */ asm_frontend_compile, asm_frontend_free, asm_extensions, (uint32_t)(sizeof asm_extensions / sizeof asm_extensions[0]), - NULL, /* commit: asm has no durable cross-compile state */ - NULL, /* abort */ - {false}, /* caps: raw asm, no preprocessor (.S cpp is a driver concern) */ - NULL, /* parse_options: no asm-specific flags */ - NULL, /* free_options */ + NULL, /* commit: asm has no durable cross-compile state */ + NULL, /* abort */ + {false, KIT_FRONTEND_LTO_OPAQUE}, + NULL, /* parse_options: no asm-specific flags */ + NULL, /* free_options */ }; static _Noreturn void panic_bad_options(Compiler* c, const char* msg) { @@ -117,9 +119,17 @@ KitStatus kit_register_frontend(KitCompiler* c, KitLanguage lang, const KitFrontendVTable* vtable) { if (!c) return KIT_INVALID; if ((unsigned)lang >= KIT_LANG_COUNT) return KIT_INVALID; - if (vtable && - (!vtable->new_frontend || !vtable->compile || !vtable->free_frontend)) { - return KIT_INVALID; + if (vtable) { + uint8_t mode = vtable->caps.lto_mode; + if (!vtable->new_frontend || !vtable->free_frontend || + mode > KIT_FRONTEND_LTO_OPAQUE) { + return KIT_INVALID; + } + if (mode == KIT_FRONTEND_LTO_CG) { + if (!vtable->compile_cg) return KIT_INVALID; + } else if (!vtable->compile_obj) { + return KIT_INVALID; + } } c->frontends[lang] = vtable; return KIT_OK; @@ -177,8 +187,8 @@ static const KitFrontendVTable* frontend_for_language(Compiler* c, return c->frontends[lang]; } -static void validate_bytes(Compiler* c, const KitSourceInput* in); -static KitStatus compile_frontend_state_into( +static KitStatus compile_obj_finalize(Compiler* c, ObjBuilder* ob); +static KitStatus compile_frontend_state_obj_into( Compiler* c, const KitFrontendVTable* vtable, KitFrontendState* frontend, const KitFrontendCompileOptions* opts, const KitSourceInput* input, ObjBuilder* ob); @@ -224,10 +234,10 @@ static void kit_frontend_abort(KitFrontend* frontend) { } } -static KitStatus kit_frontend_compile(KitFrontend* frontend, - const KitFrontendCompileOptions* opts, - const KitSourceInput* input, - KitObjBuilder* out) { +static KitStatus kit_frontend_compile_obj(KitFrontend* frontend, + const KitFrontendCompileOptions* opts, + const KitSourceInput* input, + KitObjBuilder* out) { Compiler* c; PanicSave saved; KitStatus st; @@ -237,6 +247,7 @@ static KitStatus kit_frontend_compile(KitFrontend* frontend, return KIT_INVALID; } if (input->lang != frontend->lang) return KIT_INVALID; + if (!frontend->vtable->compile_obj) return KIT_UNSUPPORTED; c = (Compiler*)frontend->c; compiler_panic_save(c, &saved); if (setjmp(c->panic)) { @@ -252,8 +263,8 @@ static KitStatus kit_frontend_compile(KitFrontend* frontend, validate_bytes(c, input); metrics_scope_begin(c, "compile.tu"); metrics_count(c, "compile.input_bytes", (u64)input->bytes.len); - st = compile_frontend_state_into(c, frontend->vtable, frontend->state, opts, - input, (ObjBuilder*)out); + st = compile_frontend_state_obj_into(c, frontend->vtable, frontend->state, + opts, input, (ObjBuilder*)out); metrics_scope_end(c, "compile.tu"); /* On a soft diagnostic failure, roll back the staged transaction here so the * frontend is left exactly as it was before this compile. On success the @@ -263,6 +274,43 @@ static KitStatus kit_frontend_compile(KitFrontend* frontend, return st; } +static KitStatus kit_frontend_compile_cg(KitFrontend* frontend, + const KitFrontendCompileOptions* opts, + const KitSourceInput* input, + KitCg* cg) { + Compiler* c; + PanicSave saved; + KitStatus st; + + if (!frontend || !frontend->c || !frontend->vtable || !frontend->state || + !opts || !input || !cg) { + return KIT_INVALID; + } + if (input->lang != frontend->lang) return KIT_INVALID; + if (frontend->vtable->caps.lto_mode != KIT_FRONTEND_LTO_CG || + !frontend->vtable->compile_cg) { + return KIT_UNSUPPORTED; + } + c = (Compiler*)frontend->c; + compiler_panic_save(c, &saved); + if (setjmp(c->panic)) { + compiler_run_cleanups(c); + kit_frontend_abort(frontend); + compiler_panic_restore(c, &saved); + return KIT_ERR; + } + validate_bytes(c, input); + metrics_scope_begin(c, "compile.tu"); + metrics_count(c, "compile.input_bytes", (u64)input->bytes.len); + metrics_scope_begin(c, "compile.frontend"); + st = frontend->vtable->compile_cg(frontend->state, opts, input, cg); + metrics_scope_end(c, "compile.frontend"); + metrics_scope_end(c, "compile.tu"); + if (st != KIT_OK) kit_frontend_abort(frontend); + compiler_panic_restore(c, &saved); + return st; +} + static void kit_frontend_free(KitFrontend* frontend) { Heap* h; if (!frontend) return; @@ -301,10 +349,12 @@ KitStatus kit_compile_session_new(KitCompiler* c, return KIT_OK; } -/* Shared compile path. On failure the frontend transaction has already been - * rolled back by kit_frontend_compile and *out is NULL. On success, when - * commit_on_success is set (the batch path), the transaction is committed - * before returning; otherwise it is left open for the caller to resolve. */ +/* Shared object-producing compile path. Opaque frontends compile directly into + * the object builder; semantic frontends use the same borrowed KitCg lifecycle + * as LTO with a single source unit. On failure the frontend transaction is + * rolled back and *out is NULL. On success, when commit_on_success is set (the + * batch path), the transaction is committed before returning; otherwise it is + * left open for the caller to resolve. */ static KitStatus compile_session_run(KitCompileSession* s, const KitSourceInput* input, KitObjBuilder** out, @@ -322,8 +372,27 @@ static KitStatus compile_session_run(KitCompileSession* s, opts = s->opts; opts.input_kind = input->input_kind; opts.repl_entry_name = input->repl_entry_name; - st = kit_frontend_compile(s->frontend, &opts, input, (KitObjBuilder*)ob); + if (s->frontend->vtable->caps.lto_mode == KIT_FRONTEND_LTO_CG) { + KitCg* cg = NULL; + KitCgUnitOptions uopts; + st = kit_cg_new(s->c, &cg); + if (st == KIT_OK) st = kit_cg_begin(cg, (KitObjBuilder*)ob, &opts.code); + memset(&uopts, 0, sizeof uopts); + uopts.source_name = input->name; + if (st == KIT_OK) st = kit_cg_begin_unit(cg, &uopts); + if (st == KIT_OK) + st = kit_frontend_compile_cg(s->frontend, &opts, input, cg); + if (st == KIT_OK) st = kit_cg_end_unit(cg); + if (st == KIT_OK) st = kit_cg_finish(cg, NULL); + if (st == KIT_OK) st = kit_cg_detach(cg); + kit_cg_free(cg); + if (st == KIT_OK) st = compile_obj_finalize((Compiler*)s->c, ob); + } else { + st = kit_frontend_compile_obj(s->frontend, &opts, input, + (KitObjBuilder*)ob); + } if (st != KIT_OK) { + kit_frontend_abort(s->frontend); obj_free(ob); return st; } @@ -338,6 +407,36 @@ KitStatus kit_compile_session_compile(KitCompileSession* s, return compile_session_run(s, input, out, /*commit_on_success=*/1); } +KitStatus kit_compile_session_compile_cg(KitCompileSession* s, + const KitSourceInput* input, + KitCg* cg) { + KitFrontendCompileOptions opts; + KitStatus st; + int unit_open = 0; + + if (!s || !s->c || !s->frontend || !input || !cg) return KIT_INVALID; + if (input->lang != s->lang) return KIT_INVALID; + opts = s->opts; + opts.input_kind = input->input_kind; + opts.repl_entry_name = input->repl_entry_name; + { + KitCgUnitOptions uopts; + memset(&uopts, 0, sizeof uopts); + uopts.source_name = input->name; + st = kit_cg_begin_unit(cg, &uopts); + } + if (st == KIT_OK) unit_open = 1; + if (st == KIT_OK) st = kit_frontend_compile_cg(s->frontend, &opts, input, cg); + if (st == KIT_OK) st = kit_cg_end_unit(cg); + if (st == KIT_OK) { + unit_open = 0; + kit_frontend_commit(s->frontend); + } else if (unit_open) { + (void)kit_cg_abort(cg); + } + return st; +} + KitStatus kit_compile_session_stage(KitCompileSession* s, const KitSourceInput* input, KitObjBuilder** out) { @@ -360,27 +459,30 @@ void kit_compile_session_free(KitCompileSession* s) { h->free(h, s, sizeof(*s)); } -static KitStatus compile_frontend_state_into( +static KitStatus compile_obj_finalize(Compiler* c, ObjBuilder* ob) { + metrics_scope_begin(c, "compile.obj_finalize"); + obj_finalize(ob); + metrics_scope_end(c, "compile.obj_finalize"); + metrics_count(c, "compile.obj_sections", obj_section_count(ob)); + metrics_count(c, "compile.obj_relocs", obj_reloc_total(ob)); + return KIT_OK; +} + +static KitStatus compile_frontend_state_obj_into( Compiler* c, const KitFrontendVTable* vtable, KitFrontendState* frontend, const KitFrontendCompileOptions* opts, const KitSourceInput* input, ObjBuilder* ob) { KitStatus st; metrics_scope_begin(c, "compile.frontend"); - st = vtable->compile(frontend, opts, input, ob); + st = vtable->compile_obj(frontend, opts, input, ob); metrics_scope_end(c, "compile.frontend"); /* Ordinary diagnostic failure: fail softly with the status the frontend * already reported. No synthetic fatal, and do not finalize a half-built - * object. Genuine internal failures panic from inside vtable->compile and + * object. Genuine internal failures panic from inside compile_obj and * never reach here. */ if (st != KIT_OK) return st; - - metrics_scope_begin(c, "compile.obj_finalize"); - obj_finalize(ob); - metrics_scope_end(c, "compile.obj_finalize"); - metrics_count(c, "compile.obj_sections", obj_section_count(ob)); - metrics_count(c, "compile.obj_relocs", obj_reloc_total(ob)); - return KIT_OK; + return compile_obj_finalize(c, ob); } /* ============================================================ diff --git a/src/api/link.c b/src/api/link.c @@ -19,6 +19,8 @@ #include <setjmp.h> #include <string.h> +#include "cg/internal.h" +#include "cg/ir_recorder.h" #include "core/core.h" #include "link/link_internal.h" @@ -227,6 +229,242 @@ KitStatus kit_link_session_add_dso_bytes(KitLinkSession* s, KitSlice name, return link_session_guard(s, link_session_add_dso_bytes_inner, &arg); } +typedef struct LinkLtoPreserveArg { + KitObjBuilder* lto_obj; + KitCg* lto_cg; + KitLinkLtoPreservedCallback cb; + void* user; +} LinkLtoPreserveArg; + +typedef struct LinkLtoRefMark { + ObjSymId sym; + u8 referenced; + u8 pad[3]; +} LinkLtoRefMark; + +typedef struct LinkLtoRefMarks { + Compiler* c; + ObjBuilder* ob; + LinkLtoRefMark* marks; + u32 nmarks; + u32 cap; +} LinkLtoRefMarks; + +static int link_lto_sym_is_logical_undef(const ObjSym* s) { + return s && s->section_id == OBJ_SEC_NONE && s->kind != SK_ABS && + s->kind != SK_COMMON; +} + +static int link_lto_sym_is_preservable_def(const ObjSym* s) { + return s && !s->removed && s->name != 0 && s->bind != SB_LOCAL && + link_sym_is_def(s); +} + +static void link_lto_preserve_name(LinkLtoPreserveArg* a, Sym name) { + ObjSymIter* it; + ObjSymEntry e; + if (!a || !name) return; + it = obj_symiter_new((ObjBuilder*)a->lto_obj); + while (it && obj_symiter_next(it, &e)) { + const ObjSym* s = e.sym; + if (!s || s->name != name) continue; + if (link_lto_sym_is_preservable_def(s)) a->cb(a->user, (KitCgSym)e.id); + } + if (it) obj_symiter_free(it); +} + +static int link_lto_sym_in_preserved_section(ObjBuilder* ob, ObjSymId sym, + const ObjSym* s) { + const Section* sec; + const ObjAtom* atom; + ObjAtomId aid; + if (!ob || !s) return 0; + if (s->section_id == OBJ_SEC_NONE) return 0; + sec = obj_section_get(ob, s->section_id); + if (sec && ((sec->flags & SF_RETAIN) || sec->sem == SSEM_INIT_ARRAY || + sec->sem == SSEM_FINI_ARRAY || sec->sem == SSEM_PREINIT_ARRAY)) + return 1; + aid = obj_atom_find_symbol(ob, sym); + atom = obj_atom_get(ob, aid); + return atom && (atom->flags & OBJ_ATOM_RETAIN); +} + +static void link_lto_refmarks_add(LinkLtoRefMarks* marks, ObjSymId sym, + const ObjSym* s) { + Heap* h; + LinkLtoRefMark* nm; + u32 ncap; + if (!marks || sym == OBJ_SYM_NONE || !s) return; + for (u32 i = 0; i < marks->nmarks; ++i) + if (marks->marks[i].sym == sym) return; + if (marks->nmarks == marks->cap) { + h = marks->c->ctx->heap; + ncap = marks->cap ? marks->cap * 2u : 32u; + nm = (LinkLtoRefMark*)h->realloc(h, marks->marks, + sizeof(*marks->marks) * marks->cap, + sizeof(*marks->marks) * ncap, + _Alignof(LinkLtoRefMark)); + if (!nm) + compiler_panic(marks->c, SRCLOC_NONE, + "link: oom on LTO semantic-ref marks"); + marks->marks = nm; + marks->cap = ncap; + } + marks->marks[marks->nmarks].sym = sym; + marks->marks[marks->nmarks].referenced = s->referenced ? 1u : 0u; + marks->nmarks++; +} + +static void link_lto_mark_refset(ObjBuilder* ob, const ObjSymSet* refs, + LinkLtoRefMarks* marks) { + if (!ob || !refs || !refs->cap) return; + for (u32 i = 0; i < refs->cap; ++i) { + ObjSymId sym = refs->slots[i].k; + const ObjSym* s; + if (sym == OBJ_SYM_NONE) continue; + s = obj_symbol_get(ob, sym); + if (link_lto_sym_is_logical_undef(s)) { + link_lto_refmarks_add(marks, sym, s); + obj_sym_mark_referenced(ob, sym); + } + } +} + +static int link_lto_module_has_asm(const CgIrModule* module) { + if (!module) return 0; + if (module->nfile_scope_asms) return 1; + for (u32 i = 0; i < module->nfuncs; ++i) { + const CgIrFunc* f = module->funcs[i]; + if (!f || f->removed) continue; + for (u32 k = 0; k < f->ninsts; ++k) + if (f->insts[k].op == CG_IR_ASM_BLOCK) return 1; + } + return 0; +} + +static void link_lto_mark_semantic_refs(LinkLtoPreserveArg* a, + LinkLtoRefMarks* marks) { + ObjBuilder* ob = (ObjBuilder*)a->lto_obj; + const CgIrModule* module; + if (!a->lto_cg || !a->lto_cg->target) return; + module = cg_ir_recorder_module(a->lto_cg->target); + if (!module) return; + for (u32 i = 0; i < module->nfuncs; ++i) { + const CgIrFunc* f = module->funcs[i]; + if (!f || f->removed) continue; + link_lto_mark_refset(ob, &f->call_refs, marks); + link_lto_mark_refset(ob, &f->global_refs, marks); + } +} + +static void link_lto_refmarks_restore(LinkLtoRefMarks* marks) { + if (!marks || !marks->ob) return; + for (u32 i = 0; i < marks->nmarks; ++i) { + obj_sym_set_referenced(marks->ob, marks->marks[i].sym, + marks->marks[i].referenced); + } +} + +static void link_lto_refmarks_fini(LinkLtoRefMarks* marks) { + Heap* h; + if (!marks || !marks->marks) return; + h = marks->c->ctx->heap; + h->free(h, marks->marks, sizeof(*marks->marks) * marks->cap); + memset(marks, 0, sizeof(*marks)); +} + +static void link_lto_preserve_intrinsic_roots(KitLinkSession* s, + LinkLtoPreserveArg* a) { + ObjBuilder* ob = (ObjBuilder*)a->lto_obj; + const CgIrModule* module = NULL; + int preserve_all_nonlocal = 0; + ObjSymIter* it; + ObjSymEntry e; + + if (a->lto_cg && a->lto_cg->target) + module = cg_ir_recorder_module(a->lto_cg->target); + + preserve_all_nonlocal = s->opts.output_kind != KIT_LINK_OUTPUT_EXE || + link_lto_module_has_asm(module); + if (s->opts.output_kind == KIT_LINK_OUTPUT_SHARED) preserve_all_nonlocal = 1; + + it = obj_symiter_new(ob); + while (it && obj_symiter_next(it, &e)) { + const ObjSym* os = e.sym; + if (!link_lto_sym_is_preservable_def(os)) continue; + if (preserve_all_nonlocal || os->bind == SB_WEAK || os->kind == SK_IFUNC || + (os->flags & KIT_CG_SYM_USED) || + link_lto_sym_in_preserved_section(ob, e.id, os)) { + a->cb(a->user, (KitCgSym)e.id); + } + } + if (it) obj_symiter_free(it); + + if (s->linker->entry_name) link_lto_preserve_name(a, s->linker->entry_name); + for (u32 i = 0; i < s->opts.nexports; ++i) { + const KitSlice* ex = &s->opts.exports[i]; + if (ex->s && ex->len) + link_lto_preserve_name( + a, + pool_intern_slice(s->c->global, (Slice){.s = ex->s, .len = ex->len})); + } +} + +static void link_lto_preserve_opaque_undef_refs(KitLinkSession* s, + LinkLtoPreserveArg* a) { + u32 ninputs = LinkInputs_count(&s->linker->inputs); + for (u32 ii = 0; ii < ninputs; ++ii) { + LinkInput* in = LinkInputs_at(&s->linker->inputs, ii); + ObjSymIter* it; + ObjSymEntry e; + if (!in || !in->obj || in->obj == (ObjBuilder*)a->lto_obj) continue; + it = obj_symiter_new(in->obj); + while (it && obj_symiter_next(it, &e)) { + const ObjSym* os = e.sym; + if (!os || os->name == 0 || os->bind == SB_LOCAL) continue; + if (link_sym_is_spurious_undef(os)) continue; + if (!link_lto_sym_is_logical_undef(os)) continue; + link_lto_preserve_name(a, os->name); + } + if (it) obj_symiter_free(it); + } +} + +static void link_session_visit_lto_preserved_inner(KitLinkSession* s, + void* arg) { + LinkLtoPreserveArg* a = (LinkLtoPreserveArg*)arg; + LinkLtoRefMarks marks; + memset(&marks, 0, sizeof marks); + marks.c = s->c; + marks.ob = (ObjBuilder*)a->lto_obj; + if (s->opts.output_kind != KIT_LINK_OUTPUT_RELOCATABLE) { + /* Archive selection needs pre-finish semantic refs, but those refs may + * disappear after LTO internalization/DCE. Borrow ObjSym::referenced only + * for archive ingestion, then restore it before CG finish. */ + link_lto_mark_semantic_refs(a, &marks); + link_ingest_archives(s->linker); + link_lto_refmarks_restore(&marks); + link_lto_refmarks_fini(&marks); + } + link_lto_preserve_intrinsic_roots(s, a); + link_lto_preserve_opaque_undef_refs(s, a); +} + +KitStatus kit_link_session_visit_lto_preserved(KitLinkSession* s, + KitObjBuilder* lto_obj, + KitCg* lto_cg, + KitLinkLtoPreservedCallback cb, + void* user) { + LinkLtoPreserveArg arg; + if (!s || !lto_obj || !lto_cg || !cb || s->resolved) return KIT_INVALID; + memset(&arg, 0, sizeof arg); + arg.lto_obj = lto_obj; + arg.lto_cg = lto_cg; + arg.cb = cb; + arg.user = user; + return link_session_guard(s, link_session_visit_lto_preserved_inner, &arg); +} + static void link_session_resolve_inner(KitLinkSession* s, void* arg) { (void)arg; if ((KitLinkOutputKind)s->opts.output_kind == KIT_LINK_OUTPUT_RELOCATABLE) { diff --git a/src/arch/aa64/arch.c b/src/arch/aa64/arch.c @@ -172,6 +172,69 @@ static KitStatus aa64_target_feature_apply_isa(const Target* target, return KIT_UNSUPPORTED; } +/* AArch64 emits AAPCS (and the target-C convention) regardless of OS; it has + * no SysV/Win64/WASM variant. */ +static int aa64_supports_call_conv(const Compiler* c, KitCgCallConv cc) { + (void)c; + switch (cc) { + case KIT_CG_CC_TARGET_C: + case KIT_CG_CC_AAPCS: + return 1; + case KIT_CG_CC_SYSV: + case KIT_CG_CC_WIN64: + case KIT_CG_CC_WASM: + case KIT_CG_CC_INTERRUPT: + return 0; + } + return 0; +} + +/* Capability twin of aa_intrinsic (src/arch/aa64/native.c); keep the two in + * sync. No default case, so a new KitCgIntrinsic trips -Wswitch here. */ +static int aa64_supports_intrinsic(const Compiler* c, KitCgIntrinsic intrin) { + (void)c; + switch (intrin) { + case KIT_CG_INTRIN_TRAP: + case KIT_CG_INTRIN_CLZ: + case KIT_CG_INTRIN_CTZ: + case KIT_CG_INTRIN_POPCOUNT: + case KIT_CG_INTRIN_BSWAP: + case KIT_CG_INTRIN_SADD_OVERFLOW: + case KIT_CG_INTRIN_UADD_OVERFLOW: + case KIT_CG_INTRIN_SSUB_OVERFLOW: + case KIT_CG_INTRIN_USUB_OVERFLOW: + case KIT_CG_INTRIN_SMUL_OVERFLOW: + case KIT_CG_INTRIN_UMUL_OVERFLOW: + case KIT_CG_INTRIN_PREFETCH: + case KIT_CG_INTRIN_EXPECT: + case KIT_CG_INTRIN_ASSUME_ALIGNED: + case KIT_CG_INTRIN_CPU_NOP: + case KIT_CG_INTRIN_CPU_YIELD: + case KIT_CG_INTRIN_ISB: + case KIT_CG_INTRIN_DMB: + case KIT_CG_INTRIN_DSB: + case KIT_CG_INTRIN_WFI: + case KIT_CG_INTRIN_WFE: + case KIT_CG_INTRIN_SEV: + case KIT_CG_INTRIN_IRQ_SAVE: + case KIT_CG_INTRIN_IRQ_RESTORE: + case KIT_CG_INTRIN_IRQ_ENABLE: + case KIT_CG_INTRIN_IRQ_DISABLE: + return 1; + case KIT_CG_INTRIN_SETJMP: + case KIT_CG_INTRIN_LONGJMP: + case KIT_CG_INTRIN_FMA: + case KIT_CG_INTRIN_SYSCALL: + case KIT_CG_INTRIN_DCACHE_CLEAN: + case KIT_CG_INTRIN_DCACHE_INVALIDATE: + case KIT_CG_INTRIN_DCACHE_CLEAN_INVALIDATE: + case KIT_CG_INTRIN_ICACHE_INVALIDATE: + case KIT_CG_INTRIN_CORO_SWITCH: + return 0; + } + return 0; +} + const ArchImpl arch_impl_aa64 = { .backend = {.name = "aa64", .make = aa64_backend_make}, .kind = KIT_ARCH_ARM_64, @@ -202,4 +265,8 @@ const ArchImpl arch_impl_aa64 = { .cfi_data_align_factor = -8, .cfi_cfa_init_reg = 31u, .cfi_cfa_init_offset = 0, + .backend_features = KIT_CG_BACKEND_STRICT_ALIGNMENT, + .atomic_lock_free_max = 8u, + .supports_call_conv = aa64_supports_call_conv, + .supports_intrinsic = aa64_supports_intrinsic, }; diff --git a/src/arch/aa64/link.c b/src/arch/aa64/link.c @@ -201,6 +201,21 @@ static int aa64_is_direct_page_reloc(RelocKind kind) { } } +/* AArch64 __chkstk for PE/COFF: probes `x15 * 16` bytes of stack one page at a + * time, then returns. Mirrors the LLVM compiler-rt implementation (chkstk.S in + * builtins/aarch64). 28 bytes. x64 needs no equivalent — it emits inline stack + * probes. link_synth_coff_ctor_dtor_list emits these bytes into a retained + * .text$chkstk section for COFF targets that carry them. */ +static const u8 aa64_coff_chkstk[28] = { + 0xf0, 0xed, 0x7c, 0xd3, /* lsl x16, x15, #4 */ + 0xf1, 0x03, 0x00, 0x91, /* mov x17, sp */ + 0x31, 0x06, 0x40, 0xd1, /* sub x17, x17, #0x1, lsl #12 */ + 0x10, 0x06, 0x40, 0xf1, /* subs x16, x16, #0x1, lsl #12 */ + 0x3f, 0x02, 0x40, 0xf9, /* ldr xzr, [x17] */ + 0xac, 0xff, 0xff, 0x54, /* b.gt #-0x14 */ + 0xc0, 0x03, 0x5f, 0xd6, /* ret */ +}; + const LinkArchDesc link_arch_aa64 = { .plt0_size = AA64_PLT0_SIZE, .plt_entry_size = AA64_PLT_ENTRY_SIZE, @@ -215,4 +230,7 @@ const LinkArchDesc link_arch_aa64 = { .is_tlvp_reloc = aa64_is_tlvp_reloc, .is_direct_page_reloc = aa64_is_direct_page_reloc, .needs_jit_call_stub = aa64_is_branch_reloc, + + .coff_chkstk_bytes = aa64_coff_chkstk, + .coff_chkstk_len = sizeof aa64_coff_chkstk, }; diff --git a/src/arch/arch.h b/src/arch/arch.h @@ -292,6 +292,34 @@ typedef struct ArchImpl { i32 cfi_data_align_factor; u32 cfi_cfa_init_reg; i32 cfi_cfa_init_offset; + + /* === Generic-layer capability queries ===================================== + * Let generic (non-backend) code in src/cg and src/link decide by capability + * instead of by arch identity (target.arch == KIT_ARCH_*). Each backend + * declares its answer here once. */ + + /* Backend codegen capability bitmask (KitCgBackendFeatureFlag). Per-arch + * constant: the x86 family sets UNALIGNED_MEMORY|RED_ZONE|SIMD, every other + * arch sets STRICT_ALIGNMENT. Read via kit_cg_target_backend_features. */ + u64 backend_features; + + /* Largest power-of-two byte width this arch lowers as a lock-free native + * atomic: 8 for aa64/x64/rv64/wasm, 4 for rv32 (no lr.d/sc.d/amo*.d). The + * single source of truth for kit_cg_atomic_is_lock_free and the C front-end's + * __atomic_always_lock_free. */ + u32 atomic_lock_free_max; + + /* 1 if call convention `cc` is selectable for this compiler's (arch, os). + * KIT_CG_CC_TARGET_C is handled generically (always 1); INTERRUPT is 0. May + * read c->target.os (a property, not arch identity). Read via + * kit_cg_target_supports_call_conv. */ + int (*supports_call_conv)(const Compiler* c, KitCgCallConv cc); + + /* 1 if this arch has a legal lowering for `intrin`. Kept in sync with the + * backend's IntrinKind lowering switch (x64_intrinsic / aa_intrinsic / + * rv_intrinsic / wasm_intrinsic). Read via kit_cg_target_supports_intrinsic. + */ + int (*supports_intrinsic)(const Compiler* c, KitCgIntrinsic intrin); } ArchImpl; const ArchImpl* arch_lookup(KitArchKind); diff --git a/src/arch/cgtarget.c b/src/arch/cgtarget.c @@ -22,13 +22,19 @@ CgTarget* cgtarget_new(Compiler* c, ObjBuilder* o) { } } +void cgtarget_set_finish_policy(CgTarget* t, const CgFinishPolicy* policy) { + if (!t) return; + memset(&t->finish_policy, 0, sizeof(t->finish_policy)); + if (policy) t->finish_policy = *policy; +} + void cgtarget_finalize(CgTarget* t) { if (t && t->finalize) t->finalize(t); } void cgtarget_free(CgTarget* t) { if (!t) return; - /* Arena-backed; nothing to free. */ + if (t->destroy) t->destroy(t); } KitStatus cg_mc_debug_new(Compiler* c, ObjBuilder* o, diff --git a/src/arch/riscv/arch.c b/src/arch/riscv/arch.c @@ -236,7 +236,8 @@ static KitStatus rv64_target_feature_apply_isa(const Target* target, const char* p; const char* end; const RiscvVariant* v = riscv_variant_for_kind(target->arch); - if (isa.len < 5 || memcmp(isa.s, v->isa_prefix, 4) != 0) return KIT_UNSUPPORTED; + if (isa.len < 5 || memcmp(isa.s, v->isa_prefix, 4) != 0) + return KIT_UNSUPPORTED; p = isa.s + 4; end = isa.s + isa.len; rv64_feature_disable_all(words, nwords); @@ -309,7 +310,8 @@ static void rv64_target_feature_defaults(const Target* target, u64* words, rv64_feature_set(words, nwords, RV64_FEAT_F); /* rv32 default profile is rv32imafc_zicsr_zifencei (ilp32f hard-single) — * no D. rv64 keeps the full G+C (lp64d) profile including D. */ - if (target->arch != KIT_ARCH_RV32) rv64_feature_set(words, nwords, RV64_FEAT_D); + if (target->arch != KIT_ARCH_RV32) + rv64_feature_set(words, nwords, RV64_FEAT_D); rv64_feature_set(words, nwords, RV64_FEAT_C); rv64_feature_set(words, nwords, RV64_FEAT_ZICSR); rv64_feature_set(words, nwords, RV64_FEAT_ZIFENCEI); @@ -346,6 +348,71 @@ static CgTarget* rv64_semantic_target_new(Compiler* c, ObjBuilder* o, return native_direct_target_new(c, o, &cfg); } +/* RISC-V emits only the target-C convention; it has no SysV/Win64/AAPCS/WASM + * variant. Shared by rv64 and rv32 (one backend, one answer). */ +static int rv64_supports_call_conv(const Compiler* c, KitCgCallConv cc) { + (void)c; + switch (cc) { + case KIT_CG_CC_TARGET_C: + return 1; + case KIT_CG_CC_SYSV: + case KIT_CG_CC_WIN64: + case KIT_CG_CC_AAPCS: + case KIT_CG_CC_WASM: + case KIT_CG_CC_INTERRUPT: + return 0; + } + return 0; +} + +/* Capability twin of rv_intrinsic (src/arch/riscv/native.c); keep the two in + * sync. rv32 and rv64 share one backend, so they share this answer (the old + * type.c matrix normalized rv32->rv64 for exactly this reason). No default + * case, so a new KitCgIntrinsic trips -Wswitch here. */ +static int rv64_supports_intrinsic(const Compiler* c, KitCgIntrinsic intrin) { + (void)c; + switch (intrin) { + case KIT_CG_INTRIN_TRAP: + case KIT_CG_INTRIN_CLZ: + case KIT_CG_INTRIN_CTZ: + case KIT_CG_INTRIN_POPCOUNT: + case KIT_CG_INTRIN_BSWAP: + case KIT_CG_INTRIN_SADD_OVERFLOW: + case KIT_CG_INTRIN_UADD_OVERFLOW: + case KIT_CG_INTRIN_SSUB_OVERFLOW: + case KIT_CG_INTRIN_USUB_OVERFLOW: + case KIT_CG_INTRIN_SMUL_OVERFLOW: + case KIT_CG_INTRIN_UMUL_OVERFLOW: + case KIT_CG_INTRIN_PREFETCH: + case KIT_CG_INTRIN_EXPECT: + case KIT_CG_INTRIN_ASSUME_ALIGNED: + case KIT_CG_INTRIN_CPU_NOP: + case KIT_CG_INTRIN_CPU_YIELD: + case KIT_CG_INTRIN_ISB: + case KIT_CG_INTRIN_DMB: + case KIT_CG_INTRIN_DSB: + case KIT_CG_INTRIN_WFI: + return 1; + case KIT_CG_INTRIN_SETJMP: + case KIT_CG_INTRIN_LONGJMP: + case KIT_CG_INTRIN_FMA: + case KIT_CG_INTRIN_SYSCALL: + case KIT_CG_INTRIN_IRQ_SAVE: + case KIT_CG_INTRIN_IRQ_RESTORE: + case KIT_CG_INTRIN_IRQ_DISABLE: + case KIT_CG_INTRIN_IRQ_ENABLE: + case KIT_CG_INTRIN_WFE: + case KIT_CG_INTRIN_SEV: + case KIT_CG_INTRIN_DCACHE_CLEAN: + case KIT_CG_INTRIN_DCACHE_INVALIDATE: + case KIT_CG_INTRIN_DCACHE_CLEAN_INVALIDATE: + case KIT_CG_INTRIN_ICACHE_INVALIDATE: + case KIT_CG_INTRIN_CORO_SWITCH: + return 0; + } + return 0; +} + const ArchImpl arch_impl_rv64 = { .backend = {.name = "rv64", .make = rv64_backend_make}, .kind = KIT_ARCH_RV64, @@ -380,6 +447,10 @@ const ArchImpl arch_impl_rv64 = { .cfi_data_align_factor = -8, .cfi_cfa_init_reg = 2u, .cfi_cfa_init_offset = 0, + .backend_features = KIT_CG_BACKEND_STRICT_ALIGNMENT, + .atomic_lock_free_max = 8u, + .supports_call_conv = rv64_supports_call_conv, + .supports_intrinsic = rv64_supports_intrinsic, }; /* RV32 shares nearly all of the RISC-V backend with rv64 — the per-XLEN @@ -421,4 +492,9 @@ const ArchImpl arch_impl_rv32 = { .cfi_data_align_factor = -4, .cfi_cfa_init_reg = 2u, .cfi_cfa_init_offset = 0, + .backend_features = KIT_CG_BACKEND_STRICT_ALIGNMENT, + /* rv32 has no native 64-bit atomics (no lr.d/sc.d/amo*.d). */ + .atomic_lock_free_max = 4u, + .supports_call_conv = rv64_supports_call_conv, + .supports_intrinsic = rv64_supports_intrinsic, }; diff --git a/src/arch/wasm/arch.c b/src/arch/wasm/arch.c @@ -71,6 +71,71 @@ static CGTarget* wasm_backend_make(Compiler* c, ObjBuilder* o, return wasm_cgtarget_new(c, o, NULL); } +/* wasm32 emits the target-C convention and its own WASM convention; no + * SysV/Win64/AAPCS. */ +static int wasm_supports_call_conv(const Compiler* c, KitCgCallConv cc) { + (void)c; + switch (cc) { + case KIT_CG_CC_TARGET_C: + case KIT_CG_CC_WASM: + return 1; + case KIT_CG_CC_SYSV: + case KIT_CG_CC_WIN64: + case KIT_CG_CC_AAPCS: + case KIT_CG_CC_INTERRUPT: + return 0; + } + return 0; +} + +/* Capability twin of wasm_intrinsic (src/arch/wasm/emit.c); keep the two in + * sync. wasm lowers only the portable intrinsics — the CPU/barrier/baremetal + * forms have no wasm lowering (emit.c panics on them). No default case, so a + * new KitCgIntrinsic trips -Wswitch here. */ +static int wasm_supports_intrinsic(const Compiler* c, KitCgIntrinsic intrin) { + (void)c; + switch (intrin) { + case KIT_CG_INTRIN_TRAP: + case KIT_CG_INTRIN_CLZ: + case KIT_CG_INTRIN_CTZ: + case KIT_CG_INTRIN_POPCOUNT: + case KIT_CG_INTRIN_BSWAP: + case KIT_CG_INTRIN_SADD_OVERFLOW: + case KIT_CG_INTRIN_UADD_OVERFLOW: + case KIT_CG_INTRIN_SSUB_OVERFLOW: + case KIT_CG_INTRIN_USUB_OVERFLOW: + case KIT_CG_INTRIN_SMUL_OVERFLOW: + case KIT_CG_INTRIN_UMUL_OVERFLOW: + case KIT_CG_INTRIN_PREFETCH: + case KIT_CG_INTRIN_EXPECT: + case KIT_CG_INTRIN_ASSUME_ALIGNED: + return 1; + case KIT_CG_INTRIN_SETJMP: + case KIT_CG_INTRIN_LONGJMP: + case KIT_CG_INTRIN_FMA: + case KIT_CG_INTRIN_SYSCALL: + case KIT_CG_INTRIN_IRQ_SAVE: + case KIT_CG_INTRIN_IRQ_RESTORE: + case KIT_CG_INTRIN_IRQ_DISABLE: + case KIT_CG_INTRIN_IRQ_ENABLE: + case KIT_CG_INTRIN_DMB: + case KIT_CG_INTRIN_DSB: + case KIT_CG_INTRIN_ISB: + case KIT_CG_INTRIN_DCACHE_CLEAN: + case KIT_CG_INTRIN_DCACHE_INVALIDATE: + case KIT_CG_INTRIN_DCACHE_CLEAN_INVALIDATE: + case KIT_CG_INTRIN_ICACHE_INVALIDATE: + case KIT_CG_INTRIN_CPU_NOP: + case KIT_CG_INTRIN_CPU_YIELD: + case KIT_CG_INTRIN_WFI: + case KIT_CG_INTRIN_WFE: + case KIT_CG_INTRIN_SEV: + case KIT_CG_INTRIN_CORO_SWITCH: + return 0; + } + return 0; +} + const ArchImpl arch_impl_wasm = { .backend = {.name = "wasm", .make = wasm_backend_make}, .kind = KIT_ARCH_WASM, @@ -92,4 +157,9 @@ const ArchImpl arch_impl_wasm = { .register_index = NULL, .register_count = NULL, .register_at = NULL, + .backend_features = KIT_CG_BACKEND_STRICT_ALIGNMENT, + /* wasm32 has 4-byte pointers but lowers 8-byte (i64) atomics lock-free. */ + .atomic_lock_free_max = 8u, + .supports_call_conv = wasm_supports_call_conv, + .supports_intrinsic = wasm_supports_intrinsic, }; diff --git a/src/arch/x64/arch.c b/src/arch/x64/arch.c @@ -141,6 +141,70 @@ static CgTarget* x64_semantic_target_new(Compiler* c, ObjBuilder* o, return native_direct_target_new(c, o, &cfg); } +/* Which explicit calling conventions x86-64 can emit. SysV and Win64 split on + * the OS (a property, not arch identity); TARGET_C is always available. */ +static int x64_supports_call_conv(const Compiler* c, KitCgCallConv cc) { + switch (cc) { + case KIT_CG_CC_TARGET_C: + return 1; + case KIT_CG_CC_SYSV: + return c->target.os != KIT_OS_WINDOWS; + case KIT_CG_CC_WIN64: + return c->target.os == KIT_OS_WINDOWS; + case KIT_CG_CC_AAPCS: + case KIT_CG_CC_WASM: + case KIT_CG_CC_INTERRUPT: + return 0; + } + return 0; +} + +/* Capability twin of x64_intrinsic (src/arch/x64/native.c); keep the two in + * sync. No default case, so a new KitCgIntrinsic trips -Wswitch here. */ +static int x64_supports_intrinsic(const Compiler* c, KitCgIntrinsic intrin) { + (void)c; + switch (intrin) { + case KIT_CG_INTRIN_TRAP: + case KIT_CG_INTRIN_CLZ: + case KIT_CG_INTRIN_CTZ: + case KIT_CG_INTRIN_POPCOUNT: + case KIT_CG_INTRIN_BSWAP: + case KIT_CG_INTRIN_SADD_OVERFLOW: + case KIT_CG_INTRIN_UADD_OVERFLOW: + case KIT_CG_INTRIN_SSUB_OVERFLOW: + case KIT_CG_INTRIN_USUB_OVERFLOW: + case KIT_CG_INTRIN_SMUL_OVERFLOW: + case KIT_CG_INTRIN_UMUL_OVERFLOW: + case KIT_CG_INTRIN_PREFETCH: + case KIT_CG_INTRIN_EXPECT: + case KIT_CG_INTRIN_ASSUME_ALIGNED: + case KIT_CG_INTRIN_CPU_NOP: + case KIT_CG_INTRIN_CPU_YIELD: + case KIT_CG_INTRIN_DMB: + case KIT_CG_INTRIN_DSB: + case KIT_CG_INTRIN_IRQ_ENABLE: + case KIT_CG_INTRIN_IRQ_DISABLE: + return 1; + case KIT_CG_INTRIN_SETJMP: + case KIT_CG_INTRIN_LONGJMP: + case KIT_CG_INTRIN_FMA: + case KIT_CG_INTRIN_SYSCALL: + case KIT_CG_INTRIN_IRQ_SAVE: + case KIT_CG_INTRIN_IRQ_RESTORE: + case KIT_CG_INTRIN_ISB: + case KIT_CG_INTRIN_WFI: + case KIT_CG_INTRIN_WFE: + case KIT_CG_INTRIN_SEV: + case KIT_CG_INTRIN_DCACHE_CLEAN: + case KIT_CG_INTRIN_DCACHE_INVALIDATE: + case KIT_CG_INTRIN_DCACHE_CLEAN_INVALIDATE: + case KIT_CG_INTRIN_ICACHE_INVALIDATE: + case KIT_CG_INTRIN_CORO_SWITCH: + return 0; + } + return 0; +} + const ArchImpl arch_impl_x64 = { .backend = {.name = "x64", .make = x64_backend_make}, .kind = KIT_ARCH_X86_64, @@ -173,4 +237,9 @@ const ArchImpl arch_impl_x64 = { .cfi_data_align_factor = -8, .cfi_cfa_init_reg = 7u, .cfi_cfa_init_offset = 8, + .backend_features = KIT_CG_BACKEND_UNALIGNED_MEMORY | + KIT_CG_BACKEND_RED_ZONE | KIT_CG_BACKEND_SIMD, + .atomic_lock_free_max = 8u, + .supports_call_conv = x64_supports_call_conv, + .supports_intrinsic = x64_supports_intrinsic, }; diff --git a/src/cg/atomic.c b/src/cg/atomic.c @@ -1,3 +1,4 @@ +#include "arch/arch.h" #include "cg/internal.h" MemAccess api_mem_for_atomic(KitCg* g, KitCgTypeId val_ty) { @@ -17,12 +18,13 @@ MemAccess api_mem_for_atomic(KitCg* g, KitCgTypeId val_ty) { return ma; } -/* Native (lock-free) atomic ceiling for the target. Most targets — aa64, x64, - * rv64, wasm32 — lower 8-byte (i64-width) atomics lock-free. rv32 has no native - * 64-bit atomic instructions (lr.d/sc.d/amo*.d are RV64-only), so 8-byte - * atomics there must go through the libatomic spinlock shim. The distinguishing - * property is a 4-byte general-purpose register / pointer width that is NOT - * wasm32 (wasm32 has 4-byte pointers but 8-byte atomics, so we test the arch). +/* Native (lock-free) atomic ceiling for the target, read from the arch backend + * descriptor (ArchImpl.atomic_lock_free_max). Most targets — aa64, x64, rv64, + * wasm32 — lower 8-byte (i64-width) atomics lock-free. rv32 reports 4: it has + * no native 64-bit atomic instructions (lr.d/sc.d/amo*.d are RV64-only), so + * 8-byte atomics there must go through the libatomic spinlock shim. (wasm32 has + * 4-byte pointers but still reports 8 — this is a per-arch capability, not a + * pointer-width test.) * * NOTE: this predicate is the single source of truth shared with the C * front-end's __atomic_always_lock_free / __atomic_is_lock_free builtins (they @@ -32,9 +34,8 @@ MemAccess api_mem_for_atomic(KitCg* g, KitCgTypeId val_ty) { * so the shim takes the spinlock path instead of recursing into an illegal * native 8-byte atomic. */ static u32 cg_atomic_lock_free_max(KitCompiler* c) { - if (c->target.ptr_size == 4 && c->target.arch != KIT_ARCH_WASM) - return 4u; /* rv32 and other 32-bit non-wasm targets */ - return CG_MAX_ATOMIC_SIZE; + const ArchImpl* a = arch_for_compiler(c); + return a ? a->atomic_lock_free_max : CG_MAX_ATOMIC_SIZE; } int kit_cg_atomic_is_legal(KitCompiler* c, KitCgMemAccess access, @@ -63,20 +64,27 @@ int kit_cg_atomic_is_lock_free(KitCompiler* c, KitCgMemAccess access) { * is exactly the 8-byte-on-a-4-byte-target case (rv32). */ static int cg_atomic_needs_libcall(KitCg* g, KitCgTypeId val_ty) { return abi_cg_sizeof(g->c->abi, val_ty) == 8 && - g->c->target.ptr_size == 4 && g->c->target.arch != KIT_ARCH_WASM; + cg_atomic_lock_free_max(g->c) < 8u; } /* Map a KitCgAtomicOp to the libatomic __atomic_fetch_<op>_8 / __atomic_*_8 * entry point. XCHG maps to __atomic_exchange_8. */ static const char* cg_atomic_rmw_libcall_8(KitCgAtomicOp op) { switch (op) { - case KIT_CG_ATOMIC_XCHG: return "__atomic_exchange_8"; - case KIT_CG_ATOMIC_ADD: return "__atomic_fetch_add_8"; - case KIT_CG_ATOMIC_SUB: return "__atomic_fetch_sub_8"; - case KIT_CG_ATOMIC_AND: return "__atomic_fetch_and_8"; - case KIT_CG_ATOMIC_OR: return "__atomic_fetch_or_8"; - case KIT_CG_ATOMIC_XOR: return "__atomic_fetch_xor_8"; - case KIT_CG_ATOMIC_NAND: return "__atomic_fetch_nand_8"; + case KIT_CG_ATOMIC_XCHG: + return "__atomic_exchange_8"; + case KIT_CG_ATOMIC_ADD: + return "__atomic_fetch_add_8"; + case KIT_CG_ATOMIC_SUB: + return "__atomic_fetch_sub_8"; + case KIT_CG_ATOMIC_AND: + return "__atomic_fetch_and_8"; + case KIT_CG_ATOMIC_OR: + return "__atomic_fetch_or_8"; + case KIT_CG_ATOMIC_XOR: + return "__atomic_fetch_xor_8"; + case KIT_CG_ATOMIC_NAND: + return "__atomic_fetch_nand_8"; } return NULL; } @@ -85,8 +93,8 @@ static const char* cg_atomic_rmw_libcall_8(KitCgAtomicOp op) { * api_runtime_helper (wide.c) but without its 3-param ceiling, which the * 5-argument __atomic_compare_exchange_8 needs. */ static KitCgSym cg_atomic_runtime_sym(KitCg* g, const char* name, - KitCgTypeId ret, const KitCgTypeId* params, - u32 nparams) { + KitCgTypeId ret, + const KitCgTypeId* params, u32 nparams) { KitCgFuncParam ps[5]; KitCgFuncResult result; KitCgFuncSig sig; @@ -105,7 +113,8 @@ static KitCgSym cg_atomic_runtime_sym(KitCg* g, const char* name, memset(&decl, 0, sizeof decl); decl.kind = KIT_CG_DECL_FUNC; decl.linkage_name = kit_cg_c_linkage_name( - (KitCompiler*)g->c, pool_intern_slice(g->c->global, slice_from_cstr(name))); + (KitCompiler*)g->c, + pool_intern_slice(g->c->global, slice_from_cstr(name))); decl.display_name = decl.linkage_name; decl.type = kit_cg_type_func((KitCompiler*)g->c, sig); decl.sym.bind = KIT_SB_GLOBAL; @@ -217,7 +226,8 @@ void kit_cg_atomic_rmw(KitCg* g, KitCgMemAccess access, KitCgAtomicOp op, KitCgTypeId ps[3]; ApiSValue args[3]; if (!name) { - compiler_panic(g->c, g->cur_loc, "KitCg: unsupported 8-byte atomic rmw op"); + compiler_panic(g->c, g->cur_loc, + "KitCg: unsupported 8-byte atomic rmw op"); return; } ps[0] = pty; diff --git a/src/cg/cgtarget.h b/src/cg/cgtarget.h @@ -346,8 +346,11 @@ typedef struct CGFuncDesc { SrcLoc loc; u32 flags; /* CGFuncDescFlag */ KitCgInlinePolicy inline_policy; + u16 sym_bind; /* SymBind */ + u16 sym_kind; /* SymKind */ + u8 sym_vis; /* SymVis */ u8 atomize; - u8 pad[3]; + u8 pad[2]; } CGFuncDesc; typedef enum CGCallFlag { @@ -468,6 +471,14 @@ typedef struct CGDebugLoc { * Debug producer without this header depending on debug/debug.h. */ typedef struct Debug Debug; +typedef struct CgFinishPolicy { + u8 output_kind; /* KitCgOutputKind */ + u8 interposition_policy; /* KitCgInterpositionPolicy */ + u8 pad[2]; + const ObjSymId* preserved_symbols; + u32 npreserved_symbols; +} CgFinishPolicy; + typedef struct CgTarget CgTarget; struct CgTarget { /* Typed IR lowering context. Subclasses extend. */ @@ -480,6 +491,8 @@ struct CgTarget { * shares the same object for line-row emission. */ Debug* debug; + CgFinishPolicy finish_policy; + /* ---- function lifecycle ---- */ void (*func_begin)(CgTarget*, const CGFuncDesc*); void (*func_end)(CgTarget*); @@ -776,6 +789,7 @@ struct CgTarget { void cg_lower_switch_default(CgTarget* t, const CGSwitchDesc* desc); CgTarget* cgtarget_new(Compiler*, ObjBuilder*); +void cgtarget_set_finish_policy(CgTarget*, const CgFinishPolicy*); void cgtarget_finalize(CgTarget*); void cgtarget_free(CgTarget*); diff --git a/src/cg/data.c b/src/cg/data.c @@ -1,8 +1,120 @@ #include "cg/internal.h" #include "core/vec.h" +#include "obj/symresolve.h" static void api_data_tls_write_zero(KitCg* g, uint64_t size); +static SymAttrs api_data_sym_attrs(const ObjSym* s) { + SymAttrs a; + memset(&a, 0, sizeof a); + if (!s) return a; + a.bind = s->bind; + a.kind = s->kind; + a.size = s->size; + a.common_align = (s->kind == SK_COMMON) ? (u32)s->common_align : 0u; + a.in_comdat = 0; + return a; +} + +static SymAttrs api_data_decl_attrs(Compiler* c, const KitCgDecl* decl, + uint64_t size, uint32_t common_align) { + SymAttrs a; + memset(&a, 0, sizeof a); + if (!decl) return a; + a.bind = api_map_bind(decl->sym.bind); + a.kind = (decl->as.object.flags & KIT_CG_OBJ_TLS) ? SK_TLS : SK_OBJ; + a.size = size; + a.common_align = common_align; + a.in_comdat = 0; + (void)c; + return a; +} + +static void api_data_clear_state(KitCg* g) { + if (!g) return; + g->data_sec = OBJ_SEC_NONE; + g->data_sym = OBJ_SYM_NONE; + g->data_base = 0; + g->data_size = 0; + g->data_atomize = 0; + g->data_retain = 0; + g->data_local_static_target = 0; + g->data_discard = 0; +} + +static void api_data_discard_begin(KitCg* g, ObjSymId sym) { + if (!g) return; + g->data_sec = OBJ_SEC_NONE; + g->data_sym = sym; + g->data_base = 0; + g->data_size = 0; + g->data_atomize = 0; + g->data_retain = 0; + g->data_local_static_target = 0; + g->data_discard = 1; +} + +static int api_data_section_is_isolated(const Section* sec, const ObjSym* sym) { + if (!sec || !sym || sym->section_id == OBJ_SEC_NONE || sym->value != 0) + return 0; + if (sec->kind == SEC_BSS || sec->sem == SSEM_NOBITS) + return sec->bss_size == sym->size; + return sec->bytes.total == sym->size; +} + +static void api_data_remove_existing_if_isolated(KitCg* g, const ObjSym* sym) { + const Section* sec; + if (!g || !sym || sym->section_id == OBJ_SEC_NONE) return; + sec = obj_section_get(g->obj, sym->section_id); + if (api_data_section_is_isolated(sec, sym)) + obj_section_remove(g->obj, sym->section_id); +} + +static void api_data_apply_symbol_attrs(KitCg* g, ObjSymId sym, + const KitCgDecl* decl) { + ObjSym* osym; + if (!g || sym == OBJ_SYM_NONE || !decl) return; + osym = (ObjSym*)obj_symbol_get(g->obj, sym); + if (!osym) return; + osym->bind = api_map_bind(decl->sym.bind); + osym->vis = api_map_vis(decl->sym.visibility); + osym->kind = (decl->as.object.flags & KIT_CG_OBJ_TLS) ? SK_TLS : SK_OBJ; + osym->common_align = 0; +} + +/* A symbol already defined by the *current* source unit is a same-TU + * re-definition — legal C tentative-definition coalescing (`int g; int g;`, + * `int g; int g = 5;`, `int arr[]; int arr[3];`). Those re-emit through the + * legacy last-writer-wins path; only a definition contributed by a *different* + * unit (cross-TU LTO staging) is resolved via symresolve_merge. */ +static int api_data_defined_this_unit(const KitCg* g, ObjSymId sym) { + if (!g || g->cur_unit_seq == 0 || sym == OBJ_SYM_NONE) return 0; + if (sym >= g->sym_def_seq_cap) return 0; + return g->sym_def_seq[sym] == g->cur_unit_seq; +} + +static void api_data_mark_defined_unit(KitCg* g, ObjSymId sym) { + Heap* h; + u32* na; + u32 cap; + if (!g || g->cur_unit_seq == 0 || sym == OBJ_SYM_NONE) return; + if (sym >= g->sym_def_seq_cap) { + h = g->c->ctx->heap; + cap = g->sym_def_seq_cap ? g->sym_def_seq_cap : 16u; + while (cap <= sym) cap *= 2u; + na = (u32*)h->alloc(h, sizeof(*na) * cap, _Alignof(u32)); + if (!na) return; + memset(na, 0, sizeof(*na) * cap); + if (g->sym_def_seq) { + memcpy(na, g->sym_def_seq, sizeof(*na) * g->sym_def_seq_cap); + h->free(h, g->sym_def_seq, sizeof(*g->sym_def_seq) * g->sym_def_seq_cap); + } + g->sym_def_seq = na; + g->sym_def_seq_cap = cap; + } + g->sym_def_seq[sym] = g->cur_unit_seq; +} + static void api_data_tls_ensure_materialized(KitCg* g) { if (!g || !g->data_tls_collect || !g->data_tls_zero_fill) return; if (g->data_size) api_data_tls_write_zero(g, g->data_size); @@ -78,6 +190,28 @@ void kit_cg_data_begin(KitCg* g, KitCgSym cg_sym, KitCgDataDefAttrs attrs) { decl_attrs = api_sym_attrs(g, cg_sym); align = attrs.align ? attrs.align : (u32)abi_cg_alignof(c->abi, decl_attrs.type); + if (sym != OBJ_SYM_NONE && !api_data_defined_this_unit(g, sym)) { + const ObjSym* existing = obj_symbol_get(ob, sym); + if (symresolve_sym_is_def(existing)) { + SymAttrs old_attrs = api_data_sym_attrs(existing); + SymAttrs new_attrs = + api_data_decl_attrs(c, &decl_attrs, abi_cg_sizeof(c->abi, ty), 0); + SymMergeResult mr = symresolve_merge(old_attrs, new_attrs); + switch (mr.kind) { + case SYM_MERGE_REPLACE: + api_data_remove_existing_if_isolated(g, existing); + obj_symbol_set_bind(ob, sym, (SymBind)new_attrs.bind); + break; + case SYM_MERGE_KEEP_EXISTING: + case SYM_MERGE_COMDAT_DISCARD: + case SYM_MERGE_COMMON: + api_data_discard_begin(g, sym); + return; + case SYM_MERGE_ODR_ERROR: + compiler_panic(c, g->cur_loc, "duplicate definition of symbol"); + } + } + } if ((attrs.flags & KIT_CG_DATADEF_FUNCTION_LOCAL) && g->target && g->target->local_static_data_begin) { @@ -190,8 +324,10 @@ void kit_cg_data_begin(KitCg* g, KitCgSym cg_sym, KitCgDataDefAttrs attrs) { g->data_atomize = atomize ? 1u : 0u; g->data_retain = (attrs.flags & KIT_CG_DATADEF_RETAIN) ? 1u : 0u; if (sym != OBJ_SYM_NONE) { + api_data_apply_symbol_attrs(g, sym, &decl_attrs); obj_symbol_define(ob, sym, sec, (u64)g->data_base, (u64)abi_cg_sizeof(c->abi, decl_attrs.type)); + api_data_mark_defined_unit(g, sym); } } @@ -205,6 +341,31 @@ void kit_cg_data_common(KitCg* g, KitCgSym cg_sym, uint64_t size, osym = (ObjSym*)obj_symbol_get(g->obj, sym); if (!osym) return; decl_attrs = api_sym_attrs(g, cg_sym); + if (symresolve_sym_is_def(osym) && !api_data_defined_this_unit(g, sym)) { + SymAttrs old_attrs = api_data_sym_attrs(osym); + SymAttrs new_attrs = api_data_decl_attrs(g->c, &decl_attrs, size, align); + SymMergeResult mr; + new_attrs.kind = SK_COMMON; + mr = symresolve_merge(old_attrs, new_attrs); + switch (mr.kind) { + case SYM_MERGE_COMMON: + osym->bind = new_attrs.bind; + osym->vis = api_map_vis(decl_attrs.sym.visibility); + osym->kind = SK_COMMON; + osym->section_id = OBJ_SEC_NONE; + osym->value = 0; + osym->size = size; + osym->common_align = mr.merged_align; + return; + case SYM_MERGE_REPLACE: + break; + case SYM_MERGE_KEEP_EXISTING: + case SYM_MERGE_COMDAT_DISCARD: + return; + case SYM_MERGE_ODR_ERROR: + compiler_panic(g->c, g->cur_loc, "duplicate definition of symbol"); + } + } osym->bind = api_map_bind(decl_attrs.sym.bind); osym->vis = api_map_vis(decl_attrs.sym.visibility); osym->kind = SK_COMMON; @@ -212,9 +373,11 @@ void kit_cg_data_common(KitCg* g, KitCgSym cg_sym, uint64_t size, osym->value = 0; osym->size = size; osym->common_align = align; + api_data_mark_defined_unit(g, sym); } void kit_cg_data_align(KitCg* g, uint32_t align) { + if (g && g->data_discard) return; if (g && g->data_local_static_target) { u32 a = align ? align : 1u; u64 base = (g->data_size + (a - 1u)) & ~(u64)(a - 1u); @@ -243,6 +406,7 @@ void kit_cg_data_align(KitCg* g, uint32_t align) { void kit_cg_data_pad(KitCg* g, uint64_t size, uint8_t value) { u8 pad[64]; if (!g || !size) return; + if (g->data_discard) return; if (g->data_local_static_target) { if (value == 0) { kit_cg_data_zero(g, size); @@ -295,6 +459,7 @@ void kit_cg_data_int(KitCg* g, uint64_t value, KitCgTypeId type) { u32 size; u8 bytes[8]; if (!g) return; + if (g->data_discard) return; ty = resolve_type(g->c, type); if (!ty) return; size = (u32)abi_cg_sizeof(g->c->abi, type); @@ -314,6 +479,7 @@ void kit_cg_data_float(KitCg* g, double value, KitCgTypeId type) { u8 b[8]; } u; if (!g) return; + if (g->data_discard) return; ty = resolve_type(g->c, type); if (!ty) return; if (api_is_f128_type(g->c, ty)) { @@ -348,6 +514,7 @@ void kit_cg_data_float(KitCg* g, double value, KitCgTypeId type) { void kit_cg_data_bytes(KitCg* g, const uint8_t* data, size_t len) { if (!g || !len) return; + if (g->data_discard) return; if (g->data_local_static_target) { g->target->local_static_data_write(g->target, data, (u64)len); g->data_size += len; @@ -364,6 +531,7 @@ void kit_cg_data_bytes(KitCg* g, const uint8_t* data, size_t len) { void kit_cg_data_zero(KitCg* g, uint64_t size) { const Section* sec; if (!g || !size) return; + if (g->data_discard) return; if (g->data_local_static_target) { g->target->local_static_data_write(g->target, NULL, size); g->data_size += size; @@ -404,6 +572,7 @@ void api_cg_data_reloc(KitCg* g, KitCgSym target, int64_t addend, RelocKind rk; u8 pad[8]; if (!g || !width || width > sizeof(pad)) return; + if (g->data_discard) return; ob = g->obj; rk = api_data_reloc_kind(pcrel, width); if (rk == R_NONE) return; @@ -429,6 +598,7 @@ void kit_cg_data_addr(KitCg* g, KitCgSym target, int64_t addend, uint32_t width, "relocations are not yet supported by this target"); return; } + if (g && g->data_discard) return; api_cg_data_reloc(g, target, addend, width, 0); } @@ -439,6 +609,7 @@ void kit_cg_data_label_addr(KitCg* g, KitCgLabel target, int64_t addend, (void)addend; (void)address_space; if (!g) return; + if (g->data_discard) return; if (!width || width > sizeof(pad)) { compiler_panic(g->c, g->cur_loc, "kit_cg_data_label_addr: width must be 1..%u, got %u", @@ -471,6 +642,7 @@ void kit_cg_data_pcrel(KitCg* g, KitCgSym target, int64_t addend, "not yet supported by this target"); return; } + if (g && g->data_discard) return; api_cg_data_reloc(g, target, addend, width, 1); } @@ -482,6 +654,7 @@ void kit_cg_data_symdiff(KitCg* g, KitCgSym lhs, KitCgSym rhs, int64_t addend, const ObjSym* lhs_sym; const ObjSym* rhs_sym; if (!g || width > sizeof(pad)) return; + if (g->data_discard) return; if (g->data_local_static_target) { compiler_panic(g->c, g->cur_loc, "kit_cg_data_symdiff: function-local static symdiff data " @@ -541,18 +714,17 @@ void kit_cg_data_end(KitCg* g) { Heap* h; u8* flat; if (!g) return; + if (g->data_discard) { + api_data_clear_state(g); + return; + } if (g->data_local_static_target) { g->target->local_static_data_end(g->target); - g->data_sec = OBJ_SEC_NONE; - g->data_sym = OBJ_SYM_NONE; - g->data_base = 0; - g->data_size = 0; - g->data_atomize = 0; - g->data_retain = 0; - g->data_local_static_target = 0; + api_data_clear_state(g); return; } if (g->data_tls_collect) { + KitCgDecl decl_attrs = api_sym_attrs(g, (KitCgSym)g->data_sym); h = (Heap*)g->c->ctx->heap; flat = NULL; if (!g->data_tls_zero_fill && g->data_size) { @@ -561,6 +733,7 @@ void kit_cg_data_end(KitCg* g) { compiler_panic(g->c, api_no_loc(), "KitCg: oom on TLS data bytes"); buf_flatten(&g->data_tls_bytes, flat); } + api_data_apply_symbol_attrs(g, g->data_sym, &decl_attrs); obj_define_tls(g->c, g->obj, g->data_sym, g->data_tls_zero_fill ? NULL : flat, (u32)g->data_size, g->data_tls_zero_fill ? 0 : 1, g->data_tls_align, @@ -576,15 +749,12 @@ void kit_cg_data_end(KitCg* g) { g->data_tls_collect = 0; g->data_tls_zero_fill = 0; g->data_tls_align = 0; - g->data_sec = OBJ_SEC_NONE; - g->data_sym = OBJ_SYM_NONE; - g->data_base = 0; - g->data_size = 0; - g->data_atomize = 0; - g->data_retain = 0; + api_data_clear_state(g); return; } if (g->data_sym != OBJ_SYM_NONE) { + KitCgDecl decl_attrs = api_sym_attrs(g, (KitCgSym)g->data_sym); + api_data_apply_symbol_attrs(g, g->data_sym, &decl_attrs); obj_symbol_define(g->obj, g->data_sym, g->data_sec, g->data_base, g->data_size); } @@ -592,12 +762,7 @@ void kit_cg_data_end(KitCg* g) { obj_atom_define(g->obj, g->data_sec, g->data_base, (u32)g->data_size, g->data_sym, g->data_retain ? OBJ_ATOM_RETAIN : 0u); } - g->data_sec = OBJ_SEC_NONE; - g->data_sym = OBJ_SYM_NONE; - g->data_base = 0; - g->data_size = 0; - g->data_atomize = 0; - g->data_retain = 0; + api_data_clear_state(g); } /* Source targets with a native switch form should override target->switch_. diff --git a/src/cg/internal.h b/src/cg/internal.h @@ -138,6 +138,15 @@ struct KitCg { ObjBuilder* obj; CgTarget* target; Debug* debug; + KitCgUnitOptions cur_unit; + u32 nsource_units; + /* Monotonic, nonzero per source unit (set at kit_cg_begin_unit). Used to tell + * a same-TU re-definition (legal tentative-definition coalescing) apart from a + * genuine cross-TU contribution that must go through symresolve_merge. */ + u32 cur_unit_seq; + u8 unit_active; + u8 finished; + u8 lifecycle_pad[2]; ApiSValue* stack; u32 sp; @@ -159,6 +168,12 @@ struct KitCg { KitCgDecl* sym_attrs; u32 sym_cap; + /* Per-ObjSymId: the cur_unit_seq of the unit that last *defined* this symbol + * (0 = not defined by any unit yet). Distinct from sym_attrs, which is reset + * on every decl; this is written only when a definition is emitted. */ + u32* sym_def_seq; + u32 sym_def_seq_cap; + ApiCgScope scopes[API_CG_MAX_SCOPES]; u32 nscopes; u32 scope_generation; @@ -177,7 +192,7 @@ struct KitCg { u8 data_local_static_target; u8 data_atomize; u8 data_retain; - u8 data_local_static_pad0[1]; + u8 data_discard; u8 data_tls_collect; u8 data_tls_zero_fill; u8 data_tls_pad[2]; @@ -349,9 +364,13 @@ void kit_cg_drop(KitCg* g); int kit_cg_top_const_int(KitCg* g, int64_t* out_value); void kit_cg_rot3(KitCg* g); KitStatus kit_cg_new(KitCompiler* c, KitCg** cg_out); -KitStatus kit_cg_begin_obj(KitCg* g, KitObjBuilder* out, - const KitCodeOptions* opts); -KitStatus kit_cg_end_obj(KitCg* g); +KitStatus kit_cg_begin(KitCg* g, KitObjBuilder* out, + const KitCodeOptions* opts); +KitStatus kit_cg_begin_unit(KitCg* g, const KitCgUnitOptions* opts); +KitStatus kit_cg_end_unit(KitCg* g); +KitStatus kit_cg_finish(KitCg* g, const KitCgFinishOptions* opts); +KitStatus kit_cg_detach(KitCg* g); +KitStatus kit_cg_abort(KitCg* g); void kit_cg_free(KitCg* g); void kit_cg_set_loc(KitCg* g, KitSrcLoc loc); KitCgSym kit_cg_decl(KitCg* g, KitCgDecl decl); diff --git a/src/cg/ir.h b/src/cg/ir.h @@ -249,7 +249,8 @@ typedef struct CgIrFunc { u32 next_inst_id; u8 complete; - u8 pad[3]; + u8 removed; + u8 pad[2]; } CgIrFunc; typedef struct CgIrAlias { diff --git a/src/cg/session.c b/src/cg/session.c @@ -10,6 +10,8 @@ #include "arch/wasm/wasm_imports.h" #endif +#include "obj/symresolve.h" + static void cg_free_obj_state(KitCg* g) { Heap* h; if (!g) return; @@ -30,6 +32,11 @@ static void cg_free_obj_state(KitCg* g) { h->free(h, g->sym_attrs, sizeof(*g->sym_attrs) * g->sym_cap); g->sym_attrs = NULL; } + if (g->sym_def_seq) { + h->free(h, g->sym_def_seq, sizeof(*g->sym_def_seq) * g->sym_def_seq_cap); + g->sym_def_seq = NULL; + g->sym_def_seq_cap = 0; + } if (g->data_tls_collect) { buf_fini(&g->data_tls_bytes); g->data_tls_collect = 0; @@ -60,6 +67,7 @@ static void cg_free_obj_state(KitCg* g) { g->data_base = 0; g->data_size = 0; g->data_local_static_target = 0; + g->data_discard = 0; g->data_tls_zero_fill = 0; g->data_tls_align = 0; g->data_tls_nrelocs = 0; @@ -83,14 +91,15 @@ KitStatus kit_cg_new(KitCompiler* c, KitCg** cg_out) { return KIT_OK; } -KitStatus kit_cg_begin_obj(KitCg* g, KitObjBuilder* out, - const KitCodeOptions* opts) { +KitStatus kit_cg_begin(KitCg* g, KitObjBuilder* out, + const KitCodeOptions* opts) { KitCompiler* c; CgTarget* target; const CGBackend* backend; int opt_level = opts ? opts->opt_level : 0; if (!g || !g->c || !out) return KIT_INVALID; - if (g->obj || g->target || g->debug) return KIT_INVALID; + if (g->obj || g->target || g->debug || g->unit_active || g->finished) + return KIT_INVALID; c = (KitCompiler*)g->c; if (opt_level < 0 || opt_level > 2) { compiler_panic((Compiler*)c, api_no_loc(), @@ -143,30 +152,104 @@ KitStatus kit_cg_begin_obj(KitCg* g, KitObjBuilder* out, g->check_only = (opts && opts->check_only) ? 1u : 0u; g->function_sections = (opts && opts->function_sections) ? 1u : 0u; g->data_sections = (opts && opts->data_sections) ? 1u : 0u; + g->nsource_units = 0; + g->unit_active = 0; + g->finished = 0; + memset(&g->cur_unit, 0, sizeof(g->cur_unit)); + return KIT_OK; +} + +KitStatus kit_cg_begin_unit(KitCg* g, const KitCgUnitOptions* opts) { + KitCgUnitOptions unit; + if (!g || !g->obj || !g->target) return KIT_INVALID; + if (g->finished || g->unit_active) return KIT_INVALID; + memset(&unit, 0, sizeof unit); + if (opts) { + if (opts->flags) return KIT_INVALID; + unit = *opts; + } + if (!unit.source_id) unit.source_id = g->nsource_units + 1u; + g->cur_unit = unit; + g->nsource_units++; + g->cur_unit_seq = g->nsource_units; /* nonzero, unique per unit */ + g->unit_active = 1; return KIT_OK; } -KitStatus kit_cg_end_obj(KitCg* g) { +KitStatus kit_cg_end_unit(KitCg* g) { + if (!g || !g->obj || !g->target) return KIT_INVALID; + if (!g->unit_active) return KIT_INVALID; + g->unit_active = 0; + memset(&g->cur_unit, 0, sizeof(g->cur_unit)); + return KIT_OK; +} + +KitStatus kit_cg_finish(KitCg* g, const KitCgFinishOptions* opts) { + CgFinishPolicy policy; if (!g) return KIT_INVALID; - if (!g->obj) return KIT_INVALID; + if (!g->obj || !g->target) return KIT_INVALID; + if (g->finished || g->unit_active) return KIT_INVALID; + memset(&policy, 0, sizeof policy); + if (opts) { + if (opts->output_kind > KIT_CG_OUTPUT_ARCHIVE_MEMBER) return KIT_INVALID; + if (opts->interposition_policy > KIT_CG_INTERPOSITION_DEFAULT_VISIBILITY) + return KIT_INVALID; + if (opts->npreserved_symbols && !opts->preserved_symbols) + return KIT_INVALID; + policy.output_kind = opts->output_kind; + policy.interposition_policy = opts->interposition_policy; + policy.preserved_symbols = (const ObjSymId*)opts->preserved_symbols; + policy.npreserved_symbols = opts->npreserved_symbols; + } + for (u32 i = 0; i < policy.npreserved_symbols; ++i) { + ObjSymId sym = policy.preserved_symbols[i]; + const ObjSym* os = obj_symbol_get(g->obj, sym); + if (sym == OBJ_SYM_NONE || !os || os->removed) return KIT_INVALID; + } + cgtarget_set_finish_policy(g->target, &policy); +#if KIT_OPT_ENABLED + /* opt_set_finish_policy treats the recorder's user as an OptImpl, which is + * only true when the optimizer wrapped the backend (opt_level > 0; see + * kit_cg_begin). At opt_level 0 g->target is the bare backend recorder — e.g. + * the C-source backend, whose user is a CTarget — so calling it there is a + * type confusion that corrupts the backend. Guard it the same way + * opt_set_dump_writer is guarded at its call site. */ + if (g->opt_level > 0) opt_set_finish_policy(g->target, &policy); +#endif cgtarget_finalize(g->target); if (g->debug) { debug_emit(g->debug); debug_free(g->debug); + g->debug = NULL; + } + g->finished = 1; + return KIT_OK; +} + +KitStatus kit_cg_detach(KitCg* g) { + if (!g) return KIT_INVALID; + if (g->debug) { + debug_free(g->debug); + g->debug = NULL; } cgtarget_free(g->target); g->obj = NULL; g->target = NULL; - g->debug = NULL; + g->finished = 0; + g->unit_active = 0; + g->nsource_units = 0; + memset(&g->cur_unit, 0, sizeof(g->cur_unit)); cg_free_obj_state(g); return KIT_OK; } +KitStatus kit_cg_abort(KitCg* g) { return kit_cg_detach(g); } + void kit_cg_free(KitCg* g) { Heap* h; if (!g) return; h = g->c->ctx->heap; - if (g->obj) (void)kit_cg_end_obj(g); + (void)kit_cg_abort(g); h->free(h, g, sizeof *g); } @@ -195,7 +278,9 @@ KitCgSym kit_cg_decl(KitCg* g, KitCgDecl decl) { ob = g->obj; ty = resolve_type(c, decl.type); if (!ty) return KIT_CG_SYM_NONE; - sym = obj_symbol_find(ob, (Sym)decl.linkage_name); + sym = (decl.sym.bind == KIT_SB_LOCAL) + ? OBJ_SYM_NONE + : obj_symbol_find(ob, (Sym)decl.linkage_name); if (sym == OBJ_SYM_NONE) { sym = obj_symbol_ex(ob, (Sym)decl.linkage_name, api_map_bind(decl.sym.bind), api_map_vis(decl.sym.visibility), @@ -204,9 +289,12 @@ KitCgSym kit_cg_decl(KitCg* g, KitCgDecl decl) { /* C permits the `weak` attribute on any declaration of a symbol; a later * weak (re)declaration demotes a previously-strong global to weak. Without * this, a plain prototype followed by a `weak` definition would emit a - * strong global and collide with any strong override at link time. */ + * strong global and collide with any strong override at link time. In a + * shared LTO builder, do not let a weak declaration from a later TU demote + * an already-defined strong symbol; merge policy handles that override. */ const ObjSym* s = obj_symbol_get(ob, sym); - if (s && s->bind == SB_GLOBAL) obj_symbol_set_bind(ob, sym, SB_WEAK); + if (s && s->bind == SB_GLOBAL && !symresolve_sym_is_def(s)) + obj_symbol_set_bind(ob, sym, SB_WEAK); } if (decl.sym.flags) { obj_symbol_set_flags(ob, sym, (u16)decl.sym.flags); @@ -305,6 +393,9 @@ void kit_cg_func_begin_attrs(KitCg* g, KitCgSym cg_sym, g->fn_desc.fn_type = fty; g->fn_desc.result_types = g->fn_result_types; g->fn_desc.loc = g->cur_loc; + g->fn_desc.sym_bind = api_map_bind(attrs.sym.bind); + g->fn_desc.sym_kind = SK_FUNC; + g->fn_desc.sym_vis = api_map_vis(attrs.sym.visibility); g->fn_desc.atomize = atomize ? 1u : 0u; if (attrs.as.func.flags & KIT_CG_FUNC_NORETURN) { g->fn_desc.flags |= CGFD_NORETURN; diff --git a/src/cg/type.c b/src/cg/type.c @@ -1,3 +1,4 @@ +#include "arch/arch.h" #include "cg/internal.h" typedef enum CgApiTypeKind { @@ -940,25 +941,14 @@ KitStatus kit_cg_type_record_field(KitCompiler* c, KitCgTypeId id, } int kit_cg_target_supports_call_conv(KitCompiler* c, KitCgCallConv cc) { + const ArchImpl* a; if (!c) return 0; - switch (cc) { - case KIT_CG_CC_TARGET_C: - return 1; - case KIT_CG_CC_SYSV: - return c->target.arch == KIT_ARCH_X86_64 && - c->target.os != KIT_OS_WINDOWS; - case KIT_CG_CC_WIN64: - return c->target.arch == KIT_ARCH_X86_64 && - c->target.os == KIT_OS_WINDOWS; - case KIT_CG_CC_AAPCS: - return c->target.arch == KIT_ARCH_ARM_32 || - c->target.arch == KIT_ARCH_ARM_64; - case KIT_CG_CC_WASM: - return c->target.arch == KIT_ARCH_WASM; - case KIT_CG_CC_INTERRUPT: - return 0; - } - return 0; + /* TARGET_C is always available, including for arches with no codegen backend + * (x86_32/arm_32, where arch_for_compiler is NULL). */ + if (cc == KIT_CG_CC_TARGET_C) return 1; + a = arch_for_compiler(c); + if (!a || !a->supports_call_conv) return 0; + return a->supports_call_conv(c, cc); } int kit_cg_target_supports_symbol_feature(KitCompiler* c, @@ -985,84 +975,21 @@ int kit_cg_target_supports_symbol_feature(KitCompiler* c, } int kit_cg_target_supports_intrinsic(KitCompiler* c, KitCgIntrinsic intrin) { - KitArchKind arch; + const ArchImpl* a; if (!c) return 0; - arch = c->target.arch; - /* rv32 and rv64 share one RISC-V backend (src/arch/riscv), so the set of - * lowerable intrinsics is identical; decide as rv64 for both. */ - if (arch == KIT_ARCH_RV32) arch = KIT_ARCH_RV64; - switch (intrin) { - /* Portable intrinsics every backend (native + wasm + C-source) lowers. - * The C-source backend runs under the host's native arch, so it is covered - * by the native arches here. */ - case KIT_CG_INTRIN_TRAP: - case KIT_CG_INTRIN_CLZ: - case KIT_CG_INTRIN_CTZ: - case KIT_CG_INTRIN_POPCOUNT: - case KIT_CG_INTRIN_BSWAP: - case KIT_CG_INTRIN_SADD_OVERFLOW: - case KIT_CG_INTRIN_UADD_OVERFLOW: - case KIT_CG_INTRIN_SSUB_OVERFLOW: - case KIT_CG_INTRIN_USUB_OVERFLOW: - case KIT_CG_INTRIN_SMUL_OVERFLOW: - case KIT_CG_INTRIN_UMUL_OVERFLOW: - case KIT_CG_INTRIN_PREFETCH: - case KIT_CG_INTRIN_EXPECT: - case KIT_CG_INTRIN_ASSUME_ALIGNED: - return arch == KIT_ARCH_ARM_64 || arch == KIT_ARCH_X86_64 || - arch == KIT_ARCH_RV64 || arch == KIT_ARCH_WASM; - - /* Single-instruction CPU control: NOP / YIELD exist on all three native - * arches; the wait/event/barrier/IRQ forms are arch-specific (see the - * per-backend nd_intrinsic switch). */ - case KIT_CG_INTRIN_CPU_NOP: - case KIT_CG_INTRIN_CPU_YIELD: - return arch == KIT_ARCH_ARM_64 || arch == KIT_ARCH_X86_64 || - arch == KIT_ARCH_RV64; - case KIT_CG_INTRIN_ISB: - return arch == KIT_ARCH_ARM_64 || arch == KIT_ARCH_RV64; - case KIT_CG_INTRIN_DMB: - case KIT_CG_INTRIN_DSB: - return arch == KIT_ARCH_ARM_64 || arch == KIT_ARCH_X86_64 || - arch == KIT_ARCH_RV64; - case KIT_CG_INTRIN_WFI: - return arch == KIT_ARCH_ARM_64 || arch == KIT_ARCH_RV64; - case KIT_CG_INTRIN_WFE: - case KIT_CG_INTRIN_SEV: - return arch == KIT_ARCH_ARM_64; - case KIT_CG_INTRIN_IRQ_SAVE: - case KIT_CG_INTRIN_IRQ_RESTORE: - return arch == KIT_ARCH_ARM_64; - case KIT_CG_INTRIN_IRQ_ENABLE: - case KIT_CG_INTRIN_IRQ_DISABLE: - return arch == KIT_ARCH_ARM_64 || arch == KIT_ARCH_X86_64; - - /* Not yet implemented on any native backend. */ - case KIT_CG_INTRIN_SETJMP: - case KIT_CG_INTRIN_LONGJMP: - case KIT_CG_INTRIN_FMA: - case KIT_CG_INTRIN_SYSCALL: - case KIT_CG_INTRIN_DCACHE_CLEAN: - case KIT_CG_INTRIN_DCACHE_INVALIDATE: - case KIT_CG_INTRIN_DCACHE_CLEAN_INVALIDATE: - case KIT_CG_INTRIN_ICACHE_INVALIDATE: - case KIT_CG_INTRIN_CORO_SWITCH: - return 0; - } - return 0; + a = arch_for_compiler(c); + if (!a || !a->supports_intrinsic) return 0; + return a->supports_intrinsic(c, intrin); } uint64_t kit_cg_target_backend_features(KitCompiler* c) { - uint64_t out = 0; + const ArchImpl* a; if (!c) return 0; - if (c->target.arch == KIT_ARCH_X86_64 || c->target.arch == KIT_ARCH_X86_32) { - out |= KIT_CG_BACKEND_UNALIGNED_MEMORY; - out |= KIT_CG_BACKEND_RED_ZONE; - out |= KIT_CG_BACKEND_SIMD; - } else { - out |= KIT_CG_BACKEND_STRICT_ALIGNMENT; - } - return out; + a = arch_for_compiler(c); + /* Arches with no registered codegen backend (x86_32) fall back to the + * conservative strict-alignment baseline. */ + if (!a) return KIT_CG_BACKEND_STRICT_ALIGNMENT; + return a->backend_features; } void cg_api_fini(Compiler* c) { diff --git a/src/emu/emu.c b/src/emu/emu.c @@ -424,6 +424,7 @@ static void* translate_block(KitEmu* e, u64 guest_pc) { ObjBuilder* ob; KitCg* cg; KitCodeOptions copts; + KitCgUnitOptions unit_opts; Sym block_name; KitCgDecl block_decl; KitCgSym block_sym; @@ -471,7 +472,10 @@ static void* translate_block(KitEmu* e, u64 guest_pc) { memset(&copts, 0, sizeof(copts)); copts.opt_level = e->opt_level; st = kit_cg_new(e->c, &cg); - if (st == KIT_OK) st = kit_cg_begin_obj(cg, (KitObjBuilder*)ob, &copts); + if (st == KIT_OK) st = kit_cg_begin(cg, (KitObjBuilder*)ob, &copts); + memset(&unit_opts, 0, sizeof unit_opts); + unit_opts.source_name = KIT_SLICE_LIT("<emu-block>"); + if (st == KIT_OK) st = kit_cg_begin_unit(cg, &unit_opts); if (st != KIT_OK || !cg) compiler_panic(e->c, SRCLOC_NONE, "emu: kit_cg_new failed"); @@ -501,9 +505,11 @@ static void* translate_block(KitEmu* e, u64 guest_pc) { insts = NULL; if (st != KIT_OK) compiler_panic(e->c, SRCLOC_NONE, "emu: failed to lift block"); - st = kit_cg_end_obj(cg); + st = kit_cg_end_unit(cg); + if (st == KIT_OK) st = kit_cg_finish(cg, NULL); + if (st == KIT_OK) st = kit_cg_detach(cg); if (st != KIT_OK) - compiler_panic(e->c, SRCLOC_NONE, "emu: kit_cg_end_obj failed"); + compiler_panic(e->c, SRCLOC_NONE, "emu: kit_cg_finish failed"); kit_cg_free(cg); obj_finalize(ob); @@ -533,7 +539,7 @@ static void* translate_block(KitEmu* e, u64 guest_pc) { #if KIT_INTERP_ENABLED /* INTERP mode: the JIT image above still resolved the block's helper externs * and validated the lifted IR, but dispatch runs the captured InterpFunc - * (lowered during kit_cg_end_obj, above) instead of the host code. Cache the + * (lowered during kit_cg_finish, above) instead of the host code. Cache the * InterpFunc*; kit_emu_step disambiguates the payload by e->mode. A rejected * block is still captured (ifn->ok == 0) and is reported with its reason when * dispatched, so only a genuine capture miss yields NULL here. */ diff --git a/src/link/link_arch.h b/src/link/link_arch.h @@ -81,6 +81,14 @@ typedef struct LinkArchDesc { int (*is_tlvp_reloc)(RelocKind); int (*is_direct_page_reloc)(RelocKind); int (*needs_jit_call_stub)(RelocKind); + + /* ---- Optional COFF __chkstk stub ---- + * Arches that cannot emit inline stack probes (aarch64) carry the bytes of a + * __chkstk function that link_synth_coff_ctor_dtor_list emits into a retained + * .text$chkstk section for PE/COFF targets. NULL/0 = none: x64 emits inline + * probes, and the RISC-V and wasm arches are not COFF targets. */ + const u8* coff_chkstk_bytes; + u32 coff_chkstk_len; } LinkArchDesc; /* Returns NULL for an unsupported arch. Callers panic with their own diff --git a/src/link/link_internal.h b/src/link/link_internal.h @@ -10,6 +10,7 @@ #include "core/segvec.h" #include "link/link.h" #include "obj/obj.h" +#include "obj/symresolve.h" /* Per-input mapping built during link_resolve. ObjSymId / ObjSecId are * scoped to a single ObjBuilder, so the linker maintains an explicit @@ -104,36 +105,17 @@ static inline LinkSectionId link_input_symbol_section(const InputMap* m, * classify input symbols the same way. These predicates are the one * authority; every lane routes through them. */ -/* Defined-symbol replacement policy: a stronger binding wins. Takes u16 - * to match ObjSym.bind. */ +/* Resolution policy now lives in obj/symresolve.h so the LTO staging merge can + * reuse it. These keep the historical link_* spellings (every lane routes + * through them) as thin wrappers over the shared definitions. */ static inline int link_bind_strength(u16 bind) { - switch (bind) { - case SB_GLOBAL: - return 3; - case SB_WEAK: - return 2; - case SB_LOCAL: - return 1; - default: - return 0; - } + return symresolve_bind_strength(bind); } - -/* A symbol that contributes a definition: not SK_UNDEF, and either an - * absolute/common/file pseudo-def or anchored to a real section. */ static inline int link_sym_is_def(const ObjSym* s) { - return s && s->kind != SK_UNDEF && - (s->kind == SK_ABS || s->kind == SK_COMMON || s->kind == SK_FILE || - s->section_id != OBJ_SEC_NONE); + return symresolve_sym_is_def(s); } - -/* An unreferenced global/weak extern declaration: a header artifact, not - * a real demand. The frontend synthesizes one per visible prototype, so - * pruning these keeps unused archive members from being pulled in. */ static inline int link_sym_is_spurious_undef(const ObjSym* s) { - return s && s->section_id == OBJ_SEC_NONE && s->kind != SK_ABS && - s->kind != SK_COMMON && !s->referenced && - (s->bind == SB_GLOBAL || s->bind == SB_WEAK); + return symresolve_sym_is_spurious_undef(s); } /* In-section byte count for an input section: BSS/NOBITS report their diff --git a/src/link/link_resolve.c b/src/link/link_resolve.c @@ -108,7 +108,8 @@ void link_input_map_alloc(LinkImage* img, InputMap* m, ObjBuilder* ob, m->natom = natom; m->atom = (LinkSectionId*)h->alloc(h, sizeof(*m->atom) * (natom ? natom : 1u), _Alignof(LinkSectionId)); - if (!m->atom) compiler_panic(img->c, SRCLOC_NONE, "link: oom on input atom map"); + if (!m->atom) + compiler_panic(img->c, SRCLOC_NONE, "link: oom on input atom map"); memset(m->atom, 0, sizeof(*m->atom) * (natom ? natom : 1u)); m->sym_atom = (ObjAtomId*)h->alloc( h, sizeof(*m->sym_atom) * (nsym ? nsym : 1u), _Alignof(ObjAtomId)); @@ -132,7 +133,8 @@ void link_input_map_alloc(LinkImage* img, InputMap* m, ObjBuilder* ob, h, sizeof(*m->section_atom_count) * (nsection ? nsection : 1u), _Alignof(u32)); if (!m->section_atom_first || !m->section_atom_count) - compiler_panic(img->c, SRCLOC_NONE, "link: oom on input section atom ranges"); + compiler_panic(img->c, SRCLOC_NONE, + "link: oom on input section atom ranges"); memset(m->section_atom_first, 0, sizeof(*m->section_atom_first) * (nsection ? nsection : 1u)); memset(m->section_atom_count, 0, @@ -145,7 +147,8 @@ void link_input_map_alloc(LinkImage* img, InputMap* m, ObjBuilder* ob, if (natom > 1u) { atoms = (AtomSortRec*)h->alloc(h, sizeof(*atoms) * natom, _Alignof(AtomSortRec)); - if (!atoms) compiler_panic(img->c, SRCLOC_NONE, "link: oom on atom sort map"); + if (!atoms) + compiler_panic(img->c, SRCLOC_NONE, "link: oom on atom sort map"); for (i = 1; i < natom; ++i) { const ObjAtom* a = obj_atom_get(ob, (ObjAtomId)i); if (!a || a->removed || a->section_id == OBJ_SEC_NONE || @@ -261,56 +264,62 @@ void link_resolve_symbols(Linker* l, LinkImage* img) { if (symhash_insert(&img->globals, s->name, fresh, &existing)) { m->sym[e.id] = link_append_symbol(img, &rec); } else { + /* A second definition of an existing global/weak name: hand the + * binding-precedence decision to the shared policy module. The + * COMDAT lookup (does prev's section carry SF_GROUP?) is the + * caller-side bookkeeping symresolve deliberately leaves out. */ LinkSymbol* prev = LinkSyms_at(&img->syms, existing - 1); - int new_strength = link_bind_strength(s->bind); - int old_strength = link_bind_strength(prev->bind); - if (prev->kind == SK_COMMON && rec.kind == SK_COMMON) { - if (rec.size > prev->size) { - u32 new_align = (rec.common_align > prev->common_align) - ? rec.common_align - : prev->common_align; + ObjBuilder* prev_ob = + (prev->input_id != LINK_INPUT_NONE) + ? LinkInputs_at(&l->inputs, prev->input_id - 1)->obj + : NULL; + const ObjSym* prev_os = + prev_ob ? obj_symbol_get(prev_ob, prev->obj_sym) : NULL; + SymAttrs ex_a = {0}; + SymAttrs inc_a = {0}; + SymMergeResult mr; + ex_a.bind = prev->bind; + ex_a.kind = prev->kind; + ex_a.size = prev->size; + ex_a.common_align = prev->common_align; + ex_a.in_comdat = (prev_ob && prev_os) + ? (u8)obj_sym_defined_in_comdat(prev_ob, prev_os) + : 0u; + inc_a.bind = rec.bind; + inc_a.kind = rec.kind; + inc_a.size = rec.size; + inc_a.common_align = rec.common_align; + inc_a.in_comdat = (u8)obj_sym_defined_in_comdat(ob, s); + mr = symresolve_merge(ex_a, inc_a); + switch (mr.kind) { + case SYM_MERGE_REPLACE: rec.id = existing; - rec.common_align = new_align; *prev = rec; - } - m->sym[e.id] = existing; - } else if (rec.kind == SK_COMMON) { - m->sym[e.id] = existing; - } else if (prev->kind == SK_COMMON) { - rec.id = existing; - *prev = rec; - m->sym[e.id] = existing; - } else if (new_strength > old_strength) { - rec.id = existing; - *prev = rec; - m->sym[e.id] = existing; - } else if (new_strength == old_strength && - new_strength == link_bind_strength(SB_GLOBAL)) { - /* COFF SELECTANY: if both defs are in COMDAT sections, - * keep the earlier one and discard the new section. */ - ObjBuilder* prev_ob = - (prev->input_id != LINK_INPUT_NONE) - ? LinkInputs_at(&l->inputs, prev->input_id - 1)->obj - : NULL; - const ObjSym* prev_os = - prev_ob ? obj_symbol_get(prev_ob, prev->obj_sym) : NULL; - if (prev_ob && prev_os && - obj_sym_defined_in_comdat(prev_ob, prev_os) && - obj_sym_defined_in_comdat(ob, s)) { + m->sym[e.id] = existing; + break; + case SYM_MERGE_COMMON: + rec.id = existing; + rec.common_align = mr.merged_align; + *prev = rec; + m->sym[e.id] = existing; + break; + case SYM_MERGE_COMDAT_DISCARD: m->sym[e.id] = existing; if (s->section_id < m->nsection) m->comdat_discarded[s->section_id] = 1; - } else { + break; + case SYM_MERGE_ODR_ERROR: { Slice nm_s = pool_slice(l->c->global, s->name); - const char* nm = nm_s.s; - size_t namelen = nm_s.len; compiler_panic(l->c, SRCLOC_NONE, "link: duplicate definition of " "global symbol '%.*s'", - (int)namelen, nm); + (int)nm_s.len, nm_s.s); + break; } - } else { - m->sym[e.id] = existing; + case SYM_MERGE_KEEP_EXISTING: + default: + m->sym[e.id] = existing; + break; } } } else { @@ -531,7 +540,8 @@ void link_gc_live_alloc(GcLive* g, Linker* l, Heap* h) { g->nsec[ii] = nsec; g->natom[ii] = natom; g->marks[ii] = (u8*)h->alloc(h, nsec ? nsec : 1u, 1); - if (!g->marks[ii]) compiler_panic(l->c, SRCLOC_NONE, "link: oom on gc marks"); + if (!g->marks[ii]) + compiler_panic(l->c, SRCLOC_NONE, "link: oom on gc marks"); memset(g->marks[ii], 0, nsec); g->atom_marks[ii] = (u8*)h->alloc(h, natom ? natom : 1u, 1); if (!g->atom_marks[ii]) @@ -836,7 +846,8 @@ static void include_archive_member(Linker* l, const LinkArchive* ar, if (mem->included) return; in = LinkInputs_push(&l->inputs, &idx); if (!in) - compiler_panic(l->c, SRCLOC_NONE, "link: oom growing inputs (archive member)"); + compiler_panic(l->c, SRCLOC_NONE, + "link: oom growing inputs (archive member)"); id = (LinkInputId)(idx + 1u); in->id = id; /* PE/COFF short-import shim: read_coff_short_import stashes the @@ -969,18 +980,6 @@ void link_synth_coff_ctor_dtor_list(Linker* l) { ObjBuilder* ob; ObjSecId sid; static const u8 kZeros[16] = {0}; - /* AArch64 __chkstk: probes `x15 * 16` bytes of stack one page at a - * time, then returns. Mirrors the LLVM compiler-rt implementation - * (chkstk.S in builtins/aarch64). 28 bytes. */ - static const u8 kAa64Chkstk[28] = { - 0xf0, 0xed, 0x7c, 0xd3, /* lsl x16, x15, #4 */ - 0xf1, 0x03, 0x00, 0x91, /* mov x17, sp */ - 0x31, 0x06, 0x40, 0xd1, /* sub x17, x17, #0x1, lsl #12 */ - 0x10, 0x06, 0x40, 0xf1, /* subs x16, x16, #0x1, lsl #12 */ - 0x3f, 0x02, 0x40, 0xf9, /* ldr xzr, [x17] */ - 0xac, 0xff, 0xff, 0x54, /* b.gt #-0x14 */ - 0xc0, 0x03, 0x5f, 0xd6, /* ret */ - }; LinkInput* in; u32 idx; if (!l || l->c->target.obj != KIT_OBJ_COFF) return; @@ -998,21 +997,28 @@ void link_synth_coff_ctor_dtor_list(Linker* l) { SB_GLOBAL, SV_DEFAULT, SK_OBJ, sid, 0, 0, 0); obj_symbol_ex(ob, pool_intern_slice(l->c->global, SLICE_LIT("__DTOR_END__")), SB_GLOBAL, SV_DEFAULT, SK_OBJ, sid, 0, 0, 0); - /* __chkstk: only the aa64 variant is synthesized here; x64 codegen - * already emits inline probes (or links libmingwex's __chkstk - * which is a plain object, not an ARM64EC alias). */ - if (l->c->target.arch == KIT_ARCH_ARM_64) { - ObjSecId tsid = obj_section_ex( - ob, pool_intern_slice(l->c->global, SLICE_LIT(".text$chkstk")), - SEC_TEXT, SSEM_PROGBITS, SF_ALLOC | SF_EXEC | SF_RETAIN, 4, 0u, 0u, 0u); - obj_section_replace_bytes(ob, tsid, kAa64Chkstk, sizeof(kAa64Chkstk)); - obj_symbol_ex(ob, pool_intern_slice(l->c->global, SLICE_LIT("__chkstk")), - SB_GLOBAL, SV_DEFAULT, SK_FUNC, tsid, 0, sizeof(kAa64Chkstk), - 0); + /* __chkstk: synthesized only for arches whose link descriptor carries the + * stub bytes (aarch64). x64 needs none — its codegen emits inline probes (or + * links libmingwex's plain-object __chkstk). Driven by the descriptor, so no + * arch identity is consulted here. */ + { + const LinkArchDesc* la = link_arch_desc_for(l->c); + if (la && la->coff_chkstk_bytes && la->coff_chkstk_len) { + ObjSecId tsid = obj_section_ex( + ob, pool_intern_slice(l->c->global, SLICE_LIT(".text$chkstk")), + SEC_TEXT, SSEM_PROGBITS, SF_ALLOC | SF_EXEC | SF_RETAIN, 4, 0u, 0u, + 0u); + obj_section_replace_bytes(ob, tsid, la->coff_chkstk_bytes, + la->coff_chkstk_len); + obj_symbol_ex(ob, pool_intern_slice(l->c->global, SLICE_LIT("__chkstk")), + SB_GLOBAL, SV_DEFAULT, SK_FUNC, tsid, 0, + la->coff_chkstk_len, 0); + } } obj_finalize(ob); in = LinkInputs_push(&l->inputs, &idx); - if (!in) compiler_panic(l->c, SRCLOC_NONE, "link: oom growing inputs (synth)"); + if (!in) + compiler_panic(l->c, SRCLOC_NONE, "link: oom growing inputs (synth)"); in->id = (LinkInputId)(idx + 1u); in->kind = LINK_INPUT_OBJ_BYTES; in->order = l->next_input_order++; diff --git a/src/obj/obj.c b/src/obj/obj.c @@ -11,6 +11,7 @@ #include <string.h> +#include "core/hashmap.h" #include "core/heap.h" #include "core/pool.h" #include "core/segvec.h" @@ -18,6 +19,14 @@ SEGVEC_DEFINE(Sections, Section, 5); /* 32 entries per segment */ SEGVEC_DEFINE(Symbols, ObjSym, 6); /* 64 entries per segment */ + +/* name (interned Sym) -> first defining ObjSymId. A validated fast-path index + * for obj_symbol_find: the whole-program LTO builder holds every TU's symbols + * in one builder, so the historical linear scan is O(n^2) at decl time. The + * index stores the first id seen for a name (matching the scan's "first match" + * semantics); obj_symbol_find re-checks the hit's name and falls back to a + * linear scan if it is stale (after obj_symbol_rename), so it is always exact. */ +HASHMAP_DEFINE(SymNameIndex, Sym, ObjSymId, hash_u32); SEGVEC_DEFINE(Relocs, Reloc, 6); /* 64 entries per segment */ SEGVEC_DEFINE(Groups, ObjGroup, 3); /* 8 entries per segment */ SEGVEC_DEFINE(Atoms, ObjAtom, 5); /* 32 entries per segment */ @@ -37,6 +46,7 @@ struct KitObjBuilder { Relocs relocs; /* flat across all sections; filtered on read */ Groups groups; /* index 0 reserved as "none" */ Atoms atoms; /* index 0 reserved as "none" */ + SymNameIndex sym_by_name; /* name -> first ObjSymId; accelerates find */ /* Format-specific ELF e_flags. Set by read_elf to the input's * e_flags (e.g. on RISC-V, EF_RISCV_RVC | EF_RISCV_FLOAT_ABI_DOUBLE); * consumed by emit_elf to round-trip. Zero when not set — emit_elf @@ -80,6 +90,7 @@ ObjBuilder* obj_new(Compiler* c) { Relocs_init(&ob->relocs, h); Groups_init(&ob->groups, h); Atoms_init(&ob->atoms, h); + SymNameIndex_init(&ob->sym_by_name, h); /* Reserve index 0 in each id space as the "none" sentinel. SegVec * pushes are zeroed, so the sentinel slots have all-zero fields. */ @@ -130,6 +141,7 @@ void obj_free(ObjBuilder* ob) { Relocs_fini(&ob->relocs); Groups_fini(&ob->groups); Atoms_fini(&ob->atoms); + SymNameIndex_fini(&ob->sym_by_name); obj_image_free_(ob); ob->heap->free(ob->heap, ob, sizeof(*ob)); } @@ -528,17 +540,23 @@ ObjSymId obj_symbol_ex(ObjBuilder* ob, Sym name, SymBind bind, SymVis vis, s->value = value; s->size = size; s->common_align = common_align; + /* First-wins: record the lowest id for this name so obj_symbol_find returns + * the same symbol the linear scan would. Later same-name symbols (legal for + * STB_LOCAL) do not overwrite. */ + if (name && !SymNameIndex_get(&ob->sym_by_name, name)) + (void)SymNameIndex_set(&ob->sym_by_name, name, (ObjSymId)id); return (ObjSymId)id; } ObjSymId obj_symbol_find(ObjBuilder* ob, Sym name) { + /* Authoritative O(1) lookup — never a linear scan. Every symbol is created + * through obj_symbol_ex (the only Symbols_push besides the id-0 sentinel), + * which indexes it, and obj_symbol_rename keeps the index exact, so the map + * always holds the first id for a live name. */ + ObjSymId* hit; if (!ob || !name) return OBJ_SYM_NONE; - u32 n = Symbols_count(&ob->symbols); - for (u32 i = 1; i < n; ++i) { - ObjSym* s = Symbols_at(&ob->symbols, i); - if (s && s->name == name) return (ObjSymId)i; - } - return OBJ_SYM_NONE; + hit = SymNameIndex_get(&ob->sym_by_name, name); + return hit ? *hit : OBJ_SYM_NONE; } void obj_symbol_define(ObjBuilder* ob, ObjSymId id, ObjSecId section_id, @@ -590,6 +608,13 @@ void obj_sym_mark_referenced(ObjBuilder* ob, ObjSymId id) { if (s) s->referenced = 1; } +void obj_sym_set_referenced(ObjBuilder* ob, ObjSymId id, int referenced) { + ObjSym* s; + if (id == OBJ_SYM_NONE) return; + s = Symbols_at(&ob->symbols, id); + if (s) s->referenced = referenced ? 1u : 0u; +} + ObjAtomId obj_atom_define(ObjBuilder* ob, ObjSecId section_id, u32 offset, u32 size, ObjSymId signature, u32 flags) { u32 id; @@ -675,10 +700,46 @@ void obj_section_rename(ObjBuilder* ob, ObjSecId id, Sym new_name) { void obj_symbol_rename(ObjBuilder* ob, ObjSymId id, Sym new_name) { ObjSym* s; + Sym old; + ObjSymId* slot; if (!ob || id == OBJ_SYM_NONE) return; s = Symbols_at(&ob->symbols, id); if (!s) return; + old = s->name; s->name = new_name; + if (old == new_name) return; + /* Keep the name index exact so obj_symbol_find stays a pure hash lookup. + * If this symbol was the indexed entry for its old name, hand the entry to + * the next-lowest symbol still carrying that name (duplicate STB_LOCAL names + * are legal), or drop it. This is the only scan in the symbol-index path and + * it is confined to obj_symbol_rename — a cold objcopy-style operation, never + * the codegen/find hot path. */ + if (old) { + slot = SymNameIndex_get(&ob->sym_by_name, old); + if (slot && *slot == id) { + ObjSymId repl = OBJ_SYM_NONE; + u32 n = Symbols_count(&ob->symbols); + for (u32 i = 1; i < n; ++i) { + ObjSym* t = Symbols_at(&ob->symbols, i); + if (t && (ObjSymId)i != id && t->name == old) { + repl = (ObjSymId)i; + break; + } + } + if (repl != OBJ_SYM_NONE) + (void)SymNameIndex_set(&ob->sym_by_name, old, repl); + else + SymNameIndex_del(&ob->sym_by_name, old); + } + } + /* new_name resolves to the lowest id that carries it (first-match order). A + * rename can give an existing lower-id symbol this name, so lower an existing + * entry when warranted. */ + if (new_name) { + slot = SymNameIndex_get(&ob->sym_by_name, new_name); + if (!slot || *slot > id) + (void)SymNameIndex_set(&ob->sym_by_name, new_name, id); + } } void obj_symbol_set_bind(ObjBuilder* ob, ObjSymId id, SymBind bind) { diff --git a/src/obj/obj.h b/src/obj/obj.h @@ -461,6 +461,7 @@ void obj_reloc_ex(ObjBuilder*, ObjSecId section_id, u32 offset, RelocKind, * ingested symbol so a roundtrip preserves UNDEFs that another tool * emitted into the input. */ void obj_sym_mark_referenced(ObjBuilder*, ObjSymId); +void obj_sym_set_referenced(ObjBuilder*, ObjSymId, int referenced); ObjAtomId obj_atom_define(ObjBuilder*, ObjSecId section_id, u32 offset, u32 size, ObjSymId signature, u32 flags); diff --git a/src/obj/symresolve.c b/src/obj/symresolve.c @@ -0,0 +1,37 @@ +#include "obj/symresolve.h" + +SymMergeResult symresolve_merge(SymAttrs existing, SymAttrs incoming) { + SymMergeResult r; + int new_strength = symresolve_bind_strength(incoming.bind); + int old_strength = symresolve_bind_strength(existing.bind); + r.kind = SYM_MERGE_KEEP_EXISTING; + r.merged_align = 0; + + if (existing.kind == SK_COMMON && incoming.kind == SK_COMMON) { + /* Tentative-definition merge: the larger reservation wins, alignment is the + * max of both. A smaller-or-equal incoming common changes nothing (this + * matches the linker's prior behavior, which did not bump alignment when + * the size did not grow). */ + if (incoming.size > existing.size) { + r.kind = SYM_MERGE_COMMON; + r.merged_align = (incoming.common_align > existing.common_align) + ? incoming.common_align + : existing.common_align; + } + } else if (incoming.kind == SK_COMMON) { + /* A real definition already present beats an incoming common. */ + } else if (existing.kind == SK_COMMON) { + /* A real definition beats a previously-seen common. */ + r.kind = SYM_MERGE_REPLACE; + } else if (new_strength > old_strength) { + r.kind = SYM_MERGE_REPLACE; + } else if (new_strength == old_strength && + new_strength == symresolve_bind_strength(SB_GLOBAL)) { + /* Two strong definitions: legal only as COFF SELECTANY when both sit in + * COMDAT sections (keep the first, discard the new); otherwise ODR. */ + r.kind = (existing.in_comdat && incoming.in_comdat) ? SYM_MERGE_COMDAT_DISCARD + : SYM_MERGE_ODR_ERROR; + } + /* else: incoming is weaker (or weak-vs-weak); keep the first definition. */ + return r; +} diff --git a/src/obj/symresolve.h b/src/obj/symresolve.h @@ -0,0 +1,78 @@ +#ifndef KIT_OBJ_SYMRESOLVE_H +#define KIT_OBJ_SYMRESOLVE_H + +#include "obj/obj.h" + +/* Symbol-resolution policy, factored out of the linker so it has one source of + * truth and a second caller can reuse it: link_resolve_symbols runs it at link + * time, and the LTO staging coordinator runs it at the per-TU merge boundary + * (doc/plan/LTO.md §3). It is a pure decision over symbol attributes — no + * linker state, no allocation. The entangled bookkeeping (the globals hash, the + * per-input map, COMDAT section discard, DSO iteration) stays in the caller. */ + +/* Defined-symbol binding precedence: strong (global) beats weak beats local. */ +static inline int symresolve_bind_strength(u16 bind) { + switch (bind) { + case SB_GLOBAL: + return 3; + case SB_WEAK: + return 2; + case SB_LOCAL: + return 1; + default: + return 0; + } +} + +/* A symbol that contributes a definition: not SK_UNDEF, and either an + * absolute/common/file pseudo-def or anchored to a real section. */ +static inline int symresolve_sym_is_def(const ObjSym* s) { + return s && s->kind != SK_UNDEF && + (s->kind == SK_ABS || s->kind == SK_COMMON || s->kind == SK_FILE || + s->section_id != OBJ_SEC_NONE); +} + +/* An unreferenced global/weak extern declaration: a header artifact, not a real + * demand. The frontend synthesizes one per visible prototype, so pruning these + * keeps unused archive members from being pulled in. */ +static inline int symresolve_sym_is_spurious_undef(const ObjSym* s) { + return s && s->section_id == OBJ_SEC_NONE && s->kind != SK_ABS && + s->kind != SK_COMMON && !s->referenced && + (s->bind == SB_GLOBAL || s->bind == SB_WEAK); +} + +/* The decision-relevant attributes of a defining symbol. `in_comdat` is true + * when the definition lives in an SF_GROUP (COMDAT/SELECTANY) section — the + * caller computes it, since it requires the section table. */ +typedef struct SymAttrs { + u16 bind; /* SymBind */ + u16 kind; /* SymKind */ + u64 size; + u32 common_align; /* SK_COMMON only; 0 otherwise */ + u8 in_comdat; +} SymAttrs; + +typedef enum SymMergeKind { + SYM_MERGE_KEEP_EXISTING, /* existing definition wins; drop incoming */ + SYM_MERGE_REPLACE, /* incoming definition wins; overwrite existing */ + SYM_MERGE_COMMON, /* common+common, incoming larger: take it with + * merged_align = max(both) */ + SYM_MERGE_COMDAT_DISCARD, /* COFF SELECTANY: keep existing, discard the + * incoming COMDAT section */ + SYM_MERGE_ODR_ERROR, /* duplicate strong definition */ +} SymMergeKind; + +typedef struct SymMergeResult { + SymMergeKind kind; + u32 merged_align; /* valid only for SYM_MERGE_COMMON */ +} SymMergeResult; + +/* Resolve two definitions of the same name. Both `existing` and `incoming` are + * definitions (the caller filters undefs and spurious externs first). Mirrors + * the linker's historical precedence exactly: common merging takes the larger + * (max align), a real definition beats a common, a stronger binding wins, + * strong-vs-strong is an ODR error unless both are COMDAT, and weak-vs-weak / + * weaker-incoming keeps the first. */ +SymMergeResult symresolve_merge(SymAttrs existing, SymAttrs incoming); + +#endif diff --git a/src/opt/opt.c b/src/opt/opt.c @@ -15,6 +15,7 @@ #include "core/slice.h" #include "core/strbuf.h" #include "debug/debug.h" +#include "obj/symresolve.h" #include "opt/opt_internal.h" #undef Operand @@ -23,11 +24,24 @@ #undef CGParamDesc #undef CGScopeDesc +/* Fixpoint bound for the whole-program inliner. opt_inline internally clamps to + * 4; an inlined straightline body introduces no new call sites, so a small + * bound converges. */ +#define OPT_WHOLE_PROGRAM_INLINE_ITERS 4 + typedef struct OptImpl { Compiler* c; CgTarget* target; NativeTarget* native; int level; + /* Whole-program (LTO) mode: defer all per-function emission to finalize so + * the module-wide sweep can GC dead symbols and run cross-function inlining + * over the full reachable set. Enabled whenever the optimizer runs (-O1 and + * above; see opt_cgtarget_new). The ARM64 path already defers + * unconditionally; this generalizes that to every arch and adds the inliner. + */ + int whole_program; + CgFinishPolicy finish_policy; Writer* dump_writer; /* Registry of functions recorded so far, for tiny-inline callee lookup. * `lowered_cache` is parallel to `cg_by_sym`: a lazily re-lowered @@ -41,6 +55,30 @@ typedef struct OptImpl { HASHMAP_DEFINE(OptFuncIndex, ObjSymId, u32, hash_u32); +/* A symbol whose definition can be replaced at link time must not have its body + * inlined — the inlined copy would defeat the override. Weak definitions are + * interposable in every output kind, so they are never safe to inline. (The + * broader default-visibility interposition under -shared is governed by the + * preserved set; see doc/plan/LTO.md §5/§9.) */ +static int opt_cg_func_interposable(OptImpl* o, const CgIrFunc* cg) { + const ObjSym* s; + if (!cg || cg->desc.sym == OBJ_SYM_NONE) return 0; + if (cg->desc.sym_bind == SB_WEAK) return 1; + s = obj_symbol_get(o->target->obj, cg->desc.sym); + return s && s->bind == SB_WEAK; +} + +/* Lower a recorded function to the pre-machinize Func used by the inliners, and + * mark interposable definitions INLINE_NEVER so neither the streaming + * tiny-inliner nor the whole-program inliner fuses their bodies into callers. + * Marking the callee's policy is honored by effective_inline_policy in both. */ +static Func* opt_lower_for_inline(OptImpl* o, const CgIrFunc* cg) { + Func* f = opt_func_from_cg_ir(o->c, cg); + if (f && opt_cg_func_interposable(o, cg)) + f->desc.inline_policy = KIT_CG_INLINE_NEVER; + return f; +} + /* Lazily re-lower (and cache) the pre-machinize Func for a recorded callee * symbol. Returns NULL for forward-defined callees not yet recorded. */ static Func* opt_tiny_callee_lookup(void* ctx, ObjSymId sym) { @@ -48,7 +86,7 @@ static Func* opt_tiny_callee_lookup(void* ctx, ObjSymId sym) { for (u32 i = 0; i < o->ncg; ++i) { if (o->cg_by_sym[i]->desc.sym != sym) continue; if (!o->lowered_cache[i]) - o->lowered_cache[i] = opt_func_from_cg_ir(o->c, o->cg_by_sym[i]); + o->lowered_cache[i] = opt_lower_for_inline(o, o->cg_by_sym[i]); return o->lowered_cache[i]; } return NULL; @@ -84,15 +122,17 @@ static void opt_dbg_dump(OptImpl* o, Func* f, const char* tag) { (const char*)bytes); } -static void opt_run_o1_native(OptImpl* o, Func* f) { - OptLiveInfo live; - OptLiveInfo regalloc_live; +/* CFG-prep prefix shared by the streaming and whole-program pipelines: lower's + * raw blocks -> built CFG -> jump cleanup -> rebuilt CFG -> local simplify. In + * whole-program mode this runs on every reachable function before opt_inline + * sees the FuncSet, so the inliner observes the same block shape the streaming + * path does. */ +static void opt_o1_native_prepare(OptImpl* o, Func* f) { if (!o->native) compiler_panic(o->c, f ? f->desc.loc : (SrcLoc){0, 0, 0}, "O1 optimizer requires a native target"); opt_dbg_dump(o, f, "entry"); - metrics_scope_begin(o->c, "opt.o1.total"); metrics_count(o->c, "opt.funcs", 1); metrics_count(o->c, "opt.blocks", f->nblocks); metrics_count(o->c, "opt.pregs", f->npregs); @@ -109,6 +149,24 @@ static void opt_run_o1_native(OptImpl* o, Func* f) { metrics_scope_begin(o->c, "opt.cfg.simplify_local"); opt_simplify_local(f); metrics_scope_end(o->c, "opt.cfg.simplify_local"); +} + +/* The machinize-through-emit suffix. `cfg_dirty` is set by the whole-program + * path: opt_inline mutated the caller's blocks in place and left CFG analysis + * stale, so the CFG must be rebuilt before tiny-inline/verify/machinize. The + * streaming path passes 0 (prepare just built it). */ +static void opt_o1_native_finish(OptImpl* o, Func* f, int cfg_dirty) { + OptLiveInfo live; + OptLiveInfo regalloc_live; + + if (cfg_dirty) { + /* opt_inline maintains succ + emit_order only; rebuild preds/CFG and merge + * the BR-glue chains back into straight-line blocks (build_cfg -> + * jump_cleanup -> build_cfg) before any pass that needs the analysis. */ + opt_build_cfg(f); + opt_jump_cleanup(f, OPT_JUMP_CLEANUP_CFG); + opt_build_cfg(f); + } metrics_scope_begin(o->c, "opt.o1.tiny_inline"); int inlined = opt_try_tiny_inline(f, opt_tiny_callee_lookup, o); @@ -223,6 +281,15 @@ static void opt_run_o1_native(OptImpl* o, Func* f) { if (o->native->mc && o->native->mc->debug) debug_func_end(o->native->mc->debug); metrics_scope_end(o->c, "opt.emit"); +} + +/* Streaming pipeline for one function: prepare + finish, back to back. Used by + * the eager per-function path (x64/rv64 below -O2) and by the ARM64/-O2 sweep + * for functions with no cross-function inlining to do. */ +static void opt_run_o1_native(OptImpl* o, Func* f) { + metrics_scope_begin(o->c, "opt.o1.total"); + opt_o1_native_prepare(o, f); + opt_o1_native_finish(o, f, /*cfg_dirty=*/0); metrics_scope_end(o->c, "opt.o1.total"); } @@ -317,7 +384,11 @@ static void opt_on_func(void* user, CgIrFunc* cg_func) { /* The dump writer renders the semantic CG IR tape — the IR as recorded, * before lowering to the optimizer's CFG form. */ if (o->dump_writer) cg_ir_func_dump(cg_func, o->dump_writer); - if (o->c->target.arch == KIT_ARCH_ARM_64) return; + /* Defer emission to the finalize sweep whenever whole-program mode is on — + * the same path for every arch. The sweep does GC + cross-function inlining + * over the full reachable set. The eager emit below is the fallback for a + * (currently unreachable) non-whole-program configuration. */ + if (o->whole_program) return; metrics_scope_begin(o->c, "opt.o1.cg_ir_lower"); f = opt_func_from_cg_ir(o->c, cg_func); metrics_scope_end(o->c, "opt.o1.cg_ir_lower"); @@ -325,9 +396,94 @@ static void opt_on_func(void* user, CgIrFunc* cg_func) { opt_maybe_capture_interp(o, cg_func); } +static int opt_module_has_asm(const CgIrModule* module) { + if (!module) return 0; + if (module->nfile_scope_asms) return 1; + for (u32 i = 0; i < module->nfuncs; ++i) { + const CgIrFunc* f = module->funcs[i]; + if (!f || f->removed) continue; + for (u32 k = 0; k < f->ninsts; ++k) + if (f->insts[k].op == CG_IR_ASM_BLOCK) return 1; + } + return 0; +} + +static int opt_sym_in_preserved_section(OptImpl* o, ObjSymId sym, + const ObjSym* s) { + const Section* sec; + const ObjAtom* atom; + ObjAtomId aid; + if (!o || !s || s->section_id == OBJ_SEC_NONE) return 0; + sec = obj_section_get(o->target->obj, s->section_id); + if (sec && ((sec->flags & SF_RETAIN) || sec->sem == SSEM_INIT_ARRAY || + sec->sem == SSEM_FINI_ARRAY || sec->sem == SSEM_PREINIT_ARRAY)) + return 1; + aid = obj_atom_find_symbol(o->target->obj, sym); + atom = obj_atom_get(o->target->obj, aid); + return atom && (atom->flags & OBJ_ATOM_RETAIN); +} + +static void opt_build_preserved_set(OptImpl* o, ObjSymSet* preserved) { + for (u32 i = 0; i < o->finish_policy.npreserved_symbols; ++i) { + ObjSymId sym = o->finish_policy.preserved_symbols[i]; + if (sym != OBJ_SYM_NONE) (void)ObjSymSet_set(preserved, sym, 1); + } +} + +static int opt_sym_must_stay_external(OptImpl* o, int module_has_asm, + const ObjSymSet* preserved, ObjSymId sym, + const ObjSym* s) { + if (!s || s->removed) return 1; + if (s->bind == SB_LOCAL) return 1; + if (o->finish_policy.output_kind != KIT_CG_OUTPUT_EXECUTABLE) return 1; + if (o->finish_policy.interposition_policy == + KIT_CG_INTERPOSITION_DEFAULT_VISIBILITY) + return 1; + if (ObjSymSet_get(preserved, sym)) return 1; + if (s->bind == SB_WEAK) return 1; + if (s->kind == SK_IFUNC) return 1; + if (s->flags & KIT_CG_SYM_USED) return 1; + if (module_has_asm) return 1; + if (opt_sym_in_preserved_section(o, sym, s)) return 1; + return 0; +} + +static int opt_sym_internalizable(const ObjSym* s) { + if (!s || s->removed || s->bind == SB_LOCAL) return 0; + if (!symresolve_sym_is_def(s)) return 0; + switch ((SymKind)s->kind) { + case SK_FUNC: + case SK_OBJ: + case SK_TLS: + return 1; + default: + return 0; + } +} + +static void opt_internalize_non_preserved(OptImpl* o, const CgIrModule* module, + const ObjSymSet* preserved) { + ObjSymIter* it; + ObjSymEntry ent; + int module_has_asm; + if (!o || !o->target || !o->target->obj) return; + if (o->finish_policy.output_kind != KIT_CG_OUTPUT_EXECUTABLE) return; + module_has_asm = opt_module_has_asm(module); + it = obj_symiter_new(o->target->obj); + while (it && obj_symiter_next(it, &ent)) { + const ObjSym* s = ent.sym; + if (!opt_sym_internalizable(s)) continue; + if (opt_sym_must_stay_external(o, module_has_asm, preserved, ent.id, s)) + continue; + obj_symbol_set_bind(o->target->obj, ent.id, SB_LOCAL); + obj_symbol_set_vis(o->target->obj, ent.id, SV_HIDDEN); + } + if (it) obj_symiter_free(it); +} + static int opt_func_is_root(OptImpl* o, const CgIrFunc* f) { const ObjSym* s; - if (!f || f->desc.sym == OBJ_SYM_NONE) return 0; + if (!f || f->removed || f->desc.sym == OBJ_SYM_NONE) return 0; s = obj_symbol_get(o->target->obj, f->desc.sym); if (!s || s->removed) return 0; if (s->bind != SB_LOCAL) return 1; @@ -335,6 +491,62 @@ static int opt_func_is_root(OptImpl* o, const CgIrFunc* f) { return 0; } +static SymAttrs opt_func_sym_attrs(OptImpl* o, const CgIrFunc* f) { + SymAttrs a; + const ObjSym* s; + memset(&a, 0, sizeof a); + if (!f) return a; + s = obj_symbol_get(o->target->obj, f->desc.sym); + a.bind = f->desc.sym_bind ? f->desc.sym_bind : (s ? s->bind : SB_GLOBAL); + a.kind = f->desc.sym_kind ? f->desc.sym_kind : SK_FUNC; + a.size = s ? s->size : 0; + a.common_align = 0; + a.in_comdat = 0; + return a; +} + +static void opt_resolve_duplicate_funcs(OptImpl* o, const CgIrModule* module, + OptFuncIndex* index) { + for (u32 i = 0; i < module->nfuncs; ++i) { + CgIrFunc* incoming = module->funcs[i]; + u32* existing_idx; + if (!incoming || incoming->removed || incoming->desc.sym == OBJ_SYM_NONE) + continue; + existing_idx = OptFuncIndex_get(index, incoming->desc.sym); + if (!existing_idx) { + (void)OptFuncIndex_set(index, incoming->desc.sym, i); + continue; + } + { + CgIrFunc* existing = module->funcs[*existing_idx]; + SymMergeResult mr = symresolve_merge(opt_func_sym_attrs(o, existing), + opt_func_sym_attrs(o, incoming)); + switch (mr.kind) { + case SYM_MERGE_REPLACE: + if (existing) existing->removed = 1; + (void)OptFuncIndex_set(index, incoming->desc.sym, i); + if (incoming->desc.sym_bind) + obj_symbol_set_bind(o->target->obj, incoming->desc.sym, + (SymBind)incoming->desc.sym_bind); + break; + case SYM_MERGE_KEEP_EXISTING: + case SYM_MERGE_COMDAT_DISCARD: + incoming->removed = 1; + if (existing && existing->desc.sym_bind) + obj_symbol_set_bind(o->target->obj, incoming->desc.sym, + (SymBind)existing->desc.sym_bind); + break; + case SYM_MERGE_COMMON: + incoming->removed = 1; + break; + case SYM_MERGE_ODR_ERROR: + compiler_panic(o->c, incoming->desc.loc, + "duplicate definition of symbol"); + } + } + } +} + static void opt_mark_func(u8* reachable, u8* queued, u32* queue, u32* qtail, u32 idx) { if (reachable[idx]) return; @@ -492,8 +704,16 @@ static void opt_prune_debug(OptImpl* o) { debug_prune_removed_funcs(o->native->mc->debug); } -static void opt_emit_reachable_aarch64(OptImpl* o, const CgIrModule* module) { +/* Whole-module finalize: seed roots, walk the call/use + data-reloc graph, + * remove unreachable local symbols, then lower + optimize + emit only the live + * set. Arch-independent — the ARM64 path has always finalized this way; -O2 now + * routes every arch through here (see opt_on_finalize). When `do_inline` is set + * the live functions are lowered into a FuncSet and run through the + * whole-program inliner before the per-function machinize/emit suffix. */ +static void opt_whole_module_finalize(OptImpl* o, const CgIrModule* module, + int do_inline) { OptFuncIndex index; + ObjSymSet preserved; ObjSymSet data_seen; u8* reachable; u8* queued; @@ -506,6 +726,7 @@ static void opt_emit_reachable_aarch64(OptImpl* o, const CgIrModule* module) { u32 nsym = 1; if (!module || !module->nfuncs) return; OptFuncIndex_init_cap(&index, o->c->ctx->heap, 0); + ObjSymSet_init_cap(&preserved, o->c->ctx->heap, 0); ObjSymSet_init_cap(&data_seen, o->c->ctx->heap, 0); reachable = arena_zarray(o->c->tu, u8, module->nfuncs); queued = arena_zarray(o->c->tu, u8, module->nfuncs); @@ -517,12 +738,11 @@ static void opt_emit_reachable_aarch64(OptImpl* o, const CgIrModule* module) { if (it) obj_symiter_free(it); } data_queue = arena_array(o->c->tu, ObjSymId, nsym); + opt_resolve_duplicate_funcs(o, module, &index); + opt_build_preserved_set(o, &preserved); + opt_internalize_non_preserved(o, module, &preserved); for (u32 i = 0; i < module->nfuncs; ++i) { - CgIrFunc* f = module->funcs[i]; - if (f && f->desc.sym != OBJ_SYM_NONE) - (void)OptFuncIndex_set(&index, f->desc.sym, i); - } - for (u32 i = 0; i < module->nfuncs; ++i) { + if (module->funcs[i] && module->funcs[i]->removed) continue; if (module->nfile_scope_asms || opt_func_is_root(o, module->funcs[i])) opt_mark_func(reachable, queued, queue, &qtail, i); } @@ -541,6 +761,7 @@ static void opt_emit_reachable_aarch64(OptImpl* o, const CgIrModule* module) { } for (u32 i = 0; i < module->nfuncs; ++i) { CgIrFunc* cg_func = module->funcs[i]; + if (cg_func && cg_func->removed) continue; if (reachable[i]) continue; if (cg_func && cg_func->desc.sym != OBJ_SYM_NONE) { const ObjSym* s = obj_symbol_get(o->target->obj, cg_func->desc.sym); @@ -549,17 +770,61 @@ static void opt_emit_reachable_aarch64(OptImpl* o, const CgIrModule* module) { } } opt_prune_debug(o); - for (u32 i = 0; i < module->nfuncs; ++i) { - Func* f; - if (!reachable[i]) continue; - metrics_scope_begin(o->c, "opt.o1.cg_ir_lower"); - f = opt_func_from_cg_ir(o->c, module->funcs[i]); - metrics_scope_end(o->c, "opt.o1.cg_ir_lower"); - opt_run_o1_native(o, f); - opt_maybe_capture_interp(o, module->funcs[i]); + if (!do_inline) { + /* Streaming emit: lower and run the full per-function pipeline in place. + * Preserves the historical ARM64 -O1 behavior exactly. */ + for (u32 i = 0; i < module->nfuncs; ++i) { + Func* f; + if (!reachable[i]) continue; + metrics_scope_begin(o->c, "opt.o1.cg_ir_lower"); + f = opt_func_from_cg_ir(o->c, module->funcs[i]); + metrics_scope_end(o->c, "opt.o1.cg_ir_lower"); + opt_run_o1_native(o, f); + opt_maybe_capture_interp(o, module->funcs[i]); + } + } else { + /* Whole-program inline: lower + CFG-prep every live function into one + * FuncSet so the inliner can resolve direct callees by symbol across the + * module, inline under the growth-gated cost model, then run the + * machinize/emit suffix on each (cfg_dirty=1 because opt_inline left the + * caller CFGs stale). Functions and their source CgIrFuncs are tracked in + * parallel so the interp-capture re-lowers the right body. */ + FuncSet fs; + CgIrFunc** cg_srcs; + u32 nlive = 0; + for (u32 i = 0; i < module->nfuncs; ++i) + if (reachable[i] && module->funcs[i] && !module->funcs[i]->removed) + ++nlive; + memset(&fs, 0, sizeof fs); + fs.c = o->c; + fs.arena = o->c->tu; + fs.funcs = arena_array(o->c->tu, Func*, nlive ? nlive : 1u); + fs.cap = nlive; + cg_srcs = arena_array(o->c->tu, CgIrFunc*, nlive ? nlive : 1u); + for (u32 i = 0; i < module->nfuncs; ++i) { + Func* f; + if (!reachable[i] || !module->funcs[i] || module->funcs[i]->removed) + continue; + metrics_scope_begin(o->c, "opt.o1.cg_ir_lower"); + f = opt_lower_for_inline(o, module->funcs[i]); + metrics_scope_end(o->c, "opt.o1.cg_ir_lower"); + opt_o1_native_prepare(o, f); + cg_srcs[fs.nfuncs] = module->funcs[i]; + fs.funcs[fs.nfuncs++] = f; + } + metrics_scope_begin(o->c, "opt.inline.total"); + opt_inline(&fs, OPT_WHOLE_PROGRAM_INLINE_ITERS); + metrics_scope_end(o->c, "opt.inline.total"); + for (u32 k = 0; k < fs.nfuncs; ++k) { + metrics_scope_begin(o->c, "opt.o1.total"); + opt_o1_native_finish(o, fs.funcs[k], /*cfg_dirty=*/1); + metrics_scope_end(o->c, "opt.o1.total"); + opt_maybe_capture_interp(o, cg_srcs[k]); + } } opt_refresh_or_prune_aliases(o, module, &index, reachable); ObjSymSet_fini(&data_seen); + ObjSymSet_fini(&preserved); OptFuncIndex_fini(&index); } @@ -573,8 +838,12 @@ static void opt_on_finalize(void* user, const CgIrModule* module) { o->native->file_scope_asm(o->native, module->file_scope_asms[i].src, module->file_scope_asms[i].len); } - if (o->c->target.arch == KIT_ARCH_ARM_64) - opt_emit_reachable_aarch64(o, module); + /* Whole-program mode finalizes through the module sweep — one path for every + * arch: GC + cross-function inlining over the full reachable set. If it were + * off, every arch would have emitted eagerly in opt_on_func and the sweep + * would find nothing, so it is skipped. */ + if (o->whole_program) + opt_whole_module_finalize(o, module, /*do_inline=*/o->whole_program); if (o->native && o->native->finalize) o->native->finalize(o->native); } @@ -634,7 +903,13 @@ CgTarget* opt_cgtarget_new(Compiler* c, CgTarget* target, int level) { o->c = c; o->target = target; o->native = native_direct_target_native(target); - o->level = 1; + o->level = level; + /* Whenever the optimizer is engaged (-O1 and above) we run whole-program + * optimization: deferred emission plus the module-wide reachability sweep and + * cross-function inliner. The optimizer recorder only exists at level >= 1 + * (see kit_cg_begin), so this is effectively "on whenever optimizing". + * -O0 uses the single-pass direct target and never reaches this code. */ + o->whole_program = (level >= 1) ? 1 : 0; CgIrRecorderConfig cfg; memset(&cfg, 0, sizeof cfg); @@ -652,3 +927,11 @@ void opt_set_dump_writer(CgTarget* t, Writer* w) { OptImpl* o = rec ? (OptImpl*)cg_ir_recorder_user(rec) : NULL; if (o) o->dump_writer = w; } + +void opt_set_finish_policy(CgTarget* t, const CgFinishPolicy* policy) { + CgIrRecorder* rec = cg_ir_recorder_from_target(t); + OptImpl* o = rec ? (OptImpl*)cg_ir_recorder_user(rec) : NULL; + if (!o) return; + memset(&o->finish_policy, 0, sizeof(o->finish_policy)); + if (policy) o->finish_policy = *policy; +} diff --git a/src/opt/opt.h b/src/opt/opt.h @@ -11,6 +11,7 @@ * opt_level >= 1 is normalized internally to this O1 path. */ CgTarget* opt_cgtarget_new(Compiler*, CgTarget* target, int level); Func* opt_func_from_cg_ir(Compiler*, const CgIrFunc*); +void opt_set_finish_policy(CgTarget*, const CgFinishPolicy*); /* Interpreter tap: run the maximal target-independent subset of the O1 pipeline * (everything in opt_run_o1_native up to, but excluding, opt_machinize_native / diff --git a/src/opt/pass_lower.c b/src/opt/pass_lower.c @@ -342,12 +342,11 @@ static void set_preg_pref_for_params(Func* f) { * f->desc.abi so this fires on paths where only f->params[i].abi is set. */ u32 next_int = 0; u32 next_fp = 0; - /* sret on non-aa64 targets consumes the first int arg slot. Only consult - * f->desc.abi for this when it's available; aa64 (the only arch where this - * hint targets x0..x7 today) doesn't have the sret-takes-arg0 quirk. */ - if (f->desc.abi && f->desc.abi->has_sret && - f->opt_target.arch != KIT_ARCH_ARM_64) - next_int = 1; + /* An sret pointer passed in the first integer argument register consumes + * that slot (SysV-x64 rdi, Win64 rcx, RISC-V a0); ABIs that return it in a + * dedicated register (AArch64 x8) do not. Driven by the ABI descriptor so no + * arch identity is needed here. */ + if (f->desc.abi && f->desc.abi->sret_consumes_int_arg) next_int = 1; for (u32 i = 0; i < f->nparams; ++i) { IRParam* p = &f->params[i]; const ABIArgInfo* ai = p->abi; diff --git a/src/wasm/wasm.h b/src/wasm/wasm.h @@ -566,6 +566,7 @@ void wasm_validate(WasmModule* m, KitCompiler* c); * Used by wasm_validate and by callers that synthesize scratch functions * (e.g. the wasm-target inline-asm path). */ void wasm_validate_func(KitCompiler* c, WasmModule* m, WasmFunc* f); +void wasm_emit_cg_into(KitCompiler* c, KitCg* cg, const WasmModule* m); void wasm_emit_cg(KitCompiler* c, const KitCodeOptions* code_opts, KitObjBuilder* out, const WasmModule* m); void wasm_encode(KitCompiler* c, const WasmModule* m, KitWriter* out); diff --git a/test/api/cg_fp_cmp_test.c b/test/api/cg_fp_cmp_test.c @@ -200,13 +200,14 @@ static void run_exec(void) { EXPECT(kit_cg_new(c, &cg) == KIT_OK && cg, "%s: cg_new", tag); memset(&opts, 0, sizeof opts); opts.opt_level = 1; /* interp capture requires the optimizer pass */ - kit_cg_begin_obj(cg, ob, &opts); + kit_cg_begin(cg, ob, &opts); for (i = 0; i < NPRED; ++i) { snprintf(nm, sizeof nm, "cmp_%s_%d", tag, i); build_cmp_fn(c, cg, nm, PREDS[i].op, /*use_f128=*/0); } - EXPECT(kit_cg_end_obj(cg) == KIT_OK, "%s: end_obj", tag); + EXPECT(kit_cg_finish(cg, NULL) == KIT_OK, "%s: finish", tag); + EXPECT(kit_cg_detach(cg) == KIT_OK, "%s: detach", tag); for (i = 0; i < NPRED; ++i) { KitInterpFunc* fn; @@ -254,7 +255,7 @@ static void run_emit(KitArchKind arch, KitOSKind os, KitObjFmt fmt, EXPECT(kit_cg_new(c, &cg) == KIT_OK && cg, "%s: cg_new", tag); memset(&opts, 0, sizeof opts); opts.opt_level = opt_level; - kit_cg_begin_obj(cg, ob, &opts); + kit_cg_begin(cg, ob, &opts); for (i = 0; i < NPRED; ++i) { snprintf(nm, sizeof nm, "emit_%s_o%d_f64_%d", tag, opt_level, i); @@ -266,7 +267,9 @@ static void run_emit(KitArchKind arch, KitOSKind os, KitObjFmt fmt, } /* If any backend mishandles a new opcode it panics here (aborting the test); * otherwise the object finalizes cleanly. */ - EXPECT(kit_cg_end_obj(cg) == KIT_OK, "%s/O%d: end_obj failed", tag, + EXPECT(kit_cg_finish(cg, NULL) == KIT_OK, "%s/O%d: finish failed", tag, + opt_level); + EXPECT(kit_cg_detach(cg) == KIT_OK, "%s/O%d: detach failed", tag, opt_level); kit_cg_free(cg); diff --git a/test/api/cg_switch_test.c b/test/api/cg_switch_test.c @@ -88,7 +88,7 @@ static void build_switch_fn(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "[%s/O%d] cg_new failed", sh->name, opt_level); if (!cg) { kit_obj_builder_free(ob); diff --git a/test/api/cg_type_test.c b/test/api/cg_type_test.c @@ -50,6 +50,11 @@ static int open_emitted_obj(KitCompiler* c, KitObjBuilder* ob, return 1; } +static void finish_cg(KitCg* cg, const char* tag) { + EXPECT(kit_cg_finish(cg, NULL) == KIT_OK, "%s cg finish failed", tag); + EXPECT(kit_cg_detach(cg) == KIT_OK, "%s cg detach failed", tag); +} + typedef struct PanicRunCtx { void (*fn)(void*); void* arg; @@ -97,7 +102,7 @@ static void exercise_cg_handles(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -179,7 +184,7 @@ static void exercise_cg_scalar_local(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -249,7 +254,7 @@ static void exercise_cg_late_local_addr(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -337,7 +342,7 @@ static void exercise_cg_data_entsize(KitCompiler* c, KitCgTypeId i8_ty) { if (!ob) return; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "entsize cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -368,6 +373,7 @@ static void exercise_cg_data_entsize(KitCompiler* c, KitCgTypeId i8_ty) { kit_cg_data_begin(cg, sym, data_attrs); kit_cg_data_bytes(cg, bytes, sizeof bytes); kit_cg_data_end(cg); + finish_cg(cg, "entsize"); kit_cg_free(cg); { @@ -438,7 +444,7 @@ static void exercise_cg_literal_folds(KitCompiler* c, KitCgTypeId i32_ty) { if (!ob) return; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "literal fold cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -481,6 +487,7 @@ static void exercise_cg_literal_folds(KitCompiler* c, KitCgTypeId i32_ty) { kit_cg_func_end(cg); } + finish_cg(cg, "literal fold"); kit_cg_free(cg); EXPECT(text_size(c, ob) <= 128, "literal folds should avoid arithmetic materialization, text size=%u", @@ -509,7 +516,7 @@ static uint32_t cg_emit_delayed_chain(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "delayed chain cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -555,6 +562,7 @@ static uint32_t cg_emit_delayed_chain(KitCompiler* c, KitCgTypeId i32_ty, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "delayed chain"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -582,7 +590,7 @@ static uint32_t cg_emit_unary_chain(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "unary chain cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -626,6 +634,7 @@ static uint32_t cg_emit_unary_chain(KitCompiler* c, KitCgTypeId i32_ty, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "unary chain"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -652,7 +661,7 @@ static uint32_t cg_emit_local_shadow(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "local shadow cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -695,6 +704,7 @@ static uint32_t cg_emit_local_shadow(KitCompiler* c, KitCgTypeId i32_ty, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "local shadow"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -722,7 +732,7 @@ static uint32_t cg_emit_delayed_cmp(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "delayed cmp cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -770,6 +780,7 @@ static uint32_t cg_emit_delayed_cmp(KitCompiler* c, KitCgTypeId i32_ty, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "delayed cmp"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -798,7 +809,7 @@ static uint32_t cg_emit_delayed_store(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "delayed store cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -851,6 +862,7 @@ static uint32_t cg_emit_delayed_store(KitCompiler* c, KitCgTypeId i32_ty, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "delayed store"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -879,7 +891,7 @@ static uint32_t cg_emit_delayed_pressure(KitCompiler* c, KitCgTypeId i32_ty, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "delayed pressure cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -935,6 +947,7 @@ static uint32_t cg_emit_delayed_pressure(KitCompiler* c, KitCgTypeId i32_ty, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "delayed pressure"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -971,7 +984,7 @@ static uint32_t cg_emit_local_shadow_boundary(KitCompiler* c, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "local shadow boundary cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -1048,6 +1061,7 @@ static uint32_t cg_emit_local_shadow_boundary(KitCompiler* c, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "local shadow boundary"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -1077,7 +1091,7 @@ static uint32_t cg_emit_local_shadow_partial_store(KitCompiler* c, if (!ob) return 0; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "partial shadow cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -1127,6 +1141,7 @@ static uint32_t cg_emit_local_shadow_partial_store(KitCompiler* c, kit_cg_ret(cg); kit_cg_func_end(cg); + finish_cg(cg, "partial shadow"); kit_cg_free(cg); size = text_size(c, ob); kit_obj_builder_free(ob); @@ -1218,7 +1233,7 @@ static KitCg* cg_begin_bad_store_func(KitCompiler* c, const char* name) { if (!ob) return NULL; cg = NULL; (void)kit_cg_new(c, &cg); - if (cg) (void)kit_cg_begin_obj(cg, ob, &opts); + if (cg) (void)kit_cg_begin(cg, ob, &opts); EXPECT(cg != NULL, "bad-store cg allocation failed"); if (!cg) { kit_obj_builder_free(ob); @@ -1339,20 +1354,83 @@ static void exercise_cg_begin_end_two_objects(KitCompiler* c) { EXPECT(ob1 && ob2, "obj builder allocation failed for cg begin/end session"); EXPECT(kit_cg_new(c, &cg) == KIT_OK && cg, "cg session new failed"); if (cg && ob1) { - EXPECT(kit_cg_begin_obj(cg, ob1, &opts) == KIT_OK, + EXPECT(kit_cg_begin(cg, ob1, &opts) == KIT_OK, "cg begin first object failed"); - EXPECT(kit_cg_end_obj(cg) == KIT_OK, "cg end first object failed"); + EXPECT(kit_cg_finish(cg, NULL) == KIT_OK, "cg finish first object failed"); + EXPECT(kit_cg_detach(cg) == KIT_OK, "cg detach first object failed"); } if (cg && ob2) { - EXPECT(kit_cg_begin_obj(cg, ob2, &opts) == KIT_OK, + EXPECT(kit_cg_begin(cg, ob2, &opts) == KIT_OK, "cg begin second object failed"); - EXPECT(kit_cg_end_obj(cg) == KIT_OK, "cg end second object failed"); + EXPECT(kit_cg_finish(cg, NULL) == KIT_OK, + "cg finish second object failed"); + EXPECT(kit_cg_detach(cg) == KIT_OK, "cg detach second object failed"); } kit_cg_free(cg); kit_obj_builder_free(ob1); kit_obj_builder_free(ob2); } +static void exercise_cg_free_does_not_finish(KitCompiler* c, + KitCgTypeId i32_ty) { + KitCodeOptions opts; + KitCgUnitOptions unit_opts; + KitObjBuilder* ob = NULL; + KitCg* cg = NULL; + KitCgFuncSig sig; + KitCgDecl decl; + KitCgSym sym; + + memset(&opts, 0, sizeof opts); + opts.opt_level = 1; + ob = new_obj(c); + EXPECT(ob != NULL, "no-finish obj builder allocation failed"); + if (!ob) return; + EXPECT(kit_cg_new(c, &cg) == KIT_OK && cg, "no-finish cg new failed"); + if (!cg) { + kit_obj_builder_free(ob); + return; + } + EXPECT(kit_cg_begin(cg, ob, &opts) == KIT_OK, "no-finish cg begin failed"); + memset(&unit_opts, 0, sizeof unit_opts); + unit_opts.source_name = KIT_SLICE_LIT("cg_free_does_not_finish.c"); + EXPECT(kit_cg_begin_unit(cg, &unit_opts) == KIT_OK, + "no-finish begin unit failed"); + + memset(&sig, 0, sizeof sig); + { + KitCgFuncResult result; + memset(&result, 0, sizeof result); + result.type = i32_ty; + sig.results = &result; + sig.nresults = 1; + sig.call_conv = KIT_CG_CC_TARGET_C; + + memset(&decl, 0, sizeof decl); + decl.kind = KIT_CG_DECL_FUNC; + decl.linkage_name = + kit_sym_intern(c, KIT_SLICE_LIT("cg_free_does_not_finish")); + decl.display_name = decl.linkage_name; + decl.type = kit_cg_type_func(c, sig); + decl.sym.bind = KIT_SB_GLOBAL; + decl.sym.visibility = KIT_CG_VIS_DEFAULT; + sym = kit_cg_decl(cg, decl); + } + EXPECT(sym != KIT_CG_SYM_NONE, "no-finish decl failed"); + kit_cg_func_begin(cg, sym); + kit_cg_push_int(cg, 42, i32_ty); + kit_cg_ret(cg); + kit_cg_func_end(cg); + EXPECT(kit_cg_end_unit(cg) == KIT_OK, "no-finish end unit failed"); + + kit_cg_free(cg); + EXPECT(kit_obj_builder_finalize(ob) == KIT_OK, + "no-finish explicit object finalize failed"); + EXPECT(text_size(c, ob) == 0, + "kit_cg_free must not lower optimized IR into the object"); + kit_obj_builder_free(ob); +} + int main(void) { KitTargetSpec target; KitCompiler* c; @@ -1531,6 +1609,7 @@ int main(void) { exercise_cg_memory_mismatch_diags(c, i32_ty, i64_ty, rec); exercise_compile_session_two_deltas(c); exercise_cg_begin_end_two_objects(c); + exercise_cg_free_does_not_finish(c, i32_ty); kit_compiler_free(c); kit_unit_summary(&g_u, "cg_api_test"); diff --git a/test/arch/inline_public_test.h b/test/arch/inline_public_test.h @@ -55,7 +55,7 @@ static inline KitStatus it_emit_func(KitCompiler* c, void* user) { if (kit_obj_builder_new(c, &emit->ob) != KIT_OK) return KIT_ERR; if (kit_cg_new(c, &cg) != KIT_OK || !cg) return KIT_ERR; memset(&opts, 0, sizeof opts); - if (kit_cg_begin_obj(cg, emit->ob, &opts) != KIT_OK) return KIT_ERR; + if (kit_cg_begin(cg, emit->ob, &opts) != KIT_OK) return KIT_ERR; bi = kit_cg_builtin_types(c); memset(&sig, 0, sizeof sig); @@ -75,7 +75,8 @@ static inline KitStatus it_emit_func(KitCompiler* c, void* user) { emit->body(c, cg, bi.id[KIT_CG_BUILTIN_I64]); kit_cg_ret(cg); kit_cg_func_end(cg); - if (kit_cg_end_obj(cg) != KIT_OK) return KIT_ERR; + if (kit_cg_finish(cg, NULL) != KIT_OK) return KIT_ERR; + if (kit_cg_detach(cg) != KIT_OK) return KIT_ERR; kit_cg_free(cg); return KIT_OK; } diff --git a/test/cg/strength_reduce_test.c b/test/cg/strength_reduce_test.c @@ -58,7 +58,7 @@ static KitStatus emit_binop_fn(KitCompiler* c, void* user) { if (kit_cg_new(c, &cg) != KIT_OK || !cg) return KIT_ERR; memset(&opts, 0, sizeof opts); opts.opt_level = 0; /* the -O0 peephole is the subject under test */ - if (kit_cg_begin_obj(cg, ctx->ob, &opts) != KIT_OK) return KIT_ERR; + if (kit_cg_begin(cg, ctx->ob, &opts) != KIT_OK) return KIT_ERR; bi = kit_cg_builtin_types(c); i64_ty = bi.id[KIT_CG_BUILTIN_I64]; @@ -99,7 +99,8 @@ static KitStatus emit_binop_fn(KitCompiler* c, void* user) { kit_cg_ret(cg); kit_cg_func_end(cg); - if (kit_cg_end_obj(cg) != KIT_OK) return KIT_ERR; + if (kit_cg_finish(cg, NULL) != KIT_OK) return KIT_ERR; + if (kit_cg_detach(cg) != KIT_OK) return KIT_ERR; kit_cg_free(cg); return KIT_OK; } diff --git a/test/opt/lto_phase1.sh b/test/opt/lto_phase1.sh @@ -0,0 +1,461 @@ +#!/usr/bin/env bash +# Cross-TU LTO Phase 1: all source-building verbs route through the shared +# staging engine, semantic frontends can emit into one open KitCg, and opaque +# asm remains an ordinary object participant. +set -euo pipefail + +ROOT="$(cd "$(dirname "$0")/../.." && pwd)" +KIT="${KIT:-$ROOT/build/kit}" +WORK="$ROOT/build/test/opt/lto_phase1" +mkdir -p "$WORK" + +call_mnemonics='\b(bl|blr|callq?|jalr?)\b' + +fail_log() { + local label="$1" + local log="$2" + printf 'lto-phase1 FAILED: %s\n' "$label" >&2 + if [ -s "$log" ]; then + sed 's/^/ | /' "$log" >&2 + fi + exit 1 +} + +require_no_calls() { + local dis="$1" + local fn="$2" + local label="$3" + local body ncalls + body="$(sed -n "/<$fn>:/,/^$/p" "$dis")" + if [ -z "$body" ]; then + fail_log "$label missing <$fn> in disassembly" "$dis" + fi + ncalls=$(printf '%s\n' "$body" | grep -cE "$call_mnemonics" || true) + if [ "$ncalls" -ne 0 ]; then + printf 'lto-phase1 FAILED: %s left %s call(s) in <%s>\n' \ + "$label" "$ncalls" "$fn" >&2 + printf '%s\n' "$body" | sed 's/^/ | /' >&2 + exit 1 + fi +} + +require_has_calls() { + local dis="$1" + local fn="$2" + local label="$3" + local body ncalls + body="$(sed -n "/<$fn>:/,/^$/p" "$dis")" + if [ -z "$body" ]; then + fail_log "$label missing <$fn> in disassembly" "$dis" + fi + ncalls=$(printf '%s\n' "$body" | grep -cE "$call_mnemonics" || true) + if [ "$ncalls" -eq 0 ]; then + printf 'lto-phase1 FAILED: %s inlined an interposable weak callee\n' \ + "$label" >&2 + printf '%s\n' "$body" | sed 's/^/ | /' >&2 + exit 1 + fi +} + +require_symbol_bind() { + local symtab="$1" + local sym="$2" + local bind="$3" + local label="$4" + if ! awk -v sym="$sym" -v bind="$bind" \ + '$2 == bind && $NF == sym { found = 1 } END { exit found ? 0 : 1 }' \ + "$symtab"; then + fail_log "$label expected symbol '$sym' with bind '$bind'" "$symtab" + fi +} + +cat > "$WORK/callee.c" <<'EOF' +int add7(int x) { return x + 7; } +EOF +cat > "$WORK/caller.c" <<'EOF' +int add7(int); +int call_add7(int x) { return add7(x) * 2; } +EOF +cat > "$WORK/entry.c" <<'EOF' +int add7(int); +int _start(void) { return add7(5); } +EOF + +if ! "$KIT" build-obj -target aarch64-linux-gnu -O1 -ffreestanding -flto \ + "$WORK/callee.c" "$WORK/caller.c" -o "$WORK/build_obj.o" \ + > "$WORK/build_obj.out" 2>&1; then + fail_log "build-obj -flto two-TU compile failed" "$WORK/build_obj.out" +fi +"$KIT" objdump -d "$WORK/build_obj.o" > "$WORK/build_obj.dis" 2>&1 +require_no_calls "$WORK/build_obj.dis" call_add7 "build-obj -flto" +printf 'lto-phase1 build-obj fused cross-TU call\n' + +if ! "$KIT" cc -target aarch64-linux-gnu -O1 -ffreestanding -nostdlib \ + -e _start -flto "$WORK/callee.c" "$WORK/entry.c" \ + -o "$WORK/cc_lto.elf" > "$WORK/cc_lto.out" 2>&1; then + fail_log "cc -flto link failed" "$WORK/cc_lto.out" +fi +"$KIT" objdump -d "$WORK/cc_lto.elf" > "$WORK/cc_lto.dis" 2>&1 +require_no_calls "$WORK/cc_lto.dis" _start "cc -flto" +printf 'lto-phase1 cc fused cross-TU call\n' + +if ! "$KIT" build-exe -target aarch64-linux-gnu -O1 -ffreestanding \ + -nostdlib -e _start -flto "$WORK/callee.c" "$WORK/entry.c" \ + -o "$WORK/build_lto.elf" > "$WORK/build_lto.out" 2>&1; then + fail_log "build-exe -flto link failed" "$WORK/build_lto.out" +fi +"$KIT" objdump -d "$WORK/build_lto.elf" > "$WORK/build_lto.dis" 2>&1 +require_no_calls "$WORK/build_lto.dis" _start "build-exe -flto" +printf 'lto-phase1 build-exe fused cross-TU call\n' + +cat > "$WORK/internal_helper.c" <<'EOF' +int arch_helper(int x) { return x + 9; } +EOF +cat > "$WORK/internal_entry.c" <<'EOF' +int arch_helper(int); +int _start(void) { return arch_helper(2); } +EOF +for target in aarch64-linux-gnu x86_64-linux-gnu riscv64-linux-gnu; do + out="$WORK/internal_$target.elf" + if ! "$KIT" cc -target "$target" -O1 -ffreestanding -nostdlib \ + -e _start -flto "$WORK/internal_helper.c" "$WORK/internal_entry.c" \ + -o "$out" > "$WORK/internal_$target.out" 2>&1; then + fail_log "cc -flto internalization failed for $target" \ + "$WORK/internal_$target.out" + fi + "$KIT" objdump -d "$out" > "$WORK/internal_$target.dis" 2>&1 + "$KIT" objdump -t "$out" > "$WORK/internal_$target.sym" 2>&1 + require_no_calls "$WORK/internal_$target.dis" _start \ + "cc -flto internalized helper for $target" + require_symbol_bind "$WORK/internal_$target.sym" arch_helper l \ + "cc -flto internal helper for $target" + require_symbol_bind "$WORK/internal_$target.sym" _start g \ + "cc -flto entry preservation for $target" +done +printf 'lto-phase1 internalized non-preserved helpers on aa64/x64/rv64\n' + +cat > "$WORK/dead_ref.c" <<'EOF' +int missing_external(void); +int dead_global(void) { return missing_external(); } +int _start(void) { return 0; } +EOF +if ! "$KIT" cc -target aarch64-linux-gnu -O1 -ffreestanding -nostdlib \ + -e _start -flto "$WORK/dead_ref.c" -o "$WORK/dead_ref.elf" \ + > "$WORK/dead_ref.out" 2>&1; then + fail_log "dead LTO semantic ref leaked into final link" "$WORK/dead_ref.out" +fi +"$KIT" objdump -t "$WORK/dead_ref.elf" > "$WORK/dead_ref.sym" 2>&1 +if grep -q "missing_external" "$WORK/dead_ref.sym"; then + fail_log "dead LTO semantic ref remained in symbol table" \ + "$WORK/dead_ref.sym" +fi +printf 'lto-phase1 dead semantic refs do not leak after prepass\n' + +if ! "$KIT" build-lib -target aarch64-linux-gnu -O1 -ffreestanding -flto \ + "$WORK/callee.c" "$WORK/caller.c" -o "$WORK/liblto.a" \ + > "$WORK/build_lib.out" 2>&1; then + fail_log "build-lib -flto failed" "$WORK/build_lib.out" +fi +if ! "$KIT" ar t "$WORK/liblto.a" > "$WORK/ar.out" 2>&1; then + fail_log "ar t on LTO archive failed" "$WORK/ar.out" +fi +members=$(grep -cE '\.o$' "$WORK/ar.out" || true) +if [ "$members" -ne 1 ]; then + fail_log "build-lib -flto should archive one merged semantic object" \ + "$WORK/ar.out" +fi +printf 'lto-phase1 build-lib archived one merged LTO object\n' + +cat > "$WORK/weak_only.c" <<'EOF' +__attribute__((weak)) int weak_add1(int x) { return x + 1; } +EOF +cat > "$WORK/weak_caller.c" <<'EOF' +int weak_add1(int); +int weak_call(int x) { return weak_add1(x); } +EOF +if ! "$KIT" build-obj -target aarch64-linux-gnu -O1 -ffreestanding -flto \ + "$WORK/weak_only.c" "$WORK/weak_caller.c" -o "$WORK/weak_lto.o" \ + > "$WORK/weak_lto.out" 2>&1; then + fail_log "weak LTO compile failed" "$WORK/weak_lto.out" +fi +"$KIT" objdump -d "$WORK/weak_lto.o" > "$WORK/weak_lto.dis" 2>&1 +require_has_calls "$WORK/weak_lto.dis" weak_call "weak LTO guard" +printf 'lto-phase1 weak callee stayed out-of-line\n' + +cat > "$WORK/weak_entry.c" <<'EOF' +int weak_add1(int); +int _start(void) { return weak_add1(1); } +EOF +if ! "$KIT" cc -target aarch64-linux-gnu -O1 -ffreestanding -nostdlib \ + -e _start -flto "$WORK/weak_only.c" "$WORK/weak_entry.c" \ + -o "$WORK/weak_exe.elf" > "$WORK/weak_exe.out" 2>&1; then + fail_log "weak executable LTO link failed" "$WORK/weak_exe.out" +fi +"$KIT" objdump -d "$WORK/weak_exe.elf" > "$WORK/weak_exe.dis" 2>&1 +"$KIT" objdump -t "$WORK/weak_exe.elf" > "$WORK/weak_exe.sym" 2>&1 +require_has_calls "$WORK/weak_exe.dis" _start "weak executable LTO guard" +require_symbol_bind "$WORK/weak_exe.sym" weak_add1 w \ + "weak executable LTO preservation" +printf 'lto-phase1 weak executable callee stayed weak and out-of-line\n' + +cat > "$WORK/weak_impl.c" <<'EOF' +__attribute__((weak)) int pick(void) { return 1; } +EOF +cat > "$WORK/strong_impl.c" <<'EOF' +int pick(void) { return 2; } +EOF +cat > "$WORK/pick_main.c" <<'EOF' +int pick(void); +int main(void) { return pick() == 2 ? 0 : 1; } +EOF +if ! "$KIT" cc -O1 -flto "$WORK/weak_impl.c" "$WORK/strong_impl.c" \ + "$WORK/pick_main.c" -o "$WORK/weakstrong" \ + > "$WORK/weakstrong.out" 2>&1; then + fail_log "strong-over-weak function LTO link failed" "$WORK/weakstrong.out" +fi +if ! "$WORK/weakstrong"; then + fail_log "strong-over-weak function LTO executable returned nonzero" \ + "$WORK/weakstrong.out" +fi +printf 'lto-phase1 strong function overrides weak definition\n' + +cat > "$WORK/weak_data.c" <<'EOF' +__attribute__((weak)) int lto_data = 1; +EOF +cat > "$WORK/strong_data.c" <<'EOF' +int lto_data = 2; +EOF +cat > "$WORK/data_main.c" <<'EOF' +extern int lto_data; +int main(void) { return lto_data == 2 ? 0 : 1; } +EOF +if ! "$KIT" cc -O1 -flto "$WORK/weak_data.c" "$WORK/strong_data.c" \ + "$WORK/data_main.c" -o "$WORK/weakdata" \ + > "$WORK/weakdata.out" 2>&1; then + fail_log "strong-over-weak data LTO link failed" "$WORK/weakdata.out" +fi +if ! "$WORK/weakdata"; then + fail_log "strong-over-weak data LTO executable returned nonzero" \ + "$WORK/weakdata.out" +fi +printf 'lto-phase1 strong data overrides weak definition\n' + +cat > "$WORK/odr1.c" <<'EOF' +int odr_dup(void) { return 1; } +EOF +cat > "$WORK/odr2.c" <<'EOF' +int odr_dup(void) { return 2; } +EOF +if bash -c '"$@"; rc=$?; exit $rc' _ "$KIT" build-obj -target aarch64-linux-gnu -O1 \ + -ffreestanding -flto "$WORK/odr1.c" "$WORK/odr2.c" \ + -o "$WORK/odr.o" > "$WORK/odr.out" 2>&1; then + fail_log "duplicate strong definitions unexpectedly compiled" "$WORK/odr.out" +fi +if ! grep -q "duplicate definition of symbol" "$WORK/odr.out"; then + fail_log "duplicate strong definitions lacked ODR diagnostic" "$WORK/odr.out" +fi +printf 'lto-phase1 duplicate strong definitions are rejected\n' + +# Cross-TU tentative definitions. kit is -fno-common: the C frontend lowers a +# file-scope `int g;` to a strong .bss definition, so two of them in different +# TUs conflict exactly as the non-LTO linker resolves them. These checks pin the +# Phase 1 resolution-fidelity invariant — -flto staging merges symbols the same +# way the linker does — and guard same-TU tentative coalescing inside a -flto +# build (the legal `int g; int g;` case must not be misread as a redefinition). +cat > "$WORK/tent_a.c" <<'EOF' +int tentative_dup; +EOF +cat > "$WORK/tent_b.c" <<'EOF' +int tentative_dup; +EOF +cat > "$WORK/tent_entry.c" <<'EOF' +extern int tentative_dup; +int _start(void) { return tentative_dup; } +EOF + +# -flto staging must reject the duplicate tentative defs with the ODR diagnostic. +if bash -c '"$@"; rc=$?; exit $rc' _ "$KIT" build-obj -target aarch64-linux-gnu -O1 \ + -ffreestanding -flto "$WORK/tent_a.c" "$WORK/tent_b.c" \ + -o "$WORK/tent_dup.o" > "$WORK/tent_dup_lto.out" 2>&1; then + fail_log "cross-TU duplicate tentative defs compiled under -flto" \ + "$WORK/tent_dup_lto.out" +fi +if ! grep -q "duplicate definition of" "$WORK/tent_dup_lto.out"; then + fail_log "cross-TU duplicate tentative defs lacked ODR diagnostic under -flto" \ + "$WORK/tent_dup_lto.out" +fi + +# The non-LTO link of the same inputs must reject them too: LTO == linker. +"$KIT" cc -target aarch64-linux-gnu -O0 -ffreestanding -c "$WORK/tent_a.c" \ + -o "$WORK/tent_a.o" > "$WORK/tent_a.out" 2>&1 || + fail_log "tentative TU a failed to compile" "$WORK/tent_a.out" +"$KIT" cc -target aarch64-linux-gnu -O0 -ffreestanding -c "$WORK/tent_b.c" \ + -o "$WORK/tent_b.o" > "$WORK/tent_b.out" 2>&1 || + fail_log "tentative TU b failed to compile" "$WORK/tent_b.out" +"$KIT" cc -target aarch64-linux-gnu -O0 -ffreestanding -c "$WORK/tent_entry.c" \ + -o "$WORK/tent_entry.o" > "$WORK/tent_entry.out" 2>&1 || + fail_log "tentative entry TU failed to compile" "$WORK/tent_entry.out" +if bash -c '"$@"; rc=$?; exit $rc' _ "$KIT" cc -target aarch64-linux-gnu \ + -ffreestanding -nostdlib -e _start "$WORK/tent_a.o" "$WORK/tent_b.o" \ + "$WORK/tent_entry.o" -o "$WORK/tent_dup.elf" \ + > "$WORK/tent_dup_link.out" 2>&1; then + fail_log "cross-TU duplicate tentative defs linked without -flto" \ + "$WORK/tent_dup_link.out" +fi +if ! grep -q "duplicate definition of" "$WORK/tent_dup_link.out"; then + fail_log "non-LTO link lacked duplicate-definition diagnostic" \ + "$WORK/tent_dup_link.out" +fi +printf 'lto-phase1 cross-TU duplicate tentative defs rejected by -flto and linker\n' + +# Positive: one definition coalesced from same-TU tentatives, shared across TUs +# through extern refs, links and observes shared storage at run time under -flto. +cat > "$WORK/tent_def.c" <<'EOF' +int shared_tentative; +int shared_tentative; /* same-TU tentative coalescing inside an -flto build */ +EOF +cat > "$WORK/tent_use.c" <<'EOF' +extern int shared_tentative; +int read_shared(void) { return shared_tentative; } +EOF +cat > "$WORK/tent_shared_main.c" <<'EOF' +extern int shared_tentative; +int read_shared(void); +int main(void) { shared_tentative = 5; return read_shared() == 5 ? 0 : 1; } +EOF +if ! "$KIT" cc -O1 -flto "$WORK/tent_def.c" "$WORK/tent_use.c" \ + "$WORK/tent_shared_main.c" -o "$WORK/tent_shared" \ + > "$WORK/tent_shared.out" 2>&1; then + fail_log "single tentative def shared across TUs failed under -flto" \ + "$WORK/tent_shared.out" +fi +if ! "$WORK/tent_shared"; then + fail_log "cross-TU tentative shared storage incorrect under -flto" \ + "$WORK/tent_shared.out" +fi +printf 'lto-phase1 single tentative def shared across TUs under -flto\n' + +cat > "$WORK/c_frontend.c" <<'EOF' +int c_frontend_value(void) { return 5; } +EOF +cat > "$WORK/toy_frontend.toy" <<'EOF' +fn toy_frontend_value(): i64 { + return 3; +} +EOF +cat > "$WORK/wasm_frontend.wat" <<'EOF' +(module + (func (export "wasm_frontend_value") (result i32) + i32.const 4)) +EOF +if ! "$KIT" build-obj -O1 -flto "$WORK/c_frontend.c" \ + "$WORK/toy_frontend.toy" "$WORK/wasm_frontend.wat" \ + -o "$WORK/semantic_frontends.o" \ + > "$WORK/semantic_frontends.out" 2>&1; then + fail_log "C/Toy/Wasm semantic LTO staging failed" \ + "$WORK/semantic_frontends.out" +fi +printf 'lto-phase1 C/Toy/Wasm semantic frontends staged together\n' + +if ! "$KIT" build-obj -O1 "$WORK/c_frontend.c" -o "$WORK/c_onetu.o" \ + > "$WORK/c_onetu.out" 2>&1; then + fail_log "C one-TU compile_cg wrapper failed" "$WORK/c_onetu.out" +fi +if ! "$KIT" build-obj -O1 "$WORK/toy_frontend.toy" -o "$WORK/toy_onetu.o" \ + > "$WORK/toy_onetu.out" 2>&1; then + fail_log "Toy one-TU compile_cg wrapper failed" "$WORK/toy_onetu.out" +fi +if ! "$KIT" build-obj -O1 "$WORK/wasm_frontend.wat" -o "$WORK/wasm_onetu.o" \ + > "$WORK/wasm_onetu.out" 2>&1; then + fail_log "Wasm one-TU compile_cg wrapper failed" "$WORK/wasm_onetu.out" +fi +printf 'lto-phase1 C/Toy/Wasm one-TU builds use compile_cg wrapper\n' + +cat > "$WORK/use_asm.c" <<'EOF' +int asm_add1(int); +int call_asm(int x) { return asm_add1(x); } +EOF +cat > "$WORK/asm_add1.s" <<'EOF' +.text +.globl asm_add1 +asm_add1: + add x0, x0, #1 + ret +EOF +if ! "$KIT" build-obj -target aarch64-linux-gnu -O1 -ffreestanding -flto \ + "$WORK/use_asm.c" "$WORK/asm_add1.s" -o "$WORK/opaque_asm.o" \ + > "$WORK/opaque_asm.out" 2>&1; then + fail_log "opaque asm participation under -flto failed" "$WORK/opaque_asm.out" +fi +"$KIT" objdump -t "$WORK/opaque_asm.o" > "$WORK/opaque_asm.sym" 2>&1 +if ! grep -q "asm_add1" "$WORK/opaque_asm.sym"; then + fail_log "opaque asm symbol missing from relocatable output" \ + "$WORK/opaque_asm.sym" +fi +printf 'lto-phase1 asm participated as opaque object\n' + +cat > "$WORK/opaque_keep.c" <<'EOF' +int keep_me(void) { return 17; } +int _start(void) { return 0; } +EOF +cat > "$WORK/opaque_ref.s" <<'EOF' +.text +.globl opaque_ref +opaque_ref: + bl keep_me + ret +EOF +if ! "$KIT" cc -target aarch64-linux-gnu -O1 -ffreestanding -nostdlib \ + -e _start -flto "$WORK/opaque_keep.c" "$WORK/opaque_ref.s" \ + -o "$WORK/opaque_ref.elf" > "$WORK/opaque_ref.out" 2>&1; then + fail_log "opaque object reference did not preserve LTO symbol" \ + "$WORK/opaque_ref.out" +fi +"$KIT" objdump -t "$WORK/opaque_ref.elf" > "$WORK/opaque_ref.sym" 2>&1 +require_symbol_bind "$WORK/opaque_ref.sym" keep_me g \ + "opaque object reference preservation" +printf 'lto-phase1 opaque object reference preserved LTO definition\n' + +cat > "$WORK/archive_lto.c" <<'EOF' +int archive_func(void); +int lto_target(void) { return 41; } +int _start(void) { return archive_func(); } +EOF +cat > "$WORK/archive_member.c" <<'EOF' +int lto_target(void); +int archive_func(void) { return lto_target() + 1; } +EOF +if ! "$KIT" cc -target aarch64-linux-gnu -O0 -ffreestanding -c \ + "$WORK/archive_member.c" -o "$WORK/archive_member.o" \ + > "$WORK/archive_member.out" 2>&1; then + fail_log "archive member compile failed" "$WORK/archive_member.out" +fi +if ! "$KIT" ar rcs "$WORK/libsemantic.a" "$WORK/archive_member.o" \ + > "$WORK/archive_ar.out" 2>&1; then + fail_log "archive creation failed" "$WORK/archive_ar.out" +fi +if ! "$KIT" cc -target aarch64-linux-gnu -O1 -ffreestanding -nostdlib \ + -e _start -flto "$WORK/archive_lto.c" "$WORK/libsemantic.a" \ + -o "$WORK/archive_lto.elf" > "$WORK/archive_lto.out" 2>&1; then + fail_log "archive selected by semantic LTO ref failed to link back" \ + "$WORK/archive_lto.out" +fi +"$KIT" objdump -t "$WORK/archive_lto.elf" > "$WORK/archive_lto.sym" 2>&1 +require_symbol_bind "$WORK/archive_lto.sym" lto_target g \ + "archive semantic-ref preservation" +require_symbol_bind "$WORK/archive_lto.sym" archive_func g \ + "archive semantic-ref selection" +printf 'lto-phase1 archive semantic ref preserved callback target\n' + +if "$KIT" cc -shared -flto -nostdlib "$WORK/callee.c" \ + -o "$WORK/libbad.so" > "$WORK/shared_lto.out" 2>&1; then + fail_log "cc -shared -flto unexpectedly succeeded" "$WORK/shared_lto.out" +fi +if ! grep -q "shared-library LTO output is not exercised" \ + "$WORK/shared_lto.out"; then + fail_log "cc -shared -flto rejection missing shared-LTO diagnostic" \ + "$WORK/shared_lto.out" +fi +printf 'lto-phase1 shared-library LTO remains disabled\n' + +printf 'lto-phase1: ok\n' diff --git a/test/opt/whole_program_inline.sh b/test/opt/whole_program_inline.sh @@ -0,0 +1,138 @@ +#!/usr/bin/env bash +# Whole-program cross-function inlining (LTO Phase 0). +# +# At -O1 the optimizer defers emission to a module-wide finalize sweep that GCs +# dead symbols and runs the whole-program inliner (opt_inline) over the live +# FuncSet. This is one path for every arch — no arch special-casing — so the +# structural checks run identically for aarch64, x86_64, and riscv64. +# +# Green: a small static callee fuses into its caller (no call instruction left +# in the caller, and the `opt.inline.inlined` metric fires). Behavioral: the +# fused program still returns the right value via the host JIT. +set -euo pipefail + +ROOT="$(cd "$(dirname "$0")/../.." && pwd)" +KIT="${KIT:-$ROOT/build/kit}" +WORK="$ROOT/build/test/opt/whole_program_inline" +mkdir -p "$WORK" + +# A caller (`compute`) that reaches two small static helpers. Both should fuse +# in, leaving `compute` call-free. +read -r -d '' SRC <<'EOF' || true +static int add1(int x) { return x + 1; } +static int twice(int x) { return add1(add1(x)); } +int compute(int x) { return twice(x) + add1(x); } +EOF + +# Per-arch call mnemonics (aarch64 bl/blr, x86_64 call/callq, riscv jal/jalr). +# After fusion `compute` must contain none of them. +call_mnemonics='\b(bl|blr|callq?|jalr?)\b' + +check_arch() { + local triple=$1 + local tag=$2 + local src="$WORK/$tag.c" + local obj="$WORK/$tag.o" + printf '%s\n' "$SRC" > "$src" + "$KIT" cc -target "$triple" -O1 -ffreestanding -std=c11 -c "$src" \ + -o "$obj" > "$WORK/$tag.cc.out" 2>&1 + "$KIT" objdump -d "$obj" > "$WORK/$tag.dis" 2>&1 + # Isolate the `compute` function body and count residual calls. + local ncalls + ncalls=$(sed -n '/<compute>:/,/^$/p' "$WORK/$tag.dis" \ + | grep -cE "$call_mnemonics" || true) + if [ "$ncalls" -ne 0 ]; then + printf 'whole-program-inline FAILED: %s left %s call(s) in compute (callee not fused)\n' \ + "$tag" "$ncalls" >&2 + sed -n '/<compute>:/,/^$/p' "$WORK/$tag.dis" | sed 's/^/ | /' >&2 + exit 1 + fi + printf 'whole-program-inline %-8s fused (compute call-free)\n' "$tag" +} + +check_arch aarch64-linux-gnu aa64 +check_arch x86_64-linux-gnu x64 +check_arch riscv64-linux-gnu rv64 + +# Interposition guard: a weak callee is link-time replaceable, so inlining its +# body would defeat a strong override. The caller must keep the call. Check on +# every arch (one unified inliner path). +read -r -d '' WEAK_SRC <<'EOF' || true +__attribute__((weak)) int wcallee(int x) { return x + 1; } +int wcaller(int x) { return wcallee(x); } +EOF +check_weak_not_inlined() { + local triple=$1 + local tag=$2 + local src="$WORK/weak_$tag.c" + local obj="$WORK/weak_$tag.o" + printf '%s\n' "$WEAK_SRC" > "$src" + "$KIT" cc -target "$triple" -O1 -ffreestanding -std=c11 -c "$src" \ + -o "$obj" > "$WORK/weak_$tag.cc.out" 2>&1 + "$KIT" objdump -d "$obj" > "$WORK/weak_$tag.dis" 2>&1 + local ncalls + ncalls=$(sed -n '/<wcaller>:/,/^$/p' "$WORK/weak_$tag.dis" \ + | grep -cE "$call_mnemonics" || true) + if [ "$ncalls" -eq 0 ]; then + printf 'whole-program-inline FAILED: %s inlined a WEAK callee (interposition unsound)\n' \ + "$tag" >&2 + sed -n '/<wcaller>:/,/^$/p' "$WORK/weak_$tag.dis" | sed 's/^/ | /' >&2 + exit 1 + fi + printf 'whole-program-inline %-8s weak callee kept out-of-line\n' "$tag" +} +check_weak_not_inlined aarch64-linux-gnu aa64 +check_weak_not_inlined x86_64-linux-gnu x64 +check_weak_not_inlined riscv64-linux-gnu rv64 + +# Metric: the whole-program inliner must actually fire at -O1 (not just the +# streaming tiny-inliner, which emits opt.tiny_inline.inlined instead). +read -r -d '' RUN_SRC <<'EOF' || true +static int add1(int x) { return x + 1; } +int main(void) { return add1(41) == 42 ? 0 : 1; } +EOF +printf '%s\n' "$RUN_SRC" > "$WORK/run.c" +if ! "$KIT" run --time -O1 "$WORK/run.c" >"$WORK/run.out" 2>"$WORK/run.err"; then + printf 'whole-program-inline FAILED: `kit run -O1` did not exit 0\n' >&2 + sed 's/^/ | /' "$WORK/run.err" >&2 + exit 1 +fi +if ! grep -q 'opt.inline.inlined' "$WORK/run.err"; then + printf 'whole-program-inline FAILED: opt.inline.inlined metric absent at -O1\n' >&2 + sed -n '1,80p' "$WORK/run.err" >&2 + exit 1 +fi +printf 'whole-program-inline run fired opt.inline.inlined, exit 0\n' + +# The kit-native build verbs (build-exe/build-lib/build-obj) compile through the +# same kit_cg path as cc, so whole-program optimization participates without any +# build-verb-specific wiring. Guard that: build-obj at -O1 must fuse, and +# build-exe must produce a correct, fused executable. +printf '%s\n' "$SRC" > "$WORK/verb.c" +"$KIT" build-obj -O1 -ffreestanding "$WORK/verb.c" -o "$WORK/verb.o" \ + > "$WORK/verb.cc.out" 2>&1 +"$KIT" objdump -d "$WORK/verb.o" > "$WORK/verb.dis" 2>&1 +vcalls=$(sed -n '/<compute>:/,/^$/p' "$WORK/verb.dis" \ + | grep -cE "$call_mnemonics" || true) +if [ "$vcalls" -ne 0 ]; then + printf 'whole-program-inline FAILED: build-obj -O1 did not fuse (LTO bypassed)\n' >&2 + sed -n '/<compute>:/,/^$/p' "$WORK/verb.dis" | sed 's/^/ | /' >&2 + exit 1 +fi +printf 'whole-program-inline build-obj fused (verb participates in LTO)\n' + +read -r -d '' VERB_EXE_SRC <<'EOF' || true +static int add1(int x) { return x + 1; } +static int twice(int x) { return add1(add1(x)); } +int main(void) { return (twice(20) + add1(1)) == 24 ? 0 : 1; } +EOF +printf '%s\n' "$VERB_EXE_SRC" > "$WORK/verb_exe.c" +if ! "$KIT" build-exe -O1 "$WORK/verb_exe.c" -o "$WORK/verb_exe" \ + > "$WORK/verb_exe.cc.out" 2>&1 || ! "$WORK/verb_exe"; then + printf 'whole-program-inline FAILED: build-exe -O1 produced wrong result\n' >&2 + sed 's/^/ | /' "$WORK/verb_exe.cc.out" >&2 + exit 1 +fi +printf 'whole-program-inline build-exe correct + fused\n' + +printf 'whole-program-inline: ok\n' diff --git a/test/parse/run.sh b/test/parse/run.sh @@ -467,7 +467,14 @@ kit_lane_C() { # leading-underscore (_global_x), so the link can't resolve — a name-mangling # mismatch the C backend can't bridge without parsing the opaque asm. ELF has # no such prefix, so the emitted C links and runs there. - if [ "$HOST_OBJ_FMT" = "macho" ] && [[ "$KIT_BASE" == asm_02_file_scope ]]; then + # asm_04_register_callee_saved hits the same wall: its file-scope asm defines + # write_saved_reg/read_saved_reg as bare names, but the C calls reference the + # underscored _write_saved_reg/_read_saved_reg on Mach-O, so the link fails. + # Verified otherwise-correct: underscoring the asm labels links clean under + # -Wall -Wextra -Werror and returns the expected 77. + if [ "$HOST_OBJ_FMT" = "macho" ] && \ + { [[ "$KIT_BASE" == asm_02_file_scope ]] || \ + [[ "$KIT_BASE" == asm_04_register_callee_saved ]]; }; then kit_skip "$KIT_NAME/C" "Mach-O underscores C symbol refs; verbatim file-scope asm defines the bare name" return fi