commit bde9b5848be4f7d9b2587abaaf7c4ad02417143c
parent 8d3bb285e71aed150d99db5019df5876c1f245a0
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Thu, 4 Jun 2026 06:49:20 -0700
doc/plan: design for build-exe/build-lib/build-obj (replacing compile)
Kit-native build verbs that compile a mixed-language source set in memory
and link/archive in one shot, replacing the no-link 'compile' tool. Captures
the --group flag-scoping grammar, the global-vs-scopable taxonomy, -X<lang>
frontend-flag routing, hybrid naming (kit --emit/-o + Zig -static/-dynamic),
the link_engine/archive_engine extraction plan, migration, and the resolved
design decisions.
Diffstat:
2 files changed, 293 insertions(+), 0 deletions(-)
diff --git a/doc/plan/BUILD_COMMANDS.md b/doc/plan/BUILD_COMMANDS.md
@@ -0,0 +1,292 @@
+# kit build commands
+
+Forward-looking roadmap for the kit-native build verbs — `build-exe`,
+`build-lib`, `build-obj` — that **replace** the `compile` tool. They compile a
+mixed-language set of sources entirely in memory and produce a final artifact
+(executable / static or shared library / object) in one invocation, with full
+control over both the per-source compile and the whole-build link. Design doc
+when shipped: [../DRIVER.md](../DRIVER.md).
+
+Distinct from [BUILD.md](BUILD.md) (the CAS-backed incremental build
+*coordinator*) and from [../BUILD.md](../BUILD.md) (kit's own Makefile build).
+This is about the driver's single-shot build commands.
+
+## Motivation
+
+Today the kit-native compile path splits awkwardly:
+
+- **`compile`** (driver/cmd/compile.c) resolves exactly one frontend (by `-x`
+ or suffix), forwards frontend-specific flags (e.g. wasm `-mfeature=`), and
+ emits objects / `.s` / portable C / IR — but **never links** and rejects
+ `.o`/`.a` inputs. Linking means writing intermediate objects to disk and
+ invoking `ld`/`cc`/`run` separately.
+- **`cc`** (driver/cmd/cc.c) already compiles a *polyglot* source set
+ (`.c .s .S .toy .wat .wasm`, language resolved per-file) to in-memory
+ `KitObjBuilder*`, links them with byte-loaded `.o`/`.a`/`.so` via a single
+ `KitLinkSession`, and emits an executable or shared library — **with no
+ intermediate files**. But `cc` is deliberately a GCC-compatible C driver: its
+ flag surface is a GCC subset and it does **not** expose frontend-specific
+ flags.
+
+So the in-memory, no-temp-files, polyglot compile+link pipeline already exists
+and is proven inside `cc`'s link path (driver/cmd/cc.c `cc_run_link_exe`). What
+is missing is a **kit-native front door** to that pipeline: one that is polyglot,
+forwards per-language frontend flags, exposes the full link-flag surface, and
+lets the caller scope compile flags to individual sources or groups of sources —
+without pretending to be `gcc`.
+
+The public API already supports every output we need:
+
+| Artifact | API |
+|----------|-----|
+| executable | `KitLinkSession` + `KIT_LINK_OUTPUT_EXE` → `kit_link_session_emit` |
+| shared library | `KitLinkSession` + `KIT_LINK_OUTPUT_SHARED` |
+| combined relocatable object | `KitLinkSession` + `KIT_LINK_OUTPUT_RELOCATABLE` |
+| single object | `kit_obj_builder_emit` (include/kit/object.h) |
+| static archive | `kit_obj_builder_emit` each member, then `kit_ar_write` (include/kit/archive.h) |
+
+This work is therefore **almost entirely a driver-layer reorganization**, not new
+core machinery: lift `cc`'s link path into a shared engine, add a kit-native
+argument grammar on top, and retire `compile`.
+
+## The command set
+
+A Zig-inspired trio. Every command is polyglot, compiles in memory, and writes
+no intermediate files.
+
+| Command | Produces | Backend |
+|---------|----------|---------|
+| `kit build-exe` | executable | link session, `OUTPUT_EXE` |
+| `kit build-lib` | static `.a` (default) or shared library (`-dynamic`) | `kit_ar_write` / `OUTPUT_SHARED` |
+| `kit build-obj` | a single object; or `--emit=asm\|c\|ir`; or `-fsyntax-only` check | one `KitObjBuilder`, or `OUTPUT_RELOCATABLE` for multi-source |
+
+`build-obj` is the full replacement for `compile`: it keeps `--emit=obj|asm|c|ir`,
+`-fsyntax-only`, the single-frontend-or-polyglot source handling, and frontend
+flag forwarding — and it gains the ability to **combine several sources into one
+relocatable object** (`ld -r` style) via `KIT_LINK_OUTPUT_RELOCATABLE`. The
+standalone `kit check` (cc.c `driver_check`) and `cc`'s own `--emit=`/`-S` are
+unaffected and remain available.
+
+The three share ~90% of their code; like `cc`/`check` they live in one file
+(`driver/cmd/build.c`) with three thin entry points (`driver_build_exe`,
+`driver_build_lib`, `driver_build_obj`) over a shared parse+run parameterized by
+output kind.
+
+## Command-line grammar
+
+### Two flag tiers
+
+1. **Global / per-output flags** apply to the whole build and may appear
+ anywhere outside a group. These are everything that must agree across the
+ link, plus the optimization and debug knobs (per the decision below):
+
+ - `-target TRIPLE` / `--target=`, and target-feature flags
+ - `-O0|-O1|-O2`, `-g`
+ - `-fPIC|-fPIE`, `-fvisibility=hidden|default`
+ - `-ffunction-sections`, `-fdata-sections`
+ - all **link** flags: `-l`, `-L`, `-e`, `-T`, `-static`/`-dynamic`,
+ `-pie`/`-no-pie`, `--build-id=`, `-Wl,…`, soname/rpath, subsystem, …
+ - all **output** flags: `-o`, `--emit=`, `-S`, `-fsyntax-only`
+ - `-Werror`, `-fmax-errors=N`
+
+2. **Scopable flags** may appear globally (baseline for every source) *and*
+ inside a `--group` (override for that group's sources only). The scopable set
+ is intentionally small — only what is genuinely per-translation-unit:
+
+ - preprocessor: `-I`, `-isystem`, `-D`, `-U`
+ - language selection: `-x LANG`
+ - frontend-specific: `-X<lang> FLAG` (see below)
+
+Placing a global flag inside a `--group` is a **usage error** with a pointed
+diagnostic (e.g. `-O is a per-output flag; place it before any --group`). This
+keeps the rule a one-liner: *outside a group = whole build; inside a group =
+those sources.*
+
+### Groups
+
+```
+--group [scopable flags…] -- source [source…]
+```
+
+Each `--group` bundles scopable overrides with the sources listed up to the next
+`--group` or the end of arguments. The `--` separates the group's flags from its
+sources. Sources listed **outside** any group ("bare" sources) receive only the
+global flags.
+
+Inheritance and precedence within a group, relative to the global baseline:
+
+- **Include dirs** (`-I`/`-isystem`): group dirs are prepended to the global
+ search path (searched first), global dirs still apply.
+- **Defines** (`-D`/`-U`): additive; a group `-D` of an already-defined name
+ overrides it for that group.
+- **Language** (`-x`): a group `-x` overrides suffix resolution for that group.
+- **Frontend flags** (`-X<lang>`): a group's apply only to that group's sources
+ of `<lang>`; global `-X<lang>` applies to all sources of `<lang>`.
+
+Link order is the left-to-right order of source/object/archive appearance; a
+group contributes its sources at the group's position. Bare inputs (`.o`/`.a`/
+`.so`) keep their command-line position for the linker.
+
+### Per-language frontend flags: `-X<lang>`
+
+`compile` could forward leftover flags unambiguously because it resolved exactly
+one frontend. A polyglot build cannot, so frontend flags are explicitly
+language-scoped:
+
+```
+-X<lang> FLAG # e.g. -Xwasm -mfeature=simd128
+```
+
+`-X<lang>` consumes exactly one following token and routes it to that frontend's
+`kit_frontend_parse_options` (the same entry `compile` uses). Repeatable.
+`<lang>` is `c|asm|toy|wasm`. Works both globally and inside a group. (Current
+kit frontend flags are single-token; a multi-token form is a future extension if
+ever needed.)
+
+### Naming conventions (hybrid)
+
+Keep kit/`cc`'s established output vocabulary; adopt Zig's clearer link-kind
+selectors:
+
+- **Output form**: `--emit=obj|asm|c|ir` and `-o PATH` (unchanged from `compile`).
+ `-S` is sugar for `--emit=asm`.
+- **Link kind**: `-static` / `-dynamic` instead of `-shared`. `build-lib`
+ defaults to a static `.a`; `-dynamic` makes a shared library. `build-exe`
+ defaults to the target's normal dynamic linking; `-static` produces a fully
+ static executable. `-shared` is accepted on `build-lib` as a **hidden alias**
+ for `-dynamic` (eases `cc`/`gcc` muscle memory) but is omitted from help, which
+ steers to `-dynamic`.
+
+### Output defaults
+
+- `build-exe`: `-o` optional; default `a.out` (`a.exe` on Windows).
+- `build-lib`: `-o` **required** (no single obvious base name across N sources);
+ shared output respects soname/`--version`.
+- `build-obj`: single source → default `<base>.o` (as `compile` does today, via
+ the equivalent of `compile_default_out`); multiple sources → `-o` required and
+ output is one relocatable object; `--emit=c` still requires `-o`; `--emit=ir`
+ still requires `-O1+`.
+
+`-o -` writes the emit to stdout for **all** emit forms (obj/asm/c/ir),
+reusing the existing `driver_stdout_writer` that `cc` uses — natural for
+pipelines (e.g. `build-obj --emit=ir -o - kernel.wat | less`). Binary objects to
+a tty are unusual but harmless and not specially rejected.
+
+A single `-target` governs the whole build — mixing targets in one invocation is
+an error (one link, one machine).
+
+## Worked examples
+
+```sh
+# Polyglot executable: C + a hand-written asm TU + a Wasm module, in memory.
+kit build-exe -target aarch64-linux-gnu -O2 -o app \
+ main.c util.c \
+ --group -DFAST -Iinc/fast -- hot1.c hot2.c \
+ --group -Xwasm -mfeature=simd128 -- kernel.wat \
+ prebuilt.o -Llib -lfoo
+
+# Static library from mixed sources (default kind).
+kit build-lib -O2 -o libmix.a a.c b.toy c.s
+
+# Shared library with a soname.
+kit build-lib -dynamic -fPIC -Wl,-soname=libmix.so.1 -o libmix.so.1 a.c b.c
+
+# Combine three TUs into one relocatable object (ld -r).
+kit build-obj -O1 -o combined.o a.c b.c c.c
+
+# Inspect: emit IR for a Wasm module compiled with a frontend feature flag.
+kit build-obj -O1 --emit=ir -Xwasm -mfeature=simd128 -o k.ir kernel.wat
+
+# Check only, no output.
+kit build-obj -fsyntax-only main.c util.c
+```
+
+## Implementation plan
+
+The work is a factor-out + new-grammar exercise. Proposed file moves:
+
+1. **`driver/lib/link_engine.{h,c}`** — lift the body of cc.c `cc_run_link_exe`
+ into a reusable step. Input: a populated link plan (in-memory `KitObjBuilder*`
+ list, byte-loaded objects/archives/DSOs, an ordered `KitLinkInputOrder`
+ list, and a filled `KitLinkSessionOptions`). It opens the writer, builds the
+ `KitLinkSession`, adds inputs in order, and emits. `cc_run_link_exe` becomes a
+ thin caller, so `cc` and `build-*` share one link path (no behavior change to
+ `cc`). The runtime-archive insertion (`libkit_rt.a`), hosted-libc wiring
+ (driver/lib/hosted), and `-l`/`-L` resolution (driver/lib/lib_resolve) are
+ already factored and are reused as-is.
+
+2. **`driver/lib/archive_engine.{h,c}`** (small) — `driver_archive_emit(objs[],
+ names[], n, writer)`: `kit_obj_builder_emit` each member to bytes, then
+ `kit_ar_write`. Used by `build-lib` (static) and reusable by a future `ar`
+ pipeline.
+
+3. **`driver/cmd/build.c`** — the new grammar and the three entry points. Reuses
+ `driver_compile_run` (driver/lib/compile_engine.h) for the per-source compile,
+ `DriverCflags` (driver/lib/cflags) for `-I/-D/-U`, and
+ `driver_target_features_*`. New here: the `--group … --` parser, the
+ global-vs-scoped validation, the `-X<lang>` router, and per-group cflag/
+ frontend-option contexts (one `DriverCflags` baseline plus per-group deltas).
+
+4. **`driver/main.c`** — register `build-exe`/`build-lib`/`build-obj` in
+ `driver_tools[]`, gated by new `KIT_TOOL_BUILD_*_ENABLED` flags
+ (include/kit/config.h); add them to the default install group. Remove the
+ `compile` entry and its `KIT_TOOL_COMPILE_ENABLED` gate.
+
+5. **Remove `driver/cmd/compile.c`** and its help. Its capabilities are fully
+ covered by `build-obj`.
+
+### Per-group compile state
+
+The compile loop already builds one `KitObjBuilder*` per source through a shared
+`KitCompiler`. The only new state is per-group compile options: each source
+carries (a) a `KitPreprocessOptions` derived from global cflags + the group's
+cflag delta, (b) a resolved `KitLanguage` (group `-x` or suffix), and (c) the
+`lang_extra` from that group's `-X<lang>` flags. This mirrors how `compile`
+already calls `kit_frontend_parse_options` per frontend — now keyed per group.
+
+## Migration
+
+- **Tests**: `test/toy/run.sh` and any harness invoking `kit compile` move to
+ `kit build-obj` (same flags: `--emit=`, `-x`, `-fsyntax-only`, frontend flags).
+ The toy corpus exercises CG via the toy frontend → `build-obj`.
+- **Config/install**: drop `KIT_TOOL_COMPILE_ENABLED`; add
+ `KIT_TOOL_BUILD_EXE_ENABLED` / `_LIB_` / `_OBJ_`. Update the `install` default
+ tool set and the centralized tool table in main.c.
+- **Docs**: update [../DRIVER.md](../DRIVER.md) and the project `CLAUDE.md` code
+ map (the `compile` bullet → the three `build-*` bullets) when this ships.
+- **`cc` unaffected**: it keeps its GCC-compatible surface; it just calls the
+ shared `link_engine` instead of its inlined copy.
+
+## Future work (post-v1)
+
+- **`@file` response files** and **attach-by-name overrides**
+ (`-Con GLOB : FLAGS`) are deferred. Both layer onto the `--group` grammar
+ later without breaking it (`@file` is pure argv preprocessing; attach is
+ additive). Add `@file` first if build-system drivers hit command-line length
+ limits — it is net-new (no existing expander in the driver) but small and
+ standard (gcc/ld/ar).
+- **JIT/`run` reuse** — `KIT_LINK_OUTPUT_JIT` already backs `kit run`; a future
+ `build-exe --run` could share the same `link_engine` plan.
+
+## Verification notes
+
+- **Relocatable-object combine** — `build-obj` multi-source
+ (`KIT_LINK_OUTPUT_RELOCATABLE`) must match `ld -r` for symbol visibility and
+ common symbols. Cover with tests against the existing relocatable-link path
+ before release; this is the one v1 feature whose semantics need confirming
+ rather than just wiring.
+
+## Decisions (2026-06-04)
+
+| Decision | Choice |
+|----------|--------|
+| Replace `compile`? | Yes — trio `build-exe`/`build-lib`/`build-obj`; `build-obj` subsumes `compile`. |
+| Flag scoping syntax | Explicit `--group [flags] -- sources` blocks. Outside = global/per-output, inside = scoped; a group of one = per-source. |
+| Global (per-output) flags | `-O`, `-g`, `-fPIC/-fPIE`, `-fvisibility` are all global (plus `-target`, all link, all output flags). |
+| Scopable-in-group set | `-I/-isystem/-D/-U`, `-x`, `-X<lang>` frontend flags. |
+| Naming conventions | Hybrid: keep kit `--emit=`/`-o`; adopt Zig `-static`/`-dynamic` for link kind. |
+| Inspection / check home | `build-obj` keeps `--emit=asm\|c\|ir`, `-fsyntax-only`, and gains multi-source → relocatable `.o`. |
+| `-shared` on `build-lib` | Accepted as a hidden alias for `-dynamic` (not shown in help). |
+| `build-obj` multi-source | Relocatable combine ships in v1, gated by `ld -r` parity tests. |
+| `-o -` to stdout | Supported for all emit forms (obj/asm/c/ir) via `driver_stdout_writer`. |
+| v1 input ergonomics | `--group` grammar only; `@file` and attach-by-name deferred to post-v1. |
diff --git a/doc/plan/README.md b/doc/plan/README.md
@@ -17,3 +17,4 @@ shrinks to whatever remains open.
| [BOOTSTRAP.md](BOOTSTRAP.md) | The 3-stage self-build reproducibility goal and the open `-O1` issues blocking it. | [../BUILD.md](../BUILD.md) |
| [IMAGE_INSPECT.md](IMAGE_INSPECT.md) | Extending object inspection to executables and shared libraries. | [../OBJ.md](../OBJ.md) |
| [BUILD.md](BUILD.md) | A new content-addressed build coordinator (Bazel/Nix-style incremental builds layered on the CAS) — storage state machine, caching algorithm, recipe protocol. Distinct from `../BUILD.md` (kit's own Makefile build). | — (new subsystem) |
+| [BUILD_COMMANDS.md](BUILD_COMMANDS.md) | The kit-native `build-exe`/`build-lib`/`build-obj` verbs that replace `compile`: polyglot, in-memory compile+link with `--group` flag scoping and full link-flag control. Distinct from `BUILD.md` (the CAS coordinator). | [../DRIVER.md](../DRIVER.md) |