Content-addressed build coordinator

Status: design, not yet built. This is a forward-looking roadmap for a new subsystem layered on the content store in ../DISTRIBUTE.md (<kit/cas.h>). It specifies the configuration model, the on-disk storage state machine, the coordinator's in-memory state and caching algorithm, the command protocol recipes use to talk back, and how builds share work over a network (remote CAS + signed traces).

Naming, to avoid confusion: this is not ../BUILD.md, which documents how kit itself is built (the Makefile, KIT_*_ENABLED component gating, and the 3-stage bootstrap). This doc designs a user-facing, Bazel/Nix-style incremental build engine that compiles arbitrary projects by running recipes and caching their outputs by content. The two are unrelated beyond sharing the word "build."

What it is

A build coordinator is a long-lived process that turns a build request (target T under configuration C) into a materialized output tree on disk, doing the least work necessary. It does this by running recipes — opaque executables, typically shell scripts — that produce output directories, and by caching every recipe result keyed by the exact set of inputs that produced it. A second request whose inputs are unchanged returns the cached output without re-running anything; a request whose direct dependencies' outputs are unchanged (even if some deep source changed) can still skip its own recipe.

The whole design rests on two properties:

Content identity. Every input and every output has a stable id: the BLAKE2b-256 of its bytes (kit_blob_info, kit_cas_*). Sameness is hash equality, never timestamps.
Determinism. A recipe is assumed to be a pure function of its declared inputs: identical declared inputs ⇒ byte-identical output tree. Caching is only as correct as this assumption (see Determinism).

It reuses the existing CAS wholesale: source-file bytes, output directories, config snapshots, and trace bodies are all CAS blobs and trees. The coordinator adds two things the content store does not have — a mutable per-target trace index, and the resolution algorithm that drives it.

Concepts and vocabulary

Term	Meaning
Target `T`	A named unit of work, e.g. `//app:server`. A build definition maps each target name to one recipe + local argv.
Recipe	The executable the coordinator runs to produce a target's output. Identified by `recipe-id` = `BLAKE2b(recipe file bytes)`.
Configuration	A key→value map of tunables (target triple, opt level, feature flags). Two scopes — propagated and local — see below. Identified by `config-id` = `BLAKE2b(canonical map)`.
Output tree	The directory a recipe produces, captured as a CAS tree (`tree-id`). The unit a build request returns.
Base inputs	The leaves of the dependency graph, dynamically requested by recipes: config values, source files, and globs.
Target dep	A dependency of one target on the output tree of another. The graph's interior edges, created by `need`.
Shallow trace	A record of one build: its direct base inputs + its direct target deps (by output `tree-id`) → its output `tree-id`.
Deep trace	A record of one build: the transitive closure of source/glob leaves + the root `config-id` → its output `tree-id`.
Tree cache	On-disk materialized output directories, keyed by `tree-id`, ready to hand back as a filesystem path.

The three base-input kinds are exactly the dynamically-requested leaves: a recipe asks the coordinator for a config value, the bytes of a source file, or the expansion of a glob, and each request is logged as a dependency so a later rebuild knows precisely what to re-check.

Configuration model

Configuration has two scopes, distinguished by visibility:

Propagated configuration flows down the whole subtree. The top-level request supplies it; a recipe reads a value with config-get <key> (logged as a config dep); and a need may overlay it for the sub-build it triggers. A target's effective propagated config = its inherited config with overlays applied along the need path from the root. It is canonicalized to byte-stable text and content-addressed: config-id = BLAKE2b(canonical map), stored as a CAS blob so any recorded config-id resolves back to the actual map.
Local configuration is the target's argv, fixed by the build definition. It is visible only to that one target's recipe (delivered as the process's actual argv) and is not propagated to deps. Treating argv as config keeps the model uniform: everything a recipe sees is configuration, some inherited, some local. Local config is recorded per trace but is not part of config-id. It too is serialized to a canonical CAS blob, argv-id.

Serialized for replay, not just hashed. Both the effective config map and the argv vector are persisted by value as canonical, content-addressed CAS blobs (config-id, argv-id) — never reduced to an opaque one-way hash. A trace references them by id, and they are part of its reachable closure (GC-rooted, and bundled with shared traces). This is what lets the shallow path replay every need request exactly — reconstruct the dep's target name, its full effective config with overlays applied, and its argv — even when driving from an imported shallow trace with no access to the original build definition or workspace. Source files and globs are the opposite case: kept by hash only, because they are verified (did this still hash the same?), never replayed, and their bytes already live in the CAS.

The resolution identity of a build is therefore (target-name, config-id), where config-id is the effective propagated config. Local argv is determined by the target name (via the build definition), so it does not widen the key. Two requests (T, C1) and (T, C2) — the same target under different propagated config — are genuinely distinct builds; both may be in flight at once.

recipe-id covers only the recipe file bytes. Two other things that can change a target's behavior — which recipe it maps to, and its argv — live in the build definition, which the coordinator reads to resolve T. That read is itself a tracked source dependency of T, so editing a target's stanza (repointing its recipe, or changing argv) invalidates its traces through the ordinary source-leaf mechanism.

Why two trace kinds

The deep trace answers "is this exact configuration unchanged and has any source moved?" in one pass over a flat leaf set — no graph walk. When it matches, the output is determined and we skip straight to materialization. It is the inner dev-loop fast path: edit code, rebuild under the same config.

The shallow trace handles everything the deep path defers — a changed source that might not actually move a dep's output, and any configuration change. It rebuilds the direct deps (cheaply, via their own traces, threading the new config down through need), compares their resulting tree-ids against what it recorded, and if they match, the recipe is still skipped. This is the payoff of recording deps by output identity rather than by input identity: a subtree that ignores a changed config key, or absorbs a comment-only edit, produces the same output tree and short-circuits its parent's recipe.

On-disk storage

The build store sits beside (or contains) a CAS. Everything content-addressed is immutable and self-verifying; exactly one class of object is mutable.

<store>/
  cas/                         # the shared content store, per DISTRIBUTE.md
    blob/<pp>/<blob-id>        #   raw file bytes (sources, config maps, argv vectors, trace bodies)
    tree/<pp>/<tree-id>        #   canonical directory manifests (output trees)
    ...                        #   (chunk/, index/ as in DISTRIBUTE.md)
  build/
    trace/<pp>/<trace-id>      # IMMUTABLE canonical trace bodies (deep + shallow)
    target/<pp>/<target-key>   # MUTABLE per-target trace set (the only mutable state)
    cache/<pp>/<tree-id>/      # tree cache: materialized output directories
    tmp/                       # staging for atomic writes + recipe sandboxes

<pp> is the first two lowercase hex chars of the relevant id (the DISTRIBUTE.md convention). trace-id = BLAKE2b of the canonical trace body. target-key = BLAKE2b("kit build target v1" ‖ target-name) — note it depends on the target name only, never on config or recipe-id, so a target's index entry is a stable handle while its inputs churn. Config maps and trace bodies are ordinary CAS blobs; only target/ records and the cache/ materializations are build-specific.

Trace bodies (immutable, content-addressed)

Both kinds are strict, byte-stable, INI-style text in the family of kit-tree 1 / kit-package 3: a version line, ordered scalar fields, then sorted sections. Unknown keys/sections, duplicate rows, and non-canonical ordering are parse errors (a parse error is treated as absent, never as a match — fail safe). All ids are lowercase hex. Config values, source bytes, and glob expansions are stored by hash, never inline, so a body's size is independent of its inputs' sizes.

Shallow trace — direct inputs and direct deps, for fine-grained rechecking:

kit-build-shallow 1
target //app:server
recipe <recipe-id>
output <output-tree-id>
config <config-id>             ; this target's effective propagated config (full map blob)
argv <argv-id>                 ; this target's local config (serialized vector blob)
[config]                       ; CONSUMED propagated key NAMES, sorted (values live in the map)
opt
[source]                       ; sorted by path
src/main.c <blob-id>
build.kit <blob-id>            ; the build definition stanza is a tracked source
[glob]                         ; sorted by pattern
src/*.c <glob-result-hash>
[dep]                          ; sorted by (dep-name, dep-config-id) — each row replays one `need`
//lib:core <dep-config-id> <dep-output-tree-id>

Deep trace — root config-id plus the transitive source/glob closure, no [dep] and no per-key [config] section:

kit-build-deep 1
target //app:server
recipe <recipe-id>
output <output-tree-id>
root-config <config-id>        ; this target's effective propagated config (full map blob)
argv <argv-id>                 ; local config (serialized vector blob)
[source]                       ; union over the whole closure, sorted, deduped
src/main.c <blob-id>
lib/core.c <blob-id>           ; pulled in transitively from //lib:core
[glob]
src/*.c <glob-result-hash>

config <config-id> and argv <argv-id> reference the serialized effective config map and argv vector — canonical CAS blobs, kept by value and bundled with shared traces — so the target's invocation and every need it issues replay exactly, not merely check (see Configuration model).
[config] lists the names of consumed propagated keys (consumed whether set or unset); the values live in the config-id map this trace references. A key absent from that map was consumed while unset, so later adding it busts the trace — no sentinel value needed. Matching compares, for each consumed key, the request's value against the built map's value (absent == unset).
<blob-id> = the source file's CAS blob id (kit_blob_info); source and glob leaves are kept by hash because they are verified, never replayed.
<glob-result-hash> = BLAKE2b of the canonical (path, blob-id) listing the pattern matched (sorted by path). It changes if any matched file is added, removed, or edited — so one glob row covers the existence and content of its whole match set, and files read through a glob need no separate [source] rows.
[dep] rows carry the dep's config-id (and output tree-id); together with the dep's own serialized config/argv this re-resolves each dep under the exact propagated config it used, overlays included.

Why the deep trace needs only root-config + source/glob leaves. If the request's config-id equals root-config and every source/glob leaf still matches the live workspace, then every recipe in the closure has byte-identical inputs — so every need overlay it computes is identical, every effective config downstream is identical, and by determinism the output is identical. Any input that could perturb a downstream overlay (a source a recipe branches on, a config value) is itself either a recorded source/glob leaf or folded into root-config, so nothing escapes the check. Using the whole root-config (rather than just consumed keys) is a deliberate, conservative simplification: an unconsumed config change busts the deep fast path, but the shallow path recovers — it re-runs only the recipes whose output actually changed, so unrelated subtrees re-verify by hash without re-running.

Target record (mutable, the only mutable object)

build/target/<pp>/<target-key> lists this target's candidate traces, newest-first, capped:

kit-build-record 1
target //app:server
deep <trace-id>
deep <trace-id>
shallow <trace-id>
shallow <trace-id>

Each row points at a trace body in build/trace/. On every successful resolution the coordinator prepends the fresh trace, de-duplicates, and truncates each kind to a small cap (KIT_BUILD_RECORD_CAP, e.g. 8) — a bounded MRU window. Older traces age out and become GC-eligible. Multiple candidates exist because the same target may have been built under several distinct input/config combinations recently (e.g. two config values flipped back and forth).

Storage state machine (atomicity and crash safety)

The store must never hand back a wrong output after a crash. The rules:

Content objects (cas/blob, cas/tree, build/trace, config maps, and a materialized build/cache/<tree-id>/) are written into build/tmp/, fsync'd, then atomically renamed to their final content-keyed path. A half-written object only ever exists under tmp/ and is orphaned, never under its content key. Re-deriving the same content re-creates the identical path — writes are idempotent, so a racing or retried producer is harmless.
The target record is the only ordering-sensitive state, updated by read-modify-write into tmp/ then atomic rename. A reader always sees a complete prior or complete next version, never a torn one. A crash loses at most the most recent record update; losing a record entry is safe — the next build re-derives and rewrites it. A corrupt record (failed kit-build-record 1 parse) is treated as empty.
Recipe sandboxes live in build/tmp/run-<n>/. On success the output subdir is ingested into the CAS (kit_cas_add_tree_from_dir) and the sandbox is removed; on failure or crash it is orphaned and swept later.

Invariant. Anything reachable under a content key (blob/, tree/, trace/) is complete and matches its key. A target record may dangle — point at a trace that GC removed — and readers tolerate that by treating the missing candidate as absent. The store degrades toward "rebuild," never toward "wrong answer."

Concurrency. Two coordinators sharing a store are safe for content writes (idempotent atomic renames). Target-record updates are last-writer-wins; both writers wrote valid records, so the only loss is MRU ordering, costing at most a rebuild. An optional per-target-key advisory lock around the read-modify-write removes even that.

Build coordinator state and caching

In-memory state

All state hangs off one KitBuildCoordinator context (no globals — the project rule). For the life of the process it memoizes every base-input probe and every target resolution, so a diamond in the graph is built once and a source file is hashed once:

Field	Contents	Notes
`source_hashes`	`path → blob-id` (+ stat).	Per-process memo.
`glob_results`	`pattern → (sorted paths, glob-result-hash)`.	Per-process memo.
`config_maps`	`config-id → map`, and overlay results.	Immutable; content-addressed.
`targets`	`(target-name, config-id) → future` of `{output-tree-id, deep-leafset, path}`.	Memo and in-flight dedup.
`cas`	Open `KitCas*` handle over `cas/`.	Plus optional remote (below).
`store`	Paths + target-record reader/writer over `build/`.	—
`jobs`	Semaphore bounding concurrent recipe processes.	The global parallelism limit.

"Cached over the life of the coordinator process" means exactly these memo tables: the coordinator assumes the workspace does not change under it mid-build.

The targets table is keyed by the full (target-name, config-id) and holds a future, not a flag — so concurrent needs of the same (T, config-id) await one resolution rather than racing (see Parallelism). A resolved build yields the output tree-id (to satisfy a need), a materialized path, and its deep leafset (the transitive source/glob closure, to fold into a parent's deep trace).

Cycle detection

The targets memo alone cannot catch a cycle — a self-dependent target would simply await its own unresolved future and deadlock. So each resolution carries an explicit build chain: the ordered list of (target-name, config-id) frames from the root request down to here. A need whose (dep, config-id) already appears on the chain is a dependency cycle; the chain is the error message (//a → //b → //a). The chain is per-path and distinct from the cross-path memo: the same (T, config-id) may appear in many chains (a shared dep) but never twice in one.

Resolution algorithm

A request is resolve(T, cfg, chain) where cfg is the effective propagated config (with cfg.id its config-id). The config-dependence is recorded in and verified against the traces; the deep path checks whole-config-id equality, the shallow path checks consumed keys.

resolve(T, cfg, chain):
  key = (T, cfg.id)
  if key in chain:  error cycle(chain + key)
  if key in targets:  return await targets[key]        # cross-path memo / in-flight dedup
  targets[key] = new future ; chain' = chain + key

  # ---- Phase 1: deep fast path (same config, did sources move?) -------
  for D in deep_traces(T), newest-first:
     if D.root_config == cfg.id
        and D.argv     == argv_id(T)
        and refresh(D.source_glob_leaves) all match live:   # memoized rehash / reglob
        return done(materialize(D.output, T, cfg), D.leafset)

  # ---- Phase 2: shallow path (config and/or sources moved) ------------
  for S in shallow_traces(T), newest-first:
     if S.recipe != recipe_id(T):           continue
     if S.argv   != argv_id(T):             continue       # local config
     M = config_by_id(S.config)                            # the built map (serialized blob)
     if any consumed key in S.config-keys differs between cfg and M:  continue
     if any direct source/glob leaf of S changed:                     continue
     ok = true ; child_leafsets = []
     for (dep, dep_cfg_id, recorded_tree) in S.deps:       # may run in parallel
        r = resolve(dep, config_by_id(dep_cfg_id), chain')  # replay the need: map recovered by id
        if r.output != recorded_tree: ok = false; break
        child_leafsets.push(r.leafset)
     if ok:
        leafset = union(direct source/glob leaves of S, child_leafsets)
        write_deep_trace(T, cfg.id, leafset, S.output)      # refresh the deep trace
        return done(materialize(S.output, T, cfg), leafset)

  # ---- Phase 3: run the recipe ----------------------------------------
  return run_recipe(T, cfg, chain')

with the materialization ladder shared by every cache hit:

materialize(tree-id, T, cfg):
  if tree-id in build/cache/  : return that path             # already on disk
  if tree-id in cas/          : restore into build/cache/, return path
  if remote configured        : fetch tree-id (+ blobs) into cas/, verify, restore
  else                        : return run_recipe(T, cfg)    # bytes gone: rebuild

This is the brief's flow, sharpened by config:

Phase 1 is the deep fast path: identical config-id and an all-clear source/glob refresh ⇒ reuse. refresh hits the per-process memo, so after the first target touches a file or glob the rest of the closure is free.
Phase 2 is the shallow path, taken when Phase 1 finds nothing (config changed, or a source moved). It guards on the target's own consumed config and direct leaves, then re-resolves each recorded dep under the dep's recorded config-id and compares outputs. Re-using the recorded dep_cfg_id is correct: if the parent's consumed config were different, the consumed-key guard above would have failed first and the parent would re-run, recomputing its need overlays.
A Phase-2 hit also writes a fresh deep trace from the now-known closure, so the next request gets the Phase-1 fast path back.
Phase 3 runs the recipe only when no trace holds, or when a hit's bytes were evicted and no remote can supply them.

The non-obvious win is Phase 2 succeeding after an input changed. A changed source busts the deep trace (Phase 1 gives up), but if that source only feeds a dep whose recipe maps it to an unchanged output (tree-id identical), the dep compare passes and T's recipe is skipped. The deep trace is the cheap "nothing moved" check; the shallow trace is the "did the churn actually reach me?" check.

Running a recipe

run_recipe(T, cfg, chain):
  acquire jobs                                              # global parallelism limit
  sandbox = build/tmp/run-<n>/ ; out = sandbox/out/
  spawn recipe (argv = argv(T), recovered by argv_id) with env:
     { KIT_BUILD_SOCK, KIT_BUILD_OUT=out, KIT_BUILD_TARGET=T, workspace root }
  service the recipe's protocol requests, logging each as a dep (next section);
     config-get reads cfg; need recurses resolve(...) under cfg ⊕ overrides
  release jobs
  on nonzero exit: propagate failure, write NO trace
  on success:
     output = kit_cas_add_tree_from_dir(out)
     put serialized cfg map (cfg.id) and argv vector (argv_id(T)) into the CAS
     write_shallow_trace(T, recipe_id, argv_id, cfg.id, consumed key names,
                         direct leaves, dep edges with their config-ids, output)
     write_deep_trace   (T, cfg.id, argv_id, transitive source/glob closure, output)
     prepend both to T's target record (dedup, truncate to cap)
     install out in build/cache/<output>/
     return done({output, path}, leafset)

Writing both trace kinds on every real build is what lets a later request take whichever path fits the change it sees.

Parallelism

The coordinator owns all parallelism: it is the single process that launches and manages every recipe, so the overall job limit is configured in it (the jobs semaphore). Independent needs resolve concurrently up to that limit; the targets futures guarantee a shared (T, config-id) is built once even when many parents request it at the same instant. Scheduling is at recipe granularity — a recipe may parallelize internally, but the coordinator counts it as one job. Cycle detection composes with concurrency through the per-path build chain described above; the global memo handles dedup, the chain handles cycles.

Recipe protocol

Recipes communicate back over a transport the driver defines and the library abstracts behind a host vtable — the same "host supplies all side effects" principle the rest of kit follows. The library defines the command set and the length-prefixed request/response framing; the host supplies connect/read/write.

typedef struct KitBuildTransport {
  /* Accept one recipe connection (the recipe dialed the endpoint named in
   * $KIT_BUILD_SOCK); read/write length-prefixed frames; close. */
  int (*accept)(void* user, KitBuildConn** out);
  int (*read_frame)(void* user, KitBuildConn*, uint8_t* buf, size_t cap, size_t* n);
  int (*write_frame)(void* user, KitBuildConn*, const uint8_t* buf, size_t n);
  void (*close)(void* user, KitBuildConn*);
  void* user;
} KitBuildTransport;

Defaults and portability: a unix-domain socket (Linux, macOS, FreeBSD, and Windows 10+, which all support AF_UNIX) named in $KIT_BUILD_SOCK, or an anonymous pipe pair on inherited fds where a socket is undesirable; Windows may instead use a named pipe. An in-process transport (a direct callback table) serves recipes that are kit library calls rather than subprocesses. The command set is identical across all of them.

Every request both returns a value and logs a dependency. The logged dependency is what lands in the shallow trace and (after expansion) the deep trace. The contract that makes caching correct: an input the recipe reads but does not request is invisible to the cache — so every input must flow through a command.

Command	Returns	Dependency logged
`config-get <key>`	the propagated value (or unset)	config dep: key recorded as consumed; its value is the one in the effective `config-id` map (absent = unset)
`source <path>`	`blob-id` + a path to read	source dep: `(path, blob-id)`
`glob <pattern>`	sorted list of matching paths	glob dep: `(pattern, glob-result-hash)`
`need <target> [k=v…]`	each dep's output `tree-id` + a path	target dep edge: `(dep, dep-config-id, output-tree-id)`

config-get reads the target's effective propagated config and records the value's hash, so changing a consumed key invalidates while changing an unconsumed key does not. If a changed consumed key makes the recipe branch and request a new key next time, the old trace already fails to match on the changed key, so the new key-set is discovered on the rebuild — self-correcting, never needing to predict the input set ahead of time. Local config (argv) is not read here; it arrives as the process's argv.
source hands back a path inside the workspace (or, in a hermetic variant, a path staged into the sandbox) plus the blob-id the read is pinned to.
glob records the whole match set's content via glob-result-hash; the recipe then reads the returned paths without further declaration.
need is the dynamic-dependency primitive. The optional k=v pairs overlay propagated config for that sub-build (local argv never propagates). The coordinator resolves (dep, cfg ⊕ overrides) recursively, returns its output tree-id and a readable path, and records the edge with the dep's config-id so the shallow path can re-resolve identically. need is also where cycles are caught.
Outputs need no command: the recipe writes under $KIT_BUILD_OUT and the coordinator snapshots that directory on exit.

Remote CAS and shared traces

Two independent network capabilities, split along the DISTRIBUTE.md line — content is trustless (hash-verified); claims are trusted (signed).

Remote object fetch (trustless, via a fetch recipe)

The coordinator performs no network I/O itself. When materialize (or a blob restore) needs an object that is absent locally and a remote is configured, it invokes a user-provided fetch recipe — an executable or command template, e.g. curl -fsS -o "$KIT_FETCH_OUT" "https://cache.example/{kind}/{pp}/{id}", rendered with {kind} ∈ blob/tree, {pp}, {id} exactly like DISTRIBUTE.md's external-fetch templates. The fetched bytes land in build/tmp/ and are verified against the requested content id before being installed into the local CAS; a corrupt or malicious mirror fails the hash check and is discarded. So the remote and the fetch recipe are untrusted — the existing self-verifying CAS makes that safe. This adds one rung to the materialization ladder (tree cache → local CAS → remote fetch → run recipe) and lets a clean checkout download an output instead of rebuilding it.

Shared traces (trusted, as signed packages)

A trace is a claim: "these inputs ⇒ this output tree." Unlike content, it is not self-verifying — confirming it means re-running the (assumed-deterministic) recipe, which is exactly the work we are trying to avoid. To safely import someone else's trace we must trust the claimant. Traces are therefore shared as signed trace bundles, reusing the DISTRIBUTE.md package + minisign + trust machinery wholesale:

A bundle is a signed manifest (kit-build-traces 1, signed exactly like a kit-package manifest) listing (target-name, kind, trace-id, output-tree-id) claims, carried in a .kpkg (portable tar.gz or native kpkg) alongside the trace bodies and the serialized config-map and argv blobs they reference (required — without them an imported shallow trace cannot replay its needs), plus, optionally, the referenced output trees/blobs.
Trust is the DISTRIBUTE.md model unchanged: verify the minisign signature against the trusted-keys file (-p KEY, the anchor file, or --tofu); the signed trusted comment binds the signature to the manifest hash.
Import: verify signature → anchor key → install the trace bodies into build/trace/ and prepend them to the relevant target records. Output bytes come from the bundle or from the remote CAS — either way hash-verified on use.

The result is a precise security split: a local build can now deep/shallow-hit on a remote builder's trace — obtaining the output without running the recipe — while trusting only the signed claim; the output bytes themselves remain trustless (verified by tree-id/blob-id). Trusting a trace signer is exactly like trusting a package signer in DISTRIBUTE.md: it is trust in their build outputs, gated by the trusted-keys allowlist, and auditable by re-running under verify mode.

Determinism and hermeticity

Cache correctness is exactly the assumption output = f(declared inputs), deterministically.

Declared-inputs-are-complete. Reading an undeclared file is a hermeticity violation. The protocol makes declaration the only intended way to get an input, but enforcement is deferred (see below): for now it is a contract recipes must honor.
Determinism. Timestamps, RNG, and unpinned network fetches break the model — a second build's captured tree differs from the recorded output, and the cache would serve a stale-but-believed-current result. Mitigations: declare such inputs (a "now" config value, a pinned URL+hash), or mark a target no-cache so it always runs Phase 3.
Verify mode. A diagnostic mode re-runs a recipe whose trace says "unchanged" and compares the fresh tree-id to the recorded one; a mismatch flags a nondeterministic or under-declared recipe — the recommended audit before trusting (or signing and sharing) traces.

Worked example

//app:server depends on //lib:core; the recipe globs src/*.c, reads src/main.c, and consumes config opt. The request is under config C0 (so config-id = c0).

Cold build. No record. Phase 3 runs both recipes. //lib:core builds → tree L0. //app:server logs config opt, glob src/*.c, source src/main.c, need //lib:core (under c0 → (//lib:core, c0)); yields tree A0. Both trace kinds written for both targets. build/cache/A0/ materialized, path returned.
No-op rebuild (same C0). Phase 1: //app:server's deep trace has root-config c0 (matches) and refreshes opt via config-id, src/*.c, src/main.c, and the folded lib/core.c — all memoized, all match. A0 is in the tree cache. Returns instantly, nothing re-run, no graph walk.
Comment-only edit to lib/core.c. lib/core.c's blob-id moved → the deep trace fails Phase 1 (a folded source leaf changed). Phase 2: direct leaves (opt, src/*.c, src/main.c) unchanged, so probe the one dep — resolve(//lib:core, c0). Its recipe re-runs (its source moved) but emits a byte-identical L0 (comment stripped). L0 == L0 ⇒ //app:server's recipe is skipped, A0 restored, a refreshed deep trace written so step 2's fast path returns next time.
Flip opt (now C1, config-id = c1). Phase 1 is skipped outright: //app:server's deep trace has root-config c0 ≠ c1. Phase 2 under c1: opt is a consumed key whose value differs between c1 and the trace's built map → no shallow trace holds → Phase 3 re-runs //app:server. Its need //lib:core carries no opt override, so the dep resolves as (//lib:core, c1); but //lib:core never consumes opt, so its shallow trace under c1 matches by consumed-keys (empty) and direct leaves, and L0 is reused without re-running. Only the one recipe that actually depends on opt re-ran.

Decided / deferred / open

Decided.

Transport is abstracted behind a host vtable the driver defines; defaults are a unix socket or anonymous pipe, with a Windows named-pipe fallback and an in-process variant. Supported hosts: Linux, macOS, Windows, FreeBSD.
Parallelism lives in the coordinator (the jobs semaphore); futures dedup shared targets; the per-path chain detects cycles.
Remote CAS delegates fetching to a user-provided fetch recipe and verifies by content id (trustless).
Trace sharing ships traces as signed .kpkg bundles over the DISTRIBUTE.md trust model (trusted), because traces are claims, not self-verifying content.
Public surface is a public header <kit/build.h> plus a kit build driver command, gated by KIT_BUILD_ENABLED (the kit/cas.h + kit cas precedent). A thin CLI parses flags and supplies the transport/host vtables; the coordinator and trace model live in the library.

Deferred.

Hermeticity enforcement. No sandbox-deny or FS-trace at first; the protocol defines the contract, enforcement comes later (deny undeclared reads, or strace/FUSE detection that warns).
GC. Deferred. When built: mark-and-sweep rooted at live target records → reachable trace bodies → referenced trees/blobs/config-maps/argv vectors; sweep unreachable trace/, cas/, and cache/, with LRU size bounds on the (re-derivable) tree cache and CAS.

Open — compact deep traces. The flat deep leafset re-lists every transitive source in every ancestor: O(Σ closure sizes) bytes and O(closure) work to refresh. A structural representation fixes both and is the natural form of the "efficient set union" the closure already needs:

Store each target's deep leafset as a content-addressed object deep-set = { direct source/glob leaves, [child-deep-set-id…] }, with deep-set-id = BLAKE2b(canonical(direct ‖ child-ids)). The fully-flattened set is the recursive union, but it is never materialized — an ancestor stores only its own direct slice plus 32-byte pointers to its children's deep-sets. A leaf shared by many ancestors is stored once (a Merkle DAG, structurally sharing like the CAS itself). This is the union operation: building a parent's deep-set is "concatenate sorted direct leaves with child-set pointers," not "merge N flat arrays."
Faster checking falls out of the same structure. Refresh memoizes deep-set-id → valid? for the whole build: a subtree checked once is reused by every parent that points at it, and an unchanged subtree is recognized by pointer/hash equality at its root without descending to leaves. Change detection becomes O(changed frontier) instead of O(closure): recompute current deep-set-ids bottom-up (memoized, so O(distinct subtrees) = O(nodes)), and where a recomputed id equals the recorded one, the whole subtree is provably unchanged and skipped.
The cost is that refresh walks the DAG rather than streaming one flat list, and the child deep-set objects must be present (another store dependency, GC-rooted like traces). The flat form in this doc is the semantic definition; the structural form is the recommended physical representation once flat traces bloat — adopt it together with the in-memory union it mirrors.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README