Distribution as a library subsystem
Status: implemented. The migration below has landed (one commit): the dist subsystem moved to
src/dist/(+ top-levelvendor/), exposed through<kit/cas.h>/<kit/package.h>(src/api/{cas,package}.c), gated byKIT_CAS_ENABLED/KIT_PKG_ENABLED;kit cas/kit pkgare thin CLIs over the public API via aKitCasHostvtable (driver/lib/dist_host.c), with operational errors flowing throughctx->diag. Verified green:test-driver-cas(41) +test-driver-pkg(182) under ASan/UBSan.Deferred: the v2 deletion +
3-suffix rename (Stage 3, below) were not done — the dead v2 code was carried over unchanged. On inspection the deletion is more surgical than the line-ranges below imply: the v2 extern surface (DistManifest/DistArtifact/DistDependency,dist_manifest_*,dist_kpkg2_*, the v2DistKpkg*structs +dist_kpkg_*v2 codecs) is safely unreferenced, but the v2 and v3 manifest parsers share the static helpersset_err/trim_lead/trim_trail/copy_field/kind_validinsrc/dist/manifest.c(onlyparse_u64, the v2finalize, anddist_manifest_path_validare v2-only). The cleanup must keep the shared helpers — verify the same insrc/dist/kpkg.c— and recompile + rerun the cas/pkg suites after.
Signed, content-addressed distribution (kit cas / kit pkg) is today the
only major capability that lives entirely inside driver/ — its model,
its vendored crypto/compression, and its create/verify/unpack pipelines all sit
under driver/dist/ and driver/cmd/{cas,pkg}.c. Every other capability is a
libkit subsystem behind include/kit/, with the CLI tool a thin
arg-parser on top. This doc captures the plan to bring distribution into the
same shape: move the implementation into the library, expose it through two
public headers, and reduce cas.c/pkg.c to flag-parsing + host wiring. The
design it realizes is in ../DISTRIBUTE.md; the precedent it
follows is the ar subsystem (src/api/archive.c + include/kit/archive.h,
gated by KIT_AR_ENABLED distinct from KIT_TOOL_AR_ENABLED).
Goal
libkit.a gains a content-store API and a signed-package API, gated by their
own subsystem flags so a minimal embedding pays nothing for them. The kit cas and kit pkg tools become thin CLIs that translate flags into public
calls and supply host vtables — exactly like ar, ld, objdump. An embedder
can create, sign, verify, inspect, and unpack packages, and drive a CAS, without
the driver and without linking host crypto/compression.
Why this is the right shape (not CLI-only logic)
Two layers are stacked under driver/dist/, and they have very different
readiness:
The
dist_*byte model (driver/dist/*.c, ~6.4k lines plus ~6.7k vendored) is already written to the public boundary's contract. It includes only<kit/core.h>plus its own headers — nodriver.h/env.h. It sources no entropy and does no I/O except throughKitWritercallbacks and a small host vtable (DistCasHost=KitFileIO+mkdir_p+mark_executable). This obeys the "host supplies all side effects" principle verbatim. Moving it is near-mechanical.The
pkg_*/cas_*orchestration (driver/cmd/pkg.c2123 lines,cas.c491 lines) holds the valuable pipelines —pkg_create_targz,pkg_create_kpkg,pkg_verify_portable,pkg_verify_native, blob reconstruction, trust/key resolution — but is entangled with the CLI. The glue to unwind, by call count inpkg.c:driver_errf×88 — stderr error reporting → structured error returns / theKitContextdiag sink (thedist_*parsers already takechar* err, size_t errcap).driver_mkdir_p,driver_mark_executable_output,driver_walk_regular_files— host filesystem ops beyondKitFileIO.driver_random_bytes×2 — host CSPRNG, only for keygen.driver_getenv×2 — trust-file path defaulting ($KIT_TRUSTED_KEYS/$HOME); env-var policy that stays in the driver.driver_streq/driver_printf/driver_has_suffix— arg parsing and stdout formatting; stay in the driver.
The layering invariant forces the move: driver/ may include only
<kit/*.h>, and src/api may not include driver/ headers — so a public
boundary is impossible while the code sits in driver/dist/. Relocating to
src/ is a precondition, not a cleanup.
Target tree layout
vendor/ # top-level: pristine third-party trees
monocypher/ # (moved from driver/dist/vendor/monocypher)
lz4/ # (moved from driver/dist/vendor/lz4)
include/kit/cas.h # content model: blob/tree hashing + CAS store
include/kit/package.h # package model: manifest, sign/verify, create/unpack
src/api/cas.c # public handles <-> internal (archive.c precedent)
src/api/package.c
src/dist/ # moved dist_* subsystem (private headers)
dist.{c,h} blob.{c,h} tree.{c,h} cas.{c,h}
manifest.{c,h} kpkg.{c,h} trust.{c,h}
blake2b.{c,h} ed25519.{c,h} minisig.{c,h} b64.{c,h}
deflate.{c,h} lz4.{c,h} tar.{c,h} # kit-maintained shims/extracts
Vendor split, confirmed by inspection: only monocypher and lz4 are
pristine third-party trees pulled in by #include — they move to a repo-root
vendor/. deflate.c is a kit-maintained extract of miniz (already
modified, not pristine), and b64.c / tar.c are self-contained — these stay
in src/dist/. The shim includes that currently read
"vendor/monocypher/..." (e.g. blake2b.h, ed25519.c) get rewritten to the
new top-level path.
Config gating
Add subsystem flags to include/kit/config.h, separate from the tool flags
(mirroring KIT_AR_ENABLED vs KIT_TOOL_AR_ENABLED):
#define KIT_CAS_ENABLED 1 /* content store: src/dist/{blob,tree,cas} + kit/cas.h */
#define KIT_PKG_ENABLED 1 /* signed packages: adds manifest/kpkg/minisig/crypto + kit/package.h */
KIT_PKG_ENABLED implies KIT_CAS_ENABLED (packages are built over the
content model). KIT_TOOL_CAS_ENABLED / KIT_TOOL_PKG_ENABLED stay and
assert their subsystem flag. Off → the units (and the vendored crypto) drop
entirely, so a minimal embedding carries no Ed25519/BLAKE2b/DEFLATE/LZ4. The
Makefile's LIB_SRCS_* gains a dist regime that pulls src/dist/*.c plus the
enabled vendor/ trees.
Public API surface
Two headers, mirroring DISTRIBUTE.md's content-model vs signed-package split.
Model structs are exposed as POD (renamed to the Kit* convention); the
vendored primitives and the kpkg wire codecs stay internal.
include/kit/cas.h — content model (self-verifying, no trust)
- POD types:
KitTree,KitTreeEntry,KitBlobInfo. - Pure hashing (no I/O):
kit_blob_id,kit_blob_root,kit_blob_info,kit_tree_id,kit_tree_emit,kit_tree_parse,kit_tree_find. - A
KitCashandle overKitContext+ a host vtable:kit_cas_open,kit_cas_put_blob/get_blob,kit_cas_put_tree/get_tree,kit_cas_add_tree_from_dir,kit_cas_verify_tree,kit_cas_materialize.
include/kit/package.h — package model (signed)
- POD model:
KitPackageManifestwith its outputs / artifacts / deps; a publicKitPackageEncodingdescriptor (region layout, chunk-index summary, external-fetch templates) soinspect --encodingand external-fetch planning are real library features. - Keys / trust:
KitMinisigKeypair;kit_pkg_keygen(entropy injected via the host vtable, never read by the library); pubkey/seckey emit + parse;kit_pkg_sign/kit_pkg_verify_signature. Trust resolution takes explicit trusted-keys bytes — the library reads no env vars and no$HOME; the driver supplies the resolved path/bytes. - Pipelines as opts-struct calls:
kit_pkg_create(formatkpkg|tar.gz, native-shapefat|metadata|thin, compression, source =--rootdir orcas + tree, external dir),kit_pkg_verify,kit_pkg_unpack,kit_pkg_inspect.
Kept internal (src/dist/ private headers)
All vendored code; the dist_blake2b / dist_ed25519 / dist_minisig /
dist_b64 / dist_gz / dist_lz4 / dist_tar shims; and the kpkg wire
codecs (header / descriptor / index encode-decode). Rationale: raw crypto and
on-wire binary layout are implementation detail — exposing them invites misuse
and an API-stability burden. The logical model and pipelines are the contract.
New host capabilities
The library reaches the host through KitContext.file_io (read/write,
already present) plus one new vtable for the operations KitFileIO doesn't
cover — every one of which the driver already implements:
typedef struct KitDistHost {
int (*mkdir_p)(void* user, const char* path);
int (*mark_executable)(void* user, const char* path);
int (*walk_regular_files)(void* user, const char* root, /* callback */ ...);
int (*fill_random)(void* user, uint8_t* out, size_t n); /* keygen only */
void* user;
} KitDistHost;
DistCasHost already models mkdir_p + mark_executable; this generalizes it
and adds the directory walk (driver_walk_regular_files) and CSPRNG
(driver_random_bytes). Naming/placement TBD during Stage 2 (could fold the
CAS-only subset into KitCas and keep fill_random package-side).
Error reporting (decided)
Public dist calls return KitStatus and emit human-readable detail
through ctx->diag — not through an err-buffer at the boundary. This is the
established convention, not a new pattern:
KitContextcarriesKitDiagSink* diagdirectly (core.h), so the sink is reachable without aKitCompiler— exactly as the pure-byte subsystems (object, archive, dwarf) get it.- It mirrors the linker:
src/link/link_layout.cemits operational errors such as "linker script: undefined symbol …" through the sink and returns a status. Package/CAS errors are the same shape — operational, no source position. KitStatusalready carries the right categories:KIT_MALFORMED(bad manifest/tree/signature),KIT_NOT_FOUND(missing blob/tree/key),KIT_IO,KIT_INVALID(unsafe path),KIT_UNSUPPORTED(encrypted seckey / scrypt). The status is the machine-readable category; the diag message is the actionable detail ("blob root mismatch for: <path>").
Mechanics:
- No source location. Emit with a zero
KitSrcLoc(file_id 0), as the linker does for non-source errors; the host stderr sink already tolerates it. - The internal
dist_*parsers keep their(char* err, size_t errcap)buffer unchanged. Thesrc/apiwrapper catches that string and forwards it toctx->diag, so the byte model barely changes and its detailed parse messages survive intact. - A small internal
api_diagf(ctx, kind, fmt, …)helper overctx->diag->emit(no-op whendiagis NULL) packs varargs for the api layer. - The 88
driver_errfsites split by ownership. Operational/pipeline errors (create / verify / unpack / resolve) move intosrc/api/package.cas diag emits; pure arg-parse errors ("unknown option","-o BASE is required") stay inpkg.casdriver_errf, because argument parsing is driver policy. - Embedder control. The sink bumps its
errorscounter and prints. For the CLI that is exactly today'sdriver_errfbehavior. An embedder doing speculative verification supplies its own (or no) sink and reads only theKitStatus, so a failed verify stays quiet.
Versioning: latest-only (decided)
We support only the current on-disk format and drop all back-compat code. This
is verified to be pure deletion with zero behavioral change: every v2 symbol
(dist_manifest_*, the non-3 dist_kpkg_*, dist_kpkg2_*, and the
DistManifest / DistArtifact / DistDependency / DistKpkgHeader /
DistKpkgDescriptor / DistKpkgIndexRecord structs) is referenced only in
its own definition files — never by pkg.c, cas.c, or any test. The driver
already emits and reads v3 exclusively.
Dropping it pays off twice:
- Deletes the dead v2 structs / functions / constants from
manifest.candkpkg.c. - Lets the survivors shed the
3suffix as they go public:DistPackageManifest→KitPackageManifest,DistKpkg3Header→KitPackageHeader, internaldist_kpkg3_*→dist_kpkg_*. The versioned naming only existed to coexist with v2.
Precision: drop the v2 parse paths and C identifiers, but keep the on-disk
wire magic at kpkg3\0 / kit-package 3 / kit-encoding 3. "Latest
version" means v3 on disk; renumbering the wire format would itself break
anything already produced. We stop accepting v2 input; we do not renumber.
Staged plan (each stage builds green)
- Vendor move.
driver/dist/vendor/{monocypher,lz4}→ top-levelvendor/{monocypher,lz4}; rewrite the shim#includepaths; update the Makefile. Pure relocation, no API change — lands first to isolate path churn. - Lift-and-shift the content layer. Move
dist.{c,h}blobtreecastosrc/dist/; addsrc/api/cas.c+include/kit/cas.hwrapping blob/tree/CAS; addKIT_CAS_ENABLED; repointdriver/cmd/cas.cat the public header + a host vtable. Smallest behavioral slice; proves the boundary end to end. - Drop v2 first, then extract the package pipelines. Delete the dead v2
code (see Versioning above) and shed the
3suffix — a self-contained, zero-behavior-change cleanup that shrinks the surface before it moves. Then movemanifestkpkgminisigtrustb64deflatelz4tar+ crypto shims tosrc/dist/; lift thepkg_create_*/pkg_verify_*/ unpack / key-resolution logic out ofdriver/cmd/pkg.cintosrc/api/package.cbehindkit_pkg_*, converting operationaldriver_errf→api_diagf(see Error reporting above) anddriver_*fs/random →KitDistHost.pkg.cshrinks to arg parsing + host wiring + trust-path/env policy. This is the bulk of the work and the main risk. - Tests. Keep
test/cas/run.sh+test/pkg/run.shas end-to-end CLI tests; optionally add unit tests that call the new public API directly (now possible — coverage was CLI-only before). - Docs. Update
../DISTRIBUTE.mdpaths (the layering diagram'sdriver/dist/*rows becomesrc/dist/*+ the two public headers), the../DESIGN.mdlayering box (thedriver/dist/callout moves), and theCLAUDE.mdcode map (addvendor/,src/dist/,kit/cas.h,kit/package.h).
Risks / watch items
- Error reporting is decided (see above):
KitStatus+ctx->diag, no boundary err-buffers. Remaining care is mechanical — route the ~70 operationaldriver_errfsites toapi_diagfwhile leaving arg-parse errors inpkg.c, and confirm the CLI's stderr output is unchanged by the existingtest/pkg/run.shcorpus. - Trust policy must not leak into the library.
$KIT_TRUSTED_KEYS/$HOMEdefaulting and--tofuwrite-back are driver policy; the library takes resolved bytes/paths and returns "would-pin this key id" decisions for the driver to act on. Keepgetenvdriver-side. - Binary-format stability. Once the manifest/tree/kpkg model is public, the determinism invariants in DISTRIBUTE.md become a public contract. With v2 gone there is only one format to preserve — keep the wire magic at v3 (do not renumber) and lock the bytes with the existing corpus before refactoring.
- Subsystem flag matrix. Verify
KIT_PKG_ENABLED && !KIT_CAS_ENABLEDis a build-time error, and that both-off drops the vendored crypto so a no-dist embedding stays clean (assert as the other subsystems do).