Code Distribution
kit ships signed, content-addressed software packages with zero host
library dependencies. Everything the package pipeline needs — BLAKE2b and
Ed25519 (via vendored monocypher), the minisign file format, DEFLATE/gzip,
LZ4, base64, and ustar tar — is vendored under src/dist/, so a stock
kit binary can create, sign, verify, inspect, and unpack packages without
linking OpenSSL, zlib, libsodium, or libarchive. The subsystem is a libkit
component, gated by KIT_CAS_ENABLED / KIT_PKG_ENABLED and exposed
through two public headers — <kit/cas.h> (the content store) and
<kit/package.h> (signed packages). Two thin driver tools surface it on the
command line: kit cas and kit pkg. See DRIVER.md for how
these slot into the multitool.
Why this shape
Three design decisions drive the whole subsystem:
- No host crypto/compression. A self-hosting toolchain that depended on
the host's OpenSSL or zlib would not be freestanding. The primitives are
small, audited, and checked into the tree (
vendor/), wrapped behind narrowdist_*shims so the rest of the code never touches a vendor API directly. - Content identity before trust. The store layer is self-verifying by hash and carries no signatures. Trust is layered on top by signing a single small manifest, never by signing bulk content. This keeps the trusted-byte surface tiny and lets the same blobs be shared, mirrored, and re-bundled without re-signing.
- Determinism. Manifests are byte-stable canonical text; ids are hashes of those exact bytes. A package, its trees, and its blobs reproduce identically regardless of which representation carried them.
Layering
kit pkg / kit cas driver/cmd/{pkg,cas}.c (thin CLIs)
|
public API include/kit/{cas,package}.h
src/api/{cas,package}.c
|
package model (signed) src/dist/{manifest,kpkg}.c
|
content model (self-verifying) src/dist/{tree,blob,cas}.c
|
vendored primitives src/dist/{blake2b,ed25519,minisig,
deflate,lz4,b64,tar}.c
vendor/{monocypher,lz4}
The content model knows nothing about signatures or package names. The package
model adds a signed claim over content ids. src/api/{cas,package}.c compose
the internal dist_* model into the public kit_cas_* / kit_pkg_* API;
the CLI tools wire that to the host filesystem and CSPRNG through a
KitCasHost vtable (driver/lib/dist_host.c). The library sources no entropy
and does no I/O beyond the host vtable and writer callbacks; operational errors
return a KitStatus and emit detail through ctx->diag, while argument
parsing and trusted-keys path/pin policy stay in the driver.
Vendored primitives
Shim (src/dist/) |
Backed by | Used for |
|---|---|---|
blake2b.c |
monocypher | content ids, region/merkle roots, minisign checksums |
ed25519.c |
monocypher | minisign signature scheme |
minisig.c |
blake2b + ed25519 + b64 | minisign key/signature file format |
b64.c |
self-contained | minisign key/signature text encoding |
deflate.c |
miniz (public domain) | gzip container for portable packages |
lz4.c |
lz4 reference | optional per-chunk block compression |
tar.c |
self-contained | ustar container framing |
All content hashes are BLAKE2b-256 (DIST_BLAKE2B_LEN = 32). The shims are
deliberately thin: dist_blake2b, dist_ed25519_*, dist_gz_*,
dist_lz4_*, dist_b64_*, dist_tar_*. The vendored monocypher and lz4
trees are pulled in by #include from the shim so kit carries no fork to
maintain (the lz4 tree carries one small, clearly-marked local edit in
xxhash.c: a dead libc-malloc path stubbed out for the freestanding build).
Beyond packaging, the deflate.c/lz4.c codecs are also surfaced standalone
through <kit/compress.h> and the kit compress tool — gzip and the
interoperable LZ4 frame format (.lz4, via the additionally-vendored
lz4frame.c/lz4hc.c/xxhash.c, behind src/dist/lz4frame.c). That frame
container is distinct from the raw LZ4 block compression (dist_lz4_*) used
per-chunk here.
minisign compatibility is exact: keys and signatures use stock minisign's
on-disk byte layout (base64 of "Ed" || keyid || pk, etc.), and signatures
are over minisign's 64-byte BLAKE2b prehash. A passwordless minisign key or
signature can be used interchangeably with kit pkg. Password-encrypted
secret keys (kdf_alg = "Sc") require scrypt, which is not vendored; they are
detected and rejected with a clear error rather than mis-parsed.
Content model
Blobs
A blob is the raw byte content of one regular file. Identity is path-independent:
blob-id = BLAKE2b-256(raw file bytes)
Blobs also carry a chunk merkle root (dist_blob_root, in
src/dist/blob.c) computed over fixed-size chunks (default 64 KiB,
DIST_BLOB_CHUNK_SIZE_DEFAULT). Leaves are domain-separated hashes of
("kit blob leaf v1" || u64le chunk-index || u64le raw-size || bytes);
interior nodes hash ("kit blob node v1" || left || right), pairing
adjacent hashes left-to-right with an odd final hash promoted unchanged (no
padding, no duplicated leaf); the root wraps the top hash under
"kit blob root v1". An empty blob has the fixed root
BLAKE2b-256("kit blob empty v1").
The two ids serve different jobs: blob-id is the simple CAS key for whole
file bytes; blob-root authenticates the chunk stream so a streaming verifier
can accept chunks as they arrive without holding the whole file.
Trees
A tree is a deterministic manifest for one output directory
(src/dist/tree.c). It is strict, byte-stable INI-style text beginning with
kit-tree 1, one [file] section per regular file, sorted bytewise by path.
Each entry records path, mode (- regular or x executable; directories are
implicit), size, blob id, and root. Unknown keys/sections, duplicate
paths, and non-canonical ordering are errors. Paths are slash-separated
relative paths; absolute paths, empty components, ./.., backslashes, drive
colons, and NUL/newline bytes are all rejected (dist_tree_path_valid).
tree-id = BLAKE2b-256(canonical tree manifest bytes)
CAS layout
kit cas maintains a shared on-disk store (src/dist/cas.c):
<cas>/
blob/<prefix>/<blob-id>
tree/<prefix>/<tree-id>
index/<prefix>/<index-root>
chunk/<blob-prefix>/<blob-id>/<chunk-index>
<prefix> is the first two lowercase hex chars of the id. Blob and tree
objects are raw/canonical bytes; index and chunk objects hold native-package
chunk data keyed by the signed index that authenticates them. CAS objects are
never signed — they are self-verifying by content identity. kit cas
supports add-blob, add-tree (from a directory walk or an explicit
path/mode/source map file), inspect-tree, verify-tree, and materialize
(which recreates the directory, verifying every blob and applying modes before
writing).
Package model
A package is a signed claim over one or more output trees. The signed object is
the package manifest (src/dist/manifest.c), strict byte-stable text
beginning with kit-package 3: top-level name/version/description plus
[output] sections (each naming a numeric id, a human-readable name, a tree
id, an optional target triple, and an optional default flag), [artifact]
overlays (semantic labels — exe, dso, obj, wasm, lib, data,
source — each keyed to an output id and a path), and optional
[dependency] sections (validated for shape but not resolved: there is no
dependency solver or network transport in this format).
An output's numeric id is the join key that [artifact] rows reference; its
name is a human-facing label (e.g. the binary or directory name) that
identifies the output to a user without exposing the tree hash. id, name,
and tree are mandatory on every output; target and default are optional.
package-id = BLAKE2b-256(package manifest bytes)
Artifact overlays carry no hashes of their own; size/hash/root/mode live in the
referenced tree, and verification rejects any artifact whose path is absent
from its output tree. An earlier kit-package 2 manifest form (flat,
artifact-indexed) is still parseable for backward compatibility; the tool
emits v3.
Signing and trust
Signing uses a detached minisign signature over the manifest bytes
(dist_minisig_sign). The signature's trusted comment — which minisign covers
with a second global signature — carries pkgid=<hex package-id>. Verification
recomputes the manifest hash and rejects the package if the trusted comment's
pkgid does not match. This binds the signature to the exact manifest content,
not merely to a name.
Trust anchors live in a trusted-keys file ($KIT_TRUSTED_KEYS, else
$HOME/.config/kit/trusted_keys; src/dist/trust.c), one
keyid pubkey label line each. A .pub bundled inside a package is never
trusted on its own. The verifier picks a key by the signature's key id:
-p PUBKEY verify against an explicitly supplied public key
(default) look the signer's key id up in the trusted-keys file
--tofu trust-on-first-use: pin the bundled key after its key id
matches the signature, then record it in the trusted-keys file
kit pkg trust {path|list|add|remove} manages the anchor file, and
kit pkg keygen produces a passwordless minisign keypair from the host
CSPRNG.
Representations
One package model, two on-disk shapes.
Portable .tar.gz
A gzip-compressed ustar archive carrying the signed manifest plus a CAS-shaped
object bundle (pkg_create_targz):
hello-0.3.1.tar.gz
kit/package.manifest
kit/package.manifest.minisig
kit/package.pub
kit/cas/tree/<prefix>/<tree-id>
kit/cas/blob/<prefix>/<blob-id>
This is meant for ordinary archive tooling and offline transfer — it is not
seek-optimized. Verification (pkg_verify_portable) decompresses, parses the
tar, anchors and verifies the manifest signature, recomputes the package id,
then for every output tree verifies the tree-id, verifies every referenced
blob by blob-id and blob-root, and checks that artifact overlays resolve.
Native .kpkg
A signed pack (src/dist/kpkg.c, pkg_create_kpkg) with a fixed
trust-neutral header (kpkg3\0, 96 bytes) that only locates the early signed
metadata: manifest, manifest signature, encoding descriptor, descriptor
signature, and bundled pubkey. It supports three shapes from one format:
- fat — everything embedded (tree manifests, chunk index, chunk content),
- metadata-rich — trees and chunk index embedded, chunk content external,
- thin — only signed metadata in the file; trees, index, and chunks external.
Non-fat shapes write their external objects into a --external DIR laid out
exactly like the shared CAS.
The native physical layout is itself signed, separately from the logical
package, by an encoding descriptor: strict text beginning
kit-encoding 3, signed by the same trusted key as the manifest. The
manifest signs what the release is; the descriptor signs how the bytes are
arranged — region offsets/sizes, embedded-vs-external decisions, chunk size,
alignment, the chunk index root, and per-region authentication roots. A region
root is domain-separated:
region-root = BLAKE2b-256("kit region v1" || kind || BLAKE2b-256(region bytes))
for kind in tree/index/content. Verification confirms the descriptor's
package-id matches the manifest, then recomputes each region root from the
actual bytes and compares — so the descriptor's claims about layout are as
trusted as the manifest's claims about content. (A legacy kit-encoding 2
descriptor and kpkg2\0 header are still parseable; the tool emits v3.)
The chunk index is a sorted (by blob id, then chunk index) array of
fixed-size little-endian records (DIST_KPKG3_INDEX_RECORD_SIZE = 168). Each
record names a blob's chunk and carries stored-size/raw-size, a
compression tag (none or lz4-block-v1), the stored byte offset within the
embedded content region (zero for external chunks), and three hashes:
stored-hash, raw-hash, and the blob leaf-hash. Empty blobs contribute no
records.
Chunk verification is layered defense-in-depth: fetch stored bytes, check
stored-hash, decompress, check raw-hash, recompute and check the blob
leaf-hash, and finally recompute the whole blob's blob-id and blob-root
against the tree entry. Native verification (pkg_verify_native) runs in this
order: read header, verify manifest signature, verify descriptor signature with
the same key, confirm package-id and that all region ranges sit inside the
file, recompute and match every region root, confirm the index is sorted and
well-formed, then reconstruct, verify, and (for unpack) materialize each blob
of the selected output tree.
External objects
The package tool performs no network fetches. Descriptor url, index-url,
and [chunk-source] template values are untrusted fetch hints (rendered with
{blob-prefix}, {blob}, {chunk}) for external tools — curl, mirrors,
build caches. A consumer fetches the referenced bytes into a CAS-shaped
--external DIR and runs pkg verify/pkg unpack against it. Locators are
treated as untrusted relative paths: the verifier rejects absolute paths and
.., constrains everything under the external dir, and accepts bytes only
after they match the signed descriptor's hashes, ids, and roots.
CLI
kit cas add-blob/add-tree/inspect-tree/verify-tree/materialize --cas DIR ...
kit pkg keygen -o BASE # writes BASE.pub + BASE.key
kit pkg create --name N --version V [--desc D] -s SECKEY
[--format kpkg|tar.gz] [--compression none|lz4-block-v1]
[--native-shape fat|metadata|thin] [--external DIR]
(--cas DIR --tree TREE_ID | --root DIR) -o OUT
kit pkg verify [-p PUBKEY | --tofu] [--external DIR] FILE
kit pkg unpack [--verify] [-p PUBKEY | --tofu] [--external DIR] FILE -C DIR
kit pkg inspect [--manifest | --encoding] FILE
kit pkg trust {path | list | add PUBKEY [label] | remove KEYID}
create --root DIR first builds a temporary tree from the directory, then
packages it; --cas DIR --tree TREE_ID packages an existing tree. Format is
inferred from the output suffix when --format is omitted. unpack always
verifies before writing files. The native descriptor for thin/metadata-rich
packages can be dumped with inspect --encoding to derive a fetch plan.
Determinism invariants
Emitters and verifiers preserve these identities exactly: package ids are manifest-byte hashes; tree ids are tree-manifest-byte hashes; blob ids are raw-byte hashes; blob roots are path-independent chunk merkle roots; native chunk indexes are blob-indexed (not path-indexed); and portable and native packages verify the same logical package/tree/blob content.