CG / ObjBuilder Lifecycle
This is the target lifecycle for semantic code generation and object building.
It is motivated by LTO, but it should be true for ordinary one-TU compilation
as well: ObjBuilder owns object lifetime, while KitCg borrows an object and
finishes codegen into it.
Status (2026-06-04): the borrowed CG/object lifecycle is implemented as the only
public CG session interface. kit_cg_free aborts and detaches without flushing,
lowering, debug-emitting, or finalizing the borrowed object. Shared-library LTO
remains disabled until that output path is exercised.
Problem
Historically KitCg had an object-shaped lifecycle:
cg_begin_object(cg, ob, code_opts);
frontend_compile_cg(..., cg);
cg_end_object(cg);
kit_obj_builder_finalize(ob);
That was the wrong ownership boundary. KitCg does not create, emit, link, or
free the object; the caller does. In the borrowed lifecycle, kit_cg_finish
finalizes the CG target and emits debug, while kit_cg_detach drops the
borrowed object/target links. kit_cg_free follows the abort path and never
finishes a partial object as a side effect of cleanup.
It also makes LTO harder to finish cleanly. LTO needs to collect multiple source
units into one object, then finish semantic codegen only after the driver/linker
has enough information to provide preserved/export policy. That handoff should
be a KitCg finish option, not a driver-owned pseudo-unit abstraction.
Ownership Model
ObjBuilder owns object state:
- symbol identity and the name-to-id index;
- sections, atoms, relocations, data bodies, common symbols, and object metadata;
- object-level finalization and emission;
- object lifetime and cleanup.
KitCg owns a semantic codegen session attached to an object:
- the current target/recorder/backend;
- codegen options and whole-module optimization state;
- source-unit boundaries and provenance;
- debug/codegen state that is produced by semantic lowering;
- the final codegen flush into the borrowed object.
The driver or API caller owns orchestration:
- creating/freeing
ObjBuilder; - deciding source order and which inputs are semantic vs opaque;
- passing link-picture policy to codegen finish;
- calling
kit_obj_builder_finalizeand then emitting/linking the object.
Target API Shape
The exact names can change, but the shape should be explicit:
KitObjBuilder* ob = NULL;
KitCg* cg = NULL;
kit_obj_builder_new(compiler, &ob);
kit_cg_new(compiler, &cg);
kit_cg_begin(cg, ob, &code_opts); /* borrow ob, attach backend */
kit_cg_begin_unit(cg, &unit_opts); /* source contribution */
frontend_compile_cg(..., cg);
kit_cg_end_unit(cg);
kit_cg_finish(cg, &finish_opts); /* flush/lower/debug into ob */
kit_cg_detach(cg); /* drop borrowed links */
kit_obj_builder_finalize(ob);
For multi-source LTO, only the unit loop grows:
kit_obj_builder_new(compiler, &ob);
kit_cg_new(compiler, &cg);
kit_cg_begin(cg, ob, &code_opts);
for each semantic source:
kit_cg_begin_unit(cg, &unit_opts);
frontend_compile_cg(..., cg);
kit_cg_end_unit(cg);
kit_cg_finish(cg, &finish_opts);
kit_cg_detach(cg);
kit_obj_builder_finalize(ob);
Opaque frontends do not attach to KitCg; they compile directly into their own
ObjBuilder and enter link/archive/relocatable order as ordinary objects.
Object vs Unit
An object is the emitted product. It may contain one source unit or many.
A unit is one semantic source contribution inside the object. Unit boundaries are not object boundaries. They exist so codegen can track:
- source name and source identity;
- ODR/duplicate-definition provenance;
- debug compilation-unit identity;
- file-scope asm and file-scope language state boundaries;
- future per-source codegen options or path-map state;
- contribution tables for "symbol X was defined by unit N".
Finish Options
kit_cg_finish is where link-picture-dependent policy enters semantic
optimization. For LTO, finish options should eventually carry:
- preserved symbols: entry, dynamic exports, opaque undefined references,
used, init/fini, asm-named/address-significant symbols, IFUNC, etc.; - output policy: executable, shared library, relocatable, archive member;
- interposition policy: default-visibility shared-library symbols are
interposable unless hidden/version-script/
-Bsymbolicpolicy says otherwise; - debug policy for cross-unit inlining.
The finish operation may use internal ObjSymId sets when the linker/driver has
already resolved names into the shared ObjBuilder. A public API can offer a
name-based adapter if needed, but the core should prefer symbol ids once an
object exists.
kit_cg_finish must not call kit_obj_builder_finalize. The caller finalizes
the object after CG has finished writing semantic output into it.
Failure Model
Cleanup must not finalize by accident.
kit_cg_finishis the only operation that flushes/lower/debug-emits CG state.kit_cg_abortdrops current CG-side state and detaches from the borrowed object without finalizing anything.kit_cg_freenever calls finish implicitly.- The caller decides whether to finalize or free the
ObjBuilder.
This fixes the old wart where freeing an open KitCg could finalize a partial
object.
Boundary Rules
Frontends should only see the KitCg semantic API or the object-only API they
explicitly implement. A semantic frontend should not own ObjBuilder
finalization, and an opaque frontend should not need a fake KitCg.
ObjBuilder should remain the single source of truth for object symbol identity
and storage. CG may ask it to declare/define/merge contributions, but CG should
not own object lifetime.
The driver should not implement symbol merge, semantic finalization, or
internalization policy. It should gather sources, gather opaque inputs, compute
or request preserved/export policy, and pass that policy to kit_cg_finish.
Migration Plan
- Introduce borrowed-lifecycle names as the public API:
kit_cg_begin,kit_cg_finish,kit_cg_detach, andkit_cg_abort. - Make one-TU semantic compilation use the same borrowed lifecycle that LTO
uses: caller creates
ObjBuilder, CG borrows it, CG finishes, caller finalizes the object. - Add
begin_unit/end_unitbookkeeping and use it in ordinary one-TU and multi-source LTO paths. - Move output-kind and preserved/export input into
kit_cg_finishoptions. The driver now passes output-kind/interposition policy for supported outputs; preserved-symbol computation, internalization, and shared-library LTO remain follow-up work, so global roots stay conservative. - Move duplicate function/data contribution bookkeeping toward the
ObjBuilder/CG contribution boundary sosrc/optandsrc/cg/data.cdo not each own fragments of LTO symbol-resolution policy.
Non-Goals
- This does not introduce a separate public
LtoUnitabstraction. - This does not require serialized IR objects.
- This does not make frontends own object finalization.
- This does not make opaque inputs semantic; asm and prebuilt objects remain ordinary object participants.