kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit e4cd5e7ca7a0f2f6d733d2b49fc524f8b83bc025
parent e807eee24c6272883531ca5fa1add0b7a4659ef8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Wed, 13 May 2026 20:24:51 -0700

Add neutral CG backend migration plan

Diffstat:
Adoc/cg-neutral-backend-plan.md | 286+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 286 insertions(+), 0 deletions(-)

diff --git a/doc/cg-neutral-backend-plan.md b/doc/cg-neutral-backend-plan.md @@ -0,0 +1,286 @@ +# Neutral CG Backend Migration Plan + +This document plans the migration from the existing C-shaped codegen path to a +neutral CG layer based on the public API in `include/cfree/cg.h`. It also +consolidates the lower-layer gap inventory exposed while updating +`src/api/cg.c` for that API. + +The central goal is that the C frontend becomes one client of a neutral codegen +interface. C `Type*` should stop being the backend type currency; it should be +translated at the frontend boundary into neutral CG type descriptors. Backends, +ABI classification, and target lowering should consume CG types and CG +operation descriptors. + +## Principles + +- Reuse public CG semantic enums and flags when they name the exact internal + concept: calling convention, TLS model, tail policy, memory order, rounding, + ABI attribute flags, operation flags, asm flags, and similar values. +- Do not pass public API structs directly into lower layers. Public structs use + API handles, caller-owned arrays, and frontend-facing ownership rules. Lower + layers should receive resolved internal descriptors with stable storage. +- Move C `Type*` above CG. The C parser/type system may still use `Type*`, but + it should lower C declarations, expressions, and layout requests into neutral + CG types before reaching ABI or `CGTarget`. +- Keep `ObjBuilder` mostly type-agnostic. It should model object-format facts: + symbols, sections, groups, relocations, data expressions, TLS model, sizes, + alignments, display names, and format-specific extensions. It should not + become a typed IR layer. +- Make unsupported behavior explicit. If a public CG feature cannot be lowered + or represented, the target/object layer should answer false through a + capability query or emit a diagnostic. Metadata should not be silently + ignored unless the API defines it as a hint. + +## Gap Coverage + +The public CG API already describes more semantics than the current lower +layers can represent. The migration plan below addresses these gaps by moving +metadata into neutral CG descriptors, object descriptors, or explicit target +capabilities. + +`CGTarget` and ABI gaps to close: + +- Non-default calling conventions are recorded by the public API but not + carried into ABI classification or lowering. +- ABI attributes are not consumed by call, return, or parameter lowering: + signext, zeroext, sret, byval, byref, inreg, noalias, readonly, writeonly, + nonnull, nest, explicit alignment, and dereferenceable size. +- Function attributes are incomplete below the API: stack alignment, custom + sections, target feature strings, cold/hot hints, naked functions, interrupt + functions, no-red-zone requests, ifunc, and full noreturn handling. +- Per-symbol TLS model selection does not reach target lowering. +- Pointer address spaces are only partially represented and do not have full + target semantics. +- Memory access metadata loses nontemporal, invariant, alias scope, and noalias + scope information. +- Computed goto, label-address values, and indirect branch over a validated + target set are unsupported. +- Switch lowering has no target hook and currently ignores jump-table hints. +- Integer operation flags are ignored: nsw, nuw, exact, trapping overflow, and + saturating arithmetic. +- Floating-point semantics are incomplete: FP remainder, fast-math flags, and + ordered-vs-unordered comparisons are not preserved. +- Conversion rounding modes are ignored. +- The internal intrinsic set is narrower than the public API, including FMA, + syscall, IRQ operations, barriers, cache maintenance, CPU wait/event ops, + coroutine switch, and signed-vs-unsigned overflow intrinsics. +- Atomic legality and lock-free queries are approximated from size instead of + target hooks; weak compare-exchange is accepted but not represented. +- Inline asm loses flags and ABI clobber sets. +- Call attributes are incomplete: musttail compatibility is not validated and + cold-call hints are ignored. + +`ObjBuilder` gaps to close: + +- Source/display names are not represented for symbols. +- DLL import/export and constructor priority are not semantic object features. +- Data label addresses have no object-level expression path. +- Data relocation address spaces are ignored. +- Symbol-difference expressions rely on available relocation kinds rather than + a format-neutral expression contract. +- Section merge/string entry size is not fully wired through data definitions. +- Common, weak, protected visibility, and COMDAT are only partially modeled as + an explicit object-level contract. + +## Type Direction + +Introduce an internal neutral CG type model as the canonical backend type +language. The public `CfreeCgTypeId` can be an API handle into this model, while +internal code may use either stable `CGTypeId` handles or `const CGType*` +references after validation. + +Surfaces that currently carry `Type*` and should move to neutral CG types +include: + +- `Operand.type` +- `MemAccess.type` +- `ConstBytes.type` +- `FrameSlotDesc.type` +- `CGParamDesc.type` +- `CGABIValue.type` +- `CGFuncDesc.fn_type` +- `CGCallDesc.fn_type` +- `AsmConstraint.type` +- ABI record layout and function classification inputs + +The C frontend should own the `Type* -> CGTypeId` adapter. Public CG API users +already construct neutral CG types directly, so they should not round-trip +through C types. + +## Internal Descriptor Shape + +Internal descriptors should be isomorphic to the public CG API where that is +useful, but resolved into backend-owned terms. + +For example, public input: + +```c +CfreeCgFuncSig +``` + +should normalize into an internal descriptor shaped like: + +```c +typedef struct CGAbiAttrs { + uint32_t flags; + uint32_t align; + uint64_t dereferenceable_size; +} CGAbiAttrs; + +typedef struct CGParam { + CGTypeId type; + CGAbiAttrs attrs; +} CGParam; + +typedef struct CGFuncSig { + CGTypeId ret; + CGAbiAttrs ret_attrs; + const CGParam* params; + uint32_t nparams; + CfreeCgCallConv call_conv; + int abi_variadic; +} CGFuncSig; +``` + +`TargetABI` should classify `CGFuncSig`, not a C function `Type*`. Parser paths +that still start with C `Type*` should synthesize a `CGFuncSig` during lowering. + +## Phasing + +### 1. Introduce Neutral CG Core Types + +Add the internal CG type table and descriptor APIs first, while keeping the old +codegen path working. This phase should define: + +- `CGTypeId` / `CGType` and constructors for builtin, pointer, array, function, + record, enum, and alias types. +- type layout/query hooks backed by `TargetABI`. +- `CGFuncSig`, `CGParam`, `CGAbiAttrs`, and neutral memory/access descriptors. +- a C frontend adapter from `Type*` to `CGTypeId`. + +This gives both the public CG API and the C frontend a shared neutral model +instead of treating `include/cfree/cg.h` as a facade over C-shaped internals. + +### 2. Move the C Frontend to the New CG Layer + +Make the C parser/frontend emit through the new CG API/layer. The old internal +CG path should no longer be a privileged backend path for C. + +This is the main semantic forcing function. It should prove that the neutral +type model can express normal C codegen, ABI calls, locals, lvalues, aggregates, +initializers, debug-facing names, and target-specific lowering requests. + +Prefer targeted red-green coverage during this phase: + +- function calls and returns for scalar, aggregate, variadic, and sret cases. +- object definitions, tentative definitions, TLS, readonly data, and custom + sections. +- control flow, switches, computed goto once supported, and inline asm. +- atomics and memory access descriptors. + +### 3. Keep the Old CG Layer Temporarily + +Do not delete `src/cg` immediately after the frontend starts targeting neutral +CG. Keep it as an adapter, comparison point, or dead-but-buildable path until +the new route is proven by the focused test corpus. + +The deletion point should be mechanical: no production path and no useful test +harness should depend on the old layer. Any parity tests worth keeping should +move to the new API before deletion. + +### 4. Update ObjBuilder to Object Descriptors + +Update `ObjBuilder` before broad `CGTarget` surgery where the new CG API already +needs stronger object semantics. + +`ObjBuilder` should grow descriptor-based write APIs for: + +- symbols with linkage name, display name, bind, visibility, kind, used, + import/export flags, COMDAT/group membership, common definition, constructor + priority, and per-symbol TLS model. +- sections with kind, semantic type, flags, alignment, entry size, group, link, + info, and format extension fields. +- data expressions for absolute symbol addresses, PC-relative symbol + references, symbol differences, and label-address values. + +Label addresses should ideally lower to normal local symbols. `CGTarget` or +`MCEmitter` can create a local notype symbol for an addressable block label; data +tables then use normal symbol relocations instead of a special data-label path. + +This phase should keep `ObjBuilder` independent of full CG type semantics. It +needs sizes and alignments at definition time, not a general type graph. + +### 5. Update ABI and CGTarget to Consume CG Types + +Once the frontend and object layer are speaking the neutral model, update ABI +classification and `CGTarget` signatures to consume CG descriptors directly. + +Important changes: + +- Replace `abi_func_info(TargetABI*, const Type*)` with classification keyed by + `CGFuncSig`. +- Preserve ABI attributes in `ABIFuncInfo` / `ABIArgInfo`: signext, zeroext, + sret, byval, byref, inreg, noalias, readonly, writeonly, nonnull, nest, + explicit alignment, and dereferenceable size. +- Extend `CGFuncDesc` for complete function attrs: stack alignment, section, + target feature strings, cold/hot, naked, interrupt, no-red-zone, ifunc, and + noreturn. +- Extend `CGCallDesc` for tail policy, musttail validation, cold call hints, + direct/indirect callee details, and full ABI signature metadata. +- Replace simple op hooks with descriptors preserving integer flags, FP flags, + ordered/unordered FP comparisons, FP remainder, and conversion rounding. +- Preserve full memory metadata: address space, volatile, nontemporal, + invariant, alias scope, noalias scope, and atomic flag/order. +- Add target hooks or descriptors for switches, label addresses, indirect + branches, atomics legality/lock-free queries, weak compare-exchange, expanded + intrinsics, and inline asm flags/ABI clobber sets. + +`opt_cgtarget` and IR replay should mirror the new `CGTarget` surface rather +than reconstructing lost metadata. + +### 6. Delete the Old CG Layer + +Delete the old CG layer only after: + +- the C frontend emits through neutral CG. +- public CG API tests pass through the same path. +- `ObjBuilder`, `TargetABI`, `CGTarget`, and `opt_cgtarget` consume neutral + descriptors. +- any useful parity tests have been moved. +- no production driver or test harness depends on the old interfaces. + +At this point deletion should be mostly removing stale adapters and C-shaped +plumbing, not making new semantic decisions. + +## Capability and Diagnostic Contract + +Capability queries should answer correctness, not performance. A target should +return support only when it can preserve the requested semantics. + +Examples: + +- non-default calling conventions must be target-backed. +- musttail requires ABI compatibility validation. +- symbol feature queries should be backed by `ObjBuilder` and object-format + support, not approximated in `src/api/cg.c`. +- atomic legality and lock-free answers should come from target hooks. +- strict conversion rounding, trapping overflow, saturating arithmetic, FP + remainder, and runtime/bare-metal intrinsics should diagnose until supported. + +Hints such as non-temporal memory, branch/call hotness, and some fast-math flags +may be ignored only when the public API explicitly permits that behavior. + +## Suggested Test Strategy + +Prefer narrow tests while the interfaces are changing: + +- `make test-cg` for neutral CG lowering and ABI behavior. +- `make test-elf` for symbol attrs, sections, `entsize`, data expressions, and + object round-trips. +- `make test-link` for relocation behavior, visibility, TLS, COMDAT, and + symdiff handling. +- frontend subsets such as `make test-parse test-cg` when migrating C lowering. +- specific arch smoke/codegen cases for features each target claims to support. + +Keep unsupported-feature tests explicit: they should assert diagnostics or false +capability answers rather than relying on accidental backend behavior.