commit e4cd5e7ca7a0f2f6d733d2b49fc524f8b83bc025
parent e807eee24c6272883531ca5fa1add0b7a4659ef8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Wed, 13 May 2026 20:24:51 -0700
Add neutral CG backend migration plan
Diffstat:
1 file changed, 286 insertions(+), 0 deletions(-)
diff --git a/doc/cg-neutral-backend-plan.md b/doc/cg-neutral-backend-plan.md
@@ -0,0 +1,286 @@
+# Neutral CG Backend Migration Plan
+
+This document plans the migration from the existing C-shaped codegen path to a
+neutral CG layer based on the public API in `include/cfree/cg.h`. It also
+consolidates the lower-layer gap inventory exposed while updating
+`src/api/cg.c` for that API.
+
+The central goal is that the C frontend becomes one client of a neutral codegen
+interface. C `Type*` should stop being the backend type currency; it should be
+translated at the frontend boundary into neutral CG type descriptors. Backends,
+ABI classification, and target lowering should consume CG types and CG
+operation descriptors.
+
+## Principles
+
+- Reuse public CG semantic enums and flags when they name the exact internal
+ concept: calling convention, TLS model, tail policy, memory order, rounding,
+ ABI attribute flags, operation flags, asm flags, and similar values.
+- Do not pass public API structs directly into lower layers. Public structs use
+ API handles, caller-owned arrays, and frontend-facing ownership rules. Lower
+ layers should receive resolved internal descriptors with stable storage.
+- Move C `Type*` above CG. The C parser/type system may still use `Type*`, but
+ it should lower C declarations, expressions, and layout requests into neutral
+ CG types before reaching ABI or `CGTarget`.
+- Keep `ObjBuilder` mostly type-agnostic. It should model object-format facts:
+ symbols, sections, groups, relocations, data expressions, TLS model, sizes,
+ alignments, display names, and format-specific extensions. It should not
+ become a typed IR layer.
+- Make unsupported behavior explicit. If a public CG feature cannot be lowered
+ or represented, the target/object layer should answer false through a
+ capability query or emit a diagnostic. Metadata should not be silently
+ ignored unless the API defines it as a hint.
+
+## Gap Coverage
+
+The public CG API already describes more semantics than the current lower
+layers can represent. The migration plan below addresses these gaps by moving
+metadata into neutral CG descriptors, object descriptors, or explicit target
+capabilities.
+
+`CGTarget` and ABI gaps to close:
+
+- Non-default calling conventions are recorded by the public API but not
+ carried into ABI classification or lowering.
+- ABI attributes are not consumed by call, return, or parameter lowering:
+ signext, zeroext, sret, byval, byref, inreg, noalias, readonly, writeonly,
+ nonnull, nest, explicit alignment, and dereferenceable size.
+- Function attributes are incomplete below the API: stack alignment, custom
+ sections, target feature strings, cold/hot hints, naked functions, interrupt
+ functions, no-red-zone requests, ifunc, and full noreturn handling.
+- Per-symbol TLS model selection does not reach target lowering.
+- Pointer address spaces are only partially represented and do not have full
+ target semantics.
+- Memory access metadata loses nontemporal, invariant, alias scope, and noalias
+ scope information.
+- Computed goto, label-address values, and indirect branch over a validated
+ target set are unsupported.
+- Switch lowering has no target hook and currently ignores jump-table hints.
+- Integer operation flags are ignored: nsw, nuw, exact, trapping overflow, and
+ saturating arithmetic.
+- Floating-point semantics are incomplete: FP remainder, fast-math flags, and
+ ordered-vs-unordered comparisons are not preserved.
+- Conversion rounding modes are ignored.
+- The internal intrinsic set is narrower than the public API, including FMA,
+ syscall, IRQ operations, barriers, cache maintenance, CPU wait/event ops,
+ coroutine switch, and signed-vs-unsigned overflow intrinsics.
+- Atomic legality and lock-free queries are approximated from size instead of
+ target hooks; weak compare-exchange is accepted but not represented.
+- Inline asm loses flags and ABI clobber sets.
+- Call attributes are incomplete: musttail compatibility is not validated and
+ cold-call hints are ignored.
+
+`ObjBuilder` gaps to close:
+
+- Source/display names are not represented for symbols.
+- DLL import/export and constructor priority are not semantic object features.
+- Data label addresses have no object-level expression path.
+- Data relocation address spaces are ignored.
+- Symbol-difference expressions rely on available relocation kinds rather than
+ a format-neutral expression contract.
+- Section merge/string entry size is not fully wired through data definitions.
+- Common, weak, protected visibility, and COMDAT are only partially modeled as
+ an explicit object-level contract.
+
+## Type Direction
+
+Introduce an internal neutral CG type model as the canonical backend type
+language. The public `CfreeCgTypeId` can be an API handle into this model, while
+internal code may use either stable `CGTypeId` handles or `const CGType*`
+references after validation.
+
+Surfaces that currently carry `Type*` and should move to neutral CG types
+include:
+
+- `Operand.type`
+- `MemAccess.type`
+- `ConstBytes.type`
+- `FrameSlotDesc.type`
+- `CGParamDesc.type`
+- `CGABIValue.type`
+- `CGFuncDesc.fn_type`
+- `CGCallDesc.fn_type`
+- `AsmConstraint.type`
+- ABI record layout and function classification inputs
+
+The C frontend should own the `Type* -> CGTypeId` adapter. Public CG API users
+already construct neutral CG types directly, so they should not round-trip
+through C types.
+
+## Internal Descriptor Shape
+
+Internal descriptors should be isomorphic to the public CG API where that is
+useful, but resolved into backend-owned terms.
+
+For example, public input:
+
+```c
+CfreeCgFuncSig
+```
+
+should normalize into an internal descriptor shaped like:
+
+```c
+typedef struct CGAbiAttrs {
+ uint32_t flags;
+ uint32_t align;
+ uint64_t dereferenceable_size;
+} CGAbiAttrs;
+
+typedef struct CGParam {
+ CGTypeId type;
+ CGAbiAttrs attrs;
+} CGParam;
+
+typedef struct CGFuncSig {
+ CGTypeId ret;
+ CGAbiAttrs ret_attrs;
+ const CGParam* params;
+ uint32_t nparams;
+ CfreeCgCallConv call_conv;
+ int abi_variadic;
+} CGFuncSig;
+```
+
+`TargetABI` should classify `CGFuncSig`, not a C function `Type*`. Parser paths
+that still start with C `Type*` should synthesize a `CGFuncSig` during lowering.
+
+## Phasing
+
+### 1. Introduce Neutral CG Core Types
+
+Add the internal CG type table and descriptor APIs first, while keeping the old
+codegen path working. This phase should define:
+
+- `CGTypeId` / `CGType` and constructors for builtin, pointer, array, function,
+ record, enum, and alias types.
+- type layout/query hooks backed by `TargetABI`.
+- `CGFuncSig`, `CGParam`, `CGAbiAttrs`, and neutral memory/access descriptors.
+- a C frontend adapter from `Type*` to `CGTypeId`.
+
+This gives both the public CG API and the C frontend a shared neutral model
+instead of treating `include/cfree/cg.h` as a facade over C-shaped internals.
+
+### 2. Move the C Frontend to the New CG Layer
+
+Make the C parser/frontend emit through the new CG API/layer. The old internal
+CG path should no longer be a privileged backend path for C.
+
+This is the main semantic forcing function. It should prove that the neutral
+type model can express normal C codegen, ABI calls, locals, lvalues, aggregates,
+initializers, debug-facing names, and target-specific lowering requests.
+
+Prefer targeted red-green coverage during this phase:
+
+- function calls and returns for scalar, aggregate, variadic, and sret cases.
+- object definitions, tentative definitions, TLS, readonly data, and custom
+ sections.
+- control flow, switches, computed goto once supported, and inline asm.
+- atomics and memory access descriptors.
+
+### 3. Keep the Old CG Layer Temporarily
+
+Do not delete `src/cg` immediately after the frontend starts targeting neutral
+CG. Keep it as an adapter, comparison point, or dead-but-buildable path until
+the new route is proven by the focused test corpus.
+
+The deletion point should be mechanical: no production path and no useful test
+harness should depend on the old layer. Any parity tests worth keeping should
+move to the new API before deletion.
+
+### 4. Update ObjBuilder to Object Descriptors
+
+Update `ObjBuilder` before broad `CGTarget` surgery where the new CG API already
+needs stronger object semantics.
+
+`ObjBuilder` should grow descriptor-based write APIs for:
+
+- symbols with linkage name, display name, bind, visibility, kind, used,
+ import/export flags, COMDAT/group membership, common definition, constructor
+ priority, and per-symbol TLS model.
+- sections with kind, semantic type, flags, alignment, entry size, group, link,
+ info, and format extension fields.
+- data expressions for absolute symbol addresses, PC-relative symbol
+ references, symbol differences, and label-address values.
+
+Label addresses should ideally lower to normal local symbols. `CGTarget` or
+`MCEmitter` can create a local notype symbol for an addressable block label; data
+tables then use normal symbol relocations instead of a special data-label path.
+
+This phase should keep `ObjBuilder` independent of full CG type semantics. It
+needs sizes and alignments at definition time, not a general type graph.
+
+### 5. Update ABI and CGTarget to Consume CG Types
+
+Once the frontend and object layer are speaking the neutral model, update ABI
+classification and `CGTarget` signatures to consume CG descriptors directly.
+
+Important changes:
+
+- Replace `abi_func_info(TargetABI*, const Type*)` with classification keyed by
+ `CGFuncSig`.
+- Preserve ABI attributes in `ABIFuncInfo` / `ABIArgInfo`: signext, zeroext,
+ sret, byval, byref, inreg, noalias, readonly, writeonly, nonnull, nest,
+ explicit alignment, and dereferenceable size.
+- Extend `CGFuncDesc` for complete function attrs: stack alignment, section,
+ target feature strings, cold/hot, naked, interrupt, no-red-zone, ifunc, and
+ noreturn.
+- Extend `CGCallDesc` for tail policy, musttail validation, cold call hints,
+ direct/indirect callee details, and full ABI signature metadata.
+- Replace simple op hooks with descriptors preserving integer flags, FP flags,
+ ordered/unordered FP comparisons, FP remainder, and conversion rounding.
+- Preserve full memory metadata: address space, volatile, nontemporal,
+ invariant, alias scope, noalias scope, and atomic flag/order.
+- Add target hooks or descriptors for switches, label addresses, indirect
+ branches, atomics legality/lock-free queries, weak compare-exchange, expanded
+ intrinsics, and inline asm flags/ABI clobber sets.
+
+`opt_cgtarget` and IR replay should mirror the new `CGTarget` surface rather
+than reconstructing lost metadata.
+
+### 6. Delete the Old CG Layer
+
+Delete the old CG layer only after:
+
+- the C frontend emits through neutral CG.
+- public CG API tests pass through the same path.
+- `ObjBuilder`, `TargetABI`, `CGTarget`, and `opt_cgtarget` consume neutral
+ descriptors.
+- any useful parity tests have been moved.
+- no production driver or test harness depends on the old interfaces.
+
+At this point deletion should be mostly removing stale adapters and C-shaped
+plumbing, not making new semantic decisions.
+
+## Capability and Diagnostic Contract
+
+Capability queries should answer correctness, not performance. A target should
+return support only when it can preserve the requested semantics.
+
+Examples:
+
+- non-default calling conventions must be target-backed.
+- musttail requires ABI compatibility validation.
+- symbol feature queries should be backed by `ObjBuilder` and object-format
+ support, not approximated in `src/api/cg.c`.
+- atomic legality and lock-free answers should come from target hooks.
+- strict conversion rounding, trapping overflow, saturating arithmetic, FP
+ remainder, and runtime/bare-metal intrinsics should diagnose until supported.
+
+Hints such as non-temporal memory, branch/call hotness, and some fast-math flags
+may be ignored only when the public API explicitly permits that behavior.
+
+## Suggested Test Strategy
+
+Prefer narrow tests while the interfaces are changing:
+
+- `make test-cg` for neutral CG lowering and ABI behavior.
+- `make test-elf` for symbol attrs, sections, `entsize`, data expressions, and
+ object round-trips.
+- `make test-link` for relocation behavior, visibility, TLS, COMDAT, and
+ symdiff handling.
+- frontend subsets such as `make test-parse test-cg` when migrating C lowering.
+- specific arch smoke/codegen cases for features each target claims to support.
+
+Keep unsupported-feature tests explicit: they should assert diagnostics or false
+capability answers rather than relying on accidental backend behavior.