commit 9e2ff3eaf2b7f21a6709bcfcb946cac6592372c4
parent fed0b77d70da076c5963cb78b8aa7fc8e45cd3f1
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Wed, 13 May 2026 04:08:58 -0700
docs: clarify CG API status and plan
Diffstat:
| M | doc/cg-api-status.md | | | 418 | ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------- |
1 file changed, 256 insertions(+), 162 deletions(-)
diff --git a/doc/cg-api-status.md b/doc/cg-api-status.md
@@ -1,134 +1,94 @@
-# CG API Implementation & Toy Language — Status
-
-## Completed Implementations (`src/api/cg.c`)
-
-All previously-stubbed public CG API functions now have real implementations:
-
-| Function | Status | Mechanism |
-|---|---|---|
-| `cfree_cg_push_float` | Done | Builds `ConstBytes` union, delegates to `CGTarget.load_const` |
-| `cfree_cg_push_bytes` | Done | Emits to `.rodata` section via `ObjBuilder`, pushes `OPK_GLOBAL` pointer |
-| `cfree_cg_memcpy` | Done | Pops dst/src addrs, forces to regs, calls `CGTarget.copy_bytes` |
-| `cfree_cg_memset` | Done | Pops dst addr, calls `CGTarget.set_bytes` with imm byte value |
-| `cfree_cg_index` | Done | Pops `[base, index]`, computes `base + offset + index * elemsz` |
-| `cfree_cg_field_addr` | Done | Pops base ptr, looks up field offset via `abi_record_layout`, emits add |
-| `cfree_cg_alloca` | Done | Pops size, calls `CGTarget.alloca_` |
-| `cfree_cg_va_start` | Done | Pops `&ap`, forces reg, calls `CGTarget.va_start_` |
-| `cfree_cg_va_arg` | Done | Pops `&ap`, allocs dst reg, calls `CGTarget.va_arg_` |
-| `cfree_cg_va_end` | Done | Pops `&ap`, forces reg, calls `CGTarget.va_end_` |
-| `cfree_cg_va_copy` | Done | Pops `&dst, &src`, forces regs, calls `CGTarget.va_copy_` |
-| `cfree_cg_data_decl` | Done | Creates symbol via `obj_symbol_ex` with visibility mapping |
-| `cfree_cg_data_begin` | Done | Selects section (data/rodata/bss/tdata), aligns, defines symbol |
-| `cfree_cg_data_bytes` | Done | `obj_write` to current section |
-| `cfree_cg_data_zero` | Done | Writes zero-padded bytes to section in 64-byte chunks |
-| `cfree_cg_data_symbol` | Done | Emits reloc via `obj_reloc` (ABS32/ABS64/PC32/PC64) |
-| `cfree_cg_data_end` | Done | Resets data state |
-| `cfree_cg_tail_call` | Done | Same as `cfree_cg_call` but with `CG_CALL_TAIL` flag |
-| `cfree_cg_continue_true` | Done | Pops condition, `cmp_branch(CMP_NE, ...)` to continue label |
-| `cfree_cg_continue_false` | Done | Pops condition, `cmp_branch(CMP_EQ, ...)` to continue label |
-| `cfree_cg_intrinsic` | Done | Maps public intrinsic ids to `CGTarget.intrinsic`; supports scalar result and overflow two-result forms |
-| `cfree_cg_atomic_load/store/rmw/cmpxchg/fence` | Done | Derives pointee memory access, sets `MF_ATOMIC`, maps op/order enums, dispatches to target atomics |
-| `cfree_cg_inline_asm` | Done | Converts public operands to `AsmConstraint`, binds inputs/outputs/clobbers, dispatches to `CGTarget.asm_block` |
-
-Struct additions to `CfreeCg`:
-
-- `rodata_counter` — unique names for anonymous rodata symbols
-- `data_sec`, `data_base`, `data_size` — tracks current data definition
-
-Helper additions:
-
-- `api_map_vis()` — maps `CfreeCgVisibility` to `SymVis`
-- `api_data_sym_kind()` — maps decl attrs to `SK_OBJ` / `SK_TLS`
-
-Bug fix:
-
-- `cfree_cg_func_end` now resets `g->nscopes = 0` (was leaking scope state across functions)
-
-## Toy Language Additions (`lang/toy/toy.c`)
-
-### New tokens
-
-`TOK_VAR`, `TOK_TAIL`, `TOK_PIPE` (`|`), `TOK_CARET` (`^`), `TOK_TILDE` (`~`), `TOK_SHL` (`<<`), `TOK_SHR` (`>>`)
-
-### New expression precedence levels (C-standard order)
-
-- shift: `<<`, `>>` -> `CFREE_CG_SHL`, `CFREE_CG_SHR_S`
-- bitwise AND: `&` -> `CFREE_CG_AND`
-- bitwise XOR: `^` -> `CFREE_CG_XOR`
-- bitwise OR: `|` -> `CFREE_CG_OR`
-- unary `~` -> `CFREE_CG_BNOT`
-
-### Global variables
-
-`let` = immutable, `var` = mutable:
-
-- `ToyGlobal` struct, `ToyGlobal globals[]` in parser
-- `toy_find_global()` lookup
-- Primary expression: globals resolved via `cfree_cg_push_symbol` + `cfree_cg_load`
-- Assignment: mutable globals stored via `cfree_cg_push_symbol` + `cfree_cg_store`
-- Global `let`/`var` declarations emit `cfree_cg_data_begin/zero/end`
-- Constant global initializers now emit bytes directly.
-- Address global initializers (`&name`) emit `cfree_cg_data_symbol`.
-
-### CG API test builtins
-
-Toy now includes small builtins dedicated to public CG API coverage:
-
-- `typecheck()` covers type constructors and queries.
-- `byteconst()` covers `push_bytes + load`.
-- `alloca`, `index`, `memset`, `memcpy` cover dynamic allocation and memory helpers.
-- `atomic_load`, `atomic_store`, `atomic_add`, `atomic_sub`, `atomic_cas_ok`, `fence` cover atomics.
-- `popcount`, `ctz`, `clz`, `bswap`, `expect` cover public intrinsic dispatch.
-- `fieldtest()` covers record construction, layout queries, and `field_addr`.
-- `asmnop()` covers public inline asm dispatch on the AArch64 toy test target.
-- `return tail f(...)` covers `cfree_cg_tail_call`.
-
-### if/else fix
-
-Switched from `cfree_cg_scope_begin/break_false/break/scope_end` to raw
-`cfree_cg_label_new/branch_false/jump/label_place` — the scope model cannot
-express "jump to else" vs "jump to end" with distinct targets.
-
-## Test Results
-
-All passing via `cfree run`:
-
-- Basic arithmetic, function calls, recursion (fib)
-- while loops, break/continue
-- `&&`, `||` short-circuit (via scopes with `break_true`/`break_false`)
-- Bitwise `&`, `|`, `^`, `~`, `<<`, `>>`
-- Pointer deref (`*p`), address-of (`&var`)
-- Multi-function programs with if/else, local vars, params
-- GCD (Euclidean algorithm), sum-to-N, is_even
-
-## Fixed: API stack/register ownership bugs
-
-The AArch64 comparison-as-value failure was fixed without adding another
-record kind; the CG API remains struct-only for records.
-
-Changes made:
-
-- `cfree_cg_store` again follows the public contract: stack is
- `[... lvalue, rvalue] -> [rvalue]`.
-- `cfree_cg_push_local` now remembers the declared slot type from
- `local_slot` / `param_slot`, so loads and stores use the correct ABI size.
-- `cfree_cg_dup` now duplicates register-owned values into a fresh register
- and pins the source during allocation, avoiding double-free/stale-register
- ownership.
-- Spill/reload now uses the type of the owned register, including pointer-sized
- storage for `OPK_INDIRECT` bases and 16-byte FP spill slots.
-- In-flight call argument spill cleanup no longer returns aggregate lvalue
- slots to the scalar spill pool.
-- `cfree_cg_push_bytes` pushes a typed rodata lvalue, matching the intended
- `push_bytes + load` materialization flow.
-- `cfree_cg_field_addr` now pushes a pointer to the selected field type instead
- of preserving the base record-pointer type.
-- `cfree_cg_store` accepts pointer rvalues as store destinations by converting
- them to indirect lvalues.
-- `cfree_cg_tail_call` marks return storage as a void placeholder so targets do
- not try to copy a return value after a tail-call-shaped dispatch.
-
-Validation:
+# CG API Implementation & Toy Language — Status And Plan
+
+This document separates the current implementation state from the target design
+plan. The current-status section describes what exists today, including behavior
+that the plan intentionally changes.
+
+## Current Status
+
+### Public CG API implementation
+
+All previously-stubbed public CG API functions in `src/api/cg.c` have concrete
+implementations.
+
+| Function | Current behavior |
+|---|---|
+| `cfree_cg_push_float` | Builds `ConstBytes` union and delegates to `CGTarget.load_const`. |
+| `cfree_cg_push_bytes` | Emits anonymous bytes to `.rodata` and currently pushes a typed rodata lvalue. |
+| `cfree_cg_memcpy` | Pops dst/src addresses, forces them to regs, calls `CGTarget.copy_bytes`. |
+| `cfree_cg_memset` | Pops dst address, calls `CGTarget.set_bytes` with an immediate byte value. |
+| `cfree_cg_index` | Pops `[base, index]`, computes `base + offset + index * elemsz`, currently address-like. |
+| `cfree_cg_field_addr` | Pops base pointer, looks up field offset via `abi_record_layout`, emits field address. |
+| `cfree_cg_alloca` | Pops size and calls `CGTarget.alloca_`. |
+| `cfree_cg_va_start` | Pops `&ap`, forces reg, calls `CGTarget.va_start_`. |
+| `cfree_cg_va_arg` | Pops `&ap`, allocs dst reg, calls `CGTarget.va_arg_`. |
+| `cfree_cg_va_end` | Pops `&ap`, forces reg, calls `CGTarget.va_end_`. |
+| `cfree_cg_va_copy` | Pops `&dst, &src`, forces regs, calls `CGTarget.va_copy_`. |
+| `cfree_cg_data_decl` | Creates symbol via `obj_symbol_ex` with visibility mapping. |
+| `cfree_cg_data_begin` | Selects section, aligns it, and defines symbol. |
+| `cfree_cg_data_bytes` | Writes bytes with `obj_write`. |
+| `cfree_cg_data_zero` | Writes zero-padded bytes to section in 64-byte chunks. |
+| `cfree_cg_data_symbol` | Emits ABS32/ABS64/PC32/PC64 relocations with `obj_reloc`. |
+| `cfree_cg_data_end` | Resets data-definition state. |
+| `cfree_cg_tail_call` | Dispatches like `cfree_cg_call` with `CG_CALL_TAIL`, using a void return placeholder. |
+| `cfree_cg_continue_true` | Pops condition, emits `cmp_branch(CMP_NE, ...)` to continue label. |
+| `cfree_cg_continue_false` | Pops condition, emits `cmp_branch(CMP_EQ, ...)` to continue label. |
+| `cfree_cg_intrinsic` | Maps public intrinsic ids to `CGTarget.intrinsic`; supports scalar and overflow results. |
+| `cfree_cg_atomic_load/store/rmw/cmpxchg/fence` | Derives pointee memory access, marks atomic, maps enums, dispatches to target atomics. |
+| `cfree_cg_inline_asm` | Converts public operands to `AsmConstraint`, binds inputs/outputs/clobbers, dispatches to `CGTarget.asm_block`. |
+
+Current `CfreeCg` state additions:
+
+- `rodata_counter` for anonymous rodata symbols.
+- `data_sec`, `data_base`, and `data_size` for current data definition.
+
+Current helper additions:
+
+- `api_map_vis()` maps `CfreeCgVisibility` to `SymVis`.
+- `api_data_sym_kind()` maps declaration attrs to `SK_OBJ` / `SK_TLS`.
+
+Current bug fix:
+
+- `cfree_cg_func_end` resets `g->nscopes = 0`, which fixed scope state leaking
+ across functions. In-function scope lifetime still needs cleanup in the plan.
+
+### Toy frontend status
+
+Toy currently has:
+
+- `let` immutable globals and `var` mutable globals.
+- Local variables, parameters, function calls, recursion, and pointer
+ address/deref syntax.
+- Arithmetic, comparisons, bitwise operators, shifts, unary `~`, `&&`, and
+ `||`.
+- `while`, `break`, `continue`, `if` / `else`, and `return tail f(...)`.
+- Dedicated CG API coverage builtins:
+ - `typecheck()`
+ - `byteconst()`
+ - `alloca`, `index`, `memset`, `memcpy`
+ - `atomic_load`, `atomic_store`, `atomic_add`, `atomic_sub`,
+ `atomic_cas_ok`, `fence`
+ - `popcount`, `ctz`, `clz`, `bswap`, `expect`
+ - `fieldtest()`
+ - `asmnop()`
+
+Current toy lowering has several planned changes:
+
+- Globals currently use `push_symbol + load/store`. The target model will use
+ `push_symbol + indirect + load/store`.
+- `byteconst()` currently covers `push_bytes + load`. The target model will use
+ `push_bytes + indirect + load`.
+- `fieldtest()` currently covers `cfree_cg_field_addr`. The target API will
+ replace that with `cfree_cg_field`.
+- `asmnop()` is AArch64-specific. The target toy surface will replace it with
+ general inline asm plus target-property selection.
+- Toy currently lowers `if` / `else` with raw labels because the nested-scope
+ idiom was not obvious. The target plan keeps scopes valid and adds inline
+ helpers for the common pattern.
+
+### Current validation
+
+Validated current behavior:
- `make lib`
- `make bin`
@@ -136,37 +96,171 @@ Validation:
- `make test-cg-binder`
- `make test-toy` — 36 pass, 0 fail
- `make test-cg` — 1573 pass, 0 fail, 0 skip
-- Toy smoke tests for store and short-circuit `&&` / `||`
-- Toy CG API cases:
- - `15_cg_api_types_bytes_globals`
- - `16_cg_api_alloca_memory_index`
- - `17_cg_api_atomics_intrinsics`
- - `18_cg_api_field_tail`
-- AArch64 object dump for `return 1 != 2` materializes the compare result in a
- real register and returns it through `x0`.
+
+Toy CG API cases currently include:
+
+- `15_cg_api_types_bytes_globals`
+- `16_cg_api_alloca_memory_index`
+- `17_cg_api_atomics_intrinsics`
+- `18_cg_api_field_tail`
+
+The AArch64 comparison-as-value failure was fixed. The value-producing compare
+path now materializes the result in a real register and returns it through `x0`.
Note: branch lowering may still disassemble `cmp xN, #0` as `subs sp, xN, #0`
because register 31 is the architectural zero-register destination for the
-flag-setting compare encoding. The fixed failure was the value-producing
-comparison path losing its destination register.
-
-## Remaining TODOs
-
-1. Add public CG API variadic coverage. The wrappers are implemented, but toy
- still lacks a way to name the target ABI `va_list` type.
-2. Add negative/error tests for public API misuse:
- - stack underflow
- - invalid type ids
- - invalid field indexes / wrong record base
- - unsupported data relocation widths
-3. Run targeted cross-arch validation for the public CG API and toy frontend
- on `aa64`, `x64`, and `rv64`, especially ABI-sensitive paths such as
- variadics, tail calls, atomics, and inline asm.
-4. Tighten and document API semantics:
- - whether `cfree_cg_push_symbol` always pushes an lvalue or can model
- GOT/PLT/TLS reference forms directly
- - expression-scope value semantics, since current target scope support is
- label-oriented rather than phi-like
- - whether qualified type ids should be interned or remain fresh source
- identities
-5. Write comprehensive `demo.toy` that exercises all features.
+flag-setting compare encoding. That is a disassembly alias concern, not the
+comparison-value bug.
+
+## Target Design Decisions
+
+### Value categories
+
+The public CG API will use explicit value-category transitions.
+
+- `cfree_cg_push_symbol` is address-producing. It pushes a pointer/address value
+ for the requested symbol reference form. GOT/PLT/TLS forms are
+ address-generation models, not lvalue categories.
+- `cfree_cg_push_bytes` is address-producing. It pushes a pointer/address to the
+ first byte of anonymous immutable bytes.
+- Add `cfree_cg_indirect(CfreeCg*)`. It converts TOS from pointer rvalue `*T`
+ to an lvalue of `T`, rejecting non-pointers and `void *`.
+- Frontends that need a different pointee type should call
+ `cfree_cg_convert` to the desired pointer type before `cfree_cg_indirect`.
+- `cfree_cg_load` converts lvalue to rvalue.
+- `cfree_cg_addr` converts lvalue to pointer rvalue.
+- `cfree_cg_store` becomes statement-like: `[lvalue, value] -> []`.
+ Frontends that need assignment-expression semantics must preserve the value
+ explicitly, usually with `cfree_cg_dup`.
+- `cfree_cg_dup` duplicates TOS while preserving value category. Rvalue
+ duplicates must have independent ownership where needed; lvalue duplicates
+ are two references to the same storage.
+
+### Data selection
+
+Data selectors are lvalue-producing.
+
+- `cfree_cg_index` selects an element lvalue.
+- Replace `cfree_cg_field_addr` with `cfree_cg_field`, which selects a field
+ lvalue.
+- Callers that need a field address use `cfree_cg_field` followed by
+ `cfree_cg_addr`.
+- Raw address-producing primitives are reserved for symbol/blob materialization,
+ explicit `addr`, and explicit pointer arithmetic.
+
+### Calls and control flow
+
+- `cfree_cg_tail_call` is a function terminator. It consumes callee and args,
+ emits a tail dispatch, pushes no result, and terminates the current
+ control-flow path. It is valid only in tail position with a return-compatible
+ callee.
+- Public scopes are stack-disciplined. Every `scope_begin` must pair with
+ `scope_end` in LIFO order, and `scope_end` deactivates/pops the top active
+ scope. `break` / `continue` reject inactive or non-active handles.
+- Statement scopes can act as named exits. Breaks may target an outer active
+ scope.
+- Expression-valued scopes stay in the public API. All result-producing exits
+ must provide a value of the declared result type, and the implementation must
+ reconcile exits into a canonical result location using register copies or a
+ stack slot as needed.
+- Add public static inline helpers for the common `if` / `else` pattern. The
+ helpers should compile down to existing scope operations rather than adding a
+ second structured-control mechanism.
+
+### Types
+
+- Expose target ABI `va_list` as a public CG builtin type:
+ `CFREE_CG_BUILTIN_VA_LIST` plus `CfreeCgBuiltinTypes.va_list`, backed by
+ `abi_va_list_type`.
+- Intern semantic structural type constructors, including qualified types.
+- Pointer, array, qualified, and function constructors return stable ids for
+ the same shape.
+- Aliases and nominal record/enum constructors remain source-identity
+ producing, unless a future scoped nominal-type key is introduced.
+
+### Toy language direction
+
+- Teach toy to parse and use `va_list`, then add toy coverage for
+ `cfree_cg_va_start`, `cfree_cg_va_arg`, `cfree_cg_va_end`, and
+ `cfree_cg_va_copy`.
+- Replace `asmnop()` with a general toy inline-asm surface.
+- Add a small compile-time selector/switch mechanism that can branch on target
+ properties such as `builtin.arch`, so toy source can choose target-specific
+ asm templates.
+- Add toy error cases with expected diagnostic-message matching.
+- Keep focused regression cases in `test/toy/cases/*`.
+- Add a comprehensive human-readable demo at `test/toy/demo.toy`.
+
+### Cross-arch validation
+
+- Add toy cross-arch tests that compile with `cfree cc -target` for Linux cross
+ targets, link with `cfree ld`, then execute via podman/qemu.
+- Cover `aa64`, `x64`, and `rv64`.
+- Do not add a cross-arch JIT path.
+- Allow environment-dependent skips.
+
+## Implementation Checklist
+
+1. Update public CG docs and declarations.
+ - Add `cfree_cg_indirect`.
+ - Replace/remove `cfree_cg_field_addr` with `cfree_cg_field`.
+ - Update `push_symbol`, `push_bytes`, `store`, `dup`, `tail_call`, scope,
+ and type-constructor documentation.
+ - Add `CFREE_CG_BUILTIN_VA_LIST` and `CfreeCgBuiltinTypes.va_list`.
+
+2. Implement value-category changes in `src/api/cg.c`.
+ - Make `push_symbol` address-producing.
+ - Make `push_bytes` address-producing.
+ - Implement `cfree_cg_indirect`.
+ - Remove implicit pointer-rvalue store behavior.
+ - Make `store` consume its value without pushing a result.
+ - Preserve and test `dup` category/ownership semantics.
+
+3. Implement selector changes.
+ - Make `cfree_cg_index` lvalue-producing.
+ - Replace `cfree_cg_field_addr` with `cfree_cg_field`.
+ - Update API users and tests to call `addr` when an address is required.
+
+4. Tighten scope and tail-call behavior.
+ - Enforce LIFO scope lifetime and stale-handle diagnostics.
+ - Implement expression-scope canonical result placement.
+ - Add static inline `if` / `else` helpers.
+ - Treat `cfree_cg_tail_call` as a terminator with no result.
+
+5. Implement type changes.
+ - Add the public `va_list` builtin id and resolution.
+ - Intern qualified types by `(base, quals)`.
+ - Keep aliases and nominal record/enum constructors source-identity
+ producing.
+
+6. Add direct API and misuse tests.
+ - Keep return-value checks in `test/api/cg_type_test.c` or a sibling API
+ test.
+ - Add a focused panic-catching CG API misuse harness for stack underflow,
+ stale/non-LIFO scopes, invalid field indexes/base types, invalid
+ `indirect`, and unsupported data relocation widths.
+
+7. Update toy lowering and positive tests.
+ - Use `push_symbol + indirect` for global loads/stores.
+ - Use `push_bytes + indirect + load` for byte constants.
+ - Use `cfree_cg_field` instead of `field_addr`.
+ - Update assignments for statement-like `store`.
+ - Update `return tail f(...)` for terminator semantics.
+ - Move `if` / `else` lowering to the new inline helpers if practical.
+
+8. Add toy variadics, inline asm, and error tests.
+ - Add `va_list` syntax/usage and variadic coverage.
+ - Replace `asmnop()` with general inline asm plus `builtin.arch` selection.
+ - Extend `test/toy/run.sh` with an error-case mode if needed.
+ - Add diagnostic-matching toy error cases.
+
+9. Add cross-arch toy validation.
+ - Add an opt-in toy cc+ld+exec mode for `aa64`, `x64`, and `rv64`.
+ - Use `cfree cc -target`, `cfree ld`, and existing podman/qemu execution
+ helpers where possible.
+ - Do not add a cross-arch JIT path.
+
+10. Add `test/toy/demo.toy`.
+ - Keep it readable and comprehensive.
+ - Exercise toy syntax, globals, control flow, calls, memory helpers,
+ atomics, variadics, tail calls, inline asm, and public CG API builtins.