kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 9e2ff3eaf2b7f21a6709bcfcb946cac6592372c4
parent fed0b77d70da076c5963cb78b8aa7fc8e45cd3f1
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Wed, 13 May 2026 04:08:58 -0700

docs: clarify CG API status and plan

Diffstat:
Mdoc/cg-api-status.md | 418++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------
1 file changed, 256 insertions(+), 162 deletions(-)

diff --git a/doc/cg-api-status.md b/doc/cg-api-status.md @@ -1,134 +1,94 @@ -# CG API Implementation & Toy Language — Status - -## Completed Implementations (`src/api/cg.c`) - -All previously-stubbed public CG API functions now have real implementations: - -| Function | Status | Mechanism | -|---|---|---| -| `cfree_cg_push_float` | Done | Builds `ConstBytes` union, delegates to `CGTarget.load_const` | -| `cfree_cg_push_bytes` | Done | Emits to `.rodata` section via `ObjBuilder`, pushes `OPK_GLOBAL` pointer | -| `cfree_cg_memcpy` | Done | Pops dst/src addrs, forces to regs, calls `CGTarget.copy_bytes` | -| `cfree_cg_memset` | Done | Pops dst addr, calls `CGTarget.set_bytes` with imm byte value | -| `cfree_cg_index` | Done | Pops `[base, index]`, computes `base + offset + index * elemsz` | -| `cfree_cg_field_addr` | Done | Pops base ptr, looks up field offset via `abi_record_layout`, emits add | -| `cfree_cg_alloca` | Done | Pops size, calls `CGTarget.alloca_` | -| `cfree_cg_va_start` | Done | Pops `&ap`, forces reg, calls `CGTarget.va_start_` | -| `cfree_cg_va_arg` | Done | Pops `&ap`, allocs dst reg, calls `CGTarget.va_arg_` | -| `cfree_cg_va_end` | Done | Pops `&ap`, forces reg, calls `CGTarget.va_end_` | -| `cfree_cg_va_copy` | Done | Pops `&dst, &src`, forces regs, calls `CGTarget.va_copy_` | -| `cfree_cg_data_decl` | Done | Creates symbol via `obj_symbol_ex` with visibility mapping | -| `cfree_cg_data_begin` | Done | Selects section (data/rodata/bss/tdata), aligns, defines symbol | -| `cfree_cg_data_bytes` | Done | `obj_write` to current section | -| `cfree_cg_data_zero` | Done | Writes zero-padded bytes to section in 64-byte chunks | -| `cfree_cg_data_symbol` | Done | Emits reloc via `obj_reloc` (ABS32/ABS64/PC32/PC64) | -| `cfree_cg_data_end` | Done | Resets data state | -| `cfree_cg_tail_call` | Done | Same as `cfree_cg_call` but with `CG_CALL_TAIL` flag | -| `cfree_cg_continue_true` | Done | Pops condition, `cmp_branch(CMP_NE, ...)` to continue label | -| `cfree_cg_continue_false` | Done | Pops condition, `cmp_branch(CMP_EQ, ...)` to continue label | -| `cfree_cg_intrinsic` | Done | Maps public intrinsic ids to `CGTarget.intrinsic`; supports scalar result and overflow two-result forms | -| `cfree_cg_atomic_load/store/rmw/cmpxchg/fence` | Done | Derives pointee memory access, sets `MF_ATOMIC`, maps op/order enums, dispatches to target atomics | -| `cfree_cg_inline_asm` | Done | Converts public operands to `AsmConstraint`, binds inputs/outputs/clobbers, dispatches to `CGTarget.asm_block` | - -Struct additions to `CfreeCg`: - -- `rodata_counter` — unique names for anonymous rodata symbols -- `data_sec`, `data_base`, `data_size` — tracks current data definition - -Helper additions: - -- `api_map_vis()` — maps `CfreeCgVisibility` to `SymVis` -- `api_data_sym_kind()` — maps decl attrs to `SK_OBJ` / `SK_TLS` - -Bug fix: - -- `cfree_cg_func_end` now resets `g->nscopes = 0` (was leaking scope state across functions) - -## Toy Language Additions (`lang/toy/toy.c`) - -### New tokens - -`TOK_VAR`, `TOK_TAIL`, `TOK_PIPE` (`|`), `TOK_CARET` (`^`), `TOK_TILDE` (`~`), `TOK_SHL` (`<<`), `TOK_SHR` (`>>`) - -### New expression precedence levels (C-standard order) - -- shift: `<<`, `>>` -> `CFREE_CG_SHL`, `CFREE_CG_SHR_S` -- bitwise AND: `&` -> `CFREE_CG_AND` -- bitwise XOR: `^` -> `CFREE_CG_XOR` -- bitwise OR: `|` -> `CFREE_CG_OR` -- unary `~` -> `CFREE_CG_BNOT` - -### Global variables - -`let` = immutable, `var` = mutable: - -- `ToyGlobal` struct, `ToyGlobal globals[]` in parser -- `toy_find_global()` lookup -- Primary expression: globals resolved via `cfree_cg_push_symbol` + `cfree_cg_load` -- Assignment: mutable globals stored via `cfree_cg_push_symbol` + `cfree_cg_store` -- Global `let`/`var` declarations emit `cfree_cg_data_begin/zero/end` -- Constant global initializers now emit bytes directly. -- Address global initializers (`&name`) emit `cfree_cg_data_symbol`. - -### CG API test builtins - -Toy now includes small builtins dedicated to public CG API coverage: - -- `typecheck()` covers type constructors and queries. -- `byteconst()` covers `push_bytes + load`. -- `alloca`, `index`, `memset`, `memcpy` cover dynamic allocation and memory helpers. -- `atomic_load`, `atomic_store`, `atomic_add`, `atomic_sub`, `atomic_cas_ok`, `fence` cover atomics. -- `popcount`, `ctz`, `clz`, `bswap`, `expect` cover public intrinsic dispatch. -- `fieldtest()` covers record construction, layout queries, and `field_addr`. -- `asmnop()` covers public inline asm dispatch on the AArch64 toy test target. -- `return tail f(...)` covers `cfree_cg_tail_call`. - -### if/else fix - -Switched from `cfree_cg_scope_begin/break_false/break/scope_end` to raw -`cfree_cg_label_new/branch_false/jump/label_place` — the scope model cannot -express "jump to else" vs "jump to end" with distinct targets. - -## Test Results - -All passing via `cfree run`: - -- Basic arithmetic, function calls, recursion (fib) -- while loops, break/continue -- `&&`, `||` short-circuit (via scopes with `break_true`/`break_false`) -- Bitwise `&`, `|`, `^`, `~`, `<<`, `>>` -- Pointer deref (`*p`), address-of (`&var`) -- Multi-function programs with if/else, local vars, params -- GCD (Euclidean algorithm), sum-to-N, is_even - -## Fixed: API stack/register ownership bugs - -The AArch64 comparison-as-value failure was fixed without adding another -record kind; the CG API remains struct-only for records. - -Changes made: - -- `cfree_cg_store` again follows the public contract: stack is - `[... lvalue, rvalue] -> [rvalue]`. -- `cfree_cg_push_local` now remembers the declared slot type from - `local_slot` / `param_slot`, so loads and stores use the correct ABI size. -- `cfree_cg_dup` now duplicates register-owned values into a fresh register - and pins the source during allocation, avoiding double-free/stale-register - ownership. -- Spill/reload now uses the type of the owned register, including pointer-sized - storage for `OPK_INDIRECT` bases and 16-byte FP spill slots. -- In-flight call argument spill cleanup no longer returns aggregate lvalue - slots to the scalar spill pool. -- `cfree_cg_push_bytes` pushes a typed rodata lvalue, matching the intended - `push_bytes + load` materialization flow. -- `cfree_cg_field_addr` now pushes a pointer to the selected field type instead - of preserving the base record-pointer type. -- `cfree_cg_store` accepts pointer rvalues as store destinations by converting - them to indirect lvalues. -- `cfree_cg_tail_call` marks return storage as a void placeholder so targets do - not try to copy a return value after a tail-call-shaped dispatch. - -Validation: +# CG API Implementation & Toy Language — Status And Plan + +This document separates the current implementation state from the target design +plan. The current-status section describes what exists today, including behavior +that the plan intentionally changes. + +## Current Status + +### Public CG API implementation + +All previously-stubbed public CG API functions in `src/api/cg.c` have concrete +implementations. + +| Function | Current behavior | +|---|---| +| `cfree_cg_push_float` | Builds `ConstBytes` union and delegates to `CGTarget.load_const`. | +| `cfree_cg_push_bytes` | Emits anonymous bytes to `.rodata` and currently pushes a typed rodata lvalue. | +| `cfree_cg_memcpy` | Pops dst/src addresses, forces them to regs, calls `CGTarget.copy_bytes`. | +| `cfree_cg_memset` | Pops dst address, calls `CGTarget.set_bytes` with an immediate byte value. | +| `cfree_cg_index` | Pops `[base, index]`, computes `base + offset + index * elemsz`, currently address-like. | +| `cfree_cg_field_addr` | Pops base pointer, looks up field offset via `abi_record_layout`, emits field address. | +| `cfree_cg_alloca` | Pops size and calls `CGTarget.alloca_`. | +| `cfree_cg_va_start` | Pops `&ap`, forces reg, calls `CGTarget.va_start_`. | +| `cfree_cg_va_arg` | Pops `&ap`, allocs dst reg, calls `CGTarget.va_arg_`. | +| `cfree_cg_va_end` | Pops `&ap`, forces reg, calls `CGTarget.va_end_`. | +| `cfree_cg_va_copy` | Pops `&dst, &src`, forces regs, calls `CGTarget.va_copy_`. | +| `cfree_cg_data_decl` | Creates symbol via `obj_symbol_ex` with visibility mapping. | +| `cfree_cg_data_begin` | Selects section, aligns it, and defines symbol. | +| `cfree_cg_data_bytes` | Writes bytes with `obj_write`. | +| `cfree_cg_data_zero` | Writes zero-padded bytes to section in 64-byte chunks. | +| `cfree_cg_data_symbol` | Emits ABS32/ABS64/PC32/PC64 relocations with `obj_reloc`. | +| `cfree_cg_data_end` | Resets data-definition state. | +| `cfree_cg_tail_call` | Dispatches like `cfree_cg_call` with `CG_CALL_TAIL`, using a void return placeholder. | +| `cfree_cg_continue_true` | Pops condition, emits `cmp_branch(CMP_NE, ...)` to continue label. | +| `cfree_cg_continue_false` | Pops condition, emits `cmp_branch(CMP_EQ, ...)` to continue label. | +| `cfree_cg_intrinsic` | Maps public intrinsic ids to `CGTarget.intrinsic`; supports scalar and overflow results. | +| `cfree_cg_atomic_load/store/rmw/cmpxchg/fence` | Derives pointee memory access, marks atomic, maps enums, dispatches to target atomics. | +| `cfree_cg_inline_asm` | Converts public operands to `AsmConstraint`, binds inputs/outputs/clobbers, dispatches to `CGTarget.asm_block`. | + +Current `CfreeCg` state additions: + +- `rodata_counter` for anonymous rodata symbols. +- `data_sec`, `data_base`, and `data_size` for current data definition. + +Current helper additions: + +- `api_map_vis()` maps `CfreeCgVisibility` to `SymVis`. +- `api_data_sym_kind()` maps declaration attrs to `SK_OBJ` / `SK_TLS`. + +Current bug fix: + +- `cfree_cg_func_end` resets `g->nscopes = 0`, which fixed scope state leaking + across functions. In-function scope lifetime still needs cleanup in the plan. + +### Toy frontend status + +Toy currently has: + +- `let` immutable globals and `var` mutable globals. +- Local variables, parameters, function calls, recursion, and pointer + address/deref syntax. +- Arithmetic, comparisons, bitwise operators, shifts, unary `~`, `&&`, and + `||`. +- `while`, `break`, `continue`, `if` / `else`, and `return tail f(...)`. +- Dedicated CG API coverage builtins: + - `typecheck()` + - `byteconst()` + - `alloca`, `index`, `memset`, `memcpy` + - `atomic_load`, `atomic_store`, `atomic_add`, `atomic_sub`, + `atomic_cas_ok`, `fence` + - `popcount`, `ctz`, `clz`, `bswap`, `expect` + - `fieldtest()` + - `asmnop()` + +Current toy lowering has several planned changes: + +- Globals currently use `push_symbol + load/store`. The target model will use + `push_symbol + indirect + load/store`. +- `byteconst()` currently covers `push_bytes + load`. The target model will use + `push_bytes + indirect + load`. +- `fieldtest()` currently covers `cfree_cg_field_addr`. The target API will + replace that with `cfree_cg_field`. +- `asmnop()` is AArch64-specific. The target toy surface will replace it with + general inline asm plus target-property selection. +- Toy currently lowers `if` / `else` with raw labels because the nested-scope + idiom was not obvious. The target plan keeps scopes valid and adds inline + helpers for the common pattern. + +### Current validation + +Validated current behavior: - `make lib` - `make bin` @@ -136,37 +96,171 @@ Validation: - `make test-cg-binder` - `make test-toy` — 36 pass, 0 fail - `make test-cg` — 1573 pass, 0 fail, 0 skip -- Toy smoke tests for store and short-circuit `&&` / `||` -- Toy CG API cases: - - `15_cg_api_types_bytes_globals` - - `16_cg_api_alloca_memory_index` - - `17_cg_api_atomics_intrinsics` - - `18_cg_api_field_tail` -- AArch64 object dump for `return 1 != 2` materializes the compare result in a - real register and returns it through `x0`. + +Toy CG API cases currently include: + +- `15_cg_api_types_bytes_globals` +- `16_cg_api_alloca_memory_index` +- `17_cg_api_atomics_intrinsics` +- `18_cg_api_field_tail` + +The AArch64 comparison-as-value failure was fixed. The value-producing compare +path now materializes the result in a real register and returns it through `x0`. Note: branch lowering may still disassemble `cmp xN, #0` as `subs sp, xN, #0` because register 31 is the architectural zero-register destination for the -flag-setting compare encoding. The fixed failure was the value-producing -comparison path losing its destination register. - -## Remaining TODOs - -1. Add public CG API variadic coverage. The wrappers are implemented, but toy - still lacks a way to name the target ABI `va_list` type. -2. Add negative/error tests for public API misuse: - - stack underflow - - invalid type ids - - invalid field indexes / wrong record base - - unsupported data relocation widths -3. Run targeted cross-arch validation for the public CG API and toy frontend - on `aa64`, `x64`, and `rv64`, especially ABI-sensitive paths such as - variadics, tail calls, atomics, and inline asm. -4. Tighten and document API semantics: - - whether `cfree_cg_push_symbol` always pushes an lvalue or can model - GOT/PLT/TLS reference forms directly - - expression-scope value semantics, since current target scope support is - label-oriented rather than phi-like - - whether qualified type ids should be interned or remain fresh source - identities -5. Write comprehensive `demo.toy` that exercises all features. +flag-setting compare encoding. That is a disassembly alias concern, not the +comparison-value bug. + +## Target Design Decisions + +### Value categories + +The public CG API will use explicit value-category transitions. + +- `cfree_cg_push_symbol` is address-producing. It pushes a pointer/address value + for the requested symbol reference form. GOT/PLT/TLS forms are + address-generation models, not lvalue categories. +- `cfree_cg_push_bytes` is address-producing. It pushes a pointer/address to the + first byte of anonymous immutable bytes. +- Add `cfree_cg_indirect(CfreeCg*)`. It converts TOS from pointer rvalue `*T` + to an lvalue of `T`, rejecting non-pointers and `void *`. +- Frontends that need a different pointee type should call + `cfree_cg_convert` to the desired pointer type before `cfree_cg_indirect`. +- `cfree_cg_load` converts lvalue to rvalue. +- `cfree_cg_addr` converts lvalue to pointer rvalue. +- `cfree_cg_store` becomes statement-like: `[lvalue, value] -> []`. + Frontends that need assignment-expression semantics must preserve the value + explicitly, usually with `cfree_cg_dup`. +- `cfree_cg_dup` duplicates TOS while preserving value category. Rvalue + duplicates must have independent ownership where needed; lvalue duplicates + are two references to the same storage. + +### Data selection + +Data selectors are lvalue-producing. + +- `cfree_cg_index` selects an element lvalue. +- Replace `cfree_cg_field_addr` with `cfree_cg_field`, which selects a field + lvalue. +- Callers that need a field address use `cfree_cg_field` followed by + `cfree_cg_addr`. +- Raw address-producing primitives are reserved for symbol/blob materialization, + explicit `addr`, and explicit pointer arithmetic. + +### Calls and control flow + +- `cfree_cg_tail_call` is a function terminator. It consumes callee and args, + emits a tail dispatch, pushes no result, and terminates the current + control-flow path. It is valid only in tail position with a return-compatible + callee. +- Public scopes are stack-disciplined. Every `scope_begin` must pair with + `scope_end` in LIFO order, and `scope_end` deactivates/pops the top active + scope. `break` / `continue` reject inactive or non-active handles. +- Statement scopes can act as named exits. Breaks may target an outer active + scope. +- Expression-valued scopes stay in the public API. All result-producing exits + must provide a value of the declared result type, and the implementation must + reconcile exits into a canonical result location using register copies or a + stack slot as needed. +- Add public static inline helpers for the common `if` / `else` pattern. The + helpers should compile down to existing scope operations rather than adding a + second structured-control mechanism. + +### Types + +- Expose target ABI `va_list` as a public CG builtin type: + `CFREE_CG_BUILTIN_VA_LIST` plus `CfreeCgBuiltinTypes.va_list`, backed by + `abi_va_list_type`. +- Intern semantic structural type constructors, including qualified types. +- Pointer, array, qualified, and function constructors return stable ids for + the same shape. +- Aliases and nominal record/enum constructors remain source-identity + producing, unless a future scoped nominal-type key is introduced. + +### Toy language direction + +- Teach toy to parse and use `va_list`, then add toy coverage for + `cfree_cg_va_start`, `cfree_cg_va_arg`, `cfree_cg_va_end`, and + `cfree_cg_va_copy`. +- Replace `asmnop()` with a general toy inline-asm surface. +- Add a small compile-time selector/switch mechanism that can branch on target + properties such as `builtin.arch`, so toy source can choose target-specific + asm templates. +- Add toy error cases with expected diagnostic-message matching. +- Keep focused regression cases in `test/toy/cases/*`. +- Add a comprehensive human-readable demo at `test/toy/demo.toy`. + +### Cross-arch validation + +- Add toy cross-arch tests that compile with `cfree cc -target` for Linux cross + targets, link with `cfree ld`, then execute via podman/qemu. +- Cover `aa64`, `x64`, and `rv64`. +- Do not add a cross-arch JIT path. +- Allow environment-dependent skips. + +## Implementation Checklist + +1. Update public CG docs and declarations. + - Add `cfree_cg_indirect`. + - Replace/remove `cfree_cg_field_addr` with `cfree_cg_field`. + - Update `push_symbol`, `push_bytes`, `store`, `dup`, `tail_call`, scope, + and type-constructor documentation. + - Add `CFREE_CG_BUILTIN_VA_LIST` and `CfreeCgBuiltinTypes.va_list`. + +2. Implement value-category changes in `src/api/cg.c`. + - Make `push_symbol` address-producing. + - Make `push_bytes` address-producing. + - Implement `cfree_cg_indirect`. + - Remove implicit pointer-rvalue store behavior. + - Make `store` consume its value without pushing a result. + - Preserve and test `dup` category/ownership semantics. + +3. Implement selector changes. + - Make `cfree_cg_index` lvalue-producing. + - Replace `cfree_cg_field_addr` with `cfree_cg_field`. + - Update API users and tests to call `addr` when an address is required. + +4. Tighten scope and tail-call behavior. + - Enforce LIFO scope lifetime and stale-handle diagnostics. + - Implement expression-scope canonical result placement. + - Add static inline `if` / `else` helpers. + - Treat `cfree_cg_tail_call` as a terminator with no result. + +5. Implement type changes. + - Add the public `va_list` builtin id and resolution. + - Intern qualified types by `(base, quals)`. + - Keep aliases and nominal record/enum constructors source-identity + producing. + +6. Add direct API and misuse tests. + - Keep return-value checks in `test/api/cg_type_test.c` or a sibling API + test. + - Add a focused panic-catching CG API misuse harness for stack underflow, + stale/non-LIFO scopes, invalid field indexes/base types, invalid + `indirect`, and unsupported data relocation widths. + +7. Update toy lowering and positive tests. + - Use `push_symbol + indirect` for global loads/stores. + - Use `push_bytes + indirect + load` for byte constants. + - Use `cfree_cg_field` instead of `field_addr`. + - Update assignments for statement-like `store`. + - Update `return tail f(...)` for terminator semantics. + - Move `if` / `else` lowering to the new inline helpers if practical. + +8. Add toy variadics, inline asm, and error tests. + - Add `va_list` syntax/usage and variadic coverage. + - Replace `asmnop()` with general inline asm plus `builtin.arch` selection. + - Extend `test/toy/run.sh` with an error-case mode if needed. + - Add diagnostic-matching toy error cases. + +9. Add cross-arch toy validation. + - Add an opt-in toy cc+ld+exec mode for `aa64`, `x64`, and `rv64`. + - Use `cfree cc -target`, `cfree ld`, and existing podman/qemu execution + helpers where possible. + - Do not add a cross-arch JIT path. + +10. Add `test/toy/demo.toy`. + - Keep it readable and comprehensive. + - Exercise toy syntax, globals, control flow, calls, memory helpers, + atomics, variadics, tail calls, inline asm, and public CG API builtins.