commit 35868a8761e3923ce2a866dc5bd07d79a44996d2
parent 571865addf5f159357059b659a46110f6388010d
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sat, 23 May 2026 10:06:01 -0700
doc: prune stale checklists and plans
Drop superseded checklists (C11 conformance/long-double, RV64/X64
parity, RT, frontend, locals, stage2, tailcall, toy rewrite, opt regs
plan), the old api-migration / builtins / cg-* design notes, and the
BUGS scratchpad. Live docs (OPT_PERF, OPT design, CG API status doc as
needed) remain.
Diffstat:
| D | doc/BUGS.md | | | 56 | -------------------------------------------------------- |
| D | doc/C11_CONFORMANCE_CHECKLIST.md | | | 297 | ------------------------------------------------------------------------------- |
| D | doc/C11_LONG_DOUBLE_CHECKLIST.md | | | 110 | ------------------------------------------------------------------------------- |
| D | doc/CTOOLCHAIN.md | | | 296 | ------------------------------------------------------------------------------- |
| D | doc/FRONTEND.md | | | 412 | ------------------------------------------------------------------------------- |
| D | doc/LANGS.md | | | 479 | ------------------------------------------------------------------------------- |
| D | doc/LOCALS.md | | | 127 | ------------------------------------------------------------------------------- |
| D | doc/OPT_REGS_CALL_PLAN.md | | | 592 | ------------------------------------------------------------------------------- |
| D | doc/RT_CFREERT_CHECKLIST.md | | | 113 | ------------------------------------------------------------------------------- |
| D | doc/RV64_PARITY_CHECKLIST.md | | | 252 | ------------------------------------------------------------------------------- |
| D | doc/STAGE2.md | | | 272 | ------------------------------------------------------------------------------- |
| D | doc/TAILCALL.md | | | 234 | ------------------------------------------------------------------------------- |
| D | doc/TOY_REWRITE_TASKS.md | | | 275 | ------------------------------------------------------------------------------- |
| D | doc/X64_PARITY_CHECKLIST.md | | | 389 | ------------------------------------------------------------------------------- |
| D | doc/api-migration.md | | | 304 | ------------------------------------------------------------------------------- |
| D | doc/builtins.md | | | 385 | ------------------------------------------------------------------------------- |
| D | doc/cg-api-status.md | | | 104 | ------------------------------------------------------------------------------- |
| D | doc/cg-ext.md | | | 618 | ------------------------------------------------------------------------------- |
| D | doc/cg-neutral-backend-plan.md | | | 286 | ------------------------------------------------------------------------------- |
| D | doc/cg-type-migration-plan.md | | | 157 | ------------------------------------------------------------------------------- |
20 files changed, 0 insertions(+), 5758 deletions(-)
diff --git a/doc/BUGS.md b/doc/BUGS.md
@@ -1,56 +0,0 @@
-Known bugs with red test cases (test-parse)
-
-Format as:
-
-```
-- [ ] <feature description>: <test case name>
-```
-
-- [x] pointer subtraction yields ptrdiff_t (assignable to a wider integer without a cast): `6_5_6_01_ptr_diff_assign_to_long`
-- [x] file-scope array bound with a parenthesized integer constant expression: `6_7_6_18_file_scope_array_bound_paren`
-- [x] parenthesized declarator name (`int (foo)(int)`): `6_7_6_19_paren_declarator_name`
-- [x] function declarator with an inline function-pointer return type (no typedef): `6_7_6_20_func_returning_funcptr_no_typedef`
-- [x] static initializer accepts unary `-` on a floating constant: `6_7_9_30_static_init_neg_float`
-- [x] `#warning` preprocessing directive (non-fatal, parsing continues): `6_10_warning_directive`
-- [x] static initializer accepts a binary constant expression on floating constants (`1.0f/2.2f`): `6_7_9_31_static_init_const_float_expr`
-- [x] conditional operator allows a comma expression in its middle operand (`a ? b, c : d`): `6_5_15_01_conditional_comma_in_middle`
-- [x] subscript accepts a conditional whose constant arm is `0` without treating it as a null pointer: `6_5_2_1_01_subscript_conditional_zero_branch`
-- [x] struct field declarator `RETTY (*(*name)(P))(IP)` (pointer-to-function-returning-function-pointer; sqlite VFS `xDlSym`): `6_7_6_21_field_ptr_to_func_returning_funcptr`
-
-Known bugs caught by other harnesses
-
-- [x] Mach-O `OutSec count drift` when `cfree cc` compiles a source and links it together with a precompiled `.o` input in one step: every `test/libc/cases/*.c` on the `darwin` cell of `test/libc/run.sh` (was 7/7 red, now 7/7 green). Root cause: in-memory ObjBuilders use ELF-style section names (`.text`, `.rodata`) and `.o` inputs use Mach-O comma-form (`__TEXT,__text`); both map to the same Mach-O `(segname, sectname)` in `pick_macho_names` but `link_layout` groups them by raw name, so same-mapped MSecs got interleaved with sections of a different mapped name. Phase B's adjacency-based OutSec coalescing then split the run, mismatching Phase A's distinct-name count. Fixed in `src/link/link_macho.c` by sorting MSecs by `(segname, sectname)` within each segment before vaddr placement.
-
-- [x] clang-emitted Mach-O `.o` rejected by `cfree ld` reader (`read_macho: non-extern reloc not supported`). Root cause: clang emits section-relative relocations (`r_extern == 0`) in `__LD,__compact_unwind` (and DWARF/EH sections); cfree's IR only modelled symbol-relative relocs. Fixed in `src/obj/macho_read.c` by lazily synthesizing one `.Lcfree.macho_secstart.<idx>` local symbol per referenced section and re-expressing the reloc as `target = sec_start_sym, addend = inplace_value - section.addr_in_obj`. The linker then resolves it to `target.vaddr + addend`, matching the original referent. Verified by linking `xcrun clang -c hello.c -o hello.o` output through `cfree ld -lSystem` and running.
-
-- [ ] Mach-O `link_macho: coalesce mismatch on __TEXT,__text (flags/zerofill)` when linking certain cfree-emitted relocatable objects. Reproduces with cfree-compiled `tmp/projects/stb_sprintf.h` (driver `tmp/refresh/use_stb_sprintf.c`) and `tmp/projects/cJSON/cJSON.c`; the trivial `int main(){return 0;}` + hosted shim still links, and `tmp/refresh/use_jsmn.c` links and runs end-to-end. The differentiator looks like a section-flag fan-out where an `__TEXT,__text` MSec gets emitted next to a `zerofill`-flagged MSec under the same `(segname, sectname)`, which trips the Phase A/B mismatch check in `src/link/link_macho.c`. No reduction yet:
-
- ```sh
- SDK="$(xcrun --show-sdk-path)"
- # Compile is fine:
- build/cfree cc -target aarch64-darwin --sysroot="$SDK" -isystem rt/include \
- -c tmp/refresh/use_stb_sprintf.c -o /tmp/stb.o
- # Link fails:
- build/cfree cc -target aarch64-darwin --sysroot="$SDK" -e main \
- -o /tmp/stb.exe /tmp/stb.o -lc
- # → fatal: link_macho: coalesce mismatch on __TEXT,__text (flags/zerofill)
- ```
-
-- [ ] aarch64 call lowering rejects "INDIRECT arg storage kind 3". Reproduces compiling `cJSON_Utils.c:845`, which passes a sized aggregate by value to a function. The AAPCS64 classifier picks INDIRECT but the call emitter has no path for the source-storage shape it sees there. No minimal repro yet:
-
- ```sh
- SDK="$(xcrun --show-sdk-path)"
- build/cfree cc -target aarch64-darwin --sysroot="$SDK" -isystem rt/include \
- -c tmp/projects/cJSON/cJSON_Utils.c -o /tmp/u.o
- # → fatal: aarch64 call: INDIRECT arg storage kind 3 unsupported
- ```
-
-- [ ] silent SIGSEGV with no diagnostic when compiling much of lua-5.4.7. After B3 was fixed, 18 lua TUs now crash cfree (exit 139): `lapi, lcode, ldebug, ldo, ldump, lfunc, lgc, llex, lmem, lobject, lparser, lstate, lstring, ltable, ltm, luac, lundump, lvm, lzio`. The other 14 (`lauxlib, lbaselib, lcorolib, lctype, ldblib, linit, liolib, lmathlib, loadlib, lopcodes, loslib, ltablib, lua, lutf8lib`) compile cleanly. No minimal reduction yet, so no red test:
-
- ```sh
- SDK="$(xcrun --show-sdk-path)"
- build/cfree cc -target aarch64-darwin \
- --sysroot="$SDK" -isystem rt/include \
- -c tmp/projects/lua/src/lparser.c -o /tmp/lparser.o
- # → Segmentation fault: 11 (exit 139, no diagnostic)
- ```
diff --git a/doc/C11_CONFORMANCE_CHECKLIST.md b/doc/C11_CONFORMANCE_CHECKLIST.md
@@ -1,297 +0,0 @@
-# C11 conformance checklist
-
-Status snapshot: 2026-05-19.
-
-Ground truth should be the implementation plus targeted tests, not README.md.
-Keep this checklist red-green: add or unskip the smallest case first, then
-make the implementation pass it.
-
-## Current signal
-
-- [x] `make test-lex` passes: 16/16.
-- [x] `make test-pp test-pp-err` passes: 83/83 and 15/15.
-- [x] `make test-parse-err` passes with expanded C11 constraint coverage:
- currently 57/57 pass.
-- [ ] `make test-parse` passes without skips: currently 2680 pass, 0 fail,
- 2 skip. The remaining skip is `long double`.
-- [x] `make test-cg-api test-opt test-dwarf test-debug` passes.
-- [x] `make rt` builds the default runtime archives.
-- [x] `make test-rt-headers` passes for the default runtime targets:
- AArch64/x86-64/RV64 Linux and AArch64/x86-64 Darwin.
-- [x] `make test-rt-runtime` passes for the default execution targets:
- AArch64/x86-64/RV64 Linux.
-- [x] `make test-lib-deps` passes.
-
-## First conformance gate: required diagnostics
-
-Goal: keep `make test-parse-err` green. These C11 constraint diagnostics now
-have targeted negative coverage; broaden the checks as adjacent semantic rules
-are implemented.
-
-- [x] Reject `sizeof` on incomplete object types.
- Test: `test/parse/cases_err/6_5_sizeof_incomplete.c`.
- Code: `parse_expr.c` `sizeof` / `c_abi_sizeof` call sites.
-- [x] Reject invalid implicit assignment conversions, starting with pointer to
- integer without an explicit cast.
- Test: `test/parse/cases_err/6_5_type_mismatch.c`.
- Code: `parse_assign_expr` in `parse_expr.c`.
-- [x] Reject bit-field widths wider than the declared bit-field type.
- Test: `test/parse/cases_err/6_7_2_1_bitfield_too_wide.c`.
- Code: `parse_member_decls` in `parse_type.c`.
-- [x] Reject multiple storage-class specifiers in one declaration.
- Test: `test/parse/cases_err/6_7_2_storage_class_combo.c`.
- Code: `parse_decl_specs`.
-- [x] Reject redefining a complete struct/union tag in the same scope.
- Test: `test/parse/cases_err/6_7_2_two_struct_defs.c`.
- Code: `parse_struct_or_union`; `complete` is set for newly defined tags,
- not only previously forward-declared tags.
-- [x] Reject assignment to const-qualified lvalues.
- Test: `test/parse/cases_err/6_7_3_const_assign.c`.
- Code: declaration qualifiers are applied to the base type and checked in
- `parse_assign_expr`.
-- [x] Reject duplicate file-scope object definitions with external/internal
- linkage.
- Test: `test/parse/cases_err/6_7_redefinition.c`.
- Code: `parse_external_decl`, symbol `defined` state.
-- [x] Reject duplicate `case` values within one switch after integer constant
- conversion.
- Test: `test/parse/cases_err/6_8_duplicate_case.c`.
- Code: `parse_case_stmt` / `SwitchCtx`.
-- [x] Reject duplicate function definitions while still allowing compatible
- declarations before one definition.
- Test: `test/parse/cases_err/6_9_redefinition_function.c`.
- Code: `parse_external_decl`; `SEK_FUNC` symbols track a `defined` bit.
-- [x] Reject `void` mixed with other function parameters.
- Test: `test/parse/cases_err/6_9_void_param_with_other.c`.
- Code: `parse_param_list`.
-- [x] Reject non-power-of-two positive `aligned(N)` values.
- Test: `test/parse/cases_err/attr_p2_aligned_not_pow2.c`.
- Code: attribute argument parsing in `parse_type.c`.
-- [x] Reject the newly covered expression/type constraint failures now exposed
- by `test/parse/cases_err/6_5_*`.
- Covered cases: address of bit-field, cast struct to scalar, incompatible
- conditional pointer arms, pointer-plus-pointer, incompatible pointer
- relational compare, struct used as scalar condition, and `sizeof` on a
- bit-field.
-- [x] Reject additional initializer/declarator/tag constraints:
- non-constant static initializer, excess scalar initializer, invalid
- array/struct/union designators, wrong-kind tag redeclaration, functions
- returning array/function, and variadic marker not last.
-- [x] Reject non-integer bit-field types.
- Test: `test/parse/cases_err/6_7_2_1_bitfield_bad_type.c`.
-
-Suggested cadence:
-
-```sh
-make test-parse-err > /tmp/cfree_parse_err.log 2>&1 || tail -n 80 /tmp/cfree_parse_err.log
-```
-
-## Positive parse skips and recently unskipped cases
-
-Goal: `make test-parse` is green with `CFREE_TEST_ALLOW_SKIP` unset.
-
-- [ ] Implement `long double` enough for parser/codegen/runtime tests.
- Current skipped case: `test/parse/cases/6_7_2_12_long_double.c`.
- Skip reason: binary128 literal/convert needs `rt/lib/fp_tf` wiring
- through CG.
-- [x] Enable file-scope `asm`.
- Covered case: `test/parse/cases/asm_02_file_scope.c`.
- The parser decodes the file-scope string literal and submits it through
- `cfree_cg_file_scope_asm`, which reuses the standalone asm parser over
- the current object emitter.
-
-Focused run:
-
-```sh
-CFREE_TEST_FILTER=6_7_2_12_long_double make test-parse
-CFREE_TEST_FILTER=asm_02_file_scope make test-parse
-```
-
-## Type system and declarations
-
-- [x] Implement enough structural compatibility for redeclarations and
- composite types beyond pointer identity.
- Covered cases: `6_2_7_01_composite_array_size`,
- `6_2_2_01_extern_in_block_inherits_internal`.
- Code: `type_compatible`, `type_composite`, and
- `c_sem_check_redeclaration`.
-- [x] Track declaration state for ordinary identifiers:
- declaration, tentative definition, definition, function definition,
- linkage, storage duration, and type compatibility.
- Code: parser `SymEntry` state plus `DSTATE_*`.
-- [x] Add same-scope ordinary identifier redefinition checks while preserving
- legal shadowing in nested block scopes.
- Tests: `6_7_same_scope_redefinition`,
- `6_9_duplicate_parameter`.
-- [x] Complete tag state handling for forward declarations, same-scope
- completion, and wrong-kind redeclarations.
- Negative coverage: `6_7_2_tag_wrong_kind`,
- `6_7_2_enum_forward`, `6_7_2_enum_wrong_kind`, and
- `6_7_2_enum_redefinition`.
-- [x] Validate function declarator constraints:
- `void` parameter rules, variadic placement, function returning function,
- function returning array, array/function parameter adjustment.
- Negative coverage: invalid variadic placement, ellipsis without a
- preceding parameter, function returning function, function returning
- array, and array of function. Positive function parameter adjustment is
- covered by `6_7_6_14_func_param_adjust`.
-- [x] Decide and document implementation-defined bit-field behavior:
- plain `int` signedness, allowed extended bit-field types, allocation
- order, straddling, and alignment.
-- [x] Add positive bit-field lowering cases from `test/parse/CORPUS.md`,
- including zero-width bit-fields.
- Positive bit-field, signed bit-field, zero-width, and `_Bool` bit-field
- cases pass; `float` bit-field rejection now passes. Current policy:
- plain `int` bit-fields are signed, integer types accepted by the
- frontend are accepted as bit-field base types, allocation proceeds from
- low to high bit offsets within little-endian storage units, zero-width
- fields force a fresh storage unit aligned as their declared type, and
- fields do not straddle storage units.
-
-## Expressions and conversions
-
-- [x] Make implicit conversions constraint-aware. Do not rely on CG conversion
- success as the semantic check.
- Covered for assignment, compound assignment, initialization, return,
- calls, and redeclaration diagnostics via `lang/c/sem`.
-- [x] Preserve lvalue properties: modifiable, const-qualified, bit-field,
- array, function designator, and incomplete type.
- The parser value stack tracks lvalue, modifiable-lvalue, bit-field, and
- null-pointer-constant state across loads, conversions, member access,
- dereference, and materialization.
-- [x] Implement `sizeof` rules completely:
- no incomplete object type, no function type, no bit-field, VLA operand
- evaluated, non-VLA operand not evaluated.
- Coverage: `sizeof(function)` and `sizeof(bit-field)` are rejected,
- VLA/deref pointer positive cases pass, and `6_5_59_sizeof_no_eval`
- verifies non-VLA operands are not evaluated.
-- [x] Complete conditional operator usual-conversion behavior for arithmetic
- and pointer/null arms.
- Positive arithmetic and pointer/null cases pass; incompatible pointer
- arms are rejected.
-- [x] Complete pointer compound assignment (`p += n`, `p -= n`).
- Positive `p += n` and `p -= n` coverage passes; pointer RHS is rejected.
-- [x] Expand `_Generic` tests for default selection, compatible types, and
- unevaluated controlling expression.
- Default selection, compatible typedef matching, and duplicate-compatible
- association diagnostics pass.
-- [x] Add negative tests for invalid pointer arithmetic, invalid relational
- comparisons, invalid casts, modifying non-lvalues, and scalar-required
- operators.
- Pointer-plus-pointer, incompatible pointer relational compare,
- struct-to-int cast, struct condition, array assignment, bad call
- argument conversion, invalid pointer compound assignment, and floating
- bitwise operands are rejected.
-
-## Constant expressions and initializers
-
-- [x] Replace the previous narrow `i64` integer evaluator with a typed,
- target-width integer constant-expression evaluator.
- Covered cases include integer literal type selection by suffix/base/value,
- integer promotions, usual arithmetic conversions, logical operators,
- conditional expressions, integer casts, immediate floating-constant casts,
- and shift-count diagnostics.
- Tests: `6_6_10_logical_cond_const`,
- `6_6_11_unsigned_const_expr`, and
- `6_6_shift_count_out_of_range`.
- Code: `eval_const_int_typed`, `CConstInt`, and `eval_const_int` in
- `parse_expr.c`.
-- [x] Accept `_Alignof` in integer constant expressions.
- Positive array-bound coverage passes.
-- [x] Generalize constant-expression classification beyond integer ICE call
- sites so arithmetic constants, address constants, null pointer constants,
- and static initializer validation share one semantic evaluator.
- Static scalar, pointer, address, null pointer, and bit-field
- initializers now flow through one `CStaticConst` classifier layered on
- the typed integer constant evaluator.
-- [x] Complete static initializer address constants:
- object address, function address, array plus/minus integer constant,
- and null pointer constants.
- Positive object/function/array-plus-integer address constants pass, and
- null pointer constants now include full integer constant expressions.
-- [x] Implement static-storage union initialization or document a temporary
- nonconformance gate.
- Positive non-first union designated initializer passes.
-- [x] Complete designated initializers:
- nested designators, enum-valued array designators, duplicate designator
- overwrite rules, non-first union member.
- Positive nested, enum-valued, duplicate overwrite, and non-first union
- coverage passes.
-- [x] Add diagnostics for initializer overflow, excess scalar initializers,
- non-constant static initializers, and invalid designators.
- New diagnostics for excess scalar initializers, non-constant static
- initializers, invalid array/struct/union designators, and signed static
- integer initializer overflow pass.
-
-## Preprocessor and translation phases
-
-- [x] Object/function-like macros, stringize, paste, rescan, conditionals,
- includes, line control, unknown pragmas, and `#embed` have passing tests.
-- [ ] Audit remaining C11 translation-phase requirements:
- universal character names, multibyte characters, trigraph policy,
- diagnostics for invalid preprocessing tokens, and line-splice edge cases.
-- [ ] Add conformance tests for implementation-defined preprocessor behavior
- documented in C11 Annex J.3.12.
-- [ ] Decide whether `#embed` is extension-only under strict C11 mode once a
- strict mode exists.
-
-## Freestanding library surface
-
-C11 freestanding requires at least `<float.h>`, `<iso646.h>`, `<limits.h>`,
-`<stdalign.h>`, `<stdarg.h>`, `<stdbool.h>`, `<stddef.h>`, `<stdint.h>`, and
-`<stdnoreturn.h>`. This tree also ships `assert.h` and `stdatomic.h`.
-`setjmp.h` and `cfree/coro.h` are advertised freestanding extensions: they
-depend on target register context, not hosted OS services.
-
-Status: complete for the current freestanding C11 profile. Keep this gate
-green with `make rt`, `make test-rt-headers`, `make test-rt-runtime`, and
-`make test-lib-deps`.
-
-- [x] Add header compile smoke tests for every freestanding header across the
- default runtime targets.
- Test: `make test-rt-headers` / `test/smoke.c`.
-- [x] Add macro/value tests for `limits.h`, `stdint.h`, `stddef.h`, and
- `float.h` against target ABI expectations.
- Test: `make test-rt-headers` / `test/smoke.c`.
-- [x] Add `stdarg.h` runtime tests for AArch64, x86-64, and RV64.
- Test: `make test-rt-runtime`.
-- [x] Get `stdatomic.h` tests passing against both parser builtins and
- `libcfree_rt.a`.
- Test: `make test-rt-runtime`.
-- [x] Fix `make rt` before treating atomics as conforming.
-- [x] Keep `setjmp.h` as an advertised freestanding extension; classify
- `cfree/coro.h` the same way.
-
-## Strict mode and extensions
-
-Today the frontend accepts GNU extensions needed by the project. C11
-conformance needs a mode story.
-
-- [ ] Add a driver/frontend option for strict C11 diagnostics, or document that
- the current mode is GNU-ish C11.
-- [ ] Classify extensions: `__int128`, `asm`, GNU attributes, statement
- expressions if added, binary integer literals, `#embed`, and cfree
- builtins.
-- [ ] In strict mode, diagnose extensions that can invalidate strictly
- conforming programs.
-- [ ] Keep extension tests separate from strict C11 tests.
-
-## Suggested working order
-
-1. Keep `test-parse-err` green while broadening semantic diagnostics beyond
- the first targeted cases.
-2. Add a compact "semantic type checks" helper layer so assignment, return,
- initialization, conditional expressions, and calls share rules.
- Helper coverage now includes assignment, compound assignment,
- redeclaration, calls, and initializer/return use sites.
-3. Fix declaration-state tracking: redeclarations, tentative definitions,
- function definitions, tag completion, and composite types.
- Ordinary identifier redeclarations, tentative/defined state, composite
- object/function types, and tag completion are covered.
-4. Keep bit-field layout/codegen covered while broadening target ABI tests.
-5. Keep the shared static-initializer/address-constant classifier as the only
- static initializer constant path when adding new initializer forms.
-6. Unskip `long double` or explicitly narrow the supported C profile until
- runtime/CG support exists.
-7. Keep the completed freestanding runtime/header gate green while expanding
- target coverage.
diff --git a/doc/C11_LONG_DOUBLE_CHECKLIST.md b/doc/C11_LONG_DOUBLE_CHECKLIST.md
@@ -1,110 +0,0 @@
-# C11 `long double` support checklist
-
-Status snapshot: 2026-05-19.
-
-Goal: make `long double` target-correct instead of aliasing it to `double`.
-Keep this red-green: add the smallest target-scoped case first, then make the
-implementation pass it on the target that owns that format.
-
-## Target profiles
-
-- [x] AArch64 Linux: IEEE binary128 `long double`.
- ABI: passed and returned in SIMD/FP `q` registers when register slots are
- available. Arithmetic and conversions lower to compiler-rt `*tf*`
- helpers.
-- [x] RV64 Linux LP64D: IEEE binary128 `long double`.
- ABI: passed and returned as two integer XLEN eightbytes because FLEN is
- 64. Arithmetic and conversions lower to compiler-rt `*tf*` helpers.
-- [ ] AArch64 Darwin: `long double == double`.
- Keep the current binary64 behavior and predefined macros for this OS.
-- [ ] x86-64 SysV/Darwin: x87 80-bit extended precision in 16-byte storage.
- Defer as a separate backend slice; it needs x87 load/store/arithmetic,
- x87 return handling, and `LDBL_*` macro updates. Do not block the
- binary128 work on this.
-
-## Support target for the binary128 slice
-
-- [x] Complete the 16-byte scalar `__int128` path before treating binary128 as
- green: layout, locals/globals, constants, arithmetic, shifts, compares,
- calls/returns, aggregate fields, unions, and static initialization.
-- [x] Add a target long-double profile query used by both the frontend and CG:
- format, storage size, alignment, macro values, and ABI classification.
-- [x] Add a distinct CG type for binary128 `long double`; `TY_LDOUBLE` must not
- map to `F64` on AArch64/RV64 Linux.
-- [x] Emit target-correct `__LDBL_*` and `__DECIMAL_DIG__` predefined macros
- for binary128 targets.
-- [x] Encode `L` floating constants as binary128 bytes without narrowing their
- storage type to `double`.
-- [x] Support binary128 local/global storage, assignment, struct fields, and
- return values.
-- [x] Lower binary128 arithmetic to runtime helpers:
- `__addtf3`, `__subtf3`, `__multf3`, and `__divtf3`.
-- [x] Lower binary128 comparisons through compiler-rt compare helpers.
-- [x] Lower integer, float, and double conversions through compiler-rt helpers:
- `__float*tf`, `__fix*tf*`, `__extend{s,d}ftf2`, and
- `__trunctf{s,d}f2`.
-- [x] Teach AArch64 codegen to move 16-byte FP values through Q-register
- load/store/copy paths.
-- [x] Teach RV64 ABI movement to pass/return binary128 values as two integer
- parts, backed by memory in CG.
-- [x] Keep runtime linkage using the existing `rt/lib/fp_tf/fp_tf.c` and
- `rt/lib/fp_ti/fp_ti.c` objects for the binary128 runtime variants.
-
-## Red tests
-
-The support-target tests live under `test/parse/cases/i128_*.c` and
-`test/parse/cases/ldbl128_*.c`. Run the `i128` group first; those cases isolate
-the 16-byte integer substrate needed by compiler-rt binary128 helpers and by
-the memory-backed long-double lowering.
-
-```sh
-CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=i128 CFREE_OPT_LEVELS=0 make test-parse
-CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=i128 CFREE_OPT_LEVELS=0 make test-parse
-CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=ldbl128 make test-parse
-CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=ldbl128 make test-parse
-```
-
-The `ldbl128` cases intentionally return success on non-binary128 targets so
-x87 work can land later without hiding the binary128 regression signal.
-
-Coverage intent:
-
-- `i128_01` through `i128_14`: target layout/alignment, literal storage,
- add/sub carry, multiply high-half behavior, div/mod, shifts/bitwise
- operations, signed and unsigned compares, signed shifts/conversions,
- calls/returns, aggregate fields, union lane visibility, and global
- initialization, arbitrary signed div/mod, and arbitrary signed/unsigned
- multiplication.
-- `ldbl128_01` through `ldbl128_15`: target macros/layout, literal decoding,
- arithmetic helpers, conversions, comparisons, calls/returns, struct and
- array storage, raw binary128 bits, globals, unary negation, stack
- arguments, mixed arithmetic, aggregate return, and arbitrary binary128
- multiplication.
-
-Known remaining limits:
-
-- The binary128 support target is Linux AArch64/RV64. Darwin `long double`
- target rules and x87 80-bit `long double` are still separate follow-up
- targets.
-- Decimal `L` literal coverage currently exercises representable values and
- raw canonical encodings; it does not yet prove full decimal-to-binary128
- precision for non-representable literals.
-- ABI aggregate classification still covers the implemented scalar and simple
- aggregate paths, not the full AArch64 HFA/HVA or every RV64 aggregate
- flattening edge.
-
-## Done criteria
-
-- [x] `CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=ldbl128 make test-parse` passes
- with `CFREE_TEST_ALLOW_SKIP` unset.
-- [x] `CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=ldbl128 make test-parse` passes
- with `CFREE_TEST_ALLOW_SKIP` unset.
-- [x] `CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=i128 make test-parse` passes
- with `CFREE_TEST_ALLOW_SKIP` unset.
-- [x] `CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=i128 make test-parse` passes
- with `CFREE_TEST_ALLOW_SKIP` unset.
-- [x] `CFREE_TEST_FILTER=6_7_2_12_long_double make test-parse` passes on
- AArch64 Linux and RV64 Linux without a `.skip` sidecar.
-- [x] `make rt` still builds the default runtime archives.
-- [x] `make test-rt-headers test-rt-runtime` stays green for the default
- runtime targets.
diff --git a/doc/CTOOLCHAIN.md b/doc/CTOOLCHAIN.md
@@ -1,296 +0,0 @@
-# C Toolchain Gap Analysis
-
-What a typical `Makefile` or build-system invokes vs. what `cfree` currently
-ships in its driver, and what's missing inside `libcfree` to close those
-gaps. Companion to the toolchain summary in `README.md`.
-
-Snapshot as of 2026-05-20.
-
-## Tool inventory
-
-| Tool | Status | Notes |
-| --------- | ----------------- | -------------------------------------------------- |
-| `cc` | shipped | `driver/cc.c`; broad GCC-subset surface |
-| `cpp` | shipped | `driver/cpp.c`; thin wrapper over `cfree_c_preprocess` |
-| `as` | shipped | `driver/as.c`; GAS-subset, single input |
-| `ld` | shipped | `driver/ld.c` |
-| `ar` | shipped | `driver/ar.c`; r/c/t/x/p + `s` modifier |
-| `ranlib` | shipped | `driver/ranlib.c` |
-| `objdump` | shipped | `driver/objdump.c` |
-| `nm` | missing | symbols only; reuse `cfree_obj_symiter_*` |
-| `size` | missing | section sizes from `cfree_obj_section` |
-| `strings` | missing | trivial; no `libcfree` API needed |
-| `file` | missing | `cfree_detect_fmt` already classifies |
-| `addr2line` | missing | needs DWARF query API surface (already used internally) |
-| `readelf` | partly via objdump | objdump covers most of GNU `readelf -a` |
-| `strip` | blocked | needs builder mutator API; see below |
-| `objcopy` | blocked | needs builder mutator API; see below |
-| `c++filt` | n/a | C only |
-| `gprof` / `gcov` | n/a | no profiling/coverage support today |
-| `ldd`, `ldconfig`, dynamic loader | n/a | host-provided |
-
-`cfree`-specific tools (`run`, `dbg`, `emu`) are out of scope for this
-document.
-
-## Strip / Objcopy
-
-Both are **blocked on a builder-mutator surface** that does not yet exist
-in `libcfree`. The reader produces an already-finalized `CfreeObjBuilder`
-(per `src/obj/obj.h` "lifecycle gates" — `obj_finalize` freezes the
-read-side view, no further writes permitted). Pure roundtrip works (open
-→ emit), but neither tool needs *only* roundtrip — both need to **remove**
-and **rename** existing structure.
-
-### Operations matrix
-
-| Operation | What it needs | Have today? |
-| ---------------------------------------- | -------------------------------------------- | --------------------------------- |
-| `strip --strip-debug` / `objcopy --strip-debug` | drop `CFREE_SEC_DEBUG` sections | reader exposes kind ✓ — emit filter missing |
-| `strip --strip-all` | drop debug + symtab | needs emit-time symbol filter |
-| `strip --strip-unneeded` | keep only relocation-referenced symbols | reader exposes reloc→sym ✓ — needs builder symbol filter |
-| `strip --keep-symbol=N` / `--strip-symbol=N` | symbol predicate | needs builder symbol filter |
-| `objcopy --remove-section=N` | drop section by name | needs builder mutator |
-| `objcopy --only-section=N` | inverse of above | needs builder mutator |
-| `objcopy --rename-section old=new[,flags]` | mutate section name + flags | needs builder mutator |
-| `objcopy --add-section name=file` | add new section from external bytes | already possible via existing builder API |
-| `objcopy --update-section name=file` | replace section contents | needs builder mutator |
-| `objcopy --redefine-sym old=new` | rename symbol | needs builder mutator |
-| `objcopy --globalize-symbol`/`--localize-symbol`/`--weaken-symbol` | mutate `CfreeSymBind` | needs builder mutator |
-| `objcopy --extract-symbol` | emit a symbol's bytes as its own object | needs builder mutator + new emit |
-| `objcopy --only-keep-debug` | keep only `.debug_*` + symtab | needs builder mutator |
-| `objcopy --add-gnu-debuglink=FILE` | append debuglink section + CRC | needs CRC32 helper + add-section |
-| `objcopy -O <bfdname>` (format convert) | ELF ↔ Mach-O ↔ COFF roundtrip | builder is already format-neutral; should work once mutators land |
-| `objcopy --change-section-address=...` | adjust section VMA / LMA | needs builder mutator |
-| `objcopy -I/-O binary`, `srec`, `ihex` | flat-binary / S-record / Intel-hex output | not supported; new emitters |
-
-### What `libcfree` needs
-
-Beyond the builder mutators, a few smaller items:
-
-1. **Section-group reader iterator** (`CfreeObjGroupIter`,
- `cfree_obj_groupiter_new/next/free`, `CfreeObjGroupInfo`). The builder
- has `cfree_obj_builder_group` and `_group_add_section`, but the reader
- exposes no way to enumerate existing groups. Any objcopy that touches
- a COMDAT-bearing object would lose grouping on roundtrip without this.
-
-2. **Builder mutator API.** Minimal MVP that unblocks strip and the
- common objcopy operations:
-
- ```c
- CfreeStatus cfree_obj_builder_remove_section(CfreeObjBuilder *, CfreeObjSection);
- CfreeStatus cfree_obj_builder_remove_symbol(CfreeObjBuilder *, CfreeObjSymbol);
- CfreeStatus cfree_obj_builder_rename_section(CfreeObjBuilder *, CfreeObjSection,
- CfreeSym new_name);
- CfreeStatus cfree_obj_builder_rename_symbol(CfreeObjBuilder *, CfreeObjSymbol,
- CfreeSym new_name);
- CfreeStatus cfree_obj_builder_symbol_set_bind(CfreeObjBuilder *, CfreeObjSymbol,
- CfreeSymBind);
- ```
-
- These would need to lift the post-finalize-frozen invariant — either
- by re-opening the builder for writes, or by adding a parallel
- filtered-emit path that takes a callback predicate. The latter is
- probably less invasive.
-
-3. **DSO / executable inputs are a separate problem.** `cfree_obj_open`
- reads relocatable `.o` cleanly, but stripping a *linked* ELF
- (executable or DSO) means understanding `.dynsym`, `.dynstr`,
- `.hash`/`.gnu.hash`, `.dynamic`, `.got`, `.plt`, `.rela.plt`,
- `PT_NOTE` (build-id), and a `PT_DYNAMIC` segment — most of which are
- linker-managed, not builder-managed. GNU `strip` and `objcopy` can
- operate on these because `bfd` round-trips the full dynamic-linking
- state. We don't model that today. Scope strip/objcopy to `.o` and
- `.a` for the first cut.
-
-### Suggested sequencing
-
-1. Add the section-group reader iterator (small, no mutator concerns).
-2. Add the builder mutator API for sections + symbols.
-3. Implement `strip` (relocatable inputs only) as a driver tool. Factor
- the per-member symbol-collection block from `driver/ar.c` and
- `driver/ranlib.c` into a shared helper while we're touching the area.
-4. Implement `objcopy` (relocatable inputs only). The `--add-section`
- / `--rename-section` / `--redefine-sym` / `--strip-*` subset covers
- the vast majority of build-system use.
-5. Defer DSO/exe strip+objcopy, format-conversion to non-object outputs
- (`binary`, `srec`, `ihex`), and `objcopy --only-keep-debug` /
- `--add-gnu-debuglink` (the split-debuginfo flow).
-
-## Flag-surface gaps
-
-Methodology: each tool's argv parser was compared against the union of
-GCC's `cc` and the corresponding binutils tool. Flags that are
-silently accepted as no-ops (e.g. `-pipe`, `-std=`) are not gaps.
-
-### `cc` — broad surface; the gaps are mostly autotools/CMake probes
-
-- **Pass-through flag families.** `-Wp,...` (preprocessor) and `-Wa,...`
- (assembler) are missing. `-Wl,...` is supported. `-Xpreprocessor` /
- `-Xassembler` similarly missing; `-Xlinker` is supported.
-- **Compiler-information probes** (used by autoconf, CMake's compiler
- detection): `-print-search-dirs`, `-print-file-name=`,
- `-print-prog-name=`, `-print-libgcc-file-name`,
- `-print-multi-os-directory`, `-print-resource-dir`, `-dumpmachine`,
- `-dumpversion`, `-dumpspecs`. Some build systems hard-fail when these
- return nothing.
-- **Linker convenience.** `-rdynamic` (≡ `-Wl,--export-dynamic`) not
- wired through.
-- **Dep emission.** `-Wp,-MD,FILE` form not handled (GNU make's
- auto-dependency idiom).
-- **Response files.** `@file` not supported; long CMake invocations on
- some platforms exceed `ARG_MAX`.
-- **Code-gen tuning.** `-march=`, `-mtune=`, `-mcpu=`, `-mfpu=`,
- `-msse*`, `-mavx*` — none implemented. Currently silently no-op'd
- via the `-W…`/`-f…` catch-all in `cc_parse`.
-- **Other compiler flags accepted as no-ops** (call-site behaviour ≠
- ABI-correctness): `-fvisibility=`, `-fcommon`/`-fno-common`,
- `-fstack-protector*`, `-fno-omit-frame-pointer`, `-funwind-tables`,
- `-fexceptions`, `-static-libgcc`, `-shared-libgcc`,
- `-fsyntax-only`, `-fdiagnostics-color`, `-save-temps`.
-- **Long forms.** `--output=PATH`, `--include=`, etc.
-- **Includes.** `-iquote`, `-idirafter`, `-include` are currently
- swallowed as no-ops (`driver/cc.c:939`); should land in the cflags
- surface.
-
-### `cpp` — same baseline as `cc -E`
-
-Inherits all of cc's `-I/-isystem/-D/-U` + dep emission. Specific
-gaps that exist equally in `cc -E`:
-
-- `-P` — suppress `#line` markers
-- `-dM` — dump defined macros instead of expanded source
-- `-C`, `-CC` — preserve comments
-- `-traditional-cpp`
-- `-fno-show-column`
-- `-Wp,-MD,FILE` (see above)
-
-### `as` — minimal surface
-
-- **No code-gen target selection.** `-march=`, `-mcpu=`, `-mtune=`,
- `-mabi=` (riscv `lp64d` vs `lp64`), `--32`/`--64`, `-m32`/`-m64`.
-- **No warnings control.** `-W`, `-Z`, `--warn`, `--fatal-warnings`,
- `--no-warn`.
-- **No `-MD <file>`** for assembler-side dependency emission on `.S`.
-- **No assembly listings.** `-a` family (`-al`, `-as`, `-an`, …),
- `--listing-*`, `--statistics`.
-- **No DWARF version selection.** Only blanket `-g`; missing
- `--gdwarf-2/3/4/5`, `--gstabs`, `--gdwarf-sections`.
-- **No PIC / passthrough flags.** `-K`, `-Q`, `-k`.
-- **One input only.** GNU `as` accepts multiple sources and
- concatenates.
-- **No `-defsym SYM=VAL`** for assemble-time constant injection.
-- **No stdin input** (`-`).
-
-### `ld` — strong; gaps are advanced features and `-z` flags
-
-- **`-z` options** (used by every distro): `-z now`, `-z relro`,
- `-z noexecstack`, `-z defs`, `-z origin`, `-z notext`, `-z lazy`,
- `-z combreloc`, `-z text`. These map to ELF dynamic-tag bits and
- segment flags that the linker already emits in some form — wiring
- them up should be small per flag.
-- **Link maps.** `-M` / `--print-map`, `-Map=FILE`,
- `--print-gc-sections`, `--print-memory-usage`.
-- **Symbol-resolution policy.** `--no-undefined`,
- `--allow-shlib-undefined`, `--unresolved-symbols={...}`.
-- **Symbol surgery.** `--wrap=SYMBOL`, `--defsym=SYM=EXPR`,
- `--undefined=SYM`, `--retain-symbols-file`.
-- **Version scripts / dynamic lists.** `--version-script`,
- `--dynamic-list`, `--exclude-libs`.
-- **Hash style.** `--hash-style={sysv,gnu,both}`, `--no-gnu-hash`.
-- **Section placement.** `--section-start=NAME=ADDR`, `-Ttext=`,
- `-Tdata=`, `-Tbss=`.
-- **Cross-reference.** `--cref`.
-- **Identical-code folding.** `--icf={none,safe,all}`.
-- **Init/fini.** `--init`, `--fini` for non-default entry symbols.
-- **Sort/common.** `--sort-section`, `--sort-common`,
- `--no-define-common`.
-- **Endianness / emulation.** `--EB`, `--EL`, `-m EMULATION` (currently
- auto-detected from inputs; the `-m` form is missing).
-- **Strip flags.** `--strip-all`, `--strip-debug`, `-s`, `-S` (would
- pair with the strip work above).
-- **ELF notes.** `--package-metadata=` (a fielded use case in distro
- packaging).
-- **Response files.** `@file`.
-- **Stdin input** (`-`).
-
-### `ar` — POSIX covered; binutils extensions missing
-
-- **Operations.** `d` (delete), `q` (quick append), and standalone
- `s` (now provided by `cfree ranlib`, but `ar s` is also expected).
- `m` (move member). `b NAME` / `a NAME` / `i NAME` (positional
- insertion modifiers paired with `r`/`m`).
-- **Modifiers.** `D`/`U` (deterministic / non-deterministic; `D` is
- GNU's default. `SOURCE_DATE_EPOCH` is similar but not equivalent).
- `N <count>` (Nth instance of a duplicated member name). `P` (full
- pathname match). `o` (preserve mtime on extract). `S` (suppress
- symbol index — opposite of `s`). `T` (thin archive — used by LLVM).
-- **MRI script mode** (read commands from stdin). Rarely used; skip.
-
-### `objdump` — biggest gap among the shipped tools
-
-- **Aggregate flags.** `-x` (all headers ≡ `-f -p -h -r -t`), `-f`
- (file header), `-p` (program header / private).
-- **Source intermixing.** `-S` (intermix source — DWARF line info
- already available), `-l` (line numbers in disasm / relocs).
-- **Disassembly scope.** `--disassemble=SYM`, `--start-address=`,
- `--stop-address=`.
-- **Disassembly formatting.** `-z` (don't skip zeros), `-w` (wide
- output), `--no-show-raw-insn`, `--prefix-addresses`, `-M ATTR`
- (e.g. `-M intel` for x86 syntax).
-- **Dynamic vs. static.** `-R` (dynamic relocations) vs the existing
- `-r` (static); `-T` (dynamic symbols) vs the existing `-t`
- (static).
-- **DWARF dumping.** `-W` / `--dwarf=...` — cfree emits DWARF and
- exposes reader APIs, so this should be straightforward.
-- **Long forms.** `--syms`, `--section-headers`, `--archive-headers`,
- `--all-headers`, `--file-offsets`.
-- **Override format / arch.** `-b BFDNAME`, `-m ARCH`, `-EB`/`-EL`.
-- **C++ demangling.** `-C`, `--demangle` — N/A for C; can land as a
- silent no-op once it's needed.
-
-## Windows (PE/COFF) target
-
-Cross-compilation to Windows requires the mingw-w64 sysroot for system
-libraries and CRT bits. Set `CFREE_MINGW_SYSROOT` to the
-`<toolchain>/x86_64-w64-mingw32` directory (or pass `-isysroot` /
-`--sysroot`) so the `cc` driver appends `$SYSROOT/lib` to the library
-search path. Both `cc -lFOO` and `ld -lFOO` resolve Windows libraries
-using the suffix list `libFOO.dll.a` → `libFOO.a` → `FOO.lib` →
-`FOO.dll.a` (mingw-canonical first, MSVC-style fallback).
-
-Example invocations:
-
-```sh
-export CFREE_MINGW_SYSROOT=/opt/homebrew/opt/mingw-w64/toolchain-x86_64/x86_64-w64-mingw32
-
-# Compile-only: produces hello.obj (note .obj suffix on Windows targets).
-cfree cc -target x86_64-windows -c hello.c
-
-# Inspect a PE32+ image. -p prints the optional header, data
-# directories, and per-DLL import lists.
-cfree objdump -p hello.exe
-
-# Link via MSVC-style flag surface (opt-in via --ms-link-driver):
-cfree ld --ms-link-driver /OUT:hello.exe /SUBSYSTEM:CONSOLE \
- /DEFAULTLIB:kernel32 hello.obj
-```
-
-Windows predefined macros emitted by `cc -target x86_64-windows`:
-`_WIN32`, `_WIN64`, `WIN32`, `__MINGW32__`, `__MINGW64__`, `_M_X64`,
-`_M_AMD64`. `aarch64-windows` substitutes `_M_ARM64` for the
-x64-specific names. `_MSC_VER` is deliberately not set — cfree targets
-the mingw flavor on Windows (DWARF debug info, mingwex CRT), not MSVC.
-
-## Recommended next moves
-
-1. **Add to `cc` first**: `-rdynamic`, `-print-search-dirs`,
- `-print-file-name`, `-print-prog-name`, `-dumpmachine`,
- `-dumpversion`, `@file`. These unblock most autotools/CMake
- probes for very little code.
-2. **Add to `ld`** the `-z` family (`-z now`, `-z relro`,
- `-z noexecstack` are the high-traffic three) and `-Map=FILE`.
-3. **Add to `objdump`** the `-x` aggregate, `-S`, `-l`, and
- `--dwarf=...`. Most "I want to see what the compiler produced"
- debug sessions need at least one of these.
-4. **Then unblock strip/objcopy** via the builder mutator API and
- ship strip first (smaller surface than objcopy).
diff --git a/doc/FRONTEND.md b/doc/FRONTEND.md
@@ -1,412 +0,0 @@
-# Interactive Frontend REPL
-
-This document tracks the current source-frontend REPL shape and the remaining
-work to make `cfree dbg` a full interactive compile/link/publish environment.
-The immediate API direction is stateful handles at each layer:
-
-- `CfreeCompileSession` owns frontend state for one source language.
-- `CfreeCg` owns reusable codegen metadata and binds one object delta at a time.
-- `CfreeLinkSession` owns linker inputs and resolution state.
-- `CfreeJit` owns the live executable image and publishes resolved deltas.
-
-One-shot public compile/link/JIT append APIs are being removed. Callers should
-create sessions, add inputs, resolve or publish, and then free the sessions.
-
-## Current State
-
-- [x] Registered frontends use the lifecycle vtable
- `new_frontend`, `compile`, `free_frontend`.
-- [x] Public source compilation goes through `CfreeCompileSession`:
- `cfree_compile_session_new`, `cfree_compile_session_compile`,
- `cfree_compile_session_free`.
-- [x] Public one-shot source APIs such as `cfree_compile_c_obj`,
- `cfree_compile_c_emit`, `cfree_compile_asm_obj`, and
- `cfree_compile_asm_emit` have been removed.
-- [x] `CfreeSourceInput` carries per-delta REPL shape:
- `input_kind` and `repl_entry_name`.
-- [x] `CfreeCompileSessionOptions` keeps fixed session options:
- language, code options, diagnostics, and language-specific options.
-- [x] `CfreeCg` is reusable across object deltas with
- `cfree_cg_begin_obj` and `cfree_cg_end_obj`.
-- [x] `CfreeLinkSession` is public and owns link inputs plus resolve/emit/JIT
- operations.
-- [x] Public one-shot link APIs have been removed; drivers and tests create
- `CfreeLinkSession` directly.
-- [x] Public JIT append is `cfree_jit_publish`. The v1 publish mode supports
- append-object batches through a `CfreeLinkSession`.
-- [x] `driver/cc.c`, `driver/as.c`, `driver/ld.c`, `driver/inputs.c`,
- `driver/runtime.c`, `driver/dbg.c`, and the active harnesses have been
- migrated to session APIs.
-- [x] `cfree dbg` can start from an empty JIT image. The default REPL language
- is selected by `-x LANG` / `--language LANG` or changed with `:language`.
-- [x] `driver/dbg.c` caches `CfreeCompileSession*` per language for snippets
- typed during the REPL.
-- [x] `driver/dbg.c` publishes snippets by creating a temporary
- `CfreeLinkSession`, adding the object delta, and calling `cfree_jit_publish`.
-- [x] Toy supports top-level snippets, bare expression wrappers, block wrappers,
- persistent globals, persistent functions, and persistent nominal types across
- REPL snippets.
-- [x] Scripted driver tests cover the Toy REPL append/expression path.
-- [ ] Initial source files passed to `cfree dbg` are not yet used to seed the
- cached per-language `CfreeCompileSession`, so REPL expressions do not have
- source-frontend declarations from those initial files.
-- [ ] C and Wasm still need true interactive frontend state. C currently owns
- parser/preprocessor/declaration state per compile; Wasm is still module-shaped.
-- [ ] `CfreeLinkSession` is session-shaped, but incremental watermarks,
- symbol-version policy, and replace/redefine semantics are still skeletal.
-- [ ] `cfree_jit_publish` function replacement is shaped in the API but returns
- `CFREE_UNSUPPORTED`.
-
-## Target UX
-
-Interactive sessions should support these workflows:
-
-```c
-(cfree) :language c
-(cfree) jit { #define SCALE(x) ((x) * 3) }
-(cfree) jit { typedef struct { int x; int y; } Point; Point p = {4, 5}; }
-(cfree) SCALE(p.x + p.y)
-$1 = 27 (0x1b)
-```
-
-```text
-cfree dbg -x toy
-(cfree) jit { type Point = record { x: i64, y: i64 }; let p: Point = .{ .x = 4, .y = 5 }; }
-(cfree) p.x + p.y
-$1 = 9 (0x9)
-```
-
-```wat
-(cfree) :language wat
-(cfree) jit { (module (func (export "add") (param i64 i64) (result i64) local.get 0 local.get 1 i64.add)) }
-(cfree) invoke add 4 5
-$1 = 9 (0x9)
-```
-
-For C and Toy, unrecognized bare input should be the expression/thunk fallback
-and should compile a language-native expression wrapper. The explicit `expr`
-command can remain as an alias. For Wasm, the natural interactive unit is a
-module plus explicit export invocation; WAT expression shortcuts can come later
-as sugar over generated modules.
-
-## Public API Shape
-
-### Source Compile
-
-Current public source compilation:
-
-```c
-typedef enum CfreeFrontendInputKind {
- CFREE_FRONTEND_INPUT_TRANSLATION_UNIT,
- CFREE_FRONTEND_INPUT_REPL_TOPLEVEL,
- CFREE_FRONTEND_INPUT_REPL_EXPR,
- CFREE_FRONTEND_INPUT_REPL_BLOCK,
-} CfreeFrontendInputKind;
-
-typedef struct CfreeSourceInput {
- CfreeBytes bytes;
- CfreeLanguage lang;
- CfreeFrontendInputKind input_kind;
- const char *repl_entry_name;
-} CfreeSourceInput;
-
-typedef struct CfreeFrontendCompileOptions {
- CfreeCodeOptions code;
- CfreeDiagnosticOptions diagnostics;
- const void *language_options;
- CfreeFrontendInputKind input_kind;
- const char *repl_entry_name;
-} CfreeFrontendCompileOptions;
-
-typedef struct CfreeCompileSessionOptions {
- CfreeLanguage lang;
- CfreeFrontendCompileOptions compile;
-} CfreeCompileSessionOptions;
-
-CfreeStatus cfree_compile_session_new(CfreeCompiler *,
- const CfreeCompileSessionOptions *,
- CfreeCompileSession **out);
-CfreeStatus cfree_compile_session_compile(CfreeCompileSession *,
- const CfreeSourceInput *,
- CfreeObjBuilder **out);
-void cfree_compile_session_free(CfreeCompileSession *);
-```
-
-`input_kind` and `repl_entry_name` are copied from `CfreeSourceInput` into the
-frontend compile options for each delta. This lets a debugger keep one
-`CfreeCompileSession` alive while alternating top-level snippets, expression
-thunks, and block thunks.
-
-### Codegen
-
-Current public codegen lifecycle:
-
-```c
-CfreeStatus cfree_cg_new(CfreeCompiler *, CfreeCg **out);
-CfreeStatus cfree_cg_begin_obj(CfreeCg *, CfreeObjBuilder *,
- const CfreeCodeOptions *);
-CfreeStatus cfree_cg_end_obj(CfreeCg *);
-void cfree_cg_free(CfreeCg *);
-```
-
-`CfreeCg` preserves compiler-level metadata across object deltas. Object-bound
-target, MC, and debug state are created by `begin_obj` and finalized by
-`end_obj`.
-
-Remaining codegen work:
-
-- [ ] Define exactly which symbol/type/metadata tables persist across
- `begin_obj`/`end_obj` for each frontend.
-- [ ] Seed each new object builder with external declarations for all known
- frontend symbols that may be referenced by later snippets.
-- [ ] Add a focused test proving two objects emitted by one `CfreeCg` can refer
- to each other after link/JIT resolution.
-- [ ] Audit failed-object cleanup so a failed snippet cannot corrupt the
- persistent frontend or CG metadata.
-
-### Link
-
-Current public link lifecycle:
-
-```c
-CfreeStatus cfree_link_session_new(CfreeCompiler *,
- const CfreeLinkSessionOptions *,
- CfreeLinkSession **out);
-CfreeStatus cfree_link_session_add_obj(CfreeLinkSession *, CfreeObjBuilder *);
-CfreeStatus cfree_link_session_add_obj_bytes(CfreeLinkSession *, CfreeBytes);
-CfreeStatus cfree_link_session_add_archive_bytes(CfreeLinkSession *,
- CfreeBytes);
-CfreeStatus cfree_link_session_add_dso_bytes(CfreeLinkSession *, CfreeBytes);
-CfreeStatus cfree_link_session_resolve(CfreeLinkSession *);
-CfreeStatus cfree_link_session_emit(CfreeLinkSession *, CfreeWriter *out);
-CfreeStatus cfree_link_session_jit(CfreeLinkSession *, CfreeJit **out_jit);
-void cfree_link_session_free(CfreeLinkSession *);
-```
-
-Remaining link work for a full interactive REPL:
-
-- [ ] Keep durable symbol-resolution state for incremental sessions instead of
- treating each publish as a mostly independent batch.
-- [ ] Define duplicate strong symbol policy across REPL generations.
-- [ ] Define weak/common/TLS behavior across appended generations.
-- [ ] Track generation watermarks so diagnostics can say whether a symbol came
- from the initial image or a later snippet.
-- [ ] Add tests where a later object resolves references against prior objects,
- archives, DSOs, and the initial JIT image.
-- [ ] Add negative tests for unresolved symbols, duplicate definitions, and
- unsupported relocation modes in interactive publish.
-
-### JIT Publish
-
-Current public publish lifecycle:
-
-```c
-typedef enum CfreeJitPublishKind {
- CFREE_JIT_PUBLISH_APPEND_OBJECTS,
- CFREE_JIT_PUBLISH_REPLACE_SYMBOLS,
-} CfreeJitPublishKind;
-
-typedef struct CfreeJitPublishOptions {
- uint8_t kind;
- CfreeLinkSession *link;
-} CfreeJitPublishOptions;
-
-typedef struct CfreeJitPublishResult {
- uint64_t generation;
-} CfreeJitPublishResult;
-
-CfreeStatus cfree_jit_publish(CfreeJit *, const CfreeJitPublishOptions *,
- CfreeJitPublishResult *);
-```
-
-Remaining publish work:
-
-- [ ] Implement `CFREE_JIT_PUBLISH_REPLACE_SYMBOLS` or remove it until the
- semantics are fully specified.
-- [ ] Preserve old symbol addresses for append-only generations and test that
- invariant directly.
-- [ ] Decide how the debugger should surface symbols shadowed or replaced by a
- later generation.
-- [ ] Keep DWARF and symbol iteration generation-aware.
-- [ ] Add tests that publish increments the JIT generation and keeps old
- function pointers callable after new snippets are appended.
-
-## Debugger Driver
-
-- [x] Cache `CfreeCompileSession*` per language in `DbgState`.
-- [x] Keep cached compile sessions alive for the whole REPL session.
-- [x] Treat a REPL line beginning with `{` as shorthand for `jit { ... }`.
-- [x] Add `:language c|toy|wat|wasm|asm`.
-- [x] Make `jit`, explicit `expr`, and bare fallback input honor the selected
- language through `CfreeSourceInput.input_kind`.
-- [x] Add `-x LANG` / `--language LANG` to choose the default REPL language
- before any source file exists.
-- [x] Make `:language` with no argument report the current language and whether
- that language has a cached compile session.
-- [x] Allow an empty initial JIT image; `run` resolves the entry lazily so a
- later snippet can define `main`.
-- [x] Remove driver-side language-specific thunk fabrication for Toy. Frontends
- own REPL expression/block wrapping.
-- [ ] Seed cached compile sessions from initial source-file inputs.
-- [ ] Reuse or persist link-session state where that is needed for incremental
- diagnostics and symbol policy.
-- [ ] Keep DWARF/JIT symbol recovery for inspecting external/preexisting code,
- not as the normal path for declarations typed during the current session.
-- [ ] Add command support for Wasm export invocation.
-
-## C Checklist
-
-Persistent C context must include the preprocessor, file-scope identifiers,
-tags, typedefs, declaration table, and CG symbol/type handles.
-
-- [ ] Change `CFrontend` to own a long-lived `Pool`.
-- [ ] Change `CFrontend` to own a long-lived `Pp`.
-- [ ] Apply command-line include paths, predefined macros, `-D`, and `-U` once
- at frontend creation or first compile, with clear behavior if options change.
-- [ ] Keep the file-scope `Scope` alive across snippets.
-- [ ] Keep `DeclTable` alive across snippets.
-- [ ] Keep one persistent `CfreeCg` and bind a new object with
- `cfree_cg_begin_obj` per snippet.
-- [ ] Split parser initialization from translation-unit parsing so a parser can
- reuse file-scope state with a new lexer.
-- [ ] Ensure failed snippets do not corrupt persistent scope or macro state.
- Initial implementation may mark a session frontend poisoned after hard errors.
-- [ ] Implement `CFREE_FRONTEND_INPUT_REPL_TOPLEVEL` for normal declarations and
- definitions.
-- [ ] Implement `CFREE_FRONTEND_INPUT_REPL_EXPR` by wrapping the expression as:
-
-```c
-unsigned long long __cfree_dbg_expr_N(void) {
- return (unsigned long long)(USER_EXPR);
-}
-```
-
-- [ ] Implement `CFREE_FRONTEND_INPUT_REPL_BLOCK` by wrapping a block as:
-
-```c
-unsigned long long __cfree_dbg_expr_N(void) {
- USER_STATEMENTS
-}
-```
-
-- [ ] Decide whether block mode requires an explicit `return` or permits
- expression-final shorthand.
-- [ ] Support macros across snippets:
- `jit { #define N 7 }`, then bare `N + 1`.
-- [ ] Support typedefs/tags across snippets:
- `jit { typedef struct { int x; } S; S s = {41}; }`, then bare `s.x + 1`.
-- [ ] Support function definitions across snippets:
- `jit { int f(int x) { return x + 1; } }`, then bare `f(41)`.
-- [ ] Diagnose strong redefinition cleanly when a later snippet defines the same
- global function/object.
-- [ ] Add targeted tests under a new `test/dbg` or `test/repl` harness.
-
-## Toy Checklist
-
-Toy is the current working interactive target.
-
-- [x] Change `ToyFrontend` to own persistent parser symbol/type storage instead
- of rebuilding all parser state per compile.
-- [x] Refactor `ToyParser` so lexical state is per snippet but declarations,
- record/enum/type tables, globals, and function symbols persist.
-- [x] Implement `CFREE_FRONTEND_INPUT_REPL_TOPLEVEL` for declarations and
- definitions.
-- [x] Implement `CFREE_FRONTEND_INPUT_REPL_EXPR` by generating:
-
-```toy
-fn __cfree_dbg_expr_N(): i64 {
- return USER_EXPR as i64;
-}
-```
-
-- [x] Implement `CFREE_FRONTEND_INPUT_REPL_BLOCK` using Toy block/function
- syntax and require an explicit return in v1.
-- [x] Preserve global variables across snippets:
- `jit { let x: i64 = 41; }`, then bare `x + 1`.
-- [x] Preserve nominal records/enums/type aliases across snippets.
-- [x] Preserve functions across snippets, including calls from later snippets.
-- [x] Add Toy REPL smoke tests for globals, functions, record field access, and
- expression wrapper.
-- [ ] Keep one persistent `CfreeCg` if Toy needs CG-level metadata beyond the
- parser-owned declaration/type tables.
-- [ ] Add diagnostics for duplicate definitions and type mismatches that include
- the snippet input name.
-- [ ] Add Toy REPL smoke tests for enum constants and block wrapper.
-- [ ] Add a focused test for a failed Toy snippet followed by a successful
- snippet, documenting poison-or-recovery semantics.
-
-## Wasm Checklist
-
-Wasm is different from C/Toy: the user normally supplies complete WAT/Wasm
-modules, not declarations in a source namespace. The ergonomic REPL target is a
-module/session model with export invocation and instance-owned runtime state.
-
-- [ ] Decide v1 interaction model:
- module append plus `invoke EXPORT ARGS...`, not arbitrary Wasm expression
- snippets.
-- [ ] Keep `WasmFrontend` lifecycle-shaped but treat most parser/module state as
- per compile unless a specific cross-module context is introduced.
-- [ ] Preserve instance/runtime state for appended modules where supported:
- memories, tables, globals, start/init calls, and import slots.
-- [ ] Add `dbg` command support for invoking Wasm exports by name with typed
- integer/float arguments.
-- [ ] Define how duplicate export names are handled across appended modules:
- reject, shadow by generation, or require module qualification.
-- [ ] Add module qualification in symbol lookup if multiple modules can export
- the same name.
-- [ ] Add clear diagnostics for unsupported interactive cases:
- relocatable Wasm object input, multi-memory gaps, unsupported proposals, WASI
- startup, and wasm64.
-- [ ] Implement optional `CFREE_FRONTEND_INPUT_REPL_EXPR` later as WAT sugar,
- lowering an expression into a generated module/function.
-- [ ] Add Wasm REPL smoke tests:
- WAT module append, export invocation, start function behavior, memory/data
- persistence, imported function call, and duplicate export diagnostics.
-
-## Shared Acceptance Tests
-
-- [x] `make bin`
-- [x] `make test-cg-api`
-- [x] `make test-link`
-- [x] `make test-asm`
-- [x] `make test-parse`
-- [x] `make test-driver`
-- [ ] `make test-toy` stays green after Toy refactors.
-- [ ] `make test-parse-err test-pp test-pp-err` stay green after C REPL
- frontend work.
-- [ ] New scripted REPL tests cover:
- C macro/type/global persistence, Toy type/global/function persistence, Wasm
- module invoke, bare expression fallback, explicit `expr` alias behavior, block
- wrappers, duplicate definitions, and clean diagnostics after failed snippets.
-- [ ] Manual smoke:
-
-```text
-cfree dbg test.c
-(cfree) :language c
-(cfree) jit { typedef struct { int a; int b; } Point; Point p = {1, 2}; }
-(cfree) p.a + p.b
-$1 = 3 (0x3)
-```
-
-```text
-cfree dbg test.toy
-(cfree) :language toy
-(cfree) jit { let x: i64 = 40; fn inc(v: i64): i64 { return v + 1; } }
-(cfree) inc(x) + 1
-$1 = 42 (0x2a)
-```
-
-```text
-cfree dbg --language wat
-(cfree) jit { (module (func (export "answer") (result i64) i64.const 42)) }
-(cfree) invoke answer
-$1 = 42 (0x2a)
-```
-
-## Notes
-
-DWARF recovery remains useful for inspecting preexisting objects and external
-debug info. It should not be the primary mechanism for normal REPL expressions
-typed during the current session. The persistent frontend has better
-source-level context for macros, typedefs, source-language type aliases,
-front-end-only attributes, and frontend-specific syntax.
diff --git a/doc/LANGS.md b/doc/LANGS.md
@@ -1,479 +0,0 @@
-# Language Frontend Architecture Plan
-
-## Overview
-
-libcfree currently hard-codes two source consumers inside `src/api/pipeline.c`:
-
-- **C** (`CFREE_LANG_C`) — preprocessor → C parser → `CG` → `CGTarget` → `MCEmitter`
-- **Assembly** (`CFREE_LANG_ASM`) — lexer → assembler → `MCEmitter`
-
-The C path is privileged because `CG` (`src/cg/cg.h`) is glued to the internal C
-type system (`src/type/type.h`). To enable *alternative* language frontends we
-will introduce a **new public codegen seam** (`include/cfree/cg.h`) that speaks
-in language-neutral layout descriptors rather than C `Type*`. Frontends live
-under `lang/` and consume only public headers (`include/cfree*.h`); the driver
-registers them at startup.
-
-This plan defines Phase 1 (public ObjBuilder registration) and Phase 2 (the new
-`CfreeCg` API), then scopes a **toy language** (`lang/toy/`) to prove the seam
-before touching the C frontend.
-
-## Directory layout
-
-```
-lang/
- toy/
- lex.c — tokenizer: produces `ToyToken` structs with `CfreeSrcLoc` (line, col)
- parse.c — recursive-descent parser that consumes a token iterator → CfreeCg calls
- type.c — toy type system: int, record, array, pointer
- type.h
- toy.h — public frontend entry: `cfree_toy_compile()`
- Makefile — produces `libcfree_toy.a`
-```
-
-`lang/` is a sibling of `driver/` and `src/`. It only includes `<cfree.h>` and
-`<cfree/cg.h>`. No internal `src/` headers.
-
-## Public API: `include/cfree/cg.h`
-
-This is the **language-neutral codegen surface**. It replaces the internal
-`CG` + `CGTarget` vtable with a stable, typed C API. The implementation lives
-in `src/api/cg.c` (or alongside `pipeline.c`) and adapts calls to the existing
-`CGTarget` machinery.
-
-### Handles
-
-```c
-typedef struct CfreeCg CfreeCg;
-typedef struct CfreeCgType CfreeCgType;
-typedef struct CfreeCgValue CfreeCgValue; /* opaque stack/SSA handle */
-typedef uint32_t CfreeCgLabel;
-#define CFREE_CG_LABEL_NONE 0u
-```
-
-### Type factory
-
-Types are **layout descriptors**, not semantic C types. They carry `(size,
-align, scalar_kind)` and, for aggregates, a flat field list. The backend ABI
-classification derives layout from these descriptors without knowing about C
-qualifiers, bitfields, or tag identity.
-
-```c
-CfreeCgType* cfree_cg_type_i32(CfreeCompiler*);
-CfreeCgType* cfree_cg_type_i64(CfreeCompiler*);
-CfreeCgType* cfree_cg_type_u32(CfreeCompiler*);
-CfreeCgType* cfree_cg_type_u64(CfreeCompiler*);
-CfreeCgType* cfree_cg_type_f32(CfreeCompiler*);
-CfreeCgType* cfree_cg_type_f64(CfreeCompiler*);
-
-/* Pointer: element type + count (0 = single pointer or unknown, >0 array) */
-CfreeCgType* cfree_cg_type_ptr(CfreeCompiler*, CfreeCgType* pointee, uint32_t count);
-
-/* Records (structs, tuples, tagged unions). The caller describes fields in
- declaration order; the backend computes offsets/alignment/padding. */
-typedef struct CfreeCgField {
- CfreeSym name; /* may be 0 for anonymous/tuples */
- CfreeCgType* type;
- uint32_t align_override; /* 0 = natural, 1 = packed */
-} CfreeCgField;
-CfreeCgType* cfree_cg_type_record(CfreeCompiler*,
- CfreeSym tag,
- const CfreeCgField* fields,
- uint32_t nfields);
-
-/* Function type for indirect calls and type-checking. */
-CfreeCgType* cfree_cg_type_func(CfreeCompiler*,
- CfreeCgType* ret,
- CfreeCgType** params,
- uint32_t nparams,
- int variadic);
-```
-
-### CG lifecycle
-
-```c
-/* Construct a CG context bound to an ObjBuilder. */
-CfreeCg* cfree_cg_new(CfreeCompiler*, CfreeObjBuilder* out);
-void cfree_cg_free(CfreeCg*);
-
-/* Function boundaries. `name` is the source-level symbol; the backend applies
- the active object format's C-symbol mangling (e.g. leading `_` on Mach-O). */
-void cfree_cg_func_begin(CfreeCg*, const char* name, CfreeCgType* fn_type);
-void cfree_cg_func_end(CfreeCg*);
-
-/* Source location tracking (sticky until next call). */
-void cfree_cg_set_loc(CfreeCg*, CfreeSrcLoc);
-```
-
-### Value stack
-
-`CfreeCg` owns a TCC-style stack. Every push produces a value; every operation
-consumes and produces values. The stack discipline is the frontend's
-responsibility; the backend manages register allocation and spills.
-
-```c
-/* Literal materialization */
-void cfree_cg_push_int(CfreeCg*, int64_t value, CfreeCgType* type);
-void cfree_cg_push_float(CfreeCg*, double value, CfreeCgType* type);
-
-/* String literals → rodata pointer */
-void cfree_cg_push_bytes(CfreeCg*, const uint8_t* str, size_t len);
-
-/* Addressable storage */
-void cfree_cg_push_local(CfreeCg*, uint32_t slot_id, CfreeCgType* type);
-void cfree_cg_push_global(CfreeCg*, CfreeSym name, CfreeCgType* type);
-
-/* Lvalue/rvalue conversion */
-void cfree_cg_load(CfreeCg*); /* lvalue → rvalue */
-void cfree_cg_addr(CfreeCg*); /* lvalue → pointer rvalue */
-void cfree_cg_store(CfreeCg*); /* pop [addr_or_lvalue, rvalue] */
-
-/* Stack manipulation */
-void cfree_cg_dup(CfreeCg*);
-void cfree_cg_swap(CfreeCg*);
-void cfree_cg_drop(CfreeCg*);
-void cfree_cg_rot3(CfreeCg*);
-```
-
-### Arithmetic, compare, convert
-
-```c
-typedef enum CfreeCgBinOp {
- CFREE_CG_ADD, CFREE_CG_SUB, CFREE_CG_MUL,
- CFREE_CG_SDIV, CFREE_CG_UDIV, CFREE_CG_SREM, CFREE_CG_UREM,
- CFREE_CG_AND, CFREE_CG_OR, CFREE_CG_XOR,
- CFREE_CG_SHL, CFREE_CG_SHR_S, CFREE_CG_SHR_U,
-} CfreeCgBinOp;
-
-typedef enum CfreeCgCmpOp {
- CFREE_CG_EQ, CFREE_CG_NE,
- CFREE_CG_LT_S, CFREE_CG_LE_S, CFREE_CG_GT_S, CFREE_CG_GE_S,
- CFREE_CG_LT_U, CFREE_CG_LE_U, CFREE_CG_GT_U, CFREE_CG_GE_U,
-} CfreeCgCmpOp;
-
-void cfree_cg_binop(CfreeCg*, CfreeCgBinOp);
-void cfree_cg_cmp(CfreeCg*, CfreeCgCmpOp);
-void cfree_cg_convert(CfreeCg*, CfreeCgType* dst);
-```
-
-### Control flow
-
-Labels are numeric handles. The backend maps them to per-arch branch targets or
-SSA blocks.
-
-```c
-CfreeCgLabel cfree_cg_label_new(CfreeCg*);
-void cfree_cg_label_place(CfreeCg*, CfreeCgLabel);
-void cfree_cg_jump(CfreeCg*, CfreeCgLabel);
-void cfree_cg_branch_true(CfreeCg*, CfreeCgLabel); /* pop i1 */
-void cfree_cg_branch_false(CfreeCg*, CfreeCgLabel); /* pop i1 */
-
-/* Structured control flow (optional but recommended). Backends that don't
- consume structure directly (all real ISAs except WASM) lower to labels. */
-typedef uint32_t CfreeCgScope;
-CfreeCgScope cfree_cg_scope_begin(CfreeCg*, const CfreeCgType* result);
-void cfree_cg_scope_end(CfreeCg*, CfreeCgScope);
-void cfree_cg_break(CfreeCg*, CfreeCgScope);
-void cfree_cg_continue(CfreeCg*, CfreeCgScope);
-```
-
-### Aggregate and memory operations
-
-```c
-/* Copy `size` bytes from src_addr to dst_addr. Pops [dst, src]. */
-void cfree_cg_memcpy(CfreeCg*, uint32_t size, uint32_t align);
-
-/* Initialize `size` bytes at addr. Pops addr. */
-void cfree_cg_memset(CfreeCg*, uint8_t val, uint32_t size, uint32_t align);
-
-/* Element access for arrays and records.
- Pops base_addr, pushes addr_of_element. */
-void cfree_cg_index(CfreeCg*, uint32_t elem_size, uint32_t index);
-void cfree_cg_field_addr(CfreeCg*, uint32_t offset);
-```
-
-### Calls and returns
-
-```c
-/* `nargs` values must be on the stack (left-to-right or right-to-left
- depending on the frontend's calling convention choice). `fn_type` is the
- callee's function type; the backend uses it for ABI classification.
- The callee value itself must be the deepest value on the stack, below args. */
-void cfree_cg_call(CfreeCg*, uint32_t nargs, CfreeCgType* fn_type);
-void cfree_cg_tail_call(CfreeCg*, uint32_t nargs, CfreeCgType* fn_type);
-void cfree_cg_ret(CfreeCg*, int has_value); /* has_value=0 for void */
-```
-
-### Inline assembly (future)
-
-```c
-/* TBD: define after the toy language proves the seam. Same constraint model
- as the internal AsmConstraint, but using CfreeCgValue handles instead of
- internal SValues. */
-```
-
-### Frame slots (locals and parameters)
-
-Frontends can allocate frame slots explicitly, or let the backend infer them
-from `cfree_cg_push_local` usage. The explicit API is useful when the frontend
-wants deterministic slot IDs (e.g. for debug variable location):
-
-```c
-uint32_t cfree_cg_local_slot(CfreeCg*, CfreeCgType* type, CfreeSym name);
-uint32_t cfree_cg_param_slot(CfreeCg*, uint32_t index, CfreeCgType* type,
- CfreeSym name);
-```
-
-### Debug info hooks
-
-The toy language will skip debug info in v1, but the API surface must reserve
-room so frontends can emit DWARF later without growing the vtable.
-
-```c
-/* TBD: debug_func_begin, debug_local, debug_param, debug_line.
- For v1 the driver passes debug_info=0 and the CG skips it. */
-```
-
-## Toy language (`lang/toy/`)
-
-### Grammar (v1)
-
-```
-decl ::= fn_decl | global_decl | type_decl
-fn_decl ::= "fn" name "(" param_list ")" (":" type)? block
-param_list ::= (name ":" type ("," name ":" type)*)?
-block ::= "{" stmt* "}"
-stmt ::= let_stmt
- | assign_stmt
- | if_stmt
- | while_stmt
- | break_stmt
- | continue_stmt
- | return_stmt
- | expr_stmt
-let_stmt ::= "let" name ":" type ("=" expr)? ";"
-assign_stmt ::= lvalue "=" expr ";"
-if_stmt ::= "if" expr block ("else" block)?
-while_stmt ::= "while" expr block
-break_stmt ::= "break" ";"
-continue_stmt ::= "continue" ";"
-return_stmt ::= "return" expr? ";"
-expr_stmt ::= expr ";"
-lvalue ::= name (lvalue_r)*
-lvalue_r ::= ("[" expr "]")*
- | ("." name)*
- | (".*")*
-
-global_decl ::= "let" name ":" type "=" expr ";"
-
-type_decl ::= "type" name "=" type ";"
-type ::= "int" | "*" type | "[" number "]" type | record_type
-record_type ::= "{" field_decl ("," field_decl)* "}"
-field_decl ::= name ":" type
-
-expr ::= or_expr
-or_expr ::= and_expr ("||" and_expr)*
-and_expr ::= cmp_expr ("&&" cmp_expr)*
-cmp_expr ::= add_expr (("<" | ">" | "<=" | ">=" | "==" | "!=") add_expr)?
-add_expr ::= mul_expr (("+" | "-") mul_expr)*
-mul_expr ::= unary_expr (("*" | "/" | "%") unary_expr)*
-unary_expr ::= ("-" | "!" | "&") unary_expr | primary
-primary ::= number | string | name | lvalue | "(" expr ")"
-```
-
-### Token representation
-
-```c
-typedef enum ToyTokenKind { TOK_EOF, TOK_FN, TOK_LET, TOK_IF, TOK_INT,
- TOK_IDENT, TOK_NUMBER, TOK_STRING, ... } ToyTokenKind;
-
-typedef struct ToyToken {
- ToyTokenKind kind;
- CfreeSrcLoc loc; /* file_id, line, col */
- const uint8_t* text; /* points into source buffer */
- size_t text_len;
- int64_t int_value; /* valid for TOK_NUMBER */
-} ToyToken;
-```
-
-The lexer tracks `cur`, `end`, `bol` (beginning-of-line), and `line` so that
-every emitted token gets an accurate `CfreeSrcLoc`. The parser holds a
-`ToyLexer` as its token iterator and calls `toy_lexer_next()` to advance,
-keeping the current token in `parser->cur`.
-
-### Semantics
-
-- **One integer type**: `int` is a signed integer whose width equals the target
- pointer width (32-bit on ILP32, 64-bit on LP64). The frontend queries
- `cfree_cg_type_int(compiler, cfree_target_ptr_size(compiler)*8, 1)`.
-- **No implicit conversions**: the parser rejects `int + ptr`.
-- **Records** are value types (like C structs). Assignment copies the whole
- record. Parameter passing follows the target ABI (the backend decides
- direct/indirect/split).
-- **Arrays** are fixed-size and decay to pointers only in subscript and field
- contexts. Array assignment is not allowed.
-- **Pointers**: `ptr` is an untyped pointer (like `void*`). Dereference is
- `*expr` (sugar for `expr[0]`). Typed pointers are a future extension.
-- **Functions**: no forward declarations needed; a single-pass parser resolves
- all function names into globals. Recursion is allowed.
-
-### Frontend pipeline
-
-1. **Lex** (`lex.c`) — token iterator (`ToyLexer`) with 1-char lookahead.
- Every token carries its kind, source span (`text`/`text_len`), `CfreeSrcLoc`
- (line/col), and an `int_value` for number literals. The parser calls
- `toy_lexer_next()` to advance the iterator and inspects the current `ToyToken`.
-2. **Type check** (`type.c`) — minimal bidirectional inference:
- - `let` requires an explicit type (or an initializer from which to infer).
- - Every expression node carries a `ToyType*`.
- - Subscript and field access check bounds/field names at parse time.
-3. **Codegen** (`parse.c` → `CfreeCg`) — single-pass lowering:
- - Globals → `cfree_cg_push_global` + `cfree_cg_store`.
- - Locals → `cfree_cg_local_slot` + `cfree_cg_push_local`.
- - Records/arrays → `cfree_cg_memcpy` or `cfree_cg_memset`.
- - Control flow → scopes and `cfree_cg_branch_true` / `cfree_cg_jump`.
- - Function calls → push callee global, push args, `cfree_cg_call`.
-
-### Entry point
-
-```c
-/* lang/toy/toy.h */
-#include <cfree.h>
-
-int cfree_toy_compile(CfreeCompiler*, const CfreeCompileOptions*,
- const CfreeBytesInput* input, CfreeObjBuilder* out);
-```
-
-This matches the signature planned for `cfree_register_frontend`.
-
-## Driver registration (Phase 1.5)
-
-### New public entry
-
-```c
-/* include/cfree.h, near the CfreeLanguage enum */
-typedef int (*CfreeCompileFn)(CfreeCompiler*, const CfreeCompileOptions*,
- const CfreeBytesInput*, CfreeObjBuilder* out);
-
-/* Register a frontend for a language tag. Overwrites any prior registration
- for that tag. Returns 0 on success, 1 on OOM or bad args. */
-int cfree_register_frontend(CfreeCompiler*, CfreeLanguage, CfreeCompileFn);
-```
-
-`CfreeLanguage` grows:
-
-```c
-typedef enum CfreeLanguage {
- CFREE_LANG_C = 0,
- CFREE_LANG_ASM = 1,
- CFREE_LANG_TOY = 2, /* new */
-} CfreeLanguage;
-```
-
-### Driver changes
-
-`driver/main.c` gains a registration hook invoked at tool startup:
-
-```c
-static void driver_register_frontends(CfreeCompiler* c) {
- cfree_register_frontend(c, CFREE_LANG_ASM, internal_compile_asm);
- cfree_register_frontend(c, CFREE_LANG_C, internal_compile_c);
- cfree_register_frontend(c, CFREE_LANG_TOY, cfree_toy_compile);
-}
-```
-
-`cfree_compile_obj` in `src/api/pipeline.c` changes from a hard-coded switch
-to a table dispatch:
-
-```c
-static void compile_into(Compiler* c, const CfreeCompileOptions* opts,
- const CfreeBytesInput* input, ObjBuilder* ob) {
- CfreeCompileFn fn = compiler_get_frontend(c, input->lang);
- if (fn) {
- CfreeObjBuilder* pub_ob = ob;
- int rc = fn(c, opts, input, pub_ob);
- if (rc != 0) panic(...);
- return;
- }
- /* fallback for unknown language */
- panic(...);
-}
-```
-
-`cfree_language_for_path` learns `.toy` → `CFREE_LANG_TOY`.
-
-### Build integration
-
-The Makefile grows a `lang/` target:
-
-```makefile
-# lang/toy/Makefile
-libcfree_toy.a: toy.o lex.o parse.o type.o
- $(AR) rcs $@ $^
-
-toy.o: toy.c $(CFREE_INCLUDES)
- $(CC) $(CFLAGS) -I../../include -c $< -o $@
-```
-
-The top-level `Makefile` adds `lang/toy/libcfree_toy.a` to `LIBCFREE_OBJS` (or
-links it into `libcfree.a` if we decide frontends are part of the core library).
-Initially we keep it as a separate static archive that the driver links against.
-
-## Migration path for C (Phase 3, after toy proves the seam)
-
-Once `lang/toy/` compiles and links end-to-end, we can migrate the C frontend:
-
-1. Move `src/parse/`, `src/pp/`, `src/lex/`, `src/decl/` into `lang/c/`.
-2. Rename internal `parse_c` to `cfree_c_compile` with the `CfreeCompileFn`
- signature.
-3. Build an **internal adapter layer** that translates C `Type*` into
- `CfreeCgType*` before calling the public `CfreeCg` methods. Initially this
- adapter can live in `lang/c/cg_adapter.c`.
-4. The internal `CG` layer (`src/cg/cg.h`) is either:
- - retired and replaced by the public `CfreeCg` implementation, or
- - kept as a private fast-path for the C frontend if the adapter overhead is
- unacceptable.
-5. Assembly remains in core (`src/api/pipeline.c` or a thin `lang/asm/`
- wrapper) because it bypasses CG entirely and talks to `MCEmitter`.
-
-## Inline assembly
-
-Inline assembly stays internal to the C frontend for now. The public `CfreeCg`
-API can grow an `asm_block` method later, but it is not required for the toy
-language. When it arrives, the signature will mirror the internal
-`CGTarget.asm_block` but use `CfreeCgValue` handles and `const char*` strings
-instead of internal `Operand` / `Sym`.
-
-## Open questions / TBD
-
-1. **String interning (`Sym`)**: The public API uses `const char*` everywhere.
- The adapter must map names to internal `Sym` ids efficiently (a temporary
- hash map inside `CfreeCg` is fine for v1).
-2. **Panic boundary**: Every public `cfree_cg_*` call is a thin wrapper that
- saves/restores `c->panic` around the internal work, exactly like
- `cfree_compile_obj` today.
-3. **Optimization wrapper**: How does `opt_level` reach the public CG? The
- `CfreeCompileOptions` already carries `opt_level`; the `CfreeCg` constructor
- can wrap the underlying `CGTarget` with `opt_cgtarget_new` internally.
-4. **Debug info**: Toy v1 skips debug info. When we add it, the public API will
- expose `CfreeDebugBuilder` handles that frontends populate with
- language-neutral type and location records.
-5. **Memory ownership of `CfreeCgType`**: Types should be arena-allocated from
- the compiler's scratch arena and valid until `cfree_cg_free`. The public
- surface should not require the frontend to free them.
-6. **Variadics / atomics**: These stay in the `CfreeCg` API but are
- optional for the toy language. They are needed when C is lifted later.
-7. **ELF symbol visibility**: The toy frontend emits globals as
- `SB_GLOBAL`/`SK_OBJ` by default. No linkage modifiers in v1.
-
-## Acceptance criteria for toy v1
-
-- [ ] `include/cfree/cg.h` exists and compiles.
-- [ ] `src/api/cg.c` implements the public API over existing `CGTarget`.
-- [ ] `lang/toy/` compiles to `libcfree_toy.a` using only public headers.
-- [ ] `driver/cc.c` (or a new `driver/toyc.c`) can compile `.toy` files:
- `cfree toyc -c hello.toy -o hello.o`
-- [ ] `cfree run` can JIT a `.toy` file and execute `main()`.
-- [ ] No changes to the C frontend or internal `src/cg/cg.h` API surface.
-- [ ] Existing test suite (`make test-asm test-lex test-parse test-pp test-cg
- test-link test-elf`) passes unchanged.
diff --git a/doc/LOCALS.md b/doc/LOCALS.md
@@ -1,127 +0,0 @@
-# LocalId Design
-
-## Goal
-
-Make source locals independent from concrete frame slots.
-
-Today many frontend paths treat a local as a `FrameSlot`. That makes every
-local memory-backed even when CG is recording to `opt_cgtarget`, where
-`CGTarget.virtual_regs` already means CG can mint unbounded virtual registers
-and leave physical allocation to opt.
-
-The new model introduces `LocalId` as the source-local identity. A `LocalId` is
-the mutable lvalue for a local variable. Its storage policy is chosen once at
-allocation time from the active `CGTarget`; it does not transition later.
-
-## Storage Policy
-
-Storage policy is a function of `CGTarget.virtual_regs`:
-
-- `virtual_regs != 0`: eligible locals are virtual-register locals.
-- `virtual_regs == 0`: locals are frame-slot locals.
-
-This keeps `-O0` single-pass. Direct machine CG never needs source-local
-liveness, local register pressure handling, or control-flow join repair. It
-continues to use frame slots for locals and the existing expression vstack
-spill path for temporary register pressure.
-
-The opt path records mutable local values in virtual registers. Register
-pressure and frame placement are handled later by opt/lowering/regalloc, where
-CFG and liveness information exist.
-
-## LocalId Shape
-
-Internally, a local needs roughly:
-
-```c
-typedef enum LocalStorageKind {
- LOCAL_STORAGE_FRAME,
- LOCAL_STORAGE_VREG,
-} LocalStorageKind;
-
-typedef struct Local {
- CfreeCgTypeId type;
- CfreeSym name;
- uint32_t flags;
- LocalStorageKind storage;
- FrameSlot slot; /* valid for LOCAL_STORAGE_FRAME */
- Reg vreg; /* valid for LOCAL_STORAGE_VREG */
-} Local;
-```
-
-`LocalId` indexes this table. It is not a `FrameSlot`, and `OPK_LOCAL` should
-remain the concrete frame-memory operand kind. If a local is frame-backed,
-pushing it produces the same lvalue shape that `cfree_cg_push_local` produces
-today. If a local is virtual-register-backed, pushing it produces a mutable
-register local lvalue.
-
-## Lvalue Semantics
-
-`LocalId` is the source lvalue. The backing store decides how load, store, and
-address operations lower:
-
-- frame local load/store: operate on the frame slot.
-- virtual-register local load: read the current virtual register value.
-- virtual-register local store: copy/define the local's virtual register value.
-- address of a frame local: address the frame slot.
-- address of a virtual-register local: unsupported unless the frontend has
- already selected frame storage for that local.
-
-Virtual-register locals are mutable pseudo-locals, not SSA values at the public
-CG layer. If opt later wants SSA, it must build SSA from this mutable local
-stream. `LocalId` should not force direct CG to implement phi insertion.
-
-## Addressable Locals
-
-A local that can require an address must be frame-backed. Since `-O0` must stay
-single-pass and this design has no storage transition, the frontend/API needs a
-declaration-time way to request addressable storage.
-
-Use the existing slot-style attributes as the semantic source:
-
-- `CFREE_CG_SLOT_ADDRESS_TAKEN`: force frame storage.
-- aggregate, VLA, `alloca`-like, volatile, ABI-required memory objects, and
- compiler temporaries that need an address: force frame storage.
-- scalar locals without addressable requirements may use virtual-register
- storage when `CGTarget.virtual_regs` is set.
-
-This may be conservative. Correctness is more important than promoting every
-possible scalar. Later analysis can mark more locals register-eligible before
-creating them, but CG itself should not need to discover that mid-stream.
-
-## Public/API Direction
-
-The API should grow local handles distinct from slot handles:
-
-```c
-CfreeCgLocal cfree_cg_local(CfreeCg*, CfreeCgTypeId type,
- CfreeCgSlotAttrs attrs);
-void cfree_cg_push_local_id(CfreeCg*, CfreeCgLocal local);
-```
-
-Compatibility can keep `cfree_cg_local_slot` and `cfree_cg_push_local` as the
-explicit frame-slot path. C frontend migration should move source automatic
-variables to `LocalId`; explicit stack objects and address-required temporaries
-can keep using slots.
-
-## Non-Goals
-
-- No `REG -> FRAME` local transition in direct CG.
-- No O0 local spilling for register-backed source locals.
-- No phi or join repair in direct CG.
-- No change to `OPK_LOCAL` meaning; it remains concrete frame memory.
-- No guarantee that every scalar local becomes a virtual register. Addressable
- or otherwise memory-required locals stay frame-backed.
-
-## Implementation Order
-
-1. Add the internal `LocalId` table and public/internal handle plumbing.
-2. Route local allocation through the storage policy above.
-3. Teach push/load/store/address paths to handle frame-backed and vreg-backed
- locals.
-4. Update the C parser adapter so normal automatic variables use `LocalId`
- while explicit frame objects remain slots.
-5. Keep O0 behavior equivalent by verifying `virtual_regs == 0` still allocates
- frame-backed locals only.
-6. Add opt-path tests that scalar locals record as virtual-register locals and
- address-required locals still record frame slots.
diff --git a/doc/OPT_REGS_CALL_PLAN.md b/doc/OPT_REGS_CALL_PLAN.md
@@ -1,592 +0,0 @@
-# OPT Register And Call Constraint Plan
-
-This plan expands the O1 register-allocation contract so the optimizer can use
-nearly all target registers safely. It combines two structural changes:
-
-1. targets expose a richer physical register file instead of a small pre-filtered
- allocable pool; and
-2. calls are lowered into opt-visible fixed-register, stack-argument, and clobber
- constraints before register allocation.
-
-The goal is not to move to a full target machine IR immediately. The goal is to
-make the current O1 path honest about target constraints while keeping replay and
-backend emission intact enough to migrate one architecture at a time.
-
-## Current Status
-
-The correctness foundation for register preservation and the first planned-call
-replay path are implemented. Targets now expose descriptive physical-register
-metadata, per-call clobber masks, return-register masks, callee-save masks, and
-call plans. O1 records each call plan during `machinize`, builds its current
-hard-register tables from `CGPhysRegInfo`, uses target save/use costs in
-allocation scoring, and preserves hard-assigned live-across-call values by
-intersecting the assigned register with the planned call's clobber mask.
-Post-RA hard-register liveness uses the same call-specific clobber mask.
-
-For supported call plans, O1 now replays calls by materializing
-arguments with a local parallel-copy resolver, invoking backend stack-argument
-and branch-only call-plan hooks, and extracting non-tail returns from fixed
-return registers. Address-valued call moves cover byval/indirect arguments and
-hidden sret destination pointers. Tail calls use the same setup and planned
-branch path, with no return extraction. The x64, AArch64, and RV64 backends implement
-`store_call_arg` for outgoing stack slots and `emit_call_plan` for the call
-branch.
-
-What this closes:
-
-- the register-preservation correctness issue for values live across calls;
-- target-provided physical-register metadata as the source for O1 register
- tables;
-- call-plan construction for scalar integer/FP/direct/indirect/byval/sret-shaped
- calls in the current descriptor model;
-- conservative allocation scoring that can choose caller-saved registers when
- rewrite can preserve them, while still preferring callee-saved registers for
- call-crossing values.
-
-What remains open:
-
-- call setup/return extraction are represented by call-plan aux data rather
- than separate first-class IR ops;
-- target `get_phys_regs` tables expose broader O1 pools, and incoming
- parameter functions can now allocate ABI argument/return registers with
- opt-side constraints for sequential parameter-copy hazards;
-- direct CG still uses legacy allocation/call hooks;
-- code-shape probes remain to be added.
-
-In phase terms: Phase 1 and Phase 2 are done, Phase 3 is implemented through
-call-plan aux visibility plus planned replay for supported call shapes, Phase 4
-is implemented for register, stack, sret, tail-call, and return moves,
-Phase 5 is implemented for call setup/replay, and Phase 6 remains open.
-
-## Planned Call Replay Boundary
-
-The legacy backend `call` hook is no longer used by O1 replay. Calls that reach
-optimized replay must have a supported plan; unsupported planned shapes fail
-diagnostically instead of falling back to sequential backend lowering. Direct CG
-continues to use the legacy `call` hook while it is migrated separately.
-
-Planned replay is used only when all of the following are true:
-
-- the call has a valid `CGCallPlan`;
-- the backend provides `emit_call_plan`;
-- every stack argument destination has backend `store_call_arg` support;
-- every offset/address-valued argument source has backend `load_call_arg`
- support;
-- every offset aggregate return store has backend `store_call_ret` support;
-- every return destination is a register, local, or indirect operand.
-
-For those calls, O1 owns the setup and extraction sequence:
-
-- source operands are rewritten to hard registers or spill slots;
-- live-across-call hard registers are saved before argument setup;
-- argument moves into ABI registers and outgoing stack slots are resolved as a
- local parallel copy;
-- indirect callees that would be overwritten by argument setup are copied to a
- target-provided scratch register first;
-- the backend emits only required call metadata and the branch through
- `emit_call_plan`;
-- non-tail return registers are copied or stored into their planned destinations;
-- tail calls stop after the planned branch and have no return extraction.
-
-The legacy `call` path is still required for:
-
-- **direct CG**: direct codegen still uses the old backend allocation and call
- hooks while O1 migrates first.
-
-This boundary lets Phase 3/4 tests exercise register argument permutation,
-outgoing stack arguments, sret hidden pointers, indirect-callee clobber hazards,
-call-specific clobber preservation, and return extraction without broadening
-the register file across legacy call lowering.
-
-## Current Problem
-
-The current `CGTarget` contract exposes:
-
-- `get_allocable_regs`;
-- `get_scratch_regs`;
-- `is_caller_saved`;
-- `plan_hard_regs` / `reserve_hard_regs`;
-- `call_stack_size`.
-
-That contract is too coarse for optimizer-driven allocation. A target has to hide
-registers that are perfectly usable in most instructions because they are unsafe
-for some call-lowering or helper-lowering cases.
-
-Examples:
-
-- ABI argument and return registers are useful for short-lived values, but current
- call emitters copy arguments sequentially into those same registers.
-- scratch registers are hidden globally even when only a small subset of target
- operations need them.
-- callee-saved registers are cheap for values live across calls but expensive for
- one-use temporaries in leaf or tiny functions.
-- the allocator can avoid caller-saved registers for call-crossing values, but it
- has no target-provided save/restore cost or call-specific clobber masks.
-
-The result is conservative and correct, but it forces unnecessary prologue and
-epilogue traffic in small O1 functions.
-
-## Design Goals
-
-- Keep O1 fast and range-based.
-- Let each target expose all general allocatable physical registers, excluding
- only permanently reserved registers such as stack pointer, frame pointer when
- fixed, zero registers, platform registers, and architectural non-registers.
-- Make ABI argument, return, and call-clobber effects explicit before liveness and
- allocation.
-- Make call argument moves parallel rather than sequential.
-- Preserve the existing backend ownership of final prologue, epilogue, frame
- layout, and machine-code emission during this migration.
-- Avoid target-specific register knowledge in opt beyond data supplied by the
- target.
-- Keep direct CG usable while opt grows the richer contract.
-
-Non-goals for this plan:
-
-- full machine IR;
-- global coalescing;
-- live-range splitting;
-- instruction scheduling;
-- target-specific peephole rewrites beyond the call boundary.
-
-## New Target Register Contract
-
-Add a register-file description that replaces the allocation-policy meaning of
-`get_allocable_regs`. The old hook can remain as a compatibility wrapper during
-migration.
-
-```c
-typedef enum CGPhysRegFlag {
- CG_REG_ALLOCABLE = 1u << 0,
- CG_REG_CALLER_SAVED = 1u << 1,
- CG_REG_CALLEE_SAVED = 1u << 2,
- CG_REG_ARG = 1u << 3,
- CG_REG_RET = 1u << 4,
- CG_REG_TEMP_PREFERRED = 1u << 5,
- CG_REG_PLATFORM = 1u << 6,
- CG_REG_RESERVED = 1u << 7,
-} CGPhysRegFlag;
-
-typedef struct CGPhysRegInfo {
- Reg reg;
- u8 cls; /* RegClass */
- u8 abi_index; /* arg/ret order when applicable, otherwise 0xff */
- u16 flags; /* CGPhysRegFlag */
- u16 save_cost; /* relative prologue/epilogue cost if callee-saved */
- u16 use_cost; /* relative preference cost for ordinary allocation */
-} CGPhysRegInfo;
-```
-
-New target hooks:
-
-```c
-void (*get_phys_regs)(CGTarget*, RegClass, const CGPhysRegInfo** out,
- u32* nregs);
-u32 (*call_clobber_mask)(CGTarget*, const CGCallDesc*, RegClass);
-u32 (*return_reg_mask)(CGTarget*, const ABIFuncInfo*, RegClass);
-u32 (*callee_save_mask)(CGTarget*, RegClass);
-```
-
-The exact masks may need to grow beyond `u32` if future architectures expose
-larger register files, but `u32` matches the current register numbering model and
-keeps this step consistent with existing code.
-
-Target policy:
-
-- AArch64 should expose normal integer allocation candidates from `x0-x28`,
- excluding `sp`, `x29`, `x30`, and platform-reserved registers as needed. `x16`
- and `x17` can be marked temp-preferred or reserved until helper scratch
- clobbers are modeled.
-- AArch64 FP should expose `v0-v31`, reserving only registers that target helper
- expansion still requires globally.
-- x64 should expose caller-saved and callee-saved GPRs except fixed `rsp/rbp` and
- any helper-reserved registers still hidden during migration. It should expose
- XMM registers with SysV all-caller-saved metadata.
-- RV64 should expose `a*`, `t*`, `s*`, and `f*` equivalents, excluding `sp`,
- fixed `s0` when used as frame pointer, `ra` unless explicitly modeled, `gp`,
- `tp`, and zero.
-
-## Opt Register Policy
-
-`opt_machinize` should build per-class register tables from `CGPhysRegInfo`:
-
-- physical register list;
-- caller-saved mask;
-- callee-saved mask;
-- reserved mask;
-- argument mask;
-- return mask;
-- save/use costs.
-
-The O1 allocator should keep its interval assignment model, but candidate
-register scoring should change from pure target order to a target-informed cost:
-
-```text
-base use cost
-+ callee-save open cost if this function has not already used that reg
-+ caller-save crossing cost if value is live across calls
-+ fixed/tied penalty rules
-+ spill/reload alternative cost
-```
-
-Hard requirements:
-
-- values live across a call may use caller-saved registers only if rewrite can
- preserve them at that call;
-- non-call-crossing values should generally prefer caller-saved registers to
- avoid function-wide callee-save traffic;
-- once a callee-saved register is already used in the function, later allocations
- may treat its save cost as already paid;
-- tied/fixed registers from ABI lowering and inline asm remain mandatory.
-
-This can land without a global coalescer. It gives the current allocator enough
-information to make better choices while preserving its O1 compile-time shape.
-
-## Opt-Visible Call Plan
-
-Add a target hook that converts a `CGCallDesc` into a call plan before liveness
-and allocation:
-
-```c
-typedef enum CGCallPlanLocKind {
- CG_CALL_PLAN_REG,
- CG_CALL_PLAN_STACK,
- CG_CALL_PLAN_IGNORE,
-} CGCallPlanLocKind;
-
-typedef enum CGCallPlanSrcKind {
- CG_CALL_PLAN_SRC_VALUE,
- CG_CALL_PLAN_SRC_ADDR,
-} CGCallPlanSrcKind;
-
-typedef struct CGCallPlanMove {
- Operand src; /* virtual value, local, indirect, imm, or global */
- u8 dst_kind; /* CGCallPlanLocKind */
- u8 src_kind; /* CGCallPlanSrcKind: value vs address materialization */
- u8 cls; /* RegClass for register destinations */
- Reg dst_reg; /* valid for CG_CALL_PLAN_REG */
- u32 src_offset; /* byte offset within aggregate source */
- u32 stack_offset; /* valid for CG_CALL_PLAN_STACK */
- MemAccess mem; /* width/sign for loads/stores */
-} CGCallPlanMove;
-
-typedef struct CGCallPlanRet {
- Operand dst; /* virtual destination in current IR */
- u8 cls;
- Reg src_reg;
- u32 dst_offset; /* byte offset within aggregate destination */
- MemAccess mem;
-} CGCallPlanRet;
-
-typedef struct CGCallPlan {
- CGCallPlanMove* args;
- u32 nargs;
- CGCallPlanRet* rets;
- u32 nrets;
- Operand callee;
- u32 clobber_mask[OPT_REG_CLASSES];
- u32 return_mask[OPT_REG_CLASSES];
- u32 stack_arg_size;
- u8 variadic_fp_count;
- u8 is_variadic;
- u8 has_sret;
-} CGCallPlan;
-```
-
-Target hook:
-
-```c
-void (*plan_call)(CGTarget*, const CGCallDesc*, CGCallPlan* out);
-```
-
-The target remains the authority for ABI classification and stack layout. Opt
-becomes the authority for scheduling the moves and preserving live values around
-the call.
-
-Lowering shape:
-
-```text
-CALL_SETUP_BEGIN
-parallel copies: virtual/local/imm -> ABI arg regs or outgoing stack slots
-CALL target, implicit uses arg regs, implicit defs return regs,
- implicit clobbers call clobber mask
-parallel copies: ABI return regs -> virtual/local destinations
-CALL_SETUP_END
-```
-
-This can be represented either as new IR ops or as expanded existing `IR_COPY`,
-`IR_STORE`, and `IR_CALL` ops with call aux data carrying implicit masks. The new
-IR-op route is clearer and easier to test.
-
-## Parallel Move Resolver
-
-Call argument setup must not use the current sequential backend copy model once
-ABI registers become allocable.
-
-Add a generic opt pass for parallel copies:
-
-- inputs are `(src operand, dst operand)` pairs;
-- destinations may be physical registers or stack argument slots;
-- sources may be virtual/hard registers, locals, indirect operands, immediates,
- or globals;
-- cycles are broken with a target-provided temporary register or spill slot;
-- memory-to-memory copies route through a temporary;
-- stack stores are ordered after any register loads that depend on stack source
- addresses they could overwrite.
-
-For O1, this resolver can be local to call setup and return extraction. It does
-not need to become a general coalescing pass in the first implementation.
-
-## Rewrite And Preservation
-
-The current rewrite inserts stores/loads for hard-assigned caller-saved values
-known to be live across calls. With call plans, this should become:
-
-- for each call, compute values live across the call;
-- intersect their assigned hard registers with the call plan's clobber mask;
-- exclude values defined by the call return;
-- emit save/restore only for that call-specific intersection.
-
-This keeps preservation precise for:
-
-- direct calls;
-- indirect calls;
-- varargs calls;
-- target-specific helper calls if they use a different clobber mask later.
-
-The allocator should still use live-across-call frequency, but correctness should
-come from per-call clobber masks in rewrite.
-
-## Backend Emission Changes
-
-Backends should gain emission hooks for an already-planned call:
-
-```c
-void (*load_call_arg)(CGTarget*, Operand dst, const CGCallPlanMove*);
-void (*store_call_arg)(CGTarget*, const CGCallPlanMove*);
-void (*store_call_ret)(CGTarget*, const CGCallPlanRet*, Operand src);
-void (*emit_call_plan)(CGTarget*, const CGCallPlan*);
-```
-
-For the current transition, these hooks assume register arguments have already
-been materialized by opt and stack arguments are written one planned move at a
-time through `store_call_arg`. `load_call_arg` and `store_call_ret` are the
-offset-aware load/store hooks for aggregate parts and address-valued moves.
-`emit_call_plan` only emits:
-
-- required varargs metadata such as x64 `AL`;
-- direct or indirect call branch;
-- target-specific call relocation;
-- no sequential argument copies;
-- no return copies.
-
-Direct CG can keep using the existing `call` hook until it is migrated or until a
-thin wrapper builds and emits a call plan internally.
-
-## Migration Phases
-
-### Phase 1 - Register Description Without Behavior Change
-
-Status: done. `CGPhysRegInfo` and `get_phys_regs` exist, x64/AArch64/RV64
-provide current-pool metadata, and `opt_machinize` consumes it with legacy
-fallbacks. Focused opt tests cover metadata consumption.
-
-- Add `CGPhysRegInfo` and `get_phys_regs`.
-- Implement it for x64, AArch64, and RV64 using the current exposed pools first.
-- Build opt's current hard-reg tables from the richer description.
-- Keep `get_allocable_regs`, `get_scratch_regs`, and `is_caller_saved` as
- wrappers.
-- Add tests that inspect target register metadata for each architecture.
-
-Expected result: no codegen behavior change.
-
-### Phase 2 - Call Plan Construction
-
-Status: done. `CGCallPlan`, `plan_call`, call clobber masks, return masks, and
-callee-save masks exist for the three native backends. O1 attaches plans during
-`machinize`, and the opt tests cover plan attachment plus downstream planned
-replay/fallback behavior.
-
-- Add `CGCallPlan` and `plan_call`.
-- Implement call planning for simple direct scalar integer and FP args/returns on
- all three architectures.
-- Keep backend `call` emission unchanged.
-- Add dump tests that verify planned arg regs, return regs, clobber masks, and
- outgoing stack size.
-
-Expected result: opt can see call constraints, but does not allocate differently
-yet.
-
-### Phase 3 - Opt IR Call Constraints
-
-Status: implemented for the current aux-data representation. Calls carry plan
-aux data before liveness/allocation. Liveness, rewrite, hard-register DCE, and
-hard-register liveness inspect plan operands for supported planned calls, while
-rewrite uses the call-specific clobber mask to save live-across-call hard values
-before argument setup. The implementation keeps explicit setup/call/return-copy
-IR ops as a possible later cleanup rather than a prerequisite.
-
-- done: attach plan aux data to `IR_CALL` during `machinize`;
-- done: teach liveness/range building to use planned source and destination
- operands when planned replay is enabled;
-- done: model call clobbers through the call-specific plan mask;
-- done: keep the legacy `call` path behind a fallback for unsupported call-plan
- shapes;
-- still optional: split setup/call/return extraction into separate IR ops if the
- aux-data representation becomes too opaque for later passes.
-
-Expected result: correctness coverage for arg-register hazards before the
-allocator starts using those registers widely.
-
-### Phase 4 - Parallel Copy Resolver
-
-Status: implemented. O1 replay uses a local
-parallel-copy resolver for planned call setup and return extraction, including
-register-register cycles, local/indirect loads, address-valued moves,
-immediates, globals, register and outgoing stack destinations, local/indirect
-return destinations, and indirect callees that occupy a destination argument
-register. Tail-call plans use the same setup and planned branch path, then skip
-return extraction.
-
-- done: implement local parallel move resolution for register call setup and
- return extraction;
-- done: support register-register cycles, local/indirect loads,
- address-valued moves, immediates, globals, outgoing stack stores, and
- local/indirect return stores;
-- done: use target-provided scratch registers to break cycles and preserve
- indirect callees;
-- done: add red-green tests for argument permutation cycles, indirect callees in
- argument registers, stack-argument replay, and address-valued args;
-- done: support `CG_CALL_PLAN_STACK` materialization directly in opt;
-- done: add return-register collision, stack-source hazard, and tail-call replay
- tests.
-
-Expected result: ABI arg and return registers can be made allocable safely.
-
-### Phase 5 - Broaden Register Exposure
-
-Status: implemented for call setup and incoming scalar parameter setup. O1 has
-target-informed scoring and per-call preservation, and the native target
-phys-reg tables now expose broader O1 pools. Known backend helper scratch
-registers remain hidden. ABI arg/return registers are available to O1. Incoming
-parameter functions keep those
-registers allocable, with opt forbidding earlier parameter values from being
-assigned to later incoming ABI registers that the backend still copies
-sequentially.
-
-- done: expand target `get_phys_regs` tables with guarded caller-saved and ABI
- registers for x64, AArch64, and RV64;
-- done: update opt scoring to prefer caller-saved regs for non-call-crossing
- values and callee-saved regs for call-crossing values;
-- done: keep known backend helper scratch registers reserved until their
- clobbers are expressed;
-- done: remove call-driven ABI-reg suppression for stack and sret call plans;
-- done: remove incoming-parameter ABI-reg suppression by modeling parameter
- incoming-register clobber hazards in opt allocation constraints;
-- done: remove the legacy tail-call fallback ABI-reg suppression by replaying
- tail-call setup through call plans;
-- Add code-shape tests for direct-call tiny functions and unused-param functions
- across x64, AArch64, and RV64.
-
-Expected result: fewer callee-save prologue/epilogue pairs without sacrificing
-call correctness.
-
-### Phase 6 - Remove Legacy Pool Semantics
-
-Status: open. Legacy `get_allocable_regs`, `get_scratch_regs`,
-`is_caller_saved`, and `call` remain active for direct CG and fallback replay.
-
-- Convert direct CG to either use `CGPhysRegInfo` or build call plans internally.
-- Remove allocation-policy dependence on `get_allocable_regs`.
-- Restrict `get_scratch_regs` to legacy direct-CG fallback, then remove it once
- backend helper clobbers are modeled.
-- Make `reserve_hard_regs` consume actual replay-visible hard registers as it
- does today, but derive preservation decisions from the richer register metadata.
-
-Expected result: one target register contract serves direct CG, opt, and future
-O2 allocation.
-
-## Test Plan
-
-Focused unit tests:
-
-- done: opt-side target register metadata consumption;
-- done: caller-saved live-across-call preservation using per-call masks;
-- done: planned-call replay through `emit_call_plan` for register-argument
- cycles, stack arguments, address-valued args, sret-shaped plans,
- return-register collisions, stack-argument source hazards, and
- indirect-callee/argument-register hazards;
-- still needed: target register metadata tests per real architecture;
-- done: broader real-architecture call-plan layout for scalar, FP, mixed,
- sret, variadic, and stack-arg calls;
-- still needed: direct call-clobber mask tests per real architecture;
-- still needed: code-shape probes after ABI registers are exposed broadly;
-- still needed: callee-save reservation/code-shape tests after broadened
- allocation.
-
-Code-shape probes:
-
-- `int f(int x) { return 42; }`;
-- `static int callee(int x) { return x + 1; }`
- plus `int caller(int x) { return callee(x) + 2; }`;
-- multiple non-call-crossing locals under pressure;
-- one value live across a call plus several short-lived call-local values;
-- FP argument and return variants.
-
-Targeted runs:
-
-```sh
-make test-opt
-make test-cg-api
-make test-toy
-make test-aa64-inline
-make test-smoke-x64
-make test-smoke-rv64
-```
-
-## Risks And Open Questions
-
-- The current call emitters still contain target-specific scratch assumptions.
- Those assumptions must either become call-plan constraints or stay reserved
- until later.
-- x64 has implicit call metadata for variadic calls (`AL`) and helper scratch use
- around memory copies; both need explicit representation.
-- AArch64 `x16/x17` and platform register policy differs by OS and relocation
- model. The register metadata must be target-OS aware.
-- RV64 `ra`, `gp`, `tp`, `s0`, and zero should remain reserved unless the backend
- grows explicit support for them.
-- Stack argument stores can alias frame or outgoing areas in awkward cases. The
- call-plan stack area should remain target-owned, with opt only scheduling the
- materialization.
-- Debug info and unwind data should continue to be backend-owned. Opt only tells
- the backend which hard registers are actually live in emitted code.
-
-## Recommended First Patch Stack
-
-Completed:
-
-1. Add `CGPhysRegInfo` plus current-pool metadata for all three targets.
-2. Teach `opt_machinize` to consume the new metadata.
-3. Add `CGCallPlan` and plan calls without using it for emission.
-4. Use call-plan clobber masks for rewrite and post-RA hard-register liveness.
-5. Replay call plans in opt, including ABI register setup, outgoing
- stack arguments, address-valued byval/indirect/sret moves, and return
- extraction.
-6. Remove call-driven ABI-reg suppression for stack-argument and sret-shaped
- calls.
-7. Add call-plan layout/dump tests for real x64/AArch64/RV64 scalar, FP, mixed,
- sret, variadic, and stack-arg cases.
-8. Add red-green hazard tests for return-register collisions and stack-argument
- sources.
-9. Remove incoming-parameter ABI-reg suppression with opt-side constraints for
- incoming parameter copy hazards.
-10. Replay tail-call plans in opt and remove O1's legacy backend `call`
- fallback.
-
-Next patch stack:
-
-1. Migrate direct CG or wrap it with internal call planning, then remove legacy
- pool semantics.
-
-This order keeps each step testable and avoids mixing API migration, allocation
-policy, and call move correctness in one change.
diff --git a/doc/RT_CFREERT_CHECKLIST.md b/doc/RT_CFREERT_CHECKLIST.md
@@ -1,113 +0,0 @@
-# Building libcfree_rt.a With cfree
-
-Goal: build the runtime archive with cfree's own `cc`, `as`, and `ar` instead
-of clang/llvm-ar. This is separate from stage-2 self-hosting of the main
-compiler binary.
-
-Current focused probe: `aarch64-apple-darwin`. This variant is useful because
-it covers LP64, int128 declarations, coroutine assembly, Mach-O symbol names,
-and the freestanding runtime headers without requiring a system SDK.
-
-## Probe Command
-
-```sh
-make bin
-rm -rf build/rt/aarch64-apple-darwin
-env CC="$PWD/build/cfree cc" \
- RT_AS="$PWD/build/cfree as" \
- RT_AR="$PWD/build/cfree ar" \
- make -e -k rt-aarch64-apple-darwin \
- RT_COMMON_CFLAGS= \
- RT_AS_COMPILE_FLAGS=
-```
-
-`RT_COMMON_CFLAGS=` drops flags cfree does not accept yet:
-`-ffreestanding -fno-builtin -std=c11 -Wpedantic -Wall -Wextra -Werror`.
-`RT_AS_COMPILE_FLAGS=` drops clang's `-c`, which `cfree as` does not use.
-
-## Current Status
-
-As of the latest probe, cfree builds or assembles:
-
-- [x] `rt/lib/int/int.c`
-- [x] `rt/lib/fp/fp.c`
-- [x] `rt/lib/mem/mem.c`
-- [x] `rt/lib/atomic/atomic_freestanding.c`
-- [x] `rt/lib/cfree/ifunc_init.c`
-- [x] `rt/lib/int64/int64.c`
-- [x] `rt/lib/coro/aarch64.c`
-- [x] `rt/lib/coro/coro.c`
-- [x] `rt/lib/coro/aarch64_macho.s`
-
-That is 9 / 9 compile or assemble steps green for this variant, and
-`cfree ar` produces `build/rt/aarch64-apple-darwin/libcfree_rt.a`.
-
-## Completed
-
-- [x] Split AArch64 file-scope coroutine assembly out of
- `rt/lib/coro/aarch64.c`.
-- [x] Added standalone AArch64 coroutine assembly sources:
- `rt/lib/coro/aarch64_elf.s` and `rt/lib/coro/aarch64_macho.s`.
-- [x] Added runtime Makefile support for assembling `.s` / `.S` sources via
- `RT_AS`, so cfree can use `cfree as` while the default clang build can keep
- using `clang -c`.
-- [x] Switched the runtime target flag from `--target=TRIPLE` to
- `-target TRIPLE`, accepted by both clang and cfree.
-- [x] Wired `__builtin_ctzl` and `__builtin_ctzll` through the existing
- `INTRIN_CTZ` path. `rt/lib/int64/int64.c` now compiles in the focused
- aarch64-apple-darwin cfree probe.
-- [x] Converted `__atomic_*_n` value operands to the atomic object type before
- lowering. This clears the pointer-sized literal mismatch in
- `__atomic_store_n((uintptr_t*)l, 0, __ATOMIC_RELEASE)` and the same class for
- RMW / compare-exchange desired operands.
-- [x] Added target-aware folding for `__atomic_always_lock_free(size, ptr)` and
- constant-size `__atomic_is_lock_free(size, ptr)`. The parser asks
- `cfree_cg_atomic_is_lock_free` for representative scalar types, so results
- follow the active target instead of an aarch64-only table.
-- [x] Exposed a general `cfree_cg_top_const_int` query for compile-time-known
- integer-like values on the CG value stack.
-- [x] Used that query in parser `if`, `&&`, and `||` lowering. Dead arms are
- still parsed for semantic/type-stack effects, but target code emission is
- suppressed, so constant-false 16-byte atomic fast paths no longer trip the
- 8-byte lock-free backend limit.
-- [x] Lowered `__builtin_memcpy`, `__builtin_memmove`, `__builtin_memset`, and
- `__builtin_memcmp` as builtins even when their byte count is runtime-sized.
- The parser now synthesizes the standard libc call directly, instead of
- rewriting the token to an undeclared plain identifier.
-- [x] Added parser support for `__atomic_fetch_nand`, mapped through the
- existing target-independent atomic NAND RMW operation.
-- [x] Fixed the `rt/lib/fp/fp.c` preprocessor crash. Function-like macro
- argument prescan now preserves raw-token hidesets, so self-referential
- suffix-renaming macros such as `rep_t -> _FP_NAME(rep_t)` do not recurse
- until stack exhaustion.
-- [x] Added parser support for `__builtin_isnan`, lowered as a single-evaluation
- floating self-compare.
-- [x] Routed C floating comparisons through FP comparison lowering for aarch64,
- x86-64, and riscv64 instead of integer compare paths.
-- [x] Accepted null pointer constants such as `((void*)0)` in static pointer
- initializers. This clears the `_Thread_local coro_t* __cfree_current = NULL`
- initializer in `rt/lib/coro/coro.c`.
-
-## Remaining Blockers
-
-- [ ] Lift remaining coroutine file-scope assembly.
- Current file-scope asm remains in:
- `rt/lib/coro/x86_64.c`, `rt/lib/coro/x86_64_win.c`,
- `rt/lib/coro/i386.c`, `rt/lib/coro/riscv64.c`,
- `rt/lib/coro/riscv32.c`, `rt/lib/coro/arm32.c`, and
- `rt/lib/coro/arm32_thumb1.c`.
-
-- [ ] Decide whether cfree should eventually accept the standard runtime C
- flags, or whether the runtime Makefile should grow a first-class
- cfree-toolchain mode that drops/translates them.
-
-- [ ] Run the same cfree-toolchain probe for the other default runtime
- variants after `aarch64-apple-darwin` archives cleanly.
-
-## Notes
-
-The ordinary clang `rt-aarch64-apple-darwin` target currently also stops in
-`atomic_freestanding.c` because clang treats several `__atomic_*` library
-entry points as builtins. That is separate from the cfree bootstrap path, but
-it means the default target is not a clean regression signal for the assembly
-split until the atomic source/build flags are addressed.
diff --git a/doc/RV64_PARITY_CHECKLIST.md b/doc/RV64_PARITY_CHECKLIST.md
@@ -1,252 +0,0 @@
-# rv64 parity checklist
-
-Goal: bring `riscv64` / `rv64` to the same practical coverage as `aarch64`
-across standalone asm, disasm, C/toy compilation, object/link output, runtime,
-debug tooling, and executable test paths.
-
-This checklist tracks parity with the aa64 lane, not architectural feature
-completeness for all RISC-V extensions. The baseline target is RV64GC Linux
-ELF with the psABI double-float ABI unless a task says otherwise.
-
-## Asm / disasm
-
-- [x] Wire rv64 into `arch_disasm_new` through `src/arch/rv64/disasm.{h,c}`.
-- [x] Add rv64 `test/asm` smoke coverage for text decode, object listing, hex
- encode, and podman-backed ELF execution.
-- [x] Add arch-scoped asm fixture applicability (`*.targets`) so aa64/x64/rv64
- cases do not fail on unrelated targets.
-- [x] Replace the current hand-written rv64 disassembler with an ISA descriptor
- layer equivalent in role to `src/arch/aa64/isa.{h,c}` so encoding,
- decoding, and printing share one description.
-- [x] Expand standalone rv64 asm parsing beyond the current small subset:
- branches, calls, arithmetic, shifts, compares, loads/stores, AUIPC/LUI,
- relocation-bearing operands, atomics, fences, CSR/system forms, scalar
- FP, and backend-emitted forms.
-- [x] Expand rv64 disasm to decode every instruction emitted by rv64 codegen and
- accepted by standalone asm, including unknown/truncated handling that
- matches the public iterator contract.
-- [x] Add relocation/symbol annotation coverage for rv64 object disassembly.
-- [x] Update `test/asm/regen.sh` or add an rv64 variant for clang/objdump golden
- regeneration.
-- [ ] Make asm round-trip (`S`) meaningful for rv64 codegen output and gate the
- rv64-emitted corpus on it. (Encode/decode tables cover the full RV64GC
- surface; an explicit round-trip gate over codegen output still TODO.)
-
-## Register API / target surface
-
-- [x] Add rv64 public register-name/index support for psABI names plus `xN` and
- `fN` aliases.
-- [x] Audit all register naming users (`dbg`, asm constraints, disasm printers)
- for consistent DWARF numbering: `x0..x31` as 0..31 and `f0..f31` as
- 32..63.
-- [x] Verify predefined macros, driver triple parsing, target defaults, and
- `cfree_test_target` setup against clang's `riscv64-linux-gnu` behavior.
-- [x] Decide policy for optional extensions (`C`, `A`, `F`, `D`, `Zicsr`,
- `Zifencei`, future vector) and reflect it in target feature queries.
- (Locked: RV64I/M/F/D/A/C + Zicsr-minimal; macros mirror clang.)
-
-## Inline asm
-
-- [x] Implement rv64 inline-asm template rendering parallel to aa64:
- placeholders, symbolic operands, memory operands, width/addr modifiers,
- escaped percent, and statement splitting.
-- [x] Add rv64 constraint support for integer, FP, immediate, memory, matching,
- early-clobber, and read-write operands.
- (Integer constraints + memory + matching done; FP-`"f"`, `"K"`/`"L"`/`"J"`
- immediates, and named-reg `"={a0}"` deferred — require src/cg/ extension.)
-- [x] Verify clobbers, `"memory"`, callee-saved preservation, named registers,
- and fixed-register conflicts on rv64.
-- [x] Add an rv64 inline-asm unit test parallel to
- `test/arch/aa64_inline_test.c`.
-- [x] Add C and toy inline-asm execution cases that run through podman/qemu rv64.
-
-## C / toy codegen
-
-- [x] Prove a targeted rv64 C parse path can compile, link, and execute through
- podman path E.
-- [x] Run and triage the full C parse corpus for rv64 at `-O0`, `-O1`, and
- `-O2`; track failures by missing backend feature rather than broad skips.
- (O0+O1: 1828/0/1830. O2 single-threaded passes; the parallel-runner
- SIGILL flakes are harness infra, not codegen.)
-- [x] Run and triage toy cross-arch path `X` for rv64 alongside aa64 cases.
- (491/0/0 after fixing the INTRA_AUIPC_ADDI width guard.)
-- [x] Match aa64 coverage for scalar integer, pointer, aggregate, varargs,
- atomics, intrinsics, labels, computed goto, switch lowering, tail calls,
- alloca, and dynamic stack adjustment.
-- [x] Close remaining explicit rv64 backend panics in `src/arch/rv64/ops.c`,
- `alloc.c`, and `emit.c`.
- (FP-cmp branching, BITCAST same-class, large fp_pair_off, label-fixup
- width guard. asm_block closed via inline-asm template walker.)
-- [x] Verify optimized rv64 lowering after recent opt pipeline work: liveness,
- register allocation, hard-register constraints, call plans, and spill
- reloads. (Implicitly verified by O1 corpus 1804/0 + toy O0/O1/O2 491/0.)
-- [x] Add targeted rv64 cases for large frames, far branches, far label-address
- materialization, large immediates, and pcrel/GOT materialization.
-- [x] Add targeted rv64 FP conversion, comparison, NaN, and rounding cases.
-- [x] Add targeted rv64 atomic cases for all supported widths and memory orders.
-
-## ABI / platform
-
-- [x] Finish psABI edge-case coverage: aggregate classification, indirect args,
- mixed int/FP aggregates, homogeneous FP shapes where applicable, sret,
- byval, empty/zero-sized fields, and mixed returns.
-- [x] Verify variadic functions: register save area layout, `va_list` shape,
- stack argument traversal, and mixed int/FP varargs.
-- [x] Verify stack alignment, frame pointer conventions, callee-saved integer
- registers `s0..s11`, and callee-saved FP registers `fs0..fs11`.
-- [x] Decide `long double` policy for rv64 (`quad` vs compatibility mode) and
- align C frontend, ABI lowering, libc harnesses, and runtime helpers.
- (Locked to `double`; LDBL128=0 in driver/runtime.c + rt/Makefile.)
-- [x] Audit TLS models for rv64: local-exec, GOT/TLS relocations, static link,
- dynamic link, and emulator/JIT behavior.
- (LE + IE codegen and reloc kinds wired; GD / TLS-Descriptor and the
- linker IE→LE relaxation are deferred — no failing test depends on them.)
-
-## Object / link / driver
-
-- [x] Keep rv64 ELF roundtrip link corpus green for path R.
-- [x] Fix `cfree objdump -d` to choose the disassembler target from the object
- file rather than the host target.
-- [x] Run rv64 link path E broadly under podman and triage execution failures.
- (parse E: 1830 cases; toy X: 491 cases; all green.)
-- [x] Ensure ELF rv64 relocations cover all codegen, asm, TLS, PLT/GOT, ifunc,
- linker-script, archive, and GC cases currently passing for aa64.
- (33 R_RV_* relocs mapped + applied; TLS_GOT_HI20 added Wave 2B. ifunc
- and linker-script details still to verify under load.)
-- [x] Implement or explicitly reject any unsupported rv64 relocation kinds with
- diagnostics that name the relocation and input object.
- (`compiler_panic` at src/link/link_reloc.c:489 names the reloc kind.)
-- [x] Exercise `cfree as`, `cc`, `ld`, `ar`, `objdump`, `strip`, and `objcopy`
- paths with rv64-specific command tests where the tool claims rv64 support.
-- [x] Verify dynamic-linker defaults for musl and glibc rv64 Linux.
- (musl: /lib/ld-musl-riscv64.so.1; glibc: /lib/ld-linux-riscv64-lp64d.so.1.)
-- [x] Add rv64 `objdump` golden tests for sections, symbols, relocs, and
- disassembly annotations.
-
-## Runtime / libc
-
-- [x] Build `libcfree_rt.a` for `riscv64-linux` through cfree, not only host
- clang probes.
-- [x] Bring rv64 coroutine/runtime support through the cfree assembler/compiler
- path. (rt/lib/coro/riscv64.c built via `$(BIN) cc` per rt/Makefile.)
-- [x] Run `test-rt-runtime` with rv64 enabled and triage every runtime helper
- failure. (5/5 cases pass: coro, freestanding_lib, setjmp, stdarg, stdatomic.)
-- [x] Retarget musl and glibc libc harnesses to rv64 sysroots and run the same
- cases currently exercised for aa64. (test-musl-rv64: 9/9 static, 9/9
- dynamic. test-glibc-rv64: 8/9 — the single anomaly is a flaky SIGKILL
- under concurrent load, not a code regression.)
-- [x] Add rv64 smoke cases that use cfree-emitted bytes for startup/runtime
- paths, not only clang-produced harness binaries.
-- [x] Verify compiler-rt-style integer, FP, memory, atomic, and coroutine
- helpers for rv64 ABI correctness.
-
-## Debug / DWARF / JIT
-
-- [x] Add rv64 debugger breakpoint support (`ebreak`) and displaced-step logic.
-- [x] Add rv64 ucontext/register marshalling for supported host OSes.
-- [x] Emit and validate rv64 DWARF CFI/line-info details, including CFA rules,
- frame-pointer conventions, return-address register `ra`, and FP register
- numbering. (Real .eh_frame producer; CFA=s0+frame_size-fp_pair_off;
- ra=x1; s0..s11 + fs0..fs11 callee-saves recorded.)
-- [x] Extend DWARF tests with rv64 producer roundtrips where instruction size
- and register numbering differ from aa64. (test/debug/cfi_unit.c.)
-- [x] Fill rv64 JIT support gaps: executable memory, relocations, symbol calls,
- TLS/TLV behavior, and native-host execution tests where available.
- (link_jit.c handles R_RV_TPREL_HI20/LO12_I/S as TLSLE and resolves
- R_RV_PCREL_LO12_I/S against the paired AUIPC's runtime displacement;
- execmem.flush_icache emits fence.i + __builtin___clear_cache on
- __riscv; test/link/rv64_jit_test.c JIT-loads a tiny rv64 image and
- SKIPs the native call on non-rv64 hosts. TLV thunk is Mach-O-only
- and stays aa64; rv64 uses local-exec TLS via the TPREL path.)
-- [x] Decide debugger scope for non-native rv64 execution; either support it
- through emulation or mark it explicitly out of parity.
- (Linux/riscv64 native only; macOS/BSD rejected via #error.)
-
-## Emulator
-
-- [x] Audit rv64 ELF loader behavior against aa64: program headers, auxv,
- stack setup, argv/envp, TLS, brk/mmap, and dynamic loader handoff.
- (static-linked; dynamic loader deferred)
-- [x] Expand rv64 decode/lift coverage to match all instructions produced by
- cfree rv64 codegen and clang-built harnesses. (decode RV64IMFDA done;
- JIT lift deferred — interpreter is functional)
-- [x] Add rv64 syscall coverage for libc and smoke workloads.
- (minimum set: exit/exit_group/write/read/close/fstat/brk/mmap)
-- [x] Add emulator regression tests for rv64 branches, calls, atomics, FP, TLS,
- and signals/traps. (rv64_smoke_test + rv64_extras_test cover FP+CSR,
- RVC, PT_INTERP, and the new syscall set. Atomics, TLS, and signal
- trampolines remain stubbed in the interpreter — out of smoke scope.)
-
-## Execution infrastructure
-
-- [x] Use podman `--platform linux/riscv64` for rv64 execution when no native or
- qemu-user runner is available.
-- [x] Prove `test-smoke-rv64` direct and batched execution paths.
-- [x] Prove `test/asm` rv64 path E through podman.
-- [x] Prove a targeted `test/parse` rv64 path E through podman.
-- [x] Run larger rv64 E matrices under podman with batching and record stable
- filters for CI-equivalent local runs.
- (test/parse and test/toy run end-to-end through podman/qemu rv64
- with batching; stable filters established.)
-- [ ] Add clear diagnostics for missing podman image/platform support, binfmt,
- qemu-user, or clang rv64 cross support.
-- [x] Decide default images for `RUN_RV64_IMAGE` across musl/glibc tests.
- (musl/Alpine = `alpine:latest`; documented in test/lib/exec_target.sh.)
-
-## Test policy
-
-- [x] Add rv64-targeted filters/goldens for each new feature as it lands.
-- [x] Keep skips explicit and arch-scoped through `*.targets`, not hidden in
- harness defaults.
-- [x] Prefer red/green targeted runs: one failing feature family at a time,
- one arch at a time.
-- [x] Promote stable rv64 lanes into default or CI-equivalent coverage once the
- runner assumptions are reliable.
- (test-rv64-inline and test-emu added to default `make test`;
- test-smoke-rv64 / test-musl-rv64 / test-glibc-rv64 remain opt-in
- because they require podman/qemu.)
-- [x] Keep aa64 lanes green while changing shared asm/disasm/link/test harness
- code.
-
-## RV64 opset status
-
-This section tracks the RV64 asm/disasm ISA families that were historically
-absent from the descriptor table (`src/arch/rv64/isa.c`) plus the remaining
-explicitly unsupported extension families.
-
-**Standard scalar FP (RV32F/D) — complete for scalar RV64GC:**
-- `fmadd.{s,d}`, `fmsub.{s,d}`, `fnmsub.{s,d}`, `fnmadd.{s,d}`, and
- `fclass.{s,d}` are now in the shared asm/disasm descriptor table, with
- targeted encode/decode coverage.
-
-**Atomic ordering suffixes (RV64A) — complete:**
-- `lr.{w,d}.{aq,rl,aqrl}`, `sc.{w,d}.{aq,rl,aqrl}`, and
- `amo*.{w,d}.{aq,rl,aqrl}` are accepted and disassembled with ordering
- suffixes. The bare forms remain present for codegen.
-
-**RV64C compressed — complete for RV64-applicable scalar/FP forms:**
-- Encoder and decoder cover the existing baseline plus `c.fld`, `c.fsd`,
- `c.fldsp`, `c.fsdsp`, `c.subw`, `c.addw`, `c.and`, `c.or`, `c.xor`,
- `c.sub`, `c.andi`, `c.srai`, `c.srli`, `c.slli`, and `c.addiw`.
-- `c.flw/c.fsw/c.flwsp/c.fswsp` remain RV32-only and are intentionally not
- accepted for RV64.
-- Codegen never emits compressed regardless; backend always picks 32-bit
- forms. Encoder coverage matters only for hand-written `.s` files.
-
-**Privileged ISA (M-mode / S-mode) — out of scope by policy:**
-- `mret`, `sret`, `uret`, `wfi`, `sfence.vma`, `hfence.*`, `mnret`.
-- M-mode/S-mode CSRs (mstatus, mtvec, mepc, mcause, satp, etc.) reachable
- only via `csrrw`/`csrrs`/`csrrc` with a literal CSR number. The asm
- syntax for named privileged CSRs (e.g., `csrrw t0, mstatus, zero`) is
- not in the table; only the fp/Zicsr CSRs (`fcsr`, `frm`, `fflags`) and
- numeric forms work.
-
-**Extension status:**
-- `Zifencei` is now supported for asm/disasm via `fence.i`.
-- Still out of scope: `V` (vector), `B`/`Zba`/`Zbb`/`Zbc`/`Zbs` (bit manipulation),
- `Zfh`/`Zfhmin` (half-precision FP), `Zicbom`/`Zicboz` (cache
- management), `Zihintpause`, `Smaia`/`Ssaia` — none planned.
-
-**Misc gaps:**
-- `c.unknown` descriptor exists as a sentinel for the disassembler; not a
- real ISA mnemonic.
diff --git a/doc/STAGE2.md b/doc/STAGE2.md
@@ -1,272 +0,0 @@
-# Stage-2 self-host
-
-What's missing to make `make self` produce a stage-2 `cfree` built by stage-1
-cfree itself. Companion to `DESIGN.md`.
-
-Latest snapshot: **105 / 107 files compile clean** (93/93 `src/**/*.c`,
-12/14 `driver/*.c`). The two remaining driver failures (`env.c`, `ld.c`)
-are both blocked by A2 — system-header ingest. Everything in `src/` builds
-under stage 1.
-
-A standalone link probe (`scripts/stage2_link.sh`) drives the full
-sequence end-to-end: cfree-stage1 compiles the 105 clean files, clang
-compiles `env.c` / `ld.c`, and `cfree ld` then attempts to link the
-combined object set against `libSystem.B.tbd`. As of the latest run the
-link reaches the chained-fixup emit pass and trips D2 below.
-
-## Build configuration
-
-Stage 2 currently invokes:
-
-```
-cfree-stage1 cc --sysroot=$SDK -isystem rt/include -Iinclude -Isrc
-```
-
-`--sysroot=$SDK` makes the host SDK's libc/POSIX headers visible.
-`rt/include/` ships the freestanding set on top.
-
-`DEPFLAGS` is empty for stage 2 today; B0 has landed but the recipe has
-not been switched back on.
-
-## Checklist
-
-### Preprocessor / lexer
-
-- [x] **A1.** Quoted `#include "x.h"` now searches the includer's
- directory first per C99 §6.10.2 (commit c9baaf8). Was blocking every
- `driver/*.c` file.
-- [ ] **A2.** System-header ingest. The driver pulls a POSIX/Mach surface
- (`sys/stat.h`, `sys/mman.h`, `sys/syscall.h`, `fcntl.h`, `unistd.h`,
- `signal.h`, `pthread.h`, `dlfcn.h`, `mach/mach.h`, `mach/mach_vm.h`,
- `mach/vm_map.h`) from the host SDK. With `-isystem $SDK/usr/include`
- and the right host predefines, the SDK parses up to a small set of
- constructs cfree doesn't yet handle. Each sub-item below is the
- minimal feature needed.
-
- - [ ] **A2-S1.** Asm-label on function declarators:
- `T fn(args) __asm__("name");`. GCC asm-label rename extension; what
- `__DARWIN_ALIAS` / `__DARWIN_ALIAS_C` / `__DARWIN_INODE64` /
- `__DARWIN_EXTSN` expand to. Blocks `sys/stat.h`, `sys/mman.h`,
- `unistd.h`, `_string.h`, `_stdio.h`.
- - [ ] **A2-S2.** Asm-label on global variables:
- `extern T name __asm__("name");`. Same extension, declarator position
- differs from S1. Blocks `_time.h` (→ `<time.h>`, `<signal.h>`,
- `<pthread.h>`).
- - [ ] **A2-S3.** Unknown `#pragma` accepted as no-op (full semantics
- not required for ingest). Today fatal "expected declaration". Blocks
- `sys/fcntl.h`, `mach/vm_types.h`. Same root cause as R2 below.
- - [ ] **A2-S4.** `__has_include`, `__has_feature`, `__has_extension`
- as preprocessor builtins inside `#if`. (`__has_attribute` already
- works.) Blocks `Availability.h` and the `__enum_decl` feature-detect
- branch.
- - [ ] **A2-S5.** `__uint128_t` declared type. Declare-only is enough
- to parse `mach/arm/_structs.h` (signal.h, ucontext); full codegen
- is a bigger lift.
- - [ ] **A2-S6.** `#warning` accepted as non-fatal. Today cfree errors
- on the directive itself; `sys/cdefs.h`'s
- `#warning "Unsupported compiler"` aborts any SDK ingest unless
- `-D__GNUC__` is also passed.
- - [ ] **A2-S7.** Predefine macOS-host macros (`__APPLE__`, `__MACH__`,
- `__arm64__`/`__aarch64__`, `__LITTLE_ENDIAN__`, `__GNUC__`,
- `__GNUC_MINOR__`) automatically when targeting macOS, so callers
- don't need to hand-pass `-D`.
-
- After S6+S7+S1+S2+S3+S4, both blocked driver files should ingest the
- SDK directly. S5 only needed for signal.h/ucontext paths.
-
-### Driver — dep emission
-
-- [x] **B0.** `cfree_dep_iter_new` / `_next` implemented over
- SourceManager (commit 8919185). Stage 2 can re-enable `-MMD -MP`
- whenever the recipe drops `DEPFLAGS=''`.
-
-### Parser / sema
-
-- [x] **B1.** `__alignof__` aliased to `_Alignof` (type-name form).
-- [x] **B2.** `__builtin_ctz` lowered through `INTRIN_CTZ`.
-- [x] **B3.** `parse_array_bound` already routed `SEK_ENUM_CST` through
- `eval_const_int` — original repro was actually B4. Regression case
- added.
-- [x] **B4.** `try_parse_addr_const` accepts string literals via
- `emit_string_to_rodata`.
-- [x] **B5.** `try_parse_addr_const` admits `SEK_FUNC` identifiers.
-- [x] **B6.** File-scope `T name[] = {...}` now calls
- `complete_incomplete_array` to match the block-scope path.
-- [x] **B7.** `__alignof__` accepts a **unary-expression** operand
- (`__alignof__(*ptr)`), not just a type-name. Required by the
- `VEC_GROW` macro in `src/core/vec.h`; previously blocked 8 files in
- `src/debug/` and `src/link/`.
-- [x] **B8.** `sizeof` accepts the no-parens **unary-expression** form
- in constant-expression contexts (e.g. file-scope initializers). C99
- §6.5.3.4 standard, not an extension. Blocked `src/arch/aa64/isa.c`
- and `src/arch/aa64/regs.c`.
-- [x] **B9.** Block-scope `static T name[] = {...}` now completes the
- incomplete array, mirroring B6's file-scope fix. Was blocking
- `src/pp/pp.c`.
-
-### Codegen — aarch64 backend
-
-- [x] **C1.** `OPK_INDIRECT` source operands handled in INT and FP arg
- paths (commit f2d3e01).
-- [x] **C2.** `OPK_INDIRECT` on the indirect-return path (commit
- f2d3e01).
-- [x] **C0.** Stage-1 regalloc "no spillable victim (class 0)" panic
- fixed — was choking on the complex functions in `src/arch/aa64/arch.c`,
- `src/arch/rv64.c`, `src/cg/cg.c`, and `src/opt/opt.c`. Not a feature
- gap; a regalloc bug surfaced by self-host pressure.
-
-### Codegen — x64 backend
-
-- [ ] **C3.** Mirror C1/C2 on x64
- (`src/arch/x64.c:1761,1798,1817,1827,1904`). Doesn't block aarch64
- self-host; blocks x64 self-host when that's attempted.
-
-### Linker
-
-- [ ] **D1.** Stage 2 currently relies on `$(CC) -o $@ ... $(LIB_AR)`
- for the final link — for stage 2 that's `cfree-stage1 cc`, which in
- turn shells out to the host linker. Once stage 2 builds, verify the
- produced binary is genuinely a stage-1-emitted object linked through
- cfree's own ld path, not falling back to clang/ld silently.
-- [x] **D2-read.** Mach-O reader rejected `ARM64_RELOC_TLVP_LOAD_PAGE21`
- (8) and `ARM64_RELOC_TLVP_LOAD_PAGEOFF12` (9). Clang emits these for
- TLS references in `driver/env.c` (errno-style access); without them
- the standalone link probe couldn't ingest `env.o`. Reader now maps
- both to TLV reloc kinds.
-- [ ] **D2-emit.** Chained-fixup emit doesn't know how to locate the
- byte slot for the new TLV pointer region — `cfree ld` aborts with
- `link_macho: chained-fixup slot for vaddr 0x… not in any segment
- buffer` at `src/link/link_macho.c:1564`. The lookup at
- `link_macho.c:1543` currently routes only segidx 2 (`__DATA_CONST`
- __got) and segidx 3 (`__DATA` __thread_ptrs / MSec walk); the new
- TLV section/segment added by the TLV ingest work isn't covered.
- Blocks the standalone link probe past compile.
-
-## Runtime — `rt/lib/*` ingest
-
-Separate from stage-2 self-host: can cfree compile `libcfree_rt.a`?
-Probed on the `aarch64-apple-darwin` variant — 8 sources, freestanding,
-no system headers. Result: **6 / 8 clean** today (`fp/fp.c`, `mem/mem.c`,
-`cfree/ifunc_init.c`, `coro/coro.c`, `coro/aarch64.c`, `int/int.c`).
-Flags must drop
-`-std=c11 -Wpedantic -Wall -Wextra -Werror -ffreestanding -fno-builtin` —
-cfree rejects all of these. (`-fno-builtin` is the only one not already
-on the stage-2 drop list.)
-
-- [x] **R1.** Replaced `__inline` with `inline` in rt sources (no
- compiler change; cfree already accepts `inline`).
-- [x] **R2.** Unknown `#pragma` now silently skipped at the parser
- boundary (`pp_next` drops forwarded pragma lines so cpp mode still
- re-emits them via `pp_next_raw`). `atomic_common.inc`'s
- `#pragma redefine_extname` rename was dropped from source; the
- `_c`-suffixed functions were renamed directly to their final library
- names (no clang-builtin collision on the cfree side).
-- [x] **R3.** `__builtin_offsetof(T, m)` now folds inside `cexpr_unary`
- using the existing `offsetof_designator` helper. Unblocks
- `_Static_assert(offsetof(...))`.
-- [x] **R4.** Member-level `_Alignas(N)` now raises the field's
- `align_override`, which the ABI layout already propagates into the
- containing aggregate's alignment (`src/abi/abi.c:195,213,223`).
-- [x] **R5.** `__int128`, `__int128_t`, `__uint128_t` recognized as
- type specifiers (`TY_INT128`/`TY_UINT128`, size 16, align 16).
- Typedef-only use parses; any `cg_load`/`cg_store`/`cg_binop`/
- `cg_unop`/`cg_convert` on int128 panics with a clear
- "`__int128` codegen not implemented" diagnostic. Codegen support is
- out of scope for this milestone.
-- [x] **R6.** Missing rt builtins wired up in the parser.
- - `__builtin_trap`, `__builtin_unreachable` → new `cg_intrinsic_void`,
- `INTRIN_TRAP` / `INTRIN_UNREACHABLE` (already implemented in all
- three backends).
- - `__builtin_clz`, `__builtin_clzl`, `__builtin_clzll` →
- `cg_intrinsic_unary_to_int(INTRIN_CLZ)`; operand type drives width.
- - `__builtin_memcpy`, `__builtin_memmove`, `__builtin_memcmp`,
- `__builtin_memset` → rewritten at `try_parse_builtin_call` to plain
- calls to the libc functions of the same name, so runtime-`n` works.
- Caller must declare the libc prototype (rt's `<string.h>` does).
-- [x] **R7.** `__func__`, `__FUNCTION__`, `__PRETTY_FUNCTION__`
- predefined identifiers (C99 §6.4.2.2). Synthesized lazily in
- `parse_primary` as a NUL-terminated `char[N+1]` literal in `.rodata`,
- using a new `Parser.cur_func_name` field set around
- `parse_function_body`. Outside a function body, a clean diagnostic.
-- [x] **R10.** File-scope `__asm__("...")` declarations
- (a GCC extension, also accepted by clang). `parse_translation_unit`
- recognizes `__asm__` / `asm` at TU scope, decodes the string-literal
- payload, and feeds it through `parse_asm` against the current object
- emitter. The object symbol table now reuses existing symbols by name,
- so C declarations before/after asm labels bind to the same `ObjSymId`.
- For `coro/aarch64.c`, the AArch64 assembler also accepts `stp`/`ldp`
- on `d0..d31`, supports `csinc`, and predefines
- `__USER_LABEL_PREFIX__` as `""` for ELF-style targets and `"_"` for
- Mach-O. Verified with:
- `build/cfree cc -target aarch64-apple-darwin -g -c rt/lib/coro/aarch64.c ...`.
-
-After R1–R10, two blockers remain for the 8-source `aarch64-apple-darwin`
-rt probe:
-
-- [ ] **R8.** `__builtin_ctzl` / `__builtin_ctzll` not wired (only
- `__builtin_ctz` is). Same shape as R6's `clz` wiring; just needs the
- three symbols added to the gate in `try_parse_builtin_call` and
- routed through `INTRIN_CTZ`. Blocks `int64/int64.c:217`.
-- [ ] **R9.** `__atomic_always_lock_free(size, ptr)` and
- `__atomic_is_lock_free(size, ptr)` must fold at compile time when
- `size` is a constant — `atomic_common.inc`'s `IS_LOCK_FREE_n` macros
- expand to these inside `case 1: ... case 16:` arms and rely on the
- fold to elide unreachable branches. Plain runtime calls would still
- link but the macros wrap the result in a switch over `size`, so
- without folding cfree would emit per-size dispatch that the rt
- layout expects to be dead-code-eliminated. Blocks
- `atomic/atomic_freestanding.c:77`.
-
-Additionally listed in the larger SDK ingest plan but not yet seen in
-the 8-source rt probe: `__builtin_*_overflow` (for `int/int.c`'s
-`__addvsi3` family — currently the source uses manual overflow checks,
-not the builtins).
-
-## How to re-run the audits
-
-Stage-2 audit (src + driver):
-
-```sh
-make && cp build/cfree build/cfree-stage1
-BIN=$(pwd)/build/cfree-stage1
-SDK=$(xcrun --show-sdk-path)
-FLAGS="--sysroot=$SDK -isystem rt/include -Iinclude -Isrc"
-DFLAGS="--sysroot=$SDK -isystem rt/include -Iinclude"
-for f in $(find src -name '*.c' | sort); do
- $BIN cc $FLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |"
-done
-for f in $(find driver -name '*.c' | sort); do
- $BIN cc $DFLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |"
-done
-```
-
-System-header ingest probe (after A2 work):
-
-```sh
-SDK=$(xcrun --show-sdk-path)
-DEFS="-D__GNUC__=4 -D__GNUC_MINOR__=2 -D__arm64__=1 -D__aarch64__=1 \
- -D__LITTLE_ENDIAN__=1 -D__APPLE__=1 -D__MACH__=1"
-for h in sys/stat.h sys/mman.h sys/syscall.h fcntl.h unistd.h signal.h \
- pthread.h dlfcn.h mach/mach.h mach/mach_vm.h mach/vm_map.h \
- stdio.h stdlib.h string.h; do
- echo "#include <$h>" > /tmp/h.c
- $BIN cc $DEFS -isystem rt/include -isystem "$SDK/usr/include" \
- -c /tmp/h.c -o /tmp/h.o 2>&1 | head -1 | sed "s|^|$h: |"
-done
-```
-
-rt ingest probe (`aarch64-apple-darwin` variant):
-
-```sh
-SRCS="lib/int/int.c lib/fp/fp.c lib/mem/mem.c \
- lib/atomic/atomic_freestanding.c lib/cfree/ifunc_init.c \
- lib/int64/int64.c lib/coro/aarch64.c lib/coro/coro.c"
-FLAGS="-target aarch64-apple-darwin -DHAS_INT128=1 \
- -Irt/lib/include/common -Irt/lib/impl \
- -Irt/lib/include/lp64_le -Irt/include"
-for f in $SRCS; do
- $BIN cc $FLAGS -c "rt/$f" -o /dev/null 2>&1 | head -1 | sed "s|^|rt/$f: |"
-done
-```
-
-Then `make self` to confirm a clean stage-2 build end-to-end.
diff --git a/doc/TAILCALL.md b/doc/TAILCALL.md
@@ -1,234 +0,0 @@
-# Tail Call Support
-
-First-class tail calls from the C frontend through codegen and the aarch64
-backend. x64 and rv64 follow the same pattern; this document focuses on
-aarch64.
-
-## Current state
-
-The groundwork is present but nothing is wired end-to-end:
-
-- `cg_tail_call` (cg.c) — stub that panics `"not in v1 slice"`
-- `CG_CALL_TAIL` in `CGCallDesc.flags` (arch.h) — defined and documented,
- never set
-- `R_AARCH64_JUMP26` (obj.h) — handled identically to CALL26 by the linker;
- only the emitted instruction opcode differs
-- `aa64_b_base()` / `aa64_br()` — defined in the ISA layer, never used for
- tail calls
-- `test/elf/cases/16_tail_call.c` — verifies that cfree's linker handles
- JUMP26 from a clang-compiled object; does not test cfree generating tail calls
-
-## Architecture constraint
-
-The aarch64 backend defers frame layout: frame size, callee-saved register
-counts, and stack offsets are computed in `aa_func_end` after all body code is
-emitted. The prologue uses NOP placeholders that are patched back-filled by
-`aa_func_end`. The epilogue is a single labeled block; all `aa_ret` paths emit
-a `B` to that label.
-
-For tail calls the frame teardown (callee-saved restores + SP restore) must
-appear **inline at the call site**, before the `B`/`BR` to the callee — not at
-the epilogue. Since frame dimensions are unknown at call-emit time, we use the
-same NOP-placeholder-then-patch approach as the prologue.
-
-## v1 constraints (panic, don't silently miscompile)
-
-| Scenario | Disposition |
-|---|---|
-| Tail call from alloca function | `compiler_panic` in `aa_call` |
-| Tail call with sret return type | `compiler_panic` in `aa_call` |
-| Tail call with stack-passed args | `compiler_panic` in `aa_call` |
-| Tail call from variadic function | `compiler_panic` in `aa_call` |
-| `musttail` not on a return stmt | `perr` in parser |
-| C23 `[[clang::musttail]]` syntax | out of scope |
-
----
-
-## Step 1 — Frontend: recognize `musttail`
-
-**`src/parse/attr.h`**: Add `ATTR_MUSTTAIL` to `AttrKind`.
-
-**`src/parse/parse_type.c`**: Register in the attribute table:
-```c
-{"musttail", ATTR_MUSTTAIL, AS_NONE},
-```
-
-**`src/parse/parse_priv.h`**: Add `u8 in_musttail` to `Parser`.
-
-**`src/parse/parse_stmt.c`** — `parse_stmt`: before the keyword dispatch check
-`starts_attr(p)`. If the parsed list contains `ATTR_MUSTTAIL`, set
-`p->in_musttail = 1`. The next token must be `return`; any other statement is a
-fatal error.
-
-**`src/parse/parse_stmt.c`** — `parse_return_stmt`: if `p->in_musttail`:
-- call `parse_expr(p)` as usual — the inner call dispatch emits `cg_tail_call`
-- do **not** call `to_rvalue` or `cg_ret`; `cg_tail_call` implicitly terminates
- the function
-- clear `p->in_musttail` before returning
-
-**`src/parse/parse_expr.c`** — postfix call dispatch: if `p->in_musttail` is
-set when a `'('` call is dispatched, emit `cg_tail_call(p->cg, nargs, fn_type)`
-instead of `cg_call`.
-
-## Step 2 — CG layer: implement `cg_tail_call`
-
-Factor the body of `cg_call` into:
-```c
-static void cg_call_impl(CG* g, u32 nargs, const Type* fn_type, u16 flags);
-```
-
-Both `cg_call` and `cg_tail_call` call it, passing `CG_CALL_NONE` or
-`CG_CALL_TAIL` respectively.
-
-When `flags & CG_CALL_TAIL`:
-- Set `desc.flags = CG_CALL_TAIL`
-- Skip result register allocation; pass a void-typed `OPK_IMM` placeholder in
- `desc.ret` so the backend has a typed slot to inspect
-- Skip `push(g, …)` — no result, no continuation
-- Still call `T->free_reg` for the callee register after `T->call` returns
- (the backend has already moved it to x16 by then)
-
-## Step 3 — AArch64 backend
-
-### `src/arch/aa64/internal.h`
-
-Worst-case inline teardown: 5 int-pair LDPs (x19–x28) + 4 fp-pair LDPs
-(d8–d15) + 1 fp/lr LDP + 2 SP-add instructions = 12; use 14 for headroom.
-
-```c
-#define AA_TAIL_EP_WORDS 14u
-#define AA64_TAIL_SCRATCH 16u /* x16 / ip0: caller-saved, not a pool reg */
-```
-
-Add to `AAImpl`:
-```c
-struct { u32 pos; } *tail_sites;
-u32 ntail_sites;
-u32 tail_sites_cap;
-```
-
-Initialize in `aa_func_begin`:
-```c
-a->tail_sites = NULL;
-a->ntail_sites = 0;
-a->tail_sites_cap = 0;
-```
-
-### `src/arch/aa64/ops.c` — `aa_call`
-
-After the `emit_arg_value` loop and `max_outgoing` update, before the existing
-BL/BLR emission:
-
-```c
-if (d->flags & CG_CALL_TAIL) {
- if (a->has_alloca)
- compiler_panic(…, "musttail not supported in alloca function");
- if (d->abi && d->abi->has_sret)
- compiler_panic(…, "musttail not supported with sret return type");
- if (stack_off > 0)
- compiler_panic(…, "musttail with stack-passed arguments not supported");
- if (a->is_variadic)
- compiler_panic(…, "musttail not supported in variadic function");
-
- /* Indirect callees live in x19–x28 (the int pool). Move to ip0 now,
- * before the teardown restores those registers from the stack. */
- if (d->callee.kind == OPK_REG)
- aa64_emit32(mc, aa64_mov_reg(1, AA64_TAIL_SCRATCH, reg_num(d->callee)));
-
- /* NOP placeholder; patched with the frame teardown in aa_func_end. */
- u32 site_pos = mc->pos(mc);
- for (u32 i = 0; i < AA_TAIL_EP_WORDS; ++i) aa64_emit32(mc, AA64_NOP);
-
- /* Tail branch. */
- if (d->callee.kind == OPK_GLOBAL) {
- u32 b_pos = mc->pos(mc);
- aa64_emit32(mc, aa64_b_base());
- mc->emit_reloc_at(mc, mc->section_id, b_pos, R_AARCH64_JUMP26,
- d->callee.v.global.sym, d->callee.v.global.addend, 0, 0);
- } else if (d->callee.kind == OPK_REG) {
- aa64_emit32(mc, aa64_br(AA64_TAIL_SCRATCH));
- } else {
- compiler_panic(…, "aarch64 tail call: callee kind %d unsupported", …);
- }
-
- aa_tail_site_push(a, site_pos);
- return; /* no return-value extraction; no continuation */
-}
-```
-
-`aa_tail_site_push` is a small grow-array helper consistent with the existing
-`add_patches` pattern.
-
-### `src/arch/aa64/emit.c` — `aa_func_end`
-
-After computing `n_int_pairs`, `n_fp_pairs`, `frame_size`, `int_save_off`,
-`fp_save_off`, `fp_lr_off` — before placing the epilogue label — patch each
-tail call site:
-
-```c
-for (u32 ti = 0; ti < a->ntail_sites; ++ti) {
- u32 words[AA_TAIL_EP_WORDS];
- u32 wi = 0;
- for (u32 i = 0; i < AA_TAIL_EP_WORDS; ++i) words[i] = AA64_NOP;
-
- for (i32 i = (i32)n_fp_pairs - 1; i >= 0; --i) {
- u32 r0 = 8u + (u32)i * 2u;
- words[wi++] = aa64_ldp_d(r0, r0+1, 31, (i32)(fp_save_off + (u32)i*16));
- }
- for (i32 i = (i32)n_int_pairs - 1; i >= 0; --i) {
- u32 r0 = 19u + (u32)i * 2u;
- words[wi++] = aa64_ldp_x(r0, r0+1, 31, (i32)(int_save_off + (u32)i*16));
- }
- words[wi++] = aa64_ldp_x(29, 30, 31, (i32)fp_lr_off);
-
- /* SP restore — mirrors emit_sp_add but writes into words[]. */
- if (frame_size <= 0xfff) {
- words[wi++] = aa64_add_imm(1, 31, 31, frame_size, 0);
- } else if ((frame_size & 0xfff) == 0 && (frame_size >> 12) <= 0xfff) {
- words[wi++] = aa64_add_imm(1, 31, 31, frame_size >> 12, 1);
- } else {
- words[wi++] = aa64_add_imm(1, 31, 31, (frame_size >> 12) & 0xfff, 1);
- words[wi++] = aa64_add_imm(1, 31, 31, frame_size & 0xfff, 0);
- }
-
- if (wi > AA_TAIL_EP_WORDS)
- compiler_panic(…, "aarch64: tail epilogue overflow (%u words)", wi);
-
- u32 p0 = a->tail_sites[ti].pos;
- for (u32 i = 0; i < AA_TAIL_EP_WORDS; ++i)
- aa64_patch32(obj, sec, p0 + i * 4u, words[i]);
-}
-```
-
-## Step 4 — Tests (red-green)
-
-Write tests first, then implement.
-
-**`test/parse/`** — attribute parse test: `__attribute__((musttail)) return f(x);`
-parses without error; missing-return error fires on a non-return statement.
-
-**`test/cg/`** — direct tail call: build `int f(int x)` that musttail-calls
-`int g(int x)` via `cg_tail_call`; verify the emitted aarch64 text contains a
-JUMP26 relocation (B) and no CALL26 (BL).
-
-**`test/cg/`** — indirect tail call via function pointer: verify `MOV x16, xN`
-+ `BR x16` and no BLR.
-
-**`test/cg/`** — end-to-end: tail-recursive sum that overflows without TCO;
-compile and run via `cfree run` and verify correctness.
-
-## Files touched
-
-| File | Change |
-|---|---|
-| `src/parse/attr.h` | `+ATTR_MUSTTAIL` |
-| `src/parse/parse_type.c` | `musttail` attribute table entry |
-| `src/parse/parse_priv.h` | `+u8 in_musttail` to `Parser` |
-| `src/parse/parse_stmt.c` | attribute prefix detection; musttail return path |
-| `src/parse/parse_expr.c` | `cg_tail_call` dispatch when `in_musttail` |
-| `src/cg/cg.c` | factor `cg_call_impl`; implement `cg_tail_call` |
-| `src/arch/aa64/internal.h` | constants, `AATailCallSite`, fields in `AAImpl` |
-| `src/arch/aa64/ops.c` | tail-call branch in `aa_call`; `aa_tail_site_push` |
-| `src/arch/aa64/emit.c` | init in `aa_func_begin`; patch loop in `aa_func_end` |
-| `test/parse/` | musttail attribute parse test |
-| `test/cg/` | direct/indirect/e2e tail call tests |
diff --git a/doc/TOY_REWRITE_TASKS.md b/doc/TOY_REWRITE_TASKS.md
@@ -1,275 +0,0 @@
-# Toy Rewrite Task List
-
-This tracks the implementation rewrite toward `doc/TOY.md`. Work proceeds
-red-green: rewrite or add tests first, run focused failures, then implement the
-smallest slice that moves those tests green.
-
-Completion rule: this task list is not a partial-coverage plan. Future agents
-must continue until `lang/toy` is fully aligned with `doc/TOY.md`, including
-internal refactors, representation cleanup, stronger diagnostics, and removal
-of temporary shortcuts. A green `make test-toy` is required after each slice,
-but it is not by itself proof that the language implementation is complete.
-
-## Phase 1: Spec-shaped Existing Coverage
-
-- [x] Rewrite existing runnable toy cases from legacy `int` to explicit scalar
- types, initially using `i64` where old `int` behavior was 64-bit.
-- [x] Replace legacy logical operators `&&` and `||` with `and` and `or`.
-- [x] Replace legacy prefix dereference `*p` with postfix `p.*` in expression
- and assignment contexts.
-- [x] Prefix public CG coverage builtins with `@`, including type-query,
- memory, atomic, vararg, intrinsic, target, and asm helpers.
-- [x] Replace legacy helper names with spec spellings where a direct mapping
- exists:
- `index(p, i)` -> `p[i]`, `sizeof<T>()` -> `@sizeof<T>()`,
- `alignof<T>()` -> `@alignof<T>()`, `offsetof<T>(f)` ->
- `@offsetof<T>(f)`.
-- [x] Keep expected exit codes unchanged unless a test intentionally changes
- semantics.
-
-## Phase 2: Focused Red Tests
-
-- [x] Add a first-class byte string/global data initializer case.
-- [x] Add array literal and indexing coverage.
-- [x] Add pointer-to-array address behavior cases.
-- [x] Add `let name = expr` inference coverage.
-- [x] Add `var name = expr` inference coverage.
-- [x] Add `NULL as *T` pointer literal coverage.
-- [x] Add record declaration, record literal, omitted field zero-fill, and field
- projection coverage.
-- [x] Add tuple record literal and numeric field projection coverage.
-- [x] Add enum declaration and dot-constant typed initializer coverage.
-- [x] Add `pub`, `extern`, and `alias` declaration coverage.
-- [x] Add one error-test harness pass for compile-fail diagnostics before adding
- many negative parser/type cases.
-
-## Phase 3: Frontend Structure
-
-- [x] Split the previous single `toy.c` implementation into explicit frontend
- modules: public compile entry, lexer, parser core/context, symbol/scope
- tables, literal helpers, and parser implementation.
-- [x] Introduce a Toy type layer that can represent aliases, nominal records,
- tuple records, enums, arrays, pointers, function pointers, qualifiers, and
- anonymous records while lowering through public CG API types.
-- [x] Replace fixed-size global/local/function arrays with context-owned
- growable storage; `lang/toy` stays on the public frontend boundary, so
- this uses explicit `ToyParser` ownership rather than internal core vector
- headers.
-- [x] Add lexical scopes for block-local declarations without global state.
-- [x] Keep codegen target and ABI details hanging off `ToyParser`/future Toy
- context structures.
-- [x] Remove legacy compatibility spellings and temporary lowering shortcuts
- once spec-shaped replacements have full coverage.
-- [x] Add focused negative tests for every rejected spec form and unsupported
- backend feature path.
-
-## Phase 4: Spec Features
-
-- [x] Declarations: `pub`, `extern`, attributes, thread-local objects, alias,
- readonly/mutable object definitions, function-local statics.
-- [x] Types: full scalar set, address-space pointers, arrays, function types,
- qualifiers, aliases, records, tuple records, anonymous records, enums.
-- [x] Expressions: `NULL`, byte strings, casts, postfix calls/index/field/deref,
- type-safe lvalues, precedence-island restrictions, aggregate literals.
-- [x] Statements: assignment-only lvalues, expression statements, labels,
- labelled loops/switches, value-bearing `break`, `return tail`.
-- [x] Expression control flow: `if` expressions, result-typed `while<T> else`,
- expression switches.
-- [x] Builtins: varargs, type queries, memory, data relocations, atomics,
- intrinsics, target capability queries, and typed inline assembly.
-- [x] Error tests: syntax, type mismatch, declaration order, unsupported target
- intrinsics, invalid attributes, direct recursive records, invalid tail
- calls, and invalid builtin forms.
-
-## Remaining Work To Reach `doc/TOY.md`
-
-The previous rewrite checklist is complete, but `doc/TOY.md` still describes
-several behaviors that the implementation does not yet provide. These are now
-tracked as red tests first; keep the tests aligned with the spec and make the
-implementation catch up in focused green slices.
-
-- [x] Implement function-local static object initializers containing
- `@labeladdr(label)` while the containing function is open. The positive
- coverage is `test/toy/cases/119_static_labeladdr_data.toy`.
-- [x] Implement `@symdiff(lhs, rhs, addend)` object initializers for every
- object format/target where `doc/TOY.md` exposes the builtin, or narrow the
- spec if the portability contract changes. The positive coverage is
- `test/toy/cases/120_data_symdiff.toy`.
-- [x] Expose dynamic `@memcpy`, `@memmove`, and `@memset` lowering through Toy
- by accepting expression-valued `size` and `align` operands, not only
- numeric literals. The positive coverage starts with
- `test/toy/cases/121_dynamic_memory_builtin.toy`.
-- [x] Propagate `.entsize(N)` from Toy data-definition attributes through
- `CfreeCgDataDefAttrs` into object sections, including merge/string
- sections. The red API assertion lives in `test/api/cg_type_test.c`.
-- [x] Add object-format inspection coverage for Toy `.entsize(N)` once the
- public object inspection surface can assert section entry size directly
- without relying on textual objdump details.
-
-## Current Slice
-
-- [x] Convert legacy runnable tests to spec syntax.
-- [x] Add minimal new syntax tests for `@` builtins, `and`/`or`, postfix
- dereference, inference, and `NULL`.
-- [x] Implement only enough lexer/parser/codegen support to make the converted
- core tests pass.
-- [x] Add object-inspection checks for toy cases and lower declaration/data
- attributes for weak symbols, aliases, object alignment, readonly storage,
- and common definitions.
-- [x] Extend memory-operation parsing for `@memcpy`, `@memmove`, and
- `@memset` access flags, and cover address-of indexed lvalues.
-- [x] Add target capability query builtins for symbol and backend features.
-- [x] Add explicit rounding-mode conversion builtins for int/float edges.
-- [x] Parse anonymous `record { ... }` type literals and use the existing
- aggregate initializer/projection lowering for locals.
-- [x] Parse inline ABI attributes on function parameters and return types.
-- [x] Parse record field attributes for alignment/packed field layout.
-- [x] Allow value blocks for expression control flow to contain preceding
- statements before the final unsuffixed expression.
-- [x] Give statement switches control scopes and support labelled switch
- breaks.
-- [x] Support forward record declarations and unresolved pointer fields using
- erased pointer storage.
-- [x] Emit top-level record data definitions from constant named-field
- initializers.
-- [x] Parse atomic `access(...)` groups for typed load/store/RMW/cmpxchg
- operations.
-- [x] Parse atomic `access(...)` groups for legality and lock-free queries.
-- [x] Accept keyword-shaped dot constants for atomic RMW operations.
-- [x] Use typed local initializer context for `NULL` pointer literals.
-- [x] Use function parameter context to resolve enum dot constants in calls.
-- [x] Resolve enum dot constants in switch arm labels from selector type.
-- [x] Emit floating-point top-level object initializers with
- `cfree_cg_data_float`.
-- [x] Cover `@unreachable()` as a statement-position terminator builtin.
-- [x] Route no-argument low-level intrinsics through CG and cover unsupported
- target intrinsic diagnostics with an error test.
-- [x] Implement implicit dereference for field access on `*Record`.
-- [x] Support field assignment lvalues for records.
-- [x] Lower record `.packed` and `.align(N)` layout attributes through record
- type construction.
-- [x] Support address-of field lvalues.
-- [x] Preserve operand type for `@expect`.
-- [x] Preserve operand type for integer scalar intrinsics.
-- [x] Add negative coverage for direct by-value recursive records.
-- [x] Support labelled result-typed while expressions.
-- [x] Distinguish loop and switch control scopes so `continue` can only target
- loops, including unlabeled `continue` through nested switches.
-- [x] Allow expression switches over enums to omit `default` only when every
- enum value is covered.
-- [x] Enforce precedence-island boundaries while allowing parenthesized mixed
- islands and normal additive/multiplicative precedence.
-- [x] Reject legacy `int`, `&&`, and `||` spellings now that spec-shaped tests
- use explicit scalar types and `and`/`or`.
-- [x] Lower `@labeladdr` and computed `goto` through the public CG label-address
- API, with a CG-level compare-chain fallback for targets without native
- label-address branches.
-- [x] Support `@pad` and `@align` low-level data initializer builtins in
- top-level byte-array object definitions.
-- [x] Reject `restrict` qualifiers on non-pointer types while preserving valid
- pointer-qualified type parsing.
-- [x] Replace accidental direct-recursive-record failures with an explicit
- incomplete-record-by-value diagnostic.
-- [x] Keep function-local statics in lexical local scope by representing them
- as scoped symbol-backed variables instead of global source bindings.
-- [x] Support byte-string initializers for function-local static byte arrays.
-- [x] Structurally split `lang/toy` into `compile.c`, `lexer.c/.h`,
- `internal.h`, `parser_core.c`, `symbols.c`, `literals.c`, and
- `parser.c` while preserving the public `cfree_toy_compile` API.
-- [x] Replace fixed parser-context arrays for locals, functions, globals,
- named types, scopes, and labels with growable `ToyParser`-owned storage
- and explicit parser cleanup.
-- [x] Split type inspection and named-type registration into `types.c`, with a
- `ToyTypeTable` owned by `ToyParser` instead of loose named-type arrays.
-- [x] Split declaration, ABI, field, and record attribute parsing into
- `attrs.c` while preserving focused coverage for attribute-heavy cases.
-- [x] Add `@pcrel` lowering for typed top-level array initializers and keep
- `@symdiff` parsed for supported relocation paths; Mach-O object emission
- for `@symdiff` remains a backend-format follow-up.
-- [x] Remove legacy untyped atomic, `@index`, prefix-deref, and old
- `@va_arg(ap, T)` compatibility spellings, with negative tests for each.
-- [x] Remove legacy `@target()` alias in favor of `@target_arch()`, with
- negative coverage.
-- [x] Track mutability on `ToyVar` and reject assignment to block-local `let`
- storage while preserving mutation through pointer pointees.
-- [x] Accept tuple field indexes in `@offsetof<T>(N)`.
-- [x] Emit top-level tuple record data definitions from positional constant
- initializers.
-- [x] Add negative coverage for invalid tail calls, including variadic callees
- and return-type mismatches.
-- [x] Add declaration-order negative coverage for function and type use before
- declaration.
-- [x] Add invalid declaration, field, and record attribute negative coverage.
-- [x] Support `@pcrel` data initializer builtins in typed top-level record and
- tuple integer fields as well as arrays.
-- [x] Add negative coverage for mutually recursive by-value records through
- forward declarations.
-- [x] Add declaration coverage for `extern let` and extern thread-local object
- attributes.
-- [x] Add data initializer negative coverage for `@pcrel` outside initializer
- context and invalid pcrel slot widths.
-- [x] Add a Toy-owned type metadata layer with `ToyTypeId`, structural type
- entries for builtins/arrays/pointers/functions/anonymous records,
- nominal entries for aliases/records/tuples/enums, qualifier entries, and
- symbol-table links from locals/functions/globals/named types while
- preserving existing public-CG lowering.
-- [x] Move local/static-local/function/global insertion behind `symbols.c`
- helpers and move named-type lookup into `types.c` so parser code stops
- owning those table mutation details directly.
-- [x] Replace computed-goto target fixed scratch storage with parser-owned
- growable scratch space, removing the remaining fixed label-target cap.
-- [x] Start replacing temporary inline-asm helpers with spec-shaped typed
- `@asm<T>` parsing for `outputs(...)`, `inputs(...)`, `clobbers(...)`,
- and `flags(...)`, including a runnable one-output/two-input operand case.
-- [x] Rewrite the `@asm_int` coverage to spec-shaped typed `@asm<i64>` and
- remove the temporary `@asm_int` builtin with negative legacy coverage.
-- [x] Rewrite the `@asm_imm` coverage to typed `@asm<i64>` immediate inputs
- and remove the temporary `@asm_imm` builtin with negative legacy
- coverage.
-- [x] Rewrite the `@asm_memory` and `@asm_clobber` coverage to typed
- `@asm<void>` with `clobbers(...)`, and remove those temporary helpers
- with negative legacy coverage.
-- [x] Extend typed `@asm<T>` operands for memory inputs, inout outputs, and
- early-clobber constraints; rewrite and remove the remaining temporary
- `@asm_mem`, `@asm_inout`, and `@asm_early` helpers with negative legacy
- coverage.
-- [x] Rewrite the remaining legacy non-generic `@asm(...)`/`@asmnop()`
- call sites to `@asm<void>` and remove those temporary parser branches
- with negative legacy coverage.
-- [x] Rewrite the remaining `@typecheck`, `@byteconst`, and `@fieldtest`
- helper call sites to ordinary Toy constants/records and remove those
- temporary builtins with negative legacy coverage.
-- [x] Replace the implicit built-in `Pair` test type with an ordinary record
- declaration and remove the parser/context special case for `Pair`.
-- [x] Remove temporary `@target_os()` by fixing Apple ARM64 vararg `va_list`
- lowering in the backend and running Toy vararg tests unconditionally,
- with negative legacy coverage for `@target_os`.
-- [x] Support returning expression-control-flow values directly from
- `return if`, `return switch`, `return while<T>`, and labelled
- result-typed `return label: while<T>` by reusing the typed
- value-to-local lowering path.
-- [x] Parse typed inline-assembly `clobber_abi(.caller_saved)` groups and
- pass them through to `CfreeCgInlineAsm.clobber_abi_sets`.
-- [x] Support typed inline-assembly record results by validating record fields
- against multiple outputs and materializing those outputs into an
- aggregate value for normal field projection.
-- [x] Split typed inline-assembly parsing and validation into `lang/toy/asm.c`
- so the main parser no longer owns that low-level builtin subgrammar.
-- [x] Add negative coverage for typed inline-assembly missing outputs, result
- type mismatches, record output count/name mismatches, unknown flags, and
- unknown ABI clobber sets.
-- [x] Move type parsing into `lang/toy/types.c` beside the Toy type metadata
- layer, leaving `parser.c` focused on expressions, statements, and
- declarations.
-- [x] Preserve source field types for named records separately from erased CG
- storage, so forward-declared record pointer fields can be initialized,
- assigned, projected, and passed to typed functions after completion.
-- [x] Remove the temporary inline-asm `arch(...)` template/clobber selector in
- favor of ordinary typed asm strings and `@target_arch()`-based language
- tests, with negative coverage for the removed selector.
-- [x] Parse typed inline-assembly named input operands of the form
- `name = in("constraint", expr)`.
-- [x] Split Toy parsing into focused builtin (`builtins.c`), expression/lvalue
- (`expr.c`), declaration (`decls.c`), data initializer (`data.c`), and
- statement/control-flow (`parser.c`) modules, with shared helper
- boundaries declared in `internal.h`.
diff --git a/doc/X64_PARITY_CHECKLIST.md b/doc/X64_PARITY_CHECKLIST.md
@@ -1,389 +0,0 @@
-# x64 parity checklist
-
-Goal: bring `x86_64` to the same practical coverage as `aarch64` across
-standalone asm, disasm, C/toy compilation, object/link output, runtime, and
-debug tooling.
-
-## Status as of 2026-05-21
-
-- Fixed an optimized x64 two-address arithmetic hazard where preparing the
- destination register could clobber the RHS. This covered integer ALU ops,
- scalar FP binops, and variable shifts whose count must be routed through
- `%cl`. The motivating failure was `24_tail_arg_permute` computing `b * 2`
- into `%r8` and then immediately overwriting `%r8` with `a`.
-- Implemented x64 `u64`/FP conversions for `CV_ITOF_U` and `CV_FTOI_U`, closing
- two explicit backend panics in scalar conversion coverage.
-- Made x64 tail-call handling conservative when stack arguments are involved:
- direct emission falls back to a normal call when the current frame cannot
- reuse enough incoming stack argument area, and optimized call planning
- clears `CG_CALL_TAIL` for stack-argument tail calls. Register-only tail calls
- remain eligible.
-- Fixed virtual-register materialization so delayed arithmetic/comparisons use
- fresh SSA values instead of destructively redefining one of their source
- virtual registers. Optimized x64 jump-table lowering is enabled again and the
- `123_spec_demo` x64 O1/O2 switch path now uses the table path correctly.
-- Fixed x64 cross-test runner overhead by passing `--pull=never` to podman.
- `--net=none` blocked container networking, but podman still performed a host
- image pull/manifest check before launch. A one-case x64 toy O0/O1/O2 smoke
- dropped from roughly 30 seconds per container lookup to roughly 2 seconds.
-- Verified after these changes: `make bin`; targeted x64 toy execution for
- `24_tail_arg_permute`, `25_tail_many_stack_args`, `26_tail_live_pressure`,
- `29_tail_cross_arch_stack`, `09_function_params`, `100_record_data_relocation`,
- `120_data_symdiff`, `123_spec_demo`, and `65_rounding_conversions`. A full
- x64 toy cross run is still pending after the podman runner fix.
-
-## Status as of 2026-05-21 (parity push)
-
-Landed across the seven areas; commit `6b82eb5` "Bring x64 to parity with aa64"
-(20 files, +3560 / −688).
-
-- Built the x64 ISA descriptor layer. `src/arch/x64/isa.{h,c}` now holds a
- 75-row `x64_insn_table` plus per-format pack/unpack helpers. Encoder
- (`emit.c`), decoder (`disasm.c`), and assembler (`asm.c`) all consult the
- same table. Adding a new instruction is a one-row change.
-- Encoder migration (phase 2): 19 `emit_*` bodies in `src/arch/x64/emit.c`
- now build a format struct and call `x64_<format>_pack`. Byte-for-byte
- output unchanged (verified by `cmp` against the pre-migration
- `123_spec_demo.O2.o`).
-- Assembler refactor (phase 3): `src/arch/x64/asm.c` mnemonic dispatch goes
- through `find_mnemonic_row` / `parse_and_emit_for_format` instead of the
- hand-coded `sym_eq` cascade.
-- Disassembler rewrite: hand-coded if/else chain replaced by
- `x64_decode_prefixes` → `x64_disasm_find` → `x64_print_operands`. The
- `123_spec_demo` jump-table dispatch now disassembles cleanly with zero
- `.byte` fallback.
-- Codegen bug fix: `emit_extend_rr` was a silent no-op when `src_size >= 32`,
- leaving the destination register undefined. Repaired with a `mov dst, src`
- when needed. Closes the only baseline x64 toy failure
- (`123_spec_demo/X-O2:x64`).
-- Darwin x64 ABI seam: new `src/abi/abi_apple_x64.c` exporting
- `apple_x64_vtable`; `src/arch/x64/arch.c` now branches on
- `CFREE_OS_MACOS`. Previously x86_64-apple-darwin used the Linux SysV
- vtable unconditionally.
-- SysV x64 variadic metadata populated: `vararg_gp_offset` /
- `vararg_fp_offset` derived from fixed-arg consumption using named
- pool-size constants.
-- Linker dynamic relocations: `src/link/link_reloc.c` handles
- `R_X86_64_RELATIVE`, `R_X86_64_GLOB_DAT`, `R_X86_64_JUMP_SLOT`; `R_X86_64_COPY`
- now panics with a clear message instead of falling through to the generic
- "unsupported reloc kind" path.
-- libc test harnesses parameterized: `test/libc/{musl,glibc}/run.sh` honour
- `CFREE_LIBC_ARCHES` (default `aa64`; `x64` available). Per-arch
- sysroot/rt/triple/loader lookup with graceful SKIP when artifacts missing.
- `test/test.mk` wires the per-arch sysroot prerequisites.
- `test/libc/cases/01_syscall_write.c` splits into per-arch syscall ABI
- branches under `#ifdef`.
-- Inline asm: new `test/arch/x64_inline_test.c` with 6 smoke cases driving
- `CGTarget->asm_block` directly; `asm.c` gains the `%b` byte-register
- modifier and a full 8-bit register spelling table.
-- Debugger scaffolding: new `src/arch/x64/dbg.c` with `INT3` sentinel and a
- conservative shim builder that declines on RIP-relative operands.
- `src/dbg/displaced.c` and `src/dbg/step.c` widen arch dispatch — x64 now
- falls back to `CFREE_UNSUPPORTED` gracefully instead of failing in
- `dbg_displaced_prepare`.
-- Verification: `make test` 3616/0/0; x64 toy R/L/X 1286/0/0 (baseline
- 1285/1/0); x64 musl 18/18 (9 static + 9 dynamic); x64 glibc 9/9.
-
-## Status as of 2026-05-21 (x64 runtime-linked push)
-
-- `driver/runtime.c` now auto-builds the x64 runtime archive with the same
- higher-level freestanding members that `rt/Makefile` builds for x64:
- assert, `si_div`, string, stdlib, qsort, printf, cache, atomics, ifunc,
- int/int64, and coroutine sources.
-- Added `cc-auto-builds-and-links-libcfree-rt-x64` to `test/driver/run.sh`.
- The regression builds an x64 executable through `cfree cc --support-dir`,
- forces implicit `build/rt/x86_64-linux/libcfree_rt.a` creation, and checks
- that the auto-built archive contains `printf.c`.
-- Verified clean x64 runtime rebuilds for Linux and Darwin:
- `rm -rf build/rt/x86_64-linux && make rt-x86_64-linux` and
- `rm -rf build/rt/x86_64-apple-darwin && make rt-x86_64-apple-darwin`.
- The x64 coroutine source `rt/lib/coro/x86_64.c` is compiled through
- `build/cfree cc` in both variants.
-- Verified x64 runtime-linked execution on Darwin/arm64 via Podman
- linux/amd64: `CFREE_RT_RUNTIME_ARCHES=x64 bash test/rt/run.sh` passed
- 5/0/0, and an explicit driver-auto-built x64 runtime binary exited 42 under
- `podman run --platform linux/amd64`.
-
-## Asm / disasm
-
-- [x] Expand `src/arch/x64/asm.c` beyond the current small AT&T subset:
- branches, calls, arithmetic, shifts, compares, loads/stores, LEA, atomics,
- SSE scalar FP, and backend-emitted forms.
- - 2026-05-21: rewritten to be table-driven via `x64_insn_table`. Every
- mnemonic the prior dispatch handled flows through the table; new
- mnemonics land as one row + a format parser. Mnemonics outside the
- current corpus are not yet wired (per-format parsers exist only for
- the formats the standalone-asm and inline-asm tests exercise today —
- see phase-3 report for the list).
-- [x] Build an x64 ISA descriptor layer equivalent in role to
- `src/arch/aa64/isa.{h,c}` so encoder, decoder, printer, and tests share
- one instruction description.
- - 2026-05-21: `src/arch/x64/isa.{h,c}` landed; encoder, decoder, and
- assembler all consult `x64_insn_table`.
-- [x] Expand `src/arch/x64/disasm.c` to decode every instruction emitted by
- x64 codegen and every standalone-asm form accepted by the assembler.
- - 2026-05-21: `disasm.c` now drives entirely through `x64_disasm_find`
- + `x64_print_operands`. Cross-checked against `llvm-objdump` on the
- spec_demo binary — operand syntax matches instruction-by-instruction.
-- [x] Add x64 listing tests under `test/asm/listing/`.
- - 2026-05-22: added `x64_symbols` listing coverage for function/local
- labels and x64 PC-relative relocation annotations.
-- [ ] Make asm round-trip (`S`) meaningful for x64 codegen output and gate the
- x64-emitted corpus on it.
-- [x] Update `test/asm/regen.sh` or add an x64 variant for clang/objdump golden
- regeneration.
- - 2026-05-22: `CFREE_TEST_ARCH=x64 test/asm/regen.sh ...` now filters
- by `.targets`, uses the x86_64 clang target, and regenerates x64
- encode/decode/listing goldens.
-
-## Inline asm
-
-- [x] Broaden x64 inline-asm template rendering to cover operand modifiers and
- memory forms expected by GNU-style x86_64 asm.
- - 2026-05-21: `%b` byte-register modifier landed with a full r0..r15
- byte-name table. `%h` (high-byte), `%k` (32-bit alias), and `%z`
- (instruction-size selector) remain unimplemented.
- - 2026-05-21: GNU x86 register modifiers now render on x64:
- `%w` = 16-bit, `%k` = 32-bit, `%h` = high-byte register where legal,
- `%b` handles low byte registers including REX-only byte names, and
- `%z` selects the instruction suffix from operand type. Symbolic
- `%[name]` operands work with the same modifier path.
-- [x] Add an x64 inline-asm unit test parallel to `test/arch/aa64_inline_test.c`.
- - 2026-05-21: `test/arch/x64_inline_test.c` lands with 6 smoke cases
- and a `test-x64-inline` Makefile target wired into `make test`.
-- [ ] Verify register clobbers, `"cc"`, `"memory"`, callee-saved preservation,
- early-clobber, matching constraints, and named operands on x64.
-- [x] Add C and toy inline-asm execution cases that run on an x64 host/runner.
- - 2026-05-21: added `cg_x64_inline_asm_modifiers.c`; verified x64 parse
- R/E paths at O0 and O1 alongside the existing x64 inline asm C smoke.
-
-## C / toy codegen
-
-- [~] Close remaining explicit x64 backend panics in `src/arch/x64/ops.c`
- (`u64`/FP conversions, unsupported bitcasts, non-constant memset byte
- paths, indirect aggregate arg shapes, tail-call/sret gaps, and other
- `unsupported`/`unimpl` paths).
- - 2026-05-21: `u64`/FP conversions are implemented; tail-call stack-arg
- cases are handled conservatively rather than panicking or emitting an
- invalid sibling tail call.
- - 2026-05-21 (parity push): the remaining `unsupported`/`unimpl` paths
- (same-class bitcast, tail+sret, indirect aggregate args, memset
- non-imm byte, alloca align >16, exotic atomic op kinds, x64-unique
- "shift count kind") are *all* mirrored in aa64 — they are shared
- architectural gaps, not x64-specific regressions. Leaving this row
- partially checked until the corresponding aa64 gaps close too.
-- [~] Match aa64 coverage for scalar integer, FP, pointer, aggregate, varargs,
- atomics, intrinsics, labels, computed goto, switch lowering, and alloca.
- - 2026-05-21: scalar optimized integer/FP RHS clobbers, variable shift
- count clobbers, and optimized x64 jump-table virtual-reg materialization
- are fixed.
- - 2026-05-21 (parity push): `emit_extend_rr` 32→64 silent-no-op fixed
- (was leaving destination uninitialized). x64 toy R/L/X 1286/0/0 —
- feature parity with aa64 for the toy corpus is reached.
-- [ ] Prove x64 optimized and unoptimized C parse corpus paths with targeted
- `CFREE_TEST_ARCH=x64` runs.
-- [x] Prove toy cross-arch path `X` for x64 alongside aa64 cases.
- - 2026-05-21: targeted x64 `X` runs pass for the tail-call, conversion,
- data-relocation, and switch regression cases listed above. Full x64 `X`
- run should be repeated after the podman `--pull=never` runner fix.
- - 2026-05-21 (parity push): full x64 R/L/X toy run: 1286/0/0.
-
-## ABI / platform
-
-- [x] Finish SysV x86_64 ABI edge cases: aggregate classification, register save
- area, variadic call metadata (`AL`), sret, byval, and mixed int/FP returns.
- - 2026-05-21: variadic metadata (`vararg_gp_offset`,
- `vararg_fp_offset`) now populated by `sysv_x64_compute_func_info`.
- At that point mixed int/FP aggregate classification was still pending.
- - 2026-05-21: SysV aggregate classification now computes INTEGER/SSE
- per eightbyte for small records, including mixed int/FP records and
- homogeneous float pairs. x64 call planning/direct emission now routes
- direct multi-part args wholly to stack when either register pool lacks
- capacity, preserves indirect sret sources that conflict with `%rdi` or
- `%rax`, and accepts global byval sources. Added ABI metadata coverage
- plus x64 parse execution cases for mixed record params/returns.
-- [x] Decide and implement x86_64 Darwin ABI differences where they diverge from
- Linux/SysV behavior.
- - 2026-05-21: `apple_x64_vtable` seam added (thin delegate to SysV
- today). `x64_abi_vtable` branches on `CFREE_OS_MACOS`. Future
- Darwin-only behaviour can land in `abi_apple_x64.c` without
- re-touching SysV.
-- [ ] Implement x86_64 `long double` semantics (`x87` 80-bit in 16-byte
- storage) or document a staged compatibility mode.
-- [ ] Audit predefined macros, target triples, and driver target selection for
- Linux and Darwin x86_64 parity.
-
-## Object / link / driver
-
-- [x] Ensure ELF x86_64 relocations cover all codegen, asm, TLS, PLT/GOT, ifunc,
- and linker-script cases currently passing for aa64.
- - 2026-05-21: `link_reloc.c` adds the missing `R_X86_64_RELATIVE`,
- `R_X86_64_GLOB_DAT`, `R_X86_64_JUMP_SLOT` cases (previously fell
- through to a generic panic) and gives `R_X86_64_COPY` a descriptive
- error. Static/dynamic ELF link cases pass for x64 musl + glibc.
- - 2026-05-21: object relocation iteration now reports x86_64 ELF
- relocation names, and x64 ELF roundtrip/link paths cover `PLT32`,
- `PC32`, GOTPCREL/GOTPCRELX, TLS local-exec, ifunc, dynamic
- `RELATIVE`/`GLOB_DAT`/`JUMP_SLOT`, and linker-script cases.
-- [ ] Bring Mach-O x86_64 object/link coverage up to the aa64 Mach-O subset.
- - Ignored for this ELF-only pass.
-- [x] Exercise `cfree as`, `cc`, `ld`, `objdump`, `run`, and `emu` paths with
- x64-specific tests where the command is intended to support x64.
- - 2026-05-21: `cfree as` and `cfree objdump` confirmed for x64 via
- round-trip demo. `cfree cc` / `cfree ld` covered by toy R/L/X and
- musl/glibc suites. `emu` remains aa64/rv64-only by current design.
-- [x] Add x64 object disassembly annotation coverage for symbols and relocs.
- - 2026-05-21: `cfree_disasm_iter` now matches relocations anywhere
- inside the decoded instruction byte range, with section filtering so
- same-offset relocs in other text sections do not bleed through.
- `test/elf/unit/x64_disasm_annotations.c` covers symbol labels plus
- `call`, RIP-relative load, and `jmp` reloc annotations.
-
-## Runtime / libc
-
-- [x] Build `libcfree_rt.a` for x86_64 Linux and Darwin through cfree, not just
- host clang probes.
-- [x] Bring x86_64 coroutine/runtime assembly and C sources through the cfree
- assembler/compiler path.
- - 2026-05-21: clean `rt-x86_64-linux` and
- `rt-x86_64-apple-darwin` rebuilds compile the full x64 source set,
- including `rt/lib/coro/x86_64.c`, through `build/cfree cc`. The driver
- auto-build path now includes the same higher-level x64 runtime members
- needed by runtime-linked binaries.
-- [x] Retarget musl/glibc libc harnesses to x64 sysroots and run the same cases
- currently exercised for aa64.
- - 2026-05-21: `test/libc/{musl,glibc}/run.sh` honour
- `CFREE_LIBC_ARCHES` (default `aa64`; `x64` available). x64 musl 18/18,
- x64 glibc 9/9.
-- [x] Add x64 smoke cases that use cfree-emitted bytes, not only clang-produced
- harness binaries.
- - 2026-05-21: `test/driver/run.sh` adds
- `cc-auto-builds-and-links-libcfree-rt-x64`; an explicit
- driver-auto-built x64 runtime binary was run via Podman linux/amd64
- and exited 42.
-
-## Debug / JIT / tooling
-
-- [~] Add x64 displaced-step/debugger support: `INT3`, RIP-relative fixups,
- ucontext register marshalling, and frame walking.
- - 2026-05-21: scaffold landed (`src/arch/x64/dbg.c`); `dbg_x64_int3_byte`
- + a conservative `dbg_x64_build_shim` that declines on RIP-relative.
- `dbg_displaced_prepare` and `dbg_step_resume` dispatch x64 to the new
- path, falling back to `CFREE_UNSUPPORTED` gracefully. Real shim
- generation (ModR/M decoder + RIP-relative re-encoding) is the next
- step.
-- [ ] Emit and validate x64 DWARF CFI/line-info details, including frame-pointer
- conventions and call-frame rows.
-- [ ] Fill x64 JIT support gaps: executable memory, relocations, symbol calls,
- TLV/TLS behavior, and native-host execution tests.
-- [ ] Decide emulator scope for x86_64; either implement it or mark `emu` as
- non-parity for x64.
-
-## Known pre-existing x64 issue
-
-- aa64/01_syscall_write [dynamic] musl link is killed by SIGKILL inside
- `cfree ld` (deterministic, rc=137). Reproduces with this commit reverted —
- not a regression from the parity push. The trigger appears to be the file-
- scope inline-asm shape × aa64 dynamic-PIE codepath; other 8 aa64 cases in
- the same suite link and run cleanly, and x64 dynamic-PIE works on every
- case. Worth a follow-up investigation in the linker.
-
-## Asm / disasm
-
-- [x] Expand `src/arch/x64/asm.c` beyond the current small AT&T subset:
- branches, calls, arithmetic, shifts, compares, loads/stores, LEA, atomics,
- SSE scalar FP, and backend-emitted forms.
-- [x] Build an x64 ISA descriptor layer equivalent in role to
- `src/arch/aa64/isa.{h,c}` so encoder, decoder, printer, and tests share
- one instruction description.
-- [x] Expand `src/arch/x64/disasm.c` to decode every instruction emitted by
- x64 codegen and every standalone-asm form accepted by the assembler.
-- [x] Add x64 listing tests under `test/asm/listing/`.
-- [ ] Make asm round-trip (`S`) meaningful for x64 codegen output and gate the
- x64-emitted corpus on it.
-- [x] Update `test/asm/regen.sh` or add an x64 variant for clang/objdump golden
- regeneration.
-
-## Inline asm
-
-- [ ] Broaden x64 inline-asm template rendering to cover operand modifiers and
- memory forms expected by GNU-style x86_64 asm.
-- [ ] Add an x64 inline-asm unit test parallel to `test/arch/aa64_inline_test.c`.
-- [ ] Verify register clobbers, `"cc"`, `"memory"`, callee-saved preservation,
- early-clobber, matching constraints, and named operands on x64.
-- [ ] Add C and toy inline-asm execution cases that run on an x64 host/runner.
-
-## C / toy codegen
-
-- [ ] Close remaining explicit x64 backend panics in `src/arch/x64/ops.c`
- (`u64`/FP conversions, unsupported bitcasts, non-constant memset byte
- paths, indirect aggregate arg shapes, tail-call/sret gaps, and other
- `unsupported`/`unimpl` paths).
- - 2026-05-21: `u64`/FP conversions are implemented; tail-call stack-arg
- cases are handled conservatively rather than panicking or emitting an
- invalid sibling tail call.
-- [ ] Match aa64 coverage for scalar integer, FP, pointer, aggregate, varargs,
- atomics, intrinsics, labels, computed goto, switch lowering, and alloca.
- - 2026-05-21: scalar optimized integer/FP RHS clobbers, variable shift
- count clobbers, and optimized x64 jump-table virtual-reg materialization
- are fixed.
-- [ ] Prove x64 optimized and unoptimized C parse corpus paths with targeted
- `CFREE_TEST_ARCH=x64` runs.
-- [ ] Prove toy cross-arch path `X` for x64 alongside aa64 cases.
- - 2026-05-21: targeted x64 `X` runs pass for the tail-call, conversion,
- data-relocation, and switch regression cases listed above. Full x64 `X`
- run should be repeated after the podman `--pull=never` runner fix.
-
-## ABI / platform
-
-- [x] Finish SysV x86_64 ABI edge cases: aggregate classification, register save
- area, variadic call metadata (`AL`), sret, byval, and mixed int/FP returns.
- - 2026-05-21: completed in the parity follow-up above; long double
- remains tracked separately.
-- [ ] Decide and implement x86_64 Darwin ABI differences where they diverge from
- Linux/SysV behavior.
-- [ ] Implement x86_64 `long double` semantics (`x87` 80-bit in 16-byte
- storage) or document a staged compatibility mode.
-- [ ] Audit predefined macros, target triples, and driver target selection for
- Linux and Darwin x86_64 parity.
-
-## Object / link / driver
-
-- [ ] Ensure ELF x86_64 relocations cover all codegen, asm, TLS, PLT/GOT, ifunc,
- and linker-script cases currently passing for aa64.
-- [ ] Bring Mach-O x86_64 object/link coverage up to the aa64 Mach-O subset.
-- [ ] Exercise `cfree as`, `cc`, `ld`, `objdump`, `run`, and `emu` paths with
- x64-specific tests where the command is intended to support x64.
-- [ ] Add x64 object disassembly annotation coverage for symbols and relocs.
-
-## Runtime / libc
-
-- [x] Build `libcfree_rt.a` for x86_64 Linux and Darwin through cfree, not just
- host clang probes.
-- [x] Bring x86_64 coroutine/runtime assembly and C sources through the cfree
- assembler/compiler path.
-- [ ] Retarget musl/glibc libc harnesses to x64 sysroots and run the same cases
- currently exercised for aa64.
-- [x] Add x64 smoke cases that use cfree-emitted bytes, not only clang-produced
- harness binaries.
-
-## Debug / JIT / tooling
-
-- [ ] Add x64 displaced-step/debugger support: `INT3`, RIP-relative fixups,
- ucontext register marshalling, and frame walking.
-- [ ] Emit and validate x64 DWARF CFI/line-info details, including frame-pointer
- conventions and call-frame rows.
-- [ ] Fill x64 JIT support gaps: executable memory, relocations, symbol calls,
- TLV/TLS behavior, and native-host execution tests.
-- [ ] Decide emulator scope for x86_64; either implement it or mark `emu` as
- non-parity for x64.
-
-## Test policy
-
-- [ ] Add x64-targeted filters/goldens for each new feature as it lands.
-- [ ] Keep skips explicit and arch-scoped; do not let x64 cases silently ride
- aa64 defaults.
-- [ ] Promote targeted x64 runs into default or CI-equivalent coverage once they
- are stable on available runners.
-- [x] Prevent podman-backed cross tests from hitting registries during normal
- execution by using `--pull=never`; test images must be prepared explicitly.
diff --git a/doc/api-migration.md b/doc/api-migration.md
@@ -1,304 +0,0 @@
-# Public API migration
-
-The single `<cfree.h>` header has been replaced by a component-split
-public surface under `<cfree/...>`. Headers in `include/` are the new
-contract — internals (`src/`, `lang/`, `driver/`, `test/`) must be
-rewritten to match. **No compat shims.**
-
-## New header layout
-
-| Header | What it covers |
-| --- | --- |
-| `cfree/core.h` | `CfreeCompiler`, `CfreeContext`, `CfreeHeap`, `CfreeDiagSink`, `CfreeWriter`, `CfreeFileIO`, `CfreeMetrics`, `CfreeTarget`, `CfreeStatus`, `CfreeIterResult`, `CfreeSrcLoc`, `CfreeBytes`, `CfreeSymBind/Kind`, `CfreeSym`, lifecycle, `cfree_writer_mem`. |
-| `cfree/source.h` | `cfree_source_add_*`, `CfreeSourceFile`. |
-| `cfree/support/arena.h` | Public arena (used by frontends and link script parser). |
-| `cfree/support/hashmap.h` | Public hashmap. |
-| `cfree/objmodel.h` | Format-neutral object types: `CfreeObjSection/Symbol/Group`, `CfreeObjSecInfo`, `CfreeObjSymInfo`, `CfreeObjReloc`, `CfreeRelocKind`. |
-| `cfree/objbuild.h` | `CfreeObjBuilder` API: `cfree_obj_builder_*`. |
-| `cfree/object.h` | `CfreeObjFile` reader: `cfree_obj_open` etc. |
-| `cfree/compile.h` | `cfree_compile_c_obj{,_emit}`, `_asm_`, `_source_`. `CfreeCCompileOptions`, `CfreeAsmCompileOptions`, `CfreeFrontendCompileOptions`. `CfreeLanguage`, `CfreeSourceInput`, `cfree_register_frontend`, dep iter. |
-| `cfree/link.h` | `CfreeExeLinkOptions`, `CfreeSharedLinkOptions`, `CfreeJitLinkOptions`, `CfreeLinkScript`, parse helper. Takes `CfreeJitHost` for JIT link. |
-| `cfree/jit.h` | `CfreeJit*`, `CfreeJitHost { execmem, tls }`, image inspection. |
-| `cfree/dbg.h` | `CfreeJitSession`, `CfreeDbgHost { os }`, breakpoints/resume. |
-| `cfree/emu.h` | `cfree_emu_run/new/step/lookup/free`. |
-| `cfree/dwarf.h` | `CfreeDebugInfo`, `cfree_dwarf_open/free`, query API. `loc_read` takes a memory-reader callback, not a JIT session. |
-| `cfree/arch.h` | `CfreeArchReg`, `CfreeUnwindFrame`, register name/index helpers. |
-| `cfree/archive.h` | `cfree_ar_*` over `CfreeBytes` + `CfreeContext`. |
-| `cfree/disasm.h` | `cfree_disasm_iter_new` over a `CfreeDisasmContext`, `cfree_disasm_obj`. |
-| `cfree/frontend.h` | Frontend convenience: includes `cg.h`, `source.h`, `support/arena.h`; declares `cfree_frontend_run`, metrics bridge, fatal helpers. |
-| `cfree/cg.h` | Codegen API. Includes `cfree/core.h` + `cfree/objbuild.h`. |
-
-`<cfree.h>` no longer exists. Every TU must include only what it needs.
-
-## Type-level renames and reshapes
-
-### `CfreeEnv` → `CfreeContext`
-
-```c
-typedef struct CfreeContext {
- CfreeHeap *heap;
- const CfreeFileIO *file_io; /* may be NULL */
- CfreeDiagSink *diag;
- const CfreeMetrics *metrics; /* may be NULL */
- int64_t now; /* negative when host has no clock */
-} CfreeContext;
-```
-
-`CfreeEnv.execmem`, `.dbg_os`, `.jit_tls` are **removed**. They are now
-passed as `CfreeJitHost { execmem, tls }` to `cfree_link_jit` and
-`CfreeDbgHost { os }` to `cfree_jit_session_new`.
-
-Internal: `Compiler.env` becomes `Compiler.ctx` (type `const
-CfreeContext*`). `CfreeContext cfree_compiler_context(CfreeCompiler*)`
-is the public accessor that returns a value copy.
-
-### `CfreeBytesInput` → split
-
-The old single shape carried `name + data + len + lang`. It is now two:
-
-```c
-typedef struct CfreeBytes { /* used everywhere except source compile */
- const char *name;
- const uint8_t *data;
- size_t len;
-} CfreeBytes;
-
-typedef struct CfreeSourceInput { /* used by cfree_frontend_compile */
- CfreeBytes bytes;
- CfreeLanguage lang;
-} CfreeSourceInput;
-```
-
-Linker archive input is now `CfreeLinkArchiveInput { bytes + flags... }`,
-not `CfreeBytesInputArchive`.
-
-### `CfreeCompileOptions` → split per language
-
-```c
-typedef struct CfreeCodeOptions {
- int opt_level, debug_info;
- uint64_t epoch;
- const CfreePathPrefixMap *path_map;
- uint32_t npath_map;
-} CfreeCodeOptions;
-
-typedef struct CfreePreprocessOptions {
- const char *const *include_dirs; uint32_t ninclude_dirs;
- const char *const *system_include_dirs; uint32_t nsystem_include_dirs;
- const CfreeDefine *defines; uint32_t ndefines;
- const char *const *undefines; uint32_t nundefines;
-} CfreePreprocessOptions;
-
-typedef struct CfreeDiagnosticOptions {
- int warnings_are_errors;
- uint32_t max_errors;
-} CfreeDiagnosticOptions;
-
-typedef struct CfreeCCompileOptions {
- CfreeCodeOptions code;
- CfreePreprocessOptions preprocess;
- CfreeDiagnosticOptions diagnostics;
-} CfreeCCompileOptions;
-
-typedef struct CfreeAsmCompileOptions {
- CfreeCodeOptions code;
- CfreeDiagnosticOptions diagnostics;
-} CfreeAsmCompileOptions;
-
-typedef struct CfreeFrontendCompileOptions {
- CfreeCodeOptions code;
- CfreeDiagnosticOptions diagnostics;
- const void *language_options;
-} CfreeFrontendCompileOptions;
-```
-
-`cfree_compile_obj` → `cfree_compile_c_obj`, `cfree_compile_asm_obj`,
-registered frontends are driven through `cfree_frontend_new`,
-`cfree_frontend_compile`, and `cfree_frontend_free`. The old
-`cfree_compile_source_obj{,_emit}` convenience entrypoints were removed.
-Frontend hook signature is now
-`CfreeStatus (*)(CfreeFrontendState*, const CfreeFrontendCompileOptions*, const CfreeSourceInput*, CfreeObjBuilder*)`.
-
-### Status-returning APIs
-
-Every entry that used to return `int` (0 ok, nonzero error) or a pointer
-(NULL on failure) returns `CfreeStatus` and writes the result to an out
-parameter. Examples:
-
-| Old | New |
-| --- | --- |
-| `CfreeCompiler* cfree_compiler_new(t,e)` | `CfreeStatus cfree_compiler_new(t, ctx, CfreeCompiler **out)` |
-| `CfreeWriter* cfree_writer_mem(h)` | `CfreeStatus cfree_writer_mem(h, CfreeWriter **out)` |
-| `CfreeArena* cfree_arena_new(h, blk)` | `CfreeStatus cfree_arena_new(h, blk, CfreeArena **out)` |
-| `CfreeObjFile* cfree_obj_open(env, in)` | `CfreeStatus cfree_obj_open(ctx, bytes, CfreeObjFile **out)` |
-| `void cfree_obj_close` | `void cfree_obj_free` |
-| `CfreeObjSecInfo cfree_obj_section(o,i)` | `CfreeStatus cfree_obj_section(o, i, CfreeObjSecInfo *out)` |
-| `const u8* cfree_obj_section_data(o,i,*)` | `CfreeStatus cfree_obj_section_data(o, i, const uint8_t **out, size_t *len_out)` |
-| `CfreeObjSymIter* cfree_obj_symiter_new` | `CfreeStatus cfree_obj_symiter_new(file, CfreeObjSymIter **out)`; iterator next returns `CfreeIterResult`. |
-| `int cfree_obj_symiter_next(it, *out)` | `CfreeIterResult cfree_obj_symiter_next(it, CfreeObjSymInfo *out)` |
-| `CfreeDebugInfo* cfree_dwarf_open(c,o)` | `CfreeStatus cfree_dwarf_open(ctx, obj, CfreeDebugInfo **out)` |
-| `void cfree_dwarf_close` | `void cfree_dwarf_free` |
-| `int cfree_dwarf_*` (queries) | `CfreeStatus cfree_dwarf_*` (queries; semantics carried by status enum) |
-| `CfreeArIter* cfree_ar_iter_init(it,b)` | `CfreeStatus cfree_ar_iter_new(ctx, bytes, CfreeArIter **out)`; iterator next returns `CfreeIterResult`. |
-| `CfreeJit* cfree_link_jit(...)` | `CfreeStatus cfree_link_jit(c, opts, host, CfreeJit **out_jit)` |
-| `CfreeJitSession* cfree_jit_session_new` | `CfreeStatus cfree_jit_session_new(jit, dbghost, CfreeJitSession **out)` |
-| `int cfree_jit_session_*` | `CfreeStatus cfree_jit_session_*` |
-| `CfreeDisasmIter* cfree_disasm_iter_new` | `CfreeStatus cfree_disasm_iter_new(const CfreeDisasmContext*, bytes, len, vaddr, const CfreeObjFile* annot, CfreeDisasmIter **out)`. Iterator next returns `CfreeIterResult`. |
-| `int cfree_obj_disasm` | `CfreeStatus cfree_disasm_obj(ctx, objfile, w)` and `cfree_disasm_obj_bytes(ctx, bytes, w)`. |
-| `int cfree_register_frontend` | `CfreeStatus cfree_register_frontend` |
-| `int cfree_link_script_parse(c, t, l, *)` | `CfreeStatus cfree_link_script_parse(const CfreeContext*, t, l, CfreeLinkScript **out)`; pair-free signature `cfree_link_script_free(const CfreeContext*, CfreeLinkScript*)`. |
-| `u32 cfree_source_add_file(c,p,sys)` | `CfreeStatus cfree_source_add_file(c, p, sys, uint32_t *id_out)` (analogous for memory/builtin/include/file). |
-| `int cfree_arch_register_index/at` | `CfreeStatus` variants. |
-
-`CfreeWriter` vtable now returns `CfreeStatus` on `write` and `seek`,
-exposes `status` (not `error`); the dispatch helpers in
-`cfree/core.h` return `CfreeStatus`.
-
-`CfreeFileIO.read_all` and `.open_writer` now return `CfreeStatus` and
-take an out-parameter (already declared in the new header).
-
-`CfreeDbgOs` vtable methods that used to return `int` now return
-`CfreeStatus` (e.g. `thread_start`, `event_new` takes `void **event_out`).
-
-### `cfree_pipeline_*` is gone
-
-The driver synthesizes its own thin orchestrator (one `CfreeCompiler` +
-the call sequence). All `cfree_pipeline_*` call sites in `driver/` must
-inline the equivalent compose: build compiler → compile_c/asm → keep
-builder live → link.
-
-### `CfreeJit` linker host
-
-`cfree_link_jit` now requires a `const CfreeJitHost*` and writes the
-result through `CfreeJit **out_jit`. The host bundles `execmem` + `tls`
-that used to live on `CfreeEnv`. Drivers construct one per build.
-
-### Object-builder public API
-
-`cfree/objbuild.h` exposes the format-neutral build API atop the
-internal `obj_*`. The public surface uses `CfreeSym` (interned through
-the compiler) for section/symbol names, `CfreeObjSection`/`Symbol`/`Group`
-opaque-int handles, and `CfreeRelocKind { arch, obj_fmt, code }` for
-relocations. The implementation lives in `src/api/objbuilder.c` and is a
-thin adapter around `src/obj/obj.h`. Section indices wire through 1:1;
-the public API uses `CFREE_SECTION_NONE` = `UINT32_MAX` while the
-internal sentinel is `OBJ_SEC_NONE = 0` — convert at the boundary.
-
-### Object-file public reader
-
-`CfreeObjFile` is the public read handle. It can be:
-
- - opened from bytes via `cfree_obj_open(const CfreeContext*, const CfreeBytes*, CfreeObjFile **out)`,
- - obtained for inspection from a `CfreeJit` via `cfree_jit_view(jit)`
- (the returned `CfreeObjFile *` is non-owned).
-
-Internally the reader keeps a borrowed `ObjBuilder*` (so symbol/reloc
-iteration reuses the existing read path). The public API never exposes
-the internal handle; everything is funnelled through `CfreeObjFmt`,
-`CfreeObjSecInfo`, `CfreeObjSymInfo`, `CfreeObjReloc`.
-
-### Source registration
-
-`source_add_*` internal functions now return `CfreeStatus` and write the
-new file id to an out parameter (the public API requires this; the
-internal callers are easier to update at the same time). The public
-`CfreeSourceFile` is unchanged in shape.
-
-### Arena public type
-
-`cfree_arena_new` returns `CfreeStatus`. Callers receive the arena
-through an out pointer. The macros (`cfree_arena_new_obj`, etc.) are
-unchanged.
-
-### Per-component status codes
-
-The full enum:
-
-```c
-typedef enum CfreeStatus {
- CFREE_OK = 0, CFREE_ERR, CFREE_NOMEM, CFREE_INVALID, CFREE_UNSUPPORTED,
- CFREE_MALFORMED, CFREE_IO, CFREE_NOT_FOUND, CFREE_AMBIGUOUS,
-} CfreeStatus;
-```
-
-Pick the most specific one available. Old return semantics map as:
-- bad argument → `CFREE_INVALID`
-- allocation failure → `CFREE_NOMEM`
-- input not found in DWARF / link script / archive → `CFREE_NOT_FOUND`
-- ambiguous DWARF line resolution → `CFREE_AMBIGUOUS`
-- malformed bytes (bad magic, truncated input) → `CFREE_MALFORMED`
-- IO error from a `CfreeWriter` or `CfreeFileIO` → `CFREE_IO`
-- generic compile failure with diagnostics already emitted → `CFREE_ERR`
-- unsupported feature / arch → `CFREE_UNSUPPORTED`
-
-## Translation rubric for call-site updates
-
-1. **Headers** — replace `#include <cfree.h>` with the specific
- `<cfree/X.h>` headers actually needed. Frontends include
- `<cfree/frontend.h>`. Drivers compose what they need. Internals
- include `src/...` and the relevant `cfree/X.h`.
-
-2. **`CfreeEnv` → `CfreeContext`** — drop `execmem/dbg_os/jit_tls`
- fields at construction sites; pass them to the JIT/dbg hosts later.
- Internal `Compiler.env` becomes `Compiler.ctx`.
-
-3. **`CfreeBytesInput` for non-source uses** → `CfreeBytes`. Drop the
- `lang` field. For source compile entries, build a
- `CfreeSourceInput`.
-
-4. **`CfreeCompileOptions` users** — pick the right specialization:
- - C → `CfreeCCompileOptions { .code, .preprocess, .diagnostics }`
- - asm → `CfreeAsmCompileOptions { .code, .diagnostics }`
- - registered frontend → `CfreeFrontendCompileOptions`
-
-5. **Return value rewrites** — for every API marked Status, do
- `CfreeStatus st = cfree_X(... &out); if (st != CFREE_OK) ...`. Don't
- discard non-OK statuses silently.
-
-6. **Iterators** — `next()` returns `CfreeIterResult` (`CFREE_ITER_ITEM`,
- `_END`, `_ERROR`). Migrate `while (...next(&out))` loops to
- `for (;;) { CfreeIterResult r = next(it, &out); if (r != CFREE_ITER_ITEM) break; ... }`.
-
-7. **JIT/dbg construction** — driver builds:
-
- ```c
- CfreeJitHost jhost = { .execmem = &my_execmem, .tls = &my_tls };
- CfreeJit *jit;
- CfreeStatus st = cfree_link_jit(c, &opts, &jhost, &jit);
-
- CfreeDbgHost dhost = { .os = &my_dbg_os };
- CfreeJitSession *sess;
- st = cfree_jit_session_new(jit, &dhost, &sess);
- ```
-
-8. **DWARF loc read** — replace `cfree_jit_session_*` based reads with
- a small `CfreeDwarfReadMemFn` adapter that calls
- `cfree_jit_session_read_mem` on a captured session. The DWARF API no
- longer pulls in `cfree/dbg.h`.
-
-9. **`cfree_pipeline_*` call sites** — replaced with explicit
- `cfree_compiler_new` + `cfree_compile_c_obj` (etc.) + `cfree_link_*`
- sequences. The driver carries the resulting compiler/builder
- ownership directly.
-
-10. **Linker script** — `cfree_link_script_parse(ctx, txt, len, &out)`;
- free with `cfree_link_script_free(ctx, out)`.
-
-## Internal aliases (src/core/core.h)
-
-`Compiler`, `Heap`, `DiagSink`, `Writer`, `Target`, `ObjBuilder`,
-`ArchKind`, `OSKind`, `ObjFmt` aliases stay. Add `Context` aliasing
-`CfreeContext`. Rename `Compiler.env` → `Compiler.ctx`. Update every
-reader. `compiler_init` takes `const CfreeContext*`.
-
-## Things that **don't** change
-
-- `CfreeSym` is still a `uint32_t`.
-- `CfreeSrcLoc` shape unchanged.
-- Internal `obj_*`, `read_elf*`, `read_macho*`, `read_coff`, `read_wasm`
- signatures don't move — only their public wrappers do.
-- Internal `link_*`, `dwarf_*`, `mc_*`, `cg_*` keep their internal
- shapes.
-- The codegen public API (`cfree/cg.h`) is largely intact; only
- `cfree_cg_new` and `cfree_cg_type_record_field` switch to
- Status-return shapes.
diff --git a/doc/builtins.md b/doc/builtins.md
@@ -1,385 +0,0 @@
-# Compiler builtins used by cfree
-
-cfree's freestanding headers hardcode every value that's invariant under
-its target assumptions, and delegate to compiler builtins for everything
-that genuinely varies across targets. This file is the contract: if a
-target violates an "assumption" below, the headers (and `test/smoke.c`)
-will be wrong.
-
-## Target assumptions (hardcoded)
-
-- `CHAR_BIT == 8`
-- `short == 16` bits, `int == 32` bits, `long long == 64` bits
-- Two's complement integer representation
-- `float` is IEEE 754 binary32
-- `double` is IEEE 754 binary64
-
-## What genuinely varies (delegated)
-
-| Quantity | Why it varies |
-| ------------------------- | -------------------------------------------------- |
-| `char` signedness | ARM defaults unsigned, x86 signed; flippable with `-funsigned-char`. Not changeable from a header. |
-| `long` width | LP64 (Unix 64-bit) makes it 64; LLP64 (Win64) and ILP32 keep it 32 |
-| `long double` format | x86 80-bit, AArch64 binary128 *or* binary64, PowerPC double-double, MSVC binary64 |
-| `FLT_ROUNDS` | Runtime rounding mode (function call required) |
-| `FLT_EVAL_METHOD` | x87 vs SSE vs embedded toolchains differ |
-| `intptr_t` width | 32 vs 64 bits |
-| `size_t`, `ptrdiff_t` | Track pointer width |
-| `wchar_t` | 16-bit on Windows, 32-bit on Unix; signedness varies |
-| `intmax_t` literal type | `long` on LP64, `long long` on LLP64 |
-| `int_fast{N}_t` widths | Each target picks its own "fast" width |
-| `va_list` and varargs ABI | Call convention is target-defined |
-| `max_align_t` | Track widest scalar alignment |
-
----
-
-## Builtins
-
-Grouped by header, every `__builtin_*` or `__*__` we still depend on.
-
-### `<float.h>`
-- `__builtin_flt_rounds()` — runtime rounding mode → `FLT_ROUNDS`
-- `__FLT_EVAL_METHOD__`
-- `__DECIMAL_DIG__`
-- `__LDBL_HAS_DENORM__` → `LDBL_HAS_SUBNORM`
-- `__LDBL_MANT_DIG__`, `__LDBL_DECIMAL_DIG__`, `__LDBL_DIG__`
-- `__LDBL_MIN_EXP__`, `__LDBL_MIN_10_EXP__`
-- `__LDBL_MAX_EXP__`, `__LDBL_MAX_10_EXP__`
-- `__LDBL_MAX__`, `__LDBL_MIN__`, `__LDBL_EPSILON__`
-- `__LDBL_DENORM_MIN__` → `LDBL_TRUE_MIN`
-
-### `<limits.h>`
-- `__LONG_MAX__` → `LONG_MAX` (and derived `LONG_MIN`, `ULONG_MAX`)
-- `__CHAR_UNSIGNED__` — defined ⇔ plain `char` is unsigned
-
-### `<stddef.h>`
-- `__PTRDIFF_TYPE__` → `ptrdiff_t`
-- `__SIZE_TYPE__` → `size_t`
-- `__WCHAR_TYPE__` → `wchar_t` (C only; in C++ it's a keyword)
-- `__builtin_offsetof(t, m)` → `offsetof`
-
-### `<stdint.h>`
-Types (aliases vary by data model even when limits don't):
-- `__INT{8,16,32,64}_TYPE__`, `__UINT{N}_TYPE__`
-- `__INT_LEAST{N}_TYPE__`, `__UINT_LEAST{N}_TYPE__`
-- `__INT_FAST{N}_TYPE__`, `__UINT_FAST{N}_TYPE__`
-- `__INTPTR_TYPE__`, `__UINTPTR_TYPE__`
-- `__INTMAX_TYPE__`, `__UINTMAX_TYPE__`
-
-Limits that are not pinned by the target assumptions:
-- `__INT_FAST{N}_MAX__`, `__UINT_FAST{N}_MAX__`
-- `__INTPTR_MAX__`, `__UINTPTR_MAX__`
-- `__INTMAX_MAX__`, `__UINTMAX_MAX__`
-- `__PTRDIFF_MAX__`
-- `__SIZE_MAX__`
-- `__WCHAR_MAX__`, `__WCHAR_MIN__`
-- `__WINT_MAX__`, `__WINT_MIN__`
-- `__SIG_ATOMIC_MAX__`, `__SIG_ATOMIC_MIN__`
-
-64-bit and intmax constant macros (literal suffix tracks the alias):
-- `__INT64_C(c)`, `__UINT64_C(c)`
-- `__INTMAX_C(c)`, `__UINTMAX_C(c)`
-
-### `<stdarg.h>`
-Entirely compiler-supplied — varargs ABI is target-defined:
-- `__builtin_va_list` (type)
-- `__builtin_va_start`, `__builtin_va_arg`, `__builtin_va_end`, `__builtin_va_copy`
-
-### `<stdatomic.h>`
-Atomic codegen, lock-free shape, and fence semantics are target-defined.
-The `__atomic_*` family must operate transparently on `_Atomic`-qualified
-pointers (no separate variant for atomic-typed args).
-
-Memory-order constants (values for the `memory_order` enum):
-- `__ATOMIC_RELAXED`, `__ATOMIC_CONSUME`, `__ATOMIC_ACQUIRE`,
- `__ATOMIC_RELEASE`, `__ATOMIC_ACQ_REL`, `__ATOMIC_SEQ_CST`
-
-Lock-free shape (per-type, value 0/1/2 per C11 7.17.5):
-- `__ATOMIC_{BOOL,CHAR,CHAR16_T,CHAR32_T,WCHAR_T,SHORT,INT,LONG,LLONG,POINTER}_LOCK_FREE`
-
-Types for the C11 char/wide aliases (also delegated for `<stddef.h>`):
-- `__CHAR16_TYPE__`, `__CHAR32_TYPE__`, `__WCHAR_TYPE__`
-
-Operations (signatures match the GCC `__atomic` builtin family):
-- `__atomic_load_n(ptr, order)`
-- `__atomic_store_n(ptr, val, order)`
-- `__atomic_exchange_n(ptr, val, order)`
-- `__atomic_compare_exchange_n(ptr, expected, desired, weak, succ, fail)`
-- `__atomic_fetch_add`, `__atomic_fetch_sub`, `__atomic_fetch_or`,
- `__atomic_fetch_xor`, `__atomic_fetch_and` — `(ptr, val, order)`
-- `__atomic_thread_fence(order)`, `__atomic_signal_fence(order)`
-- `__atomic_is_lock_free(size, ptr)`
-- `__atomic_test_and_set(ptr, order)`, `__atomic_clear(ptr, order)` — for
- `atomic_flag`
-
-### Syscalls (cfree extension)
-
-Declared in `<cfree/syscall.h>`. Kernel-trap primitive so libc syscall
-stubs can be pure C. Numbers (`SYS_*`) are libc's responsibility —
-cfree only provides the instruction. All args and result are `long`;
-pointers/sizes/fds get cast at the call site.
-
-- `__cfree_syscall0(nr)` … `__cfree_syscall6(nr, a0, a1, a2, a3, a4, a5)`
-
-Semantics:
-- Result is normalized to Linux-style `-errno` on failure, non-negative
- on success, on every target. On BSD/Darwin the lowering inspects the
- carry/C flag and rewrites the result.
-- Modeled as an opaque external call with full memory clobber plus the
- target's syscall-clobber list (so the optimizer cannot move work
- across the trap).
-- Not available on WASM — compile-time error directs callers to WASI
- imports.
-
-Per-target lowering:
-
-| Target | Instr | Nr reg | Args | Result | Error |
-| --------------- | ----------------- | ------ | -------------------------- | ------ | -------- |
-| Linux x86_64 | `syscall` | rax | rdi, rsi, rdx, r10, r8, r9 | rax | rax < 0 |
-| Linux i386 | `int 0x80` | eax | ebx, ecx, edx, esi, edi, ebp | eax | eax < 0 |
-| Linux aarch64 | `svc #0` | x8 | x0..x5 | x0 | x0 < 0 |
-| Linux arm | `svc #0` | r7 | r0..r5 | r0 | r0 < 0 |
-| Linux riscv | `ecall` | a7 | a0..a5 | a0 | a0 < 0 |
-| Darwin x86_64 | `syscall` | rax (class bits already in nr) | rdi, rsi, rdx, r10, r8, r9 | rax | carry → −errno |
-| Darwin aarch64 | `svc #0x80` | x16 | x0..x5 | x0 | C flag → −errno |
-
-i386 6-arg case (`ebp` is the frame pointer): cfree saves/restores
-`ebp` around the trap.
-
-### Bare-metal primitives (cfree extension)
-
-Declared in `<cfree/baremetal.h>`. For freestanding / embedded use, so
-libc and HAL code can stay pure C. All have opaque-call +
-full-memory-clobber semantics so the optimizer cannot reorder loads,
-stores, or other side effects across them.
-
-Interrupt control (the standard save/disable/restore critical-section
-idiom):
-- `unsigned long __cfree_irq_save(void)` — disable IRQs, return previous mask
-- `void __cfree_irq_restore(unsigned long prev)`
-- `void __cfree_irq_disable(void)`, `void __cfree_irq_enable(void)`
-
-Lowerings: x86 `cli`/`sti` + `pushf`/`popf`; Cortex-A/R `cpsid i`/`cpsie i`
-+ CPSR; Cortex-M `cpsid i`/`cpsie i` + PRIMASK (selected by
-`__ARM_ARCH_*` profile macros); aarch64 `msr daifset/daifclr, #2` +
-`mrs daif`; RISC-V `csrr{ci,si} mstatus, 8`.
-
-CPU memory barriers — distinct from `__atomic_thread_fence`. C11 fences
-provide ordering for the C abstract machine; these emit the specific
-CPU barriers required for DMA-coherent device memory, MMU/TLB
-reconfiguration, and self-modifying / freshly-loaded code.
-
-```c
-typedef enum {
- __CFREE_BARRIER_FULL, // sy
- __CFREE_BARRIER_INNER, // ish
- __CFREE_BARRIER_INNER_STORE, // ishst
- __CFREE_BARRIER_OUTER, // osh
- __CFREE_BARRIER_OUTER_STORE, // oshst
- __CFREE_BARRIER_NON_SHARE, // nsh
-} __cfree_barrier_scope;
-
-void __cfree_dmb(__cfree_barrier_scope); // ordering only
-void __cfree_dsb(__cfree_barrier_scope); // ordering + completion
-void __cfree_isb(void); // pipeline flush after sysreg / MMU change
-```
-
-Lowerings: arm/aarch64 `dmb/dsb/isb <scope>`; x86 `mfence`/`lfence`/`sfence`
-(scope ignored — TSO collapses the cases) and `isb` is a no-op (x86
-self-snoops); RISC-V `fence rw,rw` and `fence.i`. WASM: compile-time error.
-
-Cache maintenance (range-based; cfree reads `CTR`/`CTR_EL0` once at
-startup for the line size and emits a loop):
-- `void __cfree_dcache_clean(const void *, unsigned long)` — write-back
-- `void __cfree_dcache_invalidate(void *, unsigned long)`
-- `void __cfree_dcache_clean_invalidate(void *, unsigned long)`
-- `void __cfree_icache_invalidate(const void *, unsigned long)`
-
-Lowerings: aarch64 `dc {cvac,ivac,civac}` + `ic ivau` loops; arm v7+
-equivalents via CP15. x86: no-ops (cache-coherent ICache included).
-RISC-V: Zicbom / Zicboz instructions when those extensions are present,
-otherwise a compile-time error.
-
-Hints:
-- `void __cfree_nop(void)`
-- `void __cfree_yield(void)` — spin-loop hint; arm `yield`, x86 `pause`,
- RISC-V `pause`
-- `void __cfree_wfi(void)` — sleep until next interrupt; arm/aarch64
- `wfi`, x86 `hlt`, RISC-V `wfi`. All three are privileged, which is
- fine for bare-metal. Compile-time error on WASM.
-- `void __cfree_wfe(void)`, `void __cfree_sev(void)` — arm/aarch64
- only; compile-time error elsewhere. The inter-core event-flag
- abstraction (SEV sets, WFE waits, exclusive-monitor release also
- sets) does not generalize: x86 MONITOR/MWAIT is address-watch and
- privileged-extension; RISC-V has no base-ISA equivalent. Use
- `__cfree_yield` + `__cfree_wfi` for portable spin/idle loops.
-
-System-register access (`mrs`/`msr`, `csrr`/`csrw`, `rdmsr`/`wrmsr`,
-MMU/cache config, etc.) is **not** provided as a builtin. Callers use
-extended inline asm directly. Rationale: register names and privilege
-rules vary per ISA generation; the call sites are arch-specific
-already; abstracting adds churn without removing platform code.
-
----
-
-## `libcfree_rt.a` — runtime support library
-
-The codegen emits calls to symbols the user can't reasonably supply. cfree
-ships them all in a single archive: integer/float/atomic helpers *and* the
-`mem*` family the codegen lowers struct copies and aggregate inits to.
-
-Naming follows the libgcc / compiler-rt convention (`{op}{mode}{N}`, where
-mode is `qi/hi/si/di/ti/sf/df/tf` for 1/2/4/8/16-byte int and 32/64/128-bit
-float). All `mem*` are weak so a user libc wins.
-
-### Mem intrinsics (always shipped)
-- `memcpy`, `memmove`, `memset`, `memcmp`
-
-### Integer helpers
-Always:
-- Div/mod 64-bit: `__divdi3`, `__udivdi3`, `__moddi3`, `__umoddi3`, `__divmoddi4`, `__udivmoddi4`
-- Count/bits: `__clzsi2`, `__clzdi2`, `__ctzsi2`, `__ctzdi2`, `__ffsdi2`, `__popcountsi2`, `__popcountdi2`, `__paritysi2`, `__paritydi2`, `__bswapsi2`, `__bswapdi2`
-- Compare: `__cmpdi2`, `__ucmpdi2`
-- Negate/abs: `__negdi2`, `__absvdi2`
-
-64-bit targets only (128-bit `__int128` support):
-- `__divti3`, `__udivti3`, `__modti3`, `__umodti3`, `__divmodti4`, `__udivmodti4`
-- `__ashlti3`, `__lshrti3`, `__ashrti3`, `__multi3`, `__negti2`, `__clzti2`, `__ctzti2`
-
-32-bit targets only (no native 64-bit ops):
-- `__muldi3`, `__ashldi3`, `__lshrdi3`, `__ashrdi3`
-
-### Soft-float (only on FPU-less targets — RV{32,64}I, ARM `-mfloat-abi=soft`, WASM-no-simd)
-- Arithmetic `sf`/`df`/`tf`: `__add`, `__sub`, `__mul`, `__div`, `__neg`
- → e.g. `__addsf3`, `__divdf3`, `__multf3`
-- Int → float: `__float{,un}{si,di,ti}{sf,df,tf}` (e.g. `__floatdisf`, `__floatunsidf`)
-- Float → int: `__fix{,uns}{sf,df,tf}{si,di,ti}` (e.g. `__fixdfdi`, `__fixunssfsi`)
-- Float → float: `__extendsfdf2`, `__extendsftf2`, `__extenddftf2`, `__truncdfsf2`, `__trunctfsf2`, `__trunctfdf2`
-- Compare: `__eq`, `__ne`, `__lt`, `__le`, `__gt`, `__ge`, `__unord` × `sf2`/`df2`/`tf2`
-
-### Nonlocal jumps + stackful coroutines (per-arch, always shipped)
-`<setjmp.h>` and `<cfree/coro.h>` share one per-target context payload
-(256 bytes, 16-byte aligned): callee-saved GPRs + callee-saved FPRs
-+ sp + return address. `jmp_buf` and `coro_ctx` are both opaque
-typedefs over that payload; the runtime reinterprets them as the
-per-arch struct.
-
-- `setjmp`, `longjmp` — `<setjmp.h>` (C11 7.13). cfree extension:
- this header is *not* in the C11 freestanding subset.
-- `coro_init`, `coro_resume`, `coro_yield`, `coro_self` — public
- asymmetric API in `<cfree/coro.h>`. Resume drives a coroutine
- forward; yield suspends back to the most recent resumer; resumes
- nest like function calls. Status (`CORO_INIT` / `RUNNING` /
- `SUSPENDED` / `DEAD`) is tracked on the `coro_t` and propagates
- through `coro_resume`'s result.
-- `__cfree_coro_switch(from, to, value) -> uintptr_t` — the symmetric
- primitive. `coro_resume` / `coro_yield` are built on it; setjmp =
- save+return-0, longjmp = restore+deliver-val. Exposed (with the
- `__cfree_` prefix to signal "compiler-builtin-style") for
- schedulers that don't fit the asymmetric resume-chain model.
-- `__cfree_coro_ctx_init`, `__cfree_coro_trampoline` — internal,
- used only by `lib/coro/coro.c`'s asymmetric layer.
-
-Implementation: one master `.c` per arch under `lib/coro/` (file-scope
-asm + tiny C `__cfree_coro_ctx_init`), plus one arch-agnostic
-`coro/coro.c` for the public asymmetric layer. ARM has two arch
-masters: `arm32.c` (Thumb-2, ARMv7+, may use VFP `d8-d15`) and
-`arm32_thumb1.c` (ARMv6-M, no IT blocks / no VFP / data-processing
-limited to r0-r7). Not provided for: WASM (would need an
-Asyncify-fiber port).
-
-### Atomic fallbacks (only when target lacks native atomics for that width)
-- Generic: `__atomic_load`, `__atomic_store`, `__atomic_exchange`, `__atomic_compare_exchange`
-- Sized N ∈ {1,2,4,8,16}: `__atomic_load_N`, `__atomic_store_N`, `__atomic_exchange_N`, `__atomic_compare_exchange_N`, `__atomic_fetch_{add,sub,and,or,xor,nand}_N`
-
-### Architecture-specific aliases
-
-**ARM AAPCS / AEABI** (32-bit ARM only — these are aliases the AEABI ABI mandates):
-- Int div/mod: `__aeabi_idiv`, `__aeabi_uidiv`, `__aeabi_idivmod`, `__aeabi_uidivmod`, `__aeabi_ldivmod`, `__aeabi_uldivmod`
-- 64-bit shift/mul: `__aeabi_llsl`, `__aeabi_llsr`, `__aeabi_lasr`, `__aeabi_lmul`
-- Soft-float arith: `__aeabi_{f,d}{add,sub,mul,div,neg}`, `__aeabi_{f,d}rsub`
-- Soft-float convert: `__aeabi_f2iz`, `__aeabi_f2uiz`, `__aeabi_f2lz`, `__aeabi_f2ulz`, `__aeabi_d2iz`, `__aeabi_d2uiz`, `__aeabi_d2lz`, `__aeabi_d2ulz`, `__aeabi_i2f`, `__aeabi_ui2f`, `__aeabi_l2f`, `__aeabi_ul2f`, `__aeabi_i2d`, `__aeabi_ui2d`, `__aeabi_l2d`, `__aeabi_ul2d`, `__aeabi_f2d`, `__aeabi_d2f`
-- Soft-float compare: `__aeabi_fcmp{eq,lt,le,gt,ge,un}`, `__aeabi_dcmp{eq,lt,le,gt,ge,un}`
-- Mem variants (size-specialized): `__aeabi_memcpy`, `__aeabi_memcpy{4,8}`, `__aeabi_memmove`, `__aeabi_memmove{4,8}`, `__aeabi_memset`, `__aeabi_memset{4,8}`, `__aeabi_memclr`, `__aeabi_memclr{4,8}`
-
-**RISC-V** (only with `-msave-restore`, used by RV32E/embedded code-size builds):
-- `__riscv_save_{0..12}`, `__riscv_restore_{0..12}`
-
-**x86 / x86_64**: no architecture-specific aliases; uses the generic libgcc names above.
-
-**WASM**: uses generic names; `memcpy`/`memset`/`memmove` may lower to `memory.copy` / `memory.fill` instructions instead of calls.
-
----
-
-## Target-identification macros
-
-cfree predefines a small, stable set of macros so headers and user code
-can branch on architecture, OS, object format, and ABI without parsing
-target triples. Compatible-by-design with the GCC/Clang names — code
-written against `__x86_64__` / `__BYTE_ORDER__` / `__LP64__` works
-unchanged.
-
-### Compiler identification
-- `__cfree__` — defined to `1`
-- `__cfree_major__`, `__cfree_minor__`, `__cfree_patchlevel__`
-- `__STDC__ == 1`, `__STDC_VERSION__ == 201112L`
-- `__STDC_HOSTED__ == 0` (cfree is freestanding-only)
-- `__STDC_NO_COMPLEX__`, `__STDC_NO_THREADS__`, `__STDC_NO_VLA__` defined
-- `__STDC_NO_ATOMICS__` *not* defined (cfree implements `<stdatomic.h>`)
-
-### Architecture (exactly one defined)
-- `__i386__` — 32-bit x86
-- `__x86_64__` (and `__amd64__`) — 64-bit x86
-- `__arm__` — 32-bit ARM
-- `__aarch64__` — 64-bit ARM
-- `__riscv` — RISC-V (any width); paired with `__riscv_xlen` ∈ {32, 64}
-- `__wasm__` — WebAssembly; paired with `__wasm32__` or `__wasm64__`
-
-### Pointer width / data model
-- `__SIZEOF_POINTER__`, `__SIZEOF_LONG__`, `__SIZEOF_SIZE_T__`, `__SIZEOF_PTRDIFF_T__`, `__SIZEOF_WCHAR_T__`, `__SIZEOF_INT__`, `__SIZEOF_LONG_LONG__`, `__SIZEOF_FLOAT__`, `__SIZEOF_DOUBLE__`, `__SIZEOF_LONG_DOUBLE__`
-- One of: `__LP64__` / `_LP64` (Unix 64), `__ILP32__` (32-bit), or neither (LLP64 — Win64)
-
-### Endianness
-- `__BYTE_ORDER__` set to one of `__ORDER_LITTLE_ENDIAN__` / `__ORDER_BIG_ENDIAN__`
-- `__ORDER_LITTLE_ENDIAN__ == 1234`, `__ORDER_BIG_ENDIAN__ == 4321` (values match GCC)
-
-### OS / platform (zero or one defined; freestanding bare-metal defines none)
-- `__linux__` — Linux ABI
-- `__APPLE__` and `__MACH__` — Darwin / macOS
-- `_WIN32` (always on Windows), plus `_WIN64` on 64-bit Windows
-
-### Object format (exactly one defined per output)
-- `__ELF__` — ELF (Linux, *BSD, bare-metal Unix-ish)
-- `__MACH__` — Mach-O (Darwin)
-- `_WIN32` doubles as the PE/COFF marker (matches MSVC/MinGW convention)
-
-### ARM-specific (defined only when `__arm__` or `__aarch64__`)
-- `__ARM_ARCH` — integer arch version (7, 8, …); plus profile-specific `__ARM_ARCH_{7A,7R,7M,8A,…}__`
-- `__ARM_EABI__` — defined on AAPCS/AEABI targets (always, for cfree's ARM32)
-- `__ARM_PCS` (base PCS) or `__ARM_PCS_VFP` (hard-float PCS)
-- `__ARM_FP` — bitmask of supported FP widths (0x4=fp32, 0x8=fp64); undefined on soft-float
-- `__SOFTFP__` — defined ⇔ `-mfloat-abi=soft` (no FPU instructions, soft-float ABI)
-- `__ARM_NEON` — defined ⇔ NEON SIMD available
-
-### RISC-V-specific (defined only when `__riscv`)
-- `__riscv_xlen` ∈ {32, 64}
-- `__riscv_flen` ∈ {0, 32, 64} — widest hardware FP register (0 ⇒ soft-float)
-- Extension flags (defined ⇔ extension is on): `__riscv_mul`, `__riscv_div`, `__riscv_atomic`, `__riscv_compressed`, `__riscv_fdiv`, `__riscv_fsqrt`
-- ABI: `__riscv_float_abi_soft`, `__riscv_float_abi_single`, `__riscv_float_abi_double` (exactly one)
-
-### x86-specific (defined only when `__i386__` or `__x86_64__`)
-- Feature flags follow GCC names, defined ⇔ enabled at the chosen `-march`: `__SSE__`, `__SSE2__`, `__SSE3__`, `__SSSE3__`, `__SSE4_1__`, `__SSE4_2__`, `__AVX__`, `__AVX2__`, `__BMI__`, `__BMI2__`, `__POPCNT__`, `__FMA__`
-
-### WASM-specific (defined only when `__wasm__`)
-- `__wasm_simd128__` — defined ⇔ SIMD proposal enabled
-- `__wasm_bulk_memory__` — defined ⇔ `memory.copy`/`memory.fill` available (gates the lowering noted under mem intrinsics)
-
----
-
-## Discovery
-
-To enumerate what a compiler predefines for the current target:
-
-```sh
-cc -dM -E -x c /dev/null | sort
-```
diff --git a/doc/cg-api-status.md b/doc/cg-api-status.md
@@ -1,104 +0,0 @@
-# CG API And Toy Language Status
-
-## Current Status
-
-The public CG API in `src/api/cg.c` has concrete implementations for the
-planned value, selector, control-flow, type, data, intrinsic, atomic, variadic,
-and inline-asm entry points.
-
-Value categories are explicit:
-
-- `cfree_cg_push_symbol` and `cfree_cg_push_bytes` push pointer/address rvalues.
-- `cfree_cg_indirect` converts a non-void pointer rvalue to a pointee lvalue.
-- `cfree_cg_load` converts lvalues to rvalues.
-- `cfree_cg_addr` converts lvalues to pointer rvalues.
-- `cfree_cg_store` is statement-like: `[lvalue, value] -> []`.
-- `cfree_cg_dup` preserves value category and gives rvalue registers independent
- ownership.
-
-Selectors are lvalue-producing:
-
-- `cfree_cg_index` selects an element lvalue.
-- `cfree_cg_field` selects a record field lvalue.
-- Callers use `cfree_cg_addr` after a selector when they need an address.
-
-Control-flow and calls:
-
-- Public scopes are stack-disciplined. `scope_end` must be LIFO and stale or
- inactive handles are rejected.
-- Expression-valued scopes reconcile fallthrough and break results through a
- canonical result slot.
-- Public inline helpers cover the common `if` / `else` pattern.
-- `cfree_cg_tail_call` is a terminator and pushes no result.
-
-Types:
-
-- `CFREE_CG_BUILTIN_VA_LIST` and `CfreeCgBuiltinTypes.va_list` expose the
- target ABI `va_list` type.
-- Pointer, array, qualified, and function type constructors intern by shape.
-- Aliases and nominal record/enum constructors remain source-identity
- producing.
-
-Toy currently supports:
-
-- Immutable and mutable globals, locals, parameters, function calls, recursion,
- variadic functions, `va_list`, pointers, address/deref syntax, arithmetic,
- comparisons, bitwise operators, shifts, unary operators, `&&`, `||`, `while`,
- `break`, `continue`, `if` / `else`, and `return tail f(...)`.
-- CG API coverage builtins: `typecheck()`, `byteconst()`, `alloca`, `index`,
- `memset`, `memcpy`, `atomic_load`, `atomic_store`, `atomic_add`,
- `atomic_sub`, `atomic_cas_ok`, `fence`, `popcount`, `ctz`, `clz`, `bswap`,
- `expect`, `fieldtest()`, `target()`, `target_os()`, `va_start`, `va_arg`,
- `va_end`, `va_copy`, `asm(...)`, `asm_int(...)`, `asm_imm(...)`,
- `asm_mem(...)`, `asm_inout(...)`, `asm_early(...)`, `asm_memory(...)`, and
- `asm_clobber(...)`.
-- Lowering uses the explicit value-category API:
- `push_symbol + indirect + load/store`, `push_bytes + indirect + load`,
- `cfree_cg_field`, `cfree_cg_va_*`, `cfree_cg_inline_asm`, statement-like
- `store`, terminator tail calls, and the public inline `if` / `else` helpers.
- `asm(arch("aa64", "x64", "rv64"))` chooses a target-specific template at
- compile time; an empty selected template is a no-op so unsupported inline-asm
- backends can still compile the same toy source.
-
-Toy validation:
-
-- `test/toy/run.sh` supports:
- - `R`: `cfree run case.toy`
- - `L`: `cfree cc -c case.toy`, `cfree ld`, native execution
- - `X`: opt-in Linux cross-target compile/link/execute for `aa64`, `x64`, and
- `rv64` via `cfree cc -target`, `cfree ld`, and `test/lib/exec_target.sh`
-- Cross-arch validation intentionally has no cross-arch JIT path.
-- `test/toy/cases/19_cg_api_variadic_asm.toy` executes variadic API coverage on
- non-macOS targets. On macOS/AArch64 it compiles the same variadic helper but
- avoids executing it because the current AArch64 backend va_arg walker is still
- AAPCS64-shaped while Apple `va_list` is a byte cursor.
-
-Current validation:
-
-- `make lib`
-- `make bin`
-- `make test-cg-api`
-- `make test-cg-binder`
-- `make test-toy` - 38 pass, 0 fail, 0 skip
-- `CFREE_TEST_PATHS=X test/toy/run.sh` - 57 pass, 0 fail, 0 skip
-- `make test-cg` - 1573 pass, 0 fail, 0 skip
-- `test/toy/demo.toy` compiles with `cfree cc -c`
-
-## Plan / TODOs
-
-1. Add direct CG API misuse tests.
- - Keep type return-value checks in `test/api/cg_type_test.c` or a sibling
- API test.
- - Add a focused panic-catching misuse harness for stack underflow,
- stale/non-LIFO scopes, invalid field indexes/base types, invalid
- `indirect`, and unsupported data relocation widths.
-
-2. Add toy error tests.
- - Extend `test/toy/run.sh` with an error-case mode if needed.
- - Add expected diagnostic-message matching.
-
-3. Complete `test/toy/demo.toy`.
- - The demo currently covers toy syntax, globals, control flow, calls, memory
- helpers, atomics, tail calls, inline asm, and public CG API builtins.
- - Add a demo variadic path once macOS/AArch64 va_arg execution matches the
- public ABI shape.
diff --git a/doc/cg-ext.md b/doc/cg-ext.md
@@ -1,618 +0,0 @@
-# Public CG Extension Plan
-
-Scope: extensions needed for `include/cfree/cg.h` to serve as a portable
-direct codegen API for frontends other than C. This is not a plan for a stored
-LLVM-like IR. `CfreeCg` remains an imperative emitter bound to a
-`CfreeObjBuilder`; frontends lower their own AST/HIR/MIR directly into the API.
-
-This API is new enough that compatibility with the current draft is not a
-constraint. Make breaking changes. One clean way of doing things.
-
-The target user is a language frontend with its own parser, type checker, and
-high-level lowering: C, Zig, Rust-like languages, toy languages, emulators, and
-system DSLs. The frontend should not include internal `src/` headers, should
-not know `Type*`, `ObjSymId`, `CGTarget`, or `MCEmitter`, and should be able to
-generate correct code for every backend supported by `CfreeTarget`.
-
-## 1. Goals
-
-- Preserve the direct-emission model: no public module/value/block IR object is
- required.
-- Focus on backend codegen coverage and correctness, not frontend ergonomics.
-- Keep backend decisions in the backend: ABI classification, TLS sequences,
- GOT/PLT/stubs/IAT, branch relaxation, relocation encoding, and section layout.
-- Let frontends state facts that materially affect generated code: calling
- convention, ABI attributes, memory access properties, atomics, volatility,
- linkage, object placement, and source/debug identity.
-- Keep the surface portable but not lowest-common-denominator. Unsupported
- target combinations should be diagnosable from API calls.
-- Keep public handles opaque/integer-sized and context-owned. No global state.
-- Maintain one way to spell each codegen fact.
-
-## 2. Non-goals
-
-- A serialized IR, textual IR, pass manager, verifier over stored functions, or
- reusable use-def graph.
-- Language semantics above the codegen boundary. Borrow checking, comptime,
- monomorphization, generics, trait dispatch, overload resolution, destructor
- insertion, and safety checks belong in the frontend.
-- Arbitrary source-language types. The frontend lowers them to codegen storage,
- ABI, memory, and debug facts before calling CG.
-- Unwind/exception handling beyond the existing setjmp/longjmp intrinsics.
- Panic/throw paths must lower to explicit normal control flow plus calls, or
- to noreturn runtime helpers.
-- Full LTO. Direct CG may still feed the existing optimizer wrapper, but that is
- an implementation detail below this public API.
-
-## 3. Pre-Phase-1 Shape
-
-The pre-Phase-1 public CG API already provided useful pieces:
-
-- Target context through `CfreeCompiler` / `CfreeTarget`.
-- Builtin integer, float, pointer, array, function, record, enum, alias, and
- qualified types.
-- Symbol declarations, visibility, TLS model, object definitions, relocatable
- data expressions, and direct/indirect calls.
-- A value stack with lvalue/rvalue conversion, local/param slots, labels,
- structured scopes, arithmetic, comparisons, conversions, intrinsics, atomics,
- inline asm, and varargs.
-
-The largest limitation was that too many important backend facts were implicit,
-C-shaped, duplicated between type and operation APIs, or unrepresentable.
-
-## 4. Type Model
-
-The type model should describe codegen storage and ABI classification, not
-source-language semantics. A Rust `u32`, C `unsigned int`, Zig `u32`, and an
-emulator's 32-bit guest register can all use the same codegen integer type.
-
-### 4.1 Integers
-
-Use width-only integer storage types. Signedness belongs on operations,
-comparisons, conversions, and ABI extension attributes, not on the integer type.
-
-Recommended integer builtins:
-
-- `i1`/`bool` as the branch and compare-result type.
-- `i8`, `i16`, `i32`, `i64`.
-- `i128` to helper-lowered arithmetic and ABI handling for targets that lack
- native support
-- `isize`/`usize` are frontend aliases, not distinct codegen storage types. The
- frontend can choose `i32` or `i64` from the target pointer size.
-
-Consequences:
-
-- Remove separate signed/unsigned integer type constructors or builtins.
-- Keep signed/unsigned operation variants where semantics differ:
- signed/unsigned div/rem/compare, sign/zero extension, arithmetic right shift
- versus logical right shift.
-- Constants are bit patterns interpreted by the operation that consumes them.
-
-### 4.2 Floating-Point
-
-Support only the floating storage types the backend can define and lower
-correctly.
-
-Baseline:
-
-- `f32`
-- `f64`
-
-Later additions should be explicit project choices:
-
-- `f16` / `bf16` if frontend SIMD/platform intrinsics need them.
-- `f80` / `f128` only with target ABI and helper-call support.
-
-Floating arithmetic and comparisons still need operation-level attributes; see
-section 6.
-
-### 4.3 Pointers
-
-Keep pointer types as codegen storage/ABI facts. Pointee types are useful for
-load/store defaults and debug synthesis, but memory access semantics should
-come from `CfreeCgMemAccess`, not from type qualifiers.
-
-Recommended pointer model:
-
-- One thin pointer type constructor: pointee type + address space.
-- Address space 0 is the normal target data address space.
-- No type-level nullability, restrict, readonly, volatile, or mutability.
- Express these at the operation, declaration, or parameter-attribute site.
-- Fat pointers are frontend-lowered aggregates. Capability pointers should wait
- until a real target requires them.
-
-### 4.4 Aggregates and Layout
-
-Keep aggregate support only where the backend needs the aggregate shape for
-correct codegen:
-
-- ABI classification of parameters and returns.
-- Natural target layout for C-like records.
-- Data object sizing/alignment.
-- Debug synthesis when possible.
-
-Frontends can lower many patterns to existing codegen constructs.
-
-The gap to close is not richer source aggregate modeling. The useful backend
-primitive is generic address arithmetic, now part of the Phase 1 contract:
-
-```c
-/* Pops a pointer or lvalue address, pushes address + byte_offset as a pointer
- * or lvalue address with the requested result type. */
-void cfree_cg_addr_offset(CfreeCg*, int64_t byte_offset,
- CfreeCgTypeId result_type);
-```
-
-This gives frontends one way to lower non-C layouts without asking CG to
-understand the source aggregate. `cfree_cg_index` remains the typed
-scaled-index form for ordinary pointer/array indexing; `addr_offset` is the
-byte-granular escape hatch for frontend-owned record layouts and packed/custom
-field offsets.
-
-### 4.5 Qualifiers
-
-Remove C-style qualified codegen types as behavior-carrying types.
-
-- `const` is a frontend type-checking fact or an object/read-only declaration
- fact.
-- `volatile` is a memory access fact.
-- `restrict` / `noalias` is a pointer parameter or memory access fact.
-
-If debug info needs source qualifiers, they belong in debug metadata derived
-from declarations, not in backend codegen types.
-
-### 4.6 Type Queries
-
-Keep target-layout queries that frontends need for lowering:
-
-- Type kind.
-- Size and alignment.
-- Integer/float width.
-- Pointer address space and pointee.
-- Array element/count.
-- Record field offset where CG owns natural record layout.
-- Function ABI/calling-convention attributes.
-
-Avoid queries whose only purpose is reconstructing source-language types.
-
-## 5. Memory Access
-
-Memory semantics should have exactly one spelling: a memory access descriptor
-on every operation that touches memory. Do not split behavior between type
-qualifiers, lvalue flags, and special load/store variants.
-
-Recommended descriptor:
-
-```c
-typedef struct CfreeCgMemAccess {
- CfreeCgTypeId type; /* value type loaded/stored, or element type */
- uint32_t align; /* 0 = natural for type */
- uint32_t address_space; /* normally inherited from pointer type */
- uint32_t flags; /* VOLATILE, NONTEMPORAL, INVARIANT, etc. */
- uint32_t alias_scope;
- uint32_t noalias_scope;
-} CfreeCgMemAccess;
-```
-
-Recommended operations:
-
-```c
-void cfree_cg_load(CfreeCg*, CfreeCgMemAccess access);
-void cfree_cg_store(CfreeCg*, CfreeCgMemAccess access);
-void cfree_cg_memcpy(CfreeCg*, uint64_t size,
- CfreeCgMemAccess dst, CfreeCgMemAccess src);
-void cfree_cg_memmove(CfreeCg*, uint64_t size,
- CfreeCgMemAccess dst, CfreeCgMemAccess src);
-void cfree_cg_memset(CfreeCg*, uint8_t value, uint64_t size,
- CfreeCgMemAccess dst);
-```
-
-Consequences:
-
-- Remove type-level volatile behavior.
-- Remove separate fixed-size aggregate memory APIs that take only size/align.
-- Remove implicit load/store type inference when it can be ambiguous. The
- access descriptor is the authority.
-- Keep convenience constructors for common descriptors if desired, but not
- alternate semantic entry points.
-
-Needed access facts:
-
-- Explicit alignment, including known under-alignment.
-- Volatile load/store.
-- Non-temporal/cache hints: streaming accesses unlikely to be reused soon, so
- targets may select non-temporal instructions or ignore the hint.
-- Invariant memory: contents known stable for the relevant program region
- except through this access path. This is stronger than readonly object
- placement and should be set only when the frontend can prove it.
-- Alias scopes and noalias scopes. Rust `&mut`, C `restrict`, Zig `noalias`,
- and frontend escape analysis can all feed this conservatively.
-
-## 6. Operation Semantics
-
-Integer and floating operations need attributes describing language semantics.
-
-### 6.1 Integer Ops
-
-Keep signedness on operations, not on types.
-
-Required operation families:
-
-- Add, sub, mul, bitwise and/or/xor.
-- Signed and unsigned div/rem.
-- Left shift, logical right shift, arithmetic right shift.
-- Signed and unsigned comparisons.
-- Sign extension, zero extension, truncation.
-- Pointer/integer casts where the target permits them.
-
-Add operation flags:
-
-- No signed wrap / no unsigned wrap.
-- Exact division/shift where applicable.
-- Explicit signed and unsigned trap-on-overflow. Generic "overflow" is not
- enough because integer types are width-only.
-- Explicit signed and unsigned saturating arithmetic if a frontend/runtime
- wants direct lowering.
-
-Checked arithmetic uses signed and unsigned intrinsics that return
-`(result, overflow_bool)`. That is a backend-relevant primitive and avoids
-forcing frontends to reproduce target flag idioms manually.
-
-`clz` and `ctz` have defined zero-input behavior: when the operand is zero,
-the result is the operand bit width.
-
-### 6.2 Floating Ops
-
-Add floating arithmetic; the current API can push floats and convert but cannot
-fully lower C, Zig, or Rust arithmetic.
-
-Required:
-
-- Floating add/sub/mul/div/rem/neg.
-- Ordered and unordered comparisons.
-- Conversion between floats and integers with explicit signedness and rounding
- behavior.
-- Fused multiply-add intrinsic or operation.
-
-Attributes:
-
-- Strict default semantics.
-- Optional fast-math flags: reassoc, no-NaNs, no-infs, no-signed-zeros, allow
- reciprocal, approximate functions.
-- Rounding mode and exception behavior only if strict FP support is a goal.
-
-### 6.3 Bitcasts
-
-`convert` should mean semantic conversion. Add a distinct bit-preserving
-operation:
-
-- Scalar bitcast.
-- Aggregate/vector bitcast only when size matches and the backend can lower it
- as a copy/reinterpretation.
-
-## 7. Control Flow and Stack Values
-
-Phase 1 contract:
-
-- `switch` / jump table primitive with target-chosen lowering.
-- Computed goto through first-class function-local label-address values plus an
- indirect local branch. This must support direct-threaded interpreters, where
- label addresses are stored in dispatch tables, indexed by opcode, loaded, and
- jumped through. Label-address data constants must be emitted while the
- defining function is open, after the label handles are created; labels need
- not be placed yet. Data emission is allowed inside an open function, so the
- intended direct-threaded lowering is: declare the dispatch-table symbol, begin
- the function, create labels, define the table contents as data while the
- function remains open, then resume code emission. The value is opaque and
- valid only for equality, storage/loading, table selection, and computed gotos
- in the label's defining function.
-- `unreachable` as a real terminator, not a side-effect intrinsic.
-
-Do not add landing pads, cleanup edges, or exception successors unless the
-project expands beyond setjmp/longjmp.
-
-## 8. Calls, ABI, and Function Attributes
-
-The function type currently carries return type, params, and ABI variadic. That
-is not enough for multi-language direct codegen.
-
-Add:
-
-- Calling convention on function type or call site. The common path is
- backend-selected target C default; explicit SysV, Win64, AAPCS, wasm,
- interrupt, and target-specific conventions are frontend requests for ABI
- interop and must be supported by the selected backend or diagnosed.
-- Per-function attributes: noreturn, cold, hot, naked, interrupt, stack
- alignment, red-zone use, target features.
-- Per-call attributes: tail policy, musttail, notail, cold.
-- Per-parameter and return attributes: sret, byval, byref, inreg, noalias,
- readonly, writeonly, nonnull, dereferenceable, signext, zeroext, align,
- nest/context pointer.
-
-Avoid exception-related attributes such as `nounwind` unless they affect a
-supported backend output. With no unwind model, calls either return normally or
-do not return.
-
-`musttail` is important for languages that depend on tail calls or lower
-coroutines/state machines through helper functions. It should fail
-diagnostically if ABI shapes are incompatible.
-
-## 9. Symbols, Linkage, and Names
-
-The declaration API should not force C symbol mangling. C mangling is one
-frontend policy, not a universal codegen rule.
-
-Use one name model:
-
-- Linkage name: exact linker-visible spelling after the frontend has applied
- its language mangling and any desired object-format C decoration.
-- Optional display/source name for debug info.
-
-Do not keep a separate "C source name" declaration path in the core CG API. If
-the C frontend wants C decoration, it should call a helper before declaring the
-symbol or use a C-frontend wrapper.
-
-Add:
-
-- COMDAT/linkonce/select-any groups.
-- Weak/weak-odr where object formats support it.
-- Section and partition attributes on functions and data.
-- Constructor/destructor arrays with priority.
-- Symbol versioning hooks later for ELF shared libraries.
-
-## 10. Data Definitions and Constants
-
-Keep data emission close to object bytes and relocations. That matches the
-direct-codegen model and avoids a parallel constant IR.
-
-Needed additions:
-
-- Typed null pointer constants.
-- Zero initializer and arbitrary bytes.
-- Function/data address constants with pointer address space.
-- Function-local label-address constants for direct-threaded dispatch tables.
- These are emitted while the defining function is open; ordinary data
- definitions may be interleaved with function emission for block-scope statics
- and dispatch tables.
-- Enum constants are unsigned bit patterns (`uint64_t`) interpreted by the
- enum's width-only integer base type; source signedness is not part of the
- codegen enum type.
-- Relocation expressions already exist; keep target-selected lowering as the
- default. Add explicit policy only when the target needs a frontend-visible
- distinction.
-- Per-object COMDAT, alignment, section, retention, merge/string flags, and TLS
- model.
-
-Do not add structured aggregate constants unless they are needed to avoid
-incorrect backend output. Frontends can lay out aggregate initializers into
-bytes plus relocations.
-
-## 11. Atomics and Memory Model
-
-The current atomics have C-like memory orders. Multi-language support needs a
-few more backend-relevant details:
-
-- Atomic width legality query.
-- Strong versus weak compare-exchange.
-- Memory scope if a supported target exposes scopes beyond system-wide atomics.
-- Volatile atomic distinction for languages that expose both.
-- Fence sync scope if scopes are supported.
-
-Do not add wait/wake or futex-like primitives to core CG. They should remain
-library/runtime calls unless a backend can lower them specially.
-
-Atomic operations should also use `CfreeCgMemAccess` so type, address space,
-alignment, volatility, and alias information have the same spelling as ordinary
-memory operations.
-
-## 12. Inline Assembly
-
-The target constraint string is the operand contract. This is intentionally raw
-because C/Zig-level inline asm needs the full target grammar: register classes,
-explicit registers, immediate classes/ranges, memory/address constraints,
-alternatives, matching/tied operands, earlyclobber, and target-specific
-modifiers. A partial structured vocabulary would be less expressive and would
-create a second spelling for facts the backend already parses from
-constraints.
-
-Phase 1 contract:
-
-- Options: pure, nomem, readonly, preserves_flags, nostack, noreturn.
-- Clobber ABI sets such as "clobber all caller-saved".
-
-Later additions:
-
-- Target feature requirements and target arch guard.
-
-Phase 1 keeps template strings and raw target constraints, wrapped in
-`CfreeCgInlineAsm` so asm-wide options and operand arrays have a single
-descriptor.
-
-## 13. Dynamic Stack Allocation
-
-Rust and Zig generally avoid C VLAs but still need stack temporaries, alignment,
-and sometimes alloca-like lowering.
-
-Phase 1 contract:
-
-- Local slot allocation with explicit alignment and debug/address-taken flags.
-- Parameter slot allocation with the same debug/artificial/temp flags.
-- Dynamic `alloca(size, align)` returning a pointer.
-
-Later addition:
-
-- Stack probing for large frames as a target-selected behavior, with an option
- to require it where platform ABI demands it.
-
-## 14. Debug Information
-
-Debug info should ride alongside ordinary CG usage as much as possible. The
-default path should not require frontends to make a second set of debug-specific
-calls for every function, parameter, local, and type.
-
-Auto-populate debug records from existing CG calls:
-
-- `cfree_cg_decl` carries linkage name, display/source name, declaration attrs,
- type, and current source location. This is enough to create function/global
- DIE skeletons.
-- `cfree_cg_func_begin` / `func_end` define function ranges.
-- `cfree_cg_param_slot` carries parameter index, type, name, and current source
- location. This can create parameter DIEs and initial locations.
-- `cfree_cg_local_slot` carries local type, name, alignment, flags, and current
- source location. This can create local variable DIEs when the name is nonzero.
-- `cfree_cg_set_loc` drives line table rows for subsequent instructions and
- data definitions.
-- Type constructors carry enough layout information for basic debug type DIEs:
- scalars, pointers, arrays, functions, and natural-layout records.
-
-The regular API needs a few debug-oriented fields so this works:
-
-- Source/display name separate from linkage name.
-- Compile-unit language tag and producer string.
-- Public file registration or a documented way for frontends to obtain stable
- `CfreeSrcLoc.file_id` values.
-- Local/param flags: artificial, address-taken, optimized-out, compiler-temp.
-- Optional lexical-scope markers for frontends that want nested scopes. These
- can be ordinary CG control-flow scope calls with debug names/flags rather
- than a separate debug API family.
-
-Limits of auto-population:
-
-- Inlined call-site info needs explicit frontend input because ordinary CG
- locations only describe the current emitted instruction.
-- Optimized variable locations beyond frame slots/registers may need later
- hooks from the optimizer wrapper.
-- Source-language-specific debug types may need optional metadata. That metadata
- should decorate normal CG types/declarations rather than replacing them with a
- separate debug-only API.
-
-## 15. Target Capability Queries
-
-A portable direct CG frontend needs to ask what the selected target can lower
-without guessing from enum values.
-
-Add queries for:
-
-- Legal scalar widths and floating types.
-- Legal atomic widths and lock-free status.
-- Supported calling conventions.
-- Supported inline asm constraint families.
-- Object-format features: COMDAT, weak, protected visibility, TLS models,
- common symbols, merge sections, constructor priorities.
-- Backend feature flags: SIMD extensions, unaligned memory support, strict
- alignment, red zone, pointer authentication, branch protection.
-
-Capability queries should answer "can this target/API lower it correctly", not
-"is this fast".
-
-## 16. Diagnostics and Error Model
-
-Most current CG misuse paths panic. That is acceptable for internal compiler
-bugs, but external frontends benefit from diagnosable unsupported-feature
-failures.
-
-Use this distinction:
-
-- Malformed CG usage that indicates a frontend/compiler bug may panic.
-- Unsupported but well-formed target features should emit diagnostics and fail
- cleanly.
-- Type/call/memory descriptors should be validated early enough that bad input
- does not produce partial object corruption.
-
-## 17. Frontend Registration
-
-The current `CfreeLanguage` enum is fixed. That is enough for built-ins and the
-toy frontend, but not for general external language plugins.
-
-Add:
-
-- Dynamic language registration by name, default suffixes, and compile callback.
-- Per-language option payload passed through `CfreeCompileOptions`, or a generic
- frontend user pointer.
-- A standard way for a frontend to declare whether it needs preprocessing,
- debug info, or target feature strings.
-
-Because this API can break, the fixed enum can be removed from the generic
-frontend path. Builtin C/asm can still have fast internal dispatch.
-
-## 18. Suggested Phasing
-
-### Phase 1: One Clean Codegen Contract
-
-Status: public contract defined in `include/cfree/cg.h`. Implementation and
-call-site migration are intentionally separate work.
-
-Phase 1 makes these breaking API choices:
-
-- Builtin integer types are width-only: `bool`/`i1`, `i8`, `i16`, `i32`, `i64`,
- and `i128`. Signedness exists only on integer operations, comparisons,
- conversions, and ABI extension attributes.
-- Behavior-carrying qualified types are removed. `const` is an object/debug
- fact, `volatile` is a memory-access fact, and `restrict`/`noalias` is an ABI
- or memory-access fact.
-- Pointer types carry pointee type plus address space. Address space 0 is the
- normal target data address space.
-- Generic byte-address offset is included for frontend-owned aggregate layouts.
-- Function types are built from `CfreeCgFuncSig`: return type/attrs,
- parameter type/attrs, calling convention, and ABI variadic bit.
-- Declarations use exact raw linkage names plus optional display/source names.
- CG does not apply C symbol spelling policy.
-- `CfreeCgMemAccess` is the only way to spell memory semantics for loads,
- stores, fixed-size memory ops, and atomics.
-- Integer operations are split from floating operations and accept explicit
- operation flags such as no-wrap, exact, signed/unsigned trap-on-overflow, and
- signed/unsigned saturation.
-- Semantic conversions are explicit: sign extension, zero extension,
- truncation, pointer/integer casts, float extension/truncation, float/integer
- conversions with rounding, and a distinct bitcast operation.
-- Floating arithmetic and ordered/unordered comparisons are first-class API
- operations, with strict defaults and optional fast-math flags.
-- Calls use `CfreeCgCallAttrs` for tail policy and call-site flags. `musttail`
- is represented as a contract the backend must accept or diagnose.
-- Intrinsics include the backend primitives assumed by
- `rt/include/cfree/{syscall,baremetal,coro}.h`.
-- Atomics take `CfreeCgMemAccess`, include strong/weak compare-exchange, and
- expose legality and lock-free capability queries.
-- Target capability queries cover scalar type support, calling conventions, and
- object-format symbol features.
-- Inline assembly uses raw target constraints as the canonical operand contract.
-- Switch/jump-table, computed goto, and unreachable terminator are explicit
- control-flow operations.
-- Dynamic alloca and local/parameter slot attributes are explicit stack-slot
- operations.
-- Inline assembly includes ABI clobber sets.
-- Backend feature flags are queryable.
-- Data address constants carry pointer address space.
-
-### Phase 2: Backend and Object Coverage Gaps
-
-- COMDAT/groups and constructor/destructor arrays.
-- Stack probe requirement/request for large frames.
-- More complete inline asm target-feature guards.
-
-### Phase 3: Debug and Frontend Integration
-
-- Complete auto debug emission from declarations, function ranges, locations,
- params, locals, and type constructors.
-- Compile-unit language/source registration.
-- Optional lexical-scope markers through ordinary CG scopes.
-- Dynamic frontend registration.
-
-## 19. Design Rule
-
-When deciding whether a feature belongs in public CG, use this test:
-
-- If the fact changes ABI, object contents, relocation choice, instruction
- selection, memory ordering, or debug output, CG probably needs to express it.
-- If the fact is source-language-only and can be fully lowered into existing
- storage, calls, memory accesses, and operations, it belongs in the frontend.
-- If the fact exists only to make frontend modeling easier, keep it out unless
- omitting it causes incorrect backend output.
-- If the fact requires whole-function analysis but does not need to be visible
- to direct backends, it may belong in the optimizer wrapper rather than the
- public direct-emission API.
-
-The goal is not to expose every internal compiler concept. The goal is to make
-the direct codegen boundary honest enough that C, Zig, Rust-like languages, and
-machine lifters can all lower to it without depending on internal headers or
-silently losing backend-relevant semantics.
diff --git a/doc/cg-neutral-backend-plan.md b/doc/cg-neutral-backend-plan.md
@@ -1,286 +0,0 @@
-# Neutral CG Backend Migration Plan
-
-This document plans the migration from the existing C-shaped codegen path to a
-neutral CG layer based on the public API in `include/cfree/cg.h`. It also
-consolidates the lower-layer gap inventory exposed while updating
-`src/api/cg.c` for that API.
-
-The central goal is that the C frontend becomes one client of a neutral codegen
-interface. C `Type*` should stop being the backend type currency; it should be
-translated at the frontend boundary into neutral CG type descriptors. Backends,
-ABI classification, and target lowering should consume CG types and CG
-operation descriptors.
-
-## Principles
-
-- Reuse public CG semantic enums and flags when they name the exact internal
- concept: calling convention, TLS model, tail policy, memory order, rounding,
- ABI attribute flags, operation flags, asm flags, and similar values.
-- Do not pass public API structs directly into lower layers. Public structs use
- API handles, caller-owned arrays, and frontend-facing ownership rules. Lower
- layers should receive resolved internal descriptors with stable storage.
-- Move C `Type*` above CG. The C parser/type system may still use `Type*`, but
- it should lower C declarations, expressions, and layout requests into neutral
- CG types before reaching ABI or `CGTarget`.
-- Keep `ObjBuilder` mostly type-agnostic. It should model object-format facts:
- symbols, sections, groups, relocations, data expressions, TLS model, sizes,
- alignments, display names, and format-specific extensions. It should not
- become a typed IR layer.
-- Make unsupported behavior explicit. If a public CG feature cannot be lowered
- or represented, the target/object layer should answer false through a
- capability query or emit a diagnostic. Metadata should not be silently
- ignored unless the API defines it as a hint.
-
-## Gap Coverage
-
-The public CG API already describes more semantics than the current lower
-layers can represent. The migration plan below addresses these gaps by moving
-metadata into neutral CG descriptors, object descriptors, or explicit target
-capabilities.
-
-`CGTarget` and ABI gaps to close:
-
-- Non-default calling conventions are recorded by the public API but not
- carried into ABI classification or lowering.
-- ABI attributes are not consumed by call, return, or parameter lowering:
- signext, zeroext, sret, byval, byref, inreg, noalias, readonly, writeonly,
- nonnull, nest, explicit alignment, and dereferenceable size.
-- Function attributes are incomplete below the API: stack alignment, custom
- sections, target feature strings, cold/hot hints, naked functions, interrupt
- functions, no-red-zone requests, ifunc, and full noreturn handling.
-- Per-symbol TLS model selection does not reach target lowering.
-- Pointer address spaces are only partially represented and do not have full
- target semantics.
-- Memory access metadata loses nontemporal, invariant, alias scope, and noalias
- scope information.
-- Computed goto, label-address values, and indirect branch over a validated
- target set are unsupported.
-- Switch lowering has no target hook and currently ignores jump-table hints.
-- Integer operation flags are ignored: nsw, nuw, exact, trapping overflow, and
- saturating arithmetic.
-- Floating-point semantics are incomplete: FP remainder, fast-math flags, and
- ordered-vs-unordered comparisons are not preserved.
-- Conversion rounding modes are ignored.
-- The internal intrinsic set is narrower than the public API, including FMA,
- syscall, IRQ operations, barriers, cache maintenance, CPU wait/event ops,
- coroutine switch, and signed-vs-unsigned overflow intrinsics.
-- Atomic legality and lock-free queries are approximated from size instead of
- target hooks; weak compare-exchange is accepted but not represented.
-- Inline asm loses flags and ABI clobber sets.
-- Call attributes are incomplete: musttail compatibility is not validated and
- cold-call hints are ignored.
-
-`ObjBuilder` gaps to close:
-
-- Source/display names are not represented for symbols.
-- DLL import/export and constructor priority are not semantic object features.
-- Data label addresses have no object-level expression path.
-- Data relocation address spaces are ignored.
-- Symbol-difference expressions rely on available relocation kinds rather than
- a format-neutral expression contract.
-- Section merge/string entry size is not fully wired through data definitions.
-- Common, weak, protected visibility, and COMDAT are only partially modeled as
- an explicit object-level contract.
-
-## Type Direction
-
-Introduce an internal neutral CG type model as the canonical backend type
-language. The public `CfreeCgTypeId` can be an API handle into this model, while
-internal code may use either stable `CGTypeId` handles or `const CGType*`
-references after validation.
-
-Surfaces that currently carry `Type*` and should move to neutral CG types
-include:
-
-- `Operand.type`
-- `MemAccess.type`
-- `ConstBytes.type`
-- `FrameSlotDesc.type`
-- `CGParamDesc.type`
-- `CGABIValue.type`
-- `CGFuncDesc.fn_type`
-- `CGCallDesc.fn_type`
-- `AsmConstraint.type`
-- ABI record layout and function classification inputs
-
-The C frontend should own the `Type* -> CGTypeId` adapter. Public CG API users
-already construct neutral CG types directly, so they should not round-trip
-through C types.
-
-## Internal Descriptor Shape
-
-Internal descriptors should be isomorphic to the public CG API where that is
-useful, but resolved into backend-owned terms.
-
-For example, public input:
-
-```c
-CfreeCgFuncSig
-```
-
-should normalize into an internal descriptor shaped like:
-
-```c
-typedef struct CGAbiAttrs {
- uint32_t flags;
- uint32_t align;
- uint64_t dereferenceable_size;
-} CGAbiAttrs;
-
-typedef struct CGParam {
- CGTypeId type;
- CGAbiAttrs attrs;
-} CGParam;
-
-typedef struct CGFuncSig {
- CGTypeId ret;
- CGAbiAttrs ret_attrs;
- const CGParam* params;
- uint32_t nparams;
- CfreeCgCallConv call_conv;
- int abi_variadic;
-} CGFuncSig;
-```
-
-`TargetABI` should classify `CGFuncSig`, not a C function `Type*`. Parser paths
-that still start with C `Type*` should synthesize a `CGFuncSig` during lowering.
-
-## Phasing
-
-### 1. Introduce Neutral CG Core Types
-
-Add the internal CG type table and descriptor APIs first, while keeping the old
-codegen path working. This phase should define:
-
-- `CGTypeId` / `CGType` and constructors for builtin, pointer, array, function,
- record, enum, and alias types.
-- type layout/query hooks backed by `TargetABI`.
-- `CGFuncSig`, `CGParam`, `CGAbiAttrs`, and neutral memory/access descriptors.
-- a C frontend adapter from `Type*` to `CGTypeId`.
-
-This gives both the public CG API and the C frontend a shared neutral model
-instead of treating `include/cfree/cg.h` as a facade over C-shaped internals.
-
-### 2. Move the C Frontend to the New CG Layer
-
-Make the C parser/frontend emit through the new CG API/layer. The old internal
-CG path should no longer be a privileged backend path for C.
-
-This is the main semantic forcing function. It should prove that the neutral
-type model can express normal C codegen, ABI calls, locals, lvalues, aggregates,
-initializers, debug-facing names, and target-specific lowering requests.
-
-Prefer targeted red-green coverage during this phase:
-
-- function calls and returns for scalar, aggregate, variadic, and sret cases.
-- object definitions, tentative definitions, TLS, readonly data, and custom
- sections.
-- control flow, switches, computed goto once supported, and inline asm.
-- atomics and memory access descriptors.
-
-### 3. Keep the Old CG Layer Temporarily
-
-Do not delete `src/cg` immediately after the frontend starts targeting neutral
-CG. Keep it as an adapter, comparison point, or dead-but-buildable path until
-the new route is proven by the focused test corpus.
-
-The deletion point should be mechanical: no production path and no useful test
-harness should depend on the old layer. Any parity tests worth keeping should
-move to the new API before deletion.
-
-### 4. Update ObjBuilder to Object Descriptors
-
-Update `ObjBuilder` before broad `CGTarget` surgery where the new CG API already
-needs stronger object semantics.
-
-`ObjBuilder` should grow descriptor-based write APIs for:
-
-- symbols with linkage name, display name, bind, visibility, kind, used,
- import/export flags, COMDAT/group membership, common definition, constructor
- priority, and per-symbol TLS model.
-- sections with kind, semantic type, flags, alignment, entry size, group, link,
- info, and format extension fields.
-- data expressions for absolute symbol addresses, PC-relative symbol
- references, symbol differences, and label-address values.
-
-Label addresses should ideally lower to normal local symbols. `CGTarget` or
-`MCEmitter` can create a local notype symbol for an addressable block label; data
-tables then use normal symbol relocations instead of a special data-label path.
-
-This phase should keep `ObjBuilder` independent of full CG type semantics. It
-needs sizes and alignments at definition time, not a general type graph.
-
-### 5. Update ABI and CGTarget to Consume CG Types
-
-Once the frontend and object layer are speaking the neutral model, update ABI
-classification and `CGTarget` signatures to consume CG descriptors directly.
-
-Important changes:
-
-- Replace `abi_func_info(TargetABI*, const Type*)` with classification keyed by
- `CGFuncSig`.
-- Preserve ABI attributes in `ABIFuncInfo` / `ABIArgInfo`: signext, zeroext,
- sret, byval, byref, inreg, noalias, readonly, writeonly, nonnull, nest,
- explicit alignment, and dereferenceable size.
-- Extend `CGFuncDesc` for complete function attrs: stack alignment, section,
- target feature strings, cold/hot, naked, interrupt, no-red-zone, ifunc, and
- noreturn.
-- Extend `CGCallDesc` for tail policy, musttail validation, cold call hints,
- direct/indirect callee details, and full ABI signature metadata.
-- Replace simple op hooks with descriptors preserving integer flags, FP flags,
- ordered/unordered FP comparisons, FP remainder, and conversion rounding.
-- Preserve full memory metadata: address space, volatile, nontemporal,
- invariant, alias scope, noalias scope, and atomic flag/order.
-- Add target hooks or descriptors for switches, label addresses, indirect
- branches, atomics legality/lock-free queries, weak compare-exchange, expanded
- intrinsics, and inline asm flags/ABI clobber sets.
-
-`opt_cgtarget` and IR replay should mirror the new `CGTarget` surface rather
-than reconstructing lost metadata.
-
-### 6. Delete the Old CG Layer
-
-Delete the old CG layer only after:
-
-- the C frontend emits through neutral CG.
-- public CG API tests pass through the same path.
-- `ObjBuilder`, `TargetABI`, `CGTarget`, and `opt_cgtarget` consume neutral
- descriptors.
-- any useful parity tests have been moved.
-- no production driver or test harness depends on the old interfaces.
-
-At this point deletion should be mostly removing stale adapters and C-shaped
-plumbing, not making new semantic decisions.
-
-## Capability and Diagnostic Contract
-
-Capability queries should answer correctness, not performance. A target should
-return support only when it can preserve the requested semantics.
-
-Examples:
-
-- non-default calling conventions must be target-backed.
-- musttail requires ABI compatibility validation.
-- symbol feature queries should be backed by `ObjBuilder` and object-format
- support, not approximated in `src/api/cg.c`.
-- atomic legality and lock-free answers should come from target hooks.
-- strict conversion rounding, trapping overflow, saturating arithmetic, FP
- remainder, and runtime/bare-metal intrinsics should diagnose until supported.
-
-Hints such as non-temporal memory, branch/call hotness, and some fast-math flags
-may be ignored only when the public API explicitly permits that behavior.
-
-## Suggested Test Strategy
-
-Prefer narrow tests while the interfaces are changing:
-
-- `make test-cg` for neutral CG lowering and ABI behavior.
-- `make test-elf` for symbol attrs, sections, `entsize`, data expressions, and
- object round-trips.
-- `make test-link` for relocation behavior, visibility, TLS, COMDAT, and
- symdiff handling.
-- frontend subsets such as `make test-parse test-cg` when migrating C lowering.
-- specific arch smoke/codegen cases for features each target claims to support.
-
-Keep unsupported-feature tests explicit: they should assert diagnostics or false
-capability answers rather than relying on accidental backend behavior.
diff --git a/doc/cg-type-migration-plan.md b/doc/cg-type-migration-plan.md
@@ -1,157 +0,0 @@
-# Remove C `Type` From `src/`
-
-## Goal
-
-`src/` must be language-neutral. C semantic types stay in `lang/c`; generic
-codegen, ABI, arch lowering, optimizer, debug, object emission, and emu use
-`CfreeCgTypeId`, `CgType`, debug type IDs, or explicit storage facts.
-
-Completion means:
-
-```sh
-rg 'lang/c|type/type\.h|const Type\*|\bTypeKind\b|\bTY_' src include/abi
-```
-
-finds no generic `src` dependency on C semantic types. C-specific files under
-`lang/c` may still use `Type`.
-
-## Current Blockers
-
-These are the remaining dependency clusters to remove.
-
-1. **C compatibility shims in `src/`**
- - `src/type/type.h`
- - `src/decl/decl.h`
- - `src/decl/decl_attrs.h`
- - `src/lex/lex.h`
- - `src/pp/pp.h`
- - `src/parse/cg_public_compat.h`
- - `src/api/pipeline.c -> lang/c/c.h`
-
-2. **ABI still exposes C `Type*` bridge APIs**
- - `include/abi/abi.h` and `src/abi/abi.h` include `type/type.h`.
- - `abi_type_info`, `abi_sizeof`, `abi_alignof`, `abi_record_layout`, and
- `abi_func_info` still take `const Type*`.
- - `abi_size_type`, `abi_ptrdiff_type`, `abi_intptr_type`,
- `abi_uintptr_type`, and `abi_va_list_type` still manufacture C types.
- - `src/abi/abi.c` still has C bridge classification/layout code.
-
-3. **Public CG implementation still stores C `Type*` internally**
- - `src/api/cg.c` keeps `CgApiType.type`, `resolve_type`,
- `cg_api_type_import`, `cg_api_type_resolve`, stack value types, slot type
- tables, symbol type tables, function return types, and bridge helpers.
- - It builds legacy C `Type*` values when public CG type constructors are
- called.
-
-4. **`CGTarget` and arch lowering still use C type identity**
- - `src/arch/arch.h` forward-declares `Type` and uses `const Type*` in
- `FrameSlotDesc`, `MemAccess`, `ConstBytes`, `AggregateAccess`,
- `BitFieldAccess`, `Operand`, `CGABIValue`, `CGParamDesc`, `CGFuncDesc`,
- `CGCallDesc`, `CGScopeDesc`, `AsmConstraint`, `alloc_reg`, and
- `va_arg_`.
- - Arch internals include `type/type.h` and use helpers such as
- `type_is_64`, `type_is_fp_double`, `type_byte_size`, and
- `type_is_signed`.
-
-5. **Optimizer IR stores C `Type*`**
- - `src/opt/ir.h`, `src/opt/ir.c`, `src/opt/opt.c`,
- `src/opt/pass_lower.c`.
- - `Func.val_type`, instruction result types, frame slots, call metadata,
- and `IR_VA_ARG` aux data are still `const Type*`.
-
-6. **Generic debug has the C debug adapter in `src`**
- - `src/debug/c_debug.c` and `src/debug/c_debug.h` walk `Type*`.
- - Generic debug comments and APIs still refer to C `Type*` caches.
-
-7. **Emu stubs still synthesize C `Type*`**
- - `src/emu/emu.h` exposes `emu_cpu_type` and `emu_block_fn_type` as
- `const Type*`.
- - `src/emu/cpu.c` constructs CPU/block types through C type constructors.
-
-8. **Core pool still has a C type hook**
- - `src/core/pool.h` forward-declares `Type`.
- - `pool_type` exists only for the old C type interning shape and should move
- to `lang/c` or disappear.
-
-## Removal Order
-
-Do this in order; each step should keep `make lib`, `make bin`, and
-`make test-cg-api` green. Run parse/link tests when touching frontend or ABI
-behavior.
-
-1. **Make C lowering own the `Type* -> CfreeCgTypeId` cache**
- - Add a cache field or map in `lang/c`.
- - Ensure all C parser/codegen adapters call public CG constructors once per
- C type.
- - Add public CG record forward/begin/complete support before removing the
- recursive-record placeholder bridge.
-
-2. **Finish `src/api/cg.c` migration**
- - Replace all stored `const Type*` with `CfreeCgTypeId` or `CgType` facts.
- - Remove legacy C type construction from public CG constructors.
- - Keep any unavoidable bridge in tiny, named functions until step 8.
-
-3. **Make ABI purely CG-typed**
- - Rename or replace the `abi_cg_*` APIs as the only ABI layout/classification
- APIs.
- - Delete C `Type*` ABI APIs and C bridge classification/layout code from
- `src/abi`.
- - Replace target library type helpers with CG type IDs or move C spellings
- of `size_t`, `ptrdiff_t`, `intptr_t`, `uintptr_t`, and `va_list` to
- `lang/c`.
- - Remove `type/type.h` from `include/abi/abi.h` and `src/abi/abi.h`.
-
-4. **Make `CGTarget` language-neutral**
- - Change target-facing descriptors in `src/arch/arch.h` from `Type*` to
- `CfreeCgTypeId` or explicit facts: size, align, reg class, integer width,
- float width, pointer/address-space, signedness where operation-specific.
- - Replace arch helper reads of C types with CG helpers or operation flags.
- - Remove `type/type.h` includes from `src/arch/**`.
-
-5. **Move optimizer IR off C types**
- - Replace IR value/frame/instruction type fields with `CfreeCgTypeId` or
- compact derived facts.
- - Replace `IR_VA_ARG` `Type*` aux with a CG type handle.
- - Remove `type/type.h` from `src/opt/**`.
-
-6. **Move C debug lowering out of generic debug**
- - Move `src/debug/c_debug.*` to `lang/c/debug` or another C frontend adapter.
- - Generic debug should consume frontend-provided `DebugTypeId` values, not
- inspect C `Type`.
- - Remove C type cache language from generic `src/debug` docs/comments.
-
-7. **Update emu stubs**
- - Replace `emu_cpu_type` / `emu_block_fn_type` with CG type IDs or explicit
- layout records.
- - Build CPU state and block signatures through public CG constructors.
- - Remove `type/type.h` from `src/emu/**`.
-
-8. **Move pool/type interning ownership to `lang/c`**
- - Delete `pool_type` from `src/core/pool.*` or move the C-specific type
- interning helper under `lang/c/type`.
- - Remove the `Type` forward declaration from `src/core/pool.h`.
-
-9. **Delete compatibility shims and register C like Toy**
- - Delete `src/type`, `src/decl`, `src/lex`, `src/pp`, and
- `src/parse/cg_public_compat.h` once no `src` file includes them.
- - Remove `src/api/pipeline.c`'s direct `lang/c/c.h` include and hardcoded C
- branch.
- - Register C through the frontend mechanism used by Toy.
-
-## Do Not Regress
-
-- Do not put C-only facts into `CgType`.
-- Signedness should live on operations, comparisons, conversions, ABI attrs, or
- explicit lowering metadata, not storage type identity.
-- Object emission must remain byte/section/symbol/reloc based.
-- Keep frontend-specific debug/type lowering outside generic `src`.
-
-## Useful Checks
-
-```sh
-make lib
-make bin
-make test-cg-api
-rg 'lang/c|type/type\.h|const Type\*|\bTypeKind\b|\bTY_' src include/abi
-rg 'cg_api_type_import|cg_api_type_resolve|cfree_cg_internal_.*type' src
-```