kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 35868a8761e3923ce2a866dc5bd07d79a44996d2
parent 571865addf5f159357059b659a46110f6388010d
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat, 23 May 2026 10:06:01 -0700

doc: prune stale checklists and plans

Drop superseded checklists (C11 conformance/long-double, RV64/X64
parity, RT, frontend, locals, stage2, tailcall, toy rewrite, opt regs
plan), the old api-migration / builtins / cg-* design notes, and the
BUGS scratchpad. Live docs (OPT_PERF, OPT design, CG API status doc as
needed) remain.

Diffstat:
Ddoc/BUGS.md | 56--------------------------------------------------------
Ddoc/C11_CONFORMANCE_CHECKLIST.md | 297-------------------------------------------------------------------------------
Ddoc/C11_LONG_DOUBLE_CHECKLIST.md | 110-------------------------------------------------------------------------------
Ddoc/CTOOLCHAIN.md | 296-------------------------------------------------------------------------------
Ddoc/FRONTEND.md | 412-------------------------------------------------------------------------------
Ddoc/LANGS.md | 479-------------------------------------------------------------------------------
Ddoc/LOCALS.md | 127-------------------------------------------------------------------------------
Ddoc/OPT_REGS_CALL_PLAN.md | 592-------------------------------------------------------------------------------
Ddoc/RT_CFREERT_CHECKLIST.md | 113-------------------------------------------------------------------------------
Ddoc/RV64_PARITY_CHECKLIST.md | 252-------------------------------------------------------------------------------
Ddoc/STAGE2.md | 272-------------------------------------------------------------------------------
Ddoc/TAILCALL.md | 234-------------------------------------------------------------------------------
Ddoc/TOY_REWRITE_TASKS.md | 275-------------------------------------------------------------------------------
Ddoc/X64_PARITY_CHECKLIST.md | 389-------------------------------------------------------------------------------
Ddoc/api-migration.md | 304-------------------------------------------------------------------------------
Ddoc/builtins.md | 385-------------------------------------------------------------------------------
Ddoc/cg-api-status.md | 104-------------------------------------------------------------------------------
Ddoc/cg-ext.md | 618-------------------------------------------------------------------------------
Ddoc/cg-neutral-backend-plan.md | 286-------------------------------------------------------------------------------
Ddoc/cg-type-migration-plan.md | 157-------------------------------------------------------------------------------
20 files changed, 0 insertions(+), 5758 deletions(-)

diff --git a/doc/BUGS.md b/doc/BUGS.md @@ -1,56 +0,0 @@ -Known bugs with red test cases (test-parse) - -Format as: - -``` -- [ ] <feature description>: <test case name> -``` - -- [x] pointer subtraction yields ptrdiff_t (assignable to a wider integer without a cast): `6_5_6_01_ptr_diff_assign_to_long` -- [x] file-scope array bound with a parenthesized integer constant expression: `6_7_6_18_file_scope_array_bound_paren` -- [x] parenthesized declarator name (`int (foo)(int)`): `6_7_6_19_paren_declarator_name` -- [x] function declarator with an inline function-pointer return type (no typedef): `6_7_6_20_func_returning_funcptr_no_typedef` -- [x] static initializer accepts unary `-` on a floating constant: `6_7_9_30_static_init_neg_float` -- [x] `#warning` preprocessing directive (non-fatal, parsing continues): `6_10_warning_directive` -- [x] static initializer accepts a binary constant expression on floating constants (`1.0f/2.2f`): `6_7_9_31_static_init_const_float_expr` -- [x] conditional operator allows a comma expression in its middle operand (`a ? b, c : d`): `6_5_15_01_conditional_comma_in_middle` -- [x] subscript accepts a conditional whose constant arm is `0` without treating it as a null pointer: `6_5_2_1_01_subscript_conditional_zero_branch` -- [x] struct field declarator `RETTY (*(*name)(P))(IP)` (pointer-to-function-returning-function-pointer; sqlite VFS `xDlSym`): `6_7_6_21_field_ptr_to_func_returning_funcptr` - -Known bugs caught by other harnesses - -- [x] Mach-O `OutSec count drift` when `cfree cc` compiles a source and links it together with a precompiled `.o` input in one step: every `test/libc/cases/*.c` on the `darwin` cell of `test/libc/run.sh` (was 7/7 red, now 7/7 green). Root cause: in-memory ObjBuilders use ELF-style section names (`.text`, `.rodata`) and `.o` inputs use Mach-O comma-form (`__TEXT,__text`); both map to the same Mach-O `(segname, sectname)` in `pick_macho_names` but `link_layout` groups them by raw name, so same-mapped MSecs got interleaved with sections of a different mapped name. Phase B's adjacency-based OutSec coalescing then split the run, mismatching Phase A's distinct-name count. Fixed in `src/link/link_macho.c` by sorting MSecs by `(segname, sectname)` within each segment before vaddr placement. - -- [x] clang-emitted Mach-O `.o` rejected by `cfree ld` reader (`read_macho: non-extern reloc not supported`). Root cause: clang emits section-relative relocations (`r_extern == 0`) in `__LD,__compact_unwind` (and DWARF/EH sections); cfree's IR only modelled symbol-relative relocs. Fixed in `src/obj/macho_read.c` by lazily synthesizing one `.Lcfree.macho_secstart.<idx>` local symbol per referenced section and re-expressing the reloc as `target = sec_start_sym, addend = inplace_value - section.addr_in_obj`. The linker then resolves it to `target.vaddr + addend`, matching the original referent. Verified by linking `xcrun clang -c hello.c -o hello.o` output through `cfree ld -lSystem` and running. - -- [ ] Mach-O `link_macho: coalesce mismatch on __TEXT,__text (flags/zerofill)` when linking certain cfree-emitted relocatable objects. Reproduces with cfree-compiled `tmp/projects/stb_sprintf.h` (driver `tmp/refresh/use_stb_sprintf.c`) and `tmp/projects/cJSON/cJSON.c`; the trivial `int main(){return 0;}` + hosted shim still links, and `tmp/refresh/use_jsmn.c` links and runs end-to-end. The differentiator looks like a section-flag fan-out where an `__TEXT,__text` MSec gets emitted next to a `zerofill`-flagged MSec under the same `(segname, sectname)`, which trips the Phase A/B mismatch check in `src/link/link_macho.c`. No reduction yet: - - ```sh - SDK="$(xcrun --show-sdk-path)" - # Compile is fine: - build/cfree cc -target aarch64-darwin --sysroot="$SDK" -isystem rt/include \ - -c tmp/refresh/use_stb_sprintf.c -o /tmp/stb.o - # Link fails: - build/cfree cc -target aarch64-darwin --sysroot="$SDK" -e main \ - -o /tmp/stb.exe /tmp/stb.o -lc - # → fatal: link_macho: coalesce mismatch on __TEXT,__text (flags/zerofill) - ``` - -- [ ] aarch64 call lowering rejects "INDIRECT arg storage kind 3". Reproduces compiling `cJSON_Utils.c:845`, which passes a sized aggregate by value to a function. The AAPCS64 classifier picks INDIRECT but the call emitter has no path for the source-storage shape it sees there. No minimal repro yet: - - ```sh - SDK="$(xcrun --show-sdk-path)" - build/cfree cc -target aarch64-darwin --sysroot="$SDK" -isystem rt/include \ - -c tmp/projects/cJSON/cJSON_Utils.c -o /tmp/u.o - # → fatal: aarch64 call: INDIRECT arg storage kind 3 unsupported - ``` - -- [ ] silent SIGSEGV with no diagnostic when compiling much of lua-5.4.7. After B3 was fixed, 18 lua TUs now crash cfree (exit 139): `lapi, lcode, ldebug, ldo, ldump, lfunc, lgc, llex, lmem, lobject, lparser, lstate, lstring, ltable, ltm, luac, lundump, lvm, lzio`. The other 14 (`lauxlib, lbaselib, lcorolib, lctype, ldblib, linit, liolib, lmathlib, loadlib, lopcodes, loslib, ltablib, lua, lutf8lib`) compile cleanly. No minimal reduction yet, so no red test: - - ```sh - SDK="$(xcrun --show-sdk-path)" - build/cfree cc -target aarch64-darwin \ - --sysroot="$SDK" -isystem rt/include \ - -c tmp/projects/lua/src/lparser.c -o /tmp/lparser.o - # → Segmentation fault: 11 (exit 139, no diagnostic) - ``` diff --git a/doc/C11_CONFORMANCE_CHECKLIST.md b/doc/C11_CONFORMANCE_CHECKLIST.md @@ -1,297 +0,0 @@ -# C11 conformance checklist - -Status snapshot: 2026-05-19. - -Ground truth should be the implementation plus targeted tests, not README.md. -Keep this checklist red-green: add or unskip the smallest case first, then -make the implementation pass it. - -## Current signal - -- [x] `make test-lex` passes: 16/16. -- [x] `make test-pp test-pp-err` passes: 83/83 and 15/15. -- [x] `make test-parse-err` passes with expanded C11 constraint coverage: - currently 57/57 pass. -- [ ] `make test-parse` passes without skips: currently 2680 pass, 0 fail, - 2 skip. The remaining skip is `long double`. -- [x] `make test-cg-api test-opt test-dwarf test-debug` passes. -- [x] `make rt` builds the default runtime archives. -- [x] `make test-rt-headers` passes for the default runtime targets: - AArch64/x86-64/RV64 Linux and AArch64/x86-64 Darwin. -- [x] `make test-rt-runtime` passes for the default execution targets: - AArch64/x86-64/RV64 Linux. -- [x] `make test-lib-deps` passes. - -## First conformance gate: required diagnostics - -Goal: keep `make test-parse-err` green. These C11 constraint diagnostics now -have targeted negative coverage; broaden the checks as adjacent semantic rules -are implemented. - -- [x] Reject `sizeof` on incomplete object types. - Test: `test/parse/cases_err/6_5_sizeof_incomplete.c`. - Code: `parse_expr.c` `sizeof` / `c_abi_sizeof` call sites. -- [x] Reject invalid implicit assignment conversions, starting with pointer to - integer without an explicit cast. - Test: `test/parse/cases_err/6_5_type_mismatch.c`. - Code: `parse_assign_expr` in `parse_expr.c`. -- [x] Reject bit-field widths wider than the declared bit-field type. - Test: `test/parse/cases_err/6_7_2_1_bitfield_too_wide.c`. - Code: `parse_member_decls` in `parse_type.c`. -- [x] Reject multiple storage-class specifiers in one declaration. - Test: `test/parse/cases_err/6_7_2_storage_class_combo.c`. - Code: `parse_decl_specs`. -- [x] Reject redefining a complete struct/union tag in the same scope. - Test: `test/parse/cases_err/6_7_2_two_struct_defs.c`. - Code: `parse_struct_or_union`; `complete` is set for newly defined tags, - not only previously forward-declared tags. -- [x] Reject assignment to const-qualified lvalues. - Test: `test/parse/cases_err/6_7_3_const_assign.c`. - Code: declaration qualifiers are applied to the base type and checked in - `parse_assign_expr`. -- [x] Reject duplicate file-scope object definitions with external/internal - linkage. - Test: `test/parse/cases_err/6_7_redefinition.c`. - Code: `parse_external_decl`, symbol `defined` state. -- [x] Reject duplicate `case` values within one switch after integer constant - conversion. - Test: `test/parse/cases_err/6_8_duplicate_case.c`. - Code: `parse_case_stmt` / `SwitchCtx`. -- [x] Reject duplicate function definitions while still allowing compatible - declarations before one definition. - Test: `test/parse/cases_err/6_9_redefinition_function.c`. - Code: `parse_external_decl`; `SEK_FUNC` symbols track a `defined` bit. -- [x] Reject `void` mixed with other function parameters. - Test: `test/parse/cases_err/6_9_void_param_with_other.c`. - Code: `parse_param_list`. -- [x] Reject non-power-of-two positive `aligned(N)` values. - Test: `test/parse/cases_err/attr_p2_aligned_not_pow2.c`. - Code: attribute argument parsing in `parse_type.c`. -- [x] Reject the newly covered expression/type constraint failures now exposed - by `test/parse/cases_err/6_5_*`. - Covered cases: address of bit-field, cast struct to scalar, incompatible - conditional pointer arms, pointer-plus-pointer, incompatible pointer - relational compare, struct used as scalar condition, and `sizeof` on a - bit-field. -- [x] Reject additional initializer/declarator/tag constraints: - non-constant static initializer, excess scalar initializer, invalid - array/struct/union designators, wrong-kind tag redeclaration, functions - returning array/function, and variadic marker not last. -- [x] Reject non-integer bit-field types. - Test: `test/parse/cases_err/6_7_2_1_bitfield_bad_type.c`. - -Suggested cadence: - -```sh -make test-parse-err > /tmp/cfree_parse_err.log 2>&1 || tail -n 80 /tmp/cfree_parse_err.log -``` - -## Positive parse skips and recently unskipped cases - -Goal: `make test-parse` is green with `CFREE_TEST_ALLOW_SKIP` unset. - -- [ ] Implement `long double` enough for parser/codegen/runtime tests. - Current skipped case: `test/parse/cases/6_7_2_12_long_double.c`. - Skip reason: binary128 literal/convert needs `rt/lib/fp_tf` wiring - through CG. -- [x] Enable file-scope `asm`. - Covered case: `test/parse/cases/asm_02_file_scope.c`. - The parser decodes the file-scope string literal and submits it through - `cfree_cg_file_scope_asm`, which reuses the standalone asm parser over - the current object emitter. - -Focused run: - -```sh -CFREE_TEST_FILTER=6_7_2_12_long_double make test-parse -CFREE_TEST_FILTER=asm_02_file_scope make test-parse -``` - -## Type system and declarations - -- [x] Implement enough structural compatibility for redeclarations and - composite types beyond pointer identity. - Covered cases: `6_2_7_01_composite_array_size`, - `6_2_2_01_extern_in_block_inherits_internal`. - Code: `type_compatible`, `type_composite`, and - `c_sem_check_redeclaration`. -- [x] Track declaration state for ordinary identifiers: - declaration, tentative definition, definition, function definition, - linkage, storage duration, and type compatibility. - Code: parser `SymEntry` state plus `DSTATE_*`. -- [x] Add same-scope ordinary identifier redefinition checks while preserving - legal shadowing in nested block scopes. - Tests: `6_7_same_scope_redefinition`, - `6_9_duplicate_parameter`. -- [x] Complete tag state handling for forward declarations, same-scope - completion, and wrong-kind redeclarations. - Negative coverage: `6_7_2_tag_wrong_kind`, - `6_7_2_enum_forward`, `6_7_2_enum_wrong_kind`, and - `6_7_2_enum_redefinition`. -- [x] Validate function declarator constraints: - `void` parameter rules, variadic placement, function returning function, - function returning array, array/function parameter adjustment. - Negative coverage: invalid variadic placement, ellipsis without a - preceding parameter, function returning function, function returning - array, and array of function. Positive function parameter adjustment is - covered by `6_7_6_14_func_param_adjust`. -- [x] Decide and document implementation-defined bit-field behavior: - plain `int` signedness, allowed extended bit-field types, allocation - order, straddling, and alignment. -- [x] Add positive bit-field lowering cases from `test/parse/CORPUS.md`, - including zero-width bit-fields. - Positive bit-field, signed bit-field, zero-width, and `_Bool` bit-field - cases pass; `float` bit-field rejection now passes. Current policy: - plain `int` bit-fields are signed, integer types accepted by the - frontend are accepted as bit-field base types, allocation proceeds from - low to high bit offsets within little-endian storage units, zero-width - fields force a fresh storage unit aligned as their declared type, and - fields do not straddle storage units. - -## Expressions and conversions - -- [x] Make implicit conversions constraint-aware. Do not rely on CG conversion - success as the semantic check. - Covered for assignment, compound assignment, initialization, return, - calls, and redeclaration diagnostics via `lang/c/sem`. -- [x] Preserve lvalue properties: modifiable, const-qualified, bit-field, - array, function designator, and incomplete type. - The parser value stack tracks lvalue, modifiable-lvalue, bit-field, and - null-pointer-constant state across loads, conversions, member access, - dereference, and materialization. -- [x] Implement `sizeof` rules completely: - no incomplete object type, no function type, no bit-field, VLA operand - evaluated, non-VLA operand not evaluated. - Coverage: `sizeof(function)` and `sizeof(bit-field)` are rejected, - VLA/deref pointer positive cases pass, and `6_5_59_sizeof_no_eval` - verifies non-VLA operands are not evaluated. -- [x] Complete conditional operator usual-conversion behavior for arithmetic - and pointer/null arms. - Positive arithmetic and pointer/null cases pass; incompatible pointer - arms are rejected. -- [x] Complete pointer compound assignment (`p += n`, `p -= n`). - Positive `p += n` and `p -= n` coverage passes; pointer RHS is rejected. -- [x] Expand `_Generic` tests for default selection, compatible types, and - unevaluated controlling expression. - Default selection, compatible typedef matching, and duplicate-compatible - association diagnostics pass. -- [x] Add negative tests for invalid pointer arithmetic, invalid relational - comparisons, invalid casts, modifying non-lvalues, and scalar-required - operators. - Pointer-plus-pointer, incompatible pointer relational compare, - struct-to-int cast, struct condition, array assignment, bad call - argument conversion, invalid pointer compound assignment, and floating - bitwise operands are rejected. - -## Constant expressions and initializers - -- [x] Replace the previous narrow `i64` integer evaluator with a typed, - target-width integer constant-expression evaluator. - Covered cases include integer literal type selection by suffix/base/value, - integer promotions, usual arithmetic conversions, logical operators, - conditional expressions, integer casts, immediate floating-constant casts, - and shift-count diagnostics. - Tests: `6_6_10_logical_cond_const`, - `6_6_11_unsigned_const_expr`, and - `6_6_shift_count_out_of_range`. - Code: `eval_const_int_typed`, `CConstInt`, and `eval_const_int` in - `parse_expr.c`. -- [x] Accept `_Alignof` in integer constant expressions. - Positive array-bound coverage passes. -- [x] Generalize constant-expression classification beyond integer ICE call - sites so arithmetic constants, address constants, null pointer constants, - and static initializer validation share one semantic evaluator. - Static scalar, pointer, address, null pointer, and bit-field - initializers now flow through one `CStaticConst` classifier layered on - the typed integer constant evaluator. -- [x] Complete static initializer address constants: - object address, function address, array plus/minus integer constant, - and null pointer constants. - Positive object/function/array-plus-integer address constants pass, and - null pointer constants now include full integer constant expressions. -- [x] Implement static-storage union initialization or document a temporary - nonconformance gate. - Positive non-first union designated initializer passes. -- [x] Complete designated initializers: - nested designators, enum-valued array designators, duplicate designator - overwrite rules, non-first union member. - Positive nested, enum-valued, duplicate overwrite, and non-first union - coverage passes. -- [x] Add diagnostics for initializer overflow, excess scalar initializers, - non-constant static initializers, and invalid designators. - New diagnostics for excess scalar initializers, non-constant static - initializers, invalid array/struct/union designators, and signed static - integer initializer overflow pass. - -## Preprocessor and translation phases - -- [x] Object/function-like macros, stringize, paste, rescan, conditionals, - includes, line control, unknown pragmas, and `#embed` have passing tests. -- [ ] Audit remaining C11 translation-phase requirements: - universal character names, multibyte characters, trigraph policy, - diagnostics for invalid preprocessing tokens, and line-splice edge cases. -- [ ] Add conformance tests for implementation-defined preprocessor behavior - documented in C11 Annex J.3.12. -- [ ] Decide whether `#embed` is extension-only under strict C11 mode once a - strict mode exists. - -## Freestanding library surface - -C11 freestanding requires at least `<float.h>`, `<iso646.h>`, `<limits.h>`, -`<stdalign.h>`, `<stdarg.h>`, `<stdbool.h>`, `<stddef.h>`, `<stdint.h>`, and -`<stdnoreturn.h>`. This tree also ships `assert.h` and `stdatomic.h`. -`setjmp.h` and `cfree/coro.h` are advertised freestanding extensions: they -depend on target register context, not hosted OS services. - -Status: complete for the current freestanding C11 profile. Keep this gate -green with `make rt`, `make test-rt-headers`, `make test-rt-runtime`, and -`make test-lib-deps`. - -- [x] Add header compile smoke tests for every freestanding header across the - default runtime targets. - Test: `make test-rt-headers` / `test/smoke.c`. -- [x] Add macro/value tests for `limits.h`, `stdint.h`, `stddef.h`, and - `float.h` against target ABI expectations. - Test: `make test-rt-headers` / `test/smoke.c`. -- [x] Add `stdarg.h` runtime tests for AArch64, x86-64, and RV64. - Test: `make test-rt-runtime`. -- [x] Get `stdatomic.h` tests passing against both parser builtins and - `libcfree_rt.a`. - Test: `make test-rt-runtime`. -- [x] Fix `make rt` before treating atomics as conforming. -- [x] Keep `setjmp.h` as an advertised freestanding extension; classify - `cfree/coro.h` the same way. - -## Strict mode and extensions - -Today the frontend accepts GNU extensions needed by the project. C11 -conformance needs a mode story. - -- [ ] Add a driver/frontend option for strict C11 diagnostics, or document that - the current mode is GNU-ish C11. -- [ ] Classify extensions: `__int128`, `asm`, GNU attributes, statement - expressions if added, binary integer literals, `#embed`, and cfree - builtins. -- [ ] In strict mode, diagnose extensions that can invalidate strictly - conforming programs. -- [ ] Keep extension tests separate from strict C11 tests. - -## Suggested working order - -1. Keep `test-parse-err` green while broadening semantic diagnostics beyond - the first targeted cases. -2. Add a compact "semantic type checks" helper layer so assignment, return, - initialization, conditional expressions, and calls share rules. - Helper coverage now includes assignment, compound assignment, - redeclaration, calls, and initializer/return use sites. -3. Fix declaration-state tracking: redeclarations, tentative definitions, - function definitions, tag completion, and composite types. - Ordinary identifier redeclarations, tentative/defined state, composite - object/function types, and tag completion are covered. -4. Keep bit-field layout/codegen covered while broadening target ABI tests. -5. Keep the shared static-initializer/address-constant classifier as the only - static initializer constant path when adding new initializer forms. -6. Unskip `long double` or explicitly narrow the supported C profile until - runtime/CG support exists. -7. Keep the completed freestanding runtime/header gate green while expanding - target coverage. diff --git a/doc/C11_LONG_DOUBLE_CHECKLIST.md b/doc/C11_LONG_DOUBLE_CHECKLIST.md @@ -1,110 +0,0 @@ -# C11 `long double` support checklist - -Status snapshot: 2026-05-19. - -Goal: make `long double` target-correct instead of aliasing it to `double`. -Keep this red-green: add the smallest target-scoped case first, then make the -implementation pass it on the target that owns that format. - -## Target profiles - -- [x] AArch64 Linux: IEEE binary128 `long double`. - ABI: passed and returned in SIMD/FP `q` registers when register slots are - available. Arithmetic and conversions lower to compiler-rt `*tf*` - helpers. -- [x] RV64 Linux LP64D: IEEE binary128 `long double`. - ABI: passed and returned as two integer XLEN eightbytes because FLEN is - 64. Arithmetic and conversions lower to compiler-rt `*tf*` helpers. -- [ ] AArch64 Darwin: `long double == double`. - Keep the current binary64 behavior and predefined macros for this OS. -- [ ] x86-64 SysV/Darwin: x87 80-bit extended precision in 16-byte storage. - Defer as a separate backend slice; it needs x87 load/store/arithmetic, - x87 return handling, and `LDBL_*` macro updates. Do not block the - binary128 work on this. - -## Support target for the binary128 slice - -- [x] Complete the 16-byte scalar `__int128` path before treating binary128 as - green: layout, locals/globals, constants, arithmetic, shifts, compares, - calls/returns, aggregate fields, unions, and static initialization. -- [x] Add a target long-double profile query used by both the frontend and CG: - format, storage size, alignment, macro values, and ABI classification. -- [x] Add a distinct CG type for binary128 `long double`; `TY_LDOUBLE` must not - map to `F64` on AArch64/RV64 Linux. -- [x] Emit target-correct `__LDBL_*` and `__DECIMAL_DIG__` predefined macros - for binary128 targets. -- [x] Encode `L` floating constants as binary128 bytes without narrowing their - storage type to `double`. -- [x] Support binary128 local/global storage, assignment, struct fields, and - return values. -- [x] Lower binary128 arithmetic to runtime helpers: - `__addtf3`, `__subtf3`, `__multf3`, and `__divtf3`. -- [x] Lower binary128 comparisons through compiler-rt compare helpers. -- [x] Lower integer, float, and double conversions through compiler-rt helpers: - `__float*tf`, `__fix*tf*`, `__extend{s,d}ftf2`, and - `__trunctf{s,d}f2`. -- [x] Teach AArch64 codegen to move 16-byte FP values through Q-register - load/store/copy paths. -- [x] Teach RV64 ABI movement to pass/return binary128 values as two integer - parts, backed by memory in CG. -- [x] Keep runtime linkage using the existing `rt/lib/fp_tf/fp_tf.c` and - `rt/lib/fp_ti/fp_ti.c` objects for the binary128 runtime variants. - -## Red tests - -The support-target tests live under `test/parse/cases/i128_*.c` and -`test/parse/cases/ldbl128_*.c`. Run the `i128` group first; those cases isolate -the 16-byte integer substrate needed by compiler-rt binary128 helpers and by -the memory-backed long-double lowering. - -```sh -CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=i128 CFREE_OPT_LEVELS=0 make test-parse -CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=i128 CFREE_OPT_LEVELS=0 make test-parse -CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=ldbl128 make test-parse -CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=ldbl128 make test-parse -``` - -The `ldbl128` cases intentionally return success on non-binary128 targets so -x87 work can land later without hiding the binary128 regression signal. - -Coverage intent: - -- `i128_01` through `i128_14`: target layout/alignment, literal storage, - add/sub carry, multiply high-half behavior, div/mod, shifts/bitwise - operations, signed and unsigned compares, signed shifts/conversions, - calls/returns, aggregate fields, union lane visibility, and global - initialization, arbitrary signed div/mod, and arbitrary signed/unsigned - multiplication. -- `ldbl128_01` through `ldbl128_15`: target macros/layout, literal decoding, - arithmetic helpers, conversions, comparisons, calls/returns, struct and - array storage, raw binary128 bits, globals, unary negation, stack - arguments, mixed arithmetic, aggregate return, and arbitrary binary128 - multiplication. - -Known remaining limits: - -- The binary128 support target is Linux AArch64/RV64. Darwin `long double` - target rules and x87 80-bit `long double` are still separate follow-up - targets. -- Decimal `L` literal coverage currently exercises representable values and - raw canonical encodings; it does not yet prove full decimal-to-binary128 - precision for non-representable literals. -- ABI aggregate classification still covers the implemented scalar and simple - aggregate paths, not the full AArch64 HFA/HVA or every RV64 aggregate - flattening edge. - -## Done criteria - -- [x] `CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=ldbl128 make test-parse` passes - with `CFREE_TEST_ALLOW_SKIP` unset. -- [x] `CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=ldbl128 make test-parse` passes - with `CFREE_TEST_ALLOW_SKIP` unset. -- [x] `CFREE_TEST_ARCH=aa64 CFREE_TEST_FILTER=i128 make test-parse` passes - with `CFREE_TEST_ALLOW_SKIP` unset. -- [x] `CFREE_TEST_ARCH=rv64 CFREE_TEST_FILTER=i128 make test-parse` passes - with `CFREE_TEST_ALLOW_SKIP` unset. -- [x] `CFREE_TEST_FILTER=6_7_2_12_long_double make test-parse` passes on - AArch64 Linux and RV64 Linux without a `.skip` sidecar. -- [x] `make rt` still builds the default runtime archives. -- [x] `make test-rt-headers test-rt-runtime` stays green for the default - runtime targets. diff --git a/doc/CTOOLCHAIN.md b/doc/CTOOLCHAIN.md @@ -1,296 +0,0 @@ -# C Toolchain Gap Analysis - -What a typical `Makefile` or build-system invokes vs. what `cfree` currently -ships in its driver, and what's missing inside `libcfree` to close those -gaps. Companion to the toolchain summary in `README.md`. - -Snapshot as of 2026-05-20. - -## Tool inventory - -| Tool | Status | Notes | -| --------- | ----------------- | -------------------------------------------------- | -| `cc` | shipped | `driver/cc.c`; broad GCC-subset surface | -| `cpp` | shipped | `driver/cpp.c`; thin wrapper over `cfree_c_preprocess` | -| `as` | shipped | `driver/as.c`; GAS-subset, single input | -| `ld` | shipped | `driver/ld.c` | -| `ar` | shipped | `driver/ar.c`; r/c/t/x/p + `s` modifier | -| `ranlib` | shipped | `driver/ranlib.c` | -| `objdump` | shipped | `driver/objdump.c` | -| `nm` | missing | symbols only; reuse `cfree_obj_symiter_*` | -| `size` | missing | section sizes from `cfree_obj_section` | -| `strings` | missing | trivial; no `libcfree` API needed | -| `file` | missing | `cfree_detect_fmt` already classifies | -| `addr2line` | missing | needs DWARF query API surface (already used internally) | -| `readelf` | partly via objdump | objdump covers most of GNU `readelf -a` | -| `strip` | blocked | needs builder mutator API; see below | -| `objcopy` | blocked | needs builder mutator API; see below | -| `c++filt` | n/a | C only | -| `gprof` / `gcov` | n/a | no profiling/coverage support today | -| `ldd`, `ldconfig`, dynamic loader | n/a | host-provided | - -`cfree`-specific tools (`run`, `dbg`, `emu`) are out of scope for this -document. - -## Strip / Objcopy - -Both are **blocked on a builder-mutator surface** that does not yet exist -in `libcfree`. The reader produces an already-finalized `CfreeObjBuilder` -(per `src/obj/obj.h` "lifecycle gates" — `obj_finalize` freezes the -read-side view, no further writes permitted). Pure roundtrip works (open -→ emit), but neither tool needs *only* roundtrip — both need to **remove** -and **rename** existing structure. - -### Operations matrix - -| Operation | What it needs | Have today? | -| ---------------------------------------- | -------------------------------------------- | --------------------------------- | -| `strip --strip-debug` / `objcopy --strip-debug` | drop `CFREE_SEC_DEBUG` sections | reader exposes kind ✓ — emit filter missing | -| `strip --strip-all` | drop debug + symtab | needs emit-time symbol filter | -| `strip --strip-unneeded` | keep only relocation-referenced symbols | reader exposes reloc→sym ✓ — needs builder symbol filter | -| `strip --keep-symbol=N` / `--strip-symbol=N` | symbol predicate | needs builder symbol filter | -| `objcopy --remove-section=N` | drop section by name | needs builder mutator | -| `objcopy --only-section=N` | inverse of above | needs builder mutator | -| `objcopy --rename-section old=new[,flags]` | mutate section name + flags | needs builder mutator | -| `objcopy --add-section name=file` | add new section from external bytes | already possible via existing builder API | -| `objcopy --update-section name=file` | replace section contents | needs builder mutator | -| `objcopy --redefine-sym old=new` | rename symbol | needs builder mutator | -| `objcopy --globalize-symbol`/`--localize-symbol`/`--weaken-symbol` | mutate `CfreeSymBind` | needs builder mutator | -| `objcopy --extract-symbol` | emit a symbol's bytes as its own object | needs builder mutator + new emit | -| `objcopy --only-keep-debug` | keep only `.debug_*` + symtab | needs builder mutator | -| `objcopy --add-gnu-debuglink=FILE` | append debuglink section + CRC | needs CRC32 helper + add-section | -| `objcopy -O <bfdname>` (format convert) | ELF ↔ Mach-O ↔ COFF roundtrip | builder is already format-neutral; should work once mutators land | -| `objcopy --change-section-address=...` | adjust section VMA / LMA | needs builder mutator | -| `objcopy -I/-O binary`, `srec`, `ihex` | flat-binary / S-record / Intel-hex output | not supported; new emitters | - -### What `libcfree` needs - -Beyond the builder mutators, a few smaller items: - -1. **Section-group reader iterator** (`CfreeObjGroupIter`, - `cfree_obj_groupiter_new/next/free`, `CfreeObjGroupInfo`). The builder - has `cfree_obj_builder_group` and `_group_add_section`, but the reader - exposes no way to enumerate existing groups. Any objcopy that touches - a COMDAT-bearing object would lose grouping on roundtrip without this. - -2. **Builder mutator API.** Minimal MVP that unblocks strip and the - common objcopy operations: - - ```c - CfreeStatus cfree_obj_builder_remove_section(CfreeObjBuilder *, CfreeObjSection); - CfreeStatus cfree_obj_builder_remove_symbol(CfreeObjBuilder *, CfreeObjSymbol); - CfreeStatus cfree_obj_builder_rename_section(CfreeObjBuilder *, CfreeObjSection, - CfreeSym new_name); - CfreeStatus cfree_obj_builder_rename_symbol(CfreeObjBuilder *, CfreeObjSymbol, - CfreeSym new_name); - CfreeStatus cfree_obj_builder_symbol_set_bind(CfreeObjBuilder *, CfreeObjSymbol, - CfreeSymBind); - ``` - - These would need to lift the post-finalize-frozen invariant — either - by re-opening the builder for writes, or by adding a parallel - filtered-emit path that takes a callback predicate. The latter is - probably less invasive. - -3. **DSO / executable inputs are a separate problem.** `cfree_obj_open` - reads relocatable `.o` cleanly, but stripping a *linked* ELF - (executable or DSO) means understanding `.dynsym`, `.dynstr`, - `.hash`/`.gnu.hash`, `.dynamic`, `.got`, `.plt`, `.rela.plt`, - `PT_NOTE` (build-id), and a `PT_DYNAMIC` segment — most of which are - linker-managed, not builder-managed. GNU `strip` and `objcopy` can - operate on these because `bfd` round-trips the full dynamic-linking - state. We don't model that today. Scope strip/objcopy to `.o` and - `.a` for the first cut. - -### Suggested sequencing - -1. Add the section-group reader iterator (small, no mutator concerns). -2. Add the builder mutator API for sections + symbols. -3. Implement `strip` (relocatable inputs only) as a driver tool. Factor - the per-member symbol-collection block from `driver/ar.c` and - `driver/ranlib.c` into a shared helper while we're touching the area. -4. Implement `objcopy` (relocatable inputs only). The `--add-section` - / `--rename-section` / `--redefine-sym` / `--strip-*` subset covers - the vast majority of build-system use. -5. Defer DSO/exe strip+objcopy, format-conversion to non-object outputs - (`binary`, `srec`, `ihex`), and `objcopy --only-keep-debug` / - `--add-gnu-debuglink` (the split-debuginfo flow). - -## Flag-surface gaps - -Methodology: each tool's argv parser was compared against the union of -GCC's `cc` and the corresponding binutils tool. Flags that are -silently accepted as no-ops (e.g. `-pipe`, `-std=`) are not gaps. - -### `cc` — broad surface; the gaps are mostly autotools/CMake probes - -- **Pass-through flag families.** `-Wp,...` (preprocessor) and `-Wa,...` - (assembler) are missing. `-Wl,...` is supported. `-Xpreprocessor` / - `-Xassembler` similarly missing; `-Xlinker` is supported. -- **Compiler-information probes** (used by autoconf, CMake's compiler - detection): `-print-search-dirs`, `-print-file-name=`, - `-print-prog-name=`, `-print-libgcc-file-name`, - `-print-multi-os-directory`, `-print-resource-dir`, `-dumpmachine`, - `-dumpversion`, `-dumpspecs`. Some build systems hard-fail when these - return nothing. -- **Linker convenience.** `-rdynamic` (≡ `-Wl,--export-dynamic`) not - wired through. -- **Dep emission.** `-Wp,-MD,FILE` form not handled (GNU make's - auto-dependency idiom). -- **Response files.** `@file` not supported; long CMake invocations on - some platforms exceed `ARG_MAX`. -- **Code-gen tuning.** `-march=`, `-mtune=`, `-mcpu=`, `-mfpu=`, - `-msse*`, `-mavx*` — none implemented. Currently silently no-op'd - via the `-W…`/`-f…` catch-all in `cc_parse`. -- **Other compiler flags accepted as no-ops** (call-site behaviour ≠ - ABI-correctness): `-fvisibility=`, `-fcommon`/`-fno-common`, - `-fstack-protector*`, `-fno-omit-frame-pointer`, `-funwind-tables`, - `-fexceptions`, `-static-libgcc`, `-shared-libgcc`, - `-fsyntax-only`, `-fdiagnostics-color`, `-save-temps`. -- **Long forms.** `--output=PATH`, `--include=`, etc. -- **Includes.** `-iquote`, `-idirafter`, `-include` are currently - swallowed as no-ops (`driver/cc.c:939`); should land in the cflags - surface. - -### `cpp` — same baseline as `cc -E` - -Inherits all of cc's `-I/-isystem/-D/-U` + dep emission. Specific -gaps that exist equally in `cc -E`: - -- `-P` — suppress `#line` markers -- `-dM` — dump defined macros instead of expanded source -- `-C`, `-CC` — preserve comments -- `-traditional-cpp` -- `-fno-show-column` -- `-Wp,-MD,FILE` (see above) - -### `as` — minimal surface - -- **No code-gen target selection.** `-march=`, `-mcpu=`, `-mtune=`, - `-mabi=` (riscv `lp64d` vs `lp64`), `--32`/`--64`, `-m32`/`-m64`. -- **No warnings control.** `-W`, `-Z`, `--warn`, `--fatal-warnings`, - `--no-warn`. -- **No `-MD <file>`** for assembler-side dependency emission on `.S`. -- **No assembly listings.** `-a` family (`-al`, `-as`, `-an`, …), - `--listing-*`, `--statistics`. -- **No DWARF version selection.** Only blanket `-g`; missing - `--gdwarf-2/3/4/5`, `--gstabs`, `--gdwarf-sections`. -- **No PIC / passthrough flags.** `-K`, `-Q`, `-k`. -- **One input only.** GNU `as` accepts multiple sources and - concatenates. -- **No `-defsym SYM=VAL`** for assemble-time constant injection. -- **No stdin input** (`-`). - -### `ld` — strong; gaps are advanced features and `-z` flags - -- **`-z` options** (used by every distro): `-z now`, `-z relro`, - `-z noexecstack`, `-z defs`, `-z origin`, `-z notext`, `-z lazy`, - `-z combreloc`, `-z text`. These map to ELF dynamic-tag bits and - segment flags that the linker already emits in some form — wiring - them up should be small per flag. -- **Link maps.** `-M` / `--print-map`, `-Map=FILE`, - `--print-gc-sections`, `--print-memory-usage`. -- **Symbol-resolution policy.** `--no-undefined`, - `--allow-shlib-undefined`, `--unresolved-symbols={...}`. -- **Symbol surgery.** `--wrap=SYMBOL`, `--defsym=SYM=EXPR`, - `--undefined=SYM`, `--retain-symbols-file`. -- **Version scripts / dynamic lists.** `--version-script`, - `--dynamic-list`, `--exclude-libs`. -- **Hash style.** `--hash-style={sysv,gnu,both}`, `--no-gnu-hash`. -- **Section placement.** `--section-start=NAME=ADDR`, `-Ttext=`, - `-Tdata=`, `-Tbss=`. -- **Cross-reference.** `--cref`. -- **Identical-code folding.** `--icf={none,safe,all}`. -- **Init/fini.** `--init`, `--fini` for non-default entry symbols. -- **Sort/common.** `--sort-section`, `--sort-common`, - `--no-define-common`. -- **Endianness / emulation.** `--EB`, `--EL`, `-m EMULATION` (currently - auto-detected from inputs; the `-m` form is missing). -- **Strip flags.** `--strip-all`, `--strip-debug`, `-s`, `-S` (would - pair with the strip work above). -- **ELF notes.** `--package-metadata=` (a fielded use case in distro - packaging). -- **Response files.** `@file`. -- **Stdin input** (`-`). - -### `ar` — POSIX covered; binutils extensions missing - -- **Operations.** `d` (delete), `q` (quick append), and standalone - `s` (now provided by `cfree ranlib`, but `ar s` is also expected). - `m` (move member). `b NAME` / `a NAME` / `i NAME` (positional - insertion modifiers paired with `r`/`m`). -- **Modifiers.** `D`/`U` (deterministic / non-deterministic; `D` is - GNU's default. `SOURCE_DATE_EPOCH` is similar but not equivalent). - `N <count>` (Nth instance of a duplicated member name). `P` (full - pathname match). `o` (preserve mtime on extract). `S` (suppress - symbol index — opposite of `s`). `T` (thin archive — used by LLVM). -- **MRI script mode** (read commands from stdin). Rarely used; skip. - -### `objdump` — biggest gap among the shipped tools - -- **Aggregate flags.** `-x` (all headers ≡ `-f -p -h -r -t`), `-f` - (file header), `-p` (program header / private). -- **Source intermixing.** `-S` (intermix source — DWARF line info - already available), `-l` (line numbers in disasm / relocs). -- **Disassembly scope.** `--disassemble=SYM`, `--start-address=`, - `--stop-address=`. -- **Disassembly formatting.** `-z` (don't skip zeros), `-w` (wide - output), `--no-show-raw-insn`, `--prefix-addresses`, `-M ATTR` - (e.g. `-M intel` for x86 syntax). -- **Dynamic vs. static.** `-R` (dynamic relocations) vs the existing - `-r` (static); `-T` (dynamic symbols) vs the existing `-t` - (static). -- **DWARF dumping.** `-W` / `--dwarf=...` — cfree emits DWARF and - exposes reader APIs, so this should be straightforward. -- **Long forms.** `--syms`, `--section-headers`, `--archive-headers`, - `--all-headers`, `--file-offsets`. -- **Override format / arch.** `-b BFDNAME`, `-m ARCH`, `-EB`/`-EL`. -- **C++ demangling.** `-C`, `--demangle` — N/A for C; can land as a - silent no-op once it's needed. - -## Windows (PE/COFF) target - -Cross-compilation to Windows requires the mingw-w64 sysroot for system -libraries and CRT bits. Set `CFREE_MINGW_SYSROOT` to the -`<toolchain>/x86_64-w64-mingw32` directory (or pass `-isysroot` / -`--sysroot`) so the `cc` driver appends `$SYSROOT/lib` to the library -search path. Both `cc -lFOO` and `ld -lFOO` resolve Windows libraries -using the suffix list `libFOO.dll.a` → `libFOO.a` → `FOO.lib` → -`FOO.dll.a` (mingw-canonical first, MSVC-style fallback). - -Example invocations: - -```sh -export CFREE_MINGW_SYSROOT=/opt/homebrew/opt/mingw-w64/toolchain-x86_64/x86_64-w64-mingw32 - -# Compile-only: produces hello.obj (note .obj suffix on Windows targets). -cfree cc -target x86_64-windows -c hello.c - -# Inspect a PE32+ image. -p prints the optional header, data -# directories, and per-DLL import lists. -cfree objdump -p hello.exe - -# Link via MSVC-style flag surface (opt-in via --ms-link-driver): -cfree ld --ms-link-driver /OUT:hello.exe /SUBSYSTEM:CONSOLE \ - /DEFAULTLIB:kernel32 hello.obj -``` - -Windows predefined macros emitted by `cc -target x86_64-windows`: -`_WIN32`, `_WIN64`, `WIN32`, `__MINGW32__`, `__MINGW64__`, `_M_X64`, -`_M_AMD64`. `aarch64-windows` substitutes `_M_ARM64` for the -x64-specific names. `_MSC_VER` is deliberately not set — cfree targets -the mingw flavor on Windows (DWARF debug info, mingwex CRT), not MSVC. - -## Recommended next moves - -1. **Add to `cc` first**: `-rdynamic`, `-print-search-dirs`, - `-print-file-name`, `-print-prog-name`, `-dumpmachine`, - `-dumpversion`, `@file`. These unblock most autotools/CMake - probes for very little code. -2. **Add to `ld`** the `-z` family (`-z now`, `-z relro`, - `-z noexecstack` are the high-traffic three) and `-Map=FILE`. -3. **Add to `objdump`** the `-x` aggregate, `-S`, `-l`, and - `--dwarf=...`. Most "I want to see what the compiler produced" - debug sessions need at least one of these. -4. **Then unblock strip/objcopy** via the builder mutator API and - ship strip first (smaller surface than objcopy). diff --git a/doc/FRONTEND.md b/doc/FRONTEND.md @@ -1,412 +0,0 @@ -# Interactive Frontend REPL - -This document tracks the current source-frontend REPL shape and the remaining -work to make `cfree dbg` a full interactive compile/link/publish environment. -The immediate API direction is stateful handles at each layer: - -- `CfreeCompileSession` owns frontend state for one source language. -- `CfreeCg` owns reusable codegen metadata and binds one object delta at a time. -- `CfreeLinkSession` owns linker inputs and resolution state. -- `CfreeJit` owns the live executable image and publishes resolved deltas. - -One-shot public compile/link/JIT append APIs are being removed. Callers should -create sessions, add inputs, resolve or publish, and then free the sessions. - -## Current State - -- [x] Registered frontends use the lifecycle vtable - `new_frontend`, `compile`, `free_frontend`. -- [x] Public source compilation goes through `CfreeCompileSession`: - `cfree_compile_session_new`, `cfree_compile_session_compile`, - `cfree_compile_session_free`. -- [x] Public one-shot source APIs such as `cfree_compile_c_obj`, - `cfree_compile_c_emit`, `cfree_compile_asm_obj`, and - `cfree_compile_asm_emit` have been removed. -- [x] `CfreeSourceInput` carries per-delta REPL shape: - `input_kind` and `repl_entry_name`. -- [x] `CfreeCompileSessionOptions` keeps fixed session options: - language, code options, diagnostics, and language-specific options. -- [x] `CfreeCg` is reusable across object deltas with - `cfree_cg_begin_obj` and `cfree_cg_end_obj`. -- [x] `CfreeLinkSession` is public and owns link inputs plus resolve/emit/JIT - operations. -- [x] Public one-shot link APIs have been removed; drivers and tests create - `CfreeLinkSession` directly. -- [x] Public JIT append is `cfree_jit_publish`. The v1 publish mode supports - append-object batches through a `CfreeLinkSession`. -- [x] `driver/cc.c`, `driver/as.c`, `driver/ld.c`, `driver/inputs.c`, - `driver/runtime.c`, `driver/dbg.c`, and the active harnesses have been - migrated to session APIs. -- [x] `cfree dbg` can start from an empty JIT image. The default REPL language - is selected by `-x LANG` / `--language LANG` or changed with `:language`. -- [x] `driver/dbg.c` caches `CfreeCompileSession*` per language for snippets - typed during the REPL. -- [x] `driver/dbg.c` publishes snippets by creating a temporary - `CfreeLinkSession`, adding the object delta, and calling `cfree_jit_publish`. -- [x] Toy supports top-level snippets, bare expression wrappers, block wrappers, - persistent globals, persistent functions, and persistent nominal types across - REPL snippets. -- [x] Scripted driver tests cover the Toy REPL append/expression path. -- [ ] Initial source files passed to `cfree dbg` are not yet used to seed the - cached per-language `CfreeCompileSession`, so REPL expressions do not have - source-frontend declarations from those initial files. -- [ ] C and Wasm still need true interactive frontend state. C currently owns - parser/preprocessor/declaration state per compile; Wasm is still module-shaped. -- [ ] `CfreeLinkSession` is session-shaped, but incremental watermarks, - symbol-version policy, and replace/redefine semantics are still skeletal. -- [ ] `cfree_jit_publish` function replacement is shaped in the API but returns - `CFREE_UNSUPPORTED`. - -## Target UX - -Interactive sessions should support these workflows: - -```c -(cfree) :language c -(cfree) jit { #define SCALE(x) ((x) * 3) } -(cfree) jit { typedef struct { int x; int y; } Point; Point p = {4, 5}; } -(cfree) SCALE(p.x + p.y) -$1 = 27 (0x1b) -``` - -```text -cfree dbg -x toy -(cfree) jit { type Point = record { x: i64, y: i64 }; let p: Point = .{ .x = 4, .y = 5 }; } -(cfree) p.x + p.y -$1 = 9 (0x9) -``` - -```wat -(cfree) :language wat -(cfree) jit { (module (func (export "add") (param i64 i64) (result i64) local.get 0 local.get 1 i64.add)) } -(cfree) invoke add 4 5 -$1 = 9 (0x9) -``` - -For C and Toy, unrecognized bare input should be the expression/thunk fallback -and should compile a language-native expression wrapper. The explicit `expr` -command can remain as an alias. For Wasm, the natural interactive unit is a -module plus explicit export invocation; WAT expression shortcuts can come later -as sugar over generated modules. - -## Public API Shape - -### Source Compile - -Current public source compilation: - -```c -typedef enum CfreeFrontendInputKind { - CFREE_FRONTEND_INPUT_TRANSLATION_UNIT, - CFREE_FRONTEND_INPUT_REPL_TOPLEVEL, - CFREE_FRONTEND_INPUT_REPL_EXPR, - CFREE_FRONTEND_INPUT_REPL_BLOCK, -} CfreeFrontendInputKind; - -typedef struct CfreeSourceInput { - CfreeBytes bytes; - CfreeLanguage lang; - CfreeFrontendInputKind input_kind; - const char *repl_entry_name; -} CfreeSourceInput; - -typedef struct CfreeFrontendCompileOptions { - CfreeCodeOptions code; - CfreeDiagnosticOptions diagnostics; - const void *language_options; - CfreeFrontendInputKind input_kind; - const char *repl_entry_name; -} CfreeFrontendCompileOptions; - -typedef struct CfreeCompileSessionOptions { - CfreeLanguage lang; - CfreeFrontendCompileOptions compile; -} CfreeCompileSessionOptions; - -CfreeStatus cfree_compile_session_new(CfreeCompiler *, - const CfreeCompileSessionOptions *, - CfreeCompileSession **out); -CfreeStatus cfree_compile_session_compile(CfreeCompileSession *, - const CfreeSourceInput *, - CfreeObjBuilder **out); -void cfree_compile_session_free(CfreeCompileSession *); -``` - -`input_kind` and `repl_entry_name` are copied from `CfreeSourceInput` into the -frontend compile options for each delta. This lets a debugger keep one -`CfreeCompileSession` alive while alternating top-level snippets, expression -thunks, and block thunks. - -### Codegen - -Current public codegen lifecycle: - -```c -CfreeStatus cfree_cg_new(CfreeCompiler *, CfreeCg **out); -CfreeStatus cfree_cg_begin_obj(CfreeCg *, CfreeObjBuilder *, - const CfreeCodeOptions *); -CfreeStatus cfree_cg_end_obj(CfreeCg *); -void cfree_cg_free(CfreeCg *); -``` - -`CfreeCg` preserves compiler-level metadata across object deltas. Object-bound -target, MC, and debug state are created by `begin_obj` and finalized by -`end_obj`. - -Remaining codegen work: - -- [ ] Define exactly which symbol/type/metadata tables persist across - `begin_obj`/`end_obj` for each frontend. -- [ ] Seed each new object builder with external declarations for all known - frontend symbols that may be referenced by later snippets. -- [ ] Add a focused test proving two objects emitted by one `CfreeCg` can refer - to each other after link/JIT resolution. -- [ ] Audit failed-object cleanup so a failed snippet cannot corrupt the - persistent frontend or CG metadata. - -### Link - -Current public link lifecycle: - -```c -CfreeStatus cfree_link_session_new(CfreeCompiler *, - const CfreeLinkSessionOptions *, - CfreeLinkSession **out); -CfreeStatus cfree_link_session_add_obj(CfreeLinkSession *, CfreeObjBuilder *); -CfreeStatus cfree_link_session_add_obj_bytes(CfreeLinkSession *, CfreeBytes); -CfreeStatus cfree_link_session_add_archive_bytes(CfreeLinkSession *, - CfreeBytes); -CfreeStatus cfree_link_session_add_dso_bytes(CfreeLinkSession *, CfreeBytes); -CfreeStatus cfree_link_session_resolve(CfreeLinkSession *); -CfreeStatus cfree_link_session_emit(CfreeLinkSession *, CfreeWriter *out); -CfreeStatus cfree_link_session_jit(CfreeLinkSession *, CfreeJit **out_jit); -void cfree_link_session_free(CfreeLinkSession *); -``` - -Remaining link work for a full interactive REPL: - -- [ ] Keep durable symbol-resolution state for incremental sessions instead of - treating each publish as a mostly independent batch. -- [ ] Define duplicate strong symbol policy across REPL generations. -- [ ] Define weak/common/TLS behavior across appended generations. -- [ ] Track generation watermarks so diagnostics can say whether a symbol came - from the initial image or a later snippet. -- [ ] Add tests where a later object resolves references against prior objects, - archives, DSOs, and the initial JIT image. -- [ ] Add negative tests for unresolved symbols, duplicate definitions, and - unsupported relocation modes in interactive publish. - -### JIT Publish - -Current public publish lifecycle: - -```c -typedef enum CfreeJitPublishKind { - CFREE_JIT_PUBLISH_APPEND_OBJECTS, - CFREE_JIT_PUBLISH_REPLACE_SYMBOLS, -} CfreeJitPublishKind; - -typedef struct CfreeJitPublishOptions { - uint8_t kind; - CfreeLinkSession *link; -} CfreeJitPublishOptions; - -typedef struct CfreeJitPublishResult { - uint64_t generation; -} CfreeJitPublishResult; - -CfreeStatus cfree_jit_publish(CfreeJit *, const CfreeJitPublishOptions *, - CfreeJitPublishResult *); -``` - -Remaining publish work: - -- [ ] Implement `CFREE_JIT_PUBLISH_REPLACE_SYMBOLS` or remove it until the - semantics are fully specified. -- [ ] Preserve old symbol addresses for append-only generations and test that - invariant directly. -- [ ] Decide how the debugger should surface symbols shadowed or replaced by a - later generation. -- [ ] Keep DWARF and symbol iteration generation-aware. -- [ ] Add tests that publish increments the JIT generation and keeps old - function pointers callable after new snippets are appended. - -## Debugger Driver - -- [x] Cache `CfreeCompileSession*` per language in `DbgState`. -- [x] Keep cached compile sessions alive for the whole REPL session. -- [x] Treat a REPL line beginning with `{` as shorthand for `jit { ... }`. -- [x] Add `:language c|toy|wat|wasm|asm`. -- [x] Make `jit`, explicit `expr`, and bare fallback input honor the selected - language through `CfreeSourceInput.input_kind`. -- [x] Add `-x LANG` / `--language LANG` to choose the default REPL language - before any source file exists. -- [x] Make `:language` with no argument report the current language and whether - that language has a cached compile session. -- [x] Allow an empty initial JIT image; `run` resolves the entry lazily so a - later snippet can define `main`. -- [x] Remove driver-side language-specific thunk fabrication for Toy. Frontends - own REPL expression/block wrapping. -- [ ] Seed cached compile sessions from initial source-file inputs. -- [ ] Reuse or persist link-session state where that is needed for incremental - diagnostics and symbol policy. -- [ ] Keep DWARF/JIT symbol recovery for inspecting external/preexisting code, - not as the normal path for declarations typed during the current session. -- [ ] Add command support for Wasm export invocation. - -## C Checklist - -Persistent C context must include the preprocessor, file-scope identifiers, -tags, typedefs, declaration table, and CG symbol/type handles. - -- [ ] Change `CFrontend` to own a long-lived `Pool`. -- [ ] Change `CFrontend` to own a long-lived `Pp`. -- [ ] Apply command-line include paths, predefined macros, `-D`, and `-U` once - at frontend creation or first compile, with clear behavior if options change. -- [ ] Keep the file-scope `Scope` alive across snippets. -- [ ] Keep `DeclTable` alive across snippets. -- [ ] Keep one persistent `CfreeCg` and bind a new object with - `cfree_cg_begin_obj` per snippet. -- [ ] Split parser initialization from translation-unit parsing so a parser can - reuse file-scope state with a new lexer. -- [ ] Ensure failed snippets do not corrupt persistent scope or macro state. - Initial implementation may mark a session frontend poisoned after hard errors. -- [ ] Implement `CFREE_FRONTEND_INPUT_REPL_TOPLEVEL` for normal declarations and - definitions. -- [ ] Implement `CFREE_FRONTEND_INPUT_REPL_EXPR` by wrapping the expression as: - -```c -unsigned long long __cfree_dbg_expr_N(void) { - return (unsigned long long)(USER_EXPR); -} -``` - -- [ ] Implement `CFREE_FRONTEND_INPUT_REPL_BLOCK` by wrapping a block as: - -```c -unsigned long long __cfree_dbg_expr_N(void) { - USER_STATEMENTS -} -``` - -- [ ] Decide whether block mode requires an explicit `return` or permits - expression-final shorthand. -- [ ] Support macros across snippets: - `jit { #define N 7 }`, then bare `N + 1`. -- [ ] Support typedefs/tags across snippets: - `jit { typedef struct { int x; } S; S s = {41}; }`, then bare `s.x + 1`. -- [ ] Support function definitions across snippets: - `jit { int f(int x) { return x + 1; } }`, then bare `f(41)`. -- [ ] Diagnose strong redefinition cleanly when a later snippet defines the same - global function/object. -- [ ] Add targeted tests under a new `test/dbg` or `test/repl` harness. - -## Toy Checklist - -Toy is the current working interactive target. - -- [x] Change `ToyFrontend` to own persistent parser symbol/type storage instead - of rebuilding all parser state per compile. -- [x] Refactor `ToyParser` so lexical state is per snippet but declarations, - record/enum/type tables, globals, and function symbols persist. -- [x] Implement `CFREE_FRONTEND_INPUT_REPL_TOPLEVEL` for declarations and - definitions. -- [x] Implement `CFREE_FRONTEND_INPUT_REPL_EXPR` by generating: - -```toy -fn __cfree_dbg_expr_N(): i64 { - return USER_EXPR as i64; -} -``` - -- [x] Implement `CFREE_FRONTEND_INPUT_REPL_BLOCK` using Toy block/function - syntax and require an explicit return in v1. -- [x] Preserve global variables across snippets: - `jit { let x: i64 = 41; }`, then bare `x + 1`. -- [x] Preserve nominal records/enums/type aliases across snippets. -- [x] Preserve functions across snippets, including calls from later snippets. -- [x] Add Toy REPL smoke tests for globals, functions, record field access, and - expression wrapper. -- [ ] Keep one persistent `CfreeCg` if Toy needs CG-level metadata beyond the - parser-owned declaration/type tables. -- [ ] Add diagnostics for duplicate definitions and type mismatches that include - the snippet input name. -- [ ] Add Toy REPL smoke tests for enum constants and block wrapper. -- [ ] Add a focused test for a failed Toy snippet followed by a successful - snippet, documenting poison-or-recovery semantics. - -## Wasm Checklist - -Wasm is different from C/Toy: the user normally supplies complete WAT/Wasm -modules, not declarations in a source namespace. The ergonomic REPL target is a -module/session model with export invocation and instance-owned runtime state. - -- [ ] Decide v1 interaction model: - module append plus `invoke EXPORT ARGS...`, not arbitrary Wasm expression - snippets. -- [ ] Keep `WasmFrontend` lifecycle-shaped but treat most parser/module state as - per compile unless a specific cross-module context is introduced. -- [ ] Preserve instance/runtime state for appended modules where supported: - memories, tables, globals, start/init calls, and import slots. -- [ ] Add `dbg` command support for invoking Wasm exports by name with typed - integer/float arguments. -- [ ] Define how duplicate export names are handled across appended modules: - reject, shadow by generation, or require module qualification. -- [ ] Add module qualification in symbol lookup if multiple modules can export - the same name. -- [ ] Add clear diagnostics for unsupported interactive cases: - relocatable Wasm object input, multi-memory gaps, unsupported proposals, WASI - startup, and wasm64. -- [ ] Implement optional `CFREE_FRONTEND_INPUT_REPL_EXPR` later as WAT sugar, - lowering an expression into a generated module/function. -- [ ] Add Wasm REPL smoke tests: - WAT module append, export invocation, start function behavior, memory/data - persistence, imported function call, and duplicate export diagnostics. - -## Shared Acceptance Tests - -- [x] `make bin` -- [x] `make test-cg-api` -- [x] `make test-link` -- [x] `make test-asm` -- [x] `make test-parse` -- [x] `make test-driver` -- [ ] `make test-toy` stays green after Toy refactors. -- [ ] `make test-parse-err test-pp test-pp-err` stay green after C REPL - frontend work. -- [ ] New scripted REPL tests cover: - C macro/type/global persistence, Toy type/global/function persistence, Wasm - module invoke, bare expression fallback, explicit `expr` alias behavior, block - wrappers, duplicate definitions, and clean diagnostics after failed snippets. -- [ ] Manual smoke: - -```text -cfree dbg test.c -(cfree) :language c -(cfree) jit { typedef struct { int a; int b; } Point; Point p = {1, 2}; } -(cfree) p.a + p.b -$1 = 3 (0x3) -``` - -```text -cfree dbg test.toy -(cfree) :language toy -(cfree) jit { let x: i64 = 40; fn inc(v: i64): i64 { return v + 1; } } -(cfree) inc(x) + 1 -$1 = 42 (0x2a) -``` - -```text -cfree dbg --language wat -(cfree) jit { (module (func (export "answer") (result i64) i64.const 42)) } -(cfree) invoke answer -$1 = 42 (0x2a) -``` - -## Notes - -DWARF recovery remains useful for inspecting preexisting objects and external -debug info. It should not be the primary mechanism for normal REPL expressions -typed during the current session. The persistent frontend has better -source-level context for macros, typedefs, source-language type aliases, -front-end-only attributes, and frontend-specific syntax. diff --git a/doc/LANGS.md b/doc/LANGS.md @@ -1,479 +0,0 @@ -# Language Frontend Architecture Plan - -## Overview - -libcfree currently hard-codes two source consumers inside `src/api/pipeline.c`: - -- **C** (`CFREE_LANG_C`) — preprocessor → C parser → `CG` → `CGTarget` → `MCEmitter` -- **Assembly** (`CFREE_LANG_ASM`) — lexer → assembler → `MCEmitter` - -The C path is privileged because `CG` (`src/cg/cg.h`) is glued to the internal C -type system (`src/type/type.h`). To enable *alternative* language frontends we -will introduce a **new public codegen seam** (`include/cfree/cg.h`) that speaks -in language-neutral layout descriptors rather than C `Type*`. Frontends live -under `lang/` and consume only public headers (`include/cfree*.h`); the driver -registers them at startup. - -This plan defines Phase 1 (public ObjBuilder registration) and Phase 2 (the new -`CfreeCg` API), then scopes a **toy language** (`lang/toy/`) to prove the seam -before touching the C frontend. - -## Directory layout - -``` -lang/ - toy/ - lex.c — tokenizer: produces `ToyToken` structs with `CfreeSrcLoc` (line, col) - parse.c — recursive-descent parser that consumes a token iterator → CfreeCg calls - type.c — toy type system: int, record, array, pointer - type.h - toy.h — public frontend entry: `cfree_toy_compile()` - Makefile — produces `libcfree_toy.a` -``` - -`lang/` is a sibling of `driver/` and `src/`. It only includes `<cfree.h>` and -`<cfree/cg.h>`. No internal `src/` headers. - -## Public API: `include/cfree/cg.h` - -This is the **language-neutral codegen surface**. It replaces the internal -`CG` + `CGTarget` vtable with a stable, typed C API. The implementation lives -in `src/api/cg.c` (or alongside `pipeline.c`) and adapts calls to the existing -`CGTarget` machinery. - -### Handles - -```c -typedef struct CfreeCg CfreeCg; -typedef struct CfreeCgType CfreeCgType; -typedef struct CfreeCgValue CfreeCgValue; /* opaque stack/SSA handle */ -typedef uint32_t CfreeCgLabel; -#define CFREE_CG_LABEL_NONE 0u -``` - -### Type factory - -Types are **layout descriptors**, not semantic C types. They carry `(size, -align, scalar_kind)` and, for aggregates, a flat field list. The backend ABI -classification derives layout from these descriptors without knowing about C -qualifiers, bitfields, or tag identity. - -```c -CfreeCgType* cfree_cg_type_i32(CfreeCompiler*); -CfreeCgType* cfree_cg_type_i64(CfreeCompiler*); -CfreeCgType* cfree_cg_type_u32(CfreeCompiler*); -CfreeCgType* cfree_cg_type_u64(CfreeCompiler*); -CfreeCgType* cfree_cg_type_f32(CfreeCompiler*); -CfreeCgType* cfree_cg_type_f64(CfreeCompiler*); - -/* Pointer: element type + count (0 = single pointer or unknown, >0 array) */ -CfreeCgType* cfree_cg_type_ptr(CfreeCompiler*, CfreeCgType* pointee, uint32_t count); - -/* Records (structs, tuples, tagged unions). The caller describes fields in - declaration order; the backend computes offsets/alignment/padding. */ -typedef struct CfreeCgField { - CfreeSym name; /* may be 0 for anonymous/tuples */ - CfreeCgType* type; - uint32_t align_override; /* 0 = natural, 1 = packed */ -} CfreeCgField; -CfreeCgType* cfree_cg_type_record(CfreeCompiler*, - CfreeSym tag, - const CfreeCgField* fields, - uint32_t nfields); - -/* Function type for indirect calls and type-checking. */ -CfreeCgType* cfree_cg_type_func(CfreeCompiler*, - CfreeCgType* ret, - CfreeCgType** params, - uint32_t nparams, - int variadic); -``` - -### CG lifecycle - -```c -/* Construct a CG context bound to an ObjBuilder. */ -CfreeCg* cfree_cg_new(CfreeCompiler*, CfreeObjBuilder* out); -void cfree_cg_free(CfreeCg*); - -/* Function boundaries. `name` is the source-level symbol; the backend applies - the active object format's C-symbol mangling (e.g. leading `_` on Mach-O). */ -void cfree_cg_func_begin(CfreeCg*, const char* name, CfreeCgType* fn_type); -void cfree_cg_func_end(CfreeCg*); - -/* Source location tracking (sticky until next call). */ -void cfree_cg_set_loc(CfreeCg*, CfreeSrcLoc); -``` - -### Value stack - -`CfreeCg` owns a TCC-style stack. Every push produces a value; every operation -consumes and produces values. The stack discipline is the frontend's -responsibility; the backend manages register allocation and spills. - -```c -/* Literal materialization */ -void cfree_cg_push_int(CfreeCg*, int64_t value, CfreeCgType* type); -void cfree_cg_push_float(CfreeCg*, double value, CfreeCgType* type); - -/* String literals → rodata pointer */ -void cfree_cg_push_bytes(CfreeCg*, const uint8_t* str, size_t len); - -/* Addressable storage */ -void cfree_cg_push_local(CfreeCg*, uint32_t slot_id, CfreeCgType* type); -void cfree_cg_push_global(CfreeCg*, CfreeSym name, CfreeCgType* type); - -/* Lvalue/rvalue conversion */ -void cfree_cg_load(CfreeCg*); /* lvalue → rvalue */ -void cfree_cg_addr(CfreeCg*); /* lvalue → pointer rvalue */ -void cfree_cg_store(CfreeCg*); /* pop [addr_or_lvalue, rvalue] */ - -/* Stack manipulation */ -void cfree_cg_dup(CfreeCg*); -void cfree_cg_swap(CfreeCg*); -void cfree_cg_drop(CfreeCg*); -void cfree_cg_rot3(CfreeCg*); -``` - -### Arithmetic, compare, convert - -```c -typedef enum CfreeCgBinOp { - CFREE_CG_ADD, CFREE_CG_SUB, CFREE_CG_MUL, - CFREE_CG_SDIV, CFREE_CG_UDIV, CFREE_CG_SREM, CFREE_CG_UREM, - CFREE_CG_AND, CFREE_CG_OR, CFREE_CG_XOR, - CFREE_CG_SHL, CFREE_CG_SHR_S, CFREE_CG_SHR_U, -} CfreeCgBinOp; - -typedef enum CfreeCgCmpOp { - CFREE_CG_EQ, CFREE_CG_NE, - CFREE_CG_LT_S, CFREE_CG_LE_S, CFREE_CG_GT_S, CFREE_CG_GE_S, - CFREE_CG_LT_U, CFREE_CG_LE_U, CFREE_CG_GT_U, CFREE_CG_GE_U, -} CfreeCgCmpOp; - -void cfree_cg_binop(CfreeCg*, CfreeCgBinOp); -void cfree_cg_cmp(CfreeCg*, CfreeCgCmpOp); -void cfree_cg_convert(CfreeCg*, CfreeCgType* dst); -``` - -### Control flow - -Labels are numeric handles. The backend maps them to per-arch branch targets or -SSA blocks. - -```c -CfreeCgLabel cfree_cg_label_new(CfreeCg*); -void cfree_cg_label_place(CfreeCg*, CfreeCgLabel); -void cfree_cg_jump(CfreeCg*, CfreeCgLabel); -void cfree_cg_branch_true(CfreeCg*, CfreeCgLabel); /* pop i1 */ -void cfree_cg_branch_false(CfreeCg*, CfreeCgLabel); /* pop i1 */ - -/* Structured control flow (optional but recommended). Backends that don't - consume structure directly (all real ISAs except WASM) lower to labels. */ -typedef uint32_t CfreeCgScope; -CfreeCgScope cfree_cg_scope_begin(CfreeCg*, const CfreeCgType* result); -void cfree_cg_scope_end(CfreeCg*, CfreeCgScope); -void cfree_cg_break(CfreeCg*, CfreeCgScope); -void cfree_cg_continue(CfreeCg*, CfreeCgScope); -``` - -### Aggregate and memory operations - -```c -/* Copy `size` bytes from src_addr to dst_addr. Pops [dst, src]. */ -void cfree_cg_memcpy(CfreeCg*, uint32_t size, uint32_t align); - -/* Initialize `size` bytes at addr. Pops addr. */ -void cfree_cg_memset(CfreeCg*, uint8_t val, uint32_t size, uint32_t align); - -/* Element access for arrays and records. - Pops base_addr, pushes addr_of_element. */ -void cfree_cg_index(CfreeCg*, uint32_t elem_size, uint32_t index); -void cfree_cg_field_addr(CfreeCg*, uint32_t offset); -``` - -### Calls and returns - -```c -/* `nargs` values must be on the stack (left-to-right or right-to-left - depending on the frontend's calling convention choice). `fn_type` is the - callee's function type; the backend uses it for ABI classification. - The callee value itself must be the deepest value on the stack, below args. */ -void cfree_cg_call(CfreeCg*, uint32_t nargs, CfreeCgType* fn_type); -void cfree_cg_tail_call(CfreeCg*, uint32_t nargs, CfreeCgType* fn_type); -void cfree_cg_ret(CfreeCg*, int has_value); /* has_value=0 for void */ -``` - -### Inline assembly (future) - -```c -/* TBD: define after the toy language proves the seam. Same constraint model - as the internal AsmConstraint, but using CfreeCgValue handles instead of - internal SValues. */ -``` - -### Frame slots (locals and parameters) - -Frontends can allocate frame slots explicitly, or let the backend infer them -from `cfree_cg_push_local` usage. The explicit API is useful when the frontend -wants deterministic slot IDs (e.g. for debug variable location): - -```c -uint32_t cfree_cg_local_slot(CfreeCg*, CfreeCgType* type, CfreeSym name); -uint32_t cfree_cg_param_slot(CfreeCg*, uint32_t index, CfreeCgType* type, - CfreeSym name); -``` - -### Debug info hooks - -The toy language will skip debug info in v1, but the API surface must reserve -room so frontends can emit DWARF later without growing the vtable. - -```c -/* TBD: debug_func_begin, debug_local, debug_param, debug_line. - For v1 the driver passes debug_info=0 and the CG skips it. */ -``` - -## Toy language (`lang/toy/`) - -### Grammar (v1) - -``` -decl ::= fn_decl | global_decl | type_decl -fn_decl ::= "fn" name "(" param_list ")" (":" type)? block -param_list ::= (name ":" type ("," name ":" type)*)? -block ::= "{" stmt* "}" -stmt ::= let_stmt - | assign_stmt - | if_stmt - | while_stmt - | break_stmt - | continue_stmt - | return_stmt - | expr_stmt -let_stmt ::= "let" name ":" type ("=" expr)? ";" -assign_stmt ::= lvalue "=" expr ";" -if_stmt ::= "if" expr block ("else" block)? -while_stmt ::= "while" expr block -break_stmt ::= "break" ";" -continue_stmt ::= "continue" ";" -return_stmt ::= "return" expr? ";" -expr_stmt ::= expr ";" -lvalue ::= name (lvalue_r)* -lvalue_r ::= ("[" expr "]")* - | ("." name)* - | (".*")* - -global_decl ::= "let" name ":" type "=" expr ";" - -type_decl ::= "type" name "=" type ";" -type ::= "int" | "*" type | "[" number "]" type | record_type -record_type ::= "{" field_decl ("," field_decl)* "}" -field_decl ::= name ":" type - -expr ::= or_expr -or_expr ::= and_expr ("||" and_expr)* -and_expr ::= cmp_expr ("&&" cmp_expr)* -cmp_expr ::= add_expr (("<" | ">" | "<=" | ">=" | "==" | "!=") add_expr)? -add_expr ::= mul_expr (("+" | "-") mul_expr)* -mul_expr ::= unary_expr (("*" | "/" | "%") unary_expr)* -unary_expr ::= ("-" | "!" | "&") unary_expr | primary -primary ::= number | string | name | lvalue | "(" expr ")" -``` - -### Token representation - -```c -typedef enum ToyTokenKind { TOK_EOF, TOK_FN, TOK_LET, TOK_IF, TOK_INT, - TOK_IDENT, TOK_NUMBER, TOK_STRING, ... } ToyTokenKind; - -typedef struct ToyToken { - ToyTokenKind kind; - CfreeSrcLoc loc; /* file_id, line, col */ - const uint8_t* text; /* points into source buffer */ - size_t text_len; - int64_t int_value; /* valid for TOK_NUMBER */ -} ToyToken; -``` - -The lexer tracks `cur`, `end`, `bol` (beginning-of-line), and `line` so that -every emitted token gets an accurate `CfreeSrcLoc`. The parser holds a -`ToyLexer` as its token iterator and calls `toy_lexer_next()` to advance, -keeping the current token in `parser->cur`. - -### Semantics - -- **One integer type**: `int` is a signed integer whose width equals the target - pointer width (32-bit on ILP32, 64-bit on LP64). The frontend queries - `cfree_cg_type_int(compiler, cfree_target_ptr_size(compiler)*8, 1)`. -- **No implicit conversions**: the parser rejects `int + ptr`. -- **Records** are value types (like C structs). Assignment copies the whole - record. Parameter passing follows the target ABI (the backend decides - direct/indirect/split). -- **Arrays** are fixed-size and decay to pointers only in subscript and field - contexts. Array assignment is not allowed. -- **Pointers**: `ptr` is an untyped pointer (like `void*`). Dereference is - `*expr` (sugar for `expr[0]`). Typed pointers are a future extension. -- **Functions**: no forward declarations needed; a single-pass parser resolves - all function names into globals. Recursion is allowed. - -### Frontend pipeline - -1. **Lex** (`lex.c`) — token iterator (`ToyLexer`) with 1-char lookahead. - Every token carries its kind, source span (`text`/`text_len`), `CfreeSrcLoc` - (line/col), and an `int_value` for number literals. The parser calls - `toy_lexer_next()` to advance the iterator and inspects the current `ToyToken`. -2. **Type check** (`type.c`) — minimal bidirectional inference: - - `let` requires an explicit type (or an initializer from which to infer). - - Every expression node carries a `ToyType*`. - - Subscript and field access check bounds/field names at parse time. -3. **Codegen** (`parse.c` → `CfreeCg`) — single-pass lowering: - - Globals → `cfree_cg_push_global` + `cfree_cg_store`. - - Locals → `cfree_cg_local_slot` + `cfree_cg_push_local`. - - Records/arrays → `cfree_cg_memcpy` or `cfree_cg_memset`. - - Control flow → scopes and `cfree_cg_branch_true` / `cfree_cg_jump`. - - Function calls → push callee global, push args, `cfree_cg_call`. - -### Entry point - -```c -/* lang/toy/toy.h */ -#include <cfree.h> - -int cfree_toy_compile(CfreeCompiler*, const CfreeCompileOptions*, - const CfreeBytesInput* input, CfreeObjBuilder* out); -``` - -This matches the signature planned for `cfree_register_frontend`. - -## Driver registration (Phase 1.5) - -### New public entry - -```c -/* include/cfree.h, near the CfreeLanguage enum */ -typedef int (*CfreeCompileFn)(CfreeCompiler*, const CfreeCompileOptions*, - const CfreeBytesInput*, CfreeObjBuilder* out); - -/* Register a frontend for a language tag. Overwrites any prior registration - for that tag. Returns 0 on success, 1 on OOM or bad args. */ -int cfree_register_frontend(CfreeCompiler*, CfreeLanguage, CfreeCompileFn); -``` - -`CfreeLanguage` grows: - -```c -typedef enum CfreeLanguage { - CFREE_LANG_C = 0, - CFREE_LANG_ASM = 1, - CFREE_LANG_TOY = 2, /* new */ -} CfreeLanguage; -``` - -### Driver changes - -`driver/main.c` gains a registration hook invoked at tool startup: - -```c -static void driver_register_frontends(CfreeCompiler* c) { - cfree_register_frontend(c, CFREE_LANG_ASM, internal_compile_asm); - cfree_register_frontend(c, CFREE_LANG_C, internal_compile_c); - cfree_register_frontend(c, CFREE_LANG_TOY, cfree_toy_compile); -} -``` - -`cfree_compile_obj` in `src/api/pipeline.c` changes from a hard-coded switch -to a table dispatch: - -```c -static void compile_into(Compiler* c, const CfreeCompileOptions* opts, - const CfreeBytesInput* input, ObjBuilder* ob) { - CfreeCompileFn fn = compiler_get_frontend(c, input->lang); - if (fn) { - CfreeObjBuilder* pub_ob = ob; - int rc = fn(c, opts, input, pub_ob); - if (rc != 0) panic(...); - return; - } - /* fallback for unknown language */ - panic(...); -} -``` - -`cfree_language_for_path` learns `.toy` → `CFREE_LANG_TOY`. - -### Build integration - -The Makefile grows a `lang/` target: - -```makefile -# lang/toy/Makefile -libcfree_toy.a: toy.o lex.o parse.o type.o - $(AR) rcs $@ $^ - -toy.o: toy.c $(CFREE_INCLUDES) - $(CC) $(CFLAGS) -I../../include -c $< -o $@ -``` - -The top-level `Makefile` adds `lang/toy/libcfree_toy.a` to `LIBCFREE_OBJS` (or -links it into `libcfree.a` if we decide frontends are part of the core library). -Initially we keep it as a separate static archive that the driver links against. - -## Migration path for C (Phase 3, after toy proves the seam) - -Once `lang/toy/` compiles and links end-to-end, we can migrate the C frontend: - -1. Move `src/parse/`, `src/pp/`, `src/lex/`, `src/decl/` into `lang/c/`. -2. Rename internal `parse_c` to `cfree_c_compile` with the `CfreeCompileFn` - signature. -3. Build an **internal adapter layer** that translates C `Type*` into - `CfreeCgType*` before calling the public `CfreeCg` methods. Initially this - adapter can live in `lang/c/cg_adapter.c`. -4. The internal `CG` layer (`src/cg/cg.h`) is either: - - retired and replaced by the public `CfreeCg` implementation, or - - kept as a private fast-path for the C frontend if the adapter overhead is - unacceptable. -5. Assembly remains in core (`src/api/pipeline.c` or a thin `lang/asm/` - wrapper) because it bypasses CG entirely and talks to `MCEmitter`. - -## Inline assembly - -Inline assembly stays internal to the C frontend for now. The public `CfreeCg` -API can grow an `asm_block` method later, but it is not required for the toy -language. When it arrives, the signature will mirror the internal -`CGTarget.asm_block` but use `CfreeCgValue` handles and `const char*` strings -instead of internal `Operand` / `Sym`. - -## Open questions / TBD - -1. **String interning (`Sym`)**: The public API uses `const char*` everywhere. - The adapter must map names to internal `Sym` ids efficiently (a temporary - hash map inside `CfreeCg` is fine for v1). -2. **Panic boundary**: Every public `cfree_cg_*` call is a thin wrapper that - saves/restores `c->panic` around the internal work, exactly like - `cfree_compile_obj` today. -3. **Optimization wrapper**: How does `opt_level` reach the public CG? The - `CfreeCompileOptions` already carries `opt_level`; the `CfreeCg` constructor - can wrap the underlying `CGTarget` with `opt_cgtarget_new` internally. -4. **Debug info**: Toy v1 skips debug info. When we add it, the public API will - expose `CfreeDebugBuilder` handles that frontends populate with - language-neutral type and location records. -5. **Memory ownership of `CfreeCgType`**: Types should be arena-allocated from - the compiler's scratch arena and valid until `cfree_cg_free`. The public - surface should not require the frontend to free them. -6. **Variadics / atomics**: These stay in the `CfreeCg` API but are - optional for the toy language. They are needed when C is lifted later. -7. **ELF symbol visibility**: The toy frontend emits globals as - `SB_GLOBAL`/`SK_OBJ` by default. No linkage modifiers in v1. - -## Acceptance criteria for toy v1 - -- [ ] `include/cfree/cg.h` exists and compiles. -- [ ] `src/api/cg.c` implements the public API over existing `CGTarget`. -- [ ] `lang/toy/` compiles to `libcfree_toy.a` using only public headers. -- [ ] `driver/cc.c` (or a new `driver/toyc.c`) can compile `.toy` files: - `cfree toyc -c hello.toy -o hello.o` -- [ ] `cfree run` can JIT a `.toy` file and execute `main()`. -- [ ] No changes to the C frontend or internal `src/cg/cg.h` API surface. -- [ ] Existing test suite (`make test-asm test-lex test-parse test-pp test-cg - test-link test-elf`) passes unchanged. diff --git a/doc/LOCALS.md b/doc/LOCALS.md @@ -1,127 +0,0 @@ -# LocalId Design - -## Goal - -Make source locals independent from concrete frame slots. - -Today many frontend paths treat a local as a `FrameSlot`. That makes every -local memory-backed even when CG is recording to `opt_cgtarget`, where -`CGTarget.virtual_regs` already means CG can mint unbounded virtual registers -and leave physical allocation to opt. - -The new model introduces `LocalId` as the source-local identity. A `LocalId` is -the mutable lvalue for a local variable. Its storage policy is chosen once at -allocation time from the active `CGTarget`; it does not transition later. - -## Storage Policy - -Storage policy is a function of `CGTarget.virtual_regs`: - -- `virtual_regs != 0`: eligible locals are virtual-register locals. -- `virtual_regs == 0`: locals are frame-slot locals. - -This keeps `-O0` single-pass. Direct machine CG never needs source-local -liveness, local register pressure handling, or control-flow join repair. It -continues to use frame slots for locals and the existing expression vstack -spill path for temporary register pressure. - -The opt path records mutable local values in virtual registers. Register -pressure and frame placement are handled later by opt/lowering/regalloc, where -CFG and liveness information exist. - -## LocalId Shape - -Internally, a local needs roughly: - -```c -typedef enum LocalStorageKind { - LOCAL_STORAGE_FRAME, - LOCAL_STORAGE_VREG, -} LocalStorageKind; - -typedef struct Local { - CfreeCgTypeId type; - CfreeSym name; - uint32_t flags; - LocalStorageKind storage; - FrameSlot slot; /* valid for LOCAL_STORAGE_FRAME */ - Reg vreg; /* valid for LOCAL_STORAGE_VREG */ -} Local; -``` - -`LocalId` indexes this table. It is not a `FrameSlot`, and `OPK_LOCAL` should -remain the concrete frame-memory operand kind. If a local is frame-backed, -pushing it produces the same lvalue shape that `cfree_cg_push_local` produces -today. If a local is virtual-register-backed, pushing it produces a mutable -register local lvalue. - -## Lvalue Semantics - -`LocalId` is the source lvalue. The backing store decides how load, store, and -address operations lower: - -- frame local load/store: operate on the frame slot. -- virtual-register local load: read the current virtual register value. -- virtual-register local store: copy/define the local's virtual register value. -- address of a frame local: address the frame slot. -- address of a virtual-register local: unsupported unless the frontend has - already selected frame storage for that local. - -Virtual-register locals are mutable pseudo-locals, not SSA values at the public -CG layer. If opt later wants SSA, it must build SSA from this mutable local -stream. `LocalId` should not force direct CG to implement phi insertion. - -## Addressable Locals - -A local that can require an address must be frame-backed. Since `-O0` must stay -single-pass and this design has no storage transition, the frontend/API needs a -declaration-time way to request addressable storage. - -Use the existing slot-style attributes as the semantic source: - -- `CFREE_CG_SLOT_ADDRESS_TAKEN`: force frame storage. -- aggregate, VLA, `alloca`-like, volatile, ABI-required memory objects, and - compiler temporaries that need an address: force frame storage. -- scalar locals without addressable requirements may use virtual-register - storage when `CGTarget.virtual_regs` is set. - -This may be conservative. Correctness is more important than promoting every -possible scalar. Later analysis can mark more locals register-eligible before -creating them, but CG itself should not need to discover that mid-stream. - -## Public/API Direction - -The API should grow local handles distinct from slot handles: - -```c -CfreeCgLocal cfree_cg_local(CfreeCg*, CfreeCgTypeId type, - CfreeCgSlotAttrs attrs); -void cfree_cg_push_local_id(CfreeCg*, CfreeCgLocal local); -``` - -Compatibility can keep `cfree_cg_local_slot` and `cfree_cg_push_local` as the -explicit frame-slot path. C frontend migration should move source automatic -variables to `LocalId`; explicit stack objects and address-required temporaries -can keep using slots. - -## Non-Goals - -- No `REG -> FRAME` local transition in direct CG. -- No O0 local spilling for register-backed source locals. -- No phi or join repair in direct CG. -- No change to `OPK_LOCAL` meaning; it remains concrete frame memory. -- No guarantee that every scalar local becomes a virtual register. Addressable - or otherwise memory-required locals stay frame-backed. - -## Implementation Order - -1. Add the internal `LocalId` table and public/internal handle plumbing. -2. Route local allocation through the storage policy above. -3. Teach push/load/store/address paths to handle frame-backed and vreg-backed - locals. -4. Update the C parser adapter so normal automatic variables use `LocalId` - while explicit frame objects remain slots. -5. Keep O0 behavior equivalent by verifying `virtual_regs == 0` still allocates - frame-backed locals only. -6. Add opt-path tests that scalar locals record as virtual-register locals and - address-required locals still record frame slots. diff --git a/doc/OPT_REGS_CALL_PLAN.md b/doc/OPT_REGS_CALL_PLAN.md @@ -1,592 +0,0 @@ -# OPT Register And Call Constraint Plan - -This plan expands the O1 register-allocation contract so the optimizer can use -nearly all target registers safely. It combines two structural changes: - -1. targets expose a richer physical register file instead of a small pre-filtered - allocable pool; and -2. calls are lowered into opt-visible fixed-register, stack-argument, and clobber - constraints before register allocation. - -The goal is not to move to a full target machine IR immediately. The goal is to -make the current O1 path honest about target constraints while keeping replay and -backend emission intact enough to migrate one architecture at a time. - -## Current Status - -The correctness foundation for register preservation and the first planned-call -replay path are implemented. Targets now expose descriptive physical-register -metadata, per-call clobber masks, return-register masks, callee-save masks, and -call plans. O1 records each call plan during `machinize`, builds its current -hard-register tables from `CGPhysRegInfo`, uses target save/use costs in -allocation scoring, and preserves hard-assigned live-across-call values by -intersecting the assigned register with the planned call's clobber mask. -Post-RA hard-register liveness uses the same call-specific clobber mask. - -For supported call plans, O1 now replays calls by materializing -arguments with a local parallel-copy resolver, invoking backend stack-argument -and branch-only call-plan hooks, and extracting non-tail returns from fixed -return registers. Address-valued call moves cover byval/indirect arguments and -hidden sret destination pointers. Tail calls use the same setup and planned -branch path, with no return extraction. The x64, AArch64, and RV64 backends implement -`store_call_arg` for outgoing stack slots and `emit_call_plan` for the call -branch. - -What this closes: - -- the register-preservation correctness issue for values live across calls; -- target-provided physical-register metadata as the source for O1 register - tables; -- call-plan construction for scalar integer/FP/direct/indirect/byval/sret-shaped - calls in the current descriptor model; -- conservative allocation scoring that can choose caller-saved registers when - rewrite can preserve them, while still preferring callee-saved registers for - call-crossing values. - -What remains open: - -- call setup/return extraction are represented by call-plan aux data rather - than separate first-class IR ops; -- target `get_phys_regs` tables expose broader O1 pools, and incoming - parameter functions can now allocate ABI argument/return registers with - opt-side constraints for sequential parameter-copy hazards; -- direct CG still uses legacy allocation/call hooks; -- code-shape probes remain to be added. - -In phase terms: Phase 1 and Phase 2 are done, Phase 3 is implemented through -call-plan aux visibility plus planned replay for supported call shapes, Phase 4 -is implemented for register, stack, sret, tail-call, and return moves, -Phase 5 is implemented for call setup/replay, and Phase 6 remains open. - -## Planned Call Replay Boundary - -The legacy backend `call` hook is no longer used by O1 replay. Calls that reach -optimized replay must have a supported plan; unsupported planned shapes fail -diagnostically instead of falling back to sequential backend lowering. Direct CG -continues to use the legacy `call` hook while it is migrated separately. - -Planned replay is used only when all of the following are true: - -- the call has a valid `CGCallPlan`; -- the backend provides `emit_call_plan`; -- every stack argument destination has backend `store_call_arg` support; -- every offset/address-valued argument source has backend `load_call_arg` - support; -- every offset aggregate return store has backend `store_call_ret` support; -- every return destination is a register, local, or indirect operand. - -For those calls, O1 owns the setup and extraction sequence: - -- source operands are rewritten to hard registers or spill slots; -- live-across-call hard registers are saved before argument setup; -- argument moves into ABI registers and outgoing stack slots are resolved as a - local parallel copy; -- indirect callees that would be overwritten by argument setup are copied to a - target-provided scratch register first; -- the backend emits only required call metadata and the branch through - `emit_call_plan`; -- non-tail return registers are copied or stored into their planned destinations; -- tail calls stop after the planned branch and have no return extraction. - -The legacy `call` path is still required for: - -- **direct CG**: direct codegen still uses the old backend allocation and call - hooks while O1 migrates first. - -This boundary lets Phase 3/4 tests exercise register argument permutation, -outgoing stack arguments, sret hidden pointers, indirect-callee clobber hazards, -call-specific clobber preservation, and return extraction without broadening -the register file across legacy call lowering. - -## Current Problem - -The current `CGTarget` contract exposes: - -- `get_allocable_regs`; -- `get_scratch_regs`; -- `is_caller_saved`; -- `plan_hard_regs` / `reserve_hard_regs`; -- `call_stack_size`. - -That contract is too coarse for optimizer-driven allocation. A target has to hide -registers that are perfectly usable in most instructions because they are unsafe -for some call-lowering or helper-lowering cases. - -Examples: - -- ABI argument and return registers are useful for short-lived values, but current - call emitters copy arguments sequentially into those same registers. -- scratch registers are hidden globally even when only a small subset of target - operations need them. -- callee-saved registers are cheap for values live across calls but expensive for - one-use temporaries in leaf or tiny functions. -- the allocator can avoid caller-saved registers for call-crossing values, but it - has no target-provided save/restore cost or call-specific clobber masks. - -The result is conservative and correct, but it forces unnecessary prologue and -epilogue traffic in small O1 functions. - -## Design Goals - -- Keep O1 fast and range-based. -- Let each target expose all general allocatable physical registers, excluding - only permanently reserved registers such as stack pointer, frame pointer when - fixed, zero registers, platform registers, and architectural non-registers. -- Make ABI argument, return, and call-clobber effects explicit before liveness and - allocation. -- Make call argument moves parallel rather than sequential. -- Preserve the existing backend ownership of final prologue, epilogue, frame - layout, and machine-code emission during this migration. -- Avoid target-specific register knowledge in opt beyond data supplied by the - target. -- Keep direct CG usable while opt grows the richer contract. - -Non-goals for this plan: - -- full machine IR; -- global coalescing; -- live-range splitting; -- instruction scheduling; -- target-specific peephole rewrites beyond the call boundary. - -## New Target Register Contract - -Add a register-file description that replaces the allocation-policy meaning of -`get_allocable_regs`. The old hook can remain as a compatibility wrapper during -migration. - -```c -typedef enum CGPhysRegFlag { - CG_REG_ALLOCABLE = 1u << 0, - CG_REG_CALLER_SAVED = 1u << 1, - CG_REG_CALLEE_SAVED = 1u << 2, - CG_REG_ARG = 1u << 3, - CG_REG_RET = 1u << 4, - CG_REG_TEMP_PREFERRED = 1u << 5, - CG_REG_PLATFORM = 1u << 6, - CG_REG_RESERVED = 1u << 7, -} CGPhysRegFlag; - -typedef struct CGPhysRegInfo { - Reg reg; - u8 cls; /* RegClass */ - u8 abi_index; /* arg/ret order when applicable, otherwise 0xff */ - u16 flags; /* CGPhysRegFlag */ - u16 save_cost; /* relative prologue/epilogue cost if callee-saved */ - u16 use_cost; /* relative preference cost for ordinary allocation */ -} CGPhysRegInfo; -``` - -New target hooks: - -```c -void (*get_phys_regs)(CGTarget*, RegClass, const CGPhysRegInfo** out, - u32* nregs); -u32 (*call_clobber_mask)(CGTarget*, const CGCallDesc*, RegClass); -u32 (*return_reg_mask)(CGTarget*, const ABIFuncInfo*, RegClass); -u32 (*callee_save_mask)(CGTarget*, RegClass); -``` - -The exact masks may need to grow beyond `u32` if future architectures expose -larger register files, but `u32` matches the current register numbering model and -keeps this step consistent with existing code. - -Target policy: - -- AArch64 should expose normal integer allocation candidates from `x0-x28`, - excluding `sp`, `x29`, `x30`, and platform-reserved registers as needed. `x16` - and `x17` can be marked temp-preferred or reserved until helper scratch - clobbers are modeled. -- AArch64 FP should expose `v0-v31`, reserving only registers that target helper - expansion still requires globally. -- x64 should expose caller-saved and callee-saved GPRs except fixed `rsp/rbp` and - any helper-reserved registers still hidden during migration. It should expose - XMM registers with SysV all-caller-saved metadata. -- RV64 should expose `a*`, `t*`, `s*`, and `f*` equivalents, excluding `sp`, - fixed `s0` when used as frame pointer, `ra` unless explicitly modeled, `gp`, - `tp`, and zero. - -## Opt Register Policy - -`opt_machinize` should build per-class register tables from `CGPhysRegInfo`: - -- physical register list; -- caller-saved mask; -- callee-saved mask; -- reserved mask; -- argument mask; -- return mask; -- save/use costs. - -The O1 allocator should keep its interval assignment model, but candidate -register scoring should change from pure target order to a target-informed cost: - -```text -base use cost -+ callee-save open cost if this function has not already used that reg -+ caller-save crossing cost if value is live across calls -+ fixed/tied penalty rules -+ spill/reload alternative cost -``` - -Hard requirements: - -- values live across a call may use caller-saved registers only if rewrite can - preserve them at that call; -- non-call-crossing values should generally prefer caller-saved registers to - avoid function-wide callee-save traffic; -- once a callee-saved register is already used in the function, later allocations - may treat its save cost as already paid; -- tied/fixed registers from ABI lowering and inline asm remain mandatory. - -This can land without a global coalescer. It gives the current allocator enough -information to make better choices while preserving its O1 compile-time shape. - -## Opt-Visible Call Plan - -Add a target hook that converts a `CGCallDesc` into a call plan before liveness -and allocation: - -```c -typedef enum CGCallPlanLocKind { - CG_CALL_PLAN_REG, - CG_CALL_PLAN_STACK, - CG_CALL_PLAN_IGNORE, -} CGCallPlanLocKind; - -typedef enum CGCallPlanSrcKind { - CG_CALL_PLAN_SRC_VALUE, - CG_CALL_PLAN_SRC_ADDR, -} CGCallPlanSrcKind; - -typedef struct CGCallPlanMove { - Operand src; /* virtual value, local, indirect, imm, or global */ - u8 dst_kind; /* CGCallPlanLocKind */ - u8 src_kind; /* CGCallPlanSrcKind: value vs address materialization */ - u8 cls; /* RegClass for register destinations */ - Reg dst_reg; /* valid for CG_CALL_PLAN_REG */ - u32 src_offset; /* byte offset within aggregate source */ - u32 stack_offset; /* valid for CG_CALL_PLAN_STACK */ - MemAccess mem; /* width/sign for loads/stores */ -} CGCallPlanMove; - -typedef struct CGCallPlanRet { - Operand dst; /* virtual destination in current IR */ - u8 cls; - Reg src_reg; - u32 dst_offset; /* byte offset within aggregate destination */ - MemAccess mem; -} CGCallPlanRet; - -typedef struct CGCallPlan { - CGCallPlanMove* args; - u32 nargs; - CGCallPlanRet* rets; - u32 nrets; - Operand callee; - u32 clobber_mask[OPT_REG_CLASSES]; - u32 return_mask[OPT_REG_CLASSES]; - u32 stack_arg_size; - u8 variadic_fp_count; - u8 is_variadic; - u8 has_sret; -} CGCallPlan; -``` - -Target hook: - -```c -void (*plan_call)(CGTarget*, const CGCallDesc*, CGCallPlan* out); -``` - -The target remains the authority for ABI classification and stack layout. Opt -becomes the authority for scheduling the moves and preserving live values around -the call. - -Lowering shape: - -```text -CALL_SETUP_BEGIN -parallel copies: virtual/local/imm -> ABI arg regs or outgoing stack slots -CALL target, implicit uses arg regs, implicit defs return regs, - implicit clobbers call clobber mask -parallel copies: ABI return regs -> virtual/local destinations -CALL_SETUP_END -``` - -This can be represented either as new IR ops or as expanded existing `IR_COPY`, -`IR_STORE`, and `IR_CALL` ops with call aux data carrying implicit masks. The new -IR-op route is clearer and easier to test. - -## Parallel Move Resolver - -Call argument setup must not use the current sequential backend copy model once -ABI registers become allocable. - -Add a generic opt pass for parallel copies: - -- inputs are `(src operand, dst operand)` pairs; -- destinations may be physical registers or stack argument slots; -- sources may be virtual/hard registers, locals, indirect operands, immediates, - or globals; -- cycles are broken with a target-provided temporary register or spill slot; -- memory-to-memory copies route through a temporary; -- stack stores are ordered after any register loads that depend on stack source - addresses they could overwrite. - -For O1, this resolver can be local to call setup and return extraction. It does -not need to become a general coalescing pass in the first implementation. - -## Rewrite And Preservation - -The current rewrite inserts stores/loads for hard-assigned caller-saved values -known to be live across calls. With call plans, this should become: - -- for each call, compute values live across the call; -- intersect their assigned hard registers with the call plan's clobber mask; -- exclude values defined by the call return; -- emit save/restore only for that call-specific intersection. - -This keeps preservation precise for: - -- direct calls; -- indirect calls; -- varargs calls; -- target-specific helper calls if they use a different clobber mask later. - -The allocator should still use live-across-call frequency, but correctness should -come from per-call clobber masks in rewrite. - -## Backend Emission Changes - -Backends should gain emission hooks for an already-planned call: - -```c -void (*load_call_arg)(CGTarget*, Operand dst, const CGCallPlanMove*); -void (*store_call_arg)(CGTarget*, const CGCallPlanMove*); -void (*store_call_ret)(CGTarget*, const CGCallPlanRet*, Operand src); -void (*emit_call_plan)(CGTarget*, const CGCallPlan*); -``` - -For the current transition, these hooks assume register arguments have already -been materialized by opt and stack arguments are written one planned move at a -time through `store_call_arg`. `load_call_arg` and `store_call_ret` are the -offset-aware load/store hooks for aggregate parts and address-valued moves. -`emit_call_plan` only emits: - -- required varargs metadata such as x64 `AL`; -- direct or indirect call branch; -- target-specific call relocation; -- no sequential argument copies; -- no return copies. - -Direct CG can keep using the existing `call` hook until it is migrated or until a -thin wrapper builds and emits a call plan internally. - -## Migration Phases - -### Phase 1 - Register Description Without Behavior Change - -Status: done. `CGPhysRegInfo` and `get_phys_regs` exist, x64/AArch64/RV64 -provide current-pool metadata, and `opt_machinize` consumes it with legacy -fallbacks. Focused opt tests cover metadata consumption. - -- Add `CGPhysRegInfo` and `get_phys_regs`. -- Implement it for x64, AArch64, and RV64 using the current exposed pools first. -- Build opt's current hard-reg tables from the richer description. -- Keep `get_allocable_regs`, `get_scratch_regs`, and `is_caller_saved` as - wrappers. -- Add tests that inspect target register metadata for each architecture. - -Expected result: no codegen behavior change. - -### Phase 2 - Call Plan Construction - -Status: done. `CGCallPlan`, `plan_call`, call clobber masks, return masks, and -callee-save masks exist for the three native backends. O1 attaches plans during -`machinize`, and the opt tests cover plan attachment plus downstream planned -replay/fallback behavior. - -- Add `CGCallPlan` and `plan_call`. -- Implement call planning for simple direct scalar integer and FP args/returns on - all three architectures. -- Keep backend `call` emission unchanged. -- Add dump tests that verify planned arg regs, return regs, clobber masks, and - outgoing stack size. - -Expected result: opt can see call constraints, but does not allocate differently -yet. - -### Phase 3 - Opt IR Call Constraints - -Status: implemented for the current aux-data representation. Calls carry plan -aux data before liveness/allocation. Liveness, rewrite, hard-register DCE, and -hard-register liveness inspect plan operands for supported planned calls, while -rewrite uses the call-specific clobber mask to save live-across-call hard values -before argument setup. The implementation keeps explicit setup/call/return-copy -IR ops as a possible later cleanup rather than a prerequisite. - -- done: attach plan aux data to `IR_CALL` during `machinize`; -- done: teach liveness/range building to use planned source and destination - operands when planned replay is enabled; -- done: model call clobbers through the call-specific plan mask; -- done: keep the legacy `call` path behind a fallback for unsupported call-plan - shapes; -- still optional: split setup/call/return extraction into separate IR ops if the - aux-data representation becomes too opaque for later passes. - -Expected result: correctness coverage for arg-register hazards before the -allocator starts using those registers widely. - -### Phase 4 - Parallel Copy Resolver - -Status: implemented. O1 replay uses a local -parallel-copy resolver for planned call setup and return extraction, including -register-register cycles, local/indirect loads, address-valued moves, -immediates, globals, register and outgoing stack destinations, local/indirect -return destinations, and indirect callees that occupy a destination argument -register. Tail-call plans use the same setup and planned branch path, then skip -return extraction. - -- done: implement local parallel move resolution for register call setup and - return extraction; -- done: support register-register cycles, local/indirect loads, - address-valued moves, immediates, globals, outgoing stack stores, and - local/indirect return stores; -- done: use target-provided scratch registers to break cycles and preserve - indirect callees; -- done: add red-green tests for argument permutation cycles, indirect callees in - argument registers, stack-argument replay, and address-valued args; -- done: support `CG_CALL_PLAN_STACK` materialization directly in opt; -- done: add return-register collision, stack-source hazard, and tail-call replay - tests. - -Expected result: ABI arg and return registers can be made allocable safely. - -### Phase 5 - Broaden Register Exposure - -Status: implemented for call setup and incoming scalar parameter setup. O1 has -target-informed scoring and per-call preservation, and the native target -phys-reg tables now expose broader O1 pools. Known backend helper scratch -registers remain hidden. ABI arg/return registers are available to O1. Incoming -parameter functions keep those -registers allocable, with opt forbidding earlier parameter values from being -assigned to later incoming ABI registers that the backend still copies -sequentially. - -- done: expand target `get_phys_regs` tables with guarded caller-saved and ABI - registers for x64, AArch64, and RV64; -- done: update opt scoring to prefer caller-saved regs for non-call-crossing - values and callee-saved regs for call-crossing values; -- done: keep known backend helper scratch registers reserved until their - clobbers are expressed; -- done: remove call-driven ABI-reg suppression for stack and sret call plans; -- done: remove incoming-parameter ABI-reg suppression by modeling parameter - incoming-register clobber hazards in opt allocation constraints; -- done: remove the legacy tail-call fallback ABI-reg suppression by replaying - tail-call setup through call plans; -- Add code-shape tests for direct-call tiny functions and unused-param functions - across x64, AArch64, and RV64. - -Expected result: fewer callee-save prologue/epilogue pairs without sacrificing -call correctness. - -### Phase 6 - Remove Legacy Pool Semantics - -Status: open. Legacy `get_allocable_regs`, `get_scratch_regs`, -`is_caller_saved`, and `call` remain active for direct CG and fallback replay. - -- Convert direct CG to either use `CGPhysRegInfo` or build call plans internally. -- Remove allocation-policy dependence on `get_allocable_regs`. -- Restrict `get_scratch_regs` to legacy direct-CG fallback, then remove it once - backend helper clobbers are modeled. -- Make `reserve_hard_regs` consume actual replay-visible hard registers as it - does today, but derive preservation decisions from the richer register metadata. - -Expected result: one target register contract serves direct CG, opt, and future -O2 allocation. - -## Test Plan - -Focused unit tests: - -- done: opt-side target register metadata consumption; -- done: caller-saved live-across-call preservation using per-call masks; -- done: planned-call replay through `emit_call_plan` for register-argument - cycles, stack arguments, address-valued args, sret-shaped plans, - return-register collisions, stack-argument source hazards, and - indirect-callee/argument-register hazards; -- still needed: target register metadata tests per real architecture; -- done: broader real-architecture call-plan layout for scalar, FP, mixed, - sret, variadic, and stack-arg calls; -- still needed: direct call-clobber mask tests per real architecture; -- still needed: code-shape probes after ABI registers are exposed broadly; -- still needed: callee-save reservation/code-shape tests after broadened - allocation. - -Code-shape probes: - -- `int f(int x) { return 42; }`; -- `static int callee(int x) { return x + 1; }` - plus `int caller(int x) { return callee(x) + 2; }`; -- multiple non-call-crossing locals under pressure; -- one value live across a call plus several short-lived call-local values; -- FP argument and return variants. - -Targeted runs: - -```sh -make test-opt -make test-cg-api -make test-toy -make test-aa64-inline -make test-smoke-x64 -make test-smoke-rv64 -``` - -## Risks And Open Questions - -- The current call emitters still contain target-specific scratch assumptions. - Those assumptions must either become call-plan constraints or stay reserved - until later. -- x64 has implicit call metadata for variadic calls (`AL`) and helper scratch use - around memory copies; both need explicit representation. -- AArch64 `x16/x17` and platform register policy differs by OS and relocation - model. The register metadata must be target-OS aware. -- RV64 `ra`, `gp`, `tp`, `s0`, and zero should remain reserved unless the backend - grows explicit support for them. -- Stack argument stores can alias frame or outgoing areas in awkward cases. The - call-plan stack area should remain target-owned, with opt only scheduling the - materialization. -- Debug info and unwind data should continue to be backend-owned. Opt only tells - the backend which hard registers are actually live in emitted code. - -## Recommended First Patch Stack - -Completed: - -1. Add `CGPhysRegInfo` plus current-pool metadata for all three targets. -2. Teach `opt_machinize` to consume the new metadata. -3. Add `CGCallPlan` and plan calls without using it for emission. -4. Use call-plan clobber masks for rewrite and post-RA hard-register liveness. -5. Replay call plans in opt, including ABI register setup, outgoing - stack arguments, address-valued byval/indirect/sret moves, and return - extraction. -6. Remove call-driven ABI-reg suppression for stack-argument and sret-shaped - calls. -7. Add call-plan layout/dump tests for real x64/AArch64/RV64 scalar, FP, mixed, - sret, variadic, and stack-arg cases. -8. Add red-green hazard tests for return-register collisions and stack-argument - sources. -9. Remove incoming-parameter ABI-reg suppression with opt-side constraints for - incoming parameter copy hazards. -10. Replay tail-call plans in opt and remove O1's legacy backend `call` - fallback. - -Next patch stack: - -1. Migrate direct CG or wrap it with internal call planning, then remove legacy - pool semantics. - -This order keeps each step testable and avoids mixing API migration, allocation -policy, and call move correctness in one change. diff --git a/doc/RT_CFREERT_CHECKLIST.md b/doc/RT_CFREERT_CHECKLIST.md @@ -1,113 +0,0 @@ -# Building libcfree_rt.a With cfree - -Goal: build the runtime archive with cfree's own `cc`, `as`, and `ar` instead -of clang/llvm-ar. This is separate from stage-2 self-hosting of the main -compiler binary. - -Current focused probe: `aarch64-apple-darwin`. This variant is useful because -it covers LP64, int128 declarations, coroutine assembly, Mach-O symbol names, -and the freestanding runtime headers without requiring a system SDK. - -## Probe Command - -```sh -make bin -rm -rf build/rt/aarch64-apple-darwin -env CC="$PWD/build/cfree cc" \ - RT_AS="$PWD/build/cfree as" \ - RT_AR="$PWD/build/cfree ar" \ - make -e -k rt-aarch64-apple-darwin \ - RT_COMMON_CFLAGS= \ - RT_AS_COMPILE_FLAGS= -``` - -`RT_COMMON_CFLAGS=` drops flags cfree does not accept yet: -`-ffreestanding -fno-builtin -std=c11 -Wpedantic -Wall -Wextra -Werror`. -`RT_AS_COMPILE_FLAGS=` drops clang's `-c`, which `cfree as` does not use. - -## Current Status - -As of the latest probe, cfree builds or assembles: - -- [x] `rt/lib/int/int.c` -- [x] `rt/lib/fp/fp.c` -- [x] `rt/lib/mem/mem.c` -- [x] `rt/lib/atomic/atomic_freestanding.c` -- [x] `rt/lib/cfree/ifunc_init.c` -- [x] `rt/lib/int64/int64.c` -- [x] `rt/lib/coro/aarch64.c` -- [x] `rt/lib/coro/coro.c` -- [x] `rt/lib/coro/aarch64_macho.s` - -That is 9 / 9 compile or assemble steps green for this variant, and -`cfree ar` produces `build/rt/aarch64-apple-darwin/libcfree_rt.a`. - -## Completed - -- [x] Split AArch64 file-scope coroutine assembly out of - `rt/lib/coro/aarch64.c`. -- [x] Added standalone AArch64 coroutine assembly sources: - `rt/lib/coro/aarch64_elf.s` and `rt/lib/coro/aarch64_macho.s`. -- [x] Added runtime Makefile support for assembling `.s` / `.S` sources via - `RT_AS`, so cfree can use `cfree as` while the default clang build can keep - using `clang -c`. -- [x] Switched the runtime target flag from `--target=TRIPLE` to - `-target TRIPLE`, accepted by both clang and cfree. -- [x] Wired `__builtin_ctzl` and `__builtin_ctzll` through the existing - `INTRIN_CTZ` path. `rt/lib/int64/int64.c` now compiles in the focused - aarch64-apple-darwin cfree probe. -- [x] Converted `__atomic_*_n` value operands to the atomic object type before - lowering. This clears the pointer-sized literal mismatch in - `__atomic_store_n((uintptr_t*)l, 0, __ATOMIC_RELEASE)` and the same class for - RMW / compare-exchange desired operands. -- [x] Added target-aware folding for `__atomic_always_lock_free(size, ptr)` and - constant-size `__atomic_is_lock_free(size, ptr)`. The parser asks - `cfree_cg_atomic_is_lock_free` for representative scalar types, so results - follow the active target instead of an aarch64-only table. -- [x] Exposed a general `cfree_cg_top_const_int` query for compile-time-known - integer-like values on the CG value stack. -- [x] Used that query in parser `if`, `&&`, and `||` lowering. Dead arms are - still parsed for semantic/type-stack effects, but target code emission is - suppressed, so constant-false 16-byte atomic fast paths no longer trip the - 8-byte lock-free backend limit. -- [x] Lowered `__builtin_memcpy`, `__builtin_memmove`, `__builtin_memset`, and - `__builtin_memcmp` as builtins even when their byte count is runtime-sized. - The parser now synthesizes the standard libc call directly, instead of - rewriting the token to an undeclared plain identifier. -- [x] Added parser support for `__atomic_fetch_nand`, mapped through the - existing target-independent atomic NAND RMW operation. -- [x] Fixed the `rt/lib/fp/fp.c` preprocessor crash. Function-like macro - argument prescan now preserves raw-token hidesets, so self-referential - suffix-renaming macros such as `rep_t -> _FP_NAME(rep_t)` do not recurse - until stack exhaustion. -- [x] Added parser support for `__builtin_isnan`, lowered as a single-evaluation - floating self-compare. -- [x] Routed C floating comparisons through FP comparison lowering for aarch64, - x86-64, and riscv64 instead of integer compare paths. -- [x] Accepted null pointer constants such as `((void*)0)` in static pointer - initializers. This clears the `_Thread_local coro_t* __cfree_current = NULL` - initializer in `rt/lib/coro/coro.c`. - -## Remaining Blockers - -- [ ] Lift remaining coroutine file-scope assembly. - Current file-scope asm remains in: - `rt/lib/coro/x86_64.c`, `rt/lib/coro/x86_64_win.c`, - `rt/lib/coro/i386.c`, `rt/lib/coro/riscv64.c`, - `rt/lib/coro/riscv32.c`, `rt/lib/coro/arm32.c`, and - `rt/lib/coro/arm32_thumb1.c`. - -- [ ] Decide whether cfree should eventually accept the standard runtime C - flags, or whether the runtime Makefile should grow a first-class - cfree-toolchain mode that drops/translates them. - -- [ ] Run the same cfree-toolchain probe for the other default runtime - variants after `aarch64-apple-darwin` archives cleanly. - -## Notes - -The ordinary clang `rt-aarch64-apple-darwin` target currently also stops in -`atomic_freestanding.c` because clang treats several `__atomic_*` library -entry points as builtins. That is separate from the cfree bootstrap path, but -it means the default target is not a clean regression signal for the assembly -split until the atomic source/build flags are addressed. diff --git a/doc/RV64_PARITY_CHECKLIST.md b/doc/RV64_PARITY_CHECKLIST.md @@ -1,252 +0,0 @@ -# rv64 parity checklist - -Goal: bring `riscv64` / `rv64` to the same practical coverage as `aarch64` -across standalone asm, disasm, C/toy compilation, object/link output, runtime, -debug tooling, and executable test paths. - -This checklist tracks parity with the aa64 lane, not architectural feature -completeness for all RISC-V extensions. The baseline target is RV64GC Linux -ELF with the psABI double-float ABI unless a task says otherwise. - -## Asm / disasm - -- [x] Wire rv64 into `arch_disasm_new` through `src/arch/rv64/disasm.{h,c}`. -- [x] Add rv64 `test/asm` smoke coverage for text decode, object listing, hex - encode, and podman-backed ELF execution. -- [x] Add arch-scoped asm fixture applicability (`*.targets`) so aa64/x64/rv64 - cases do not fail on unrelated targets. -- [x] Replace the current hand-written rv64 disassembler with an ISA descriptor - layer equivalent in role to `src/arch/aa64/isa.{h,c}` so encoding, - decoding, and printing share one description. -- [x] Expand standalone rv64 asm parsing beyond the current small subset: - branches, calls, arithmetic, shifts, compares, loads/stores, AUIPC/LUI, - relocation-bearing operands, atomics, fences, CSR/system forms, scalar - FP, and backend-emitted forms. -- [x] Expand rv64 disasm to decode every instruction emitted by rv64 codegen and - accepted by standalone asm, including unknown/truncated handling that - matches the public iterator contract. -- [x] Add relocation/symbol annotation coverage for rv64 object disassembly. -- [x] Update `test/asm/regen.sh` or add an rv64 variant for clang/objdump golden - regeneration. -- [ ] Make asm round-trip (`S`) meaningful for rv64 codegen output and gate the - rv64-emitted corpus on it. (Encode/decode tables cover the full RV64GC - surface; an explicit round-trip gate over codegen output still TODO.) - -## Register API / target surface - -- [x] Add rv64 public register-name/index support for psABI names plus `xN` and - `fN` aliases. -- [x] Audit all register naming users (`dbg`, asm constraints, disasm printers) - for consistent DWARF numbering: `x0..x31` as 0..31 and `f0..f31` as - 32..63. -- [x] Verify predefined macros, driver triple parsing, target defaults, and - `cfree_test_target` setup against clang's `riscv64-linux-gnu` behavior. -- [x] Decide policy for optional extensions (`C`, `A`, `F`, `D`, `Zicsr`, - `Zifencei`, future vector) and reflect it in target feature queries. - (Locked: RV64I/M/F/D/A/C + Zicsr-minimal; macros mirror clang.) - -## Inline asm - -- [x] Implement rv64 inline-asm template rendering parallel to aa64: - placeholders, symbolic operands, memory operands, width/addr modifiers, - escaped percent, and statement splitting. -- [x] Add rv64 constraint support for integer, FP, immediate, memory, matching, - early-clobber, and read-write operands. - (Integer constraints + memory + matching done; FP-`"f"`, `"K"`/`"L"`/`"J"` - immediates, and named-reg `"={a0}"` deferred — require src/cg/ extension.) -- [x] Verify clobbers, `"memory"`, callee-saved preservation, named registers, - and fixed-register conflicts on rv64. -- [x] Add an rv64 inline-asm unit test parallel to - `test/arch/aa64_inline_test.c`. -- [x] Add C and toy inline-asm execution cases that run through podman/qemu rv64. - -## C / toy codegen - -- [x] Prove a targeted rv64 C parse path can compile, link, and execute through - podman path E. -- [x] Run and triage the full C parse corpus for rv64 at `-O0`, `-O1`, and - `-O2`; track failures by missing backend feature rather than broad skips. - (O0+O1: 1828/0/1830. O2 single-threaded passes; the parallel-runner - SIGILL flakes are harness infra, not codegen.) -- [x] Run and triage toy cross-arch path `X` for rv64 alongside aa64 cases. - (491/0/0 after fixing the INTRA_AUIPC_ADDI width guard.) -- [x] Match aa64 coverage for scalar integer, pointer, aggregate, varargs, - atomics, intrinsics, labels, computed goto, switch lowering, tail calls, - alloca, and dynamic stack adjustment. -- [x] Close remaining explicit rv64 backend panics in `src/arch/rv64/ops.c`, - `alloc.c`, and `emit.c`. - (FP-cmp branching, BITCAST same-class, large fp_pair_off, label-fixup - width guard. asm_block closed via inline-asm template walker.) -- [x] Verify optimized rv64 lowering after recent opt pipeline work: liveness, - register allocation, hard-register constraints, call plans, and spill - reloads. (Implicitly verified by O1 corpus 1804/0 + toy O0/O1/O2 491/0.) -- [x] Add targeted rv64 cases for large frames, far branches, far label-address - materialization, large immediates, and pcrel/GOT materialization. -- [x] Add targeted rv64 FP conversion, comparison, NaN, and rounding cases. -- [x] Add targeted rv64 atomic cases for all supported widths and memory orders. - -## ABI / platform - -- [x] Finish psABI edge-case coverage: aggregate classification, indirect args, - mixed int/FP aggregates, homogeneous FP shapes where applicable, sret, - byval, empty/zero-sized fields, and mixed returns. -- [x] Verify variadic functions: register save area layout, `va_list` shape, - stack argument traversal, and mixed int/FP varargs. -- [x] Verify stack alignment, frame pointer conventions, callee-saved integer - registers `s0..s11`, and callee-saved FP registers `fs0..fs11`. -- [x] Decide `long double` policy for rv64 (`quad` vs compatibility mode) and - align C frontend, ABI lowering, libc harnesses, and runtime helpers. - (Locked to `double`; LDBL128=0 in driver/runtime.c + rt/Makefile.) -- [x] Audit TLS models for rv64: local-exec, GOT/TLS relocations, static link, - dynamic link, and emulator/JIT behavior. - (LE + IE codegen and reloc kinds wired; GD / TLS-Descriptor and the - linker IE→LE relaxation are deferred — no failing test depends on them.) - -## Object / link / driver - -- [x] Keep rv64 ELF roundtrip link corpus green for path R. -- [x] Fix `cfree objdump -d` to choose the disassembler target from the object - file rather than the host target. -- [x] Run rv64 link path E broadly under podman and triage execution failures. - (parse E: 1830 cases; toy X: 491 cases; all green.) -- [x] Ensure ELF rv64 relocations cover all codegen, asm, TLS, PLT/GOT, ifunc, - linker-script, archive, and GC cases currently passing for aa64. - (33 R_RV_* relocs mapped + applied; TLS_GOT_HI20 added Wave 2B. ifunc - and linker-script details still to verify under load.) -- [x] Implement or explicitly reject any unsupported rv64 relocation kinds with - diagnostics that name the relocation and input object. - (`compiler_panic` at src/link/link_reloc.c:489 names the reloc kind.) -- [x] Exercise `cfree as`, `cc`, `ld`, `ar`, `objdump`, `strip`, and `objcopy` - paths with rv64-specific command tests where the tool claims rv64 support. -- [x] Verify dynamic-linker defaults for musl and glibc rv64 Linux. - (musl: /lib/ld-musl-riscv64.so.1; glibc: /lib/ld-linux-riscv64-lp64d.so.1.) -- [x] Add rv64 `objdump` golden tests for sections, symbols, relocs, and - disassembly annotations. - -## Runtime / libc - -- [x] Build `libcfree_rt.a` for `riscv64-linux` through cfree, not only host - clang probes. -- [x] Bring rv64 coroutine/runtime support through the cfree assembler/compiler - path. (rt/lib/coro/riscv64.c built via `$(BIN) cc` per rt/Makefile.) -- [x] Run `test-rt-runtime` with rv64 enabled and triage every runtime helper - failure. (5/5 cases pass: coro, freestanding_lib, setjmp, stdarg, stdatomic.) -- [x] Retarget musl and glibc libc harnesses to rv64 sysroots and run the same - cases currently exercised for aa64. (test-musl-rv64: 9/9 static, 9/9 - dynamic. test-glibc-rv64: 8/9 — the single anomaly is a flaky SIGKILL - under concurrent load, not a code regression.) -- [x] Add rv64 smoke cases that use cfree-emitted bytes for startup/runtime - paths, not only clang-produced harness binaries. -- [x] Verify compiler-rt-style integer, FP, memory, atomic, and coroutine - helpers for rv64 ABI correctness. - -## Debug / DWARF / JIT - -- [x] Add rv64 debugger breakpoint support (`ebreak`) and displaced-step logic. -- [x] Add rv64 ucontext/register marshalling for supported host OSes. -- [x] Emit and validate rv64 DWARF CFI/line-info details, including CFA rules, - frame-pointer conventions, return-address register `ra`, and FP register - numbering. (Real .eh_frame producer; CFA=s0+frame_size-fp_pair_off; - ra=x1; s0..s11 + fs0..fs11 callee-saves recorded.) -- [x] Extend DWARF tests with rv64 producer roundtrips where instruction size - and register numbering differ from aa64. (test/debug/cfi_unit.c.) -- [x] Fill rv64 JIT support gaps: executable memory, relocations, symbol calls, - TLS/TLV behavior, and native-host execution tests where available. - (link_jit.c handles R_RV_TPREL_HI20/LO12_I/S as TLSLE and resolves - R_RV_PCREL_LO12_I/S against the paired AUIPC's runtime displacement; - execmem.flush_icache emits fence.i + __builtin___clear_cache on - __riscv; test/link/rv64_jit_test.c JIT-loads a tiny rv64 image and - SKIPs the native call on non-rv64 hosts. TLV thunk is Mach-O-only - and stays aa64; rv64 uses local-exec TLS via the TPREL path.) -- [x] Decide debugger scope for non-native rv64 execution; either support it - through emulation or mark it explicitly out of parity. - (Linux/riscv64 native only; macOS/BSD rejected via #error.) - -## Emulator - -- [x] Audit rv64 ELF loader behavior against aa64: program headers, auxv, - stack setup, argv/envp, TLS, brk/mmap, and dynamic loader handoff. - (static-linked; dynamic loader deferred) -- [x] Expand rv64 decode/lift coverage to match all instructions produced by - cfree rv64 codegen and clang-built harnesses. (decode RV64IMFDA done; - JIT lift deferred — interpreter is functional) -- [x] Add rv64 syscall coverage for libc and smoke workloads. - (minimum set: exit/exit_group/write/read/close/fstat/brk/mmap) -- [x] Add emulator regression tests for rv64 branches, calls, atomics, FP, TLS, - and signals/traps. (rv64_smoke_test + rv64_extras_test cover FP+CSR, - RVC, PT_INTERP, and the new syscall set. Atomics, TLS, and signal - trampolines remain stubbed in the interpreter — out of smoke scope.) - -## Execution infrastructure - -- [x] Use podman `--platform linux/riscv64` for rv64 execution when no native or - qemu-user runner is available. -- [x] Prove `test-smoke-rv64` direct and batched execution paths. -- [x] Prove `test/asm` rv64 path E through podman. -- [x] Prove a targeted `test/parse` rv64 path E through podman. -- [x] Run larger rv64 E matrices under podman with batching and record stable - filters for CI-equivalent local runs. - (test/parse and test/toy run end-to-end through podman/qemu rv64 - with batching; stable filters established.) -- [ ] Add clear diagnostics for missing podman image/platform support, binfmt, - qemu-user, or clang rv64 cross support. -- [x] Decide default images for `RUN_RV64_IMAGE` across musl/glibc tests. - (musl/Alpine = `alpine:latest`; documented in test/lib/exec_target.sh.) - -## Test policy - -- [x] Add rv64-targeted filters/goldens for each new feature as it lands. -- [x] Keep skips explicit and arch-scoped through `*.targets`, not hidden in - harness defaults. -- [x] Prefer red/green targeted runs: one failing feature family at a time, - one arch at a time. -- [x] Promote stable rv64 lanes into default or CI-equivalent coverage once the - runner assumptions are reliable. - (test-rv64-inline and test-emu added to default `make test`; - test-smoke-rv64 / test-musl-rv64 / test-glibc-rv64 remain opt-in - because they require podman/qemu.) -- [x] Keep aa64 lanes green while changing shared asm/disasm/link/test harness - code. - -## RV64 opset status - -This section tracks the RV64 asm/disasm ISA families that were historically -absent from the descriptor table (`src/arch/rv64/isa.c`) plus the remaining -explicitly unsupported extension families. - -**Standard scalar FP (RV32F/D) — complete for scalar RV64GC:** -- `fmadd.{s,d}`, `fmsub.{s,d}`, `fnmsub.{s,d}`, `fnmadd.{s,d}`, and - `fclass.{s,d}` are now in the shared asm/disasm descriptor table, with - targeted encode/decode coverage. - -**Atomic ordering suffixes (RV64A) — complete:** -- `lr.{w,d}.{aq,rl,aqrl}`, `sc.{w,d}.{aq,rl,aqrl}`, and - `amo*.{w,d}.{aq,rl,aqrl}` are accepted and disassembled with ordering - suffixes. The bare forms remain present for codegen. - -**RV64C compressed — complete for RV64-applicable scalar/FP forms:** -- Encoder and decoder cover the existing baseline plus `c.fld`, `c.fsd`, - `c.fldsp`, `c.fsdsp`, `c.subw`, `c.addw`, `c.and`, `c.or`, `c.xor`, - `c.sub`, `c.andi`, `c.srai`, `c.srli`, `c.slli`, and `c.addiw`. -- `c.flw/c.fsw/c.flwsp/c.fswsp` remain RV32-only and are intentionally not - accepted for RV64. -- Codegen never emits compressed regardless; backend always picks 32-bit - forms. Encoder coverage matters only for hand-written `.s` files. - -**Privileged ISA (M-mode / S-mode) — out of scope by policy:** -- `mret`, `sret`, `uret`, `wfi`, `sfence.vma`, `hfence.*`, `mnret`. -- M-mode/S-mode CSRs (mstatus, mtvec, mepc, mcause, satp, etc.) reachable - only via `csrrw`/`csrrs`/`csrrc` with a literal CSR number. The asm - syntax for named privileged CSRs (e.g., `csrrw t0, mstatus, zero`) is - not in the table; only the fp/Zicsr CSRs (`fcsr`, `frm`, `fflags`) and - numeric forms work. - -**Extension status:** -- `Zifencei` is now supported for asm/disasm via `fence.i`. -- Still out of scope: `V` (vector), `B`/`Zba`/`Zbb`/`Zbc`/`Zbs` (bit manipulation), - `Zfh`/`Zfhmin` (half-precision FP), `Zicbom`/`Zicboz` (cache - management), `Zihintpause`, `Smaia`/`Ssaia` — none planned. - -**Misc gaps:** -- `c.unknown` descriptor exists as a sentinel for the disassembler; not a - real ISA mnemonic. diff --git a/doc/STAGE2.md b/doc/STAGE2.md @@ -1,272 +0,0 @@ -# Stage-2 self-host - -What's missing to make `make self` produce a stage-2 `cfree` built by stage-1 -cfree itself. Companion to `DESIGN.md`. - -Latest snapshot: **105 / 107 files compile clean** (93/93 `src/**/*.c`, -12/14 `driver/*.c`). The two remaining driver failures (`env.c`, `ld.c`) -are both blocked by A2 — system-header ingest. Everything in `src/` builds -under stage 1. - -A standalone link probe (`scripts/stage2_link.sh`) drives the full -sequence end-to-end: cfree-stage1 compiles the 105 clean files, clang -compiles `env.c` / `ld.c`, and `cfree ld` then attempts to link the -combined object set against `libSystem.B.tbd`. As of the latest run the -link reaches the chained-fixup emit pass and trips D2 below. - -## Build configuration - -Stage 2 currently invokes: - -``` -cfree-stage1 cc --sysroot=$SDK -isystem rt/include -Iinclude -Isrc -``` - -`--sysroot=$SDK` makes the host SDK's libc/POSIX headers visible. -`rt/include/` ships the freestanding set on top. - -`DEPFLAGS` is empty for stage 2 today; B0 has landed but the recipe has -not been switched back on. - -## Checklist - -### Preprocessor / lexer - -- [x] **A1.** Quoted `#include "x.h"` now searches the includer's - directory first per C99 §6.10.2 (commit c9baaf8). Was blocking every - `driver/*.c` file. -- [ ] **A2.** System-header ingest. The driver pulls a POSIX/Mach surface - (`sys/stat.h`, `sys/mman.h`, `sys/syscall.h`, `fcntl.h`, `unistd.h`, - `signal.h`, `pthread.h`, `dlfcn.h`, `mach/mach.h`, `mach/mach_vm.h`, - `mach/vm_map.h`) from the host SDK. With `-isystem $SDK/usr/include` - and the right host predefines, the SDK parses up to a small set of - constructs cfree doesn't yet handle. Each sub-item below is the - minimal feature needed. - - - [ ] **A2-S1.** Asm-label on function declarators: - `T fn(args) __asm__("name");`. GCC asm-label rename extension; what - `__DARWIN_ALIAS` / `__DARWIN_ALIAS_C` / `__DARWIN_INODE64` / - `__DARWIN_EXTSN` expand to. Blocks `sys/stat.h`, `sys/mman.h`, - `unistd.h`, `_string.h`, `_stdio.h`. - - [ ] **A2-S2.** Asm-label on global variables: - `extern T name __asm__("name");`. Same extension, declarator position - differs from S1. Blocks `_time.h` (→ `<time.h>`, `<signal.h>`, - `<pthread.h>`). - - [ ] **A2-S3.** Unknown `#pragma` accepted as no-op (full semantics - not required for ingest). Today fatal "expected declaration". Blocks - `sys/fcntl.h`, `mach/vm_types.h`. Same root cause as R2 below. - - [ ] **A2-S4.** `__has_include`, `__has_feature`, `__has_extension` - as preprocessor builtins inside `#if`. (`__has_attribute` already - works.) Blocks `Availability.h` and the `__enum_decl` feature-detect - branch. - - [ ] **A2-S5.** `__uint128_t` declared type. Declare-only is enough - to parse `mach/arm/_structs.h` (signal.h, ucontext); full codegen - is a bigger lift. - - [ ] **A2-S6.** `#warning` accepted as non-fatal. Today cfree errors - on the directive itself; `sys/cdefs.h`'s - `#warning "Unsupported compiler"` aborts any SDK ingest unless - `-D__GNUC__` is also passed. - - [ ] **A2-S7.** Predefine macOS-host macros (`__APPLE__`, `__MACH__`, - `__arm64__`/`__aarch64__`, `__LITTLE_ENDIAN__`, `__GNUC__`, - `__GNUC_MINOR__`) automatically when targeting macOS, so callers - don't need to hand-pass `-D`. - - After S6+S7+S1+S2+S3+S4, both blocked driver files should ingest the - SDK directly. S5 only needed for signal.h/ucontext paths. - -### Driver — dep emission - -- [x] **B0.** `cfree_dep_iter_new` / `_next` implemented over - SourceManager (commit 8919185). Stage 2 can re-enable `-MMD -MP` - whenever the recipe drops `DEPFLAGS=''`. - -### Parser / sema - -- [x] **B1.** `__alignof__` aliased to `_Alignof` (type-name form). -- [x] **B2.** `__builtin_ctz` lowered through `INTRIN_CTZ`. -- [x] **B3.** `parse_array_bound` already routed `SEK_ENUM_CST` through - `eval_const_int` — original repro was actually B4. Regression case - added. -- [x] **B4.** `try_parse_addr_const` accepts string literals via - `emit_string_to_rodata`. -- [x] **B5.** `try_parse_addr_const` admits `SEK_FUNC` identifiers. -- [x] **B6.** File-scope `T name[] = {...}` now calls - `complete_incomplete_array` to match the block-scope path. -- [x] **B7.** `__alignof__` accepts a **unary-expression** operand - (`__alignof__(*ptr)`), not just a type-name. Required by the - `VEC_GROW` macro in `src/core/vec.h`; previously blocked 8 files in - `src/debug/` and `src/link/`. -- [x] **B8.** `sizeof` accepts the no-parens **unary-expression** form - in constant-expression contexts (e.g. file-scope initializers). C99 - §6.5.3.4 standard, not an extension. Blocked `src/arch/aa64/isa.c` - and `src/arch/aa64/regs.c`. -- [x] **B9.** Block-scope `static T name[] = {...}` now completes the - incomplete array, mirroring B6's file-scope fix. Was blocking - `src/pp/pp.c`. - -### Codegen — aarch64 backend - -- [x] **C1.** `OPK_INDIRECT` source operands handled in INT and FP arg - paths (commit f2d3e01). -- [x] **C2.** `OPK_INDIRECT` on the indirect-return path (commit - f2d3e01). -- [x] **C0.** Stage-1 regalloc "no spillable victim (class 0)" panic - fixed — was choking on the complex functions in `src/arch/aa64/arch.c`, - `src/arch/rv64.c`, `src/cg/cg.c`, and `src/opt/opt.c`. Not a feature - gap; a regalloc bug surfaced by self-host pressure. - -### Codegen — x64 backend - -- [ ] **C3.** Mirror C1/C2 on x64 - (`src/arch/x64.c:1761,1798,1817,1827,1904`). Doesn't block aarch64 - self-host; blocks x64 self-host when that's attempted. - -### Linker - -- [ ] **D1.** Stage 2 currently relies on `$(CC) -o $@ ... $(LIB_AR)` - for the final link — for stage 2 that's `cfree-stage1 cc`, which in - turn shells out to the host linker. Once stage 2 builds, verify the - produced binary is genuinely a stage-1-emitted object linked through - cfree's own ld path, not falling back to clang/ld silently. -- [x] **D2-read.** Mach-O reader rejected `ARM64_RELOC_TLVP_LOAD_PAGE21` - (8) and `ARM64_RELOC_TLVP_LOAD_PAGEOFF12` (9). Clang emits these for - TLS references in `driver/env.c` (errno-style access); without them - the standalone link probe couldn't ingest `env.o`. Reader now maps - both to TLV reloc kinds. -- [ ] **D2-emit.** Chained-fixup emit doesn't know how to locate the - byte slot for the new TLV pointer region — `cfree ld` aborts with - `link_macho: chained-fixup slot for vaddr 0x… not in any segment - buffer` at `src/link/link_macho.c:1564`. The lookup at - `link_macho.c:1543` currently routes only segidx 2 (`__DATA_CONST` - __got) and segidx 3 (`__DATA` __thread_ptrs / MSec walk); the new - TLV section/segment added by the TLV ingest work isn't covered. - Blocks the standalone link probe past compile. - -## Runtime — `rt/lib/*` ingest - -Separate from stage-2 self-host: can cfree compile `libcfree_rt.a`? -Probed on the `aarch64-apple-darwin` variant — 8 sources, freestanding, -no system headers. Result: **6 / 8 clean** today (`fp/fp.c`, `mem/mem.c`, -`cfree/ifunc_init.c`, `coro/coro.c`, `coro/aarch64.c`, `int/int.c`). -Flags must drop -`-std=c11 -Wpedantic -Wall -Wextra -Werror -ffreestanding -fno-builtin` — -cfree rejects all of these. (`-fno-builtin` is the only one not already -on the stage-2 drop list.) - -- [x] **R1.** Replaced `__inline` with `inline` in rt sources (no - compiler change; cfree already accepts `inline`). -- [x] **R2.** Unknown `#pragma` now silently skipped at the parser - boundary (`pp_next` drops forwarded pragma lines so cpp mode still - re-emits them via `pp_next_raw`). `atomic_common.inc`'s - `#pragma redefine_extname` rename was dropped from source; the - `_c`-suffixed functions were renamed directly to their final library - names (no clang-builtin collision on the cfree side). -- [x] **R3.** `__builtin_offsetof(T, m)` now folds inside `cexpr_unary` - using the existing `offsetof_designator` helper. Unblocks - `_Static_assert(offsetof(...))`. -- [x] **R4.** Member-level `_Alignas(N)` now raises the field's - `align_override`, which the ABI layout already propagates into the - containing aggregate's alignment (`src/abi/abi.c:195,213,223`). -- [x] **R5.** `__int128`, `__int128_t`, `__uint128_t` recognized as - type specifiers (`TY_INT128`/`TY_UINT128`, size 16, align 16). - Typedef-only use parses; any `cg_load`/`cg_store`/`cg_binop`/ - `cg_unop`/`cg_convert` on int128 panics with a clear - "`__int128` codegen not implemented" diagnostic. Codegen support is - out of scope for this milestone. -- [x] **R6.** Missing rt builtins wired up in the parser. - - `__builtin_trap`, `__builtin_unreachable` → new `cg_intrinsic_void`, - `INTRIN_TRAP` / `INTRIN_UNREACHABLE` (already implemented in all - three backends). - - `__builtin_clz`, `__builtin_clzl`, `__builtin_clzll` → - `cg_intrinsic_unary_to_int(INTRIN_CLZ)`; operand type drives width. - - `__builtin_memcpy`, `__builtin_memmove`, `__builtin_memcmp`, - `__builtin_memset` → rewritten at `try_parse_builtin_call` to plain - calls to the libc functions of the same name, so runtime-`n` works. - Caller must declare the libc prototype (rt's `<string.h>` does). -- [x] **R7.** `__func__`, `__FUNCTION__`, `__PRETTY_FUNCTION__` - predefined identifiers (C99 §6.4.2.2). Synthesized lazily in - `parse_primary` as a NUL-terminated `char[N+1]` literal in `.rodata`, - using a new `Parser.cur_func_name` field set around - `parse_function_body`. Outside a function body, a clean diagnostic. -- [x] **R10.** File-scope `__asm__("...")` declarations - (a GCC extension, also accepted by clang). `parse_translation_unit` - recognizes `__asm__` / `asm` at TU scope, decodes the string-literal - payload, and feeds it through `parse_asm` against the current object - emitter. The object symbol table now reuses existing symbols by name, - so C declarations before/after asm labels bind to the same `ObjSymId`. - For `coro/aarch64.c`, the AArch64 assembler also accepts `stp`/`ldp` - on `d0..d31`, supports `csinc`, and predefines - `__USER_LABEL_PREFIX__` as `""` for ELF-style targets and `"_"` for - Mach-O. Verified with: - `build/cfree cc -target aarch64-apple-darwin -g -c rt/lib/coro/aarch64.c ...`. - -After R1–R10, two blockers remain for the 8-source `aarch64-apple-darwin` -rt probe: - -- [ ] **R8.** `__builtin_ctzl` / `__builtin_ctzll` not wired (only - `__builtin_ctz` is). Same shape as R6's `clz` wiring; just needs the - three symbols added to the gate in `try_parse_builtin_call` and - routed through `INTRIN_CTZ`. Blocks `int64/int64.c:217`. -- [ ] **R9.** `__atomic_always_lock_free(size, ptr)` and - `__atomic_is_lock_free(size, ptr)` must fold at compile time when - `size` is a constant — `atomic_common.inc`'s `IS_LOCK_FREE_n` macros - expand to these inside `case 1: ... case 16:` arms and rely on the - fold to elide unreachable branches. Plain runtime calls would still - link but the macros wrap the result in a switch over `size`, so - without folding cfree would emit per-size dispatch that the rt - layout expects to be dead-code-eliminated. Blocks - `atomic/atomic_freestanding.c:77`. - -Additionally listed in the larger SDK ingest plan but not yet seen in -the 8-source rt probe: `__builtin_*_overflow` (for `int/int.c`'s -`__addvsi3` family — currently the source uses manual overflow checks, -not the builtins). - -## How to re-run the audits - -Stage-2 audit (src + driver): - -```sh -make && cp build/cfree build/cfree-stage1 -BIN=$(pwd)/build/cfree-stage1 -SDK=$(xcrun --show-sdk-path) -FLAGS="--sysroot=$SDK -isystem rt/include -Iinclude -Isrc" -DFLAGS="--sysroot=$SDK -isystem rt/include -Iinclude" -for f in $(find src -name '*.c' | sort); do - $BIN cc $FLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |" -done -for f in $(find driver -name '*.c' | sort); do - $BIN cc $DFLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |" -done -``` - -System-header ingest probe (after A2 work): - -```sh -SDK=$(xcrun --show-sdk-path) -DEFS="-D__GNUC__=4 -D__GNUC_MINOR__=2 -D__arm64__=1 -D__aarch64__=1 \ - -D__LITTLE_ENDIAN__=1 -D__APPLE__=1 -D__MACH__=1" -for h in sys/stat.h sys/mman.h sys/syscall.h fcntl.h unistd.h signal.h \ - pthread.h dlfcn.h mach/mach.h mach/mach_vm.h mach/vm_map.h \ - stdio.h stdlib.h string.h; do - echo "#include <$h>" > /tmp/h.c - $BIN cc $DEFS -isystem rt/include -isystem "$SDK/usr/include" \ - -c /tmp/h.c -o /tmp/h.o 2>&1 | head -1 | sed "s|^|$h: |" -done -``` - -rt ingest probe (`aarch64-apple-darwin` variant): - -```sh -SRCS="lib/int/int.c lib/fp/fp.c lib/mem/mem.c \ - lib/atomic/atomic_freestanding.c lib/cfree/ifunc_init.c \ - lib/int64/int64.c lib/coro/aarch64.c lib/coro/coro.c" -FLAGS="-target aarch64-apple-darwin -DHAS_INT128=1 \ - -Irt/lib/include/common -Irt/lib/impl \ - -Irt/lib/include/lp64_le -Irt/include" -for f in $SRCS; do - $BIN cc $FLAGS -c "rt/$f" -o /dev/null 2>&1 | head -1 | sed "s|^|rt/$f: |" -done -``` - -Then `make self` to confirm a clean stage-2 build end-to-end. diff --git a/doc/TAILCALL.md b/doc/TAILCALL.md @@ -1,234 +0,0 @@ -# Tail Call Support - -First-class tail calls from the C frontend through codegen and the aarch64 -backend. x64 and rv64 follow the same pattern; this document focuses on -aarch64. - -## Current state - -The groundwork is present but nothing is wired end-to-end: - -- `cg_tail_call` (cg.c) — stub that panics `"not in v1 slice"` -- `CG_CALL_TAIL` in `CGCallDesc.flags` (arch.h) — defined and documented, - never set -- `R_AARCH64_JUMP26` (obj.h) — handled identically to CALL26 by the linker; - only the emitted instruction opcode differs -- `aa64_b_base()` / `aa64_br()` — defined in the ISA layer, never used for - tail calls -- `test/elf/cases/16_tail_call.c` — verifies that cfree's linker handles - JUMP26 from a clang-compiled object; does not test cfree generating tail calls - -## Architecture constraint - -The aarch64 backend defers frame layout: frame size, callee-saved register -counts, and stack offsets are computed in `aa_func_end` after all body code is -emitted. The prologue uses NOP placeholders that are patched back-filled by -`aa_func_end`. The epilogue is a single labeled block; all `aa_ret` paths emit -a `B` to that label. - -For tail calls the frame teardown (callee-saved restores + SP restore) must -appear **inline at the call site**, before the `B`/`BR` to the callee — not at -the epilogue. Since frame dimensions are unknown at call-emit time, we use the -same NOP-placeholder-then-patch approach as the prologue. - -## v1 constraints (panic, don't silently miscompile) - -| Scenario | Disposition | -|---|---| -| Tail call from alloca function | `compiler_panic` in `aa_call` | -| Tail call with sret return type | `compiler_panic` in `aa_call` | -| Tail call with stack-passed args | `compiler_panic` in `aa_call` | -| Tail call from variadic function | `compiler_panic` in `aa_call` | -| `musttail` not on a return stmt | `perr` in parser | -| C23 `[[clang::musttail]]` syntax | out of scope | - ---- - -## Step 1 — Frontend: recognize `musttail` - -**`src/parse/attr.h`**: Add `ATTR_MUSTTAIL` to `AttrKind`. - -**`src/parse/parse_type.c`**: Register in the attribute table: -```c -{"musttail", ATTR_MUSTTAIL, AS_NONE}, -``` - -**`src/parse/parse_priv.h`**: Add `u8 in_musttail` to `Parser`. - -**`src/parse/parse_stmt.c`** — `parse_stmt`: before the keyword dispatch check -`starts_attr(p)`. If the parsed list contains `ATTR_MUSTTAIL`, set -`p->in_musttail = 1`. The next token must be `return`; any other statement is a -fatal error. - -**`src/parse/parse_stmt.c`** — `parse_return_stmt`: if `p->in_musttail`: -- call `parse_expr(p)` as usual — the inner call dispatch emits `cg_tail_call` -- do **not** call `to_rvalue` or `cg_ret`; `cg_tail_call` implicitly terminates - the function -- clear `p->in_musttail` before returning - -**`src/parse/parse_expr.c`** — postfix call dispatch: if `p->in_musttail` is -set when a `'('` call is dispatched, emit `cg_tail_call(p->cg, nargs, fn_type)` -instead of `cg_call`. - -## Step 2 — CG layer: implement `cg_tail_call` - -Factor the body of `cg_call` into: -```c -static void cg_call_impl(CG* g, u32 nargs, const Type* fn_type, u16 flags); -``` - -Both `cg_call` and `cg_tail_call` call it, passing `CG_CALL_NONE` or -`CG_CALL_TAIL` respectively. - -When `flags & CG_CALL_TAIL`: -- Set `desc.flags = CG_CALL_TAIL` -- Skip result register allocation; pass a void-typed `OPK_IMM` placeholder in - `desc.ret` so the backend has a typed slot to inspect -- Skip `push(g, …)` — no result, no continuation -- Still call `T->free_reg` for the callee register after `T->call` returns - (the backend has already moved it to x16 by then) - -## Step 3 — AArch64 backend - -### `src/arch/aa64/internal.h` - -Worst-case inline teardown: 5 int-pair LDPs (x19–x28) + 4 fp-pair LDPs -(d8–d15) + 1 fp/lr LDP + 2 SP-add instructions = 12; use 14 for headroom. - -```c -#define AA_TAIL_EP_WORDS 14u -#define AA64_TAIL_SCRATCH 16u /* x16 / ip0: caller-saved, not a pool reg */ -``` - -Add to `AAImpl`: -```c -struct { u32 pos; } *tail_sites; -u32 ntail_sites; -u32 tail_sites_cap; -``` - -Initialize in `aa_func_begin`: -```c -a->tail_sites = NULL; -a->ntail_sites = 0; -a->tail_sites_cap = 0; -``` - -### `src/arch/aa64/ops.c` — `aa_call` - -After the `emit_arg_value` loop and `max_outgoing` update, before the existing -BL/BLR emission: - -```c -if (d->flags & CG_CALL_TAIL) { - if (a->has_alloca) - compiler_panic(…, "musttail not supported in alloca function"); - if (d->abi && d->abi->has_sret) - compiler_panic(…, "musttail not supported with sret return type"); - if (stack_off > 0) - compiler_panic(…, "musttail with stack-passed arguments not supported"); - if (a->is_variadic) - compiler_panic(…, "musttail not supported in variadic function"); - - /* Indirect callees live in x19–x28 (the int pool). Move to ip0 now, - * before the teardown restores those registers from the stack. */ - if (d->callee.kind == OPK_REG) - aa64_emit32(mc, aa64_mov_reg(1, AA64_TAIL_SCRATCH, reg_num(d->callee))); - - /* NOP placeholder; patched with the frame teardown in aa_func_end. */ - u32 site_pos = mc->pos(mc); - for (u32 i = 0; i < AA_TAIL_EP_WORDS; ++i) aa64_emit32(mc, AA64_NOP); - - /* Tail branch. */ - if (d->callee.kind == OPK_GLOBAL) { - u32 b_pos = mc->pos(mc); - aa64_emit32(mc, aa64_b_base()); - mc->emit_reloc_at(mc, mc->section_id, b_pos, R_AARCH64_JUMP26, - d->callee.v.global.sym, d->callee.v.global.addend, 0, 0); - } else if (d->callee.kind == OPK_REG) { - aa64_emit32(mc, aa64_br(AA64_TAIL_SCRATCH)); - } else { - compiler_panic(…, "aarch64 tail call: callee kind %d unsupported", …); - } - - aa_tail_site_push(a, site_pos); - return; /* no return-value extraction; no continuation */ -} -``` - -`aa_tail_site_push` is a small grow-array helper consistent with the existing -`add_patches` pattern. - -### `src/arch/aa64/emit.c` — `aa_func_end` - -After computing `n_int_pairs`, `n_fp_pairs`, `frame_size`, `int_save_off`, -`fp_save_off`, `fp_lr_off` — before placing the epilogue label — patch each -tail call site: - -```c -for (u32 ti = 0; ti < a->ntail_sites; ++ti) { - u32 words[AA_TAIL_EP_WORDS]; - u32 wi = 0; - for (u32 i = 0; i < AA_TAIL_EP_WORDS; ++i) words[i] = AA64_NOP; - - for (i32 i = (i32)n_fp_pairs - 1; i >= 0; --i) { - u32 r0 = 8u + (u32)i * 2u; - words[wi++] = aa64_ldp_d(r0, r0+1, 31, (i32)(fp_save_off + (u32)i*16)); - } - for (i32 i = (i32)n_int_pairs - 1; i >= 0; --i) { - u32 r0 = 19u + (u32)i * 2u; - words[wi++] = aa64_ldp_x(r0, r0+1, 31, (i32)(int_save_off + (u32)i*16)); - } - words[wi++] = aa64_ldp_x(29, 30, 31, (i32)fp_lr_off); - - /* SP restore — mirrors emit_sp_add but writes into words[]. */ - if (frame_size <= 0xfff) { - words[wi++] = aa64_add_imm(1, 31, 31, frame_size, 0); - } else if ((frame_size & 0xfff) == 0 && (frame_size >> 12) <= 0xfff) { - words[wi++] = aa64_add_imm(1, 31, 31, frame_size >> 12, 1); - } else { - words[wi++] = aa64_add_imm(1, 31, 31, (frame_size >> 12) & 0xfff, 1); - words[wi++] = aa64_add_imm(1, 31, 31, frame_size & 0xfff, 0); - } - - if (wi > AA_TAIL_EP_WORDS) - compiler_panic(…, "aarch64: tail epilogue overflow (%u words)", wi); - - u32 p0 = a->tail_sites[ti].pos; - for (u32 i = 0; i < AA_TAIL_EP_WORDS; ++i) - aa64_patch32(obj, sec, p0 + i * 4u, words[i]); -} -``` - -## Step 4 — Tests (red-green) - -Write tests first, then implement. - -**`test/parse/`** — attribute parse test: `__attribute__((musttail)) return f(x);` -parses without error; missing-return error fires on a non-return statement. - -**`test/cg/`** — direct tail call: build `int f(int x)` that musttail-calls -`int g(int x)` via `cg_tail_call`; verify the emitted aarch64 text contains a -JUMP26 relocation (B) and no CALL26 (BL). - -**`test/cg/`** — indirect tail call via function pointer: verify `MOV x16, xN` -+ `BR x16` and no BLR. - -**`test/cg/`** — end-to-end: tail-recursive sum that overflows without TCO; -compile and run via `cfree run` and verify correctness. - -## Files touched - -| File | Change | -|---|---| -| `src/parse/attr.h` | `+ATTR_MUSTTAIL` | -| `src/parse/parse_type.c` | `musttail` attribute table entry | -| `src/parse/parse_priv.h` | `+u8 in_musttail` to `Parser` | -| `src/parse/parse_stmt.c` | attribute prefix detection; musttail return path | -| `src/parse/parse_expr.c` | `cg_tail_call` dispatch when `in_musttail` | -| `src/cg/cg.c` | factor `cg_call_impl`; implement `cg_tail_call` | -| `src/arch/aa64/internal.h` | constants, `AATailCallSite`, fields in `AAImpl` | -| `src/arch/aa64/ops.c` | tail-call branch in `aa_call`; `aa_tail_site_push` | -| `src/arch/aa64/emit.c` | init in `aa_func_begin`; patch loop in `aa_func_end` | -| `test/parse/` | musttail attribute parse test | -| `test/cg/` | direct/indirect/e2e tail call tests | diff --git a/doc/TOY_REWRITE_TASKS.md b/doc/TOY_REWRITE_TASKS.md @@ -1,275 +0,0 @@ -# Toy Rewrite Task List - -This tracks the implementation rewrite toward `doc/TOY.md`. Work proceeds -red-green: rewrite or add tests first, run focused failures, then implement the -smallest slice that moves those tests green. - -Completion rule: this task list is not a partial-coverage plan. Future agents -must continue until `lang/toy` is fully aligned with `doc/TOY.md`, including -internal refactors, representation cleanup, stronger diagnostics, and removal -of temporary shortcuts. A green `make test-toy` is required after each slice, -but it is not by itself proof that the language implementation is complete. - -## Phase 1: Spec-shaped Existing Coverage - -- [x] Rewrite existing runnable toy cases from legacy `int` to explicit scalar - types, initially using `i64` where old `int` behavior was 64-bit. -- [x] Replace legacy logical operators `&&` and `||` with `and` and `or`. -- [x] Replace legacy prefix dereference `*p` with postfix `p.*` in expression - and assignment contexts. -- [x] Prefix public CG coverage builtins with `@`, including type-query, - memory, atomic, vararg, intrinsic, target, and asm helpers. -- [x] Replace legacy helper names with spec spellings where a direct mapping - exists: - `index(p, i)` -> `p[i]`, `sizeof<T>()` -> `@sizeof<T>()`, - `alignof<T>()` -> `@alignof<T>()`, `offsetof<T>(f)` -> - `@offsetof<T>(f)`. -- [x] Keep expected exit codes unchanged unless a test intentionally changes - semantics. - -## Phase 2: Focused Red Tests - -- [x] Add a first-class byte string/global data initializer case. -- [x] Add array literal and indexing coverage. -- [x] Add pointer-to-array address behavior cases. -- [x] Add `let name = expr` inference coverage. -- [x] Add `var name = expr` inference coverage. -- [x] Add `NULL as *T` pointer literal coverage. -- [x] Add record declaration, record literal, omitted field zero-fill, and field - projection coverage. -- [x] Add tuple record literal and numeric field projection coverage. -- [x] Add enum declaration and dot-constant typed initializer coverage. -- [x] Add `pub`, `extern`, and `alias` declaration coverage. -- [x] Add one error-test harness pass for compile-fail diagnostics before adding - many negative parser/type cases. - -## Phase 3: Frontend Structure - -- [x] Split the previous single `toy.c` implementation into explicit frontend - modules: public compile entry, lexer, parser core/context, symbol/scope - tables, literal helpers, and parser implementation. -- [x] Introduce a Toy type layer that can represent aliases, nominal records, - tuple records, enums, arrays, pointers, function pointers, qualifiers, and - anonymous records while lowering through public CG API types. -- [x] Replace fixed-size global/local/function arrays with context-owned - growable storage; `lang/toy` stays on the public frontend boundary, so - this uses explicit `ToyParser` ownership rather than internal core vector - headers. -- [x] Add lexical scopes for block-local declarations without global state. -- [x] Keep codegen target and ABI details hanging off `ToyParser`/future Toy - context structures. -- [x] Remove legacy compatibility spellings and temporary lowering shortcuts - once spec-shaped replacements have full coverage. -- [x] Add focused negative tests for every rejected spec form and unsupported - backend feature path. - -## Phase 4: Spec Features - -- [x] Declarations: `pub`, `extern`, attributes, thread-local objects, alias, - readonly/mutable object definitions, function-local statics. -- [x] Types: full scalar set, address-space pointers, arrays, function types, - qualifiers, aliases, records, tuple records, anonymous records, enums. -- [x] Expressions: `NULL`, byte strings, casts, postfix calls/index/field/deref, - type-safe lvalues, precedence-island restrictions, aggregate literals. -- [x] Statements: assignment-only lvalues, expression statements, labels, - labelled loops/switches, value-bearing `break`, `return tail`. -- [x] Expression control flow: `if` expressions, result-typed `while<T> else`, - expression switches. -- [x] Builtins: varargs, type queries, memory, data relocations, atomics, - intrinsics, target capability queries, and typed inline assembly. -- [x] Error tests: syntax, type mismatch, declaration order, unsupported target - intrinsics, invalid attributes, direct recursive records, invalid tail - calls, and invalid builtin forms. - -## Remaining Work To Reach `doc/TOY.md` - -The previous rewrite checklist is complete, but `doc/TOY.md` still describes -several behaviors that the implementation does not yet provide. These are now -tracked as red tests first; keep the tests aligned with the spec and make the -implementation catch up in focused green slices. - -- [x] Implement function-local static object initializers containing - `@labeladdr(label)` while the containing function is open. The positive - coverage is `test/toy/cases/119_static_labeladdr_data.toy`. -- [x] Implement `@symdiff(lhs, rhs, addend)` object initializers for every - object format/target where `doc/TOY.md` exposes the builtin, or narrow the - spec if the portability contract changes. The positive coverage is - `test/toy/cases/120_data_symdiff.toy`. -- [x] Expose dynamic `@memcpy`, `@memmove`, and `@memset` lowering through Toy - by accepting expression-valued `size` and `align` operands, not only - numeric literals. The positive coverage starts with - `test/toy/cases/121_dynamic_memory_builtin.toy`. -- [x] Propagate `.entsize(N)` from Toy data-definition attributes through - `CfreeCgDataDefAttrs` into object sections, including merge/string - sections. The red API assertion lives in `test/api/cg_type_test.c`. -- [x] Add object-format inspection coverage for Toy `.entsize(N)` once the - public object inspection surface can assert section entry size directly - without relying on textual objdump details. - -## Current Slice - -- [x] Convert legacy runnable tests to spec syntax. -- [x] Add minimal new syntax tests for `@` builtins, `and`/`or`, postfix - dereference, inference, and `NULL`. -- [x] Implement only enough lexer/parser/codegen support to make the converted - core tests pass. -- [x] Add object-inspection checks for toy cases and lower declaration/data - attributes for weak symbols, aliases, object alignment, readonly storage, - and common definitions. -- [x] Extend memory-operation parsing for `@memcpy`, `@memmove`, and - `@memset` access flags, and cover address-of indexed lvalues. -- [x] Add target capability query builtins for symbol and backend features. -- [x] Add explicit rounding-mode conversion builtins for int/float edges. -- [x] Parse anonymous `record { ... }` type literals and use the existing - aggregate initializer/projection lowering for locals. -- [x] Parse inline ABI attributes on function parameters and return types. -- [x] Parse record field attributes for alignment/packed field layout. -- [x] Allow value blocks for expression control flow to contain preceding - statements before the final unsuffixed expression. -- [x] Give statement switches control scopes and support labelled switch - breaks. -- [x] Support forward record declarations and unresolved pointer fields using - erased pointer storage. -- [x] Emit top-level record data definitions from constant named-field - initializers. -- [x] Parse atomic `access(...)` groups for typed load/store/RMW/cmpxchg - operations. -- [x] Parse atomic `access(...)` groups for legality and lock-free queries. -- [x] Accept keyword-shaped dot constants for atomic RMW operations. -- [x] Use typed local initializer context for `NULL` pointer literals. -- [x] Use function parameter context to resolve enum dot constants in calls. -- [x] Resolve enum dot constants in switch arm labels from selector type. -- [x] Emit floating-point top-level object initializers with - `cfree_cg_data_float`. -- [x] Cover `@unreachable()` as a statement-position terminator builtin. -- [x] Route no-argument low-level intrinsics through CG and cover unsupported - target intrinsic diagnostics with an error test. -- [x] Implement implicit dereference for field access on `*Record`. -- [x] Support field assignment lvalues for records. -- [x] Lower record `.packed` and `.align(N)` layout attributes through record - type construction. -- [x] Support address-of field lvalues. -- [x] Preserve operand type for `@expect`. -- [x] Preserve operand type for integer scalar intrinsics. -- [x] Add negative coverage for direct by-value recursive records. -- [x] Support labelled result-typed while expressions. -- [x] Distinguish loop and switch control scopes so `continue` can only target - loops, including unlabeled `continue` through nested switches. -- [x] Allow expression switches over enums to omit `default` only when every - enum value is covered. -- [x] Enforce precedence-island boundaries while allowing parenthesized mixed - islands and normal additive/multiplicative precedence. -- [x] Reject legacy `int`, `&&`, and `||` spellings now that spec-shaped tests - use explicit scalar types and `and`/`or`. -- [x] Lower `@labeladdr` and computed `goto` through the public CG label-address - API, with a CG-level compare-chain fallback for targets without native - label-address branches. -- [x] Support `@pad` and `@align` low-level data initializer builtins in - top-level byte-array object definitions. -- [x] Reject `restrict` qualifiers on non-pointer types while preserving valid - pointer-qualified type parsing. -- [x] Replace accidental direct-recursive-record failures with an explicit - incomplete-record-by-value diagnostic. -- [x] Keep function-local statics in lexical local scope by representing them - as scoped symbol-backed variables instead of global source bindings. -- [x] Support byte-string initializers for function-local static byte arrays. -- [x] Structurally split `lang/toy` into `compile.c`, `lexer.c/.h`, - `internal.h`, `parser_core.c`, `symbols.c`, `literals.c`, and - `parser.c` while preserving the public `cfree_toy_compile` API. -- [x] Replace fixed parser-context arrays for locals, functions, globals, - named types, scopes, and labels with growable `ToyParser`-owned storage - and explicit parser cleanup. -- [x] Split type inspection and named-type registration into `types.c`, with a - `ToyTypeTable` owned by `ToyParser` instead of loose named-type arrays. -- [x] Split declaration, ABI, field, and record attribute parsing into - `attrs.c` while preserving focused coverage for attribute-heavy cases. -- [x] Add `@pcrel` lowering for typed top-level array initializers and keep - `@symdiff` parsed for supported relocation paths; Mach-O object emission - for `@symdiff` remains a backend-format follow-up. -- [x] Remove legacy untyped atomic, `@index`, prefix-deref, and old - `@va_arg(ap, T)` compatibility spellings, with negative tests for each. -- [x] Remove legacy `@target()` alias in favor of `@target_arch()`, with - negative coverage. -- [x] Track mutability on `ToyVar` and reject assignment to block-local `let` - storage while preserving mutation through pointer pointees. -- [x] Accept tuple field indexes in `@offsetof<T>(N)`. -- [x] Emit top-level tuple record data definitions from positional constant - initializers. -- [x] Add negative coverage for invalid tail calls, including variadic callees - and return-type mismatches. -- [x] Add declaration-order negative coverage for function and type use before - declaration. -- [x] Add invalid declaration, field, and record attribute negative coverage. -- [x] Support `@pcrel` data initializer builtins in typed top-level record and - tuple integer fields as well as arrays. -- [x] Add negative coverage for mutually recursive by-value records through - forward declarations. -- [x] Add declaration coverage for `extern let` and extern thread-local object - attributes. -- [x] Add data initializer negative coverage for `@pcrel` outside initializer - context and invalid pcrel slot widths. -- [x] Add a Toy-owned type metadata layer with `ToyTypeId`, structural type - entries for builtins/arrays/pointers/functions/anonymous records, - nominal entries for aliases/records/tuples/enums, qualifier entries, and - symbol-table links from locals/functions/globals/named types while - preserving existing public-CG lowering. -- [x] Move local/static-local/function/global insertion behind `symbols.c` - helpers and move named-type lookup into `types.c` so parser code stops - owning those table mutation details directly. -- [x] Replace computed-goto target fixed scratch storage with parser-owned - growable scratch space, removing the remaining fixed label-target cap. -- [x] Start replacing temporary inline-asm helpers with spec-shaped typed - `@asm<T>` parsing for `outputs(...)`, `inputs(...)`, `clobbers(...)`, - and `flags(...)`, including a runnable one-output/two-input operand case. -- [x] Rewrite the `@asm_int` coverage to spec-shaped typed `@asm<i64>` and - remove the temporary `@asm_int` builtin with negative legacy coverage. -- [x] Rewrite the `@asm_imm` coverage to typed `@asm<i64>` immediate inputs - and remove the temporary `@asm_imm` builtin with negative legacy - coverage. -- [x] Rewrite the `@asm_memory` and `@asm_clobber` coverage to typed - `@asm<void>` with `clobbers(...)`, and remove those temporary helpers - with negative legacy coverage. -- [x] Extend typed `@asm<T>` operands for memory inputs, inout outputs, and - early-clobber constraints; rewrite and remove the remaining temporary - `@asm_mem`, `@asm_inout`, and `@asm_early` helpers with negative legacy - coverage. -- [x] Rewrite the remaining legacy non-generic `@asm(...)`/`@asmnop()` - call sites to `@asm<void>` and remove those temporary parser branches - with negative legacy coverage. -- [x] Rewrite the remaining `@typecheck`, `@byteconst`, and `@fieldtest` - helper call sites to ordinary Toy constants/records and remove those - temporary builtins with negative legacy coverage. -- [x] Replace the implicit built-in `Pair` test type with an ordinary record - declaration and remove the parser/context special case for `Pair`. -- [x] Remove temporary `@target_os()` by fixing Apple ARM64 vararg `va_list` - lowering in the backend and running Toy vararg tests unconditionally, - with negative legacy coverage for `@target_os`. -- [x] Support returning expression-control-flow values directly from - `return if`, `return switch`, `return while<T>`, and labelled - result-typed `return label: while<T>` by reusing the typed - value-to-local lowering path. -- [x] Parse typed inline-assembly `clobber_abi(.caller_saved)` groups and - pass them through to `CfreeCgInlineAsm.clobber_abi_sets`. -- [x] Support typed inline-assembly record results by validating record fields - against multiple outputs and materializing those outputs into an - aggregate value for normal field projection. -- [x] Split typed inline-assembly parsing and validation into `lang/toy/asm.c` - so the main parser no longer owns that low-level builtin subgrammar. -- [x] Add negative coverage for typed inline-assembly missing outputs, result - type mismatches, record output count/name mismatches, unknown flags, and - unknown ABI clobber sets. -- [x] Move type parsing into `lang/toy/types.c` beside the Toy type metadata - layer, leaving `parser.c` focused on expressions, statements, and - declarations. -- [x] Preserve source field types for named records separately from erased CG - storage, so forward-declared record pointer fields can be initialized, - assigned, projected, and passed to typed functions after completion. -- [x] Remove the temporary inline-asm `arch(...)` template/clobber selector in - favor of ordinary typed asm strings and `@target_arch()`-based language - tests, with negative coverage for the removed selector. -- [x] Parse typed inline-assembly named input operands of the form - `name = in("constraint", expr)`. -- [x] Split Toy parsing into focused builtin (`builtins.c`), expression/lvalue - (`expr.c`), declaration (`decls.c`), data initializer (`data.c`), and - statement/control-flow (`parser.c`) modules, with shared helper - boundaries declared in `internal.h`. diff --git a/doc/X64_PARITY_CHECKLIST.md b/doc/X64_PARITY_CHECKLIST.md @@ -1,389 +0,0 @@ -# x64 parity checklist - -Goal: bring `x86_64` to the same practical coverage as `aarch64` across -standalone asm, disasm, C/toy compilation, object/link output, runtime, and -debug tooling. - -## Status as of 2026-05-21 - -- Fixed an optimized x64 two-address arithmetic hazard where preparing the - destination register could clobber the RHS. This covered integer ALU ops, - scalar FP binops, and variable shifts whose count must be routed through - `%cl`. The motivating failure was `24_tail_arg_permute` computing `b * 2` - into `%r8` and then immediately overwriting `%r8` with `a`. -- Implemented x64 `u64`/FP conversions for `CV_ITOF_U` and `CV_FTOI_U`, closing - two explicit backend panics in scalar conversion coverage. -- Made x64 tail-call handling conservative when stack arguments are involved: - direct emission falls back to a normal call when the current frame cannot - reuse enough incoming stack argument area, and optimized call planning - clears `CG_CALL_TAIL` for stack-argument tail calls. Register-only tail calls - remain eligible. -- Fixed virtual-register materialization so delayed arithmetic/comparisons use - fresh SSA values instead of destructively redefining one of their source - virtual registers. Optimized x64 jump-table lowering is enabled again and the - `123_spec_demo` x64 O1/O2 switch path now uses the table path correctly. -- Fixed x64 cross-test runner overhead by passing `--pull=never` to podman. - `--net=none` blocked container networking, but podman still performed a host - image pull/manifest check before launch. A one-case x64 toy O0/O1/O2 smoke - dropped from roughly 30 seconds per container lookup to roughly 2 seconds. -- Verified after these changes: `make bin`; targeted x64 toy execution for - `24_tail_arg_permute`, `25_tail_many_stack_args`, `26_tail_live_pressure`, - `29_tail_cross_arch_stack`, `09_function_params`, `100_record_data_relocation`, - `120_data_symdiff`, `123_spec_demo`, and `65_rounding_conversions`. A full - x64 toy cross run is still pending after the podman runner fix. - -## Status as of 2026-05-21 (parity push) - -Landed across the seven areas; commit `6b82eb5` "Bring x64 to parity with aa64" -(20 files, +3560 / −688). - -- Built the x64 ISA descriptor layer. `src/arch/x64/isa.{h,c}` now holds a - 75-row `x64_insn_table` plus per-format pack/unpack helpers. Encoder - (`emit.c`), decoder (`disasm.c`), and assembler (`asm.c`) all consult the - same table. Adding a new instruction is a one-row change. -- Encoder migration (phase 2): 19 `emit_*` bodies in `src/arch/x64/emit.c` - now build a format struct and call `x64_<format>_pack`. Byte-for-byte - output unchanged (verified by `cmp` against the pre-migration - `123_spec_demo.O2.o`). -- Assembler refactor (phase 3): `src/arch/x64/asm.c` mnemonic dispatch goes - through `find_mnemonic_row` / `parse_and_emit_for_format` instead of the - hand-coded `sym_eq` cascade. -- Disassembler rewrite: hand-coded if/else chain replaced by - `x64_decode_prefixes` → `x64_disasm_find` → `x64_print_operands`. The - `123_spec_demo` jump-table dispatch now disassembles cleanly with zero - `.byte` fallback. -- Codegen bug fix: `emit_extend_rr` was a silent no-op when `src_size >= 32`, - leaving the destination register undefined. Repaired with a `mov dst, src` - when needed. Closes the only baseline x64 toy failure - (`123_spec_demo/X-O2:x64`). -- Darwin x64 ABI seam: new `src/abi/abi_apple_x64.c` exporting - `apple_x64_vtable`; `src/arch/x64/arch.c` now branches on - `CFREE_OS_MACOS`. Previously x86_64-apple-darwin used the Linux SysV - vtable unconditionally. -- SysV x64 variadic metadata populated: `vararg_gp_offset` / - `vararg_fp_offset` derived from fixed-arg consumption using named - pool-size constants. -- Linker dynamic relocations: `src/link/link_reloc.c` handles - `R_X86_64_RELATIVE`, `R_X86_64_GLOB_DAT`, `R_X86_64_JUMP_SLOT`; `R_X86_64_COPY` - now panics with a clear message instead of falling through to the generic - "unsupported reloc kind" path. -- libc test harnesses parameterized: `test/libc/{musl,glibc}/run.sh` honour - `CFREE_LIBC_ARCHES` (default `aa64`; `x64` available). Per-arch - sysroot/rt/triple/loader lookup with graceful SKIP when artifacts missing. - `test/test.mk` wires the per-arch sysroot prerequisites. - `test/libc/cases/01_syscall_write.c` splits into per-arch syscall ABI - branches under `#ifdef`. -- Inline asm: new `test/arch/x64_inline_test.c` with 6 smoke cases driving - `CGTarget->asm_block` directly; `asm.c` gains the `%b` byte-register - modifier and a full 8-bit register spelling table. -- Debugger scaffolding: new `src/arch/x64/dbg.c` with `INT3` sentinel and a - conservative shim builder that declines on RIP-relative operands. - `src/dbg/displaced.c` and `src/dbg/step.c` widen arch dispatch — x64 now - falls back to `CFREE_UNSUPPORTED` gracefully instead of failing in - `dbg_displaced_prepare`. -- Verification: `make test` 3616/0/0; x64 toy R/L/X 1286/0/0 (baseline - 1285/1/0); x64 musl 18/18 (9 static + 9 dynamic); x64 glibc 9/9. - -## Status as of 2026-05-21 (x64 runtime-linked push) - -- `driver/runtime.c` now auto-builds the x64 runtime archive with the same - higher-level freestanding members that `rt/Makefile` builds for x64: - assert, `si_div`, string, stdlib, qsort, printf, cache, atomics, ifunc, - int/int64, and coroutine sources. -- Added `cc-auto-builds-and-links-libcfree-rt-x64` to `test/driver/run.sh`. - The regression builds an x64 executable through `cfree cc --support-dir`, - forces implicit `build/rt/x86_64-linux/libcfree_rt.a` creation, and checks - that the auto-built archive contains `printf.c`. -- Verified clean x64 runtime rebuilds for Linux and Darwin: - `rm -rf build/rt/x86_64-linux && make rt-x86_64-linux` and - `rm -rf build/rt/x86_64-apple-darwin && make rt-x86_64-apple-darwin`. - The x64 coroutine source `rt/lib/coro/x86_64.c` is compiled through - `build/cfree cc` in both variants. -- Verified x64 runtime-linked execution on Darwin/arm64 via Podman - linux/amd64: `CFREE_RT_RUNTIME_ARCHES=x64 bash test/rt/run.sh` passed - 5/0/0, and an explicit driver-auto-built x64 runtime binary exited 42 under - `podman run --platform linux/amd64`. - -## Asm / disasm - -- [x] Expand `src/arch/x64/asm.c` beyond the current small AT&T subset: - branches, calls, arithmetic, shifts, compares, loads/stores, LEA, atomics, - SSE scalar FP, and backend-emitted forms. - - 2026-05-21: rewritten to be table-driven via `x64_insn_table`. Every - mnemonic the prior dispatch handled flows through the table; new - mnemonics land as one row + a format parser. Mnemonics outside the - current corpus are not yet wired (per-format parsers exist only for - the formats the standalone-asm and inline-asm tests exercise today — - see phase-3 report for the list). -- [x] Build an x64 ISA descriptor layer equivalent in role to - `src/arch/aa64/isa.{h,c}` so encoder, decoder, printer, and tests share - one instruction description. - - 2026-05-21: `src/arch/x64/isa.{h,c}` landed; encoder, decoder, and - assembler all consult `x64_insn_table`. -- [x] Expand `src/arch/x64/disasm.c` to decode every instruction emitted by - x64 codegen and every standalone-asm form accepted by the assembler. - - 2026-05-21: `disasm.c` now drives entirely through `x64_disasm_find` - + `x64_print_operands`. Cross-checked against `llvm-objdump` on the - spec_demo binary — operand syntax matches instruction-by-instruction. -- [x] Add x64 listing tests under `test/asm/listing/`. - - 2026-05-22: added `x64_symbols` listing coverage for function/local - labels and x64 PC-relative relocation annotations. -- [ ] Make asm round-trip (`S`) meaningful for x64 codegen output and gate the - x64-emitted corpus on it. -- [x] Update `test/asm/regen.sh` or add an x64 variant for clang/objdump golden - regeneration. - - 2026-05-22: `CFREE_TEST_ARCH=x64 test/asm/regen.sh ...` now filters - by `.targets`, uses the x86_64 clang target, and regenerates x64 - encode/decode/listing goldens. - -## Inline asm - -- [x] Broaden x64 inline-asm template rendering to cover operand modifiers and - memory forms expected by GNU-style x86_64 asm. - - 2026-05-21: `%b` byte-register modifier landed with a full r0..r15 - byte-name table. `%h` (high-byte), `%k` (32-bit alias), and `%z` - (instruction-size selector) remain unimplemented. - - 2026-05-21: GNU x86 register modifiers now render on x64: - `%w` = 16-bit, `%k` = 32-bit, `%h` = high-byte register where legal, - `%b` handles low byte registers including REX-only byte names, and - `%z` selects the instruction suffix from operand type. Symbolic - `%[name]` operands work with the same modifier path. -- [x] Add an x64 inline-asm unit test parallel to `test/arch/aa64_inline_test.c`. - - 2026-05-21: `test/arch/x64_inline_test.c` lands with 6 smoke cases - and a `test-x64-inline` Makefile target wired into `make test`. -- [ ] Verify register clobbers, `"cc"`, `"memory"`, callee-saved preservation, - early-clobber, matching constraints, and named operands on x64. -- [x] Add C and toy inline-asm execution cases that run on an x64 host/runner. - - 2026-05-21: added `cg_x64_inline_asm_modifiers.c`; verified x64 parse - R/E paths at O0 and O1 alongside the existing x64 inline asm C smoke. - -## C / toy codegen - -- [~] Close remaining explicit x64 backend panics in `src/arch/x64/ops.c` - (`u64`/FP conversions, unsupported bitcasts, non-constant memset byte - paths, indirect aggregate arg shapes, tail-call/sret gaps, and other - `unsupported`/`unimpl` paths). - - 2026-05-21: `u64`/FP conversions are implemented; tail-call stack-arg - cases are handled conservatively rather than panicking or emitting an - invalid sibling tail call. - - 2026-05-21 (parity push): the remaining `unsupported`/`unimpl` paths - (same-class bitcast, tail+sret, indirect aggregate args, memset - non-imm byte, alloca align >16, exotic atomic op kinds, x64-unique - "shift count kind") are *all* mirrored in aa64 — they are shared - architectural gaps, not x64-specific regressions. Leaving this row - partially checked until the corresponding aa64 gaps close too. -- [~] Match aa64 coverage for scalar integer, FP, pointer, aggregate, varargs, - atomics, intrinsics, labels, computed goto, switch lowering, and alloca. - - 2026-05-21: scalar optimized integer/FP RHS clobbers, variable shift - count clobbers, and optimized x64 jump-table virtual-reg materialization - are fixed. - - 2026-05-21 (parity push): `emit_extend_rr` 32→64 silent-no-op fixed - (was leaving destination uninitialized). x64 toy R/L/X 1286/0/0 — - feature parity with aa64 for the toy corpus is reached. -- [ ] Prove x64 optimized and unoptimized C parse corpus paths with targeted - `CFREE_TEST_ARCH=x64` runs. -- [x] Prove toy cross-arch path `X` for x64 alongside aa64 cases. - - 2026-05-21: targeted x64 `X` runs pass for the tail-call, conversion, - data-relocation, and switch regression cases listed above. Full x64 `X` - run should be repeated after the podman `--pull=never` runner fix. - - 2026-05-21 (parity push): full x64 R/L/X toy run: 1286/0/0. - -## ABI / platform - -- [x] Finish SysV x86_64 ABI edge cases: aggregate classification, register save - area, variadic call metadata (`AL`), sret, byval, and mixed int/FP returns. - - 2026-05-21: variadic metadata (`vararg_gp_offset`, - `vararg_fp_offset`) now populated by `sysv_x64_compute_func_info`. - At that point mixed int/FP aggregate classification was still pending. - - 2026-05-21: SysV aggregate classification now computes INTEGER/SSE - per eightbyte for small records, including mixed int/FP records and - homogeneous float pairs. x64 call planning/direct emission now routes - direct multi-part args wholly to stack when either register pool lacks - capacity, preserves indirect sret sources that conflict with `%rdi` or - `%rax`, and accepts global byval sources. Added ABI metadata coverage - plus x64 parse execution cases for mixed record params/returns. -- [x] Decide and implement x86_64 Darwin ABI differences where they diverge from - Linux/SysV behavior. - - 2026-05-21: `apple_x64_vtable` seam added (thin delegate to SysV - today). `x64_abi_vtable` branches on `CFREE_OS_MACOS`. Future - Darwin-only behaviour can land in `abi_apple_x64.c` without - re-touching SysV. -- [ ] Implement x86_64 `long double` semantics (`x87` 80-bit in 16-byte - storage) or document a staged compatibility mode. -- [ ] Audit predefined macros, target triples, and driver target selection for - Linux and Darwin x86_64 parity. - -## Object / link / driver - -- [x] Ensure ELF x86_64 relocations cover all codegen, asm, TLS, PLT/GOT, ifunc, - and linker-script cases currently passing for aa64. - - 2026-05-21: `link_reloc.c` adds the missing `R_X86_64_RELATIVE`, - `R_X86_64_GLOB_DAT`, `R_X86_64_JUMP_SLOT` cases (previously fell - through to a generic panic) and gives `R_X86_64_COPY` a descriptive - error. Static/dynamic ELF link cases pass for x64 musl + glibc. - - 2026-05-21: object relocation iteration now reports x86_64 ELF - relocation names, and x64 ELF roundtrip/link paths cover `PLT32`, - `PC32`, GOTPCREL/GOTPCRELX, TLS local-exec, ifunc, dynamic - `RELATIVE`/`GLOB_DAT`/`JUMP_SLOT`, and linker-script cases. -- [ ] Bring Mach-O x86_64 object/link coverage up to the aa64 Mach-O subset. - - Ignored for this ELF-only pass. -- [x] Exercise `cfree as`, `cc`, `ld`, `objdump`, `run`, and `emu` paths with - x64-specific tests where the command is intended to support x64. - - 2026-05-21: `cfree as` and `cfree objdump` confirmed for x64 via - round-trip demo. `cfree cc` / `cfree ld` covered by toy R/L/X and - musl/glibc suites. `emu` remains aa64/rv64-only by current design. -- [x] Add x64 object disassembly annotation coverage for symbols and relocs. - - 2026-05-21: `cfree_disasm_iter` now matches relocations anywhere - inside the decoded instruction byte range, with section filtering so - same-offset relocs in other text sections do not bleed through. - `test/elf/unit/x64_disasm_annotations.c` covers symbol labels plus - `call`, RIP-relative load, and `jmp` reloc annotations. - -## Runtime / libc - -- [x] Build `libcfree_rt.a` for x86_64 Linux and Darwin through cfree, not just - host clang probes. -- [x] Bring x86_64 coroutine/runtime assembly and C sources through the cfree - assembler/compiler path. - - 2026-05-21: clean `rt-x86_64-linux` and - `rt-x86_64-apple-darwin` rebuilds compile the full x64 source set, - including `rt/lib/coro/x86_64.c`, through `build/cfree cc`. The driver - auto-build path now includes the same higher-level x64 runtime members - needed by runtime-linked binaries. -- [x] Retarget musl/glibc libc harnesses to x64 sysroots and run the same cases - currently exercised for aa64. - - 2026-05-21: `test/libc/{musl,glibc}/run.sh` honour - `CFREE_LIBC_ARCHES` (default `aa64`; `x64` available). x64 musl 18/18, - x64 glibc 9/9. -- [x] Add x64 smoke cases that use cfree-emitted bytes, not only clang-produced - harness binaries. - - 2026-05-21: `test/driver/run.sh` adds - `cc-auto-builds-and-links-libcfree-rt-x64`; an explicit - driver-auto-built x64 runtime binary was run via Podman linux/amd64 - and exited 42. - -## Debug / JIT / tooling - -- [~] Add x64 displaced-step/debugger support: `INT3`, RIP-relative fixups, - ucontext register marshalling, and frame walking. - - 2026-05-21: scaffold landed (`src/arch/x64/dbg.c`); `dbg_x64_int3_byte` - + a conservative `dbg_x64_build_shim` that declines on RIP-relative. - `dbg_displaced_prepare` and `dbg_step_resume` dispatch x64 to the new - path, falling back to `CFREE_UNSUPPORTED` gracefully. Real shim - generation (ModR/M decoder + RIP-relative re-encoding) is the next - step. -- [ ] Emit and validate x64 DWARF CFI/line-info details, including frame-pointer - conventions and call-frame rows. -- [ ] Fill x64 JIT support gaps: executable memory, relocations, symbol calls, - TLV/TLS behavior, and native-host execution tests. -- [ ] Decide emulator scope for x86_64; either implement it or mark `emu` as - non-parity for x64. - -## Known pre-existing x64 issue - -- aa64/01_syscall_write [dynamic] musl link is killed by SIGKILL inside - `cfree ld` (deterministic, rc=137). Reproduces with this commit reverted — - not a regression from the parity push. The trigger appears to be the file- - scope inline-asm shape × aa64 dynamic-PIE codepath; other 8 aa64 cases in - the same suite link and run cleanly, and x64 dynamic-PIE works on every - case. Worth a follow-up investigation in the linker. - -## Asm / disasm - -- [x] Expand `src/arch/x64/asm.c` beyond the current small AT&T subset: - branches, calls, arithmetic, shifts, compares, loads/stores, LEA, atomics, - SSE scalar FP, and backend-emitted forms. -- [x] Build an x64 ISA descriptor layer equivalent in role to - `src/arch/aa64/isa.{h,c}` so encoder, decoder, printer, and tests share - one instruction description. -- [x] Expand `src/arch/x64/disasm.c` to decode every instruction emitted by - x64 codegen and every standalone-asm form accepted by the assembler. -- [x] Add x64 listing tests under `test/asm/listing/`. -- [ ] Make asm round-trip (`S`) meaningful for x64 codegen output and gate the - x64-emitted corpus on it. -- [x] Update `test/asm/regen.sh` or add an x64 variant for clang/objdump golden - regeneration. - -## Inline asm - -- [ ] Broaden x64 inline-asm template rendering to cover operand modifiers and - memory forms expected by GNU-style x86_64 asm. -- [ ] Add an x64 inline-asm unit test parallel to `test/arch/aa64_inline_test.c`. -- [ ] Verify register clobbers, `"cc"`, `"memory"`, callee-saved preservation, - early-clobber, matching constraints, and named operands on x64. -- [ ] Add C and toy inline-asm execution cases that run on an x64 host/runner. - -## C / toy codegen - -- [ ] Close remaining explicit x64 backend panics in `src/arch/x64/ops.c` - (`u64`/FP conversions, unsupported bitcasts, non-constant memset byte - paths, indirect aggregate arg shapes, tail-call/sret gaps, and other - `unsupported`/`unimpl` paths). - - 2026-05-21: `u64`/FP conversions are implemented; tail-call stack-arg - cases are handled conservatively rather than panicking or emitting an - invalid sibling tail call. -- [ ] Match aa64 coverage for scalar integer, FP, pointer, aggregate, varargs, - atomics, intrinsics, labels, computed goto, switch lowering, and alloca. - - 2026-05-21: scalar optimized integer/FP RHS clobbers, variable shift - count clobbers, and optimized x64 jump-table virtual-reg materialization - are fixed. -- [ ] Prove x64 optimized and unoptimized C parse corpus paths with targeted - `CFREE_TEST_ARCH=x64` runs. -- [ ] Prove toy cross-arch path `X` for x64 alongside aa64 cases. - - 2026-05-21: targeted x64 `X` runs pass for the tail-call, conversion, - data-relocation, and switch regression cases listed above. Full x64 `X` - run should be repeated after the podman `--pull=never` runner fix. - -## ABI / platform - -- [x] Finish SysV x86_64 ABI edge cases: aggregate classification, register save - area, variadic call metadata (`AL`), sret, byval, and mixed int/FP returns. - - 2026-05-21: completed in the parity follow-up above; long double - remains tracked separately. -- [ ] Decide and implement x86_64 Darwin ABI differences where they diverge from - Linux/SysV behavior. -- [ ] Implement x86_64 `long double` semantics (`x87` 80-bit in 16-byte - storage) or document a staged compatibility mode. -- [ ] Audit predefined macros, target triples, and driver target selection for - Linux and Darwin x86_64 parity. - -## Object / link / driver - -- [ ] Ensure ELF x86_64 relocations cover all codegen, asm, TLS, PLT/GOT, ifunc, - and linker-script cases currently passing for aa64. -- [ ] Bring Mach-O x86_64 object/link coverage up to the aa64 Mach-O subset. -- [ ] Exercise `cfree as`, `cc`, `ld`, `objdump`, `run`, and `emu` paths with - x64-specific tests where the command is intended to support x64. -- [ ] Add x64 object disassembly annotation coverage for symbols and relocs. - -## Runtime / libc - -- [x] Build `libcfree_rt.a` for x86_64 Linux and Darwin through cfree, not just - host clang probes. -- [x] Bring x86_64 coroutine/runtime assembly and C sources through the cfree - assembler/compiler path. -- [ ] Retarget musl/glibc libc harnesses to x64 sysroots and run the same cases - currently exercised for aa64. -- [x] Add x64 smoke cases that use cfree-emitted bytes, not only clang-produced - harness binaries. - -## Debug / JIT / tooling - -- [ ] Add x64 displaced-step/debugger support: `INT3`, RIP-relative fixups, - ucontext register marshalling, and frame walking. -- [ ] Emit and validate x64 DWARF CFI/line-info details, including frame-pointer - conventions and call-frame rows. -- [ ] Fill x64 JIT support gaps: executable memory, relocations, symbol calls, - TLV/TLS behavior, and native-host execution tests. -- [ ] Decide emulator scope for x86_64; either implement it or mark `emu` as - non-parity for x64. - -## Test policy - -- [ ] Add x64-targeted filters/goldens for each new feature as it lands. -- [ ] Keep skips explicit and arch-scoped; do not let x64 cases silently ride - aa64 defaults. -- [ ] Promote targeted x64 runs into default or CI-equivalent coverage once they - are stable on available runners. -- [x] Prevent podman-backed cross tests from hitting registries during normal - execution by using `--pull=never`; test images must be prepared explicitly. diff --git a/doc/api-migration.md b/doc/api-migration.md @@ -1,304 +0,0 @@ -# Public API migration - -The single `<cfree.h>` header has been replaced by a component-split -public surface under `<cfree/...>`. Headers in `include/` are the new -contract — internals (`src/`, `lang/`, `driver/`, `test/`) must be -rewritten to match. **No compat shims.** - -## New header layout - -| Header | What it covers | -| --- | --- | -| `cfree/core.h` | `CfreeCompiler`, `CfreeContext`, `CfreeHeap`, `CfreeDiagSink`, `CfreeWriter`, `CfreeFileIO`, `CfreeMetrics`, `CfreeTarget`, `CfreeStatus`, `CfreeIterResult`, `CfreeSrcLoc`, `CfreeBytes`, `CfreeSymBind/Kind`, `CfreeSym`, lifecycle, `cfree_writer_mem`. | -| `cfree/source.h` | `cfree_source_add_*`, `CfreeSourceFile`. | -| `cfree/support/arena.h` | Public arena (used by frontends and link script parser). | -| `cfree/support/hashmap.h` | Public hashmap. | -| `cfree/objmodel.h` | Format-neutral object types: `CfreeObjSection/Symbol/Group`, `CfreeObjSecInfo`, `CfreeObjSymInfo`, `CfreeObjReloc`, `CfreeRelocKind`. | -| `cfree/objbuild.h` | `CfreeObjBuilder` API: `cfree_obj_builder_*`. | -| `cfree/object.h` | `CfreeObjFile` reader: `cfree_obj_open` etc. | -| `cfree/compile.h` | `cfree_compile_c_obj{,_emit}`, `_asm_`, `_source_`. `CfreeCCompileOptions`, `CfreeAsmCompileOptions`, `CfreeFrontendCompileOptions`. `CfreeLanguage`, `CfreeSourceInput`, `cfree_register_frontend`, dep iter. | -| `cfree/link.h` | `CfreeExeLinkOptions`, `CfreeSharedLinkOptions`, `CfreeJitLinkOptions`, `CfreeLinkScript`, parse helper. Takes `CfreeJitHost` for JIT link. | -| `cfree/jit.h` | `CfreeJit*`, `CfreeJitHost { execmem, tls }`, image inspection. | -| `cfree/dbg.h` | `CfreeJitSession`, `CfreeDbgHost { os }`, breakpoints/resume. | -| `cfree/emu.h` | `cfree_emu_run/new/step/lookup/free`. | -| `cfree/dwarf.h` | `CfreeDebugInfo`, `cfree_dwarf_open/free`, query API. `loc_read` takes a memory-reader callback, not a JIT session. | -| `cfree/arch.h` | `CfreeArchReg`, `CfreeUnwindFrame`, register name/index helpers. | -| `cfree/archive.h` | `cfree_ar_*` over `CfreeBytes` + `CfreeContext`. | -| `cfree/disasm.h` | `cfree_disasm_iter_new` over a `CfreeDisasmContext`, `cfree_disasm_obj`. | -| `cfree/frontend.h` | Frontend convenience: includes `cg.h`, `source.h`, `support/arena.h`; declares `cfree_frontend_run`, metrics bridge, fatal helpers. | -| `cfree/cg.h` | Codegen API. Includes `cfree/core.h` + `cfree/objbuild.h`. | - -`<cfree.h>` no longer exists. Every TU must include only what it needs. - -## Type-level renames and reshapes - -### `CfreeEnv` → `CfreeContext` - -```c -typedef struct CfreeContext { - CfreeHeap *heap; - const CfreeFileIO *file_io; /* may be NULL */ - CfreeDiagSink *diag; - const CfreeMetrics *metrics; /* may be NULL */ - int64_t now; /* negative when host has no clock */ -} CfreeContext; -``` - -`CfreeEnv.execmem`, `.dbg_os`, `.jit_tls` are **removed**. They are now -passed as `CfreeJitHost { execmem, tls }` to `cfree_link_jit` and -`CfreeDbgHost { os }` to `cfree_jit_session_new`. - -Internal: `Compiler.env` becomes `Compiler.ctx` (type `const -CfreeContext*`). `CfreeContext cfree_compiler_context(CfreeCompiler*)` -is the public accessor that returns a value copy. - -### `CfreeBytesInput` → split - -The old single shape carried `name + data + len + lang`. It is now two: - -```c -typedef struct CfreeBytes { /* used everywhere except source compile */ - const char *name; - const uint8_t *data; - size_t len; -} CfreeBytes; - -typedef struct CfreeSourceInput { /* used by cfree_frontend_compile */ - CfreeBytes bytes; - CfreeLanguage lang; -} CfreeSourceInput; -``` - -Linker archive input is now `CfreeLinkArchiveInput { bytes + flags... }`, -not `CfreeBytesInputArchive`. - -### `CfreeCompileOptions` → split per language - -```c -typedef struct CfreeCodeOptions { - int opt_level, debug_info; - uint64_t epoch; - const CfreePathPrefixMap *path_map; - uint32_t npath_map; -} CfreeCodeOptions; - -typedef struct CfreePreprocessOptions { - const char *const *include_dirs; uint32_t ninclude_dirs; - const char *const *system_include_dirs; uint32_t nsystem_include_dirs; - const CfreeDefine *defines; uint32_t ndefines; - const char *const *undefines; uint32_t nundefines; -} CfreePreprocessOptions; - -typedef struct CfreeDiagnosticOptions { - int warnings_are_errors; - uint32_t max_errors; -} CfreeDiagnosticOptions; - -typedef struct CfreeCCompileOptions { - CfreeCodeOptions code; - CfreePreprocessOptions preprocess; - CfreeDiagnosticOptions diagnostics; -} CfreeCCompileOptions; - -typedef struct CfreeAsmCompileOptions { - CfreeCodeOptions code; - CfreeDiagnosticOptions diagnostics; -} CfreeAsmCompileOptions; - -typedef struct CfreeFrontendCompileOptions { - CfreeCodeOptions code; - CfreeDiagnosticOptions diagnostics; - const void *language_options; -} CfreeFrontendCompileOptions; -``` - -`cfree_compile_obj` → `cfree_compile_c_obj`, `cfree_compile_asm_obj`, -registered frontends are driven through `cfree_frontend_new`, -`cfree_frontend_compile`, and `cfree_frontend_free`. The old -`cfree_compile_source_obj{,_emit}` convenience entrypoints were removed. -Frontend hook signature is now -`CfreeStatus (*)(CfreeFrontendState*, const CfreeFrontendCompileOptions*, const CfreeSourceInput*, CfreeObjBuilder*)`. - -### Status-returning APIs - -Every entry that used to return `int` (0 ok, nonzero error) or a pointer -(NULL on failure) returns `CfreeStatus` and writes the result to an out -parameter. Examples: - -| Old | New | -| --- | --- | -| `CfreeCompiler* cfree_compiler_new(t,e)` | `CfreeStatus cfree_compiler_new(t, ctx, CfreeCompiler **out)` | -| `CfreeWriter* cfree_writer_mem(h)` | `CfreeStatus cfree_writer_mem(h, CfreeWriter **out)` | -| `CfreeArena* cfree_arena_new(h, blk)` | `CfreeStatus cfree_arena_new(h, blk, CfreeArena **out)` | -| `CfreeObjFile* cfree_obj_open(env, in)` | `CfreeStatus cfree_obj_open(ctx, bytes, CfreeObjFile **out)` | -| `void cfree_obj_close` | `void cfree_obj_free` | -| `CfreeObjSecInfo cfree_obj_section(o,i)` | `CfreeStatus cfree_obj_section(o, i, CfreeObjSecInfo *out)` | -| `const u8* cfree_obj_section_data(o,i,*)` | `CfreeStatus cfree_obj_section_data(o, i, const uint8_t **out, size_t *len_out)` | -| `CfreeObjSymIter* cfree_obj_symiter_new` | `CfreeStatus cfree_obj_symiter_new(file, CfreeObjSymIter **out)`; iterator next returns `CfreeIterResult`. | -| `int cfree_obj_symiter_next(it, *out)` | `CfreeIterResult cfree_obj_symiter_next(it, CfreeObjSymInfo *out)` | -| `CfreeDebugInfo* cfree_dwarf_open(c,o)` | `CfreeStatus cfree_dwarf_open(ctx, obj, CfreeDebugInfo **out)` | -| `void cfree_dwarf_close` | `void cfree_dwarf_free` | -| `int cfree_dwarf_*` (queries) | `CfreeStatus cfree_dwarf_*` (queries; semantics carried by status enum) | -| `CfreeArIter* cfree_ar_iter_init(it,b)` | `CfreeStatus cfree_ar_iter_new(ctx, bytes, CfreeArIter **out)`; iterator next returns `CfreeIterResult`. | -| `CfreeJit* cfree_link_jit(...)` | `CfreeStatus cfree_link_jit(c, opts, host, CfreeJit **out_jit)` | -| `CfreeJitSession* cfree_jit_session_new` | `CfreeStatus cfree_jit_session_new(jit, dbghost, CfreeJitSession **out)` | -| `int cfree_jit_session_*` | `CfreeStatus cfree_jit_session_*` | -| `CfreeDisasmIter* cfree_disasm_iter_new` | `CfreeStatus cfree_disasm_iter_new(const CfreeDisasmContext*, bytes, len, vaddr, const CfreeObjFile* annot, CfreeDisasmIter **out)`. Iterator next returns `CfreeIterResult`. | -| `int cfree_obj_disasm` | `CfreeStatus cfree_disasm_obj(ctx, objfile, w)` and `cfree_disasm_obj_bytes(ctx, bytes, w)`. | -| `int cfree_register_frontend` | `CfreeStatus cfree_register_frontend` | -| `int cfree_link_script_parse(c, t, l, *)` | `CfreeStatus cfree_link_script_parse(const CfreeContext*, t, l, CfreeLinkScript **out)`; pair-free signature `cfree_link_script_free(const CfreeContext*, CfreeLinkScript*)`. | -| `u32 cfree_source_add_file(c,p,sys)` | `CfreeStatus cfree_source_add_file(c, p, sys, uint32_t *id_out)` (analogous for memory/builtin/include/file). | -| `int cfree_arch_register_index/at` | `CfreeStatus` variants. | - -`CfreeWriter` vtable now returns `CfreeStatus` on `write` and `seek`, -exposes `status` (not `error`); the dispatch helpers in -`cfree/core.h` return `CfreeStatus`. - -`CfreeFileIO.read_all` and `.open_writer` now return `CfreeStatus` and -take an out-parameter (already declared in the new header). - -`CfreeDbgOs` vtable methods that used to return `int` now return -`CfreeStatus` (e.g. `thread_start`, `event_new` takes `void **event_out`). - -### `cfree_pipeline_*` is gone - -The driver synthesizes its own thin orchestrator (one `CfreeCompiler` + -the call sequence). All `cfree_pipeline_*` call sites in `driver/` must -inline the equivalent compose: build compiler → compile_c/asm → keep -builder live → link. - -### `CfreeJit` linker host - -`cfree_link_jit` now requires a `const CfreeJitHost*` and writes the -result through `CfreeJit **out_jit`. The host bundles `execmem` + `tls` -that used to live on `CfreeEnv`. Drivers construct one per build. - -### Object-builder public API - -`cfree/objbuild.h` exposes the format-neutral build API atop the -internal `obj_*`. The public surface uses `CfreeSym` (interned through -the compiler) for section/symbol names, `CfreeObjSection`/`Symbol`/`Group` -opaque-int handles, and `CfreeRelocKind { arch, obj_fmt, code }` for -relocations. The implementation lives in `src/api/objbuilder.c` and is a -thin adapter around `src/obj/obj.h`. Section indices wire through 1:1; -the public API uses `CFREE_SECTION_NONE` = `UINT32_MAX` while the -internal sentinel is `OBJ_SEC_NONE = 0` — convert at the boundary. - -### Object-file public reader - -`CfreeObjFile` is the public read handle. It can be: - - - opened from bytes via `cfree_obj_open(const CfreeContext*, const CfreeBytes*, CfreeObjFile **out)`, - - obtained for inspection from a `CfreeJit` via `cfree_jit_view(jit)` - (the returned `CfreeObjFile *` is non-owned). - -Internally the reader keeps a borrowed `ObjBuilder*` (so symbol/reloc -iteration reuses the existing read path). The public API never exposes -the internal handle; everything is funnelled through `CfreeObjFmt`, -`CfreeObjSecInfo`, `CfreeObjSymInfo`, `CfreeObjReloc`. - -### Source registration - -`source_add_*` internal functions now return `CfreeStatus` and write the -new file id to an out parameter (the public API requires this; the -internal callers are easier to update at the same time). The public -`CfreeSourceFile` is unchanged in shape. - -### Arena public type - -`cfree_arena_new` returns `CfreeStatus`. Callers receive the arena -through an out pointer. The macros (`cfree_arena_new_obj`, etc.) are -unchanged. - -### Per-component status codes - -The full enum: - -```c -typedef enum CfreeStatus { - CFREE_OK = 0, CFREE_ERR, CFREE_NOMEM, CFREE_INVALID, CFREE_UNSUPPORTED, - CFREE_MALFORMED, CFREE_IO, CFREE_NOT_FOUND, CFREE_AMBIGUOUS, -} CfreeStatus; -``` - -Pick the most specific one available. Old return semantics map as: -- bad argument → `CFREE_INVALID` -- allocation failure → `CFREE_NOMEM` -- input not found in DWARF / link script / archive → `CFREE_NOT_FOUND` -- ambiguous DWARF line resolution → `CFREE_AMBIGUOUS` -- malformed bytes (bad magic, truncated input) → `CFREE_MALFORMED` -- IO error from a `CfreeWriter` or `CfreeFileIO` → `CFREE_IO` -- generic compile failure with diagnostics already emitted → `CFREE_ERR` -- unsupported feature / arch → `CFREE_UNSUPPORTED` - -## Translation rubric for call-site updates - -1. **Headers** — replace `#include <cfree.h>` with the specific - `<cfree/X.h>` headers actually needed. Frontends include - `<cfree/frontend.h>`. Drivers compose what they need. Internals - include `src/...` and the relevant `cfree/X.h`. - -2. **`CfreeEnv` → `CfreeContext`** — drop `execmem/dbg_os/jit_tls` - fields at construction sites; pass them to the JIT/dbg hosts later. - Internal `Compiler.env` becomes `Compiler.ctx`. - -3. **`CfreeBytesInput` for non-source uses** → `CfreeBytes`. Drop the - `lang` field. For source compile entries, build a - `CfreeSourceInput`. - -4. **`CfreeCompileOptions` users** — pick the right specialization: - - C → `CfreeCCompileOptions { .code, .preprocess, .diagnostics }` - - asm → `CfreeAsmCompileOptions { .code, .diagnostics }` - - registered frontend → `CfreeFrontendCompileOptions` - -5. **Return value rewrites** — for every API marked Status, do - `CfreeStatus st = cfree_X(... &out); if (st != CFREE_OK) ...`. Don't - discard non-OK statuses silently. - -6. **Iterators** — `next()` returns `CfreeIterResult` (`CFREE_ITER_ITEM`, - `_END`, `_ERROR`). Migrate `while (...next(&out))` loops to - `for (;;) { CfreeIterResult r = next(it, &out); if (r != CFREE_ITER_ITEM) break; ... }`. - -7. **JIT/dbg construction** — driver builds: - - ```c - CfreeJitHost jhost = { .execmem = &my_execmem, .tls = &my_tls }; - CfreeJit *jit; - CfreeStatus st = cfree_link_jit(c, &opts, &jhost, &jit); - - CfreeDbgHost dhost = { .os = &my_dbg_os }; - CfreeJitSession *sess; - st = cfree_jit_session_new(jit, &dhost, &sess); - ``` - -8. **DWARF loc read** — replace `cfree_jit_session_*` based reads with - a small `CfreeDwarfReadMemFn` adapter that calls - `cfree_jit_session_read_mem` on a captured session. The DWARF API no - longer pulls in `cfree/dbg.h`. - -9. **`cfree_pipeline_*` call sites** — replaced with explicit - `cfree_compiler_new` + `cfree_compile_c_obj` (etc.) + `cfree_link_*` - sequences. The driver carries the resulting compiler/builder - ownership directly. - -10. **Linker script** — `cfree_link_script_parse(ctx, txt, len, &out)`; - free with `cfree_link_script_free(ctx, out)`. - -## Internal aliases (src/core/core.h) - -`Compiler`, `Heap`, `DiagSink`, `Writer`, `Target`, `ObjBuilder`, -`ArchKind`, `OSKind`, `ObjFmt` aliases stay. Add `Context` aliasing -`CfreeContext`. Rename `Compiler.env` → `Compiler.ctx`. Update every -reader. `compiler_init` takes `const CfreeContext*`. - -## Things that **don't** change - -- `CfreeSym` is still a `uint32_t`. -- `CfreeSrcLoc` shape unchanged. -- Internal `obj_*`, `read_elf*`, `read_macho*`, `read_coff`, `read_wasm` - signatures don't move — only their public wrappers do. -- Internal `link_*`, `dwarf_*`, `mc_*`, `cg_*` keep their internal - shapes. -- The codegen public API (`cfree/cg.h`) is largely intact; only - `cfree_cg_new` and `cfree_cg_type_record_field` switch to - Status-return shapes. diff --git a/doc/builtins.md b/doc/builtins.md @@ -1,385 +0,0 @@ -# Compiler builtins used by cfree - -cfree's freestanding headers hardcode every value that's invariant under -its target assumptions, and delegate to compiler builtins for everything -that genuinely varies across targets. This file is the contract: if a -target violates an "assumption" below, the headers (and `test/smoke.c`) -will be wrong. - -## Target assumptions (hardcoded) - -- `CHAR_BIT == 8` -- `short == 16` bits, `int == 32` bits, `long long == 64` bits -- Two's complement integer representation -- `float` is IEEE 754 binary32 -- `double` is IEEE 754 binary64 - -## What genuinely varies (delegated) - -| Quantity | Why it varies | -| ------------------------- | -------------------------------------------------- | -| `char` signedness | ARM defaults unsigned, x86 signed; flippable with `-funsigned-char`. Not changeable from a header. | -| `long` width | LP64 (Unix 64-bit) makes it 64; LLP64 (Win64) and ILP32 keep it 32 | -| `long double` format | x86 80-bit, AArch64 binary128 *or* binary64, PowerPC double-double, MSVC binary64 | -| `FLT_ROUNDS` | Runtime rounding mode (function call required) | -| `FLT_EVAL_METHOD` | x87 vs SSE vs embedded toolchains differ | -| `intptr_t` width | 32 vs 64 bits | -| `size_t`, `ptrdiff_t` | Track pointer width | -| `wchar_t` | 16-bit on Windows, 32-bit on Unix; signedness varies | -| `intmax_t` literal type | `long` on LP64, `long long` on LLP64 | -| `int_fast{N}_t` widths | Each target picks its own "fast" width | -| `va_list` and varargs ABI | Call convention is target-defined | -| `max_align_t` | Track widest scalar alignment | - ---- - -## Builtins - -Grouped by header, every `__builtin_*` or `__*__` we still depend on. - -### `<float.h>` -- `__builtin_flt_rounds()` — runtime rounding mode → `FLT_ROUNDS` -- `__FLT_EVAL_METHOD__` -- `__DECIMAL_DIG__` -- `__LDBL_HAS_DENORM__` → `LDBL_HAS_SUBNORM` -- `__LDBL_MANT_DIG__`, `__LDBL_DECIMAL_DIG__`, `__LDBL_DIG__` -- `__LDBL_MIN_EXP__`, `__LDBL_MIN_10_EXP__` -- `__LDBL_MAX_EXP__`, `__LDBL_MAX_10_EXP__` -- `__LDBL_MAX__`, `__LDBL_MIN__`, `__LDBL_EPSILON__` -- `__LDBL_DENORM_MIN__` → `LDBL_TRUE_MIN` - -### `<limits.h>` -- `__LONG_MAX__` → `LONG_MAX` (and derived `LONG_MIN`, `ULONG_MAX`) -- `__CHAR_UNSIGNED__` — defined ⇔ plain `char` is unsigned - -### `<stddef.h>` -- `__PTRDIFF_TYPE__` → `ptrdiff_t` -- `__SIZE_TYPE__` → `size_t` -- `__WCHAR_TYPE__` → `wchar_t` (C only; in C++ it's a keyword) -- `__builtin_offsetof(t, m)` → `offsetof` - -### `<stdint.h>` -Types (aliases vary by data model even when limits don't): -- `__INT{8,16,32,64}_TYPE__`, `__UINT{N}_TYPE__` -- `__INT_LEAST{N}_TYPE__`, `__UINT_LEAST{N}_TYPE__` -- `__INT_FAST{N}_TYPE__`, `__UINT_FAST{N}_TYPE__` -- `__INTPTR_TYPE__`, `__UINTPTR_TYPE__` -- `__INTMAX_TYPE__`, `__UINTMAX_TYPE__` - -Limits that are not pinned by the target assumptions: -- `__INT_FAST{N}_MAX__`, `__UINT_FAST{N}_MAX__` -- `__INTPTR_MAX__`, `__UINTPTR_MAX__` -- `__INTMAX_MAX__`, `__UINTMAX_MAX__` -- `__PTRDIFF_MAX__` -- `__SIZE_MAX__` -- `__WCHAR_MAX__`, `__WCHAR_MIN__` -- `__WINT_MAX__`, `__WINT_MIN__` -- `__SIG_ATOMIC_MAX__`, `__SIG_ATOMIC_MIN__` - -64-bit and intmax constant macros (literal suffix tracks the alias): -- `__INT64_C(c)`, `__UINT64_C(c)` -- `__INTMAX_C(c)`, `__UINTMAX_C(c)` - -### `<stdarg.h>` -Entirely compiler-supplied — varargs ABI is target-defined: -- `__builtin_va_list` (type) -- `__builtin_va_start`, `__builtin_va_arg`, `__builtin_va_end`, `__builtin_va_copy` - -### `<stdatomic.h>` -Atomic codegen, lock-free shape, and fence semantics are target-defined. -The `__atomic_*` family must operate transparently on `_Atomic`-qualified -pointers (no separate variant for atomic-typed args). - -Memory-order constants (values for the `memory_order` enum): -- `__ATOMIC_RELAXED`, `__ATOMIC_CONSUME`, `__ATOMIC_ACQUIRE`, - `__ATOMIC_RELEASE`, `__ATOMIC_ACQ_REL`, `__ATOMIC_SEQ_CST` - -Lock-free shape (per-type, value 0/1/2 per C11 7.17.5): -- `__ATOMIC_{BOOL,CHAR,CHAR16_T,CHAR32_T,WCHAR_T,SHORT,INT,LONG,LLONG,POINTER}_LOCK_FREE` - -Types for the C11 char/wide aliases (also delegated for `<stddef.h>`): -- `__CHAR16_TYPE__`, `__CHAR32_TYPE__`, `__WCHAR_TYPE__` - -Operations (signatures match the GCC `__atomic` builtin family): -- `__atomic_load_n(ptr, order)` -- `__atomic_store_n(ptr, val, order)` -- `__atomic_exchange_n(ptr, val, order)` -- `__atomic_compare_exchange_n(ptr, expected, desired, weak, succ, fail)` -- `__atomic_fetch_add`, `__atomic_fetch_sub`, `__atomic_fetch_or`, - `__atomic_fetch_xor`, `__atomic_fetch_and` — `(ptr, val, order)` -- `__atomic_thread_fence(order)`, `__atomic_signal_fence(order)` -- `__atomic_is_lock_free(size, ptr)` -- `__atomic_test_and_set(ptr, order)`, `__atomic_clear(ptr, order)` — for - `atomic_flag` - -### Syscalls (cfree extension) - -Declared in `<cfree/syscall.h>`. Kernel-trap primitive so libc syscall -stubs can be pure C. Numbers (`SYS_*`) are libc's responsibility — -cfree only provides the instruction. All args and result are `long`; -pointers/sizes/fds get cast at the call site. - -- `__cfree_syscall0(nr)` … `__cfree_syscall6(nr, a0, a1, a2, a3, a4, a5)` - -Semantics: -- Result is normalized to Linux-style `-errno` on failure, non-negative - on success, on every target. On BSD/Darwin the lowering inspects the - carry/C flag and rewrites the result. -- Modeled as an opaque external call with full memory clobber plus the - target's syscall-clobber list (so the optimizer cannot move work - across the trap). -- Not available on WASM — compile-time error directs callers to WASI - imports. - -Per-target lowering: - -| Target | Instr | Nr reg | Args | Result | Error | -| --------------- | ----------------- | ------ | -------------------------- | ------ | -------- | -| Linux x86_64 | `syscall` | rax | rdi, rsi, rdx, r10, r8, r9 | rax | rax < 0 | -| Linux i386 | `int 0x80` | eax | ebx, ecx, edx, esi, edi, ebp | eax | eax < 0 | -| Linux aarch64 | `svc #0` | x8 | x0..x5 | x0 | x0 < 0 | -| Linux arm | `svc #0` | r7 | r0..r5 | r0 | r0 < 0 | -| Linux riscv | `ecall` | a7 | a0..a5 | a0 | a0 < 0 | -| Darwin x86_64 | `syscall` | rax (class bits already in nr) | rdi, rsi, rdx, r10, r8, r9 | rax | carry → −errno | -| Darwin aarch64 | `svc #0x80` | x16 | x0..x5 | x0 | C flag → −errno | - -i386 6-arg case (`ebp` is the frame pointer): cfree saves/restores -`ebp` around the trap. - -### Bare-metal primitives (cfree extension) - -Declared in `<cfree/baremetal.h>`. For freestanding / embedded use, so -libc and HAL code can stay pure C. All have opaque-call + -full-memory-clobber semantics so the optimizer cannot reorder loads, -stores, or other side effects across them. - -Interrupt control (the standard save/disable/restore critical-section -idiom): -- `unsigned long __cfree_irq_save(void)` — disable IRQs, return previous mask -- `void __cfree_irq_restore(unsigned long prev)` -- `void __cfree_irq_disable(void)`, `void __cfree_irq_enable(void)` - -Lowerings: x86 `cli`/`sti` + `pushf`/`popf`; Cortex-A/R `cpsid i`/`cpsie i` -+ CPSR; Cortex-M `cpsid i`/`cpsie i` + PRIMASK (selected by -`__ARM_ARCH_*` profile macros); aarch64 `msr daifset/daifclr, #2` + -`mrs daif`; RISC-V `csrr{ci,si} mstatus, 8`. - -CPU memory barriers — distinct from `__atomic_thread_fence`. C11 fences -provide ordering for the C abstract machine; these emit the specific -CPU barriers required for DMA-coherent device memory, MMU/TLB -reconfiguration, and self-modifying / freshly-loaded code. - -```c -typedef enum { - __CFREE_BARRIER_FULL, // sy - __CFREE_BARRIER_INNER, // ish - __CFREE_BARRIER_INNER_STORE, // ishst - __CFREE_BARRIER_OUTER, // osh - __CFREE_BARRIER_OUTER_STORE, // oshst - __CFREE_BARRIER_NON_SHARE, // nsh -} __cfree_barrier_scope; - -void __cfree_dmb(__cfree_barrier_scope); // ordering only -void __cfree_dsb(__cfree_barrier_scope); // ordering + completion -void __cfree_isb(void); // pipeline flush after sysreg / MMU change -``` - -Lowerings: arm/aarch64 `dmb/dsb/isb <scope>`; x86 `mfence`/`lfence`/`sfence` -(scope ignored — TSO collapses the cases) and `isb` is a no-op (x86 -self-snoops); RISC-V `fence rw,rw` and `fence.i`. WASM: compile-time error. - -Cache maintenance (range-based; cfree reads `CTR`/`CTR_EL0` once at -startup for the line size and emits a loop): -- `void __cfree_dcache_clean(const void *, unsigned long)` — write-back -- `void __cfree_dcache_invalidate(void *, unsigned long)` -- `void __cfree_dcache_clean_invalidate(void *, unsigned long)` -- `void __cfree_icache_invalidate(const void *, unsigned long)` - -Lowerings: aarch64 `dc {cvac,ivac,civac}` + `ic ivau` loops; arm v7+ -equivalents via CP15. x86: no-ops (cache-coherent ICache included). -RISC-V: Zicbom / Zicboz instructions when those extensions are present, -otherwise a compile-time error. - -Hints: -- `void __cfree_nop(void)` -- `void __cfree_yield(void)` — spin-loop hint; arm `yield`, x86 `pause`, - RISC-V `pause` -- `void __cfree_wfi(void)` — sleep until next interrupt; arm/aarch64 - `wfi`, x86 `hlt`, RISC-V `wfi`. All three are privileged, which is - fine for bare-metal. Compile-time error on WASM. -- `void __cfree_wfe(void)`, `void __cfree_sev(void)` — arm/aarch64 - only; compile-time error elsewhere. The inter-core event-flag - abstraction (SEV sets, WFE waits, exclusive-monitor release also - sets) does not generalize: x86 MONITOR/MWAIT is address-watch and - privileged-extension; RISC-V has no base-ISA equivalent. Use - `__cfree_yield` + `__cfree_wfi` for portable spin/idle loops. - -System-register access (`mrs`/`msr`, `csrr`/`csrw`, `rdmsr`/`wrmsr`, -MMU/cache config, etc.) is **not** provided as a builtin. Callers use -extended inline asm directly. Rationale: register names and privilege -rules vary per ISA generation; the call sites are arch-specific -already; abstracting adds churn without removing platform code. - ---- - -## `libcfree_rt.a` — runtime support library - -The codegen emits calls to symbols the user can't reasonably supply. cfree -ships them all in a single archive: integer/float/atomic helpers *and* the -`mem*` family the codegen lowers struct copies and aggregate inits to. - -Naming follows the libgcc / compiler-rt convention (`{op}{mode}{N}`, where -mode is `qi/hi/si/di/ti/sf/df/tf` for 1/2/4/8/16-byte int and 32/64/128-bit -float). All `mem*` are weak so a user libc wins. - -### Mem intrinsics (always shipped) -- `memcpy`, `memmove`, `memset`, `memcmp` - -### Integer helpers -Always: -- Div/mod 64-bit: `__divdi3`, `__udivdi3`, `__moddi3`, `__umoddi3`, `__divmoddi4`, `__udivmoddi4` -- Count/bits: `__clzsi2`, `__clzdi2`, `__ctzsi2`, `__ctzdi2`, `__ffsdi2`, `__popcountsi2`, `__popcountdi2`, `__paritysi2`, `__paritydi2`, `__bswapsi2`, `__bswapdi2` -- Compare: `__cmpdi2`, `__ucmpdi2` -- Negate/abs: `__negdi2`, `__absvdi2` - -64-bit targets only (128-bit `__int128` support): -- `__divti3`, `__udivti3`, `__modti3`, `__umodti3`, `__divmodti4`, `__udivmodti4` -- `__ashlti3`, `__lshrti3`, `__ashrti3`, `__multi3`, `__negti2`, `__clzti2`, `__ctzti2` - -32-bit targets only (no native 64-bit ops): -- `__muldi3`, `__ashldi3`, `__lshrdi3`, `__ashrdi3` - -### Soft-float (only on FPU-less targets — RV{32,64}I, ARM `-mfloat-abi=soft`, WASM-no-simd) -- Arithmetic `sf`/`df`/`tf`: `__add`, `__sub`, `__mul`, `__div`, `__neg` - → e.g. `__addsf3`, `__divdf3`, `__multf3` -- Int → float: `__float{,un}{si,di,ti}{sf,df,tf}` (e.g. `__floatdisf`, `__floatunsidf`) -- Float → int: `__fix{,uns}{sf,df,tf}{si,di,ti}` (e.g. `__fixdfdi`, `__fixunssfsi`) -- Float → float: `__extendsfdf2`, `__extendsftf2`, `__extenddftf2`, `__truncdfsf2`, `__trunctfsf2`, `__trunctfdf2` -- Compare: `__eq`, `__ne`, `__lt`, `__le`, `__gt`, `__ge`, `__unord` × `sf2`/`df2`/`tf2` - -### Nonlocal jumps + stackful coroutines (per-arch, always shipped) -`<setjmp.h>` and `<cfree/coro.h>` share one per-target context payload -(256 bytes, 16-byte aligned): callee-saved GPRs + callee-saved FPRs -+ sp + return address. `jmp_buf` and `coro_ctx` are both opaque -typedefs over that payload; the runtime reinterprets them as the -per-arch struct. - -- `setjmp`, `longjmp` — `<setjmp.h>` (C11 7.13). cfree extension: - this header is *not* in the C11 freestanding subset. -- `coro_init`, `coro_resume`, `coro_yield`, `coro_self` — public - asymmetric API in `<cfree/coro.h>`. Resume drives a coroutine - forward; yield suspends back to the most recent resumer; resumes - nest like function calls. Status (`CORO_INIT` / `RUNNING` / - `SUSPENDED` / `DEAD`) is tracked on the `coro_t` and propagates - through `coro_resume`'s result. -- `__cfree_coro_switch(from, to, value) -> uintptr_t` — the symmetric - primitive. `coro_resume` / `coro_yield` are built on it; setjmp = - save+return-0, longjmp = restore+deliver-val. Exposed (with the - `__cfree_` prefix to signal "compiler-builtin-style") for - schedulers that don't fit the asymmetric resume-chain model. -- `__cfree_coro_ctx_init`, `__cfree_coro_trampoline` — internal, - used only by `lib/coro/coro.c`'s asymmetric layer. - -Implementation: one master `.c` per arch under `lib/coro/` (file-scope -asm + tiny C `__cfree_coro_ctx_init`), plus one arch-agnostic -`coro/coro.c` for the public asymmetric layer. ARM has two arch -masters: `arm32.c` (Thumb-2, ARMv7+, may use VFP `d8-d15`) and -`arm32_thumb1.c` (ARMv6-M, no IT blocks / no VFP / data-processing -limited to r0-r7). Not provided for: WASM (would need an -Asyncify-fiber port). - -### Atomic fallbacks (only when target lacks native atomics for that width) -- Generic: `__atomic_load`, `__atomic_store`, `__atomic_exchange`, `__atomic_compare_exchange` -- Sized N ∈ {1,2,4,8,16}: `__atomic_load_N`, `__atomic_store_N`, `__atomic_exchange_N`, `__atomic_compare_exchange_N`, `__atomic_fetch_{add,sub,and,or,xor,nand}_N` - -### Architecture-specific aliases - -**ARM AAPCS / AEABI** (32-bit ARM only — these are aliases the AEABI ABI mandates): -- Int div/mod: `__aeabi_idiv`, `__aeabi_uidiv`, `__aeabi_idivmod`, `__aeabi_uidivmod`, `__aeabi_ldivmod`, `__aeabi_uldivmod` -- 64-bit shift/mul: `__aeabi_llsl`, `__aeabi_llsr`, `__aeabi_lasr`, `__aeabi_lmul` -- Soft-float arith: `__aeabi_{f,d}{add,sub,mul,div,neg}`, `__aeabi_{f,d}rsub` -- Soft-float convert: `__aeabi_f2iz`, `__aeabi_f2uiz`, `__aeabi_f2lz`, `__aeabi_f2ulz`, `__aeabi_d2iz`, `__aeabi_d2uiz`, `__aeabi_d2lz`, `__aeabi_d2ulz`, `__aeabi_i2f`, `__aeabi_ui2f`, `__aeabi_l2f`, `__aeabi_ul2f`, `__aeabi_i2d`, `__aeabi_ui2d`, `__aeabi_l2d`, `__aeabi_ul2d`, `__aeabi_f2d`, `__aeabi_d2f` -- Soft-float compare: `__aeabi_fcmp{eq,lt,le,gt,ge,un}`, `__aeabi_dcmp{eq,lt,le,gt,ge,un}` -- Mem variants (size-specialized): `__aeabi_memcpy`, `__aeabi_memcpy{4,8}`, `__aeabi_memmove`, `__aeabi_memmove{4,8}`, `__aeabi_memset`, `__aeabi_memset{4,8}`, `__aeabi_memclr`, `__aeabi_memclr{4,8}` - -**RISC-V** (only with `-msave-restore`, used by RV32E/embedded code-size builds): -- `__riscv_save_{0..12}`, `__riscv_restore_{0..12}` - -**x86 / x86_64**: no architecture-specific aliases; uses the generic libgcc names above. - -**WASM**: uses generic names; `memcpy`/`memset`/`memmove` may lower to `memory.copy` / `memory.fill` instructions instead of calls. - ---- - -## Target-identification macros - -cfree predefines a small, stable set of macros so headers and user code -can branch on architecture, OS, object format, and ABI without parsing -target triples. Compatible-by-design with the GCC/Clang names — code -written against `__x86_64__` / `__BYTE_ORDER__` / `__LP64__` works -unchanged. - -### Compiler identification -- `__cfree__` — defined to `1` -- `__cfree_major__`, `__cfree_minor__`, `__cfree_patchlevel__` -- `__STDC__ == 1`, `__STDC_VERSION__ == 201112L` -- `__STDC_HOSTED__ == 0` (cfree is freestanding-only) -- `__STDC_NO_COMPLEX__`, `__STDC_NO_THREADS__`, `__STDC_NO_VLA__` defined -- `__STDC_NO_ATOMICS__` *not* defined (cfree implements `<stdatomic.h>`) - -### Architecture (exactly one defined) -- `__i386__` — 32-bit x86 -- `__x86_64__` (and `__amd64__`) — 64-bit x86 -- `__arm__` — 32-bit ARM -- `__aarch64__` — 64-bit ARM -- `__riscv` — RISC-V (any width); paired with `__riscv_xlen` ∈ {32, 64} -- `__wasm__` — WebAssembly; paired with `__wasm32__` or `__wasm64__` - -### Pointer width / data model -- `__SIZEOF_POINTER__`, `__SIZEOF_LONG__`, `__SIZEOF_SIZE_T__`, `__SIZEOF_PTRDIFF_T__`, `__SIZEOF_WCHAR_T__`, `__SIZEOF_INT__`, `__SIZEOF_LONG_LONG__`, `__SIZEOF_FLOAT__`, `__SIZEOF_DOUBLE__`, `__SIZEOF_LONG_DOUBLE__` -- One of: `__LP64__` / `_LP64` (Unix 64), `__ILP32__` (32-bit), or neither (LLP64 — Win64) - -### Endianness -- `__BYTE_ORDER__` set to one of `__ORDER_LITTLE_ENDIAN__` / `__ORDER_BIG_ENDIAN__` -- `__ORDER_LITTLE_ENDIAN__ == 1234`, `__ORDER_BIG_ENDIAN__ == 4321` (values match GCC) - -### OS / platform (zero or one defined; freestanding bare-metal defines none) -- `__linux__` — Linux ABI -- `__APPLE__` and `__MACH__` — Darwin / macOS -- `_WIN32` (always on Windows), plus `_WIN64` on 64-bit Windows - -### Object format (exactly one defined per output) -- `__ELF__` — ELF (Linux, *BSD, bare-metal Unix-ish) -- `__MACH__` — Mach-O (Darwin) -- `_WIN32` doubles as the PE/COFF marker (matches MSVC/MinGW convention) - -### ARM-specific (defined only when `__arm__` or `__aarch64__`) -- `__ARM_ARCH` — integer arch version (7, 8, …); plus profile-specific `__ARM_ARCH_{7A,7R,7M,8A,…}__` -- `__ARM_EABI__` — defined on AAPCS/AEABI targets (always, for cfree's ARM32) -- `__ARM_PCS` (base PCS) or `__ARM_PCS_VFP` (hard-float PCS) -- `__ARM_FP` — bitmask of supported FP widths (0x4=fp32, 0x8=fp64); undefined on soft-float -- `__SOFTFP__` — defined ⇔ `-mfloat-abi=soft` (no FPU instructions, soft-float ABI) -- `__ARM_NEON` — defined ⇔ NEON SIMD available - -### RISC-V-specific (defined only when `__riscv`) -- `__riscv_xlen` ∈ {32, 64} -- `__riscv_flen` ∈ {0, 32, 64} — widest hardware FP register (0 ⇒ soft-float) -- Extension flags (defined ⇔ extension is on): `__riscv_mul`, `__riscv_div`, `__riscv_atomic`, `__riscv_compressed`, `__riscv_fdiv`, `__riscv_fsqrt` -- ABI: `__riscv_float_abi_soft`, `__riscv_float_abi_single`, `__riscv_float_abi_double` (exactly one) - -### x86-specific (defined only when `__i386__` or `__x86_64__`) -- Feature flags follow GCC names, defined ⇔ enabled at the chosen `-march`: `__SSE__`, `__SSE2__`, `__SSE3__`, `__SSSE3__`, `__SSE4_1__`, `__SSE4_2__`, `__AVX__`, `__AVX2__`, `__BMI__`, `__BMI2__`, `__POPCNT__`, `__FMA__` - -### WASM-specific (defined only when `__wasm__`) -- `__wasm_simd128__` — defined ⇔ SIMD proposal enabled -- `__wasm_bulk_memory__` — defined ⇔ `memory.copy`/`memory.fill` available (gates the lowering noted under mem intrinsics) - ---- - -## Discovery - -To enumerate what a compiler predefines for the current target: - -```sh -cc -dM -E -x c /dev/null | sort -``` diff --git a/doc/cg-api-status.md b/doc/cg-api-status.md @@ -1,104 +0,0 @@ -# CG API And Toy Language Status - -## Current Status - -The public CG API in `src/api/cg.c` has concrete implementations for the -planned value, selector, control-flow, type, data, intrinsic, atomic, variadic, -and inline-asm entry points. - -Value categories are explicit: - -- `cfree_cg_push_symbol` and `cfree_cg_push_bytes` push pointer/address rvalues. -- `cfree_cg_indirect` converts a non-void pointer rvalue to a pointee lvalue. -- `cfree_cg_load` converts lvalues to rvalues. -- `cfree_cg_addr` converts lvalues to pointer rvalues. -- `cfree_cg_store` is statement-like: `[lvalue, value] -> []`. -- `cfree_cg_dup` preserves value category and gives rvalue registers independent - ownership. - -Selectors are lvalue-producing: - -- `cfree_cg_index` selects an element lvalue. -- `cfree_cg_field` selects a record field lvalue. -- Callers use `cfree_cg_addr` after a selector when they need an address. - -Control-flow and calls: - -- Public scopes are stack-disciplined. `scope_end` must be LIFO and stale or - inactive handles are rejected. -- Expression-valued scopes reconcile fallthrough and break results through a - canonical result slot. -- Public inline helpers cover the common `if` / `else` pattern. -- `cfree_cg_tail_call` is a terminator and pushes no result. - -Types: - -- `CFREE_CG_BUILTIN_VA_LIST` and `CfreeCgBuiltinTypes.va_list` expose the - target ABI `va_list` type. -- Pointer, array, qualified, and function type constructors intern by shape. -- Aliases and nominal record/enum constructors remain source-identity - producing. - -Toy currently supports: - -- Immutable and mutable globals, locals, parameters, function calls, recursion, - variadic functions, `va_list`, pointers, address/deref syntax, arithmetic, - comparisons, bitwise operators, shifts, unary operators, `&&`, `||`, `while`, - `break`, `continue`, `if` / `else`, and `return tail f(...)`. -- CG API coverage builtins: `typecheck()`, `byteconst()`, `alloca`, `index`, - `memset`, `memcpy`, `atomic_load`, `atomic_store`, `atomic_add`, - `atomic_sub`, `atomic_cas_ok`, `fence`, `popcount`, `ctz`, `clz`, `bswap`, - `expect`, `fieldtest()`, `target()`, `target_os()`, `va_start`, `va_arg`, - `va_end`, `va_copy`, `asm(...)`, `asm_int(...)`, `asm_imm(...)`, - `asm_mem(...)`, `asm_inout(...)`, `asm_early(...)`, `asm_memory(...)`, and - `asm_clobber(...)`. -- Lowering uses the explicit value-category API: - `push_symbol + indirect + load/store`, `push_bytes + indirect + load`, - `cfree_cg_field`, `cfree_cg_va_*`, `cfree_cg_inline_asm`, statement-like - `store`, terminator tail calls, and the public inline `if` / `else` helpers. - `asm(arch("aa64", "x64", "rv64"))` chooses a target-specific template at - compile time; an empty selected template is a no-op so unsupported inline-asm - backends can still compile the same toy source. - -Toy validation: - -- `test/toy/run.sh` supports: - - `R`: `cfree run case.toy` - - `L`: `cfree cc -c case.toy`, `cfree ld`, native execution - - `X`: opt-in Linux cross-target compile/link/execute for `aa64`, `x64`, and - `rv64` via `cfree cc -target`, `cfree ld`, and `test/lib/exec_target.sh` -- Cross-arch validation intentionally has no cross-arch JIT path. -- `test/toy/cases/19_cg_api_variadic_asm.toy` executes variadic API coverage on - non-macOS targets. On macOS/AArch64 it compiles the same variadic helper but - avoids executing it because the current AArch64 backend va_arg walker is still - AAPCS64-shaped while Apple `va_list` is a byte cursor. - -Current validation: - -- `make lib` -- `make bin` -- `make test-cg-api` -- `make test-cg-binder` -- `make test-toy` - 38 pass, 0 fail, 0 skip -- `CFREE_TEST_PATHS=X test/toy/run.sh` - 57 pass, 0 fail, 0 skip -- `make test-cg` - 1573 pass, 0 fail, 0 skip -- `test/toy/demo.toy` compiles with `cfree cc -c` - -## Plan / TODOs - -1. Add direct CG API misuse tests. - - Keep type return-value checks in `test/api/cg_type_test.c` or a sibling - API test. - - Add a focused panic-catching misuse harness for stack underflow, - stale/non-LIFO scopes, invalid field indexes/base types, invalid - `indirect`, and unsupported data relocation widths. - -2. Add toy error tests. - - Extend `test/toy/run.sh` with an error-case mode if needed. - - Add expected diagnostic-message matching. - -3. Complete `test/toy/demo.toy`. - - The demo currently covers toy syntax, globals, control flow, calls, memory - helpers, atomics, tail calls, inline asm, and public CG API builtins. - - Add a demo variadic path once macOS/AArch64 va_arg execution matches the - public ABI shape. diff --git a/doc/cg-ext.md b/doc/cg-ext.md @@ -1,618 +0,0 @@ -# Public CG Extension Plan - -Scope: extensions needed for `include/cfree/cg.h` to serve as a portable -direct codegen API for frontends other than C. This is not a plan for a stored -LLVM-like IR. `CfreeCg` remains an imperative emitter bound to a -`CfreeObjBuilder`; frontends lower their own AST/HIR/MIR directly into the API. - -This API is new enough that compatibility with the current draft is not a -constraint. Make breaking changes. One clean way of doing things. - -The target user is a language frontend with its own parser, type checker, and -high-level lowering: C, Zig, Rust-like languages, toy languages, emulators, and -system DSLs. The frontend should not include internal `src/` headers, should -not know `Type*`, `ObjSymId`, `CGTarget`, or `MCEmitter`, and should be able to -generate correct code for every backend supported by `CfreeTarget`. - -## 1. Goals - -- Preserve the direct-emission model: no public module/value/block IR object is - required. -- Focus on backend codegen coverage and correctness, not frontend ergonomics. -- Keep backend decisions in the backend: ABI classification, TLS sequences, - GOT/PLT/stubs/IAT, branch relaxation, relocation encoding, and section layout. -- Let frontends state facts that materially affect generated code: calling - convention, ABI attributes, memory access properties, atomics, volatility, - linkage, object placement, and source/debug identity. -- Keep the surface portable but not lowest-common-denominator. Unsupported - target combinations should be diagnosable from API calls. -- Keep public handles opaque/integer-sized and context-owned. No global state. -- Maintain one way to spell each codegen fact. - -## 2. Non-goals - -- A serialized IR, textual IR, pass manager, verifier over stored functions, or - reusable use-def graph. -- Language semantics above the codegen boundary. Borrow checking, comptime, - monomorphization, generics, trait dispatch, overload resolution, destructor - insertion, and safety checks belong in the frontend. -- Arbitrary source-language types. The frontend lowers them to codegen storage, - ABI, memory, and debug facts before calling CG. -- Unwind/exception handling beyond the existing setjmp/longjmp intrinsics. - Panic/throw paths must lower to explicit normal control flow plus calls, or - to noreturn runtime helpers. -- Full LTO. Direct CG may still feed the existing optimizer wrapper, but that is - an implementation detail below this public API. - -## 3. Pre-Phase-1 Shape - -The pre-Phase-1 public CG API already provided useful pieces: - -- Target context through `CfreeCompiler` / `CfreeTarget`. -- Builtin integer, float, pointer, array, function, record, enum, alias, and - qualified types. -- Symbol declarations, visibility, TLS model, object definitions, relocatable - data expressions, and direct/indirect calls. -- A value stack with lvalue/rvalue conversion, local/param slots, labels, - structured scopes, arithmetic, comparisons, conversions, intrinsics, atomics, - inline asm, and varargs. - -The largest limitation was that too many important backend facts were implicit, -C-shaped, duplicated between type and operation APIs, or unrepresentable. - -## 4. Type Model - -The type model should describe codegen storage and ABI classification, not -source-language semantics. A Rust `u32`, C `unsigned int`, Zig `u32`, and an -emulator's 32-bit guest register can all use the same codegen integer type. - -### 4.1 Integers - -Use width-only integer storage types. Signedness belongs on operations, -comparisons, conversions, and ABI extension attributes, not on the integer type. - -Recommended integer builtins: - -- `i1`/`bool` as the branch and compare-result type. -- `i8`, `i16`, `i32`, `i64`. -- `i128` to helper-lowered arithmetic and ABI handling for targets that lack - native support -- `isize`/`usize` are frontend aliases, not distinct codegen storage types. The - frontend can choose `i32` or `i64` from the target pointer size. - -Consequences: - -- Remove separate signed/unsigned integer type constructors or builtins. -- Keep signed/unsigned operation variants where semantics differ: - signed/unsigned div/rem/compare, sign/zero extension, arithmetic right shift - versus logical right shift. -- Constants are bit patterns interpreted by the operation that consumes them. - -### 4.2 Floating-Point - -Support only the floating storage types the backend can define and lower -correctly. - -Baseline: - -- `f32` -- `f64` - -Later additions should be explicit project choices: - -- `f16` / `bf16` if frontend SIMD/platform intrinsics need them. -- `f80` / `f128` only with target ABI and helper-call support. - -Floating arithmetic and comparisons still need operation-level attributes; see -section 6. - -### 4.3 Pointers - -Keep pointer types as codegen storage/ABI facts. Pointee types are useful for -load/store defaults and debug synthesis, but memory access semantics should -come from `CfreeCgMemAccess`, not from type qualifiers. - -Recommended pointer model: - -- One thin pointer type constructor: pointee type + address space. -- Address space 0 is the normal target data address space. -- No type-level nullability, restrict, readonly, volatile, or mutability. - Express these at the operation, declaration, or parameter-attribute site. -- Fat pointers are frontend-lowered aggregates. Capability pointers should wait - until a real target requires them. - -### 4.4 Aggregates and Layout - -Keep aggregate support only where the backend needs the aggregate shape for -correct codegen: - -- ABI classification of parameters and returns. -- Natural target layout for C-like records. -- Data object sizing/alignment. -- Debug synthesis when possible. - -Frontends can lower many patterns to existing codegen constructs. - -The gap to close is not richer source aggregate modeling. The useful backend -primitive is generic address arithmetic, now part of the Phase 1 contract: - -```c -/* Pops a pointer or lvalue address, pushes address + byte_offset as a pointer - * or lvalue address with the requested result type. */ -void cfree_cg_addr_offset(CfreeCg*, int64_t byte_offset, - CfreeCgTypeId result_type); -``` - -This gives frontends one way to lower non-C layouts without asking CG to -understand the source aggregate. `cfree_cg_index` remains the typed -scaled-index form for ordinary pointer/array indexing; `addr_offset` is the -byte-granular escape hatch for frontend-owned record layouts and packed/custom -field offsets. - -### 4.5 Qualifiers - -Remove C-style qualified codegen types as behavior-carrying types. - -- `const` is a frontend type-checking fact or an object/read-only declaration - fact. -- `volatile` is a memory access fact. -- `restrict` / `noalias` is a pointer parameter or memory access fact. - -If debug info needs source qualifiers, they belong in debug metadata derived -from declarations, not in backend codegen types. - -### 4.6 Type Queries - -Keep target-layout queries that frontends need for lowering: - -- Type kind. -- Size and alignment. -- Integer/float width. -- Pointer address space and pointee. -- Array element/count. -- Record field offset where CG owns natural record layout. -- Function ABI/calling-convention attributes. - -Avoid queries whose only purpose is reconstructing source-language types. - -## 5. Memory Access - -Memory semantics should have exactly one spelling: a memory access descriptor -on every operation that touches memory. Do not split behavior between type -qualifiers, lvalue flags, and special load/store variants. - -Recommended descriptor: - -```c -typedef struct CfreeCgMemAccess { - CfreeCgTypeId type; /* value type loaded/stored, or element type */ - uint32_t align; /* 0 = natural for type */ - uint32_t address_space; /* normally inherited from pointer type */ - uint32_t flags; /* VOLATILE, NONTEMPORAL, INVARIANT, etc. */ - uint32_t alias_scope; - uint32_t noalias_scope; -} CfreeCgMemAccess; -``` - -Recommended operations: - -```c -void cfree_cg_load(CfreeCg*, CfreeCgMemAccess access); -void cfree_cg_store(CfreeCg*, CfreeCgMemAccess access); -void cfree_cg_memcpy(CfreeCg*, uint64_t size, - CfreeCgMemAccess dst, CfreeCgMemAccess src); -void cfree_cg_memmove(CfreeCg*, uint64_t size, - CfreeCgMemAccess dst, CfreeCgMemAccess src); -void cfree_cg_memset(CfreeCg*, uint8_t value, uint64_t size, - CfreeCgMemAccess dst); -``` - -Consequences: - -- Remove type-level volatile behavior. -- Remove separate fixed-size aggregate memory APIs that take only size/align. -- Remove implicit load/store type inference when it can be ambiguous. The - access descriptor is the authority. -- Keep convenience constructors for common descriptors if desired, but not - alternate semantic entry points. - -Needed access facts: - -- Explicit alignment, including known under-alignment. -- Volatile load/store. -- Non-temporal/cache hints: streaming accesses unlikely to be reused soon, so - targets may select non-temporal instructions or ignore the hint. -- Invariant memory: contents known stable for the relevant program region - except through this access path. This is stronger than readonly object - placement and should be set only when the frontend can prove it. -- Alias scopes and noalias scopes. Rust `&mut`, C `restrict`, Zig `noalias`, - and frontend escape analysis can all feed this conservatively. - -## 6. Operation Semantics - -Integer and floating operations need attributes describing language semantics. - -### 6.1 Integer Ops - -Keep signedness on operations, not on types. - -Required operation families: - -- Add, sub, mul, bitwise and/or/xor. -- Signed and unsigned div/rem. -- Left shift, logical right shift, arithmetic right shift. -- Signed and unsigned comparisons. -- Sign extension, zero extension, truncation. -- Pointer/integer casts where the target permits them. - -Add operation flags: - -- No signed wrap / no unsigned wrap. -- Exact division/shift where applicable. -- Explicit signed and unsigned trap-on-overflow. Generic "overflow" is not - enough because integer types are width-only. -- Explicit signed and unsigned saturating arithmetic if a frontend/runtime - wants direct lowering. - -Checked arithmetic uses signed and unsigned intrinsics that return -`(result, overflow_bool)`. That is a backend-relevant primitive and avoids -forcing frontends to reproduce target flag idioms manually. - -`clz` and `ctz` have defined zero-input behavior: when the operand is zero, -the result is the operand bit width. - -### 6.2 Floating Ops - -Add floating arithmetic; the current API can push floats and convert but cannot -fully lower C, Zig, or Rust arithmetic. - -Required: - -- Floating add/sub/mul/div/rem/neg. -- Ordered and unordered comparisons. -- Conversion between floats and integers with explicit signedness and rounding - behavior. -- Fused multiply-add intrinsic or operation. - -Attributes: - -- Strict default semantics. -- Optional fast-math flags: reassoc, no-NaNs, no-infs, no-signed-zeros, allow - reciprocal, approximate functions. -- Rounding mode and exception behavior only if strict FP support is a goal. - -### 6.3 Bitcasts - -`convert` should mean semantic conversion. Add a distinct bit-preserving -operation: - -- Scalar bitcast. -- Aggregate/vector bitcast only when size matches and the backend can lower it - as a copy/reinterpretation. - -## 7. Control Flow and Stack Values - -Phase 1 contract: - -- `switch` / jump table primitive with target-chosen lowering. -- Computed goto through first-class function-local label-address values plus an - indirect local branch. This must support direct-threaded interpreters, where - label addresses are stored in dispatch tables, indexed by opcode, loaded, and - jumped through. Label-address data constants must be emitted while the - defining function is open, after the label handles are created; labels need - not be placed yet. Data emission is allowed inside an open function, so the - intended direct-threaded lowering is: declare the dispatch-table symbol, begin - the function, create labels, define the table contents as data while the - function remains open, then resume code emission. The value is opaque and - valid only for equality, storage/loading, table selection, and computed gotos - in the label's defining function. -- `unreachable` as a real terminator, not a side-effect intrinsic. - -Do not add landing pads, cleanup edges, or exception successors unless the -project expands beyond setjmp/longjmp. - -## 8. Calls, ABI, and Function Attributes - -The function type currently carries return type, params, and ABI variadic. That -is not enough for multi-language direct codegen. - -Add: - -- Calling convention on function type or call site. The common path is - backend-selected target C default; explicit SysV, Win64, AAPCS, wasm, - interrupt, and target-specific conventions are frontend requests for ABI - interop and must be supported by the selected backend or diagnosed. -- Per-function attributes: noreturn, cold, hot, naked, interrupt, stack - alignment, red-zone use, target features. -- Per-call attributes: tail policy, musttail, notail, cold. -- Per-parameter and return attributes: sret, byval, byref, inreg, noalias, - readonly, writeonly, nonnull, dereferenceable, signext, zeroext, align, - nest/context pointer. - -Avoid exception-related attributes such as `nounwind` unless they affect a -supported backend output. With no unwind model, calls either return normally or -do not return. - -`musttail` is important for languages that depend on tail calls or lower -coroutines/state machines through helper functions. It should fail -diagnostically if ABI shapes are incompatible. - -## 9. Symbols, Linkage, and Names - -The declaration API should not force C symbol mangling. C mangling is one -frontend policy, not a universal codegen rule. - -Use one name model: - -- Linkage name: exact linker-visible spelling after the frontend has applied - its language mangling and any desired object-format C decoration. -- Optional display/source name for debug info. - -Do not keep a separate "C source name" declaration path in the core CG API. If -the C frontend wants C decoration, it should call a helper before declaring the -symbol or use a C-frontend wrapper. - -Add: - -- COMDAT/linkonce/select-any groups. -- Weak/weak-odr where object formats support it. -- Section and partition attributes on functions and data. -- Constructor/destructor arrays with priority. -- Symbol versioning hooks later for ELF shared libraries. - -## 10. Data Definitions and Constants - -Keep data emission close to object bytes and relocations. That matches the -direct-codegen model and avoids a parallel constant IR. - -Needed additions: - -- Typed null pointer constants. -- Zero initializer and arbitrary bytes. -- Function/data address constants with pointer address space. -- Function-local label-address constants for direct-threaded dispatch tables. - These are emitted while the defining function is open; ordinary data - definitions may be interleaved with function emission for block-scope statics - and dispatch tables. -- Enum constants are unsigned bit patterns (`uint64_t`) interpreted by the - enum's width-only integer base type; source signedness is not part of the - codegen enum type. -- Relocation expressions already exist; keep target-selected lowering as the - default. Add explicit policy only when the target needs a frontend-visible - distinction. -- Per-object COMDAT, alignment, section, retention, merge/string flags, and TLS - model. - -Do not add structured aggregate constants unless they are needed to avoid -incorrect backend output. Frontends can lay out aggregate initializers into -bytes plus relocations. - -## 11. Atomics and Memory Model - -The current atomics have C-like memory orders. Multi-language support needs a -few more backend-relevant details: - -- Atomic width legality query. -- Strong versus weak compare-exchange. -- Memory scope if a supported target exposes scopes beyond system-wide atomics. -- Volatile atomic distinction for languages that expose both. -- Fence sync scope if scopes are supported. - -Do not add wait/wake or futex-like primitives to core CG. They should remain -library/runtime calls unless a backend can lower them specially. - -Atomic operations should also use `CfreeCgMemAccess` so type, address space, -alignment, volatility, and alias information have the same spelling as ordinary -memory operations. - -## 12. Inline Assembly - -The target constraint string is the operand contract. This is intentionally raw -because C/Zig-level inline asm needs the full target grammar: register classes, -explicit registers, immediate classes/ranges, memory/address constraints, -alternatives, matching/tied operands, earlyclobber, and target-specific -modifiers. A partial structured vocabulary would be less expressive and would -create a second spelling for facts the backend already parses from -constraints. - -Phase 1 contract: - -- Options: pure, nomem, readonly, preserves_flags, nostack, noreturn. -- Clobber ABI sets such as "clobber all caller-saved". - -Later additions: - -- Target feature requirements and target arch guard. - -Phase 1 keeps template strings and raw target constraints, wrapped in -`CfreeCgInlineAsm` so asm-wide options and operand arrays have a single -descriptor. - -## 13. Dynamic Stack Allocation - -Rust and Zig generally avoid C VLAs but still need stack temporaries, alignment, -and sometimes alloca-like lowering. - -Phase 1 contract: - -- Local slot allocation with explicit alignment and debug/address-taken flags. -- Parameter slot allocation with the same debug/artificial/temp flags. -- Dynamic `alloca(size, align)` returning a pointer. - -Later addition: - -- Stack probing for large frames as a target-selected behavior, with an option - to require it where platform ABI demands it. - -## 14. Debug Information - -Debug info should ride alongside ordinary CG usage as much as possible. The -default path should not require frontends to make a second set of debug-specific -calls for every function, parameter, local, and type. - -Auto-populate debug records from existing CG calls: - -- `cfree_cg_decl` carries linkage name, display/source name, declaration attrs, - type, and current source location. This is enough to create function/global - DIE skeletons. -- `cfree_cg_func_begin` / `func_end` define function ranges. -- `cfree_cg_param_slot` carries parameter index, type, name, and current source - location. This can create parameter DIEs and initial locations. -- `cfree_cg_local_slot` carries local type, name, alignment, flags, and current - source location. This can create local variable DIEs when the name is nonzero. -- `cfree_cg_set_loc` drives line table rows for subsequent instructions and - data definitions. -- Type constructors carry enough layout information for basic debug type DIEs: - scalars, pointers, arrays, functions, and natural-layout records. - -The regular API needs a few debug-oriented fields so this works: - -- Source/display name separate from linkage name. -- Compile-unit language tag and producer string. -- Public file registration or a documented way for frontends to obtain stable - `CfreeSrcLoc.file_id` values. -- Local/param flags: artificial, address-taken, optimized-out, compiler-temp. -- Optional lexical-scope markers for frontends that want nested scopes. These - can be ordinary CG control-flow scope calls with debug names/flags rather - than a separate debug API family. - -Limits of auto-population: - -- Inlined call-site info needs explicit frontend input because ordinary CG - locations only describe the current emitted instruction. -- Optimized variable locations beyond frame slots/registers may need later - hooks from the optimizer wrapper. -- Source-language-specific debug types may need optional metadata. That metadata - should decorate normal CG types/declarations rather than replacing them with a - separate debug-only API. - -## 15. Target Capability Queries - -A portable direct CG frontend needs to ask what the selected target can lower -without guessing from enum values. - -Add queries for: - -- Legal scalar widths and floating types. -- Legal atomic widths and lock-free status. -- Supported calling conventions. -- Supported inline asm constraint families. -- Object-format features: COMDAT, weak, protected visibility, TLS models, - common symbols, merge sections, constructor priorities. -- Backend feature flags: SIMD extensions, unaligned memory support, strict - alignment, red zone, pointer authentication, branch protection. - -Capability queries should answer "can this target/API lower it correctly", not -"is this fast". - -## 16. Diagnostics and Error Model - -Most current CG misuse paths panic. That is acceptable for internal compiler -bugs, but external frontends benefit from diagnosable unsupported-feature -failures. - -Use this distinction: - -- Malformed CG usage that indicates a frontend/compiler bug may panic. -- Unsupported but well-formed target features should emit diagnostics and fail - cleanly. -- Type/call/memory descriptors should be validated early enough that bad input - does not produce partial object corruption. - -## 17. Frontend Registration - -The current `CfreeLanguage` enum is fixed. That is enough for built-ins and the -toy frontend, but not for general external language plugins. - -Add: - -- Dynamic language registration by name, default suffixes, and compile callback. -- Per-language option payload passed through `CfreeCompileOptions`, or a generic - frontend user pointer. -- A standard way for a frontend to declare whether it needs preprocessing, - debug info, or target feature strings. - -Because this API can break, the fixed enum can be removed from the generic -frontend path. Builtin C/asm can still have fast internal dispatch. - -## 18. Suggested Phasing - -### Phase 1: One Clean Codegen Contract - -Status: public contract defined in `include/cfree/cg.h`. Implementation and -call-site migration are intentionally separate work. - -Phase 1 makes these breaking API choices: - -- Builtin integer types are width-only: `bool`/`i1`, `i8`, `i16`, `i32`, `i64`, - and `i128`. Signedness exists only on integer operations, comparisons, - conversions, and ABI extension attributes. -- Behavior-carrying qualified types are removed. `const` is an object/debug - fact, `volatile` is a memory-access fact, and `restrict`/`noalias` is an ABI - or memory-access fact. -- Pointer types carry pointee type plus address space. Address space 0 is the - normal target data address space. -- Generic byte-address offset is included for frontend-owned aggregate layouts. -- Function types are built from `CfreeCgFuncSig`: return type/attrs, - parameter type/attrs, calling convention, and ABI variadic bit. -- Declarations use exact raw linkage names plus optional display/source names. - CG does not apply C symbol spelling policy. -- `CfreeCgMemAccess` is the only way to spell memory semantics for loads, - stores, fixed-size memory ops, and atomics. -- Integer operations are split from floating operations and accept explicit - operation flags such as no-wrap, exact, signed/unsigned trap-on-overflow, and - signed/unsigned saturation. -- Semantic conversions are explicit: sign extension, zero extension, - truncation, pointer/integer casts, float extension/truncation, float/integer - conversions with rounding, and a distinct bitcast operation. -- Floating arithmetic and ordered/unordered comparisons are first-class API - operations, with strict defaults and optional fast-math flags. -- Calls use `CfreeCgCallAttrs` for tail policy and call-site flags. `musttail` - is represented as a contract the backend must accept or diagnose. -- Intrinsics include the backend primitives assumed by - `rt/include/cfree/{syscall,baremetal,coro}.h`. -- Atomics take `CfreeCgMemAccess`, include strong/weak compare-exchange, and - expose legality and lock-free capability queries. -- Target capability queries cover scalar type support, calling conventions, and - object-format symbol features. -- Inline assembly uses raw target constraints as the canonical operand contract. -- Switch/jump-table, computed goto, and unreachable terminator are explicit - control-flow operations. -- Dynamic alloca and local/parameter slot attributes are explicit stack-slot - operations. -- Inline assembly includes ABI clobber sets. -- Backend feature flags are queryable. -- Data address constants carry pointer address space. - -### Phase 2: Backend and Object Coverage Gaps - -- COMDAT/groups and constructor/destructor arrays. -- Stack probe requirement/request for large frames. -- More complete inline asm target-feature guards. - -### Phase 3: Debug and Frontend Integration - -- Complete auto debug emission from declarations, function ranges, locations, - params, locals, and type constructors. -- Compile-unit language/source registration. -- Optional lexical-scope markers through ordinary CG scopes. -- Dynamic frontend registration. - -## 19. Design Rule - -When deciding whether a feature belongs in public CG, use this test: - -- If the fact changes ABI, object contents, relocation choice, instruction - selection, memory ordering, or debug output, CG probably needs to express it. -- If the fact is source-language-only and can be fully lowered into existing - storage, calls, memory accesses, and operations, it belongs in the frontend. -- If the fact exists only to make frontend modeling easier, keep it out unless - omitting it causes incorrect backend output. -- If the fact requires whole-function analysis but does not need to be visible - to direct backends, it may belong in the optimizer wrapper rather than the - public direct-emission API. - -The goal is not to expose every internal compiler concept. The goal is to make -the direct codegen boundary honest enough that C, Zig, Rust-like languages, and -machine lifters can all lower to it without depending on internal headers or -silently losing backend-relevant semantics. diff --git a/doc/cg-neutral-backend-plan.md b/doc/cg-neutral-backend-plan.md @@ -1,286 +0,0 @@ -# Neutral CG Backend Migration Plan - -This document plans the migration from the existing C-shaped codegen path to a -neutral CG layer based on the public API in `include/cfree/cg.h`. It also -consolidates the lower-layer gap inventory exposed while updating -`src/api/cg.c` for that API. - -The central goal is that the C frontend becomes one client of a neutral codegen -interface. C `Type*` should stop being the backend type currency; it should be -translated at the frontend boundary into neutral CG type descriptors. Backends, -ABI classification, and target lowering should consume CG types and CG -operation descriptors. - -## Principles - -- Reuse public CG semantic enums and flags when they name the exact internal - concept: calling convention, TLS model, tail policy, memory order, rounding, - ABI attribute flags, operation flags, asm flags, and similar values. -- Do not pass public API structs directly into lower layers. Public structs use - API handles, caller-owned arrays, and frontend-facing ownership rules. Lower - layers should receive resolved internal descriptors with stable storage. -- Move C `Type*` above CG. The C parser/type system may still use `Type*`, but - it should lower C declarations, expressions, and layout requests into neutral - CG types before reaching ABI or `CGTarget`. -- Keep `ObjBuilder` mostly type-agnostic. It should model object-format facts: - symbols, sections, groups, relocations, data expressions, TLS model, sizes, - alignments, display names, and format-specific extensions. It should not - become a typed IR layer. -- Make unsupported behavior explicit. If a public CG feature cannot be lowered - or represented, the target/object layer should answer false through a - capability query or emit a diagnostic. Metadata should not be silently - ignored unless the API defines it as a hint. - -## Gap Coverage - -The public CG API already describes more semantics than the current lower -layers can represent. The migration plan below addresses these gaps by moving -metadata into neutral CG descriptors, object descriptors, or explicit target -capabilities. - -`CGTarget` and ABI gaps to close: - -- Non-default calling conventions are recorded by the public API but not - carried into ABI classification or lowering. -- ABI attributes are not consumed by call, return, or parameter lowering: - signext, zeroext, sret, byval, byref, inreg, noalias, readonly, writeonly, - nonnull, nest, explicit alignment, and dereferenceable size. -- Function attributes are incomplete below the API: stack alignment, custom - sections, target feature strings, cold/hot hints, naked functions, interrupt - functions, no-red-zone requests, ifunc, and full noreturn handling. -- Per-symbol TLS model selection does not reach target lowering. -- Pointer address spaces are only partially represented and do not have full - target semantics. -- Memory access metadata loses nontemporal, invariant, alias scope, and noalias - scope information. -- Computed goto, label-address values, and indirect branch over a validated - target set are unsupported. -- Switch lowering has no target hook and currently ignores jump-table hints. -- Integer operation flags are ignored: nsw, nuw, exact, trapping overflow, and - saturating arithmetic. -- Floating-point semantics are incomplete: FP remainder, fast-math flags, and - ordered-vs-unordered comparisons are not preserved. -- Conversion rounding modes are ignored. -- The internal intrinsic set is narrower than the public API, including FMA, - syscall, IRQ operations, barriers, cache maintenance, CPU wait/event ops, - coroutine switch, and signed-vs-unsigned overflow intrinsics. -- Atomic legality and lock-free queries are approximated from size instead of - target hooks; weak compare-exchange is accepted but not represented. -- Inline asm loses flags and ABI clobber sets. -- Call attributes are incomplete: musttail compatibility is not validated and - cold-call hints are ignored. - -`ObjBuilder` gaps to close: - -- Source/display names are not represented for symbols. -- DLL import/export and constructor priority are not semantic object features. -- Data label addresses have no object-level expression path. -- Data relocation address spaces are ignored. -- Symbol-difference expressions rely on available relocation kinds rather than - a format-neutral expression contract. -- Section merge/string entry size is not fully wired through data definitions. -- Common, weak, protected visibility, and COMDAT are only partially modeled as - an explicit object-level contract. - -## Type Direction - -Introduce an internal neutral CG type model as the canonical backend type -language. The public `CfreeCgTypeId` can be an API handle into this model, while -internal code may use either stable `CGTypeId` handles or `const CGType*` -references after validation. - -Surfaces that currently carry `Type*` and should move to neutral CG types -include: - -- `Operand.type` -- `MemAccess.type` -- `ConstBytes.type` -- `FrameSlotDesc.type` -- `CGParamDesc.type` -- `CGABIValue.type` -- `CGFuncDesc.fn_type` -- `CGCallDesc.fn_type` -- `AsmConstraint.type` -- ABI record layout and function classification inputs - -The C frontend should own the `Type* -> CGTypeId` adapter. Public CG API users -already construct neutral CG types directly, so they should not round-trip -through C types. - -## Internal Descriptor Shape - -Internal descriptors should be isomorphic to the public CG API where that is -useful, but resolved into backend-owned terms. - -For example, public input: - -```c -CfreeCgFuncSig -``` - -should normalize into an internal descriptor shaped like: - -```c -typedef struct CGAbiAttrs { - uint32_t flags; - uint32_t align; - uint64_t dereferenceable_size; -} CGAbiAttrs; - -typedef struct CGParam { - CGTypeId type; - CGAbiAttrs attrs; -} CGParam; - -typedef struct CGFuncSig { - CGTypeId ret; - CGAbiAttrs ret_attrs; - const CGParam* params; - uint32_t nparams; - CfreeCgCallConv call_conv; - int abi_variadic; -} CGFuncSig; -``` - -`TargetABI` should classify `CGFuncSig`, not a C function `Type*`. Parser paths -that still start with C `Type*` should synthesize a `CGFuncSig` during lowering. - -## Phasing - -### 1. Introduce Neutral CG Core Types - -Add the internal CG type table and descriptor APIs first, while keeping the old -codegen path working. This phase should define: - -- `CGTypeId` / `CGType` and constructors for builtin, pointer, array, function, - record, enum, and alias types. -- type layout/query hooks backed by `TargetABI`. -- `CGFuncSig`, `CGParam`, `CGAbiAttrs`, and neutral memory/access descriptors. -- a C frontend adapter from `Type*` to `CGTypeId`. - -This gives both the public CG API and the C frontend a shared neutral model -instead of treating `include/cfree/cg.h` as a facade over C-shaped internals. - -### 2. Move the C Frontend to the New CG Layer - -Make the C parser/frontend emit through the new CG API/layer. The old internal -CG path should no longer be a privileged backend path for C. - -This is the main semantic forcing function. It should prove that the neutral -type model can express normal C codegen, ABI calls, locals, lvalues, aggregates, -initializers, debug-facing names, and target-specific lowering requests. - -Prefer targeted red-green coverage during this phase: - -- function calls and returns for scalar, aggregate, variadic, and sret cases. -- object definitions, tentative definitions, TLS, readonly data, and custom - sections. -- control flow, switches, computed goto once supported, and inline asm. -- atomics and memory access descriptors. - -### 3. Keep the Old CG Layer Temporarily - -Do not delete `src/cg` immediately after the frontend starts targeting neutral -CG. Keep it as an adapter, comparison point, or dead-but-buildable path until -the new route is proven by the focused test corpus. - -The deletion point should be mechanical: no production path and no useful test -harness should depend on the old layer. Any parity tests worth keeping should -move to the new API before deletion. - -### 4. Update ObjBuilder to Object Descriptors - -Update `ObjBuilder` before broad `CGTarget` surgery where the new CG API already -needs stronger object semantics. - -`ObjBuilder` should grow descriptor-based write APIs for: - -- symbols with linkage name, display name, bind, visibility, kind, used, - import/export flags, COMDAT/group membership, common definition, constructor - priority, and per-symbol TLS model. -- sections with kind, semantic type, flags, alignment, entry size, group, link, - info, and format extension fields. -- data expressions for absolute symbol addresses, PC-relative symbol - references, symbol differences, and label-address values. - -Label addresses should ideally lower to normal local symbols. `CGTarget` or -`MCEmitter` can create a local notype symbol for an addressable block label; data -tables then use normal symbol relocations instead of a special data-label path. - -This phase should keep `ObjBuilder` independent of full CG type semantics. It -needs sizes and alignments at definition time, not a general type graph. - -### 5. Update ABI and CGTarget to Consume CG Types - -Once the frontend and object layer are speaking the neutral model, update ABI -classification and `CGTarget` signatures to consume CG descriptors directly. - -Important changes: - -- Replace `abi_func_info(TargetABI*, const Type*)` with classification keyed by - `CGFuncSig`. -- Preserve ABI attributes in `ABIFuncInfo` / `ABIArgInfo`: signext, zeroext, - sret, byval, byref, inreg, noalias, readonly, writeonly, nonnull, nest, - explicit alignment, and dereferenceable size. -- Extend `CGFuncDesc` for complete function attrs: stack alignment, section, - target feature strings, cold/hot, naked, interrupt, no-red-zone, ifunc, and - noreturn. -- Extend `CGCallDesc` for tail policy, musttail validation, cold call hints, - direct/indirect callee details, and full ABI signature metadata. -- Replace simple op hooks with descriptors preserving integer flags, FP flags, - ordered/unordered FP comparisons, FP remainder, and conversion rounding. -- Preserve full memory metadata: address space, volatile, nontemporal, - invariant, alias scope, noalias scope, and atomic flag/order. -- Add target hooks or descriptors for switches, label addresses, indirect - branches, atomics legality/lock-free queries, weak compare-exchange, expanded - intrinsics, and inline asm flags/ABI clobber sets. - -`opt_cgtarget` and IR replay should mirror the new `CGTarget` surface rather -than reconstructing lost metadata. - -### 6. Delete the Old CG Layer - -Delete the old CG layer only after: - -- the C frontend emits through neutral CG. -- public CG API tests pass through the same path. -- `ObjBuilder`, `TargetABI`, `CGTarget`, and `opt_cgtarget` consume neutral - descriptors. -- any useful parity tests have been moved. -- no production driver or test harness depends on the old interfaces. - -At this point deletion should be mostly removing stale adapters and C-shaped -plumbing, not making new semantic decisions. - -## Capability and Diagnostic Contract - -Capability queries should answer correctness, not performance. A target should -return support only when it can preserve the requested semantics. - -Examples: - -- non-default calling conventions must be target-backed. -- musttail requires ABI compatibility validation. -- symbol feature queries should be backed by `ObjBuilder` and object-format - support, not approximated in `src/api/cg.c`. -- atomic legality and lock-free answers should come from target hooks. -- strict conversion rounding, trapping overflow, saturating arithmetic, FP - remainder, and runtime/bare-metal intrinsics should diagnose until supported. - -Hints such as non-temporal memory, branch/call hotness, and some fast-math flags -may be ignored only when the public API explicitly permits that behavior. - -## Suggested Test Strategy - -Prefer narrow tests while the interfaces are changing: - -- `make test-cg` for neutral CG lowering and ABI behavior. -- `make test-elf` for symbol attrs, sections, `entsize`, data expressions, and - object round-trips. -- `make test-link` for relocation behavior, visibility, TLS, COMDAT, and - symdiff handling. -- frontend subsets such as `make test-parse test-cg` when migrating C lowering. -- specific arch smoke/codegen cases for features each target claims to support. - -Keep unsupported-feature tests explicit: they should assert diagnostics or false -capability answers rather than relying on accidental backend behavior. diff --git a/doc/cg-type-migration-plan.md b/doc/cg-type-migration-plan.md @@ -1,157 +0,0 @@ -# Remove C `Type` From `src/` - -## Goal - -`src/` must be language-neutral. C semantic types stay in `lang/c`; generic -codegen, ABI, arch lowering, optimizer, debug, object emission, and emu use -`CfreeCgTypeId`, `CgType`, debug type IDs, or explicit storage facts. - -Completion means: - -```sh -rg 'lang/c|type/type\.h|const Type\*|\bTypeKind\b|\bTY_' src include/abi -``` - -finds no generic `src` dependency on C semantic types. C-specific files under -`lang/c` may still use `Type`. - -## Current Blockers - -These are the remaining dependency clusters to remove. - -1. **C compatibility shims in `src/`** - - `src/type/type.h` - - `src/decl/decl.h` - - `src/decl/decl_attrs.h` - - `src/lex/lex.h` - - `src/pp/pp.h` - - `src/parse/cg_public_compat.h` - - `src/api/pipeline.c -> lang/c/c.h` - -2. **ABI still exposes C `Type*` bridge APIs** - - `include/abi/abi.h` and `src/abi/abi.h` include `type/type.h`. - - `abi_type_info`, `abi_sizeof`, `abi_alignof`, `abi_record_layout`, and - `abi_func_info` still take `const Type*`. - - `abi_size_type`, `abi_ptrdiff_type`, `abi_intptr_type`, - `abi_uintptr_type`, and `abi_va_list_type` still manufacture C types. - - `src/abi/abi.c` still has C bridge classification/layout code. - -3. **Public CG implementation still stores C `Type*` internally** - - `src/api/cg.c` keeps `CgApiType.type`, `resolve_type`, - `cg_api_type_import`, `cg_api_type_resolve`, stack value types, slot type - tables, symbol type tables, function return types, and bridge helpers. - - It builds legacy C `Type*` values when public CG type constructors are - called. - -4. **`CGTarget` and arch lowering still use C type identity** - - `src/arch/arch.h` forward-declares `Type` and uses `const Type*` in - `FrameSlotDesc`, `MemAccess`, `ConstBytes`, `AggregateAccess`, - `BitFieldAccess`, `Operand`, `CGABIValue`, `CGParamDesc`, `CGFuncDesc`, - `CGCallDesc`, `CGScopeDesc`, `AsmConstraint`, `alloc_reg`, and - `va_arg_`. - - Arch internals include `type/type.h` and use helpers such as - `type_is_64`, `type_is_fp_double`, `type_byte_size`, and - `type_is_signed`. - -5. **Optimizer IR stores C `Type*`** - - `src/opt/ir.h`, `src/opt/ir.c`, `src/opt/opt.c`, - `src/opt/pass_lower.c`. - - `Func.val_type`, instruction result types, frame slots, call metadata, - and `IR_VA_ARG` aux data are still `const Type*`. - -6. **Generic debug has the C debug adapter in `src`** - - `src/debug/c_debug.c` and `src/debug/c_debug.h` walk `Type*`. - - Generic debug comments and APIs still refer to C `Type*` caches. - -7. **Emu stubs still synthesize C `Type*`** - - `src/emu/emu.h` exposes `emu_cpu_type` and `emu_block_fn_type` as - `const Type*`. - - `src/emu/cpu.c` constructs CPU/block types through C type constructors. - -8. **Core pool still has a C type hook** - - `src/core/pool.h` forward-declares `Type`. - - `pool_type` exists only for the old C type interning shape and should move - to `lang/c` or disappear. - -## Removal Order - -Do this in order; each step should keep `make lib`, `make bin`, and -`make test-cg-api` green. Run parse/link tests when touching frontend or ABI -behavior. - -1. **Make C lowering own the `Type* -> CfreeCgTypeId` cache** - - Add a cache field or map in `lang/c`. - - Ensure all C parser/codegen adapters call public CG constructors once per - C type. - - Add public CG record forward/begin/complete support before removing the - recursive-record placeholder bridge. - -2. **Finish `src/api/cg.c` migration** - - Replace all stored `const Type*` with `CfreeCgTypeId` or `CgType` facts. - - Remove legacy C type construction from public CG constructors. - - Keep any unavoidable bridge in tiny, named functions until step 8. - -3. **Make ABI purely CG-typed** - - Rename or replace the `abi_cg_*` APIs as the only ABI layout/classification - APIs. - - Delete C `Type*` ABI APIs and C bridge classification/layout code from - `src/abi`. - - Replace target library type helpers with CG type IDs or move C spellings - of `size_t`, `ptrdiff_t`, `intptr_t`, `uintptr_t`, and `va_list` to - `lang/c`. - - Remove `type/type.h` from `include/abi/abi.h` and `src/abi/abi.h`. - -4. **Make `CGTarget` language-neutral** - - Change target-facing descriptors in `src/arch/arch.h` from `Type*` to - `CfreeCgTypeId` or explicit facts: size, align, reg class, integer width, - float width, pointer/address-space, signedness where operation-specific. - - Replace arch helper reads of C types with CG helpers or operation flags. - - Remove `type/type.h` includes from `src/arch/**`. - -5. **Move optimizer IR off C types** - - Replace IR value/frame/instruction type fields with `CfreeCgTypeId` or - compact derived facts. - - Replace `IR_VA_ARG` `Type*` aux with a CG type handle. - - Remove `type/type.h` from `src/opt/**`. - -6. **Move C debug lowering out of generic debug** - - Move `src/debug/c_debug.*` to `lang/c/debug` or another C frontend adapter. - - Generic debug should consume frontend-provided `DebugTypeId` values, not - inspect C `Type`. - - Remove C type cache language from generic `src/debug` docs/comments. - -7. **Update emu stubs** - - Replace `emu_cpu_type` / `emu_block_fn_type` with CG type IDs or explicit - layout records. - - Build CPU state and block signatures through public CG constructors. - - Remove `type/type.h` from `src/emu/**`. - -8. **Move pool/type interning ownership to `lang/c`** - - Delete `pool_type` from `src/core/pool.*` or move the C-specific type - interning helper under `lang/c/type`. - - Remove the `Type` forward declaration from `src/core/pool.h`. - -9. **Delete compatibility shims and register C like Toy** - - Delete `src/type`, `src/decl`, `src/lex`, `src/pp`, and - `src/parse/cg_public_compat.h` once no `src` file includes them. - - Remove `src/api/pipeline.c`'s direct `lang/c/c.h` include and hardcoded C - branch. - - Register C through the frontend mechanism used by Toy. - -## Do Not Regress - -- Do not put C-only facts into `CgType`. -- Signedness should live on operations, comparisons, conversions, ABI attrs, or - explicit lowering metadata, not storage type identity. -- Object emission must remain byte/section/symbol/reloc based. -- Keep frontend-specific debug/type lowering outside generic `src`. - -## Useful Checks - -```sh -make lib -make bin -make test-cg-api -rg 'lang/c|type/type\.h|const Type\*|\bTypeKind\b|\bTY_' src include/abi -rg 'cg_api_type_import|cg_api_type_resolve|cfree_cg_internal_.*type' src -```