kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit cbeab9e6eaea110d2dc9f1a3263f8db8587a46d2
parent 0dbab6ba4f57c50e1ff6cd5a36e59dc21bd2dfdb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 14 May 2026 14:47:56 -0700

Rewrite design document

Diffstat:
Mdoc/DESIGN.md | 1123++++++++++++++++---------------------------------------------------------------
1 file changed, 222 insertions(+), 901 deletions(-)

diff --git a/doc/DESIGN.md b/doc/DESIGN.md @@ -1,931 +1,252 @@ -# cfree design +# cfree Design + +This document describes the current implementation structure of cfree. It is +not a roadmap and it does not describe target surfaces that are not wired into +the tree today. + +cfree is organized around a small public `libcfree` API, with CLI tools and +language frontends as API consumers. The library owns compilation, object +construction, linking, JIT mapping, debugging support, and emulation internals. +The driver owns command-line policy and host I/O. + +## Public Boundary + +The public headers are: + +- `include/cfree.h`: compiler lifecycle, targets, compile/link/JIT/debug/emu + APIs, host vtables, object inspection, archive and disassembly helpers. +- `include/cfree/cg.h`: the public code-generation API used by language + frontends. +- `include/cfree/frontend.h`: frontend support APIs that do not expose + `CfreeCompiler` internals, such as arenas, source registration, symbols, and + frontend panic boundaries. +- `include/cfree/hashmap.h`: public helper used by frontends. + +`driver/` is built against the public include tree only. It must not include +private `src/` headers. Its job is to parse tool options, load and release +files, provide host vtables in `CfreeEnv`, register frontends on each compiler, +and call public `cfree_*` entry points. + +`lang/` is also outside `src/` and is an API consumer. `lang/c` and `lang/toy` +use `<cfree.h>`, `<cfree/cg.h>`, and frontend support headers, plus their own +private headers under `lang/...`. They do not reach into `src/` implementation +headers. + +`src/` is `libcfree` implementation. Internal modules may share private +headers, but their public surface is exposed only through `include/`. + +## Layering + +From outside to inside: + +1. **Driver (`driver/`)** + Implements the multi-call `cfree` binary: `cc`, `as`, `ld`, `ar`, + `objdump`, `run`, `dbg`, and `emu`. It translates command-line flags into + public API options, supplies heap/diagnostic/file/executable-memory vtables, + and resolves path-shaped inputs into byte buffers and writers. + +2. **Language frontends (`lang/`)** + Registered per `CfreeCompiler` with `cfree_register_frontend`. + `lang/c` preprocesses, parses, type-checks, manages C declarations, and + drives the public CG API. `lang/toy` is a small frontend used to exercise the + same CG API. Frontends produce object contents through `CfreeCg`; they do + not own object formats or linker policy. + +3. **Public API glue (`src/api/`)** + Implements `CfreeCompiler`/`CfreePipeline` lifecycle, compile and link entry + points, writer helpers, object inspection, archive APIs, disassembly APIs, + frontend support, and the public CG API. This layer is the composition point + between public handles and internal subsystems. + +4. **Core services (`src/core/`)** + Provides allocation helpers, arenas, vectors, buffers, string buffers, + interned symbols, source-file tracking, diagnostics, hashing, and common + utilities. Most state is rooted in a `CfreeCompiler` or in explicit + subsystem contexts passed through the call graph. + +5. **Frontend-neutral compilation internals** + `src/abi/` owns target ABI layout and call classification. `src/arch/` + owns target registration, internal `CGTarget` and `MCEmitter` creation, + instruction emission, register files, register allocation support, + disassembly, and per-architecture fixups. `src/asm/` owns the standalone + assembler and shared assembler helpers. + +6. **Optimizer (`src/opt/`)** + Implements an internal `CGTarget` wrapper. At `-O0`, code generation drives + the real target directly. At `-O1`, the wrapper records functions as IR, + runs the implemented lowering/register-allocation/combine/DCE path, and + replays into the wrapped target. `-O2` is reserved in API shape and comments + but is not currently implemented as a full optimization pipeline. + +7. **Object, debug, link, JIT, dbg, and emu** + `src/obj/` owns the in-memory object model plus ELF and Mach-O reading and + writing paths. `src/debug/` owns DWARF production and reading. `src/link/` + resolves symbols, lays out images, applies relocations, emits executables, + and builds JIT images. The public shared-library entry point is present, but + shared-library codegen is not yet supported. `src/dbg/` layers breakpoints, + stepping, memory access, and displaced execution over JIT sessions. + `src/emu/` loads guest ELF images, decodes/lifts guest blocks, and runs them + through the JIT-backed runtime. + +8. **Runtime (`rt/`)** + Provides freestanding headers and compiler-rt/libc-style support code used + by generated programs and self-hosting configurations. It is separate from + the compiler implementation library. + +## Compile Data Flow + +### C source to object -Architecture of the cfree compiler, assembler, and linker. Companion to -`README.md`. Scope: how the modules fit together and what their contracts are. -Not a tutorial; not implementation notes. - -## 1. Goals - -- Conforming C11 freestanding compiler, written in C11. -- Single multi-call binary: `cc`, `cpp`, `as`, `ld`, `ar`, `objdump`, `dbg`. -- Targets: x86 (32/64), ARM (32/64), RISC-V (32/64), WASM. -- Output: object files (ELF, COFF, Mach-O, WASM) and executables. -- In-memory JIT path sharing the entire pipeline with the file path. -- Lightweight optimizer at roughly 70% of GCC/Clang `-O2` on integer code. -- Self-hosting. Bootstraps from a hex0-seed. -- Streaming wherever feasible. Direct lowering is function-at-a-time; `-O2` - may retain per-TU IR for inter-procedural optimization. +``` +driver cc + -> CfreeEnv + CfreeCompiler + -> cfree_compile_obj*() + -> registered C frontend + -> lexer -> preprocessor -> parser/type/declaration logic + -> public CfreeCg API + -> internal CGTarget + -> MCEmitter + -> ObjBuilder + -> object writer or linker input +``` -This design keeps the full project goals visible, but the interface contracts -below are currently tightened around the compiler, object emission, linker, -and JIT path. Standalone tool-specific surfaces (`ar`, `objdump`, `dbg`, -packaging, bootstrap) are allowed for by the shared model but are not the focus -of this pass. +The driver loads the source bytes and chooses `CfreeCompileOptions`. The +pipeline entry in `src/api/pipeline.c` creates an `ObjBuilder` and dispatches +to the frontend registered for `input->lang`. -## 2. Non-goals (v1) +The C frontend registers source files and include edges through frontend +support APIs, then preprocesses and parses the token stream. It records C +declaration semantics in its own `lang/c` tables and emits functions and data +through `CfreeCg`. -- C++, Objective-C. -- C11 variable-length arrays and variably-modified types (`__STDC_NO_VLA__`). -- Cross-TU LTO, PGO, autovectorization beyond peephole-level idiom recognition. -- Thread-safe parallel compile inside one process. -- Sanitizers, coverage instrumentation. -- `_Generic` corner cases that require multi-pass disambiguation are best-effort. +`CfreeCg` maps public CG types, symbols, stack operations, calls, branches, +data definitions, and source locations onto internal target operations. The +selected target either receives those operations directly (`-O0`) or through +the optimizer wrapper (`-O1`). Final machine bytes, labels, relocations, and +section contents are written into `ObjBuilder` through `MCEmitter`. -## 3. Layout +### Assembly source to object ``` -include/ public C11 headers shipped with the compiler (the runtime) -lib/ compiler-rt (the runtime) -src/ - core/ allocators, intern pool, source manager, diagnostics, buffers, target - lex/ shared tokenizer (C and asm) - pp/ C preprocessor - type/ target-neutral C type interning and compatibility - abi/ target ABI type layout and call classification - decl/ C declaration, linkage, storage-duration, and initializer model - parse/ C11 parser, asm parser - cg/ single-pass value-stack code generator - arch/ CGTarget + MCEmitter interfaces and per-arch backends - opt/ lightweight SSA IR + passes; presents itself as a CGTarget - obj/ in-memory object model + per-format file writers and readers - debug/ DWARF info collection + emission - link/ symbol resolution, relocation, exe writer, JIT linker - driver/ multi-call dispatch and command-line front-ends -test/ -doc/ +driver as or cfree_compile_obj*(CFREE_LANG_ASM) + -> asm lexer/parser + -> MCEmitter + -> ObjBuilder + -> object writer or linker input ``` -The compiler source lives in `src/`. `include/` and `lib/` are the runtime that -ships *with* the compiler (the freestanding stdlib and compiler-rt) and are -not built by the compiler-development tree. +Assembly bypasses `CfreeCg` because it is already target-level syntax. The +assembler uses the same object builder and machine emitter path as compiled C. -## 4. Dataflow +### Toy source to object ``` -.c → lex → pp → parse_c → decl + cg → CGTarget → MCEmitter → ObjBuilder ──┬──→ emit_{elf|coff|macho|wasm} → .o / exe -.s → lex → parse_asm ───────────────→ MCEmitter ──────────────────┤ - ├──→ link_file (.o + archives → exe) - └──→ link_jit (mmap + exec) +driver cc input.toy + -> registered toy frontend + -> public CfreeCg API + -> normal backend/object path ``` -Reading order, left to right: - -1. `lex` produces a stream of raw tokens (idents, numbers, punctuators, - strings). Tokens preserve exact spelling; literals carry deferred `LitId` - handles rather than host-decoded numeric values. -2. For C: `pp` consumes tokens, expands macros, and emits a stream of - preprocessed tokens. For asm: tokens go straight to `parse_asm`. -3. `parse_c` is recursive-descent over preprocessed tokens. It records C - declaration semantics in `DeclTable` and drives `cg` for executable code. - There is no explicit AST. -4. `cg` maintains a value stack à la TCC. Each parser action manipulates that - stack: pushes, loads, stores, aggregate copies, conversions, calls. At - `-O0`, CG owns live value lifetimes, spills, reloads, and preservation - across calls/asm; the target provides scratch registers and spill/reload - mechanics. -5. `CGTarget` is the typed C/IR lowering vtable. Concrete targets lower those - operations into machine emission; the optimizer also implements `CGTarget` - by recording the call sequence as IR per function, running - intra-procedural passes on `func_end`, and on `cgtarget_finalize` running - cross-function passes before replaying into the wrapped target `CGTarget`. -6. `MCEmitter` is the machine/object emission vtable. It owns section position, - bytes, alignment/fill, relocations at explicit offsets, machine-label - references, and source locations for debug line emission. -7. `ObjBuilder` is the single in-memory object representation. It accepts - sections, bytes, symbols, and relocations on the write side, and exposes - read accessors for file writers, the linker (file and JIT), and objdump. - -`parse_asm` bypasses `cg` and writes directly into `MCEmitter`; inline asm -is a typed `CGTarget.asm_block` operation that lowers through the target's asm -machinery. See §10. - -## 5. Key interfaces - -### 5.0 `SourceManager` (`src/core/core.h`) — source identity - -`SourceManager` is owned by `Compiler` and is the authority for `SrcLoc.file_id`. -It registers real files, memory inputs, builtins, and macro-expansion pseudo -files; maps file ids back to normalized paths and diagnostic spellings; records -include edges; and exposes dependency iteration for `-M*` output. Lexer and -preprocessor create source ids through it. Diagnostics, DWARF, dependency -generation, and reproducible-build path handling read from it rather than -inventing their own file tables. - -Macro-expanded tokens keep both spelling and expansion locations. Consumers -that need user-facing diagnostics can ask for spelling locations; consumers -that need execution/profiling/debug line attribution can ask for expansion -locations. `Debug.debug_file` takes a source file id, not a raw path. - -### 5.1 `CGTarget` (`src/arch/arch.h`) — typed lowering - -`CGTarget` is a vtable representing "something that can accept typed C/IR -operations for one function at a time". `cg` calls `CGTarget` after it has -resolved an operation's operands to concrete `Operand` values (immediate, -register, frame-relative, object-symbol-relative, indirect). Direct target -implementations lower these operations into their `MCEmitter`; `opt` wraps a -target `CGTarget` and records the same operations as IR before replaying them -later. - -Method groups: - -- **Function lifecycle.** `func_begin(CGFuncDesc)`, `func_end`. - `CGFuncDesc` carries the function `ObjSymId`, `fn_type`, inspectable - `ABIFuncInfo`, parameter descriptors, and declaration location. -- **Frame slots, parameters, and value lifetimes.** `frame_slot(FrameSlotDesc)` - creates stable frame-resident storage for locals, parameters, spills, sret, - and dynamic-allocation bookkeeping. `param(CGParamDesc)` binds a source - parameter index to its stable slot and ABI incoming parts. `alloc_reg(class, - type)` returns a - physical scratch register for real targets and a fresh virtual for - `opt_cgtarget`. CG, not the target, owns the `-O0` value stack: it uses - `clobbers`, `spill_reg`, and `reload_reg` to preserve live values across - register pressure, calls, and inline asm. `free_reg` releases a value-stack - claim; `opt_cgtarget` treats it as a hint. -- **Control flow.** `label_new`, `label_place`, `jump`, `cmp_branch` (fused - compare-and-branch; the only conditional-branch primitive — for arbitrary - i1 values cg synthesizes `cmp_branch(CMP_NE, val, IMM_ZERO, label)`). -- **Structured control flow.** `scope_begin(CGScopeDesc)`, `scope_else`, - `scope_end`, `break_to`, `continue_to`. `CGScopeDesc` carries explicit break - and continue labels, so C `for` continues land on the increment expression - instead of assuming the loop header. Real backends shim these onto - `label_new`/`label_place`/`jump` (no code-size cost). The WASM backend - consumes them natively to emit block/loop/if with structurally-bounded `br` - targets. `goto`, computed-goto, and `switch` fallthrough still go through - the flat label API. opt's IR is flat-CFG; at -O2 the WASM lowering pass - reconstructs structure from the flat IR. -- **Data movement and aggregates.** `load_imm`, `load_const`, `copy`, `load`, - `store`, `addr_of`, `copy_bytes`, `set_bytes`, `bitfield_load`, and - `bitfield_store`. Scalar memory operations carry `MemAccess`; aggregate and - bitfield operations carry ABI-sized metadata so struct assignment, block - zeroing, byval copies, and bitfield accesses remain visible to opt and - direct backends. -- **Arithmetic / compare / convert.** `binop` uses explicit integer and - floating-point op families (`BO_I*`, `BO_F*`) rather than inferring behavior - from operand type. `cmp` materializes 0/1; use `cmp_branch` when the result - feeds a branch. `convert` is explicit by `ConvKind`. -- **Calls / return.** `call(CGCallDesc)` and `ret(CGABIValue*)`. The parser - type-checks `fn_type`; CG asks `TargetABI` for `ABIFuncInfo`, materializes - `CGABIValue`/`CGABIPart` arrays for direct, indirect/byval, sret, split, and - multi-register values, and passes that structured call/return shape to the - target. `callee.kind == OPK_GLOBAL` is a direct call; any other kind is - indirect. On WASM, `fn_type` selects the `call_indirect` type index — - interned `Type*` identity is the index source of truth (§12). -- **alloca.** `alloca(dst, size, align)` — dynamic stack allocation. Reachable - only via `__builtin_alloca` since v1 does not parse VLAs (§2). Backend grows - the linear-memory or native shadow stack; result pointer in `dst`. -- **Variadics.** `va_start`, `va_arg`, `va_end`, `va_copy`. `<stdarg.h>` macros - expand to compiler builtins which CG forwards here. Per-arch ABI: SysV - x86-64 manages the register-save area; arm64 manages its split gp/fp areas; - WASM walks the spilled-args memory. -- **setjmp / longjmp.** Optional methods. Real backends leave them NULL: the - parser lowers `<setjmp.h>`'s `setjmp` to a normal call to `__cfree_setjmp` - (a hand-written .S in `lib/`) and opt recognizes the symbol by name as - returns-twice (no inlining across; values defined before the call are not - GVN-merged with values defined after). The WASM backend implements - `setjmp_`/`longjmp_` via the exception-handling proposal — there is no - saveable native SP, so a library-only implementation is impossible. -- **Atomics.** `atomic_load`, `atomic_store`, `atomic_rmw`, `atomic_cas`, - `fence`. Atomic memory operations carry both `MemAccess` and `MemOrder`. - Backends route oversized atomics to compiler-rt; small atomics are inline. -- **Inline asm.** `asm_block(tmpl, outs, ins, clobbers)` — per-arch - constraint binding plus template assembly, packaged as one operation. The - asm parser is reused as a template walker inside this call, but final bytes - and relocations are emitted through `MCEmitter`. -- **Source location.** `set_loc(SrcLoc)` — sticky; subsequent emit-side - calls inherit it. `opt_cgtarget` stamps it onto each `Inst.loc`; target - backends forward it to `MCEmitter` for `Debug.line`. -- **End-of-TU.** `finalize`. - -Implementations: - -- Real CGTargets per arch under `src/arch/`. Their `finalize` is a no-op. -- `opt` (`src/opt/opt.h`) returns a wrapper CGTarget that records into IR. - Its `finalize` runs cross-function passes and lowers all buffered IR into a - wrapped target CGTarget. - -### 5.2 `MCEmitter` (`src/arch/arch.h`) — machine/object emission - -`MCEmitter` is the low-level emission vtable shared by target backends and -assembler input. It owns the current section, byte position, machine-label -creation/placement, raw byte output, fill/alignment, relocations against -`ObjSymId` at explicit offsets, label references/fixups, and sticky source -locations used by the debug line program. - -`CGTarget` implementations may hide instruction selection, register -allocation, prolog/epilog emission, and instruction encoding behind their -typed methods, but when they finally write object contents they go through -`MCEmitter`. `parse_asm` uses the same emitter directly because assembler -input is already machine-level syntax. - -### 5.3 Symbol identity — object-first - -`Sym` is only an interned spelling. It is used for identifiers, section names, -debug names, and lookup keys, but it is not a symbol table entry. - -`ObjSymId` is the authoritative symbol handle during compilation, assembly, -object reading, relocation emission, debug collection, and link input. It is -scoped to one `ObjBuilder`, so two objects can both contain a local `static -int x` without colliding, and an object reader can preserve local labels, -section symbols, file symbols, unnamed temporary symbols, and external -references faithfully. Parser declaration binding creates or reuses -`ObjSymId`s in the current builder; `cg`, `CGTarget`, `MCEmitter`, `Debug`, and `ObjBuilder` -traffic in those handles. - -The linker has its own resolved-symbol table built from each input object's -`ObjSymId`s. Externally visible definitions are matched by `Sym` name and -binding during resolution. JIT lookup and explicit entry selection are -therefore name-based (`Sym`), not handle-based: object symbol handles are not -portable across builders. - -### 5.3.1 `DeclTable` (`src/decl/decl.h`) — C declarations - -`DeclTable` is the C-language declaration layer above `ObjBuilder`. The parser -uses it for storage class, linkage, visibility, TLS, inline/weak attributes, -tentative definitions, static locals, explicit sections, and global -initializers. It returns `DeclId`s for parser and CG bookkeeping and owns the -mapping from a C declaration to its object-scoped `ObjSymId`. - -Global initialization is a list of `InitItem`s: zero ranges, exact -`ConstBytes`, relocatable symbol references, and fills. `DeclTable` applies C -rules such as tentative-definition coalescing and default section selection, -then writes concrete sections, bytes, symbols, and relocations into -`ObjBuilder`. `ObjBuilder` remains object-format canonical storage and does not -learn C storage-duration rules. - -### 5.4 `TargetABI` (`src/abi/abi.h`) — target layout authority - -`Type` is structural and target-neutral: kind, qualifiers, element/parameter -types, immutable record fields, array counts, scoped tag ids, tag spellings, -and bitfield flags/widths. -Records are built through a mutable `TypeRecordBuilder` and committed to an -interned immutable `Type*`. Field flags distinguish normal fields, anonymous -fields, flexible array members, bitfields, and zero-width bitfields. `Type` -does not own target-dependent facts such as scalar widths, record size, field -offsets, bitfield packing, aggregate alignment, or calling-convention -classification. - -Record and enum tags carry a `TagId` in addition to their `Sym` spelling. -`Sym` is only the diagnostic/debug spelling; `TagId` is scoped declaration -identity. This prevents two unrelated `struct S` declarations in different C -scopes from collapsing under global type interning. - -`TargetABI` is the one authority for those facts. It is initialized from -`Compiler.target` and is available as `Compiler.abi`. Its responsibilities: - -- Builtin scalar profiles: width/alignment/signedness of C scalar types, - pointer size/alignment, `long double`, enum representation policy, and - target-defined library types (`size_t`, `ptrdiff_t`, `intptr_t`, - `uintptr_t`, `va_list`). -- `sizeof`/`_Alignof` for every complete type. -- Record layout: field byte offsets, bitfield storage units, bit offsets, - final size, final alignment, and incomplete-type diagnostics. -- Calling convention classification: direct/indirect/split aggregate - arguments, return values, hidden sret pointers, byval copies, variadic - register-save/spill behavior, stack slot alignment, and inspectable - per-part placement data. - -Consumers must ask `TargetABI` rather than reading layout facts from `Type`. -Parser/type checking use it for `sizeof`, `_Alignof`, field access, enum -constant typing, and diagnostics. `cg` uses it before creating frame slots, -before emitting aggregate/bitfield operations, and when selecting conversions. -Calls use a hybrid model: `TargetABI` returns rich `ABIFuncInfo` data; CG turns -that into `CGABIValue`/`CGABIPart` operands; target hooks handle only final -instruction/OS-specific mechanics. `Debug` uses ABI data for DIE sizes, member -locations, parameter locations, and sret/byval facts. - -### 5.5 `ObjBuilder` (`src/obj/obj.h`) — concrete - -The single in-memory object representation. There is no second implementation, -so it is a concrete type rather than a vtable. Object, section, group, and -symbol handles are explicit (`OBJ_SEC_NONE`, `OBJ_GROUP_NONE`, -`OBJ_SYM_NONE`). The write API -(`obj_section`/`obj_write`/`obj_reserve_bss`/`obj_symbol`/`obj_reloc`/ -`obj_finalize`) is what MCEmitter, CGTarget, and `.o` readers use; the read API -(`obj_section_get`/`obj_relocs`/`obj_symbol_get`, symbol iteration with ids) is -what file emitters, the linker, JIT, and future objdump use. - -`ObjBuilder` is a canonical superset model, not merely "bytes plus names". -Sections carry both coarse compiler kind (`SEC_TEXT`, `SEC_DATA`, ...) and -object semantics (`SSEM_PROGBITS`, `SSEM_RELA`, `SSEM_GROUP`, ...), flags, -alignment, entry size, link/info references, and group membership. Symbols -carry binding, kind, visibility, absolute/common/TLS state, common alignment, -and object-scoped identity. Relocations record kind, explicit-addend versus -in-place addend, pairing, target symbol, and addend. COMDAT/group membership -is represented explicitly. `Writer` is a real byte sink with write, seek, tell, -error, and close operations so file emitters do not depend on a hidden I/O -side channel. - -Format-specific metadata is admitted only through typed enum fields -(`ObjExtKind`, semantic kinds, flags) and narrowly-scoped extension values -where a real format has no shared equivalent. Avoid opaque `void*` sidecars: -linker, JIT, emitters, readers, and objdump must be able to inspect the -canonical model without knowing which reader produced it. - -The invariant: the post-finalize state of an `ObjBuilder` is the same shape -as what you'd get from reading a `.o` back in. So `read_elf` of a freshly -emitted file produces an `ObjBuilder` indistinguishable from the one used to -emit it, modulo permitted canonicalization of section ordering and string-table -layout. Consumers (linker, objdump) don't care which path produced it. - -### 5.5.1 `LinkImage` (`src/link/link.h`) — resolved program image - -`Linker` accepts explicit inputs (`LinkInputId`) for fresh objects, object -files, and archives. Resolution produces a `LinkImage`: a shared file/JIT data -model containing resolved symbols (`LinkSymId`), final symbol addresses, -segments, laid-out section placements (`LinkSectionId`), segment bytes, and -relocation applications with concrete write locations. Undefined, duplicate, -unsupported-relocation, and layout failures are fatal diagnostics through -`Compiler.panic`. - -Executable emission and JIT mapping consume the same `LinkImage`. File writers -(`link_emit_image_writer`) read segment bytes, section placements, final -addresses, and relocation records from the image and write to a caller-owned -`Writer*`. JIT (`cfree_jit_from_image`) maps fresh writable memory, copies the -same segment bytes, applies relocation records at their `write_vaddr` -locations, resolves allowed external symbols through `LinkExternResolver`, -changes final permissions, and looks up exported/entry symbols by resolved -`Sym` name. Object-local `ObjSymId` values never escape as JIT lookup handles. -`CfreeJit` is the public owning handle; it takes ownership of the `LinkImage` -on construction and releases both on `cfree_jit_free`. - -`link_resolve` registers the returned `LinkImage` with `compiler_defer`, so a -panic between resolve and consumer (file emit or JIT mapping) reaps the -image. Successful consumers either call `link_image_free` (which undefers -and frees) or transfer ownership via `cfree_jit_from_image` (which undefers -and keeps the image alive for the JIT's lifetime). - -Linker inputs are byte buffers (`link_add_obj_bytes`, `link_add_archive_bytes`) -or already-built `ObjBuilder*` (`link_add_obj`). Path-shaped inputs are a -driver-level concern: the driver calls `c->env->file_io->read_all`, then feeds -the bytes APIs. - -**Incremental-linking forward compat.** The single-shot `link_resolve` -implementation must not destroy or consume input-side state that a future -incremental re-resolve would need. `LinkRelocApply` records stay as data -(they are not burned into segment bytes destructively without preserving the -originals); `LinkInputId → ObjBuilder*` mappings stay stable for the -lifetime of the `Linker`; resolution is a function from inputs to a fresh -`LinkImage`, not in-place mutation of the `Linker`. Incremental linking is -the single most likely future addition, and the existing surface -(`LinkInputId` stable handles, separable `LinkImage`, byte/`ObjBuilder` -inputs) is already amenable — this discipline keeps it amenable without -adding a speculative API. - -### 5.6 `MemAccess` — explicit memory semantics - -`MemAccess` is attached to every typed memory operation (`load`, `store`, -atomics, and IR memory instructions). It contains: - -- `type`: the semantic C object type being accessed. -- `size`: ABI byte width of the access. -- `align`: known byte alignment; `0` means unknown. -- `flags`: volatility, atomicity, restrict-derived noalias facts, readonly / - writeonly knowledge, and explicit unaligned accesses. -- `addr_space`: target address space / memory index (`0` for ordinary C - memory; WASM may use this for multiple memories later). -- `alias`: an alias root, one of unknown, local, global `ObjSymId`, parameter, - heap, or string literal. - -`cg` derives `MemAccess` when it turns an lvalue into a memory operation: -qualifiers supply `volatile` and `_Atomic`, `TargetABI` supplies size and -minimum alignment, declaration binding supplies local/global/parameter roots, -string literals supply string roots, and pointer arithmetic preserves the -best known root until it escapes. Casts that lose provenance downgrade the -root to `ALIAS_UNKNOWN`; `restrict` pointers create parameter roots with the -restrict flag. - -Optimization rules: - -- Volatile memory operations are side effects. They may not be deleted, - merged, reordered with other volatile operations, or moved across calls or - inline asm with a memory clobber. -- Atomic operations use both `MemAccess` and `MemOrder`; memory-order rules - dominate ordinary alias reasoning. -- Nonvolatile accesses with disjoint known alias roots may be reordered or - used for redundant-load and dead-store elimination. -- Unknown alias roots conservatively may alias any ordinary memory. -- The metadata is a permission to optimize, not a UB oracle: opt still may - not assume invalid programs are unreachable (§9). - -### 5.7 `ConstBytes` — exact literal materialization - -`ConstBytes` is the representation for constants whose exact target bits -matter. It carries the semantic `Type*`, ABI representation bytes, size, and -alignment. The bytes are produced by literal parsing plus `TargetABI`, never -by trusting host floating-point layout. This matters for hex floats, -rounding, `float` versus `double`, target-specific `long double`, endian -order, and future vector constants. - -`CGTarget.load_imm(dst, i64)` remains a convenience for small integer -constants. `CGTarget.load_const(dst, ConstBytes)` is the general path. Target -backends may encode the constant as an immediate, synthesize it with -instructions, or place it in a constant pool / `.rodata` and emit a load. -`cg_push_const` pushes an exact constant. `cg_push_float(double, type)` exists -only as a convenience for parser paths that have already accepted host-double -precision loss as harmless; conforming literal parsing should prefer -`cg_push_const`. - -### 5.8 Tokens and literals — spelling first, decoding later - -`Tok` preserves exact token spelling for diagnostics, macro stringification, -token pasting, dependency output, and faithful preprocessing. Numeric, -character, and string literals carry a `LitId` into the lexer's/preprocessor's -literal table. A literal record stores kind, encoding, suffix/encoding flags, -the exact spelling, and decoded bytes/code units only when decoding is already -target-independent. - -The lexer does not choose final C literal types and does not round floating -literals through host `double`. The parser, with `TargetABI`, performs integer -literal type selection, floating parsing/rounding, character literal value -selection, string literal concatenation, and construction of exact -`ConstBytes`. The preprocessor uses spelling and `LitId` to implement `#`, -`##`, `__LINE__`/`__FILE__`, include handling, and macro expansion without -discarding information the parser later needs. - -Bad literals remain tokens with `TF_LITERAL_BAD` plus spelling and source -location so diagnostics can point at the exact source text and recovery can -continue. - -## 6. Allocators and lifetimes - -cfree uses explicit allocators rather than a single global heap. Allocators are -fields of `Compiler` (`src/core/core.h`) and are passed down to subsystems. - -| Allocator | Lifetime | Owns | -|--------------|------------------------|--------------------------------------------------------| -| `Pool global`| Process | Interned strings and interned types. | -| `env.heap` | Output object/exe | Section chunks, reloc tables (survive into linker), JIT bookkeeping. | -| `Arena tu` | One TU compile | Local symbols, parser scratch, SourceManager tables, ABI caches. | -| `Arena scratch` | Reset per function | Value-stack scratch, fixup lists, lookahead buffers. | - -Rules: - -- A struct never owns its own heap implicitly. If it allocates, an allocator - reference is part of its API. -- Arena resets are an explicit operation on the arena. Subsystems holding - pointers into a scratch arena must either copy them out before reset, or - treat them as invalidated. -- Long-lived data (anything that outlives a TU) goes through `Pool global` or - `Heap output`. Don't copy from arenas into one of those — interning is the - only path in. -- Source identities live in `Compiler.sources`. They are stable for the - compile/link invocation and are read by diagnostics, dependency output, and - DWARF emission. - -`env.heap` is a normal heap (typically `heap_libc`). The JIT does not -compile directly into executable memory: `cfree_jit_from_image` consumes a -resolved `LinkImage`, mmaps a fresh region, copies laid-out segments in, -applies relocations in-place, and `mprotect`s final permissions. The `Heap` -vtable still exists so the JIT can swap allocators for the *destination* -mapping and so tests can substitute fakes. - -## 7. Error handling - -A single `Compiler` carries a `jmp_buf` and references a host-supplied -`DiagSink` through `CfreeEnv`. Fatal errors call `compiler_panic`, which emits -a diagnostic and `longjmp`s out of the entire parse/CG pipeline. Drivers -establish the `setjmp` boundary at TU or pipeline granularity. - -Layered driver functions (`cfree_compile_obj`, `cfree_link_*`, `cfree_run`) -each install their own boundary. To remain composable, every such function -saves `c->panic` via `compiler_panic_save` on entry and restores it via -`compiler_panic_restore` on every exit path (panic-return after -`compiler_run_cleanups`, and success). Without save/restore, an inner -`setjmp` clobbers an outer one and any subsequent `compiler_panic` in the -outer caller longjmps into the inner's already-returned stack frame. - -This means almost no function in `parse`, `cg`, or `arch` returns an error. The -happy path is the only path. Arena scratch is reset rather than unwound -one-by-one. - -Subsystem objects with non-arena resources (file handles, mmaps, child -allocators) self-register a cleanup with `compiler_defer` in their `_new` -and call `compiler_undefer` from their `_free`. The pipeline-level -`setjmp` handler runs `compiler_run_cleanups`, which walks the LIFO stack -and releases everything still registered. This keeps `compiler_panic` -correct even when failure happens deep inside a composition that has -allocated several subsystems. - -What is *not* fatal: warnings, recoverable parse errors that have a sensible -recovery point (skip-to-`;`, skip-to-`}`). The parser uses limited internal -recovery for these and only escalates to `compiler_panic` when continued -parsing would produce cascading garbage. - -## 8. Streaming - -Streams cleanly on direct lowering (`-O0` and targets that do not wrap with -`opt_cgtarget`): - -- Lexer → preprocessor token stream. -- Preprocessor → parser token stream. -- Parser → CG → CGTarget calls within a function. -- CGTarget → MCEmitter → ObjBuilder section bytes, appended via chunked buffers. - -Buffers per function (bounded, not per TU): - -- CG's value stack and label fixup tables. -- Per-target register/frame state. -- Optimizer's IR for the function being optimized, when only intra-procedural - passes are enabled. - -Buffers per TU: - -- Symbol tables — relocations cannot be resolved until all definitions are - seen. Final patching is deferred to ObjBuilder finalize / linker. -- Debug info — DWARF tables reference final section layout. -- `-O2` optimizer IR — cross-function inlining keeps all candidate function IR - and call graph metadata until `cgtarget_finalize`. - -So the streaming guarantee is tiered: - -- `-O0` direct target: source and codegen are function-at-a-time. -- `-O1` target-local optimization: function-at-a-time unless a target opts - into specific buffering. -- `-O2`: source is still read once, but optimized function IR may be retained - per TU for IPO. This is intentional and bounded by the TU, not the whole - program. - -## 9. Optimizer - -`opt` (`src/opt/opt.h`, `src/opt/ir.h`) implements `CGTarget`. The pass set and -ordering are modelled on MIR (`mir-gen.c`) — that pipeline is proven, well -understood, and a good fit for the "70% of -O2" target. The one cfree -addition is cross-function inlining, which MIR does not have. - -IR shape: block-based SSA. Functions are lists of basic blocks; blocks have -`Phi`s at the top; instructions reference values by SSA id. `Func` also owns -first-class frame-slot and parameter tables so `-O0` frame residency, -parameter ingress, mem2reg promotion, and debug locations all refer to the -same objects. The op set is small (integer constants, exact byte constants, -mem ops, aggregate ops, bitfield ops, explicit integer and floating-point -arith, compares, conversions, GEP, calls, terminators, an opaque `ASM_BLOCK`, -plus `IR_VA_*` and `IR_SETJMP`/`IR_LONGJMP`). `Inst` stays compact; ordinary -instructions define one `Val`, while multi-result instructions carry -`defs[0..ndefs)`. Complex per-op facts live in arena-owned typed aux structs -(`IRCallAux`, `IRAggregateAux`, `IRBitFieldAux`, `IRGepAux`, `IRAsmAux`, -`IRPhiAux`, `IRCasAux`). This keeps calls, aggregate copies, asm, CAS -multi-results, and ABI metadata inspectable by passes without turning every -instruction into a large union. - -The IR is flat-CFG: structured-scope ops on `CGTarget` (§5.1) are flattened by -`opt_cgtarget`'s recorder into ordinary labels, branches, and basic blocks. WASM -lowering at -O2 therefore needs to reconstruct structure (relooper) before -emitting. At -O0/-O1 there is no `opt_cgtarget` wrapper and CG drives the WASM -backend directly, producing structured output by construction. - -`IR_SETJMP` is a control barrier: opt does not inline across it, does not -hoist through it, and does not GVN-merge values defined on either side. -`IR_LONGJMP` has no successors (control does not return). The library setjmp -symbol used on real arches is recognized by name and gets the same treatment -when it appears as the callee of an `IR_CALL`. - -**No UB-exploiting passes.** Rules in opt may not assume that a UB-triggering -operation (signed overflow, shift-by-≥-width, division by zero, null deref) -is unreachable. WASM traps deterministically on the first three and faults on -the fourth — the program terminates rather than time-traveling. Real-target -behavior is also more predictable this way. The "70% of -O2" goal is -achievable without these rules. `Inst.flags` is general-purpose; no specific -bit allocations are reserved. If a non-UB-exploiting pass that benefits from -operation-level annotations arrives later, the path is to thread a flags -argument through `CGTarget.binop` and into `IR_*` then — not before. - -### 9.1 Lifecycle - -- `func_begin` allocates a fresh `Func` IR container in the per-TU IR arena. -- `alloc_reg(class, type)` returns a fresh virtual `Reg` whose mapping to a - `Val` is recorded; `free_reg` is a hint and ignored. -- `frame_slot` and `param` populate `Func.frame_slots` and `Func.params`. - Parameter ABI incoming parts are visible to later promotion, debug, and - replay. -- Every other emit call appends one SSA `Inst` to the current basic block. - Each `Inst` carries the `SrcLoc` set by the most recent `CGTarget.set_loc`. - `call(CGCallDesc)`, `atomic_cas`, and ABI split returns use the multi-result - `defs` convention. -- `func_end` runs the **intra-procedural** pipeline (§9.2) and stores the - optimized `Func`. **No lowering yet.** -- `cgtarget_finalize` runs the **inter-procedural** pipeline (§9.3) over all - buffered functions, then for each function runs the **lowering** pipeline - (§9.4) which drives the wrapped target CGTarget via `CGTarget.set_loc` + - emit-side calls. - -The driver therefore looks like: - -```c -parse_c(c, pp, decls, cg); -cgtarget_finalize(target); /* no-op for plain CGTarget; runs IPO+lower for opt */ -emit_elf(c, ob, w); -``` +The toy frontend exists to exercise and test the public CG API independently of +C language semantics. -At `-O0` the wrapper is not used and the target CGTarget is driven directly -during parse, with no function IR retention. `-O1` may use only local -lowering/target peepholes and remains function-at-a-time. `-O2` uses -`opt_cgtarget` and may retain IR for all functions in the TU. +## Link and Run Data Flow -Memory cost at `-O2`: the IR for every function in a TU is held in the per-TU -IR arena until `cgtarget_finalize`. Per-pass scratch lives in `Arena scratch`, -not in the IR arena. - -### 9.2 Intra-procedural pipeline (per `Func`, on `func_end` at `-O2`) +### File link ``` -build_cfg -block_cloning (hot path duplication; skipped if it would block addr_xform) -build_ssa (incl. promotion of non-address-taken FrameSlots — - mem2reg is folded in, not a separate pass) -addr_xform (fold GEP-equivalent address insns into uses) -gvn (incl. constprop, redundant-load elimination) -copy_prop (incl. redundant-extension elimination) -dse (dead store elimination) -ssa_dce -build_loop_tree + licm -pressure_relief -make_conventional_ssa + ssa_combine + undo_ssa -jump_opt +objects / object bytes / archives / DSO stubs + -> cfree_link_exe() + -> Linker + -> object/archive readers + -> symbol resolution + -> layout + -> relocation + -> executable writer ``` -### 9.3 Inter-procedural pipeline (over all `Func`s, on `cgtarget_finalize`) +The linker accepts already-built `CfreeObjBuilder` values, encoded object +bytes, archives, and dynamic library inputs described by public API options. +It owns archive member selection, symbol resolution, section and segment +layout, relocation, build-id/image-id handling, and final image emission. +`cfree_link_shared()` has a public option surface, but currently reports that +shared-library codegen is not supported. -Inlining doesn't pay off without a follow-up: the new opportunities (callee -arguments that are now constants, branches in the callee that are now dead, -redundant ops shared across the caller/callee boundary, callee bodies that -landed inside a caller loop) only get realised by re-running intra-procedural -passes on the modified caller. +### JIT run and debug ``` -opt_inline (call-graph bottom-up; SCCs skipped for v1) -for each dirty caller: - opt_cleanup (subset re-run: gvn, copy_prop, ssa_dce, jump_opt, - licm if loops, addr_xform if uses remain) +source/object inputs + -> compile/link to LinkImage + -> cfree_link_jit() + -> executable-memory host vtable + -> CfreeJit / CfreeJitSession + -> run or dbg ``` -Iteration (`inline → cleanup → inline → ...`) is bounded by `-finline-iters=N` -(default 1, hard cap enforced by opt_cgtarget). Tuning is benchmark-driven. +The JIT path shares the same compile, object, symbol, and relocation machinery +as file output. Mapping executable memory is delegated to the host through +`CfreeEnv`; libcfree enforces the image layout and relocation model. -### 9.4 Lowering pipeline (per `Func`, after IPO, drives target CGTarget) +`driver/run.c` invokes an entry point in-process. `driver/dbg.c` builds on JIT +sessions and `src/dbg/` for breakpoints, stepping, register display, and memory +inspection. -``` -machinize (target ABI lowering, 2-op forms, call lowering) -build_loop_tree (-O1+, used by RA) -coalesce (-O2, move-related) -live_info -regalloc (linear scan; live-range splitting at -O2) -combine (-O1+, code selection: merge dependent insns) -dce (-O1+, post-RA) -opt_emit (prolog/epilog; insn split; drive target CGTarget) -``` +### Emulation -### 9.5 Inline asm - -`ASM_BLOCK` is opaque: passes treat it as reading its input operands, writing -its output operands and clobbers, and not commuting with surrounding memory -ops. Inline asm is therefore safe across optimization without per-asm -modelling. - -## 10. Inline asm - -Two callers exercise the asm machinery: - -- Standalone `.s`: tokens → `parse_asm` → `MCEmitter.emit_bytes`/ - `emit_reloc_at`/`emit_label_ref` → `ObjBuilder`. Bypasses cg entirely; - operands are literal registers, immediates, labels, and symbols from the asm - syntax itself. - Standalone `.s` does not go through `opt_cgtarget`. -- Inline `asm("...": outs : ins : clobbers)` inside C: invoked via - `cg_inline_asm`. Flow: - - 1. Parser parses constraint list and template; evaluates each input/output - expression so inputs are `SValue`s on the CG stack and each output binds - an lvalue. - 2. cg pops inputs (in declaration order), packs them into an `Operand[]`, - and calls `CGTarget.asm_block(tmpl, outs, ins, clobbers)`. - 3. The arch implementation does **constraint binding** (`r`, `m`, `i`, - `=&r`, matching constraints, ...), then walks the template and assembles - each instruction. Under `opt_cgtarget` this is recorded as one `IR_ASM_BLOCK` - and replayed on the target arch at lowering time, after RA has assigned - the bound virtuals to physicals. - 4. arch fills `out_ops[]` with the location holding each result; cg pushes - those back as new SValues. - -The asm parser is shared between the standalone path (writing directly to -`MCEmitter`) and the inline path (used as a template walker inside -`CGTarget.asm_block`). Constraint binding is per-arch. - -`"memory"` clobber is conservative: cg flushes all live stack-resident values -to memory before the block and reloads after. This is suboptimal but -correct. - -Asm syntax (decided, single supported flavour per arch): - -- x86 (32 + 64): AT&T. Same parser serves both inline asm and standalone - `.s`. Matches GCC inline-asm convention. -- ARM (32 + 64): GNU `as` ("unified") syntax. -- RISC-V (32 + 64): GNU `as` syntax. -- WASM: WAT (text format). - -Open: full GCC-syntax constraint coverage (early-clobber, matching `0`, -multi-alternative). v1 covers `r`, `m`, `i`, `a`, `=r`, `+r`, `=m`, `=&r`, -matching constraints. The remainder is deferred. - -## 11. DWARF debug info - -Debug info lives in `src/debug/` and is owned by a single `Debug` object that -collects events during compilation and emits `.debug_*` sections at the end -of the TU. - -**Inputs (called during compilation):** - -| Producer | Calls | -|---|---| -| Driver | `debug_file(source_file_id)` to populate the DWARF file table from `SourceManager`. | -| CG | `debug_func_begin/end`, `debug_scope_begin/end`, `debug_param`, `debug_local`. cg holds an optional `Debug*` (NULL when `-g` is off). | -| MCEmitter (or opt's lowering pass) | `debug_line` per emitted instruction, sourced from the `SrcLoc` set by `CGTarget.set_loc`/`MCEmitter.set_loc`; `debug_func_pc_range` after each function is laid out. | -| opt at `-O2` | `debug_loclist_*` when a variable's location changes across the function. The `SrcLoc` propagates through opt because every recorded `Inst` carries it. | - -**Outputs:** `.debug_info`, `.debug_abbrev`, `.debug_line`, `.debug_str`, -`.debug_aranges`, `.debug_rnglists`, `.debug_loclists` — written into the -same `ObjBuilder` when `debug_emit` is called. `debug_emit` runs after all -code sections are finalized but before file emitters consume the builder. - -**Variable locations:** at `-O0`, all locals live at stable frame offsets and -`DebugVarLoc` is `DVL_FRAME`; this gives full debuggability for free. With -`opt`, the lowering pass produces `DVL_LOCLIST` entries describing where a -variable lives across PC ranges. v1 may downgrade opt'd debug info to -function-level only (start/end PC, no locals); refining to per-variable -location lists is a follow-up but the interface already accommodates it. - -**Type DIEs:** generated on demand from the `Type*` reaching `debug_local` / -`debug_param`, with sizes, alignments, and member offsets supplied by -`TargetABI`. Interned by `Type*` identity (which is already pointer-equal for -equal types thanks to `Pool global`). - -## 12. Cross-cutting decisions - -- **Interning is global**, in `Pool global`. `Sym` (32-bit string id) is the - currency for spellings and lookup keys, not symbol identity. Symbol table - identity is object-scoped (`ObjSymId`, §5.3) until the linker resolves - definitions. C tag identity is scoped `TagId`, not `Sym`, so equal tag - spellings in different scopes remain distinct. Equal types are pointer-equal - after `pool_type` (same applies to strings: pool_intern returns the canonical - id). On WASM, this `Type*` identity is also the source of truth for - `call_indirect` type-index assignment. -- **Source identity is centralized.** `SrcLoc.file_id` belongs to - `SourceManager`, not to the lexer, preprocessor, diagnostics, or debug - emitter. Macro expansion and include edges are recorded once and reused by - diagnostics, DWARF, and dependency generation. -- **Locals and parameters always start frame-resident.** `cg_local` and - `cg_param` allocate stable `FrameSlot`s through `CGTarget.frame_slot` and - `CGTarget.param`. Promotion to virtual registers (and to WASM-locals on - that target) happens *inside* SSA construction: `build_ssa` (§9.2) promotes - any slot whose `FrameSlotFlag` never had `FSF_ADDR_TAKEN` set. Address- - taken slots remain as memory ops and are reasoned about through `MemAccess` - alias roots. There is no separate mem2reg pass — SSA construction already - has to decide which `FrameSlot` accesses become Phi chains vs which stay - loads/stores, and a second pass would re-walk the same decisions. At -O0 - every slot stays on the frame, which is the same shape `Debug` wants for - `DVL_FRAME` (§11) — full debuggability for free, no parser pre-scan needed. -- **Function-pointer ABI is a linker concern.** A function symbol's address - taken via `&f` lowers to a normal `ObjSymId`-relative `Operand`. - ELF/COFF/Mach-O resolve this directly. WASM file emitters and the JIT linker - walk function-address relocations (`R_WASM_FUNCIDX` / `R_WASM_TABLEIDX`) while - building the shared `LinkImage` and assign indirect-function-table slots; the - slot index is the pointer's bit pattern. CG and `CGTarget` are unaware. -- **Sections are chunked.** A `Section.bytes` is a linked list of fixed-size - chunks. Append is O(1). Backward patching uses a 32-bit flat offset - computed at finalize time, so forward fixups don't depend on chunk - boundaries. -- **Error model is `setjmp`/`longjmp`.** See §7. -- **Single-pass parser+CG.** No separate AST. The optimizer reconstructs an - IR by recording CGTarget calls; this is technically two-pass *within a function* - but the source is read once. -- **Self-hosting constraint.** Anything in `src/` must be writable in C11 - freestanding (with the runtime in `include/`/`lib/`). No GNU extensions, no - libc beyond what cfree itself ships. Bootstrap is hex0-seed → small subset - → full cfree; details TBD. - -## 13. Build composition - -The driver-facing API is layered (`src/driver/pipeline.h`). Most consumers -should not hand-compose the pipeline; they should call one of: - -- `cfree_compile_obj(c, opts, input, &ob)` — one TU → in-memory `ObjBuilder*` - for chaining into the linker. -- `cfree_compile_obj_emit(c, opts, input, writer)` — one TU → encoded `.o` - bytes via the caller's `Writer*` (cc -c). -- `cfree_link_exe(c, link_opts, writer)` — link → executable bytes. -- `cfree_link_jit(c, link_opts, &jit)` — link → owning `CfreeJit*`. -- `cfree_run(opts)` — convenience composition for the multi-input case. - -Data contracts at each boundary: - -- `compile_obj → link`: `ObjBuilder*` is the cross-API currency. The - returned builder is finalized; do not write further. Lifetime is tied to - the `Compiler`; it must remain alive until link is done. -- `compile_obj_emit → file`: `Writer*`. The `ObjBuilder` is consumed and - released inside the call. On nonzero return the Writer may contain - partial output and should not be consumed. -- `link → exe`: `Writer*`. No path appears in the core API. Same partial- - output caveat on nonzero return. -- `link → jit`: `CfreeJit*` owns its `LinkImage` and mapped pages; lookups - are by `Sym` (interned name) — `ObjSymId` never escapes. - -Each layered function (`cfree_compile_obj`, `cfree_compile_obj_emit`, -`cfree_link_exe`, `cfree_link_jit`) saves and restores `Compiler.panic` -around its own `setjmp`, so they are safely callable from inside another -active panic boundary (for example from `cfree_run`). Library resolution -(`-lfoo` against `-L` paths) is the CLI driver's job; archives reaching -`CfreeOptions` must already be concrete paths. - -Path-shaped helpers (`cfree -c file.c -o file.o`, `ld a.o b.o`, etc.) live -in driver-level adapters. They call `c->env->file_io->read_all` to obtain -byte buffers, then feed the byte/Writer APIs above. The freestanding core -never takes paths. - -The internal one-TU sequence used by `cfree_compile_obj` looks like: - -```c -ObjBuilder* ob = obj_new(c); -Pp* pp = pp_new(c); /* reads c->env->file_io */ -DeclTable* decls = decl_new(c, ob); -MCEmitter* mc = mc_new(c, ob); -CGTarget* a = cgtarget_new(c, ob, mc); -if (opt_level >= 1) a = opt_cgtarget_new(c, a, opt_level); -Debug* d = dbg ? debug_new(c, ob) : NULL; -CG* g = cg_new(c, a, d); - -pp_push_input(pp, lex_open_mem(c, name, src, len)); /* borrows src */ -parse_c(c, pp, decls, g); - -cgtarget_finalize(a); /* IPO + lowering at -O2; no-op otherwise */ -if (d) debug_emit(d); -obj_finalize(ob); +``` +guest ELF bytes + -> emu ELF loader + -> decode/lift guest basic blocks + -> CGTarget or opt_cgtarget + -> JIT image + -> emu runtime ``` -Order is load-bearing: `cgtarget_finalize` flushes lowered code, `debug_emit` -appends `.debug_*` sections, `obj_finalize` freezes the read-side view, and -only then may file emitters or the linker consume the builder. - -Each subsystem `_new` registers a cleanup with `compiler_defer` and the -matching `_free` pops it via `compiler_undefer` (§7), so a panic anywhere -in the sequence above unwinds correctly through `compiler_run_cleanups`. - -## 14. Open questions - -- WASM is structurally different from the register-shaped CGTarget (stack VM, - no ELF-style relocations). The `Operand`-driven CGTarget will lower verbosely - (every `binop` becomes `local.get; local.get; iN.add; local.set`); a - follow-up peephole pass for stack-shape lowering will reclaim most of the - bloat. Worth prototyping early to validate the abstractions. -- Bootstrap subset definition: which features must the seed compiler accept? -- Debug-info quality at `-O2`: minimum acceptable v1 is function-level - (low_pc/high_pc + parameter list at entry); per-variable location lists - for opt'd locals are a follow-up but the `Debug` interface admits them. -- WASM relooper at -O2: choosing between Stackifier-style (preserve flat CFG - with relooped wrappers) and Relooper-style (reconstruct nested scopes). - Affects code size and opt's freedom to introduce irreducible CFGs. -- Full VLA support beyond `__builtin_alloca`: deferred for v1 - (`__STDC_NO_VLA__=1`). The `IR_ALLOCA`/`CGTarget.alloca_` interface accommodates - it when the parser is extended. - -## 15. Safety model (WASM target) - -cfree's WASM backend inherits the WebAssembly sandbox; the goal here is to be -explicit about what that does and does not buy. - -**Checked at runtime:** - -- **Linear-memory bounds.** Every load and store traps on out-of-bounds. -- **Control-flow integrity for direct branches.** Structured `block`/`loop`/ - `if` mean a `br N` can only target a lexically enclosing scope. The - structured `CGTarget` ops (§5.1) are the source of this — flat goto and - `switch` fallthrough route through the relooper at -O2 and through the - WASM CGTarget's structural fallback at -O0/-O1. -- **CFI for indirect calls.** `call_indirect` traps on signature mismatch. - The WASM type index is keyed off interned `Type*` identity (§12), so equal - C function types produce a single WASM type id and a real (not vacuous) - type check. -- **No native code injection.** WASM has no `mprotect`/JIT-into-data path - exposed to the program; cfree's own JIT linker uses host APIs outside the - sandbox. -- **`setjmp`/`longjmp`** lower to WASM exception handling; a `longjmp` cannot - smash the host stack or skip past a structured-control-flow boundary it - did not originate inside. - -**NOT checked:** - -- **Pointer provenance.** Pointers are `i32` indices into linear memory. - `(int*)0xdeadbeef` is a valid bit pattern; the only guard is the bounds - check on the eventual access. Use-after-free, type confusion, and - intra-heap buffer overflow that stays inside linear memory all remain - exploitable — exactly as on a real target. -- **Integer/UB traps as a safety net.** Signed overflow, shift-by-≥-width, - and division-by-zero trap *deterministically* on WASM, but `opt` is not - permitted to assume they're unreachable (§9). They terminate the program; - they are not a substitute for input validation. -- **Stack exhaustion** beyond the configured WASM stack limit: traps, but - recovery requires host-side restart. - -In short: WASM gives cfree-compiled programs **memory-isolation** safety -(can't escape linear memory) and **control-flow-integrity** safety (can't -forge a return address or call a wrong-typed function), but not -**type-system** safety on pointers within linear memory. The compiler does -not pretend otherwise. +The emulator is a user-mode ELF runner. It translates guest basic blocks into +the same backend/JIT infrastructure used by native JIT compilation. The public +`CfreeEmuOptions.optimize` API currently reserves level `2`; implemented use is +through the available direct or optimizer-backed translation paths described in +`include/cfree.h` and `doc/EMU.md`. + +## Object and Symbol Model + +`CfreeSym` is an interned spelling. It is suitable for identifiers, section +names, symbol names, and lookup keys, but it is not itself a definition. + +`CfreeCgSym` is the public CG handle for a symbol inside one generated object. +Internally, object builders use object-scoped symbol ids so local symbols from +different objects do not collide. Linker resolution builds a separate +resolved-symbol table over all input objects and matches externally visible +definitions by name, binding, and object-format rules. + +`ObjBuilder` is the canonical in-memory object representation during +compilation and assembly. Object writers, the linker, object inspection, debug +emission, and JIT image construction consume this model rather than duplicating +section/symbol/relocation storage. + +## State and Ownership + +The host supplies storage and side effects through `CfreeEnv`: heap, +diagnostics, file I/O, executable memory, debugger OS hooks, JIT TLS hooks, and +time. Public APIs receive explicit options and handles; internal subsystems +hang state off `CfreeCompiler`, `CfreePipeline`, `CfreeCg`, `ObjBuilder`, +`Linker`, `CfreeJit`, `CfreeJitSession`, `CfreeEmu`, or frontend-owned context +structures. + +Compile inputs are byte buffers owned by the caller and must outlive the call. +Writers are host-owned. Builders returned by `cfree_compile_obj` are owned by +the compiler and must remain alive until consumers finish with them. Encoded +object bytes, archive bytes, and DSO bytes are borrowed by link calls for the +duration of the call unless a specific API says otherwise. + +## Current Optimization Contract + +- `opt_level == 0`: direct code generation into the selected backend. +- `opt_level == 1`: implemented optimizer-backed path. It records CGTarget + operations as IR, performs the implemented backend-prep and local cleanup + pipeline, allocates registers, combines, removes dead code, and emits through + the wrapped real target. +- `opt_level == 2`: not yet implemented as a full optimization level. Public + option fields and some internal pass declarations reserve this level, but the + current design should treat `-O2` as future work rather than a dependable + behavior contract.