commit cbeab9e6eaea110d2dc9f1a3263f8db8587a46d2
parent 0dbab6ba4f57c50e1ff6cd5a36e59dc21bd2dfdb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Thu, 14 May 2026 14:47:56 -0700
Rewrite design document
Diffstat:
| M | doc/DESIGN.md | | | 1123 | ++++++++++++++++--------------------------------------------------------------- |
1 file changed, 222 insertions(+), 901 deletions(-)
diff --git a/doc/DESIGN.md b/doc/DESIGN.md
@@ -1,931 +1,252 @@
-# cfree design
+# cfree Design
+
+This document describes the current implementation structure of cfree. It is
+not a roadmap and it does not describe target surfaces that are not wired into
+the tree today.
+
+cfree is organized around a small public `libcfree` API, with CLI tools and
+language frontends as API consumers. The library owns compilation, object
+construction, linking, JIT mapping, debugging support, and emulation internals.
+The driver owns command-line policy and host I/O.
+
+## Public Boundary
+
+The public headers are:
+
+- `include/cfree.h`: compiler lifecycle, targets, compile/link/JIT/debug/emu
+ APIs, host vtables, object inspection, archive and disassembly helpers.
+- `include/cfree/cg.h`: the public code-generation API used by language
+ frontends.
+- `include/cfree/frontend.h`: frontend support APIs that do not expose
+ `CfreeCompiler` internals, such as arenas, source registration, symbols, and
+ frontend panic boundaries.
+- `include/cfree/hashmap.h`: public helper used by frontends.
+
+`driver/` is built against the public include tree only. It must not include
+private `src/` headers. Its job is to parse tool options, load and release
+files, provide host vtables in `CfreeEnv`, register frontends on each compiler,
+and call public `cfree_*` entry points.
+
+`lang/` is also outside `src/` and is an API consumer. `lang/c` and `lang/toy`
+use `<cfree.h>`, `<cfree/cg.h>`, and frontend support headers, plus their own
+private headers under `lang/...`. They do not reach into `src/` implementation
+headers.
+
+`src/` is `libcfree` implementation. Internal modules may share private
+headers, but their public surface is exposed only through `include/`.
+
+## Layering
+
+From outside to inside:
+
+1. **Driver (`driver/`)**
+ Implements the multi-call `cfree` binary: `cc`, `as`, `ld`, `ar`,
+ `objdump`, `run`, `dbg`, and `emu`. It translates command-line flags into
+ public API options, supplies heap/diagnostic/file/executable-memory vtables,
+ and resolves path-shaped inputs into byte buffers and writers.
+
+2. **Language frontends (`lang/`)**
+ Registered per `CfreeCompiler` with `cfree_register_frontend`.
+ `lang/c` preprocesses, parses, type-checks, manages C declarations, and
+ drives the public CG API. `lang/toy` is a small frontend used to exercise the
+ same CG API. Frontends produce object contents through `CfreeCg`; they do
+ not own object formats or linker policy.
+
+3. **Public API glue (`src/api/`)**
+ Implements `CfreeCompiler`/`CfreePipeline` lifecycle, compile and link entry
+ points, writer helpers, object inspection, archive APIs, disassembly APIs,
+ frontend support, and the public CG API. This layer is the composition point
+ between public handles and internal subsystems.
+
+4. **Core services (`src/core/`)**
+ Provides allocation helpers, arenas, vectors, buffers, string buffers,
+ interned symbols, source-file tracking, diagnostics, hashing, and common
+ utilities. Most state is rooted in a `CfreeCompiler` or in explicit
+ subsystem contexts passed through the call graph.
+
+5. **Frontend-neutral compilation internals**
+ `src/abi/` owns target ABI layout and call classification. `src/arch/`
+ owns target registration, internal `CGTarget` and `MCEmitter` creation,
+ instruction emission, register files, register allocation support,
+ disassembly, and per-architecture fixups. `src/asm/` owns the standalone
+ assembler and shared assembler helpers.
+
+6. **Optimizer (`src/opt/`)**
+ Implements an internal `CGTarget` wrapper. At `-O0`, code generation drives
+ the real target directly. At `-O1`, the wrapper records functions as IR,
+ runs the implemented lowering/register-allocation/combine/DCE path, and
+ replays into the wrapped target. `-O2` is reserved in API shape and comments
+ but is not currently implemented as a full optimization pipeline.
+
+7. **Object, debug, link, JIT, dbg, and emu**
+ `src/obj/` owns the in-memory object model plus ELF and Mach-O reading and
+ writing paths. `src/debug/` owns DWARF production and reading. `src/link/`
+ resolves symbols, lays out images, applies relocations, emits executables,
+ and builds JIT images. The public shared-library entry point is present, but
+ shared-library codegen is not yet supported. `src/dbg/` layers breakpoints,
+ stepping, memory access, and displaced execution over JIT sessions.
+ `src/emu/` loads guest ELF images, decodes/lifts guest blocks, and runs them
+ through the JIT-backed runtime.
+
+8. **Runtime (`rt/`)**
+ Provides freestanding headers and compiler-rt/libc-style support code used
+ by generated programs and self-hosting configurations. It is separate from
+ the compiler implementation library.
+
+## Compile Data Flow
+
+### C source to object
-Architecture of the cfree compiler, assembler, and linker. Companion to
-`README.md`. Scope: how the modules fit together and what their contracts are.
-Not a tutorial; not implementation notes.
-
-## 1. Goals
-
-- Conforming C11 freestanding compiler, written in C11.
-- Single multi-call binary: `cc`, `cpp`, `as`, `ld`, `ar`, `objdump`, `dbg`.
-- Targets: x86 (32/64), ARM (32/64), RISC-V (32/64), WASM.
-- Output: object files (ELF, COFF, Mach-O, WASM) and executables.
-- In-memory JIT path sharing the entire pipeline with the file path.
-- Lightweight optimizer at roughly 70% of GCC/Clang `-O2` on integer code.
-- Self-hosting. Bootstraps from a hex0-seed.
-- Streaming wherever feasible. Direct lowering is function-at-a-time; `-O2`
- may retain per-TU IR for inter-procedural optimization.
+```
+driver cc
+ -> CfreeEnv + CfreeCompiler
+ -> cfree_compile_obj*()
+ -> registered C frontend
+ -> lexer -> preprocessor -> parser/type/declaration logic
+ -> public CfreeCg API
+ -> internal CGTarget
+ -> MCEmitter
+ -> ObjBuilder
+ -> object writer or linker input
+```
-This design keeps the full project goals visible, but the interface contracts
-below are currently tightened around the compiler, object emission, linker,
-and JIT path. Standalone tool-specific surfaces (`ar`, `objdump`, `dbg`,
-packaging, bootstrap) are allowed for by the shared model but are not the focus
-of this pass.
+The driver loads the source bytes and chooses `CfreeCompileOptions`. The
+pipeline entry in `src/api/pipeline.c` creates an `ObjBuilder` and dispatches
+to the frontend registered for `input->lang`.
-## 2. Non-goals (v1)
+The C frontend registers source files and include edges through frontend
+support APIs, then preprocesses and parses the token stream. It records C
+declaration semantics in its own `lang/c` tables and emits functions and data
+through `CfreeCg`.
-- C++, Objective-C.
-- C11 variable-length arrays and variably-modified types (`__STDC_NO_VLA__`).
-- Cross-TU LTO, PGO, autovectorization beyond peephole-level idiom recognition.
-- Thread-safe parallel compile inside one process.
-- Sanitizers, coverage instrumentation.
-- `_Generic` corner cases that require multi-pass disambiguation are best-effort.
+`CfreeCg` maps public CG types, symbols, stack operations, calls, branches,
+data definitions, and source locations onto internal target operations. The
+selected target either receives those operations directly (`-O0`) or through
+the optimizer wrapper (`-O1`). Final machine bytes, labels, relocations, and
+section contents are written into `ObjBuilder` through `MCEmitter`.
-## 3. Layout
+### Assembly source to object
```
-include/ public C11 headers shipped with the compiler (the runtime)
-lib/ compiler-rt (the runtime)
-src/
- core/ allocators, intern pool, source manager, diagnostics, buffers, target
- lex/ shared tokenizer (C and asm)
- pp/ C preprocessor
- type/ target-neutral C type interning and compatibility
- abi/ target ABI type layout and call classification
- decl/ C declaration, linkage, storage-duration, and initializer model
- parse/ C11 parser, asm parser
- cg/ single-pass value-stack code generator
- arch/ CGTarget + MCEmitter interfaces and per-arch backends
- opt/ lightweight SSA IR + passes; presents itself as a CGTarget
- obj/ in-memory object model + per-format file writers and readers
- debug/ DWARF info collection + emission
- link/ symbol resolution, relocation, exe writer, JIT linker
- driver/ multi-call dispatch and command-line front-ends
-test/
-doc/
+driver as or cfree_compile_obj*(CFREE_LANG_ASM)
+ -> asm lexer/parser
+ -> MCEmitter
+ -> ObjBuilder
+ -> object writer or linker input
```
-The compiler source lives in `src/`. `include/` and `lib/` are the runtime that
-ships *with* the compiler (the freestanding stdlib and compiler-rt) and are
-not built by the compiler-development tree.
+Assembly bypasses `CfreeCg` because it is already target-level syntax. The
+assembler uses the same object builder and machine emitter path as compiled C.
-## 4. Dataflow
+### Toy source to object
```
-.c → lex → pp → parse_c → decl + cg → CGTarget → MCEmitter → ObjBuilder ──┬──→ emit_{elf|coff|macho|wasm} → .o / exe
-.s → lex → parse_asm ───────────────→ MCEmitter ──────────────────┤
- ├──→ link_file (.o + archives → exe)
- └──→ link_jit (mmap + exec)
+driver cc input.toy
+ -> registered toy frontend
+ -> public CfreeCg API
+ -> normal backend/object path
```
-Reading order, left to right:
-
-1. `lex` produces a stream of raw tokens (idents, numbers, punctuators,
- strings). Tokens preserve exact spelling; literals carry deferred `LitId`
- handles rather than host-decoded numeric values.
-2. For C: `pp` consumes tokens, expands macros, and emits a stream of
- preprocessed tokens. For asm: tokens go straight to `parse_asm`.
-3. `parse_c` is recursive-descent over preprocessed tokens. It records C
- declaration semantics in `DeclTable` and drives `cg` for executable code.
- There is no explicit AST.
-4. `cg` maintains a value stack à la TCC. Each parser action manipulates that
- stack: pushes, loads, stores, aggregate copies, conversions, calls. At
- `-O0`, CG owns live value lifetimes, spills, reloads, and preservation
- across calls/asm; the target provides scratch registers and spill/reload
- mechanics.
-5. `CGTarget` is the typed C/IR lowering vtable. Concrete targets lower those
- operations into machine emission; the optimizer also implements `CGTarget`
- by recording the call sequence as IR per function, running
- intra-procedural passes on `func_end`, and on `cgtarget_finalize` running
- cross-function passes before replaying into the wrapped target `CGTarget`.
-6. `MCEmitter` is the machine/object emission vtable. It owns section position,
- bytes, alignment/fill, relocations at explicit offsets, machine-label
- references, and source locations for debug line emission.
-7. `ObjBuilder` is the single in-memory object representation. It accepts
- sections, bytes, symbols, and relocations on the write side, and exposes
- read accessors for file writers, the linker (file and JIT), and objdump.
-
-`parse_asm` bypasses `cg` and writes directly into `MCEmitter`; inline asm
-is a typed `CGTarget.asm_block` operation that lowers through the target's asm
-machinery. See §10.
-
-## 5. Key interfaces
-
-### 5.0 `SourceManager` (`src/core/core.h`) — source identity
-
-`SourceManager` is owned by `Compiler` and is the authority for `SrcLoc.file_id`.
-It registers real files, memory inputs, builtins, and macro-expansion pseudo
-files; maps file ids back to normalized paths and diagnostic spellings; records
-include edges; and exposes dependency iteration for `-M*` output. Lexer and
-preprocessor create source ids through it. Diagnostics, DWARF, dependency
-generation, and reproducible-build path handling read from it rather than
-inventing their own file tables.
-
-Macro-expanded tokens keep both spelling and expansion locations. Consumers
-that need user-facing diagnostics can ask for spelling locations; consumers
-that need execution/profiling/debug line attribution can ask for expansion
-locations. `Debug.debug_file` takes a source file id, not a raw path.
-
-### 5.1 `CGTarget` (`src/arch/arch.h`) — typed lowering
-
-`CGTarget` is a vtable representing "something that can accept typed C/IR
-operations for one function at a time". `cg` calls `CGTarget` after it has
-resolved an operation's operands to concrete `Operand` values (immediate,
-register, frame-relative, object-symbol-relative, indirect). Direct target
-implementations lower these operations into their `MCEmitter`; `opt` wraps a
-target `CGTarget` and records the same operations as IR before replaying them
-later.
-
-Method groups:
-
-- **Function lifecycle.** `func_begin(CGFuncDesc)`, `func_end`.
- `CGFuncDesc` carries the function `ObjSymId`, `fn_type`, inspectable
- `ABIFuncInfo`, parameter descriptors, and declaration location.
-- **Frame slots, parameters, and value lifetimes.** `frame_slot(FrameSlotDesc)`
- creates stable frame-resident storage for locals, parameters, spills, sret,
- and dynamic-allocation bookkeeping. `param(CGParamDesc)` binds a source
- parameter index to its stable slot and ABI incoming parts. `alloc_reg(class,
- type)` returns a
- physical scratch register for real targets and a fresh virtual for
- `opt_cgtarget`. CG, not the target, owns the `-O0` value stack: it uses
- `clobbers`, `spill_reg`, and `reload_reg` to preserve live values across
- register pressure, calls, and inline asm. `free_reg` releases a value-stack
- claim; `opt_cgtarget` treats it as a hint.
-- **Control flow.** `label_new`, `label_place`, `jump`, `cmp_branch` (fused
- compare-and-branch; the only conditional-branch primitive — for arbitrary
- i1 values cg synthesizes `cmp_branch(CMP_NE, val, IMM_ZERO, label)`).
-- **Structured control flow.** `scope_begin(CGScopeDesc)`, `scope_else`,
- `scope_end`, `break_to`, `continue_to`. `CGScopeDesc` carries explicit break
- and continue labels, so C `for` continues land on the increment expression
- instead of assuming the loop header. Real backends shim these onto
- `label_new`/`label_place`/`jump` (no code-size cost). The WASM backend
- consumes them natively to emit block/loop/if with structurally-bounded `br`
- targets. `goto`, computed-goto, and `switch` fallthrough still go through
- the flat label API. opt's IR is flat-CFG; at -O2 the WASM lowering pass
- reconstructs structure from the flat IR.
-- **Data movement and aggregates.** `load_imm`, `load_const`, `copy`, `load`,
- `store`, `addr_of`, `copy_bytes`, `set_bytes`, `bitfield_load`, and
- `bitfield_store`. Scalar memory operations carry `MemAccess`; aggregate and
- bitfield operations carry ABI-sized metadata so struct assignment, block
- zeroing, byval copies, and bitfield accesses remain visible to opt and
- direct backends.
-- **Arithmetic / compare / convert.** `binop` uses explicit integer and
- floating-point op families (`BO_I*`, `BO_F*`) rather than inferring behavior
- from operand type. `cmp` materializes 0/1; use `cmp_branch` when the result
- feeds a branch. `convert` is explicit by `ConvKind`.
-- **Calls / return.** `call(CGCallDesc)` and `ret(CGABIValue*)`. The parser
- type-checks `fn_type`; CG asks `TargetABI` for `ABIFuncInfo`, materializes
- `CGABIValue`/`CGABIPart` arrays for direct, indirect/byval, sret, split, and
- multi-register values, and passes that structured call/return shape to the
- target. `callee.kind == OPK_GLOBAL` is a direct call; any other kind is
- indirect. On WASM, `fn_type` selects the `call_indirect` type index —
- interned `Type*` identity is the index source of truth (§12).
-- **alloca.** `alloca(dst, size, align)` — dynamic stack allocation. Reachable
- only via `__builtin_alloca` since v1 does not parse VLAs (§2). Backend grows
- the linear-memory or native shadow stack; result pointer in `dst`.
-- **Variadics.** `va_start`, `va_arg`, `va_end`, `va_copy`. `<stdarg.h>` macros
- expand to compiler builtins which CG forwards here. Per-arch ABI: SysV
- x86-64 manages the register-save area; arm64 manages its split gp/fp areas;
- WASM walks the spilled-args memory.
-- **setjmp / longjmp.** Optional methods. Real backends leave them NULL: the
- parser lowers `<setjmp.h>`'s `setjmp` to a normal call to `__cfree_setjmp`
- (a hand-written .S in `lib/`) and opt recognizes the symbol by name as
- returns-twice (no inlining across; values defined before the call are not
- GVN-merged with values defined after). The WASM backend implements
- `setjmp_`/`longjmp_` via the exception-handling proposal — there is no
- saveable native SP, so a library-only implementation is impossible.
-- **Atomics.** `atomic_load`, `atomic_store`, `atomic_rmw`, `atomic_cas`,
- `fence`. Atomic memory operations carry both `MemAccess` and `MemOrder`.
- Backends route oversized atomics to compiler-rt; small atomics are inline.
-- **Inline asm.** `asm_block(tmpl, outs, ins, clobbers)` — per-arch
- constraint binding plus template assembly, packaged as one operation. The
- asm parser is reused as a template walker inside this call, but final bytes
- and relocations are emitted through `MCEmitter`.
-- **Source location.** `set_loc(SrcLoc)` — sticky; subsequent emit-side
- calls inherit it. `opt_cgtarget` stamps it onto each `Inst.loc`; target
- backends forward it to `MCEmitter` for `Debug.line`.
-- **End-of-TU.** `finalize`.
-
-Implementations:
-
-- Real CGTargets per arch under `src/arch/`. Their `finalize` is a no-op.
-- `opt` (`src/opt/opt.h`) returns a wrapper CGTarget that records into IR.
- Its `finalize` runs cross-function passes and lowers all buffered IR into a
- wrapped target CGTarget.
-
-### 5.2 `MCEmitter` (`src/arch/arch.h`) — machine/object emission
-
-`MCEmitter` is the low-level emission vtable shared by target backends and
-assembler input. It owns the current section, byte position, machine-label
-creation/placement, raw byte output, fill/alignment, relocations against
-`ObjSymId` at explicit offsets, label references/fixups, and sticky source
-locations used by the debug line program.
-
-`CGTarget` implementations may hide instruction selection, register
-allocation, prolog/epilog emission, and instruction encoding behind their
-typed methods, but when they finally write object contents they go through
-`MCEmitter`. `parse_asm` uses the same emitter directly because assembler
-input is already machine-level syntax.
-
-### 5.3 Symbol identity — object-first
-
-`Sym` is only an interned spelling. It is used for identifiers, section names,
-debug names, and lookup keys, but it is not a symbol table entry.
-
-`ObjSymId` is the authoritative symbol handle during compilation, assembly,
-object reading, relocation emission, debug collection, and link input. It is
-scoped to one `ObjBuilder`, so two objects can both contain a local `static
-int x` without colliding, and an object reader can preserve local labels,
-section symbols, file symbols, unnamed temporary symbols, and external
-references faithfully. Parser declaration binding creates or reuses
-`ObjSymId`s in the current builder; `cg`, `CGTarget`, `MCEmitter`, `Debug`, and `ObjBuilder`
-traffic in those handles.
-
-The linker has its own resolved-symbol table built from each input object's
-`ObjSymId`s. Externally visible definitions are matched by `Sym` name and
-binding during resolution. JIT lookup and explicit entry selection are
-therefore name-based (`Sym`), not handle-based: object symbol handles are not
-portable across builders.
-
-### 5.3.1 `DeclTable` (`src/decl/decl.h`) — C declarations
-
-`DeclTable` is the C-language declaration layer above `ObjBuilder`. The parser
-uses it for storage class, linkage, visibility, TLS, inline/weak attributes,
-tentative definitions, static locals, explicit sections, and global
-initializers. It returns `DeclId`s for parser and CG bookkeeping and owns the
-mapping from a C declaration to its object-scoped `ObjSymId`.
-
-Global initialization is a list of `InitItem`s: zero ranges, exact
-`ConstBytes`, relocatable symbol references, and fills. `DeclTable` applies C
-rules such as tentative-definition coalescing and default section selection,
-then writes concrete sections, bytes, symbols, and relocations into
-`ObjBuilder`. `ObjBuilder` remains object-format canonical storage and does not
-learn C storage-duration rules.
-
-### 5.4 `TargetABI` (`src/abi/abi.h`) — target layout authority
-
-`Type` is structural and target-neutral: kind, qualifiers, element/parameter
-types, immutable record fields, array counts, scoped tag ids, tag spellings,
-and bitfield flags/widths.
-Records are built through a mutable `TypeRecordBuilder` and committed to an
-interned immutable `Type*`. Field flags distinguish normal fields, anonymous
-fields, flexible array members, bitfields, and zero-width bitfields. `Type`
-does not own target-dependent facts such as scalar widths, record size, field
-offsets, bitfield packing, aggregate alignment, or calling-convention
-classification.
-
-Record and enum tags carry a `TagId` in addition to their `Sym` spelling.
-`Sym` is only the diagnostic/debug spelling; `TagId` is scoped declaration
-identity. This prevents two unrelated `struct S` declarations in different C
-scopes from collapsing under global type interning.
-
-`TargetABI` is the one authority for those facts. It is initialized from
-`Compiler.target` and is available as `Compiler.abi`. Its responsibilities:
-
-- Builtin scalar profiles: width/alignment/signedness of C scalar types,
- pointer size/alignment, `long double`, enum representation policy, and
- target-defined library types (`size_t`, `ptrdiff_t`, `intptr_t`,
- `uintptr_t`, `va_list`).
-- `sizeof`/`_Alignof` for every complete type.
-- Record layout: field byte offsets, bitfield storage units, bit offsets,
- final size, final alignment, and incomplete-type diagnostics.
-- Calling convention classification: direct/indirect/split aggregate
- arguments, return values, hidden sret pointers, byval copies, variadic
- register-save/spill behavior, stack slot alignment, and inspectable
- per-part placement data.
-
-Consumers must ask `TargetABI` rather than reading layout facts from `Type`.
-Parser/type checking use it for `sizeof`, `_Alignof`, field access, enum
-constant typing, and diagnostics. `cg` uses it before creating frame slots,
-before emitting aggregate/bitfield operations, and when selecting conversions.
-Calls use a hybrid model: `TargetABI` returns rich `ABIFuncInfo` data; CG turns
-that into `CGABIValue`/`CGABIPart` operands; target hooks handle only final
-instruction/OS-specific mechanics. `Debug` uses ABI data for DIE sizes, member
-locations, parameter locations, and sret/byval facts.
-
-### 5.5 `ObjBuilder` (`src/obj/obj.h`) — concrete
-
-The single in-memory object representation. There is no second implementation,
-so it is a concrete type rather than a vtable. Object, section, group, and
-symbol handles are explicit (`OBJ_SEC_NONE`, `OBJ_GROUP_NONE`,
-`OBJ_SYM_NONE`). The write API
-(`obj_section`/`obj_write`/`obj_reserve_bss`/`obj_symbol`/`obj_reloc`/
-`obj_finalize`) is what MCEmitter, CGTarget, and `.o` readers use; the read API
-(`obj_section_get`/`obj_relocs`/`obj_symbol_get`, symbol iteration with ids) is
-what file emitters, the linker, JIT, and future objdump use.
-
-`ObjBuilder` is a canonical superset model, not merely "bytes plus names".
-Sections carry both coarse compiler kind (`SEC_TEXT`, `SEC_DATA`, ...) and
-object semantics (`SSEM_PROGBITS`, `SSEM_RELA`, `SSEM_GROUP`, ...), flags,
-alignment, entry size, link/info references, and group membership. Symbols
-carry binding, kind, visibility, absolute/common/TLS state, common alignment,
-and object-scoped identity. Relocations record kind, explicit-addend versus
-in-place addend, pairing, target symbol, and addend. COMDAT/group membership
-is represented explicitly. `Writer` is a real byte sink with write, seek, tell,
-error, and close operations so file emitters do not depend on a hidden I/O
-side channel.
-
-Format-specific metadata is admitted only through typed enum fields
-(`ObjExtKind`, semantic kinds, flags) and narrowly-scoped extension values
-where a real format has no shared equivalent. Avoid opaque `void*` sidecars:
-linker, JIT, emitters, readers, and objdump must be able to inspect the
-canonical model without knowing which reader produced it.
-
-The invariant: the post-finalize state of an `ObjBuilder` is the same shape
-as what you'd get from reading a `.o` back in. So `read_elf` of a freshly
-emitted file produces an `ObjBuilder` indistinguishable from the one used to
-emit it, modulo permitted canonicalization of section ordering and string-table
-layout. Consumers (linker, objdump) don't care which path produced it.
-
-### 5.5.1 `LinkImage` (`src/link/link.h`) — resolved program image
-
-`Linker` accepts explicit inputs (`LinkInputId`) for fresh objects, object
-files, and archives. Resolution produces a `LinkImage`: a shared file/JIT data
-model containing resolved symbols (`LinkSymId`), final symbol addresses,
-segments, laid-out section placements (`LinkSectionId`), segment bytes, and
-relocation applications with concrete write locations. Undefined, duplicate,
-unsupported-relocation, and layout failures are fatal diagnostics through
-`Compiler.panic`.
-
-Executable emission and JIT mapping consume the same `LinkImage`. File writers
-(`link_emit_image_writer`) read segment bytes, section placements, final
-addresses, and relocation records from the image and write to a caller-owned
-`Writer*`. JIT (`cfree_jit_from_image`) maps fresh writable memory, copies the
-same segment bytes, applies relocation records at their `write_vaddr`
-locations, resolves allowed external symbols through `LinkExternResolver`,
-changes final permissions, and looks up exported/entry symbols by resolved
-`Sym` name. Object-local `ObjSymId` values never escape as JIT lookup handles.
-`CfreeJit` is the public owning handle; it takes ownership of the `LinkImage`
-on construction and releases both on `cfree_jit_free`.
-
-`link_resolve` registers the returned `LinkImage` with `compiler_defer`, so a
-panic between resolve and consumer (file emit or JIT mapping) reaps the
-image. Successful consumers either call `link_image_free` (which undefers
-and frees) or transfer ownership via `cfree_jit_from_image` (which undefers
-and keeps the image alive for the JIT's lifetime).
-
-Linker inputs are byte buffers (`link_add_obj_bytes`, `link_add_archive_bytes`)
-or already-built `ObjBuilder*` (`link_add_obj`). Path-shaped inputs are a
-driver-level concern: the driver calls `c->env->file_io->read_all`, then feeds
-the bytes APIs.
-
-**Incremental-linking forward compat.** The single-shot `link_resolve`
-implementation must not destroy or consume input-side state that a future
-incremental re-resolve would need. `LinkRelocApply` records stay as data
-(they are not burned into segment bytes destructively without preserving the
-originals); `LinkInputId → ObjBuilder*` mappings stay stable for the
-lifetime of the `Linker`; resolution is a function from inputs to a fresh
-`LinkImage`, not in-place mutation of the `Linker`. Incremental linking is
-the single most likely future addition, and the existing surface
-(`LinkInputId` stable handles, separable `LinkImage`, byte/`ObjBuilder`
-inputs) is already amenable — this discipline keeps it amenable without
-adding a speculative API.
-
-### 5.6 `MemAccess` — explicit memory semantics
-
-`MemAccess` is attached to every typed memory operation (`load`, `store`,
-atomics, and IR memory instructions). It contains:
-
-- `type`: the semantic C object type being accessed.
-- `size`: ABI byte width of the access.
-- `align`: known byte alignment; `0` means unknown.
-- `flags`: volatility, atomicity, restrict-derived noalias facts, readonly /
- writeonly knowledge, and explicit unaligned accesses.
-- `addr_space`: target address space / memory index (`0` for ordinary C
- memory; WASM may use this for multiple memories later).
-- `alias`: an alias root, one of unknown, local, global `ObjSymId`, parameter,
- heap, or string literal.
-
-`cg` derives `MemAccess` when it turns an lvalue into a memory operation:
-qualifiers supply `volatile` and `_Atomic`, `TargetABI` supplies size and
-minimum alignment, declaration binding supplies local/global/parameter roots,
-string literals supply string roots, and pointer arithmetic preserves the
-best known root until it escapes. Casts that lose provenance downgrade the
-root to `ALIAS_UNKNOWN`; `restrict` pointers create parameter roots with the
-restrict flag.
-
-Optimization rules:
-
-- Volatile memory operations are side effects. They may not be deleted,
- merged, reordered with other volatile operations, or moved across calls or
- inline asm with a memory clobber.
-- Atomic operations use both `MemAccess` and `MemOrder`; memory-order rules
- dominate ordinary alias reasoning.
-- Nonvolatile accesses with disjoint known alias roots may be reordered or
- used for redundant-load and dead-store elimination.
-- Unknown alias roots conservatively may alias any ordinary memory.
-- The metadata is a permission to optimize, not a UB oracle: opt still may
- not assume invalid programs are unreachable (§9).
-
-### 5.7 `ConstBytes` — exact literal materialization
-
-`ConstBytes` is the representation for constants whose exact target bits
-matter. It carries the semantic `Type*`, ABI representation bytes, size, and
-alignment. The bytes are produced by literal parsing plus `TargetABI`, never
-by trusting host floating-point layout. This matters for hex floats,
-rounding, `float` versus `double`, target-specific `long double`, endian
-order, and future vector constants.
-
-`CGTarget.load_imm(dst, i64)` remains a convenience for small integer
-constants. `CGTarget.load_const(dst, ConstBytes)` is the general path. Target
-backends may encode the constant as an immediate, synthesize it with
-instructions, or place it in a constant pool / `.rodata` and emit a load.
-`cg_push_const` pushes an exact constant. `cg_push_float(double, type)` exists
-only as a convenience for parser paths that have already accepted host-double
-precision loss as harmless; conforming literal parsing should prefer
-`cg_push_const`.
-
-### 5.8 Tokens and literals — spelling first, decoding later
-
-`Tok` preserves exact token spelling for diagnostics, macro stringification,
-token pasting, dependency output, and faithful preprocessing. Numeric,
-character, and string literals carry a `LitId` into the lexer's/preprocessor's
-literal table. A literal record stores kind, encoding, suffix/encoding flags,
-the exact spelling, and decoded bytes/code units only when decoding is already
-target-independent.
-
-The lexer does not choose final C literal types and does not round floating
-literals through host `double`. The parser, with `TargetABI`, performs integer
-literal type selection, floating parsing/rounding, character literal value
-selection, string literal concatenation, and construction of exact
-`ConstBytes`. The preprocessor uses spelling and `LitId` to implement `#`,
-`##`, `__LINE__`/`__FILE__`, include handling, and macro expansion without
-discarding information the parser later needs.
-
-Bad literals remain tokens with `TF_LITERAL_BAD` plus spelling and source
-location so diagnostics can point at the exact source text and recovery can
-continue.
-
-## 6. Allocators and lifetimes
-
-cfree uses explicit allocators rather than a single global heap. Allocators are
-fields of `Compiler` (`src/core/core.h`) and are passed down to subsystems.
-
-| Allocator | Lifetime | Owns |
-|--------------|------------------------|--------------------------------------------------------|
-| `Pool global`| Process | Interned strings and interned types. |
-| `env.heap` | Output object/exe | Section chunks, reloc tables (survive into linker), JIT bookkeeping. |
-| `Arena tu` | One TU compile | Local symbols, parser scratch, SourceManager tables, ABI caches. |
-| `Arena scratch` | Reset per function | Value-stack scratch, fixup lists, lookahead buffers. |
-
-Rules:
-
-- A struct never owns its own heap implicitly. If it allocates, an allocator
- reference is part of its API.
-- Arena resets are an explicit operation on the arena. Subsystems holding
- pointers into a scratch arena must either copy them out before reset, or
- treat them as invalidated.
-- Long-lived data (anything that outlives a TU) goes through `Pool global` or
- `Heap output`. Don't copy from arenas into one of those — interning is the
- only path in.
-- Source identities live in `Compiler.sources`. They are stable for the
- compile/link invocation and are read by diagnostics, dependency output, and
- DWARF emission.
-
-`env.heap` is a normal heap (typically `heap_libc`). The JIT does not
-compile directly into executable memory: `cfree_jit_from_image` consumes a
-resolved `LinkImage`, mmaps a fresh region, copies laid-out segments in,
-applies relocations in-place, and `mprotect`s final permissions. The `Heap`
-vtable still exists so the JIT can swap allocators for the *destination*
-mapping and so tests can substitute fakes.
-
-## 7. Error handling
-
-A single `Compiler` carries a `jmp_buf` and references a host-supplied
-`DiagSink` through `CfreeEnv`. Fatal errors call `compiler_panic`, which emits
-a diagnostic and `longjmp`s out of the entire parse/CG pipeline. Drivers
-establish the `setjmp` boundary at TU or pipeline granularity.
-
-Layered driver functions (`cfree_compile_obj`, `cfree_link_*`, `cfree_run`)
-each install their own boundary. To remain composable, every such function
-saves `c->panic` via `compiler_panic_save` on entry and restores it via
-`compiler_panic_restore` on every exit path (panic-return after
-`compiler_run_cleanups`, and success). Without save/restore, an inner
-`setjmp` clobbers an outer one and any subsequent `compiler_panic` in the
-outer caller longjmps into the inner's already-returned stack frame.
-
-This means almost no function in `parse`, `cg`, or `arch` returns an error. The
-happy path is the only path. Arena scratch is reset rather than unwound
-one-by-one.
-
-Subsystem objects with non-arena resources (file handles, mmaps, child
-allocators) self-register a cleanup with `compiler_defer` in their `_new`
-and call `compiler_undefer` from their `_free`. The pipeline-level
-`setjmp` handler runs `compiler_run_cleanups`, which walks the LIFO stack
-and releases everything still registered. This keeps `compiler_panic`
-correct even when failure happens deep inside a composition that has
-allocated several subsystems.
-
-What is *not* fatal: warnings, recoverable parse errors that have a sensible
-recovery point (skip-to-`;`, skip-to-`}`). The parser uses limited internal
-recovery for these and only escalates to `compiler_panic` when continued
-parsing would produce cascading garbage.
-
-## 8. Streaming
-
-Streams cleanly on direct lowering (`-O0` and targets that do not wrap with
-`opt_cgtarget`):
-
-- Lexer → preprocessor token stream.
-- Preprocessor → parser token stream.
-- Parser → CG → CGTarget calls within a function.
-- CGTarget → MCEmitter → ObjBuilder section bytes, appended via chunked buffers.
-
-Buffers per function (bounded, not per TU):
-
-- CG's value stack and label fixup tables.
-- Per-target register/frame state.
-- Optimizer's IR for the function being optimized, when only intra-procedural
- passes are enabled.
-
-Buffers per TU:
-
-- Symbol tables — relocations cannot be resolved until all definitions are
- seen. Final patching is deferred to ObjBuilder finalize / linker.
-- Debug info — DWARF tables reference final section layout.
-- `-O2` optimizer IR — cross-function inlining keeps all candidate function IR
- and call graph metadata until `cgtarget_finalize`.
-
-So the streaming guarantee is tiered:
-
-- `-O0` direct target: source and codegen are function-at-a-time.
-- `-O1` target-local optimization: function-at-a-time unless a target opts
- into specific buffering.
-- `-O2`: source is still read once, but optimized function IR may be retained
- per TU for IPO. This is intentional and bounded by the TU, not the whole
- program.
-
-## 9. Optimizer
-
-`opt` (`src/opt/opt.h`, `src/opt/ir.h`) implements `CGTarget`. The pass set and
-ordering are modelled on MIR (`mir-gen.c`) — that pipeline is proven, well
-understood, and a good fit for the "70% of -O2" target. The one cfree
-addition is cross-function inlining, which MIR does not have.
-
-IR shape: block-based SSA. Functions are lists of basic blocks; blocks have
-`Phi`s at the top; instructions reference values by SSA id. `Func` also owns
-first-class frame-slot and parameter tables so `-O0` frame residency,
-parameter ingress, mem2reg promotion, and debug locations all refer to the
-same objects. The op set is small (integer constants, exact byte constants,
-mem ops, aggregate ops, bitfield ops, explicit integer and floating-point
-arith, compares, conversions, GEP, calls, terminators, an opaque `ASM_BLOCK`,
-plus `IR_VA_*` and `IR_SETJMP`/`IR_LONGJMP`). `Inst` stays compact; ordinary
-instructions define one `Val`, while multi-result instructions carry
-`defs[0..ndefs)`. Complex per-op facts live in arena-owned typed aux structs
-(`IRCallAux`, `IRAggregateAux`, `IRBitFieldAux`, `IRGepAux`, `IRAsmAux`,
-`IRPhiAux`, `IRCasAux`). This keeps calls, aggregate copies, asm, CAS
-multi-results, and ABI metadata inspectable by passes without turning every
-instruction into a large union.
-
-The IR is flat-CFG: structured-scope ops on `CGTarget` (§5.1) are flattened by
-`opt_cgtarget`'s recorder into ordinary labels, branches, and basic blocks. WASM
-lowering at -O2 therefore needs to reconstruct structure (relooper) before
-emitting. At -O0/-O1 there is no `opt_cgtarget` wrapper and CG drives the WASM
-backend directly, producing structured output by construction.
-
-`IR_SETJMP` is a control barrier: opt does not inline across it, does not
-hoist through it, and does not GVN-merge values defined on either side.
-`IR_LONGJMP` has no successors (control does not return). The library setjmp
-symbol used on real arches is recognized by name and gets the same treatment
-when it appears as the callee of an `IR_CALL`.
-
-**No UB-exploiting passes.** Rules in opt may not assume that a UB-triggering
-operation (signed overflow, shift-by-≥-width, division by zero, null deref)
-is unreachable. WASM traps deterministically on the first three and faults on
-the fourth — the program terminates rather than time-traveling. Real-target
-behavior is also more predictable this way. The "70% of -O2" goal is
-achievable without these rules. `Inst.flags` is general-purpose; no specific
-bit allocations are reserved. If a non-UB-exploiting pass that benefits from
-operation-level annotations arrives later, the path is to thread a flags
-argument through `CGTarget.binop` and into `IR_*` then — not before.
-
-### 9.1 Lifecycle
-
-- `func_begin` allocates a fresh `Func` IR container in the per-TU IR arena.
-- `alloc_reg(class, type)` returns a fresh virtual `Reg` whose mapping to a
- `Val` is recorded; `free_reg` is a hint and ignored.
-- `frame_slot` and `param` populate `Func.frame_slots` and `Func.params`.
- Parameter ABI incoming parts are visible to later promotion, debug, and
- replay.
-- Every other emit call appends one SSA `Inst` to the current basic block.
- Each `Inst` carries the `SrcLoc` set by the most recent `CGTarget.set_loc`.
- `call(CGCallDesc)`, `atomic_cas`, and ABI split returns use the multi-result
- `defs` convention.
-- `func_end` runs the **intra-procedural** pipeline (§9.2) and stores the
- optimized `Func`. **No lowering yet.**
-- `cgtarget_finalize` runs the **inter-procedural** pipeline (§9.3) over all
- buffered functions, then for each function runs the **lowering** pipeline
- (§9.4) which drives the wrapped target CGTarget via `CGTarget.set_loc` +
- emit-side calls.
-
-The driver therefore looks like:
-
-```c
-parse_c(c, pp, decls, cg);
-cgtarget_finalize(target); /* no-op for plain CGTarget; runs IPO+lower for opt */
-emit_elf(c, ob, w);
-```
+The toy frontend exists to exercise and test the public CG API independently of
+C language semantics.
-At `-O0` the wrapper is not used and the target CGTarget is driven directly
-during parse, with no function IR retention. `-O1` may use only local
-lowering/target peepholes and remains function-at-a-time. `-O2` uses
-`opt_cgtarget` and may retain IR for all functions in the TU.
+## Link and Run Data Flow
-Memory cost at `-O2`: the IR for every function in a TU is held in the per-TU
-IR arena until `cgtarget_finalize`. Per-pass scratch lives in `Arena scratch`,
-not in the IR arena.
-
-### 9.2 Intra-procedural pipeline (per `Func`, on `func_end` at `-O2`)
+### File link
```
-build_cfg
-block_cloning (hot path duplication; skipped if it would block addr_xform)
-build_ssa (incl. promotion of non-address-taken FrameSlots —
- mem2reg is folded in, not a separate pass)
-addr_xform (fold GEP-equivalent address insns into uses)
-gvn (incl. constprop, redundant-load elimination)
-copy_prop (incl. redundant-extension elimination)
-dse (dead store elimination)
-ssa_dce
-build_loop_tree + licm
-pressure_relief
-make_conventional_ssa + ssa_combine + undo_ssa
-jump_opt
+objects / object bytes / archives / DSO stubs
+ -> cfree_link_exe()
+ -> Linker
+ -> object/archive readers
+ -> symbol resolution
+ -> layout
+ -> relocation
+ -> executable writer
```
-### 9.3 Inter-procedural pipeline (over all `Func`s, on `cgtarget_finalize`)
+The linker accepts already-built `CfreeObjBuilder` values, encoded object
+bytes, archives, and dynamic library inputs described by public API options.
+It owns archive member selection, symbol resolution, section and segment
+layout, relocation, build-id/image-id handling, and final image emission.
+`cfree_link_shared()` has a public option surface, but currently reports that
+shared-library codegen is not supported.
-Inlining doesn't pay off without a follow-up: the new opportunities (callee
-arguments that are now constants, branches in the callee that are now dead,
-redundant ops shared across the caller/callee boundary, callee bodies that
-landed inside a caller loop) only get realised by re-running intra-procedural
-passes on the modified caller.
+### JIT run and debug
```
-opt_inline (call-graph bottom-up; SCCs skipped for v1)
-for each dirty caller:
- opt_cleanup (subset re-run: gvn, copy_prop, ssa_dce, jump_opt,
- licm if loops, addr_xform if uses remain)
+source/object inputs
+ -> compile/link to LinkImage
+ -> cfree_link_jit()
+ -> executable-memory host vtable
+ -> CfreeJit / CfreeJitSession
+ -> run or dbg
```
-Iteration (`inline → cleanup → inline → ...`) is bounded by `-finline-iters=N`
-(default 1, hard cap enforced by opt_cgtarget). Tuning is benchmark-driven.
+The JIT path shares the same compile, object, symbol, and relocation machinery
+as file output. Mapping executable memory is delegated to the host through
+`CfreeEnv`; libcfree enforces the image layout and relocation model.
-### 9.4 Lowering pipeline (per `Func`, after IPO, drives target CGTarget)
+`driver/run.c` invokes an entry point in-process. `driver/dbg.c` builds on JIT
+sessions and `src/dbg/` for breakpoints, stepping, register display, and memory
+inspection.
-```
-machinize (target ABI lowering, 2-op forms, call lowering)
-build_loop_tree (-O1+, used by RA)
-coalesce (-O2, move-related)
-live_info
-regalloc (linear scan; live-range splitting at -O2)
-combine (-O1+, code selection: merge dependent insns)
-dce (-O1+, post-RA)
-opt_emit (prolog/epilog; insn split; drive target CGTarget)
-```
+### Emulation
-### 9.5 Inline asm
-
-`ASM_BLOCK` is opaque: passes treat it as reading its input operands, writing
-its output operands and clobbers, and not commuting with surrounding memory
-ops. Inline asm is therefore safe across optimization without per-asm
-modelling.
-
-## 10. Inline asm
-
-Two callers exercise the asm machinery:
-
-- Standalone `.s`: tokens → `parse_asm` → `MCEmitter.emit_bytes`/
- `emit_reloc_at`/`emit_label_ref` → `ObjBuilder`. Bypasses cg entirely;
- operands are literal registers, immediates, labels, and symbols from the asm
- syntax itself.
- Standalone `.s` does not go through `opt_cgtarget`.
-- Inline `asm("...": outs : ins : clobbers)` inside C: invoked via
- `cg_inline_asm`. Flow:
-
- 1. Parser parses constraint list and template; evaluates each input/output
- expression so inputs are `SValue`s on the CG stack and each output binds
- an lvalue.
- 2. cg pops inputs (in declaration order), packs them into an `Operand[]`,
- and calls `CGTarget.asm_block(tmpl, outs, ins, clobbers)`.
- 3. The arch implementation does **constraint binding** (`r`, `m`, `i`,
- `=&r`, matching constraints, ...), then walks the template and assembles
- each instruction. Under `opt_cgtarget` this is recorded as one `IR_ASM_BLOCK`
- and replayed on the target arch at lowering time, after RA has assigned
- the bound virtuals to physicals.
- 4. arch fills `out_ops[]` with the location holding each result; cg pushes
- those back as new SValues.
-
-The asm parser is shared between the standalone path (writing directly to
-`MCEmitter`) and the inline path (used as a template walker inside
-`CGTarget.asm_block`). Constraint binding is per-arch.
-
-`"memory"` clobber is conservative: cg flushes all live stack-resident values
-to memory before the block and reloads after. This is suboptimal but
-correct.
-
-Asm syntax (decided, single supported flavour per arch):
-
-- x86 (32 + 64): AT&T. Same parser serves both inline asm and standalone
- `.s`. Matches GCC inline-asm convention.
-- ARM (32 + 64): GNU `as` ("unified") syntax.
-- RISC-V (32 + 64): GNU `as` syntax.
-- WASM: WAT (text format).
-
-Open: full GCC-syntax constraint coverage (early-clobber, matching `0`,
-multi-alternative). v1 covers `r`, `m`, `i`, `a`, `=r`, `+r`, `=m`, `=&r`,
-matching constraints. The remainder is deferred.
-
-## 11. DWARF debug info
-
-Debug info lives in `src/debug/` and is owned by a single `Debug` object that
-collects events during compilation and emits `.debug_*` sections at the end
-of the TU.
-
-**Inputs (called during compilation):**
-
-| Producer | Calls |
-|---|---|
-| Driver | `debug_file(source_file_id)` to populate the DWARF file table from `SourceManager`. |
-| CG | `debug_func_begin/end`, `debug_scope_begin/end`, `debug_param`, `debug_local`. cg holds an optional `Debug*` (NULL when `-g` is off). |
-| MCEmitter (or opt's lowering pass) | `debug_line` per emitted instruction, sourced from the `SrcLoc` set by `CGTarget.set_loc`/`MCEmitter.set_loc`; `debug_func_pc_range` after each function is laid out. |
-| opt at `-O2` | `debug_loclist_*` when a variable's location changes across the function. The `SrcLoc` propagates through opt because every recorded `Inst` carries it. |
-
-**Outputs:** `.debug_info`, `.debug_abbrev`, `.debug_line`, `.debug_str`,
-`.debug_aranges`, `.debug_rnglists`, `.debug_loclists` — written into the
-same `ObjBuilder` when `debug_emit` is called. `debug_emit` runs after all
-code sections are finalized but before file emitters consume the builder.
-
-**Variable locations:** at `-O0`, all locals live at stable frame offsets and
-`DebugVarLoc` is `DVL_FRAME`; this gives full debuggability for free. With
-`opt`, the lowering pass produces `DVL_LOCLIST` entries describing where a
-variable lives across PC ranges. v1 may downgrade opt'd debug info to
-function-level only (start/end PC, no locals); refining to per-variable
-location lists is a follow-up but the interface already accommodates it.
-
-**Type DIEs:** generated on demand from the `Type*` reaching `debug_local` /
-`debug_param`, with sizes, alignments, and member offsets supplied by
-`TargetABI`. Interned by `Type*` identity (which is already pointer-equal for
-equal types thanks to `Pool global`).
-
-## 12. Cross-cutting decisions
-
-- **Interning is global**, in `Pool global`. `Sym` (32-bit string id) is the
- currency for spellings and lookup keys, not symbol identity. Symbol table
- identity is object-scoped (`ObjSymId`, §5.3) until the linker resolves
- definitions. C tag identity is scoped `TagId`, not `Sym`, so equal tag
- spellings in different scopes remain distinct. Equal types are pointer-equal
- after `pool_type` (same applies to strings: pool_intern returns the canonical
- id). On WASM, this `Type*` identity is also the source of truth for
- `call_indirect` type-index assignment.
-- **Source identity is centralized.** `SrcLoc.file_id` belongs to
- `SourceManager`, not to the lexer, preprocessor, diagnostics, or debug
- emitter. Macro expansion and include edges are recorded once and reused by
- diagnostics, DWARF, and dependency generation.
-- **Locals and parameters always start frame-resident.** `cg_local` and
- `cg_param` allocate stable `FrameSlot`s through `CGTarget.frame_slot` and
- `CGTarget.param`. Promotion to virtual registers (and to WASM-locals on
- that target) happens *inside* SSA construction: `build_ssa` (§9.2) promotes
- any slot whose `FrameSlotFlag` never had `FSF_ADDR_TAKEN` set. Address-
- taken slots remain as memory ops and are reasoned about through `MemAccess`
- alias roots. There is no separate mem2reg pass — SSA construction already
- has to decide which `FrameSlot` accesses become Phi chains vs which stay
- loads/stores, and a second pass would re-walk the same decisions. At -O0
- every slot stays on the frame, which is the same shape `Debug` wants for
- `DVL_FRAME` (§11) — full debuggability for free, no parser pre-scan needed.
-- **Function-pointer ABI is a linker concern.** A function symbol's address
- taken via `&f` lowers to a normal `ObjSymId`-relative `Operand`.
- ELF/COFF/Mach-O resolve this directly. WASM file emitters and the JIT linker
- walk function-address relocations (`R_WASM_FUNCIDX` / `R_WASM_TABLEIDX`) while
- building the shared `LinkImage` and assign indirect-function-table slots; the
- slot index is the pointer's bit pattern. CG and `CGTarget` are unaware.
-- **Sections are chunked.** A `Section.bytes` is a linked list of fixed-size
- chunks. Append is O(1). Backward patching uses a 32-bit flat offset
- computed at finalize time, so forward fixups don't depend on chunk
- boundaries.
-- **Error model is `setjmp`/`longjmp`.** See §7.
-- **Single-pass parser+CG.** No separate AST. The optimizer reconstructs an
- IR by recording CGTarget calls; this is technically two-pass *within a function*
- but the source is read once.
-- **Self-hosting constraint.** Anything in `src/` must be writable in C11
- freestanding (with the runtime in `include/`/`lib/`). No GNU extensions, no
- libc beyond what cfree itself ships. Bootstrap is hex0-seed → small subset
- → full cfree; details TBD.
-
-## 13. Build composition
-
-The driver-facing API is layered (`src/driver/pipeline.h`). Most consumers
-should not hand-compose the pipeline; they should call one of:
-
-- `cfree_compile_obj(c, opts, input, &ob)` — one TU → in-memory `ObjBuilder*`
- for chaining into the linker.
-- `cfree_compile_obj_emit(c, opts, input, writer)` — one TU → encoded `.o`
- bytes via the caller's `Writer*` (cc -c).
-- `cfree_link_exe(c, link_opts, writer)` — link → executable bytes.
-- `cfree_link_jit(c, link_opts, &jit)` — link → owning `CfreeJit*`.
-- `cfree_run(opts)` — convenience composition for the multi-input case.
-
-Data contracts at each boundary:
-
-- `compile_obj → link`: `ObjBuilder*` is the cross-API currency. The
- returned builder is finalized; do not write further. Lifetime is tied to
- the `Compiler`; it must remain alive until link is done.
-- `compile_obj_emit → file`: `Writer*`. The `ObjBuilder` is consumed and
- released inside the call. On nonzero return the Writer may contain
- partial output and should not be consumed.
-- `link → exe`: `Writer*`. No path appears in the core API. Same partial-
- output caveat on nonzero return.
-- `link → jit`: `CfreeJit*` owns its `LinkImage` and mapped pages; lookups
- are by `Sym` (interned name) — `ObjSymId` never escapes.
-
-Each layered function (`cfree_compile_obj`, `cfree_compile_obj_emit`,
-`cfree_link_exe`, `cfree_link_jit`) saves and restores `Compiler.panic`
-around its own `setjmp`, so they are safely callable from inside another
-active panic boundary (for example from `cfree_run`). Library resolution
-(`-lfoo` against `-L` paths) is the CLI driver's job; archives reaching
-`CfreeOptions` must already be concrete paths.
-
-Path-shaped helpers (`cfree -c file.c -o file.o`, `ld a.o b.o`, etc.) live
-in driver-level adapters. They call `c->env->file_io->read_all` to obtain
-byte buffers, then feed the byte/Writer APIs above. The freestanding core
-never takes paths.
-
-The internal one-TU sequence used by `cfree_compile_obj` looks like:
-
-```c
-ObjBuilder* ob = obj_new(c);
-Pp* pp = pp_new(c); /* reads c->env->file_io */
-DeclTable* decls = decl_new(c, ob);
-MCEmitter* mc = mc_new(c, ob);
-CGTarget* a = cgtarget_new(c, ob, mc);
-if (opt_level >= 1) a = opt_cgtarget_new(c, a, opt_level);
-Debug* d = dbg ? debug_new(c, ob) : NULL;
-CG* g = cg_new(c, a, d);
-
-pp_push_input(pp, lex_open_mem(c, name, src, len)); /* borrows src */
-parse_c(c, pp, decls, g);
-
-cgtarget_finalize(a); /* IPO + lowering at -O2; no-op otherwise */
-if (d) debug_emit(d);
-obj_finalize(ob);
+```
+guest ELF bytes
+ -> emu ELF loader
+ -> decode/lift guest basic blocks
+ -> CGTarget or opt_cgtarget
+ -> JIT image
+ -> emu runtime
```
-Order is load-bearing: `cgtarget_finalize` flushes lowered code, `debug_emit`
-appends `.debug_*` sections, `obj_finalize` freezes the read-side view, and
-only then may file emitters or the linker consume the builder.
-
-Each subsystem `_new` registers a cleanup with `compiler_defer` and the
-matching `_free` pops it via `compiler_undefer` (§7), so a panic anywhere
-in the sequence above unwinds correctly through `compiler_run_cleanups`.
-
-## 14. Open questions
-
-- WASM is structurally different from the register-shaped CGTarget (stack VM,
- no ELF-style relocations). The `Operand`-driven CGTarget will lower verbosely
- (every `binop` becomes `local.get; local.get; iN.add; local.set`); a
- follow-up peephole pass for stack-shape lowering will reclaim most of the
- bloat. Worth prototyping early to validate the abstractions.
-- Bootstrap subset definition: which features must the seed compiler accept?
-- Debug-info quality at `-O2`: minimum acceptable v1 is function-level
- (low_pc/high_pc + parameter list at entry); per-variable location lists
- for opt'd locals are a follow-up but the `Debug` interface admits them.
-- WASM relooper at -O2: choosing between Stackifier-style (preserve flat CFG
- with relooped wrappers) and Relooper-style (reconstruct nested scopes).
- Affects code size and opt's freedom to introduce irreducible CFGs.
-- Full VLA support beyond `__builtin_alloca`: deferred for v1
- (`__STDC_NO_VLA__=1`). The `IR_ALLOCA`/`CGTarget.alloca_` interface accommodates
- it when the parser is extended.
-
-## 15. Safety model (WASM target)
-
-cfree's WASM backend inherits the WebAssembly sandbox; the goal here is to be
-explicit about what that does and does not buy.
-
-**Checked at runtime:**
-
-- **Linear-memory bounds.** Every load and store traps on out-of-bounds.
-- **Control-flow integrity for direct branches.** Structured `block`/`loop`/
- `if` mean a `br N` can only target a lexically enclosing scope. The
- structured `CGTarget` ops (§5.1) are the source of this — flat goto and
- `switch` fallthrough route through the relooper at -O2 and through the
- WASM CGTarget's structural fallback at -O0/-O1.
-- **CFI for indirect calls.** `call_indirect` traps on signature mismatch.
- The WASM type index is keyed off interned `Type*` identity (§12), so equal
- C function types produce a single WASM type id and a real (not vacuous)
- type check.
-- **No native code injection.** WASM has no `mprotect`/JIT-into-data path
- exposed to the program; cfree's own JIT linker uses host APIs outside the
- sandbox.
-- **`setjmp`/`longjmp`** lower to WASM exception handling; a `longjmp` cannot
- smash the host stack or skip past a structured-control-flow boundary it
- did not originate inside.
-
-**NOT checked:**
-
-- **Pointer provenance.** Pointers are `i32` indices into linear memory.
- `(int*)0xdeadbeef` is a valid bit pattern; the only guard is the bounds
- check on the eventual access. Use-after-free, type confusion, and
- intra-heap buffer overflow that stays inside linear memory all remain
- exploitable — exactly as on a real target.
-- **Integer/UB traps as a safety net.** Signed overflow, shift-by-≥-width,
- and division-by-zero trap *deterministically* on WASM, but `opt` is not
- permitted to assume they're unreachable (§9). They terminate the program;
- they are not a substitute for input validation.
-- **Stack exhaustion** beyond the configured WASM stack limit: traps, but
- recovery requires host-side restart.
-
-In short: WASM gives cfree-compiled programs **memory-isolation** safety
-(can't escape linear memory) and **control-flow-integrity** safety (can't
-forge a return address or call a wrong-typed function), but not
-**type-system** safety on pointers within linear memory. The compiler does
-not pretend otherwise.
+The emulator is a user-mode ELF runner. It translates guest basic blocks into
+the same backend/JIT infrastructure used by native JIT compilation. The public
+`CfreeEmuOptions.optimize` API currently reserves level `2`; implemented use is
+through the available direct or optimizer-backed translation paths described in
+`include/cfree.h` and `doc/EMU.md`.
+
+## Object and Symbol Model
+
+`CfreeSym` is an interned spelling. It is suitable for identifiers, section
+names, symbol names, and lookup keys, but it is not itself a definition.
+
+`CfreeCgSym` is the public CG handle for a symbol inside one generated object.
+Internally, object builders use object-scoped symbol ids so local symbols from
+different objects do not collide. Linker resolution builds a separate
+resolved-symbol table over all input objects and matches externally visible
+definitions by name, binding, and object-format rules.
+
+`ObjBuilder` is the canonical in-memory object representation during
+compilation and assembly. Object writers, the linker, object inspection, debug
+emission, and JIT image construction consume this model rather than duplicating
+section/symbol/relocation storage.
+
+## State and Ownership
+
+The host supplies storage and side effects through `CfreeEnv`: heap,
+diagnostics, file I/O, executable memory, debugger OS hooks, JIT TLS hooks, and
+time. Public APIs receive explicit options and handles; internal subsystems
+hang state off `CfreeCompiler`, `CfreePipeline`, `CfreeCg`, `ObjBuilder`,
+`Linker`, `CfreeJit`, `CfreeJitSession`, `CfreeEmu`, or frontend-owned context
+structures.
+
+Compile inputs are byte buffers owned by the caller and must outlive the call.
+Writers are host-owned. Builders returned by `cfree_compile_obj` are owned by
+the compiler and must remain alive until consumers finish with them. Encoded
+object bytes, archive bytes, and DSO bytes are borrowed by link calls for the
+duration of the call unless a specific API says otherwise.
+
+## Current Optimization Contract
+
+- `opt_level == 0`: direct code generation into the selected backend.
+- `opt_level == 1`: implemented optimizer-backed path. It records CGTarget
+ operations as IR, performs the implemented backend-prep and local cleanup
+ pipeline, allocates registers, combines, removes dead code, and emits through
+ the wrapped real target.
+- `opt_level == 2`: not yet implemented as a full optimization level. Public
+ option fields and some internal pass declarations reserve this level, but the
+ current design should treat `-O2` as future work rather than a dependable
+ behavior contract.