commit 6c3ea14efbae7eed1d31df74207ab01753448462
parent 9c1a093280492ba7866eb3cebbb445c716da8ecb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Thu, 7 May 2026 14:52:20 -0700
DESIGN.md and interfaces
Diffstat:
34 files changed, 3106 insertions(+), 157 deletions(-)
diff --git a/README.md b/README.md
@@ -23,3 +23,13 @@ It features:
- Reproducible builds
- A build and packaging system
- Bootstrap from hex0-seed
+
+cfree also provides these headers beyond the freestanding set:
+- stdatomic.h
+- assert.h
+- setjmp.h
+
+And cfree-specific extensions:
+- cfree/syscall.h
+- cfree/baremetal.h
+- cfree/coro.h
diff --git a/doc/DESIGN.md b/doc/DESIGN.md
@@ -0,0 +1,860 @@
+# cfree design
+
+Architecture of the cfree compiler, assembler, and linker. Companion to
+`README.md`. Scope: how the modules fit together and what their contracts are.
+Not a tutorial; not implementation notes.
+
+## 1. Goals
+
+- Conforming C11 freestanding compiler, written in C11.
+- Single multi-call binary: `cc`, `cpp`, `as`, `ld`, `ar`, `objdump`, `dbg`.
+- Targets: x86 (32/64), ARM (32/64), RISC-V (32/64), WASM.
+- Output: object files (ELF, COFF, Mach-O, WASM) and executables.
+- In-memory JIT path sharing the entire pipeline with the file path.
+- Lightweight optimizer at roughly 70% of GCC/Clang `-O2` on integer code.
+- Self-hosting. Bootstraps from a hex0-seed.
+- Streaming wherever feasible. Direct lowering is function-at-a-time; `-O2`
+ may retain per-TU IR for inter-procedural optimization.
+
+This design keeps the full project goals visible, but the interface contracts
+below are currently tightened around the compiler, object emission, linker,
+and JIT path. Standalone tool-specific surfaces (`ar`, `objdump`, `dbg`,
+packaging, bootstrap) are allowed for by the shared model but are not the focus
+of this pass.
+
+## 2. Non-goals (v1)
+
+- C++, Objective-C.
+- C11 variable-length arrays and variably-modified types (`__STDC_NO_VLA__`).
+- Cross-TU LTO, PGO, autovectorization beyond peephole-level idiom recognition.
+- Thread-safe parallel compile inside one process.
+- Sanitizers, coverage instrumentation.
+- `_Generic` corner cases that require multi-pass disambiguation are best-effort.
+
+## 3. Layout
+
+```
+include/ public C11 headers shipped with the compiler (the runtime)
+lib/ compiler-rt (the runtime)
+src/
+ core/ allocators, intern pool, source manager, diagnostics, buffers, target
+ lex/ shared tokenizer (C and asm)
+ pp/ C preprocessor
+ type/ target-neutral C type interning and compatibility
+ abi/ target ABI type layout and call classification
+ decl/ C declaration, linkage, storage-duration, and initializer model
+ parse/ C11 parser, asm parser
+ cg/ single-pass value-stack code generator
+ arch/ CGTarget + MCEmitter interfaces and per-arch backends
+ opt/ lightweight SSA IR + passes; presents itself as a CGTarget
+ obj/ in-memory object model + per-format file writers and readers
+ debug/ DWARF info collection + emission
+ link/ symbol resolution, relocation, exe writer, JIT linker
+ driver/ multi-call dispatch and command-line front-ends
+test/
+doc/
+```
+
+The compiler source lives in `src/`. `include/` and `lib/` are the runtime that
+ships *with* the compiler (the freestanding stdlib and compiler-rt) and are
+not built by the compiler-development tree.
+
+## 4. Dataflow
+
+```
+.c → lex → pp → parse_c → decl + cg → CGTarget → MCEmitter → ObjBuilder ──┬──→ emit_{elf|coff|macho|wasm} → .o / exe
+.s → lex → parse_asm ───────────────→ MCEmitter ──────────────────┤
+ ├──→ link_file (.o + archives → exe)
+ └──→ link_jit (mmap + exec)
+```
+
+Reading order, left to right:
+
+1. `lex` produces a stream of raw tokens (idents, numbers, punctuators,
+ strings). Tokens preserve exact spelling; literals carry deferred `LitId`
+ handles rather than host-decoded numeric values.
+2. For C: `pp` consumes tokens, expands macros, and emits a stream of
+ preprocessed tokens. For asm: tokens go straight to `parse_asm`.
+3. `parse_c` is recursive-descent over preprocessed tokens. It records C
+ declaration semantics in `DeclTable` and drives `cg` for executable code.
+ There is no explicit AST.
+4. `cg` maintains a value stack à la TCC. Each parser action manipulates that
+ stack: pushes, loads, stores, aggregate copies, conversions, calls. At
+ `-O0`, CG owns live value lifetimes, spills, reloads, and preservation
+ across calls/asm; the target provides scratch registers and spill/reload
+ mechanics.
+5. `CGTarget` is the typed C/IR lowering vtable. Concrete targets lower those
+ operations into machine emission; the optimizer also implements `CGTarget`
+ by recording the call sequence as IR per function, running
+ intra-procedural passes on `func_end`, and on `cgtarget_finalize` running
+ cross-function passes before replaying into the wrapped target `CGTarget`.
+6. `MCEmitter` is the machine/object emission vtable. It owns section position,
+ bytes, alignment/fill, relocations at explicit offsets, machine-label
+ references, and source locations for debug line emission.
+7. `ObjBuilder` is the single in-memory object representation. It accepts
+ sections, bytes, symbols, and relocations on the write side, and exposes
+ read accessors for file writers, the linker (file and JIT), and objdump.
+
+`parse_asm` bypasses `cg` and writes directly into `MCEmitter`; inline asm
+is a typed `CGTarget.asm_block` operation that lowers through the target's asm
+machinery. See §10.
+
+## 5. Key interfaces
+
+### 5.0 `SourceManager` (`src/core/core.h`) — source identity
+
+`SourceManager` is owned by `Compiler` and is the authority for `SrcLoc.file_id`.
+It registers real files, memory inputs, builtins, and macro-expansion pseudo
+files; maps file ids back to normalized paths and diagnostic spellings; records
+include edges; and exposes dependency iteration for `-M*` output. Lexer and
+preprocessor create source ids through it. Diagnostics, DWARF, dependency
+generation, and reproducible-build path handling read from it rather than
+inventing their own file tables.
+
+Macro-expanded tokens keep both spelling and expansion locations. Consumers
+that need user-facing diagnostics can ask for spelling locations; consumers
+that need execution/profiling/debug line attribution can ask for expansion
+locations. `Debug.debug_file` takes a source file id, not a raw path.
+
+### 5.1 `CGTarget` (`src/arch/arch.h`) — typed lowering
+
+`CGTarget` is a vtable representing "something that can accept typed C/IR
+operations for one function at a time". `cg` calls `CGTarget` after it has
+resolved an operation's operands to concrete `Operand` values (immediate,
+register, frame-relative, object-symbol-relative, indirect). Direct target
+implementations lower these operations into their `MCEmitter`; `opt` wraps a
+target `CGTarget` and records the same operations as IR before replaying them
+later.
+
+Method groups:
+
+- **Function lifecycle.** `func_begin(CGFuncDesc)`, `func_end`.
+ `CGFuncDesc` carries the function `ObjSymId`, `fn_type`, inspectable
+ `ABIFuncInfo`, parameter descriptors, and declaration location.
+- **Frame slots, parameters, and value lifetimes.** `frame_slot(FrameSlotDesc)`
+ creates stable frame-resident storage for locals, parameters, spills, sret,
+ and dynamic-allocation bookkeeping. `param(CGParamDesc)` binds a source
+ parameter index to its stable slot and ABI incoming parts. `alloc_reg(class,
+ type)` returns a
+ physical scratch register for real targets and a fresh virtual for
+ `opt_cgtarget`. CG, not the target, owns the `-O0` value stack: it uses
+ `clobbers`, `spill_reg`, and `reload_reg` to preserve live values across
+ register pressure, calls, and inline asm. `free_reg` releases a value-stack
+ claim; `opt_cgtarget` treats it as a hint.
+- **Control flow.** `label_new`, `label_place`, `jump`, `cmp_branch` (fused
+ compare-and-branch; the only conditional-branch primitive — for arbitrary
+ i1 values cg synthesizes `cmp_branch(CMP_NE, val, IMM_ZERO, label)`).
+- **Structured control flow.** `scope_begin(CGScopeDesc)`, `scope_else`,
+ `scope_end`, `break_to`, `continue_to`. `CGScopeDesc` carries explicit break
+ and continue labels, so C `for` continues land on the increment expression
+ instead of assuming the loop header. Real backends shim these onto
+ `label_new`/`label_place`/`jump` (no code-size cost). The WASM backend
+ consumes them natively to emit block/loop/if with structurally-bounded `br`
+ targets. `goto`, computed-goto, and `switch` fallthrough still go through
+ the flat label API. opt's IR is flat-CFG; at -O2 the WASM lowering pass
+ reconstructs structure from the flat IR.
+- **Data movement and aggregates.** `load_imm`, `load_const`, `copy`, `load`,
+ `store`, `addr_of`, `copy_bytes`, `set_bytes`, `bitfield_load`, and
+ `bitfield_store`. Scalar memory operations carry `MemAccess`; aggregate and
+ bitfield operations carry ABI-sized metadata so struct assignment, block
+ zeroing, byval copies, and bitfield accesses remain visible to opt and
+ direct backends.
+- **Arithmetic / compare / convert.** `binop` uses explicit integer and
+ floating-point op families (`BO_I*`, `BO_F*`) rather than inferring behavior
+ from operand type. `cmp` materializes 0/1; use `cmp_branch` when the result
+ feeds a branch. `convert` is explicit by `ConvKind`.
+- **Calls / return.** `call(CGCallDesc)` and `ret(CGABIValue*)`. The parser
+ type-checks `fn_type`; CG asks `TargetABI` for `ABIFuncInfo`, materializes
+ `CGABIValue`/`CGABIPart` arrays for direct, indirect/byval, sret, split, and
+ multi-register values, and passes that structured call/return shape to the
+ target. `callee.kind == OPK_GLOBAL` is a direct call; any other kind is
+ indirect. On WASM, `fn_type` selects the `call_indirect` type index —
+ interned `Type*` identity is the index source of truth (§12).
+- **alloca.** `alloca(dst, size, align)` — dynamic stack allocation. Reachable
+ only via `__builtin_alloca` since v1 does not parse VLAs (§2). Backend grows
+ the linear-memory or native shadow stack; result pointer in `dst`.
+- **Variadics.** `va_start`, `va_arg`, `va_end`, `va_copy`. `<stdarg.h>` macros
+ expand to compiler builtins which CG forwards here. Per-arch ABI: SysV
+ x86-64 manages the register-save area; arm64 manages its split gp/fp areas;
+ WASM walks the spilled-args memory.
+- **setjmp / longjmp.** Optional methods. Real backends leave them NULL: the
+ parser lowers `<setjmp.h>`'s `setjmp` to a normal call to `__cfree_setjmp`
+ (a hand-written .S in `lib/`) and opt recognizes the symbol by name as
+ returns-twice (no inlining across; values defined before the call are not
+ GVN-merged with values defined after). The WASM backend implements
+ `setjmp_`/`longjmp_` via the exception-handling proposal — there is no
+ saveable native SP, so a library-only implementation is impossible.
+- **Atomics.** `atomic_load`, `atomic_store`, `atomic_rmw`, `atomic_cas`,
+ `fence`. Atomic memory operations carry both `MemAccess` and `MemOrder`.
+ Backends route oversized atomics to compiler-rt; small atomics are inline.
+- **Inline asm.** `asm_block(tmpl, outs, ins, clobbers)` — per-arch
+ constraint binding plus template assembly, packaged as one operation. The
+ asm parser is reused as a template walker inside this call, but final bytes
+ and relocations are emitted through `MCEmitter`.
+- **Source location.** `set_loc(SrcLoc)` — sticky; subsequent emit-side
+ calls inherit it. `opt_cgtarget` stamps it onto each `Inst.loc`; target
+ backends forward it to `MCEmitter` for `Debug.line`.
+- **End-of-TU.** `finalize`.
+
+Implementations:
+
+- Real CGTargets per arch under `src/arch/`. Their `finalize` is a no-op.
+- `opt` (`src/opt/opt.h`) returns a wrapper CGTarget that records into IR.
+ Its `finalize` runs cross-function passes and lowers all buffered IR into a
+ wrapped target CGTarget.
+
+### 5.2 `MCEmitter` (`src/arch/arch.h`) — machine/object emission
+
+`MCEmitter` is the low-level emission vtable shared by target backends and
+assembler input. It owns the current section, byte position, machine-label
+creation/placement, raw byte output, fill/alignment, relocations against
+`ObjSymId` at explicit offsets, label references/fixups, and sticky source
+locations used by the debug line program.
+
+`CGTarget` implementations may hide instruction selection, register
+allocation, prolog/epilog emission, and instruction encoding behind their
+typed methods, but when they finally write object contents they go through
+`MCEmitter`. `parse_asm` uses the same emitter directly because assembler
+input is already machine-level syntax.
+
+### 5.3 Symbol identity — object-first
+
+`Sym` is only an interned spelling. It is used for identifiers, section names,
+debug names, and lookup keys, but it is not a symbol table entry.
+
+`ObjSymId` is the authoritative symbol handle during compilation, assembly,
+object reading, relocation emission, debug collection, and link input. It is
+scoped to one `ObjBuilder`, so two objects can both contain a local `static
+int x` without colliding, and an object reader can preserve local labels,
+section symbols, file symbols, unnamed temporary symbols, and external
+references faithfully. Parser declaration binding creates or reuses
+`ObjSymId`s in the current builder; `cg`, `CGTarget`, `MCEmitter`, `Debug`, and `ObjBuilder`
+traffic in those handles.
+
+The linker has its own resolved-symbol table built from each input object's
+`ObjSymId`s. Externally visible definitions are matched by `Sym` name and
+binding during resolution. JIT lookup and explicit entry selection are
+therefore name-based (`Sym`), not handle-based: object symbol handles are not
+portable across builders.
+
+### 5.3.1 `DeclTable` (`src/decl/decl.h`) — C declarations
+
+`DeclTable` is the C-language declaration layer above `ObjBuilder`. The parser
+uses it for storage class, linkage, visibility, TLS, inline/weak attributes,
+tentative definitions, static locals, explicit sections, and global
+initializers. It returns `DeclId`s for parser and CG bookkeeping and owns the
+mapping from a C declaration to its object-scoped `ObjSymId`.
+
+Global initialization is a list of `InitItem`s: zero ranges, exact
+`ConstBytes`, relocatable symbol references, and fills. `DeclTable` applies C
+rules such as tentative-definition coalescing and default section selection,
+then writes concrete sections, bytes, symbols, and relocations into
+`ObjBuilder`. `ObjBuilder` remains object-format canonical storage and does not
+learn C storage-duration rules.
+
+### 5.4 `TargetABI` (`src/abi/abi.h`) — target layout authority
+
+`Type` is structural and target-neutral: kind, qualifiers, element/parameter
+types, immutable record fields, array counts, scoped tag ids, tag spellings,
+and bitfield flags/widths.
+Records are built through a mutable `TypeRecordBuilder` and committed to an
+interned immutable `Type*`. Field flags distinguish normal fields, anonymous
+fields, flexible array members, bitfields, and zero-width bitfields. `Type`
+does not own target-dependent facts such as scalar widths, record size, field
+offsets, bitfield packing, aggregate alignment, or calling-convention
+classification.
+
+Record and enum tags carry a `TagId` in addition to their `Sym` spelling.
+`Sym` is only the diagnostic/debug spelling; `TagId` is scoped declaration
+identity. This prevents two unrelated `struct S` declarations in different C
+scopes from collapsing under global type interning.
+
+`TargetABI` is the one authority for those facts. It is initialized from
+`Compiler.target` and is available as `Compiler.abi`. Its responsibilities:
+
+- Builtin scalar profiles: width/alignment/signedness of C scalar types,
+ pointer size/alignment, `long double`, enum representation policy, and
+ target-defined library types (`size_t`, `ptrdiff_t`, `intptr_t`,
+ `uintptr_t`, `va_list`).
+- `sizeof`/`_Alignof` for every complete type.
+- Record layout: field byte offsets, bitfield storage units, bit offsets,
+ final size, final alignment, and incomplete-type diagnostics.
+- Calling convention classification: direct/indirect/split aggregate
+ arguments, return values, hidden sret pointers, byval copies, variadic
+ register-save/spill behavior, stack slot alignment, and inspectable
+ per-part placement data.
+
+Consumers must ask `TargetABI` rather than reading layout facts from `Type`.
+Parser/type checking use it for `sizeof`, `_Alignof`, field access, enum
+constant typing, and diagnostics. `cg` uses it before creating frame slots,
+before emitting aggregate/bitfield operations, and when selecting conversions.
+Calls use a hybrid model: `TargetABI` returns rich `ABIFuncInfo` data; CG turns
+that into `CGABIValue`/`CGABIPart` operands; target hooks handle only final
+instruction/OS-specific mechanics. `Debug` uses ABI data for DIE sizes, member
+locations, parameter locations, and sret/byval facts.
+
+### 5.5 `ObjBuilder` (`src/obj/obj.h`) — concrete
+
+The single in-memory object representation. There is no second implementation,
+so it is a concrete type rather than a vtable. Object, section, group, and
+symbol handles are explicit (`OBJ_SEC_NONE`, `OBJ_GROUP_NONE`,
+`OBJ_SYM_NONE`). The write API
+(`obj_section`/`obj_write`/`obj_reserve_bss`/`obj_symbol`/`obj_reloc`/
+`obj_finalize`) is what MCEmitter, CGTarget, and `.o` readers use; the read API
+(`obj_section_get`/`obj_relocs`/`obj_symbol_get`, symbol iteration with ids) is
+what file emitters, the linker, JIT, and future objdump use.
+
+`ObjBuilder` is a canonical superset model, not merely "bytes plus names".
+Sections carry both coarse compiler kind (`SEC_TEXT`, `SEC_DATA`, ...) and
+object semantics (`SSEM_PROGBITS`, `SSEM_RELA`, `SSEM_GROUP`, ...), flags,
+alignment, entry size, link/info references, and group membership. Symbols
+carry binding, kind, visibility, absolute/common/TLS state, common alignment,
+and object-scoped identity. Relocations record kind, explicit-addend versus
+in-place addend, pairing, target symbol, and addend. COMDAT/group membership
+is represented explicitly. `Writer` is a real byte sink with write, seek, tell,
+error, and close operations so file emitters do not depend on a hidden I/O
+side channel.
+
+Format-specific metadata is admitted only through typed enum fields
+(`ObjExtKind`, semantic kinds, flags) and narrowly-scoped extension values
+where a real format has no shared equivalent. Avoid opaque `void*` sidecars:
+linker, JIT, emitters, readers, and objdump must be able to inspect the
+canonical model without knowing which reader produced it.
+
+The invariant: the post-finalize state of an `ObjBuilder` is the same shape
+as what you'd get from reading a `.o` back in. So `read_elf` of a freshly
+emitted file produces an `ObjBuilder` indistinguishable from the one used to
+emit it, modulo permitted canonicalization of section ordering and string-table
+layout. Consumers (linker, objdump) don't care which path produced it.
+
+### 5.5.1 `LinkImage` (`src/link/link.h`) — resolved program image
+
+`Linker` accepts explicit inputs (`LinkInputId`) for fresh objects, object
+files, and archives. Resolution produces a `LinkImage`: a shared file/JIT data
+model containing resolved symbols (`LinkSymId`), final symbol addresses,
+segments, laid-out section placements (`LinkSectionId`), segment bytes, and
+relocation applications with concrete write locations. Undefined, duplicate,
+unsupported-relocation, and layout failures are fatal diagnostics through
+`Compiler.panic`.
+
+Executable emission and JIT mapping consume the same `LinkImage`. File writers
+read segment bytes, section placements, final addresses, and relocation records
+from the image. JIT maps fresh writable memory, copies the same segment bytes,
+applies relocation records at their `write_vaddr` locations, resolves allowed
+external symbols through `LinkExternResolver`, changes final permissions, and
+looks up exported/entry symbols by resolved `Sym` name. Object-local
+`ObjSymId` values never escape as JIT lookup handles. `JitImage` owns the mapped
+memory; the caller owns the `LinkImage` unless an API explicitly documents a
+transfer.
+
+### 5.6 `MemAccess` — explicit memory semantics
+
+`MemAccess` is attached to every typed memory operation (`load`, `store`,
+atomics, and IR memory instructions). It contains:
+
+- `type`: the semantic C object type being accessed.
+- `size`: ABI byte width of the access.
+- `align`: known byte alignment; `0` means unknown.
+- `flags`: volatility, atomicity, restrict-derived noalias facts, readonly /
+ writeonly knowledge, and explicit unaligned accesses.
+- `addr_space`: target address space / memory index (`0` for ordinary C
+ memory; WASM may use this for multiple memories later).
+- `alias`: an alias root, one of unknown, local, global `ObjSymId`, parameter,
+ heap, or string literal.
+
+`cg` derives `MemAccess` when it turns an lvalue into a memory operation:
+qualifiers supply `volatile` and `_Atomic`, `TargetABI` supplies size and
+minimum alignment, declaration binding supplies local/global/parameter roots,
+string literals supply string roots, and pointer arithmetic preserves the
+best known root until it escapes. Casts that lose provenance downgrade the
+root to `ALIAS_UNKNOWN`; `restrict` pointers create parameter roots with the
+restrict flag.
+
+Optimization rules:
+
+- Volatile memory operations are side effects. They may not be deleted,
+ merged, reordered with other volatile operations, or moved across calls or
+ inline asm with a memory clobber.
+- Atomic operations use both `MemAccess` and `MemOrder`; memory-order rules
+ dominate ordinary alias reasoning.
+- Nonvolatile accesses with disjoint known alias roots may be reordered or
+ used for redundant-load and dead-store elimination.
+- Unknown alias roots conservatively may alias any ordinary memory.
+- The metadata is a permission to optimize, not a UB oracle: opt still may
+ not assume invalid programs are unreachable (§9).
+
+### 5.7 `ConstBytes` — exact literal materialization
+
+`ConstBytes` is the representation for constants whose exact target bits
+matter. It carries the semantic `Type*`, ABI representation bytes, size, and
+alignment. The bytes are produced by literal parsing plus `TargetABI`, never
+by trusting host floating-point layout. This matters for hex floats,
+rounding, `float` versus `double`, target-specific `long double`, endian
+order, and future vector constants.
+
+`CGTarget.load_imm(dst, i64)` remains a convenience for small integer
+constants. `CGTarget.load_const(dst, ConstBytes)` is the general path. Target
+backends may encode the constant as an immediate, synthesize it with
+instructions, or place it in a constant pool / `.rodata` and emit a load.
+`cg_push_const` pushes an exact constant. `cg_push_float(double, type)` exists
+only as a convenience for parser paths that have already accepted host-double
+precision loss as harmless; conforming literal parsing should prefer
+`cg_push_const`.
+
+### 5.8 Tokens and literals — spelling first, decoding later
+
+`Tok` preserves exact token spelling for diagnostics, macro stringification,
+token pasting, dependency output, and faithful preprocessing. Numeric,
+character, and string literals carry a `LitId` into the lexer's/preprocessor's
+literal table. A literal record stores kind, encoding, suffix/encoding flags,
+the exact spelling, and decoded bytes/code units only when decoding is already
+target-independent.
+
+The lexer does not choose final C literal types and does not round floating
+literals through host `double`. The parser, with `TargetABI`, performs integer
+literal type selection, floating parsing/rounding, character literal value
+selection, string literal concatenation, and construction of exact
+`ConstBytes`. The preprocessor uses spelling and `LitId` to implement `#`,
+`##`, `__LINE__`/`__FILE__`, include handling, and macro expansion without
+discarding information the parser later needs.
+
+Bad literals remain tokens with `TF_LITERAL_BAD` plus spelling and source
+location so diagnostics can point at the exact source text and recovery can
+continue.
+
+## 6. Allocators and lifetimes
+
+cfree uses explicit allocators rather than a single global heap. Allocators are
+fields of `Compiler` (`src/core/core.h`) and are passed down to subsystems.
+
+| Allocator | Lifetime | Owns |
+|--------------|------------------------|--------------------------------------------------------|
+| `Pool global`| Process | Interned strings and interned types. |
+| `Heap output`| Output object/exe | Section chunks, reloc tables (survive into linker). |
+| `Arena tu` | One TU compile | Local symbols, parser scratch, SourceManager tables, ABI caches. |
+| `Arena scratch` | Reset per function | Value-stack scratch, fixup lists, lookahead buffers. |
+
+Rules:
+
+- A struct never owns its own heap implicitly. If it allocates, an allocator
+ reference is part of its API.
+- Arena resets are an explicit operation on the arena. Subsystems holding
+ pointers into a scratch arena must either copy them out before reset, or
+ treat them as invalidated.
+- Long-lived data (anything that outlives a TU) goes through `Pool global` or
+ `Heap output`. Don't copy from arenas into one of those — interning is the
+ only path in.
+- Source identities live in `Compiler.sources`. They are stable for the
+ compile/link invocation and are read by diagnostics, dependency output, and
+ DWARF emission.
+
+`Heap output` is a normal heap (typically `heap_libc`). The JIT does not
+compile directly into executable memory: `link_jit_image` consumes a resolved
+`LinkImage`, mmaps a fresh region, copies laid-out segments in, applies
+relocations in-place, and `mprotect`s final permissions. The `Heap` vtable
+still exists so the JIT can swap allocators for the *destination* mapping and
+so tests can substitute fakes.
+
+## 7. Error handling
+
+A single `Compiler` carries a `jmp_buf` and a `DiagSink`. Fatal errors call
+`compiler_panic`, which emits a diagnostic and `longjmp`s out of the entire
+parse/CG pipeline. Drivers establish the `setjmp` boundary at TU granularity.
+
+This means almost no function in `parse`, `cg`, or `arch` returns an error. The
+happy path is the only path. Subsystems clean up via arena reset, not by
+unwinding allocations one-by-one.
+
+What is *not* fatal: warnings, recoverable parse errors that have a sensible
+recovery point (skip-to-`;`, skip-to-`}`). The parser uses limited internal
+recovery for these and only escalates to `compiler_panic` when continued
+parsing would produce cascading garbage.
+
+## 8. Streaming
+
+Streams cleanly on direct lowering (`-O0` and targets that do not wrap with
+`opt_cgtarget`):
+
+- Lexer → preprocessor token stream.
+- Preprocessor → parser token stream.
+- Parser → CG → CGTarget calls within a function.
+- CGTarget → MCEmitter → ObjBuilder section bytes, appended via chunked buffers.
+
+Buffers per function (bounded, not per TU):
+
+- CG's value stack and label fixup tables.
+- Per-target register/frame state.
+- Optimizer's IR for the function being optimized, when only intra-procedural
+ passes are enabled.
+
+Buffers per TU:
+
+- Symbol tables — relocations cannot be resolved until all definitions are
+ seen. Final patching is deferred to ObjBuilder finalize / linker.
+- Debug info — DWARF tables reference final section layout.
+- `-O2` optimizer IR — cross-function inlining keeps all candidate function IR
+ and call graph metadata until `cgtarget_finalize`.
+
+So the streaming guarantee is tiered:
+
+- `-O0` direct target: source and codegen are function-at-a-time.
+- `-O1` target-local optimization: function-at-a-time unless a target opts
+ into specific buffering.
+- `-O2`: source is still read once, but optimized function IR may be retained
+ per TU for IPO. This is intentional and bounded by the TU, not the whole
+ program.
+
+## 9. Optimizer
+
+`opt` (`src/opt/opt.h`, `src/opt/ir.h`) implements `CGTarget`. The pass set and
+ordering are modelled on MIR (`mir-gen.c`) — that pipeline is proven, well
+understood, and a good fit for the "70% of -O2" target. The one cfree
+addition is cross-function inlining, which MIR does not have.
+
+IR shape: block-based SSA. Functions are lists of basic blocks; blocks have
+`Phi`s at the top; instructions reference values by SSA id. `Func` also owns
+first-class frame-slot and parameter tables so `-O0` frame residency,
+parameter ingress, mem2reg promotion, and debug locations all refer to the
+same objects. The op set is small (integer constants, exact byte constants,
+mem ops, aggregate ops, bitfield ops, explicit integer and floating-point
+arith, compares, conversions, GEP, calls, terminators, an opaque `ASM_BLOCK`,
+plus `IR_VA_*` and `IR_SETJMP`/`IR_LONGJMP`). `Inst` stays compact; ordinary
+instructions define one `Val`, while multi-result instructions carry
+`defs[0..ndefs)`. Complex per-op facts live in arena-owned typed aux structs
+(`IRCallAux`, `IRAggregateAux`, `IRBitFieldAux`, `IRGepAux`, `IRAsmAux`,
+`IRPhiAux`, `IRCasAux`). This keeps calls, aggregate copies, asm, CAS
+multi-results, and ABI metadata inspectable by passes without turning every
+instruction into a large union.
+
+The IR is flat-CFG: structured-scope ops on `CGTarget` (§5.1) are flattened by
+`opt_cgtarget`'s recorder into ordinary labels, branches, and basic blocks. WASM
+lowering at -O2 therefore needs to reconstruct structure (relooper) before
+emitting. At -O0/-O1 there is no `opt_cgtarget` wrapper and CG drives the WASM
+backend directly, producing structured output by construction.
+
+`IR_SETJMP` is a control barrier: opt does not inline across it, does not
+hoist through it, and does not GVN-merge values defined on either side.
+`IR_LONGJMP` has no successors (control does not return). The library setjmp
+symbol used on real arches is recognized by name and gets the same treatment
+when it appears as the callee of an `IR_CALL`.
+
+**No UB-exploiting passes.** Rules in opt may not assume that a UB-triggering
+operation (signed overflow, shift-by-≥-width, division by zero, null deref)
+is unreachable. WASM traps deterministically on the first three and faults on
+the fourth — the program terminates rather than time-traveling. Real-target
+behavior is also more predictable this way. The "70% of -O2" goal is
+achievable without these rules; reserved bits in `Inst.flags` can host
+`nsw`/`nuw`-style annotations later if a specific non-UB-exploiting pass
+needs them.
+
+### 9.1 Lifecycle
+
+- `func_begin` allocates a fresh `Func` IR container in the per-TU IR arena.
+- `alloc_reg(class, type)` returns a fresh virtual `Reg` whose mapping to a
+ `Val` is recorded; `free_reg` is a hint and ignored.
+- `frame_slot` and `param` populate `Func.frame_slots` and `Func.params`.
+ Parameter ABI incoming parts are visible to later promotion, debug, and
+ replay.
+- Every other emit call appends one SSA `Inst` to the current basic block.
+ Each `Inst` carries the `SrcLoc` set by the most recent `CGTarget.set_loc`.
+ `call(CGCallDesc)`, `atomic_cas`, and ABI split returns use the multi-result
+ `defs` convention.
+- `func_end` runs the **intra-procedural** pipeline (§9.2) and stores the
+ optimized `Func`. **No lowering yet.**
+- `cgtarget_finalize` runs the **inter-procedural** pipeline (§9.3) over all
+ buffered functions, then for each function runs the **lowering** pipeline
+ (§9.4) which drives the wrapped target CGTarget via `CGTarget.set_loc` +
+ emit-side calls.
+
+The driver therefore looks like:
+
+```c
+parse_c(c, pp, decls, cg);
+cgtarget_finalize(target); /* no-op for plain CGTarget; runs IPO+lower for opt */
+emit_elf(c, ob, w);
+```
+
+At `-O0` the wrapper is not used and the target CGTarget is driven directly
+during parse, with no function IR retention. `-O1` may use only local
+lowering/target peepholes and remains function-at-a-time. `-O2` uses
+`opt_cgtarget` and may retain IR for all functions in the TU.
+
+Memory cost at `-O2`: the IR for every function in a TU is held in the per-TU
+IR arena until `cgtarget_finalize`. Per-pass scratch lives in `Arena scratch`,
+not in the IR arena.
+
+### 9.2 Intra-procedural pipeline (per `Func`, on `func_end` at `-O2`)
+
+```
+build_cfg
+block_cloning (hot path duplication; skipped if it would block addr_xform)
+build_ssa
+addr_xform (fold GEP-equivalent address insns into uses)
+gvn (incl. constprop, redundant-load elimination)
+copy_prop (incl. redundant-extension elimination)
+dse (dead store elimination)
+ssa_dce
+build_loop_tree + licm
+pressure_relief
+make_conventional_ssa + ssa_combine + undo_ssa
+jump_opt
+```
+
+### 9.3 Inter-procedural pipeline (over all `Func`s, on `cgtarget_finalize`)
+
+Inlining doesn't pay off without a follow-up: the new opportunities (callee
+arguments that are now constants, branches in the callee that are now dead,
+redundant ops shared across the caller/callee boundary, callee bodies that
+landed inside a caller loop) only get realised by re-running intra-procedural
+passes on the modified caller.
+
+```
+opt_inline (call-graph bottom-up; SCCs skipped for v1)
+for each dirty caller:
+ opt_cleanup (subset re-run: gvn, copy_prop, ssa_dce, jump_opt,
+ licm if loops, addr_xform if uses remain)
+```
+
+Iteration (`inline → cleanup → inline → ...`) is bounded by `-finline-iters=N`
+(default 1, hard cap enforced by opt_cgtarget). Tuning is benchmark-driven.
+
+### 9.4 Lowering pipeline (per `Func`, after IPO, drives target CGTarget)
+
+```
+machinize (target ABI lowering, 2-op forms, call lowering)
+build_loop_tree (-O1+, used by RA)
+coalesce (-O2, move-related)
+live_info
+regalloc (linear scan; live-range splitting at -O2)
+combine (-O1+, code selection: merge dependent insns)
+dce (-O1+, post-RA)
+opt_emit (prolog/epilog; insn split; drive target CGTarget)
+```
+
+### 9.5 Inline asm
+
+`ASM_BLOCK` is opaque: passes treat it as reading its input operands, writing
+its output operands and clobbers, and not commuting with surrounding memory
+ops. Inline asm is therefore safe across optimization without per-asm
+modelling.
+
+## 10. Inline asm
+
+Two callers exercise the asm machinery:
+
+- Standalone `.s`: tokens → `parse_asm` → `MCEmitter.emit_bytes`/
+ `emit_reloc_at`/`emit_label_ref` → `ObjBuilder`. Bypasses cg entirely;
+ operands are literal registers, immediates, labels, and symbols from the asm
+ syntax itself.
+ Standalone `.s` does not go through `opt_cgtarget`.
+- Inline `asm("...": outs : ins : clobbers)` inside C: invoked via
+ `cg_inline_asm`. Flow:
+
+ 1. Parser parses constraint list and template; evaluates each input/output
+ expression so inputs are `SValue`s on the CG stack and each output binds
+ an lvalue.
+ 2. cg pops inputs (in declaration order), packs them into an `Operand[]`,
+ and calls `CGTarget.asm_block(tmpl, outs, ins, clobbers)`.
+ 3. The arch implementation does **constraint binding** (`r`, `m`, `i`,
+ `=&r`, matching constraints, ...), then walks the template and assembles
+ each instruction. Under `opt_cgtarget` this is recorded as one `IR_ASM_BLOCK`
+ and replayed on the target arch at lowering time, after RA has assigned
+ the bound virtuals to physicals.
+ 4. arch fills `out_ops[]` with the location holding each result; cg pushes
+ those back as new SValues.
+
+The asm parser is shared between the standalone path (writing directly to
+`MCEmitter`) and the inline path (used as a template walker inside
+`CGTarget.asm_block`). Constraint binding is per-arch.
+
+`"memory"` clobber is conservative: cg flushes all live stack-resident values
+to memory before the block and reloads after. This is suboptimal but
+correct.
+
+Asm syntax (decided, single supported flavour per arch):
+
+- x86 (32 + 64): AT&T. Same parser serves both inline asm and standalone
+ `.s`. Matches GCC inline-asm convention.
+- ARM (32 + 64): GNU `as` ("unified") syntax.
+- RISC-V (32 + 64): GNU `as` syntax.
+- WASM: WAT (text format).
+
+Open: full GCC-syntax constraint coverage (early-clobber, matching `0`,
+multi-alternative). v1 covers `r`, `m`, `i`, `a`, `=r`, `+r`, `=m`, `=&r`,
+matching constraints. The remainder is deferred.
+
+## 11. DWARF debug info
+
+Debug info lives in `src/debug/` and is owned by a single `Debug` object that
+collects events during compilation and emits `.debug_*` sections at the end
+of the TU.
+
+**Inputs (called during compilation):**
+
+| Producer | Calls |
+|---|---|
+| Driver | `debug_file(source_file_id)` to populate the DWARF file table from `SourceManager`. |
+| CG | `debug_func_begin/end`, `debug_scope_begin/end`, `debug_param`, `debug_local`. cg holds an optional `Debug*` (NULL when `-g` is off). |
+| MCEmitter (or opt's lowering pass) | `debug_line` per emitted instruction, sourced from the `SrcLoc` set by `CGTarget.set_loc`/`MCEmitter.set_loc`; `debug_func_pc_range` after each function is laid out. |
+| opt at `-O2` | `debug_loclist_*` when a variable's location changes across the function. The `SrcLoc` propagates through opt because every recorded `Inst` carries it. |
+
+**Outputs:** `.debug_info`, `.debug_abbrev`, `.debug_line`, `.debug_str`,
+`.debug_aranges`, `.debug_rnglists`, `.debug_loclists` — written into the
+same `ObjBuilder` when `debug_emit` is called. `debug_emit` runs after all
+code sections are finalized but before file emitters consume the builder.
+
+**Variable locations:** at `-O0`, all locals live at stable frame offsets and
+`DebugVarLoc` is `DVL_FRAME`; this gives full debuggability for free. With
+`opt`, the lowering pass produces `DVL_LOCLIST` entries describing where a
+variable lives across PC ranges. v1 may downgrade opt'd debug info to
+function-level only (start/end PC, no locals); refining to per-variable
+location lists is a follow-up but the interface already accommodates it.
+
+**Type DIEs:** generated on demand from the `Type*` reaching `debug_local` /
+`debug_param`, with sizes, alignments, and member offsets supplied by
+`TargetABI`. Interned by `Type*` identity (which is already pointer-equal for
+equal types thanks to `Pool global`).
+
+## 12. Cross-cutting decisions
+
+- **Interning is global**, in `Pool global`. `Sym` (32-bit string id) is the
+ currency for spellings and lookup keys, not symbol identity. Symbol table
+ identity is object-scoped (`ObjSymId`, §5.3) until the linker resolves
+ definitions. C tag identity is scoped `TagId`, not `Sym`, so equal tag
+ spellings in different scopes remain distinct. Equal types are pointer-equal
+ after `pool_type` (same applies to strings: pool_intern returns the canonical
+ id). On WASM, this `Type*` identity is also the source of truth for
+ `call_indirect` type-index assignment.
+- **Source identity is centralized.** `SrcLoc.file_id` belongs to
+ `SourceManager`, not to the lexer, preprocessor, diagnostics, or debug
+ emitter. Macro expansion and include edges are recorded once and reused by
+ diagnostics, DWARF, and dependency generation.
+- **Locals and parameters always start frame-resident.** `cg_local` and
+ `cg_param` allocate stable `FrameSlot`s through `CGTarget.frame_slot` and
+ `CGTarget.param`. A mem2reg-style pass during opt's lowering pipeline
+ promotes non-address-taken slots to virtual registers (and to WASM-locals on
+ that target). At -O0 every slot stays on the frame, which is the same shape
+ `Debug` wants for `DVL_FRAME` (§11) — full debuggability for free, no parser
+ pre-scan needed.
+- **Function-pointer ABI is a linker concern.** A function symbol's address
+ taken via `&f` lowers to a normal `ObjSymId`-relative `Operand`.
+ ELF/COFF/Mach-O resolve this directly. WASM file emitters and the JIT linker
+ walk function-address relocations (`R_WASM_FUNCIDX` / `R_WASM_TABLEIDX`) while
+ building the shared `LinkImage` and assign indirect-function-table slots; the
+ slot index is the pointer's bit pattern. CG and `CGTarget` are unaware.
+- **Sections are chunked.** A `Section.bytes` is a linked list of fixed-size
+ chunks. Append is O(1). Backward patching uses a 32-bit flat offset
+ computed at finalize time, so forward fixups don't depend on chunk
+ boundaries.
+- **Error model is `setjmp`/`longjmp`.** See §7.
+- **Single-pass parser+CG.** No separate AST. The optimizer reconstructs an
+ IR by recording CGTarget calls; this is technically two-pass *within a function*
+ but the source is read once.
+- **Self-hosting constraint.** Anything in `src/` must be writable in C11
+ freestanding (with the runtime in `include/`/`lib/`). No GNU extensions, no
+ libc beyond what cfree itself ships. Bootstrap is hex0-seed → small subset
+ → full cfree; details TBD.
+
+## 13. Build composition
+
+A typical `cc` invocation composes the pipeline like this:
+
+```c
+Compiler c_store;
+Compiler* c = &c_store;
+compiler_init(c, target); /* creates SourceManager, ABI, allocators */
+Pp* pp = pp_new(c);
+ObjBuilder* ob = obj_new(c);
+DeclTable* decls = decl_new(c, ob);
+MCEmitter* mc = mc_new(c, ob);
+CGTarget* a = cgtarget_new(c, ob, mc);
+if (opt_level >= 1) a = opt_cgtarget_new(c, a, opt_level);
+Debug* d = dbg ? debug_new(c, ob) : NULL;
+CG* g = cg_new(c, a, d);
+
+pp_push_input(pp, lex_open(c, input_path));
+parse_c(c, pp, decls, g);
+
+cgtarget_finalize(a); /* IPO + lowering at -O2; no-op otherwise */
+if (d) debug_emit(d);
+obj_finalize(ob);
+Writer* w = writer_file(output_path);
+emit_elf(c, ob, w);
+writer_close(w);
+```
+
+Order is load-bearing: `cgtarget_finalize` flushes lowered code, `debug_emit`
+appends `.debug_*` sections, `obj_finalize` freezes the read-side view, and
+only then may file emitters consume the builder.
+
+JIT swaps the final emit for:
+
+```c
+Linker* l = link_new(c);
+link_add_obj(l, ob);
+LinkImage* img = link_resolve(l);
+JitImage* jit = link_jit_image(img);
+entry = jit_image_lookup(jit, entry_sym);
+```
+
+## 14. Open questions
+
+- WASM is structurally different from the register-shaped CGTarget (stack VM,
+ no ELF-style relocations). The `Operand`-driven CGTarget will lower verbosely
+ (every `binop` becomes `local.get; local.get; iN.add; local.set`); a
+ follow-up peephole pass for stack-shape lowering will reclaim most of the
+ bloat. Worth prototyping early to validate the abstractions.
+- Bootstrap subset definition: which features must the seed compiler accept?
+- Debug-info quality at `-O2`: minimum acceptable v1 is function-level
+ (low_pc/high_pc + parameter list at entry); per-variable location lists
+ for opt'd locals are a follow-up but the `Debug` interface admits them.
+- WASM relooper at -O2: choosing between Stackifier-style (preserve flat CFG
+ with relooped wrappers) and Relooper-style (reconstruct nested scopes).
+ Affects code size and opt's freedom to introduce irreducible CFGs.
+- Full VLA support beyond `__builtin_alloca`: deferred for v1
+ (`__STDC_NO_VLA__=1`). The `IR_ALLOCA`/`CGTarget.alloca_` interface accommodates
+ it when the parser is extended.
+
+## 15. Safety model (WASM target)
+
+cfree's WASM backend inherits the WebAssembly sandbox; the goal here is to be
+explicit about what that does and does not buy.
+
+**Checked at runtime:**
+
+- **Linear-memory bounds.** Every load and store traps on out-of-bounds.
+- **Control-flow integrity for direct branches.** Structured `block`/`loop`/
+ `if` mean a `br N` can only target a lexically enclosing scope. The
+ structured `CGTarget` ops (§5.1) are the source of this — flat goto and
+ `switch` fallthrough route through the relooper at -O2 and through the
+ WASM CGTarget's structural fallback at -O0/-O1.
+- **CFI for indirect calls.** `call_indirect` traps on signature mismatch.
+ The WASM type index is keyed off interned `Type*` identity (§12), so equal
+ C function types produce a single WASM type id and a real (not vacuous)
+ type check.
+- **No native code injection.** WASM has no `mprotect`/JIT-into-data path
+ exposed to the program; cfree's own JIT linker uses host APIs outside the
+ sandbox.
+- **`setjmp`/`longjmp`** lower to WASM exception handling; a `longjmp` cannot
+ smash the host stack or skip past a structured-control-flow boundary it
+ did not originate inside.
+
+**NOT checked:**
+
+- **Pointer provenance.** Pointers are `i32` indices into linear memory.
+ `(int*)0xdeadbeef` is a valid bit pattern; the only guard is the bounds
+ check on the eventual access. Use-after-free, type confusion, and
+ intra-heap buffer overflow that stays inside linear memory all remain
+ exploitable — exactly as on a real target.
+- **Integer/UB traps as a safety net.** Signed overflow, shift-by-≥-width,
+ and division-by-zero trap *deterministically* on WASM, but `opt` is not
+ permitted to assume they're unreachable (§9). They terminate the program;
+ they are not a substitute for input validation.
+- **Stack exhaustion** beyond the configured WASM stack limit: traps, but
+ recovery requires host-side restart.
+
+In short: WASM gives cfree-compiled programs **memory-isolation** safety
+(can't escape linear memory) and **control-flow-integrity** safety (can't
+forge a return address or call a wrong-typed function), but not
+**type-system** safety on pointers within linear memory. The compiler does
+not pretend otherwise.
diff --git a/doc/builtins.md b/doc/builtins.md
@@ -112,6 +112,114 @@ Operations (signatures match the GCC `__atomic` builtin family):
- `__atomic_test_and_set(ptr, order)`, `__atomic_clear(ptr, order)` — for
`atomic_flag`
+### Syscalls (cfree extension)
+
+Declared in `<cfree/syscall.h>`. Kernel-trap primitive so libc syscall
+stubs can be pure C. Numbers (`SYS_*`) are libc's responsibility —
+cfree only provides the instruction. All args and result are `long`;
+pointers/sizes/fds get cast at the call site.
+
+- `__cfree_syscall0(nr)` … `__cfree_syscall6(nr, a0, a1, a2, a3, a4, a5)`
+
+Semantics:
+- Result is normalized to Linux-style `-errno` on failure, non-negative
+ on success, on every target. On BSD/Darwin the lowering inspects the
+ carry/C flag and rewrites the result.
+- Modeled as an opaque external call with full memory clobber plus the
+ target's syscall-clobber list (so the optimizer cannot move work
+ across the trap).
+- Not available on WASM — compile-time error directs callers to WASI
+ imports.
+
+Per-target lowering:
+
+| Target | Instr | Nr reg | Args | Result | Error |
+| --------------- | ----------------- | ------ | -------------------------- | ------ | -------- |
+| Linux x86_64 | `syscall` | rax | rdi, rsi, rdx, r10, r8, r9 | rax | rax < 0 |
+| Linux i386 | `int 0x80` | eax | ebx, ecx, edx, esi, edi, ebp | eax | eax < 0 |
+| Linux aarch64 | `svc #0` | x8 | x0..x5 | x0 | x0 < 0 |
+| Linux arm | `svc #0` | r7 | r0..r5 | r0 | r0 < 0 |
+| Linux riscv | `ecall` | a7 | a0..a5 | a0 | a0 < 0 |
+| Darwin x86_64 | `syscall` | rax (class bits already in nr) | rdi, rsi, rdx, r10, r8, r9 | rax | carry → −errno |
+| Darwin aarch64 | `svc #0x80` | x16 | x0..x5 | x0 | C flag → −errno |
+
+i386 6-arg case (`ebp` is the frame pointer): cfree saves/restores
+`ebp` around the trap.
+
+### Bare-metal primitives (cfree extension)
+
+Declared in `<cfree/baremetal.h>`. For freestanding / embedded use, so
+libc and HAL code can stay pure C. All have opaque-call +
+full-memory-clobber semantics so the optimizer cannot reorder loads,
+stores, or other side effects across them.
+
+Interrupt control (the standard save/disable/restore critical-section
+idiom):
+- `unsigned long __cfree_irq_save(void)` — disable IRQs, return previous mask
+- `void __cfree_irq_restore(unsigned long prev)`
+- `void __cfree_irq_disable(void)`, `void __cfree_irq_enable(void)`
+
+Lowerings: x86 `cli`/`sti` + `pushf`/`popf`; Cortex-A/R `cpsid i`/`cpsie i`
++ CPSR; Cortex-M `cpsid i`/`cpsie i` + PRIMASK (selected by
+`__ARM_ARCH_*` profile macros); aarch64 `msr daifset/daifclr, #2` +
+`mrs daif`; RISC-V `csrr{ci,si} mstatus, 8`.
+
+CPU memory barriers — distinct from `__atomic_thread_fence`. C11 fences
+provide ordering for the C abstract machine; these emit the specific
+CPU barriers required for DMA-coherent device memory, MMU/TLB
+reconfiguration, and self-modifying / freshly-loaded code.
+
+```c
+typedef enum {
+ __CFREE_BARRIER_FULL, // sy
+ __CFREE_BARRIER_INNER, // ish
+ __CFREE_BARRIER_INNER_STORE, // ishst
+ __CFREE_BARRIER_OUTER, // osh
+ __CFREE_BARRIER_OUTER_STORE, // oshst
+ __CFREE_BARRIER_NON_SHARE, // nsh
+} __cfree_barrier_scope;
+
+void __cfree_dmb(__cfree_barrier_scope); // ordering only
+void __cfree_dsb(__cfree_barrier_scope); // ordering + completion
+void __cfree_isb(void); // pipeline flush after sysreg / MMU change
+```
+
+Lowerings: arm/aarch64 `dmb/dsb/isb <scope>`; x86 `mfence`/`lfence`/`sfence`
+(scope ignored — TSO collapses the cases) and `isb` is a no-op (x86
+self-snoops); RISC-V `fence rw,rw` and `fence.i`. WASM: compile-time error.
+
+Cache maintenance (range-based; cfree reads `CTR`/`CTR_EL0` once at
+startup for the line size and emits a loop):
+- `void __cfree_dcache_clean(const void *, unsigned long)` — write-back
+- `void __cfree_dcache_invalidate(void *, unsigned long)`
+- `void __cfree_dcache_clean_invalidate(void *, unsigned long)`
+- `void __cfree_icache_invalidate(const void *, unsigned long)`
+
+Lowerings: aarch64 `dc {cvac,ivac,civac}` + `ic ivau` loops; arm v7+
+equivalents via CP15. x86: no-ops (cache-coherent ICache included).
+RISC-V: Zicbom / Zicboz instructions when those extensions are present,
+otherwise a compile-time error.
+
+Hints:
+- `void __cfree_nop(void)`
+- `void __cfree_yield(void)` — spin-loop hint; arm `yield`, x86 `pause`,
+ RISC-V `pause`
+- `void __cfree_wfi(void)` — sleep until next interrupt; arm/aarch64
+ `wfi`, x86 `hlt`, RISC-V `wfi`. All three are privileged, which is
+ fine for bare-metal. Compile-time error on WASM.
+- `void __cfree_wfe(void)`, `void __cfree_sev(void)` — arm/aarch64
+ only; compile-time error elsewhere. The inter-core event-flag
+ abstraction (SEV sets, WFE waits, exclusive-monitor release also
+ sets) does not generalize: x86 MONITOR/MWAIT is address-watch and
+ privileged-extension; RISC-V has no base-ISA equivalent. Use
+ `__cfree_yield` + `__cfree_wfi` for portable spin/idle loops.
+
+System-register access (`mrs`/`msr`, `csrr`/`csrw`, `rdmsr`/`wrmsr`,
+MMU/cache config, etc.) is **not** provided as a builtin. Callers use
+extended inline asm directly. Rationale: register names and privilege
+rules vary per ISA generation; the call sites are arch-specific
+already; abstracting adds churn without removing platform code.
+
---
## `libcfree_rt.a` — runtime support library
@@ -150,7 +258,7 @@ Always:
- Compare: `__eq`, `__ne`, `__lt`, `__le`, `__gt`, `__ge`, `__unord` × `sf2`/`df2`/`tf2`
### Nonlocal jumps + stackful coroutines (per-arch, always shipped)
-`<setjmp.h>` and `<stdcoro.h>` share one per-target context payload
+`<setjmp.h>` and `<cfree/coro.h>` share one per-target context payload
(256 bytes, 16-byte aligned): callee-saved GPRs + callee-saved FPRs
+ sp + return address. `jmp_buf` and `coro_ctx` are both opaque
typedefs over that payload; the runtime reinterprets them as the
@@ -159,7 +267,7 @@ per-arch struct.
- `setjmp`, `longjmp` — `<setjmp.h>` (C11 7.13). cfree extension:
this header is *not* in the C11 freestanding subset.
- `coro_init`, `coro_resume`, `coro_yield`, `coro_self` — public
- asymmetric API in `<stdcoro.h>`. Resume drives a coroutine
+ asymmetric API in `<cfree/coro.h>`. Resume drives a coroutine
forward; yield suspends back to the most recent resumer; resumes
nest like function calls. Status (`CORO_INIT` / `RUNNING` /
`SUSPENDED` / `DEAD`) is tracked on the `coro_t` and propagates
diff --git a/include/cfree/baremetal.h b/include/cfree/baremetal.h
@@ -0,0 +1,125 @@
+/* cfree/baremetal.h -- cfree extension -- bare-metal / freestanding
+ * embedded primitives
+ *
+ * cfree/baremetal.h is non-standard: C11 has no notion of interrupt
+ * masking, CPU memory barriers (distinct from C11 fences), or cache
+ * maintenance. cfree exposes them so HAL and libc-substrate code
+ * targeting bare metal can stay pure C without resorting to inline
+ * asm for these recurring idioms.
+ *
+ * Optimizer view: every primitive in this header is opaque to the
+ * optimizer with full memory clobber. Loads, stores, and other side
+ * effects on either side of a call are not reordered across it. This
+ * is what makes the IRQ save/restore idiom and the DMA-coherent
+ * barrier idioms correct without per-call inline-asm clobbers.
+ *
+ * Per-target lowering: see doc/builtins.md. Targets where a primitive
+ * has no meaningful lowering (e.g. WFI on x86, DMB on WASM) raise a
+ * compile-time error rather than silently no-op.
+ *
+ * What is *not* in this header. System-register access (mrs/msr,
+ * csrr/csrw, rdmsr/wrmsr, MMU/cache config writes, ...) stays in
+ * extended inline asm at the call site. Register names and privilege
+ * rules vary too much per ISA generation to wrap usefully, and call
+ * sites are arch-specific anyway.
+ */
+#ifndef CFREE_BAREMETAL_H
+#define CFREE_BAREMETAL_H
+
+/* ====================================================================
+ * Interrupt control.
+ *
+ * The standard save/disable/restore critical-section idiom:
+ *
+ * unsigned long prev = __cfree_irq_save();
+ * // ... critical section ...
+ * __cfree_irq_restore(prev);
+ *
+ * Save/restore nests safely. The standalone disable/enable forms are
+ * for code that owns the interrupt-enable bit unconditionally (boot,
+ * panic paths).
+ *
+ * Lowerings: x86 cli/sti + pushf/popf; Cortex-A/R cpsid i/cpsie i +
+ * CPSR; Cortex-M cpsid i/cpsie i + PRIMASK (selected by __ARM_ARCH_*
+ * profile macros); aarch64 msr daifset/daifclr + mrs daif; RISC-V
+ * csrr{ci,si} mstatus, 8.
+ * ==================================================================== */
+unsigned long __cfree_irq_save(void);
+void __cfree_irq_restore(unsigned long prev);
+void __cfree_irq_disable(void);
+void __cfree_irq_enable(void);
+
+/* ====================================================================
+ * CPU memory barriers.
+ *
+ * Distinct from <stdatomic.h>'s __atomic_thread_fence: C11 fences
+ * provide ordering for the C abstract machine and assume a
+ * cache-coherent multiprocessor. These primitives emit the specific
+ * CPU barriers required for DMA-coherent device memory, MMU/TLB
+ * reconfiguration, and self-modifying / freshly-loaded code -- where
+ * the C abstract machine is not the right model.
+ *
+ * Scope selects the shareability domain on arm/aarch64; targets with
+ * no such concept (x86 TSO collapses every case) ignore it.
+ *
+ * Lowerings: arm/aarch64 dmb/dsb/isb <scope>; x86 mfence/lfence/sfence
+ * (scope ignored) and isb is a no-op (x86 self-snoops); RISC-V
+ * fence rw,rw and fence.i. WASM: compile-time error.
+ * ==================================================================== */
+typedef enum {
+ __CFREE_BARRIER_FULL, /* sy */
+ __CFREE_BARRIER_INNER, /* ish */
+ __CFREE_BARRIER_INNER_STORE, /* ishst */
+ __CFREE_BARRIER_OUTER, /* osh */
+ __CFREE_BARRIER_OUTER_STORE, /* oshst */
+ __CFREE_BARRIER_NON_SHARE, /* nsh */
+} __cfree_barrier_scope;
+
+void __cfree_dmb(__cfree_barrier_scope); /* ordering only */
+void __cfree_dsb(__cfree_barrier_scope); /* ordering + completion */
+void __cfree_isb(void); /* pipeline flush after sysreg/MMU */
+
+/* ====================================================================
+ * Cache maintenance (range-based).
+ *
+ * cfree reads CTR / CTR_EL0 once at startup to learn the data and
+ * instruction cache line sizes and emits a loop over [p, p + n).
+ * Callers do not have to align p or n; the runtime widens to line
+ * boundaries.
+ *
+ * Lowerings: aarch64 dc {cvac,ivac,civac} + ic ivau loops; arm v7+
+ * equivalents via CP15. x86: no-ops (cache-coherent, ICache included).
+ * RISC-V: Zicbom / Zicboz instructions when those extensions are
+ * present, otherwise a compile-time error.
+ * ==================================================================== */
+void __cfree_dcache_clean(const void *p, unsigned long n);
+void __cfree_dcache_invalidate(void *p, unsigned long n);
+void __cfree_dcache_clean_invalidate(void *p, unsigned long n);
+void __cfree_icache_invalidate(const void *p, unsigned long n);
+
+/* ====================================================================
+ * Hints.
+ *
+ * __cfree_yield is the spin-loop hint (arm yield, x86 pause,
+ * RISC-V pause).
+ *
+ * __cfree_wfi sleeps until the next interrupt -- the universal "idle
+ * loop" primitive. Lowers to arm/aarch64 wfi, x86 hlt, RISC-V wfi.
+ * All three are privileged (ring 0 / EL1+ / M-or-S mode); bare-metal
+ * code is privileged by construction. WASM: compile-time error.
+ *
+ * The wfe/sev pair is arm/aarch64-only: the inter-core "event flag"
+ * abstraction (SEV signals the flag, WFE sleeps on it; the exclusive
+ * monitor's release also signals) does not generalize. x86 MONITOR/
+ * MWAIT is address-watch rather than flag-based and not in the base
+ * ISA; RISC-V has no base-ISA equivalent. Invoking them outside arm/
+ * aarch64 is a compile-time error -- write the spin-and-back-off loop
+ * with __cfree_yield + __cfree_wfi instead.
+ * ==================================================================== */
+void __cfree_nop(void);
+void __cfree_yield(void);
+void __cfree_wfi(void);
+void __cfree_wfe(void); /* arm/aarch64 only */
+void __cfree_sev(void); /* arm/aarch64 only */
+
+#endif
diff --git a/include/cfree/coro.h b/include/cfree/coro.h
@@ -0,0 +1,130 @@
+/* cfree/coro.h -- cfree extension -- stackful asymmetric coroutines
+ *
+ * cfree/coro.h is non-standard: C11 has no stackful-coroutine facility.
+ * cfree ships it as a native counterpart to <setjmp.h>: the underlying
+ * per-target context payload is literally shared with setjmp/longjmp
+ * (256 bytes, see doc/builtins.md), and the runtime is target-specific
+ * assembly in libcfree_rt.a.
+ *
+ * Two layers in this header:
+ *
+ * coro_ctx Raw register-context buffer used by the symmetric
+ * primitive __cfree_coro_switch. Most code does not
+ * touch it -- it is exposed for advanced schedulers
+ * (M:N, custom dispatch) that want the bare switch.
+ *
+ * coro_t Asymmetric coroutine handle. Resume drives forward,
+ * yield suspends back to the most recent resumer.
+ * Resumes nest like function calls. status is
+ * publicly readable; the rest is private storage.
+ *
+ * Programming model (asymmetric):
+ * 1. Allocate a coro_t and a stack region.
+ * 2. coro_init(&c, fn, stack_base, stack_len).
+ * 3. coro_resume(&c, value) drives c forward.
+ * 4. From inside fn, coro_yield(value) suspends back to the resumer.
+ * 5. fn's return value becomes the final coro_resume payload, with
+ * status CORO_DEAD; the runtime cleans up automatically.
+ *
+ * Threading. The runtime's "current coroutine" pointer and "main"
+ * register save slot are _Thread_local, so each thread has its own
+ * resume chain. A coroutine itself is still tied to the thread that
+ * drives it: errno, _Thread_local user state, and thread-affine OS
+ * handles silently rebind if a coroutine is resumed on a different
+ * thread, so don't migrate a suspended coroutine across threads.
+ * cfree's contract defines __STDC_NO_THREADS__ (no <threads.h>) --
+ * _Thread_local is a separate C11 language feature and works
+ * independently.
+ */
+#ifndef CFREE_CORO_H
+#define CFREE_CORO_H
+
+#include <stddef.h>
+#include <stdint.h>
+
+/* Stack alignment required at function-call boundaries on every cfree
+ target (16 on x86_64/aarch64/arm32-AAPCS-VFP/riscv; weaker on i386
+ but 16 covers it). Caller stacks must be aligned to this. */
+#define CORO_STACK_ALIGN 16
+
+/* Raw register-context buffer. 256 bytes, alignof 16. The runtime
+ reinterprets this as a per-target struct of callee-saved GPRs +
+ callee-saved FPRs + sp + return address. Exposed only because the
+ internal __cfree_coro_switch primitive at the bottom of this header
+ needs it as an argument type. coro_t below embeds one of these as
+ the first word of its private storage. */
+typedef struct coro_ctx {
+ _Alignas(16) unsigned char __cfree_storage[256];
+} coro_ctx;
+
+/* ====================================================================
+ * Asymmetric coroutine API.
+ * ==================================================================== */
+
+typedef enum {
+ CORO_INIT, /* never resumed */
+ CORO_RUNNING, /* on the live resume chain */
+ CORO_SUSPENDED, /* yielded; resumable */
+ CORO_DEAD, /* entry returned */
+} coro_status_t;
+
+typedef struct {
+ uintptr_t value;
+ coro_status_t status;
+} coro_result_t;
+
+/* Coroutine entry point. The first coro_resume's value is passed as
+ `arg`. The return value is delivered as the final coro_resume's
+ payload, with status CORO_DEAD. */
+typedef uintptr_t (*coro_fn)(uintptr_t arg);
+
+/* Coroutine handle. status is publicly readable; the private blob
+ carries the register context (256 B), a resumer pointer, and the
+ user-supplied entry fn. 288 B is comfortable headroom on both LP64
+ and ILP32 (lib/coro/coro.c verifies the fit with a _Static_assert). */
+typedef struct coro {
+ coro_status_t status;
+ _Alignas(16) unsigned char __cfree_priv[288];
+} coro_t;
+
+/* Initialize *c to run fn on [stack_base, stack_base + stack_len).
+ stack_base must be CORO_STACK_ALIGN-aligned. status becomes
+ CORO_INIT. The first coro_resume delivers its value as fn's arg. */
+void coro_init(coro_t *c, coro_fn fn, void *stack_base, size_t stack_len);
+
+/* Drive c forward. If c is INIT, calls fn(value) on c's stack. If
+ SUSPENDED, c's matching coro_yield call returns value. coro_resume
+ itself returns when c yields or its fn returns; the result carries
+ c's new status (SUSPENDED or DEAD) and the value c delivered.
+ UB if c is RUNNING or DEAD. */
+coro_result_t coro_resume(coro_t *c, uintptr_t value);
+
+/* Suspend the current coroutine, returning value to its resumer (the
+ matching coro_resume call returns this value). coro_yield itself
+ returns the value the next resumer passes. UB outside a coroutine. */
+uintptr_t coro_yield(uintptr_t value);
+
+/* The currently running coroutine, or NULL if not in one. */
+coro_t *coro_self(void);
+
+static inline coro_status_t coro_status(const coro_t *c) { return c->status; }
+
+/* ====================================================================
+ * Symmetric primitive (compiler-builtin-style; for advanced schedulers).
+ *
+ * Saves callee-saved state into *from, restores it from *to, and
+ * delivers `value` to *to as the return of its prior switch (or as
+ * the first-arg register of *to's trampoline on a fresh context).
+ * Returns the value passed by the next switch back to *from.
+ *
+ * coro_resume / coro_yield are built on this. Most code should not
+ * call it directly; it is exposed for schedulers that don't fit the
+ * asymmetric resume-chain model (M:N runtimes, work-stealing, etc.).
+ *
+ * Bypassing the asymmetric layer means losing coro_self / status
+ * tracking / DEAD propagation -- the symmetric primitive is purely
+ * a register-shuffle and knows nothing about coro_t.
+ * ==================================================================== */
+uintptr_t __cfree_coro_switch(coro_ctx *from, coro_ctx *to, uintptr_t value);
+
+#endif
diff --git a/include/cfree/syscall.h b/include/cfree/syscall.h
@@ -0,0 +1,44 @@
+/* cfree/syscall.h -- cfree extension -- kernel-trap primitive
+ *
+ * cfree/syscall.h is non-standard: C11 has no notion of a kernel
+ * trap. cfree exposes the bare instruction so libc syscall stubs and
+ * other low-level code can stay pure C without resorting to inline
+ * asm.
+ *
+ * Numbering is the caller's responsibility -- cfree provides no
+ * SYS_* table. Pass the platform-specific number (see Linux
+ * <asm/unistd.h>, Darwin <sys/syscall.h>, etc.) in nr; pointers,
+ * sizes, and file descriptors are cast to long at the call site.
+ *
+ * Result convention: normalized to Linux-style "non-negative on
+ * success, -errno on failure" on every supported target. On
+ * BSD/Darwin, where the kernel signals failure via the carry/C
+ * flag and returns the positive errno in the result register, the
+ * lowering inspects the flag and rewrites the value -- callers
+ * see the Linux convention regardless of host kernel.
+ *
+ * Optimizer view: each call is opaque, with full memory clobber
+ * plus the target's syscall-clobber list. The optimizer cannot
+ * reorder loads, stores, or other side effects across the trap.
+ *
+ * Per-target lowering (see doc/builtins.md for the table): cfree
+ * emits the appropriate trap instruction (`syscall`, `int 0x80`,
+ * `svc`, `ecall`) inline; there is no library call.
+ *
+ * Not available on WASM: invoking any of these on __wasm__ is a
+ * compile-time error. WASM programs reach the host via WASI
+ * imports, not a syscall instruction.
+ */
+#ifndef CFREE_SYSCALL_H
+#define CFREE_SYSCALL_H
+
+long __cfree_syscall0(long nr);
+long __cfree_syscall1(long nr, long a0);
+long __cfree_syscall2(long nr, long a0, long a1);
+long __cfree_syscall3(long nr, long a0, long a1, long a2);
+long __cfree_syscall4(long nr, long a0, long a1, long a2, long a3);
+long __cfree_syscall5(long nr, long a0, long a1, long a2, long a3, long a4);
+long __cfree_syscall6(long nr, long a0, long a1, long a2, long a3, long a4,
+ long a5);
+
+#endif
diff --git a/include/setjmp.h b/include/setjmp.h
@@ -10,7 +10,7 @@
* such struct across cfree targets -- 256 bytes (x86_64 Windows: 12
* GPR slots + xmm6-15). C11 explicitly excludes the FP status flags
* and open-file state, so no signal-mask slot is reserved. The same
- * 256-byte payload is shared with <stdcoro.h>'s coro_ctx so the
+ * 256-byte payload is shared with <cfree/coro.h>'s coro_ctx so the
* underlying save/restore halves are reused across all three
* primitives. */
#ifndef CFREE_SETJMP_H
diff --git a/include/stdcoro.h b/include/stdcoro.h
@@ -1,130 +0,0 @@
-/* stdcoro.h -- cfree extension -- stackful asymmetric coroutines
- *
- * stdcoro.h is non-standard: C11 has no stackful-coroutine facility.
- * cfree ships it as a native counterpart to <setjmp.h>: the underlying
- * per-target context payload is literally shared with setjmp/longjmp
- * (256 bytes, see doc/builtins.md), and the runtime is target-specific
- * assembly in libcfree_rt.a.
- *
- * Two layers in this header:
- *
- * coro_ctx Raw register-context buffer used by the symmetric
- * primitive __cfree_coro_switch. Most code does not
- * touch it -- it is exposed for advanced schedulers
- * (M:N, custom dispatch) that want the bare switch.
- *
- * coro_t Asymmetric coroutine handle. Resume drives forward,
- * yield suspends back to the most recent resumer.
- * Resumes nest like function calls. status is
- * publicly readable; the rest is private storage.
- *
- * Programming model (asymmetric):
- * 1. Allocate a coro_t and a stack region.
- * 2. coro_init(&c, fn, stack_base, stack_len).
- * 3. coro_resume(&c, value) drives c forward.
- * 4. From inside fn, coro_yield(value) suspends back to the resumer.
- * 5. fn's return value becomes the final coro_resume payload, with
- * status CORO_DEAD; the runtime cleans up automatically.
- *
- * Threading. The runtime's "current coroutine" pointer and "main"
- * register save slot are _Thread_local, so each thread has its own
- * resume chain. A coroutine itself is still tied to the thread that
- * drives it: errno, _Thread_local user state, and thread-affine OS
- * handles silently rebind if a coroutine is resumed on a different
- * thread, so don't migrate a suspended coroutine across threads.
- * cfree's contract defines __STDC_NO_THREADS__ (no <threads.h>) --
- * _Thread_local is a separate C11 language feature and works
- * independently.
- */
-#ifndef CFREE_STDCORO_H
-#define CFREE_STDCORO_H
-
-#include <stddef.h>
-#include <stdint.h>
-
-/* Stack alignment required at function-call boundaries on every cfree
- target (16 on x86_64/aarch64/arm32-AAPCS-VFP/riscv; weaker on i386
- but 16 covers it). Caller stacks must be aligned to this. */
-#define CORO_STACK_ALIGN 16
-
-/* Raw register-context buffer. 256 bytes, alignof 16. The runtime
- reinterprets this as a per-target struct of callee-saved GPRs +
- callee-saved FPRs + sp + return address. Exposed only because the
- internal __cfree_coro_switch primitive at the bottom of this header
- needs it as an argument type. coro_t below embeds one of these as
- the first word of its private storage. */
-typedef struct coro_ctx {
- _Alignas(16) unsigned char __cfree_storage[256];
-} coro_ctx;
-
-/* ====================================================================
- * Asymmetric coroutine API.
- * ==================================================================== */
-
-typedef enum {
- CORO_INIT, /* never resumed */
- CORO_RUNNING, /* on the live resume chain */
- CORO_SUSPENDED, /* yielded; resumable */
- CORO_DEAD, /* entry returned */
-} coro_status_t;
-
-typedef struct {
- uintptr_t value;
- coro_status_t status;
-} coro_result_t;
-
-/* Coroutine entry point. The first coro_resume's value is passed as
- `arg`. The return value is delivered as the final coro_resume's
- payload, with status CORO_DEAD. */
-typedef uintptr_t (*coro_fn)(uintptr_t arg);
-
-/* Coroutine handle. status is publicly readable; the private blob
- carries the register context (256 B), a resumer pointer, and the
- user-supplied entry fn. 288 B is comfortable headroom on both LP64
- and ILP32 (lib/coro/coro.c verifies the fit with a _Static_assert). */
-typedef struct coro {
- coro_status_t status;
- _Alignas(16) unsigned char __cfree_priv[288];
-} coro_t;
-
-/* Initialize *c to run fn on [stack_base, stack_base + stack_len).
- stack_base must be CORO_STACK_ALIGN-aligned. status becomes
- CORO_INIT. The first coro_resume delivers its value as fn's arg. */
-void coro_init(coro_t *c, coro_fn fn, void *stack_base, size_t stack_len);
-
-/* Drive c forward. If c is INIT, calls fn(value) on c's stack. If
- SUSPENDED, c's matching coro_yield call returns value. coro_resume
- itself returns when c yields or its fn returns; the result carries
- c's new status (SUSPENDED or DEAD) and the value c delivered.
- UB if c is RUNNING or DEAD. */
-coro_result_t coro_resume(coro_t *c, uintptr_t value);
-
-/* Suspend the current coroutine, returning value to its resumer (the
- matching coro_resume call returns this value). coro_yield itself
- returns the value the next resumer passes. UB outside a coroutine. */
-uintptr_t coro_yield(uintptr_t value);
-
-/* The currently running coroutine, or NULL if not in one. */
-coro_t *coro_self(void);
-
-static inline coro_status_t coro_status(const coro_t *c) { return c->status; }
-
-/* ====================================================================
- * Symmetric primitive (compiler-builtin-style; for advanced schedulers).
- *
- * Saves callee-saved state into *from, restores it from *to, and
- * delivers `value` to *to as the return of its prior switch (or as
- * the first-arg register of *to's trampoline on a fresh context).
- * Returns the value passed by the next switch back to *from.
- *
- * coro_resume / coro_yield are built on this. Most code should not
- * call it directly; it is exposed for schedulers that don't fit the
- * asymmetric resume-chain model (M:N runtimes, work-stealing, etc.).
- *
- * Bypassing the asymmetric layer means losing coro_self / status
- * tracking / DEAD propagation -- the symmetric primitive is purely
- * a register-shuffle and knows nothing about coro_t.
- * ==================================================================== */
-uintptr_t __cfree_coro_switch(coro_ctx *from, coro_ctx *to, uintptr_t value);
-
-#endif
diff --git a/lib/README.md b/lib/README.md
@@ -33,8 +33,8 @@ hand-written `mem/mem.c` is 0BSD; relicense as desired.
| `riscv/rv64.S` | `__riscv_save_*` + `__riscv_restore_*` (rv64) | RISC-V rv64 with `-msave-restore` |
| `mem/mem.c` | `memcpy` / `memmove` / `memset` / `memcmp` (weak) | All; user libc overrides |
| `atomic/atomic_freestanding.c` | `__atomic_*` fallback shim | All |
-| `coro/<arch>.c` | Per-arch primitives: `setjmp` / `longjmp` (`<setjmp.h>`) + `__cfree_coro_ctx_init` / `__cfree_coro_switch` / `__cfree_coro_trampoline` (internal; the public `<stdcoro.h>` API sits on top via `coro/coro.c`) | One of `aarch64`, `arm32`, `arm32_thumb1`, `i386`, `riscv32`, `riscv64`, `x86_64`, `x86_64_win`. Not built for `wasm32`. |
-| `coro/coro.c` | Arch-agnostic asymmetric layer: `coro_init` / `coro_resume` / `coro_yield` / `coro_self` (`<stdcoro.h>`) | All variants that ship a `coro/<arch>.c`. |
+| `coro/<arch>.c` | Per-arch primitives: `setjmp` / `longjmp` (`<setjmp.h>`) + `__cfree_coro_ctx_init` / `__cfree_coro_switch` / `__cfree_coro_trampoline` (internal; the public `<cfree/coro.h>` API sits on top via `coro/coro.c`) | One of `aarch64`, `arm32`, `arm32_thumb1`, `i386`, `riscv32`, `riscv64`, `x86_64`, `x86_64_win`. Not built for `wasm32`. |
+| `coro/coro.c` | Arch-agnostic asymmetric layer: `coro_init` / `coro_resume` / `coro_yield` / `coro_self` (`<cfree/coro.h>`) | All variants that ship a `coro/<arch>.c`. |
### Build-time include dirs (consumed by the masters; nothing here lands in `libcfree_rt.a`)
@@ -153,7 +153,7 @@ Provides:
- `setjmp` / `longjmp` (public, `<setjmp.h>`).
- `__cfree_coro_switch(from, to, value)` — symmetric register switch,
- exposed in `<stdcoro.h>` as a compiler-builtin-style primitive for
+ exposed in `<cfree/coro.h>` as a compiler-builtin-style primitive for
advanced schedulers; the asymmetric layer below also uses it.
- `__cfree_coro_ctx_init` / `__cfree_coro_trampoline` — internal.
diff --git a/lib/build.sh b/lib/build.sh
@@ -120,7 +120,7 @@ echo
# ---- LP64 little-endian ------------------------------------------------------
LP64_BASE="$INT_C $INT64_C $FP_C $MEM_C $ATOMIC_C"
-# Coro impl needs cfree's own headers (setjmp.h, stdcoro.h).
+# Coro impl needs cfree's own headers (setjmp.h, cfree/coro.h).
CORO_INC="-I../include"
build_variant x86_64-linux \
diff --git a/lib/coro/aarch64.c b/lib/coro/aarch64.c
@@ -1,7 +1,7 @@
/*
* lib/coro/aarch64.c -- AArch64 (AAPCS) implementations of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* All three primitives sit on one per-target context layout:
*
@@ -24,7 +24,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/arm32.c b/lib/coro/arm32.c
@@ -1,7 +1,7 @@
/*
* lib/coro/arm32.c -- ARM32 Thumb-2 (AAPCS) implementations of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* All three primitives sit on one per-target context layout:
*
@@ -31,7 +31,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/arm32_thumb1.c b/lib/coro/arm32_thumb1.c
@@ -1,7 +1,7 @@
/*
* lib/coro/arm32_thumb1.c -- ARMv6-M (Cortex-M0 / M0+, Thumb-1) impls of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* Thumb-1 / ARMv6-M is a strict subset of the Thumb-2 ISA used by the
* sibling arm32.c, and several conveniences disappear:
@@ -27,7 +27,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/coro.c b/lib/coro/coro.c
@@ -1,5 +1,5 @@
/*
- * lib/coro/coro.c -- asymmetric coroutine layer for <stdcoro.h>.
+ * lib/coro/coro.c -- asymmetric coroutine layer for <cfree/coro.h>.
*
* Sits on top of the per-arch __cfree_coro_switch / __cfree_coro_ctx_init
* primitives (one of lib/coro/<arch>.c) and supplies the public
@@ -37,7 +37,7 @@
* any coroutine's lifecycle.
*/
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/i386.c b/lib/coro/i386.c
@@ -1,7 +1,7 @@
/*
* lib/coro/i386.c -- i386 System V (cdecl, ILP32) implementations of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* cdecl callee-saved set: ebx, esi, edi, ebp, esp. Args are pushed
* right-to-left on the stack: at function entry, 4(%esp)=arg0,
@@ -33,7 +33,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/riscv32.c b/lib/coro/riscv32.c
@@ -1,7 +1,7 @@
/*
* lib/coro/riscv32.c -- RISC-V 32-bit (ILP32/ILP32F/ILP32D) implementations of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* Per-target context layout (matches xOS rv32 tick_coro_ctx):
*
@@ -26,7 +26,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/riscv64.c b/lib/coro/riscv64.c
@@ -1,7 +1,7 @@
/*
* lib/coro/riscv64.c -- RISC-V 64-bit (LP64D) implementations of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* RISC-V LP64D callee-saved set:
* ra (x1) -- saved manually so longjmp/__cfree_coro_switch can
@@ -39,7 +39,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/x86_64.c b/lib/coro/x86_64.c
@@ -1,7 +1,7 @@
/*
* lib/coro/x86_64.c -- x86_64 System V ABI implementations of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* Callee-saved set on SysV: rbx, rbp, r12-r15. (No callee-saved xmm
* regs -- those are MS-ABI specific; see x86_64_win.c.)
@@ -24,7 +24,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/lib/coro/x86_64_win.c b/lib/coro/x86_64_win.c
@@ -1,7 +1,7 @@
/*
* lib/coro/x86_64_win.c -- x86_64 Windows (MS x64 ABI) implementations of
* setjmp / longjmp (<setjmp.h>)
- * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>)
+ * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>)
*
* MS x64 callee-saved set: rbx, rbp, rdi, rsi, r12-r15, xmm6-xmm15.
* (Compare with x86_64.c -- SysV doesn't preserve rdi/rsi or any xmm.)
@@ -29,7 +29,7 @@
*/
#include <setjmp.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
diff --git a/src/abi/abi.h b/src/abi/abi.h
@@ -0,0 +1,128 @@
+#ifndef CFREE_ABI_H
+#define CFREE_ABI_H
+
+#include "../core/core.h"
+#include "../type/type.h"
+
+/* TargetABI is the single authority for target-dependent C layout and calling
+ * convention decisions. Type remains structural and ABI-neutral; all sizes,
+ * alignments, field offsets, bitfield packing, scalar widths, and
+ * argument/return classifications are derived here from Compiler.target. */
+typedef struct TargetABI TargetABI;
+
+typedef enum ABIScalarKind {
+ ABI_SC_VOID,
+ ABI_SC_BOOL,
+ ABI_SC_INT,
+ ABI_SC_FLOAT,
+ ABI_SC_PTR,
+} ABIScalarKind;
+
+typedef struct ABITypeInfo {
+ u32 size;
+ u32 align;
+ u8 scalar_kind; /* ABIScalarKind; ABI_SC_VOID for aggregates/void */
+ u8 signed_;
+ u8 atomic;
+ u8 pad;
+} ABITypeInfo;
+
+typedef struct ABIFieldLayout {
+ u32 offset; /* byte offset from record base */
+ u16 bit_offset; /* bit offset within storage unit for bitfields */
+ u16 bit_width; /* 0 for non-bitfield */
+ u32 storage_size; /* bytes in the bitfield storage unit; 0 otherwise */
+} ABIFieldLayout;
+
+typedef struct ABIRecordLayout {
+ u32 size;
+ u32 align;
+ u32 nfields;
+ const ABIFieldLayout* fields;
+} ABIRecordLayout;
+
+typedef enum ABIArgKind {
+ ABI_ARG_IGNORE,
+ ABI_ARG_DIRECT, /* one or more inspectable parts */
+ ABI_ARG_INDIRECT, /* caller passes address */
+ ABI_ARG_EXPAND, /* aggregate split into parts below */
+} ABIArgKind;
+
+typedef enum ABIArgClass {
+ ABI_CLASS_NONE,
+ ABI_CLASS_INT,
+ ABI_CLASS_FP,
+ ABI_CLASS_VEC,
+ ABI_CLASS_MEM,
+} ABIArgClass;
+
+typedef enum ABIArgLoc {
+ ABI_LOC_NONE,
+ ABI_LOC_REG,
+ ABI_LOC_STACK,
+ ABI_LOC_EITHER,
+} ABIArgLoc;
+
+typedef enum ABIArgFlag {
+ ABI_AF_NONE = 0,
+ ABI_AF_SRET = 1u << 0, /* hidden structure-return pointer */
+ ABI_AF_BYVAL = 1u << 1, /* caller passes an address to a copy */
+ ABI_AF_SIGN_EXT = 1u << 2,
+ ABI_AF_ZERO_EXT = 1u << 3,
+ ABI_AF_VARARG = 1u << 4, /* placement affected by variadic rules */
+ ABI_AF_SPLIT = 1u << 5, /* source value is split across parts */
+} ABIArgFlag;
+
+typedef struct ABIArgPart {
+ u8 cls; /* ABIArgClass */
+ u8 loc; /* ABIArgLoc preference */
+ u16 flags; /* ABIArgFlag */
+ u32 src_offset; /* byte offset within source object */
+ u32 size; /* bytes carried by this part */
+ u32 align; /* part alignment */
+ u32 stack_align; /* required stack alignment if stack-passed */
+} ABIArgPart;
+
+typedef struct ABIArgInfo {
+ u8 kind; /* ABIArgKind */
+ u8 flags; /* ABIArgFlag applying to the whole argument */
+ u16 nparts;
+ u32 indirect_align; /* required alignment for ABI_ARG_INDIRECT/byval copy */
+ const ABIArgPart* parts;
+} ABIArgInfo;
+
+typedef struct ABIFuncInfo {
+ ABIArgInfo ret;
+ const ABIArgInfo* params;
+ u16 nparams;
+ u8 variadic;
+ u8 has_sret;
+ u32 vararg_gp_offset;
+ u32 vararg_fp_offset;
+ u32 vararg_overflow_offset;
+} ABIFuncInfo;
+
+void abi_init(TargetABI*, Compiler*);
+void abi_fini(TargetABI*);
+
+/* Builtin scalar profiles and general type layout. */
+ABITypeInfo abi_type_info(TargetABI*, const Type*);
+u32 abi_sizeof (TargetABI*, const Type*);
+u32 abi_alignof (TargetABI*, const Type*);
+
+/* Record layout is cached by Type* identity inside TargetABI and is stable for
+ * the lifetime of the ABI object. Incomplete records are fatal diagnostics. */
+const ABIRecordLayout* abi_record_layout(TargetABI*, const Type*);
+
+/* Calling convention classification. The returned object is owned by the ABI
+ * cache and remains valid until abi_fini. */
+const ABIFuncInfo* abi_func_info(TargetABI*, const Type* fn_type);
+
+/* Target-defined library types used by headers and builtins. */
+const Type* abi_size_type (TargetABI*, Pool*);
+const Type* abi_ptrdiff_type (TargetABI*, Pool*);
+const Type* abi_intptr_type (TargetABI*, Pool*);
+const Type* abi_uintptr_type (TargetABI*, Pool*);
+const Type* abi_va_list_type (TargetABI*, Pool*);
+
+#endif
diff --git a/src/arch/arch.h b/src/arch/arch.h
@@ -0,0 +1,439 @@
+#ifndef CFREE_ARCH_H
+#define CFREE_ARCH_H
+
+#include "../core/core.h"
+#include "../type/type.h"
+#include "../abi/abi.h"
+#include "../obj/obj.h"
+
+/* Reg is wide enough for opt_cgtarget to hand out unbounded virtual registers
+ * (one per defined value). Target backends use only a small subset. */
+typedef u32 Reg;
+#define REG_NONE 0xffffffffu
+
+typedef enum RegClass {
+ RC_INT,
+ RC_FP,
+ RC_VEC,
+} RegClass;
+
+typedef enum BinOp {
+ BO_IADD, BO_ISUB, BO_IMUL,
+ BO_SDIV, BO_UDIV, BO_SREM, BO_UREM,
+ BO_FADD, BO_FSUB, BO_FMUL, BO_FDIV,
+ BO_AND, BO_OR, BO_XOR,
+ BO_SHL, BO_SHR_S, BO_SHR_U,
+} BinOp;
+
+typedef enum UnOp {
+ UO_NEG,
+ UO_NOT, /* logical: 0/1 */
+ UO_BNOT, /* bitwise ~ */
+} UnOp;
+
+typedef enum CmpOp {
+ CMP_EQ, CMP_NE,
+ CMP_LT_S, CMP_LE_S, CMP_GT_S, CMP_GE_S,
+ CMP_LT_U, CMP_LE_U, CMP_GT_U, CMP_GE_U,
+ CMP_LT_F, CMP_LE_F, CMP_GT_F, CMP_GE_F,
+} CmpOp;
+
+typedef enum ConvKind {
+ CV_SEXT, CV_ZEXT, CV_TRUNC,
+ CV_ITOF_S, CV_ITOF_U, CV_FTOI_S, CV_FTOI_U,
+ CV_FEXT, CV_FTRUNC,
+ CV_BITCAST,
+} ConvKind;
+
+typedef enum AtomicOp {
+ AO_XCHG,
+ AO_ADD, AO_SUB,
+ AO_AND, AO_OR, AO_XOR, AO_NAND,
+} AtomicOp;
+
+typedef enum MemOrder {
+ MO_RELAXED,
+ MO_CONSUME,
+ MO_ACQUIRE,
+ MO_RELEASE,
+ MO_ACQ_REL,
+ MO_SEQ_CST,
+} MemOrder;
+
+typedef enum OpKind {
+ OPK_IMM,
+ OPK_REG,
+ OPK_LOCAL, /* frame-relative; v.frame_slot identifies the slot */
+ OPK_GLOBAL, /* address: symbol+addend, not a load */
+ OPK_INDIRECT, /* [reg + ofs] */
+} OpKind;
+
+typedef u32 FrameSlot;
+#define FRAME_SLOT_NONE 0u
+
+typedef enum FrameSlotKind {
+ FS_LOCAL,
+ FS_PARAM,
+ FS_SPILL,
+ FS_SRET,
+ FS_ALLOCA,
+} FrameSlotKind;
+
+typedef enum FrameSlotFlag {
+ FSF_NONE = 0,
+ FSF_ADDR_TAKEN = 1u << 0,
+ FSF_VOLATILE = 1u << 1,
+} FrameSlotFlag;
+
+typedef struct FrameSlotDesc {
+ const Type* type;
+ Sym name;
+ SrcLoc loc;
+ u32 size;
+ u32 align;
+ u8 kind; /* FrameSlotKind */
+ u8 pad;
+ u16 flags; /* FrameSlotFlag */
+} FrameSlotDesc;
+
+typedef enum MemFlag {
+ MF_NONE = 0,
+ MF_VOLATILE = 1u << 0,
+ MF_ATOMIC = 1u << 1,
+ MF_RESTRICT = 1u << 2,
+ MF_READONLY = 1u << 3,
+ MF_WRITEONLY = 1u << 4,
+ MF_UNALIGNED = 1u << 5,
+} MemFlag;
+
+typedef enum AliasKind {
+ ALIAS_UNKNOWN,
+ ALIAS_LOCAL,
+ ALIAS_GLOBAL,
+ ALIAS_PARAM,
+ ALIAS_HEAP,
+ ALIAS_STRING,
+} AliasKind;
+
+typedef struct AliasRoot {
+ u8 kind; /* AliasKind */
+ u8 pad[3];
+ union {
+ i32 local_id;
+ ObjSymId global;
+ u32 param_idx;
+ Sym string_id;
+ } v;
+} AliasRoot;
+
+typedef struct MemAccess {
+ const Type* type; /* semantic C object type accessed */
+ u32 size; /* ABI byte size of this access */
+ u32 align; /* known byte alignment; 0 means unknown */
+ u16 flags; /* MemFlag */
+ u16 addr_space;
+ AliasRoot alias;
+} MemAccess;
+
+typedef struct ConstBytes {
+ const Type* type;
+ const u8* bytes; /* ABI representation, little/big endian per target */
+ u32 size;
+ u32 align;
+} ConstBytes;
+
+typedef struct AggregateAccess {
+ const Type* type;
+ u32 size;
+ u32 align;
+ MemAccess mem;
+} AggregateAccess;
+
+typedef struct BitFieldAccess {
+ const Type* field_type;
+ MemAccess storage;
+ u32 storage_offset; /* byte offset from record base */
+ u16 bit_offset; /* target-endian bit offset within storage unit */
+ u16 bit_width; /* may be 0 for zero-width layout barriers */
+ u8 signed_;
+ u8 pad[3];
+} BitFieldAccess;
+
+typedef struct Operand {
+ u8 kind;
+ u8 cls; /* RegClass */
+ u16 pad;
+ const Type* type;
+ union {
+ i64 imm;
+ Reg reg;
+ FrameSlot frame_slot;
+ struct { ObjSymId sym; i64 addend; } global;
+ struct { Reg base; i32 ofs; } ind;
+ } v;
+} Operand;
+
+typedef enum CGABIPartFlag {
+ CG_ABI_PART_NONE = 0,
+ CG_ABI_PART_SRET = 1u << 0,
+ CG_ABI_PART_BYVAL = 1u << 1,
+ CG_ABI_PART_INDIRECT = 1u << 2,
+} CGABIPartFlag;
+
+typedef struct CGABIPart {
+ const ABIArgPart* abi_part;
+ Operand op;
+ u32 src_offset;
+ u32 size;
+ u16 flags; /* CGABIPartFlag */
+ u16 pad;
+} CGABIPart;
+
+typedef struct CGABIValue {
+ const Type* type;
+ const ABIArgInfo* abi;
+ Operand storage; /* address for indirect/byval/sret, REG/IMM for simple values */
+ const CGABIPart* parts;
+ u32 nparts;
+} CGABIValue;
+
+typedef struct CGParamDesc {
+ u32 index;
+ Sym name;
+ const Type* type;
+ FrameSlot slot;
+ const ABIArgInfo* abi;
+ const CGABIPart* incoming;
+ u32 nincoming;
+ SrcLoc loc;
+} CGParamDesc;
+
+typedef struct CGFuncDesc {
+ ObjSymId sym;
+ const Type* fn_type;
+ const ABIFuncInfo* abi;
+ const CGParamDesc* params;
+ u32 nparams;
+ SrcLoc loc;
+} CGFuncDesc;
+
+typedef struct CGCallDesc {
+ const Type* fn_type;
+ const ABIFuncInfo* abi;
+ Operand callee;
+ const CGABIValue* args;
+ u32 nargs;
+ CGABIValue ret;
+} CGCallDesc;
+
+typedef u32 Label;
+#define LABEL_NONE 0
+
+typedef enum ScopeKind {
+ SCOPE_BLOCK, /* break exits forward */
+ SCOPE_LOOP, /* break exits forward; continue uses explicit target */
+ SCOPE_IF, /* cond consumed at scope_begin */
+} ScopeKind;
+
+typedef u32 CGScope;
+#define CG_SCOPE_NONE 0u
+
+typedef struct CGScopeDesc {
+ u8 kind; /* ScopeKind */
+ u8 pad[3];
+ Label break_label; /* explicit target for break; LABEL_NONE => target creates one */
+ Label continue_label; /* explicit target for continue; LABEL_NONE for non-loops */
+ Operand cond; /* SCOPE_IF condition; ignored otherwise */
+ const Type* result_type; /* reserved for structured expression results */
+} CGScopeDesc;
+
+typedef enum AsmDir { ASM_IN, ASM_OUT, ASM_INOUT } AsmDir;
+
+typedef struct AsmConstraint {
+ const char* str; /* GCC-style: "r", "=&r", "+m", "i", "0" ... */
+ u8 dir; /* AsmDir */
+ u8 pad[3];
+} AsmConstraint;
+
+typedef u32 MCLabel;
+#define MC_LABEL_NONE 0u
+
+typedef struct MCEmitter MCEmitter;
+struct MCEmitter {
+ /* Machine/object emission context. Subclasses extend. */
+ Compiler* c;
+ ObjBuilder* obj;
+ u32 section_id;
+
+ void (*set_section)(MCEmitter*, u32 section_id);
+ u32 (*pos) (MCEmitter*);
+
+ MCLabel (*label_new) (MCEmitter*);
+ void (*label_place)(MCEmitter*, MCLabel);
+
+ void (*emit_bytes)(MCEmitter*, const u8*, size_t);
+ void (*emit_fill) (MCEmitter*, size_t n, u8 byte);
+ void (*emit_align)(MCEmitter*, u32 align, u8 fill);
+ void (*emit_reloc)(MCEmitter*, RelocKind, ObjSymId, i64 addend);
+ void (*emit_reloc_at)(MCEmitter*, u32 section_id, u32 offset, RelocKind,
+ ObjSymId, i64 addend, int explicit_addend, int pair);
+ void (*emit_label_ref)(MCEmitter*, MCLabel, RelocKind, u32 width, i64 addend);
+ void (*set_loc) (MCEmitter*, SrcLoc);
+ void (*destroy) (MCEmitter*);
+};
+
+typedef struct CGTarget CGTarget;
+struct CGTarget {
+ /* Typed C/IR lowering context. Subclasses extend. */
+ Compiler* c;
+ ObjBuilder* obj;
+ MCEmitter* mc;
+ u32 text_section_id;
+
+ /* ---- function lifecycle ---- */
+ void (*func_begin)(CGTarget*, const CGFuncDesc*);
+ void (*func_end)(CGTarget*);
+
+ /* ---- registers and frame slots ----
+ * At -O0 CG is TCC-style and owns the value stack: it decides which live
+ * values must be spilled/reloaded across register pressure, calls, and asm.
+ * Real targets return physical scratch registers and implement spill/reload
+ * mechanics; opt_cgtarget returns fresh virtual regs and ignores spills. */
+ Reg (*alloc_reg) (CGTarget*, RegClass, const Type*);
+ void (*free_reg) (CGTarget*, Reg); /* hint; opt_cgtarget ignores */
+ i32 (*alloc_local)(CGTarget*, u32 size, u32 align);
+ FrameSlot (*frame_slot)(CGTarget*, const FrameSlotDesc*);
+ void (*param) (CGTarget*, const CGParamDesc*);
+ const Reg* (*clobbers)(CGTarget*, RegClass, u32* nregs);
+ void (*spill_reg) (CGTarget*, Operand src_reg, FrameSlot, MemAccess);
+ void (*reload_reg) (CGTarget*, Operand dst_reg, FrameSlot, MemAccess);
+
+ /* ---- labels and control flow ---- */
+ Label (*label_new) (CGTarget*);
+ void (*label_place)(CGTarget*, Label);
+ void (*jump) (CGTarget*, Label);
+ /* Fused compare-and-branch. cg's preferred form: avoids materializing 0/1
+ * for a normal `if (a < b)`. For an arbitrary i1 in a register, callers
+ * synthesize cmp_branch(CMP_NE, val, IMM_ZERO, label). */
+ void (*cmp_branch)(CGTarget*, CmpOp, Operand a, Operand b, Label);
+
+ /* ---- structured control flow ----
+ * Mirrors CG's scope ops. CG passes explicit break/continue targets so C
+ * `for` continues can land on the increment expression rather than the loop
+ * header. Real backends shim these onto label_new/label_place/jump.
+ * The WASM backend consumes them natively to emit block/loop/if with
+ * structurally-bounded br targets, which is what gives WASM its CFI.
+ *
+ * For SCOPE_IF, `cond` is the i1 operand; ignored for BLOCK/LOOP.
+ * `result_type` is reserved for if-as-expression on WASM (NULL for the
+ * statement case used by C); other backends ignore it. */
+ CGScope (*scope_begin)(CGTarget*, const CGScopeDesc*);
+ void (*scope_else) (CGTarget*, CGScope);
+ void (*scope_end) (CGTarget*, CGScope);
+ void (*break_to) (CGTarget*, CGScope);
+ void (*continue_to)(CGTarget*, CGScope);
+
+ /* ---- data movement (split, no overloading) ---- */
+ void (*load_imm)(CGTarget*, Operand dst /*REG*/, i64 imm);
+ void (*load_const)(CGTarget*, Operand dst /*REG*/, ConstBytes);
+ void (*copy) (CGTarget*, Operand dst /*REG*/, Operand src /*REG*/);
+ void (*load) (CGTarget*, Operand dst /*REG*/, Operand addr /*LOCAL|GLOBAL|INDIRECT*/, MemAccess);
+ void (*store) (CGTarget*, Operand addr /*LOCAL|GLOBAL|INDIRECT*/, Operand src /*REG|IMM*/, MemAccess);
+ void (*addr_of) (CGTarget*, Operand dst /*REG*/, Operand lv /*LOCAL|GLOBAL|INDIRECT*/);
+ void (*copy_bytes)(CGTarget*, Operand dst_addr, Operand src_addr, AggregateAccess);
+ void (*set_bytes) (CGTarget*, Operand dst_addr, Operand byte_value, AggregateAccess);
+ void (*bitfield_load) (CGTarget*, Operand dst /*REG*/, Operand record_addr, BitFieldAccess);
+ void (*bitfield_store)(CGTarget*, Operand record_addr, Operand src /*REG|IMM*/, BitFieldAccess);
+
+ /* ---- arithmetic, compare, convert ---- */
+ void (*binop) (CGTarget*, BinOp, Operand dst, Operand a, Operand b);
+ void (*unop) (CGTarget*, UnOp, Operand dst, Operand a);
+ void (*cmp) (CGTarget*, CmpOp, Operand dst, Operand a, Operand b); /* materialize 0/1 */
+ void (*convert)(CGTarget*, ConvKind, Operand dst, Operand src);
+
+ /* ---- calls / return ----
+ * CGCallDesc carries the type-checked signature, inspectable ABI
+ * classification, source operands, and the already-materialized ABI parts
+ * for direct, indirect/byval, sret, split, and multi-register values.
+ * `callee.kind == OPK_GLOBAL` is direct; any other kind is indirect. */
+ void (*call)(CGTarget*, const CGCallDesc*);
+ void (*ret) (CGTarget*, const CGABIValue* val_or_null);
+
+ /* ---- alloca ----
+ * Dynamic stack allocation. `size` is i64 bytes; `align` is the required
+ * alignment of the returned pointer. Backend grows the (linear-memory or
+ * native) shadow stack, returns the pointer in `dst`. v1 only emits this
+ * via __builtin_alloca; C VLAs are not parsed (__STDC_NO_VLA__). */
+ void (*alloca_)(CGTarget*, Operand dst /*REG*/, Operand size, u32 align);
+
+ /* ---- variadics ----
+ * va_list type is per-arch (defined in <stdarg.h>); these methods
+ * implement the four C macros after builtin substitution. ap is always
+ * passed as ≈ on SysV x86-64 the backend manages the register-save
+ * area, on WASM the backend walks the spilled-args memory. */
+ void (*va_start_)(CGTarget*, Operand ap_addr);
+ void (*va_arg_) (CGTarget*, Operand dst /*REG*/, Operand ap_addr, const Type* t);
+ void (*va_end_) (CGTarget*, Operand ap_addr);
+ void (*va_copy_) (CGTarget*, Operand dst_ap_addr, Operand src_ap_addr);
+
+ /* ---- setjmp / longjmp ----
+ * Optional. Real backends leave these NULL: the parser lowers <setjmp.h>'s
+ * setjmp to a normal call to __cfree_setjmp and opt recognizes the symbol
+ * by name as returns-twice. The WASM backend implements them via the
+ * exception-handling proposal so that a longjmp can unwind across WASM
+ * frames (which lack a saveable native SP).
+ *
+ * setjmp pops &buf, returns i32 in `dst` (0 on direct return, nonzero on
+ * longjmp). longjmp pops &buf and val; control does not return. */
+ void (*setjmp_) (CGTarget*, Operand dst /*REG, i32*/, Operand buf_addr);
+ void (*longjmp_)(CGTarget*, Operand buf_addr, Operand val);
+
+ /* ---- atomics ---- */
+ void (*atomic_load) (CGTarget*, Operand dst /*REG*/, Operand addr, MemAccess, MemOrder);
+ void (*atomic_store)(CGTarget*, Operand addr, Operand src, MemAccess, MemOrder);
+ void (*atomic_rmw) (CGTarget*, AtomicOp, Operand dst /*REG: prior value*/,
+ Operand addr, Operand val, MemAccess, MemOrder);
+ void (*atomic_cas) (CGTarget*, Operand prior /*REG*/, Operand ok /*REG, i1*/,
+ Operand addr, Operand expected, Operand desired,
+ MemAccess, MemOrder success, MemOrder failure);
+ void (*fence) (CGTarget*, MemOrder);
+
+ /* ---- inline asm ----
+ * Per-arch constraint binding + template assembly, packaged as one block.
+ * ins[i] are pre-evaluated input operands.
+ * out_ops[i] is filled by the arch with the location holding the result
+ * for outs[i]; the caller (cg) reads them out after the call.
+ * "=&r" early-clobber outputs must be allocated disjoint from any input.
+ * opt_cgtarget records this as a single IR_ASM_BLOCK; the wrapped target
+ * receives the same call at lowering time with materialized operands. */
+ void (*asm_block)(CGTarget*,
+ const char* tmpl,
+ const AsmConstraint* outs, u32 nout, Operand* out_ops,
+ const AsmConstraint* ins, u32 nin, const Operand* in_ops,
+ const Sym* clobbers, u32 nclob);
+
+ /* ---- source-location tracking ----
+ * Sets the SrcLoc inherited by subsequent emit-side calls (binop/load/...).
+ * opt_cgtarget stamps it on every recorded Inst; target CGTargets forward it
+ * to MCEmitter for Debug line emission. Sticky until the next set_loc. */
+ void (*set_loc)(CGTarget*, SrcLoc);
+
+ /* ---- end-of-TU hook ----
+ * No-op for plain target CGTargets. opt_cgtarget runs cross-function passes
+ * (inlining + cleanup) and lowers all buffered IR functions into the
+ * wrapped target CGTarget. Drivers must call this after the last func_end and
+ * before reading from `obj` or calling debug_emit. */
+ void (*finalize)(CGTarget*);
+
+ void (*destroy)(CGTarget*);
+};
+
+/* Construct the right target/emitter pair for c->target. */
+MCEmitter* mc_new(Compiler*, ObjBuilder*);
+void mc_free(MCEmitter*);
+
+CGTarget* cgtarget_new(Compiler*, ObjBuilder*, MCEmitter*);
+void cgtarget_finalize(CGTarget*);
+void cgtarget_free(CGTarget*);
+
+#endif
diff --git a/src/cg/cg.h b/src/cg/cg.h
@@ -0,0 +1,163 @@
+#ifndef CFREE_CG_H
+#define CFREE_CG_H
+
+#include "../arch/arch.h"
+#include "../decl/decl.h"
+#include "../type/type.h"
+
+typedef struct CG CG;
+typedef struct Debug Debug;
+
+/* Debug is optional; pass NULL when -g is off. */
+CG* cg_new(Compiler*, CGTarget*, Debug*);
+void cg_free(CG*);
+
+/* ----- functions ----- */
+void cg_func_begin(CG*, const CGFuncDesc*);
+void cg_func_end (CG*);
+
+/* ----- locals & params ----- */
+FrameSlot cg_local(CG*, const FrameSlotDesc*); /* returns frame slot; pushes nothing */
+void cg_param(CG*, const CGParamDesc*);
+
+/* ----- value-stack pushes ----- */
+void cg_push_int (CG*, i64, const Type*);
+void cg_push_const (CG*, ConstBytes); /* exact ABI bytes */
+void cg_push_float (CG*, double, const Type*); /* convenience for simple parser paths */
+void cg_push_str (CG*, Sym str_id, const Type*); /* into rodata; pushes pointer */
+void cg_push_local (CG*, FrameSlot); /* lvalue */
+void cg_push_global(CG*, ObjSymId, const Type*); /* lvalue */
+
+/* ----- value-stack manipulation ----- */
+void cg_load (CG*); /* lvalue → rvalue; derives MemAccess */
+void cg_addr (CG*); /* lvalue → ptr rvalue */
+void cg_store(CG*); /* [..., lv, rv] → []; derives MemAccess */
+void cg_dup (CG*);
+void cg_swap (CG*);
+void cg_drop (CG*);
+
+/* Aggregate and bitfield operations keep C object semantics visible to direct
+ * targets and opt. Addresses are lvalues or pointer rvalues on the value stack;
+ * sizes, offsets, storage units, and alignments come from TargetABI. */
+void cg_copy_aggregate(CG*, AggregateAccess); /* [..., dst_addr, src_addr] → [] */
+void cg_set_aggregate (CG*, AggregateAccess); /* [..., dst_addr, byte] → [] */
+void cg_bitfield_load (CG*, BitFieldAccess); /* [..., record_addr] → value */
+void cg_bitfield_store(CG*, BitFieldAccess); /* [..., record_addr, value] → [] */
+
+void cg_binop (CG*, BinOp);
+void cg_unop (CG*, UnOp);
+void cg_cmp (CG*, CmpOp);
+void cg_convert(CG*, const Type* dst); /* picks ConvKind from src/dst */
+
+/* Direct vs indirect: callee on the stack distinguishes itself by SValue/operand
+ * kind. CG obtains ABIFuncInfo from Compiler.abi, materializes CGABIValue
+ * argument/return parts, then calls CGTarget.call with a CGCallDesc. On WASM,
+ * fn_type selects the call_indirect type index (interned Type* identity is the
+ * index source of truth). */
+void cg_call(CG*, u32 nargs, const Type* fn_type); /* stack: [..., callee, arg0..argN-1]
+ → result (if non-void) */
+void cg_ret (CG*, int has_value);
+
+/* ----- C declarations and global initializers -----
+ * Parser records C declaration semantics through DeclTable. CG consumes DeclIds
+ * only when a declaration becomes executable code or an addressable object. */
+void cg_bind_decl(CG*, DeclId);
+
+/* ----- alloca -----
+ * Dynamic stack allocation. Pops `size_bytes` (i64), pushes `void*` aligned to
+ * max_align_t. v1 does not parse C99/C11 VLAs (predefines __STDC_NO_VLA__);
+ * cg_alloca is reachable only via the __builtin_alloca path. */
+void cg_alloca(CG*);
+
+/* ----- variadics -----
+ * va_list type is per-arch (defined in <stdarg.h>). The four ops match the C
+ * macros after builtin substitution. cg_va_arg pops &ap and pushes the next
+ * arg of `t`. cg_va_start/end/copy pop the va_list addresses and push nothing. */
+/* The trailing underscores avoid colliding with <stdarg.h> macros — cfree
+ * sources include stdarg.h for compiler_panicv (see core.h). */
+void cg_va_start_(CG*); /* pop &ap */
+void cg_va_arg_ (CG*, const Type* t); /* pop ≈ push value */
+void cg_va_end_ (CG*); /* pop &ap */
+void cg_va_copy_ (CG*); /* pop &dst, &src */
+
+/* ----- setjmp / longjmp -----
+ * On real arches these are NOT emitted: the parser lowers <setjmp.h>'s setjmp
+ * to a normal extern call to __cfree_setjmp; opt recognizes the symbol by name
+ * as returns-twice (no inlining across; values defined before the call are not
+ * GVN-merged with values defined after). On WASM the parser instead emits
+ * cg_setjmp/cg_longjmp, which forward to CGTarget.setjmp/CGTarget.longjmp; the WASM
+ * backend lowers via the exception-handling proposal.
+ *
+ * cg_setjmp pops &buf and pushes i32 (0 on direct return, nonzero on longjmp).
+ * cg_longjmp pops &buf and val; does not return. */
+void cg_setjmp (CG*);
+void cg_longjmp(CG*);
+
+/* ----- atomics -----
+ * Pointer operands are typed `_Atomic T*`. cg derives MemAccess from the
+ * pointee type, qualifiers, alignment facts, and alias root; the pointee type
+ * drives width and tells the backend whether the op fits inline or routes to
+ * compiler-rt. */
+void cg_atomic_load (CG*, MemOrder); /* pops ptr; pushes value */
+void cg_atomic_store(CG*, MemOrder); /* pops ptr, value */
+void cg_atomic_rmw (CG*, AtomicOp, MemOrder); /* pops ptr, val; pushes prior */
+void cg_atomic_cas (CG*, MemOrder success, MemOrder failure);
+ /* pops ptr, expected, desired;
+ * pushes (prior, ok_i1) */
+void cg_fence (CG*, MemOrder);
+
+/* ----- control flow (CG-level labels) -----
+ * cg_branch_true fuses with a preceding cg_cmp into a single CGTarget.cmp_branch
+ * when the i1 on top of stack is the unconsumed result of that cmp. For a
+ * non-cmp i1, it emits cmp_branch(CMP_NE, val, IMM_ZERO, label). */
+typedef u32 CGLabel;
+CGLabel cg_label_new(CG*);
+void cg_label_place(CG*, CGLabel);
+void cg_jump(CG*, CGLabel);
+void cg_branch_true (CG*, CGLabel); /* pops i1 */
+void cg_branch_false(CG*, CGLabel);
+
+/* ----- structured control flow -----
+ * Used for if / while / for / do — the cases where the parser already knows
+ * the structure. Nests like a stack: every scope_begin must pair with one
+ * scope_end at the same nesting depth. Break and continue targets are explicit
+ * so C `for` continue jumps to the increment expression, not necessarily the
+ * loop header.
+ *
+ * Real backends implement these as a thin shim over label_place/jump (no code
+ * size cost). The WASM backend consumes them directly to emit block/loop/if
+ * with structurally-bounded br targets — that's the source of CFI on WASM
+ * without invoking the relooper.
+ *
+ * goto, computed-goto, and switch fallthrough still go through the flat label
+ * API above. opt's IR is flat-CFG; at -O2 the WASM lowering pass relooper
+ * reconstructs structure from the flat IR. At -O0/-O1 (no opt wrapper),
+ * CG drives the WASM CGTarget directly with scope ops and no relooper runs. */
+/* ScopeKind is shared with CGTarget (see arch.h). */
+typedef u32 CGScope;
+typedef struct CGScopeConfig {
+ ScopeKind kind;
+ CGLabel break_label;
+ CGLabel continue_label;
+ const Type* result_type;
+} CGScopeConfig;
+CGScope cg_scope_begin(CG*, CGScopeConfig); /* IF: pops i1 */
+void cg_scope_else (CG*, CGScope); /* IF only */
+void cg_scope_end (CG*, CGScope);
+void cg_break (CG*, CGScope);
+void cg_continue (CG*, CGScope); /* LOOP only */
+
+/* ----- source location ----- */
+void cg_set_loc(CG*, SrcLoc); /* propagates to CGTarget and Debug */
+
+/* ----- inline asm -----
+ * Inputs are popped from the CG stack in declaration order before outputs are
+ * pushed back as fresh SValues. Constraints are GCC-style strings; binding
+ * is per-arch and happens inside CGTarget.asm_block. */
+void cg_inline_asm(CG*,
+ const char* tmpl,
+ const AsmConstraint* outs, u32 nout,
+ const AsmConstraint* ins, u32 nin,
+ const Sym* clobbers, u32 nclob);
+
+#endif
diff --git a/src/debug/debug.h b/src/debug/debug.h
@@ -0,0 +1,72 @@
+#ifndef CFREE_DEBUG_H
+#define CFREE_DEBUG_H
+
+#include "../core/core.h"
+#include "../type/type.h"
+#include "../arch/arch.h"
+
+/* DWARF debug info. The producer side (CG, CGTarget/MCEmitter, opt) feeds events here as
+ * compilation runs; the consumer side writes .debug_* sections into the same
+ * ObjBuilder when debug_emit is called.
+ *
+ * Producer responsibilities:
+ * - Parser: nothing directly; types are looked up on demand from those that
+ * reach debug_local / debug_param.
+ * - CG: function and scope lifecycle, parameter and local declarations.
+ * - MCEmitter (or the lowering pass inside opt at -O2): the line program, and
+ * pc-range bounds for functions.
+ * - opt at -O2: location-list entries when a variable's location changes
+ * across the optimized function. */
+
+typedef struct Debug Debug;
+
+Debug* debug_new(Compiler*, ObjBuilder*);
+void debug_free(Debug*);
+
+/* file table — SourceManager owns paths; returns DWARF file index */
+u32 debug_file(Debug*, u32 source_file_id);
+
+/* function lifecycle */
+void debug_func_begin (Debug*, ObjSymId, const Type* fn_type, SrcLoc decl);
+void debug_func_pc_range(Debug*, ObjSecId text_section_id, u32 begin_ofs, u32 end_ofs);
+void debug_func_end (Debug*);
+
+/* lexical scopes (nested between func_begin/end) */
+void debug_scope_begin(Debug*, SrcLoc);
+void debug_scope_end (Debug*, SrcLoc);
+
+/* variable location */
+typedef enum DebugVarLocKind {
+ DVL_FRAME,
+ DVL_REG,
+ DVL_GLOBAL,
+ DVL_LOCLIST, /* time-varying location, see debug_loclist_* */
+} DebugVarLocKind;
+
+typedef struct DebugVarLoc {
+ u8 kind;
+ u8 pad[3];
+ union {
+ i32 frame_ofs;
+ Reg reg;
+ ObjSymId global;
+ u32 loclist_id;
+ } v;
+} DebugVarLoc;
+
+void debug_param(Debug*, Sym name, const Type*, SrcLoc, u32 idx, DebugVarLoc);
+void debug_local(Debug*, Sym name, const Type*, SrcLoc, DebugVarLoc);
+
+/* line program */
+void debug_line(Debug*, ObjSecId text_section_id, u32 text_offset, SrcLoc, int is_stmt);
+
+/* location lists — for opt'd code where a variable moves between locations */
+u32 debug_loclist_new(Debug*);
+void debug_loclist_add(Debug*, u32 id, u32 begin_pc, u32 end_pc, DebugVarLoc);
+
+/* Emit the accumulated debug info as DWARF sections into the ObjBuilder.
+ * Must be called after all code sections are finalized but before the
+ * file emitters run. */
+void debug_emit(Debug*);
+
+#endif
diff --git a/src/decl/decl.h b/src/decl/decl.h
@@ -0,0 +1,91 @@
+#ifndef CFREE_DECL_H
+#define CFREE_DECL_H
+
+#include "../arch/arch.h"
+
+/* C declaration semantics. This layer is deliberately above ObjBuilder:
+ * ObjBuilder stores object-format facts, while DeclTable owns C linkage,
+ * storage duration, tentative-definition, static-local, and initializer rules. */
+typedef struct DeclTable DeclTable;
+
+typedef u32 DeclId;
+#define DECL_NONE 0u
+
+typedef enum DeclStorage {
+ DS_EXTERN,
+ DS_STATIC,
+ DS_AUTO,
+ DS_REGISTER,
+ DS_TYPEDEF,
+} DeclStorage;
+
+typedef enum DeclLinkage {
+ DL_NONE,
+ DL_INTERNAL,
+ DL_EXTERNAL,
+} DeclLinkage;
+
+typedef enum DeclFlag {
+ DF_NONE = 0,
+ DF_THREAD = 1u << 0,
+ DF_INLINE = 1u << 1,
+ DF_TENTATIVE = 1u << 2,
+ DF_USED = 1u << 3,
+ DF_WEAK = 1u << 4,
+ DF_STATIC_LOCAL = 1u << 5,
+} DeclFlag;
+
+typedef struct Decl {
+ DeclId id;
+ Sym name;
+ const Type* type;
+ ObjSymId obj_sym;
+ ObjSecId section_id; /* optional explicit section; OBJ_SEC_NONE => default */
+ SrcLoc loc;
+ u8 storage; /* DeclStorage */
+ u8 linkage; /* DeclLinkage */
+ u8 visibility; /* SymVis */
+ u8 pad;
+ u32 flags; /* DeclFlag */
+} Decl;
+
+typedef enum InitKind {
+ INIT_ZERO,
+ INIT_BYTES,
+ INIT_RELOC,
+ INIT_FILL,
+} InitKind;
+
+typedef struct InitReloc {
+ RelocKind kind;
+ ObjSymId target;
+ i64 addend;
+ u32 width;
+} InitReloc;
+
+typedef struct InitItem {
+ u32 offset; /* byte offset inside the initialized object */
+ u32 size;
+ u8 kind; /* InitKind */
+ u8 pad[3];
+ union {
+ ConstBytes bytes;
+ InitReloc reloc;
+ struct { u8 byte; } fill;
+ } v;
+} InitItem;
+
+DeclTable* decl_new(Compiler*, ObjBuilder*);
+void decl_free(DeclTable*);
+
+DeclId decl_declare(DeclTable*, const Decl*);
+const Decl* decl_get(const DeclTable*, DeclId);
+ObjSymId decl_obj_sym(const DeclTable*, DeclId);
+
+void decl_define_function(DeclTable*, DeclId, ObjSecId text_section_id,
+ u64 value, u64 size);
+void decl_define_object (DeclTable*, DeclId, u64 size, u32 align,
+ const InitItem* init, u32 ninit);
+void decl_define_tentative(DeclTable*, DeclId, u64 size, u32 align);
+
+#endif
diff --git a/src/driver/driver.h b/src/driver/driver.h
@@ -0,0 +1,23 @@
+#ifndef CFREE_DRIVER_H
+#define CFREE_DRIVER_H
+
+#include "../core/core.h"
+
+typedef enum Tool {
+ TOOL_CC,
+ TOOL_CPP,
+ TOOL_AS,
+ TOOL_LD,
+ TOOL_AR,
+ TOOL_OBJDUMP,
+ TOOL_DBG,
+} Tool;
+
+/* Multi-call entry: dispatches by argv[0] basename, falling back to argv[1]
+ * (e.g. `cfree cc ...`). */
+int driver_main(int argc, char** argv);
+
+/* Direct entry per tool. */
+int driver_run(Tool, int argc, char** argv);
+
+#endif
diff --git a/src/lex/lex.h b/src/lex/lex.h
@@ -0,0 +1,102 @@
+#ifndef CFREE_LEX_H
+#define CFREE_LEX_H
+
+#include "../core/core.h"
+
+typedef enum TokKind {
+ TOK_EOF = 0,
+ TOK_IDENT, /* v.ident */
+ TOK_NUM, /* lit */
+ TOK_FLT, /* lit */
+ TOK_STR, /* lit; v.str is decoded bytes if target-independent */
+ TOK_CHR, /* lit */
+ TOK_PUNCT, /* v.punct */
+ TOK_PP_HASH, /* # */
+ TOK_PP_PASTE, /* ## */
+ TOK_NEWLINE, /* visible to PP only */
+ TOK_KW_FIRST,
+ /* C11 keywords are inserted into this range by parse_c via pool */
+ TOK_KW_LAST = 0x1000,
+} TokKind;
+
+typedef enum TokFlag {
+ TF_AT_BOL = 1u << 0,
+ TF_HAS_SPACE = 1u << 1,
+ TF_NO_EXPAND = 1u << 2,
+ TF_INT_U = 1u << 3,
+ TF_INT_L = 1u << 4,
+ TF_INT_LL = 1u << 5,
+ TF_FLT_F = 1u << 6,
+ TF_FLT_L = 1u << 7,
+ TF_STR_WIDE = 1u << 8,
+ TF_STR_U8 = 1u << 9,
+ TF_STR_U16 = 1u << 10,
+ TF_STR_U32 = 1u << 11,
+ TF_LITERAL_BAD = 1u << 12,
+} TokFlag;
+
+typedef enum Punct {
+ P_NONE = 0,
+ /* Single-char punctuators reuse their ASCII codepoint here. */
+ P_ARROW = 256, P_INC, P_DEC,
+ P_SHL, P_SHR,
+ P_LE, P_GE, P_EQ, P_NE,
+ P_AND, P_OR,
+ P_ADD_ASSIGN, P_SUB_ASSIGN, P_MUL_ASSIGN, P_DIV_ASSIGN, P_MOD_ASSIGN,
+ P_AND_ASSIGN, P_OR_ASSIGN, P_XOR_ASSIGN, P_SHL_ASSIGN, P_SHR_ASSIGN,
+ P_ELLIPSIS,
+ P_HASH_HASH,
+} Punct;
+
+typedef u32 LitId;
+#define LIT_NONE 0u
+
+typedef enum LitKind {
+ LIT_INT,
+ LIT_FLOAT,
+ LIT_STRING,
+ LIT_CHAR,
+} LitKind;
+
+typedef enum LitEnc {
+ LENC_ORDINARY,
+ LENC_UTF8,
+ LENC_WIDE,
+ LENC_UTF16,
+ LENC_UTF32,
+} LitEnc;
+
+typedef struct LitInfo {
+ u8 kind; /* LitKind */
+ u8 enc; /* LitEnc for strings/chars */
+ u16 flags; /* TokFlag suffix/encoding bits */
+ Sym spelling; /* exact source spelling */
+ Sym bytes; /* decoded bytes/code units, if already decoded */
+} LitInfo;
+
+typedef struct Tok {
+ u16 kind;
+ u16 flags;
+ SrcLoc loc;
+ Sym spelling; /* exact token spelling for diagnostics/#/## */
+ LitId lit; /* literal-table handle; LIT_NONE otherwise */
+ union {
+ Sym ident;
+ Sym str;
+ u32 punct;
+ } v;
+} Tok;
+
+typedef struct Lexer Lexer;
+
+Lexer* lex_open(Compiler*, const char* path);
+Lexer* lex_open_mem(Compiler*, const char* name, const char* src, size_t len);
+void lex_close(Lexer*);
+
+/* Streaming. Returns TOK_EOF repeatedly at end of input. */
+Tok lex_next(Lexer*);
+SrcLoc lex_loc(const Lexer*);
+u32 lex_file_id(const Lexer*);
+const LitInfo* lex_lit(const Lexer*, LitId);
+
+#endif
diff --git a/src/link/link.h b/src/link/link.h
@@ -0,0 +1,132 @@
+#ifndef CFREE_LINK_H
+#define CFREE_LINK_H
+
+#include "../obj/obj.h"
+
+typedef struct Linker Linker;
+typedef struct LinkImage LinkImage;
+
+typedef enum LinkInputKind {
+ LINK_INPUT_OBJ,
+ LINK_INPUT_OBJ_FILE,
+ LINK_INPUT_ARCHIVE,
+} LinkInputKind;
+
+typedef u32 LinkInputId;
+#define LINK_INPUT_NONE 0u
+
+typedef u32 LinkSymId;
+#define LINK_SYM_NONE 0u
+
+typedef u32 LinkSegmentId;
+#define LINK_SEG_NONE 0u
+
+typedef u32 LinkSectionId;
+#define LINK_SEC_NONE 0u
+
+typedef struct LinkInput {
+ LinkInputId id;
+ u8 kind; /* LinkInputKind */
+ u8 pad[3];
+ ObjBuilder* obj; /* for LINK_INPUT_OBJ, otherwise NULL until read */
+ Sym path; /* for file/archive inputs */
+} LinkInput;
+
+typedef struct LinkSymbol {
+ LinkSymId id;
+ Sym name;
+ LinkInputId input_id;
+ ObjSymId obj_sym;
+ ObjSecId section_id;
+ u64 value;
+ u64 vaddr; /* final linked address, 0 for unresolved undef */
+ u64 size;
+ u8 bind; /* SymBind */
+ u8 kind; /* SymKind */
+ u8 defined;
+ u8 pad;
+} LinkSymbol;
+
+typedef struct LinkSegment {
+ LinkSegmentId id;
+ u32 flags; /* SecFlag-like permissions after layout */
+ u64 file_offset;
+ u64 vaddr;
+ u64 mem_size;
+ u64 file_size;
+ u32 align;
+ u32 nsections;
+} LinkSegment;
+
+typedef struct LinkSection {
+ LinkSectionId id;
+ LinkInputId input_id;
+ ObjSecId obj_section_id;
+ LinkSegmentId segment_id;
+ u64 input_offset;
+ u64 file_offset;
+ u64 vaddr;
+ u64 size;
+ u32 flags;
+ u32 align;
+} LinkSection;
+
+typedef struct LinkRelocApply {
+ LinkInputId input_id;
+ ObjSecId section_id;
+ LinkSectionId link_section_id;
+ u32 offset;
+ u32 width;
+ u64 write_vaddr;
+ u64 write_file_offset;
+ RelocKind kind;
+ LinkSymId target;
+ i64 addend;
+} LinkRelocApply;
+
+typedef void* (*LinkExternResolver)(void* user, Sym name);
+
+typedef struct JitImage {
+ LinkImage* image;
+ void* base;
+ size_t size;
+} JitImage;
+
+Linker* link_new(Compiler*);
+void link_free(Linker*);
+
+LinkInputId link_add_obj(Linker*, ObjBuilder*); /* fresh-compiled */
+LinkInputId link_add_obj_file(Linker*, const char* path); /* read .o from disk */
+LinkInputId link_add_archive(Linker*, const char* path); /* .a / static archive */
+void link_add_lib_search_path(Linker*, const char* dir);
+void link_set_entry(Linker*, Sym name);
+void link_set_script(Linker*, const char* path);
+void link_set_extern_resolver(Linker*, LinkExternResolver, void* user);
+
+/* Symbol resolution and layout are explicit so file linking and JIT share the
+ * same resolved image. Fatal diagnostics use Compiler.panic. */
+LinkImage* link_resolve(Linker*);
+void link_image_free(LinkImage*);
+const LinkSymbol* link_symbol(LinkImage*, LinkSymId);
+LinkSymId link_symbol_lookup(LinkImage*, Sym name);
+u32 link_segment_count(LinkImage*);
+const LinkSegment* link_segment_get(LinkImage*, u32 id);
+const u8* link_segment_bytes(LinkImage*, LinkSegmentId, size_t* size_out);
+u32 link_section_count(LinkImage*);
+const LinkSection* link_section_get(LinkImage*, LinkSectionId id);
+u32 link_reloc_apply_count(LinkImage*);
+const LinkRelocApply* link_reloc_apply_get(LinkImage*, u32 id);
+
+/* Writes an executable in the format implied by Compiler.target. */
+void link_emit_exe(Linker*, const char* out_path);
+void link_emit_image(LinkImage*, const char* out_path);
+
+/* JIT: maps sections into memory, applies relocations, returns the address of
+ * the entry symbol (or any named symbol via link_jit_lookup). */
+void* link_jit(Linker*);
+JitImage* link_jit_image(LinkImage*);
+void* link_jit_lookup(Linker*, Sym name);
+void* jit_image_lookup(JitImage*, Sym name);
+void jit_image_free(JitImage*);
+
+#endif
diff --git a/src/obj/obj.h b/src/obj/obj.h
@@ -0,0 +1,239 @@
+#ifndef CFREE_OBJ_H
+#define CFREE_OBJ_H
+
+#include "../core/core.h"
+#include "../core/buf.h"
+
+typedef enum SecKind {
+ SEC_TEXT,
+ SEC_RODATA,
+ SEC_DATA,
+ SEC_BSS,
+ SEC_DEBUG,
+ SEC_OTHER,
+} SecKind;
+
+typedef enum SecFlag {
+ SF_EXEC = 1u << 0,
+ SF_WRITE = 1u << 1,
+ SF_ALLOC = 1u << 2,
+ SF_TLS = 1u << 3,
+ SF_MERGE = 1u << 4,
+ SF_STRINGS = 1u << 5,
+ SF_GROUP = 1u << 6,
+ SF_LINK_ORDER= 1u << 7,
+} SecFlag;
+
+typedef enum SecSem {
+ SSEM_PROGBITS,
+ SSEM_NOBITS,
+ SSEM_SYMTAB,
+ SSEM_STRTAB,
+ SSEM_RELA,
+ SSEM_REL,
+ SSEM_NOTE,
+ SSEM_INIT_ARRAY,
+ SSEM_FINI_ARRAY,
+ SSEM_PREINIT_ARRAY,
+ SSEM_GROUP,
+ SSEM_WASM_CUSTOM,
+} SecSem;
+
+typedef enum SymBind {
+ SB_LOCAL,
+ SB_GLOBAL,
+ SB_WEAK,
+} SymBind;
+
+typedef enum SymVis {
+ SV_DEFAULT,
+ SV_HIDDEN,
+ SV_PROTECTED,
+ SV_INTERNAL,
+} SymVis;
+
+typedef enum SymKind {
+ SK_UNDEF,
+ SK_FUNC,
+ SK_OBJ,
+ SK_SECTION,
+ SK_FILE,
+ SK_COMMON,
+ SK_TLS,
+ SK_ABS,
+} SymKind;
+
+typedef enum ObjExtKind {
+ OBJ_EXT_NONE,
+ OBJ_EXT_ELF,
+ OBJ_EXT_COFF,
+ OBJ_EXT_MACHO,
+ OBJ_EXT_WASM,
+} ObjExtKind;
+
+typedef u32 ObjSecId;
+#define OBJ_SEC_NONE 0u
+
+typedef u32 ObjGroupId;
+#define OBJ_GROUP_NONE 0u
+
+/* Per-ObjBuilder symbol handle. Object files own their symbol namespace:
+ * local/static symbols, section symbols, file symbols, unnamed labels, common
+ * definitions, and external references are all represented by ObjSymId values
+ * scoped to one builder. 0 is reserved as "none". */
+typedef u32 ObjSymId;
+#define OBJ_SYM_NONE 0u
+
+typedef enum RelocKind {
+ R_NONE = 0,
+ R_ABS32, R_ABS64,
+ R_REL32, R_REL64,
+ R_PC32, R_PC64,
+ R_GOT32, R_PLT32,
+ R_ARM_CALL, R_ARM_MOVW, R_ARM_MOVT, R_ARM_B26,
+ R_AARCH64_CALL26, R_AARCH64_ADR_PREL_PG_HI21, R_AARCH64_ADD_ABS_LO12_NC,
+ R_RV_HI20, R_RV_LO12_I, R_RV_LO12_S, R_RV_BRANCH, R_RV_JAL, R_RV_CALL,
+ R_WASM_FUNCIDX, R_WASM_TABLEIDX, R_WASM_MEMOFS, R_WASM_TYPEIDX,
+} RelocKind;
+
+typedef struct Section {
+ Sym name;
+ u16 kind;
+ u16 flags;
+ u16 sem; /* SecSem */
+ u16 ext_kind; /* ObjExtKind */
+ u32 align;
+ u32 entsize;
+ ObjSecId link; /* section index or OBJ_SEC_NONE */
+ u32 info; /* section-format dependent, typed by sem/ext_kind */
+ ObjGroupId group_id; /* OBJ_GROUP_NONE if not in a COMDAT/group */
+ u32 bss_size; /* nonzero only for SEC_BSS */
+ Buf bytes;
+} Section;
+
+typedef struct Reloc {
+ ObjSecId section_id;
+ u32 offset;
+ u16 kind;
+ u8 has_explicit_addend;
+ u8 pair; /* paired/following relocation, format-specific */
+ ObjSymId sym;
+ i64 addend;
+} Reloc;
+
+typedef struct ObjSym {
+ Sym name;
+ u16 bind;
+ u16 kind;
+ u8 vis;
+ u8 ext_kind;
+ u16 flags;
+ ObjSecId section_id; /* OBJ_SEC_NONE if undef */
+ u64 value; /* offset within section, or absolute */
+ u64 size;
+ u64 common_align; /* nonzero for SK_COMMON */
+} ObjSym;
+
+typedef struct ObjGroup {
+ Sym name;
+ ObjSymId signature;
+ ObjSecId* sections;
+ u32 nsections;
+ u32 flags;
+} ObjGroup;
+
+/* The single concrete in-memory object representation.
+ * Written by MCEmitter/CGTarget (during compile) or by an .o reader (during link).
+ * Read by file emitters, the linker (file and JIT), and objdump.
+ *
+ * Invariant: post-finalize state is identical in shape to what an .o reader
+ * would produce from a written-out object — so consumers don't care which
+ * path produced it.
+ *
+ * Lifecycle gates:
+ * 1. MCEmitter/CGTarget (or a .o reader) issues writes.
+ * 2. cgtarget_finalize must be called before any debug_emit or read access on
+ * the builder. At -O2 it flushes lowered code into sections.
+ * 3. debug_emit (if -g) writes .debug_* sections.
+ * 4. obj_finalize closes the builder: computes flat section offsets, applies
+ * pending fixups within sections, and freezes the read-side view.
+ * No further writes are permitted afterward.
+ * 5. File emitters and the linker consume via the read API. */
+typedef struct ObjBuilder ObjBuilder;
+
+ObjBuilder* obj_new(Compiler*);
+void obj_free(ObjBuilder*);
+
+/* ---- write side (MCEmitter/CGTarget and .o readers) ---- */
+ObjSecId obj_section(ObjBuilder*, Sym name, SecKind, u16 flags, u32 align);
+ObjSecId obj_section_ex(ObjBuilder*, Sym name, SecKind, SecSem, u16 flags,
+ u32 align, u32 entsize, u32 link, u32 info);
+void obj_section_set_flags(ObjBuilder*, ObjSecId, u16 flags);
+void obj_section_set_align(ObjBuilder*, ObjSecId, u32 align);
+void obj_section_set_group(ObjBuilder*, ObjSecId, ObjGroupId);
+void obj_write (ObjBuilder*, ObjSecId section_id, const void* data, size_t n);
+u8* obj_reserve(ObjBuilder*, ObjSecId section_id, size_t n);
+void obj_reserve_bss(ObjBuilder*, ObjSecId section_id, u32 size, u32 align);
+u32 obj_pos (ObjBuilder*, ObjSecId section_id);
+void obj_patch (ObjBuilder*, ObjSecId section_id, u32 ofs, const void* data, size_t n);
+
+ObjSymId obj_symbol(ObjBuilder*, Sym name, SymBind, SymKind,
+ ObjSecId section_id, u64 value, u64 size);
+ObjSymId obj_symbol_ex(ObjBuilder*, Sym name, SymBind, SymVis, SymKind,
+ ObjSecId section_id, u64 value, u64 size, u64 common_align);
+void obj_symbol_define(ObjBuilder*, ObjSymId, ObjSecId section_id, u64 value, u64 size);
+
+void obj_reloc(ObjBuilder*, ObjSecId section_id, u32 offset,
+ RelocKind, ObjSymId sym, i64 addend);
+void obj_reloc_ex(ObjBuilder*, ObjSecId section_id, u32 offset, RelocKind,
+ ObjSymId sym, i64 addend, int explicit_addend, int pair);
+
+ObjGroupId obj_group(ObjBuilder*, Sym name, ObjSymId signature, u32 flags);
+void obj_group_add_section(ObjBuilder*, ObjGroupId group_id, ObjSecId section_id);
+
+void obj_finalize(ObjBuilder*);
+
+/* ---- read side (linker, file emitters, objdump) ---- */
+u32 obj_section_count(const ObjBuilder*);
+const Section* obj_section_get (const ObjBuilder*, ObjSecId id);
+u32 obj_reloc_count (const ObjBuilder*, ObjSecId section_id);
+const Reloc* obj_relocs (const ObjBuilder*, ObjSecId section_id);
+const ObjSym* obj_symbol_get (const ObjBuilder*, ObjSymId);
+u32 obj_group_count (const ObjBuilder*);
+const ObjGroup* obj_group_get (const ObjBuilder*, ObjGroupId id);
+
+/* Symbol iteration: ObjSymId is scoped to this builder, but callers should not
+ * assume dense contiguous ids or direct indexing. The builder may store symbols
+ * in segments internally; use the cursor. */
+typedef struct ObjSymIter ObjSymIter;
+typedef struct ObjSymEntry {
+ ObjSymId id;
+ const ObjSym* sym;
+} ObjSymEntry;
+ObjSymIter* obj_symiter_new (const ObjBuilder*);
+int obj_symiter_next(ObjSymIter*, ObjSymEntry* out); /* returns 0 at end */
+void obj_symiter_free(ObjSymIter*);
+
+/* ---- streaming output sink (for file emitters) ---- */
+typedef struct Writer Writer;
+Writer* writer_file(const char* path);
+Writer* writer_mem (Heap*);
+void writer_write(Writer*, const void* data, size_t n);
+void writer_seek (Writer*, u64 offset);
+u64 writer_tell (Writer*);
+int writer_error(Writer*);
+void writer_close(Writer*);
+
+/* ---- file format emitters ---- */
+void emit_elf (Compiler*, ObjBuilder*, Writer*);
+void emit_coff (Compiler*, ObjBuilder*, Writer*);
+void emit_macho(Compiler*, ObjBuilder*, Writer*);
+void emit_wasm (Compiler*, ObjBuilder*, Writer*);
+
+/* ---- file format readers (for ld and objdump) ---- */
+ObjBuilder* read_elf (Compiler*, const char* path);
+ObjBuilder* read_coff (Compiler*, const char* path);
+ObjBuilder* read_macho(Compiler*, const char* path);
+ObjBuilder* read_wasm (Compiler*, const char* path);
+
+#endif
diff --git a/src/opt/ir.h b/src/opt/ir.h
@@ -0,0 +1,181 @@
+#ifndef CFREE_IR_H
+#define CFREE_IR_H
+
+#include "../core/core.h"
+#include "../core/arena.h"
+#include "../arch/arch.h"
+#include "../type/type.h"
+
+typedef u32 Val;
+#define VAL_NONE 0u
+
+typedef enum IROp {
+ IR_NOP,
+ IR_CONST_I, IR_CONST_BYTES,
+ IR_PARAM,
+ IR_ALLOCA,
+ IR_LOAD, IR_STORE,
+ IR_AGG_COPY, IR_AGG_SET,
+ IR_BITFIELD_LOAD, IR_BITFIELD_STORE,
+ IR_IADD, IR_ISUB, IR_IMUL,
+ IR_SDIV, IR_UDIV, IR_SREM, IR_UREM,
+ IR_FADD, IR_FSUB, IR_FMUL, IR_FDIV,
+ IR_AND, IR_OR, IR_XOR,
+ IR_SHL, IR_ASHR, IR_LSHR,
+ IR_NEG, IR_BNOT,
+ IR_CMP_EQ, IR_CMP_NE,
+ IR_CMP_SLT, IR_CMP_SLE, IR_CMP_ULT, IR_CMP_ULE,
+ IR_CMP_FLT, IR_CMP_FLE, IR_CMP_FEQ, IR_CMP_FNE,
+ IR_SEXT, IR_ZEXT, IR_TRUNC, IR_BITCAST,
+ IR_SITOFP, IR_UITOFP, IR_FPTOSI, IR_FPTOUI, IR_FPEXT, IR_FPTRUNC,
+ IR_GEP,
+ IR_CALL,
+ IR_PHI,
+ IR_BR, IR_CONDBR, IR_RET,
+ IR_ATOMIC_LOAD, IR_ATOMIC_STORE,
+ IR_ATOMIC_RMW, /* extra.imm encodes (AtomicOp << 8) | MemOrder */
+ IR_ATOMIC_CAS, /* extra.imm encodes (success << 8) | failure */
+ IR_FENCE, /* extra.imm = MemOrder */
+ IR_VA_START, IR_VA_ARG, IR_VA_END, IR_VA_COPY,
+ IR_SETJMP, /* returns-twice; opt treats as control barrier */
+ IR_LONGJMP, /* terminator-like; control does not return */
+ IR_ASM_BLOCK, /* opaque to most passes; preserves order, defines outs, clobbers */
+} IROp;
+
+typedef struct IRCallAux {
+ const Type* fn_type;
+ const ABIFuncInfo* abi;
+ ObjSymId direct_sym; /* OBJ_SYM_NONE for indirect */
+ Val callee; /* VAL_NONE for direct_sym calls */
+ u32 nargs;
+ Val* args;
+ u32 nresults;
+ Val* results; /* ABI return parts and multi-result builtins */
+ CGABIValue ret_abi;
+} IRCallAux;
+
+typedef struct IRFrameSlot {
+ FrameSlot id;
+ const Type* type;
+ Sym name;
+ SrcLoc loc;
+ u32 size;
+ u32 align;
+ u8 kind; /* FrameSlotKind */
+ u8 pad;
+ u16 flags; /* FrameSlotFlag */
+} IRFrameSlot;
+
+typedef struct IRParam {
+ u32 index;
+ Sym name;
+ const Type* type;
+ FrameSlot slot;
+ const ABIArgInfo* abi;
+ SrcLoc loc;
+} IRParam;
+
+typedef struct IRMemAux {
+ MemAccess mem;
+} IRMemAux;
+
+typedef struct IRAggregateAux {
+ AggregateAccess access;
+} IRAggregateAux;
+
+typedef struct IRBitFieldAux {
+ BitFieldAccess access;
+} IRBitFieldAux;
+
+typedef struct IRGepAux {
+ const Type* base_type;
+ u32 nindices;
+ i64* indices;
+} IRGepAux;
+
+typedef struct IRPhiAux {
+ u32 npreds;
+ u32* pred_blocks;
+ Val* pred_vals;
+} IRPhiAux;
+
+typedef struct IRAsmAux {
+ const char* tmpl;
+ const AsmConstraint* outs;
+ const AsmConstraint* ins;
+ const Sym* clobbers;
+ u32 nout, nin, nclob;
+} IRAsmAux;
+
+typedef struct IRCasAux {
+ MemAccess mem;
+ MemOrder success;
+ MemOrder failure;
+ Val prior;
+ Val ok;
+} IRCasAux;
+
+typedef struct Inst {
+ u16 op;
+ u16 flags; /* per-op flags (e.g. nsw/nuw, volatile) */
+ SrcLoc loc; /* set from CGTarget.set_loc when this insn was recorded */
+ const Type* type;
+ Val def; /* this instruction's SSA value, or VAL_NONE */
+ u32 ndefs; /* multi-result instructions use defs[0..ndefs) */
+ Val* defs; /* arena-allocated; NULL when ndefs <= 1 */
+ u32 nopnds;
+ Val* opnds; /* arena-allocated */
+ union {
+ i64 imm;
+ ConstBytes cbytes;
+ struct { ObjSymId sym; } objsym;
+ MemAccess mem;
+ void* aux; /* one of IR*Aux, arena-owned and typed by op */
+ } extra;
+} Inst;
+
+typedef struct Block {
+ u32 id;
+ Inst* insts;
+ u32 ninsts, cap;
+ u32* preds;
+ u32 npreds;
+ u32 succ[2]; /* condbr: 2; br: 1; ret: 0 */
+ u8 nsucc;
+} Block;
+
+typedef struct Func {
+ /* IR storage. Lives until cgtarget_finalize so inter-procedural passes can
+ * read every Func in the TU. Per-pass scratch goes in Arena scratch, not
+ * here. */
+ Arena* arena;
+ ObjSymId name;
+ const Type* type;
+ Block* blocks;
+ u32 nblocks, blocks_cap;
+ u32 entry; /* index of entry block */
+
+ IRFrameSlot* frame_slots;
+ u32 nframe_slots, frame_slots_cap;
+ IRParam* params;
+ u32 nparams, params_cap;
+
+ /* Value table: for each Val, where it's defined and its type. */
+ u32* val_def_block;
+ u32* val_def_inst;
+ const Type** val_type;
+ u32 nvals, vals_cap;
+} Func;
+
+Func* ir_func_new(Arena*, ObjSymId, const Type* fn_type);
+u32 ir_block_new(Func*);
+Val ir_emit(Func*, u32 block, IROp, const Type* result, const Val* opnds, u32 n);
+void ir_emit_multi(Func*, u32 block, IROp, const Type** results, Val* defs,
+ u32 ndefs, const Val* opnds, u32 nopnds);
+Val ir_emit_const_i(Func*, u32 block, const Type*, i64);
+Val ir_emit_const_bytes(Func*, u32 block, ConstBytes);
+FrameSlot ir_frame_slot_new(Func*, const FrameSlotDesc*);
+void ir_param_add(Func*, const CGParamDesc*);
+void ir_set_terminator(Func*, u32 block, IROp, u32 succ_a, u32 succ_b, Val cond);
+
+#endif
diff --git a/src/opt/opt.h b/src/opt/opt.h
@@ -0,0 +1,78 @@
+#ifndef CFREE_OPT_H
+#define CFREE_OPT_H
+
+#include "../arch/arch.h"
+#include "ir.h"
+
+/* opt_cgtarget: a CGTarget wrapper that records each function as IR.
+ *
+ * - alloc_reg returns a fresh virtual reg per call (typed). The Reg space is
+ * unbounded for opt_cgtarget; free_reg is treated as a hint and ignored.
+ * - Every other emit-side call is recorded into the current block as one
+ * SSA Inst (with the current SrcLoc from set_loc).
+ * - On CGTarget.func_end it runs the intra-procedural pipeline (down through
+ * jump_opt) and stores the optimized Func in a per-TU set.
+ * - On CGTarget.finalize it runs inter-procedural passes (inlining + cleanup),
+ * then for each Func runs machinize → live → coalesce → RA → combine →
+ * DCE → prolog/epilog → translate, driving the wrapped target CGTarget.
+ *
+ * No machine code is in `obj` until the driver calls cgtarget_finalize.
+ * Drivers must call it before reading `obj` or invoking debug_emit.
+ *
+ * Owns `target` and frees it via cgtarget_free(target) on its own destroy.
+ *
+ * level:
+ * 0 — caller should not use opt_cgtarget at all (drive target directly).
+ * 1 — minimal: combine + DCE during lowering. No SSA passes. No inlining.
+ * 2 — full pipeline below. Inlining enabled. */
+CGTarget* opt_cgtarget_new(Compiler*, CGTarget* target, int level);
+
+/* ----- intra-procedural passes (run per Func at func_end on -O2) ----- */
+void opt_build_cfg (Func*);
+void opt_block_cloning (Func*);
+void opt_build_ssa (Func*);
+void opt_addr_xform (Func*);
+void opt_gvn (Func*); /* incl. constprop, redundant-load elim */
+void opt_copy_prop (Func*); /* incl. redundant-extension elim */
+void opt_dse (Func*); /* dead store elimination */
+void opt_ssa_dce (Func*);
+void opt_licm (Func*); /* requires loop tree built */
+void opt_pressure_relief (Func*);
+void opt_make_conventional_ssa(Func*);
+void opt_ssa_combine (Func*);
+void opt_undo_ssa (Func*);
+void opt_jump_opt (Func*);
+
+/* ----- inter-procedural passes (run on the whole Func set at finalize) ----- */
+typedef struct FuncSet FuncSet;
+
+/* Walks the call graph bottom-up. For each caller, inlines callees that fit
+ * the size/heuristic budget, marks the caller dirty, and queues it for
+ * opt_cleanup. SCCs (mutual recursion) are skipped for v1.
+ *
+ * Iteration count is bounded by `max_iters` (driver knob `-finline-iters=N`,
+ * default 1; cap is enforced by opt_cgtarget). */
+void opt_inline(FuncSet*, int max_iters);
+
+/* Cheap re-run of the intra-procedural pipeline tailored to "what inlining
+ * exposes": constfold, copy_prop, gvn, ssa_dce, jump_opt, licm if the
+ * function has loops, addr_xform if any GEP-equivalents reach uses. Run on
+ * each Func that opt_inline marked dirty. */
+void opt_cleanup(Func*);
+
+/* ----- lowering / backend prep (per Func, run before driving target CGTarget) ----- */
+/* Machine-dependent ABI lowering, 2-op insns, etc. Implemented per-arch and
+ * per-OS, so it takes the full Target. */
+void opt_machinize (Func*, Target);
+void opt_live_info (Func*);
+void opt_coalesce (Func*);
+void opt_regalloc (Func*, int allow_live_range_split);
+void opt_combine (Func*); /* code selection: merge dependent insns */
+void opt_dce (Func*); /* post-RA DCE */
+
+/* Walks the lowered IR and drives a target CGTarget to emit machine code into its
+ * ObjBuilder. Inserts prolog/epilog. Splits long insns where the target needs.
+ * Stamps each emitted insn's SrcLoc onto target via CGTarget.set_loc. */
+void opt_emit (Compiler*, Func*, CGTarget* target);
+
+#endif
diff --git a/src/parse/parse.h b/src/parse/parse.h
@@ -0,0 +1,16 @@
+#ifndef CFREE_PARSE_H
+#define CFREE_PARSE_H
+
+#include "../pp/pp.h"
+#include "../decl/decl.h"
+#include "../cg/cg.h"
+#include "../arch/arch.h"
+
+/* C11 frontend. Reads tokens from `pp`, records C declarations in DeclTable,
+ * and drives `cg` for executable code. */
+void parse_c(Compiler*, Pp*, DeclTable*, CG*);
+
+/* Standalone assembler. Reads tokens directly from a Lexer; emits via MCEmitter. */
+void parse_asm(Compiler*, Lexer*, MCEmitter*);
+
+#endif
diff --git a/src/pp/pp.h b/src/pp/pp.h
@@ -0,0 +1,23 @@
+#ifndef CFREE_PP_H
+#define CFREE_PP_H
+
+#include "../lex/lex.h"
+
+typedef struct Pp Pp;
+
+Pp* pp_new(Compiler*);
+void pp_free(Pp*);
+
+void pp_add_include_dir(Pp*, const char* dir, int system);
+void pp_define(Pp*, const char* name, const char* body); /* -D */
+void pp_undef(Pp*, const char* name); /* -U */
+
+void pp_push_input(Pp*, Lexer*);
+void pp_add_include_edge(Pp*, u32 includer_file_id, u32 included_file_id,
+ SrcLoc include_loc, int system);
+
+/* Streaming. Yields preprocessed tokens (macro-expanded, directives consumed). */
+Tok pp_next(Pp*);
+const LitInfo* pp_lit(const Pp*, LitId);
+
+#endif
diff --git a/src/type/type.h b/src/type/type.h
@@ -0,0 +1,115 @@
+#ifndef CFREE_TYPE_H
+#define CFREE_TYPE_H
+
+#include "../core/core.h"
+#include "../core/pool.h"
+
+typedef enum TypeKind {
+ TY_VOID,
+ TY_BOOL,
+ TY_CHAR, TY_SCHAR, TY_UCHAR,
+ TY_SHORT, TY_USHORT,
+ TY_INT, TY_UINT,
+ TY_LONG, TY_ULONG,
+ TY_LLONG, TY_ULLONG,
+ TY_FLOAT, TY_DOUBLE, TY_LDOUBLE,
+ TY_PTR,
+ TY_ARRAY,
+ TY_FUNC,
+ TY_STRUCT,
+ TY_UNION,
+ TY_ENUM,
+} TypeKind;
+
+/* C tag identity is scoped declaration identity, not the spelling. `Sym tag`
+ * remains the diagnostic/debug name; TagId prevents two scoped `struct S`
+ * declarations from collapsing under global Type interning. */
+typedef u32 TagId;
+#define TAG_NONE 0u
+
+typedef enum TagDeclKind {
+ TAG_STRUCT,
+ TAG_UNION,
+ TAG_ENUM,
+} TagDeclKind;
+
+typedef struct TagDecl {
+ TagId id;
+ Sym spelling;
+ SrcLoc loc;
+ u8 kind; /* TagDeclKind */
+ u8 complete;
+ u16 pad;
+} TagDecl;
+
+typedef enum TypeQual {
+ Q_CONST = 1u << 0,
+ Q_VOLATILE = 1u << 1,
+ Q_RESTRICT = 1u << 2,
+ Q_ATOMIC = 1u << 3,
+} TypeQual;
+
+typedef enum FieldFlag {
+ FIELD_NONE = 0,
+ FIELD_BITFIELD = 1u << 0,
+ FIELD_ZERO_WIDTH = 1u << 1,
+ FIELD_ANON = 1u << 2,
+ FIELD_FLEXIBLE_ARRAY = 1u << 3,
+} FieldFlag;
+
+typedef struct Field {
+ Sym name;
+ const Type* type;
+ u16 bitfield_width; /* valid when FIELD_BITFIELD is set; may be 0 */
+ u16 flags; /* FieldFlag */
+} Field;
+
+struct Type {
+ u16 kind;
+ u16 qual;
+ union {
+ struct { const Type* pointee; } ptr;
+ struct { const Type* elem; u32 count; u8 incomplete; } arr;
+ struct {
+ const Type* ret;
+ const Type** params;
+ u16 nparams;
+ u8 variadic;
+ } fn;
+ struct {
+ TagId tag_id;
+ Sym tag;
+ const Field* fields;
+ u16 nfields;
+ u8 incomplete;
+ } rec; /* struct / union */
+ struct { TagId tag_id; Sym tag; const Type* base; } enm;
+ };
+};
+
+const Type* type_void(Pool*);
+const Type* type_prim(Pool*, TypeKind);
+const Type* type_ptr(Pool*, const Type*);
+const Type* type_array(Pool*, const Type* elem, u32 count, int incomplete);
+const Type* type_func(Pool*, const Type* ret, const Type** params, u16 n, int variadic);
+const Type* type_qualified(Pool*, const Type*, u16 qual);
+
+/* Aggregate construction is mutable only through TypeRecordBuilder. The
+ * committed Type is immutable and interned; field offsets, record
+ * size/alignment, and bitfield storage are target ABI facts. */
+typedef struct TypeRecordBuilder TypeRecordBuilder;
+TagId type_tag_new(Pool*, TagDeclKind, Sym spelling, SrcLoc);
+const TagDecl* type_tag_get(Pool*, TagId);
+TypeRecordBuilder* type_record_begin(Pool*, TypeKind kind, TagId, Sym tag); /* TY_STRUCT or TY_UNION */
+void type_record_field(TypeRecordBuilder*, Field);
+const Type* type_record_end(Pool*, TypeRecordBuilder*);
+const Type* type_enum(Pool*, TagId, Sym tag, const Type* base);
+
+const Type* type_unqual(Pool*, const Type*);
+const Type* type_promoted(Pool*, const Type*);
+int type_compatible(const Type*, const Type*);
+int type_is_arith(const Type*);
+int type_is_int(const Type*);
+int type_is_ptr(const Type*);
+
+#endif
diff --git a/test/smoke.c b/test/smoke.c
@@ -34,7 +34,7 @@
#include <stdarg.h>
#include <stdatomic.h>
#include <stdbool.h>
-#include <stdcoro.h>
+#include <cfree/coro.h>
#include <stddef.h>
#include <stdint.h>
#include <stdnoreturn.h>
@@ -145,7 +145,7 @@ static int cfree_setjmp_compiles(int x) {
return 0;
}
-/* stdcoro: coro_ctx and coro_t storage exists; the asymmetric API
+/* cfree/coro: coro_ctx and coro_t storage exists; the asymmetric API
surface compiles and resolves. Compile-only -- smoke.c never links
against a libcfree_rt. */
_Static_assert(sizeof(coro_ctx) >= 64, "coro_ctx room for regs");