kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 6c3ea14efbae7eed1d31df74207ab01753448462
parent 9c1a093280492ba7866eb3cebbb445c716da8ecb
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu,  7 May 2026 14:52:20 -0700

DESIGN.md and interfaces

Diffstat:
MREADME.md | 10++++++++++
Adoc/DESIGN.md | 860+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdoc/builtins.md | 112+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
Ainclude/cfree/baremetal.h | 125+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Ainclude/cfree/coro.h | 130+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Ainclude/cfree/syscall.h | 44++++++++++++++++++++++++++++++++++++++++++++
Minclude/setjmp.h | 2+-
Dinclude/stdcoro.h | 130-------------------------------------------------------------------------------
Mlib/README.md | 6+++---
Mlib/build.sh | 2+-
Mlib/coro/aarch64.c | 4++--
Mlib/coro/arm32.c | 4++--
Mlib/coro/arm32_thumb1.c | 4++--
Mlib/coro/coro.c | 4++--
Mlib/coro/i386.c | 4++--
Mlib/coro/riscv32.c | 4++--
Mlib/coro/riscv64.c | 4++--
Mlib/coro/x86_64.c | 4++--
Mlib/coro/x86_64_win.c | 4++--
Asrc/abi/abi.h | 128+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/arch/arch.h | 439+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/cg/cg.h | 163+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/debug/debug.h | 72++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/decl/decl.h | 91+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/driver/driver.h | 23+++++++++++++++++++++++
Asrc/lex/lex.h | 102+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/link/link.h | 132+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/obj/obj.h | 239+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/opt/ir.h | 181+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/opt/opt.h | 78++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/parse/parse.h | 16++++++++++++++++
Asrc/pp/pp.h | 23+++++++++++++++++++++++
Asrc/type/type.h | 115+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mtest/smoke.c | 4++--
34 files changed, 3106 insertions(+), 157 deletions(-)

diff --git a/README.md b/README.md @@ -23,3 +23,13 @@ It features: - Reproducible builds - A build and packaging system - Bootstrap from hex0-seed + +cfree also provides these headers beyond the freestanding set: +- stdatomic.h +- assert.h +- setjmp.h + +And cfree-specific extensions: +- cfree/syscall.h +- cfree/baremetal.h +- cfree/coro.h diff --git a/doc/DESIGN.md b/doc/DESIGN.md @@ -0,0 +1,860 @@ +# cfree design + +Architecture of the cfree compiler, assembler, and linker. Companion to +`README.md`. Scope: how the modules fit together and what their contracts are. +Not a tutorial; not implementation notes. + +## 1. Goals + +- Conforming C11 freestanding compiler, written in C11. +- Single multi-call binary: `cc`, `cpp`, `as`, `ld`, `ar`, `objdump`, `dbg`. +- Targets: x86 (32/64), ARM (32/64), RISC-V (32/64), WASM. +- Output: object files (ELF, COFF, Mach-O, WASM) and executables. +- In-memory JIT path sharing the entire pipeline with the file path. +- Lightweight optimizer at roughly 70% of GCC/Clang `-O2` on integer code. +- Self-hosting. Bootstraps from a hex0-seed. +- Streaming wherever feasible. Direct lowering is function-at-a-time; `-O2` + may retain per-TU IR for inter-procedural optimization. + +This design keeps the full project goals visible, but the interface contracts +below are currently tightened around the compiler, object emission, linker, +and JIT path. Standalone tool-specific surfaces (`ar`, `objdump`, `dbg`, +packaging, bootstrap) are allowed for by the shared model but are not the focus +of this pass. + +## 2. Non-goals (v1) + +- C++, Objective-C. +- C11 variable-length arrays and variably-modified types (`__STDC_NO_VLA__`). +- Cross-TU LTO, PGO, autovectorization beyond peephole-level idiom recognition. +- Thread-safe parallel compile inside one process. +- Sanitizers, coverage instrumentation. +- `_Generic` corner cases that require multi-pass disambiguation are best-effort. + +## 3. Layout + +``` +include/ public C11 headers shipped with the compiler (the runtime) +lib/ compiler-rt (the runtime) +src/ + core/ allocators, intern pool, source manager, diagnostics, buffers, target + lex/ shared tokenizer (C and asm) + pp/ C preprocessor + type/ target-neutral C type interning and compatibility + abi/ target ABI type layout and call classification + decl/ C declaration, linkage, storage-duration, and initializer model + parse/ C11 parser, asm parser + cg/ single-pass value-stack code generator + arch/ CGTarget + MCEmitter interfaces and per-arch backends + opt/ lightweight SSA IR + passes; presents itself as a CGTarget + obj/ in-memory object model + per-format file writers and readers + debug/ DWARF info collection + emission + link/ symbol resolution, relocation, exe writer, JIT linker + driver/ multi-call dispatch and command-line front-ends +test/ +doc/ +``` + +The compiler source lives in `src/`. `include/` and `lib/` are the runtime that +ships *with* the compiler (the freestanding stdlib and compiler-rt) and are +not built by the compiler-development tree. + +## 4. Dataflow + +``` +.c → lex → pp → parse_c → decl + cg → CGTarget → MCEmitter → ObjBuilder ──┬──→ emit_{elf|coff|macho|wasm} → .o / exe +.s → lex → parse_asm ───────────────→ MCEmitter ──────────────────┤ + ├──→ link_file (.o + archives → exe) + └──→ link_jit (mmap + exec) +``` + +Reading order, left to right: + +1. `lex` produces a stream of raw tokens (idents, numbers, punctuators, + strings). Tokens preserve exact spelling; literals carry deferred `LitId` + handles rather than host-decoded numeric values. +2. For C: `pp` consumes tokens, expands macros, and emits a stream of + preprocessed tokens. For asm: tokens go straight to `parse_asm`. +3. `parse_c` is recursive-descent over preprocessed tokens. It records C + declaration semantics in `DeclTable` and drives `cg` for executable code. + There is no explicit AST. +4. `cg` maintains a value stack à la TCC. Each parser action manipulates that + stack: pushes, loads, stores, aggregate copies, conversions, calls. At + `-O0`, CG owns live value lifetimes, spills, reloads, and preservation + across calls/asm; the target provides scratch registers and spill/reload + mechanics. +5. `CGTarget` is the typed C/IR lowering vtable. Concrete targets lower those + operations into machine emission; the optimizer also implements `CGTarget` + by recording the call sequence as IR per function, running + intra-procedural passes on `func_end`, and on `cgtarget_finalize` running + cross-function passes before replaying into the wrapped target `CGTarget`. +6. `MCEmitter` is the machine/object emission vtable. It owns section position, + bytes, alignment/fill, relocations at explicit offsets, machine-label + references, and source locations for debug line emission. +7. `ObjBuilder` is the single in-memory object representation. It accepts + sections, bytes, symbols, and relocations on the write side, and exposes + read accessors for file writers, the linker (file and JIT), and objdump. + +`parse_asm` bypasses `cg` and writes directly into `MCEmitter`; inline asm +is a typed `CGTarget.asm_block` operation that lowers through the target's asm +machinery. See §10. + +## 5. Key interfaces + +### 5.0 `SourceManager` (`src/core/core.h`) — source identity + +`SourceManager` is owned by `Compiler` and is the authority for `SrcLoc.file_id`. +It registers real files, memory inputs, builtins, and macro-expansion pseudo +files; maps file ids back to normalized paths and diagnostic spellings; records +include edges; and exposes dependency iteration for `-M*` output. Lexer and +preprocessor create source ids through it. Diagnostics, DWARF, dependency +generation, and reproducible-build path handling read from it rather than +inventing their own file tables. + +Macro-expanded tokens keep both spelling and expansion locations. Consumers +that need user-facing diagnostics can ask for spelling locations; consumers +that need execution/profiling/debug line attribution can ask for expansion +locations. `Debug.debug_file` takes a source file id, not a raw path. + +### 5.1 `CGTarget` (`src/arch/arch.h`) — typed lowering + +`CGTarget` is a vtable representing "something that can accept typed C/IR +operations for one function at a time". `cg` calls `CGTarget` after it has +resolved an operation's operands to concrete `Operand` values (immediate, +register, frame-relative, object-symbol-relative, indirect). Direct target +implementations lower these operations into their `MCEmitter`; `opt` wraps a +target `CGTarget` and records the same operations as IR before replaying them +later. + +Method groups: + +- **Function lifecycle.** `func_begin(CGFuncDesc)`, `func_end`. + `CGFuncDesc` carries the function `ObjSymId`, `fn_type`, inspectable + `ABIFuncInfo`, parameter descriptors, and declaration location. +- **Frame slots, parameters, and value lifetimes.** `frame_slot(FrameSlotDesc)` + creates stable frame-resident storage for locals, parameters, spills, sret, + and dynamic-allocation bookkeeping. `param(CGParamDesc)` binds a source + parameter index to its stable slot and ABI incoming parts. `alloc_reg(class, + type)` returns a + physical scratch register for real targets and a fresh virtual for + `opt_cgtarget`. CG, not the target, owns the `-O0` value stack: it uses + `clobbers`, `spill_reg`, and `reload_reg` to preserve live values across + register pressure, calls, and inline asm. `free_reg` releases a value-stack + claim; `opt_cgtarget` treats it as a hint. +- **Control flow.** `label_new`, `label_place`, `jump`, `cmp_branch` (fused + compare-and-branch; the only conditional-branch primitive — for arbitrary + i1 values cg synthesizes `cmp_branch(CMP_NE, val, IMM_ZERO, label)`). +- **Structured control flow.** `scope_begin(CGScopeDesc)`, `scope_else`, + `scope_end`, `break_to`, `continue_to`. `CGScopeDesc` carries explicit break + and continue labels, so C `for` continues land on the increment expression + instead of assuming the loop header. Real backends shim these onto + `label_new`/`label_place`/`jump` (no code-size cost). The WASM backend + consumes them natively to emit block/loop/if with structurally-bounded `br` + targets. `goto`, computed-goto, and `switch` fallthrough still go through + the flat label API. opt's IR is flat-CFG; at -O2 the WASM lowering pass + reconstructs structure from the flat IR. +- **Data movement and aggregates.** `load_imm`, `load_const`, `copy`, `load`, + `store`, `addr_of`, `copy_bytes`, `set_bytes`, `bitfield_load`, and + `bitfield_store`. Scalar memory operations carry `MemAccess`; aggregate and + bitfield operations carry ABI-sized metadata so struct assignment, block + zeroing, byval copies, and bitfield accesses remain visible to opt and + direct backends. +- **Arithmetic / compare / convert.** `binop` uses explicit integer and + floating-point op families (`BO_I*`, `BO_F*`) rather than inferring behavior + from operand type. `cmp` materializes 0/1; use `cmp_branch` when the result + feeds a branch. `convert` is explicit by `ConvKind`. +- **Calls / return.** `call(CGCallDesc)` and `ret(CGABIValue*)`. The parser + type-checks `fn_type`; CG asks `TargetABI` for `ABIFuncInfo`, materializes + `CGABIValue`/`CGABIPart` arrays for direct, indirect/byval, sret, split, and + multi-register values, and passes that structured call/return shape to the + target. `callee.kind == OPK_GLOBAL` is a direct call; any other kind is + indirect. On WASM, `fn_type` selects the `call_indirect` type index — + interned `Type*` identity is the index source of truth (§12). +- **alloca.** `alloca(dst, size, align)` — dynamic stack allocation. Reachable + only via `__builtin_alloca` since v1 does not parse VLAs (§2). Backend grows + the linear-memory or native shadow stack; result pointer in `dst`. +- **Variadics.** `va_start`, `va_arg`, `va_end`, `va_copy`. `<stdarg.h>` macros + expand to compiler builtins which CG forwards here. Per-arch ABI: SysV + x86-64 manages the register-save area; arm64 manages its split gp/fp areas; + WASM walks the spilled-args memory. +- **setjmp / longjmp.** Optional methods. Real backends leave them NULL: the + parser lowers `<setjmp.h>`'s `setjmp` to a normal call to `__cfree_setjmp` + (a hand-written .S in `lib/`) and opt recognizes the symbol by name as + returns-twice (no inlining across; values defined before the call are not + GVN-merged with values defined after). The WASM backend implements + `setjmp_`/`longjmp_` via the exception-handling proposal — there is no + saveable native SP, so a library-only implementation is impossible. +- **Atomics.** `atomic_load`, `atomic_store`, `atomic_rmw`, `atomic_cas`, + `fence`. Atomic memory operations carry both `MemAccess` and `MemOrder`. + Backends route oversized atomics to compiler-rt; small atomics are inline. +- **Inline asm.** `asm_block(tmpl, outs, ins, clobbers)` — per-arch + constraint binding plus template assembly, packaged as one operation. The + asm parser is reused as a template walker inside this call, but final bytes + and relocations are emitted through `MCEmitter`. +- **Source location.** `set_loc(SrcLoc)` — sticky; subsequent emit-side + calls inherit it. `opt_cgtarget` stamps it onto each `Inst.loc`; target + backends forward it to `MCEmitter` for `Debug.line`. +- **End-of-TU.** `finalize`. + +Implementations: + +- Real CGTargets per arch under `src/arch/`. Their `finalize` is a no-op. +- `opt` (`src/opt/opt.h`) returns a wrapper CGTarget that records into IR. + Its `finalize` runs cross-function passes and lowers all buffered IR into a + wrapped target CGTarget. + +### 5.2 `MCEmitter` (`src/arch/arch.h`) — machine/object emission + +`MCEmitter` is the low-level emission vtable shared by target backends and +assembler input. It owns the current section, byte position, machine-label +creation/placement, raw byte output, fill/alignment, relocations against +`ObjSymId` at explicit offsets, label references/fixups, and sticky source +locations used by the debug line program. + +`CGTarget` implementations may hide instruction selection, register +allocation, prolog/epilog emission, and instruction encoding behind their +typed methods, but when they finally write object contents they go through +`MCEmitter`. `parse_asm` uses the same emitter directly because assembler +input is already machine-level syntax. + +### 5.3 Symbol identity — object-first + +`Sym` is only an interned spelling. It is used for identifiers, section names, +debug names, and lookup keys, but it is not a symbol table entry. + +`ObjSymId` is the authoritative symbol handle during compilation, assembly, +object reading, relocation emission, debug collection, and link input. It is +scoped to one `ObjBuilder`, so two objects can both contain a local `static +int x` without colliding, and an object reader can preserve local labels, +section symbols, file symbols, unnamed temporary symbols, and external +references faithfully. Parser declaration binding creates or reuses +`ObjSymId`s in the current builder; `cg`, `CGTarget`, `MCEmitter`, `Debug`, and `ObjBuilder` +traffic in those handles. + +The linker has its own resolved-symbol table built from each input object's +`ObjSymId`s. Externally visible definitions are matched by `Sym` name and +binding during resolution. JIT lookup and explicit entry selection are +therefore name-based (`Sym`), not handle-based: object symbol handles are not +portable across builders. + +### 5.3.1 `DeclTable` (`src/decl/decl.h`) — C declarations + +`DeclTable` is the C-language declaration layer above `ObjBuilder`. The parser +uses it for storage class, linkage, visibility, TLS, inline/weak attributes, +tentative definitions, static locals, explicit sections, and global +initializers. It returns `DeclId`s for parser and CG bookkeeping and owns the +mapping from a C declaration to its object-scoped `ObjSymId`. + +Global initialization is a list of `InitItem`s: zero ranges, exact +`ConstBytes`, relocatable symbol references, and fills. `DeclTable` applies C +rules such as tentative-definition coalescing and default section selection, +then writes concrete sections, bytes, symbols, and relocations into +`ObjBuilder`. `ObjBuilder` remains object-format canonical storage and does not +learn C storage-duration rules. + +### 5.4 `TargetABI` (`src/abi/abi.h`) — target layout authority + +`Type` is structural and target-neutral: kind, qualifiers, element/parameter +types, immutable record fields, array counts, scoped tag ids, tag spellings, +and bitfield flags/widths. +Records are built through a mutable `TypeRecordBuilder` and committed to an +interned immutable `Type*`. Field flags distinguish normal fields, anonymous +fields, flexible array members, bitfields, and zero-width bitfields. `Type` +does not own target-dependent facts such as scalar widths, record size, field +offsets, bitfield packing, aggregate alignment, or calling-convention +classification. + +Record and enum tags carry a `TagId` in addition to their `Sym` spelling. +`Sym` is only the diagnostic/debug spelling; `TagId` is scoped declaration +identity. This prevents two unrelated `struct S` declarations in different C +scopes from collapsing under global type interning. + +`TargetABI` is the one authority for those facts. It is initialized from +`Compiler.target` and is available as `Compiler.abi`. Its responsibilities: + +- Builtin scalar profiles: width/alignment/signedness of C scalar types, + pointer size/alignment, `long double`, enum representation policy, and + target-defined library types (`size_t`, `ptrdiff_t`, `intptr_t`, + `uintptr_t`, `va_list`). +- `sizeof`/`_Alignof` for every complete type. +- Record layout: field byte offsets, bitfield storage units, bit offsets, + final size, final alignment, and incomplete-type diagnostics. +- Calling convention classification: direct/indirect/split aggregate + arguments, return values, hidden sret pointers, byval copies, variadic + register-save/spill behavior, stack slot alignment, and inspectable + per-part placement data. + +Consumers must ask `TargetABI` rather than reading layout facts from `Type`. +Parser/type checking use it for `sizeof`, `_Alignof`, field access, enum +constant typing, and diagnostics. `cg` uses it before creating frame slots, +before emitting aggregate/bitfield operations, and when selecting conversions. +Calls use a hybrid model: `TargetABI` returns rich `ABIFuncInfo` data; CG turns +that into `CGABIValue`/`CGABIPart` operands; target hooks handle only final +instruction/OS-specific mechanics. `Debug` uses ABI data for DIE sizes, member +locations, parameter locations, and sret/byval facts. + +### 5.5 `ObjBuilder` (`src/obj/obj.h`) — concrete + +The single in-memory object representation. There is no second implementation, +so it is a concrete type rather than a vtable. Object, section, group, and +symbol handles are explicit (`OBJ_SEC_NONE`, `OBJ_GROUP_NONE`, +`OBJ_SYM_NONE`). The write API +(`obj_section`/`obj_write`/`obj_reserve_bss`/`obj_symbol`/`obj_reloc`/ +`obj_finalize`) is what MCEmitter, CGTarget, and `.o` readers use; the read API +(`obj_section_get`/`obj_relocs`/`obj_symbol_get`, symbol iteration with ids) is +what file emitters, the linker, JIT, and future objdump use. + +`ObjBuilder` is a canonical superset model, not merely "bytes plus names". +Sections carry both coarse compiler kind (`SEC_TEXT`, `SEC_DATA`, ...) and +object semantics (`SSEM_PROGBITS`, `SSEM_RELA`, `SSEM_GROUP`, ...), flags, +alignment, entry size, link/info references, and group membership. Symbols +carry binding, kind, visibility, absolute/common/TLS state, common alignment, +and object-scoped identity. Relocations record kind, explicit-addend versus +in-place addend, pairing, target symbol, and addend. COMDAT/group membership +is represented explicitly. `Writer` is a real byte sink with write, seek, tell, +error, and close operations so file emitters do not depend on a hidden I/O +side channel. + +Format-specific metadata is admitted only through typed enum fields +(`ObjExtKind`, semantic kinds, flags) and narrowly-scoped extension values +where a real format has no shared equivalent. Avoid opaque `void*` sidecars: +linker, JIT, emitters, readers, and objdump must be able to inspect the +canonical model without knowing which reader produced it. + +The invariant: the post-finalize state of an `ObjBuilder` is the same shape +as what you'd get from reading a `.o` back in. So `read_elf` of a freshly +emitted file produces an `ObjBuilder` indistinguishable from the one used to +emit it, modulo permitted canonicalization of section ordering and string-table +layout. Consumers (linker, objdump) don't care which path produced it. + +### 5.5.1 `LinkImage` (`src/link/link.h`) — resolved program image + +`Linker` accepts explicit inputs (`LinkInputId`) for fresh objects, object +files, and archives. Resolution produces a `LinkImage`: a shared file/JIT data +model containing resolved symbols (`LinkSymId`), final symbol addresses, +segments, laid-out section placements (`LinkSectionId`), segment bytes, and +relocation applications with concrete write locations. Undefined, duplicate, +unsupported-relocation, and layout failures are fatal diagnostics through +`Compiler.panic`. + +Executable emission and JIT mapping consume the same `LinkImage`. File writers +read segment bytes, section placements, final addresses, and relocation records +from the image. JIT maps fresh writable memory, copies the same segment bytes, +applies relocation records at their `write_vaddr` locations, resolves allowed +external symbols through `LinkExternResolver`, changes final permissions, and +looks up exported/entry symbols by resolved `Sym` name. Object-local +`ObjSymId` values never escape as JIT lookup handles. `JitImage` owns the mapped +memory; the caller owns the `LinkImage` unless an API explicitly documents a +transfer. + +### 5.6 `MemAccess` — explicit memory semantics + +`MemAccess` is attached to every typed memory operation (`load`, `store`, +atomics, and IR memory instructions). It contains: + +- `type`: the semantic C object type being accessed. +- `size`: ABI byte width of the access. +- `align`: known byte alignment; `0` means unknown. +- `flags`: volatility, atomicity, restrict-derived noalias facts, readonly / + writeonly knowledge, and explicit unaligned accesses. +- `addr_space`: target address space / memory index (`0` for ordinary C + memory; WASM may use this for multiple memories later). +- `alias`: an alias root, one of unknown, local, global `ObjSymId`, parameter, + heap, or string literal. + +`cg` derives `MemAccess` when it turns an lvalue into a memory operation: +qualifiers supply `volatile` and `_Atomic`, `TargetABI` supplies size and +minimum alignment, declaration binding supplies local/global/parameter roots, +string literals supply string roots, and pointer arithmetic preserves the +best known root until it escapes. Casts that lose provenance downgrade the +root to `ALIAS_UNKNOWN`; `restrict` pointers create parameter roots with the +restrict flag. + +Optimization rules: + +- Volatile memory operations are side effects. They may not be deleted, + merged, reordered with other volatile operations, or moved across calls or + inline asm with a memory clobber. +- Atomic operations use both `MemAccess` and `MemOrder`; memory-order rules + dominate ordinary alias reasoning. +- Nonvolatile accesses with disjoint known alias roots may be reordered or + used for redundant-load and dead-store elimination. +- Unknown alias roots conservatively may alias any ordinary memory. +- The metadata is a permission to optimize, not a UB oracle: opt still may + not assume invalid programs are unreachable (§9). + +### 5.7 `ConstBytes` — exact literal materialization + +`ConstBytes` is the representation for constants whose exact target bits +matter. It carries the semantic `Type*`, ABI representation bytes, size, and +alignment. The bytes are produced by literal parsing plus `TargetABI`, never +by trusting host floating-point layout. This matters for hex floats, +rounding, `float` versus `double`, target-specific `long double`, endian +order, and future vector constants. + +`CGTarget.load_imm(dst, i64)` remains a convenience for small integer +constants. `CGTarget.load_const(dst, ConstBytes)` is the general path. Target +backends may encode the constant as an immediate, synthesize it with +instructions, or place it in a constant pool / `.rodata` and emit a load. +`cg_push_const` pushes an exact constant. `cg_push_float(double, type)` exists +only as a convenience for parser paths that have already accepted host-double +precision loss as harmless; conforming literal parsing should prefer +`cg_push_const`. + +### 5.8 Tokens and literals — spelling first, decoding later + +`Tok` preserves exact token spelling for diagnostics, macro stringification, +token pasting, dependency output, and faithful preprocessing. Numeric, +character, and string literals carry a `LitId` into the lexer's/preprocessor's +literal table. A literal record stores kind, encoding, suffix/encoding flags, +the exact spelling, and decoded bytes/code units only when decoding is already +target-independent. + +The lexer does not choose final C literal types and does not round floating +literals through host `double`. The parser, with `TargetABI`, performs integer +literal type selection, floating parsing/rounding, character literal value +selection, string literal concatenation, and construction of exact +`ConstBytes`. The preprocessor uses spelling and `LitId` to implement `#`, +`##`, `__LINE__`/`__FILE__`, include handling, and macro expansion without +discarding information the parser later needs. + +Bad literals remain tokens with `TF_LITERAL_BAD` plus spelling and source +location so diagnostics can point at the exact source text and recovery can +continue. + +## 6. Allocators and lifetimes + +cfree uses explicit allocators rather than a single global heap. Allocators are +fields of `Compiler` (`src/core/core.h`) and are passed down to subsystems. + +| Allocator | Lifetime | Owns | +|--------------|------------------------|--------------------------------------------------------| +| `Pool global`| Process | Interned strings and interned types. | +| `Heap output`| Output object/exe | Section chunks, reloc tables (survive into linker). | +| `Arena tu` | One TU compile | Local symbols, parser scratch, SourceManager tables, ABI caches. | +| `Arena scratch` | Reset per function | Value-stack scratch, fixup lists, lookahead buffers. | + +Rules: + +- A struct never owns its own heap implicitly. If it allocates, an allocator + reference is part of its API. +- Arena resets are an explicit operation on the arena. Subsystems holding + pointers into a scratch arena must either copy them out before reset, or + treat them as invalidated. +- Long-lived data (anything that outlives a TU) goes through `Pool global` or + `Heap output`. Don't copy from arenas into one of those — interning is the + only path in. +- Source identities live in `Compiler.sources`. They are stable for the + compile/link invocation and are read by diagnostics, dependency output, and + DWARF emission. + +`Heap output` is a normal heap (typically `heap_libc`). The JIT does not +compile directly into executable memory: `link_jit_image` consumes a resolved +`LinkImage`, mmaps a fresh region, copies laid-out segments in, applies +relocations in-place, and `mprotect`s final permissions. The `Heap` vtable +still exists so the JIT can swap allocators for the *destination* mapping and +so tests can substitute fakes. + +## 7. Error handling + +A single `Compiler` carries a `jmp_buf` and a `DiagSink`. Fatal errors call +`compiler_panic`, which emits a diagnostic and `longjmp`s out of the entire +parse/CG pipeline. Drivers establish the `setjmp` boundary at TU granularity. + +This means almost no function in `parse`, `cg`, or `arch` returns an error. The +happy path is the only path. Subsystems clean up via arena reset, not by +unwinding allocations one-by-one. + +What is *not* fatal: warnings, recoverable parse errors that have a sensible +recovery point (skip-to-`;`, skip-to-`}`). The parser uses limited internal +recovery for these and only escalates to `compiler_panic` when continued +parsing would produce cascading garbage. + +## 8. Streaming + +Streams cleanly on direct lowering (`-O0` and targets that do not wrap with +`opt_cgtarget`): + +- Lexer → preprocessor token stream. +- Preprocessor → parser token stream. +- Parser → CG → CGTarget calls within a function. +- CGTarget → MCEmitter → ObjBuilder section bytes, appended via chunked buffers. + +Buffers per function (bounded, not per TU): + +- CG's value stack and label fixup tables. +- Per-target register/frame state. +- Optimizer's IR for the function being optimized, when only intra-procedural + passes are enabled. + +Buffers per TU: + +- Symbol tables — relocations cannot be resolved until all definitions are + seen. Final patching is deferred to ObjBuilder finalize / linker. +- Debug info — DWARF tables reference final section layout. +- `-O2` optimizer IR — cross-function inlining keeps all candidate function IR + and call graph metadata until `cgtarget_finalize`. + +So the streaming guarantee is tiered: + +- `-O0` direct target: source and codegen are function-at-a-time. +- `-O1` target-local optimization: function-at-a-time unless a target opts + into specific buffering. +- `-O2`: source is still read once, but optimized function IR may be retained + per TU for IPO. This is intentional and bounded by the TU, not the whole + program. + +## 9. Optimizer + +`opt` (`src/opt/opt.h`, `src/opt/ir.h`) implements `CGTarget`. The pass set and +ordering are modelled on MIR (`mir-gen.c`) — that pipeline is proven, well +understood, and a good fit for the "70% of -O2" target. The one cfree +addition is cross-function inlining, which MIR does not have. + +IR shape: block-based SSA. Functions are lists of basic blocks; blocks have +`Phi`s at the top; instructions reference values by SSA id. `Func` also owns +first-class frame-slot and parameter tables so `-O0` frame residency, +parameter ingress, mem2reg promotion, and debug locations all refer to the +same objects. The op set is small (integer constants, exact byte constants, +mem ops, aggregate ops, bitfield ops, explicit integer and floating-point +arith, compares, conversions, GEP, calls, terminators, an opaque `ASM_BLOCK`, +plus `IR_VA_*` and `IR_SETJMP`/`IR_LONGJMP`). `Inst` stays compact; ordinary +instructions define one `Val`, while multi-result instructions carry +`defs[0..ndefs)`. Complex per-op facts live in arena-owned typed aux structs +(`IRCallAux`, `IRAggregateAux`, `IRBitFieldAux`, `IRGepAux`, `IRAsmAux`, +`IRPhiAux`, `IRCasAux`). This keeps calls, aggregate copies, asm, CAS +multi-results, and ABI metadata inspectable by passes without turning every +instruction into a large union. + +The IR is flat-CFG: structured-scope ops on `CGTarget` (§5.1) are flattened by +`opt_cgtarget`'s recorder into ordinary labels, branches, and basic blocks. WASM +lowering at -O2 therefore needs to reconstruct structure (relooper) before +emitting. At -O0/-O1 there is no `opt_cgtarget` wrapper and CG drives the WASM +backend directly, producing structured output by construction. + +`IR_SETJMP` is a control barrier: opt does not inline across it, does not +hoist through it, and does not GVN-merge values defined on either side. +`IR_LONGJMP` has no successors (control does not return). The library setjmp +symbol used on real arches is recognized by name and gets the same treatment +when it appears as the callee of an `IR_CALL`. + +**No UB-exploiting passes.** Rules in opt may not assume that a UB-triggering +operation (signed overflow, shift-by-≥-width, division by zero, null deref) +is unreachable. WASM traps deterministically on the first three and faults on +the fourth — the program terminates rather than time-traveling. Real-target +behavior is also more predictable this way. The "70% of -O2" goal is +achievable without these rules; reserved bits in `Inst.flags` can host +`nsw`/`nuw`-style annotations later if a specific non-UB-exploiting pass +needs them. + +### 9.1 Lifecycle + +- `func_begin` allocates a fresh `Func` IR container in the per-TU IR arena. +- `alloc_reg(class, type)` returns a fresh virtual `Reg` whose mapping to a + `Val` is recorded; `free_reg` is a hint and ignored. +- `frame_slot` and `param` populate `Func.frame_slots` and `Func.params`. + Parameter ABI incoming parts are visible to later promotion, debug, and + replay. +- Every other emit call appends one SSA `Inst` to the current basic block. + Each `Inst` carries the `SrcLoc` set by the most recent `CGTarget.set_loc`. + `call(CGCallDesc)`, `atomic_cas`, and ABI split returns use the multi-result + `defs` convention. +- `func_end` runs the **intra-procedural** pipeline (§9.2) and stores the + optimized `Func`. **No lowering yet.** +- `cgtarget_finalize` runs the **inter-procedural** pipeline (§9.3) over all + buffered functions, then for each function runs the **lowering** pipeline + (§9.4) which drives the wrapped target CGTarget via `CGTarget.set_loc` + + emit-side calls. + +The driver therefore looks like: + +```c +parse_c(c, pp, decls, cg); +cgtarget_finalize(target); /* no-op for plain CGTarget; runs IPO+lower for opt */ +emit_elf(c, ob, w); +``` + +At `-O0` the wrapper is not used and the target CGTarget is driven directly +during parse, with no function IR retention. `-O1` may use only local +lowering/target peepholes and remains function-at-a-time. `-O2` uses +`opt_cgtarget` and may retain IR for all functions in the TU. + +Memory cost at `-O2`: the IR for every function in a TU is held in the per-TU +IR arena until `cgtarget_finalize`. Per-pass scratch lives in `Arena scratch`, +not in the IR arena. + +### 9.2 Intra-procedural pipeline (per `Func`, on `func_end` at `-O2`) + +``` +build_cfg +block_cloning (hot path duplication; skipped if it would block addr_xform) +build_ssa +addr_xform (fold GEP-equivalent address insns into uses) +gvn (incl. constprop, redundant-load elimination) +copy_prop (incl. redundant-extension elimination) +dse (dead store elimination) +ssa_dce +build_loop_tree + licm +pressure_relief +make_conventional_ssa + ssa_combine + undo_ssa +jump_opt +``` + +### 9.3 Inter-procedural pipeline (over all `Func`s, on `cgtarget_finalize`) + +Inlining doesn't pay off without a follow-up: the new opportunities (callee +arguments that are now constants, branches in the callee that are now dead, +redundant ops shared across the caller/callee boundary, callee bodies that +landed inside a caller loop) only get realised by re-running intra-procedural +passes on the modified caller. + +``` +opt_inline (call-graph bottom-up; SCCs skipped for v1) +for each dirty caller: + opt_cleanup (subset re-run: gvn, copy_prop, ssa_dce, jump_opt, + licm if loops, addr_xform if uses remain) +``` + +Iteration (`inline → cleanup → inline → ...`) is bounded by `-finline-iters=N` +(default 1, hard cap enforced by opt_cgtarget). Tuning is benchmark-driven. + +### 9.4 Lowering pipeline (per `Func`, after IPO, drives target CGTarget) + +``` +machinize (target ABI lowering, 2-op forms, call lowering) +build_loop_tree (-O1+, used by RA) +coalesce (-O2, move-related) +live_info +regalloc (linear scan; live-range splitting at -O2) +combine (-O1+, code selection: merge dependent insns) +dce (-O1+, post-RA) +opt_emit (prolog/epilog; insn split; drive target CGTarget) +``` + +### 9.5 Inline asm + +`ASM_BLOCK` is opaque: passes treat it as reading its input operands, writing +its output operands and clobbers, and not commuting with surrounding memory +ops. Inline asm is therefore safe across optimization without per-asm +modelling. + +## 10. Inline asm + +Two callers exercise the asm machinery: + +- Standalone `.s`: tokens → `parse_asm` → `MCEmitter.emit_bytes`/ + `emit_reloc_at`/`emit_label_ref` → `ObjBuilder`. Bypasses cg entirely; + operands are literal registers, immediates, labels, and symbols from the asm + syntax itself. + Standalone `.s` does not go through `opt_cgtarget`. +- Inline `asm("...": outs : ins : clobbers)` inside C: invoked via + `cg_inline_asm`. Flow: + + 1. Parser parses constraint list and template; evaluates each input/output + expression so inputs are `SValue`s on the CG stack and each output binds + an lvalue. + 2. cg pops inputs (in declaration order), packs them into an `Operand[]`, + and calls `CGTarget.asm_block(tmpl, outs, ins, clobbers)`. + 3. The arch implementation does **constraint binding** (`r`, `m`, `i`, + `=&r`, matching constraints, ...), then walks the template and assembles + each instruction. Under `opt_cgtarget` this is recorded as one `IR_ASM_BLOCK` + and replayed on the target arch at lowering time, after RA has assigned + the bound virtuals to physicals. + 4. arch fills `out_ops[]` with the location holding each result; cg pushes + those back as new SValues. + +The asm parser is shared between the standalone path (writing directly to +`MCEmitter`) and the inline path (used as a template walker inside +`CGTarget.asm_block`). Constraint binding is per-arch. + +`"memory"` clobber is conservative: cg flushes all live stack-resident values +to memory before the block and reloads after. This is suboptimal but +correct. + +Asm syntax (decided, single supported flavour per arch): + +- x86 (32 + 64): AT&T. Same parser serves both inline asm and standalone + `.s`. Matches GCC inline-asm convention. +- ARM (32 + 64): GNU `as` ("unified") syntax. +- RISC-V (32 + 64): GNU `as` syntax. +- WASM: WAT (text format). + +Open: full GCC-syntax constraint coverage (early-clobber, matching `0`, +multi-alternative). v1 covers `r`, `m`, `i`, `a`, `=r`, `+r`, `=m`, `=&r`, +matching constraints. The remainder is deferred. + +## 11. DWARF debug info + +Debug info lives in `src/debug/` and is owned by a single `Debug` object that +collects events during compilation and emits `.debug_*` sections at the end +of the TU. + +**Inputs (called during compilation):** + +| Producer | Calls | +|---|---| +| Driver | `debug_file(source_file_id)` to populate the DWARF file table from `SourceManager`. | +| CG | `debug_func_begin/end`, `debug_scope_begin/end`, `debug_param`, `debug_local`. cg holds an optional `Debug*` (NULL when `-g` is off). | +| MCEmitter (or opt's lowering pass) | `debug_line` per emitted instruction, sourced from the `SrcLoc` set by `CGTarget.set_loc`/`MCEmitter.set_loc`; `debug_func_pc_range` after each function is laid out. | +| opt at `-O2` | `debug_loclist_*` when a variable's location changes across the function. The `SrcLoc` propagates through opt because every recorded `Inst` carries it. | + +**Outputs:** `.debug_info`, `.debug_abbrev`, `.debug_line`, `.debug_str`, +`.debug_aranges`, `.debug_rnglists`, `.debug_loclists` — written into the +same `ObjBuilder` when `debug_emit` is called. `debug_emit` runs after all +code sections are finalized but before file emitters consume the builder. + +**Variable locations:** at `-O0`, all locals live at stable frame offsets and +`DebugVarLoc` is `DVL_FRAME`; this gives full debuggability for free. With +`opt`, the lowering pass produces `DVL_LOCLIST` entries describing where a +variable lives across PC ranges. v1 may downgrade opt'd debug info to +function-level only (start/end PC, no locals); refining to per-variable +location lists is a follow-up but the interface already accommodates it. + +**Type DIEs:** generated on demand from the `Type*` reaching `debug_local` / +`debug_param`, with sizes, alignments, and member offsets supplied by +`TargetABI`. Interned by `Type*` identity (which is already pointer-equal for +equal types thanks to `Pool global`). + +## 12. Cross-cutting decisions + +- **Interning is global**, in `Pool global`. `Sym` (32-bit string id) is the + currency for spellings and lookup keys, not symbol identity. Symbol table + identity is object-scoped (`ObjSymId`, §5.3) until the linker resolves + definitions. C tag identity is scoped `TagId`, not `Sym`, so equal tag + spellings in different scopes remain distinct. Equal types are pointer-equal + after `pool_type` (same applies to strings: pool_intern returns the canonical + id). On WASM, this `Type*` identity is also the source of truth for + `call_indirect` type-index assignment. +- **Source identity is centralized.** `SrcLoc.file_id` belongs to + `SourceManager`, not to the lexer, preprocessor, diagnostics, or debug + emitter. Macro expansion and include edges are recorded once and reused by + diagnostics, DWARF, and dependency generation. +- **Locals and parameters always start frame-resident.** `cg_local` and + `cg_param` allocate stable `FrameSlot`s through `CGTarget.frame_slot` and + `CGTarget.param`. A mem2reg-style pass during opt's lowering pipeline + promotes non-address-taken slots to virtual registers (and to WASM-locals on + that target). At -O0 every slot stays on the frame, which is the same shape + `Debug` wants for `DVL_FRAME` (§11) — full debuggability for free, no parser + pre-scan needed. +- **Function-pointer ABI is a linker concern.** A function symbol's address + taken via `&f` lowers to a normal `ObjSymId`-relative `Operand`. + ELF/COFF/Mach-O resolve this directly. WASM file emitters and the JIT linker + walk function-address relocations (`R_WASM_FUNCIDX` / `R_WASM_TABLEIDX`) while + building the shared `LinkImage` and assign indirect-function-table slots; the + slot index is the pointer's bit pattern. CG and `CGTarget` are unaware. +- **Sections are chunked.** A `Section.bytes` is a linked list of fixed-size + chunks. Append is O(1). Backward patching uses a 32-bit flat offset + computed at finalize time, so forward fixups don't depend on chunk + boundaries. +- **Error model is `setjmp`/`longjmp`.** See §7. +- **Single-pass parser+CG.** No separate AST. The optimizer reconstructs an + IR by recording CGTarget calls; this is technically two-pass *within a function* + but the source is read once. +- **Self-hosting constraint.** Anything in `src/` must be writable in C11 + freestanding (with the runtime in `include/`/`lib/`). No GNU extensions, no + libc beyond what cfree itself ships. Bootstrap is hex0-seed → small subset + → full cfree; details TBD. + +## 13. Build composition + +A typical `cc` invocation composes the pipeline like this: + +```c +Compiler c_store; +Compiler* c = &c_store; +compiler_init(c, target); /* creates SourceManager, ABI, allocators */ +Pp* pp = pp_new(c); +ObjBuilder* ob = obj_new(c); +DeclTable* decls = decl_new(c, ob); +MCEmitter* mc = mc_new(c, ob); +CGTarget* a = cgtarget_new(c, ob, mc); +if (opt_level >= 1) a = opt_cgtarget_new(c, a, opt_level); +Debug* d = dbg ? debug_new(c, ob) : NULL; +CG* g = cg_new(c, a, d); + +pp_push_input(pp, lex_open(c, input_path)); +parse_c(c, pp, decls, g); + +cgtarget_finalize(a); /* IPO + lowering at -O2; no-op otherwise */ +if (d) debug_emit(d); +obj_finalize(ob); +Writer* w = writer_file(output_path); +emit_elf(c, ob, w); +writer_close(w); +``` + +Order is load-bearing: `cgtarget_finalize` flushes lowered code, `debug_emit` +appends `.debug_*` sections, `obj_finalize` freezes the read-side view, and +only then may file emitters consume the builder. + +JIT swaps the final emit for: + +```c +Linker* l = link_new(c); +link_add_obj(l, ob); +LinkImage* img = link_resolve(l); +JitImage* jit = link_jit_image(img); +entry = jit_image_lookup(jit, entry_sym); +``` + +## 14. Open questions + +- WASM is structurally different from the register-shaped CGTarget (stack VM, + no ELF-style relocations). The `Operand`-driven CGTarget will lower verbosely + (every `binop` becomes `local.get; local.get; iN.add; local.set`); a + follow-up peephole pass for stack-shape lowering will reclaim most of the + bloat. Worth prototyping early to validate the abstractions. +- Bootstrap subset definition: which features must the seed compiler accept? +- Debug-info quality at `-O2`: minimum acceptable v1 is function-level + (low_pc/high_pc + parameter list at entry); per-variable location lists + for opt'd locals are a follow-up but the `Debug` interface admits them. +- WASM relooper at -O2: choosing between Stackifier-style (preserve flat CFG + with relooped wrappers) and Relooper-style (reconstruct nested scopes). + Affects code size and opt's freedom to introduce irreducible CFGs. +- Full VLA support beyond `__builtin_alloca`: deferred for v1 + (`__STDC_NO_VLA__=1`). The `IR_ALLOCA`/`CGTarget.alloca_` interface accommodates + it when the parser is extended. + +## 15. Safety model (WASM target) + +cfree's WASM backend inherits the WebAssembly sandbox; the goal here is to be +explicit about what that does and does not buy. + +**Checked at runtime:** + +- **Linear-memory bounds.** Every load and store traps on out-of-bounds. +- **Control-flow integrity for direct branches.** Structured `block`/`loop`/ + `if` mean a `br N` can only target a lexically enclosing scope. The + structured `CGTarget` ops (§5.1) are the source of this — flat goto and + `switch` fallthrough route through the relooper at -O2 and through the + WASM CGTarget's structural fallback at -O0/-O1. +- **CFI for indirect calls.** `call_indirect` traps on signature mismatch. + The WASM type index is keyed off interned `Type*` identity (§12), so equal + C function types produce a single WASM type id and a real (not vacuous) + type check. +- **No native code injection.** WASM has no `mprotect`/JIT-into-data path + exposed to the program; cfree's own JIT linker uses host APIs outside the + sandbox. +- **`setjmp`/`longjmp`** lower to WASM exception handling; a `longjmp` cannot + smash the host stack or skip past a structured-control-flow boundary it + did not originate inside. + +**NOT checked:** + +- **Pointer provenance.** Pointers are `i32` indices into linear memory. + `(int*)0xdeadbeef` is a valid bit pattern; the only guard is the bounds + check on the eventual access. Use-after-free, type confusion, and + intra-heap buffer overflow that stays inside linear memory all remain + exploitable — exactly as on a real target. +- **Integer/UB traps as a safety net.** Signed overflow, shift-by-≥-width, + and division-by-zero trap *deterministically* on WASM, but `opt` is not + permitted to assume they're unreachable (§9). They terminate the program; + they are not a substitute for input validation. +- **Stack exhaustion** beyond the configured WASM stack limit: traps, but + recovery requires host-side restart. + +In short: WASM gives cfree-compiled programs **memory-isolation** safety +(can't escape linear memory) and **control-flow-integrity** safety (can't +forge a return address or call a wrong-typed function), but not +**type-system** safety on pointers within linear memory. The compiler does +not pretend otherwise. diff --git a/doc/builtins.md b/doc/builtins.md @@ -112,6 +112,114 @@ Operations (signatures match the GCC `__atomic` builtin family): - `__atomic_test_and_set(ptr, order)`, `__atomic_clear(ptr, order)` — for `atomic_flag` +### Syscalls (cfree extension) + +Declared in `<cfree/syscall.h>`. Kernel-trap primitive so libc syscall +stubs can be pure C. Numbers (`SYS_*`) are libc's responsibility — +cfree only provides the instruction. All args and result are `long`; +pointers/sizes/fds get cast at the call site. + +- `__cfree_syscall0(nr)` … `__cfree_syscall6(nr, a0, a1, a2, a3, a4, a5)` + +Semantics: +- Result is normalized to Linux-style `-errno` on failure, non-negative + on success, on every target. On BSD/Darwin the lowering inspects the + carry/C flag and rewrites the result. +- Modeled as an opaque external call with full memory clobber plus the + target's syscall-clobber list (so the optimizer cannot move work + across the trap). +- Not available on WASM — compile-time error directs callers to WASI + imports. + +Per-target lowering: + +| Target | Instr | Nr reg | Args | Result | Error | +| --------------- | ----------------- | ------ | -------------------------- | ------ | -------- | +| Linux x86_64 | `syscall` | rax | rdi, rsi, rdx, r10, r8, r9 | rax | rax < 0 | +| Linux i386 | `int 0x80` | eax | ebx, ecx, edx, esi, edi, ebp | eax | eax < 0 | +| Linux aarch64 | `svc #0` | x8 | x0..x5 | x0 | x0 < 0 | +| Linux arm | `svc #0` | r7 | r0..r5 | r0 | r0 < 0 | +| Linux riscv | `ecall` | a7 | a0..a5 | a0 | a0 < 0 | +| Darwin x86_64 | `syscall` | rax (class bits already in nr) | rdi, rsi, rdx, r10, r8, r9 | rax | carry → −errno | +| Darwin aarch64 | `svc #0x80` | x16 | x0..x5 | x0 | C flag → −errno | + +i386 6-arg case (`ebp` is the frame pointer): cfree saves/restores +`ebp` around the trap. + +### Bare-metal primitives (cfree extension) + +Declared in `<cfree/baremetal.h>`. For freestanding / embedded use, so +libc and HAL code can stay pure C. All have opaque-call + +full-memory-clobber semantics so the optimizer cannot reorder loads, +stores, or other side effects across them. + +Interrupt control (the standard save/disable/restore critical-section +idiom): +- `unsigned long __cfree_irq_save(void)` — disable IRQs, return previous mask +- `void __cfree_irq_restore(unsigned long prev)` +- `void __cfree_irq_disable(void)`, `void __cfree_irq_enable(void)` + +Lowerings: x86 `cli`/`sti` + `pushf`/`popf`; Cortex-A/R `cpsid i`/`cpsie i` ++ CPSR; Cortex-M `cpsid i`/`cpsie i` + PRIMASK (selected by +`__ARM_ARCH_*` profile macros); aarch64 `msr daifset/daifclr, #2` + +`mrs daif`; RISC-V `csrr{ci,si} mstatus, 8`. + +CPU memory barriers — distinct from `__atomic_thread_fence`. C11 fences +provide ordering for the C abstract machine; these emit the specific +CPU barriers required for DMA-coherent device memory, MMU/TLB +reconfiguration, and self-modifying / freshly-loaded code. + +```c +typedef enum { + __CFREE_BARRIER_FULL, // sy + __CFREE_BARRIER_INNER, // ish + __CFREE_BARRIER_INNER_STORE, // ishst + __CFREE_BARRIER_OUTER, // osh + __CFREE_BARRIER_OUTER_STORE, // oshst + __CFREE_BARRIER_NON_SHARE, // nsh +} __cfree_barrier_scope; + +void __cfree_dmb(__cfree_barrier_scope); // ordering only +void __cfree_dsb(__cfree_barrier_scope); // ordering + completion +void __cfree_isb(void); // pipeline flush after sysreg / MMU change +``` + +Lowerings: arm/aarch64 `dmb/dsb/isb <scope>`; x86 `mfence`/`lfence`/`sfence` +(scope ignored — TSO collapses the cases) and `isb` is a no-op (x86 +self-snoops); RISC-V `fence rw,rw` and `fence.i`. WASM: compile-time error. + +Cache maintenance (range-based; cfree reads `CTR`/`CTR_EL0` once at +startup for the line size and emits a loop): +- `void __cfree_dcache_clean(const void *, unsigned long)` — write-back +- `void __cfree_dcache_invalidate(void *, unsigned long)` +- `void __cfree_dcache_clean_invalidate(void *, unsigned long)` +- `void __cfree_icache_invalidate(const void *, unsigned long)` + +Lowerings: aarch64 `dc {cvac,ivac,civac}` + `ic ivau` loops; arm v7+ +equivalents via CP15. x86: no-ops (cache-coherent ICache included). +RISC-V: Zicbom / Zicboz instructions when those extensions are present, +otherwise a compile-time error. + +Hints: +- `void __cfree_nop(void)` +- `void __cfree_yield(void)` — spin-loop hint; arm `yield`, x86 `pause`, + RISC-V `pause` +- `void __cfree_wfi(void)` — sleep until next interrupt; arm/aarch64 + `wfi`, x86 `hlt`, RISC-V `wfi`. All three are privileged, which is + fine for bare-metal. Compile-time error on WASM. +- `void __cfree_wfe(void)`, `void __cfree_sev(void)` — arm/aarch64 + only; compile-time error elsewhere. The inter-core event-flag + abstraction (SEV sets, WFE waits, exclusive-monitor release also + sets) does not generalize: x86 MONITOR/MWAIT is address-watch and + privileged-extension; RISC-V has no base-ISA equivalent. Use + `__cfree_yield` + `__cfree_wfi` for portable spin/idle loops. + +System-register access (`mrs`/`msr`, `csrr`/`csrw`, `rdmsr`/`wrmsr`, +MMU/cache config, etc.) is **not** provided as a builtin. Callers use +extended inline asm directly. Rationale: register names and privilege +rules vary per ISA generation; the call sites are arch-specific +already; abstracting adds churn without removing platform code. + --- ## `libcfree_rt.a` — runtime support library @@ -150,7 +258,7 @@ Always: - Compare: `__eq`, `__ne`, `__lt`, `__le`, `__gt`, `__ge`, `__unord` × `sf2`/`df2`/`tf2` ### Nonlocal jumps + stackful coroutines (per-arch, always shipped) -`<setjmp.h>` and `<stdcoro.h>` share one per-target context payload +`<setjmp.h>` and `<cfree/coro.h>` share one per-target context payload (256 bytes, 16-byte aligned): callee-saved GPRs + callee-saved FPRs + sp + return address. `jmp_buf` and `coro_ctx` are both opaque typedefs over that payload; the runtime reinterprets them as the @@ -159,7 +267,7 @@ per-arch struct. - `setjmp`, `longjmp` — `<setjmp.h>` (C11 7.13). cfree extension: this header is *not* in the C11 freestanding subset. - `coro_init`, `coro_resume`, `coro_yield`, `coro_self` — public - asymmetric API in `<stdcoro.h>`. Resume drives a coroutine + asymmetric API in `<cfree/coro.h>`. Resume drives a coroutine forward; yield suspends back to the most recent resumer; resumes nest like function calls. Status (`CORO_INIT` / `RUNNING` / `SUSPENDED` / `DEAD`) is tracked on the `coro_t` and propagates diff --git a/include/cfree/baremetal.h b/include/cfree/baremetal.h @@ -0,0 +1,125 @@ +/* cfree/baremetal.h -- cfree extension -- bare-metal / freestanding + * embedded primitives + * + * cfree/baremetal.h is non-standard: C11 has no notion of interrupt + * masking, CPU memory barriers (distinct from C11 fences), or cache + * maintenance. cfree exposes them so HAL and libc-substrate code + * targeting bare metal can stay pure C without resorting to inline + * asm for these recurring idioms. + * + * Optimizer view: every primitive in this header is opaque to the + * optimizer with full memory clobber. Loads, stores, and other side + * effects on either side of a call are not reordered across it. This + * is what makes the IRQ save/restore idiom and the DMA-coherent + * barrier idioms correct without per-call inline-asm clobbers. + * + * Per-target lowering: see doc/builtins.md. Targets where a primitive + * has no meaningful lowering (e.g. WFI on x86, DMB on WASM) raise a + * compile-time error rather than silently no-op. + * + * What is *not* in this header. System-register access (mrs/msr, + * csrr/csrw, rdmsr/wrmsr, MMU/cache config writes, ...) stays in + * extended inline asm at the call site. Register names and privilege + * rules vary too much per ISA generation to wrap usefully, and call + * sites are arch-specific anyway. + */ +#ifndef CFREE_BAREMETAL_H +#define CFREE_BAREMETAL_H + +/* ==================================================================== + * Interrupt control. + * + * The standard save/disable/restore critical-section idiom: + * + * unsigned long prev = __cfree_irq_save(); + * // ... critical section ... + * __cfree_irq_restore(prev); + * + * Save/restore nests safely. The standalone disable/enable forms are + * for code that owns the interrupt-enable bit unconditionally (boot, + * panic paths). + * + * Lowerings: x86 cli/sti + pushf/popf; Cortex-A/R cpsid i/cpsie i + + * CPSR; Cortex-M cpsid i/cpsie i + PRIMASK (selected by __ARM_ARCH_* + * profile macros); aarch64 msr daifset/daifclr + mrs daif; RISC-V + * csrr{ci,si} mstatus, 8. + * ==================================================================== */ +unsigned long __cfree_irq_save(void); +void __cfree_irq_restore(unsigned long prev); +void __cfree_irq_disable(void); +void __cfree_irq_enable(void); + +/* ==================================================================== + * CPU memory barriers. + * + * Distinct from <stdatomic.h>'s __atomic_thread_fence: C11 fences + * provide ordering for the C abstract machine and assume a + * cache-coherent multiprocessor. These primitives emit the specific + * CPU barriers required for DMA-coherent device memory, MMU/TLB + * reconfiguration, and self-modifying / freshly-loaded code -- where + * the C abstract machine is not the right model. + * + * Scope selects the shareability domain on arm/aarch64; targets with + * no such concept (x86 TSO collapses every case) ignore it. + * + * Lowerings: arm/aarch64 dmb/dsb/isb <scope>; x86 mfence/lfence/sfence + * (scope ignored) and isb is a no-op (x86 self-snoops); RISC-V + * fence rw,rw and fence.i. WASM: compile-time error. + * ==================================================================== */ +typedef enum { + __CFREE_BARRIER_FULL, /* sy */ + __CFREE_BARRIER_INNER, /* ish */ + __CFREE_BARRIER_INNER_STORE, /* ishst */ + __CFREE_BARRIER_OUTER, /* osh */ + __CFREE_BARRIER_OUTER_STORE, /* oshst */ + __CFREE_BARRIER_NON_SHARE, /* nsh */ +} __cfree_barrier_scope; + +void __cfree_dmb(__cfree_barrier_scope); /* ordering only */ +void __cfree_dsb(__cfree_barrier_scope); /* ordering + completion */ +void __cfree_isb(void); /* pipeline flush after sysreg/MMU */ + +/* ==================================================================== + * Cache maintenance (range-based). + * + * cfree reads CTR / CTR_EL0 once at startup to learn the data and + * instruction cache line sizes and emits a loop over [p, p + n). + * Callers do not have to align p or n; the runtime widens to line + * boundaries. + * + * Lowerings: aarch64 dc {cvac,ivac,civac} + ic ivau loops; arm v7+ + * equivalents via CP15. x86: no-ops (cache-coherent, ICache included). + * RISC-V: Zicbom / Zicboz instructions when those extensions are + * present, otherwise a compile-time error. + * ==================================================================== */ +void __cfree_dcache_clean(const void *p, unsigned long n); +void __cfree_dcache_invalidate(void *p, unsigned long n); +void __cfree_dcache_clean_invalidate(void *p, unsigned long n); +void __cfree_icache_invalidate(const void *p, unsigned long n); + +/* ==================================================================== + * Hints. + * + * __cfree_yield is the spin-loop hint (arm yield, x86 pause, + * RISC-V pause). + * + * __cfree_wfi sleeps until the next interrupt -- the universal "idle + * loop" primitive. Lowers to arm/aarch64 wfi, x86 hlt, RISC-V wfi. + * All three are privileged (ring 0 / EL1+ / M-or-S mode); bare-metal + * code is privileged by construction. WASM: compile-time error. + * + * The wfe/sev pair is arm/aarch64-only: the inter-core "event flag" + * abstraction (SEV signals the flag, WFE sleeps on it; the exclusive + * monitor's release also signals) does not generalize. x86 MONITOR/ + * MWAIT is address-watch rather than flag-based and not in the base + * ISA; RISC-V has no base-ISA equivalent. Invoking them outside arm/ + * aarch64 is a compile-time error -- write the spin-and-back-off loop + * with __cfree_yield + __cfree_wfi instead. + * ==================================================================== */ +void __cfree_nop(void); +void __cfree_yield(void); +void __cfree_wfi(void); +void __cfree_wfe(void); /* arm/aarch64 only */ +void __cfree_sev(void); /* arm/aarch64 only */ + +#endif diff --git a/include/cfree/coro.h b/include/cfree/coro.h @@ -0,0 +1,130 @@ +/* cfree/coro.h -- cfree extension -- stackful asymmetric coroutines + * + * cfree/coro.h is non-standard: C11 has no stackful-coroutine facility. + * cfree ships it as a native counterpart to <setjmp.h>: the underlying + * per-target context payload is literally shared with setjmp/longjmp + * (256 bytes, see doc/builtins.md), and the runtime is target-specific + * assembly in libcfree_rt.a. + * + * Two layers in this header: + * + * coro_ctx Raw register-context buffer used by the symmetric + * primitive __cfree_coro_switch. Most code does not + * touch it -- it is exposed for advanced schedulers + * (M:N, custom dispatch) that want the bare switch. + * + * coro_t Asymmetric coroutine handle. Resume drives forward, + * yield suspends back to the most recent resumer. + * Resumes nest like function calls. status is + * publicly readable; the rest is private storage. + * + * Programming model (asymmetric): + * 1. Allocate a coro_t and a stack region. + * 2. coro_init(&c, fn, stack_base, stack_len). + * 3. coro_resume(&c, value) drives c forward. + * 4. From inside fn, coro_yield(value) suspends back to the resumer. + * 5. fn's return value becomes the final coro_resume payload, with + * status CORO_DEAD; the runtime cleans up automatically. + * + * Threading. The runtime's "current coroutine" pointer and "main" + * register save slot are _Thread_local, so each thread has its own + * resume chain. A coroutine itself is still tied to the thread that + * drives it: errno, _Thread_local user state, and thread-affine OS + * handles silently rebind if a coroutine is resumed on a different + * thread, so don't migrate a suspended coroutine across threads. + * cfree's contract defines __STDC_NO_THREADS__ (no <threads.h>) -- + * _Thread_local is a separate C11 language feature and works + * independently. + */ +#ifndef CFREE_CORO_H +#define CFREE_CORO_H + +#include <stddef.h> +#include <stdint.h> + +/* Stack alignment required at function-call boundaries on every cfree + target (16 on x86_64/aarch64/arm32-AAPCS-VFP/riscv; weaker on i386 + but 16 covers it). Caller stacks must be aligned to this. */ +#define CORO_STACK_ALIGN 16 + +/* Raw register-context buffer. 256 bytes, alignof 16. The runtime + reinterprets this as a per-target struct of callee-saved GPRs + + callee-saved FPRs + sp + return address. Exposed only because the + internal __cfree_coro_switch primitive at the bottom of this header + needs it as an argument type. coro_t below embeds one of these as + the first word of its private storage. */ +typedef struct coro_ctx { + _Alignas(16) unsigned char __cfree_storage[256]; +} coro_ctx; + +/* ==================================================================== + * Asymmetric coroutine API. + * ==================================================================== */ + +typedef enum { + CORO_INIT, /* never resumed */ + CORO_RUNNING, /* on the live resume chain */ + CORO_SUSPENDED, /* yielded; resumable */ + CORO_DEAD, /* entry returned */ +} coro_status_t; + +typedef struct { + uintptr_t value; + coro_status_t status; +} coro_result_t; + +/* Coroutine entry point. The first coro_resume's value is passed as + `arg`. The return value is delivered as the final coro_resume's + payload, with status CORO_DEAD. */ +typedef uintptr_t (*coro_fn)(uintptr_t arg); + +/* Coroutine handle. status is publicly readable; the private blob + carries the register context (256 B), a resumer pointer, and the + user-supplied entry fn. 288 B is comfortable headroom on both LP64 + and ILP32 (lib/coro/coro.c verifies the fit with a _Static_assert). */ +typedef struct coro { + coro_status_t status; + _Alignas(16) unsigned char __cfree_priv[288]; +} coro_t; + +/* Initialize *c to run fn on [stack_base, stack_base + stack_len). + stack_base must be CORO_STACK_ALIGN-aligned. status becomes + CORO_INIT. The first coro_resume delivers its value as fn's arg. */ +void coro_init(coro_t *c, coro_fn fn, void *stack_base, size_t stack_len); + +/* Drive c forward. If c is INIT, calls fn(value) on c's stack. If + SUSPENDED, c's matching coro_yield call returns value. coro_resume + itself returns when c yields or its fn returns; the result carries + c's new status (SUSPENDED or DEAD) and the value c delivered. + UB if c is RUNNING or DEAD. */ +coro_result_t coro_resume(coro_t *c, uintptr_t value); + +/* Suspend the current coroutine, returning value to its resumer (the + matching coro_resume call returns this value). coro_yield itself + returns the value the next resumer passes. UB outside a coroutine. */ +uintptr_t coro_yield(uintptr_t value); + +/* The currently running coroutine, or NULL if not in one. */ +coro_t *coro_self(void); + +static inline coro_status_t coro_status(const coro_t *c) { return c->status; } + +/* ==================================================================== + * Symmetric primitive (compiler-builtin-style; for advanced schedulers). + * + * Saves callee-saved state into *from, restores it from *to, and + * delivers `value` to *to as the return of its prior switch (or as + * the first-arg register of *to's trampoline on a fresh context). + * Returns the value passed by the next switch back to *from. + * + * coro_resume / coro_yield are built on this. Most code should not + * call it directly; it is exposed for schedulers that don't fit the + * asymmetric resume-chain model (M:N runtimes, work-stealing, etc.). + * + * Bypassing the asymmetric layer means losing coro_self / status + * tracking / DEAD propagation -- the symmetric primitive is purely + * a register-shuffle and knows nothing about coro_t. + * ==================================================================== */ +uintptr_t __cfree_coro_switch(coro_ctx *from, coro_ctx *to, uintptr_t value); + +#endif diff --git a/include/cfree/syscall.h b/include/cfree/syscall.h @@ -0,0 +1,44 @@ +/* cfree/syscall.h -- cfree extension -- kernel-trap primitive + * + * cfree/syscall.h is non-standard: C11 has no notion of a kernel + * trap. cfree exposes the bare instruction so libc syscall stubs and + * other low-level code can stay pure C without resorting to inline + * asm. + * + * Numbering is the caller's responsibility -- cfree provides no + * SYS_* table. Pass the platform-specific number (see Linux + * <asm/unistd.h>, Darwin <sys/syscall.h>, etc.) in nr; pointers, + * sizes, and file descriptors are cast to long at the call site. + * + * Result convention: normalized to Linux-style "non-negative on + * success, -errno on failure" on every supported target. On + * BSD/Darwin, where the kernel signals failure via the carry/C + * flag and returns the positive errno in the result register, the + * lowering inspects the flag and rewrites the value -- callers + * see the Linux convention regardless of host kernel. + * + * Optimizer view: each call is opaque, with full memory clobber + * plus the target's syscall-clobber list. The optimizer cannot + * reorder loads, stores, or other side effects across the trap. + * + * Per-target lowering (see doc/builtins.md for the table): cfree + * emits the appropriate trap instruction (`syscall`, `int 0x80`, + * `svc`, `ecall`) inline; there is no library call. + * + * Not available on WASM: invoking any of these on __wasm__ is a + * compile-time error. WASM programs reach the host via WASI + * imports, not a syscall instruction. + */ +#ifndef CFREE_SYSCALL_H +#define CFREE_SYSCALL_H + +long __cfree_syscall0(long nr); +long __cfree_syscall1(long nr, long a0); +long __cfree_syscall2(long nr, long a0, long a1); +long __cfree_syscall3(long nr, long a0, long a1, long a2); +long __cfree_syscall4(long nr, long a0, long a1, long a2, long a3); +long __cfree_syscall5(long nr, long a0, long a1, long a2, long a3, long a4); +long __cfree_syscall6(long nr, long a0, long a1, long a2, long a3, long a4, + long a5); + +#endif diff --git a/include/setjmp.h b/include/setjmp.h @@ -10,7 +10,7 @@ * such struct across cfree targets -- 256 bytes (x86_64 Windows: 12 * GPR slots + xmm6-15). C11 explicitly excludes the FP status flags * and open-file state, so no signal-mask slot is reserved. The same - * 256-byte payload is shared with <stdcoro.h>'s coro_ctx so the + * 256-byte payload is shared with <cfree/coro.h>'s coro_ctx so the * underlying save/restore halves are reused across all three * primitives. */ #ifndef CFREE_SETJMP_H diff --git a/include/stdcoro.h b/include/stdcoro.h @@ -1,130 +0,0 @@ -/* stdcoro.h -- cfree extension -- stackful asymmetric coroutines - * - * stdcoro.h is non-standard: C11 has no stackful-coroutine facility. - * cfree ships it as a native counterpart to <setjmp.h>: the underlying - * per-target context payload is literally shared with setjmp/longjmp - * (256 bytes, see doc/builtins.md), and the runtime is target-specific - * assembly in libcfree_rt.a. - * - * Two layers in this header: - * - * coro_ctx Raw register-context buffer used by the symmetric - * primitive __cfree_coro_switch. Most code does not - * touch it -- it is exposed for advanced schedulers - * (M:N, custom dispatch) that want the bare switch. - * - * coro_t Asymmetric coroutine handle. Resume drives forward, - * yield suspends back to the most recent resumer. - * Resumes nest like function calls. status is - * publicly readable; the rest is private storage. - * - * Programming model (asymmetric): - * 1. Allocate a coro_t and a stack region. - * 2. coro_init(&c, fn, stack_base, stack_len). - * 3. coro_resume(&c, value) drives c forward. - * 4. From inside fn, coro_yield(value) suspends back to the resumer. - * 5. fn's return value becomes the final coro_resume payload, with - * status CORO_DEAD; the runtime cleans up automatically. - * - * Threading. The runtime's "current coroutine" pointer and "main" - * register save slot are _Thread_local, so each thread has its own - * resume chain. A coroutine itself is still tied to the thread that - * drives it: errno, _Thread_local user state, and thread-affine OS - * handles silently rebind if a coroutine is resumed on a different - * thread, so don't migrate a suspended coroutine across threads. - * cfree's contract defines __STDC_NO_THREADS__ (no <threads.h>) -- - * _Thread_local is a separate C11 language feature and works - * independently. - */ -#ifndef CFREE_STDCORO_H -#define CFREE_STDCORO_H - -#include <stddef.h> -#include <stdint.h> - -/* Stack alignment required at function-call boundaries on every cfree - target (16 on x86_64/aarch64/arm32-AAPCS-VFP/riscv; weaker on i386 - but 16 covers it). Caller stacks must be aligned to this. */ -#define CORO_STACK_ALIGN 16 - -/* Raw register-context buffer. 256 bytes, alignof 16. The runtime - reinterprets this as a per-target struct of callee-saved GPRs + - callee-saved FPRs + sp + return address. Exposed only because the - internal __cfree_coro_switch primitive at the bottom of this header - needs it as an argument type. coro_t below embeds one of these as - the first word of its private storage. */ -typedef struct coro_ctx { - _Alignas(16) unsigned char __cfree_storage[256]; -} coro_ctx; - -/* ==================================================================== - * Asymmetric coroutine API. - * ==================================================================== */ - -typedef enum { - CORO_INIT, /* never resumed */ - CORO_RUNNING, /* on the live resume chain */ - CORO_SUSPENDED, /* yielded; resumable */ - CORO_DEAD, /* entry returned */ -} coro_status_t; - -typedef struct { - uintptr_t value; - coro_status_t status; -} coro_result_t; - -/* Coroutine entry point. The first coro_resume's value is passed as - `arg`. The return value is delivered as the final coro_resume's - payload, with status CORO_DEAD. */ -typedef uintptr_t (*coro_fn)(uintptr_t arg); - -/* Coroutine handle. status is publicly readable; the private blob - carries the register context (256 B), a resumer pointer, and the - user-supplied entry fn. 288 B is comfortable headroom on both LP64 - and ILP32 (lib/coro/coro.c verifies the fit with a _Static_assert). */ -typedef struct coro { - coro_status_t status; - _Alignas(16) unsigned char __cfree_priv[288]; -} coro_t; - -/* Initialize *c to run fn on [stack_base, stack_base + stack_len). - stack_base must be CORO_STACK_ALIGN-aligned. status becomes - CORO_INIT. The first coro_resume delivers its value as fn's arg. */ -void coro_init(coro_t *c, coro_fn fn, void *stack_base, size_t stack_len); - -/* Drive c forward. If c is INIT, calls fn(value) on c's stack. If - SUSPENDED, c's matching coro_yield call returns value. coro_resume - itself returns when c yields or its fn returns; the result carries - c's new status (SUSPENDED or DEAD) and the value c delivered. - UB if c is RUNNING or DEAD. */ -coro_result_t coro_resume(coro_t *c, uintptr_t value); - -/* Suspend the current coroutine, returning value to its resumer (the - matching coro_resume call returns this value). coro_yield itself - returns the value the next resumer passes. UB outside a coroutine. */ -uintptr_t coro_yield(uintptr_t value); - -/* The currently running coroutine, or NULL if not in one. */ -coro_t *coro_self(void); - -static inline coro_status_t coro_status(const coro_t *c) { return c->status; } - -/* ==================================================================== - * Symmetric primitive (compiler-builtin-style; for advanced schedulers). - * - * Saves callee-saved state into *from, restores it from *to, and - * delivers `value` to *to as the return of its prior switch (or as - * the first-arg register of *to's trampoline on a fresh context). - * Returns the value passed by the next switch back to *from. - * - * coro_resume / coro_yield are built on this. Most code should not - * call it directly; it is exposed for schedulers that don't fit the - * asymmetric resume-chain model (M:N runtimes, work-stealing, etc.). - * - * Bypassing the asymmetric layer means losing coro_self / status - * tracking / DEAD propagation -- the symmetric primitive is purely - * a register-shuffle and knows nothing about coro_t. - * ==================================================================== */ -uintptr_t __cfree_coro_switch(coro_ctx *from, coro_ctx *to, uintptr_t value); - -#endif diff --git a/lib/README.md b/lib/README.md @@ -33,8 +33,8 @@ hand-written `mem/mem.c` is 0BSD; relicense as desired. | `riscv/rv64.S` | `__riscv_save_*` + `__riscv_restore_*` (rv64) | RISC-V rv64 with `-msave-restore` | | `mem/mem.c` | `memcpy` / `memmove` / `memset` / `memcmp` (weak) | All; user libc overrides | | `atomic/atomic_freestanding.c` | `__atomic_*` fallback shim | All | -| `coro/<arch>.c` | Per-arch primitives: `setjmp` / `longjmp` (`<setjmp.h>`) + `__cfree_coro_ctx_init` / `__cfree_coro_switch` / `__cfree_coro_trampoline` (internal; the public `<stdcoro.h>` API sits on top via `coro/coro.c`) | One of `aarch64`, `arm32`, `arm32_thumb1`, `i386`, `riscv32`, `riscv64`, `x86_64`, `x86_64_win`. Not built for `wasm32`. | -| `coro/coro.c` | Arch-agnostic asymmetric layer: `coro_init` / `coro_resume` / `coro_yield` / `coro_self` (`<stdcoro.h>`) | All variants that ship a `coro/<arch>.c`. | +| `coro/<arch>.c` | Per-arch primitives: `setjmp` / `longjmp` (`<setjmp.h>`) + `__cfree_coro_ctx_init` / `__cfree_coro_switch` / `__cfree_coro_trampoline` (internal; the public `<cfree/coro.h>` API sits on top via `coro/coro.c`) | One of `aarch64`, `arm32`, `arm32_thumb1`, `i386`, `riscv32`, `riscv64`, `x86_64`, `x86_64_win`. Not built for `wasm32`. | +| `coro/coro.c` | Arch-agnostic asymmetric layer: `coro_init` / `coro_resume` / `coro_yield` / `coro_self` (`<cfree/coro.h>`) | All variants that ship a `coro/<arch>.c`. | ### Build-time include dirs (consumed by the masters; nothing here lands in `libcfree_rt.a`) @@ -153,7 +153,7 @@ Provides: - `setjmp` / `longjmp` (public, `<setjmp.h>`). - `__cfree_coro_switch(from, to, value)` — symmetric register switch, - exposed in `<stdcoro.h>` as a compiler-builtin-style primitive for + exposed in `<cfree/coro.h>` as a compiler-builtin-style primitive for advanced schedulers; the asymmetric layer below also uses it. - `__cfree_coro_ctx_init` / `__cfree_coro_trampoline` — internal. diff --git a/lib/build.sh b/lib/build.sh @@ -120,7 +120,7 @@ echo # ---- LP64 little-endian ------------------------------------------------------ LP64_BASE="$INT_C $INT64_C $FP_C $MEM_C $ATOMIC_C" -# Coro impl needs cfree's own headers (setjmp.h, stdcoro.h). +# Coro impl needs cfree's own headers (setjmp.h, cfree/coro.h). CORO_INC="-I../include" build_variant x86_64-linux \ diff --git a/lib/coro/aarch64.c b/lib/coro/aarch64.c @@ -1,7 +1,7 @@ /* * lib/coro/aarch64.c -- AArch64 (AAPCS) implementations of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * All three primitives sit on one per-target context layout: * @@ -24,7 +24,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/arm32.c b/lib/coro/arm32.c @@ -1,7 +1,7 @@ /* * lib/coro/arm32.c -- ARM32 Thumb-2 (AAPCS) implementations of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * All three primitives sit on one per-target context layout: * @@ -31,7 +31,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/arm32_thumb1.c b/lib/coro/arm32_thumb1.c @@ -1,7 +1,7 @@ /* * lib/coro/arm32_thumb1.c -- ARMv6-M (Cortex-M0 / M0+, Thumb-1) impls of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * Thumb-1 / ARMv6-M is a strict subset of the Thumb-2 ISA used by the * sibling arm32.c, and several conveniences disappear: @@ -27,7 +27,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/coro.c b/lib/coro/coro.c @@ -1,5 +1,5 @@ /* - * lib/coro/coro.c -- asymmetric coroutine layer for <stdcoro.h>. + * lib/coro/coro.c -- asymmetric coroutine layer for <cfree/coro.h>. * * Sits on top of the per-arch __cfree_coro_switch / __cfree_coro_ctx_init * primitives (one of lib/coro/<arch>.c) and supplies the public @@ -37,7 +37,7 @@ * any coroutine's lifecycle. */ -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/i386.c b/lib/coro/i386.c @@ -1,7 +1,7 @@ /* * lib/coro/i386.c -- i386 System V (cdecl, ILP32) implementations of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * cdecl callee-saved set: ebx, esi, edi, ebp, esp. Args are pushed * right-to-left on the stack: at function entry, 4(%esp)=arg0, @@ -33,7 +33,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/riscv32.c b/lib/coro/riscv32.c @@ -1,7 +1,7 @@ /* * lib/coro/riscv32.c -- RISC-V 32-bit (ILP32/ILP32F/ILP32D) implementations of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * Per-target context layout (matches xOS rv32 tick_coro_ctx): * @@ -26,7 +26,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/riscv64.c b/lib/coro/riscv64.c @@ -1,7 +1,7 @@ /* * lib/coro/riscv64.c -- RISC-V 64-bit (LP64D) implementations of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * RISC-V LP64D callee-saved set: * ra (x1) -- saved manually so longjmp/__cfree_coro_switch can @@ -39,7 +39,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/x86_64.c b/lib/coro/x86_64.c @@ -1,7 +1,7 @@ /* * lib/coro/x86_64.c -- x86_64 System V ABI implementations of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * Callee-saved set on SysV: rbx, rbp, r12-r15. (No callee-saved xmm * regs -- those are MS-ABI specific; see x86_64_win.c.) @@ -24,7 +24,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/lib/coro/x86_64_win.c b/lib/coro/x86_64_win.c @@ -1,7 +1,7 @@ /* * lib/coro/x86_64_win.c -- x86_64 Windows (MS x64 ABI) implementations of * setjmp / longjmp (<setjmp.h>) - * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<stdcoro.h>) + * __cfree_coro_ctx_init / __cfree_coro_switch / trampoline (<cfree/coro.h>) * * MS x64 callee-saved set: rbx, rbp, rdi, rsi, r12-r15, xmm6-xmm15. * (Compare with x86_64.c -- SysV doesn't preserve rdi/rsi or any xmm.) @@ -29,7 +29,7 @@ */ #include <setjmp.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> diff --git a/src/abi/abi.h b/src/abi/abi.h @@ -0,0 +1,128 @@ +#ifndef CFREE_ABI_H +#define CFREE_ABI_H + +#include "../core/core.h" +#include "../type/type.h" + +/* TargetABI is the single authority for target-dependent C layout and calling + * convention decisions. Type remains structural and ABI-neutral; all sizes, + * alignments, field offsets, bitfield packing, scalar widths, and + * argument/return classifications are derived here from Compiler.target. */ +typedef struct TargetABI TargetABI; + +typedef enum ABIScalarKind { + ABI_SC_VOID, + ABI_SC_BOOL, + ABI_SC_INT, + ABI_SC_FLOAT, + ABI_SC_PTR, +} ABIScalarKind; + +typedef struct ABITypeInfo { + u32 size; + u32 align; + u8 scalar_kind; /* ABIScalarKind; ABI_SC_VOID for aggregates/void */ + u8 signed_; + u8 atomic; + u8 pad; +} ABITypeInfo; + +typedef struct ABIFieldLayout { + u32 offset; /* byte offset from record base */ + u16 bit_offset; /* bit offset within storage unit for bitfields */ + u16 bit_width; /* 0 for non-bitfield */ + u32 storage_size; /* bytes in the bitfield storage unit; 0 otherwise */ +} ABIFieldLayout; + +typedef struct ABIRecordLayout { + u32 size; + u32 align; + u32 nfields; + const ABIFieldLayout* fields; +} ABIRecordLayout; + +typedef enum ABIArgKind { + ABI_ARG_IGNORE, + ABI_ARG_DIRECT, /* one or more inspectable parts */ + ABI_ARG_INDIRECT, /* caller passes address */ + ABI_ARG_EXPAND, /* aggregate split into parts below */ +} ABIArgKind; + +typedef enum ABIArgClass { + ABI_CLASS_NONE, + ABI_CLASS_INT, + ABI_CLASS_FP, + ABI_CLASS_VEC, + ABI_CLASS_MEM, +} ABIArgClass; + +typedef enum ABIArgLoc { + ABI_LOC_NONE, + ABI_LOC_REG, + ABI_LOC_STACK, + ABI_LOC_EITHER, +} ABIArgLoc; + +typedef enum ABIArgFlag { + ABI_AF_NONE = 0, + ABI_AF_SRET = 1u << 0, /* hidden structure-return pointer */ + ABI_AF_BYVAL = 1u << 1, /* caller passes an address to a copy */ + ABI_AF_SIGN_EXT = 1u << 2, + ABI_AF_ZERO_EXT = 1u << 3, + ABI_AF_VARARG = 1u << 4, /* placement affected by variadic rules */ + ABI_AF_SPLIT = 1u << 5, /* source value is split across parts */ +} ABIArgFlag; + +typedef struct ABIArgPart { + u8 cls; /* ABIArgClass */ + u8 loc; /* ABIArgLoc preference */ + u16 flags; /* ABIArgFlag */ + u32 src_offset; /* byte offset within source object */ + u32 size; /* bytes carried by this part */ + u32 align; /* part alignment */ + u32 stack_align; /* required stack alignment if stack-passed */ +} ABIArgPart; + +typedef struct ABIArgInfo { + u8 kind; /* ABIArgKind */ + u8 flags; /* ABIArgFlag applying to the whole argument */ + u16 nparts; + u32 indirect_align; /* required alignment for ABI_ARG_INDIRECT/byval copy */ + const ABIArgPart* parts; +} ABIArgInfo; + +typedef struct ABIFuncInfo { + ABIArgInfo ret; + const ABIArgInfo* params; + u16 nparams; + u8 variadic; + u8 has_sret; + u32 vararg_gp_offset; + u32 vararg_fp_offset; + u32 vararg_overflow_offset; +} ABIFuncInfo; + +void abi_init(TargetABI*, Compiler*); +void abi_fini(TargetABI*); + +/* Builtin scalar profiles and general type layout. */ +ABITypeInfo abi_type_info(TargetABI*, const Type*); +u32 abi_sizeof (TargetABI*, const Type*); +u32 abi_alignof (TargetABI*, const Type*); + +/* Record layout is cached by Type* identity inside TargetABI and is stable for + * the lifetime of the ABI object. Incomplete records are fatal diagnostics. */ +const ABIRecordLayout* abi_record_layout(TargetABI*, const Type*); + +/* Calling convention classification. The returned object is owned by the ABI + * cache and remains valid until abi_fini. */ +const ABIFuncInfo* abi_func_info(TargetABI*, const Type* fn_type); + +/* Target-defined library types used by headers and builtins. */ +const Type* abi_size_type (TargetABI*, Pool*); +const Type* abi_ptrdiff_type (TargetABI*, Pool*); +const Type* abi_intptr_type (TargetABI*, Pool*); +const Type* abi_uintptr_type (TargetABI*, Pool*); +const Type* abi_va_list_type (TargetABI*, Pool*); + +#endif diff --git a/src/arch/arch.h b/src/arch/arch.h @@ -0,0 +1,439 @@ +#ifndef CFREE_ARCH_H +#define CFREE_ARCH_H + +#include "../core/core.h" +#include "../type/type.h" +#include "../abi/abi.h" +#include "../obj/obj.h" + +/* Reg is wide enough for opt_cgtarget to hand out unbounded virtual registers + * (one per defined value). Target backends use only a small subset. */ +typedef u32 Reg; +#define REG_NONE 0xffffffffu + +typedef enum RegClass { + RC_INT, + RC_FP, + RC_VEC, +} RegClass; + +typedef enum BinOp { + BO_IADD, BO_ISUB, BO_IMUL, + BO_SDIV, BO_UDIV, BO_SREM, BO_UREM, + BO_FADD, BO_FSUB, BO_FMUL, BO_FDIV, + BO_AND, BO_OR, BO_XOR, + BO_SHL, BO_SHR_S, BO_SHR_U, +} BinOp; + +typedef enum UnOp { + UO_NEG, + UO_NOT, /* logical: 0/1 */ + UO_BNOT, /* bitwise ~ */ +} UnOp; + +typedef enum CmpOp { + CMP_EQ, CMP_NE, + CMP_LT_S, CMP_LE_S, CMP_GT_S, CMP_GE_S, + CMP_LT_U, CMP_LE_U, CMP_GT_U, CMP_GE_U, + CMP_LT_F, CMP_LE_F, CMP_GT_F, CMP_GE_F, +} CmpOp; + +typedef enum ConvKind { + CV_SEXT, CV_ZEXT, CV_TRUNC, + CV_ITOF_S, CV_ITOF_U, CV_FTOI_S, CV_FTOI_U, + CV_FEXT, CV_FTRUNC, + CV_BITCAST, +} ConvKind; + +typedef enum AtomicOp { + AO_XCHG, + AO_ADD, AO_SUB, + AO_AND, AO_OR, AO_XOR, AO_NAND, +} AtomicOp; + +typedef enum MemOrder { + MO_RELAXED, + MO_CONSUME, + MO_ACQUIRE, + MO_RELEASE, + MO_ACQ_REL, + MO_SEQ_CST, +} MemOrder; + +typedef enum OpKind { + OPK_IMM, + OPK_REG, + OPK_LOCAL, /* frame-relative; v.frame_slot identifies the slot */ + OPK_GLOBAL, /* address: symbol+addend, not a load */ + OPK_INDIRECT, /* [reg + ofs] */ +} OpKind; + +typedef u32 FrameSlot; +#define FRAME_SLOT_NONE 0u + +typedef enum FrameSlotKind { + FS_LOCAL, + FS_PARAM, + FS_SPILL, + FS_SRET, + FS_ALLOCA, +} FrameSlotKind; + +typedef enum FrameSlotFlag { + FSF_NONE = 0, + FSF_ADDR_TAKEN = 1u << 0, + FSF_VOLATILE = 1u << 1, +} FrameSlotFlag; + +typedef struct FrameSlotDesc { + const Type* type; + Sym name; + SrcLoc loc; + u32 size; + u32 align; + u8 kind; /* FrameSlotKind */ + u8 pad; + u16 flags; /* FrameSlotFlag */ +} FrameSlotDesc; + +typedef enum MemFlag { + MF_NONE = 0, + MF_VOLATILE = 1u << 0, + MF_ATOMIC = 1u << 1, + MF_RESTRICT = 1u << 2, + MF_READONLY = 1u << 3, + MF_WRITEONLY = 1u << 4, + MF_UNALIGNED = 1u << 5, +} MemFlag; + +typedef enum AliasKind { + ALIAS_UNKNOWN, + ALIAS_LOCAL, + ALIAS_GLOBAL, + ALIAS_PARAM, + ALIAS_HEAP, + ALIAS_STRING, +} AliasKind; + +typedef struct AliasRoot { + u8 kind; /* AliasKind */ + u8 pad[3]; + union { + i32 local_id; + ObjSymId global; + u32 param_idx; + Sym string_id; + } v; +} AliasRoot; + +typedef struct MemAccess { + const Type* type; /* semantic C object type accessed */ + u32 size; /* ABI byte size of this access */ + u32 align; /* known byte alignment; 0 means unknown */ + u16 flags; /* MemFlag */ + u16 addr_space; + AliasRoot alias; +} MemAccess; + +typedef struct ConstBytes { + const Type* type; + const u8* bytes; /* ABI representation, little/big endian per target */ + u32 size; + u32 align; +} ConstBytes; + +typedef struct AggregateAccess { + const Type* type; + u32 size; + u32 align; + MemAccess mem; +} AggregateAccess; + +typedef struct BitFieldAccess { + const Type* field_type; + MemAccess storage; + u32 storage_offset; /* byte offset from record base */ + u16 bit_offset; /* target-endian bit offset within storage unit */ + u16 bit_width; /* may be 0 for zero-width layout barriers */ + u8 signed_; + u8 pad[3]; +} BitFieldAccess; + +typedef struct Operand { + u8 kind; + u8 cls; /* RegClass */ + u16 pad; + const Type* type; + union { + i64 imm; + Reg reg; + FrameSlot frame_slot; + struct { ObjSymId sym; i64 addend; } global; + struct { Reg base; i32 ofs; } ind; + } v; +} Operand; + +typedef enum CGABIPartFlag { + CG_ABI_PART_NONE = 0, + CG_ABI_PART_SRET = 1u << 0, + CG_ABI_PART_BYVAL = 1u << 1, + CG_ABI_PART_INDIRECT = 1u << 2, +} CGABIPartFlag; + +typedef struct CGABIPart { + const ABIArgPart* abi_part; + Operand op; + u32 src_offset; + u32 size; + u16 flags; /* CGABIPartFlag */ + u16 pad; +} CGABIPart; + +typedef struct CGABIValue { + const Type* type; + const ABIArgInfo* abi; + Operand storage; /* address for indirect/byval/sret, REG/IMM for simple values */ + const CGABIPart* parts; + u32 nparts; +} CGABIValue; + +typedef struct CGParamDesc { + u32 index; + Sym name; + const Type* type; + FrameSlot slot; + const ABIArgInfo* abi; + const CGABIPart* incoming; + u32 nincoming; + SrcLoc loc; +} CGParamDesc; + +typedef struct CGFuncDesc { + ObjSymId sym; + const Type* fn_type; + const ABIFuncInfo* abi; + const CGParamDesc* params; + u32 nparams; + SrcLoc loc; +} CGFuncDesc; + +typedef struct CGCallDesc { + const Type* fn_type; + const ABIFuncInfo* abi; + Operand callee; + const CGABIValue* args; + u32 nargs; + CGABIValue ret; +} CGCallDesc; + +typedef u32 Label; +#define LABEL_NONE 0 + +typedef enum ScopeKind { + SCOPE_BLOCK, /* break exits forward */ + SCOPE_LOOP, /* break exits forward; continue uses explicit target */ + SCOPE_IF, /* cond consumed at scope_begin */ +} ScopeKind; + +typedef u32 CGScope; +#define CG_SCOPE_NONE 0u + +typedef struct CGScopeDesc { + u8 kind; /* ScopeKind */ + u8 pad[3]; + Label break_label; /* explicit target for break; LABEL_NONE => target creates one */ + Label continue_label; /* explicit target for continue; LABEL_NONE for non-loops */ + Operand cond; /* SCOPE_IF condition; ignored otherwise */ + const Type* result_type; /* reserved for structured expression results */ +} CGScopeDesc; + +typedef enum AsmDir { ASM_IN, ASM_OUT, ASM_INOUT } AsmDir; + +typedef struct AsmConstraint { + const char* str; /* GCC-style: "r", "=&r", "+m", "i", "0" ... */ + u8 dir; /* AsmDir */ + u8 pad[3]; +} AsmConstraint; + +typedef u32 MCLabel; +#define MC_LABEL_NONE 0u + +typedef struct MCEmitter MCEmitter; +struct MCEmitter { + /* Machine/object emission context. Subclasses extend. */ + Compiler* c; + ObjBuilder* obj; + u32 section_id; + + void (*set_section)(MCEmitter*, u32 section_id); + u32 (*pos) (MCEmitter*); + + MCLabel (*label_new) (MCEmitter*); + void (*label_place)(MCEmitter*, MCLabel); + + void (*emit_bytes)(MCEmitter*, const u8*, size_t); + void (*emit_fill) (MCEmitter*, size_t n, u8 byte); + void (*emit_align)(MCEmitter*, u32 align, u8 fill); + void (*emit_reloc)(MCEmitter*, RelocKind, ObjSymId, i64 addend); + void (*emit_reloc_at)(MCEmitter*, u32 section_id, u32 offset, RelocKind, + ObjSymId, i64 addend, int explicit_addend, int pair); + void (*emit_label_ref)(MCEmitter*, MCLabel, RelocKind, u32 width, i64 addend); + void (*set_loc) (MCEmitter*, SrcLoc); + void (*destroy) (MCEmitter*); +}; + +typedef struct CGTarget CGTarget; +struct CGTarget { + /* Typed C/IR lowering context. Subclasses extend. */ + Compiler* c; + ObjBuilder* obj; + MCEmitter* mc; + u32 text_section_id; + + /* ---- function lifecycle ---- */ + void (*func_begin)(CGTarget*, const CGFuncDesc*); + void (*func_end)(CGTarget*); + + /* ---- registers and frame slots ---- + * At -O0 CG is TCC-style and owns the value stack: it decides which live + * values must be spilled/reloaded across register pressure, calls, and asm. + * Real targets return physical scratch registers and implement spill/reload + * mechanics; opt_cgtarget returns fresh virtual regs and ignores spills. */ + Reg (*alloc_reg) (CGTarget*, RegClass, const Type*); + void (*free_reg) (CGTarget*, Reg); /* hint; opt_cgtarget ignores */ + i32 (*alloc_local)(CGTarget*, u32 size, u32 align); + FrameSlot (*frame_slot)(CGTarget*, const FrameSlotDesc*); + void (*param) (CGTarget*, const CGParamDesc*); + const Reg* (*clobbers)(CGTarget*, RegClass, u32* nregs); + void (*spill_reg) (CGTarget*, Operand src_reg, FrameSlot, MemAccess); + void (*reload_reg) (CGTarget*, Operand dst_reg, FrameSlot, MemAccess); + + /* ---- labels and control flow ---- */ + Label (*label_new) (CGTarget*); + void (*label_place)(CGTarget*, Label); + void (*jump) (CGTarget*, Label); + /* Fused compare-and-branch. cg's preferred form: avoids materializing 0/1 + * for a normal `if (a < b)`. For an arbitrary i1 in a register, callers + * synthesize cmp_branch(CMP_NE, val, IMM_ZERO, label). */ + void (*cmp_branch)(CGTarget*, CmpOp, Operand a, Operand b, Label); + + /* ---- structured control flow ---- + * Mirrors CG's scope ops. CG passes explicit break/continue targets so C + * `for` continues can land on the increment expression rather than the loop + * header. Real backends shim these onto label_new/label_place/jump. + * The WASM backend consumes them natively to emit block/loop/if with + * structurally-bounded br targets, which is what gives WASM its CFI. + * + * For SCOPE_IF, `cond` is the i1 operand; ignored for BLOCK/LOOP. + * `result_type` is reserved for if-as-expression on WASM (NULL for the + * statement case used by C); other backends ignore it. */ + CGScope (*scope_begin)(CGTarget*, const CGScopeDesc*); + void (*scope_else) (CGTarget*, CGScope); + void (*scope_end) (CGTarget*, CGScope); + void (*break_to) (CGTarget*, CGScope); + void (*continue_to)(CGTarget*, CGScope); + + /* ---- data movement (split, no overloading) ---- */ + void (*load_imm)(CGTarget*, Operand dst /*REG*/, i64 imm); + void (*load_const)(CGTarget*, Operand dst /*REG*/, ConstBytes); + void (*copy) (CGTarget*, Operand dst /*REG*/, Operand src /*REG*/); + void (*load) (CGTarget*, Operand dst /*REG*/, Operand addr /*LOCAL|GLOBAL|INDIRECT*/, MemAccess); + void (*store) (CGTarget*, Operand addr /*LOCAL|GLOBAL|INDIRECT*/, Operand src /*REG|IMM*/, MemAccess); + void (*addr_of) (CGTarget*, Operand dst /*REG*/, Operand lv /*LOCAL|GLOBAL|INDIRECT*/); + void (*copy_bytes)(CGTarget*, Operand dst_addr, Operand src_addr, AggregateAccess); + void (*set_bytes) (CGTarget*, Operand dst_addr, Operand byte_value, AggregateAccess); + void (*bitfield_load) (CGTarget*, Operand dst /*REG*/, Operand record_addr, BitFieldAccess); + void (*bitfield_store)(CGTarget*, Operand record_addr, Operand src /*REG|IMM*/, BitFieldAccess); + + /* ---- arithmetic, compare, convert ---- */ + void (*binop) (CGTarget*, BinOp, Operand dst, Operand a, Operand b); + void (*unop) (CGTarget*, UnOp, Operand dst, Operand a); + void (*cmp) (CGTarget*, CmpOp, Operand dst, Operand a, Operand b); /* materialize 0/1 */ + void (*convert)(CGTarget*, ConvKind, Operand dst, Operand src); + + /* ---- calls / return ---- + * CGCallDesc carries the type-checked signature, inspectable ABI + * classification, source operands, and the already-materialized ABI parts + * for direct, indirect/byval, sret, split, and multi-register values. + * `callee.kind == OPK_GLOBAL` is direct; any other kind is indirect. */ + void (*call)(CGTarget*, const CGCallDesc*); + void (*ret) (CGTarget*, const CGABIValue* val_or_null); + + /* ---- alloca ---- + * Dynamic stack allocation. `size` is i64 bytes; `align` is the required + * alignment of the returned pointer. Backend grows the (linear-memory or + * native) shadow stack, returns the pointer in `dst`. v1 only emits this + * via __builtin_alloca; C VLAs are not parsed (__STDC_NO_VLA__). */ + void (*alloca_)(CGTarget*, Operand dst /*REG*/, Operand size, u32 align); + + /* ---- variadics ---- + * va_list type is per-arch (defined in <stdarg.h>); these methods + * implement the four C macros after builtin substitution. ap is always + * passed as &ap; on SysV x86-64 the backend manages the register-save + * area, on WASM the backend walks the spilled-args memory. */ + void (*va_start_)(CGTarget*, Operand ap_addr); + void (*va_arg_) (CGTarget*, Operand dst /*REG*/, Operand ap_addr, const Type* t); + void (*va_end_) (CGTarget*, Operand ap_addr); + void (*va_copy_) (CGTarget*, Operand dst_ap_addr, Operand src_ap_addr); + + /* ---- setjmp / longjmp ---- + * Optional. Real backends leave these NULL: the parser lowers <setjmp.h>'s + * setjmp to a normal call to __cfree_setjmp and opt recognizes the symbol + * by name as returns-twice. The WASM backend implements them via the + * exception-handling proposal so that a longjmp can unwind across WASM + * frames (which lack a saveable native SP). + * + * setjmp pops &buf, returns i32 in `dst` (0 on direct return, nonzero on + * longjmp). longjmp pops &buf and val; control does not return. */ + void (*setjmp_) (CGTarget*, Operand dst /*REG, i32*/, Operand buf_addr); + void (*longjmp_)(CGTarget*, Operand buf_addr, Operand val); + + /* ---- atomics ---- */ + void (*atomic_load) (CGTarget*, Operand dst /*REG*/, Operand addr, MemAccess, MemOrder); + void (*atomic_store)(CGTarget*, Operand addr, Operand src, MemAccess, MemOrder); + void (*atomic_rmw) (CGTarget*, AtomicOp, Operand dst /*REG: prior value*/, + Operand addr, Operand val, MemAccess, MemOrder); + void (*atomic_cas) (CGTarget*, Operand prior /*REG*/, Operand ok /*REG, i1*/, + Operand addr, Operand expected, Operand desired, + MemAccess, MemOrder success, MemOrder failure); + void (*fence) (CGTarget*, MemOrder); + + /* ---- inline asm ---- + * Per-arch constraint binding + template assembly, packaged as one block. + * ins[i] are pre-evaluated input operands. + * out_ops[i] is filled by the arch with the location holding the result + * for outs[i]; the caller (cg) reads them out after the call. + * "=&r" early-clobber outputs must be allocated disjoint from any input. + * opt_cgtarget records this as a single IR_ASM_BLOCK; the wrapped target + * receives the same call at lowering time with materialized operands. */ + void (*asm_block)(CGTarget*, + const char* tmpl, + const AsmConstraint* outs, u32 nout, Operand* out_ops, + const AsmConstraint* ins, u32 nin, const Operand* in_ops, + const Sym* clobbers, u32 nclob); + + /* ---- source-location tracking ---- + * Sets the SrcLoc inherited by subsequent emit-side calls (binop/load/...). + * opt_cgtarget stamps it on every recorded Inst; target CGTargets forward it + * to MCEmitter for Debug line emission. Sticky until the next set_loc. */ + void (*set_loc)(CGTarget*, SrcLoc); + + /* ---- end-of-TU hook ---- + * No-op for plain target CGTargets. opt_cgtarget runs cross-function passes + * (inlining + cleanup) and lowers all buffered IR functions into the + * wrapped target CGTarget. Drivers must call this after the last func_end and + * before reading from `obj` or calling debug_emit. */ + void (*finalize)(CGTarget*); + + void (*destroy)(CGTarget*); +}; + +/* Construct the right target/emitter pair for c->target. */ +MCEmitter* mc_new(Compiler*, ObjBuilder*); +void mc_free(MCEmitter*); + +CGTarget* cgtarget_new(Compiler*, ObjBuilder*, MCEmitter*); +void cgtarget_finalize(CGTarget*); +void cgtarget_free(CGTarget*); + +#endif diff --git a/src/cg/cg.h b/src/cg/cg.h @@ -0,0 +1,163 @@ +#ifndef CFREE_CG_H +#define CFREE_CG_H + +#include "../arch/arch.h" +#include "../decl/decl.h" +#include "../type/type.h" + +typedef struct CG CG; +typedef struct Debug Debug; + +/* Debug is optional; pass NULL when -g is off. */ +CG* cg_new(Compiler*, CGTarget*, Debug*); +void cg_free(CG*); + +/* ----- functions ----- */ +void cg_func_begin(CG*, const CGFuncDesc*); +void cg_func_end (CG*); + +/* ----- locals & params ----- */ +FrameSlot cg_local(CG*, const FrameSlotDesc*); /* returns frame slot; pushes nothing */ +void cg_param(CG*, const CGParamDesc*); + +/* ----- value-stack pushes ----- */ +void cg_push_int (CG*, i64, const Type*); +void cg_push_const (CG*, ConstBytes); /* exact ABI bytes */ +void cg_push_float (CG*, double, const Type*); /* convenience for simple parser paths */ +void cg_push_str (CG*, Sym str_id, const Type*); /* into rodata; pushes pointer */ +void cg_push_local (CG*, FrameSlot); /* lvalue */ +void cg_push_global(CG*, ObjSymId, const Type*); /* lvalue */ + +/* ----- value-stack manipulation ----- */ +void cg_load (CG*); /* lvalue → rvalue; derives MemAccess */ +void cg_addr (CG*); /* lvalue → ptr rvalue */ +void cg_store(CG*); /* [..., lv, rv] → []; derives MemAccess */ +void cg_dup (CG*); +void cg_swap (CG*); +void cg_drop (CG*); + +/* Aggregate and bitfield operations keep C object semantics visible to direct + * targets and opt. Addresses are lvalues or pointer rvalues on the value stack; + * sizes, offsets, storage units, and alignments come from TargetABI. */ +void cg_copy_aggregate(CG*, AggregateAccess); /* [..., dst_addr, src_addr] → [] */ +void cg_set_aggregate (CG*, AggregateAccess); /* [..., dst_addr, byte] → [] */ +void cg_bitfield_load (CG*, BitFieldAccess); /* [..., record_addr] → value */ +void cg_bitfield_store(CG*, BitFieldAccess); /* [..., record_addr, value] → [] */ + +void cg_binop (CG*, BinOp); +void cg_unop (CG*, UnOp); +void cg_cmp (CG*, CmpOp); +void cg_convert(CG*, const Type* dst); /* picks ConvKind from src/dst */ + +/* Direct vs indirect: callee on the stack distinguishes itself by SValue/operand + * kind. CG obtains ABIFuncInfo from Compiler.abi, materializes CGABIValue + * argument/return parts, then calls CGTarget.call with a CGCallDesc. On WASM, + * fn_type selects the call_indirect type index (interned Type* identity is the + * index source of truth). */ +void cg_call(CG*, u32 nargs, const Type* fn_type); /* stack: [..., callee, arg0..argN-1] + → result (if non-void) */ +void cg_ret (CG*, int has_value); + +/* ----- C declarations and global initializers ----- + * Parser records C declaration semantics through DeclTable. CG consumes DeclIds + * only when a declaration becomes executable code or an addressable object. */ +void cg_bind_decl(CG*, DeclId); + +/* ----- alloca ----- + * Dynamic stack allocation. Pops `size_bytes` (i64), pushes `void*` aligned to + * max_align_t. v1 does not parse C99/C11 VLAs (predefines __STDC_NO_VLA__); + * cg_alloca is reachable only via the __builtin_alloca path. */ +void cg_alloca(CG*); + +/* ----- variadics ----- + * va_list type is per-arch (defined in <stdarg.h>). The four ops match the C + * macros after builtin substitution. cg_va_arg pops &ap and pushes the next + * arg of `t`. cg_va_start/end/copy pop the va_list addresses and push nothing. */ +/* The trailing underscores avoid colliding with <stdarg.h> macros — cfree + * sources include stdarg.h for compiler_panicv (see core.h). */ +void cg_va_start_(CG*); /* pop &ap */ +void cg_va_arg_ (CG*, const Type* t); /* pop &ap; push value */ +void cg_va_end_ (CG*); /* pop &ap */ +void cg_va_copy_ (CG*); /* pop &dst, &src */ + +/* ----- setjmp / longjmp ----- + * On real arches these are NOT emitted: the parser lowers <setjmp.h>'s setjmp + * to a normal extern call to __cfree_setjmp; opt recognizes the symbol by name + * as returns-twice (no inlining across; values defined before the call are not + * GVN-merged with values defined after). On WASM the parser instead emits + * cg_setjmp/cg_longjmp, which forward to CGTarget.setjmp/CGTarget.longjmp; the WASM + * backend lowers via the exception-handling proposal. + * + * cg_setjmp pops &buf and pushes i32 (0 on direct return, nonzero on longjmp). + * cg_longjmp pops &buf and val; does not return. */ +void cg_setjmp (CG*); +void cg_longjmp(CG*); + +/* ----- atomics ----- + * Pointer operands are typed `_Atomic T*`. cg derives MemAccess from the + * pointee type, qualifiers, alignment facts, and alias root; the pointee type + * drives width and tells the backend whether the op fits inline or routes to + * compiler-rt. */ +void cg_atomic_load (CG*, MemOrder); /* pops ptr; pushes value */ +void cg_atomic_store(CG*, MemOrder); /* pops ptr, value */ +void cg_atomic_rmw (CG*, AtomicOp, MemOrder); /* pops ptr, val; pushes prior */ +void cg_atomic_cas (CG*, MemOrder success, MemOrder failure); + /* pops ptr, expected, desired; + * pushes (prior, ok_i1) */ +void cg_fence (CG*, MemOrder); + +/* ----- control flow (CG-level labels) ----- + * cg_branch_true fuses with a preceding cg_cmp into a single CGTarget.cmp_branch + * when the i1 on top of stack is the unconsumed result of that cmp. For a + * non-cmp i1, it emits cmp_branch(CMP_NE, val, IMM_ZERO, label). */ +typedef u32 CGLabel; +CGLabel cg_label_new(CG*); +void cg_label_place(CG*, CGLabel); +void cg_jump(CG*, CGLabel); +void cg_branch_true (CG*, CGLabel); /* pops i1 */ +void cg_branch_false(CG*, CGLabel); + +/* ----- structured control flow ----- + * Used for if / while / for / do — the cases where the parser already knows + * the structure. Nests like a stack: every scope_begin must pair with one + * scope_end at the same nesting depth. Break and continue targets are explicit + * so C `for` continue jumps to the increment expression, not necessarily the + * loop header. + * + * Real backends implement these as a thin shim over label_place/jump (no code + * size cost). The WASM backend consumes them directly to emit block/loop/if + * with structurally-bounded br targets — that's the source of CFI on WASM + * without invoking the relooper. + * + * goto, computed-goto, and switch fallthrough still go through the flat label + * API above. opt's IR is flat-CFG; at -O2 the WASM lowering pass relooper + * reconstructs structure from the flat IR. At -O0/-O1 (no opt wrapper), + * CG drives the WASM CGTarget directly with scope ops and no relooper runs. */ +/* ScopeKind is shared with CGTarget (see arch.h). */ +typedef u32 CGScope; +typedef struct CGScopeConfig { + ScopeKind kind; + CGLabel break_label; + CGLabel continue_label; + const Type* result_type; +} CGScopeConfig; +CGScope cg_scope_begin(CG*, CGScopeConfig); /* IF: pops i1 */ +void cg_scope_else (CG*, CGScope); /* IF only */ +void cg_scope_end (CG*, CGScope); +void cg_break (CG*, CGScope); +void cg_continue (CG*, CGScope); /* LOOP only */ + +/* ----- source location ----- */ +void cg_set_loc(CG*, SrcLoc); /* propagates to CGTarget and Debug */ + +/* ----- inline asm ----- + * Inputs are popped from the CG stack in declaration order before outputs are + * pushed back as fresh SValues. Constraints are GCC-style strings; binding + * is per-arch and happens inside CGTarget.asm_block. */ +void cg_inline_asm(CG*, + const char* tmpl, + const AsmConstraint* outs, u32 nout, + const AsmConstraint* ins, u32 nin, + const Sym* clobbers, u32 nclob); + +#endif diff --git a/src/debug/debug.h b/src/debug/debug.h @@ -0,0 +1,72 @@ +#ifndef CFREE_DEBUG_H +#define CFREE_DEBUG_H + +#include "../core/core.h" +#include "../type/type.h" +#include "../arch/arch.h" + +/* DWARF debug info. The producer side (CG, CGTarget/MCEmitter, opt) feeds events here as + * compilation runs; the consumer side writes .debug_* sections into the same + * ObjBuilder when debug_emit is called. + * + * Producer responsibilities: + * - Parser: nothing directly; types are looked up on demand from those that + * reach debug_local / debug_param. + * - CG: function and scope lifecycle, parameter and local declarations. + * - MCEmitter (or the lowering pass inside opt at -O2): the line program, and + * pc-range bounds for functions. + * - opt at -O2: location-list entries when a variable's location changes + * across the optimized function. */ + +typedef struct Debug Debug; + +Debug* debug_new(Compiler*, ObjBuilder*); +void debug_free(Debug*); + +/* file table — SourceManager owns paths; returns DWARF file index */ +u32 debug_file(Debug*, u32 source_file_id); + +/* function lifecycle */ +void debug_func_begin (Debug*, ObjSymId, const Type* fn_type, SrcLoc decl); +void debug_func_pc_range(Debug*, ObjSecId text_section_id, u32 begin_ofs, u32 end_ofs); +void debug_func_end (Debug*); + +/* lexical scopes (nested between func_begin/end) */ +void debug_scope_begin(Debug*, SrcLoc); +void debug_scope_end (Debug*, SrcLoc); + +/* variable location */ +typedef enum DebugVarLocKind { + DVL_FRAME, + DVL_REG, + DVL_GLOBAL, + DVL_LOCLIST, /* time-varying location, see debug_loclist_* */ +} DebugVarLocKind; + +typedef struct DebugVarLoc { + u8 kind; + u8 pad[3]; + union { + i32 frame_ofs; + Reg reg; + ObjSymId global; + u32 loclist_id; + } v; +} DebugVarLoc; + +void debug_param(Debug*, Sym name, const Type*, SrcLoc, u32 idx, DebugVarLoc); +void debug_local(Debug*, Sym name, const Type*, SrcLoc, DebugVarLoc); + +/* line program */ +void debug_line(Debug*, ObjSecId text_section_id, u32 text_offset, SrcLoc, int is_stmt); + +/* location lists — for opt'd code where a variable moves between locations */ +u32 debug_loclist_new(Debug*); +void debug_loclist_add(Debug*, u32 id, u32 begin_pc, u32 end_pc, DebugVarLoc); + +/* Emit the accumulated debug info as DWARF sections into the ObjBuilder. + * Must be called after all code sections are finalized but before the + * file emitters run. */ +void debug_emit(Debug*); + +#endif diff --git a/src/decl/decl.h b/src/decl/decl.h @@ -0,0 +1,91 @@ +#ifndef CFREE_DECL_H +#define CFREE_DECL_H + +#include "../arch/arch.h" + +/* C declaration semantics. This layer is deliberately above ObjBuilder: + * ObjBuilder stores object-format facts, while DeclTable owns C linkage, + * storage duration, tentative-definition, static-local, and initializer rules. */ +typedef struct DeclTable DeclTable; + +typedef u32 DeclId; +#define DECL_NONE 0u + +typedef enum DeclStorage { + DS_EXTERN, + DS_STATIC, + DS_AUTO, + DS_REGISTER, + DS_TYPEDEF, +} DeclStorage; + +typedef enum DeclLinkage { + DL_NONE, + DL_INTERNAL, + DL_EXTERNAL, +} DeclLinkage; + +typedef enum DeclFlag { + DF_NONE = 0, + DF_THREAD = 1u << 0, + DF_INLINE = 1u << 1, + DF_TENTATIVE = 1u << 2, + DF_USED = 1u << 3, + DF_WEAK = 1u << 4, + DF_STATIC_LOCAL = 1u << 5, +} DeclFlag; + +typedef struct Decl { + DeclId id; + Sym name; + const Type* type; + ObjSymId obj_sym; + ObjSecId section_id; /* optional explicit section; OBJ_SEC_NONE => default */ + SrcLoc loc; + u8 storage; /* DeclStorage */ + u8 linkage; /* DeclLinkage */ + u8 visibility; /* SymVis */ + u8 pad; + u32 flags; /* DeclFlag */ +} Decl; + +typedef enum InitKind { + INIT_ZERO, + INIT_BYTES, + INIT_RELOC, + INIT_FILL, +} InitKind; + +typedef struct InitReloc { + RelocKind kind; + ObjSymId target; + i64 addend; + u32 width; +} InitReloc; + +typedef struct InitItem { + u32 offset; /* byte offset inside the initialized object */ + u32 size; + u8 kind; /* InitKind */ + u8 pad[3]; + union { + ConstBytes bytes; + InitReloc reloc; + struct { u8 byte; } fill; + } v; +} InitItem; + +DeclTable* decl_new(Compiler*, ObjBuilder*); +void decl_free(DeclTable*); + +DeclId decl_declare(DeclTable*, const Decl*); +const Decl* decl_get(const DeclTable*, DeclId); +ObjSymId decl_obj_sym(const DeclTable*, DeclId); + +void decl_define_function(DeclTable*, DeclId, ObjSecId text_section_id, + u64 value, u64 size); +void decl_define_object (DeclTable*, DeclId, u64 size, u32 align, + const InitItem* init, u32 ninit); +void decl_define_tentative(DeclTable*, DeclId, u64 size, u32 align); + +#endif diff --git a/src/driver/driver.h b/src/driver/driver.h @@ -0,0 +1,23 @@ +#ifndef CFREE_DRIVER_H +#define CFREE_DRIVER_H + +#include "../core/core.h" + +typedef enum Tool { + TOOL_CC, + TOOL_CPP, + TOOL_AS, + TOOL_LD, + TOOL_AR, + TOOL_OBJDUMP, + TOOL_DBG, +} Tool; + +/* Multi-call entry: dispatches by argv[0] basename, falling back to argv[1] + * (e.g. `cfree cc ...`). */ +int driver_main(int argc, char** argv); + +/* Direct entry per tool. */ +int driver_run(Tool, int argc, char** argv); + +#endif diff --git a/src/lex/lex.h b/src/lex/lex.h @@ -0,0 +1,102 @@ +#ifndef CFREE_LEX_H +#define CFREE_LEX_H + +#include "../core/core.h" + +typedef enum TokKind { + TOK_EOF = 0, + TOK_IDENT, /* v.ident */ + TOK_NUM, /* lit */ + TOK_FLT, /* lit */ + TOK_STR, /* lit; v.str is decoded bytes if target-independent */ + TOK_CHR, /* lit */ + TOK_PUNCT, /* v.punct */ + TOK_PP_HASH, /* # */ + TOK_PP_PASTE, /* ## */ + TOK_NEWLINE, /* visible to PP only */ + TOK_KW_FIRST, + /* C11 keywords are inserted into this range by parse_c via pool */ + TOK_KW_LAST = 0x1000, +} TokKind; + +typedef enum TokFlag { + TF_AT_BOL = 1u << 0, + TF_HAS_SPACE = 1u << 1, + TF_NO_EXPAND = 1u << 2, + TF_INT_U = 1u << 3, + TF_INT_L = 1u << 4, + TF_INT_LL = 1u << 5, + TF_FLT_F = 1u << 6, + TF_FLT_L = 1u << 7, + TF_STR_WIDE = 1u << 8, + TF_STR_U8 = 1u << 9, + TF_STR_U16 = 1u << 10, + TF_STR_U32 = 1u << 11, + TF_LITERAL_BAD = 1u << 12, +} TokFlag; + +typedef enum Punct { + P_NONE = 0, + /* Single-char punctuators reuse their ASCII codepoint here. */ + P_ARROW = 256, P_INC, P_DEC, + P_SHL, P_SHR, + P_LE, P_GE, P_EQ, P_NE, + P_AND, P_OR, + P_ADD_ASSIGN, P_SUB_ASSIGN, P_MUL_ASSIGN, P_DIV_ASSIGN, P_MOD_ASSIGN, + P_AND_ASSIGN, P_OR_ASSIGN, P_XOR_ASSIGN, P_SHL_ASSIGN, P_SHR_ASSIGN, + P_ELLIPSIS, + P_HASH_HASH, +} Punct; + +typedef u32 LitId; +#define LIT_NONE 0u + +typedef enum LitKind { + LIT_INT, + LIT_FLOAT, + LIT_STRING, + LIT_CHAR, +} LitKind; + +typedef enum LitEnc { + LENC_ORDINARY, + LENC_UTF8, + LENC_WIDE, + LENC_UTF16, + LENC_UTF32, +} LitEnc; + +typedef struct LitInfo { + u8 kind; /* LitKind */ + u8 enc; /* LitEnc for strings/chars */ + u16 flags; /* TokFlag suffix/encoding bits */ + Sym spelling; /* exact source spelling */ + Sym bytes; /* decoded bytes/code units, if already decoded */ +} LitInfo; + +typedef struct Tok { + u16 kind; + u16 flags; + SrcLoc loc; + Sym spelling; /* exact token spelling for diagnostics/#/## */ + LitId lit; /* literal-table handle; LIT_NONE otherwise */ + union { + Sym ident; + Sym str; + u32 punct; + } v; +} Tok; + +typedef struct Lexer Lexer; + +Lexer* lex_open(Compiler*, const char* path); +Lexer* lex_open_mem(Compiler*, const char* name, const char* src, size_t len); +void lex_close(Lexer*); + +/* Streaming. Returns TOK_EOF repeatedly at end of input. */ +Tok lex_next(Lexer*); +SrcLoc lex_loc(const Lexer*); +u32 lex_file_id(const Lexer*); +const LitInfo* lex_lit(const Lexer*, LitId); + +#endif diff --git a/src/link/link.h b/src/link/link.h @@ -0,0 +1,132 @@ +#ifndef CFREE_LINK_H +#define CFREE_LINK_H + +#include "../obj/obj.h" + +typedef struct Linker Linker; +typedef struct LinkImage LinkImage; + +typedef enum LinkInputKind { + LINK_INPUT_OBJ, + LINK_INPUT_OBJ_FILE, + LINK_INPUT_ARCHIVE, +} LinkInputKind; + +typedef u32 LinkInputId; +#define LINK_INPUT_NONE 0u + +typedef u32 LinkSymId; +#define LINK_SYM_NONE 0u + +typedef u32 LinkSegmentId; +#define LINK_SEG_NONE 0u + +typedef u32 LinkSectionId; +#define LINK_SEC_NONE 0u + +typedef struct LinkInput { + LinkInputId id; + u8 kind; /* LinkInputKind */ + u8 pad[3]; + ObjBuilder* obj; /* for LINK_INPUT_OBJ, otherwise NULL until read */ + Sym path; /* for file/archive inputs */ +} LinkInput; + +typedef struct LinkSymbol { + LinkSymId id; + Sym name; + LinkInputId input_id; + ObjSymId obj_sym; + ObjSecId section_id; + u64 value; + u64 vaddr; /* final linked address, 0 for unresolved undef */ + u64 size; + u8 bind; /* SymBind */ + u8 kind; /* SymKind */ + u8 defined; + u8 pad; +} LinkSymbol; + +typedef struct LinkSegment { + LinkSegmentId id; + u32 flags; /* SecFlag-like permissions after layout */ + u64 file_offset; + u64 vaddr; + u64 mem_size; + u64 file_size; + u32 align; + u32 nsections; +} LinkSegment; + +typedef struct LinkSection { + LinkSectionId id; + LinkInputId input_id; + ObjSecId obj_section_id; + LinkSegmentId segment_id; + u64 input_offset; + u64 file_offset; + u64 vaddr; + u64 size; + u32 flags; + u32 align; +} LinkSection; + +typedef struct LinkRelocApply { + LinkInputId input_id; + ObjSecId section_id; + LinkSectionId link_section_id; + u32 offset; + u32 width; + u64 write_vaddr; + u64 write_file_offset; + RelocKind kind; + LinkSymId target; + i64 addend; +} LinkRelocApply; + +typedef void* (*LinkExternResolver)(void* user, Sym name); + +typedef struct JitImage { + LinkImage* image; + void* base; + size_t size; +} JitImage; + +Linker* link_new(Compiler*); +void link_free(Linker*); + +LinkInputId link_add_obj(Linker*, ObjBuilder*); /* fresh-compiled */ +LinkInputId link_add_obj_file(Linker*, const char* path); /* read .o from disk */ +LinkInputId link_add_archive(Linker*, const char* path); /* .a / static archive */ +void link_add_lib_search_path(Linker*, const char* dir); +void link_set_entry(Linker*, Sym name); +void link_set_script(Linker*, const char* path); +void link_set_extern_resolver(Linker*, LinkExternResolver, void* user); + +/* Symbol resolution and layout are explicit so file linking and JIT share the + * same resolved image. Fatal diagnostics use Compiler.panic. */ +LinkImage* link_resolve(Linker*); +void link_image_free(LinkImage*); +const LinkSymbol* link_symbol(LinkImage*, LinkSymId); +LinkSymId link_symbol_lookup(LinkImage*, Sym name); +u32 link_segment_count(LinkImage*); +const LinkSegment* link_segment_get(LinkImage*, u32 id); +const u8* link_segment_bytes(LinkImage*, LinkSegmentId, size_t* size_out); +u32 link_section_count(LinkImage*); +const LinkSection* link_section_get(LinkImage*, LinkSectionId id); +u32 link_reloc_apply_count(LinkImage*); +const LinkRelocApply* link_reloc_apply_get(LinkImage*, u32 id); + +/* Writes an executable in the format implied by Compiler.target. */ +void link_emit_exe(Linker*, const char* out_path); +void link_emit_image(LinkImage*, const char* out_path); + +/* JIT: maps sections into memory, applies relocations, returns the address of + * the entry symbol (or any named symbol via link_jit_lookup). */ +void* link_jit(Linker*); +JitImage* link_jit_image(LinkImage*); +void* link_jit_lookup(Linker*, Sym name); +void* jit_image_lookup(JitImage*, Sym name); +void jit_image_free(JitImage*); + +#endif diff --git a/src/obj/obj.h b/src/obj/obj.h @@ -0,0 +1,239 @@ +#ifndef CFREE_OBJ_H +#define CFREE_OBJ_H + +#include "../core/core.h" +#include "../core/buf.h" + +typedef enum SecKind { + SEC_TEXT, + SEC_RODATA, + SEC_DATA, + SEC_BSS, + SEC_DEBUG, + SEC_OTHER, +} SecKind; + +typedef enum SecFlag { + SF_EXEC = 1u << 0, + SF_WRITE = 1u << 1, + SF_ALLOC = 1u << 2, + SF_TLS = 1u << 3, + SF_MERGE = 1u << 4, + SF_STRINGS = 1u << 5, + SF_GROUP = 1u << 6, + SF_LINK_ORDER= 1u << 7, +} SecFlag; + +typedef enum SecSem { + SSEM_PROGBITS, + SSEM_NOBITS, + SSEM_SYMTAB, + SSEM_STRTAB, + SSEM_RELA, + SSEM_REL, + SSEM_NOTE, + SSEM_INIT_ARRAY, + SSEM_FINI_ARRAY, + SSEM_PREINIT_ARRAY, + SSEM_GROUP, + SSEM_WASM_CUSTOM, +} SecSem; + +typedef enum SymBind { + SB_LOCAL, + SB_GLOBAL, + SB_WEAK, +} SymBind; + +typedef enum SymVis { + SV_DEFAULT, + SV_HIDDEN, + SV_PROTECTED, + SV_INTERNAL, +} SymVis; + +typedef enum SymKind { + SK_UNDEF, + SK_FUNC, + SK_OBJ, + SK_SECTION, + SK_FILE, + SK_COMMON, + SK_TLS, + SK_ABS, +} SymKind; + +typedef enum ObjExtKind { + OBJ_EXT_NONE, + OBJ_EXT_ELF, + OBJ_EXT_COFF, + OBJ_EXT_MACHO, + OBJ_EXT_WASM, +} ObjExtKind; + +typedef u32 ObjSecId; +#define OBJ_SEC_NONE 0u + +typedef u32 ObjGroupId; +#define OBJ_GROUP_NONE 0u + +/* Per-ObjBuilder symbol handle. Object files own their symbol namespace: + * local/static symbols, section symbols, file symbols, unnamed labels, common + * definitions, and external references are all represented by ObjSymId values + * scoped to one builder. 0 is reserved as "none". */ +typedef u32 ObjSymId; +#define OBJ_SYM_NONE 0u + +typedef enum RelocKind { + R_NONE = 0, + R_ABS32, R_ABS64, + R_REL32, R_REL64, + R_PC32, R_PC64, + R_GOT32, R_PLT32, + R_ARM_CALL, R_ARM_MOVW, R_ARM_MOVT, R_ARM_B26, + R_AARCH64_CALL26, R_AARCH64_ADR_PREL_PG_HI21, R_AARCH64_ADD_ABS_LO12_NC, + R_RV_HI20, R_RV_LO12_I, R_RV_LO12_S, R_RV_BRANCH, R_RV_JAL, R_RV_CALL, + R_WASM_FUNCIDX, R_WASM_TABLEIDX, R_WASM_MEMOFS, R_WASM_TYPEIDX, +} RelocKind; + +typedef struct Section { + Sym name; + u16 kind; + u16 flags; + u16 sem; /* SecSem */ + u16 ext_kind; /* ObjExtKind */ + u32 align; + u32 entsize; + ObjSecId link; /* section index or OBJ_SEC_NONE */ + u32 info; /* section-format dependent, typed by sem/ext_kind */ + ObjGroupId group_id; /* OBJ_GROUP_NONE if not in a COMDAT/group */ + u32 bss_size; /* nonzero only for SEC_BSS */ + Buf bytes; +} Section; + +typedef struct Reloc { + ObjSecId section_id; + u32 offset; + u16 kind; + u8 has_explicit_addend; + u8 pair; /* paired/following relocation, format-specific */ + ObjSymId sym; + i64 addend; +} Reloc; + +typedef struct ObjSym { + Sym name; + u16 bind; + u16 kind; + u8 vis; + u8 ext_kind; + u16 flags; + ObjSecId section_id; /* OBJ_SEC_NONE if undef */ + u64 value; /* offset within section, or absolute */ + u64 size; + u64 common_align; /* nonzero for SK_COMMON */ +} ObjSym; + +typedef struct ObjGroup { + Sym name; + ObjSymId signature; + ObjSecId* sections; + u32 nsections; + u32 flags; +} ObjGroup; + +/* The single concrete in-memory object representation. + * Written by MCEmitter/CGTarget (during compile) or by an .o reader (during link). + * Read by file emitters, the linker (file and JIT), and objdump. + * + * Invariant: post-finalize state is identical in shape to what an .o reader + * would produce from a written-out object — so consumers don't care which + * path produced it. + * + * Lifecycle gates: + * 1. MCEmitter/CGTarget (or a .o reader) issues writes. + * 2. cgtarget_finalize must be called before any debug_emit or read access on + * the builder. At -O2 it flushes lowered code into sections. + * 3. debug_emit (if -g) writes .debug_* sections. + * 4. obj_finalize closes the builder: computes flat section offsets, applies + * pending fixups within sections, and freezes the read-side view. + * No further writes are permitted afterward. + * 5. File emitters and the linker consume via the read API. */ +typedef struct ObjBuilder ObjBuilder; + +ObjBuilder* obj_new(Compiler*); +void obj_free(ObjBuilder*); + +/* ---- write side (MCEmitter/CGTarget and .o readers) ---- */ +ObjSecId obj_section(ObjBuilder*, Sym name, SecKind, u16 flags, u32 align); +ObjSecId obj_section_ex(ObjBuilder*, Sym name, SecKind, SecSem, u16 flags, + u32 align, u32 entsize, u32 link, u32 info); +void obj_section_set_flags(ObjBuilder*, ObjSecId, u16 flags); +void obj_section_set_align(ObjBuilder*, ObjSecId, u32 align); +void obj_section_set_group(ObjBuilder*, ObjSecId, ObjGroupId); +void obj_write (ObjBuilder*, ObjSecId section_id, const void* data, size_t n); +u8* obj_reserve(ObjBuilder*, ObjSecId section_id, size_t n); +void obj_reserve_bss(ObjBuilder*, ObjSecId section_id, u32 size, u32 align); +u32 obj_pos (ObjBuilder*, ObjSecId section_id); +void obj_patch (ObjBuilder*, ObjSecId section_id, u32 ofs, const void* data, size_t n); + +ObjSymId obj_symbol(ObjBuilder*, Sym name, SymBind, SymKind, + ObjSecId section_id, u64 value, u64 size); +ObjSymId obj_symbol_ex(ObjBuilder*, Sym name, SymBind, SymVis, SymKind, + ObjSecId section_id, u64 value, u64 size, u64 common_align); +void obj_symbol_define(ObjBuilder*, ObjSymId, ObjSecId section_id, u64 value, u64 size); + +void obj_reloc(ObjBuilder*, ObjSecId section_id, u32 offset, + RelocKind, ObjSymId sym, i64 addend); +void obj_reloc_ex(ObjBuilder*, ObjSecId section_id, u32 offset, RelocKind, + ObjSymId sym, i64 addend, int explicit_addend, int pair); + +ObjGroupId obj_group(ObjBuilder*, Sym name, ObjSymId signature, u32 flags); +void obj_group_add_section(ObjBuilder*, ObjGroupId group_id, ObjSecId section_id); + +void obj_finalize(ObjBuilder*); + +/* ---- read side (linker, file emitters, objdump) ---- */ +u32 obj_section_count(const ObjBuilder*); +const Section* obj_section_get (const ObjBuilder*, ObjSecId id); +u32 obj_reloc_count (const ObjBuilder*, ObjSecId section_id); +const Reloc* obj_relocs (const ObjBuilder*, ObjSecId section_id); +const ObjSym* obj_symbol_get (const ObjBuilder*, ObjSymId); +u32 obj_group_count (const ObjBuilder*); +const ObjGroup* obj_group_get (const ObjBuilder*, ObjGroupId id); + +/* Symbol iteration: ObjSymId is scoped to this builder, but callers should not + * assume dense contiguous ids or direct indexing. The builder may store symbols + * in segments internally; use the cursor. */ +typedef struct ObjSymIter ObjSymIter; +typedef struct ObjSymEntry { + ObjSymId id; + const ObjSym* sym; +} ObjSymEntry; +ObjSymIter* obj_symiter_new (const ObjBuilder*); +int obj_symiter_next(ObjSymIter*, ObjSymEntry* out); /* returns 0 at end */ +void obj_symiter_free(ObjSymIter*); + +/* ---- streaming output sink (for file emitters) ---- */ +typedef struct Writer Writer; +Writer* writer_file(const char* path); +Writer* writer_mem (Heap*); +void writer_write(Writer*, const void* data, size_t n); +void writer_seek (Writer*, u64 offset); +u64 writer_tell (Writer*); +int writer_error(Writer*); +void writer_close(Writer*); + +/* ---- file format emitters ---- */ +void emit_elf (Compiler*, ObjBuilder*, Writer*); +void emit_coff (Compiler*, ObjBuilder*, Writer*); +void emit_macho(Compiler*, ObjBuilder*, Writer*); +void emit_wasm (Compiler*, ObjBuilder*, Writer*); + +/* ---- file format readers (for ld and objdump) ---- */ +ObjBuilder* read_elf (Compiler*, const char* path); +ObjBuilder* read_coff (Compiler*, const char* path); +ObjBuilder* read_macho(Compiler*, const char* path); +ObjBuilder* read_wasm (Compiler*, const char* path); + +#endif diff --git a/src/opt/ir.h b/src/opt/ir.h @@ -0,0 +1,181 @@ +#ifndef CFREE_IR_H +#define CFREE_IR_H + +#include "../core/core.h" +#include "../core/arena.h" +#include "../arch/arch.h" +#include "../type/type.h" + +typedef u32 Val; +#define VAL_NONE 0u + +typedef enum IROp { + IR_NOP, + IR_CONST_I, IR_CONST_BYTES, + IR_PARAM, + IR_ALLOCA, + IR_LOAD, IR_STORE, + IR_AGG_COPY, IR_AGG_SET, + IR_BITFIELD_LOAD, IR_BITFIELD_STORE, + IR_IADD, IR_ISUB, IR_IMUL, + IR_SDIV, IR_UDIV, IR_SREM, IR_UREM, + IR_FADD, IR_FSUB, IR_FMUL, IR_FDIV, + IR_AND, IR_OR, IR_XOR, + IR_SHL, IR_ASHR, IR_LSHR, + IR_NEG, IR_BNOT, + IR_CMP_EQ, IR_CMP_NE, + IR_CMP_SLT, IR_CMP_SLE, IR_CMP_ULT, IR_CMP_ULE, + IR_CMP_FLT, IR_CMP_FLE, IR_CMP_FEQ, IR_CMP_FNE, + IR_SEXT, IR_ZEXT, IR_TRUNC, IR_BITCAST, + IR_SITOFP, IR_UITOFP, IR_FPTOSI, IR_FPTOUI, IR_FPEXT, IR_FPTRUNC, + IR_GEP, + IR_CALL, + IR_PHI, + IR_BR, IR_CONDBR, IR_RET, + IR_ATOMIC_LOAD, IR_ATOMIC_STORE, + IR_ATOMIC_RMW, /* extra.imm encodes (AtomicOp << 8) | MemOrder */ + IR_ATOMIC_CAS, /* extra.imm encodes (success << 8) | failure */ + IR_FENCE, /* extra.imm = MemOrder */ + IR_VA_START, IR_VA_ARG, IR_VA_END, IR_VA_COPY, + IR_SETJMP, /* returns-twice; opt treats as control barrier */ + IR_LONGJMP, /* terminator-like; control does not return */ + IR_ASM_BLOCK, /* opaque to most passes; preserves order, defines outs, clobbers */ +} IROp; + +typedef struct IRCallAux { + const Type* fn_type; + const ABIFuncInfo* abi; + ObjSymId direct_sym; /* OBJ_SYM_NONE for indirect */ + Val callee; /* VAL_NONE for direct_sym calls */ + u32 nargs; + Val* args; + u32 nresults; + Val* results; /* ABI return parts and multi-result builtins */ + CGABIValue ret_abi; +} IRCallAux; + +typedef struct IRFrameSlot { + FrameSlot id; + const Type* type; + Sym name; + SrcLoc loc; + u32 size; + u32 align; + u8 kind; /* FrameSlotKind */ + u8 pad; + u16 flags; /* FrameSlotFlag */ +} IRFrameSlot; + +typedef struct IRParam { + u32 index; + Sym name; + const Type* type; + FrameSlot slot; + const ABIArgInfo* abi; + SrcLoc loc; +} IRParam; + +typedef struct IRMemAux { + MemAccess mem; +} IRMemAux; + +typedef struct IRAggregateAux { + AggregateAccess access; +} IRAggregateAux; + +typedef struct IRBitFieldAux { + BitFieldAccess access; +} IRBitFieldAux; + +typedef struct IRGepAux { + const Type* base_type; + u32 nindices; + i64* indices; +} IRGepAux; + +typedef struct IRPhiAux { + u32 npreds; + u32* pred_blocks; + Val* pred_vals; +} IRPhiAux; + +typedef struct IRAsmAux { + const char* tmpl; + const AsmConstraint* outs; + const AsmConstraint* ins; + const Sym* clobbers; + u32 nout, nin, nclob; +} IRAsmAux; + +typedef struct IRCasAux { + MemAccess mem; + MemOrder success; + MemOrder failure; + Val prior; + Val ok; +} IRCasAux; + +typedef struct Inst { + u16 op; + u16 flags; /* per-op flags (e.g. nsw/nuw, volatile) */ + SrcLoc loc; /* set from CGTarget.set_loc when this insn was recorded */ + const Type* type; + Val def; /* this instruction's SSA value, or VAL_NONE */ + u32 ndefs; /* multi-result instructions use defs[0..ndefs) */ + Val* defs; /* arena-allocated; NULL when ndefs <= 1 */ + u32 nopnds; + Val* opnds; /* arena-allocated */ + union { + i64 imm; + ConstBytes cbytes; + struct { ObjSymId sym; } objsym; + MemAccess mem; + void* aux; /* one of IR*Aux, arena-owned and typed by op */ + } extra; +} Inst; + +typedef struct Block { + u32 id; + Inst* insts; + u32 ninsts, cap; + u32* preds; + u32 npreds; + u32 succ[2]; /* condbr: 2; br: 1; ret: 0 */ + u8 nsucc; +} Block; + +typedef struct Func { + /* IR storage. Lives until cgtarget_finalize so inter-procedural passes can + * read every Func in the TU. Per-pass scratch goes in Arena scratch, not + * here. */ + Arena* arena; + ObjSymId name; + const Type* type; + Block* blocks; + u32 nblocks, blocks_cap; + u32 entry; /* index of entry block */ + + IRFrameSlot* frame_slots; + u32 nframe_slots, frame_slots_cap; + IRParam* params; + u32 nparams, params_cap; + + /* Value table: for each Val, where it's defined and its type. */ + u32* val_def_block; + u32* val_def_inst; + const Type** val_type; + u32 nvals, vals_cap; +} Func; + +Func* ir_func_new(Arena*, ObjSymId, const Type* fn_type); +u32 ir_block_new(Func*); +Val ir_emit(Func*, u32 block, IROp, const Type* result, const Val* opnds, u32 n); +void ir_emit_multi(Func*, u32 block, IROp, const Type** results, Val* defs, + u32 ndefs, const Val* opnds, u32 nopnds); +Val ir_emit_const_i(Func*, u32 block, const Type*, i64); +Val ir_emit_const_bytes(Func*, u32 block, ConstBytes); +FrameSlot ir_frame_slot_new(Func*, const FrameSlotDesc*); +void ir_param_add(Func*, const CGParamDesc*); +void ir_set_terminator(Func*, u32 block, IROp, u32 succ_a, u32 succ_b, Val cond); + +#endif diff --git a/src/opt/opt.h b/src/opt/opt.h @@ -0,0 +1,78 @@ +#ifndef CFREE_OPT_H +#define CFREE_OPT_H + +#include "../arch/arch.h" +#include "ir.h" + +/* opt_cgtarget: a CGTarget wrapper that records each function as IR. + * + * - alloc_reg returns a fresh virtual reg per call (typed). The Reg space is + * unbounded for opt_cgtarget; free_reg is treated as a hint and ignored. + * - Every other emit-side call is recorded into the current block as one + * SSA Inst (with the current SrcLoc from set_loc). + * - On CGTarget.func_end it runs the intra-procedural pipeline (down through + * jump_opt) and stores the optimized Func in a per-TU set. + * - On CGTarget.finalize it runs inter-procedural passes (inlining + cleanup), + * then for each Func runs machinize → live → coalesce → RA → combine → + * DCE → prolog/epilog → translate, driving the wrapped target CGTarget. + * + * No machine code is in `obj` until the driver calls cgtarget_finalize. + * Drivers must call it before reading `obj` or invoking debug_emit. + * + * Owns `target` and frees it via cgtarget_free(target) on its own destroy. + * + * level: + * 0 — caller should not use opt_cgtarget at all (drive target directly). + * 1 — minimal: combine + DCE during lowering. No SSA passes. No inlining. + * 2 — full pipeline below. Inlining enabled. */ +CGTarget* opt_cgtarget_new(Compiler*, CGTarget* target, int level); + +/* ----- intra-procedural passes (run per Func at func_end on -O2) ----- */ +void opt_build_cfg (Func*); +void opt_block_cloning (Func*); +void opt_build_ssa (Func*); +void opt_addr_xform (Func*); +void opt_gvn (Func*); /* incl. constprop, redundant-load elim */ +void opt_copy_prop (Func*); /* incl. redundant-extension elim */ +void opt_dse (Func*); /* dead store elimination */ +void opt_ssa_dce (Func*); +void opt_licm (Func*); /* requires loop tree built */ +void opt_pressure_relief (Func*); +void opt_make_conventional_ssa(Func*); +void opt_ssa_combine (Func*); +void opt_undo_ssa (Func*); +void opt_jump_opt (Func*); + +/* ----- inter-procedural passes (run on the whole Func set at finalize) ----- */ +typedef struct FuncSet FuncSet; + +/* Walks the call graph bottom-up. For each caller, inlines callees that fit + * the size/heuristic budget, marks the caller dirty, and queues it for + * opt_cleanup. SCCs (mutual recursion) are skipped for v1. + * + * Iteration count is bounded by `max_iters` (driver knob `-finline-iters=N`, + * default 1; cap is enforced by opt_cgtarget). */ +void opt_inline(FuncSet*, int max_iters); + +/* Cheap re-run of the intra-procedural pipeline tailored to "what inlining + * exposes": constfold, copy_prop, gvn, ssa_dce, jump_opt, licm if the + * function has loops, addr_xform if any GEP-equivalents reach uses. Run on + * each Func that opt_inline marked dirty. */ +void opt_cleanup(Func*); + +/* ----- lowering / backend prep (per Func, run before driving target CGTarget) ----- */ +/* Machine-dependent ABI lowering, 2-op insns, etc. Implemented per-arch and + * per-OS, so it takes the full Target. */ +void opt_machinize (Func*, Target); +void opt_live_info (Func*); +void opt_coalesce (Func*); +void opt_regalloc (Func*, int allow_live_range_split); +void opt_combine (Func*); /* code selection: merge dependent insns */ +void opt_dce (Func*); /* post-RA DCE */ + +/* Walks the lowered IR and drives a target CGTarget to emit machine code into its + * ObjBuilder. Inserts prolog/epilog. Splits long insns where the target needs. + * Stamps each emitted insn's SrcLoc onto target via CGTarget.set_loc. */ +void opt_emit (Compiler*, Func*, CGTarget* target); + +#endif diff --git a/src/parse/parse.h b/src/parse/parse.h @@ -0,0 +1,16 @@ +#ifndef CFREE_PARSE_H +#define CFREE_PARSE_H + +#include "../pp/pp.h" +#include "../decl/decl.h" +#include "../cg/cg.h" +#include "../arch/arch.h" + +/* C11 frontend. Reads tokens from `pp`, records C declarations in DeclTable, + * and drives `cg` for executable code. */ +void parse_c(Compiler*, Pp*, DeclTable*, CG*); + +/* Standalone assembler. Reads tokens directly from a Lexer; emits via MCEmitter. */ +void parse_asm(Compiler*, Lexer*, MCEmitter*); + +#endif diff --git a/src/pp/pp.h b/src/pp/pp.h @@ -0,0 +1,23 @@ +#ifndef CFREE_PP_H +#define CFREE_PP_H + +#include "../lex/lex.h" + +typedef struct Pp Pp; + +Pp* pp_new(Compiler*); +void pp_free(Pp*); + +void pp_add_include_dir(Pp*, const char* dir, int system); +void pp_define(Pp*, const char* name, const char* body); /* -D */ +void pp_undef(Pp*, const char* name); /* -U */ + +void pp_push_input(Pp*, Lexer*); +void pp_add_include_edge(Pp*, u32 includer_file_id, u32 included_file_id, + SrcLoc include_loc, int system); + +/* Streaming. Yields preprocessed tokens (macro-expanded, directives consumed). */ +Tok pp_next(Pp*); +const LitInfo* pp_lit(const Pp*, LitId); + +#endif diff --git a/src/type/type.h b/src/type/type.h @@ -0,0 +1,115 @@ +#ifndef CFREE_TYPE_H +#define CFREE_TYPE_H + +#include "../core/core.h" +#include "../core/pool.h" + +typedef enum TypeKind { + TY_VOID, + TY_BOOL, + TY_CHAR, TY_SCHAR, TY_UCHAR, + TY_SHORT, TY_USHORT, + TY_INT, TY_UINT, + TY_LONG, TY_ULONG, + TY_LLONG, TY_ULLONG, + TY_FLOAT, TY_DOUBLE, TY_LDOUBLE, + TY_PTR, + TY_ARRAY, + TY_FUNC, + TY_STRUCT, + TY_UNION, + TY_ENUM, +} TypeKind; + +/* C tag identity is scoped declaration identity, not the spelling. `Sym tag` + * remains the diagnostic/debug name; TagId prevents two scoped `struct S` + * declarations from collapsing under global Type interning. */ +typedef u32 TagId; +#define TAG_NONE 0u + +typedef enum TagDeclKind { + TAG_STRUCT, + TAG_UNION, + TAG_ENUM, +} TagDeclKind; + +typedef struct TagDecl { + TagId id; + Sym spelling; + SrcLoc loc; + u8 kind; /* TagDeclKind */ + u8 complete; + u16 pad; +} TagDecl; + +typedef enum TypeQual { + Q_CONST = 1u << 0, + Q_VOLATILE = 1u << 1, + Q_RESTRICT = 1u << 2, + Q_ATOMIC = 1u << 3, +} TypeQual; + +typedef enum FieldFlag { + FIELD_NONE = 0, + FIELD_BITFIELD = 1u << 0, + FIELD_ZERO_WIDTH = 1u << 1, + FIELD_ANON = 1u << 2, + FIELD_FLEXIBLE_ARRAY = 1u << 3, +} FieldFlag; + +typedef struct Field { + Sym name; + const Type* type; + u16 bitfield_width; /* valid when FIELD_BITFIELD is set; may be 0 */ + u16 flags; /* FieldFlag */ +} Field; + +struct Type { + u16 kind; + u16 qual; + union { + struct { const Type* pointee; } ptr; + struct { const Type* elem; u32 count; u8 incomplete; } arr; + struct { + const Type* ret; + const Type** params; + u16 nparams; + u8 variadic; + } fn; + struct { + TagId tag_id; + Sym tag; + const Field* fields; + u16 nfields; + u8 incomplete; + } rec; /* struct / union */ + struct { TagId tag_id; Sym tag; const Type* base; } enm; + }; +}; + +const Type* type_void(Pool*); +const Type* type_prim(Pool*, TypeKind); +const Type* type_ptr(Pool*, const Type*); +const Type* type_array(Pool*, const Type* elem, u32 count, int incomplete); +const Type* type_func(Pool*, const Type* ret, const Type** params, u16 n, int variadic); +const Type* type_qualified(Pool*, const Type*, u16 qual); + +/* Aggregate construction is mutable only through TypeRecordBuilder. The + * committed Type is immutable and interned; field offsets, record + * size/alignment, and bitfield storage are target ABI facts. */ +typedef struct TypeRecordBuilder TypeRecordBuilder; +TagId type_tag_new(Pool*, TagDeclKind, Sym spelling, SrcLoc); +const TagDecl* type_tag_get(Pool*, TagId); +TypeRecordBuilder* type_record_begin(Pool*, TypeKind kind, TagId, Sym tag); /* TY_STRUCT or TY_UNION */ +void type_record_field(TypeRecordBuilder*, Field); +const Type* type_record_end(Pool*, TypeRecordBuilder*); +const Type* type_enum(Pool*, TagId, Sym tag, const Type* base); + +const Type* type_unqual(Pool*, const Type*); +const Type* type_promoted(Pool*, const Type*); +int type_compatible(const Type*, const Type*); +int type_is_arith(const Type*); +int type_is_int(const Type*); +int type_is_ptr(const Type*); + +#endif diff --git a/test/smoke.c b/test/smoke.c @@ -34,7 +34,7 @@ #include <stdarg.h> #include <stdatomic.h> #include <stdbool.h> -#include <stdcoro.h> +#include <cfree/coro.h> #include <stddef.h> #include <stdint.h> #include <stdnoreturn.h> @@ -145,7 +145,7 @@ static int cfree_setjmp_compiles(int x) { return 0; } -/* stdcoro: coro_ctx and coro_t storage exists; the asymmetric API +/* cfree/coro: coro_ctx and coro_t storage exists; the asymmetric API surface compiles and resolves. Compile-only -- smoke.c never links against a libcfree_rt. */ _Static_assert(sizeof(coro_ctx) >= 64, "coro_ctx room for regs");