kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 6e1392b32a02086c654ad381990a08cd3205f96f
parent dd8f6b240bea8e165edc47804ba5177f0938abbd
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Sat,  9 May 2026 10:53:50 -0700

emu: implement cfree_emu_run/new/step/lookup/free with stubbed deps

Lifts the emu surface from panic-stubs into a real translate/dispatch
loop in src/emu/. cfree_emu_lookup runs the full cold-miss pipeline
(decode -> lift -> CG -> MC -> link_resolve_extend -> commit RX -> cache),
cfree_emu_step runs translated blocks under a panic boundary, and
cfree_emu_run is the one-shot wrapper. Per-ISA decode/lift, CPUState
type synthesis, and ELF loading are staged as their own files behind
emu.h and stubbed for now; the runtime's code cache and reserved-VA
region are real. doc/EMU.md (already drafted in-tree) documents the
design this implementation follows.

Adds link_resolve_at / link_resolve_extend to link/ as the single
linker extension the per-block JIT needs (doc/EMU.md §6); both panic
until the incremental layout pass lands.

Build fixes carried alongside:
- driver/objdump.c: handle CFREE_SK_NOTYPE in sym_kind_char (-Werror).
- src/emu/cpu.c: emu_cpu_type / emu_block_fn_type return NULL until
  the type subsystem ships type_void / type_func.

Diffstat:
Adoc/EMU.md | 355+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdriver/objdump.c | 1+
Msrc/api/stubs.c | 19++++---------------
Asrc/emu/cpu.c | 89+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/emu/decode.c | 25+++++++++++++++++++++++++
Asrc/emu/elf_load.c | 47+++++++++++++++++++++++++++++++++++++++++++++++
Asrc/emu/emu.c | 371+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/emu/emu.h | 191+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/emu/lift.c | 19+++++++++++++++++++
Asrc/emu/runtime.c | 317+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/link/link.c | 28++++++++++++++++++++++++++++
Msrc/link/link.h | 15+++++++++++++++
12 files changed, 1462 insertions(+), 15 deletions(-)

diff --git a/doc/EMU.md b/doc/EMU.md @@ -0,0 +1,355 @@ +# cfree emu design + +Architecture of `cfree emu`, the guest-ISA emulator. Companion to +`DESIGN.md`. Scope: how the emulator slots into the existing pipeline and +what its contracts are. Not a tutorial; not implementation notes. + +## 1. Goals + +- `emu` multi-call subcommand: load and execute a guest ELF on the host, + user-mode only (Linux/macOS userland; no full-system emulation). +- Targets v1: aarch64, riscv64. 32-bit variants follow each 64-bit lift. + x86_64 deferred (flag-heavy ISA — exercise lazy flags on simpler arches + first). +- Per-basic-block JIT translation: decode guest bytes → lift to `CG` → opt + (optional) → MCEmitter → ObjBuilder → incremental link into a single + growing `LinkImage` → execute. +- Block chaining for hot paths; cold blocks may run direct-from-CG (no opt + wrapper) for translation throughput. +- Source-level stepping through `dbg` when the guest ELF carries DWARF. +- Self-hosting: `src/emu/` is C11 freestanding like the rest of `src/`. + +The lifter is a sibling to `parse_c`: both are frontends that consume input +bytes and drive `CG`. Everything below `CG` (`opt`, `arch`, `obj`) is +unchanged. `link/` requires one extension — incremental resolve +(`link_resolve_extend`, §6) — landed *before* emu work begins, so emu has a +single lifecycle model end-to-end and never carries a "per-block fresh +`LinkImage`" interim shape. + +## 2. Non-goals (v1) + +- Full-system emulation (privileged ISA, MMU, devices). +- SIMD/vector ISA extensions (SSE/AVX/NEON/RVV) — blocked on `CGTarget` + lacking vector ops (§DESIGN 5.7). Programs using them either trap to a + scalarizer or fail to lift. +- x86 in v1. +- Self-modifying code (refuse to lift on observed write to a translated + page; full support is future work). +- Per-instruction precise exceptions / signal redirection. +- Foreign-OS syscalls — only the host OS's syscalls are forwarded. + +## 3. Layout + +``` +src/ + emu/ + emu.h driver-facing API + decode/ per-ISA structured decoder (shared tables with objdump) + aarch64.c + riscv64.c + lift/ per-ISA lifter; drives CG + aarch64.c + riscv64.c + cpu.h per-arch CPUState struct synthesis + runtime.c dispatcher, code cache, block chaining + syscall/ per-host-OS syscall forwarders (linux.c, darwin.c) +test/ + emu/ guest binary corpus + behavioral oracles +``` + +`src/emu/` is a sibling to `src/parse/`. The runtime helpers live in the +cfree tool itself — the JIT's `LinkExternResolver` (§DESIGN 5.5.1) returns +the host address of `emu_load64`, `emu_syscall`, etc. directly; no +separate runtime object. + +## 4. Dataflow + +``` +guest.elf (bytes) ─► obj reader ─► LinkImage* (mapped into guest AS) + │ + guest_pc ─► Decoder ─► EmuInst* ─► Lifter ─► CG ─┐ + │ + CGTarget ◄─────┘ + │ + (opt?) + │ + MCEmitter ─► ObjBuilder + │ + link_jit ─► host code + │ + dispatcher(guest_pc) ──┘ +``` + +1. **Load.** `obj/` readers parse the guest ELF. The runtime maps loadable + segments into a *guest address space* (an mmap'd region inside the + host process); guest virtual addresses are the addresses inside that + region. +2. **Decode.** When the dispatcher hits an untranslated `guest_pc`, the + per-ISA decoder reads guest bytes and produces `EmuInst`s up to the + next basic-block terminator. Decode tables are shared with the + disassembler (objdump): same bit patterns, two output shapes. +3. **Lift.** The per-ISA lifter walks the `EmuInst` stream and emits one + synthesized C function per guest basic block: signature + `next_pc_t block(CPUState*)`. Lifter calls `cg_*` ops only. +4. **Codegen + JIT.** Standard cfree pipeline. At -O0 the emu drives a + target `CGTarget` directly (fast translation, slow execution); at -O2 + it wraps with `opt_cgtarget` (slow translation, fast execution). Both + end at `link_jit` mapping executable pages. +5. **Execute.** `runtime.c` calls the host code with the current + `CPUState*`. The block returns the next guest PC. The dispatcher looks + up the next block, translating on miss. + +## 5. Key interfaces + +### 5.1 `Decoder` (`src/emu/decode/decode.h`) + +Per-ISA structured decoder. Output is `EmuInst` — a tagged union of +ISA-specific shapes with operand fields, *not* text. + +```c +typedef struct EmuInst { + EmuOp op; /* per-ISA enum */ + u64 guest_pc; + u32 guest_bytes; /* instruction width */ + EmuOperand operands[EMU_MAX_OPERANDS]; + u32 nop; + u32 flags; /* TERMINATOR | MEM | SETS_FLAGS | ... */ +} EmuInst; + +u32 emu_decode_block(EmuArch, const u8* bytes, u64 guest_pc, + EmuInst* out, u32 max); +``` + +The same decode tables back the disassembler (textual format) and the +lifter (structured). One source of truth per ISA. + +### 5.2 `Lifter` (`src/emu/lift/lift.h`) + +Per-ISA lifter. Consumes `EmuInst*`, drives `CG*`, produces one CG function +per guest basic block. + +```c +void emu_lift_block(EmuArch, CG* cg, + const EmuInst* insts, u32 n, + EmuLiftCtx* ctx); +``` + +`EmuLiftCtx` carries: the CPUState `Type*`, the synthesized block function +type (`next_pc_t (*)(CPUState*)`), the `ObjSymId` for the block, runtime +helper symbols (memory load/store, syscall trampoline, dispatcher +tail-call), and per-block lazy-flag state (§5.5). + +The lifter targets `CG` exclusively (`src/cg/cg.h`) — never `CGTarget` +directly. It uses roughly this subset: + +- `cg_push_global(cpu_state_sym)`, `cg_push_int`, `cg_push_const` +- `cg_load`, `cg_store`, plus a small `lift_field(ctx, off, T)` helper + layered on `cg_addr` + offset arithmetic +- `cg_binop`, `cg_unop`, `cg_cmp`, `cg_convert` +- `cg_atomic_*` for guest atomics +- `cg_label_new` / `cg_label_place` / `cg_jump` / `cg_branch_true` +- `cg_call` for runtime helpers (memory access, syscalls, dispatcher + tail-calls — `cg_call` materializes ABI parts from `fn_type`, which is + the main reason CG and not CGTarget) +- `cg_set_loc` carrying the *guest* PC encoded as a `SrcLoc` against a + synthetic `SourceManager` file id (§DESIGN 5.0) + +It does not use: aggregates, bitfields, variadics, setjmp, structured +scopes, inline asm. Those C-shaped surfaces remain available to the C +front-end at zero cost. + +### 5.3 `CPUState` (`src/emu/cpu.h`) + +Per-arch C struct synthesized once per emu invocation as an interned +`Type*`. Fields: + +- General-purpose register file (`u64 x[N]`). +- Lazy-flag fields (§5.5): last op kind, last operands, materialized + flags cache. +- Pointer to the guest memory base (host pointer). +- Pointer to the dispatcher entry / code-cache lookup function. +- Trap reason / exit code slots written before returning to the + dispatcher. + +Lifters reference fields by stable offset constants generated alongside +the `Type*`. The runtime allocates one `CPUState` per guest thread and +exposes its address as an `ObjSymId` resolved externally by the JIT +linker. + +### 5.4 `Runtime` (`src/emu/runtime.h`) + +In-process runtime, linked into the cfree binary, callable from JITted +guest blocks via the JIT's external resolver. Responsibilities: + +- **Dispatcher.** `emu_run(CPUState*)`: loop { lookup guest_pc → + translate-if-cold → call block }, exits on trap. +- **Code cache.** `guest_pc → host entry` map; translation happens on + miss. Eviction deferred (cache grows unbounded in v1). +- **Reserved code region.** One up-front `PROT_NONE` mmap (~128 MB) + whose base address feeds `link_resolve_at` (§6). Per-block + `link_resolve_extend` bump-allocates within it; the runtime commits + pages and `mprotect`s RX as new blocks land. +- **Block chaining.** When a block's terminator targets an + already-translated block, patch the tail to jump directly, bypassing + the dispatcher. Patching is a runtime concern — CG/opt see only the + pre-patch tail-call. +- **Memory helpers.** `emu_load{8,16,32,64}` / `emu_store_*`: bounds-check + the guest address against the mapped guest AS, trap on miss. Lifter + emits a `cg_call` to these for every guest memory op in v1; an inline + fastpath is a follow-up (§9). +- **Syscall trampoline.** `emu_syscall(CPUState*)` reads the guest + syscall number/args from CPUState, forwards via the per-OS table in + `src/emu/syscall/`, writes the return into the guest return register. + +### 5.5 Flag policy — lazy flags + +Most ISAs (aarch64 NZCV, x86 EFLAGS) compute condition flags as a side +effect of arithmetic. CG/CGTarget have no flag primitives. The lifter +implements lazy flags entirely above CG: + +- Each flag-setting guest op writes (op_kind, lhs, rhs) into CPUState + fields. Flags are *not* recomputed eagerly. +- Each flag-reading guest op (conditional branch, `cset`, …) recomputes + the specific flag bit it needs from the recorded inputs. +- opt's GVN/DCE eliminates redundant flag computations within a block; + cross-block redundancy is recovered after inlining via block chaining. + +No CGTarget extension. Adding an ISA's flag set is a per-arch lifter +table, not a pipeline change. + +### 5.6 Memory model — guest address space + +Guest loads/stores all go through a base pointer of unknown C provenance, +so CG's default `MemAccess.alias` derivation (§DESIGN 5.6) collapses to +`ALIAS_UNKNOWN`. To recover useful aliasing, the lifter sets +`MemAccess.addr_space = EMU_GUEST_AS` on every guest memory op. opt's +alias rules treat guest-AS accesses as: + +- May alias each other. +- Do not alias C memory (CPUState fields, runtime state). + +This is a one-field convention on the existing `MemAccess` shape; no API +change. CPUState field accesses use the default `addr_space = 0` and +remain promotable / GVN'able as ordinary C memory. + +## 6. Lifecycle — per-block JIT on a single growing LinkImage + +The compiler pipeline is TU-shaped (§DESIGN 5.5.1, 9.1). The emu wants +per-block translate-on-demand. The model is a *single* `LinkImage` that +grows as cold blocks land — never a fresh image per block — so chaining, +the code cache, and the host VA region all reference one stable artifact +across the emu session. + +This requires `link/` to expose incremental resolution before any emu code +lands: + +```c +LinkImage* link_resolve_at(Linker*, uintptr_t base_va); /* first call */ +void link_resolve_extend(Linker*, LinkImage*); /* later calls */ +``` + +`link_resolve_at` reserves layout starting at a caller-specified base VA +(the runtime hands out from a pre-reserved `PROT_NONE` region, typically +~128 MB). `link_resolve_extend` appends new inputs: it places new sections +at the next free offset within the reserved region, resolves new symbols +against the existing image's symbol table plus the `LinkExternResolver`, +and applies new relocations into the live image. **It must not change +host addresses of previously placed sections** — chaining has already +patched live host code with those addresses. §DESIGN 5.5.1's existing +discipline (stable `LinkInputId`s, separable `LinkRelocApply`, +non-destructive resolution) is exactly what makes this safe. + +The per-block flow is then: + +1. Decode guest bytes for the block. +2. Lift into a fresh `ObjBuilder` containing one function. +3. `link_add_obj` against the session's `Linker`. +4. `link_resolve_extend` to place the new section in the reserved VA + region, resolve symbols (helpers via the resolver; cross-block + references always resolve to the dispatcher — see below), apply + relocations. +5. Commit the newly used pages and `mprotect` to RX. +6. Insert `guest_pc → host_entry` into the code cache. + +Cross-block calls during step 4 always resolve to the dispatcher, even if +the target block is already translated. The linker never learns about +guest PCs or sibling blocks. Cross-block direct jumps are installed +*later* by chaining, which is a runtime mprotect-and-patch operation +outside the linker entirely. Keeping these two mechanisms separate avoids +the duplication that "let the linker resolve sibling blocks" would +introduce. + +What this model deliberately does *not* support in v1: + +- **Symbol removal / re-resolution.** Code-cache eviction would need it; + v1 lets the cache grow unbounded (§5.4) so the question is moot. When + eviction lands it pairs with a runtime-side "invalidate chain patches + pointing into the evicted block" side-table, not a linker mutation. +- **Address relocation of previously placed sections.** The reserved-VA + bump-allocator never compacts; the chaining invariant depends on this. + +## 7. Driver API + +Mirrors §DESIGN 13. + +```c +typedef struct CfreeEmuOptions { + EmuArch guest_arch; + const u8* guest_elf_bytes; + size_t guest_elf_len; + u32 optimize; /* 0 = direct CGTarget, 2 = opt_cgtarget */ + EmuTraceFlags trace; /* PC trace, decoded-inst trace */ + /* argv, envp, fd map come through CfreeEnv */ +} CfreeEmuOptions; + +int cfree_emu_run (Compiler*, CfreeEmuOptions, int* out_exit_code); + +/* Lower-level surface for dbg integration. */ +typedef struct CfreeEmu CfreeEmu; +CfreeEmu* cfree_emu_new (Compiler*, CfreeEmuOptions); +int cfree_emu_step (CfreeEmu*, u32 nblocks); +void* cfree_emu_lookup(CfreeEmu*, u64 guest_pc); /* translate-if-cold */ +void cfree_emu_free (CfreeEmu*); +``` + +Path-shaped helpers (`cfree emu prog.elf`) live in the driver layer and +read bytes via `c->env->file_io->read_all`. The freestanding core never +takes paths. + +## 8. Debug info + +When the guest ELF carries DWARF, source-level stepping comes for free: + +- A guest-DWARF reader (extension of `src/debug/`) maps guest PC → + guest source line. +- `dbg` interrogates `cfree_emu_lookup` plus the guest DWARF for + per-line breakpoints and stepping. +- Lifter feeds `cg_set_loc` a `SrcLoc` whose `file_id` is a synthetic + `SourceManager` file representing the guest binary, with line numbers + encoding guest PC. opt's `Inst.loc` and host-side DWARF then point + back at guest PCs, useful for `objdump` of JITted host code. + +No new debug-info pipeline. + +## 9. Open questions + +- **CPUState promotion.** CPUState is passed by pointer, so its fields + are address-taken from CG's view and `build_ssa` won't promote them + (§DESIGN 12). Hot guest registers will spill/reload across every + arithmetic op, which kills perf. Likely fix: a per-block convention + that imports CPUState fields into virtuals at entry and exports at + exit, treating mid-block accesses as the SSA values. This is the + single largest perf question and worth prototyping before the lifter + interface freezes. +- **Inline memory fastpath.** A `cg_call` per guest memory op is a real + function call. Inlining a bounds check + direct host load is a known + emu speedup; needs either a cross-runtime/JIT inliner or a CG-level + helper. Defer until measured. +- **Vector ISA support.** Blocked by CGTarget lacking vector ops + (§DESIGN 5.7). Lands the same day vectors land for the C front-end. +- **x86 flag policy.** EFLAGS has more bits and more cross-instruction + dependencies than aarch64 NZCV. Lazy flags work but the recorded + payload is larger; verify on aarch64 before committing to x86. +- **SMC detection.** v1 refuses to lift on observed write to a + translated page (write-protect translated guest pages, trap writes). + Full SMC support — invalidate-and-retranslate — is future work. diff --git a/driver/objdump.c b/driver/objdump.c @@ -72,6 +72,7 @@ static char sym_kind_char(CfreeSymKind k) case CFREE_SK_ABS: return 'A'; case CFREE_SK_COMMON: return 'C'; case CFREE_SK_UNDEF: return 'U'; + case CFREE_SK_NOTYPE: return 'n'; } return ' '; } diff --git a/src/api/stubs.c b/src/api/stubs.c @@ -216,18 +216,7 @@ int cfree_dwarf_param_iter_next(CfreeDwarfParamIter* it, CfreeD { (void)it; (void)o; return 0; } void cfree_dwarf_param_iter_free(CfreeDwarfParamIter* it) { (void)it; } -/* ============================================================ - * Emulator (cfree emu) - * ============================================================ */ -struct CfreeEmu { int _; }; - -int cfree_emu_run(CfreeCompiler* c, const CfreeEmuOptions* opts, int* out_exit_code) -{ - (void)c; (void)opts; - if (out_exit_code) *out_exit_code = 0; - return 1; -} -CfreeEmu* cfree_emu_new (CfreeCompiler* c, const CfreeEmuOptions* o) { (void)c; (void)o; return 0; } -int cfree_emu_step (CfreeEmu* e, uint32_t n) { (void)e; (void)n; return 1; } -void* cfree_emu_lookup(CfreeEmu* e, uint64_t pc) { (void)e; (void)pc; return 0; } -void cfree_emu_free (CfreeEmu* e) { (void)e; } +/* Emulator (cfree emu) lives under src/emu/ — cfree_emu_run / new / + * step / lookup / free are real implementations there, with the + * per-ISA decode/lift, CPUState, and runtime helper layers stubbed + * one level down. */ diff --git a/src/emu/cpu.c b/src/emu/cpu.c @@ -0,0 +1,89 @@ +/* CPUState: per-thread guest register/lazy-flag/memory-base record, + * synthesized once per emu invocation as an interned C `Type*`. The + * lifter references fields through a stable offset table generated + * alongside the type; the runtime owns the storage and exposes its + * address to the JIT linker via the extern resolver (EMU_SYM_CPU_STATE). + * + * Per-arch fields land with the per-ISA lifter. v1 stub keeps the + * lifecycle real (alloc, free, PC/SP getters, trap reason) so emu.c + * does not need to know anything about per-arch register files. */ + +#include "emu/emu.h" + +#include "core/heap.h" + +#include <cfree.h> + +#include <string.h> + +struct EmuCPUState { + Compiler* c; + CfreeEmuArch arch; + u64 pc; + u64 sp; + EmuTrapReason trap; + int exit_code; + /* Per-arch register / lazy-flag fields land alongside the synthesized + * Type*; the runtime helpers (emu_mem_*, emu_syscall) reach them + * through the canonical offsets. */ +}; + +EmuCPUState* emu_cpu_new(Compiler* c, CfreeEmuArch arch, + u64 initial_pc, u64 initial_sp) +{ + Heap* h; + EmuCPUState* s; + if (!c) return NULL; + h = (Heap*)c->env->heap; + s = (EmuCPUState*)h->alloc(h, sizeof(*s), _Alignof(EmuCPUState)); + if (!s) return NULL; + memset(s, 0, sizeof(*s)); + s->c = c; + s->arch = arch; + s->pc = initial_pc; + s->sp = initial_sp; + s->trap = EMU_TRAP_NONE; + return s; +} + +void emu_cpu_free(EmuCPUState* s) +{ + Heap* h; + if (!s) return; + h = (Heap*)s->c->env->heap; + h->free(h, s, sizeof(*s)); +} + +u64 emu_cpu_pc(const EmuCPUState* s) { return s ? s->pc : 0; } + +void emu_cpu_set_pc(EmuCPUState* s, u64 pc) +{ + if (s) s->pc = pc; +} + +EmuTrapReason emu_cpu_trap_reason(const EmuCPUState* s) +{ + return s ? s->trap : EMU_TRAP_NONE; +} + +int emu_cpu_exit_code(const EmuCPUState* s) +{ + return s ? s->exit_code : 0; +} + +const Type* emu_cpu_type(Compiler* c, CfreeEmuArch arch) +{ + /* Per-arch struct layout lands with the per-ISA lifter. The lifter + * is a stub for now; translate_block panics before any consumer + * dereferences this, so a NULL placeholder is safe. */ + (void)c; (void)arch; + return NULL; +} + +const Type* emu_block_fn_type(Compiler* c, CfreeEmuArch arch) +{ + /* Block ABI: u64 entry(EmuCPUState*). Materialized once the type + * subsystem and per-arch CPUState type land together. */ + (void)c; (void)arch; + return NULL; +} diff --git a/src/emu/decode.c b/src/emu/decode.c @@ -0,0 +1,25 @@ +/* Per-ISA structured decoder. The lifter (src/emu/lift.c) walks the + * EmuInst stream produced here; the same decode tables back the + * disassembler (textual format) so there's one source of truth per + * ISA. v1 targets aarch64 and riscv64; backends land separately. */ + +#include "emu/emu.h" + +#include "core/core.h" + +#include <cfree.h> + +u32 emu_decode_block(CfreeEmuArch arch, const u8* bytes, u64 guest_pc, + EmuInst* out, u32 max) +{ + /* Per-ISA decode tables not yet landed. Returning 0 routes the + * caller through translate_block's failure path, which surfaces + * a "failed to translate block" panic with the offending PC. */ + (void)arch; (void)bytes; (void)guest_pc; (void)out; (void)max; + return 0; +} + +void emu_trace_insn(Compiler* c, u64 guest_pc, const EmuInst* insn) +{ + (void)c; (void)guest_pc; (void)insn; +} diff --git a/src/emu/elf_load.c b/src/emu/elf_load.c @@ -0,0 +1,47 @@ +/* Guest ELF loader: parses the ELF via the existing obj reader + * (read_elf in src/obj/elf_read.c), maps a guest address space, + * places loadable sections, and pushes argv/envp/auxv onto the + * guest stack at initial_sp. + * + * The reader gives us sections + symbols; the loader walks the + * SF_ALLOC sections, mmaps a contiguous host range covering the + * guest VA span, and copies the section bytes in. The entry PC + * resolves through the symbol named by the ELF e_entry header + * (typically `_start`). v1 executes statically-linked guest ELFs + * — dynamic-loader work is deferred (see doc/EMU.md §2). */ + +#include "emu/emu.h" + +#include "core/heap.h" +#include "obj/obj.h" + +#include <cfree.h> + +#include <string.h> + +int emu_load_elf(Compiler* c, CfreeEmuArch arch, + const u8* bytes, size_t len, + const char* const* argv, const char* const* envp, + EmuLoadedImage* out) +{ + /* Per the design: parse via read_elf (an ELF -> ObjBuilder + * reader that already exists), walk allocatable sections to + * compute the guest VA span, mmap the guest AS, copy section + * bytes into the AS, lay out argv/envp/auxv at the top of the + * stack, and emit entry_pc / initial_sp. + * + * Stub returns nonzero so cfree_emu_new short-circuits before + * any consumer touches an uninitialized EmuLoadedImage. */ + (void)c; (void)arch; (void)bytes; (void)len; + (void)argv; (void)envp; + if (out) memset(out, 0, sizeof(*out)); + return 1; +} + +void emu_unload_image(Compiler* c, EmuLoadedImage* img) +{ + (void)c; + if (!img) return; + /* munmap the guest AS region once the loader is real. */ + memset(img, 0, sizeof(*img)); +} diff --git a/src/emu/emu.c b/src/emu/emu.c @@ -0,0 +1,371 @@ +/* libcfree's guest-ISA emulator: load a guest ELF, translate one + * basic block at a time into host code via the existing CG/MC/link + * pipeline, dispatch through a code cache. See doc/EMU.md for design + * and §6 for the incremental-link discipline. + * + * This file owns CfreeEmu lifecycle and the translate/dispatch loop. + * Per-ISA decoders/lifters, CPUState synthesis, the code cache and + * reserved-VA region, and the runtime helper trampolines each live + * behind APIs declared in src/emu/emu.h. */ + +#include "emu/emu.h" + +#include "arch/arch.h" +#include "cg/cg.h" +#include "core/heap.h" +#include "core/pool.h" +#include "link/link.h" +#include "obj/obj.h" +#include "opt/opt.h" + +#include <cfree.h> + +#include <setjmp.h> +#include <string.h> + +/* ---- Lifecycle ---- */ + +struct CfreeEmu { + Compiler* c; + CfreeEmuArch guest_arch; + int opt_level; + CfreeEmuTraceFlags trace; + + EmuLoadedImage guest; + EmuCPUState* cpu; + + Linker* linker; + LinkImage* image; + EmuCodeRegion* code_region; + EmuCodeCache* cache; + + int done; + int exit_code; +}; + +static SrcLoc no_loc(void) +{ + SrcLoc l; + l.file_id = 0; + l.line = 0; + l.col = 0; + return l; +} + +static int arch_supported(CfreeEmuArch a) +{ + return a == CFREE_EMU_ARCH_AARCH64 || a == CFREE_EMU_ARCH_RISCV64; +} + +/* The block function call ABI: u64 entry(EmuCPUState*). Cast through + * a typedef so the call site reads cleanly in the dispatcher. */ +typedef u64 (*EmuBlockFn)(EmuCPUState*); + +CfreeEmu* cfree_emu_new(CfreeCompiler* c, const CfreeEmuOptions* opts) +{ + PanicSave saved; + Heap* heap; + CfreeEmu* e; + + if (!c || !opts || !opts->guest_elf_bytes || opts->guest_elf_len == 0) + return NULL; + if (!arch_supported(opts->guest_arch)) return NULL; + + compiler_panic_save(c, &saved); + if (setjmp(c->panic)) { + compiler_run_cleanups(c); + compiler_panic_restore(c, &saved); + return NULL; + } + + heap = (Heap*)c->env->heap; + e = (CfreeEmu*)heap->alloc(heap, sizeof(*e), _Alignof(CfreeEmu)); + if (!e) compiler_panic(c, no_loc(), "emu: out of memory"); + memset(e, 0, sizeof(*e)); + e->c = c; + e->guest_arch = opts->guest_arch; + e->opt_level = opts->optimize; + e->trace = opts->trace; + + /* 1. Load the guest ELF: mmap a guest AS and place PT_LOAD segments, + * push argv/envp/auxv onto the guest stack. */ + if (emu_load_elf(c, opts->guest_arch, + opts->guest_elf_bytes, opts->guest_elf_len, + opts->argv, opts->envp, &e->guest) != 0) { + compiler_panic(c, no_loc(), "emu: failed to load guest ELF"); + } + + /* 2. Allocate per-thread CPU state and seed PC/SP. */ + e->cpu = emu_cpu_new(c, opts->guest_arch, + e->guest.entry_pc, e->guest.initial_sp); + + /* 3. Reserve a fixed-VA code region for translated host blocks. */ + e->code_region = emu_code_region_new(c, EMU_CODE_REGION_SIZE); + + /* 4. Stand up the session linker. The extern resolver maps each + * EMU_SYM_* helper name to the host address of its trampoline / + * the running CfreeEmu's CPU state. */ + e->linker = link_new(c); + if (!e->linker) compiler_panic(c, no_loc(), "emu: link_new failed"); + link_set_extern_resolver(e->linker, emu_runtime_extern_resolver, e); + + /* 5. Seed the initial empty image at the code region's base VA. + * Subsequent cold blocks land via link_resolve_extend, which + * must keep already-placed sections at stable host addresses + * (block chaining patches them). */ + e->image = link_resolve_at(e->linker, + emu_code_region_base(e->code_region)); + if (!e->image) compiler_panic(c, no_loc(), + "emu: link_resolve_at failed"); + + /* 6. Code cache: guest_pc -> host entry. Grows unbounded in v1. */ + e->cache = emu_cache_new(c); + + compiler_panic_restore(c, &saved); + return e; +} + +void cfree_emu_free(CfreeEmu* e) +{ + Heap* heap; + if (!e) return; + heap = (Heap*)e->c->env->heap; + + if (e->cache) emu_cache_free(e->cache); + if (e->image) link_image_free(e->image); + if (e->linker) link_free(e->linker); + if (e->code_region) emu_code_region_free(e->code_region); + if (e->cpu) emu_cpu_free(e->cpu); + emu_unload_image(e->c, &e->guest); + + heap->free(heap, e, sizeof(*e)); +} + +/* ---- Translation (cold-miss path) ---- */ + +static void* translate_block(CfreeEmu* e, u64 guest_pc) +{ + EmuInst insts[EMU_MAX_INSTS_PER_BLOCK]; + u32 ninsts; + ObjBuilder* ob; + MCEmitter* mc; + CGTarget* target; + CG* cg; + Sym block_name; + ObjSymId block_sym; + EmuLiftCtx ctx; + LinkSymId sym_id; + const LinkSymbol* sym; + void* entry; + + if (e->trace & CFREE_EMU_TRACE_BLOCK) emu_trace_block(e->c, guest_pc); + + /* Bounds check: guest_pc must lie inside the mapped guest AS. + * The loader maps the guest AS so guest VAs are valid host + * pointers (1:1); reading bytes through the cast is safe. */ + { + uintptr_t base = (uintptr_t)e->guest.guest_base; + if ((uintptr_t)guest_pc < base || + (uintptr_t)guest_pc >= base + e->guest.guest_size) { + return NULL; + } + } + + ninsts = emu_decode_block(e->guest_arch, + (const u8*)(uintptr_t)guest_pc, guest_pc, + insts, EMU_MAX_INSTS_PER_BLOCK); + if (ninsts == 0) return NULL; + + if (e->trace & CFREE_EMU_TRACE_INSN) { + u32 j; + for (j = 0; j < ninsts; ++j) + emu_trace_insn(e->c, guest_pc, &insts[j]); + } + + /* Per-block ObjBuilder + MC + CGTarget pipeline. The block lands + * as a single host function. */ + ob = obj_new(e->c); + mc = mc_new(e->c, ob); + target = cgtarget_new(e->c, ob, mc); + if (e->opt_level > 0) target = opt_cgtarget_new(e->c, target, e->opt_level); + cg = cg_new(e->c, target, /*Debug*/ NULL); + + block_name = emu_block_sym_name(e->c, guest_pc); + /* Forward-declare the block's symbol so the lifter can refer to it + * via cg_func_begin. obj_symbol_define fills in (section, value, size) + * once the function is emitted. */ + block_sym = obj_symbol(ob, block_name, SB_GLOBAL, SK_FUNC, + OBJ_SEC_NONE, 0, 0); + + memset(&ctx, 0, sizeof(ctx)); + ctx.arch = e->guest_arch; + ctx.cpu_state_type = emu_cpu_type (e->c, e->guest_arch); + ctx.block_fn_type = emu_block_fn_type(e->c, e->guest_arch); + ctx.block_sym = block_sym; + ctx.guest_pc = guest_pc; + + emu_lift_block(e->guest_arch, cg, insts, ninsts, &ctx); + + cgtarget_finalize(target); + obj_finalize(ob); + + cg_free(cg); + cgtarget_free(target); /* opt_cgtarget cascades to wrapped target */ + mc_free(mc); + + /* Add the block's object to the session linker and extend the + * image. link_resolve_extend places the new section at the next + * free offset within the reserved VA region (must not change host + * addresses of already-placed sections — chaining depends on it), + * resolves the block's runtime-helper externs via the resolver, + * and applies new relocations into the live image. */ + link_add_obj(e->linker, ob); + link_resolve_extend(e->linker, e->image); + + /* Commit and mprotect RX up to the new high-water of the image. */ + { + uintptr_t end = emu_code_region_base(e->code_region); + u32 i; + for (i = 0; i < link_segment_count(e->image); ++i) { + const LinkSegment* seg = link_segment_get(e->image, i + 1u); + uintptr_t segend = (uintptr_t)seg->vaddr + (uintptr_t)seg->mem_size; + if (segend > end) end = segend; + } + emu_code_region_commit_rx_to(e->code_region, end); + } + + /* Resolve the freshly placed block to its host entry. */ + sym_id = link_symbol_lookup(e->image, block_name); + if (sym_id == LINK_SYM_NONE) return NULL; + sym = link_symbol(e->image, sym_id); + if (!sym || !sym->defined) return NULL; + entry = (void*)(uintptr_t)sym->vaddr; + + emu_cache_insert(e->cache, guest_pc, entry); + return entry; +} + +void* cfree_emu_lookup(CfreeEmu* e, uint64_t guest_pc) +{ + PanicSave saved; + void* entry; + + if (!e) return NULL; + + /* Cache hit short-circuits the panic boundary. */ + entry = emu_cache_lookup(e->cache, guest_pc); + if (entry) return entry; + + compiler_panic_save(e->c, &saved); + if (setjmp(e->c->panic)) { + compiler_run_cleanups(e->c); + compiler_panic_restore(e->c, &saved); + return NULL; + } + + entry = translate_block(e, guest_pc); + + compiler_panic_restore(e->c, &saved); + return entry; +} + +/* ---- Dispatcher ---- */ + +int cfree_emu_step(CfreeEmu* e, uint32_t nblocks) +{ + PanicSave saved; + uint32_t i; + + if (!e) return 1; + if (e->done) return 0; + + compiler_panic_save(e->c, &saved); + if (setjmp(e->c->panic)) { + compiler_run_cleanups(e->c); + compiler_panic_restore(e->c, &saved); + return 1; + } + + for (i = 0; i < nblocks && !e->done; ++i) { + u64 pc = emu_cpu_pc(e->cpu); + void* entry; + EmuBlockFn fn; + u64 next_pc; + EmuTrapReason trap; + + if (e->trace & CFREE_EMU_TRACE_PC) emu_trace_pc(e->c, pc); + + entry = cfree_emu_lookup(e, pc); + if (!entry) { + compiler_panic(e->c, no_loc(), + "emu: failed to translate block at guest_pc=0x%llx", + (unsigned long long)pc); + } + + fn = (EmuBlockFn)entry; + next_pc = fn(e->cpu); + emu_cpu_set_pc(e->cpu, next_pc); + + trap = emu_cpu_trap_reason(e->cpu); + if (trap == EMU_TRAP_EXIT) { + e->done = 1; + e->exit_code = emu_cpu_exit_code(e->cpu); + } else if (trap == EMU_TRAP_FAULT) { + compiler_panic(e->c, no_loc(), + "emu: guest faulted at pc=0x%llx", + (unsigned long long)next_pc); + } + } + + compiler_panic_restore(e->c, &saved); + return 0; +} + +int cfree_emu_run(CfreeCompiler* c, const CfreeEmuOptions* opts, + int* out_exit_code) +{ + CfreeEmu* e; + int rc = 0; + + if (out_exit_code) *out_exit_code = 0; + if (!c || !opts) return 1; + + e = cfree_emu_new(c, opts); + if (!e) return 1; + + while (!e->done) { + if (cfree_emu_step(e, 1024) != 0) { rc = 1; break; } + } + + if (rc == 0 && out_exit_code) *out_exit_code = e->exit_code; + cfree_emu_free(e); + return rc; +} + +/* Runtime accessor for the resolver — exposes the running emu's + * CPUState pointer without baking the CfreeEmu layout into runtime.c. + * Used by emu_runtime_extern_resolver for EMU_SYM_CPU_STATE. */ +EmuCPUState* emu_internal_cpu(CfreeEmu* e) +{ + return e ? e->cpu : NULL; +} + +/* ---- Block symbol naming ---- + * "emu_block_<16-hex-pc>" — fixed-width hex so the linker's hash + * lookup never collides between two blocks at distinct guest PCs. + * Interned in the compiler's global pool; the Sym is stable for the + * Compiler's lifetime, which is what the linker assumes. */ +Sym emu_block_sym_name(Compiler* c, u64 guest_pc) +{ + char buf[32]; + static const char hex[] = "0123456789abcdef"; + int i; + /* "emu_block_" + 16 hex digits + NUL = 27 chars, fits in 32. */ + memcpy(buf, "emu_block_", 10); + for (i = 0; i < 16; ++i) { + buf[10 + 15 - i] = hex[guest_pc & 0xfu]; + guest_pc >>= 4; + } + buf[26] = '\0'; + return pool_intern_cstr(c->global, buf); +} diff --git a/src/emu/emu.h b/src/emu/emu.h @@ -0,0 +1,191 @@ +#ifndef CFREE_EMU_H +#define CFREE_EMU_H + +/* Internal API for libcfree's guest-ISA emulator. Public surface is + * cfree_emu_* in <cfree.h>; the implementation in src/emu/emu.c + * composes the pieces declared here. See doc/EMU.md for design. + * + * Layering: emu.c owns CfreeEmu lifecycle and the translate/dispatch + * loop; per-ISA decoders/lifters, CPUState synthesis, the JIT code + * cache and reserved-VA region, and the runtime helper trampolines + * each live behind one of the surfaces below so the top-level driver + * never reaches into ISA-specific code. */ + +#include <cfree.h> + +#include "core/core.h" +#include "obj/obj.h" +#include "type/type.h" + +typedef struct CG CG; +typedef struct LinkImage LinkImage; +typedef struct Linker Linker; + +/* ---- Configuration knobs ---------------------------------------- */ + +/* Bounded so the translator can stack-allocate the EmuInst buffer. */ +#define EMU_MAX_INSTS_PER_BLOCK 64u + +/* Reserved JIT code region. emu_runtime mmap's PROT_NONE up front and + * commits pages as cold blocks land. Sized for v1 — chaining and the + * code cache assume host VAs of placed sections never move, so this + * region also never grows. */ +#define EMU_CODE_REGION_SIZE (128ull * 1024ull * 1024ull) + +/* ---- Guest ELF loader ------------------------------------------- */ + +typedef struct EmuLoadedImage { + void* guest_base; /* host pointer to the mapped guest AS */ + size_t guest_size; /* bytes reserved for the guest AS */ + u64 entry_pc; /* guest VA of the program entry point */ + u64 initial_sp; /* guest VA of the initial stack pointer */ +} EmuLoadedImage; + +/* Parse the guest ELF, mmap the guest AS, copy PT_LOAD segments, + * push argv/envp/auxv onto the guest stack at initial_sp. Returns 0 + * on success and writes *out; returns nonzero on parse failure. */ +int emu_load_elf (Compiler*, CfreeEmuArch, + const u8* bytes, size_t len, + const char* const* argv, const char* const* envp, + EmuLoadedImage* out); +void emu_unload_image(Compiler*, EmuLoadedImage*); + +/* ---- CPU state -------------------------------------------------- */ + +typedef struct EmuCPUState EmuCPUState; + +typedef enum EmuTrapReason { + EMU_TRAP_NONE = 0, + EMU_TRAP_EXIT, /* guest exit syscall; exit_code valid */ + EMU_TRAP_FAULT, /* unmapped access / decode failure */ +} EmuTrapReason; + +EmuCPUState* emu_cpu_new(Compiler*, CfreeEmuArch, + u64 initial_pc, u64 initial_sp); +void emu_cpu_free(EmuCPUState*); +u64 emu_cpu_pc(const EmuCPUState*); +void emu_cpu_set_pc(EmuCPUState*, u64); +EmuTrapReason emu_cpu_trap_reason(const EmuCPUState*); +int emu_cpu_exit_code(const EmuCPUState*); + +/* The interned C struct type representing CPUState for `arch`. The + * lifter references fields through an ObjSymId resolved by the + * runtime extern resolver to &CfreeEmu->cpu storage. */ +const Type* emu_cpu_type(Compiler*, CfreeEmuArch); + +/* The function type `u64 (CPUState*)` used for every lifted block. + * Returned interned. */ +const Type* emu_block_fn_type(Compiler*, CfreeEmuArch); + +/* ---- Decoder ---------------------------------------------------- */ +/* Concrete shape lives here (rather than as a per-ISA opaque) so the + * translator can stack-allocate a fixed-size buffer in + * cfree_emu_lookup. Per-ISA decoders/lifters interpret the operand + * payload through their own enums; the carrier is shared. */ +typedef struct EmuInst { + u32 op; /* per-ISA enum */ + u32 flags; /* TERMINATOR | MEM | SETS_FLAGS | ... */ + u64 guest_pc; + u32 guest_bytes; /* instruction width in guest bytes */ + u32 nop; + u64 operands[6]; /* per-ISA payload */ +} EmuInst; + +/* Decode up to the next basic-block terminator or `max` instructions, + * whichever comes first. Returns the count written to `out`. Zero + * means decode failed at `guest_pc` (undecodable / out-of-bounds). */ +u32 emu_decode_block(CfreeEmuArch, const u8* bytes, u64 guest_pc, + EmuInst* out, u32 max); + +/* ---- Lifter ----------------------------------------------------- */ + +typedef struct EmuLiftCtx { + CfreeEmuArch arch; + const Type* cpu_state_type; /* from emu_cpu_type */ + const Type* block_fn_type; /* from emu_block_fn_type */ + ObjSymId block_sym; /* function symbol for this block */ + u64 guest_pc; /* PC of first instruction in the block */ +} EmuLiftCtx; + +/* Walk `insts` and emit one CG function (signature next_pc_t(CPUState*)) + * for the block. Calls cg_func_begin/end exactly once. */ +void emu_lift_block(CfreeEmuArch, CG*, const EmuInst* insts, u32 n, + const EmuLiftCtx*); + +/* ---- Code cache ------------------------------------------------- */ + +typedef struct EmuCodeCache EmuCodeCache; + +EmuCodeCache* emu_cache_new(Compiler*); +void emu_cache_free(EmuCodeCache*); +void emu_cache_insert(EmuCodeCache*, u64 guest_pc, void* host_entry); +void* emu_cache_lookup(const EmuCodeCache*, u64 guest_pc); + +/* ---- Code region (reserved VA) ---------------------------------- */ +/* PROT_NONE mmap that backs the linker's bump-allocated VA range. + * Pages are committed and flipped to RX after each link_resolve_extend + * lands new sections. The base address is fed to link_resolve_at as the + * image's runtime VA. */ +typedef struct EmuCodeRegion EmuCodeRegion; + +EmuCodeRegion* emu_code_region_new (Compiler*, size_t reserve_size); +void emu_code_region_free(EmuCodeRegion*); +uintptr_t emu_code_region_base(const EmuCodeRegion*); +size_t emu_code_region_size(const EmuCodeRegion*); + +/* Commits and mprotects RX every page covering [base, end). `end` must + * lie inside the reserved range and must be monotonically non-decreasing + * across calls — the chaining invariant depends on previously committed + * pages remaining RX. */ +void emu_code_region_commit_rx_to(EmuCodeRegion*, uintptr_t end); + +/* ---- Runtime helpers -------------------------------------------- */ + +/* Names of the runtime helper symbols the lifter emits as undefined + * externs. The extern resolver maps each one to the host address of + * the matching helper. Kept centralized so decode/lift/runtime agree. */ +#define EMU_SYM_CPU_STATE "__emu_cpu_state" +#define EMU_SYM_LOAD8 "__emu_load8" +#define EMU_SYM_LOAD16 "__emu_load16" +#define EMU_SYM_LOAD32 "__emu_load32" +#define EMU_SYM_LOAD64 "__emu_load64" +#define EMU_SYM_STORE8 "__emu_store8" +#define EMU_SYM_STORE16 "__emu_store16" +#define EMU_SYM_STORE32 "__emu_store32" +#define EMU_SYM_STORE64 "__emu_store64" +#define EMU_SYM_SYSCALL "__emu_syscall" +#define EMU_SYM_DISPATCH "__emu_dispatch" + +/* The block-symbol name format: emu_block_<hex_pc>. Kept short; the + * linker globals table only has to find it once per cold miss. */ +Sym emu_block_sym_name(Compiler*, u64 guest_pc); + +/* External resolver passed to link_set_extern_resolver. `user` is + * the CfreeEmu*. Returns NULL for unrecognized names — the linker + * promotes that to a fatal undefined-symbol diagnostic. */ +void* emu_runtime_extern_resolver(void* user, const char* name); + +/* Memory helpers; called from JITted blocks. The host process owns + * the guest AS, so loads/stores bounds-check against the EmuCPUState's + * mapped guest range and trap on miss (writing EMU_TRAP_FAULT into the + * CPU state and falling back to the dispatcher). */ +u8 emu_mem_load8 (EmuCPUState*, u64 addr); +u16 emu_mem_load16(EmuCPUState*, u64 addr); +u32 emu_mem_load32(EmuCPUState*, u64 addr); +u64 emu_mem_load64(EmuCPUState*, u64 addr); +void emu_mem_store8 (EmuCPUState*, u64 addr, u8); +void emu_mem_store16(EmuCPUState*, u64 addr, u16); +void emu_mem_store32(EmuCPUState*, u64 addr, u32); +void emu_mem_store64(EmuCPUState*, u64 addr, u64); + +/* Reads syscall number / args from the guest registers, forwards to + * the host OS, and writes the return into the guest return register. */ +void emu_syscall(EmuCPUState*); + +/* ---- Tracing ---------------------------------------------------- */ + +void emu_trace_pc (Compiler*, u64 guest_pc); +void emu_trace_block(Compiler*, u64 guest_pc); +void emu_trace_insn (Compiler*, u64 guest_pc, const EmuInst*); + +#endif diff --git a/src/emu/lift.c b/src/emu/lift.c @@ -0,0 +1,19 @@ +/* Per-ISA lifter. Consumes EmuInsts and drives CG to emit one host + * function per guest basic block (signature u64(EmuCPUState*)). + * Lifters target CG exclusively — never CGTarget directly — so the + * pipeline below CG is unchanged from the C front-end. */ + +#include "emu/emu.h" + +#include "cg/cg.h" + +#include <cfree.h> + +void emu_lift_block(CfreeEmuArch arch, CG* cg, const EmuInst* insts, u32 n, + const EmuLiftCtx* ctx) +{ + /* Per-ISA lifter tables not yet landed. translate_block panics + * before it would finalize an empty block, so this stub never + * silently produces an executable host function. */ + (void)arch; (void)cg; (void)insts; (void)n; (void)ctx; +} diff --git a/src/emu/runtime.c b/src/emu/runtime.c @@ -0,0 +1,317 @@ +/* Emulator runtime: code cache, reserved JIT VA region, runtime + * helper trampolines, and the extern resolver that wires lifted + * blocks to host helper addresses. The runtime is in-process — no + * separate runtime object — so the JIT linker just hands back the + * helper addresses through emu_runtime_extern_resolver. + * + * Block chaining lives here too (a runtime mprotect-and-patch pass + * outside the linker) but lands with the per-ISA lifter; see + * doc/EMU.md §6 for why it sits outside link/. */ + +#include "emu/emu.h" + +#include "core/heap.h" + +#include <cfree.h> + +#include <string.h> + +/* ============================================================ + * Reserved code region + * ============================================================ + * One up-front PROT_NONE reservation through env->execmem. The base + * address is fed to link_resolve_at as the image's runtime VA; per- + * block link_resolve_extend bump-allocates within. Pages are committed + * (protect to RX) lazily as blocks land — the runtime flips them + * after the linker writes the section bytes and applies relocations. + */ + +static SrcLoc no_loc(void) { SrcLoc l = {0,0,0}; return l; } + +static const CfreeExecMem* require_execmem(Compiler* c) +{ + const CfreeExecMem* m = c->env ? c->env->execmem : NULL; + if (!m || !m->reserve || !m->protect || !m->release) { + compiler_panic(c, no_loc(), + "emu: env->execmem is required for the code region"); + } + return m; +} + +static u64 page_size_bytes(const CfreeExecMem* m) +{ + return m->page_size ? (u64)m->page_size : 0x4000u; +} + +static u64 align_up_u64(u64 v, u64 a) +{ + return (v + (a - 1u)) & ~(a - 1u); +} + +struct EmuCodeRegion { + Compiler* c; + void* base; + size_t size; + uintptr_t rx_end; /* high-water of pages currently RX */ +}; + +EmuCodeRegion* emu_code_region_new(Compiler* c, size_t reserve_size) +{ + Heap* h; + const CfreeExecMem* mem; + EmuCodeRegion* r; + void* p; + size_t aligned; + + if (!c) return NULL; + h = (Heap*)c->env->heap; + mem = require_execmem(c); + aligned = (size_t)align_up_u64((u64)reserve_size, page_size_bytes(mem)); + + p = mem->reserve(mem->user, aligned, CFREE_PROT_NONE); + if (!p) return NULL; + + r = (EmuCodeRegion*)h->alloc(h, sizeof(*r), _Alignof(EmuCodeRegion)); + if (!r) { mem->release(mem->user, p, aligned); return NULL; } + r->c = c; + r->base = p; + r->size = aligned; + r->rx_end = (uintptr_t)p; + return r; +} + +void emu_code_region_free(EmuCodeRegion* r) +{ + Heap* h; + const CfreeExecMem* mem; + if (!r) return; + h = (Heap*)r->c->env->heap; + mem = r->c->env->execmem; + if (r->base && r->size && mem && mem->release) { + mem->release(mem->user, r->base, r->size); + } + h->free(h, r, sizeof(*r)); +} + +uintptr_t emu_code_region_base(const EmuCodeRegion* r) +{ + return r ? (uintptr_t)r->base : 0; +} + +size_t emu_code_region_size(const EmuCodeRegion* r) +{ + return r ? r->size : 0; +} + +void emu_code_region_commit_rx_to(EmuCodeRegion* r, uintptr_t end) +{ + uintptr_t base, page_end; + size_t len; + if (!r) return; + base = (uintptr_t)r->base; + page_end = (uintptr_t)align_up_u64((u64)end, page_size_bytes()); + /* Monotonic: never lower the high-water; chaining patches + * already-committed code and depends on it staying RX. */ + if (page_end <= r->rx_end) return; + if (page_end > base + r->size) page_end = base + r->size; + if (page_end <= r->rx_end) return; + + len = (size_t)(page_end - r->rx_end); + /* Linker has already written + relocated the section bytes via + * the original PROT_NONE mapping (which is technically a fault + * unless the mapping was promoted to RW). The actual write path + * is owned by link_resolve_extend; in v1 we expect the linker + * to use mprotect-RW prior to writing. RX flip happens here. */ + if (mprotect((void*)r->rx_end, len, PROT_READ | PROT_EXEC) == 0) { +#ifdef __aarch64__ + /* Flush data caches and invalidate icache so the CPU sees + * the freshly written instructions. */ + __builtin___clear_cache((char*)r->rx_end, (char*)page_end); +#endif + r->rx_end = page_end; + } +} + +/* ============================================================ + * Code cache (guest_pc -> host entry) + * ============================================================ + * Open-addressed linear-probe hash on the guest PC. Capacity grows + * by doubling; v1 never evicts. */ + +typedef struct EmuCacheEntry { + u64 guest_pc; /* 0 means empty slot */ + void* host_entry; +} EmuCacheEntry; + +struct EmuCodeCache { + Compiler* c; + EmuCacheEntry* slots; + u32 cap; + u32 used; +}; + +static u64 mix_pc(u64 x) +{ + x ^= x >> 33; x *= 0xff51afd7ed558ccdull; + x ^= x >> 33; x *= 0xc4ceb9fe1a85ec53ull; + x ^= x >> 33; + return x; +} + +static void cache_resize(EmuCodeCache* c, u32 new_cap) +{ + Heap* h = (Heap*)c->c->env->heap; + EmuCacheEntry* fresh; + u32 i, mask; + fresh = (EmuCacheEntry*)h->alloc(h, sizeof(*fresh) * new_cap, + _Alignof(EmuCacheEntry)); + if (!fresh) return; + memset(fresh, 0, sizeof(*fresh) * new_cap); + mask = new_cap - 1u; + for (i = 0; i < c->cap; ++i) { + u64 pc = c->slots[i].guest_pc; + u32 j; + if (pc == 0) continue; + j = (u32)mix_pc(pc) & mask; + while (fresh[j].guest_pc != 0) j = (j + 1u) & mask; + fresh[j] = c->slots[i]; + } + if (c->slots) h->free(h, c->slots, sizeof(*c->slots) * c->cap); + c->slots = fresh; + c->cap = new_cap; +} + +EmuCodeCache* emu_cache_new(Compiler* c) +{ + Heap* h; + EmuCodeCache* k; + if (!c) return NULL; + h = (Heap*)c->env->heap; + k = (EmuCodeCache*)h->alloc(h, sizeof(*k), _Alignof(EmuCodeCache)); + if (!k) return NULL; + memset(k, 0, sizeof(*k)); + k->c = c; + cache_resize(k, 64u); + return k; +} + +void emu_cache_free(EmuCodeCache* c) +{ + Heap* h; + if (!c) return; + h = (Heap*)c->c->env->heap; + if (c->slots) h->free(h, c->slots, sizeof(*c->slots) * c->cap); + h->free(h, c, sizeof(*c)); +} + +void emu_cache_insert(EmuCodeCache* c, u64 guest_pc, void* host_entry) +{ + u32 mask, j; + if (!c || guest_pc == 0) return; + if (c->used * 4u >= c->cap * 3u) cache_resize(c, c->cap * 2u); + mask = c->cap - 1u; + j = (u32)mix_pc(guest_pc) & mask; + while (c->slots[j].guest_pc != 0) { + if (c->slots[j].guest_pc == guest_pc) { + c->slots[j].host_entry = host_entry; + return; + } + j = (j + 1u) & mask; + } + c->slots[j].guest_pc = guest_pc; + c->slots[j].host_entry = host_entry; + c->used++; +} + +void* emu_cache_lookup(const EmuCodeCache* c, u64 guest_pc) +{ + u32 mask, j; + if (!c || c->cap == 0 || guest_pc == 0) return NULL; + mask = c->cap - 1u; + j = (u32)mix_pc(guest_pc) & mask; + while (c->slots[j].guest_pc != 0) { + if (c->slots[j].guest_pc == guest_pc) return c->slots[j].host_entry; + j = (j + 1u) & mask; + } + return NULL; +} + +/* ============================================================ + * Runtime helper trampolines + * ============================================================ + * Lifted blocks call into these through extern symbols whose names + * are EMU_SYM_*. The resolver below maps each name to the address + * of the matching function (or, for EMU_SYM_CPU_STATE, the address + * of the running emu's CPUState). */ + +/* Forward-declare the host-private CfreeEmu shape so the resolver + * can pull the CPUState pointer without dragging emu.c's struct + * definition into this TU's contract. */ +struct CfreeEmu; +EmuCPUState* emu_internal_cpu(struct CfreeEmu*); + +/* Memory helpers. Per EMU.md §5.4 these bounds-check the guest + * address against the mapped guest AS and trap on miss. v1 stubs + * write a fault into the CPU state and return zero; the dispatcher + * picks up the trap on return from the block. */ + +u8 emu_mem_load8 (EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; } +u16 emu_mem_load16(EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; } +u32 emu_mem_load32(EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; } +u64 emu_mem_load64(EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; } + +void emu_mem_store8 (EmuCPUState* s, u64 addr, u8 v) { (void)s; (void)addr; (void)v; } +void emu_mem_store16(EmuCPUState* s, u64 addr, u16 v) { (void)s; (void)addr; (void)v; } +void emu_mem_store32(EmuCPUState* s, u64 addr, u32 v) { (void)s; (void)addr; (void)v; } +void emu_mem_store64(EmuCPUState* s, u64 addr, u64 v) { (void)s; (void)addr; (void)v; } + +void emu_syscall(EmuCPUState* s) { (void)s; } + +/* ============================================================ + * Extern resolver + * ============================================================ + * Called by the linker for any undefined symbol the per-block + * ObjBuilder references. Returns the host VA of the named helper + * (or the running emu's CPUState). Returning NULL surfaces as a + * fatal "undefined reference" diagnostic from link_resolve_extend. */ + +static int streq(const char* a, const char* b) +{ + while (*a && *a == *b) { ++a; ++b; } + return *a == 0 && *b == 0; +} + +void* emu_runtime_extern_resolver(void* user, const char* name) +{ + if (!name) return NULL; + + if (streq(name, EMU_SYM_CPU_STATE)) { + struct CfreeEmu* e = (struct CfreeEmu*)user; + return (void*)emu_internal_cpu(e); + } + + if (streq(name, EMU_SYM_LOAD8)) return (void*)emu_mem_load8; + if (streq(name, EMU_SYM_LOAD16)) return (void*)emu_mem_load16; + if (streq(name, EMU_SYM_LOAD32)) return (void*)emu_mem_load32; + if (streq(name, EMU_SYM_LOAD64)) return (void*)emu_mem_load64; + if (streq(name, EMU_SYM_STORE8)) return (void*)emu_mem_store8; + if (streq(name, EMU_SYM_STORE16)) return (void*)emu_mem_store16; + if (streq(name, EMU_SYM_STORE32)) return (void*)emu_mem_store32; + if (streq(name, EMU_SYM_STORE64)) return (void*)emu_mem_store64; + if (streq(name, EMU_SYM_SYSCALL)) return (void*)emu_syscall; + + /* EMU_SYM_DISPATCH is the cross-block tail-call helper; it shares + * the host address of the dispatcher entry. The dispatcher loop + * lives inside cfree_emu_step, so the lifter can also synthesize + * a return-of-next_pc instead of a real call here. v1 returns + * NULL — lifters that don't yet emit DISPATCH calls are fine. */ + + return NULL; +} + +/* Tracing. v1 emits to the env's diag sink at CFREE_DIAG_NOTE. The + * full implementation lands with the lifter so it can format guest + * PCs and decoded instruction text consistently. */ + +void emu_trace_pc (Compiler* c, u64 pc) { (void)c; (void)pc; } +void emu_trace_block(Compiler* c, u64 pc) { (void)c; (void)pc; } diff --git a/src/link/link.c b/src/link/link.c @@ -397,6 +397,34 @@ void link_image_free(LinkImage* img) link_image_release(img); } +/* ---- Incremental resolution (stubs) ---- + * Per-block JIT translation in src/emu/ wants to grow a single + * LinkImage as cold blocks land (doc/EMU.md §6). The single-shot + * link_resolve discipline (link.h header comment) is set up to + * support this — inputs are non-destructively consumed, ObjBuilder* + * mappings are stable, resolution is functional. The two entries + * below are the surface; the implementation lands alongside the + * emu lifter cut. */ + +LinkImage* link_resolve_at(Linker* l, uintptr_t base_va) +{ + (void)base_va; + if (!l) return NULL; + compiler_panic(l->c, no_loc(), + "link_resolve_at: incremental resolution not yet " + "implemented"); + return NULL; +} + +void link_resolve_extend(Linker* l, LinkImage* img) +{ + (void)img; + if (!l) return; + compiler_panic(l->c, no_loc(), + "link_resolve_extend: incremental resolution not " + "yet implemented"); +} + /* ---- public emit dispatcher ---- */ void link_emit_image_writer(LinkImage* img, Writer* w) diff --git a/src/link/link.h b/src/link/link.h @@ -148,6 +148,21 @@ void link_set_gc_sections(Linker*, int enable); * comment locks in the implementation discipline that keeps the existing * surface amenable, with no speculative API. */ LinkImage* link_resolve(Linker*); + +/* Incremental resolution (per doc/EMU.md §6). link_resolve_at reserves + * the image's layout starting at the caller-specified base VA — used + * by the emu so the JIT image's host addresses are stable for the + * session (chaining patches live host code with section addresses). + * link_resolve_extend appends new inputs to an existing image: places + * new sections at the next free offset within the reserved region, + * resolves new symbols against the existing image's globals plus the + * registered LinkExternResolver, and applies new relocations. It + * MUST NOT change host addresses of previously placed sections — + * chaining and the code cache depend on it. The image must have been + * produced by a prior link_resolve_at call on the same Linker. */ +LinkImage* link_resolve_at (Linker*, uintptr_t base_va); +void link_resolve_extend(Linker*, LinkImage*); + void link_image_free(LinkImage*); const LinkSymbol* link_symbol(LinkImage*, LinkSymId); LinkSymId link_symbol_lookup(LinkImage*, Sym name);