commit 6e1392b32a02086c654ad381990a08cd3205f96f
parent dd8f6b240bea8e165edc47804ba5177f0938abbd
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sat, 9 May 2026 10:53:50 -0700
emu: implement cfree_emu_run/new/step/lookup/free with stubbed deps
Lifts the emu surface from panic-stubs into a real translate/dispatch
loop in src/emu/. cfree_emu_lookup runs the full cold-miss pipeline
(decode -> lift -> CG -> MC -> link_resolve_extend -> commit RX -> cache),
cfree_emu_step runs translated blocks under a panic boundary, and
cfree_emu_run is the one-shot wrapper. Per-ISA decode/lift, CPUState
type synthesis, and ELF loading are staged as their own files behind
emu.h and stubbed for now; the runtime's code cache and reserved-VA
region are real. doc/EMU.md (already drafted in-tree) documents the
design this implementation follows.
Adds link_resolve_at / link_resolve_extend to link/ as the single
linker extension the per-block JIT needs (doc/EMU.md §6); both panic
until the incremental layout pass lands.
Build fixes carried alongside:
- driver/objdump.c: handle CFREE_SK_NOTYPE in sym_kind_char (-Werror).
- src/emu/cpu.c: emu_cpu_type / emu_block_fn_type return NULL until
the type subsystem ships type_void / type_func.
Diffstat:
12 files changed, 1462 insertions(+), 15 deletions(-)
diff --git a/doc/EMU.md b/doc/EMU.md
@@ -0,0 +1,355 @@
+# cfree emu design
+
+Architecture of `cfree emu`, the guest-ISA emulator. Companion to
+`DESIGN.md`. Scope: how the emulator slots into the existing pipeline and
+what its contracts are. Not a tutorial; not implementation notes.
+
+## 1. Goals
+
+- `emu` multi-call subcommand: load and execute a guest ELF on the host,
+ user-mode only (Linux/macOS userland; no full-system emulation).
+- Targets v1: aarch64, riscv64. 32-bit variants follow each 64-bit lift.
+ x86_64 deferred (flag-heavy ISA — exercise lazy flags on simpler arches
+ first).
+- Per-basic-block JIT translation: decode guest bytes → lift to `CG` → opt
+ (optional) → MCEmitter → ObjBuilder → incremental link into a single
+ growing `LinkImage` → execute.
+- Block chaining for hot paths; cold blocks may run direct-from-CG (no opt
+ wrapper) for translation throughput.
+- Source-level stepping through `dbg` when the guest ELF carries DWARF.
+- Self-hosting: `src/emu/` is C11 freestanding like the rest of `src/`.
+
+The lifter is a sibling to `parse_c`: both are frontends that consume input
+bytes and drive `CG`. Everything below `CG` (`opt`, `arch`, `obj`) is
+unchanged. `link/` requires one extension — incremental resolve
+(`link_resolve_extend`, §6) — landed *before* emu work begins, so emu has a
+single lifecycle model end-to-end and never carries a "per-block fresh
+`LinkImage`" interim shape.
+
+## 2. Non-goals (v1)
+
+- Full-system emulation (privileged ISA, MMU, devices).
+- SIMD/vector ISA extensions (SSE/AVX/NEON/RVV) — blocked on `CGTarget`
+ lacking vector ops (§DESIGN 5.7). Programs using them either trap to a
+ scalarizer or fail to lift.
+- x86 in v1.
+- Self-modifying code (refuse to lift on observed write to a translated
+ page; full support is future work).
+- Per-instruction precise exceptions / signal redirection.
+- Foreign-OS syscalls — only the host OS's syscalls are forwarded.
+
+## 3. Layout
+
+```
+src/
+ emu/
+ emu.h driver-facing API
+ decode/ per-ISA structured decoder (shared tables with objdump)
+ aarch64.c
+ riscv64.c
+ lift/ per-ISA lifter; drives CG
+ aarch64.c
+ riscv64.c
+ cpu.h per-arch CPUState struct synthesis
+ runtime.c dispatcher, code cache, block chaining
+ syscall/ per-host-OS syscall forwarders (linux.c, darwin.c)
+test/
+ emu/ guest binary corpus + behavioral oracles
+```
+
+`src/emu/` is a sibling to `src/parse/`. The runtime helpers live in the
+cfree tool itself — the JIT's `LinkExternResolver` (§DESIGN 5.5.1) returns
+the host address of `emu_load64`, `emu_syscall`, etc. directly; no
+separate runtime object.
+
+## 4. Dataflow
+
+```
+guest.elf (bytes) ─► obj reader ─► LinkImage* (mapped into guest AS)
+ │
+ guest_pc ─► Decoder ─► EmuInst* ─► Lifter ─► CG ─┐
+ │
+ CGTarget ◄─────┘
+ │
+ (opt?)
+ │
+ MCEmitter ─► ObjBuilder
+ │
+ link_jit ─► host code
+ │
+ dispatcher(guest_pc) ──┘
+```
+
+1. **Load.** `obj/` readers parse the guest ELF. The runtime maps loadable
+ segments into a *guest address space* (an mmap'd region inside the
+ host process); guest virtual addresses are the addresses inside that
+ region.
+2. **Decode.** When the dispatcher hits an untranslated `guest_pc`, the
+ per-ISA decoder reads guest bytes and produces `EmuInst`s up to the
+ next basic-block terminator. Decode tables are shared with the
+ disassembler (objdump): same bit patterns, two output shapes.
+3. **Lift.** The per-ISA lifter walks the `EmuInst` stream and emits one
+ synthesized C function per guest basic block: signature
+ `next_pc_t block(CPUState*)`. Lifter calls `cg_*` ops only.
+4. **Codegen + JIT.** Standard cfree pipeline. At -O0 the emu drives a
+ target `CGTarget` directly (fast translation, slow execution); at -O2
+ it wraps with `opt_cgtarget` (slow translation, fast execution). Both
+ end at `link_jit` mapping executable pages.
+5. **Execute.** `runtime.c` calls the host code with the current
+ `CPUState*`. The block returns the next guest PC. The dispatcher looks
+ up the next block, translating on miss.
+
+## 5. Key interfaces
+
+### 5.1 `Decoder` (`src/emu/decode/decode.h`)
+
+Per-ISA structured decoder. Output is `EmuInst` — a tagged union of
+ISA-specific shapes with operand fields, *not* text.
+
+```c
+typedef struct EmuInst {
+ EmuOp op; /* per-ISA enum */
+ u64 guest_pc;
+ u32 guest_bytes; /* instruction width */
+ EmuOperand operands[EMU_MAX_OPERANDS];
+ u32 nop;
+ u32 flags; /* TERMINATOR | MEM | SETS_FLAGS | ... */
+} EmuInst;
+
+u32 emu_decode_block(EmuArch, const u8* bytes, u64 guest_pc,
+ EmuInst* out, u32 max);
+```
+
+The same decode tables back the disassembler (textual format) and the
+lifter (structured). One source of truth per ISA.
+
+### 5.2 `Lifter` (`src/emu/lift/lift.h`)
+
+Per-ISA lifter. Consumes `EmuInst*`, drives `CG*`, produces one CG function
+per guest basic block.
+
+```c
+void emu_lift_block(EmuArch, CG* cg,
+ const EmuInst* insts, u32 n,
+ EmuLiftCtx* ctx);
+```
+
+`EmuLiftCtx` carries: the CPUState `Type*`, the synthesized block function
+type (`next_pc_t (*)(CPUState*)`), the `ObjSymId` for the block, runtime
+helper symbols (memory load/store, syscall trampoline, dispatcher
+tail-call), and per-block lazy-flag state (§5.5).
+
+The lifter targets `CG` exclusively (`src/cg/cg.h`) — never `CGTarget`
+directly. It uses roughly this subset:
+
+- `cg_push_global(cpu_state_sym)`, `cg_push_int`, `cg_push_const`
+- `cg_load`, `cg_store`, plus a small `lift_field(ctx, off, T)` helper
+ layered on `cg_addr` + offset arithmetic
+- `cg_binop`, `cg_unop`, `cg_cmp`, `cg_convert`
+- `cg_atomic_*` for guest atomics
+- `cg_label_new` / `cg_label_place` / `cg_jump` / `cg_branch_true`
+- `cg_call` for runtime helpers (memory access, syscalls, dispatcher
+ tail-calls — `cg_call` materializes ABI parts from `fn_type`, which is
+ the main reason CG and not CGTarget)
+- `cg_set_loc` carrying the *guest* PC encoded as a `SrcLoc` against a
+ synthetic `SourceManager` file id (§DESIGN 5.0)
+
+It does not use: aggregates, bitfields, variadics, setjmp, structured
+scopes, inline asm. Those C-shaped surfaces remain available to the C
+front-end at zero cost.
+
+### 5.3 `CPUState` (`src/emu/cpu.h`)
+
+Per-arch C struct synthesized once per emu invocation as an interned
+`Type*`. Fields:
+
+- General-purpose register file (`u64 x[N]`).
+- Lazy-flag fields (§5.5): last op kind, last operands, materialized
+ flags cache.
+- Pointer to the guest memory base (host pointer).
+- Pointer to the dispatcher entry / code-cache lookup function.
+- Trap reason / exit code slots written before returning to the
+ dispatcher.
+
+Lifters reference fields by stable offset constants generated alongside
+the `Type*`. The runtime allocates one `CPUState` per guest thread and
+exposes its address as an `ObjSymId` resolved externally by the JIT
+linker.
+
+### 5.4 `Runtime` (`src/emu/runtime.h`)
+
+In-process runtime, linked into the cfree binary, callable from JITted
+guest blocks via the JIT's external resolver. Responsibilities:
+
+- **Dispatcher.** `emu_run(CPUState*)`: loop { lookup guest_pc →
+ translate-if-cold → call block }, exits on trap.
+- **Code cache.** `guest_pc → host entry` map; translation happens on
+ miss. Eviction deferred (cache grows unbounded in v1).
+- **Reserved code region.** One up-front `PROT_NONE` mmap (~128 MB)
+ whose base address feeds `link_resolve_at` (§6). Per-block
+ `link_resolve_extend` bump-allocates within it; the runtime commits
+ pages and `mprotect`s RX as new blocks land.
+- **Block chaining.** When a block's terminator targets an
+ already-translated block, patch the tail to jump directly, bypassing
+ the dispatcher. Patching is a runtime concern — CG/opt see only the
+ pre-patch tail-call.
+- **Memory helpers.** `emu_load{8,16,32,64}` / `emu_store_*`: bounds-check
+ the guest address against the mapped guest AS, trap on miss. Lifter
+ emits a `cg_call` to these for every guest memory op in v1; an inline
+ fastpath is a follow-up (§9).
+- **Syscall trampoline.** `emu_syscall(CPUState*)` reads the guest
+ syscall number/args from CPUState, forwards via the per-OS table in
+ `src/emu/syscall/`, writes the return into the guest return register.
+
+### 5.5 Flag policy — lazy flags
+
+Most ISAs (aarch64 NZCV, x86 EFLAGS) compute condition flags as a side
+effect of arithmetic. CG/CGTarget have no flag primitives. The lifter
+implements lazy flags entirely above CG:
+
+- Each flag-setting guest op writes (op_kind, lhs, rhs) into CPUState
+ fields. Flags are *not* recomputed eagerly.
+- Each flag-reading guest op (conditional branch, `cset`, …) recomputes
+ the specific flag bit it needs from the recorded inputs.
+- opt's GVN/DCE eliminates redundant flag computations within a block;
+ cross-block redundancy is recovered after inlining via block chaining.
+
+No CGTarget extension. Adding an ISA's flag set is a per-arch lifter
+table, not a pipeline change.
+
+### 5.6 Memory model — guest address space
+
+Guest loads/stores all go through a base pointer of unknown C provenance,
+so CG's default `MemAccess.alias` derivation (§DESIGN 5.6) collapses to
+`ALIAS_UNKNOWN`. To recover useful aliasing, the lifter sets
+`MemAccess.addr_space = EMU_GUEST_AS` on every guest memory op. opt's
+alias rules treat guest-AS accesses as:
+
+- May alias each other.
+- Do not alias C memory (CPUState fields, runtime state).
+
+This is a one-field convention on the existing `MemAccess` shape; no API
+change. CPUState field accesses use the default `addr_space = 0` and
+remain promotable / GVN'able as ordinary C memory.
+
+## 6. Lifecycle — per-block JIT on a single growing LinkImage
+
+The compiler pipeline is TU-shaped (§DESIGN 5.5.1, 9.1). The emu wants
+per-block translate-on-demand. The model is a *single* `LinkImage` that
+grows as cold blocks land — never a fresh image per block — so chaining,
+the code cache, and the host VA region all reference one stable artifact
+across the emu session.
+
+This requires `link/` to expose incremental resolution before any emu code
+lands:
+
+```c
+LinkImage* link_resolve_at(Linker*, uintptr_t base_va); /* first call */
+void link_resolve_extend(Linker*, LinkImage*); /* later calls */
+```
+
+`link_resolve_at` reserves layout starting at a caller-specified base VA
+(the runtime hands out from a pre-reserved `PROT_NONE` region, typically
+~128 MB). `link_resolve_extend` appends new inputs: it places new sections
+at the next free offset within the reserved region, resolves new symbols
+against the existing image's symbol table plus the `LinkExternResolver`,
+and applies new relocations into the live image. **It must not change
+host addresses of previously placed sections** — chaining has already
+patched live host code with those addresses. §DESIGN 5.5.1's existing
+discipline (stable `LinkInputId`s, separable `LinkRelocApply`,
+non-destructive resolution) is exactly what makes this safe.
+
+The per-block flow is then:
+
+1. Decode guest bytes for the block.
+2. Lift into a fresh `ObjBuilder` containing one function.
+3. `link_add_obj` against the session's `Linker`.
+4. `link_resolve_extend` to place the new section in the reserved VA
+ region, resolve symbols (helpers via the resolver; cross-block
+ references always resolve to the dispatcher — see below), apply
+ relocations.
+5. Commit the newly used pages and `mprotect` to RX.
+6. Insert `guest_pc → host_entry` into the code cache.
+
+Cross-block calls during step 4 always resolve to the dispatcher, even if
+the target block is already translated. The linker never learns about
+guest PCs or sibling blocks. Cross-block direct jumps are installed
+*later* by chaining, which is a runtime mprotect-and-patch operation
+outside the linker entirely. Keeping these two mechanisms separate avoids
+the duplication that "let the linker resolve sibling blocks" would
+introduce.
+
+What this model deliberately does *not* support in v1:
+
+- **Symbol removal / re-resolution.** Code-cache eviction would need it;
+ v1 lets the cache grow unbounded (§5.4) so the question is moot. When
+ eviction lands it pairs with a runtime-side "invalidate chain patches
+ pointing into the evicted block" side-table, not a linker mutation.
+- **Address relocation of previously placed sections.** The reserved-VA
+ bump-allocator never compacts; the chaining invariant depends on this.
+
+## 7. Driver API
+
+Mirrors §DESIGN 13.
+
+```c
+typedef struct CfreeEmuOptions {
+ EmuArch guest_arch;
+ const u8* guest_elf_bytes;
+ size_t guest_elf_len;
+ u32 optimize; /* 0 = direct CGTarget, 2 = opt_cgtarget */
+ EmuTraceFlags trace; /* PC trace, decoded-inst trace */
+ /* argv, envp, fd map come through CfreeEnv */
+} CfreeEmuOptions;
+
+int cfree_emu_run (Compiler*, CfreeEmuOptions, int* out_exit_code);
+
+/* Lower-level surface for dbg integration. */
+typedef struct CfreeEmu CfreeEmu;
+CfreeEmu* cfree_emu_new (Compiler*, CfreeEmuOptions);
+int cfree_emu_step (CfreeEmu*, u32 nblocks);
+void* cfree_emu_lookup(CfreeEmu*, u64 guest_pc); /* translate-if-cold */
+void cfree_emu_free (CfreeEmu*);
+```
+
+Path-shaped helpers (`cfree emu prog.elf`) live in the driver layer and
+read bytes via `c->env->file_io->read_all`. The freestanding core never
+takes paths.
+
+## 8. Debug info
+
+When the guest ELF carries DWARF, source-level stepping comes for free:
+
+- A guest-DWARF reader (extension of `src/debug/`) maps guest PC →
+ guest source line.
+- `dbg` interrogates `cfree_emu_lookup` plus the guest DWARF for
+ per-line breakpoints and stepping.
+- Lifter feeds `cg_set_loc` a `SrcLoc` whose `file_id` is a synthetic
+ `SourceManager` file representing the guest binary, with line numbers
+ encoding guest PC. opt's `Inst.loc` and host-side DWARF then point
+ back at guest PCs, useful for `objdump` of JITted host code.
+
+No new debug-info pipeline.
+
+## 9. Open questions
+
+- **CPUState promotion.** CPUState is passed by pointer, so its fields
+ are address-taken from CG's view and `build_ssa` won't promote them
+ (§DESIGN 12). Hot guest registers will spill/reload across every
+ arithmetic op, which kills perf. Likely fix: a per-block convention
+ that imports CPUState fields into virtuals at entry and exports at
+ exit, treating mid-block accesses as the SSA values. This is the
+ single largest perf question and worth prototyping before the lifter
+ interface freezes.
+- **Inline memory fastpath.** A `cg_call` per guest memory op is a real
+ function call. Inlining a bounds check + direct host load is a known
+ emu speedup; needs either a cross-runtime/JIT inliner or a CG-level
+ helper. Defer until measured.
+- **Vector ISA support.** Blocked by CGTarget lacking vector ops
+ (§DESIGN 5.7). Lands the same day vectors land for the C front-end.
+- **x86 flag policy.** EFLAGS has more bits and more cross-instruction
+ dependencies than aarch64 NZCV. Lazy flags work but the recorded
+ payload is larger; verify on aarch64 before committing to x86.
+- **SMC detection.** v1 refuses to lift on observed write to a
+ translated page (write-protect translated guest pages, trap writes).
+ Full SMC support — invalidate-and-retranslate — is future work.
diff --git a/driver/objdump.c b/driver/objdump.c
@@ -72,6 +72,7 @@ static char sym_kind_char(CfreeSymKind k)
case CFREE_SK_ABS: return 'A';
case CFREE_SK_COMMON: return 'C';
case CFREE_SK_UNDEF: return 'U';
+ case CFREE_SK_NOTYPE: return 'n';
}
return ' ';
}
diff --git a/src/api/stubs.c b/src/api/stubs.c
@@ -216,18 +216,7 @@ int cfree_dwarf_param_iter_next(CfreeDwarfParamIter* it, CfreeD
{ (void)it; (void)o; return 0; }
void cfree_dwarf_param_iter_free(CfreeDwarfParamIter* it) { (void)it; }
-/* ============================================================
- * Emulator (cfree emu)
- * ============================================================ */
-struct CfreeEmu { int _; };
-
-int cfree_emu_run(CfreeCompiler* c, const CfreeEmuOptions* opts, int* out_exit_code)
-{
- (void)c; (void)opts;
- if (out_exit_code) *out_exit_code = 0;
- return 1;
-}
-CfreeEmu* cfree_emu_new (CfreeCompiler* c, const CfreeEmuOptions* o) { (void)c; (void)o; return 0; }
-int cfree_emu_step (CfreeEmu* e, uint32_t n) { (void)e; (void)n; return 1; }
-void* cfree_emu_lookup(CfreeEmu* e, uint64_t pc) { (void)e; (void)pc; return 0; }
-void cfree_emu_free (CfreeEmu* e) { (void)e; }
+/* Emulator (cfree emu) lives under src/emu/ — cfree_emu_run / new /
+ * step / lookup / free are real implementations there, with the
+ * per-ISA decode/lift, CPUState, and runtime helper layers stubbed
+ * one level down. */
diff --git a/src/emu/cpu.c b/src/emu/cpu.c
@@ -0,0 +1,89 @@
+/* CPUState: per-thread guest register/lazy-flag/memory-base record,
+ * synthesized once per emu invocation as an interned C `Type*`. The
+ * lifter references fields through a stable offset table generated
+ * alongside the type; the runtime owns the storage and exposes its
+ * address to the JIT linker via the extern resolver (EMU_SYM_CPU_STATE).
+ *
+ * Per-arch fields land with the per-ISA lifter. v1 stub keeps the
+ * lifecycle real (alloc, free, PC/SP getters, trap reason) so emu.c
+ * does not need to know anything about per-arch register files. */
+
+#include "emu/emu.h"
+
+#include "core/heap.h"
+
+#include <cfree.h>
+
+#include <string.h>
+
+struct EmuCPUState {
+ Compiler* c;
+ CfreeEmuArch arch;
+ u64 pc;
+ u64 sp;
+ EmuTrapReason trap;
+ int exit_code;
+ /* Per-arch register / lazy-flag fields land alongside the synthesized
+ * Type*; the runtime helpers (emu_mem_*, emu_syscall) reach them
+ * through the canonical offsets. */
+};
+
+EmuCPUState* emu_cpu_new(Compiler* c, CfreeEmuArch arch,
+ u64 initial_pc, u64 initial_sp)
+{
+ Heap* h;
+ EmuCPUState* s;
+ if (!c) return NULL;
+ h = (Heap*)c->env->heap;
+ s = (EmuCPUState*)h->alloc(h, sizeof(*s), _Alignof(EmuCPUState));
+ if (!s) return NULL;
+ memset(s, 0, sizeof(*s));
+ s->c = c;
+ s->arch = arch;
+ s->pc = initial_pc;
+ s->sp = initial_sp;
+ s->trap = EMU_TRAP_NONE;
+ return s;
+}
+
+void emu_cpu_free(EmuCPUState* s)
+{
+ Heap* h;
+ if (!s) return;
+ h = (Heap*)s->c->env->heap;
+ h->free(h, s, sizeof(*s));
+}
+
+u64 emu_cpu_pc(const EmuCPUState* s) { return s ? s->pc : 0; }
+
+void emu_cpu_set_pc(EmuCPUState* s, u64 pc)
+{
+ if (s) s->pc = pc;
+}
+
+EmuTrapReason emu_cpu_trap_reason(const EmuCPUState* s)
+{
+ return s ? s->trap : EMU_TRAP_NONE;
+}
+
+int emu_cpu_exit_code(const EmuCPUState* s)
+{
+ return s ? s->exit_code : 0;
+}
+
+const Type* emu_cpu_type(Compiler* c, CfreeEmuArch arch)
+{
+ /* Per-arch struct layout lands with the per-ISA lifter. The lifter
+ * is a stub for now; translate_block panics before any consumer
+ * dereferences this, so a NULL placeholder is safe. */
+ (void)c; (void)arch;
+ return NULL;
+}
+
+const Type* emu_block_fn_type(Compiler* c, CfreeEmuArch arch)
+{
+ /* Block ABI: u64 entry(EmuCPUState*). Materialized once the type
+ * subsystem and per-arch CPUState type land together. */
+ (void)c; (void)arch;
+ return NULL;
+}
diff --git a/src/emu/decode.c b/src/emu/decode.c
@@ -0,0 +1,25 @@
+/* Per-ISA structured decoder. The lifter (src/emu/lift.c) walks the
+ * EmuInst stream produced here; the same decode tables back the
+ * disassembler (textual format) so there's one source of truth per
+ * ISA. v1 targets aarch64 and riscv64; backends land separately. */
+
+#include "emu/emu.h"
+
+#include "core/core.h"
+
+#include <cfree.h>
+
+u32 emu_decode_block(CfreeEmuArch arch, const u8* bytes, u64 guest_pc,
+ EmuInst* out, u32 max)
+{
+ /* Per-ISA decode tables not yet landed. Returning 0 routes the
+ * caller through translate_block's failure path, which surfaces
+ * a "failed to translate block" panic with the offending PC. */
+ (void)arch; (void)bytes; (void)guest_pc; (void)out; (void)max;
+ return 0;
+}
+
+void emu_trace_insn(Compiler* c, u64 guest_pc, const EmuInst* insn)
+{
+ (void)c; (void)guest_pc; (void)insn;
+}
diff --git a/src/emu/elf_load.c b/src/emu/elf_load.c
@@ -0,0 +1,47 @@
+/* Guest ELF loader: parses the ELF via the existing obj reader
+ * (read_elf in src/obj/elf_read.c), maps a guest address space,
+ * places loadable sections, and pushes argv/envp/auxv onto the
+ * guest stack at initial_sp.
+ *
+ * The reader gives us sections + symbols; the loader walks the
+ * SF_ALLOC sections, mmaps a contiguous host range covering the
+ * guest VA span, and copies the section bytes in. The entry PC
+ * resolves through the symbol named by the ELF e_entry header
+ * (typically `_start`). v1 executes statically-linked guest ELFs
+ * — dynamic-loader work is deferred (see doc/EMU.md §2). */
+
+#include "emu/emu.h"
+
+#include "core/heap.h"
+#include "obj/obj.h"
+
+#include <cfree.h>
+
+#include <string.h>
+
+int emu_load_elf(Compiler* c, CfreeEmuArch arch,
+ const u8* bytes, size_t len,
+ const char* const* argv, const char* const* envp,
+ EmuLoadedImage* out)
+{
+ /* Per the design: parse via read_elf (an ELF -> ObjBuilder
+ * reader that already exists), walk allocatable sections to
+ * compute the guest VA span, mmap the guest AS, copy section
+ * bytes into the AS, lay out argv/envp/auxv at the top of the
+ * stack, and emit entry_pc / initial_sp.
+ *
+ * Stub returns nonzero so cfree_emu_new short-circuits before
+ * any consumer touches an uninitialized EmuLoadedImage. */
+ (void)c; (void)arch; (void)bytes; (void)len;
+ (void)argv; (void)envp;
+ if (out) memset(out, 0, sizeof(*out));
+ return 1;
+}
+
+void emu_unload_image(Compiler* c, EmuLoadedImage* img)
+{
+ (void)c;
+ if (!img) return;
+ /* munmap the guest AS region once the loader is real. */
+ memset(img, 0, sizeof(*img));
+}
diff --git a/src/emu/emu.c b/src/emu/emu.c
@@ -0,0 +1,371 @@
+/* libcfree's guest-ISA emulator: load a guest ELF, translate one
+ * basic block at a time into host code via the existing CG/MC/link
+ * pipeline, dispatch through a code cache. See doc/EMU.md for design
+ * and §6 for the incremental-link discipline.
+ *
+ * This file owns CfreeEmu lifecycle and the translate/dispatch loop.
+ * Per-ISA decoders/lifters, CPUState synthesis, the code cache and
+ * reserved-VA region, and the runtime helper trampolines each live
+ * behind APIs declared in src/emu/emu.h. */
+
+#include "emu/emu.h"
+
+#include "arch/arch.h"
+#include "cg/cg.h"
+#include "core/heap.h"
+#include "core/pool.h"
+#include "link/link.h"
+#include "obj/obj.h"
+#include "opt/opt.h"
+
+#include <cfree.h>
+
+#include <setjmp.h>
+#include <string.h>
+
+/* ---- Lifecycle ---- */
+
+struct CfreeEmu {
+ Compiler* c;
+ CfreeEmuArch guest_arch;
+ int opt_level;
+ CfreeEmuTraceFlags trace;
+
+ EmuLoadedImage guest;
+ EmuCPUState* cpu;
+
+ Linker* linker;
+ LinkImage* image;
+ EmuCodeRegion* code_region;
+ EmuCodeCache* cache;
+
+ int done;
+ int exit_code;
+};
+
+static SrcLoc no_loc(void)
+{
+ SrcLoc l;
+ l.file_id = 0;
+ l.line = 0;
+ l.col = 0;
+ return l;
+}
+
+static int arch_supported(CfreeEmuArch a)
+{
+ return a == CFREE_EMU_ARCH_AARCH64 || a == CFREE_EMU_ARCH_RISCV64;
+}
+
+/* The block function call ABI: u64 entry(EmuCPUState*). Cast through
+ * a typedef so the call site reads cleanly in the dispatcher. */
+typedef u64 (*EmuBlockFn)(EmuCPUState*);
+
+CfreeEmu* cfree_emu_new(CfreeCompiler* c, const CfreeEmuOptions* opts)
+{
+ PanicSave saved;
+ Heap* heap;
+ CfreeEmu* e;
+
+ if (!c || !opts || !opts->guest_elf_bytes || opts->guest_elf_len == 0)
+ return NULL;
+ if (!arch_supported(opts->guest_arch)) return NULL;
+
+ compiler_panic_save(c, &saved);
+ if (setjmp(c->panic)) {
+ compiler_run_cleanups(c);
+ compiler_panic_restore(c, &saved);
+ return NULL;
+ }
+
+ heap = (Heap*)c->env->heap;
+ e = (CfreeEmu*)heap->alloc(heap, sizeof(*e), _Alignof(CfreeEmu));
+ if (!e) compiler_panic(c, no_loc(), "emu: out of memory");
+ memset(e, 0, sizeof(*e));
+ e->c = c;
+ e->guest_arch = opts->guest_arch;
+ e->opt_level = opts->optimize;
+ e->trace = opts->trace;
+
+ /* 1. Load the guest ELF: mmap a guest AS and place PT_LOAD segments,
+ * push argv/envp/auxv onto the guest stack. */
+ if (emu_load_elf(c, opts->guest_arch,
+ opts->guest_elf_bytes, opts->guest_elf_len,
+ opts->argv, opts->envp, &e->guest) != 0) {
+ compiler_panic(c, no_loc(), "emu: failed to load guest ELF");
+ }
+
+ /* 2. Allocate per-thread CPU state and seed PC/SP. */
+ e->cpu = emu_cpu_new(c, opts->guest_arch,
+ e->guest.entry_pc, e->guest.initial_sp);
+
+ /* 3. Reserve a fixed-VA code region for translated host blocks. */
+ e->code_region = emu_code_region_new(c, EMU_CODE_REGION_SIZE);
+
+ /* 4. Stand up the session linker. The extern resolver maps each
+ * EMU_SYM_* helper name to the host address of its trampoline /
+ * the running CfreeEmu's CPU state. */
+ e->linker = link_new(c);
+ if (!e->linker) compiler_panic(c, no_loc(), "emu: link_new failed");
+ link_set_extern_resolver(e->linker, emu_runtime_extern_resolver, e);
+
+ /* 5. Seed the initial empty image at the code region's base VA.
+ * Subsequent cold blocks land via link_resolve_extend, which
+ * must keep already-placed sections at stable host addresses
+ * (block chaining patches them). */
+ e->image = link_resolve_at(e->linker,
+ emu_code_region_base(e->code_region));
+ if (!e->image) compiler_panic(c, no_loc(),
+ "emu: link_resolve_at failed");
+
+ /* 6. Code cache: guest_pc -> host entry. Grows unbounded in v1. */
+ e->cache = emu_cache_new(c);
+
+ compiler_panic_restore(c, &saved);
+ return e;
+}
+
+void cfree_emu_free(CfreeEmu* e)
+{
+ Heap* heap;
+ if (!e) return;
+ heap = (Heap*)e->c->env->heap;
+
+ if (e->cache) emu_cache_free(e->cache);
+ if (e->image) link_image_free(e->image);
+ if (e->linker) link_free(e->linker);
+ if (e->code_region) emu_code_region_free(e->code_region);
+ if (e->cpu) emu_cpu_free(e->cpu);
+ emu_unload_image(e->c, &e->guest);
+
+ heap->free(heap, e, sizeof(*e));
+}
+
+/* ---- Translation (cold-miss path) ---- */
+
+static void* translate_block(CfreeEmu* e, u64 guest_pc)
+{
+ EmuInst insts[EMU_MAX_INSTS_PER_BLOCK];
+ u32 ninsts;
+ ObjBuilder* ob;
+ MCEmitter* mc;
+ CGTarget* target;
+ CG* cg;
+ Sym block_name;
+ ObjSymId block_sym;
+ EmuLiftCtx ctx;
+ LinkSymId sym_id;
+ const LinkSymbol* sym;
+ void* entry;
+
+ if (e->trace & CFREE_EMU_TRACE_BLOCK) emu_trace_block(e->c, guest_pc);
+
+ /* Bounds check: guest_pc must lie inside the mapped guest AS.
+ * The loader maps the guest AS so guest VAs are valid host
+ * pointers (1:1); reading bytes through the cast is safe. */
+ {
+ uintptr_t base = (uintptr_t)e->guest.guest_base;
+ if ((uintptr_t)guest_pc < base ||
+ (uintptr_t)guest_pc >= base + e->guest.guest_size) {
+ return NULL;
+ }
+ }
+
+ ninsts = emu_decode_block(e->guest_arch,
+ (const u8*)(uintptr_t)guest_pc, guest_pc,
+ insts, EMU_MAX_INSTS_PER_BLOCK);
+ if (ninsts == 0) return NULL;
+
+ if (e->trace & CFREE_EMU_TRACE_INSN) {
+ u32 j;
+ for (j = 0; j < ninsts; ++j)
+ emu_trace_insn(e->c, guest_pc, &insts[j]);
+ }
+
+ /* Per-block ObjBuilder + MC + CGTarget pipeline. The block lands
+ * as a single host function. */
+ ob = obj_new(e->c);
+ mc = mc_new(e->c, ob);
+ target = cgtarget_new(e->c, ob, mc);
+ if (e->opt_level > 0) target = opt_cgtarget_new(e->c, target, e->opt_level);
+ cg = cg_new(e->c, target, /*Debug*/ NULL);
+
+ block_name = emu_block_sym_name(e->c, guest_pc);
+ /* Forward-declare the block's symbol so the lifter can refer to it
+ * via cg_func_begin. obj_symbol_define fills in (section, value, size)
+ * once the function is emitted. */
+ block_sym = obj_symbol(ob, block_name, SB_GLOBAL, SK_FUNC,
+ OBJ_SEC_NONE, 0, 0);
+
+ memset(&ctx, 0, sizeof(ctx));
+ ctx.arch = e->guest_arch;
+ ctx.cpu_state_type = emu_cpu_type (e->c, e->guest_arch);
+ ctx.block_fn_type = emu_block_fn_type(e->c, e->guest_arch);
+ ctx.block_sym = block_sym;
+ ctx.guest_pc = guest_pc;
+
+ emu_lift_block(e->guest_arch, cg, insts, ninsts, &ctx);
+
+ cgtarget_finalize(target);
+ obj_finalize(ob);
+
+ cg_free(cg);
+ cgtarget_free(target); /* opt_cgtarget cascades to wrapped target */
+ mc_free(mc);
+
+ /* Add the block's object to the session linker and extend the
+ * image. link_resolve_extend places the new section at the next
+ * free offset within the reserved VA region (must not change host
+ * addresses of already-placed sections — chaining depends on it),
+ * resolves the block's runtime-helper externs via the resolver,
+ * and applies new relocations into the live image. */
+ link_add_obj(e->linker, ob);
+ link_resolve_extend(e->linker, e->image);
+
+ /* Commit and mprotect RX up to the new high-water of the image. */
+ {
+ uintptr_t end = emu_code_region_base(e->code_region);
+ u32 i;
+ for (i = 0; i < link_segment_count(e->image); ++i) {
+ const LinkSegment* seg = link_segment_get(e->image, i + 1u);
+ uintptr_t segend = (uintptr_t)seg->vaddr + (uintptr_t)seg->mem_size;
+ if (segend > end) end = segend;
+ }
+ emu_code_region_commit_rx_to(e->code_region, end);
+ }
+
+ /* Resolve the freshly placed block to its host entry. */
+ sym_id = link_symbol_lookup(e->image, block_name);
+ if (sym_id == LINK_SYM_NONE) return NULL;
+ sym = link_symbol(e->image, sym_id);
+ if (!sym || !sym->defined) return NULL;
+ entry = (void*)(uintptr_t)sym->vaddr;
+
+ emu_cache_insert(e->cache, guest_pc, entry);
+ return entry;
+}
+
+void* cfree_emu_lookup(CfreeEmu* e, uint64_t guest_pc)
+{
+ PanicSave saved;
+ void* entry;
+
+ if (!e) return NULL;
+
+ /* Cache hit short-circuits the panic boundary. */
+ entry = emu_cache_lookup(e->cache, guest_pc);
+ if (entry) return entry;
+
+ compiler_panic_save(e->c, &saved);
+ if (setjmp(e->c->panic)) {
+ compiler_run_cleanups(e->c);
+ compiler_panic_restore(e->c, &saved);
+ return NULL;
+ }
+
+ entry = translate_block(e, guest_pc);
+
+ compiler_panic_restore(e->c, &saved);
+ return entry;
+}
+
+/* ---- Dispatcher ---- */
+
+int cfree_emu_step(CfreeEmu* e, uint32_t nblocks)
+{
+ PanicSave saved;
+ uint32_t i;
+
+ if (!e) return 1;
+ if (e->done) return 0;
+
+ compiler_panic_save(e->c, &saved);
+ if (setjmp(e->c->panic)) {
+ compiler_run_cleanups(e->c);
+ compiler_panic_restore(e->c, &saved);
+ return 1;
+ }
+
+ for (i = 0; i < nblocks && !e->done; ++i) {
+ u64 pc = emu_cpu_pc(e->cpu);
+ void* entry;
+ EmuBlockFn fn;
+ u64 next_pc;
+ EmuTrapReason trap;
+
+ if (e->trace & CFREE_EMU_TRACE_PC) emu_trace_pc(e->c, pc);
+
+ entry = cfree_emu_lookup(e, pc);
+ if (!entry) {
+ compiler_panic(e->c, no_loc(),
+ "emu: failed to translate block at guest_pc=0x%llx",
+ (unsigned long long)pc);
+ }
+
+ fn = (EmuBlockFn)entry;
+ next_pc = fn(e->cpu);
+ emu_cpu_set_pc(e->cpu, next_pc);
+
+ trap = emu_cpu_trap_reason(e->cpu);
+ if (trap == EMU_TRAP_EXIT) {
+ e->done = 1;
+ e->exit_code = emu_cpu_exit_code(e->cpu);
+ } else if (trap == EMU_TRAP_FAULT) {
+ compiler_panic(e->c, no_loc(),
+ "emu: guest faulted at pc=0x%llx",
+ (unsigned long long)next_pc);
+ }
+ }
+
+ compiler_panic_restore(e->c, &saved);
+ return 0;
+}
+
+int cfree_emu_run(CfreeCompiler* c, const CfreeEmuOptions* opts,
+ int* out_exit_code)
+{
+ CfreeEmu* e;
+ int rc = 0;
+
+ if (out_exit_code) *out_exit_code = 0;
+ if (!c || !opts) return 1;
+
+ e = cfree_emu_new(c, opts);
+ if (!e) return 1;
+
+ while (!e->done) {
+ if (cfree_emu_step(e, 1024) != 0) { rc = 1; break; }
+ }
+
+ if (rc == 0 && out_exit_code) *out_exit_code = e->exit_code;
+ cfree_emu_free(e);
+ return rc;
+}
+
+/* Runtime accessor for the resolver — exposes the running emu's
+ * CPUState pointer without baking the CfreeEmu layout into runtime.c.
+ * Used by emu_runtime_extern_resolver for EMU_SYM_CPU_STATE. */
+EmuCPUState* emu_internal_cpu(CfreeEmu* e)
+{
+ return e ? e->cpu : NULL;
+}
+
+/* ---- Block symbol naming ----
+ * "emu_block_<16-hex-pc>" — fixed-width hex so the linker's hash
+ * lookup never collides between two blocks at distinct guest PCs.
+ * Interned in the compiler's global pool; the Sym is stable for the
+ * Compiler's lifetime, which is what the linker assumes. */
+Sym emu_block_sym_name(Compiler* c, u64 guest_pc)
+{
+ char buf[32];
+ static const char hex[] = "0123456789abcdef";
+ int i;
+ /* "emu_block_" + 16 hex digits + NUL = 27 chars, fits in 32. */
+ memcpy(buf, "emu_block_", 10);
+ for (i = 0; i < 16; ++i) {
+ buf[10 + 15 - i] = hex[guest_pc & 0xfu];
+ guest_pc >>= 4;
+ }
+ buf[26] = '\0';
+ return pool_intern_cstr(c->global, buf);
+}
diff --git a/src/emu/emu.h b/src/emu/emu.h
@@ -0,0 +1,191 @@
+#ifndef CFREE_EMU_H
+#define CFREE_EMU_H
+
+/* Internal API for libcfree's guest-ISA emulator. Public surface is
+ * cfree_emu_* in <cfree.h>; the implementation in src/emu/emu.c
+ * composes the pieces declared here. See doc/EMU.md for design.
+ *
+ * Layering: emu.c owns CfreeEmu lifecycle and the translate/dispatch
+ * loop; per-ISA decoders/lifters, CPUState synthesis, the JIT code
+ * cache and reserved-VA region, and the runtime helper trampolines
+ * each live behind one of the surfaces below so the top-level driver
+ * never reaches into ISA-specific code. */
+
+#include <cfree.h>
+
+#include "core/core.h"
+#include "obj/obj.h"
+#include "type/type.h"
+
+typedef struct CG CG;
+typedef struct LinkImage LinkImage;
+typedef struct Linker Linker;
+
+/* ---- Configuration knobs ---------------------------------------- */
+
+/* Bounded so the translator can stack-allocate the EmuInst buffer. */
+#define EMU_MAX_INSTS_PER_BLOCK 64u
+
+/* Reserved JIT code region. emu_runtime mmap's PROT_NONE up front and
+ * commits pages as cold blocks land. Sized for v1 — chaining and the
+ * code cache assume host VAs of placed sections never move, so this
+ * region also never grows. */
+#define EMU_CODE_REGION_SIZE (128ull * 1024ull * 1024ull)
+
+/* ---- Guest ELF loader ------------------------------------------- */
+
+typedef struct EmuLoadedImage {
+ void* guest_base; /* host pointer to the mapped guest AS */
+ size_t guest_size; /* bytes reserved for the guest AS */
+ u64 entry_pc; /* guest VA of the program entry point */
+ u64 initial_sp; /* guest VA of the initial stack pointer */
+} EmuLoadedImage;
+
+/* Parse the guest ELF, mmap the guest AS, copy PT_LOAD segments,
+ * push argv/envp/auxv onto the guest stack at initial_sp. Returns 0
+ * on success and writes *out; returns nonzero on parse failure. */
+int emu_load_elf (Compiler*, CfreeEmuArch,
+ const u8* bytes, size_t len,
+ const char* const* argv, const char* const* envp,
+ EmuLoadedImage* out);
+void emu_unload_image(Compiler*, EmuLoadedImage*);
+
+/* ---- CPU state -------------------------------------------------- */
+
+typedef struct EmuCPUState EmuCPUState;
+
+typedef enum EmuTrapReason {
+ EMU_TRAP_NONE = 0,
+ EMU_TRAP_EXIT, /* guest exit syscall; exit_code valid */
+ EMU_TRAP_FAULT, /* unmapped access / decode failure */
+} EmuTrapReason;
+
+EmuCPUState* emu_cpu_new(Compiler*, CfreeEmuArch,
+ u64 initial_pc, u64 initial_sp);
+void emu_cpu_free(EmuCPUState*);
+u64 emu_cpu_pc(const EmuCPUState*);
+void emu_cpu_set_pc(EmuCPUState*, u64);
+EmuTrapReason emu_cpu_trap_reason(const EmuCPUState*);
+int emu_cpu_exit_code(const EmuCPUState*);
+
+/* The interned C struct type representing CPUState for `arch`. The
+ * lifter references fields through an ObjSymId resolved by the
+ * runtime extern resolver to &CfreeEmu->cpu storage. */
+const Type* emu_cpu_type(Compiler*, CfreeEmuArch);
+
+/* The function type `u64 (CPUState*)` used for every lifted block.
+ * Returned interned. */
+const Type* emu_block_fn_type(Compiler*, CfreeEmuArch);
+
+/* ---- Decoder ---------------------------------------------------- */
+/* Concrete shape lives here (rather than as a per-ISA opaque) so the
+ * translator can stack-allocate a fixed-size buffer in
+ * cfree_emu_lookup. Per-ISA decoders/lifters interpret the operand
+ * payload through their own enums; the carrier is shared. */
+typedef struct EmuInst {
+ u32 op; /* per-ISA enum */
+ u32 flags; /* TERMINATOR | MEM | SETS_FLAGS | ... */
+ u64 guest_pc;
+ u32 guest_bytes; /* instruction width in guest bytes */
+ u32 nop;
+ u64 operands[6]; /* per-ISA payload */
+} EmuInst;
+
+/* Decode up to the next basic-block terminator or `max` instructions,
+ * whichever comes first. Returns the count written to `out`. Zero
+ * means decode failed at `guest_pc` (undecodable / out-of-bounds). */
+u32 emu_decode_block(CfreeEmuArch, const u8* bytes, u64 guest_pc,
+ EmuInst* out, u32 max);
+
+/* ---- Lifter ----------------------------------------------------- */
+
+typedef struct EmuLiftCtx {
+ CfreeEmuArch arch;
+ const Type* cpu_state_type; /* from emu_cpu_type */
+ const Type* block_fn_type; /* from emu_block_fn_type */
+ ObjSymId block_sym; /* function symbol for this block */
+ u64 guest_pc; /* PC of first instruction in the block */
+} EmuLiftCtx;
+
+/* Walk `insts` and emit one CG function (signature next_pc_t(CPUState*))
+ * for the block. Calls cg_func_begin/end exactly once. */
+void emu_lift_block(CfreeEmuArch, CG*, const EmuInst* insts, u32 n,
+ const EmuLiftCtx*);
+
+/* ---- Code cache ------------------------------------------------- */
+
+typedef struct EmuCodeCache EmuCodeCache;
+
+EmuCodeCache* emu_cache_new(Compiler*);
+void emu_cache_free(EmuCodeCache*);
+void emu_cache_insert(EmuCodeCache*, u64 guest_pc, void* host_entry);
+void* emu_cache_lookup(const EmuCodeCache*, u64 guest_pc);
+
+/* ---- Code region (reserved VA) ---------------------------------- */
+/* PROT_NONE mmap that backs the linker's bump-allocated VA range.
+ * Pages are committed and flipped to RX after each link_resolve_extend
+ * lands new sections. The base address is fed to link_resolve_at as the
+ * image's runtime VA. */
+typedef struct EmuCodeRegion EmuCodeRegion;
+
+EmuCodeRegion* emu_code_region_new (Compiler*, size_t reserve_size);
+void emu_code_region_free(EmuCodeRegion*);
+uintptr_t emu_code_region_base(const EmuCodeRegion*);
+size_t emu_code_region_size(const EmuCodeRegion*);
+
+/* Commits and mprotects RX every page covering [base, end). `end` must
+ * lie inside the reserved range and must be monotonically non-decreasing
+ * across calls — the chaining invariant depends on previously committed
+ * pages remaining RX. */
+void emu_code_region_commit_rx_to(EmuCodeRegion*, uintptr_t end);
+
+/* ---- Runtime helpers -------------------------------------------- */
+
+/* Names of the runtime helper symbols the lifter emits as undefined
+ * externs. The extern resolver maps each one to the host address of
+ * the matching helper. Kept centralized so decode/lift/runtime agree. */
+#define EMU_SYM_CPU_STATE "__emu_cpu_state"
+#define EMU_SYM_LOAD8 "__emu_load8"
+#define EMU_SYM_LOAD16 "__emu_load16"
+#define EMU_SYM_LOAD32 "__emu_load32"
+#define EMU_SYM_LOAD64 "__emu_load64"
+#define EMU_SYM_STORE8 "__emu_store8"
+#define EMU_SYM_STORE16 "__emu_store16"
+#define EMU_SYM_STORE32 "__emu_store32"
+#define EMU_SYM_STORE64 "__emu_store64"
+#define EMU_SYM_SYSCALL "__emu_syscall"
+#define EMU_SYM_DISPATCH "__emu_dispatch"
+
+/* The block-symbol name format: emu_block_<hex_pc>. Kept short; the
+ * linker globals table only has to find it once per cold miss. */
+Sym emu_block_sym_name(Compiler*, u64 guest_pc);
+
+/* External resolver passed to link_set_extern_resolver. `user` is
+ * the CfreeEmu*. Returns NULL for unrecognized names — the linker
+ * promotes that to a fatal undefined-symbol diagnostic. */
+void* emu_runtime_extern_resolver(void* user, const char* name);
+
+/* Memory helpers; called from JITted blocks. The host process owns
+ * the guest AS, so loads/stores bounds-check against the EmuCPUState's
+ * mapped guest range and trap on miss (writing EMU_TRAP_FAULT into the
+ * CPU state and falling back to the dispatcher). */
+u8 emu_mem_load8 (EmuCPUState*, u64 addr);
+u16 emu_mem_load16(EmuCPUState*, u64 addr);
+u32 emu_mem_load32(EmuCPUState*, u64 addr);
+u64 emu_mem_load64(EmuCPUState*, u64 addr);
+void emu_mem_store8 (EmuCPUState*, u64 addr, u8);
+void emu_mem_store16(EmuCPUState*, u64 addr, u16);
+void emu_mem_store32(EmuCPUState*, u64 addr, u32);
+void emu_mem_store64(EmuCPUState*, u64 addr, u64);
+
+/* Reads syscall number / args from the guest registers, forwards to
+ * the host OS, and writes the return into the guest return register. */
+void emu_syscall(EmuCPUState*);
+
+/* ---- Tracing ---------------------------------------------------- */
+
+void emu_trace_pc (Compiler*, u64 guest_pc);
+void emu_trace_block(Compiler*, u64 guest_pc);
+void emu_trace_insn (Compiler*, u64 guest_pc, const EmuInst*);
+
+#endif
diff --git a/src/emu/lift.c b/src/emu/lift.c
@@ -0,0 +1,19 @@
+/* Per-ISA lifter. Consumes EmuInsts and drives CG to emit one host
+ * function per guest basic block (signature u64(EmuCPUState*)).
+ * Lifters target CG exclusively — never CGTarget directly — so the
+ * pipeline below CG is unchanged from the C front-end. */
+
+#include "emu/emu.h"
+
+#include "cg/cg.h"
+
+#include <cfree.h>
+
+void emu_lift_block(CfreeEmuArch arch, CG* cg, const EmuInst* insts, u32 n,
+ const EmuLiftCtx* ctx)
+{
+ /* Per-ISA lifter tables not yet landed. translate_block panics
+ * before it would finalize an empty block, so this stub never
+ * silently produces an executable host function. */
+ (void)arch; (void)cg; (void)insts; (void)n; (void)ctx;
+}
diff --git a/src/emu/runtime.c b/src/emu/runtime.c
@@ -0,0 +1,317 @@
+/* Emulator runtime: code cache, reserved JIT VA region, runtime
+ * helper trampolines, and the extern resolver that wires lifted
+ * blocks to host helper addresses. The runtime is in-process — no
+ * separate runtime object — so the JIT linker just hands back the
+ * helper addresses through emu_runtime_extern_resolver.
+ *
+ * Block chaining lives here too (a runtime mprotect-and-patch pass
+ * outside the linker) but lands with the per-ISA lifter; see
+ * doc/EMU.md §6 for why it sits outside link/. */
+
+#include "emu/emu.h"
+
+#include "core/heap.h"
+
+#include <cfree.h>
+
+#include <string.h>
+
+/* ============================================================
+ * Reserved code region
+ * ============================================================
+ * One up-front PROT_NONE reservation through env->execmem. The base
+ * address is fed to link_resolve_at as the image's runtime VA; per-
+ * block link_resolve_extend bump-allocates within. Pages are committed
+ * (protect to RX) lazily as blocks land — the runtime flips them
+ * after the linker writes the section bytes and applies relocations.
+ */
+
+static SrcLoc no_loc(void) { SrcLoc l = {0,0,0}; return l; }
+
+static const CfreeExecMem* require_execmem(Compiler* c)
+{
+ const CfreeExecMem* m = c->env ? c->env->execmem : NULL;
+ if (!m || !m->reserve || !m->protect || !m->release) {
+ compiler_panic(c, no_loc(),
+ "emu: env->execmem is required for the code region");
+ }
+ return m;
+}
+
+static u64 page_size_bytes(const CfreeExecMem* m)
+{
+ return m->page_size ? (u64)m->page_size : 0x4000u;
+}
+
+static u64 align_up_u64(u64 v, u64 a)
+{
+ return (v + (a - 1u)) & ~(a - 1u);
+}
+
+struct EmuCodeRegion {
+ Compiler* c;
+ void* base;
+ size_t size;
+ uintptr_t rx_end; /* high-water of pages currently RX */
+};
+
+EmuCodeRegion* emu_code_region_new(Compiler* c, size_t reserve_size)
+{
+ Heap* h;
+ const CfreeExecMem* mem;
+ EmuCodeRegion* r;
+ void* p;
+ size_t aligned;
+
+ if (!c) return NULL;
+ h = (Heap*)c->env->heap;
+ mem = require_execmem(c);
+ aligned = (size_t)align_up_u64((u64)reserve_size, page_size_bytes(mem));
+
+ p = mem->reserve(mem->user, aligned, CFREE_PROT_NONE);
+ if (!p) return NULL;
+
+ r = (EmuCodeRegion*)h->alloc(h, sizeof(*r), _Alignof(EmuCodeRegion));
+ if (!r) { mem->release(mem->user, p, aligned); return NULL; }
+ r->c = c;
+ r->base = p;
+ r->size = aligned;
+ r->rx_end = (uintptr_t)p;
+ return r;
+}
+
+void emu_code_region_free(EmuCodeRegion* r)
+{
+ Heap* h;
+ const CfreeExecMem* mem;
+ if (!r) return;
+ h = (Heap*)r->c->env->heap;
+ mem = r->c->env->execmem;
+ if (r->base && r->size && mem && mem->release) {
+ mem->release(mem->user, r->base, r->size);
+ }
+ h->free(h, r, sizeof(*r));
+}
+
+uintptr_t emu_code_region_base(const EmuCodeRegion* r)
+{
+ return r ? (uintptr_t)r->base : 0;
+}
+
+size_t emu_code_region_size(const EmuCodeRegion* r)
+{
+ return r ? r->size : 0;
+}
+
+void emu_code_region_commit_rx_to(EmuCodeRegion* r, uintptr_t end)
+{
+ uintptr_t base, page_end;
+ size_t len;
+ if (!r) return;
+ base = (uintptr_t)r->base;
+ page_end = (uintptr_t)align_up_u64((u64)end, page_size_bytes());
+ /* Monotonic: never lower the high-water; chaining patches
+ * already-committed code and depends on it staying RX. */
+ if (page_end <= r->rx_end) return;
+ if (page_end > base + r->size) page_end = base + r->size;
+ if (page_end <= r->rx_end) return;
+
+ len = (size_t)(page_end - r->rx_end);
+ /* Linker has already written + relocated the section bytes via
+ * the original PROT_NONE mapping (which is technically a fault
+ * unless the mapping was promoted to RW). The actual write path
+ * is owned by link_resolve_extend; in v1 we expect the linker
+ * to use mprotect-RW prior to writing. RX flip happens here. */
+ if (mprotect((void*)r->rx_end, len, PROT_READ | PROT_EXEC) == 0) {
+#ifdef __aarch64__
+ /* Flush data caches and invalidate icache so the CPU sees
+ * the freshly written instructions. */
+ __builtin___clear_cache((char*)r->rx_end, (char*)page_end);
+#endif
+ r->rx_end = page_end;
+ }
+}
+
+/* ============================================================
+ * Code cache (guest_pc -> host entry)
+ * ============================================================
+ * Open-addressed linear-probe hash on the guest PC. Capacity grows
+ * by doubling; v1 never evicts. */
+
+typedef struct EmuCacheEntry {
+ u64 guest_pc; /* 0 means empty slot */
+ void* host_entry;
+} EmuCacheEntry;
+
+struct EmuCodeCache {
+ Compiler* c;
+ EmuCacheEntry* slots;
+ u32 cap;
+ u32 used;
+};
+
+static u64 mix_pc(u64 x)
+{
+ x ^= x >> 33; x *= 0xff51afd7ed558ccdull;
+ x ^= x >> 33; x *= 0xc4ceb9fe1a85ec53ull;
+ x ^= x >> 33;
+ return x;
+}
+
+static void cache_resize(EmuCodeCache* c, u32 new_cap)
+{
+ Heap* h = (Heap*)c->c->env->heap;
+ EmuCacheEntry* fresh;
+ u32 i, mask;
+ fresh = (EmuCacheEntry*)h->alloc(h, sizeof(*fresh) * new_cap,
+ _Alignof(EmuCacheEntry));
+ if (!fresh) return;
+ memset(fresh, 0, sizeof(*fresh) * new_cap);
+ mask = new_cap - 1u;
+ for (i = 0; i < c->cap; ++i) {
+ u64 pc = c->slots[i].guest_pc;
+ u32 j;
+ if (pc == 0) continue;
+ j = (u32)mix_pc(pc) & mask;
+ while (fresh[j].guest_pc != 0) j = (j + 1u) & mask;
+ fresh[j] = c->slots[i];
+ }
+ if (c->slots) h->free(h, c->slots, sizeof(*c->slots) * c->cap);
+ c->slots = fresh;
+ c->cap = new_cap;
+}
+
+EmuCodeCache* emu_cache_new(Compiler* c)
+{
+ Heap* h;
+ EmuCodeCache* k;
+ if (!c) return NULL;
+ h = (Heap*)c->env->heap;
+ k = (EmuCodeCache*)h->alloc(h, sizeof(*k), _Alignof(EmuCodeCache));
+ if (!k) return NULL;
+ memset(k, 0, sizeof(*k));
+ k->c = c;
+ cache_resize(k, 64u);
+ return k;
+}
+
+void emu_cache_free(EmuCodeCache* c)
+{
+ Heap* h;
+ if (!c) return;
+ h = (Heap*)c->c->env->heap;
+ if (c->slots) h->free(h, c->slots, sizeof(*c->slots) * c->cap);
+ h->free(h, c, sizeof(*c));
+}
+
+void emu_cache_insert(EmuCodeCache* c, u64 guest_pc, void* host_entry)
+{
+ u32 mask, j;
+ if (!c || guest_pc == 0) return;
+ if (c->used * 4u >= c->cap * 3u) cache_resize(c, c->cap * 2u);
+ mask = c->cap - 1u;
+ j = (u32)mix_pc(guest_pc) & mask;
+ while (c->slots[j].guest_pc != 0) {
+ if (c->slots[j].guest_pc == guest_pc) {
+ c->slots[j].host_entry = host_entry;
+ return;
+ }
+ j = (j + 1u) & mask;
+ }
+ c->slots[j].guest_pc = guest_pc;
+ c->slots[j].host_entry = host_entry;
+ c->used++;
+}
+
+void* emu_cache_lookup(const EmuCodeCache* c, u64 guest_pc)
+{
+ u32 mask, j;
+ if (!c || c->cap == 0 || guest_pc == 0) return NULL;
+ mask = c->cap - 1u;
+ j = (u32)mix_pc(guest_pc) & mask;
+ while (c->slots[j].guest_pc != 0) {
+ if (c->slots[j].guest_pc == guest_pc) return c->slots[j].host_entry;
+ j = (j + 1u) & mask;
+ }
+ return NULL;
+}
+
+/* ============================================================
+ * Runtime helper trampolines
+ * ============================================================
+ * Lifted blocks call into these through extern symbols whose names
+ * are EMU_SYM_*. The resolver below maps each name to the address
+ * of the matching function (or, for EMU_SYM_CPU_STATE, the address
+ * of the running emu's CPUState). */
+
+/* Forward-declare the host-private CfreeEmu shape so the resolver
+ * can pull the CPUState pointer without dragging emu.c's struct
+ * definition into this TU's contract. */
+struct CfreeEmu;
+EmuCPUState* emu_internal_cpu(struct CfreeEmu*);
+
+/* Memory helpers. Per EMU.md §5.4 these bounds-check the guest
+ * address against the mapped guest AS and trap on miss. v1 stubs
+ * write a fault into the CPU state and return zero; the dispatcher
+ * picks up the trap on return from the block. */
+
+u8 emu_mem_load8 (EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; }
+u16 emu_mem_load16(EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; }
+u32 emu_mem_load32(EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; }
+u64 emu_mem_load64(EmuCPUState* s, u64 addr) { (void)s; (void)addr; return 0; }
+
+void emu_mem_store8 (EmuCPUState* s, u64 addr, u8 v) { (void)s; (void)addr; (void)v; }
+void emu_mem_store16(EmuCPUState* s, u64 addr, u16 v) { (void)s; (void)addr; (void)v; }
+void emu_mem_store32(EmuCPUState* s, u64 addr, u32 v) { (void)s; (void)addr; (void)v; }
+void emu_mem_store64(EmuCPUState* s, u64 addr, u64 v) { (void)s; (void)addr; (void)v; }
+
+void emu_syscall(EmuCPUState* s) { (void)s; }
+
+/* ============================================================
+ * Extern resolver
+ * ============================================================
+ * Called by the linker for any undefined symbol the per-block
+ * ObjBuilder references. Returns the host VA of the named helper
+ * (or the running emu's CPUState). Returning NULL surfaces as a
+ * fatal "undefined reference" diagnostic from link_resolve_extend. */
+
+static int streq(const char* a, const char* b)
+{
+ while (*a && *a == *b) { ++a; ++b; }
+ return *a == 0 && *b == 0;
+}
+
+void* emu_runtime_extern_resolver(void* user, const char* name)
+{
+ if (!name) return NULL;
+
+ if (streq(name, EMU_SYM_CPU_STATE)) {
+ struct CfreeEmu* e = (struct CfreeEmu*)user;
+ return (void*)emu_internal_cpu(e);
+ }
+
+ if (streq(name, EMU_SYM_LOAD8)) return (void*)emu_mem_load8;
+ if (streq(name, EMU_SYM_LOAD16)) return (void*)emu_mem_load16;
+ if (streq(name, EMU_SYM_LOAD32)) return (void*)emu_mem_load32;
+ if (streq(name, EMU_SYM_LOAD64)) return (void*)emu_mem_load64;
+ if (streq(name, EMU_SYM_STORE8)) return (void*)emu_mem_store8;
+ if (streq(name, EMU_SYM_STORE16)) return (void*)emu_mem_store16;
+ if (streq(name, EMU_SYM_STORE32)) return (void*)emu_mem_store32;
+ if (streq(name, EMU_SYM_STORE64)) return (void*)emu_mem_store64;
+ if (streq(name, EMU_SYM_SYSCALL)) return (void*)emu_syscall;
+
+ /* EMU_SYM_DISPATCH is the cross-block tail-call helper; it shares
+ * the host address of the dispatcher entry. The dispatcher loop
+ * lives inside cfree_emu_step, so the lifter can also synthesize
+ * a return-of-next_pc instead of a real call here. v1 returns
+ * NULL — lifters that don't yet emit DISPATCH calls are fine. */
+
+ return NULL;
+}
+
+/* Tracing. v1 emits to the env's diag sink at CFREE_DIAG_NOTE. The
+ * full implementation lands with the lifter so it can format guest
+ * PCs and decoded instruction text consistently. */
+
+void emu_trace_pc (Compiler* c, u64 pc) { (void)c; (void)pc; }
+void emu_trace_block(Compiler* c, u64 pc) { (void)c; (void)pc; }
diff --git a/src/link/link.c b/src/link/link.c
@@ -397,6 +397,34 @@ void link_image_free(LinkImage* img)
link_image_release(img);
}
+/* ---- Incremental resolution (stubs) ----
+ * Per-block JIT translation in src/emu/ wants to grow a single
+ * LinkImage as cold blocks land (doc/EMU.md §6). The single-shot
+ * link_resolve discipline (link.h header comment) is set up to
+ * support this — inputs are non-destructively consumed, ObjBuilder*
+ * mappings are stable, resolution is functional. The two entries
+ * below are the surface; the implementation lands alongside the
+ * emu lifter cut. */
+
+LinkImage* link_resolve_at(Linker* l, uintptr_t base_va)
+{
+ (void)base_va;
+ if (!l) return NULL;
+ compiler_panic(l->c, no_loc(),
+ "link_resolve_at: incremental resolution not yet "
+ "implemented");
+ return NULL;
+}
+
+void link_resolve_extend(Linker* l, LinkImage* img)
+{
+ (void)img;
+ if (!l) return;
+ compiler_panic(l->c, no_loc(),
+ "link_resolve_extend: incremental resolution not "
+ "yet implemented");
+}
+
/* ---- public emit dispatcher ---- */
void link_emit_image_writer(LinkImage* img, Writer* w)
diff --git a/src/link/link.h b/src/link/link.h
@@ -148,6 +148,21 @@ void link_set_gc_sections(Linker*, int enable);
* comment locks in the implementation discipline that keeps the existing
* surface amenable, with no speculative API. */
LinkImage* link_resolve(Linker*);
+
+/* Incremental resolution (per doc/EMU.md §6). link_resolve_at reserves
+ * the image's layout starting at the caller-specified base VA — used
+ * by the emu so the JIT image's host addresses are stable for the
+ * session (chaining patches live host code with section addresses).
+ * link_resolve_extend appends new inputs to an existing image: places
+ * new sections at the next free offset within the reserved region,
+ * resolves new symbols against the existing image's globals plus the
+ * registered LinkExternResolver, and applies new relocations. It
+ * MUST NOT change host addresses of previously placed sections —
+ * chaining and the code cache depend on it. The image must have been
+ * produced by a prior link_resolve_at call on the same Linker. */
+LinkImage* link_resolve_at (Linker*, uintptr_t base_va);
+void link_resolve_extend(Linker*, LinkImage*);
+
void link_image_free(LinkImage*);
const LinkSymbol* link_symbol(LinkImage*, LinkSymId);
LinkSymId link_symbol_lookup(LinkImage*, Sym name);