kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit b89d452878b5db44a453092e6589a72c0bc5876f
parent 5b42e9230e1edf716949e3e618fe329bca04ed13
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 11 May 2026 10:19:34 -0700

dbg: JIT debugger — session, displaced single-step, REPL wire-up

Lights up `cfree dbg` end-to-end on aarch64 (macOS Apple silicon + Linux):
compile + JIT-link a C source, set software breakpoints, run, continue,
single-step, examine memory, dump registers, list JIT symbols, quit
cleanly. Source-level features (bt, typed `p NAME`, n/sl/finish) are
wired but gated on cfree_jit_view, which remains a stub.

Public API (include/cfree.h):
- CfreeDbgOs + CfreeDbgSignalOps vtables; hung off CfreeEnv.dbg_os.
- cfree_jit_session_attach_dwarf for source-level resume modes.
- cfree_arch_reg_iter_* replaced by cfree_arch_register_count / _at
  (stateless, allocation-free; dense iteration indices unrelated to the
  sparse DWARF numbering).

Library — new src/dbg/:
- session.c: worker thread + park/unpark; on_fault classifier.
- bp.c: refcounted BRK patch table, idempotent set/clear, read overlay
  that substitutes saved bytes for patched ranges.
- displaced.c + arch_aa64.c: out-of-line single-step. aa64 shim covers
  B/BL/B.cond (near + far via trampoline), CBZ/CBNZ, TBZ/TBNZ, ADR,
  ADRP, LDR-literal (W/X/SW), BR/BLR/RET, plus verbatim copy for
  non-PC-relative insns.
- step.c: STEP_LINE / NEXT_LINE / STEP_OUT state machines on top of the
  displaced primitive.
- mem.c: read/write via dbg_os->guarded_copy.

Library — linker bridge (src/link/link_jit.c):
- cfree_jit_image_contains / _arch / _compiler for src/dbg/.
- cfree_jit_sym_iter_* + cfree_jit_addr_to_sym walking LinkImage->syms;
  names demangled via obj_format_demangle_c.

Library — stubs gone (src/api/stubs.c):
- All cfree_jit_session_*, CfreeArchRegIter, the sym-iter stubs.
- cfree_jit_session_get_regs (was declared without a body) now real.

Host adapter (driver/env.c):
- POSIX g_dbg_os_posix singleton: pthread_create/join/kill thread shim,
  condvar-pair event shim, SA_SIGINFO handlers for SIGTRAP/SEGV/BUS/
  ILL/FPE/SIGUSR2 with aarch64 ucontext marshalling, dual-mapping
  registry for the code-write window, sys_icache_invalidate /
  __builtin___clear_cache, TLS sigsetjmp guarded copy.

Driver (driver/dbg.c):
- attach_dwarf right after cfree_dwarf_open.
- info reg loops cfree_arch_register_count + _at.

See doc/DBG.md for the full design and a checklist of what's landed
vs. open.

Diffstat:
Adoc/DBG.md | 443+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mdriver/dbg.c | 15+++++++++------
Mdriver/driver.h | 3++-
Mdriver/env.c | 467++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
Minclude/cfree.h | 100++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
Msrc/api/arch_regs.c | 29+++++++++++++++++++++--------
Msrc/api/stubs.c | 89+++++--------------------------------------------------------------------------
Asrc/dbg/arch_aa64.c | 235+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/dbg/bp.c | 224+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/dbg/dbg.h | 183+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/dbg/displaced.c | 121+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/dbg/mem.c | 27+++++++++++++++++++++++++++
Asrc/dbg/session.c | 399+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Asrc/dbg/step.c | 200+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Msrc/link/link_jit.c | 109+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
15 files changed, 2528 insertions(+), 116 deletions(-)

diff --git a/doc/DBG.md b/doc/DBG.md @@ -0,0 +1,443 @@ +# cfree dbg design + +Architecture of `cfree dbg`, the interactive JIT debugger. Companion to +`DESIGN.md`. Scope: how the REPL drives the JIT'd code under controlled +execution, and where the OS-specific machinery is isolated. Not a tutorial; +not implementation notes. + +## 1. Goals + +- `dbg` multi-call subcommand: compile C sources with `-g`, JIT-link, and + drop into a REPL that controls one worker running the JIT'd entry. +- Source-level operations: breakpoints by `file:line`, `sym[+off]`, or + `0xADDR`; step-insn / step-line / next-line / finish; backtrace using + DWARF CFI; print and write named variables; raw memory examine. +- All OS interaction — threads, signals, ucontext, page protection flips, + icache flushing — funnels through a single env vtable + (`CfreeDbgOs`, §5) populated by `driver/env.c`. `src/dbg/` (the + library-side session) is C11 freestanding like the rest of `src/`. +- v1 target: aarch64 on macOS and Linux. Other arches/hosts follow once the + contracts in §5 and §8 are stable. + +## 2. Non-goals (v1) + +- Multi-threaded guests. Single worker thread; concurrent guest threads are + future work and will require widening `CfreeDbgOs` (thread enumeration, + per-tid stop). +- Out-of-process / remote debugging. The worker runs in the same address + space as the REPL — read/write memory is a guarded `memcpy`, not + `ptrace`/`mach_vm_*`. +- Hardware breakpoints, watchpoints, conditional-breakpoint JIT codegen. + All breakpoints are software (BRK patch); the `condition` callback in + `CfreeBreakpointSpec` is host-side C. +- Stepping through optimized code with reconstructed values. `-g` is + forced on, but `-O0` is the contract for usable source-level stepping. +- Forked-child / `exec`-into-guest. The JIT entry is called directly. +- x86_64 / rv64 single-step. Wiring exists at the vtable layer; per-arch + displaced-step lifters land after aa64 is solid. + +## 3. Layout + +``` +include/ + cfree.h CfreeJitSession + CfreeDbgOs + CfreeStopInfo (public) + +src/ + dbg/ library-side session (NEW) + session.c worker thread, park/unpark, stop dispatch + bp.c breakpoint patch table (addr -> saved bytes, refcount) + step.c resume-mode state machine (insn / line / next / out) + displaced.c arch-neutral plumbing for out-of-line execution + arch_aa64.c aa64 BRK encoding + PC-relative fixups for displaced + arch_x64.c (later) + arch_rv64.c (later) + mem.c read/write_mem with sigsetjmp bad-address guard + dbg.h internal contracts + +driver/ + dbg.c REPL (already exists; see §4) + env.c CfreeDbgOs POSIX impl (NEW section in this file) +``` + +## 4. Dataflow + +``` +stdin → dbg REPL → CfreeJitSession ──► worker thread runs JIT entry + │ │ + │ ▼ + │ SIGTRAP / SIGSEGV / ... + │ │ + ▼ ▼ + CfreeStopInfo ◄────── ucontext → CfreeUnwindFrame + │ + ▼ + DWARF (line/CFI/var) → user-visible output +``` + +The REPL is already wired (`driver/dbg.c`). It owns the breakpoint id +namespace presented to the user, the DWARF consumer, and SIGINT +forwarding at the prompt. Everything *behind* `cfree_jit_session_*` — +threading, signals, memory protection — is what this doc covers. + +## 5. CfreeDbgOs vtable + +The session never calls `pthread_*`, `sigaction`, `mprotect`, or +`pthread_jit_write_protect_np` directly. All host primitives go through +a single vtable supplied via `CfreeEnv`. `driver/env.c` is the only TU +in the tree that includes `<pthread.h>`, `<signal.h>`, or `<sys/mman.h>` +for debugger use; `src/dbg/` stays freestanding. + +```c +typedef struct CfreeDbgOs { + /* --- threading ------------------------------------------------- */ + /* Spawn `fn(arg)` on a new thread. *out is an opaque handle the + * session passes back to join/cancel. Returns 0 on success. */ + int (*thread_start)(void* user, void (*fn)(void*), void* arg, + void** thread_out); + /* Block until the worker exits; releases the handle. */ + void (*thread_join)(void* user, void* thread); + /* Async-signal-safe: deliver the debugger's interrupt signal to the + * worker thread. Called from session_interrupt; must use + * pthread_kill / equivalent. */ + int (*thread_signal_self_worker)(void* user); + + /* --- park/unpark ----------------------------------------------- */ + /* One-shot binary handoffs. The session uses two: one for "worker + * has stopped, REPL may inspect" and one for "REPL has issued + * resume, worker may continue". `event_new` allocates; `wait` + * blocks until `signal`; `reset` rearms. */ + void* (*event_new)(void* user); + void (*event_free)(void* user, void* ev); + void (*event_wait)(void* user, void* ev); + void (*event_signal)(void* user, void* ev); + void (*event_reset)(void* user, void* ev); + + /* --- signal plumbing ------------------------------------------- */ + /* Install process-wide handlers for SIGTRAP, SIGSEGV, SIGBUS, + * SIGILL, SIGFPE, and one user-chosen signal for INTERRUPT (the + * implementation reserves SIGUSR2 on POSIX). The handler must: + * 1. confirm the faulting thread is the worker (pthread_self == + * the worker tid captured at thread_start); + * 2. snapshot the ucontext into the CfreeUnwindFrame buffer the + * session pre-registered via `register_stop_slot`; + * 3. classify the cause (which signal, was the PC patched, was an + * interrupt pending) and store the CfreeStopKind; + * 4. event_signal the stop event; + * 5. event_wait the resume event; + * 6. on resume, write any mutated regs back into the ucontext and + * return so the kernel restarts the worker. + * The session supplies the snapshot/classify/wait callbacks; the + * OS impl only owns sigaction + the ucontext <-> CfreeUnwindFrame + * marshalling for the host arch. */ + int (*signals_install)(void* user, const CfreeDbgSignalOps* ops, + void* session); + void (*signals_uninstall)(void* user); + + /* Slot the handler reads/writes. The session owns the memory; this + * is just a pointer the OS layer caches so it can be reached from + * async-signal context without indirecting through the session. */ + void (*register_stop_slot)(void* user, CfreeUnwindFrame* regs, + CfreeStopKind* kind, int* signal_out); + + /* --- memory protection (W^X dance for the BRK patch) ---------- */ + /* CfreeExecMem already provides reserve/protect/release. The dbg + * extension is per-page write-window: on Apple silicon the JIT + * pages live in the dual-mapping (write alias is RW), so the patch + * goes through `write` and the BRK is observed at `runtime`. On + * Linux a transient mprotect RW->RX flip is required. The session + * asks the OS layer to "open" and "close" a write window over an + * address range that lies inside an existing JIT reservation. */ + int (*code_write_begin)(void* user, void* runtime_addr, size_t n, + void** write_addr_out); + void (*code_write_end)(void* user, void* runtime_addr, size_t n); + void (*flush_icache)(void* user, void* runtime_addr, size_t n); + + /* --- fault-guarded memory copy -------------------------------- */ + /* Read/write `n` bytes between guest (in-process) memory and a + * caller buffer, returning nonzero on SIGSEGV/SIGBUS. Implemented + * with sigsetjmp + a SIGSEGV handler scoped to a TLS landing slot + * the dbg OS owns; the standard fault handlers above defer to this + * landing slot when set. */ + int (*guarded_copy)(void* user, void* dst, const void* src, size_t n); + + void* user; +} CfreeDbgOs; +``` + +`CfreeEnv` gains one new field: + +```c +const CfreeDbgOs* dbg_os; /* NULL ok unless dbg paths run */ +``` + +The session looks at `env->dbg_os` once, in `cfree_jit_session_new`, +and returns NULL if absent — exactly the contract `dbg.c:1862-1868` +already expects. + +## 6. Session lifecycle + +``` +session_new + ├── allocate state, stop slot, two events + ├── dbg_os->signals_install + └── dbg_os->thread_start(worker_main) + │ + ▼ (worker) ┌── (REPL) + wait resume_event │ + call entry │ session_call(entry, argv) + │ │ install entry args + │ │ event_signal(resume) + │ trap │ event_wait(stop) + handler: snapshot, │ + classify, signal stop, │ + wait resume │ + │ │ session_resume(MODE) + │ │ prepare per-mode trampoline / + │ │ one-shot bps (§7,§8) + │ │ event_reset(stop) + │ │ event_signal(resume) + │ │ event_wait(stop) + entry returns │ + final stop = EXIT ─────┘ session_call returns CfreeStopInfo +session_free + ├── dbg_os->thread_join + └── dbg_os->signals_uninstall +``` + +Invariants: + +- Exactly one worker per session. `session_call` is rejected if the + worker is already running. +- The REPL thread never reads `CfreeStopInfo` while the worker is + unparked. `event_wait(stop)` is the only synchronization point. +- The worker never writes to session state except through the + preregistered stop slot from async-signal context. + +## 7. Software breakpoints + +aa64-specific encoding lives in `src/dbg/arch_aa64.c`; everything else +in `src/dbg/bp.c` is arch-neutral. + +- Patch instruction: `BRK #0` (4 bytes on aa64; `0xCC` on x64 later). +- Per-address entry: `{addr, saved_bytes[8], refcount, enabled}`. The + refcount lets the line/next-line state machine drop temporary + breakpoints without disturbing user breakpoints at the same PC. +- Install: `code_write_begin(addr, 4)` → `memcpy` original out → + write `BRK` → `code_write_end` → `flush_icache(addr, 4)`. +- Clear: reverse, refcount-gated. +- On a SIGTRAP, the handler looks up the faulting PC in the table. + Hit → stop kind BREAKPOINT, bp_id from the table. Miss → stop kind + SIGNAL with `signal=SIGTRAP` (program-emitted BRK passes through). +- The trap byte is *never* visible to a `read_mem` against the patched + range: the bp table is consulted first and the saved byte is + substituted. `info b` and disassembly stay honest. + +## 8. Displaced single-step + +User-mode aarch64 has no architectural single-step (MDSCR.SS is EL1). +`RESUME_STEP_INSN`, and "resume past a breakpoint", both use an +out-of-line copy. + +- Each session reserves a small executable scratch page (one slot per + worker; 64 bytes is enough for one fixed-up insn + a B back). +- `displaced_prepare(insn_addr)`: + 1. Read 4 bytes of original (from the bp table, not the patched + memory). + 2. Fix up PC-relative operands so the insn behaves at the scratch + address: `B`, `BL`, `B.cond`, `CBZ/CBNZ`, `TBZ/TBNZ`, `ADR`, + `ADRP`, `LDR (literal)`. Branch targets become absolute via a + post-insn `B`/`MOV+BR`; ADR/ADRP get the original PC + substituted; LDR-literal gets converted to a synthesized load + from a fixed-up immediate. + 3. Append a one-shot internal breakpoint at `scratch + brk_offset` + where `brk_offset` is set by the arch fixup (4 for verbatim-copy + forms, larger for multi-insn trampolines that resolve out-of-range + CBZ/TBZ/B.cond, ADR/ADRP literal-loads, and LDR-literal). The + trampoline shape is `cond-branch +8 ; BRK ; LDR x16,[pc+N] ; BR x16 ; + <8-byte literal pool>`. +- Set worker PC to scratch, resume. On the return-slot BRK the handler + restores PC to `insn_addr + 4` and reports the user-visible stop kind + for the mode (STEP_INSN → MODE_DONE; STEP_LINE → continue to next + line-table entry; etc.). +- For indirect branches (`BR`, `BLR`, `RET`) the original insn is copied + verbatim — the trailing BRK never fires because control leaves the + scratch slot. `displaced_finalize` is idempotent; the next + `displaced_prepare` clears any lingering internal bp before laying + down the new shim. + +`STEP_LINE` / `NEXT_LINE` / `STEP_OUT` are state machines on top of +this primitive (`src/dbg/step.c`): + +- STEP_LINE: after each insn step, check `cfree_dwarf_addr_to_line`; + stop when the line index changes and the PC stays inside the + current subprogram. +- NEXT_LINE: same, but if the next insn is a `BL` to a sub at lower + call-depth than the current frame, drop a one-shot bp at the return + address (from `cfree_dwarf_unwind_step` on the current frame) and + CONTINUE instead of stepping. +- STEP_OUT: one-shot bp at the unwound return address; CONTINUE. + +## 9. Memory and register access + +- `session_read_mem`/`session_write_mem` route through + `dbg_os->guarded_copy`, which sets a TLS sigsetjmp landing slot + before the copy and tears it down after. The standard SIGSEGV + handler (installed at session_new) checks the slot first and longjmps + if set; otherwise it falls through to the normal "worker took a + fault" path. This keeps `p` and `x` from killing the worker on a + bad pointer. +- Register access reads/writes the `CfreeUnwindFrame` snapshot the + handler captured. `set_regs` mutates the same slot; the handler + writes it back into the ucontext on resume. +- `cfree_jit_session_get_regs` (currently declared but not even + stubbed in `src/api/`) lands in `src/dbg/session.c` as a simple + copy from the slot. + +## 10. DWARF integration + +The DWARF consumer entries exist and are used by the driver: + +- `cfree_dwarf_line_to_addr` supplies `b file:line` targets. +- `cfree_dwarf_unwind_step` powers `bt` and is reused by `STEP_OUT` / + `NEXT_LINE`. +- `cfree_dwarf_var_at` + `cfree_dwarf_loc_read` drive `p name` and + `set name = ...`; the loc reader calls `session_read_mem`, which is + live. + +The session takes its DWARF handle through +`cfree_jit_session_attach_dwarf(session, debug_info)` — the binding is +optional; source-level resume modes return an error if it's absent. + +**Current gap (Task #4 in §12).** `cfree_jit_view` is still a stub +returning NULL (`src/link/link_jit.c`), so the driver's +`cfree_dwarf_open` call yields NULL and `attach_dwarf` is never called. +Every DWARF-dependent feature in the REPL (`bt`, source-level steps, +typed `p name`, `info locals/args`, `b file:line`) is therefore offline +even though the lib-side wiring is complete. Fix is independent and +self-contained; see §12. + +## 11. Driver-side changes + +Only `driver/env.c` and `driver/dbg.c` change in the driver: + +- `driver/env.c` carries the `g_dbg_os_posix` singleton populated by + `driver_env_init` and exposed through `driver_env_to_cfree`. Sections: + thread shim (`pthread_create` / `pthread_join` / `pthread_kill`), + event shim (`pthread_mutex_t`+`pthread_cond_t` pair + signaled flag + per event), signal install (one `sigaction` per signo with the cohort + blocked in `sa_mask`), code-write window (a per-process registry + `g_jit_dual_map` records each dual-mapped exec reservation as + `{write_base, runtime_base, size}` so `code_write_begin` can return + the write alias for any runtime address inside it; on hosts without + a dual mapping it falls back to a transient `mprotect` RW↔RX flip + on the page span), icache flush (`sys_icache_invalidate` on Apple, + `__builtin___clear_cache` elsewhere), guarded copy (TLS + `sigjmp_buf` + armed flag; the SEGV/BUS handler checks this slot + before delegating to `on_fault`). +- `driver/dbg.c` calls `cfree_jit_session_attach_dwarf(session, + debug_info)` right after `cfree_dwarf_open` succeeds, so source-level + resume modes light up the moment `cfree_jit_view` returns non-NULL. + The degraded-mode warning at `dbg.c:1862-1868` is left in place; it + now only fires on non-aarch64 hosts. + +## 12. Checklist + +Single source of truth for what's done and what's open. Items grouped +by lane; ordered top-to-bottom by priority within each group. Add new +work here as a new `[ ]` line; never delete completed lines, just flip +the box. + +### Library — `src/dbg/` + +- [x] `session.c` — worker thread, park/unpark, on_fault classifier +- [x] `bp.c` — refcounted patch table, idempotent set/clear, read overlay +- [x] `mem.c` — guarded read/write via `dbg_os->guarded_copy` +- [x] `displaced.c` — scratch page + per-insn shim primitive +- [x] `arch_aa64.c` — verbatim copy + B / BL / B.cond / CBZ / CBNZ / + TBZ / TBNZ / ADR / ADRP / LDR-lit (W/X/SW) / BR / BLR / RET +- [x] `step.c` — `STEP_LINE` / `NEXT_LINE` / `STEP_OUT` state machines +- [ ] `arch_aa64.c`: LDR-literal vector forms (S/D/Q register dest); + currently decline. Common in optimized builds. +- [ ] `arch_x64.c`: INT3 + RIP-relative fixups for the same insn family +- [ ] `arch_rv64.c`: EBREAK + AUIPC/JAL/branch fixups + +### Public API — `include/cfree.h` + +- [x] `CfreeDbgOs` + `CfreeDbgSignalOps` vtables +- [x] `CfreeEnv.dbg_os` field +- [x] `cfree_jit_session_attach_dwarf(session, debug_info)` +- [x] `cfree_jit_session_get_regs` (was declared without a body) +- [x] All `cfree_jit_session_*` stub bodies deleted from `src/api/stubs.c` + +### Linker bridge — `src/link/link_jit.c` + +- [x] `cfree_jit_image_contains(jit, runtime_addr)` +- [x] `cfree_jit_image_arch(jit)` +- [x] `cfree_jit_compiler(jit)` +- [x] `cfree_jit_sym_iter_*` and `cfree_jit_addr_to_sym` — walk + `LinkImage->syms`, surface FUNC / OBJ / COMMON / TLS / IFUNC / + ABS; names go through `obj_format_demangle_c` so Mach-O's + leading `_` is stripped +- [ ] `cfree_jit_view(jit)` — **blocks all DWARF-dependent REPL + features** (`bt`, typed `p NAME`, `b file:line`, `info + locals/args`, source-level resume modes). Either retain input + `CfreeObjBuilder`(s) on `CfreeJit` with post-link debug-section + relocations applied, or emit a fresh `CfreeBytesInput` after + linking and re-open. PC translation from image-relative to + runtime addresses needed on the DWARF side too. + +### Public arch-register API + +- [x] `cfree_arch_register_count(arch)` + `cfree_arch_register_at(arch, + idx, out)` replace the stubbed `cfree_arch_reg_iter_*` surface. + Stateless and allocation-free; dense indices in `[0, count)`, + unrelated to the sparse DWARF numbering. The previous iter API + is removed from `cfree.h` and `src/api/stubs.c`. + +### Host adapter — `driver/env.c` + +- [x] POSIX `g_dbg_os_posix` singleton wired in `driver_env_init` +- [x] aarch64 ucontext marshalling for macOS (Apple silicon) and Linux +- [x] Dual-mapping registry `g_jit_dual_map` for code-write window +- [x] Single-mapping `mprotect` RW↔RX fallback +- [x] TLS sigsetjmp guarded-copy + SEGV/BUS handler check +- [ ] Windows host: vectored exception handlers + `SetThreadContext` + instead of POSIX signals + +### Driver — `driver/dbg.c` + +- [x] `cfree_jit_session_attach_dwarf` call right after `cfree_dwarf_open` +- [x] Degraded-mode warning still present but only fires on non-aarch64 +- [ ] Remove the warning entirely once x64 / rv64 sessions are real + +### Tests (none landed yet — all verification to date is by-hand REPL) + +- [ ] `test/smoke/dbg_hello`: scripted REPL against a JIT'd C source, + golden-transcript diff. Exercise `b sym`, `r`, `c`, `s`, `x ADDR`, + `p NAME`, `q`. aarch64 hosts only. +- [ ] `test/dbg/bp_patch_roundtrip`: install/clear at one address, + verify byte restore, refcount, and `dbg_bp_unpatch_read` overlay +- [ ] `test/dbg/displaced_aa64`: one canned encoding from every + PC-relative family; assert shim bytes + literal pool layout +- [ ] `test/dbg/guarded_copy_segv`: `read_mem` from NULL returns + nonzero, worker survives the next resume +- [ ] `test/dbg/source_step` (gated on `cfree_jit_view`): scripted + REPL drives `n` / `sl` / `finish`, assert reported source line + at each stop + +### Bigger follow-ons + +- [ ] Watchpoints once `CGTarget` can express them without an + ISA-specific debug-register API +- [ ] Multi-thread guests; widen `CfreeDbgOs` with thread enumeration + and per-tid event slots + +### Design note (not a checklist item) + +`cfree_jit_session_free` deliberately leaks the worker thread when +torn down with `state == DBG_STATE_STOPPED`. There is no async-safe +way to unwind a worker parked inside the signal handler without +re-running the program to completion. The session is only freed at +process exit, so the OS reaps the worker; events, signal handlers, +and session memory are left untouched until `_exit`. Keeps +`q`-while-stopped immediate. diff --git a/driver/dbg.c b/driver/dbg.c @@ -1253,27 +1253,27 @@ static void dbg_cmd_info_vars(DbgState* s, uint32_t mask, const char* label) { static void dbg_cmd_info_reg(DbgState* s) { CfreeArchKind arch = driver_host_target().arch; - CfreeArchRegIter* it; - CfreeArchReg r; + uint32_t n = cfree_arch_register_count(arch); + uint32_t i; if (!s->has_stop) { driver_errf(DBG_TOOL, "no program is stopped"); return; } - it = cfree_arch_reg_iter_new(arch); - if (!it) { + if (n == 0) { driver_errf(DBG_TOOL, "no register table for this arch"); return; } driver_printf("pc 0x%016llx\n", (unsigned long long)s->last_stop.regs.pc); driver_printf("cfa 0x%016llx\n", (unsigned long long)s->last_stop.regs.cfa); - while (cfree_arch_reg_iter_next(it, &r)) { + for (i = 0; i < n; ++i) { + CfreeArchReg r; + if (cfree_arch_register_at(arch, i, &r) != 0) continue; if (r.dwarf_idx >= 32) continue; /* outside CfreeUnwindFrame.regs */ driver_printf("%-6s 0x%016llx\n", r.name, (unsigned long long)s->last_stop.regs.regs[r.dwarf_idx]); } - cfree_arch_reg_iter_free(it); } /* ============================================================ @@ -1871,6 +1871,9 @@ int driver_dbg(int argc, char** argv) { st.view = cfree_jit_view(jit); if (st.view) { st.dwarf = cfree_dwarf_open(cfree_pipeline_compiler(pipe), st.view); + if (st.dwarf && st.session) { + cfree_jit_session_attach_dwarf(st.session, st.dwarf); + } } dbg_repl(&st); diff --git a/driver/driver.h b/driver/driver.h @@ -73,7 +73,8 @@ typedef struct DriverEnv { CfreeDiagSink* diag; CfreeFileIO file_io; const CfreeExecMem* execmem; - int64_t now; /* unix seconds; -1 = unknown */ + const CfreeDbgOs* dbg_os; /* NULL unless `cfree dbg` paths run */ + int64_t now; /* unix seconds; -1 = unknown */ } DriverEnv; void driver_env_init(DriverEnv*); diff --git a/driver/env.c b/driver/env.c @@ -1,6 +1,24 @@ +/* ucontext.h is technically deprecated by POSIX but signal handlers + * cannot snapshot register state without it. macOS gates the header on + * _XOPEN_SOURCE; pair it with _DARWIN_C_SOURCE so MAP_ANON / RTLD_DEFAULT + * (which the SUS feature macros would otherwise strip) stay visible. */ +#if defined(__APPLE__) +#ifndef _XOPEN_SOURCE +#define _XOPEN_SOURCE 600 +#endif +#ifndef _DARWIN_C_SOURCE +#define _DARWIN_C_SOURCE 1 +#endif +#endif +#if defined(__linux__) && !defined(_GNU_SOURCE) +#define _GNU_SOURCE 1 +#endif + #include <dlfcn.h> #include <errno.h> #include <fcntl.h> +#include <pthread.h> +#include <setjmp.h> #include <signal.h> #include <stdarg.h> #include <stdint.h> @@ -10,8 +28,13 @@ #include <sys/mman.h> #include <sys/stat.h> #include <time.h> +#include <ucontext.h> #include <unistd.h> +#if defined(__APPLE__) +#include <libkern/OSCacheControl.h> +#endif + #include "driver.h" /* Dual-mapping back-ends for strict W^X. Picks per-platform: @@ -186,6 +209,66 @@ typedef struct ExecMemToken { size_t size; } ExecMemToken; +/* Registry of EXEC reservations with distinct write/runtime aliases. The + * dbg_os code_write_begin path uses this to translate a runtime address + * into the corresponding write alias on dual-mapping hosts. Single-mapping + * reservations (write == runtime) are not registered. JITs typically hold + * 1-2 reservations live so a linked list keeps the lookup trivial. */ +typedef struct ExecDualNode { + void* write_base; + void* runtime_base; + size_t size; + struct ExecDualNode* next; +} ExecDualNode; + +static ExecDualNode* g_jit_dual_map; +static pthread_mutex_t g_jit_dual_map_mu = PTHREAD_MUTEX_INITIALIZER; + +static void exec_dual_register(void* write_base, void* runtime_base, + size_t size) { + ExecDualNode* n; + if (write_base == runtime_base) return; + n = (ExecDualNode*)malloc(sizeof(*n)); + if (!n) return; /* registry is best-effort; lookup will fail open */ + n->write_base = write_base; + n->runtime_base = runtime_base; + n->size = size; + pthread_mutex_lock(&g_jit_dual_map_mu); + n->next = g_jit_dual_map; + g_jit_dual_map = n; + pthread_mutex_unlock(&g_jit_dual_map_mu); +} + +static void exec_dual_unregister(void* runtime_base) { + ExecDualNode** pp; + pthread_mutex_lock(&g_jit_dual_map_mu); + for (pp = &g_jit_dual_map; *pp; pp = &(*pp)->next) { + if ((*pp)->runtime_base == runtime_base) { + ExecDualNode* dead = *pp; + *pp = dead->next; + free(dead); + break; + } + } + pthread_mutex_unlock(&g_jit_dual_map_mu); +} + +static int exec_dual_lookup(void* runtime_addr, size_t n, void** write_out) { + ExecDualNode* cur; + uintptr_t a = (uintptr_t)runtime_addr; + pthread_mutex_lock(&g_jit_dual_map_mu); + for (cur = g_jit_dual_map; cur; cur = cur->next) { + uintptr_t base = (uintptr_t)cur->runtime_base; + if (a >= base && a + n <= base + cur->size) { + *write_out = (void*)((uintptr_t)cur->write_base + (a - base)); + pthread_mutex_unlock(&g_jit_dual_map_mu); + return 0; + } + } + pthread_mutex_unlock(&g_jit_dual_map_mu); + return 1; +} + static int execmem_reserve_single(size_t size, CfreeExecMemRegion* out) { void* p = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0); @@ -239,6 +322,8 @@ static int execmem_reserve_dual_apple(size_t size, CfreeExecMemRegion* out) { tok->runtime_addr = (void*)(uintptr_t)r_addr; tok->size = size; + exec_dual_register(w, (void*)(uintptr_t)r_addr, size); + out->write = w; out->runtime = (void*)(uintptr_t)r_addr; out->size = size; @@ -285,6 +370,8 @@ static int execmem_reserve_dual_linux(size_t size, CfreeExecMemRegion* out) { tok->runtime_addr = r; tok->size = size; + exec_dual_register(w, r, size); + out->write = w; out->runtime = r; out->size = size; @@ -321,8 +408,10 @@ static void execmem_release(void* user, CfreeExecMemRegion* region) { if (!region || !region->size) return; if (region->token) { ExecMemToken* tok = (ExecMemToken*)region->token; - if (tok->runtime_addr && tok->runtime_addr != tok->write_addr) + if (tok->runtime_addr && tok->runtime_addr != tok->write_addr) { + exec_dual_unregister(tok->runtime_addr); munmap(tok->runtime_addr, tok->size); + } if (tok->write_addr) munmap(tok->write_addr, tok->size); free(tok); } else if (region->write) { @@ -351,6 +440,363 @@ static size_t driver_host_page_size(void) { static CfreeExecMem g_execmem_posix; /* page_size set in driver_env_init */ +/* ---------------- dbg os (POSIX) ---------------- */ +/* Implements CfreeDbgOs for the `cfree dbg` JIT debugger. v1 supports + * aarch64 on macOS (Apple silicon) and Linux only — the ucontext shape + * and W^X dance are arch/OS specific and a non-aarch64 build would link + * but fail at runtime. */ + +#define DBG_INTERRUPT_SIGNO SIGUSR2 + +/* Single-session process model (one debug target at a time). The signal + * handler reads these from async-signal context; both writes happen in + * signals_install before any signal can arrive, both clears happen in + * signals_uninstall after restoring SIG_DFL. */ +static const CfreeDbgSignalOps* g_dbg_ops; +static void* g_dbg_session; +static pthread_t g_dbg_worker_tid; +static int g_dbg_worker_tid_valid; + +/* Previous dispositions, restored by signals_uninstall. */ +static const int g_dbg_signos[] = {SIGTRAP, SIGSEGV, SIGBUS, + SIGILL, SIGFPE, DBG_INTERRUPT_SIGNO}; +#define DBG_NSIGS ((int)(sizeof(g_dbg_signos) / sizeof(g_dbg_signos[0]))) +static struct sigaction g_dbg_prev_sa[DBG_NSIGS]; +static int g_dbg_installed; + +/* TLS landing slot for guarded_copy. The SEGV/BUS handler checks + * g_guard_armed first; if set it siglongjmps back into guarded_copy + * without touching on_fault. */ +static __thread sigjmp_buf g_guard_buf; +static __thread int g_guard_armed; + +/* --- thread shim --- */ + +typedef struct DbgThread { + pthread_t tid; + void (*fn)(void*); + void* arg; +} DbgThread; + +static void* dbg_thread_trampoline(void* p) { + DbgThread* t = (DbgThread*)p; + t->fn(t->arg); + return NULL; +} + +static int dbg_thread_start(void* user, void (*fn)(void*), void* arg, + void** thread_out) { + DbgThread* t; + (void)user; + t = (DbgThread*)malloc(sizeof(*t)); + if (!t) return 1; + t->fn = fn; + t->arg = arg; + if (pthread_create(&t->tid, NULL, dbg_thread_trampoline, t) != 0) { + free(t); + return 1; + } + g_dbg_worker_tid = t->tid; + g_dbg_worker_tid_valid = 1; + *thread_out = t; + return 0; +} + +static void dbg_thread_join(void* user, void* thread) { + DbgThread* t = (DbgThread*)thread; + (void)user; + if (!t) return; + pthread_join(t->tid, NULL); + g_dbg_worker_tid_valid = 0; + free(t); +} + +static int dbg_thread_interrupt(void* user, void* thread) { + DbgThread* t = (DbgThread*)thread; + (void)user; + if (!t) return 1; + return pthread_kill(t->tid, DBG_INTERRUPT_SIGNO) == 0 ? 0 : 1; +} + +/* --- event shim --- */ + +typedef struct DbgEvent { + pthread_mutex_t mu; + pthread_cond_t cv; + int signaled; +} DbgEvent; + +static void* dbg_event_new(void* user) { + DbgEvent* e; + (void)user; + e = (DbgEvent*)malloc(sizeof(*e)); + if (!e) return NULL; + if (pthread_mutex_init(&e->mu, NULL) != 0) { + free(e); + return NULL; + } + if (pthread_cond_init(&e->cv, NULL) != 0) { + pthread_mutex_destroy(&e->mu); + free(e); + return NULL; + } + e->signaled = 0; + return e; +} + +static void dbg_event_free(void* user, void* ev) { + DbgEvent* e = (DbgEvent*)ev; + (void)user; + if (!e) return; + pthread_cond_destroy(&e->cv); + pthread_mutex_destroy(&e->mu); + free(e); +} + +static void dbg_event_wait(void* user, void* ev) { + DbgEvent* e = (DbgEvent*)ev; + (void)user; + pthread_mutex_lock(&e->mu); + while (!e->signaled) pthread_cond_wait(&e->cv, &e->mu); + e->signaled = 0; + pthread_mutex_unlock(&e->mu); +} + +/* pthread_cond_signal is not formally async-signal-safe per POSIX, but + * it is in practice on glibc and Apple libc when callers manage the + * signal mask carefully. LLDB and rr rely on the same pattern. */ +static void dbg_event_signal(void* user, void* ev) { + DbgEvent* e = (DbgEvent*)ev; + (void)user; + pthread_mutex_lock(&e->mu); + e->signaled = 1; + pthread_cond_broadcast(&e->cv); + pthread_mutex_unlock(&e->mu); +} + +static void dbg_event_reset(void* user, void* ev) { + DbgEvent* e = (DbgEvent*)ev; + (void)user; + pthread_mutex_lock(&e->mu); + e->signaled = 0; + pthread_mutex_unlock(&e->mu); +} + +/* --- signal install + ucontext marshalling --- */ + +/* Marshal ucontext_t <-> CfreeUnwindFrame. aarch64 only in v1; pc lives + * in regs[31] of CfreeUnwindFrame.pc and sp lives in regs[31]. The DWARF + * register numbering on aarch64 puts x0..x30 at 0..30, sp at 31. */ +#if defined(__aarch64__) && defined(__APPLE__) +static void dbg_ucontext_to_frame(const ucontext_t* uc, CfreeUnwindFrame* f) { + const struct __darwin_arm_thread_state64* ss = &uc->uc_mcontext->__ss; + int i; + for (i = 0; i < 29; ++i) f->regs[i] = ss->__x[i]; + f->regs[29] = (uint64_t)ss->__fp; + f->regs[30] = (uint64_t)ss->__lr; + f->regs[31] = (uint64_t)ss->__sp; + f->pc = (uint64_t)ss->__pc; + f->cfa = (uint64_t)ss->__fp; /* DWARF CFI refines this in the session */ +} +static void dbg_frame_to_ucontext(const CfreeUnwindFrame* f, ucontext_t* uc) { + struct __darwin_arm_thread_state64* ss = &uc->uc_mcontext->__ss; + int i; + for (i = 0; i < 29; ++i) ss->__x[i] = f->regs[i]; + ss->__fp = f->regs[29]; + ss->__lr = f->regs[30]; + ss->__sp = f->regs[31]; + ss->__pc = f->pc; +} +#elif defined(__aarch64__) && defined(__linux__) +static void dbg_ucontext_to_frame(const ucontext_t* uc, CfreeUnwindFrame* f) { + const mcontext_t* mc = &uc->uc_mcontext; + int i; + for (i = 0; i < 31; ++i) f->regs[i] = mc->regs[i]; + f->regs[31] = mc->sp; + f->pc = mc->pc; + f->cfa = mc->regs[29]; /* fp; CFI refines */ +} +static void dbg_frame_to_ucontext(const CfreeUnwindFrame* f, ucontext_t* uc) { + mcontext_t* mc = &uc->uc_mcontext; + int i; + for (i = 0; i < 31; ++i) mc->regs[i] = f->regs[i]; + mc->sp = f->regs[31]; + mc->pc = f->pc; +} +#else +#error "cfree dbg v1 supports only aarch64 on macOS or Linux" +#endif + +static void dbg_signal_handler(int signo, siginfo_t* si, void* ucv) { + ucontext_t* uc = (ucontext_t*)ucv; + CfreeUnwindFrame frame; + int rc; + (void)si; + + /* SIGSEGV/SIGBUS during an armed guarded_copy: bail out to the + * sigsetjmp landing slot before the session ever sees the fault. */ + if ((signo == SIGSEGV || signo == SIGBUS) && g_guard_armed) { + g_guard_armed = 0; + siglongjmp(g_guard_buf, 1); + } + + /* Only the registered worker thread participates in stop-the-world. + * Faults on other threads (e.g. the REPL) fall through to the default. */ + if (!g_dbg_worker_tid_valid || + !pthread_equal(pthread_self(), g_dbg_worker_tid) || !g_dbg_ops || + !g_dbg_ops->on_fault) { + int i; + for (i = 0; i < DBG_NSIGS; ++i) { + if (g_dbg_signos[i] == signo) { + sigaction(signo, &g_dbg_prev_sa[i], NULL); + break; + } + } + raise(signo); + return; + } + + dbg_ucontext_to_frame(uc, &frame); + rc = g_dbg_ops->on_fault(g_dbg_session, signo, &frame); + if (rc != 0) { + /* Session declined to handle: restore default and re-raise so the + * host produces a core dump for the original cause. */ + int i; + for (i = 0; i < DBG_NSIGS; ++i) { + if (g_dbg_signos[i] == signo) { + sigaction(signo, &g_dbg_prev_sa[i], NULL); + break; + } + } + raise(signo); + return; + } + dbg_frame_to_ucontext(&frame, uc); +} + +static int dbg_signals_install(void* user, const CfreeDbgSignalOps* ops, + void* session) { + struct sigaction sa; + int i; + (void)user; + if (g_dbg_installed) return 1; + g_dbg_ops = ops; + g_dbg_session = session; + + memset(&sa, 0, sizeof(sa)); + sa.sa_sigaction = dbg_signal_handler; + sa.sa_flags = SA_SIGINFO | SA_RESTART; + sigemptyset(&sa.sa_mask); + /* Block our signal cohort during the handler so nested faults from + * the cond-wait critical region don't recurse. */ + for (i = 0; i < DBG_NSIGS; ++i) sigaddset(&sa.sa_mask, g_dbg_signos[i]); + + for (i = 0; i < DBG_NSIGS; ++i) { + if (sigaction(g_dbg_signos[i], &sa, &g_dbg_prev_sa[i]) != 0) { + /* Roll back what we installed. */ + int j; + for (j = 0; j < i; ++j) sigaction(g_dbg_signos[j], &g_dbg_prev_sa[j], NULL); + g_dbg_ops = NULL; + g_dbg_session = NULL; + return 1; + } + } + g_dbg_installed = 1; + return 0; +} + +static void dbg_signals_uninstall(void* user) { + int i; + (void)user; + if (!g_dbg_installed) return; + for (i = 0; i < DBG_NSIGS; ++i) + sigaction(g_dbg_signos[i], &g_dbg_prev_sa[i], NULL); + g_dbg_installed = 0; + g_dbg_ops = NULL; + g_dbg_session = NULL; +} + +/* --- code write window (W^X dance) --- */ + +#if defined(__linux__) +static size_t dbg_page_floor(size_t v, size_t pg) { return v & ~(pg - 1); } +static size_t dbg_page_ceil(size_t v, size_t pg) { + return (v + pg - 1) & ~(pg - 1); +} +#endif + +static int dbg_code_write_begin(void* user, void* runtime_addr, size_t n, + void** write_out) { + (void)user; + if (!runtime_addr || !n || !write_out) return 1; +#if defined(__APPLE__) + /* Dual-mapped reservation (mach_vm_remap): the write alias is a + * separate VA already RW. Translate via the registry; no protect flip + * is required, so code_write_end is a no-op. */ + return exec_dual_lookup(runtime_addr, n, write_out); +#elif defined(__linux__) + { + size_t pg = driver_host_page_size(); + uintptr_t a = (uintptr_t)runtime_addr; + uintptr_t base = dbg_page_floor(a, pg); + size_t span = dbg_page_ceil((a - base) + n, pg); + /* Linux dual-mapping uses memfd: write alias and runtime alias have + * distinct VAs. Prefer the alias lookup; fall back to a transient + * mprotect of the runtime alias for single-mapping reservations. */ + if (exec_dual_lookup(runtime_addr, n, write_out) == 0) return 0; + if (mprotect((void*)base, span, PROT_READ | PROT_WRITE | PROT_EXEC) != 0) + return 1; + *write_out = runtime_addr; + return 0; + } +#else +#error "cfree dbg v1 supports only macOS and Linux" +#endif +} + +static void dbg_code_write_end(void* user, void* runtime_addr, size_t n) { + (void)user; +#if defined(__APPLE__) + (void)runtime_addr; + (void)n; +#elif defined(__linux__) + { + void* w; + size_t pg = driver_host_page_size(); + uintptr_t a = (uintptr_t)runtime_addr; + uintptr_t base = dbg_page_floor(a, pg); + size_t span = dbg_page_ceil((a - base) + n, pg); + if (exec_dual_lookup(runtime_addr, n, &w) == 0) return; /* dual: nothing to flip back */ + mprotect((void*)base, span, PROT_READ | PROT_EXEC); + } +#endif +} + +static void dbg_flush_icache(void* user, void* runtime_addr, size_t n) { + (void)user; +#if defined(__APPLE__) && defined(__aarch64__) + sys_icache_invalidate(runtime_addr, n); +#else + __builtin___clear_cache((char*)runtime_addr, (char*)runtime_addr + n); +#endif +} + +/* --- guarded copy --- */ + +static int dbg_guarded_copy(void* user, void* dst, const void* src, size_t n) { + (void)user; + if (sigsetjmp(g_guard_buf, 1) != 0) { + g_guard_armed = 0; + return 1; /* SIGSEGV/SIGBUS during the copy */ + } + g_guard_armed = 1; + memcpy(dst, src, n); + g_guard_armed = 0; + return 0; +} + +static CfreeDbgOs g_dbg_os_posix; + /* ---------------- writer (fd-backed) ---------------- */ typedef struct DriverFdWriter { @@ -531,6 +977,24 @@ void driver_env_init(DriverEnv* e) { g_execmem_posix.user = NULL; e->execmem = &g_execmem_posix; + g_dbg_os_posix.thread_start = dbg_thread_start; + g_dbg_os_posix.thread_join = dbg_thread_join; + g_dbg_os_posix.thread_interrupt = dbg_thread_interrupt; + g_dbg_os_posix.event_new = dbg_event_new; + g_dbg_os_posix.event_free = dbg_event_free; + g_dbg_os_posix.event_wait = dbg_event_wait; + g_dbg_os_posix.event_signal = dbg_event_signal; + g_dbg_os_posix.event_reset = dbg_event_reset; + g_dbg_os_posix.signals_install = dbg_signals_install; + g_dbg_os_posix.signals_uninstall = dbg_signals_uninstall; + g_dbg_os_posix.interrupt_signo = DBG_INTERRUPT_SIGNO; + g_dbg_os_posix.code_write_begin = dbg_code_write_begin; + g_dbg_os_posix.code_write_end = dbg_code_write_end; + g_dbg_os_posix.flush_icache = dbg_flush_icache; + g_dbg_os_posix.guarded_copy = dbg_guarded_copy; + g_dbg_os_posix.user = NULL; + e->dbg_os = &g_dbg_os_posix; + /* Reproducible-build precedent: SOURCE_DATE_EPOCH wins over wall clock. * If neither is set or the env value doesn't parse, advertise -1 ("no * clock") and pp falls back to C11 placeholders. */ @@ -558,6 +1022,7 @@ CfreeEnv driver_env_to_cfree(const DriverEnv* e) { ce.file_io = &e->file_io; ce.diag = e->diag; ce.execmem = e->execmem; + ce.dbg_os = e->dbg_os; ce.now = e->now; return ce; } diff --git a/include/cfree.h b/include/cfree.h @@ -187,11 +187,13 @@ typedef enum CfreeSymKind { * canonical assembler name (e.g. "rax", "x0", "a0") so dbg can render * `info registers` and accept `set $rax = ...` syntax. * - * Stateless and allocation-free in the lookup paths — name strings are static - * library data. `cfree_arch_register_name` returns NULL for an unmapped - * index; `cfree_arch_register_index` returns 0 on a known name and 1 if the - * name is unknown. The iterator yields named registers in DWARF index order. */ -typedef struct CfreeArchRegIter CfreeArchRegIter; + * Stateless and allocation-free — name strings are static library data. + * `cfree_arch_register_name` returns NULL for an unmapped DWARF index; + * `cfree_arch_register_index` returns 0 on a known name and 1 if the name + * is unknown. To enumerate every register defined for an arch, loop + * 0..cfree_arch_register_count(arch) calling cfree_arch_register_at; the + * iteration indices are dense in `[0, count)` and are unrelated to the + * DWARF indices, which are sparse (e.g. 32..63 are unused on aarch64). */ typedef struct CfreeArchReg { uint32_t dwarf_idx; const char* name; @@ -201,9 +203,8 @@ const char* cfree_arch_register_name(CfreeArchKind, uint32_t dwarf_idx); int cfree_arch_register_index(CfreeArchKind, const char* name, uint32_t* idx_out); -CfreeArchRegIter* cfree_arch_reg_iter_new(CfreeArchKind); -int cfree_arch_reg_iter_next(CfreeArchRegIter*, CfreeArchReg* out); -void cfree_arch_reg_iter_free(CfreeArchRegIter*); +uint32_t cfree_arch_register_count(CfreeArchKind); +int cfree_arch_register_at(CfreeArchKind, uint32_t idx, CfreeArchReg* out); /* ============================================================ * Host environment @@ -279,11 +280,87 @@ typedef struct CfreeExecMem { void* user; } CfreeExecMem; +/* Debugger OS vtable. Required by the JIT session (cfree_jit_session_new) so + * libcfree never includes <pthread.h>, <signal.h>, or platform headers for + * ucontext / W^X flips. May be NULL for hosts that never enter `dbg`. + * + * Threading model: a single worker thread is spawned per session; the REPL + * thread and worker hand off through two events (stop, resume). Signal + * handlers run on the worker thread, snapshot the host ucontext into a + * CfreeUnwindFrame, and call back into the session through on_fault. + * + * thread_start / _join — spawn worker, join on session teardown. + * thread_interrupt — async-signal-safe: deliver `interrupt_signo` + * to the worker thread (used by session_interrupt). + * event_* — one-shot binary events. The session creates two + * per worker; signal/wait must be safe to call + * from the worker's signal-handler context. + * signals_install — install handlers for SIGTRAP/SEGV/BUS/ILL/FPE + * plus `interrupt_signo`. Each handler: + * 1. snapshots ucontext into a CfreeUnwindFrame; + * 2. invokes ops->on_fault(session, signo, &regs); + * 3. on return, writes mutated regs back into + * ucontext before returning to the kernel. + * If on_fault returns nonzero the OS layer + * re-raises the signal to the host default + * (a fault the session declined to handle). + * signals_uninstall — restore prior dispositions on session teardown. + * interrupt_signo — host signal number reserved for STOP_INTERRUPT + * (e.g. SIGUSR2 on POSIX). + * code_write_begin/_end — open a write window over [runtime_addr, + * runtime_addr+n) inside an existing + * CfreeExecMem reservation. *write_out is the + * address through which the session writes the + * BRK / restore bytes. On dual-mapping hosts + * (Apple silicon) it is the write alias; on + * Linux it equals runtime_addr and the OS layer + * mprotect-flips RW<->RX around the window. + * flush_icache — make freshly patched code visible to the CPU + * at the runtime alias. Required on aarch64. + * guarded_copy — read/write `n` bytes between in-process + * addresses with a TLS sigsetjmp landing slot + * so SIGSEGV/SIGBUS during the copy returns + * nonzero instead of stopping the worker. The + * SEGV/BUS handlers in signals_install check + * this landing slot before delegating to + * on_fault. */ +typedef struct CfreeDbgSignalOps { + int (*on_fault)(void* session, int signo, CfreeUnwindFrame* regs); +} CfreeDbgSignalOps; + +typedef struct CfreeDbgOs { + int (*thread_start)(void* user, void (*fn)(void*), void* arg, + void** thread_out); + void (*thread_join)(void* user, void* thread); + int (*thread_interrupt)(void* user, void* thread); + + void* (*event_new)(void* user); + void (*event_free)(void* user, void* ev); + void (*event_wait)(void* user, void* ev); + void (*event_signal)(void* user, void* ev); + void (*event_reset)(void* user, void* ev); + + int (*signals_install)(void* user, const CfreeDbgSignalOps* ops, + void* session); + void (*signals_uninstall)(void* user); + int interrupt_signo; + + int (*code_write_begin)(void* user, void* runtime_addr, size_t n, + void** write_out); + void (*code_write_end)(void* user, void* runtime_addr, size_t n); + void (*flush_icache)(void* user, void* runtime_addr, size_t n); + + int (*guarded_copy)(void* user, void* dst, const void* src, size_t n); + + void* user; +} CfreeDbgOs; + typedef struct CfreeEnv { CfreeHeap* heap; const CfreeFileIO* file_io; /* may be NULL for purely in-memory pipelines */ CfreeDiagSink* diag; const CfreeExecMem* execmem; /* NULL ok unless JIT/emu paths run */ + const CfreeDbgOs* dbg_os; /* NULL ok unless `cfree dbg` paths run */ /* Unix seconds since 1970-01-01 UTC, or negative for "no clock". Used * by the preprocessor for __DATE__ / __TIME__ (negative → C11 §6.10.8.1 * placeholders). The host decides the policy (SOURCE_DATE_EPOCH, @@ -433,6 +510,13 @@ typedef enum CfreeEntryKind { CfreeJitSession* cfree_jit_session_new(CfreeJit*); void cfree_jit_session_free(CfreeJitSession*); +/* Bind a DWARF consumer to the session. Required for the source-level + * resume modes (STEP_LINE, NEXT_LINE, STEP_OUT). The CfreeDebugInfo must + * outlive every session_resume that uses those modes; the session does + * not take ownership and will not free it. Passing NULL detaches. + * Returns 0 on success. */ +int cfree_jit_session_attach_dwarf(CfreeJitSession*, CfreeDebugInfo*); + /* Begin executing `entry` with `argv`. Blocks until the worker stops. * `entry` must be a pointer returned by cfree_jit_lookup (or otherwise * within the JIT image). Returns 0 on success (including an EXIT stop), diff --git a/src/api/arch_regs.c b/src/api/arch_regs.c @@ -1,14 +1,7 @@ /* Public arch register name API. * * Stateless dispatch onto the per-arch register table. v1 wires aarch64 - * only; other arches return NULL / unknown. - * - * The iterator surface uses an opaque handle but the public API doesn't - * supply a heap, and src/ is -ffreestanding (no malloc). v1 keeps the - * iterator as the existing NULL-returning stub and exposes the - * stateless name ↔ index queries for the disassembler and unwinder - * paths. Iterator support can land later by making the iter API take an - * env/heap. */ + * only; other arches return zero / unknown. */ #include <cfree.h> @@ -34,3 +27,23 @@ int cfree_arch_register_index(CfreeArchKind arch, const char* name, return 1; } } + +uint32_t cfree_arch_register_count(CfreeArchKind arch) { + switch (arch) { + case CFREE_ARCH_ARM_64: + return aa64_register_iter_size(); + default: + return 0; + } +} + +int cfree_arch_register_at(CfreeArchKind arch, uint32_t idx, + CfreeArchReg* out) { + if (!out) return 1; + switch (arch) { + case CFREE_ARCH_ARM_64: + return aa64_register_iter_get(idx, &out->dwarf_idx, &out->name); + default: + return 1; + } +} diff --git a/src/api/stubs.c b/src/api/stubs.c @@ -104,96 +104,17 @@ int cfree_dep_iter_next(CfreeDepIter* it, CfreeDepEdge* o) { void cfree_dep_iter_free(CfreeDepIter* it) { (void)it; } /* Disassembler is real (src/api/disasm.c, src/arch/disasm.c, - * src/arch/aa64_disasm.c). Per-arch register name lookups are real - * (src/api/arch_regs.c + src/arch/aa64_regs.c). The reg-name iterator - * still has no heap supply via the public API, so its stub remains. */ - -struct CfreeArchRegIter { - int _; -}; -CfreeArchRegIter* cfree_arch_reg_iter_new(CfreeArchKind a) { - (void)a; - return 0; -} -int cfree_arch_reg_iter_next(CfreeArchRegIter* it, CfreeArchReg* o) { - (void)it; - (void)o; - return 0; -} -void cfree_arch_reg_iter_free(CfreeArchRegIter* it) { (void)it; } + * src/arch/aa64_disasm.c). Per-arch register name lookups and the + * indexed enumeration (cfree_arch_register_count / _at) are real + * (src/api/arch_regs.c + src/arch/aa64_regs.c). */ /* Linker script parsing lives in src/link/link_script.c. */ /* JIT lookup, view, addr_to_sym, and the symbol iterator live in * src/link/link_jit.c. */ -CfreeJitSession* cfree_jit_session_new(CfreeJit* j) { - (void)j; - return 0; -} -void cfree_jit_session_free(CfreeJitSession* s) { (void)s; } -int cfree_jit_session_call(CfreeJitSession* s, void* e, CfreeEntryKind k, - int ac, char** av, CfreeStopInfo* o) { - (void)s; - (void)e; - (void)k; - (void)ac; - (void)av; - (void)o; - return 1; -} -int cfree_jit_session_resume(CfreeJitSession* s, CfreeResumeMode m, - CfreeStopInfo* o) { - (void)s; - (void)m; - (void)o; - return 1; -} -int cfree_jit_session_interrupt(CfreeJitSession* s) { - (void)s; - return 1; -} -int cfree_jit_session_read_mem(CfreeJitSession* s, uint64_t a, void* d, - size_t n) { - (void)s; - (void)a; - (void)d; - (void)n; - return 1; -} -int cfree_jit_session_write_mem(CfreeJitSession* s, uint64_t a, const void* d, - size_t n) { - (void)s; - (void)a; - (void)d; - (void)n; - return 1; -} -int cfree_jit_session_set_regs(CfreeJitSession* s, const CfreeUnwindFrame* f) { - (void)s; - (void)f; - return 1; -} -int cfree_jit_session_breakpoint_set(CfreeJitSession* s, uint64_t a, - uint32_t* o) { - (void)s; - (void)a; - (void)o; - return 1; -} -int cfree_jit_session_breakpoint_clear(CfreeJitSession* s, uint32_t id) { - (void)s; - (void)id; - return 1; -} -int cfree_jit_session_breakpoint_set_spec(CfreeJitSession* s, - const CfreeBreakpointSpec* sp, - uint32_t* o) { - (void)s; - (void)sp; - (void)o; - return 1; -} +/* JIT session implementation lives in src/dbg/ (session.c, bp.c, step.c, + * displaced.c, arch_aa64.c, mem.c). */ /* DWARF consumer: the cfree_dwarf_* implementations live in src/dwarf/. * Their stubs were removed when src/dwarf/dwarf_*.c took ownership of diff --git a/src/dbg/arch_aa64.c b/src/dbg/arch_aa64.c @@ -0,0 +1,235 @@ +/* AArch64 lifter for the displaced-step shim. + * + * Lays out a fixed-up copy of one insn in the session scratch slot + * (DBG_DISPLACED_SLOT_BYTES bytes), followed by a BRK sentinel the + * session arms an internal bp on. + * + * Supported families: + * - any insn with no PC-relative operand (copied verbatim); + * - B / BL / B.cond — re-encode the immediate; + * - CBZ / CBNZ / TBZ / TBNZ — always emit a trampoline: + * slot[0] cond-branch +2 words (taken → slot+8) + * slot[4] BRK (not-taken fallthrough) + * slot[8] LDR x16, =target + * slot[12] BR x16 + * slot[16] literal pool (8 bytes, absolute target) + * - ADR / ADRP — replace with LDR Xd, =target: + * slot[0] LDR Xd, =target + * slot[4] BRK + * slot[8] literal pool (8 bytes) + * - LDR (literal), integer/LDRSW — synthesize indirect load: + * slot[0] LDR x16, =literal_addr + * slot[4] LDR Xt/Wt/LDRSW Xt, [x16] + * slot[8] BRK + * slot[12] literal pool (8 bytes, absolute literal addr) + * - BR / BLR / RET — copied verbatim; the BRK after never + * fires because the indirect branch transfers control. The session's + * stale internal_bp is cleared by the next prepare; finalize gates on + * PC == return_pc so it stays a no-op when control left the slot. */ + +#include "dbg/dbg.h" + +#include <string.h> + +#include "arch/aa64_isa.h" + +#define SHIM_X16 16u /* IP0; safe to clobber inside a shim */ + +uint32_t dbg_aa64_brk_word(void) { + return aa64_brk(0); +} + +static int fits_signed(int64_t v, int bits) { + int64_t lim = (int64_t)1 << (bits - 1); + return v >= -lim && v < lim; +} + +/* LDR (literal) for integer Xt: opc=01, V=0, fixed bits 011_0_00. + * 01 011 0 00 imm19 Rt → 0x58000000 | (imm19<<5) | Rt + * imm19 is the signed word offset from the LDR's own PC. */ +static uint32_t enc_ldr_lit_x(uint32_t Rt, int32_t imm19) { + return 0x58000000u | (((uint32_t)imm19 & 0x7ffffu) << 5) | (Rt & 0x1fu); +} +/* LDR Xt, [Xn, #0] / LDR Wt, [Xn, #0] / LDRSW Xt, [Xn, #0]. */ +static uint32_t enc_ldr64_reg(uint32_t Rt, uint32_t Rn) { + return aa64_ldr64_uimm12(Rt, Rn, 0); +} +static uint32_t enc_ldr32_reg(uint32_t Rt, uint32_t Rn) { + return aa64_ldst_uimm_pack((AA64LdStUimm){ + .size = 2, .V = 0, .opc = AA64_LDST_OPC_LDR, .imm12 = 0, .Rn = Rn, + .Rt = Rt}); +} +static uint32_t enc_ldrsw_reg(uint32_t Rt, uint32_t Rn) { + return aa64_ldst_uimm_pack((AA64LdStUimm){ + .size = 2, .V = 0, .opc = 2, .imm12 = 0, .Rn = Rn, .Rt = Rt}); +} + +static void put_u32(uint8_t* w, uint32_t off, uint32_t v) { + memcpy(w + off, &v, sizeof(v)); +} +static void put_u64(uint8_t* w, uint32_t off, uint64_t v) { + memcpy(w + off, &v, sizeof(v)); +} + +/* Sign-extend a `bits`-wide field whose raw value is `v`. */ +static int64_t sign_extend(uint64_t v, int bits) { + uint64_t m = 1ull << (bits - 1); + return (int64_t)((v ^ m) - m); +} + +int dbg_aa64_build_shim(uint32_t orig_insn, uint64_t orig_pc, + void* scratch_write, uint64_t scratch_runtime, + u32* shim_len) { + uint8_t* w = (uint8_t*)scratch_write; + uint32_t brk = aa64_brk(0); + int64_t pc_delta; + if (!shim_len) return 1; + *shim_len = 0; + pc_delta = (int64_t)orig_pc - (int64_t)scratch_runtime; + + /* ---- B / BL (imm26) ------------------------------------------------ */ + if ((orig_insn & 0x7C000000u) == 0x14000000u) { + AA64BrImm f = aa64_brimm_unpack(orig_insn); + int64_t imm = sign_extend(f.imm26, 26); + int64_t new_off = imm * 4 + pc_delta; + if ((new_off & 3) || !fits_signed(new_off / 4, 26)) { + /* Out of B/BL range from scratch: fall back to LDR x30/PC trick is + * messy for BL (need to preserve LR). Decline. */ + return 1; + } + f.imm26 = (uint32_t)((new_off / 4) & 0x3ffffffu); + put_u32(w, 0, aa64_brimm_pack(f)); + put_u32(w, 4, brk); + *shim_len = 4; + return 0; + } + + /* ---- B.cond (imm19) ------------------------------------------------ */ + if ((orig_insn & 0xFF000010u) == 0x54000000u) { + AA64BrCond f = aa64_brcond_unpack(orig_insn); + int64_t imm = sign_extend(f.imm19, 19); + int64_t new_off = imm * 4 + pc_delta; + if ((new_off & 3) || !fits_signed(new_off / 4, 19)) { + /* Synthesize: B.cond +8 (skip BRK) ; BRK ; LDR x16,=tgt ; BR x16 ; + * literal. The "taken" path branches to slot+8, the "not-taken" + * path falls through to BRK at slot+4. */ + uint64_t target = orig_pc + (uint64_t)(imm * 4); + AA64BrCond nf; + nf.cond = f.cond; + nf.imm19 = 2u; /* +8 bytes from slot[0] → slot[8] */ + put_u32(w, 0, aa64_brcond_pack(nf)); + put_u32(w, 4, brk); + put_u32(w, 8, enc_ldr_lit_x(SHIM_X16, 2)); /* LDR x16, [pc+8] = slot[16] */ + put_u32(w, 12, aa64_br(SHIM_X16)); + put_u64(w, 16, target); + *shim_len = 4; + return 0; + } + f.imm19 = (uint32_t)((new_off / 4) & 0x7ffffu); + put_u32(w, 0, aa64_brcond_pack(f)); + put_u32(w, 4, brk); + *shim_len = 4; + return 0; + } + + /* ---- CBZ / CBNZ (imm19) — always trampoline form ------------------- */ + if ((orig_insn & 0x7E000000u) == 0x34000000u) { + AA64CB f = aa64_cb_unpack(orig_insn); + int64_t imm = sign_extend(f.imm19, 19); + uint64_t target = orig_pc + (uint64_t)(imm * 4); + AA64CB nf = f; + nf.imm19 = 2u; /* +8 → slot[8] */ + put_u32(w, 0, aa64_cb_pack(nf)); + put_u32(w, 4, brk); + put_u32(w, 8, enc_ldr_lit_x(SHIM_X16, 2)); + put_u32(w, 12, aa64_br(SHIM_X16)); + put_u64(w, 16, target); + *shim_len = 4; + return 0; + } + + /* ---- TBZ / TBNZ (imm14) — always trampoline ------------------------ + * b5 011011 op b40[18:14] imm14[18:5] -- wait, field layout: + * b5(31) 011011(30..25) op(24) b40(23..19) imm14(18..5) Rt(4..0). */ + if ((orig_insn & 0x7E000000u) == 0x36000000u) { + uint32_t b5 = (orig_insn >> 31) & 1u; + uint32_t op = (orig_insn >> 24) & 1u; + uint32_t b40 = (orig_insn >> 19) & 0x1fu; + uint32_t Rt = orig_insn & 0x1fu; + uint32_t imm14_raw = (orig_insn >> 5) & 0x3fffu; + int64_t imm = sign_extend(imm14_raw, 14); + uint64_t target = orig_pc + (uint64_t)(imm * 4); + uint32_t new_imm14 = 2u; /* +8 → slot[8] */ + uint32_t new_word = + (b5 << 31) | 0x36000000u | (op << 24) | (b40 << 19) | + ((new_imm14 & 0x3fffu) << 5) | (Rt & 0x1fu); + put_u32(w, 0, new_word); + put_u32(w, 4, brk); + put_u32(w, 8, enc_ldr_lit_x(SHIM_X16, 2)); + put_u32(w, 12, aa64_br(SHIM_X16)); + put_u64(w, 16, target); + *shim_len = 4; + return 0; + } + + /* ---- ADR / ADRP ---------------------------------------------------- */ + if ((orig_insn & 0x1F000000u) == 0x10000000u) { + AA64PCRelAdr f = aa64_pcrel_adr_unpack(orig_insn); + uint64_t imm_raw = ((uint64_t)f.immhi << 2) | (uint64_t)f.immlo; + int64_t imm21 = sign_extend(imm_raw, 21); + uint64_t target; + if (f.op == AA64_ADR_OP_ADRP) { + target = (orig_pc & ~(uint64_t)0xFFF) + ((uint64_t)imm21 << 12); + } else { + target = orig_pc + (uint64_t)imm21; + } + /* LDR Xd, [pc + 8] — the literal sits at slot[8]. */ + put_u32(w, 0, enc_ldr_lit_x(f.Rd, 2)); + put_u32(w, 4, brk); + put_u64(w, 8, target); + *shim_len = 4; + return 0; + } + + /* ---- LDR (literal) — integer & LDRSW only -------------------------- */ + if ((orig_insn & 0x3B000000u) == 0x18000000u) { + uint32_t opc = (orig_insn >> 30) & 3u; + uint32_t V = (orig_insn >> 26) & 1u; + uint32_t Rt = orig_insn & 0x1fu; + uint32_t imm19_raw = (orig_insn >> 5) & 0x7ffffu; + int64_t imm19 = sign_extend(imm19_raw, 19); + uint64_t literal_addr = orig_pc + (uint64_t)(imm19 * 4); + uint32_t load_insn; + if (V) return 1; /* vector forms (S/D/Q): not supported in v1 */ + switch (opc) { + case 0: load_insn = enc_ldr32_reg(Rt, SHIM_X16); break; /* LDR Wt */ + case 1: load_insn = enc_ldr64_reg(Rt, SHIM_X16); break; /* LDR Xt */ + case 2: load_insn = enc_ldrsw_reg(Rt, SHIM_X16); break; /* LDRSW */ + default: return 1; /* PRFM (literal): not meaningful here */ + } + /* LDR x16, [pc + 12] — literal at slot[12]. */ + put_u32(w, 0, enc_ldr_lit_x(SHIM_X16, 3)); + put_u32(w, 4, load_insn); + put_u32(w, 8, brk); + put_u64(w, 12, literal_addr); + *shim_len = 8; + return 0; + } + + /* ---- BR / BLR / RET (indirect) ------------------------------------- */ + if ((orig_insn & 0xFE1FFC1Fu) == AA64_BR_REG_FAMILY_MATCH) { + /* Copy verbatim; the BRK after will not fire because control + * transfers to the register target. The session clears the stale + * internal bp on the next prepare. */ + put_u32(w, 0, orig_insn); + put_u32(w, 4, brk); + *shim_len = 4; + return 0; + } + + /* ---- default: no PC-relative operand — copy verbatim --------------- */ + put_u32(w, 0, orig_insn); + put_u32(w, 4, brk); + *shim_len = 4; + return 0; +} diff --git a/src/dbg/bp.c b/src/dbg/bp.c @@ -0,0 +1,224 @@ +/* Breakpoint table for the JIT debugger session. + * + * Keyed by runtime address. Each slot owns the bytes overwritten by the BRK + * patch, a refcount (so step.c can drop temporaries without disturbing user + * bps at the same PC), and a monotonic user-visible id. Reads of the + * patched range substitute the saved bytes so `x` and disasm stay honest. */ + +#include "dbg/dbg.h" + +#include <string.h> + +#include "core/heap.h" + +static u32 bp_find_slot(DbgBpTable* t, uint64_t addr) { + u32 i; + for (i = 0; i < t->cap; ++i) { + if (t->slots[i].user_id != 0 && t->slots[i].addr == addr) return i + 1; + } + return 0; +} + +static u32 bp_alloc_slot(CfreeJitSession* s) { + DbgBpTable* t = &s->bps; + u32 i; + for (i = 0; i < t->cap; ++i) { + if (t->slots[i].user_id == 0) return i; + } + /* grow */ + { + u32 ncap = t->cap ? t->cap * 2 : 16; + DbgBp* nslots = (DbgBp*)s->heap->alloc(s->heap, sizeof(DbgBp) * ncap, + _Alignof(DbgBp)); + if (!nslots) return (u32)-1; + memset(nslots, 0, sizeof(DbgBp) * ncap); + if (t->slots) { + memcpy(nslots, t->slots, sizeof(DbgBp) * t->cap); + s->heap->free(s->heap, t->slots, sizeof(DbgBp) * t->cap); + } + t->slots = nslots; + i = t->cap; + t->cap = ncap; + return i; + } +} + +void dbg_bp_init(CfreeJitSession* s) { + memset(&s->bps, 0, sizeof(s->bps)); + s->bps.next_user_id = 1; + s->bps.next_internal_id = DBG_BP_ID_INTERNAL_BASE; +} + +void dbg_bp_fini(CfreeJitSession* s) { + DbgBpTable* t = &s->bps; + u32 i; + if (t->slots) { + /* Restore any still-armed patches so the JIT image is left clean. */ + for (i = 0; i < t->cap; ++i) { + DbgBp* b = &t->slots[i]; + if (b->user_id != 0 && b->enabled && b->saved_len) { + void* write_addr = NULL; + if (s->os->code_write_begin(s->os->user, (void*)(uintptr_t)b->addr, + b->saved_len, &write_addr) == 0 && + write_addr) { + memcpy(write_addr, b->saved, b->saved_len); + s->os->code_write_end(s->os->user, (void*)(uintptr_t)b->addr, + b->saved_len); + if (s->os->flush_icache) + s->os->flush_icache(s->os->user, (void*)(uintptr_t)b->addr, + b->saved_len); + } + } + } + s->heap->free(s->heap, t->slots, sizeof(DbgBp) * t->cap); + t->slots = NULL; + t->cap = 0; + } +} + +static int bp_install_patch(CfreeJitSession* s, DbgBp* b) { + void* write_addr = NULL; + uint32_t brk; + if (s->arch != CFREE_ARCH_ARM_64) return 1; + brk = dbg_aa64_brk_word(); + b->saved_len = DBG_AA64_INSN_LEN; + if (s->os->code_write_begin(s->os->user, (void*)(uintptr_t)b->addr, + b->saved_len, &write_addr) != 0 || + !write_addr) { + return 1; + } + memcpy(b->saved, write_addr, b->saved_len); + memcpy(write_addr, &brk, sizeof(brk)); + s->os->code_write_end(s->os->user, (void*)(uintptr_t)b->addr, b->saved_len); + if (s->os->flush_icache) + s->os->flush_icache(s->os->user, (void*)(uintptr_t)b->addr, b->saved_len); + b->enabled = 1; + return 0; +} + +static void bp_remove_patch(CfreeJitSession* s, DbgBp* b) { + void* write_addr = NULL; + if (!b->enabled || !b->saved_len) return; + if (s->os->code_write_begin(s->os->user, (void*)(uintptr_t)b->addr, + b->saved_len, &write_addr) != 0 || + !write_addr) { + return; + } + memcpy(write_addr, b->saved, b->saved_len); + s->os->code_write_end(s->os->user, (void*)(uintptr_t)b->addr, b->saved_len); + if (s->os->flush_icache) + s->os->flush_icache(s->os->user, (void*)(uintptr_t)b->addr, b->saved_len); + b->enabled = 0; +} + +static int bp_set_common(CfreeJitSession* s, const CfreeBreakpointSpec* spec, + int internal, u32* id_out) { + uint64_t addr = spec->addr; + u32 slot; + DbgBp* b; + /* Internal bps may live in the displaced-step scratch page, which is + * outside the JIT image; only user bps are constrained to image range. */ + if (!internal && !cfree_jit_image_contains(s->jit, addr)) return 1; + + /* idempotent: existing slot at this address bumps refcount and returns + * its already-issued id. */ + slot = bp_find_slot(&s->bps, addr); + if (slot) { + b = &s->bps.slots[slot - 1]; + b->refcount++; + if (id_out) *id_out = b->user_id; + return 0; + } + + { + u32 i = bp_alloc_slot(s); + if (i == (u32)-1) return 1; + b = &s->bps.slots[i]; + memset(b, 0, sizeof(*b)); + b->addr = addr; + b->refcount = 1; + b->skip_count = spec->skip_count; + b->max_hits = spec->max_hits; + b->condition = spec->condition; + b->condition_user = spec->condition_user; + b->internal = (u8)(internal ? 1 : 0); + if (internal) { + b->user_id = s->bps.next_internal_id++; + } else { + b->user_id = s->bps.next_user_id++; + } + if (bp_install_patch(s, b) != 0) { + memset(b, 0, sizeof(*b)); + return 1; + } + s->bps.count++; + if (id_out) *id_out = b->user_id; + } + return 0; +} + +int dbg_bp_set(CfreeJitSession* s, uint64_t addr, u32* id_out) { + CfreeBreakpointSpec spec; + memset(&spec, 0, sizeof(spec)); + spec.addr = addr; + return bp_set_common(s, &spec, 0, id_out); +} + +int dbg_bp_set_spec(CfreeJitSession* s, const CfreeBreakpointSpec* spec, + u32* id_out) { + if (!spec) return 1; + return bp_set_common(s, spec, 0, id_out); +} + +int dbg_bp_set_internal(CfreeJitSession* s, uint64_t addr, u32* id_out) { + CfreeBreakpointSpec spec; + memset(&spec, 0, sizeof(spec)); + spec.addr = addr; + return bp_set_common(s, &spec, 1, id_out); +} + +int dbg_bp_clear(CfreeJitSession* s, u32 id) { + u32 i; + if (id == 0) return 0; + for (i = 0; i < s->bps.cap; ++i) { + DbgBp* b = &s->bps.slots[i]; + if (b->user_id != id) continue; + if (b->refcount > 1) { + b->refcount--; + return 0; + } + bp_remove_patch(s, b); + memset(b, 0, sizeof(*b)); + s->bps.count--; + return 0; + } + return 0; /* silent on unknown id, per contract */ +} + +u32 dbg_bp_lookup_index(CfreeJitSession* s, uint64_t addr) { + return bp_find_slot(&s->bps, addr); +} + +DbgBp* dbg_bp_at_index(CfreeJitSession* s, u32 idx) { + if (idx == 0 || idx > s->bps.cap) return NULL; + return &s->bps.slots[idx - 1]; +} + +void dbg_bp_unpatch_read(CfreeJitSession* s, uint64_t addr, void* buf, + size_t n) { + u32 i; + u8* out = (u8*)buf; + uint64_t end = addr + n; + for (i = 0; i < s->bps.cap; ++i) { + DbgBp* b = &s->bps.slots[i]; + uint64_t bp_end; + uint64_t lo; + uint64_t hi; + if (b->user_id == 0 || !b->enabled || !b->saved_len) continue; + bp_end = b->addr + b->saved_len; + if (bp_end <= addr || b->addr >= end) continue; + lo = b->addr > addr ? b->addr : addr; + hi = bp_end < end ? bp_end : end; + memcpy(out + (lo - addr), b->saved + (lo - b->addr), (size_t)(hi - lo)); + } +} diff --git a/src/dbg/dbg.h b/src/dbg/dbg.h @@ -0,0 +1,183 @@ +#ifndef CFREE_DBG_INTERNAL_H +#define CFREE_DBG_INTERNAL_H + +/* Internal contracts for src/dbg/. The public CfreeJitSession entries are + * defined in session.c on top of these primitives; bp.c, step.c, mem.c, + * displaced.c, and arch_aa64.c each own one slice. */ + +#include <cfree.h> + +#include "core/core.h" + +#define DBG_SCRATCH_PAGE_SIZE 4096u +#define DBG_BP_MAX_INSN_LEN 8u +#define DBG_BP_ID_INTERNAL_BASE 0x80000000u +#define DBG_AA64_INSN_LEN 4u +#define DBG_DISPLACED_SLOT_BYTES 64u + +/* Bridge into link_jit.c so the session can validate addresses and pick the + * arch lifter without dragging LinkImage internals into src/dbg/. */ +int cfree_jit_image_contains(CfreeJit*, uint64_t runtime_addr); +CfreeArchKind cfree_jit_image_arch(CfreeJit*); +Compiler* cfree_jit_compiler(CfreeJit*); + +/* ---- breakpoint table ------------------------------------------------ */ + +typedef struct DbgBp { + uint64_t addr; + u8 saved[DBG_BP_MAX_INSN_LEN]; + u32 saved_len; + u32 refcount; + u32 user_id; /* public handle returned to caller; 0 = unused slot */ + u8 enabled; + u8 internal; /* set when the entry was armed by step.c */ + u16 pad; + uint64_t hit_count; + uint64_t skip_count; + uint64_t max_hits; + int (*condition)(void*, const CfreeUnwindFrame*); + void* condition_user; +} DbgBp; + +typedef struct DbgBpTable { + DbgBp* slots; + u32 cap; + u32 count; + u32 next_user_id; /* monotonic, starts at 1 */ + u32 next_internal_id; /* monotonic, starts at DBG_BP_ID_INTERNAL_BASE */ +} DbgBpTable; + +struct CfreeJitSession; /* fwd */ + +void dbg_bp_init(struct CfreeJitSession*); +void dbg_bp_fini(struct CfreeJitSession*); + +/* set/clear with the user-facing handle space. The internal variants are + * used by step.c for one-shot temporaries. */ +int dbg_bp_set(struct CfreeJitSession*, uint64_t addr, u32* id_out); +int dbg_bp_set_spec(struct CfreeJitSession*, const CfreeBreakpointSpec*, + u32* id_out); +int dbg_bp_set_internal(struct CfreeJitSession*, uint64_t addr, u32* id_out); +int dbg_bp_clear(struct CfreeJitSession*, u32 id); + +/* Lookup at a PC. Returns the slot index + 1 (so 0 means "not patched"); + * the caller uses dbg_bp_at_index to fetch the entry. */ +u32 dbg_bp_lookup_index(struct CfreeJitSession*, uint64_t addr); +DbgBp* dbg_bp_at_index(struct CfreeJitSession*, u32 idx); + +/* Memory-read fixup: if [addr, addr+n) overlaps any patched bp, write the + * original bytes back into `buf` at the right offsets. */ +void dbg_bp_unpatch_read(struct CfreeJitSession*, uint64_t addr, void* buf, + size_t n); + +/* ---- memory --------------------------------------------------------- */ +int dbg_mem_read(struct CfreeJitSession*, uint64_t addr, void* dst, size_t n); +int dbg_mem_write(struct CfreeJitSession*, uint64_t addr, const void* src, + size_t n); + +/* ---- displaced step ------------------------------------------------- */ +/* The session owns a single executable scratch region. arch_aa64.c writes + * a fixed-up copy of the original insn plus a return-shim into it; the + * worker is then resumed with PC pointing at the scratch entry. The shim + * ends with a BRK that the fault classifier recognizes (via the bp table) + * and translates back into MODE_DONE. */ +typedef struct DbgDisplaced { + CfreeExecMemRegion region; + int valid; + uint64_t orig_pc; /* original user PC of the insn being stepped */ + uint64_t return_pc; /* PC the shim's BRK lives at (= scratch + N) */ + uint32_t internal_bp; /* id of the one-shot bp at return_pc */ +} DbgDisplaced; + +int dbg_displaced_init(struct CfreeJitSession*); +void dbg_displaced_fini(struct CfreeJitSession*); + +/* Prepare an out-of-line single-step at `insn_pc`. Sets *new_pc to the + * scratch entry the worker should branch to; arms an internal bp on the + * shim's BRK. Returns 0 on success, 1 if the insn family is not supported. */ +int dbg_displaced_prepare(struct CfreeJitSession*, uint64_t insn_pc, + uint64_t* new_pc); +/* After the shim BRK fires, finalize: clear the internal bp, restore the + * user-visible PC to insn_pc + 4 (or branch target captured by the shim). */ +void dbg_displaced_finalize(struct CfreeJitSession*); + +/* ---- arch-aa64 ------------------------------------------------------ */ +uint32_t dbg_aa64_brk_word(void); +/* Lay down a displaced-step shim for the 4-byte AArch64 insn `orig_insn` + * (originally at `orig_pc`) into the scratch buffer beginning at + * `scratch_runtime`. Writes bytes through `scratch_write` (write alias). + * On success returns 0 and sets *brk_offset to the byte offset of the BRK + * sentinel from `scratch_runtime`; the caller arms an internal bp at + * `scratch_runtime + *brk_offset` and flushes the whole slot. Returns 1 + * for unsupported instruction families. */ +int dbg_aa64_build_shim(uint32_t orig_insn, uint64_t orig_pc, + void* scratch_write, uint64_t scratch_runtime, + u32* brk_offset); + +/* ---- step state machine --------------------------------------------- */ +int dbg_step_resume(struct CfreeJitSession*, CfreeResumeMode mode); + +/* ---- session state -------------------------------------------------- */ +typedef enum DbgSessionState { + DBG_STATE_IDLE = 0, /* no worker call in flight */ + DBG_STATE_RUNNING = 1, /* worker has been signaled to run */ + DBG_STATE_STOPPED = 2, /* worker is parked, REPL may inspect */ + DBG_STATE_EXITED = 3, /* worker entry returned */ +} DbgSessionState; + +struct CfreeJitSession { + CfreeJit* jit; + Compiler* c; + Heap* heap; + const CfreeDbgOs* os; + CfreeArchKind arch; + + /* worker thread + event handshake */ + void* worker; + void* ev_resume; + void* ev_stop; + DbgSessionState state; + u8 interrupt_pending; + u8 worker_alive; + u8 worker_should_exit; + u8 pad0; + + /* entry args set by _call */ + void* entry; + CfreeEntryKind entry_kind; + int entry_argc; + char** entry_argv; + int entry_ret; + + /* current stop slot (filled by the fault handler / worker exit path) */ + CfreeStopInfo stop; + CfreeUnwindFrame regs_scratch; /* used by signal handler */ + + /* pending resume directive (set by REPL before signaling ev_resume) */ + CfreeResumeMode pending_mode; + uint64_t pending_pc_override; /* nonzero → write before resume */ + u8 pending_has_pc; + u8 pending_step_pending; /* MODE_STEP_INSN in progress via displaced */ + u8 pad1[2]; + + /* breakpoint table */ + DbgBpTable bps; + + /* displaced-step scratch */ + DbgDisplaced displaced; + + /* optional DWARF binding (caller-owned; needed for source-level steps) */ + CfreeDebugInfo* dwarf; + + /* set by dbg_step_resume when it has already driven the worker through + * its own signal/wait cycles; tells cfree_jit_session_resume not to + * issue another resume. */ + u8 pending_done; + u8 pad2[3]; +}; + +/* internal helpers shared between session.c and step.c */ +int dbg_session_wait_stop(struct CfreeJitSession*); +int dbg_session_signal_resume(struct CfreeJitSession*); + +#endif diff --git a/src/dbg/displaced.c b/src/dbg/displaced.c @@ -0,0 +1,121 @@ +/* Displaced single-step plumbing. + * + * Reserves a single executable page (W^X dual-mapped via env->execmem) + * the first time STEP_INSN is requested. The per-arch lifter copies a + * fixed-up version of the instruction at insn_pc into that page, followed + * by a BRK sentinel; the session arms an internal breakpoint on the + * sentinel and resumes with PC = scratch_runtime. On the BRK fault, the + * fault classifier sees the internal bp and the session uses + * dbg_displaced_finalize to restore the user-visible PC. */ + +#include "dbg/dbg.h" + +#include <string.h> + +int dbg_displaced_init(CfreeJitSession* s) { + const CfreeExecMem* mem; + if (s->displaced.valid) return 0; + mem = s->c->env ? s->c->env->execmem : NULL; + if (!mem || !mem->reserve || !mem->protect) return 1; + memset(&s->displaced.region, 0, sizeof(s->displaced.region)); + if (mem->reserve(mem->user, DBG_SCRATCH_PAGE_SIZE, + CFREE_PROT_READ | CFREE_PROT_EXEC, + &s->displaced.region) != 0) { + return 1; + } + if (mem->protect(mem->user, s->displaced.region.runtime, + s->displaced.region.size, + CFREE_PROT_READ | CFREE_PROT_EXEC) != 0) { + mem->release(mem->user, &s->displaced.region); + memset(&s->displaced.region, 0, sizeof(s->displaced.region)); + return 1; + } + s->displaced.valid = 1; + return 0; +} + +void dbg_displaced_fini(CfreeJitSession* s) { + const CfreeExecMem* mem = s->c->env ? s->c->env->execmem : NULL; + if (!s->displaced.valid) return; + if (mem && mem->release) mem->release(mem->user, &s->displaced.region); + memset(&s->displaced, 0, sizeof(s->displaced)); +} + +int dbg_displaced_prepare(CfreeJitSession* s, uint64_t insn_pc, + uint64_t* new_pc) { + uint32_t orig_word = 0; + u32 brk_off = 0; + uint64_t scratch_runtime; + uint8_t* scratch_write; + u32 bp_id = 0; + + if (s->arch != CFREE_ARCH_ARM_64) return 1; + if (dbg_displaced_init(s) != 0) return 1; + + /* A previous step whose shim transferred control (indirect branch or + * a CBZ/TBZ trampoline that took) never ran finalize, leaving a stale + * internal bp at the old return_pc. Drop it before we lay down the + * new shim — bp.c is idempotent, but the refcount would climb. */ + if (s->displaced.internal_bp != 0) { + dbg_bp_clear(s, s->displaced.internal_bp); + s->displaced.internal_bp = 0; + s->displaced.return_pc = 0; + s->displaced.orig_pc = 0; + } + + /* Read the original 4 bytes via the bp table (so we get the saved + * insn, not the BRK if one is patched here). */ + { + u32 idx = dbg_bp_lookup_index(s, insn_pc); + if (idx) { + DbgBp* b = dbg_bp_at_index(s, idx); + memcpy(&orig_word, b->saved, sizeof(orig_word)); + } else { + if (s->os->guarded_copy(s->os->user, &orig_word, + (const void*)(uintptr_t)insn_pc, + sizeof(orig_word)) != 0) { + return 1; + } + } + } + + scratch_runtime = (uint64_t)(uintptr_t)s->displaced.region.runtime; + scratch_write = (uint8_t*)s->displaced.region.write; + if (dbg_aa64_build_shim(orig_word, insn_pc, scratch_write, scratch_runtime, + &brk_off) != 0) { + return 1; + } + /* Flush the entire slot — trampoline forms write up to 24 bytes plus a + * literal pool; arch_aa64.c returns the BRK *offset*, not the length. */ + if (s->c->env->execmem->flush_icache) { + s->c->env->execmem->flush_icache(s->c->env->execmem->user, + s->displaced.region.runtime, + DBG_DISPLACED_SLOT_BYTES); + } + + /* Arm an internal breakpoint on the shim's BRK sentinel so the fault + * classifier identifies it as a displaced-step completion. */ + if (dbg_bp_set_internal(s, scratch_runtime + brk_off, &bp_id) != 0) { + return 1; + } + s->displaced.orig_pc = insn_pc; + s->displaced.return_pc = scratch_runtime + brk_off; + s->displaced.internal_bp = bp_id; + if (new_pc) *new_pc = scratch_runtime; + return 0; +} + +void dbg_displaced_finalize(CfreeJitSession* s) { + if (s->displaced.internal_bp != 0) { + dbg_bp_clear(s, s->displaced.internal_bp); + s->displaced.internal_bp = 0; + } + /* Restore PC to the instruction following the original, unless the + * fixed-up branch took (in which case PC will already be elsewhere + * and we leave it alone). */ + if (s->stop.regs.pc == s->displaced.return_pc) { + s->stop.regs.pc = s->displaced.orig_pc + DBG_AA64_INSN_LEN; + } + s->displaced.orig_pc = 0; + s->displaced.return_pc = 0; +} diff --git a/src/dbg/mem.c b/src/dbg/mem.c @@ -0,0 +1,27 @@ +/* Guarded read/write of guest memory plus bp-byte fixup. The actual + * SIGSEGV catch lives in the host's CfreeDbgOs.guarded_copy; this TU + * just delegates and then overlays original bytes from the bp table + * over any patched ranges in read results. */ + +#include "dbg/dbg.h" + +#include <string.h> + +int dbg_mem_read(CfreeJitSession* s, uint64_t addr, void* dst, size_t n) { + if (!s || !dst || n == 0) return 1; + if (s->os->guarded_copy(s->os->user, dst, (const void*)(uintptr_t)addr, n) != + 0) { + return 1; + } + dbg_bp_unpatch_read(s, addr, dst, n); + return 0; +} + +int dbg_mem_write(CfreeJitSession* s, uint64_t addr, const void* src, + size_t n) { + if (!s || !src || n == 0) return 1; + if (s->os->guarded_copy(s->os->user, (void*)(uintptr_t)addr, src, n) != 0) { + return 1; + } + return 0; +} diff --git a/src/dbg/session.c b/src/dbg/session.c @@ -0,0 +1,399 @@ +/* CfreeJitSession lifecycle, worker handshake, and fault classification. + * + * The session owns a single worker thread that runs the JIT'd entry. The + * REPL thread and worker thread coordinate through two events (resume, + * stop) and one shared CfreeStopInfo slot. Every fault on the worker + * (BRK / SIGSEGV / SIGBUS / SIGILL / SIGFPE / interrupt_signo) drops into + * on_fault here; this TU is also the only place that touches the public + * CfreeJitSession entries. */ + +#include "dbg/dbg.h" + +#include <string.h> + +#include "core/heap.h" + +/* ---- fault classification ------------------------------------------- */ + +static int signo_is_trap(int signo) { + /* Trap = software breakpoint (BRK on aa64). The host typically maps + * this to SIGTRAP; we don't include the platform header to assert the + * value but BRK-induced faults always carry the trap signo for the + * host that installed the handler. The POSIX impl in driver/env.c + * passes the host signo straight through. We treat any signo whose + * fault PC is patched as a trap, so the actual signo here is mostly + * advisory. */ + (void)signo; + return 1; +} + +static int on_fault(void* session_v, int signo, CfreeUnwindFrame* regs) { + CfreeJitSession* s = (CfreeJitSession*)session_v; + u32 idx; + DbgBp* bp; + + if (!s) return 1; + + /* Snapshot the regs into our stop slot up-front. */ + memcpy(&s->stop.regs, regs, sizeof(*regs)); + s->stop.signal = signo; + s->stop.exit_code = 0; + s->stop.bp_id = 0; + + /* Interrupt — host requested via thread_interrupt. */ + if (s->os->interrupt_signo != 0 && signo == s->os->interrupt_signo) { + s->stop.kind = CFREE_STOP_INTERRUPT; + goto park; + } + + /* Breakpoint? */ + idx = dbg_bp_lookup_index(s, regs->pc); + if (idx) { + bp = dbg_bp_at_index(s, idx); + + /* Displaced-step sentinel: complete the step and either resume + * silently (auto_continue path inside dbg_step_resume) or surface + * a generic stop for STEP_INSN. */ + if (bp && bp->internal && s->displaced.return_pc == regs->pc) { + dbg_displaced_finalize(s); + /* Sync the on-stack regs with the corrected PC so the OS layer + * writes it back into ucontext on return. */ + regs->pc = s->stop.regs.pc; + if (s->pending_step_pending) { + /* CONTINUE-over-bp: do not park; just resume. */ + s->pending_step_pending = 0; + return 0; + } + s->stop.kind = CFREE_STOP_BREAKPOINT; + s->stop.bp_id = 0; + goto park; + } + + /* Plain internal bp (e.g. fallback one-shot at PC+4): treat like + * a step completion. */ + if (bp && bp->internal) { + /* Clear it so it doesn't linger across resumes. */ + dbg_bp_clear(s, bp->user_id); + if (s->pending_step_pending) { + s->pending_step_pending = 0; + return 0; + } + s->stop.kind = CFREE_STOP_BREAKPOINT; + s->stop.bp_id = 0; + goto park; + } + + /* User-visible bp. Apply skip_count / condition / max_hits. */ + if (bp) { + bp->hit_count++; + if (bp->hit_count <= bp->skip_count) { + /* Silent skip: re-step over the patch so the original insn + * runs and execution continues without notifying the REPL. + * Implemented by arming a displaced step then deferring the + * "do not park" decision via pending_step_pending. */ + s->pending_step_pending = 1; + if (dbg_step_resume(s, CFREE_RESUME_CONTINUE) != 0) { + /* Couldn't arm a step; surface the stop after all. */ + s->pending_step_pending = 0; + s->stop.kind = CFREE_STOP_BREAKPOINT; + s->stop.bp_id = bp->user_id; + goto park; + } + /* Apply any PC override the prepare-step path set. */ + if (s->pending_has_pc) { + regs->pc = s->pending_pc_override; + s->pending_has_pc = 0; + } + return 0; + } + if (bp->condition && bp->condition(bp->condition_user, regs) == 0) { + /* Condition rejected: same silent-resume path. */ + s->pending_step_pending = 1; + if (dbg_step_resume(s, CFREE_RESUME_CONTINUE) != 0) { + s->pending_step_pending = 0; + s->stop.kind = CFREE_STOP_BREAKPOINT; + s->stop.bp_id = bp->user_id; + goto park; + } + if (s->pending_has_pc) { + regs->pc = s->pending_pc_override; + s->pending_has_pc = 0; + } + return 0; + } + s->stop.kind = CFREE_STOP_BREAKPOINT; + s->stop.bp_id = bp->user_id; + if (bp->max_hits != 0 && bp->hit_count >= bp->max_hits + bp->skip_count) { + /* Auto-clear after surfacing. Defer to post-park so the bp_id + * is still valid when the driver inspects. */ + } + goto park; + } + } + + /* Not a patched address — pass through as SIGNAL (covers SEGV, BUS, + * ILL, FPE, and any SIGTRAP from a program-emitted BRK). */ + (void)signo_is_trap; + s->stop.kind = CFREE_STOP_SIGNAL; + +park: + s->state = DBG_STATE_STOPPED; + s->os->event_signal(s->os->user, s->ev_stop); + s->os->event_wait(s->os->user, s->ev_resume); + s->os->event_reset(s->os->user, s->ev_resume); + /* Apply pending PC override (set by step_resume) before returning. */ + if (s->pending_has_pc) { + regs->pc = s->pending_pc_override; + s->stop.regs.pc = s->pending_pc_override; + s->pending_has_pc = 0; + } else { + /* Allow REPL set_regs to mutate any field. */ + memcpy(regs, &s->stop.regs, sizeof(*regs)); + } + return 0; +} + +/* ---- worker thread -------------------------------------------------- */ + +static void worker_main(void* arg) { + CfreeJitSession* s = (CfreeJitSession*)arg; + for (;;) { + s->os->event_wait(s->os->user, s->ev_resume); + s->os->event_reset(s->os->user, s->ev_resume); + if (s->worker_should_exit) return; + if (s->state == DBG_STATE_RUNNING && s->entry) { + typedef int (*EntryIntArgv)(int, char**); + int ret = 0; + switch (s->entry_kind) { + case CFREE_ENTRY_INT_ARGV: + ret = ((EntryIntArgv)s->entry)(s->entry_argc, s->entry_argv); + break; + } + memset(&s->stop, 0, sizeof(s->stop)); + s->stop.kind = CFREE_STOP_EXIT; + s->stop.exit_code = ret; + s->state = DBG_STATE_EXITED; + s->entry = NULL; + s->os->event_signal(s->os->user, s->ev_stop); + } + } +} + +/* ---- public entries ------------------------------------------------- */ + +CfreeJitSession* cfree_jit_session_new(CfreeJit* jit) { + CfreeJitSession* s; + Compiler* c; + Heap* heap; + const CfreeDbgOs* os; + CfreeDbgSignalOps ops; + + if (!jit) return NULL; + c = cfree_jit_compiler(jit); + if (!c || !c->env) return NULL; + os = c->env->dbg_os; + if (!os) return NULL; + if (!os->thread_start || !os->thread_join || !os->event_new || + !os->event_wait || !os->event_signal || !os->event_reset || + !os->event_free || !os->signals_install || !os->signals_uninstall || + !os->code_write_begin || !os->code_write_end || !os->guarded_copy) { + return NULL; + } + /* v1 only supports aarch64 lifters; refuse other targets early so we + * don't end up with patched bytes we can't roll back. */ + if (cfree_jit_image_arch(jit) != CFREE_ARCH_ARM_64) return NULL; + + heap = (Heap*)c->env->heap; + s = (CfreeJitSession*)heap->alloc(heap, sizeof(*s), _Alignof(CfreeJitSession)); + if (!s) return NULL; + memset(s, 0, sizeof(*s)); + s->jit = jit; + s->c = c; + s->heap = heap; + s->os = os; + s->arch = cfree_jit_image_arch(jit); + s->state = DBG_STATE_IDLE; + + s->ev_resume = os->event_new(os->user); + s->ev_stop = os->event_new(os->user); + if (!s->ev_resume || !s->ev_stop) { + if (s->ev_resume) os->event_free(os->user, s->ev_resume); + if (s->ev_stop) os->event_free(os->user, s->ev_stop); + heap->free(heap, s, sizeof(*s)); + return NULL; + } + + dbg_bp_init(s); + + ops.on_fault = on_fault; + if (os->signals_install(os->user, &ops, s) != 0) { + os->event_free(os->user, s->ev_resume); + os->event_free(os->user, s->ev_stop); + heap->free(heap, s, sizeof(*s)); + return NULL; + } + + if (os->thread_start(os->user, worker_main, s, &s->worker) != 0) { + os->signals_uninstall(os->user); + os->event_free(os->user, s->ev_resume); + os->event_free(os->user, s->ev_stop); + heap->free(heap, s, sizeof(*s)); + return NULL; + } + s->worker_alive = 1; + return s; +} + +int cfree_jit_session_attach_dwarf(CfreeJitSession* s, CfreeDebugInfo* dw) { + if (!s) return 1; + s->dwarf = dw; + return 0; +} + +int dbg_session_signal_resume(CfreeJitSession* s) { + if (!s) return 1; + s->state = DBG_STATE_RUNNING; + s->os->event_signal(s->os->user, s->ev_resume); + return 0; +} + +int dbg_session_wait_stop(CfreeJitSession* s) { + if (!s) return 1; + s->os->event_wait(s->os->user, s->ev_stop); + s->os->event_reset(s->os->user, s->ev_stop); + return 0; +} + +void cfree_jit_session_free(CfreeJitSession* s) { + if (!s) return; + /* If the worker is parked inside the signal handler (STOPPED), there is + * no clean way to unwind it without re-running the user's program to + * completion: the kernel will restart it at the trap PC and re-trap. + * The session is only torn down at process exit, so leak the worker and + * let the OS reap it. We skip event/signal/heap teardown for the same + * reason — the worker may still touch them before _exit. */ + if (s->worker_alive && s->state == DBG_STATE_STOPPED) return; + + s->worker_should_exit = 1; + if (s->worker_alive) { + s->os->event_signal(s->os->user, s->ev_resume); + s->os->thread_join(s->os->user, s->worker); + s->worker_alive = 0; + } + s->os->signals_uninstall(s->os->user); + dbg_displaced_fini(s); + dbg_bp_fini(s); + if (s->ev_resume) s->os->event_free(s->os->user, s->ev_resume); + if (s->ev_stop) s->os->event_free(s->os->user, s->ev_stop); + s->heap->free(s->heap, s, sizeof(*s)); +} + +int cfree_jit_session_call(CfreeJitSession* s, void* entry, CfreeEntryKind kind, + int argc, char** argv, CfreeStopInfo* stop_out) { + if (!s || !entry) return 1; + if (s->state == DBG_STATE_RUNNING) return 1; + s->entry = entry; + s->entry_kind = kind; + s->entry_argc = argc; + s->entry_argv = argv; + s->state = DBG_STATE_RUNNING; + s->pending_mode = CFREE_RESUME_CONTINUE; + s->pending_has_pc = 0; + s->pending_step_pending = 0; + s->os->event_reset(s->os->user, s->ev_stop); + s->os->event_signal(s->os->user, s->ev_resume); + s->os->event_wait(s->os->user, s->ev_stop); + s->os->event_reset(s->os->user, s->ev_stop); + if (stop_out) *stop_out = s->stop; + return 0; +} + +int cfree_jit_session_resume(CfreeJitSession* s, CfreeResumeMode mode, + CfreeStopInfo* stop_out) { + if (!s) return 1; + if (s->state == DBG_STATE_EXITED) return 1; + if (s->state != DBG_STATE_STOPPED) return 1; + + s->pending_mode = mode; + s->pending_has_pc = 0; + s->pending_step_pending = 0; + s->pending_done = 0; + + /* For CONTINUE-over-bp we use displaced step to skip the patched insn + * and rely on the on_fault handler's pending_step_pending fast-path to + * not surface that step's BRK to the REPL. */ + if (mode == CFREE_RESUME_CONTINUE && + dbg_bp_lookup_index(s, s->stop.regs.pc) != 0) { + s->pending_step_pending = 1; + if (dbg_step_resume(s, CFREE_RESUME_STEP_INSN) != 0) { + s->pending_step_pending = 0; + return 1; + } + } else if (mode != CFREE_RESUME_CONTINUE) { + if (dbg_step_resume(s, mode) != 0) return 1; + } + + if (!s->pending_done) { + s->state = DBG_STATE_RUNNING; + s->os->event_signal(s->os->user, s->ev_resume); + s->os->event_wait(s->os->user, s->ev_stop); + s->os->event_reset(s->os->user, s->ev_stop); + } + s->pending_done = 0; + if (stop_out) *stop_out = s->stop; + return 0; +} + +int cfree_jit_session_interrupt(CfreeJitSession* s) { + if (!s) return 1; + if (s->state != DBG_STATE_RUNNING) return 1; + if (!s->os->thread_interrupt) return 1; + return s->os->thread_interrupt(s->os->user, s->worker); +} + +int cfree_jit_session_read_mem(CfreeJitSession* s, uint64_t addr, void* dst, + size_t n) { + if (!s) return 1; + if (s->state != DBG_STATE_STOPPED && s->state != DBG_STATE_EXITED) return 1; + return dbg_mem_read(s, addr, dst, n); +} + +int cfree_jit_session_write_mem(CfreeJitSession* s, uint64_t addr, + const void* src, size_t n) { + if (!s) return 1; + if (s->state != DBG_STATE_STOPPED && s->state != DBG_STATE_EXITED) return 1; + return dbg_mem_write(s, addr, src, n); +} + +int cfree_jit_session_get_regs(CfreeJitSession* s, CfreeUnwindFrame* out) { + if (!s || !out) return 1; + if (s->state != DBG_STATE_STOPPED) return 1; + *out = s->stop.regs; + return 0; +} + +int cfree_jit_session_set_regs(CfreeJitSession* s, const CfreeUnwindFrame* in) { + if (!s || !in) return 1; + if (s->state != DBG_STATE_STOPPED) return 1; + if (!cfree_jit_image_contains(s->jit, in->pc)) return 1; + s->stop.regs = *in; + return 0; +} + +int cfree_jit_session_breakpoint_set(CfreeJitSession* s, uint64_t addr, + uint32_t* bp_id_out) { + if (!s) return 1; + return dbg_bp_set(s, addr, bp_id_out); +} + +int cfree_jit_session_breakpoint_clear(CfreeJitSession* s, uint32_t bp_id) { + if (!s) return 1; + return dbg_bp_clear(s, bp_id); +} + +int cfree_jit_session_breakpoint_set_spec(CfreeJitSession* s, + const CfreeBreakpointSpec* spec, + uint32_t* bp_id_out) { + if (!s || !spec) return 1; + return dbg_bp_set_spec(s, spec, bp_id_out); +} diff --git a/src/dbg/step.c b/src/dbg/step.c @@ -0,0 +1,200 @@ +/* Resume-mode state machine. + * + * STEP_INSN uses the displaced-step primitive directly via pending_pc_override + * and the outer session_resume's signal/wait. STEP_LINE / NEXT_LINE / STEP_OUT + * drive their own signal/wait cycles in-loop and set pending_done so the outer + * resume short-circuits. */ + +#include "dbg/dbg.h" + +#include <string.h> + +#define DBG_STEP_LINE_INSN_CAP 1024u +#define DBG_AA64_BL_MASK 0xFC000000u +#define DBG_AA64_BL_OP 0x94000000u + +static int prepare_step_insn(CfreeJitSession* s) { + uint64_t pc = s->stop.regs.pc; + uint64_t scratch_entry = 0; + + if (dbg_displaced_prepare(s, pc, &scratch_entry) != 0) { + return 1; + } + s->pending_has_pc = 1; + s->pending_pc_override = scratch_entry; + return 0; +} + +/* Drive a single displaced-step cycle synchronously. Returns 0 on success; + * the session is parked again at the post-step PC. Returns 1 on prepare + * failure (worker not advanced). */ +static int do_one_displaced(CfreeJitSession* s) { + if (prepare_step_insn(s) != 0) return 1; + if (dbg_session_signal_resume(s) != 0) return 1; + if (dbg_session_wait_stop(s) != 0) return 1; + return 0; +} + +static int stop_is_internal_completion(const CfreeJitSession* s) { + return s->stop.kind == CFREE_STOP_BREAKPOINT && s->stop.bp_id == 0; +} + +static int dwarf_line_for(CfreeJitSession* s, uint64_t pc, const char** file, + uint32_t* line) { + uint32_t col = 0; + *file = NULL; + *line = 0; + return cfree_dwarf_addr_to_line(s->dwarf, pc, file, line, &col); +} + +static int dwarf_sub_for(CfreeJitSession* s, uint64_t pc, + CfreeDwarfSubprogram* out) { + memset(out, 0, sizeof(*out)); + return cfree_dwarf_subprogram_at(s->dwarf, pc, out); +} + +static int line_changed(const char* base_file, uint32_t base_line, + const char* new_file, uint32_t new_line) { + if (new_line == 0) return 0; + if (base_line == 0) return 1; + if (new_line != base_line) return 1; + if (base_file != new_file) { + if (!base_file || !new_file) return 1; + if (strcmp(base_file, new_file) != 0) return 1; + } + return 0; +} + +static int run_step_line_loop(CfreeJitSession* s) { + const char* base_file = NULL; + uint32_t base_line = 0; + CfreeDwarfSubprogram base_sub; + int have_sub; + u32 i; + + (void)dwarf_line_for(s, s->stop.regs.pc, &base_file, &base_line); + have_sub = (dwarf_sub_for(s, s->stop.regs.pc, &base_sub) == 0); + + for (i = 0; i < DBG_STEP_LINE_INSN_CAP; ++i) { + const char* new_file = NULL; + uint32_t new_line = 0; + CfreeDwarfSubprogram new_sub; + int have_new_sub; + + if (do_one_displaced(s) != 0) { + /* Lifter declined; surface whatever the current stop is. */ + return 0; + } + if (!stop_is_internal_completion(s)) { + /* User breakpoint, signal, or exit — surface. */ + return 0; + } + + (void)dwarf_line_for(s, s->stop.regs.pc, &new_file, &new_line); + have_new_sub = (dwarf_sub_for(s, s->stop.regs.pc, &new_sub) == 0); + + if (have_sub && have_new_sub) { + if (new_sub.low_pc != base_sub.low_pc) { + /* Left the original subprogram — STEP_LINE follows in. */ + return 0; + } + } else if (have_sub != have_new_sub) { + return 0; + } + + if (line_changed(base_file, base_line, new_file, new_line)) { + return 0; + } + } + return 0; +} + +static int read_insn_word(CfreeJitSession* s, uint64_t pc, uint32_t* out) { + uint8_t buf[4]; + if (dbg_mem_read(s, pc, buf, 4) != 0) return 1; + *out = ((uint32_t)buf[0]) | ((uint32_t)buf[1] << 8) | + ((uint32_t)buf[2] << 16) | ((uint32_t)buf[3] << 24); + return 0; +} + +static int aa64_is_bl(uint32_t insn) { + return (insn & DBG_AA64_BL_MASK) == DBG_AA64_BL_OP; +} + +static int run_step_out(CfreeJitSession* s) { + CfreeUnwindFrame frame; + u32 bp_id = 0; + frame = s->stop.regs; + if (cfree_dwarf_unwind_step(s->dwarf, &frame) != 0) return 1; + if (frame.pc == 0) return 1; + if (dbg_bp_set_internal(s, frame.pc, &bp_id) != 0) return 1; + if (dbg_session_signal_resume(s) != 0) return 1; + if (dbg_session_wait_stop(s) != 0) return 1; + return 0; +} + +static int run_next_line(CfreeJitSession* s) { + uint32_t insn = 0; + if (s->arch != CFREE_ARCH_ARM_64) return 1; + + if (read_insn_word(s, s->stop.regs.pc, &insn) != 0) { + return run_step_line_loop(s); + } + if (!aa64_is_bl(insn)) { + return run_step_line_loop(s); + } + + /* Step OVER the call by setting an internal bp at the unwound return PC + * and CONTINUE-ing. After the bp fires, fall into the STEP_LINE loop to + * keep advancing until the source line actually changes. */ + { + CfreeUnwindFrame frame = s->stop.regs; + u32 bp_id = 0; + if (cfree_dwarf_unwind_step(s->dwarf, &frame) != 0 || frame.pc == 0) { + /* Fall back to stepping into the call. */ + return run_step_line_loop(s); + } + if (dbg_bp_set_internal(s, frame.pc, &bp_id) != 0) { + return run_step_line_loop(s); + } + if (dbg_session_signal_resume(s) != 0) return 1; + if (dbg_session_wait_stop(s) != 0) return 1; + if (stop_is_internal_completion(s)) { + return run_step_line_loop(s); + } + return 0; + } +} + +int dbg_step_resume(CfreeJitSession* s, CfreeResumeMode mode) { + switch (mode) { + case CFREE_RESUME_CONTINUE: + if (dbg_bp_lookup_index(s, s->stop.regs.pc) != 0) { + return prepare_step_insn(s); + } + return 0; + + case CFREE_RESUME_STEP_INSN: + return prepare_step_insn(s); + + case CFREE_RESUME_STEP_LINE: + if (!s->dwarf) return 1; + if (run_step_line_loop(s) != 0) return 1; + s->pending_done = 1; + return 0; + + case CFREE_RESUME_NEXT_LINE: + if (!s->dwarf) return 1; + if (s->arch != CFREE_ARCH_ARM_64) return 1; + if (run_next_line(s) != 0) return 1; + s->pending_done = 1; + return 0; + + case CFREE_RESUME_STEP_OUT: + if (!s->dwarf) return 1; + if (run_step_out(s) != 0) return 1; + s->pending_done = 1; + return 0; + } + return 1; +} diff --git a/src/link/link_jit.c b/src/link/link_jit.c @@ -332,34 +332,127 @@ void* cfree_jit_lookup(CfreeJit* jit, const char* name) { return (void*)vaddr_to_runtime(jit->image, jit->segs, s->vaddr); } -/* ---- inspector entries (stubs; out of scope for this cut) ---- */ +/* ---- inspector entries ---- */ const CfreeObjFile* cfree_jit_view(CfreeJit* jit) { (void)jit; return NULL; } +/* True for symbol kinds the user-facing JIT inspector surfaces. Mapping + * symbols (SK_NOTYPE — aarch64 $x/$d), section/file/undef symbols are + * skipped. */ +static int jit_sym_kind_visible(u8 k) { + return k == SK_FUNC || k == SK_OBJ || k == SK_COMMON || k == SK_TLS || + k == SK_IFUNC || k == SK_ABS; +} + +/* Resolve a LinkSymbol's interned name to a stable C string, stripping + * the target format's C-mangling prefix (Mach-O's leading `_`) so the + * user sees the source-level name. */ +static const char* jit_sym_name_cstr(CfreeJit* jit, const LinkSymbol* s) { + size_t len = 0; + const char* nm = pool_str(jit->c->global, s->name, &len); + if (!nm) return NULL; + obj_format_demangle_c(jit->c, &nm, &len); + return nm; +} + +static uint64_t jit_sym_runtime_addr(CfreeJit* jit, const LinkSymbol* s) { + if (s->kind == SK_ABS) return s->vaddr; + return (uint64_t)vaddr_to_runtime(jit->image, jit->segs, s->vaddr); +} + int cfree_jit_addr_to_sym(CfreeJit* jit, uint64_t addr, const char** name_out, uint64_t* off_out) { - (void)jit; - (void)addr; + u32 n; + u32 i; if (name_out) *name_out = NULL; if (off_out) *off_out = 0; + if (!jit) return 1; + n = LinkSyms_count(&jit->image->syms); + for (i = 0; i < n; ++i) { + LinkSymbol* s = LinkSyms_at(&jit->image->syms, i); + uint64_t base; + if (!s || !s->defined) continue; + if (!jit_sym_kind_visible(s->kind)) continue; + base = jit_sym_runtime_addr(jit, s); + if (!base) continue; + if (addr >= base && (s->size == 0 || addr < base + s->size)) { + if (name_out) *name_out = jit_sym_name_cstr(jit, s); + if (off_out) *off_out = addr - base; + return 0; + } + } return 1; } +struct CfreeJitSymIter { + CfreeJit* jit; + u32 next; +}; + CfreeJitSymIter* cfree_jit_sym_iter_new(CfreeJit* jit) { - (void)jit; - return NULL; + Heap* h; + CfreeJitSymIter* it; + if (!jit) return NULL; + h = (Heap*)jit->c->env->heap; + it = (CfreeJitSymIter*)h->alloc(h, sizeof(*it), _Alignof(CfreeJitSymIter)); + if (!it) return NULL; + it->jit = jit; + it->next = 0; + return it; } int cfree_jit_sym_iter_next(CfreeJitSymIter* it, CfreeJitSym* out) { - (void)it; - (void)out; + u32 n; + if (!it || !out) return 0; + n = LinkSyms_count(&it->jit->image->syms); + while (it->next < n) { + LinkSymbol* s = LinkSyms_at(&it->jit->image->syms, it->next++); + if (!s || !s->defined) continue; + if (!jit_sym_kind_visible(s->kind)) continue; + out->name = jit_sym_name_cstr(it->jit, s); + out->addr = jit_sym_runtime_addr(it->jit, s); + out->size = s->size; + out->kind = (CfreeSymKind)s->kind; + return 1; + } return 0; } -void cfree_jit_sym_iter_free(CfreeJitSymIter* it) { (void)it; } +void cfree_jit_sym_iter_free(CfreeJitSymIter* it) { + Heap* h; + if (!it) return; + h = (Heap*)it->jit->c->env->heap; + h->free(h, it, sizeof(*it)); +} + +/* ---- accessors for src/dbg/ ---- + * + * The CfreeJit struct is private to this TU. The debugger session needs a + * way to validate that an address lies inside the JIT image (so it can + * reject breakpoint and read/write requests pointing outside the code + * region) and a way to read the image's target arch. These accessors give + * it just that, without exporting the segment table or the LinkImage. */ +int cfree_jit_image_contains(CfreeJit* jit, uint64_t runtime_addr) { + u32 i; + uintptr_t a; + if (!jit || !jit->segs) return 0; + a = (uintptr_t)runtime_addr; + for (i = 0; i < jit->nsegs; ++i) { + uintptr_t lo = (uintptr_t)jit->segs[i].runtime; + uintptr_t hi = lo + (uintptr_t)jit->segs[i].size; + if (a >= lo && a < hi) return 1; + } + return 0; +} + +CfreeArchKind cfree_jit_image_arch(CfreeJit* jit) { + return jit->c->target.arch; +} + +Compiler* cfree_jit_compiler(CfreeJit* jit) { return jit->c; } void cfree_jit_run_dtors(CfreeJit* jit) { typedef void (*VoidFn)(void);