kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

Debugger (kit dbg)

kit dbg is an interactive, in-process source-level debugger that drives a JIT-linked image under controlled execution. It owns one worker thread running the JIT'd entry, catches its faults, and lets a REPL inspect and steer the program: breakpoints by file:line / sym[+off] / 0xADDR, instruction and source-line stepping (into / over / out), backtraces, register and named-variable read/write, and raw memory examine. The library half is src/dbg/ (freestanding C11, like all of src/); the host primitives and the REPL live in the driver. See JIT.md for how the image is produced and DWARF.md for the line/CFI/variable tables the source-level features consume.

Layering

driver/cmd/dbg.c        REPL: command parsing, stop rendering, DWARF queries,
                        driver-local breakpoint table, SIGINT forwarding
driver/env/*            KitDbgOs host adapter (threads, signals, W^X, fault copy)
        │  (public API: kit_jit_session_*, KitDbgOs vtable)
        ▼
src/dbg/                library-side session
  session.c             worker thread, event handshake, fault classification
  step.c                resume-mode state machine (insn / line / next / out)
  displaced.c           arch-neutral out-of-line single-step plumbing
  bp.c                  address-keyed breakpoint patch table
  mem.c                 guarded guest-memory read/write + bp-byte overlay
  dbg.h                 internal session contracts
        │  (per-arch hooks: ArchImpl.dbg → ArchDbgOps)
        ▼
src/arch/{aa64,x64,rv64}/dbg.c   trap encoding, insn decode, displaced shims

The session never calls pthread_*, sigaction, mprotect, or an icache flush directly. Every host primitive funnels through one vtable, KitDbgOs (include/kit/dbg.h), supplied at kit_jit_session_new through a KitDbgHost. Everything architecture-specific — the trap byte sequence, instruction decoding, the displaced-step shim — reaches the session through ArchImpl.dbg (an ArchDbgOps, src/arch/arch.h). The session itself is pure coordination logic: it knows about events, a stop slot, a breakpoint table, and a single scratch page, and nothing about the host OS or the target ISA.

This separation is the central design decision. It keeps the debugger's control logic testable and portable, isolates all signal/thread/memory-protection hazards to a handful of driver TUs, and lets a new target gain a debugger by implementing only the ArchDbgOps hooks.

Worker thread and the fault handshake

A session owns exactly one worker thread (worker_main in src/dbg/session.c). The worker and the REPL thread share two one-shot events — ev_resume and ev_stop — and a single KitStopInfo slot on the session. The events are the only synchronization point: the REPL reads the stop slot only while the worker is parked, and the worker mutates session state only from the fault handler.

REPL thread                         worker thread
-----------                         -------------
session_new ── thread_start ──────► wait(ev_resume)
                                       │
session_call(entry) ──────────────────┤
  state = RUNNING                      │
  signal(ev_resume) ───────────────►  run entry (call_with_catch)
  wait(ev_stop)                        │
       │                          (a) fault: SIGTRAP/SEGV/BUS/ILL/FPE/interrupt
       │                              on_fault: snapshot regs, classify,
       │                              state = STOPPED,
       │             ◄──────────────── signal(ev_stop); wait(ev_resume)
  inspect stop slot                    (parked inside the signal handler)
  session_resume(MODE) ──────────────► (b) entry returns normally:
       │                              state = EXITED, stop.kind = EXIT,
       │             ◄──────────────── signal(ev_stop)
  read KitStopInfo

The worker loops over ev_resume. On the run-from-entry path it invokes worker_run_entry, which dispatches the entry as either an int(int,char**) main or a 0–8 argument uint64_t(...) thunk (the latter backs the REPL's expr function calls). The entry runs inside call_with_catch, an optional host hook that establishes a sigsetjmp landing pad so a RESUME_ABORT can longjmp the worker out of a parked fault back to the loop. When the entry returns the worker records a synthetic KIT_STOP_EXIT and signals ev_stop.

A fault on the worker enters the host signal handler, which marshals the ucontext into a KitUnwindFrame and calls back into the session's on_fault. This is the heart of the design: the worker parks itself inside the signal handler. on_fault snapshots the registers into the stop slot, classifies the cause, sets state = STOPPED, signals ev_stop, then blocks on ev_resume. When the REPL later resumes, on_fault writes any mutated registers (a PC override or REPL set_regs edits) back into the live KitUnwindFrame and returns, so the kernel restarts the worker exactly where the debugger wants it. There is no separate "continue" mechanism — resume is just unblocking the parked handler.

Fault classification

on_fault decides what kind of stop a fault represents (KitStopKind + KitStopReason):

Invariants the handshake relies on: one worker per session; session_call is rejected while RUNNING or STOPPED; the REPL touches the stop slot only after wait(ev_stop) returns; the worker writes the stop slot only from the async-signal context inside on_fault.

Teardown while parked

kit_jit_session_free deliberately leaks the worker when it is torn down in the STOPPED state. A worker parked inside the signal handler cannot be cleanly unwound without re-running the program to completion, and the session is only freed at process exit, so the OS reaps the thread and the event/signal/heap teardown is skipped. This keeps quitting from a stopped prompt immediate.

Resume-mode state machine

kit_jit_session_resume takes a KitResumeMode and produces the next stop. Plain CONTINUE simply unblocks the parked handler — unless the current PC sits on a breakpoint patch, in which case the original instruction must execute before control continues. Everything more elaborate is built in src/dbg/step.c on two primitives: the displaced single-step (one instruction out of line) and one-shot internal breakpoints.

The state machine has two execution styles. STEP_INSN and continue-over-bp set a pending PC override and let the outer session_resume drive the single signal/wait cycle. The source-level modes (STEP_LINE, NEXT_LINE, STEP_OUT) drive their own signal/wait cycles in a loop inside step.c and set pending_done so the outer resume short-circuits. They require an attached DWARF binding and return an error without one.

A bounded instruction cap guards the STEP_LINE loop against runaway stepping when the line table is sparse or absent. All PCs cross a runtime↔image translation boundary before any DWARF query (see below).

Displaced single-step

User-mode aarch64 has no architectural single-step (the single-step bit lives at EL1). The debugger therefore executes one instruction out of line: it copies a fixed-up version of the target instruction into a scratch page, appends a trap sentinel, sets the worker PC to the scratch entry, and resumes. This primitive also underlies "resume past a breakpoint": the patched byte cannot execute in place, so the original instruction runs from scratch instead. src/dbg/displaced.c is the arch-neutral plumbing; the per-arch lifter is ArchDbgOps.build_displaced_shim.

session reserves one RX scratch page (from the JIT's execmem pool)

displaced_prepare(insn_pc):
  decode the original instruction (read via dbg_mem_read, so the saved
    byte — not the trap patch — is decoded)
  build_displaced_shim → writes a fixed-up copy + a trap sentinel into the
    scratch page, returns the sentinel offset and the fallthrough PC
  flush icache over the whole slot
  arm an internal bp on the sentinel (so the fault classifier recognizes it)
  new_pc = scratch entry

  worker resumes at scratch ─► runs the fixed-up insn ─► hits the sentinel

displaced_finalize:
  clear the internal bp
  if the worker stopped at the sentinel, set PC = fallthrough_pc;
    if a fixed-up branch took, PC is already elsewhere — leave it

Fixup is needed for any PC-relative operand so the instruction behaves correctly at the scratch address rather than its original one. The aarch64 lifter (src/arch/aa64/dbg.c) handles the full core family:

A single scratch slot per session suffices because exactly one displaced step is ever in flight. The scratch page is allocated lazily from the JIT image's own execmem pool (kit_jit_image_execmem); if the JIT was built without one, single-step paths return KIT_UNSUPPORTED.

Breakpoint table

src/dbg/bp.c is an arch-neutral table keyed by runtime address. Each slot holds the address, the bytes the trap patch overwrote, a refcount, a monotonic user-visible id, and the per-breakpoint policy (skip_count, condition, max_hits). Internal (one-shot) ids are drawn from a separate high id space so they never collide with user ids; this lets the step machinery drop temporaries without disturbing a user breakpoint at the same PC. The refcount makes set/clear idempotent: a second set at an occupied address bumps the count and reuses the existing id.

Patching goes through the host W^X window: code_write_begin returns a writable alias for the runtime address (the write side of a dual mapping on hosts that have one, otherwise a transient mprotect RW↔RX flip), the original bytes are saved and the arch's trap sequence (ArchDbgOps.breakpoint_patch, e.g. BRK on aarch64) is written, then code_write_end and an icache flush. User breakpoints are constrained to the JIT image address range; internal breakpoints are also allowed in the scratch page, which lies outside it. dbg_bp_fini restores every armed patch so the image is left clean.

The table is also a read overlay: dbg_bp_unpatch_read substitutes the saved bytes back into any memory read that overlaps a patched range, so x, disassembly, and the displaced-step decoder never see the trap byte.

Guest memory and registers

Because the worker shares the REPL's address space, reading and writing guest memory is a guarded memcpy, not a ptrace/mach_vm round trip. src/dbg/mem.c delegates to the host's guarded_copy, which arms a thread-local sigsetjmp landing slot before the copy; the host SEGV/BUS handler checks that slot first and longjmps back on a bad address before the session's on_fault ever sees it. So a p *badptr returns an error instead of killing the worker. Read results then pass through the breakpoint read overlay.

Register access reads and writes the KitUnwindFrame snapshot captured in the stop slot; set_regs mutates that slot and the parked on_fault writes it back into the ucontext on resume. The valid states differ by operation: memory read/write are both accepted while STOPPED or EXITED, while register get/set both require STOPPED (there is no live register snapshot once the program has exited). A set_regs PC must lie inside the JIT image.

REPL (driver/cmd/dbg.c)

The driver TU mirrors kit run for compile flags and argv shape (with -g forced on), turns the input list into a JIT image, opens a DWARF view over it (kit_jit_viewkit_dwarf_openkit_jit_session_attach_dwarf), then reads commands from stdin and dispatches them. Inputs follow the shared DriverInputs shape used by kit run — C sources, objects, archives, and stdin — so a session can mix pipeline-compiled -g sources with prebuilt .o/.a. With no inputs the REPL starts empty and code is appended interactively with jit/expr.

Responsibilities that stay in the driver:

Runtime-vs-image PC translation

DWARF line, CFI, and variable tables are authored in image-relative vaddrs, while stop PCs, breakpoint addresses, and return addresses pushed by calls live in the JIT runtime address space. Both the session (step.c) and the REPL translate at the boundary: every DWARF call consumes an image vaddr and every DWARF result that names a code address is image-relative, mapped with kit_jit_runtime_to_image / kit_jit_image_to_runtime. The fallback is pass-through, so a PC outside the image (e.g. a future multi-input stop in foreign code) degrades gracefully instead of resolving to zero. Register and CFA values stay in their runtime form because the DWARF location evaluator reads them as live host values.

Host adapter

The POSIX adapter (driver/env/posix_dbg.c, plus per-OS code in driver/env/{macos,linux,freebsd}.c) is the only place in the tree that includes <pthread.h>, <signal.h>, and <sys/mman.h> for debugger use. It provides: pthreads for the worker and for the event objects (a mutex + condvar + signaled flag per event); sigaction-based handlers for the trap/fault cohort plus a reserved interrupt signal, which confirm the faulting thread is the worker before marshalling the ucontext and calling on_fault; the W^X code-write window, which returns a dual-mapping write alias when the host has one (mach_vm_remap on macOS, memfd_create on Linux, both recorded in a shared registry) and otherwise flips protection transiently; icache flush; the sigsetjmp-guarded memory copy; and the call_with_catch/thread_abort pair backing RESUME_ABORT.

Scope and assumptions

Single worker thread; no concurrent guest threads. In-process only — read/write memory is a guarded copy, not remote debugging. All breakpoints are software (a trap-byte patch); the condition callback is host-side C, not JIT-compiled. Source-level stepping assumes -O0 and -g for usable line/variable mapping.