Debugger (kit dbg)
kit dbg is an interactive, in-process source-level debugger that drives a
JIT-linked image under controlled execution. It owns one worker thread running
the JIT'd entry, catches its faults, and lets a REPL inspect and steer the
program: breakpoints by file:line / sym[+off] / 0xADDR, instruction and
source-line stepping (into / over / out), backtraces, register and named-variable
read/write, and raw memory examine. The library half is src/dbg/ (freestanding
C11, like all of src/); the host primitives and the REPL live in the driver.
See JIT.md for how the image is produced and DWARF.md for
the line/CFI/variable tables the source-level features consume.
Layering
driver/cmd/dbg.c REPL: command parsing, stop rendering, DWARF queries,
driver-local breakpoint table, SIGINT forwarding
driver/env/* KitDbgOs host adapter (threads, signals, W^X, fault copy)
│ (public API: kit_jit_session_*, KitDbgOs vtable)
▼
src/dbg/ library-side session
session.c worker thread, event handshake, fault classification
step.c resume-mode state machine (insn / line / next / out)
displaced.c arch-neutral out-of-line single-step plumbing
bp.c address-keyed breakpoint patch table
mem.c guarded guest-memory read/write + bp-byte overlay
dbg.h internal session contracts
│ (per-arch hooks: ArchImpl.dbg → ArchDbgOps)
▼
src/arch/{aa64,x64,rv64}/dbg.c trap encoding, insn decode, displaced shims
The session never calls pthread_*, sigaction, mprotect, or an icache flush
directly. Every host primitive funnels through one vtable, KitDbgOs
(include/kit/dbg.h), supplied at kit_jit_session_new through a
KitDbgHost. Everything architecture-specific — the trap byte sequence,
instruction decoding, the displaced-step shim — reaches the session through
ArchImpl.dbg (an ArchDbgOps, src/arch/arch.h). The session itself is pure
coordination logic: it knows about events, a stop slot, a breakpoint table, and a
single scratch page, and nothing about the host OS or the target ISA.
This separation is the central design decision. It keeps the debugger's control
logic testable and portable, isolates all signal/thread/memory-protection hazards
to a handful of driver TUs, and lets a new target gain a debugger by implementing
only the ArchDbgOps hooks.
Worker thread and the fault handshake
A session owns exactly one worker thread (worker_main in src/dbg/session.c).
The worker and the REPL thread share two one-shot events — ev_resume and
ev_stop — and a single KitStopInfo slot on the session. The events are the
only synchronization point: the REPL reads the stop slot only while the worker is
parked, and the worker mutates session state only from the fault handler.
REPL thread worker thread
----------- -------------
session_new ── thread_start ──────► wait(ev_resume)
│
session_call(entry) ──────────────────┤
state = RUNNING │
signal(ev_resume) ───────────────► run entry (call_with_catch)
wait(ev_stop) │
│ (a) fault: SIGTRAP/SEGV/BUS/ILL/FPE/interrupt
│ on_fault: snapshot regs, classify,
│ state = STOPPED,
│ ◄──────────────── signal(ev_stop); wait(ev_resume)
inspect stop slot (parked inside the signal handler)
session_resume(MODE) ──────────────► (b) entry returns normally:
│ state = EXITED, stop.kind = EXIT,
│ ◄──────────────── signal(ev_stop)
read KitStopInfo
The worker loops over ev_resume. On the run-from-entry path it invokes
worker_run_entry, which dispatches the entry as either an int(int,char**)
main or a 0–8 argument uint64_t(...) thunk (the latter backs the REPL's expr
function calls). The entry runs inside call_with_catch, an optional host hook
that establishes a sigsetjmp landing pad so a RESUME_ABORT can longjmp the
worker out of a parked fault back to the loop. When the entry returns the worker
records a synthetic KIT_STOP_EXIT and signals ev_stop.
A fault on the worker enters the host signal handler, which marshals the
ucontext into a KitUnwindFrame and calls back into the session's on_fault.
This is the heart of the design: the worker parks itself inside the signal
handler. on_fault snapshots the registers into the stop slot, classifies the
cause, sets state = STOPPED, signals ev_stop, then blocks on ev_resume.
When the REPL later resumes, on_fault writes any mutated registers (a PC
override or REPL set_regs edits) back into the live KitUnwindFrame and
returns, so the kernel restarts the worker exactly where the debugger wants it.
There is no separate "continue" mechanism — resume is just unblocking the parked
handler.
Fault classification
on_fault decides what kind of stop a fault represents (KitStopKind +
KitStopReason):
- Interrupt — the signal equals the host's
interrupt_signo(delivered bykit_jit_session_interruptviapthread_kill). Reported asKIT_STOP_INTERRUPT. - Breakpoint — the faulting PC, normalized by the arch hook
breakpoint_addr_from_fault_pc(x86 INT3 reports the PC after the trap byte; aarch64 BRK reports at the byte), hits the breakpoint table. The handler then branches on the entry kind:- Displaced-step sentinel (an
internalbp at the activedisplaced.return_pc) → finalize the displaced step, restore the user PC, and either resume silently (the continue-over-breakpoint fast path) or surface a step completion. - Plain internal bp (a one-shot dropped by
step.c) → clear it and report a step completion. - User bp → apply
skip_count, the host-sideconditioncallback, andmax_hits; a skip or rejected condition silently re-steps over the patched instruction (see displaced step) without notifying the REPL; otherwise reportKIT_STOP_BREAKPOINTwith the user's bp id.
- Displaced-step sentinel (an
- Signal — any unpatched fault: SEGV, BUS, ILL, FPE, or a SIGTRAP the program
emitted itself. Reported as
KIT_STOP_SIGNAL, distinguishing a genuine trap (trap_signo) from other signals.
Invariants the handshake relies on: one worker per session; session_call is
rejected while RUNNING or STOPPED; the REPL touches the stop slot only after
wait(ev_stop) returns; the worker writes the stop slot only from the
async-signal context inside on_fault.
Teardown while parked
kit_jit_session_free deliberately leaks the worker when it is torn down in the
STOPPED state. A worker parked inside the signal handler cannot be cleanly
unwound without re-running the program to completion, and the session is only
freed at process exit, so the OS reaps the thread and the event/signal/heap
teardown is skipped. This keeps quitting from a stopped prompt immediate.
Resume-mode state machine
kit_jit_session_resume takes a KitResumeMode and produces the next stop.
Plain CONTINUE simply unblocks the parked handler — unless the current PC sits
on a breakpoint patch, in which case the original instruction must execute before
control continues. Everything more elaborate is built in src/dbg/step.c on two
primitives: the displaced single-step (one instruction out of line) and one-shot
internal breakpoints.
The state machine has two execution styles. STEP_INSN and continue-over-bp set
a pending PC override and let the outer session_resume drive the single
signal/wait cycle. The source-level modes (STEP_LINE, NEXT_LINE, STEP_OUT)
drive their own signal/wait cycles in a loop inside step.c and set
pending_done so the outer resume short-circuits. They require an attached DWARF
binding and return an error without one.
- STEP_INSN — prepare a displaced step at the current PC; resume; the sentinel trap reports the completion.
- STEP_LINE (step into) — record the current source line and subprogram from DWARF, then advance one instruction at a time. At each instruction: if it is a direct call, drop a one-shot bp at the callee entry and continue into it; if it is a direct jump, follow it the same way; otherwise displaced-step over it. Stop when the source line changes or control leaves the original subprogram.
- NEXT_LINE (step over) — if the current instruction is a call, set a one-shot bp at the unwound return address (via DWARF CFI) and continue over it, then fall into the STEP_LINE loop to keep advancing until the line actually changes. A non-call falls straight into the STEP_LINE loop.
- STEP_OUT (finish) — unwind one frame with
kit_dwarf_unwind_step(or an arch link-register fallback), set a one-shot bp at the caller's return address, and continue.
A bounded instruction cap guards the STEP_LINE loop against runaway stepping when the line table is sparse or absent. All PCs cross a runtime↔image translation boundary before any DWARF query (see below).
Displaced single-step
User-mode aarch64 has no architectural single-step (the single-step bit lives at
EL1). The debugger therefore executes one instruction out of line: it copies a
fixed-up version of the target instruction into a scratch page, appends a trap
sentinel, sets the worker PC to the scratch entry, and resumes. This primitive
also underlies "resume past a breakpoint": the patched byte cannot execute in
place, so the original instruction runs from scratch instead. src/dbg/displaced.c
is the arch-neutral plumbing; the per-arch lifter is
ArchDbgOps.build_displaced_shim.
session reserves one RX scratch page (from the JIT's execmem pool)
displaced_prepare(insn_pc):
decode the original instruction (read via dbg_mem_read, so the saved
byte — not the trap patch — is decoded)
build_displaced_shim → writes a fixed-up copy + a trap sentinel into the
scratch page, returns the sentinel offset and the fallthrough PC
flush icache over the whole slot
arm an internal bp on the sentinel (so the fault classifier recognizes it)
new_pc = scratch entry
worker resumes at scratch ─► runs the fixed-up insn ─► hits the sentinel
displaced_finalize:
clear the internal bp
if the worker stopped at the sentinel, set PC = fallthrough_pc;
if a fixed-up branch took, PC is already elsewhere — leave it
Fixup is needed for any PC-relative operand so the instruction behaves correctly
at the scratch address rather than its original one. The aarch64 lifter
(src/arch/aa64/dbg.c) handles the full core family:
- No PC-relative operand → copied verbatim, sentinel immediately after.
B/BL/B.cond→ re-encode the immediate (or, when out of range, a literal-load + indirect-branch trampoline).CBZ/CBNZ/TBZ/TBNZ→ always a trampoline: the not-taken path falls through to a sentinel, the taken path loads the absolute target from a literal pool and branches to it.ADR/ADRP→ replaced with a load of the original (absolute) result from a literal pool.LDR(literal), integer andLDRSW→ synthesized as a two-step indirect load through the fixed-up literal address.BR/BLR/RET→ copied verbatim; the trailing sentinel never fires because control has already left the scratch slot, soprepareis idempotent and clears any lingering internal bp before laying down the next shim.
A single scratch slot per session suffices because exactly one displaced step is
ever in flight. The scratch page is allocated lazily from the JIT image's own
execmem pool (kit_jit_image_execmem); if the JIT was built without one,
single-step paths return KIT_UNSUPPORTED.
Breakpoint table
src/dbg/bp.c is an arch-neutral table keyed by runtime address. Each slot holds
the address, the bytes the trap patch overwrote, a refcount, a monotonic
user-visible id, and the per-breakpoint policy (skip_count, condition,
max_hits). Internal (one-shot) ids are drawn from a separate high id space so
they never collide with user ids; this lets the step machinery drop temporaries
without disturbing a user breakpoint at the same PC. The refcount makes
set/clear idempotent: a second set at an occupied address bumps the count
and reuses the existing id.
Patching goes through the host W^X window: code_write_begin returns a writable
alias for the runtime address (the write side of a dual mapping on hosts that have
one, otherwise a transient mprotect RW↔RX flip), the original bytes are saved
and the arch's trap sequence (ArchDbgOps.breakpoint_patch, e.g. BRK on
aarch64) is written, then code_write_end and an icache flush. User breakpoints
are constrained to the JIT image address range; internal breakpoints are also
allowed in the scratch page, which lies outside it. dbg_bp_fini restores every
armed patch so the image is left clean.
The table is also a read overlay: dbg_bp_unpatch_read substitutes the saved
bytes back into any memory read that overlaps a patched range, so x,
disassembly, and the displaced-step decoder never see the trap byte.
Guest memory and registers
Because the worker shares the REPL's address space, reading and writing guest
memory is a guarded memcpy, not a ptrace/mach_vm round trip. src/dbg/mem.c
delegates to the host's guarded_copy, which arms a thread-local sigsetjmp
landing slot before the copy; the host SEGV/BUS handler checks that slot first and
longjmps back on a bad address before the session's on_fault ever sees it. So a
p *badptr returns an error instead of killing the worker. Read results then pass
through the breakpoint read overlay.
Register access reads and writes the KitUnwindFrame snapshot captured in the
stop slot; set_regs mutates that slot and the parked on_fault writes it back
into the ucontext on resume. The valid states differ by operation: memory
read/write are both accepted while STOPPED or EXITED, while register get/set both
require STOPPED (there is no live register snapshot once the program has exited). A
set_regs PC must lie inside the JIT image.
REPL (driver/cmd/dbg.c)
The driver TU mirrors kit run for compile flags and argv shape (with -g
forced on), turns the input list into a JIT image, opens a DWARF view over it
(kit_jit_view → kit_dwarf_open → kit_jit_session_attach_dwarf), then
reads commands from stdin and dispatches them. Inputs follow the shared
DriverInputs shape used by kit run — C sources, objects, archives, and stdin
— so a session can mix pipeline-compiled -g sources with prebuilt .o/.a. With
no inputs the REPL starts empty and code is appended interactively with
jit/expr.
Responsibilities that stay in the driver:
- Command engine — a flat token dispatch covering
r/c, the four stepping commands,b/info b/delete/ignore,bt,p/set,x/disasm/list,info reg/locals/args,jit/edit/expr, and language switching. - Driver-local breakpoint table — the user-facing id namespace, the spec text,
and the enabled flag, keyed off the session-side handles. Location strings
(
file:line,sym[+off],0xADDR) are resolved withkit_dwarf_line_to_addrandkit_jit_lookup. - Stop rendering —
dbg_render_stopturns aKitStopInfointo a source-level message (breakpoint hit, step completion, signal, interrupt, exit) and a compact source-context listing. - DWARF queries —
btviakit_dwarf_unwind_step;p name/setviakit_dwarf_var_at+kit_dwarf_loc_read; locals/args enumeration;list. - SIGINT forwarding — while a session call is in flight the driver installs a
SIGINT handler that calls
kit_jit_session_interrupt; at the prompt it restores default behavior so Ctrl-C terminates.
Runtime-vs-image PC translation
DWARF line, CFI, and variable tables are authored in image-relative vaddrs, while
stop PCs, breakpoint addresses, and return addresses pushed by calls live in the
JIT runtime address space. Both the session (step.c) and the REPL translate at
the boundary: every DWARF call consumes an image vaddr and every DWARF result that
names a code address is image-relative, mapped with kit_jit_runtime_to_image /
kit_jit_image_to_runtime. The fallback is pass-through, so a PC outside the
image (e.g. a future multi-input stop in foreign code) degrades gracefully instead
of resolving to zero. Register and CFA values stay in their runtime form because
the DWARF location evaluator reads them as live host values.
Host adapter
The POSIX adapter (driver/env/posix_dbg.c, plus per-OS code in
driver/env/{macos,linux,freebsd}.c) is the only place in the tree that includes
<pthread.h>, <signal.h>, and <sys/mman.h> for debugger use. It provides:
pthreads for the worker and for the event objects (a mutex + condvar + signaled
flag per event); sigaction-based handlers for the trap/fault cohort plus a
reserved interrupt signal, which confirm the faulting thread is the worker before
marshalling the ucontext and calling on_fault; the W^X code-write window,
which returns a dual-mapping write alias when the host has one (mach_vm_remap on
macOS, memfd_create on Linux, both recorded in a shared registry) and otherwise
flips protection transiently; icache flush; the sigsetjmp-guarded memory copy;
and the call_with_catch/thread_abort pair backing RESUME_ABORT.
Scope and assumptions
Single worker thread; no concurrent guest threads. In-process only — read/write
memory is a guarded copy, not remote debugging. All breakpoints are software (a
trap-byte patch); the condition callback is host-side C, not JIT-compiled.
Source-level stepping assumes -O0 and -g for usable line/variable mapping.