commit 1981a616987cc76bc3f02afbf58616014c8ce2a4
parent 50d5bec75749d30912c9f54abad250f279a0ffa2
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Sat, 9 May 2026 16:23:06 -0700
DWARF plan
Diffstat:
| A | doc/DWARF.md | | | 689 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 689 insertions(+), 0 deletions(-)
diff --git a/doc/DWARF.md b/doc/DWARF.md
@@ -0,0 +1,689 @@
+# DWARF — implementation plan
+
+Scope: what it takes for cfree to produce a DWARF-bearing object file
+and to read DWARF back out of one. The producer side is `Debug`
+(`src/debug/debug.h`) + the MCEmitter line program; the consumer side
+is the `cfree_dwarf_*` family (`include/cfree.h:1224-1450`). Both sides
+share `ObjBuilder` as the carrier — debug bytes are sections, abbrev
+codes are interned, and DIE references are section-relative
+relocations.
+
+Today the headers are real, the implementations are stubs, and the
+W path in `test/cg/run.sh` is staged and waiting for them. The first
+case it asserts (`p01_line_one_inst`) is one `set_loc` + one
+instruction — the smallest demand surface that exercises the full
+producer→consumer round trip. This plan starts there and builds out.
+
+---
+
+## 1. What working DWARF must look like
+
+Two artifacts — the same one viewed from each end.
+
+### 1.1 Producer output (object file shape)
+
+A non-`-O` aarch64-elf64 object compiled with `-g` should carry, at a
+minimum:
+
+```
+.debug_abbrev DWARF 5 abbreviation table
+.debug_info one CU; subprogram + scope + variable DIEs
+.debug_line line program; rows for every set_loc transition
+.debug_line_str file & dir strings for the line program (DW5)
+.debug_str strings for .debug_info
+.debug_str_offsets DW5 indirection table for .debug_str
+.debug_aranges CU pc-range index (kept for gdb fast path)
+.debug_loclists location lists for opt'd code (Phase 5)
+.debug_rnglists range lists for noncontiguous scopes (Phase 3+)
+.eh_frame CFI for unwind (Phase 4)
+```
+
+DIE shape we commit to in Phase 1–3:
+
+```
+DW_TAG_compile_unit (root)
+ DW_AT_producer "cfree <semver>"
+ DW_AT_language DW_LANG_C11 (or C17/C23 from CompileOptions)
+ DW_AT_name TU path (post path-remap)
+ DW_AT_comp_dir cwd at invocation (post path-remap)
+ DW_AT_stmt_list .debug_line offset
+ DW_AT_low_pc 0
+ DW_AT_ranges .debug_rnglists offset
+ DW_AT_addr_base .debug_addr offset (DW5 split disabled — addrs are inline)
+ DW_TAG_base_type, DW_TAG_pointer_type, DW_TAG_array_type,
+ DW_TAG_const_type, DW_TAG_volatile_type, DW_TAG_restrict_type,
+ DW_TAG_typedef, DW_TAG_subroutine_type,
+ DW_TAG_structure_type/DW_TAG_union_type with DW_TAG_member,
+ DW_TAG_enumeration_type with DW_TAG_enumerator
+ DW_TAG_subprogram
+ DW_AT_name, DW_AT_type, DW_AT_decl_file, DW_AT_decl_line,
+ DW_AT_low_pc, DW_AT_high_pc (offset form), DW_AT_frame_base
+ DW_TAG_formal_parameter (per param: name, type, decl_loc, location)
+ DW_TAG_lexical_block (per scope_begin/end pair)
+ DW_TAG_variable (per local)
+```
+
+What we do not emit (out of scope, at least until called for):
+
+- `DW_TAG_inlined_subroutine` — opt won't synthesize inlines yet.
+- `DW_TAG_namespace`, anything C++.
+- Split DWARF (`.dwo`, `DW_AT_GNU_dwo_*`).
+- `.debug_pubnames` / `.debug_pubtypes` — DW5 deprecated, gdb-index
+ builds its own.
+- `.debug_macro` — pp doesn't feed it yet.
+
+### 1.2 Consumer surface
+
+`cfree_dwarf_open(CfreeCompiler*, const CfreeObjFile*) → CfreeDebugInfo*`
+must answer the queries declared in `include/cfree.h:1252-1450`. The
+test/cg W path exercises a strict subset to begin with:
+
+- `cfree_dwarf_addr_to_line` / `cfree_dwarf_line_to_addr`
+- `cfree_dwarf_subprogram_at` (and its thin wrapper `_func_at`)
+- `cfree_dwarf_var_at` + `cfree_dwarf_loc_read`
+- `cfree_dwarf_type_info` + field/enum iters
+- `cfree_dwarf_unwind_step` (Phase 4)
+
+`emu/dbg` (`doc/EMU.md` §8) consumes the same API; nothing in the
+debugger should reach past `cfree_dwarf_*`.
+
+---
+
+## 2. Current state inventory
+
+### 2.1 Headers — real
+
+- `src/debug/debug.h` — full producer API (Debug, type DIE builders,
+ func/scope/var lifecycle, line program, loclist, `debug_emit`).
+- `src/debug/c_debug.h` — `c_debug_type(Debug*, TargetABI*, const Type*)`
+ adapter, with the documented "intern by Type* pointer" contract.
+- `include/cfree.h:1224-1450` — full consumer API.
+
+### 2.2 Implementations — stubs
+
+- `src/api/stubs.c:93-98` — `debug_new` panics, `debug_emit` and
+ `debug_free` no-op. No `Debug` ever exists today.
+- `src/api/pipeline.c:230-238` — pipeline already calls
+ `debug_new(c, ob)` when `opts->debug_info` is set, and
+ `debug_emit(debug)` after codegen. The driver is wired; only the
+ module behind it is missing.
+- `src/api/stubs.c:319-440` — every `cfree_dwarf_*` returns "no DWARF"
+ / NULL.
+
+### 2.3 Producer-side wiring — partial
+
+- `cg_set_loc(CG*, SrcLoc)` — declared in `src/cg/cg.h:159` ("propagates
+ to CGTarget and Debug") but `cg.c` propagates only to `CGTarget`
+ today; the Debug fanout is dead until `Debug` exists.
+- `CGTarget::set_loc` — every backend stamps the loc on the impl, but
+ no backend calls into `Debug` yet (correct: that's `cg_set_loc`'s
+ responsibility, not the backend's).
+- MCEmitter — has no notion of `(text_section, offset, SrcLoc)` rows.
+ Needs a per-emitter `LineProgram` accumulator that flushes into
+ `Debug` on demand (or that `Debug` polls on `debug_func_pc_range`).
+- Aarch64 backend — emits sized text sections (`func_end` calls
+ `obj_symbol_define` with the function size), so `debug_func_pc_range`
+ has the bounds it needs.
+- `c_debug_type` — declared, not implemented. Needs to walk the
+ `Type*` chain producing `debug_type_*` calls, with a per-Debug
+ `Type* → DebugTypeId` cache for interning.
+
+### 2.4 Test surface
+
+- `test/cg/run.sh:336-359` — W path runs `cg-runner --dwarf-checks NAME
+ | cg_check_dwarf OBJ`; cases with no directives are silently
+ skipped, cases with directives are graded by the `line` and
+ `subprogram` directives implemented in
+ `test/cg/harness/cg_check_dwarf.c:124-167`.
+- `test/cg/harness/cases_p.c:26-38` — `build_p01_line_one_inst` is the
+ only registered Group P case. Comment block calls out the dependency
+ chain: Debug, MCEmitter line program, `cfree_dwarf_open`.
+- `cg_check_dwarf` uses **only** the public consumer API. It is the
+ consumer's first real client.
+
+---
+
+## 3. Producer pipeline
+
+### 3.1 Who drives Debug
+
+Producer events split into three classes by who has the information.
+Each class has a different driver — there is no single "the producer"
+that calls Debug.
+
+| Event class | Driver | Source-level info | Storage info | Text-offset info |
+|---|---|---|---|---|
+| `func_begin/end`, decl loc, types | parser | parser (`Type*`, decl `SrcLoc`) | — | — |
+| `param`, `local`, `scope_begin/end` | parser | parser (name, type, scope) | CG (Reg / frame_ofs, returned to parser) | — |
+| `set_loc` line rows | parser + backend | parser (`SrcLoc`) | — | backend (`obj_pos`) |
+| `func_pc_range` | CG | — | — | backend (`obj_pos` at `func_end`) |
+
+The parser holds `(cg, debug)` as peers. Anything the parser knows on
+its own — declarations, scopes, types — goes from parser straight to
+Debug; it does not transit CG. CG and the backend drive Debug only
+for events whose information they own.
+
+**Class 1 — declarations, types, scopes.** The parser is the only
+thing that has lexical scopes, decl `SrcLoc`s, and the C `Type*`
+chain. CG sees individual ops, not `for`-loop bodies; the backend
+sees instructions, not declarations. The parser calls Debug directly:
+
+```c
+parser_decl_local(p, name, type, init) {
+ Reg r = cg_alloc_local(p->cg, type); /* CG returns storage */
+ if (p->debug) {
+ DebugTypeId tid = c_debug_type(p->debug, p->abi, type);
+ debug_local(p->debug, name, tid, loc, dvl_from_reg(r));
+ }
+}
+```
+
+`debug_func_begin` is the same shape: parser resolves the function's
+`Type*` through `c_debug_type` and calls `debug_func_begin` itself
+(in addition to whatever it tells CG to do).
+
+**Class 2 — line rows.** The parser knows `SrcLoc`; the backend knows
+`obj_pos`. Neither alone can produce a row, so this is the one place
+Debug receives events from two sides:
+
+1. Parser calls `cg_set_loc(cg, loc)` before each statement-level IR
+ op. CG forwards to `target->set_loc` (already wired) and stashes
+ the loc on `debug` via `debug_set_pending_loc(debug, loc)`.
+2. Each backend instruction emit calls
+ `debug_emit_row(debug, section_id, offset, pending_loc)` after
+ writing bytes. Debug appends; rows arrive in text order, no sort
+ pass needed.
+
+Granularity is per-instruction, not per-CG-op. A multi-instruction CG
+op (e.g. a 64-bit immediate via `MOVZ; MOVK; MOVK; MOVK`) produces
+four rows pointing at the same loc. This is correct DWARF: only the
+first row sets `is_stmt`, the rest are continuation rows. Debug
+deduplicates a row whose `(section, offset, loc)` matches the
+previous row, so back-to-back identical events from the parser cost
+nothing. The backend doesn't grow a Debug dependency; it grows a
+single one-line call against an already-needed `obj_pos`.
+
+The harness in `test/cg/harness/cases_p.c:33` calls
+`target->set_loc` directly. That stays — it's the parser-side half of
+Class 2, and the harness is the parser stand-in for these tests.
+
+**Class 3 — `func_pc_range`.** Function bounds are only known at
+`func_end`, after the backend has finalized the function size. CG
+holds those bounds and calls `debug_func_pc_range` from inside
+`cg_func_end`:
+
+```c
+void cg_func_end(CG* cg) {
+ u32 end_ofs = obj_pos(cg->ob, cg->cur_text_sec);
+ cg->target->func_end(cg->target);
+ if (cg->debug)
+ debug_func_pc_range(cg->debug, cg->cur_text_sec,
+ cg->func_begin_ofs, end_ofs);
+}
+```
+
+This is the only class where CG drives Debug. The parser doesn't have
+the bounds; the backend doesn't have the Debug handle.
+
+**Emu lifter.** `src/emu/` (see `doc/EMU.md` §8) is a parser-shaped
+client for guest code: it calls Debug directly with synthetic
+`file_id`s encoding guest PC. Same Class-1 / Class-2 split — the
+lifter is the parser, the host backend is the backend.
+
+### 3.2 What this means for module dependencies
+
+- `src/debug/` does not include `src/cg/` or `src/arch/`. Debug is
+ driven *into*, never *out of*.
+- The backend (`src/arch/*`) gets one new dependency: a single Debug
+ forward declaration plus `debug_emit_row`. No type DIE, no
+ declaration API, nothing else.
+- CG (`src/cg/cg.c`) calls Debug only for `set_pending_loc` (in
+ `cg_set_loc`) and `func_pc_range` (in `cg_func_end`).
+- Everything else — type construction, params, locals, scopes,
+ func_begin — is parser → Debug, with CG out of the path.
+
+### 3.3 Module shape
+
+```
+src/debug/
+ debug.h (existing)
+ c_debug.h (existing)
+ debug.c NEW: state, type DIEs, func/scope/var, line program
+ debug_emit.c NEW: linearize to .debug_* sections in ObjBuilder
+ debug_abbrev.c NEW: abbrev pool, dedup, encode
+ debug_form.c NEW: form encoders (LEB128, strx, addrx, sec_offset)
+ debug_eh.c NEW: .eh_frame CIE+FDE assembler (Phase 4)
+ c_debug.c NEW: c_debug_type adapter + Type* → DebugTypeId cache
+```
+
+State held by `Debug`:
+
+```
+ Compiler* c;
+ ObjBuilder* ob;
+
+ /* file table — DWARF file index ←→ SourceManager file_id */
+ Vec<u32> file_to_src; /* dwarf_idx → src file_id */
+ Map<u32,u32> src_to_file; /* src file_id → dwarf_idx */
+
+ /* type DIE pool */
+ Vec<DebugType> types; /* indexed by DebugTypeId-1 */
+
+ /* function lifecycle stack (one entry per open func_begin) */
+ Vec<DebugFunc> funcs;
+
+ /* line program rows, in (section, offset) order */
+ Vec<LineRow> lines;
+ SrcLoc pending_loc; /* set by debug_set_pending_loc */
+
+ /* loclists keyed by debug_loclist_new id */
+ Vec<LocList> loclists;
+```
+
+`DebugType` is a tagged record carrying everything the abbrev encoder
+needs: kind, name (interned `Sym`), inner ids, byte size, member
+list, encoding, etc.
+
+### 3.4 Line program
+
+DWARF 5 line program, header-only complications:
+
+- File 0 is the CU's primary file (the `DW_AT_name` value).
+ Subsequent file numbers are dense, allocated as `debug_file` is
+ called.
+- `directory_entry_format` and `file_name_entry_format` use a fixed
+ shape: `DW_LNCT_path, DW_FORM_line_strp` and `DW_LNCT_directory_index,
+ DW_FORM_udata`. Strings live in `.debug_line_str` (separate from
+ `.debug_str`).
+- `minimum_instruction_length = 4`, `maximum_operations_per_instruction
+ = 1` for aarch64.
+- Standard opcodes only — no extension opcodes. `DW_LNS_set_file`,
+ `set_column`, `negate_stmt`, `advance_pc`, `advance_line`,
+ `const_add_pc`, special opcodes for compact (advance, line) deltas.
+- Address advances are fixed-form `DW_LNE_set_address` followed by
+ `DW_RELOC_ABS64` against the function symbol — the linker patches
+ PCs at link time. We do **not** emit `.debug_addr` indirection in
+ Phase 1; switch to it in Phase 5 if the size cost matters.
+
+Emit order, per CU:
+
+```
+header → file_names + dir_names assembled from `file_to_src`
+opcodes → emit one DW_LNE_set_address per function entry
+ (sym = function ObjSymId, addend = 0)
+ → walk `lines` for that function, advancing PC and line
+ → DW_LNE_end_sequence at function end
+```
+
+A canonical output (one func, one line, one inst):
+
+```
+[header ...]
+DW_LNE_set_address &test_main
+DW_LNS_advance_line +9 ; from default 1 → 10
+DW_LNS_copy ; emits row (file=0, line=10, addr=&test_main)
+DW_LNS_advance_pc 4
+DW_LNE_end_sequence
+```
+
+This is exactly what `p01_line_one_inst` should produce.
+
+### 3.5 .debug_info / .debug_abbrev
+
+Two-pass:
+
+1. Walk the in-memory DIE tree to assign abbrev codes (dedup by
+ `(tag, has_children, attr_list)` tuple). Build the abbrev section
+ in the order codes were assigned.
+2. Encode the DIE tree against the abbrev table. Forward references
+ (e.g. `DW_AT_type` to a type DIE that hasn't been emitted yet) are
+ resolved by recording an offset table during pass 1.
+
+Forms we commit to:
+
+- `DW_FORM_strx1` for short string indices, `DW_FORM_strx4` for the
+ rest. Strings are interned in `.debug_str` via a hash table; offsets
+ are written into `.debug_str_offsets`, indexed from `DW_AT_str_offsets_base`.
+- `DW_FORM_sec_offset` for everything pointing at another debug
+ section (line, loclists, rnglists).
+- `DW_FORM_addr` for `DW_AT_low_pc`, written as a `R_*_ABS64` reloc
+ against the function symbol, addend = 0. `DW_AT_high_pc` uses
+ `DW_FORM_data4` and stores `func_size` (i.e. function-relative).
+- `DW_FORM_exprloc` for `DW_AT_location` and `DW_AT_frame_base`.
+- `DW_FORM_data1/2/4/udata/sdata` per attribute; pick the smallest
+ fixed form that holds the value.
+
+### 3.6 Variable locations
+
+`DebugVarLoc → DW_AT_location` mapping:
+
+| `DebugVarLocKind` | exprloc bytes |
+|---|---|
+| `DVL_REG` (reg `n`) | `DW_OP_reg<n>` for n<32, else `DW_OP_regx <n>` |
+| `DVL_FRAME` (`ofs`) | `DW_OP_fbreg <sleb128 ofs>` |
+| `DVL_GLOBAL` (sym) | `DW_OP_addr <reloc against sym>` |
+| `DVL_LOCLIST` (id) | `DW_AT_location DW_FORM_loclistx <idx>` |
+
+`DW_AT_frame_base` for every subprogram is `DW_OP_call_frame_cfa` —
+the CFI machine then defines what the CFA is. This is the cleanest
+encoding and matches what gcc/clang emit.
+
+`Reg` numbering must be the architecture's DWARF register number. Use
+`cfree_arch_register_*` (`include/cfree.h:196-206`) as the canonical
+mapping; the Debug module asks the `Compiler`'s arch for its kind
+once and caches the table.
+
+### 3.7 .eh_frame (Phase 4)
+
+One CIE per CU, fixed augmentation:
+
+```
+CIE
+ version 1
+ augmentation "zR" ; FDE pointer encoding present
+ code_align 4 ; aarch64
+ data_align -8 ; aarch64
+ return_register r30 ; LR
+ augmentation_data [DW_EH_PE_pcrel|DW_EH_PE_sdata4]
+ initial instructions: DW_CFA_def_cfa r31, 0 ; sp-based, 0 offset
+```
+
+One FDE per function. Backend-emitted CFI directives (we'll need a
+`CGTarget.cfi_*` surface, or piggyback on existing prologue/epilogue
+hooks) drive `DW_CFA_advance_loc`, `DW_CFA_def_cfa_offset`,
+`DW_CFA_offset`. The aarch64 backend's prologue is small and uniform;
+the FDE bytes can be templated for "stp x29,x30,[sp,-N]; mov x29,sp"
+forms initially and only generalized when the prologue diversifies.
+
+---
+
+## 4. Consumer pipeline
+
+### 4.1 Open
+
+`cfree_dwarf_open` reads sections by name from the `CfreeObjBuilder`
+(`cfree_obj_builder()` from the file). Mandatory: `.debug_abbrev`,
+`.debug_info`, `.debug_line`, `.debug_str`, `.debug_line_str`. If any
+of these are absent, return NULL.
+
+The DWARF reader does *not* re-decode object format. It treats the
+already-parsed `ObjBuilder` (which holds the raw section bytes via
+`obj_section_get`) as its substrate. Cross-section references resolve
+by section name + offset.
+
+State:
+
+```
+struct CfreeDebugInfo {
+ CfreeCompiler* c;
+ CfreeObjFile* obj; /* not owned */
+
+ /* abbrev cache: per-CU, abbrev_code → AbbrevDecl */
+ Vec<AbbrevTable> abbrevs;
+
+ /* lazy: built on first query that needs it */
+ LineTable* lines; /* (section_idx, offset) → row */
+ Vec<Subprogram> subs; /* sorted by low_pc */
+ TypeCache types; /* DIE offset → CfreeDwarfType* */
+ EhFrame* eh; /* CIE list + FDE index */
+};
+```
+
+### 4.2 Line program decoder
+
+Walks `.debug_line`, materializing the row matrix once and indexing it
+two ways:
+
+- by PC range (sorted `(low_pc, high_pc, file, line, col)` tuples) for
+ `addr_to_line`.
+- by `(file_norm, line)` → first-matching-PC for `line_to_addr`.
+
+`file_norm` is the post-path-remap absolute path. Comparison is
+byte-equal; the producer is responsible for emitting a single
+canonical form.
+
+### 4.3 DIE walker
+
+A streaming walker over `.debug_info` keyed off the abbrev table. The
+public surface only needs:
+
+- root CU traversal,
+- `DW_TAG_subprogram` collection (for `subprogram_at` / `func_at`),
+- `DW_TAG_lexical_block` traversal (for `var_at` scope resolution),
+- `DW_TAG_variable` / `DW_TAG_formal_parameter` resolution,
+- type DIE following.
+
+We don't need a general "iterate every DIE" API outside the module.
+
+### 4.4 Loc-expr evaluator
+
+A small DWARF stack machine, supporting just the ops the producer
+emits in §3.6: `DW_OP_reg0..31`, `DW_OP_regx`, `DW_OP_fbreg`,
+`DW_OP_addr`, `DW_OP_call_frame_cfa`, plus the arithmetic ops needed
+for any future composite locations (`DW_OP_plus_uconst`,
+`DW_OP_breg*`, `DW_OP_consts`, `DW_OP_and`, `DW_OP_shr`).
+
+Composite locations (`DW_OP_piece`) are not in scope until opt
+generates them.
+
+### 4.5 CFI machine (Phase 4)
+
+`cfree_dwarf_unwind_step` walks `.eh_frame` from the highest-address
+end (CIEs first) and runs the FDE program for the FDE whose
+`(initial_location, address_range)` covers `frame->pc`. State is
+the standard CFI table; output mutates `frame->pc`, `frame->cfa`, and
+caller-saved register slots. Returns 0 on a step, 1 at stack bottom
+(no caller information, return address register is `0`), nonzero on
+decode error.
+
+---
+
+## 5. Test plan
+
+### 5.1 Group P / W path
+
+The CORPUS sketch from `test/cg/CORPUS.md` is the producer→consumer
+end-to-end test. Each case registers directives that
+`cg_check_dwarf` runs through the public consumer API. Failure of any
+directive fails the W run for that case.
+
+Existing today:
+- `p01_line_one_inst` — `line p01.c 10` + `subprogram test_main`.
+
+To register as Phase 1+2 land:
+- `p02_line_monotone` — three lines, three directives.
+- `p03_line_repeat` — same line on two PCs, one directive (the round
+ trip is enough).
+- `p05_func_pc_range` — `subprogram` directive carries
+ no inclusive bounds today; add a `pc_range FILE LINE LOW HIGH`
+ directive once `subprogram_at` returns ranges that we can predict.
+- `p07_local_loc` — needs a new directive `var PC NAME EXPECT_KIND
+ EXPECT_VALUE` that drives `cfree_dwarf_var_at` + `loc_read`.
+
+The directive grammar in `cg_check_dwarf.c:169-205` is intentionally
+small. Extend it as cases require — each new directive is one switch
+arm calling one consumer entry.
+
+### 5.2 Self-roundtrip unit
+
+A small unit test under `test/debug/` that:
+
+1. spins up an in-memory `ObjBuilder`,
+2. drives `Debug` directly (no CG),
+3. calls `debug_emit`,
+4. opens the result with `cfree_dwarf_open`,
+5. asserts every input row, type, and subprogram round-trips.
+
+This is what catches abbrev/encoding bugs that the W path would
+attribute to "the backend emitted nothing for set_loc".
+
+### 5.3 External validators
+
+Run two third-party DWARF readers against the same Phase-1 obj as
+sanity for the wire format itself:
+
+- `llvm-dwarfdump --verify` — fails on malformed sections, ambiguous
+ abbrevs, dangling DIE refs.
+- `readelf --debug-dump=info,line,abbrev,aranges` — reference
+ rendering; hand-diff once per phase.
+
+Both run under `test/musl/` style: optional, gated by tool
+availability (`command -v llvm-dwarfdump`), skipped otherwise. They
+are **not** the oracle for any case; the W path is. They exist to
+catch wire-format errors that our own consumer would also miss.
+
+---
+
+## 6. Phasing
+
+Order is chosen so each phase produces a green test the next can rely
+on. Each phase ends with a runnable W-path case green.
+
+### Phase 0 — wiring (≈300 LOC, no DWARF bytes)
+
+- Remove `unimplemented` from `debug_new`, return a real `Debug`.
+- `cg_set_loc` fanout to Debug (`debug_set_pending_loc`).
+- Backend op-end fanout: when `cg->debug != NULL`, after each emitted
+ instruction, call `debug_emit_row(debug, text_section, offset,
+ pending_loc)`.
+- `debug_emit` writes nothing; it just frees state.
+
+End state: `-g` builds run without panicking; no `.debug_*` sections
+yet; W-path stays red.
+
+### Phase 1 — minimal producer
+
+- `.debug_abbrev`, `.debug_info` with one `DW_TAG_compile_unit` and
+ one `DW_TAG_subprogram` per `debug_func_begin`,
+- `.debug_line` with the line program assembled in §3.4,
+- `.debug_str`, `.debug_line_str`, `.debug_str_offsets`,
+- `.debug_aranges` (one entry per subprogram),
+- relocations against function symbols for low_pc.
+
+End state: `readelf --debug-dump=line` and `--debug-dump=info` show
+sane output; `llvm-dwarfdump --verify` clean.
+
+### Phase 2 — minimal consumer
+
+- `cfree_dwarf_open` (real),
+- `cfree_dwarf_addr_to_line`, `cfree_dwarf_line_to_addr`,
+- `cfree_dwarf_subprogram_at`, `cfree_dwarf_func_at`.
+
+End state: `p01_line_one_inst/W` green. Add `p02`, `p03`, `p05`.
+
+### Phase 3 — types, locals, params
+
+Producer:
+- `c_debug_type` adapter (full Type chain → DIE tree),
+- `debug_param`, `debug_local`, `debug_scope_begin/end` write
+ `DW_TAG_formal_parameter`, `DW_TAG_variable`, `DW_TAG_lexical_block`.
+
+Consumer:
+- `cfree_dwarf_var_at`, `cfree_dwarf_vars_at_*`,
+ `cfree_dwarf_param_iter_*`,
+- `cfree_dwarf_type_info`, field/enum iters,
+- `cfree_dwarf_loc_read` against a `CfreeJitSession` (regs from
+ `CfreeUnwindFrame`, frame memory through the JIT session's read).
+
+End state: `p06`/`p07` directives extended; opt-off dbg can render
+`info locals` for an aarch64 binary.
+
+### Phase 4 — CFI / unwind
+
+- `.eh_frame` producer (templated FDEs from the aarch64 prologue).
+- Consumer CFI machine + `cfree_dwarf_unwind_step`.
+- `dbg` backtrace works on a self-built obj.
+
+### Phase 5 — opt path (loclists)
+
+- Producer `debug_loclist_new/add` realize as `.debug_loclists`.
+- `DW_FORM_loclistx` references on `DW_AT_location`.
+- Consumer loc-expr evaluator already handles single-location
+ exprlocs; loclists are an outer wrapper.
+
+End state: `-O2` builds keep variables debuggable.
+
+### Deferred (no phase)
+
+- `.debug_macro` (preprocessor macros). Cheap to add once pp records
+ edges; nothing depends on it.
+- Inlined subprograms. Wait until opt synthesizes inlines.
+- Split DWARF, `.debug_pubnames`, GNU index. No client.
+- `LSDA` / `.gcc_except_table`. C has no exceptions.
+
+---
+
+## 7. Risks and decisions
+
+### DWARF 4 vs 5
+
+Pick **5**. gdb ≥ 10 and lldb ≥ 9 read it; clang ≥ 11 emits it by
+default; the format is cleaner (`.debug_line_str`, `loclistx`,
+`rnglistx`). The only consumer-side cost is implementing the indirect
+form encodings, which we'd want anyway for any non-trivial CU. Don't
+support 4; if we ever need to, it's an emit option, not a different
+codepath in the consumer.
+
+### Path remap
+
+The producer must apply `CfreeCompileOptions.path_map`
+(`include/cfree.h:589-592, 604-605`) before any path enters
+`.debug_line_str` / `.debug_str`. The remap is "first match wins". The
+consumer does *not* apply remaps — paths come back exactly as the
+producer wrote them. The W path checks paths byte-equal, so test
+cases must register synthetic file ids whose names are stable across
+runs (already the case with `source_add_memory`).
+
+### Reproducibility
+
+`CfreeCompileOptions.epoch` already gates timestamps elsewhere
+(`include/cfree.h:599-603`). DWARF has no required timestamp; the
+producer's `DW_AT_producer` should not embed a build time. With
+`epoch == 0` (the default) we additionally avoid file-mtime metadata
+in any future `.debug_macro` emission.
+
+### File 0 vs file 1
+
+DW5 makes file 0 valid (the CU's primary file). DW4 reserved 0. We
+emit 5, so we use file 0 for the CU primary and start
+`debug_file`-allocated indices at 1. The line program header's
+file/dir entry counts include file 0.
+
+### .debug_aranges duplication
+
+Rangelists in `.debug_info` (`DW_AT_ranges`) supersede aranges, but
+gdb's fast attach path still uses `.debug_aranges`. Cost is one
+section with `(low_pc, length)` per subprogram (Phase 1). Keep it.
+
+### Backend coupling
+
+The Debug module must not include any backend headers
+(`src/cg/cg.h`, `src/arch/*.h`). It depends on `core` + `obj` + the
+arch register-name mapping from `include/cfree.h`. The reverse
+direction — CG using Debug — is fine and already in `cg.h`.
+
+### Consumer / producer separation
+
+`src/debug/` and the consumer must not share state types behind the
+public API. The consumer reads bytes; the producer writes bytes; the
+public DWARF wire format is the only contract between them. Concretely:
+no `#include "debug/debug.h"` from the consumer module, and no
+`#include "debug/consumer.h"` from `debug.c`. This is what lets
+`test/debug/` self-roundtrip catch encoder bugs.
+
+---
+
+## 8. Pointers
+
+- W path runner: `test/cg/run.sh:336-359`
+- Consumer probe: `test/cg/harness/cg_check_dwarf.c`
+- Directive registry: `test/cg/harness/cases.c:504-510`
+- First case body: `test/cg/harness/cases_p.c`
+- Producer header: `src/debug/debug.h`
+- C-type adapter header: `src/debug/c_debug.h`
+- Consumer header: `include/cfree.h:1224-1450`
+- Stubs to delete: `src/api/stubs.c:93-98` (producer),
+ `src/api/stubs.c:319-440` (consumer)
+- Pipeline integration: `src/api/pipeline.c:230-238`
+- Group P CORPUS entries: `test/cg/CORPUS.md` Group P table
+- Emu-side use: `doc/EMU.md` §8 (guest-DWARF reader extends `src/debug/`)