kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

PE/COFF Test Corpus — Target Coverage

What the test/coff/ corpus should cover for full PE/COFF object-file support, independent of kit's current implementation state. Mirrors the section layout of test/elf/CORPUS.md.

Conventions:

The driver lives in kit-roundtrip-coff.c. Each U case is a self-contained static void test_*(void) that builds an ObjBuilder, emits to a kit_writer_mem, reads back via read_coff, asserts shape equivalence, then re-emits and asserts byte equality between the two emits.


1. File header / target identification

Case Layer Shape
IMAGE_FILE_MACHINE_AMD64 round-trip U minimal .text with two bytes, no symbols (test_header_minimal_x64)
IMAGE_FILE_MACHINE_ARM64 round-trip U minimal .text with ret, no symbols (test_header_minimal_aa64)
Reproducible TimeDateStamp == 0 U implicitly checked by byte-stable round-trip
Per-arch reloc machine dispatch U covered by reloc tests below
SizeOfOptionalHeader == 0 for .obj U implicitly: every U case is a .obj, not a PE image

2. Section types

Case Layer Shape
.text (IMAGE_SCN_CNT_CODE \| MEM_EXECUTE \| MEM_READ) U test_text_only_x64 / test_text_only_aa64
.rdata (CNT_INITIALIZED_DATA \| MEM_READ) U test_rodata
.data (CNT_INITIALIZED_DATA \| MEM_READ \| MEM_WRITE) U test_data_with_reloc_abs64_x64
.bss (CNT_UNINITIALIZED_DATA) U test_bss
.tls$ (TLS template section, name-detected) U test_tls_section
.debug_* (DWARF passthrough) C deferred
.CRT$X[CIP]* (init/fini) C deferred
.xdata / .pdata (SEH unwind) C deferred — doc/WINDOWS.md §3.5
Multiple text sections (.text$mn, etc.) U covered via test_comdat_group

3. Section characteristics flags

Flag Coverage
CNT_CODE / INITIALIZED_DATA / UNINITIALIZED_DATA U — kind matrix above
MEM_EXECUTE / MEM_READ / MEM_WRITE U — kind matrix above
IMAGE_SCN_LNK_COMDAT Utest_comdat_group
IMAGE_SCN_ALIGN_* nibble (1, 4, 8, 16, 4096) Utest_align_nibble
LNK_INFO / LNK_REMOVE / MEM_DISCARDABLE C — preserved via OBJ_EXT_COFF, not yet exercised by a U case

4. Symbol coverage

Storage classes: EXTERNAL, STATIC, WEAK_EXTERNAL, FILE, SECTION (synthesized).

Section number specials: ordinary 1-based index, UNDEFINED (0), ABSOLUTE (-1), DEBUG (-2).

Case Layer Shape
Plain global function (EXTERNAL, SK_FUNC) U test_text_only_x64
Static (file-local, STATIC, SB_LOCAL) U test_static_local_symbol
Common (UNDEFINED + Value>0) U test_common_symbol
Weak external (WEAK_EXTERNAL + aux) U test_weak_global
Section symbol synthesis (SK_SECTION round-trip) U test_section_symbol_synthesis
Long symbol name (>8 chars; strtab spillover) U test_long_symbol_name
Long section name (/N form) U test_long_section_name
File symbol (.file + aux records) C deferred (kit's emit_coff handles it; no U case yet)
Hidden / protected visibility n/a COFF has no visibility model

5. Relocation coverage

x86_64 (IMAGE_REL_AMD64_*)

Wire kind kit RelocKind Layer Shape
ABSOLUTE (0) R_NONE implicit
ADDR64 (1) R_ABS64 U test_data_with_reloc_abs64_x64
ADDR32 (2) R_ABS32 U covered alongside REL32 (same harness)
ADDR32NB (3) R_X64_32S C not yet exercised
REL32 (4) R_PC32 / R_REL32 / R_PLT32 / R_X64_GOTPCREL* U test_data_with_reloc_rel32_x64
REL32_1..5 (5..9) R_PC32 + explicit addend on read C reader-only path; no U yet
SECREL / SECTION (not modeled in v1) deferred — doc/WINDOWS.md §3.1

aarch64 (IMAGE_REL_ARM64_*)

Wire kind kit RelocKind Layer Shape
ABSOLUTE (0) R_NONE implicit
ADDR32 (1) R_ABS32 C not yet exercised
ADDR32NB (2) R_ABS32 C not yet exercised
BRANCH26 (3) R_AARCH64_CALL26 / R_AARCH64_JUMP26 U test_aa64_branch26
PAGEBASE_REL21 (4) R_AARCH64_ADR_PREL_PG_HI21 U test_aa64_pagebase_pageoffset
REL21 (5) R_AARCH64_ADR_PREL_LO21 C not yet exercised
PAGEOFFSET_12A (6) R_AARCH64_ADD_ABS_LO12_NC U test_aa64_pagebase_pageoffset
PAGEOFFSET_12L (7) R_AARCH64_LDST64_ABS_LO12_NC C not yet exercised
BRANCH19 (15) R_AARCH64_CONDBR19 C not yet exercised
BRANCH14 (16) R_AARCH64_TSTBR14 C not yet exercised
ADDR64 (14) R_ABS64 U test_data_with_reloc_abs64_aa64
SECREL family (not modeled in v1) deferred

6. COMDAT / groups

Case Layer Shape
COMDAT group with SELECT_ANY U test_comdat_group
SELECT_NODUPLICATES C not yet exercised
SELECT_SAME_SIZE / EXACT_MATCH C not yet exercised
SELECT_ASSOCIATIVE (paired sections) C reader handles; no U yet
SELECT_LARGEST / NEWEST C not yet exercised

7. TLS / special sections

Case Layer Shape
.tls$ data section U test_tls_section
.tls$ZZZ BSS-tail C
_tls_index / _tls_used directory E Phase 3
.CRT$XCU constructors C deferred

8. Layout / structure edges

Case Layer Shape
Empty .obj (no sections, no symbols) U test_empty_obj
Long section name (/<decimal> form) U test_long_section_name
Long symbol name (LongName form) U test_long_symbol_name
Section alignment 1 / 4 / 8 / 16 / 4096 U test_align_nibble
> 65535 relocations in one section n/a emitter panics; not legal in v1

9. Negative inputs (bad/)

Deferred — no bad/ corpus in Phase 1. Layer E will cover:

10. Known limitations (round-trip asymmetries)

  1. Section-definition aux records. emit_coff always emits a STATIC section symbol + section-definition aux for every kept section, even if the input ObjBuilder did not name one. The reader maps those aux records onto SK_SECTION symbols. After one round-trip the readback carries an SK_SECTION symbol per section; the second emit reproduces the exact same wire bytes (byte-stable from step 2 onward).

  2. Symbol ordering. Section symbols come first (one per kept section), then .file symbols (if any), then user-defined symbols in iteration order. A user-supplied ObjBuilder that mints user symbols before section symbols still round-trips, but the symbol-table index ordering differs after the first emit. The harness compares by name, not index.

  3. TimeDateStamp. Always zero (reproducible builds), so byte stability holds even across re-emits with different now values.

  4. COMDAT selection flag-vs-enum. obj_group(..., flags) takes a flag bitfield (KIT_OBJ_GROUP_COMDAT = 1). The COFF selection (e.g. IMAGE_COMDAT_SELECT_ANY = 2) is a small int enum stored as flags on the group when read back from COFF. Round-trip stability holds as long as callers consistently use one or the other model — see test_comdat_group.

Stratification

When picking what to land next:

  1. Reloc-kind matrix per arch (U) — every kind in the per-arch translator table needs a U case. Currently covered: R_ABS64, R_PC32 on both arches; R_AARCH64_CALL26, R_AARCH64_ADR_PREL_PG_HI21 + R_AARCH64_ADD_ABS_LO12_NC on aa64.
  2. Symbol storage-class matrix (U) — covered: EXTERNAL, STATIC, WEAK_EXTERNAL, SECTION; common symbols.
  3. Section characteristics matrix (U) — kind × flags matrix covered for .text / .rdata / .data / .bss / .tls$.
  4. mingw fixtures (C) — gated on toolchain availability.
  5. Negative inputs (Layer E) — defer until reader's diagnostic surface is exercised by Phase 3 link tests.
  6. SEH / unwind-info round-trip — Phase 2.7.

A "complete" corpus has one U cell for each row in groups 1–3 and at least one C row for groups 4–6.