PE/COFF Test Corpus — Target Coverage
What the test/coff/ corpus should cover for full PE/COFF object-file
support, independent of kit's current implementation state. Mirrors
the section layout of test/elf/CORPUS.md.
Conventions:
- U = unit (hand-built
ObjBuilderround-trip; what this harness ships today) - C =
cases/(x86_64-w64-mingw32-gcc -c/aarch64-w64-mingw32-gcc -cfixtures — deferred, no mingw toolchain wired in yet) - E =
exec/(link_emit_coff+ Wine — deferred until Phase 3 ofdoc/WINDOWS.mdlands)
The driver lives in kit-roundtrip-coff.c. Each U case is a
self-contained static void test_*(void) that builds an
ObjBuilder, emits to a kit_writer_mem, reads back via
read_coff, asserts shape equivalence, then re-emits and asserts
byte equality between the two emits.
1. File header / target identification
| Case | Layer | Shape |
|---|---|---|
IMAGE_FILE_MACHINE_AMD64 round-trip |
U | minimal .text with two bytes, no symbols (test_header_minimal_x64) |
IMAGE_FILE_MACHINE_ARM64 round-trip |
U | minimal .text with ret, no symbols (test_header_minimal_aa64) |
| Reproducible TimeDateStamp == 0 | U | implicitly checked by byte-stable round-trip |
| Per-arch reloc machine dispatch | U | covered by reloc tests below |
SizeOfOptionalHeader == 0 for .obj |
U | implicitly: every U case is a .obj, not a PE image |
2. Section types
| Case | Layer | Shape |
|---|---|---|
.text (IMAGE_SCN_CNT_CODE \| MEM_EXECUTE \| MEM_READ) |
U | test_text_only_x64 / test_text_only_aa64 |
.rdata (CNT_INITIALIZED_DATA \| MEM_READ) |
U | test_rodata |
.data (CNT_INITIALIZED_DATA \| MEM_READ \| MEM_WRITE) |
U | test_data_with_reloc_abs64_x64 |
.bss (CNT_UNINITIALIZED_DATA) |
U | test_bss |
.tls$ (TLS template section, name-detected) |
U | test_tls_section |
.debug_* (DWARF passthrough) |
C | deferred |
.CRT$X[CIP]* (init/fini) |
C | deferred |
.xdata / .pdata (SEH unwind) |
C | deferred — doc/WINDOWS.md §3.5 |
Multiple text sections (.text$mn, etc.) |
U | covered via test_comdat_group |
3. Section characteristics flags
| Flag | Coverage |
|---|---|
CNT_CODE / INITIALIZED_DATA / UNINITIALIZED_DATA |
U — kind matrix above |
MEM_EXECUTE / MEM_READ / MEM_WRITE |
U — kind matrix above |
IMAGE_SCN_LNK_COMDAT |
U — test_comdat_group |
IMAGE_SCN_ALIGN_* nibble (1, 4, 8, 16, 4096) |
U — test_align_nibble |
LNK_INFO / LNK_REMOVE / MEM_DISCARDABLE |
C — preserved via OBJ_EXT_COFF, not yet exercised by a U case |
4. Symbol coverage
Storage classes: EXTERNAL, STATIC, WEAK_EXTERNAL, FILE,
SECTION (synthesized).
Section number specials: ordinary 1-based index, UNDEFINED (0),
ABSOLUTE (-1), DEBUG (-2).
| Case | Layer | Shape |
|---|---|---|
Plain global function (EXTERNAL, SK_FUNC) |
U | test_text_only_x64 |
Static (file-local, STATIC, SB_LOCAL) |
U | test_static_local_symbol |
Common (UNDEFINED + Value>0) |
U | test_common_symbol |
Weak external (WEAK_EXTERNAL + aux) |
U | test_weak_global |
Section symbol synthesis (SK_SECTION round-trip) |
U | test_section_symbol_synthesis |
| Long symbol name (>8 chars; strtab spillover) | U | test_long_symbol_name |
Long section name (/N form) |
U | test_long_section_name |
File symbol (.file + aux records) |
C | deferred (kit's emit_coff handles it; no U case yet) |
| Hidden / protected visibility | n/a | COFF has no visibility model |
5. Relocation coverage
x86_64 (IMAGE_REL_AMD64_*)
| Wire kind | kit RelocKind |
Layer | Shape |
|---|---|---|---|
ABSOLUTE (0) |
R_NONE |
implicit | — |
ADDR64 (1) |
R_ABS64 |
U | test_data_with_reloc_abs64_x64 |
ADDR32 (2) |
R_ABS32 |
U | covered alongside REL32 (same harness) |
ADDR32NB (3) |
R_X64_32S |
C | not yet exercised |
REL32 (4) |
R_PC32 / R_REL32 / R_PLT32 / R_X64_GOTPCREL* |
U | test_data_with_reloc_rel32_x64 |
REL32_1..5 (5..9) |
R_PC32 + explicit addend on read |
C | reader-only path; no U yet |
SECREL / SECTION |
(not modeled in v1) | — | deferred — doc/WINDOWS.md §3.1 |
aarch64 (IMAGE_REL_ARM64_*)
| Wire kind | kit RelocKind |
Layer | Shape |
|---|---|---|---|
ABSOLUTE (0) |
R_NONE |
implicit | — |
ADDR32 (1) |
R_ABS32 |
C | not yet exercised |
ADDR32NB (2) |
R_ABS32 |
C | not yet exercised |
BRANCH26 (3) |
R_AARCH64_CALL26 / R_AARCH64_JUMP26 |
U | test_aa64_branch26 |
PAGEBASE_REL21 (4) |
R_AARCH64_ADR_PREL_PG_HI21 |
U | test_aa64_pagebase_pageoffset |
REL21 (5) |
R_AARCH64_ADR_PREL_LO21 |
C | not yet exercised |
PAGEOFFSET_12A (6) |
R_AARCH64_ADD_ABS_LO12_NC |
U | test_aa64_pagebase_pageoffset |
PAGEOFFSET_12L (7) |
R_AARCH64_LDST64_ABS_LO12_NC |
C | not yet exercised |
BRANCH19 (15) |
R_AARCH64_CONDBR19 |
C | not yet exercised |
BRANCH14 (16) |
R_AARCH64_TSTBR14 |
C | not yet exercised |
ADDR64 (14) |
R_ABS64 |
U | test_data_with_reloc_abs64_aa64 |
SECREL family |
(not modeled in v1) | — | deferred |
6. COMDAT / groups
| Case | Layer | Shape |
|---|---|---|
COMDAT group with SELECT_ANY |
U | test_comdat_group |
SELECT_NODUPLICATES |
C | not yet exercised |
SELECT_SAME_SIZE / EXACT_MATCH |
C | not yet exercised |
SELECT_ASSOCIATIVE (paired sections) |
C | reader handles; no U yet |
SELECT_LARGEST / NEWEST |
C | not yet exercised |
7. TLS / special sections
| Case | Layer | Shape |
|---|---|---|
.tls$ data section |
U | test_tls_section |
.tls$ZZZ BSS-tail |
C | — |
_tls_index / _tls_used directory |
E | Phase 3 |
.CRT$XCU constructors |
C | deferred |
8. Layout / structure edges
| Case | Layer | Shape |
|---|---|---|
Empty .obj (no sections, no symbols) |
U | test_empty_obj |
Long section name (/<decimal> form) |
U | test_long_section_name |
| Long symbol name (LongName form) | U | test_long_symbol_name |
| Section alignment 1 / 4 / 8 / 16 / 4096 | U | test_align_nibble |
| > 65535 relocations in one section | n/a | emitter panics; not legal in v1 |
9. Negative inputs (bad/)
Deferred — no bad/ corpus in Phase 1. Layer E will cover:
- Truncated file header (< 20 bytes)
- Non-zero
SizeOfOptionalHeader(i.e. PE image fed to.objreader) - Unsupported
Machine(e.g.IMAGE_FILE_MACHINE_I386) PointerToRawData + SizeOfRawData > file_sizePointerToSymbolTable + NumberOfSymbols * 18overflows- Strtab size field < 4 / strtab body extending past file
- Reloc
SymbolTableIndexpast symbol table - COMDAT aux with
Selection == ASSOCIATIVEandNumberout of range
10. Known limitations (round-trip asymmetries)
Section-definition aux records.
emit_coffalways emits a STATIC section symbol + section-definition aux for every kept section, even if the inputObjBuilderdid not name one. The reader maps those aux records ontoSK_SECTIONsymbols. After one round-trip the readback carries anSK_SECTIONsymbol per section; the second emit reproduces the exact same wire bytes (byte-stable from step 2 onward).Symbol ordering. Section symbols come first (one per kept section), then
.filesymbols (if any), then user-defined symbols in iteration order. A user-suppliedObjBuilderthat mints user symbols before section symbols still round-trips, but the symbol-table index ordering differs after the first emit. The harness compares by name, not index.TimeDateStamp. Always zero (reproducible builds), so byte stability holds even across re-emits with different
nowvalues.COMDAT selection flag-vs-enum.
obj_group(..., flags)takes a flag bitfield (KIT_OBJ_GROUP_COMDAT = 1). The COFF selection (e.g.IMAGE_COMDAT_SELECT_ANY = 2) is a small int enum stored asflagson the group when read back from COFF. Round-trip stability holds as long as callers consistently use one or the other model — seetest_comdat_group.
Stratification
When picking what to land next:
- ★ Reloc-kind matrix per arch (U) — every kind in the
per-arch translator table needs a U case. Currently covered:
R_ABS64,R_PC32on both arches;R_AARCH64_CALL26,R_AARCH64_ADR_PREL_PG_HI21+R_AARCH64_ADD_ABS_LO12_NCon aa64. - ★ Symbol storage-class matrix (U) — covered:
EXTERNAL,STATIC,WEAK_EXTERNAL,SECTION; common symbols. - ★ Section characteristics matrix (U) — kind × flags matrix
covered for
.text/.rdata/.data/.bss/.tls$. - mingw fixtures (C) — gated on toolchain availability.
- Negative inputs (Layer E) — defer until reader's diagnostic surface is exercised by Phase 3 link tests.
- SEH / unwind-info round-trip — Phase 2.7.
A "complete" corpus has one U cell for each row in groups 1–3 and at least one C row for groups 4–6.