kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

ELF Test Corpus — Target Coverage

What the test/elf/ corpus should cover for full ELF object-file support, independent of kit's current implementation state. Each row is a distinct case worth a discrete test (unit/, cases/, exec/, or bad/); groups starred (★) are highest-leverage and should land first.

Conventions:


1. ELF header / target identification ★

Case Layer Notes
e_machine per supported arch C, U aarch64, x86_64, riscv64, riscv32, arm32 (when supported)
ELFCLASS32 32-bit ELF C, U independent matrix from class
ELFDATA2MSB big-endian C, U aarch64-be, mips, etc.
ELFOSABI_* variations U NONE / LINUX / FREEBSD — emitter must round-trip whatever it reads
e_flags per-arch C RISC-V RVE/RVC, ARM EABI version

2. Section types ★

sh_type Layer Specific case
PROGBITS (text/data/rodata) C, U covered by 01_return42 etc.
NOBITS C, U .bss of various sizes; sh_addralign 1/8/64/4096
SYMTAB / STRTAB U round-trip preserves logical content (not byte layout)
RELA / REL C both encodings; sh_flags & SHF_INFO_LINK
NOTE C .note.gnu.build-id, .note.ABI-tag, .note.gnu.property
INIT_ARRAY/FINI_ARRAY/PREINIT_ARRAY C, E constructor ordering across TUs
GROUP (COMDAT) C C++-style inline funcs across two TUs (09_comdat_inline.c)
LLVM_ADDRSIG and other custom C unknown sh_type must round-trip via raw-type preservation
GNU_HASH / HASH C dynamic objects (post-shared-lib support)
DYNSYM / DYNAMIC C shared objects

3. Section flags

Flag Coverage
SHF_ALLOC / WRITE / EXECINSTR implicit in every case
SHF_TLS .tdata / .tbss
SHF_MERGE + SHF_STRINGS .rodata.str1.1 / .debug_str
SHF_MERGE + fixed sh_entsize .rodata.cst{4,8,16} constant pools
SHF_GROUP every section inside a COMDAT
SHF_LINK_ORDER -flto outputs, .gcc_except_table
SHF_INFO_LINK every .rela.*
SHF_EXCLUDE .llvm_addrsig; linker drop hint
SHF_COMPRESSED zlib/zstd-compressed .debug_*

4. Symbol coverage ★

Bindings: STB_LOCAL, STB_GLOBAL, STB_WEAK. (STB_GNU_UNIQUE if kit ever needs it.)

Types: STT_NOTYPE, STT_FUNC, STT_OBJECT, STT_SECTION, STT_FILE, STT_COMMON, STT_TLS, STT_GNU_IFUNC.

Visibility: STV_DEFAULT, STV_HIDDEN, STV_PROTECTED, STV_INTERNAL.

shndx values: ordinary index, SHN_UNDEF, SHN_ABS, SHN_COMMON, SHN_XINDEX (extended for >65279 sections).

Cases:

Case Layer
Plain global function definition C
Static (file-local) function C
Tentative definition (common) C
__attribute__((weak)) defined and undefined C
__attribute__((visibility("hidden"))) C
TLS variable (__thread) C, E
IFUNC (__attribute__((ifunc("resolver")))) C, E
Aliased symbols (multiple names, same address) C
Section symbols as relocation targets C
File symbol (STT_FILE) round-trip C
AArch64 mapping symbols $x / $d (STT_NOTYPE on defined sym) C

5. Relocation coverage ★

For each supported arch, every reloc kind kit's RelocKind enum maps must have a unit test (round-trip) AND a behavioral test (linked + run gives the right value).

AArch64

Reloc Test Notes
R_AARCH64_NONE U sentinel
R_AARCH64_ABS64 / ABS32 C, E data pointers, absolute jump tables
R_AARCH64_PREL64 / PREL32 C, E .eh_frame FDE pointers
R_AARCH64_CALL26 / JUMP26 E direct calls, tail calls
R_AARCH64_ADR_PREL_PG_HI21 + ADD_ABS_LO12_NC E small-model PIC addressing
R_AARCH64_LDST{8,16,32,64,128}_ABS_LO12_NC E LDR/STR offset materialization
R_AARCH64_GOT_* family C, E shared-lib path
R_AARCH64_TLSGD_* / TLSIE_* / TLSLE_* / TLSDESC_* C, E TLS access models
R_AARCH64_PLT32 C, E PIE/shared call through PLT

x86_64 (when added)

R_X86_64_64, _32, _PC32, _PC64, _PLT32, _GOTPCREL, _GOTPCRELX, _REX_GOTPCRELX, _TLSGD, _GOTTPOFF, _TPOFF32/64, _DTPOFF32/64.

RISC-V (when added)

R_RISCV_HI20/_LO12_I/_LO12_S, _BRANCH, _JAL, _CALL_PLT, _PCREL_HI20/_PCREL_LO12_*, _RELAX, _TLS_GD_HI20, etc.

Reloc edge cases (any arch)

6. Special sections

Section Coverage
.text.<fnname> (function sections) C — -ffunction-sections
.data.<varname> C — -fdata-sections
.data.rel.ro C — relocatable read-only data
.init_array.NNN / .fini_array.NNN E — priority ctors/dtors
.tdata / .tbss C, E — TLS
.gcc_except_table + .eh_frame C — exception tables
.note.gnu.build-id C — reproducible-build identity
.note.gnu.property C — CET/BTI/PAC markers (AArch64-BTI)
.ARM.attributes / .riscv.attributes / .note.ABI-tag C
.gnu.linkonce.t.<sym> (legacy COMDAT) C
.debug_* (DWARF) C — opaque preservation; semantic equivalence later
.eh_frame_hdr C — when shared/exe path emits it
.got / .got.plt / .plt E — shared-lib link path
.dynamic / .dynstr / .dynsym E — shared-object output

7. Layout / structure edge cases

8. Archive (.a) ★

Case Layer
Empty archive B
Single .o member C-like (separate ar harness)
Multiple members, dependency on later member E
BSD vs SysV format C
Symbol index (//__.SYMDEF) present and absent C, E
Long filenames (// extended name table) C

9. Negative inputs (bad/)

Each blob has a .expect substring; harness asserts compiler_panic exits cleanly (no segfault).

Blob Trigger
truncated_ehdr.elf < 64 bytes
bad_magic.elf first 4 bytes wrong
e_machine_x86.elf machine mismatch (when arch-validated)
wrong_class.elf 64-bit machine tagged ELFCLASS32 (class/arch mismatch)
wrong_endian.elf ELFDATA2MSB in an LSB pipeline
sh_offset_oob.elf sh_offset + sh_size > file_size
sh_link_oob.elf sh_link >= e_shnum
e_shstrndx_oob.elf bogus shstrndx
symtab_entsize_bad.elf sh_entsize != sizeof(Elf64_Sym)
rela_entsize_bad.elf sh_entsize != 24
r_info_sym_oob.elf reloc sym index past symtab
group_cycle.elf SHT_GROUP referencing itself
nobits_with_data.elf SHT_NOBITS with non-zero sh_offset body
huge_size.elf sh_size = u64::max
string_no_nul.elf strtab without trailing \0
unknown_machine.elf accepted as opaque or rejected by policy

10. Cross-tool agreement

For every cases/*.c, the structural diff oracle should pass against:

Any case that diverges across these is a bug in either kit or the normalizer — not allowed to silently .xfail.

11. Behavioral / runtime

exec/ already covers: exit code, in-section call, ADRP+ADD load, .rodata load, .data load, BSS, two-TU link. Extend with:

Case Exercises
Static initializer order across TUs INIT_ARRAY priority
Weak symbol replaced by strong resolution rule
Common symbol coalescing tentative-def merging
Inline function shared via COMDAT group dedup
TLS variable read/written from two TUs .tdata + TLS relocs end-to-end
dlopen-style runtime relocation (when shared lands) dynamic relocs
setjmp/longjmp across compilation unit unwind interaction

Stratification

When picking what to land next, the prioritization is:

  1. ★ Reloc-kind matrix per arch — every kind kit claims to support needs unit + behavioral coverage. This is the single highest-leverage gap.
  2. ★ Symbol kind/visibility matrix — every STT_* × STB_* × STV_* combo we emit must round-trip.
  3. ★ Section type matrix — every sh_type we admit, especially NOBITS, GROUP, INIT_ARRAY.
  4. Special sections with semantic flags (SHF_TLS, SHF_MERGE, etc.).
  5. Negative inputs (bad/).
  6. Layout edge cases (large/extended).
  7. Cross-tool agreement (clang vs gcc, lld vs ld.bfd).
  8. Archive support.
  9. DWARF semantic equivalence (deferred until the consumer side cares).

A "complete" corpus has a row for every cell in groups 1–4 and at least one representative for every cell in 5–7.