Image Inspection (planned work)
kit can read relocatable objects through kit_obj_open, and it has been
extended to also inspect linked images -- executables and shared objects --
across the same neutral API. ELF and Mach-O reading have landed; the
remaining work is the COFF/PE image reader plus a handful of follow-ups. This
doc captures the goal, what is already baseline, and what is left, so the PE
work and any later refinements parallelize against a settled contract. The
matching design lives in ../OBJ.md; see also
LINKER.md for the linker side and ../DBG.md /
../DWARF.md for the debug-info flow that rides on it.
Goal
One kit_obj_open call inspects any of: a relocatable object, an
executable, or a shared object, for ELF, Mach-O, and COFF/PE. Sections and
symbols keep working where the format still carries them; linked images
additionally expose segments, an entry point and image base, an interpreter /
SONAME, library dependencies and rpaths, and a dynamic symbol/relocation
table. objdump and the inherited tools (nm, size, addr2line) operate on
images the same way they operate on objects, with no per-format
special-casing in the driver.
Why this is a real extension, not a flag
The original reader was relocatable-object-shaped. kit_obj_open ->
kit_detect_target -> impl->read, and the ELF backend rejected anything
but ET_REL. There were DSO readers (read_elf_dso, read_coff_dso, a
Mach-O dylib stub) but they were wired only into the linker's input path,
not the public impl->read / kit_obj_open surface; ET_EXEC had no
reader at all. The in-memory model (ObjBuilder) was section / symbol /
reloc oriented with no notion of a segment (PT_LOAD), the dynamic
table (DT_NEEDED / SONAME / RPATH), an entry point, image base,
imports, or data directories -- which is exactly what image
inspection is about. The fix is a new image dimension on the model plus
neutral iterators, not a flag.
Baseline (done)
The neutral API and internal model are in place, and ELF + Mach-O image reading work end to end:
- Neutral API (
include/kit/object.h):KitObjKind+kit_obj_kind;KitObjImageInfo+kit_obj_image_info(entry, image base, interp, soname); segment iterator (kit_obj_segiter_*overKitObjSegInfowithKIT_SEG_R/W/X); dependency iterator (kit_obj_depiter_*, carrying imported names for PE/Mach-O); rpath iterator (kit_obj_rpathiter_*); dynamic symbols and relocations reusing theKitObjSymIter/KitObjRelocItershapes viakit_obj_dynsymiter_new/kit_obj_dynreliter_new. - Internal model (
src/obj/obj.h,src/obj/obj.c): anObjImagehung offObjBuilder, NULL on pure relocatables. Readers callobj_image_ensure(ob, OBJ_KIND_*)then setters/appenders for entry, base, interp, soname, segments, deps, rpaths, dynsyms, and dynrelocs;obj_freereleases it. - Glue (
src/api/object_file.c): mapsObjImageto the public iterators; relocatable inputs reportKIT_OBJ_KIND_RELwith empty image iterators and the section/symbol path unchanged. - ELF reader (
src/obj/elf/read.c):read_elfacceptsET_EXEC/ET_DYN, sharing one path -- the olde_type != ET_RELguard is now a kind switch.read_elf_imagewalks program headers for segments + PT_INTERP + image base, and parses.dynamicfor DT_NEEDED / DT_SONAME / DT_RPATH / DT_RUNPATH plus the dynsym/dynstr/reloc pointers. A zeroed section-header table is accepted for images (empty section view, segment view carries the load picture). - Mach-O reader (
src/obj/macho/read.c): acceptsMH_EXECUTE/MH_DYLIB;read_macho_imagere-walks load commands for segments (LC_SEGMENT_64,__TEXTbase, VM_PROT->OBJ_SEG perms), interp (LC_LOAD_DYLINKER), soname (LC_ID_DYLIB), deps (LC_LOAD_DYLIBand weak/reexport variants), rpaths (LC_RPATH), entry (LC_MAIN/LC_UNIXTHREAD), dynamic symbols from the externalLC_SYMTABnlist entries, andLC_DYLD_CHAINED_FIXUPSbinds/rebases (DYLD_CHAINED_PTR_64). ClassicLC_DYLD_INFOand the exports trie are intentionally not read; non-64-bit chained pointer formats are skipped leniently. - objdump (
driver/cmd/objdump.c): grew-p/--private-headers(program/dynamic headers, format-neutral via the image API),-T/--dynamic-syms, and-R/--dynamic-reloc;-freports the image type flags and real entry point;-h/-t/-dwork on executables.-dfalls back to disassembling X-permPT_LOADsegments by vaddr when the section walk is empty (stripped images), with no ELF special-casing. - Inherited tools: nm, size, addr2line open images via
kit_obj_open. nm grew-D(.dynsym);KitObjSecInfo.addrcarries the load vaddr (0 for relocatables) so SysVsize -Areports real layout. (strings is intentionally format-agnostic -- it scans raw bytes and does not callkit_obj_open.) - Debug-info retention in the linker:
.debug_*sections are carried through to linked images as file-only sections with relocations resolved in place, soaddr2line/dbgresolvefile:lineon kit-linked executables (single- and multi-input, ELF and Mach-O). See ../LINK.md and ../DWARF.md.
Remaining work
COFF/PE image reader (primary gap)
PE is the one format whose linked images do not yet open through
kit_obj_open. The COFF backend's read does not populate ObjImage, and
read_coff_dso is still wired only into the linker. As a result objdump
keeps a hand-rolled pe_parse_image raw-byte walker
(driver/cmd/objdump.c:392) behind a KIT_BIN_PE special-case
(driver/cmd/objdump.c:1831) that serves -f / -h / -p and soft-errors
-t / -d / -r / -s. The plan:
- Give the COFF backend a real image reader: DOS / NT headers, optional
header (entry point + image base + subsystem), data directories, the
section table, the import and export directories, and the base-relocation
table. Reuse / fold in
read_coff_dso's machinery so EXEC and DLL share one path the way ELF EXEC/DYN do. - Populate
ObjImage: segments from sections + image base, deps from the import directory (each DLL's imported names go into the per-dep imports list), exports from the export directory into the dynamic symbol table, and base relocations into the dynamic relocation view. - Extend objdump
-pto render the PE optional header + data directories,-Tfor exports,-Rfor base relocations, all through the neutral image API. - Delete
pe_parse_imageand collapse theKIT_BIN_PEbranch indriver_objdumpinto the normaldump_objpath once PE images open viakit_obj_open.
Escape hatch for format-specific raw fields
Some inspection needs format-specific values that do not fit the neutral
model: raw DT_* tag values, raw Mach-O load commands, PE data-directory
entries. Surface these through a per-format escape hatch in the spirit of the
existing kit_obj_section_format_flags, keeping the neutral API clean
rather than widening it per format.
Mach-O classic-format breadth (deferred)
The Mach-O reader deliberately supports only the modern fixup path
(LC_DYLD_CHAINED_FIXUPS, with the symbol table as the authoritative
dynamic-symbol source) and LC_DYLD_EXPORTS_TRIE. Classic LC_DYLD_INFO
opcode/trie reading and the exports trie remain out of scope; reading older
dylibs is a separate, lower-priority effort. Revisit only if a real input
demands it.
Out of scope
- Core files (
ET_CORE, Mach-OMH_CORE):KIT_OBJ_KIND_COREstays defined but unimplemented; detect and reject cleanly. Note / register-state parsing is a separate feature. - Synthesizing pseudo-sections from segments on stripped ELF: matches GNU
objdump/llvm-objdump, which are section-header-driven and report "no sections" when the table is absent. The segment view (and-dover X-perm segments) covers the disassembly case.
Design notes carried forward
- One open call, two views.
kit_obj_opendetects kind (reusingkit_detect_target+e_type/filetype/ PE characteristics) and routes to the backend, which fillsObjBuilder(sections/symbols where present) and, for EXEC/DYN,ObjImage. Tools that already usekit_obj_openinherit image support for free. - Segment is the load-layout unit.
{ vaddr, vsize, file_off, file_size, perms, align, name }, populated from PT_LOAD / LC_SEGMENT_64 / PE sections. Sections continue to map through the existingObjBuilderview where the format retains them; the segment view carries the load picture when section headers are absent. - Dynamic syms/relocs reuse object shapes. The dynamic symbol and
relocation iterators reuse
KitObjSymInfo/KitObjRelocrather than introducing parallel types, so consumers written for objects work on images.
Test strategy
The compiler links its own ELF / Mach-O / PE images, so tests round-trip:
link a small program, open it via kit_obj_open, and assert
kind/entry/segments/deps/dynsyms against what the linker emitted, cross-checked
against host readelf / objdump in smoke tests where available. ELF and
Mach-O goldens live under test/objdump/; PE corpora land under
test/{coff,pe}/ with the reader. Dynamic NEEDED/SONAME/dynsym paths fully
exercise once -shared / dynamic linking emit populated tables; the
empty-table rendering is already covered. See ../TESTING.md.