kit

kit
git clone https://git.ryansepassi.com/git/kit.git
Log | Files | Refs | README

commit 34beffa90bc8871bb7749008fbe1a89d3ee88cf9
parent b89d452878b5db44a453092e6589a72c0bc5876f
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Mon, 11 May 2026 10:20:56 -0700

STAGE2.md plan

Diffstat:
Adoc/STAGE2.md | 154+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 154 insertions(+), 0 deletions(-)

diff --git a/doc/STAGE2.md b/doc/STAGE2.md @@ -0,0 +1,154 @@ +# Stage-2 self-host + +What's missing to make `make self` produce a stage-2 `cfree` built by stage-1 +cfree itself. Companion to `DESIGN.md`. Snapshot taken by compiling every +`src/**/*.c` and `driver/*.c` individually with stage-1 cfree-cc. + +Result at snapshot: **60 of 104 files compile clean. 44 fail.** The failures +collapse into ~10 root causes, listed below in roughly the order a fix would +unblock the most files. + +## Build configuration + +Stage 2 currently invokes: + +``` +cfree-stage1 cc -isystem rt/include -isystem rt/include/libc -Iinclude -Isrc +``` + +`-isystem rt/include/libc` is required so the hosted libc headers are +visible (top-level `rt/include/` only ships the freestanding set). The SDK +include path is deliberately *not* on the search path — stage 2 should +resolve everything through `rt/include` + `rt/include/libc`. + +`DEPFLAGS` is empty for stage 2 until B0 lands. + +## Checklist + +### Preprocessor / lexer + +- [ ] **A1.** `#include "x.h"` doesn't search the source file's directory. + C99 §6.10.2 requires quoted includes to look in the including file's + directory first, then fall back to the bracketed-include search list. Today + cfree's pp jumps straight to the search list. Repro: + `echo '#include "foo.h"' > /tmp/dir/use.c && cfree cc -c /tmp/dir/use.c` + fails to find `/tmp/dir/foo.h`. _Blocks all 13 `driver/*.c` files._ +- [ ] **A2.** Expand `rt/include/libc/` to cover the POSIX/Mach surface the + driver uses. Missing today: `sys/stat.h`, `sys/mman.h`, `sys/syscall.h`, + `fcntl.h`, `unistd.h`, `signal.h`, `pthread.h`, `dlfcn.h`, `mach/mach.h`, + `mach/mach_vm.h`, `mach/vm_map.h`. Scope question (vs. dropping the + dependency in the driver source); not a compiler bug. _Blocks most of + `driver/`._ + +### Driver — dep emission + +- [ ] **B0.** Implement `cfree_dep_iter_new` / `_next` (today both stubs in + `src/api/stubs.c:106-115`). PP needs to record header-include edges so the + iterator can drain them. Until then, stage 2 strips `-MMD -MP` via + `DEPFLAGS=''`. Also: change the failure path in `driver/cc.c:1264` so a + NULL iter doesn't surface as `"out of memory"` — that error message hid + this for the whole first investigation. _Quality-of-life; stage 2 builds + fine without dep files for now._ + +### Parser / sema + +- [ ] **B1.** Recognize `__alignof__` as an alias for `_Alignof`. `_Alignof` + already works. _8 files: `src/debug/{c_debug,debug,debug_abbrev,debug_emit}.c`, + `src/link/{link_dyn,link_elf,link_layout,link_macho}.c`._ +- [ ] **B2.** Implement `__builtin_ctz` (count trailing zeros, unsigned int). + Used by 3 backends. Either lower to the ISA's CTZ/CLZ pair or expand to a + portable C fallback at sema time. _`src/arch/{aarch64,rv64,x64}.c`._ +- [ ] **B3.** Treat enum constants as constant expressions in array-bound + positions at file scope. Designated-init contexts already accept enum + constants (verified), so the gap is specifically the array-bound path. + Repro: + ```c + typedef enum { A, B, N } E; + static const char* names[N] = {"a", "b"}; + ``` + → `expected constant expression`. _`src/parse/parse.c:117`._ +- [ ] **B4.** Treat the address of a string literal as a constant expression + in static / file-scope initializers for pointer slots. Repro: + ```c + typedef struct { const char* s; } S; + const S g = { .s = "hi" }; + ``` + → `expected constant expression`. Same issue when initializing + `const char* arr[N]` with string literals. _`src/arch/aa64_disasm.c`, + `src/link/link_arch_{aa64,rv64,x64}.c`._ +- [ ] **B5.** Treat the address of a function (including `static` functions) + as a constant expression in static initializers — functions always have + static storage duration. The current error message "static initializer + requires object with static storage" is wrong on its face. Repro: + ```c + static int helper(void) { return 0; } + typedef int (*FN)(void); + const FN g = helper; + ``` + _4 `src/abi/abi_*.c` vtable files._ +- [ ] **B6.** Aggregate-initializer brace-tracking misfires for *array-of-struct* + where each struct has a trailing fixed-size array field initialized with a + brace-list. Single-instance form works; the arrayed form trips. Repro: + ```c + typedef struct { unsigned a; unsigned char p[2]; } S; + static const S t[] = { {1u, {0,0}}, {2u, {0,0}} }; /* "too many initializers for array" */ + ``` + _`src/arch/aa64_{isa,asm,regs}.c`._ + +### Codegen — aarch64 backend + +- [ ] **C1.** Argument lowering: handle `OPK_INDIRECT` source operands in + both the INT and FP paths at `src/arch/aarch64.c:2073-2129`. Today only + `OPK_IMM` / `OPK_REG` / `OPK_LOCAL` are wired; an indirect source (e.g., + passing `ptr->field` by value, or `arr[i].field` where the addressing was + lowered to a base+offset load) panics with + `aarch64 call: arg storage kind 4 unsupported`. The fix mirrors the + existing `OPK_LOCAL` case but loads from `[base + part->src_offset]` + instead of `[fp - slot_off + src_offset]`. _6 files: `src/arch/mc.c`, + `src/cg/cg.c`, `src/decl/{decl,decl_attrs}.c`, `src/opt/opt.c`, + `src/pp/pp.c`._ +- [ ] **C2.** Same `OPK_INDIRECT` gap in the indirect-return path + (separate panic string: `aarch64 ret indirect: storage kind 4 + unsupported`). _`src/api/pipeline.c`, `src/parse/parse_asm.c`._ + +### Codegen — x64 backend + +- [ ] **C3.** Mirror C1/C2 on x64. The same panics exist at + `src/arch/x64.c:1761,1798,1817,1827,1904`. Doesn't block aarch64 + self-host but blocks x64 self-host once that's attempted. + +### Linker + +- [ ] **D1.** Stage 2 currently relies on `$(CC) -o $@ ... $(LIB_AR)` to do + the final link — for stage 2 that's `cfree-stage1 cc`, which in turn + shells out to the host linker. Once stage 2 builds, the `$(BIN)` recipe + should be reviewed to confirm the produced binary is genuinely a + stage-1-emitted object linked through cfree's own ld path, not falling + back to clang/ld silently. + +### Hosted libc shim + +- [ ] **E1.** Today `libcfree_hosted_macos.a` is built but not threaded into + the `$(BIN)` link. For a "self-host on rt libc" milestone (separate from + this checklist's primary goal of "stage 2 builds at all"), the `$(BIN)` + rule on macOS should consume the hosted shim and route libc calls through + it instead of clang's default `-lSystem` glue. + +## How to re-run the audit + +After landing any fix, regenerate the failure list with: + +```sh +make && cp build/cfree build/cfree-stage1 +BIN=$(pwd)/build/cfree-stage1 +FLAGS="-isystem rt/include -isystem rt/include/libc -Iinclude -Isrc" +DFLAGS="-isystem rt/include -isystem rt/include/libc -Iinclude" +for f in $(find src -name '*.c' | sort); do + $BIN cc $FLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |" +done +for f in $(find driver -name '*.c' | sort); do + $BIN cc $DFLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |" +done +``` + +Then `make self` to confirm a clean stage-2 build end-to-end.