commit 34beffa90bc8871bb7749008fbe1a89d3ee88cf9
parent b89d452878b5db44a453092e6589a72c0bc5876f
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 11 May 2026 10:20:56 -0700
STAGE2.md plan
Diffstat:
| A | doc/STAGE2.md | | | 154 | +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ |
1 file changed, 154 insertions(+), 0 deletions(-)
diff --git a/doc/STAGE2.md b/doc/STAGE2.md
@@ -0,0 +1,154 @@
+# Stage-2 self-host
+
+What's missing to make `make self` produce a stage-2 `cfree` built by stage-1
+cfree itself. Companion to `DESIGN.md`. Snapshot taken by compiling every
+`src/**/*.c` and `driver/*.c` individually with stage-1 cfree-cc.
+
+Result at snapshot: **60 of 104 files compile clean. 44 fail.** The failures
+collapse into ~10 root causes, listed below in roughly the order a fix would
+unblock the most files.
+
+## Build configuration
+
+Stage 2 currently invokes:
+
+```
+cfree-stage1 cc -isystem rt/include -isystem rt/include/libc -Iinclude -Isrc
+```
+
+`-isystem rt/include/libc` is required so the hosted libc headers are
+visible (top-level `rt/include/` only ships the freestanding set). The SDK
+include path is deliberately *not* on the search path — stage 2 should
+resolve everything through `rt/include` + `rt/include/libc`.
+
+`DEPFLAGS` is empty for stage 2 until B0 lands.
+
+## Checklist
+
+### Preprocessor / lexer
+
+- [ ] **A1.** `#include "x.h"` doesn't search the source file's directory.
+ C99 §6.10.2 requires quoted includes to look in the including file's
+ directory first, then fall back to the bracketed-include search list. Today
+ cfree's pp jumps straight to the search list. Repro:
+ `echo '#include "foo.h"' > /tmp/dir/use.c && cfree cc -c /tmp/dir/use.c`
+ fails to find `/tmp/dir/foo.h`. _Blocks all 13 `driver/*.c` files._
+- [ ] **A2.** Expand `rt/include/libc/` to cover the POSIX/Mach surface the
+ driver uses. Missing today: `sys/stat.h`, `sys/mman.h`, `sys/syscall.h`,
+ `fcntl.h`, `unistd.h`, `signal.h`, `pthread.h`, `dlfcn.h`, `mach/mach.h`,
+ `mach/mach_vm.h`, `mach/vm_map.h`. Scope question (vs. dropping the
+ dependency in the driver source); not a compiler bug. _Blocks most of
+ `driver/`._
+
+### Driver — dep emission
+
+- [ ] **B0.** Implement `cfree_dep_iter_new` / `_next` (today both stubs in
+ `src/api/stubs.c:106-115`). PP needs to record header-include edges so the
+ iterator can drain them. Until then, stage 2 strips `-MMD -MP` via
+ `DEPFLAGS=''`. Also: change the failure path in `driver/cc.c:1264` so a
+ NULL iter doesn't surface as `"out of memory"` — that error message hid
+ this for the whole first investigation. _Quality-of-life; stage 2 builds
+ fine without dep files for now._
+
+### Parser / sema
+
+- [ ] **B1.** Recognize `__alignof__` as an alias for `_Alignof`. `_Alignof`
+ already works. _8 files: `src/debug/{c_debug,debug,debug_abbrev,debug_emit}.c`,
+ `src/link/{link_dyn,link_elf,link_layout,link_macho}.c`._
+- [ ] **B2.** Implement `__builtin_ctz` (count trailing zeros, unsigned int).
+ Used by 3 backends. Either lower to the ISA's CTZ/CLZ pair or expand to a
+ portable C fallback at sema time. _`src/arch/{aarch64,rv64,x64}.c`._
+- [ ] **B3.** Treat enum constants as constant expressions in array-bound
+ positions at file scope. Designated-init contexts already accept enum
+ constants (verified), so the gap is specifically the array-bound path.
+ Repro:
+ ```c
+ typedef enum { A, B, N } E;
+ static const char* names[N] = {"a", "b"};
+ ```
+ → `expected constant expression`. _`src/parse/parse.c:117`._
+- [ ] **B4.** Treat the address of a string literal as a constant expression
+ in static / file-scope initializers for pointer slots. Repro:
+ ```c
+ typedef struct { const char* s; } S;
+ const S g = { .s = "hi" };
+ ```
+ → `expected constant expression`. Same issue when initializing
+ `const char* arr[N]` with string literals. _`src/arch/aa64_disasm.c`,
+ `src/link/link_arch_{aa64,rv64,x64}.c`._
+- [ ] **B5.** Treat the address of a function (including `static` functions)
+ as a constant expression in static initializers — functions always have
+ static storage duration. The current error message "static initializer
+ requires object with static storage" is wrong on its face. Repro:
+ ```c
+ static int helper(void) { return 0; }
+ typedef int (*FN)(void);
+ const FN g = helper;
+ ```
+ _4 `src/abi/abi_*.c` vtable files._
+- [ ] **B6.** Aggregate-initializer brace-tracking misfires for *array-of-struct*
+ where each struct has a trailing fixed-size array field initialized with a
+ brace-list. Single-instance form works; the arrayed form trips. Repro:
+ ```c
+ typedef struct { unsigned a; unsigned char p[2]; } S;
+ static const S t[] = { {1u, {0,0}}, {2u, {0,0}} }; /* "too many initializers for array" */
+ ```
+ _`src/arch/aa64_{isa,asm,regs}.c`._
+
+### Codegen — aarch64 backend
+
+- [ ] **C1.** Argument lowering: handle `OPK_INDIRECT` source operands in
+ both the INT and FP paths at `src/arch/aarch64.c:2073-2129`. Today only
+ `OPK_IMM` / `OPK_REG` / `OPK_LOCAL` are wired; an indirect source (e.g.,
+ passing `ptr->field` by value, or `arr[i].field` where the addressing was
+ lowered to a base+offset load) panics with
+ `aarch64 call: arg storage kind 4 unsupported`. The fix mirrors the
+ existing `OPK_LOCAL` case but loads from `[base + part->src_offset]`
+ instead of `[fp - slot_off + src_offset]`. _6 files: `src/arch/mc.c`,
+ `src/cg/cg.c`, `src/decl/{decl,decl_attrs}.c`, `src/opt/opt.c`,
+ `src/pp/pp.c`._
+- [ ] **C2.** Same `OPK_INDIRECT` gap in the indirect-return path
+ (separate panic string: `aarch64 ret indirect: storage kind 4
+ unsupported`). _`src/api/pipeline.c`, `src/parse/parse_asm.c`._
+
+### Codegen — x64 backend
+
+- [ ] **C3.** Mirror C1/C2 on x64. The same panics exist at
+ `src/arch/x64.c:1761,1798,1817,1827,1904`. Doesn't block aarch64
+ self-host but blocks x64 self-host once that's attempted.
+
+### Linker
+
+- [ ] **D1.** Stage 2 currently relies on `$(CC) -o $@ ... $(LIB_AR)` to do
+ the final link — for stage 2 that's `cfree-stage1 cc`, which in turn
+ shells out to the host linker. Once stage 2 builds, the `$(BIN)` recipe
+ should be reviewed to confirm the produced binary is genuinely a
+ stage-1-emitted object linked through cfree's own ld path, not falling
+ back to clang/ld silently.
+
+### Hosted libc shim
+
+- [ ] **E1.** Today `libcfree_hosted_macos.a` is built but not threaded into
+ the `$(BIN)` link. For a "self-host on rt libc" milestone (separate from
+ this checklist's primary goal of "stage 2 builds at all"), the `$(BIN)`
+ rule on macOS should consume the hosted shim and route libc calls through
+ it instead of clang's default `-lSystem` glue.
+
+## How to re-run the audit
+
+After landing any fix, regenerate the failure list with:
+
+```sh
+make && cp build/cfree build/cfree-stage1
+BIN=$(pwd)/build/cfree-stage1
+FLAGS="-isystem rt/include -isystem rt/include/libc -Iinclude -Isrc"
+DFLAGS="-isystem rt/include -isystem rt/include/libc -Iinclude"
+for f in $(find src -name '*.c' | sort); do
+ $BIN cc $FLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |"
+done
+for f in $(find driver -name '*.c' | sort); do
+ $BIN cc $DFLAGS -c "$f" -o /dev/null 2>&1 | head -1 | sed "s|^|$f: |"
+done
+```
+
+Then `make self` to confirm a clean stage-2 build end-to-end.