CC parallel work plan

Coordination doc for the next batch of tests/cc/ features. Defines four parallel work streams, the design seams between them, and the sequencing required where streams aren't fully independent.

Companion to CC-PUNCHLIST.md (the per-item checklist) and CC-INTERNALS.md (module layering and the cg-fixture-first workflow). ABI choices below cite P1.md §Arguments and return values.

Tests in scope

Currently failing on aarch64:

Test	Stream	Failure
cc/082-union-basic	D	exit 1 — union field offsets bumped like struct
cc/087-sizeof-noeval	D	exit 2 — sizeof(x++) actually increments x
cc/111-struct-ret-1word	A1	compile-fail — return-by-value struct unhandled
cc/112-struct-ret-2word	A1	compile-fail — no two-word return path
cc/113-struct-ret-3word	A2	compile-fail — no indirect-result path
cc/114-struct-ret-many-args	A3	exit 2 — silently truncates
cc/115-struct-ret-3word-many-args	A3	exit 2 — silently truncates
cc/116-struct-ret-vararg	A3	exit 2 — silently truncates
cc/117-compound-literal	B	compile-fail — `(T){…}` unparsed
cc/118-const-expr	C	compile-fail — const-expr surface incomplete

Stream layout

                         ┌─ A1: 111 + 112  (one-word + two-word direct)
Stream A (sequential) ───┤
struct-return ABI        └─ A2: 113        (indirect-result)
                              │
                              ▼
                         ┌─ A3a: 114 (sret + stack-staged args)   ┐
                         ├─ A3b: 115 (sret + 3-word + many args)  │ parallel
                         └─ A3c: 116 (sret + variadic save area)  ┘

Stream B  (independent, t=0):  117 compound literals
Stream C  (independent, t=0):  118 const-expr evaluator
Stream D  (independent, t=0):  082 union offsets + 087 sizeof no-emit

t=0: A1, B, C, D fan out together. A2 starts when A1 lands. A3a/b/c fan out when A2 lands. Total wall time ≈ A1 + A2 + max(A3).

Stream A — Struct-return ABI

Implements the three result conventions from P1.md §Arguments:

Width	Convention	a0	a1	Args 0..3
≤ 8B	one-word direct	result word 0	(caller arg 1)	a0..a3
9–16B	two-word direct	result word 0	result word 1	a0..a3
> 16B	indirect-result	result-buffer ptr	(callee arg 0)	a1..a3 (shifted)

In the indirect convention, incoming stack-arg slot 0 corresponds to explicit arg word 3 (not 4). LDARG indexing in the variadic save area must respect this shift.

A1 — one-word and two-word direct

cg primitives: extend cg-fn-end to load the function's return slot into a0 (≤8B) or a0/a1 (9–16B) per the return ctype's size. Extend cg-call's receive side to allocate a fresh frame slot via cg-alloc-slot (sized to the return ctype) and store back from a0[/a1] before pushing a frame-lval.
Uniform path: ≤8B and 9–16B both go through a frame slot. No register-only fast path. Simpler cg, identical parser surface.
Receive-area lifetime: fresh slot per call site, allocated by cg-call. Required so chained make_triple(...).c (cc/113:32) and ret1(99).x (cc/111:26) don't alias across consecutive calls.
Parser: parse-return-stmt accepts a struct-typed expression and emits a per-byte copy from the source lval into the function's return slot. parse-postfix-rest accepts struct-typed call results as lvals so .field and & chain naturally.
CC-CONTRACTS update: §3.2 currently asserts a single 8-byte return slot. Replace with a pointer to P1.md §Arguments and a note that the cg lowers all three conventions.
Fixtures: tests/cc-cg/70-struct-ret-1word.scm, 71-struct-ret-2word.scm first; then unblock tests/cc/111, 112.

A2 — indirect-result

cg primitives: when return ctype size > 16, cg-fn-begin/v treats arg slot 0 as the sret pointer, parameter slots shift by one register. cg-fn-end does no a0 store (callee already wrote through *a0 during the return-stmt copy); the convention's "a0 holds the same pointer on return" is automatic since a0 hasn't been clobbered.
Caller side: cg-call detects sret-eligible return type; before emitting the call, materializes the receive-slot's address into a0, shifts ordinary args from a0..a3 → a1..a3 + stack, with stack slot 0 now holding arg word 3.
Variadic interaction is deferred to A3c. A2 itself targets only cc/113 (no varargs, ≤4 named args).
Fixtures: tests/cc-cg/72-struct-ret-3word.scm; then tests/cc/113.

A3 — sret compositions (parallel)

Each agent picks up one test, mostly composing infrastructure A1+A2 already shipped:

A3a — cc/114 (sret-pair + 8 stack-staged args): the two-word return doesn't need sret; the 8 args do exercise stack staging (P1.md §Incoming stack-argument area). Validates that A1's two-word receive composes with the existing stack-stage path.
A3b — cc/115 (sret-3word + 8 stack-staged args): with sret in a0, args 0–2 live in a1–a3 and args 3–7 stage to stack slots 0–4. Pure indexing test for A2's shift.
A3c — cc/116 (sret + variadic save area): cg-fn-begin/v's 16-slot save window indexes from incoming arg word 0. When the fn uses indirect-result, slot 0 is arg word 3 (not 4) — A3c adjusts the windowing accordingly. __builtin_va_start must skip the sret pointer; in practice that's automatic if the named-arg count threading is correct, since the sret pointer occupies a0 and the named args start at a1.

Stream B — Compound literals (117)

C99 §6.5.2.5: (T){ init-list } as a postfix expression. Block-scope only; file-scope literals die explicitly.

Parser: detect (T){ lookahead in parse-cast-or-unary. Parse the typename via parse-decl-spec + parse-declarator, then call the existing parse-init-local-aggregate against a fresh slot from cg-alloc-slot. Push a frame-lval typed as T (or T[N]).
Lvalue contract: the literal is an lvalue, so &literal, literal.field, and literal[i] all work via existing cg-take-addr / cg-push-field / cg-decay-array paths.
Lifetime: frame slot ⇒ enclosing block, automatic. Matches C99.
Reuse: no new cg primitives. The fixture surface (positional, designated, partial-init zero-fill, trailing comma, array decay, byval struct arg) is all already covered by Stream E in CC-PUNCHLIST.

Stream C — Const-expr evaluator (118)

Adds parse-const-expr ps → (value . ctype), a self-contained walker that never touches cg.

Operand surface: integer/character literal, enum constant, sizeof(typename), unary + - ~ !, binary + - * / % << >> & | ^, compare < <= > >= == !=, logical && || (short-circuit), ternary ?:, cast to integer type, parenthesization. Anything else dies.
Width-aware return: the (int)(unsigned char)257 == 1 case requires the cast to truncate at u8 width. Bare fixnum loses this; the (value . ctype) tuple keeps it.
Sizeof arm: for now, only sizeof(TYPENAME) — the only form exercised by 118. If a value-expression form surfaces later, grow a scope-lookup arm; still no cg interaction.
Wiring (replace existing parse-const-int call sites): parse-enum-spec, parse-decl-suf-cont's [] arm, parse-init-global's scalar branch, parse-switch-stmt's case label, local array bound in parse-stmt.
No interaction with Stream D. See "Sizeof split" below.

Stream D — Surgical fixes (082 + 087)

Single agent — both ~30-line changes.

082 — union field offsets

parse-struct-fields (cc.scm:3634) advances offset after every field regardless of kind. For unions all fields must stay at offset 0.

Thread kind from parse-aggregate-spec (cc.scm:3611, where it's already in scope) through to parse-struct-fields.
Gate the (+ oa (max sz 0)) bump on (eq? kind 'struct). Unions pass through with oa unchanged.
complete-agg! already sizes unions correctly; no change there.

087 — sizeof no-emit

The current sizeof arm at cc.scm:4898 calls parse-unary / parse-expr which emit code for the operand. sizeof(x++) therefore actually increments x.

Add cg primitives cg-snapshot cg → tag and cg-rewind cg tag. Snapshot captures vstack depth and fn-buf chunk count; rewind restores both. Internal-only.
In the sizeof arm, snapshot before parse-unary, read (opnd-type (cg-top)), rewind, push cg-push-imm %t-u64 size.
Fixture lock-in: 087 covers sizeof(x++). Add a cg-fixture if cg primitives need direct validation.

Cross-stream contracts

Sizeof split (C and D stay independent)

Two distinct callers, two independent mechanisms:

Outside const-expr (Stream D — 087): operand can be anything (x++, calls, side-effects). Result lives at runtime. Use cg-snapshot / cg-rewind.
Inside const-expr (Stream C — 118): operand grammar restricted; result is a parse-time fixnum. Const-expr evaluator handles sizeof(TYPENAME) directly via parse-decl-spec + parse-declarator + ctype-size. Never calls cg.

The two paths share the concept (don't evaluate the operand) but not the implementation. They can land in either order.

A1 ↔ B/C/D

Independent. A1 only touches cg-fn-end, cg-call, and parse-return-stmt. B touches parse-cast-or-unary. C adds a new walker plus changes to four call sites that don't overlap with A1. D touches parse-struct-fields and adds two new cg primitives.

A2 ↔ A3

A3 depends on A2's indirect-result implementation. A3a/b/c are independent of each other once A2 has landed.

Workflow per stream

Per CC-INTERNALS §Feature workflow:

cc-cg fixture (red) — drive the cg API directly.
Implement cg primitives until cc-cg green.
cc fixture (red) — full driver.
Implement parser changes until cc green.

Pick the next free <n> per suite. cc-cg currently goes up through 69; cc goes up through 118 (with gaps).

Acceptance

Per stream: make test SUITE=cc ARCH=aarch64 shows the stream's target tests as PASS, no prior tests regress. Final acceptance: all ten currently-failing tests green on aarch64.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README

boot2