tcc-boot2 Current TODO

Current tracker for the scheme1-hosted cc.scm path that builds tcc.flat.c into tcc-boot2.

Companion docs:

TCC.md describes the surrounding tcc pipeline.
CC.md describes the C subset and validation milestones.
LIBC.md describes the libc side used to link tcc-boot2.

Current State

cc.scm compiles the flattened tcc translation unit, the P1pp output assembles and links via the M1pp + hex2++ chain, and the resulting tcc-boot2 is at full parity with the gcc-built control on the tcc-cc acceptance suite (see Latest Result below).

Useful smoke checks:

make tcc-boot2 ARCH=aarch64

build/aarch64/tcc-boot2/tcc-boot2 -v
build/aarch64/tcc-boot2/tcc-boot2 -E smoke.c
build/aarch64/tcc-boot2/tcc-boot2 -c smoke.c -o smoke.o

For native generated-program testing, use the ARM64-targeted build via the tcc-cc suite:

make test SUITE=tcc-cc

`tcc-cc` Suite

tcc-cc runs the plain tests/cc fixtures through tcc-tcc (second-stage tcc) instead of through cc.scm directly. The Makefile chain is cc.scm → tcc-boot2 → tcc-tcc; the runner then does:

build/aarch64/tcc-tcc/tcc-tcc \
    -nostdlib build/aarch64/tcc-cc/start.o build/aarch64/tcc-cc/mem.o \
    tests/cc/NAME.c -o build/aarch64/tests/tcc-cc/NAME

./build/aarch64/tests/tcc-cc/NAME

Routing fixtures through tcc-tcc (rather than tcc-boot2 directly) turns every fixture into a self-host check: a regression in cc.scm's emitted code surfaces first when tcc-boot2 builds tcc-tcc, then again when tcc-tcc runs the fixtures.

mem.o is the compiler-builtin mem* runtime — memcpy/memmove/memset that tcc emits direct calls to for struct copies and bulk init, plus memcmp for fixtures that reach it via bare extern int memcmp(...). The tcc-gcc sibling supplies the equivalent four symbols by compiling mes-libc's string/{memcpy,memmove,memset,memcmp}.c into its runtime archive.

The result is compared against the same .expected and .expected-exit files used by the regular cc suite. The suite is aarch64-only today because it needs generated binaries to run natively inside the aarch64 container.

Run a subset with NAMES:

NAMES='002-arith 007-call-with-args' make test SUITE=tcc-cc

Latest Result

make test SUITE=tcc-cc       tcc-tcc on tests/cc:    181 passed, 0 failed
scripts/run-gcc-libc-flat-tcc.sh  tcc-gcc baseline:   181 passed, 0 failed

Exact parity, suite fully green on both paths.

The path from earlier results to here:

Result	Delta
148/30	baseline before mem-runtime
163/15	added `tcc-cc/mem.c` runtime; cleared the `mem*` undefined-symbol cluster
175/3	cc.scm migration to M1pp + hex2++ pipeline (dotted local labels, `.scope`/`.endscope`, `.align` directives, bare-hex string emission) cleared the entire `assert fail: 0@12051` cluster (14 fixtures) plus a hex2pp.P1 BSS-overlap fix that unblocked the tcc-boot2 link itself for inputs >1 MiB
176/2	ternary-arms common-type fix in `cg-ifelse-merge` cleared `220-const-promote` (was: arm 1's type leaked through as the result type, truncating wider arm 2 to 32-bit; tcc's `gen_opic` sign-extension idiom hit this)
178/1	reframed mem* as compiler builtins supplied by the build process: renamed libp1pp's `libp1pp__memcpy` / `_memcmp` / `_memset` to plain `memcpy` / `memcmp` / `memset` and added `memmove`; dropped mes-libc's `string/memcpy.c` / `memmove.c` / `memset.c` / `memcmp.c` from `unified-libc.c` so the symbols are not duplicated; added `memcmp` to `tcc-cc/mem.c` and linked it into the gcc-built tcc-gcc binary; updated and renamed the regression fixture (`129-extern-libp1pp` → `129-extern-mem-builtins`) to extern the plain names. Cleared the fixture on every path (cc, cc-libc, tcc-cc, tcc-gcc).
183/1	added `tcc-tcc` (second-stage tcc) and routed `tcc-cc` / `tcc-libc` through it. cc.scm's `cg-load` was 8-byte-spilling struct lvalues — anything `sizeof > 8` got truncated when used in expression context (e.g. as a ternary arm). Fixed `cg-load` to leave aggregates as lvalues and updated `cg-ifelse-merge` to memcpy aggregate arms into a struct-sized merge slot; without this, tcc-boot2 (cc.scm-built) self-corrupted whenever it had to compile `type = bt1 == 6 ? type1 : type2;`. Regression locked by `tests/cc/336-struct-assign-ternary`.
181/0	fixed cc.scm's struct-by-value parameter ABI — both `cg-call` and `cg-fn-begin/v` now split 9..16-byte aggregates across two consecutive ABI slots. Locked by `tests/cc/337-struct-by-value-arg`.
181/0	added `simple-patches/tcc-0.9.26/lex-char-unsigned` so tcc reads single-byte character constants through `uint8_t`, not `int8_t`; clears `200-lex-char-type` on both `tcc-cc` (tcc-tcc-driven) and `tcc-gcc` (gcc-built control). C99 §6.4.4.4¶10 leaves `char` signedness implementation-defined and aarch64 AAPCS picks unsigned, so `'\xFF'` must be 255, not -1.

Host Baseline

The tests/cc fixtures are coherent under a host compiler. The temporary host harness compiled, ran, and compared every fixture with plain host cc:

build/aarch64/.work/tests/tcc-cc/run-host-cc.sh

Recorded baseline:

HOST_CC=cc
HOST_CFLAGS=-std=gnu11 -w
153 passed, 0 failed

The gcc-built flattened-tcc control runs in the Alpine gcc image:

make tcc-gcc TCC_TARGET=ARM64
podman run --rm --pull=never --platform linux/arm64 \
    -v "$PWD":/work -w /work boot2-alpine-gcc:aarch64 \
    sh scripts/run-gcc-libc-flat-tcc.sh

This is the canonical sanity reference for "tcc-built-from-our-source" fixture coverage; cc.scm-built tcc-boot2 is now at exact parity with it.

Patches

scripts/simple-patches/tcc-0.9.26/ carries fixes applied during stage1-flatten so any tcc rebuilt from this tree picks them up:

aarch64-stdarg-array.{before,after} — swaps the bundled va_list for __va_list_struct[1] (matches glibc/musl/x86_64 ABI).
arm64-va-{pointer-operand,arg-pointer}.{before,after} — teaches gen_va_start/gen_va_arg to skip gaddrof() when the operand is already a pointer (the array-decayed/pointer-parameter case). Without this, va_list forwarding into a non-variadic helper (the vfprintf shape, e.g. 131-vararg-mixed) hit assert fail: 0 in arm64-gen.c.
const-divzero-shortcircuit-int.{before,after} — gates gen_opic's "division by zero in constant" error on !nocode_wanted so the unevaluated arm of &&/||/?: in constant expressions (C11 §6.6¶3) does not abort.
lex-char-unsigned.{before,after} — reads single-byte character constants through uint8_t instead of int8_t so '\xFF' produces 255, not -1, matching aarch64 AAPCS's plain-char signedness (C99 §6.4.4.4¶10 leaves it implementation-defined).

Fixture cleanups

Two small fixtures were rewritten to drop assumptions the regular cc suite shouldn't depend on:

tests/cc/125-anon-union.c explicitly initializes its local struct before probing anonymous-union aliasing. Tests must not depend on implicit zeroing of automatic locals.
tests/cc/132-tentative-bss-sizing.c returns distinct numeric exit codes instead of calling sys_write/strlen. Plain tests/cc fixtures must not need stdio/libc helpers.

Known limitations

riscv64: u32 narrowing leaves dirty upper bits

tests/cc/335-ternary-merge-arith-conv fails on riscv64 in both tcc-cc[stage2] and tcc-cc[stage3] (identical behavior — the fixed-point property holds, the bug is in tcc's RISC-V codegen, not in cc.scm or the P1 pipeline). aarch64 and amd64 are green.

The proximate trigger is in riscv64-gen.c::load():

func3 = size == 1 ? 0 : size == 2 ? 1 : size == 4 ? 2 : 3;
if (size < 4 && !is_float(sv->type.t) && (sv->type.t & VT_UNSIGNED))
    func3 |= 4;          // promotes lb→lbu, lh→lhu, but skips lw→lwu

The func3 |= 4 promotion to LWU is gated on size < 4, so a 4-byte unsigned load uses LW (sign-extending) instead of LWU (zero-extending). gen_cast to VT_INT|VT_UNSIGNED from a wider source emits no narrowing — it relies on the use-time load to truncate, but with LW the high u32 bits of the source leak through. (u32)x where x is u64 with bit 31 set then evaluates to 0xFFFFFFFFFFFFFFFF. This same bug is present in upstream tcc mob.

Why the one-line patch isn't enough. Widening the gate to size <= 4 (so 4-byte unsigned loads use LWU) regresses 017-int-arith and 128-cast-signedness. They were passing because two compensating bugs canceled out: stock tcc on riscv64 also sign-extends unsigned 32-bit immediate constants (LUI/ADDI with a bit-31-set value), so a comparison between an unsigned int variable (loaded with sign-extending LW) and an unsigned int constant (loaded with sign-extending LUI/ADDI) had matching dirty upper bits and BEQ saw them as equal. Fixing only the load breaks that join, because the compare path also lies — BEQ is a 64-bit instruction but C semantics require 32-bit width for unsigned int == unsigned int.

Full fix shape. Three coupled pieces: (1) load — emit LWU for unsigned 4-byte loads; (2) immediate — clear bits 32–63 when materializing an unsigned 32-bit constant with bit 31 set; (3) compare — eagerly canonicalize 32-bit-typed values into zero-extended or sign-extended form (per VT_UNSIGNED) after every op that can leave the upper half dirty. Pieces 2 and 3 overlap: if values are canonicalized at every produce site, the load fix becomes one of many sites that need to do it. This is what gcc/clang's RISC-V backends do, and it's beyond the scope of the literal-block simple-patches mechanism — file upstream or write a real canonicalization pass.

For now: known limitation, document, move on. The scalar codegen elsewhere on riscv64 is fine — only u32 narrowing of a wider source trips it.

tcc0 → tcc1 is not a fixed point on riscv64 (cc.scm behavioral bug)

boot3.sh + boot4.sh produce four staged compilers:

tcc0 = tcc-source compiled by cc.scm (boot3 output)
tcc1 = tcc-source compiled by tcc0 (boot4)
tcc2 = tcc-source compiled by tcc1 (boot4)
tcc3 = tcc-source compiled by tcc2 (boot4)

The fixed-point check is tcc2 == tcc3 (asserted at the end of boot4.sh, verified on aarch64, amd64, riscv64). On riscv64 the weaker tcc1 == tcc2 does not hold: tcc0(tcc.flat.c) produces a 616100-byte .o while tcc1(tcc.flat.c) and tcc2(tcc.flat.c) produce a byte-identical 615892-byte .o — 208 bytes larger from tcc0 (200 in .text + 8 ripple in symtab/reloc offsets). amd64 and aarch64 satisfy tcc1 == tcc2; only riscv64 diverges.

This is a bug to investigate, not just a "fatter code" observation. cc.scm should be a faithful (semantics-preserving) compiler — slower or larger output is acceptable, but tcc0 and tcc1 must produce byte-identical output when run on the same source. That they don't on riscv64 means cc.scm's translation of tcc.flat.c into tcc0 changed what tcc0 does at runtime, not just how it's encoded. We don't care about peephole optimizations being missed; we do care that tcc0 makes different codegen decisions than tcc1 makes.

What's known

The visible symptom: tcc0 emits 4 RISCV codegen patterns differently than tcc1 does:

Source pattern	tcc0 emits	tcc1 emits	Δ
`x = x - imm` (i32)	`addiw t,zero,imm; addw rd,rs,t`	`addiw rd,rs,imm`	+4 B
`x = x & imm`	`addiw t,zero,imm; and rd,rs,t`	`andi rd,rs,imm`	+4 B
zero-ext after `sext.w`	`sext.w r,r; slli r,r,0x20; srli r,r,0x20`	`sext.w r,r`	+8 B
`x == 0xFFFFFFFF` (i32)	`addiw t,zero,-1; slli/srli; beq x,t,L`	`addi x,x,1; beqz x,L`	+8 B

These are decision points in riscv64-gen.c (immediate-folding, zero-ext elision). Same source code, same input C, but the running tcc0 takes the slow branch where the running tcc1 takes the fast one — even though both are compiled from the same tcc.flat.c.

Hypothesis to test

cc.scm likely miscompiles an integer comparison or bit-test inside the immediate-fits-in-instruction guard in riscv64-gen.c. Most of the missed patterns share the shape if (small_int_fits) { fold } else { materialize }. If cc.scm gets the predicate wrong (e.g. signed vs. unsigned compare, or wrong branch on a particular bit pattern), tcc0 falls into the materialize path on inputs where tcc1 takes the fold path.

Repro / starting point

# In the riscv64 container with boot3+boot4 outputs present:
$TCC0 -nostdlib -c -o /tmp/flat-tcc0.o tcc.flat.c
$TCC1 -nostdlib -c -o /tmp/flat-tcc1.o tcc.flat.c
# wc -c /tmp/flat-tcc0.o /tmp/flat-tcc1.o   →  616100 vs 615892
# objdump -d both, normalize addresses, diff to find divergent functions

The first divergent function in disassembly is tal_free_impl — a small refcount-decrement that hits the "x = x - 1" pattern. Good starting point because the function is short and the source path is narrow.

Until this is fixed, tcc1 is the "shake-out" stage and tcc2 is the canonical compiler.

Standalone `bootN.sh`: remaining host deps

scripts/{boot0,boot1,boot2}.sh are pure scratch + busybox — no host compiler, no alpine-gcc image, just podman + the pinned busybox:musl digest. boot3.sh is also pure scratch + busybox (it's just scheme1 + M1pp + hex2pp on .flat.c inputs flattened by host cc -E). boot4.sh previously had one host-tooling dep on aarch64 only: cross-asm of tcc-libc/aarch64/{start,sys_stubs}.S to .o via $HOST_CC -target aarch64-linux-gnu. tcc 0.9.26's aarch64 backend has no assembler (no arm64-asm.c) and no inline-asm support, so .S inputs historically needed pre-compilation host-side; the patched arm64-asm.c now removes that requirement (see docs/TCC-ARM64-ASM.md).

amd64 and riscv64 backends both ship CONFIG_TCC_ASM and assemble .S in-container via tcc-boot2 itself (stages C+D in boot4.sh). The riscv64 .S files are macroed behind #ifdef __TINYC__ because tcc's riscv64 asm parser uses 3-operand load/store syntax (ld rd, base, off, sd base, src, off — base first for stores) instead of GAS's ld rd, off(base) / sd src, off(base); the GAS path stays usable for the Makefile's alpine-gcc fallback. The boot2-alpine-gcc:riscv64 image is no longer used by boot3.sh / boot4.sh.

Replacing the aarch64 .S pair with .P1pp (or any in-container-buildable) equivalents drops the host-cc dep entirely. After that, every bootN.sh is podman + scratch + busybox only.

Out of scope for this TODO (already accepted as host-side): stage1-flatten.sh and libc-flatten.sh use the host cc -E preprocessor to produce tcc.flat.c and libc.flat.c. The unpacked tcc-0.9.26/lib/{lib-arm64.c, va_list.c} helpers compile cleanly under tcc-boot2 inside the container — no host cc on those, just source deps.

Next steps

The cc.scm path is at full parity with the gcc-built control on the test suites that pass: every fixture in tcc-cc and tcc-libc passes on both, modulo the riscv64 limitation noted above. Further bug-hunting work is open-ended — surface a misbehavior, write a tests/cc fixture that locks it, fix.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README

boot2