Building tcc-0.9.26 from this repo

Describes the host-side flatten step that produces tcc.flat.c — the single-file C bytestream that the boot pipeline (boot3.sh → boot4.sh) feeds to cc.scm to bootstrap tcc-0.9.26.

bootprep/stage1-flatten.sh runs on the host (macOS or Linux). It flattens upstream tcc.c into a single tcc.flat.c bytestream using only the host C preprocessor — no M2-Planet, Mes Scheme, or MesCC anywhere.

This is the upstream half of the CC.md story: once cc.scm ingests tcc.flat.c, the rest of the pipeline (boot3 → boot4) is in-tree.

Inputs

Path	Contents
`../lb-work/distfiles/tcc-0.9.26.tar.gz`	tcc-0.9.26-1147-gee75a10c source (janneke's bootstrap-friendly fork; same artifact live-bootstrap consumes)
`../lb-work/distfiles/mes-0.27.1.tar.gz`	GNU Mes 0.27.1 — used only for its bundled minimal libc sources and headers, not for any Scheme runtime
`../live-bootstrap/steps/tcc-0.9.26/simple-patches/`	Two file-open reorder patches applied before the flatten step
`../mes/include/`	Same Mes headers as the tarball — used at flatten time so we don't pull in host glibc/musl

The three scripts sit on top of these inputs; they require nothing else from the host besides tar, awk, a host cc, and podman.

Pipeline overview

tcc-0.9.26-1147-gee75a10c.tar.gz                   live-bootstrap source
        │
        │   stage1-flatten.sh                      (host)
        │     • unpack
        │     • apply 2 simple-patches
        │     • host cc -E -nostdinc with mes headers + tcc-mes defines
        ▼
build/$ARCH/vendor/tcc/tcc.flat.c          ~600 KB single-file C

tcc.flat.c is a portable artifact; downstream the boot pipeline (boot3.sh → boot4.sh) feeds it to cc.scm to produce a working tcc-0.9.26.

Stage 1 — flatten tcc.c into tcc.flat.c

bootprep/stage1-flatten.sh --arch X86_64

Mirrors the live-bootstrap tcc-mes invocation (steps/tcc-0.9.26/pass1.kaem:60–87) minus the actual compile. The host preprocessor expands every #include in tcc.c (which uses ONE_SOURCE=1 to fold libtcc.c, tcctools.c, and the per-arch backends in via #include) and inlines all the Mes- bundled standard headers.

Stage 1 deliberately stays on the host — it is just text manipulation and the tcc.flat.c it produces is consumed identically downstream regardless of where stage 1 ran.

Sub-steps

Unpack tcc-0.9.26.tar.gz into build/amd64/vendor/tcc/.
Apply simple-patches: remove-fileopen.before/.after then addback-fileopen.before/.after against tcctools.c. Implemented as an awk literal-block replacer (live-bootstrap's simple-patch is a trivial before/after substitution).
Empty config.h shims: live-bootstrap creates two empty config.h files via catm. We do the same — one in $TCC_PKG/config.h, one in mes-overlay/mes/config.h for the <mes/config.h> reach the Mes stdio.h does.

Host preprocess: cc -E -nostdinc with the Mes headers as the sole -I set, plus the same -D set live-bootstrap passes:

-D BOOTSTRAP=1
-D HAVE_LONG_LONG=1
-D inline=
-D ONE_SOURCE=1
-D TCC_TARGET_X86_64=1
-D __linux__=1
-D __x86_64__=1            # mescc would inject these; we mirror them
-D CONFIG_TCC_*="..."      # exactly the live-bootstrap paths

Output: a single ~600 KB C bytestream, no remaining #includes, no preprocessor directives at all.

(Optional, --verify) host cc -c tcc.flat.c -> tcc.flat.o. On macOS this produces a Mach-O .o; the verify is purely a "does the source compile" check. Failure here means the flatten step is wrong.

What this unlocks for the scheme1 cc

The interface for the slot scheme CC fills:

Input: tcc.flat.c produced by stage 1.
Output: a working ELF capable of compiling the same tests/cc fixtures the regular cc suite covers.

make tcc-boot2 ARCH=aarch64 runs that path end-to-end: cc.scm + tcc.flat.c → tcc-boot2, linking against a cc.scm-built libc.flat.c instead of mes libc. The tcc-cc acceptance suite (make test SUITE=tcc-cc) shows full parity with the gcc-built control on aarch64 and amd64.

Reproducibility

bootprep/stage1-flatten.sh --arch X86_64

Artifacts in build/$ARCH/vendor/tcc/:

File	Size	Built by	What it is
`tcc.flat.c`	~600KB	host cc	flattened single-source tcc-0.9.26

build/ is in .gitignore; nothing tracked outside the scripts themselves.

Issues / bugs

tcc 0.9.26 SEGV on large concatenated TU

When ~22+ mes libc files are catm'd into one TU and the chain hits a file with non-trivial inline asm (the trigger we found was linux/x86_64-mes-gcc/_exit.c), tcc-0.9.26 crashes mid-compile. Below that threshold all combinations work. Each individual file compiles fine.

The interaction is some accumulator state inside tcc — symbol table, hash chain, or similar — that overflows or hits a corrupted state when the TU grows large enough.

Workaround: compile each .c separately, then ar together. The boot pipeline does this for the mes libc / musl per-file sweeps. Bonus: avoids redundant header re-parses, faster overall.

Confirmed in canonical live-bootstrap. The tools/diag-livebootstrap-qemu.sh diagnostic runs upstream live-bootstrap's amd64 pass1 chain inside the same busybox + linux/amd64 QEMU we use, and its mescc-built tcc-mes SEGVs at exactly this step (tcc-mes -c unified-libc.c with assert fail: 0 then SIGSEGV). The per-file workaround is load-bearing for any tcc-0.9.26-on-QEMU build, not specific to our path.

riscv64 integer-width codegen

RV64 keeps 32-bit values in sign-extended register form, including unsigned int; an unsigned value is zero-extended only when it is widened to a 64-bit C type. LW for an unsigned 32-bit lvalue is therefore intentional. Changing it to LWU breaks comparisons against the backend's sign-extended 32-bit constants.

The widening direction is handled by riscv64-cvt-int-zext and riscv64-gen-cvt-sxtw: signed int widens with ADDIW, while unsigned int widens with SLLI 32; SRLI 32.

The narrowing direction needs an explicit operation too. Upstream tcc only relabels an unspilled 64-bit register when casting it to int or unsigned int. The stale upper half can then affect an RV64 relational comparison, whose SLT/SLTU operates on all 64 bits. This is the minimal failing shape:

typedef unsigned int u32;
typedef unsigned long u64;
static u64 ident(u64 x) { return x; }

(u32)ident(0x1234567800000000UL) < 1U; /* must be true */

Stock codegen compares the unchanged 0x1234567800000000 register and returns false. The riscv64-cvt-int-narrow patch makes gen_cast() emit ADDIW r,r,0 at a 64-to-32-bit boundary, producing the canonical sign-extended low word before any comparison. The regression is tests/cc/341-riscv64-u32-register-narrow.

The self-host chain reaches a clean fixed point: tcc0 and tcc1 emit byte-identical tcc.flat.c objects, and boot4 verifies tcc1 == tcc2.

AT-series patches (post-bootstrap uniformity)

These patches go beyond the bootstrap-stub patches in vendor/tcc/patches/ and exist to remove per-arch workarounds in seed-kernel and the build pipeline. They live in the same patch directory and are listed in bootprep/stage1-flatten.sh's apply_our_patch block.

AT.2 — native PT_NOTE for PVH boot

Stock tcc 0.9.26 tags every assembler-created section as SHT_PROGBITS and emits only PT_LOAD phdrs for static EXEs. QEMU's PVH -kernel path on amd64 scans PT_NOTE phdrs for the Xen 18 note that names the 32-bit entry; without one it errors out ("Error loading uncompressed kernel without PVH ELF Note"). The old workaround was a post-link host tool, elf-pvh-note.c, that rewrote the ELF after tcc finished. AT.2 replaces it with two patches in tccelf.c:

note-section-sht-note — find_section() creates .note* sections as SHT_NOTE instead of SHT_PROGBITS.
pt-note-phdr — elf_output_file() bumps phnum by one when at least one SHT_NOTE+SHF_ALLOC section exists, then fills the reserved phdr slot with a PT_NOTE covering [min(sh_offset), max(sh_offset+sh_size)) over those sections. The bump is gated on actual presence so arches with no note sections (aarch64, riscv64) keep the same phnum and produce byte-identical output.
load-obj-accept-sht-note — tcc_load_object_file()'s accepted-section-type list adds SHT_NOTE. Without this, .o files emitted by the patched find_section (which now produces SHT_NOTE for .note*) get silently skipped during the link; the subsequent .rela.note.* merge then derefs a NULL sm_table[].s for the (skipped) note section. Strict pair with note-section-sht-note.

After AT.2 the post-link elf-pvh-note.c tool, the amd64-only branch in boot6.sh, and the amd64 fixup block in boot6-gen-runscm.sh are all gone. The amd64 kernel is emitted ready-to-boot by tcc2.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README