commit e21919d9ee8fa9e95220ff98f3fb11f49f35cee3
parent b797882f164a5440762926256347aed75997d2ef
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 4 May 2026 14:12:53 -0700
Fix LP64 constants for boot4 musl
Diffstat:
5 files changed, 136 insertions(+), 303 deletions(-)
diff --git a/docs/MUSL.md b/docs/MUSL.md
@@ -1,319 +1,115 @@
-# boot4 — building musl with the boot3 tcc
+# boot4 musl spec
-Working doc. boot3 produces a self-host fixed-point tcc (`tcc2 == tcc3`).
-boot4 takes that compiler and uses it to build [musl
-1.2.5](https://musl.libc.org/) from upstream source plus a small set of
-tcc-compatibility patches, then links and runs a static hello world.
-The harness is wired for amd64, aarch64, and riscv64; **amd64** and
-**riscv64** are verified end-to-end (aarch64 still blocks at link due
-to tcc 0.9.26 codegen bugs). Modeled on the in-image build in
-[/Users/ryan/tmp/musltcc](file:///Users/ryan/tmp/musltcc) but driven
-from this repo's bootstrap and constrained to scratch+busybox.
+`scripts/boot4.sh <arch>` builds a static musl 1.2.5 libc with the
+verified boot3 tcc for the same architecture, then links and runs a
+static hello-world smoke binary. Supported architectures are `amd64`,
+`aarch64`, and `riscv64`; all three are verified end-to-end.
-## Pipeline
-
-```
-build/$ARCH/boot3/tcc3 (verified by boot3)
- │
- │ scripts/boot4.sh amd64
- │ • libtcc1.a: tcc compiles libtcc1.c, alloca86_64.{S,bt.S},
- │ va_list.c into a tcc -ar archive
- │ • patch: apply musl-1.2.5-tcc.patch + scrub deleted
- │ src/complex/, src/{fenv,signal}/x86_64/,
- │ src/math/x86_64/*.c
- │ • configure: CC=tcc AR=true sh ./configure
- │ --target=x86_64-linux-musl --disable-shared
- │ • headers: sed mkalltypes.sed; build syscall.h, version.h
- │ • compile: tcc -c every src/<dir>/*.{c,s,S}, skip-on-fail
- │ • crt: Scrt1.o, crt1.o, rcrt1.o, crti.o, crtn.o
- │ • libc.a: tcc -ar rcs all .o files
- │ • hello: tcc -static crt1.o hello.c -lc -ltcc1 -lc
- │
- ▼
-build/$ARCH/boot4/{libtcc1.a, libc.a, crt1.o, crti.o, crtn.o, hello}
-```
-
-The container is the same `boot2-scratch:$ARCH` boot3 uses (FROM scratch
-+ busybox, no libc, no /etc).
-
-## Multi-arch status
-
-| arch | dispatch | musl patch | va_list shim | end-to-end |
-|---------|:--:|:--:|:--:|:--:|
-| amd64 | ✓ | ✓ | ✓ | ✓ verified |
-| aarch64 | ✓ | aarch64-targeted patches landed (syscall trampoline, atomic CAS via single-fn LL/SC, get_tp helper, replacement crt_arch.h, replacement __set_thread_area, deletion sweep) + arm64-gen.c VT_CONST\|VT_LVAL store/load fix. Compile reaches **1263/1271**; 8 skips. | mirrors tcc 0.9.26 AAPCS register-save struct, gets past the `va_list` typename | hello starts, runs first printf, then segfaults — open mystery (see below) |
-| riscv64 | ✓ | riscv64-targeted patches landed (mirrors aarch64: syscall trampoline, atomic externs, get_tp helper, replacement crt_arch.h, patched __set_thread_area, deletion sweep). Compile reaches **1268/1271**; 3 skips. | mirrors tcc 0.9.26 riscv64 stdarg.h: `__builtin_va_list = char *`, va_arg as the lp64 pointer-arithmetic macro | ✓ verified |
-
-### aarch64 status after patch round 1 + tcc fix
-
-The first round of musl patches dropped the skip count from 153 to 20.
-Of those 20, 17 were tccgen-level `store(0, (1011, 5130, 0)) / assert
-fail: 0` failures across `__libc_start_main`, `__init_tls`, `abort`,
-the entire `mallocng/*`, `oldmalloc/malloc.c`, `pthread_join`,
-`asctime_r`, etc. — the load-bearing files hello needs to link.
-
-Root cause **found and fixed in tcc**:
-[`scripts/simple-patches/tcc-0.9.26/arm64-{store,load}-const-lvalue.{before,after}`](../scripts/simple-patches/tcc-0.9.26/).
-`arm64-gen.c`'s `store` and `load` handle `VT_CONST | VT_LVAL | VT_SYM`
-(store/load via symbol address) but never plain `VT_CONST | VT_LVAL`
-(via integer address). Trips on `*(volatile T *)addr = v;` patterns
-that fall out of musl's weak-hidden-extern code paths after constant
-folding. x86_64 routes through generic `gen_modrm`, riscv64 has an
-explicit `fr == VT_CONST` branch, arm64 just ran into `printf + assert`.
-The patch mirrors the existing `|VT_SYM` case but materializes the
-address with `arm64_movimm` instead of `arm64_sym`. Regression test:
-`tests/cc/338-literal-addr-deref.c`. After this fix the skip count
-drops from 20 to 8.
-
-### aarch64 status after patch round 2 (current)
-
-Patches added in round 2 to push past the residual asm-shaped issues:
-
-- **`arch/aarch64/atomic_arch.h` redesigned** to expose only `a_cas` /
- `a_cas_p` as externs (plus `a_barrier` / `a_ctz_64` / `a_clz_64`).
- An earlier attempt with extern `a_ll` / `a_sc` deadlooped: the
- function-call boundary between `ldaxr` and `stlxr` clears the
- exclusive monitor on real hardware and on QEMU/Apple Silicon, so
- the LL/SC retry loop never made progress. musl's
- `src/internal/atomic.h` derives `a_swap` / `a_fetch_add` / `a_or` /
- `a_and` / `a_inc` / `a_dec` / `a_store` from `a_cas`.
-- **`src/internal/aarch64/atomic.s`** holds the entire LL/SC pair
- inside one call. Two arm64-asm.c phase-2 quirks shape the layout:
- - **forward `b.cond` / `cbz` / `cbnz` to a same-file label** errors
- with `"CONDBR19 reloc unsupported"`,
- - **forward unconditional `b` to a same-file label** silently
- assembles as `b +0` (branch-to-self) — no error, but the function
- becomes an infinite loop.
- Backward branches resolve correctly; branches to extern symbols
- (CALL26/JUMP26) work in either direction. So each function defines
- its exit block BEFORE the function entry, making every conditional
- branch backward.
-- **`src/thread/aarch64/__set_thread_area.s` restored** as a
- replacement (not deletion). Stock musl uses `msr tpidr_el0, x0`;
- arm64-asm.c phase 1+2 doesn't recognize the `msr` mnemonic, so the
- encoding is emitted as a raw `.long`. Without this file,
- `__init_tls` calls undefined `__set_thread_area` whose static-link
- reference gets silently resolved by tcc, then jumps to garbage
- before main runs.
-- **`arch/aarch64/crt_arch.h` simplified** to just `mov x0, sp; b _start_c`
- — drops the `adrp`/`:lo12:_DYNAMIC` sequence (unused for static
- builds), the `and sp, x0, #-16` alignment (`bic` rejects bitmask-
- immediate; Linux/AAPCS already 16-byte-aligns sp at process entry),
- and the `mov x29, #0` / `mov x30, #0` register zeroing (arm64-asm.c
- encodes `mov xN, #imm` as the 32-bit `MOVZ wN, #imm` form, leaving
- upper 32 bits unset — kernel zeroes GPRs at process entry anyway).
-- Two compounding `.word` → `.long` fixes: tcc's `.word` is **2 bytes**
- (gas-style for x86), not 4. Every raw-encoding line in `atomic.s`,
- `get_tp.s`, and `__set_thread_area.s` would have emitted only half
- the instruction, misaligning subsequent function symbols and tripping
- `R_AARCH64_(JUMP|CALL)26 relocation failed (val=…, addr=…)` at
- link.
-
-Result: **1263/1271 compile, 8 skips, libc.a archives at ~2.95 MB,
-hello links at 87 KB.** The 8 skips are the same long-double
-constant-folding files as amd64 (`__cosl.c`, `__sinl.c`, `__tanl.c`,
-`exp2l.c`, `fmaf.c`, `j1f.c`, `pow_data.c`) plus
-`src/thread/__unmapself.c` (inline asm with output operand —
-phase-3-blocked).
-
-### Hello segfault — open mystery
-
-Hello starts up, prints `hello from boot4 (tcc-built musl); argc=4`,
-then segfaults before the second printf. Isolating shows
-`malloc(8)` returns NULL deterministically in some link closures and
-succeeds in others. So far:
-
-- Direct `__syscall(SYS_brk, 0)` works (returns valid break).
-- Direct `__syscall(SYS_mmap, ...)` works (returns valid page).
-- `int main(void) { malloc(8); }` — NULL.
-- `int main(void) { malloc(8); printf(...); }` — still NULL.
-- Same with `extern int *__errno_location(void);` declared — still NULL.
-- Same with `putchar` before malloc — still NULL.
-- Same with extern `__syscall` reference — still NULL.
-- BUT: same after explicitly calling `__syscall(SYS_brk, 0)` once
- before malloc — succeeds. Repeated mallocs after that all succeed.
-
-The malloc trampoline path is right (verified in isolation); the asm
-primitives are right (verified). The trigger isn't an unresolved
-weak-alias `___errno_location` either — adding strong references to
-it doesn't change the behavior. Looks like an actual bug in mallocng's
-first-call init that depends on something subtle about call ordering
-or the kernel's brk-state-on-first-call semantics under QEMU emulation.
-
-**Pursuing root cause; not papering over with a `brk(0)` warm-up call.**
-
-### aarch64 skip taxonomy (pre-patch snapshot — 153 skipped sources)
-
-Compiling each skipped file in isolation and bucketing the first error:
-
-| count | bucket | category |
-|------:|--------|----------|
-| **79** | `pthread_arch.h:4` / `atomic_arch.h:5,73` "ARM64 inline asm operands not implemented yet" | arm64-asm.c phase 3 not started — input/output operand constraint plumbing (`subst_asm_operand`, `asm_compute_constraints`). Same root cause as `pthread_arch.h`'s `__get_tp` (uses `"=r"(tp)`), atomic primitives (`a_ll`/`a_sc`/`a_cas`), and the entire `arch/aarch64/syscall_arch.h` surface (every `__syscallN`). |
-| **30** | `arch/aarch64/<math>.c` "invalid operand reference after %" | parser doesn't accept the `%w0` width-modifier form for 32-bit views of x-registers; phase-3-adjacent. |
-| **17** | `atomic_arch.h:20` "dsb/dmb/isb: expected #imm option" | mnemonic is recognized but the parser wants `#imm`; musl writes `dmb ish` (the named option form). |
-| **17** | `store(0, (1011, 5130, 0))` and `assert fail: f == VT_FLOAT \|\| ...` / `assert fail: 0` | tcc internal codegen / assertion failures on aarch64 — NOT asm-related. Files: TLS init, abort, malloc, locale, errno-via-TLS. Pre-existing tcc 0.9.26 aarch64 codegen bugs. |
-| **5** | "known instruction expected" — scattered: `crt_arch.h`, `clone.s`, `__set_thread_area.s`, `memset.S`, `fenv.s` | mnemonics outside phase 1+2. `crt_arch.h:15` is the load-bearing one (uses `adrp` + `:lo12:` reloc). |
-| **3** | `CONDBR19`/`TSTBR14` reloc unsupported (`b.cond`/`cbz`/`cbnz`/`tbz` to extern targets) | per `docs/TCC-ARM64-ASM.md` phase 2: in-section only; extern targets need entries in `arm64-link.c`. |
-| **2** | `ldp/stp: expected register` (setjmp/longjmp) | parsing of pre/post-indexed forms not yet covered. |
-| **1** | `pow_data.c` "initializer element is not constant" | same long-double constant-folding bug as on amd64. |
-| **1** | crt step: `arch/aarch64/crt_arch.h:15: known instruction expected` | (separate from the 153 — kills the `crt1.o` build.) |
-
-### riscv64 status after patch round 1
+The build runs in `boot2-scratch:$ARCH` (scratch + busybox, no libc, no
+`/etc`) and produces only static artifacts. Dynamic linking and `ldso/`
+are intentionally out of scope.
-riscv64 mirrors the aarch64 strategy: tcc 0.9.26's `riscv64-asm.c`
-has a real upstream assembler for the base ISA, but
-`subst_asm_operand` is a stub (`tcc_error("RISCV64 asm not
-implemented.")`), so every musl inline-asm site with output operands
-fails. The lr/sc atomics, the named `fence rw,rw` form, and a handful
-of pseudo-instructions (`tail`, `j`, `ret`) are also absent.
+## Usage
-Patches added (mirrors the aarch64 set):
-
-- `arch/riscv64/syscall_arch.h` — static `__inline` wrappers calling
- one variadic `__syscall` trampoline
-- `src/internal/riscv64/syscall.s` — C-ABI → kernel-ABI shuffle, `ecall`
-- `arch/riscv64/pthread_arch.h` — `__get_tp` extern
-- `src/internal/riscv64/get_tp.s` — `mv a0, tp; jalr x0, x1, 0`
-- `arch/riscv64/atomic_arch.h` — `a_barrier` / `a_cas` / `a_cas_p` as externs
-- `src/internal/riscv64/atomic.s` — `lr.w/d.aqrl`, `sc.w/d.aqrl`,
- `bne ±12`, `fence rw, rw` as raw `.word` encodings; control flow as
- plain mnemonics
-- `arch/riscv64/crt_arch.h` — drop `tail` / `.option push/norelax/pop`
- / `lla gp, __global_pointer$` (static-only, GP relaxation
- unnecessary); pass `_DYNAMIC = NULL`; tail-call via `jal x0, _start_c`
-- `src/thread/riscv64/__set_thread_area.s` — replace the `ret`
- pseudo with `jalr x0, x1, 0` (other two instructions are stock).
- This file is on the `__init_tp` startup path; without it the
- generic C fallback returns -ENOSYS (no `SYS_set_thread_area` on
- riscv64) and `a_crash()` fires before main runs.
-
-Plus the per-arch va_list shim. tcc 0.9.26's stock riscv64 `stdarg.h`
-spells `__builtin_va_list` as `char *` and implements `va_arg` as the
-lp64 pointer-arithmetic macro (no helper-call required, unlike the
-amd64 path). The shim mirrors that exactly so musl's `<stdarg.h>` /
-`bits/alltypes.h` typedefs and macros resolve under `-nostdinc`.
-
-Plus `boot4.sh` deletion sweep: `src/math/riscv64/*.c` (FPU inline asm
-with `"=f"` constraints — portable C in `src/math/` takes over),
-`src/fenv/riscv64/*`, `src/setjmp/riscv64/*.S`, `src/signal/riscv64/*.s`,
-the remaining `src/thread/riscv64/*.s` (clone, syscall_cp,
-__unmapself), `src/process/riscv64/vfork.s` — all use displacement
-load/store syntax (`sd rs, off(rd)`), `csr*` mnemonics, or the
-missing pseudos (`j`, `ret`). libc.a will lack clone, syscall_cp,
-setjmp/longjmp/sigsetjmp, vfork, fenv, restore — fine for hello.
-
-Result: **1268/1271 sources compile**, libc.a 2.77 MB, hello 69 KB,
-runs in scratch+busybox container. The 3 remaining skips are:
-
-- `src/math/log.c`, `src/math/pow_data.c` — long-double constant-
- initializer folding (same tcc 0.9.26 bug that skips 11 files on
- amd64).
-- `src/thread/__unmapself.c` — `arch/riscv64/reloc.h`'s `CRTJMP`
- macro is `__asm__("mv sp, %1 ; jr %0" : : "r"(pc), "r"(sp) :
- "memory")`, which needs `subst_asm_operand`. Not on the hello
- path; called only when threads exit.
-
-Cleaner residual than amd64 (3 vs 11) and dramatically cleaner than
-aarch64 (3 vs 20) — riscv64 has no equivalent of aarch64's tcc
-codegen-bug bucket, so all asm-shaped failures cleared with the
-patch round.
+```sh
+scripts/boot3.sh <amd64|aarch64|riscv64>
+scripts/boot4.sh <amd64|aarch64|riscv64>
+```
## Inputs
-| Path | Contents |
-|------|----------|
-| `build/amd64/boot3/tcc3` | boot3's verified self-host tcc |
-| `build/tcc/X86_64/tcc-0.9.26-1147-gee75a10c/{lib,include}` | tcc lib + headers, staged by `stage1-flatten.sh` |
-| `vendor/upstream/musl-1.2.5.tar.gz` | pristine upstream musl tarball |
-| `vendor/upstream/musl-1.2.5-tcc.patch` | tcc-compat patch (3145 lines, 93 files) |
-| `scripts/boot4-musl-shim.h` | `__builtin_va_list` shim (see below) |
-
-## The musl patch
-
-The patch is the unified diff between upstream musl-1.2.5 and the
-pre-modified tree under `/Users/ryan/tmp/musltcc/musl-1.2.5/`. Most of
-it is deletions; the meaningful additions/modifications are:
+| Path | Purpose |
+|------|---------|
+| `build/$ARCH/boot3/tcc3` | fixed-point self-host tcc from boot3 |
+| `build/tcc/$TCC_TARGET/tcc-0.9.26-1147-gee75a10c/{include,lib}` | staged tcc headers and libtcc1 sources |
+| `vendor/upstream/musl-1.2.5.tar.gz` | pristine upstream musl source |
+| `vendor/upstream/musl-1.2.5-tcc.patch` | tcc-compat musl patch |
+| `scripts/boot4-musl-shim-$ARCH.h` | per-arch `__builtin_va_list` bridge |
-| File | Change |
-|------|--------|
-| `arch/x86_64/syscall_arch.h` | replace inline-asm syscalls with calls to a pure-asm trampoline (tcc lacks GCC's register-asm-variable extension) |
-| `src/internal/x86_64/syscall.s` | new SysV-ABI → kernel-ABI shim called by the new syscall_arch.h |
-| `src/include/features.h` | redefine `weak_alias()` as `.weak`/`.set` directives + an extern decl, since tcc ignores `__attribute__((alias(...)))` |
-| `src/internal/syscall.h`, `src/network/lookup*.{h,c}` | drop C99 `[static N]` array-parameter qualifiers (tcc 0.9.26 doesn't parse them) |
-| `include/complex.h` | stub out — tcc has no `_Complex` |
-| `src/complex/*` (deleted) | empty header makes them irrelevant |
-| `src/{fenv,signal}/x86_64/*.s`, `src/math/x86_64/*.c` (deleted) | drop x86_64 inline-asm overrides — tcc rejects SSE/x87 constraints, `stmxcsr`, x87 tbyte ops; the portable C fallbacks take over |
+Architecture mapping:
-## boot4-musl-shim.h
-
-Pre-included on every musl `.c` translation unit. musl's `stdarg.h` and
-generated `bits/alltypes.h` spell varargs the GCC way:
-
-```c
-typedef __builtin_va_list va_list;
-#define va_start(v,l) __builtin_va_start(v,l)
-```
+| `ARCH` | container platform | `TCC_TARGET` | musl target |
+|--------|--------------------|--------------|-------------|
+| `amd64` | `linux/amd64` | `X86_64` | `x86_64-linux-musl` |
+| `aarch64` | `linux/arm64` | `ARM64` | `aarch64-linux-musl` |
+| `riscv64` | `linux/riscv64` | `RISCV64` | `riscv64-linux-musl` |
-tcc 0.9.26 has no `__builtin_va_list` typename; its own `<stdarg.h>`
-spells the same shape `__va_list_struct[1]`. The shim aliases
-`__builtin_va_list` to that array type and routes the four
-`__builtin_va_*` macros to tcc's intrinsics (`__va_start`, `__va_arg`,
-`__builtin_frame_address`, `__builtin_va_arg_types`). libtcc1's
-`va_list.c` provides `__va_start` and `__va_arg` at link time.
-
-## Two tcc 0.9.26 traps
-
-1. **`-include` corrupts assembler input.** tcc 0.9.26 prepends the
- contents of `-include` files to `.s`/`.S` inputs as well as `.c`
- inputs, choking the assembler. It still emits a 620-byte ELF with
- no defined symbols and no error — the build looks green and the link
- fails with "undefined symbol memset". boot4 splits CFLAGS into
- `CFLAGS_C` (with `-include`) and `CFLAGS_ASM` (without).
-2. **No `__builtin_va_list`.** Solved by the shim above. Without it,
- the first musl source to pull in `<stdio.h>` errors with `';'
- expected (got "va_list")`.
-
-## Skipped sources
+## Outputs
-11 files are skipped (compile-on-fail in the loop, never reached at
-link time by hello). All of them lean on long-double constant folding
-that tcc 0.9.26 can't do — e.g. `static const long double toint =
-1.5/LDBL_EPSILON;` in `__rem_pio2l.c`:
+`scripts/boot4.sh` writes final artifacts to `build/$ARCH/boot4/`:
-```
-src/math/__rem_pio2l.c src/math/__sinl.c src/math/__tanl.c
-src/math/erfl.c src/math/lgammal.c src/math/modfl.c
-src/math/pow_data.c src/math/powl.c src/math/rintl.c
-src/math/roundl.c src/math/tgammal.c
-```
+| File | Purpose |
+|------|---------|
+| `libtcc1.a` | tcc runtime archive used when linking musl-built programs |
+| `libc.a` | static musl libc archive |
+| `crt1.o`, `crti.o`, `crtn.o` | static startup and init/fini CRT objects |
+| `hello` | static smoke-test ELF linked by boot4 |
-Anything that calls `sinl`, `cosl`, `tanl`, `erfl`, `lgammal`, `powl`,
-or other long-double trig/special functions will fail to link. hello.c
-doesn't use any of them. The musltcc demo (which uses tcc-mob 0.9.28rc)
-does not skip these.
+The staging copy under `build/$ARCH/.boot4-stage/` is disposable.
-## Outputs
+## Pipeline
+1. Copy boot3 `tcc3`, tcc headers, tcc runtime sources, musl tarball,
+ musl patch, and the per-arch shim into `build/$ARCH/.boot4-stage/in`.
+2. Build `libtcc1.a` with the boot3 tcc.
+3. Extract musl, apply `musl-1.2.5-tcc.patch`, and remove unsupported
+ arch-specific override files so portable C fallbacks are selected
+ where possible.
+4. Configure musl with `CC=$TCC AR=true RANLIB=true`, `--disable-shared`,
+ and `--disable-wrapper`.
+5. Generate `bits/alltypes.h`, `bits/syscall.h`, and `version.h`.
+6. Compile all selected musl sources. Sources that fail to compile are
+ skipped and reported; boot4 requires the remaining closure to archive,
+ link, and run hello.
+7. Build CRT objects, archive `libc.a`, link static `hello`, and execute
+ it inside the target container.
+
+Assembler inputs must not receive the va-list shim. tcc 0.9.26 applies
+`-include` to `.s`/`.S` as well as `.c`, so boot4 keeps separate
+`CFLAGS_C` and `CFLAGS_ASM`.
+
+## Compatibility Surface
+
+The musl patch keeps upstream musl mostly intact and replaces only the
+surfaces tcc 0.9.26 cannot compile:
+
+| Area | Rule |
+|------|------|
+| syscalls | replace GCC register-asm-variable wrappers with per-arch asm trampolines |
+| atomics / thread pointer | replace inline asm operands with extern asm helpers on aarch64/riscv64 |
+| weak aliases | implement `weak_alias` via assembler `.weak`/`.set` directives |
+| C99 array parameters | remove `[static N]` qualifiers tcc does not parse |
+| `_Complex` | stub `complex.h` and remove complex sources |
+| arch asm overrides | delete unsupported fenv, signal, setjmp, thread, string, math overrides as needed |
+| varargs | pre-include `scripts/boot4-musl-shim-$ARCH.h` for C translation units |
+
+Required tcc fixes live under `scripts/simple-patches/tcc-0.9.26/`.
+The musl build depends on the aarch64 literal-address load/store fixes
+and the LP64 `L`-suffix constant fix.
+
+## Status
+
+| arch | result | skipped sources |
+|------|--------|-----------------|
+| `amd64` | verified | 11 |
+| `aarch64` | verified | 8 |
+| `riscv64` | verified | 3 |
+
+Skipped sources are outside the boot4 hello closure. They fall into two
+categories:
+
+- long-double constant-folding files that tcc 0.9.26 cannot compile;
+- thread exit / low-level asm files needing inline-asm operand support.
+
+Anything that references a skipped function may fail to link. The boot4
+contract is a static libc sufficient to link and run the included hello
+smoke program, not full musl conformance.
+
+## Smoke Output
+
+Successful boot4 ends by running:
+
+```text
+hello from boot4 (tcc-built musl); argc=4
+strdup: works, strlen: 5
```
-libtcc1.a ~7 KB tcc runtime: libtcc1.o + alloca86_64.{o,bt.o} + va_list.o
-libc.a ~2.4 MB static musl libc, 1258 .o members
-crt1.o ~1.2 KB static-link entry stub
-crti.o ~830 B _init/_fini head
-crtn.o ~770 B _init/_fini tail
-hello ~55 KB static ELF, runs in container, prints argc + strdup demo
-```
-
-## Caveats / not done
-
-- musl is built **static-only**: `ldso/` is excluded from libc.a — it's
- for the dynamic linker and defines `__init_array_start` which collides
- with what tcc's internal linker synthesizes for `-static` binaries.
-- `compat/time32` skipped — 32-bit time_t aliases, irrelevant on
- x86_64 and produces duplicate-symbol errors.
-- aarch64: blocks at link — see "aarch64 status after patch round 1"
- for the remaining tcc 0.9.26 codegen bugs.
-- riscv64: end-to-end works after patch round 1 (see "riscv64 status
- after patch round 1"). 3 residual skips, none on the hello path.
-- No `make`, `busybox`, or further userland — boot4 stops at hello.
- The musltcc demo continues to GNU make 4.4.1 and busybox 1.36.1; that
- pipeline could plug in here once the libc is solid.
diff --git a/scripts/simple-patches/tcc-0.9.26/lp64-long-constant.after b/scripts/simple-patches/tcc-0.9.26/lp64-long-constant.after
@@ -0,0 +1,6 @@
+ lcount++;
+#if (!defined TCC_TARGET_X86_64 && !defined TCC_TARGET_ARM64 && !defined TCC_TARGET_RISCV64) || defined TCC_TARGET_PE
+ if (lcount == 2)
+#endif
+ must_64bit = 1;
+ ch = *p++;
diff --git a/scripts/simple-patches/tcc-0.9.26/lp64-long-constant.before b/scripts/simple-patches/tcc-0.9.26/lp64-long-constant.before
@@ -0,0 +1,6 @@
+ lcount++;
+#if !defined TCC_TARGET_X86_64 || defined TCC_TARGET_PE
+ if (lcount == 2)
+#endif
+ must_64bit = 1;
+ ch = *p++;
diff --git a/scripts/stage1-flatten.sh b/scripts/stage1-flatten.sh
@@ -140,6 +140,12 @@ apply_our_patch getcwd-stub "$SRC/tccgen.c"
apply_our_patch ldexp-stub "$SRC/tccpp.c"
apply_our_patch date-time-stub "$SRC/tccpp.c"
apply_our_patch lex-char-unsigned "$SRC/tccpp.c"
+
+# LP64 constants: upstream's parser treats one `L` suffix as 64-bit
+# only on x86_64. ARM64/RISCV64 are LP64 too; without this, `-4096UL`
+# is zero-extended from 32 bits and musl's __syscall_ret rejects valid
+# high mmap addresses as errors.
+apply_our_patch lp64-long-constant "$SRC/tccpp.c"
apply_our_patch elfinterp-stub "$SRC/tccelf.c"
# x86_64 static-link PLT32 collapse: under BOOTSTRAP we force
diff --git a/tests/cc/339-lp64-unsigned-long-constant.c b/tests/cc/339-lp64-unsigned-long-constant.c
@@ -0,0 +1,19 @@
+/* LP64 integer suffix regression.
+ *
+ * tcc 0.9.26 treated one `L` suffix as 64-bit only for x86_64. On
+ * aarch64/riscv64 that made `-4096UL` become 0x00000000fffff000
+ * instead of 0xfffffffffffff000, which broke musl's __syscall_ret:
+ * valid high mmap addresses were classified as syscall errors.
+ */
+
+int main(void)
+{
+ unsigned long u = -12UL;
+ unsigned long threshold = -4096UL;
+ unsigned long high_user_addr = 0x0000ffff00000000UL;
+
+ if ((long)u != -12L) return 1;
+ if ((long)threshold != -4096L) return 2;
+ if (high_user_addr > threshold) return 3;
+ return 0;
+}