commit 930ba4a2e8118f8b48f5d222312a01d0e011b05a
parent 7bb17ff74249cc529f25e34b5b31b0e5cb3f520e
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 4 May 2026 08:10:14 -0700
docs: document riscv64 u32-narrowing limitation in TCC-TODO
tests/cc/335-ternary-merge-arith-conv fails on riscv64 through both
tcc-cc[stage2] and tcc-cc[stage3] — identical behavior, so the bug
is in tcc's RISC-V codegen rather than in our pipeline. The proximate
load-instruction fix (LW → LWU for unsigned 4-byte loads in
riscv64-gen.c::load()) is real, but applying it alone regresses
017-int-arith and 128-cast-signedness because stock tcc has
compensating sign-extension on the immediate-load and 64-bit-compare
paths. Documents the three coupled pieces a real fix needs and why
this is out of scope for simple-patches/.
Diffstat:
| M | docs/TCC-TODO.md | | | 62 | ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- |
1 file changed, 58 insertions(+), 4 deletions(-)
diff --git a/docs/TCC-TODO.md b/docs/TCC-TODO.md
@@ -157,9 +157,63 @@ suite shouldn't depend on:
codes instead of calling `sys_write`/`strlen`. Plain `tests/cc`
fixtures must not need stdio/libc helpers.
+## Known limitations
+
+### riscv64: u32 narrowing leaves dirty upper bits
+
+`tests/cc/335-ternary-merge-arith-conv` fails on riscv64 in both
+`tcc-cc[stage2]` and `tcc-cc[stage3]` (identical behavior — the
+fixed-point property holds, the bug is in tcc's RISC-V codegen, not
+in cc.scm or the P1 pipeline). aarch64 and amd64 are green.
+
+The proximate trigger is in `riscv64-gen.c::load()`:
+
+```c
+func3 = size == 1 ? 0 : size == 2 ? 1 : size == 4 ? 2 : 3;
+if (size < 4 && !is_float(sv->type.t) && (sv->type.t & VT_UNSIGNED))
+ func3 |= 4; // promotes lb→lbu, lh→lhu, but skips lw→lwu
+```
+
+The `func3 |= 4` promotion to LWU is gated on `size < 4`, so a 4-byte
+unsigned load uses LW (sign-extending) instead of LWU (zero-extending).
+`gen_cast` to `VT_INT|VT_UNSIGNED` from a wider source emits no
+narrowing — it relies on the use-time load to truncate, but with LW
+the high u32 bits of the source leak through. `(u32)x` where `x` is
+`u64` with bit 31 set then evaluates to `0xFFFFFFFFFFFFFFFF`. This
+same bug is present in upstream tcc mob.
+
+**Why the one-line patch isn't enough.** Widening the gate to
+`size <= 4` (so 4-byte unsigned loads use LWU) regresses
+`017-int-arith` and `128-cast-signedness`. They were passing because
+two compensating bugs canceled out: stock tcc on riscv64 also
+sign-extends unsigned 32-bit immediate constants (`LUI`/`ADDI` with a
+bit-31-set value), so a comparison between an `unsigned int`
+variable (loaded with sign-extending LW) and an `unsigned int`
+constant (loaded with sign-extending LUI/ADDI) had matching dirty
+upper bits and `BEQ` saw them as equal. Fixing only the load breaks
+that join, because the compare path also lies — `BEQ` is a 64-bit
+instruction but C semantics require 32-bit width for `unsigned int ==
+unsigned int`.
+
+**Full fix shape.** Three coupled pieces: (1) load — emit LWU for
+unsigned 4-byte loads; (2) immediate — clear bits 32–63 when
+materializing an unsigned 32-bit constant with bit 31 set; (3)
+compare — eagerly canonicalize 32-bit-typed values into zero-extended
+or sign-extended form (per `VT_UNSIGNED`) after every op that can
+leave the upper half dirty. Pieces 2 and 3 overlap: if values are
+canonicalized at every produce site, the load fix becomes one of many
+sites that need to do it. This is what gcc/clang's RISC-V backends
+do, and it's beyond the scope of the literal-block `simple-patches`
+mechanism — file upstream or write a real canonicalization pass.
+
+For now: known limitation, document, move on. The scalar codegen
+elsewhere on riscv64 is fine — only u32 narrowing of a wider source
+trips it.
+
## Next steps
-The cc.scm path is at full parity with the gcc-built control: every
-fixture in `tcc-cc` and `tcc-libc` passes on both. Further bug-hunting
-work is open-ended — surface a misbehavior, write a `tests/cc` fixture
-that locks it, fix.
+The cc.scm path is at full parity with the gcc-built control on the
+test suites that pass: every fixture in `tcc-cc` and `tcc-libc`
+passes on both, modulo the riscv64 limitation noted above. Further
+bug-hunting work is open-ended — surface a misbehavior, write a
+`tests/cc` fixture that locks it, fix.