commit afba887f0c618419dd2bd013b6326c27d03b9bd9
parent 9e1803cb83d77b9d2501e034d8be1631a25020b0
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Wed, 6 May 2026 11:03:09 -0700
seed-kernel: riscv64 DRIVER=seed end-to-end fixes
Three blockers between boot0 and boot6 under DRIVER=seed riscv64:
- hex2 stage 4 panicked because brk_base = g_user_image_end (16-byte
aligned past end-of-image) collided with hex2's PC-relative scratch
buffer at 0x6007e8: write_string's 8-byte sd-zero terminator at the
buffer tail landed on the first heap node's str pointer, which a
later lookup walked into and null-deref'd. Round brk_base up to the
next 4 KiB page in kmain and sys_spawn (Linux's convention).
- boot4 stage D faulted in mem_cpy storing to 0x10000 because tcc
0.9.26's riscv64-link.c defaults ELF_START_ADDR=0x10000, below
USER_VA_LO=0x200000. Add simple-patch riscv64-elf-start-addr to move
the default to 0x600000, matching the rest of the chain
(M0/hex2pp -B, scheme1, hex2, ...). Belt-and-suspenders:
boot4-gen-runscm.sh and boot5-gen-runscm.sh also pass
-Wl,-Ttext=0x600000 on riscv64 link lines so the chain is robust
even if a future tcc.flat.c regen lands without the patch.
- boot5 (-m 3072M) couldn't read the DTB: OpenSBI placed it at PA
0x13fe00000, beyond the kernel's L2 direct map. Extend
arch/riscv64/mmu.c's kernel direct map to L2 slots 2-7
(PA 0x80000000..0x1ffffffff, 6 GiB), and relocate the device alias
to L2 slot 8 (ARCH_DEVICE_ALIAS_BASE = 1<<33). arch_mmio_ptr in
kernel.S shifts by 33 to match.
Per-stage validation: boot0..boot6 each pass under DRIVER=seed.
End-to-end scripts/boot.sh riscv64 not yet rerun after the fixes.
Drop the SEED-RISCV64-TODO.md working doc; the riscv64 path is at
parity with aarch64 at the per-stage level.
Diffstat:
10 files changed, 74 insertions(+), 209 deletions(-)
diff --git a/docs/SEED-RISCV64-TODO.md b/docs/SEED-RISCV64-TODO.md
@@ -1,196 +0,0 @@
-# riscv64 seed-kernel TODO
-
-Working doc. Captures the open work needed to get
-`DRIVER=seed ./scripts/boot.sh riscv64` to a clean exit, mirroring the
-aarch64 path. Pairs with `docs/OS.md` (kernel contract) and
-`docs/TCC.md` (compiler).
-
-## Goal
-
-`DRIVER=seed ./scripts/boot.sh riscv64` should run the full
-boot0→boot6 chain entirely *inside* the tcc-built riscv64 seed kernel
-(the kernel is its own build driver, with podman only used to mint
-the very first kernel image). This is the non-negotiable end-to-end
-validation: it exercises every kernel path the chain depends on
-(ELF load, MMU, syscalls, virtio-blk DMA, fork/exec, exit) under
-real workloads — boot4 alone runs ~5000 user-mode tcc invocations
-inside the kernel.
-
-The aarch64 path already passes; this work brings riscv64 to parity.
-
-## What works (May 2026)
-
-- `boot6 riscv64` builds a clean ELF kernel from tcc3, located at
- `build/riscv64/boot6/kernel.elf`. Loads under OpenSBI on
- `qemu-system-riscv64 -machine virt`.
-- Kernel reaches `kmain`, parses DTB (mem 0x80000000), brings up
- virtio-blk, parses cpio, lists tmpfs, loads ELF, erets to user.
-- DRIVER=seed wiring is in place: `boot.sh`, `boot[0-5].sh`,
- `boot6.sh`, `lib-runscm.sh`, `lib-pipeline.sh` all dispatch to
- `qemu-system-riscv64` and the right kernel filename
- (`kernel.elf` vs aarch64's `Image`) when `SEED_ARCH=riscv64`.
-- Boot0 stages 1, 2, 3 run cleanly under DRIVER=seed: the kernel
- runs `hex0-seed`, `hex0`, `hex1` in user mode and SEEDFS-extracts
- the correct outputs. This exercises ELF load, eret_to_user,
- openat/read/write/close/lseek/exit syscalls, and virtio-blk DMA.
-
-## Blocker: boot0 stage 4 user-mode panic
-
-Stage 4 runs `hex2 in/catm.hex2 out/catm` inside the seed kernel.
-Hex2 boots, processes the file, and panics partway through with:
-
-```
-PANIC: user sync, ESR=0x000000000000000d ELR=0x0000000000600730 FAR=0x0000000000000000
-```
-
-— a user-mode load page fault on a null pointer. Same `hex2` binary
-runs correctly under podman/Linux on the same input, so the bug is
-on the seed-kernel side, not in the assembled hex2.
-
-The fault is at `lbu t4, 0(t2)` after `ld t2, 16(t0)`; t2=0 means the
-ld read a zero from memory at `t0+16`. Disassembly says t0 should
-hold s1 (set by `addi t0, s1, 0` four instructions earlier), but the
-trapframe dump on the panic path shows t0=0x6007f0 (initial brk
-address), not s1=0x600905, with no instruction in between that
-modifies t0.
-
-### Investigation so far
-
-One real bug found and fixed: `trap_entry` was clobbering t0/t1
-(using them to read sscratch and to reach `saved_user_sp`) **before**
-saving them to the trapframe. Linux's RISC-V syscall ABI preserves
-all GPRs except a0; user code that holds state in t0/t1 across an
-ecall would otherwise see kernel garbage on return. The fix in
-`seed-kernel/arch/riscv64/kernel.S` reorders the saves: x5 and x6
-are now stashed before any kernel scratch use.
-
-After the fix the dump shows t1=0x6007e8 correctly (user value at
-trap time), but t0 still reads back as 0x6007f0 — so something else
-is going on. Candidates, in order of decreasing likelihood:
-
-1. **More tcc-riscv64 codegen bugs in the kernel itself.** We already
- landed three: `abtol-long-accumulator` (mes-libc), `riscv64-cvt-int-zext`
- + `riscv64-gen-cvt-sxtw` (u32→u64 didn't zero-extend), and
- `riscv64-load-ptr-zext` (lui sign-extending pointer constants).
- The kernel's trap entry/exit asm is the most exercised path; if
- tcc miscompiles any C in trap_sync that touches the trapframe,
- the dump reads garbage.
-2. **Hex2 internal layout I'm misreading without source.** Possible
- but doesn't explain why podman/Linux works on the same bytes.
-3. **A trap_entry recursion I haven't identified** — e.g., a fault
- inside trap_entry's saves that triggers a second pass through and
- overwrites the original trapframe before C sees it.
-
-The dumps to confirm or rule out (1) are checked in but commented
-out; the trap_sync `[sc] nr=…` syscall trace also stays out by
-default to keep boot transcripts short.
-
-## How to repro
-
-```sh
-# One-time prereq (10–15 min):
-DRIVER=podman ./scripts/boot.sh riscv64
-
-# Stash the kernel so the wipe in DRIVER=seed below preserves it:
-mkdir -p build/.seed-bootstrap/riscv64
-cp build/riscv64/boot6/kernel.elf build/.seed-bootstrap/riscv64/
-
-# Reproduce the panic (~3 min into the run, in boot0 stage 4):
-DRIVER=seed ./scripts/boot.sh riscv64
-```
-
-The full per-stage QEMU transcripts land in
-`build/riscv64/.boot0-stage/s04/transcript.txt`.
-
-## Smaller-scope reproducer
-
-Once the chain has been built once, the failing stage can be replayed
-without re-running boot0 stages 1–3:
-
-```sh
-mkdir -p build/.qtest/s04/in
-cp build/riscv64/boot0/hex2 build/.qtest/s04/init
-cp vendor/seed/riscv64/catm.hex2 build/.qtest/s04/in/catm.hex2
-chmod +x build/.qtest/s04/init
-( cd build/.qtest/s04 && { echo init; find in -type f; } | sort -u | \
- cpio -o -H newc 2>/dev/null ) > build/.qtest/s04/in.img
-truncate -s 256M build/.qtest/s04/out.img
-
-qemu-system-riscv64 -machine virt -m 2048M -nographic -no-reboot \
- -global virtio-mmio.force-legacy=false \
- -kernel build/riscv64/boot6/kernel.elf \
- -drive file=build/.qtest/s04/in.img,if=none,format=raw,id=hd0,readonly=on \
- -device virtio-blk-device,drive=hd0 \
- -drive file=build/.qtest/s04/out.img,if=none,format=raw,id=hd1 \
- -device virtio-blk-device,drive=hd1 \
- -append "hex2 in/catm.hex2 out/catm"
-```
-
-To turn on the per-syscall trace and panic-time register dump that
-nailed down the trap_entry bug, restore the prints around
-`trap_sync()` in `seed-kernel/kernel.c` (see git history for the
-exact diagnostics — they were removed before commit to keep
-transcripts clean).
-
-## Rough work plan
-
-1. **Re-add the diagnostic prints under a compile-time flag** so
- they aren't free-text deletes and don't pollute the boot logs by
- default.
-2. **Identify the t0 mismatch.** Most likely path: write a tiny
- trap_entry self-test that has the kernel do a deliberate ecall in
- a known register state and assert tf->x[5] reads back the value
- it was set to before the ecall. If it fails, the bug is in
- trap_entry itself (asm-level); if it passes, the bug is somewhere
- between hex2's PC 0x600720 and 0x600730.
-3. **Walk forward from there.** Each subsequent stage of the chain
- may surface new tcc-riscv64 codegen issues — boot1, boot2's
- scheme1, boot3's cc.scm-built tcc0, boot4's tcc1/2/3 self-host —
- so expect this to be N rounds of *kernel runs → fault → tcc patch
- or kernel asm fix → boot4 rebuild*. The TCC_BOOTSTRAP_RELAX_FIXEDPOINT
- knob in `boot4.sh` is there exactly for this loop: each
- codegen-altering tcc patch needs one extra bootstrap pass before
- tcc2 == tcc3 settles.
-
-## Patches and source changes already landed
-
-- `vendor/mes-libc/patches/abtol-long-accumulator.{before,after}` —
- `int i` → `long i` so `strtoull("0x80200000", …, 16)` returns
- `0x80200000` instead of sign-extending to `0xffffffff80200000`.
- Without this, tcc3 mishandles `-Wl,-Ttext=0x80200000` on the
- riscv64 link line and the resulting ELF is unloadable.
-- `scripts/simple-patches/tcc-0.9.26/riscv64-cvt-int-zext.{before,after}`
- + `riscv64-gen-cvt-sxtw.{before,after}` — make `gen_cvt_sxtw` emit
- `addiw` for signed and `slli;srli` for unsigned, and remove the
- call-site gate that skipped the unsigned case. Without this,
- `(u64)be32(p)` in the seed kernel's DTB parser sign-extends
- cells whose top bit is set, so `mem_start = 0x80000000` reads
- back as `0xffffffff80000000`.
-- `scripts/simple-patches/tcc-0.9.26/riscv64-load-ptr-zext.{before,after}`
- — widens the existing `bt == VT_LLONG` zext check at constant load
- time to also cover `VT_PTR` and `VT_FUNC`. Without this,
- `(u8 *)0x8b000000UL` (kheap_end constant) loads as `0xffffffff8b000000`
- because `lui` always sign-extends bits 63:32.
-- `seed-kernel/arch/riscv64/kernel.S`:
- - Macros now use `.long` (32-bit) not `.word` — tcc 0.9.26's `.word`
- is 16-bit, so the encoded CSR-op constants would be truncated.
- - `SD`/`SW` macros emit base-first (`sd base, src, off`), since
- tcc's riscv64 assembler parses three-comma stores as
- `<rs1>, <rs2>, <imm>` rather than GAS's `<src>, <imm>(<base>)`.
- - `bgeu` offset in the bss-zero loop changed from 12 to 16
- (off-by-one: 12 lands on the `J(1b)` instruction, not the next-stage
- label).
- - `trap_entry` saves x5 (t0) and x6 (t1) **before** any kernel
- scratch use, instead of reading sscratch into t0 first.
-- `scripts/boot4.sh` gains a `TCC_BOOTSTRAP_RELAX_FIXEDPOINT=1`
- escape: codegen-altering tcc patches need a second bootstrap pass
- before `cmp tcc2 tcc3` agrees. The next boot4 run (started from
- the relaxed run's tcc3) settles back to a real fixed point.
-- `scripts/boot6.sh` and `scripts/boot6-gen-runscm.sh` extended
- for amd64 + riscv64; emit the right link base address and ELF
- format per arch.
-- `scripts/lib-runscm.sh` and `scripts/lib-pipeline.sh` dispatch to
- `qemu-system-riscv64` (TCG only — no hvf for riscv on Apple
- Silicon, hence ~10× slower per stage than aarch64) when
- `SEED_ARCH=riscv64`. All the per-stage `boot[0-5].sh` scripts
- pick the correct kernel filename for the active arch.
diff --git a/scripts/boot4-gen-runscm.sh b/scripts/boot4-gen-runscm.sh
@@ -31,6 +31,18 @@ case "$ARCH" in
*) echo "boot4-gen: unknown arch $ARCH" >&2; exit 2 ;;
esac
+# Per-arch link base for user binaries. tcc 0.9.26's riscv64-link.c
+# defaults to ELF_START_ADDR=0x10000, which lives below the seed
+# kernel's USER_VA_LO (0x200000). amd64 (0x400000) and aarch64
+# (0x400000) defaults already sit inside the user window, so we leave
+# them alone. Everywhere else in the chain (M0/hex2pp -B, boot6
+# -Wl,-Ttext) we link riscv64 user binaries at 0x600000; do the same
+# here so tcc-built outputs are loadable inside the seed kernel.
+case "$ARCH" in
+ riscv64) LINK_TTEXT='"-Wl,-Ttext=0x600000"' ;;
+ *) LINK_TTEXT= ;;
+esac
+
# emit_helpers — cc reads .S/.c sources from in/, writes .o to out/.
# cc_path is the cwd-relative path to the spawned compiler binary (in/tcc0
# for round B; out/tcc1, out/tcc2 in later rounds).
@@ -71,7 +83,7 @@ emit_archive() {
emit_link_tcc() {
cc_path=$1; tag=$2; pfx=$3; out=$4
- echo "(must (run \"$cc_path\" \"-nostdlib\" \"out/${pfx}crt1.o\" \"in/tcc.flat.c\" \"out/${pfx}libc.a\" \"out/${pfx}libtcc1.a\" \"out/${pfx}libc.a\" \"-o\" \"out/$out\") \"$tag -> $out\")"
+ echo "(must (run \"$cc_path\" \"-nostdlib\" $LINK_TTEXT \"out/${pfx}crt1.o\" \"in/tcc.flat.c\" \"out/${pfx}libc.a\" \"out/${pfx}libtcc1.a\" \"out/${pfx}libc.a\" \"-o\" \"out/$out\") \"$tag -> $out\")"
}
{
@@ -100,7 +112,7 @@ emit_helpers in/tcc0 tcc0
cat <<EOF
(write-string stdout "boot4: stage C (tcc0 -> tcc1)\n")
-(must (run "in/tcc0" "-nostdlib" "out/start.o" "out/sys_stubs.o" "out/mem.o" "out/libc.o" "out/$LIB_HELPER_OBJ" "in/tcc.flat.c" "-o" "out/tcc1") "tcc0 -> tcc1")
+(must (run "in/tcc0" "-nostdlib" $LINK_TTEXT "out/start.o" "out/sys_stubs.o" "out/mem.o" "out/libc.o" "out/$LIB_HELPER_OBJ" "in/tcc.flat.c" "-o" "out/tcc1") "tcc0 -> tcc1")
(write-string stdout "boot4: stage D (tcc1 -> tcc2)\n")
EOF
@@ -120,11 +132,11 @@ emit_helpers out/tcc2 tcc2
emit_archive out/tcc2 tcc2 "s3-"
emit_link_tcc out/tcc2 tcc2 "s3-" tcc3
-cat <<'EPILOGUE'
+cat <<EOF
(write-string stdout "boot4: linking hello\n")
-(must (run "out/tcc2" "-nostdlib" "out/s3-crt1.o" "in/hello.c" "out/s3-libc.a" "out/s3-libtcc1.a" "out/s3-libc.a" "-o" "out/hello") "tcc2 -> hello")
+(must (run "out/tcc2" "-nostdlib" $LINK_TTEXT "out/s3-crt1.o" "in/hello.c" "out/s3-libc.a" "out/s3-libtcc1.a" "out/s3-libc.a" "-o" "out/hello") "tcc2 -> hello")
(write-string stdout "boot4: ALL-OK\n")
(exit 0)
-EPILOGUE
+EOF
} > "$OUT"
diff --git a/scripts/boot5-gen-runscm.sh b/scripts/boot5-gen-runscm.sh
@@ -47,6 +47,15 @@ CFLAGS_ASM_QUOTED="$CFLAGS_BASE_QUOTED"
CRTFLAGS_C_QUOTED="$CFLAGS_C_QUOTED \"-fno-stack-protector\" \"-DCRT\""
CRTFLAGS_ASM_QUOTED="$CFLAGS_ASM_QUOTED \"-fno-stack-protector\" \"-DCRT\""
+# tcc 0.9.26's riscv64-link.c default ELF_START_ADDR=0x10000 sits below
+# the seed kernel's USER_VA_LO (0x200000); land riscv64 user binaries
+# in the same 0x600000 window the rest of the chain uses. amd64
+# (0x400000) and aarch64 (0x400000) defaults already fit the window.
+case "$MUSL_ARCH" in
+ riscv64) LINK_TTEXT='"-Wl,-Ttext=0x600000"' ;;
+ *) LINK_TTEXT= ;;
+esac
+
{
cat <<'PROLOGUE'
;; boot5 run.scm — drive musl-1.2.5 (~500 TUs) + hello.
@@ -128,7 +137,7 @@ cat <<EOF
(write-string stdout "boot5: stage D (link hello)\n")
;; -Lout pulls libc.a (just built); -Lin pulls libtcc1.a (input).
-(must (run "in/tcc" "-static" "-nostdinc" "-nostdlib" "-include" "in/tcc-stdarg-bridge.h"
+(must (run "in/tcc" "-static" "-nostdinc" "-nostdlib" "-include" "in/tcc-stdarg-bridge.h" $LINK_TTEXT
"-I$CIN/include" "-I$CIN/arch/$MUSL_ARCH" "-I$CIN/arch/generic" "-I$CIN/obj/include"
"out/crt1.o" "in/hello.c" "-Lout" "-lc" "-Lin" "-ltcc1" "-Lout" "-lc" "-o" "out/hello") "link hello")
diff --git a/scripts/simple-patches/tcc-0.9.26/riscv64-elf-start-addr.after b/scripts/simple-patches/tcc-0.9.26/riscv64-elf-start-addr.after
@@ -0,0 +1,8 @@
+/* Stock tcc 0.9.26 defaults statically-linked riscv64 binaries to
+ `addr = 0x00010000`, which sits below the seed kernel's user VA
+ window (USER_VA_LO = 0x00200000). Land tcc-emitted ELFs at the
+ same 0x600000 the rest of the chain (M0/hex2pp -B, scheme1, hex2,
+ ...) uses, so sys_spawn's load_elf can copy segments into the
+ user pool without falling through to the unmapped low-PA range.
+ amd64 (0x400000) and aarch64 (0x400000) defaults already fit. */
+#define ELF_START_ADDR 0x00600000
diff --git a/scripts/simple-patches/tcc-0.9.26/riscv64-elf-start-addr.before b/scripts/simple-patches/tcc-0.9.26/riscv64-elf-start-addr.before
@@ -0,0 +1 @@
+#define ELF_START_ADDR 0x00010000
diff --git a/scripts/stage1-flatten.sh b/scripts/stage1-flatten.sh
@@ -258,6 +258,13 @@ apply_our_patch riscv64-cvt-int-zext "$SRC/tccgen.c"
apply_our_patch riscv64-gen-cvt-sxtw "$SRC/riscv64-gen.c"
apply_our_patch riscv64-load-ptr-zext "$SRC/riscv64-gen.c"
+# riscv64 ELF default load address — stock tcc lands binaries at
+# 0x10000, below the seed kernel's USER_VA_LO=0x200000. Move the
+# default to 0x600000 so tcc-emitted ELFs slot into the user pool
+# without per-link `-Wl,-Ttext=` overrides. Patch is gated by the
+# stock literal in the before-block, so it no-ops elsewhere.
+apply_our_patch riscv64-elf-start-addr "$SRC/riscv64-link.c"
+
# riscv64 stdarg.h order fix — the upstream `#elif __riscv` branch
# uses `__builtin_va_list` before it's typedef'd. Stock tcc treats
# `__builtin_va_list` as a built-in keyword and forgives the forward
diff --git a/seed-kernel/arch/riscv64/arch.h b/seed-kernel/arch/riscv64/arch.h
@@ -8,7 +8,7 @@
#define ARCH_ELF_MACHINE 0xf3
#define ARCH_ELF_MACHINE_NAME "riscv64"
-#define ARCH_DEVICE_ALIAS_BASE 0x100000000UL
+#define ARCH_DEVICE_ALIAS_BASE 0x200000000UL
#define ARCH_UART0_PA 0x10000000UL
#define ARCH_KERNEL_HEAP_END 0x8b000000UL
diff --git a/seed-kernel/arch/riscv64/kernel.S b/seed-kernel/arch/riscv64/kernel.S
@@ -268,8 +268,10 @@ arch_idle_forever:
.globl arch_mmio_ptr
arch_mmio_ptr:
+ /* Device alias offset = ARCH_DEVICE_ALIAS_BASE = 1 << 33.
+ * Must match the L2 slot picked in arch/riscv64/mmu.c. */
li t0, 1
- slli t0, t0, 32
+ slli t0, t0, 33
add a0, a0, t0
RET
diff --git a/seed-kernel/arch/riscv64/mmu.c b/seed-kernel/arch/riscv64/mmu.c
@@ -48,10 +48,23 @@ void arch_setup_mmu(void) {
fill_user_l1(0);
l2_root[0] = pte((u64)l1_user, PTE_V);
- l2_root[1] = pte(0x40000000UL, DFLAGS);
- l2_root[2] = pte(0x80000000UL, KFLAGS);
- l2_root[3] = pte(0xc0000000UL, KFLAGS);
- l2_root[4] = pte(0x00000000UL, DFLAGS);
+ l2_root[1] = pte(0x40000000UL, DFLAGS);
+ /* VA == PA identity map for kernel-side DRAM, slot per 1 GiB.
+ * Covers PA 0x80000000 .. 0x1ffffffff (6 GiB), enough for QEMU
+ * virt configurations up to ~6 GiB of RAM. The DTB lives near
+ * the top of physical RAM (e.g. 0x13fe00000 with -m 3072M); we
+ * read it via VA = dtb_phys, so it must fall inside this map. */
+ l2_root[2] = pte(0x80000000UL, KFLAGS);
+ l2_root[3] = pte(0xc0000000UL, KFLAGS);
+ l2_root[4] = pte(0x100000000UL, KFLAGS);
+ l2_root[5] = pte(0x140000000UL, KFLAGS);
+ l2_root[6] = pte(0x180000000UL, KFLAGS);
+ l2_root[7] = pte(0x1c0000000UL, KFLAGS);
+ /* Device alias: VA ARCH_DEVICE_ALIAS_BASE + PA → PA, used by
+ * arch_mmio_ptr / arch_console_putc to reach UART + virtio-mmio
+ * regs whose low-PA addresses overlap the user pool L1 slots.
+ * Lives well above the direct-map kernel window. */
+ l2_root[8] = pte(0x00000000UL, DFLAGS);
riscv_set_sum();
riscv_write_satp(((u64)8 << 60) | ((u64)l2_root >> 12));
diff --git a/seed-kernel/kernel.c b/seed-kernel/kernel.c
@@ -1123,8 +1123,15 @@ static i64 sys_spawn(struct trapframe *tf, const char *path, char **argv) {
return -ENOEXEC;
}
- /* Reset brk above the new image's end-of-bss. */
+ /* Reset brk above the new image's end-of-bss, page-aligned up.
+ * Some seed binaries (e.g. riscv64 hex2) embed PC-relative scratch
+ * buffers in the bytes just past their loaded image and assume brk
+ * lives a full page beyond — Linux rounds brk to PAGE_SIZE. If we
+ * placed the heap immediately after the image (16-byte aligned), a
+ * write through the in-binary scratch overlaps the first heap node
+ * and silently corrupts the user's data structures. */
brk_base = g_user_image_end ? g_user_image_end : USER_VA_LO;
+ brk_base = (brk_base + 0xfffUL) & ~0xfffUL;
brk_cur = brk_base;
/* Build new user stack at top of user VA window. */
@@ -1505,7 +1512,9 @@ void kmain(u64 dtb_phys) {
* end-of-bss (g_user_image_end, set by load_elf). 16 MB reserved at
* the top for the user stack. */
u64 ustack_top = USER_VA_HI;
+ /* See sys_spawn for why brk_base is page-rounded above end-of-image. */
brk_base = g_user_image_end ? g_user_image_end : USER_VA_LO;
+ brk_base = (brk_base + 0xfffUL) & ~0xfffUL;
brk_cur = brk_base;
brk_max = USER_VA_HI - 0x01000000UL;