Generalize VM execution into the shared exec seam + a hosted test suite - kit

commit 30367860c674c288dc894fe10c8595aa40519149
parent 040f0c2838b6b40acd1c3f38b981cc7efedadd2d
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Fri,  5 Jun 2026 09:35:39 -0700

Generalize VM execution into the shared exec seam + a hosted test suite

Lift hosted-VM execution out of test/toy/vm.sh into the shared runner
test/lib/exec_target.sh, and build a unified cross-OS hosted test suite on top
of it (test/hosted), green across the full support set.

Shared exec seam (test/lib/exec_target.sh + new test/lib/exec_vm.sh):
- Add <arch>-freebsd / <arch>-windows tags whose runner is a VM. A VM is
  expensive + stateful, so add lifecycle hooks the stateless podman/qemu runners
  don't need: exec_target_setup boots the VM lazily and once (one Windows VM
  serves both arches), keeps it warm across flushes, and exec_target_teardown_all
  (trap ... EXIT) stops only VMs we booted.
- Add <arch>-linux-glibc tags routing to per-arch Debian images, so glibc and
  musl (<arch>-linux -> alpine) both run through one flush; _exec_target_arch
  now takes the first field for 3-part tags and image-present memoizes per image.
- scripts/{freebsd,windows}_vm.sh run-batch gains a results contract: ship a
  staging dir in, run a generated run-remote.{sh,ps1}, bring each binary's
  rc/out/err back (FreeBSD bidirectional tar; Windows scp-back + Defender
  exclusion + Start-Process capture). Windows rc masked to 8 bits.
- Repoint test/toy/vm.sh at the seam (compile + queue only; ~140 lines lighter).

Hosted suite (scripts/hosted.sh + test/hosted):
- scripts/hosted.sh: one front-end to provision sysroots and cross-compile+link
  across the support set (<os>[-<libc>]-<arch>): prepare/path/triple/tag/cc/doctor,
  wrapping freebsd_sysroot.sh / llvm_mingw_sysroot.sh / the libc extract.sh; macOS
  uses the Xcode SDK via -isysroot.
- test/hosted: seed case hello.c built via hosted.sh and run via exec_target,
  checking exit code + stdout. make test-hosted (Linux+macOS) / test-hosted-vm
  (adds FreeBSD+Windows). 15-config matrix verified green (30 pass/0 fail):
  Linux {aa64,x64,rv64} x {musl-static,musl-dynamic,glibc} + macos-arm64 +
  windows {x64,aarch64} + freebsd {amd64,aarch64,riscv64}.

C frontend (lang/c/parse): recognize __restrict / __restrict__ as keyword
aliases of `restrict`. glibc headers use them as bare GCC keywords and #undef
the fallback macro under __GNUC__, so the preprocessor-macro approach was not
enough; this unblocks compiling against glibc headers on all arches.

No regression: existing exec_target consumers (test-parse 3840/0) still pass;
test/toy/run.sh and the corpus engine are untouched.

Diffstat:
M doc/TESTING.md  | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
M lang/c/parse/parse.c  | 6 ++++++
M lang/c/parse/parse_priv.h  | 8 ++++++++
M mk/test.mk  | 34 +++++++++++++++++++++++++++++++++-
M scripts/freebsd_vm.sh  | 30 +++++++++++++++++-------------
A scripts/hosted.sh  | 196 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M scripts/windows_vm.sh  | 42 +++++++++++++++++++++++-------------------
A test/hosted/cases/hello.c  | 6 ++++++
A test/hosted/cases/hello.expected  | 1 +
A test/hosted/cases/hello.stdout  | 1 +
A test/hosted/run.sh  | 143 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M test/lib/exec_target.sh  | 60 +++++++++++++++++++++++++++++++++++++++++++++++++-----------
A test/lib/exec_vm.sh  | 264 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M test/toy/vm.sh  | 376 ++++++++++++++++++++++---------------------------------------------------------

14 files changed, 904 insertions(+), 322 deletions(-)
diff --git a/doc/TESTING.md b/doc/TESTING.md
@@ -273,6 +273,19 @@ qemu-user, then a batched podman container, and reports "no runner" so callers
 SKIP cleanly. It is shared by the Toy cross lane, the hostas-cross lane, and the
 link/smoke/libc harnesses, so cross-exec policy lives in exactly one file.
 
+The same tag interface also covers `<arch>-freebsd` and `<arch>-windows`, whose
+runner is a **VM** (`test/lib/exec_vm.sh`). A VM is expensive to boot and
+stateful, so unlike the stateless podman/qemu runners it has a lifecycle:
+`exec_target_setup`/the flush boot the VM **lazily and once** (one Windows VM
+serves both arches), keep it warm across flushes, and `exec_target_teardown_all`
+(installed via `trap … EXIT`) shuts down only VMs we booted — never a VM the user
+already had running. The transport is `scripts/{freebsd,windows}_vm.sh run-batch`,
+which ships a staging dir in, runs a generated entry script, and brings each
+binary's `rc`/`out`/`err` back; Windows exit codes are masked to 8 bits so `.rc`
+is a uniform POSIX-style status across every tag. So a harness gains real
+FreeBSD/Windows execution just by selecting the tag — `test/toy/vm.sh` and the
+hosted suite both do.
+
 ## The Toy corpus as CG-API coverage
 
 The Toy frontend (`lang/toy/`, see [FRONTENDS.md](FRONTENDS.md)) is a small
@@ -303,11 +316,12 @@ corpus + oracle, the hosted counterparts to the freestanding-Linux X lane: it
 links each case against a real OS sysroot (FreeBSD `base.txz` extract via
 `scripts/freebsd_sysroot.sh`, or the llvm-mingw UCRT sysroot) and runs the
 binary on the genuine OS in a VM, so the full hosted path — ABI, CRT startup,
-the platform loader, syscalls/Win32 — is exercised. Per (os, arch) it stages
-every applicable case × opt × link-mode into one dir and drains it in a single
-VM session via the VM scripts' `run-batch` subcommand (`<id> <rc>` per binary,
-joined back to the oracle). FreeBSD covers amd64/aarch64/riscv64 (static +
-dynamic); Windows covers x64 (Prism emulation) + aarch64 on one ARM64 VM. These
+the platform loader, syscalls/Win32 — is exercised. It compiles every applicable
+case × opt × link-mode (so a codegen/link bug is caught even with no VM) and
+executes them through the shared seam (the `<arch>-freebsd`/`<arch>-windows`
+tags + `exec_vm.sh` lifecycle above), joining each exit code back to the oracle.
+FreeBSD covers amd64/aarch64/riscv64 (static + dynamic); Windows covers x64
+(Prism emulation) + aarch64 on one ARM64 VM. These
 are opt-in (`make test-toy-freebsd-vm` / `test-toy-windows-vm` / `test-toy-vm`),
 not in the default set, since they need provisioned VMs + cross sysroots and
 amd64/riscv64 FreeBSD run under slow TCG. Inapplicable cases SKIP (a committed
@@ -389,6 +403,41 @@ A related guard, `test-lib-deps`, asserts `libkit.a`'s set of external
 and that a relocatable link of the library exposes no non-public symbols. This
 keeps the freestanding library's dependency surface from drifting silently.
 
+## Hosted suite (cross-OS build + run)
+
+`test/hosted/` is a unified hosted-execution suite: each C case in
+`test/hosted/cases/*.c` is built for **every (target, link-mode) config in the
+support set** and run on it, checked against an exit-code + stdout oracle. It's
+the principled counterpart to the per-libc lanes above — one corpus, one runner
+seam, many OSes. The seed case is `hello.c`. Two verdicts per (case, config):
+`:build` (compile + link succeeded) and `:run` (correct exit code + stdout). The
+full matrix is 15 configs: Linux {aa64,x64,rv64} × {musl-static, musl-dynamic,
+glibc} + macOS-arm64 + Windows {x64,aarch64} + FreeBSD {amd64,aarch64,riscv64}.
+The libc rides in the exec tag (`<arch>-linux` = musl/alpine,
+`<arch>-linux-glibc` = glibc/debian), so a single flush routes every config to
+its runner (alpine/debian container, native, or VM).
+
+The two pieces it composes are reusable on their own:
+
+- **`scripts/hosted.sh`** — one front-end over every sysroot provisioner. Targets
+  are `<os>[-<libc>]-<arch>` (`linux-{glibc,musl}-{aa64,x64,rv64}`,
+  `freebsd-{amd64,aarch64,riscv64}`, `windows-{x64,aarch64}`, `macos-aarch64`).
+  `prepare` provisions the sysroot (wrapping `freebsd_sysroot.sh` /
+  `llvm_mingw_sysroot.sh` / the libc container `extract.sh`); `cc` compiles+links
+  for a target (adding the right `-target`/`--sysroot`/`-mconsole`/`-isysroot`);
+  `path`/`triple`/`tag`/`doctor` round it out.
+- **`test/lib/exec_target.sh`** runs the result (native / qemu-user / podman /
+  VM), picking the runner from the tag.
+
+Default config set is Linux (all three libc/link shapes) + macOS;
+`KIT_HOSTED_VM=1` (or `make test-hosted-vm`) adds the FreeBSD + Windows VM
+configs. On macOS, Linux binaries run under podman (no qemu-user on a Darwin
+host): musl in the pinned alpine images, glibc in per-arch Debian images
+(`make test-hosted` pulls both). Compiling against glibc headers needs kit's
+`__restrict`/`__restrict__` keyword aliases (glibc uses them as bare GCC
+keywords and `#undef`s the fallback macro). Opt-in (`make test-hosted`), not in
+the default set; a missing sysroot/runner SKIPs, never fails.
+
 ## Bootstrap reproducibility (stage2 == stage3)
 
 The strongest end-to-end correctness signal is that kit can compile itself to a
diff --git a/lang/c/parse/parse.c b/lang/c/parse/parse.c
@@ -1586,6 +1586,12 @@ void parse_c(Compiler* c, Pool* pool, Pp* pp, DeclTable* decls, CG* cg,
   p.sym_asm_alias = kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__asm"));
   p.sym_inline_alias = kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__inline"));
   p.sym_inline_alias2 = kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__inline__"));
+  /* __restrict / __restrict__: GCC keyword spellings of `restrict`. glibc
+   * headers use these as real keywords (and #undef any fallback macro under
+   * __GNUC__), so recognize them in the parser, not just via a pp macro. */
+  p.sym_restrict_alias = kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__restrict"));
+  p.sym_restrict_alias2 =
+      kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__restrict__"));
   p.sym_thread_alias = kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__thread"));
   p.sym_int128 = kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__int128"));
   p.sym_int128_t = kit_sym_intern(p.pool->c, KIT_SLICE_LIT("__int128_t"));
diff --git a/lang/c/parse/parse_priv.h b/lang/c/parse/parse_priv.h
@@ -284,6 +284,8 @@ typedef struct Parser {
   Sym sym_asm_alias;
   Sym sym_inline_alias;
   Sym sym_inline_alias2;
+  Sym sym_restrict_alias;
+  Sym sym_restrict_alias2;
   Sym sym_thread_alias;
   Sym sym_int128;    /* __int128 */
   Sym sym_int128_t;  /* __int128_t */
@@ -425,6 +427,8 @@ static inline CKw ident_kw_inline(const Parser* p, Sym name) {
   if (name == p->sym_asm_alias) return KW_BUILTIN_ASM;
   if (name == p->sym_inline_alias || name == p->sym_inline_alias2)
     return KW_INLINE;
+  if (name == p->sym_restrict_alias || name == p->sym_restrict_alias2)
+    return KW_RESTRICT;
   if (name == p->sym_thread_alias) return KW_THREAD_LOCAL;
   return KW_NONE;
 }
@@ -437,6 +441,10 @@ static inline int is_kw(const Parser* p, const Tok* t, CKw k) {
   if (k == KW_INLINE &&
       (t->v.ident == p->sym_inline_alias || t->v.ident == p->sym_inline_alias2))
     return 1;
+  if (k == KW_RESTRICT &&
+      (t->v.ident == p->sym_restrict_alias ||
+       t->v.ident == p->sym_restrict_alias2))
+    return 1;
   if (k == KW_THREAD_LOCAL && t->v.ident == p->sym_thread_alias) return 1;
   return 0;
 }
diff --git a/mk/test.mk b/mk/test.mk
@@ -171,7 +171,8 @@ DEFAULT_TEST_TARGETS = \
 		test-bootstrap-toy
 
 .PHONY: test $(TEST_TARGETS) windows-ucrt-sysroots \
-    test-toy-vm test-toy-freebsd-vm test-toy-windows-vm
+    test-toy-vm test-toy-freebsd-vm test-toy-windows-vm \
+    test-hosted test-hosted-vm hosted-glibc-images
 
 test: $(DEFAULT_TEST_TARGETS)
 
@@ -1083,6 +1084,37 @@ test-libc-musl-rv64:
 test-libc-glibc-rv64:
 	@$(MAKE) test-libc-glibc KIT_LIBC_ARCHES=rv64
 
+# Hosted test suite (test/hosted/run.sh): build each C case for every
+# (target, link-mode) config in the support set via scripts/hosted.sh and run it
+# through the shared exec seam, checking exit code + stdout. Opt-in (not in the
+# default set). The default config set is Linux (musl static+dynamic + glibc,
+# all 3 arches) + macOS; it provisions the linux sysroots + rt + the per-arch
+# run images (alpine for musl, debian for glibc) here. test-hosted-vm adds the
+# FreeBSD + Windows VM configs.
+HOSTED_LINUX_DEPS = \
+    $(MUSL_SYSROOT_MARKER) $(MUSL_SYSROOT_X64_MARKER) $(MUSL_SYSROOT_RV64_MARKER) \
+    $(GLIBC_SYSROOT_MARKER) $(GLIBC_SYSROOT_X64_MARKER) $(GLIBC_SYSROOT_RV64_MARKER) \
+    rt-aarch64-linux rt-x86_64-linux rt-riscv64-linux
+
+# Per-arch glibc (Debian) run images for the linux-glibc configs. musl uses the
+# pinned alpine images from `make test-images`. Best-effort pull (idempotent).
+HOSTED_GLIBC_IMAGES = \
+    docker.io/arm64v8/debian:bookworm-slim \
+    docker.io/amd64/debian:bookworm-slim \
+    docker.io/riscv64/debian:trixie-slim
+
+hosted-glibc-images:
+	@for img in $(HOSTED_GLIBC_IMAGES); do \
+	  podman image exists "$$img" 2>/dev/null || podman pull "$$img" || \
+	    echo "warn: could not pull $$img (linux-glibc run lanes will SKIP)"; \
+	done
+
+test-hosted: bin $(HOSTED_LINUX_DEPS) test-images hosted-glibc-images
+	@KIT=$(abspath $(BIN)) bash test/hosted/run.sh
+
+test-hosted-vm: bin $(HOSTED_LINUX_DEPS) test-images hosted-glibc-images rt-x86_64-pc-windows rt-aarch64-windows windows-ucrt-sysroots
+	@KIT=$(abspath $(BIN)) KIT_HOSTED_VM=1 bash test/hosted/run.sh
+
 # Fail if libkit.a depends on any external symbol not in the allowlist, or
 # if a relocatable link exposes non-public global definitions.
 # External dependency drift in either direction (new dep, or stale entry) is a
diff --git a/scripts/freebsd_vm.sh b/scripts/freebsd_vm.sh
@@ -39,8 +39,10 @@ commands:
   run <arch> [qemu...]   run VM in foreground with host-only SSH forwarding
   wait-ssh <arch>        wait for SSH and print uname
   ssh <arch> [cmd...]    SSH into a running VM
-  run-batch <arch> <dir> ship a staging dir into the VM, run its run-remote.sh,
-                         and relay its "<id> <rc>" output (used by test-toy)
+  run-batch <arch> <dir> <res>
+                         ship a staging dir into the VM, run its run-remote.sh,
+                         and bring per-binary res/<id>.{rc,out,err} back into
+                         <res> (used by test/lib/exec_vm.sh)
 
 arches:
   amd64 | x64 | aarch64 | arm64 | rv64 | riscv64
@@ -629,24 +631,26 @@ ssh_arch() {
   exec ssh "${args[@]}" "$SSH_USER@127.0.0.1" "$@"
 }
 
-# run_batch ARCH STAGEDIR
-#   Drain a whole directory of staged executables in a single SSH session: tar
-#   the stagedir into a guest tempdir and run its run-remote.sh entry script,
-#   which executes each binary and prints one "<id> <rc>" line. We only forward
-#   that stdout (and the remote rc); the caller joins the lines with its local
-#   manifest. One VM round-trip per (arch) keeps the test-toy VM lane viable
-#   even at hundreds of binaries per arch. The VM must already be reachable
-#   (the caller boots it and waits for SSH).
+# run_batch ARCH STAGEDIR RESULTSDIR
+#   Run a whole staging dir in a single SSH session and bring per-binary results
+#   back. Pure transport: tar the stagedir in on stdin, the guest extracts it and
+#   runs its run-remote.sh entry script (authored by the caller; it writes
+#   res/<id>.{rc,out,err} per binary), then tars res/ back on stdout, which we
+#   extract flat into RESULTSDIR. One VM round-trip per arch. The VM must already
+#   be reachable (the caller boots it and waits for SSH).
 run_batch_arch() {
-  local arch="$1" stagedir="$2" args
+  local arch="$1" stagedir="$2" resultsdir="$3" args
   arch="$(canon_arch "$arch")"
   [ -d "$stagedir" ] || die "run-batch: no such staging dir: $stagedir"
   [ -f "$stagedir/run-remote.sh" ] || die "run-batch: missing $stagedir/run-remote.sh"
+  [ -n "$resultsdir" ] || die "run-batch: results dir required"
   command -v tar >/dev/null 2>&1 || die "run-batch: tar not found on host"
+  mkdir -p "$resultsdir"
   # shellcheck disable=SC2207
   args=($(ssh_args "$arch"))
   tar -C "$stagedir" -cf - . | ssh "${args[@]}" "$SSH_USER@127.0.0.1" \
-    'd=$(mktemp -d /tmp/kit-toy.XXXXXX) && tar -C "$d" -xf - && sh "$d/run-remote.sh"; rc=$?; rm -rf "$d"; exit $rc'
+    'd=$(mktemp -d /tmp/kit-vm.XXXXXX) || exit 1; tar -C "$d" -xf - || exit 1; ( cd "$d" && sh run-remote.sh ) </dev/null >/dev/null 2>&1; tar -C "$d/res" -cf - . 2>/dev/null; rm -rf "$d"' \
+    | tar -C "$resultsdir" -xf - 2>/dev/null || true
 }
 
 doctor() {
@@ -681,7 +685,7 @@ case "$cmd" in
   run) [ $# -ge 2 ] || { usage; exit 2; }; arch="$2"; shift 2; run_arch "$arch" "$@" ;;
   wait-ssh) [ $# -eq 2 ] || { usage; exit 2; }; wait_ssh "$2" ;;
   ssh) [ $# -ge 2 ] || { usage; exit 2; }; arch="$2"; shift 2; ssh_arch "$arch" "$@" ;;
-  run-batch) [ $# -eq 3 ] || { usage; exit 2; }; run_batch_arch "$2" "$3" ;;
+  run-batch) [ $# -eq 4 ] || { usage; exit 2; }; run_batch_arch "$2" "$3" "$4" ;;
   -h|--help|help|"") usage ;;
   *) usage; exit 2 ;;
 esac
diff --git a/scripts/hosted.sh b/scripts/hosted.sh
@@ -0,0 +1,196 @@
+#!/usr/bin/env bash
+# scripts/hosted.sh — one front-end for provisioning sysroots and
+# cross-compiling+linking against them, across kit's hosted support set. Wraps
+# the per-OS provisioners (it does not replace them):
+#   FreeBSD  -> scripts/freebsd_sysroot.sh        (base.txz extract)
+#   Windows  -> scripts/llvm_mingw_sysroot.sh     (llvm-mingw UCRT)
+#   Linux    -> test/libc/{glibc,musl}/extract.sh (podman container extract)
+#   macOS    -> host SDK (native; no cross sysroot)
+#
+# Targets: <os>[-<libc>]-<arch>
+#   linux-glibc-{aa64,x64,rv64}  linux-musl-{aa64,x64,rv64}
+#   freebsd-{amd64,aarch64,riscv64}  windows-{x64,aarch64}  macos-aarch64
+#
+# Commands:
+#   doctor                     support set + what is provisioned here
+#   prepare <target>|all       provision the sysroot for a target
+#   path <target>              print the sysroot dir ("" for macOS native)
+#   triple <target>            print the kit -target triple
+#   tag <target>               print the test/lib/exec_target.sh exec tag
+#   cc <target> [kit-cc-args]  compile+link: kit cc -target T --sysroot S [...]
+#
+# env: KIT (default build/kit), plus the per-OS provisioners' own env.
+
+set -u
+
+ROOT="$(cd "$(dirname "$0")/.." && pwd)"
+KIT="${KIT:-$ROOT/build/kit}"
+
+SUPPORT_SET="
+linux-glibc-aa64 linux-glibc-x64 linux-glibc-rv64
+linux-musl-aa64 linux-musl-x64 linux-musl-rv64
+freebsd-amd64 freebsd-aarch64 freebsd-riscv64
+windows-x64 windows-aarch64
+macos-aarch64
+"
+
+die() { printf 'hosted: %s\n' "$*" >&2; exit 1; }
+
+# ---- target parsing --------------------------------------------------------
+# Sets globals: T_OS, T_LIBC (linux only), T_ARCH (canonical per-OS token).
+parse_target() {
+  local t="$1" os rest a
+  os="${t%%-*}"; rest="${t#*-}"
+  T_OS="$os"; T_LIBC=""; T_ARCH=""
+  case "$os" in
+    linux)
+      T_LIBC="${rest%%-*}"; a="${rest#*-}"
+      case "$T_LIBC" in glibc|musl) ;; *) die "bad linux libc in '$t' (want glibc|musl)";; esac
+      case "$a" in
+        aa64|aarch64|arm64) T_ARCH=aarch64 ;;
+        x64|x86_64|amd64)   T_ARCH=x64 ;;
+        rv64|riscv64)       T_ARCH=rv64 ;;
+        *) die "bad linux arch '$a' in '$t'" ;;
+      esac ;;
+    freebsd)
+      case "$rest" in
+        amd64|x64|x86_64)   T_ARCH=amd64 ;;
+        aarch64|arm64|aa64) T_ARCH=aarch64 ;;
+        riscv64|rv64)       T_ARCH=riscv64 ;;
+        *) die "bad freebsd arch '$rest' in '$t'" ;;
+      esac ;;
+    windows)
+      case "$rest" in
+        x64|x86_64|amd64)   T_ARCH=x64 ;;
+        aarch64|arm64|aa64) T_ARCH=aarch64 ;;
+        *) die "bad windows arch '$rest' in '$t'" ;;
+      esac ;;
+    macos)
+      case "$rest" in
+        aarch64|arm64|aa64) T_ARCH=aarch64 ;;
+        *) die "bad macos arch '$rest' in '$t' (arm64 only)" ;;
+      esac ;;
+    *) die "unknown os in target '$t'" ;;
+  esac
+}
+
+triple_of() {
+  parse_target "$1"
+  case "$T_OS" in
+    linux)
+      local suf=gnu; [ "$T_LIBC" = musl ] && suf=musl
+      case "$T_ARCH" in
+        aarch64) echo "aarch64-linux-$suf" ;;
+        x64)     echo "x86_64-linux-$suf" ;;
+        rv64)    echo "riscv64-linux-$suf" ;;
+      esac ;;
+    freebsd)
+      case "$T_ARCH" in
+        amd64)   echo x86_64-freebsd ;;
+        aarch64) echo aarch64-freebsd ;;
+        riscv64) echo riscv64-freebsd ;;
+      esac ;;
+    windows)
+      case "$T_ARCH" in
+        x64)     echo x86_64-windows ;;
+        aarch64) echo aarch64-windows ;;
+      esac ;;
+    macos) echo aarch64-apple-darwin ;;
+  esac
+}
+
+# exec_target tag (test/lib/exec_target.sh). Linux/macOS use aa64/x64/rv64;
+# FreeBSD uses amd64/aarch64/riscv64; Windows uses x64/aarch64.
+tag_of() {
+  parse_target "$1"
+  local ea
+  case "$T_ARCH" in aarch64) ea=aa64 ;; x64) ea=x64 ;; rv64) ea=rv64 ;; *) ea="$T_ARCH" ;; esac
+  case "$T_OS" in
+    linux)   [ "$T_LIBC" = glibc ] && echo "$ea-linux-glibc" || echo "$ea-linux" ;;
+    macos)   echo "$ea-macos" ;;
+    freebsd) echo "$T_ARCH-freebsd" ;;
+    windows) echo "$T_ARCH-windows" ;;
+  esac
+}
+
+_linux_sysroot_dir() { # libc arch
+  local libc="$1" arch="$2" suf=""
+  case "$arch" in x64) suf=-x64 ;; rv64) suf=-rv64 ;; esac
+  printf '%s/build/%s-sysroot%s' "$ROOT" "$libc" "$suf"
+}
+
+path_of() {
+  parse_target "$1"
+  case "$T_OS" in
+    linux)   _linux_sysroot_dir "$T_LIBC" "$T_ARCH" ;;
+    freebsd) "$ROOT/scripts/freebsd_sysroot.sh" path "$T_ARCH" 2>/dev/null ;;
+    windows) "$ROOT/scripts/llvm_mingw_sysroot.sh" path "$T_ARCH" 2>/dev/null ;;
+    macos)   echo "" ;;
+  esac
+}
+
+# ---- prepare ---------------------------------------------------------------
+prepare_one() {
+  parse_target "$1"
+  case "$T_OS" in
+    linux)
+      command -v podman >/dev/null 2>&1 || die "podman required for linux sysroots"
+      bash "$ROOT/test/libc/$T_LIBC/extract.sh" -a "$T_ARCH" ;;
+    freebsd) "$ROOT/scripts/freebsd_sysroot.sh" "$T_ARCH" ;;
+    windows) "$ROOT/scripts/llvm_mingw_sysroot.sh" prepare "$T_ARCH" ;;
+    macos)   printf 'macos: native host SDK, nothing to prepare\n' ;;
+  esac
+}
+
+# ---- cc --------------------------------------------------------------------
+# cc <target> [extra kit cc args...]  — adds -target, --sysroot, and any
+# OS-mandatory flags; everything else (sources, -o, -O, -static, ...) passes
+# through. macOS compiles native (no -target/--sysroot).
+cc_target() {
+  local target="$1"; shift
+  parse_target "$target"
+  [ -x "$KIT" ] || die "kit not found at $KIT (run 'make bin')"
+  local triple sysroot; triple="$(triple_of "$target")"; sysroot="$(path_of "$target")"
+  local args=()
+  if [ "$T_OS" = macos ]; then
+    # Native host compile against the Xcode SDK.
+    local sdk; sdk="$(xcrun --show-sdk-path 2>/dev/null)"
+    [ -n "$sdk" ] && args+=(-isysroot "$sdk")
+  else
+    args+=(-target "$triple")
+    [ -n "$sysroot" ] && args+=(--sysroot "$sysroot")
+    [ "$T_OS" = windows ] && args+=(-mconsole)   # console exit-code semantics
+  fi
+  exec "$KIT" cc "${args[@]}" "$@"
+}
+
+# ---- doctor ----------------------------------------------------------------
+doctor() {
+  printf 'host: %s/%s\n' "$(uname -s 2>/dev/null)" "$(uname -m 2>/dev/null)"
+  printf 'kit:  %s%s\n' "$KIT" "$([ -x "$KIT" ] && echo '' || echo ' (missing)')"
+  printf 'support set (sysroot provisioned?):\n'
+  local t sr ok
+  for t in $SUPPORT_SET; do
+    sr="$(path_of "$t")"
+    if [ "$(printf '%s' "$t" | cut -d- -f1)" = macos ]; then ok='native'
+    elif [ -n "$sr" ] && [ -d "$sr" ]; then ok='yes'
+    else ok='no'; fi
+    printf '  %-20s triple=%-22s tag=%-12s sysroot=%s\n' \
+      "$t" "$(triple_of "$t")" "$(tag_of "$t")" "$ok"
+  done
+}
+
+cmd="${1:-}"
+case "$cmd" in
+  doctor) doctor ;;
+  prepare)
+    [ $# -ge 2 ] || die "usage: $0 prepare <target>|all"
+    if [ "$2" = all ]; then for t in $SUPPORT_SET; do prepare_one "$t"; done
+    else prepare_one "$2"; fi ;;
+  path)   [ $# -eq 2 ] || die "usage: $0 path <target>"; path_of "$2" ;;
+  triple) [ $# -eq 2 ] || die "usage: $0 triple <target>"; triple_of "$2" ;;
+  tag)    [ $# -eq 2 ] || die "usage: $0 tag <target>"; tag_of "$2" ;;
+  cc)     [ $# -ge 2 ] || die "usage: $0 cc <target> [kit-cc-args]"; shift; cc_target "$@" ;;
+  -h|--help|help|"") sed -n '2,30p' "$0" | sed 's/^# \{0,1\}//' ;;
+  *) die "unknown command '$cmd' (try: doctor prepare path triple tag cc)" ;;
+esac
diff --git a/scripts/windows_vm.sh b/scripts/windows_vm.sh
@@ -90,8 +90,10 @@ provisioning (single Windows 11 ARM64 VM, serves both arches):
 execution (used by the COFF/PE smoke tests):
   smoke <arch>           run a small probe in the VM
   run <arch> exe [args]  upload exe to the VM, run it, then remove it
-  run-batch <arch> <dir> upload a staging dir's *.exe + run-remote.ps1, run it,
-                         and relay its "<id> <rc>" output (used by test-toy)
+  run-batch <arch> <dir> <res>
+                         upload a staging dir's *.exe + run-remote.ps1, run it,
+                         and bring per-binary res\<id>.{rc,out,err} back into
+                         <res> (used by test/lib/exec_vm.sh)
 
 arches:  x64 | x86_64 | amd64 | aarch64 | arm64 | aa64
 
@@ -825,19 +827,20 @@ run_exe() {
   return "$rc"
 }
 
-# run-batch ARCH STAGEDIR
-#   Drain a whole staging dir in one VM session: upload its *.exe plus the
-#   run-remote.ps1 entry script (authored by the caller), run the script, and
-#   relay its "<id> <rc>" stdout (one line per binary). The caller joins those
-#   lines with its host-side manifest. One upload+run per arch keeps the
-#   test-toy VM lane viable across hundreds of binaries. Mirrors run_exe's
-#   ssh/scp plumbing. The VM must already be reachable.
+# run-batch ARCH STAGEDIR RESULTSDIR
+#   Run a whole staging dir in one VM session and bring per-binary results back.
+#   Pure transport: upload the *.exe + run-remote.ps1 entry script (authored by
+#   the caller; it writes res\<id>.{rc,out,err} per binary), run it, then bring
+#   the res\ dir back, flattened into RESULTSDIR. One upload+run per arch.
+#   Mirrors run_exe's ssh/scp plumbing. The VM must already be reachable.
 run_batch() {
-  local arch="$1" stage="$2" destdir dest_fwd run_ps rc
+  local arch="$1" stage="$2" resultsdir="$3" destdir dest_fwd run_ps tmp
   [ -d "$stage" ] || die "run-batch: no such staging dir: $stage"
   [ -f "$stage/run-remote.ps1" ] || die "run-batch: missing $stage/run-remote.ps1"
+  [ -n "$resultsdir" ] || die "run-batch: results dir required"
   command -v ssh >/dev/null 2>&1 || die "ssh not found"
   command -v scp >/dev/null 2>&1 || die "scp not found"
+  mkdir -p "$resultsdir"
   ssh_setup "$arch"
   destdir="$(remote_mkdir)"
   dest_fwd="${destdir//\\//}"
@@ -848,21 +851,22 @@ run_batch() {
   # an admin session + Defender, but works even with Tamper Protection on
   # (path exclusions are still honored, unlike disabling real-time monitoring).
   remote_ps "try { Add-MpPreference -ExclusionPath \$env:TEMP -ErrorAction Stop } catch {}" >/dev/null 2>&1 || true
-  # Upload only the executables + runner; the host-side manifest/logs stay home.
+  # Upload only the executables + runner; the host-side bookkeeping stays home.
   if ! scp "${SSH_ARGS[@]}" "$stage"/*.exe "$stage/run-remote.ps1" \
         "$SSH_DEST:$dest_fwd/" >/dev/null 2>&1; then
     remote_cleanup "$destdir"
     die "scp upload failed -> $dest_fwd"
   fi
-  # Invoke the runner from the upload dir (it Set-Location's to $PSScriptRoot
-  # and prints "<id> <rc>" lines to stdout, which remote_ps relays verbatim).
+  # Run the runner from the upload dir (it Set-Location's to $PSScriptRoot and
+  # writes res\<id>.{rc,out,err} per binary).
   run_ps="\$ErrorActionPreference='Continue'; & (Join-Path '$(ps_sq "$destdir")' 'run-remote.ps1')"
-  set +e
-  remote_ps "$run_ps"
-  rc=$?
-  set -e
+  remote_ps "$run_ps" >/dev/null 2>&1 || true
+  # Bring res\ back and flatten it into RESULTSDIR.
+  tmp="$(mktemp -d)"
+  scp -r "${SSH_ARGS[@]}" "$SSH_DEST:$dest_fwd/res" "$tmp/" >/dev/null 2>&1 || true
+  [ -d "$tmp/res" ] && cp "$tmp/res/"* "$resultsdir/" 2>/dev/null || true
+  rm -rf "$tmp"
   remote_cleanup "$destdir"
-  return "$rc"
 }
 
 smoke_arch() {
@@ -966,7 +970,7 @@ case "$cmd" in
   stop) powerdown ;;
   smoke) [ $# -eq 2 ] || { usage; exit 2; }; smoke_arch "$2" ;;
   run) [ $# -ge 3 ] || { usage; exit 2; }; arch="$2"; exe="$3"; shift 3; run_exe "$arch" "$exe" "$@" ;;
-  run-batch) [ $# -eq 3 ] || { usage; exit 2; }; run_batch "$2" "$3" ;;
+  run-batch) [ $# -eq 4 ] || { usage; exit 2; }; run_batch "$2" "$3" "$4" ;;
   -h|--help|help|"") usage ;;
   *) usage; exit 2 ;;
 esac
diff --git a/test/hosted/cases/hello.c b/test/hosted/cases/hello.c
@@ -0,0 +1,6 @@
+#include <stdio.h>
+
+int main(void) {
+  printf("hello, world\n");
+  return 0;
+}
diff --git a/test/hosted/cases/hello.expected b/test/hosted/cases/hello.expected
@@ -0,0 +1 @@
+0
diff --git a/test/hosted/cases/hello.stdout b/test/hosted/cases/hello.stdout
@@ -0,0 +1 @@
+hello, world
diff --git a/test/hosted/run.sh b/test/hosted/run.sh
@@ -0,0 +1,143 @@
+#!/usr/bin/env bash
+# test/hosted/run.sh — the hosted test suite: build each C case for every
+# (target, link-mode) config in the support set with scripts/hosted.sh, run it
+# through the shared seam test/lib/exec_target.sh, and check exit code + stdout
+# against the oracle (<name>.expected / <name>.stdout). The first principled
+# cross-OS hosted-exec suite; seed case is cases/hello.c.
+#
+# Full matrix (15 configs):
+#   linux {aa64,x64,rv64} x {musl-static, musl-dynamic, glibc}   (podman: alpine
+#       for musl, debian for glibc; each routed by its exec tag)
+#   macos-aarch64                                                (native)
+#   windows {x64,aarch64}                                        (VM)
+#   freebsd {amd64,aarch64,riscv64}                              (VM)
+#
+# Two verdicts per (case, config): ":build" (cc+link ok) and ":run" (right exit
+# code + stdout). A target whose sysroot is absent SKIPs build; a config with no
+# runner here SKIPs run. The tag carries the libc, so one flush routes every
+# config to its runner.
+#
+# Default config set: Linux + macOS (fast). FreeBSD + Windows (VMs) are added by
+# KIT_HOSTED_VM=1. env: KIT, HOSTED_CONFIGS (override the list), KIT_HOSTED_VM,
+# EXEC_VM_KEEP_UP.
+
+set -u
+
+ROOT="$(cd "$(dirname "$0")/../.." && pwd)"
+KIT="${KIT:-$ROOT/build/kit}"
+HOSTED="$ROOT/scripts/hosted.sh"
+CASES="$ROOT/test/hosted/cases"
+BUILD_DIR="$ROOT/build/test/hosted"
+
+# shellcheck source=../lib/kit_sh_report.sh
+. "$ROOT/test/lib/kit_sh_report.sh"
+# shellcheck source=../lib/exec_target.sh
+. "$ROOT/test/lib/exec_target.sh"
+kit_report_init
+trap exec_target_teardown_all EXIT
+
+[ -x "$KIT" ] || { echo "hosted: kit not found at $KIT (run 'make bin')" >&2; exit 2; }
+
+# exec_target's caller contract for the linux/macos runners (VM tags ignore these).
+have_podman=0; command -v podman >/dev/null 2>&1 && have_podman=1
+QEMU_BIN="${QEMU_BIN:-$(command -v qemu-aarch64 2>/dev/null || true)}"
+QEMU_RV64_BIN="${QEMU_RV64_BIN:-$(command -v qemu-riscv64 2>/dev/null || true)}"
+have_qemu=0; [ -n "$QEMU_BIN" ] && have_qemu=1
+case "$(uname -m 2>/dev/null)" in aarch64|arm64) is_aarch64=1 ;; *) is_aarch64=0 ;; esac
+export have_podman QEMU_BIN QEMU_RV64_BIN have_qemu is_aarch64
+mkdir -p "$BUILD_DIR"
+EXEC_TARGET_MOUNT_ROOT="$BUILD_DIR"; export EXEC_TARGET_MOUNT_ROOT
+
+# ---- config list -----------------------------------------------------------
+# A config is "<target>[:<mode>]". mode (static|dynamic) only varies for musl;
+# glibc is dynamic-only, freebsd is static, windows/macos have one shape.
+LINUX_CONFIGS=""
+for a in aa64 x64 rv64; do
+  LINUX_CONFIGS="$LINUX_CONFIGS linux-musl-$a:static linux-musl-$a:dynamic linux-glibc-$a"
+done
+DEFAULT_CONFIGS="$LINUX_CONFIGS macos-aarch64"
+VM_CONFIGS="freebsd-amd64 freebsd-aarch64 freebsd-riscv64 windows-x64 windows-aarch64"
+if [ -n "${HOSTED_CONFIGS:-}" ]; then
+  CONFIGS="$HOSTED_CONFIGS"
+else
+  CONFIGS="$DEFAULT_CONFIGS"
+  [ "${KIT_HOSTED_VM:-0}" = 1 ] && CONFIGS="$CONFIGS $VM_CONFIGS"
+fi
+
+# Link flags for a (target, mode): musl honors the mode; freebsd is self-contained
+# static; glibc/windows/macos use their default shape (glibc is dynamic-only).
+link_flags_for() {
+  local target="$1" mode="$2"
+  case "$target" in
+    linux-musl-*) [ "$mode" = static ] && echo -static ;;
+    freebsd-*)    echo -static ;;
+    *)            echo ;;
+  esac
+}
+
+# ---- build + queue ---------------------------------------------------------
+H_NAME=(); H_OUT=(); H_RC=(); H_EXP=(); H_STDOUT=()
+
+printf 'hosted: configs=%s\n' "$CONFIGS"
+
+for case_src in "$CASES"/*.c; do
+  cbase="$(basename "${case_src%.c}")"
+  exp=0; [ -f "${case_src%.c}.expected" ] && exp="$(cat "${case_src%.c}.expected")"
+  want_out=""; [ -f "${case_src%.c}.stdout" ] && want_out="$(cat "${case_src%.c}.stdout")"
+  for config in $CONFIGS; do
+    target="${config%%:*}"; mode="${config#*:}"; [ "$mode" = "$config" ] && mode=""
+    os="${target%%-*}"
+    label="$cbase/$config"
+    sr="$("$HOSTED" path "$target" 2>/dev/null)"
+    if [ "$os" != macos ] && { [ -z "$sr" ] || [ ! -d "$sr" ]; }; then
+      kit_skip "$label:build" "missing sysroot (scripts/hosted.sh prepare $target)"
+      continue
+    fi
+    cdir="$BUILD_DIR/$(printf '%s' "$config" | tr ':/' '__')"; mkdir -p "$cdir"
+    ext=""; [ "$os" = windows ] && ext=".exe"
+    exe="$cdir/$cbase$ext"
+    # shellcheck disable=SC2046
+    if ! "$HOSTED" cc "$target" $(link_flags_for "$target" "$mode") "$case_src" -o "$exe" \
+          > "$cdir/$cbase.cc.out" 2> "$cdir/$cbase.cc.err"; then
+      kit_fail "$label:build" "hosted.sh cc failed"
+      sed 's/^/    | /' "$cdir/$cbase.cc.err" | head -20
+      continue
+    fi
+    kit_pass "$label:build"
+
+    tag="$("$HOSTED" tag "$target")"
+    if ! exec_target_supported "$tag"; then
+      kit_skip "$label:run" "no runner for $tag"
+      continue
+    fi
+    exec_target_queue "$tag" "$label" "$exe" \
+      "$cdir/$cbase.out" "$cdir/$cbase.err" "$cdir/$cbase.rc"
+    H_NAME+=("$label:run"); H_OUT+=("$cdir/$cbase.out")
+    H_RC+=("$cdir/$cbase.rc"); H_EXP+=("$exp"); H_STDOUT+=("$want_out")
+  done
+done
+
+# ---- execute + check -------------------------------------------------------
+exec_target_flush
+
+i=0; n="${#H_NAME[@]}"
+while [ "$i" -lt "$n" ]; do
+  name="${H_NAME[$i]}"; exp=$(( ${H_EXP[$i]} & 255 ))
+  rc="$(cat "${H_RC[$i]}" 2>/dev/null || echo 127)"
+  got_out="$(cat "${H_OUT[$i]}" 2>/dev/null || true)"
+  if ! case "$rc" in ''|*[!0-9-]*) false ;; *) true ;; esac; then
+    kit_fail "$name" "did not run (rc=$rc)"
+  elif [ "$(( rc & 255 ))" -ne "$exp" ]; then
+    kit_fail "$name" "expected rc $exp, got $rc"
+  elif [ -n "${H_STDOUT[$i]}" ] && [ "$got_out" != "${H_STDOUT[$i]}" ]; then
+    kit_fail "$name" "stdout mismatch"
+    printf '    want: %s\n    got:  %s\n' "${H_STDOUT[$i]}" "$got_out"
+  else
+    kit_pass "$name"
+  fi
+  i=$((i + 1))
+done
+
+KIT_SKIP_IS_FAILURE=0
+kit_summary test-hosted
+kit_exit
diff --git a/test/lib/exec_target.sh b/test/lib/exec_target.sh
@@ -50,6 +50,11 @@
 # kit_test_image_for_arch). Sourced relative to this file's location.
 . "$(dirname "${BASH_SOURCE[0]}")/test_images.sh"
 
+# VM execution backend: the `<arch>-freebsd` / `<arch>-windows` runner plus the
+# boot/teardown lifecycle. The stateless linux/macos runners live in this file;
+# the stateful VM runner (which an expensive-to-boot VM needs) lives there.
+. "$(dirname "${BASH_SOURCE[0]}")/exec_vm.sh"
+
 # Internal queue arrays. Each entry's tag is recorded alongside the
 # rest so flush can split into per-target batched runs.
 EXEC_TARGET_TAGS=()
@@ -70,7 +75,7 @@ EXEC_TARGET_RCS=()
 
 _exec_target_arch() {
     case "$1" in
-        *-*) printf '%s' "${1%-*}" ;;
+        *-*) printf '%s' "${1%%-*}" ;;   # first field (handles <arch>-<os>[-<libc>])
         *)   printf '%s' "$1" ;;
     esac
 }
@@ -98,7 +103,20 @@ _exec_target_platform() {
 # RUN_<ARCH>_IMAGE overrides them (e.g. for a glibc base). Distinct digests per
 # arch mean local storage can never confuse one arch's rootfs for another's.
 _exec_target_image() {
-    local img; img="$(kit_test_image_for_arch "$(_exec_target_arch "$1")")"
+    local arch; arch="$(_exec_target_arch "$1")"
+    # glibc targets run in a glibc (Debian) image; musl/default in alpine. The
+    # arch-qualified names disambiguate the manifest so podman never confuses one
+    # arch's rootfs for another's (the alpine pins use digests for the same end).
+    if [ "$(_exec_target_os "$1")" = "linux-glibc" ]; then
+        case "$arch" in
+            aa64|aarch64) printf '%s' "${RUN_GLIBC_AARCH64_IMAGE:-docker.io/arm64v8/debian:bookworm-slim}" ;;
+            x64)          printf '%s' "${RUN_GLIBC_X64_IMAGE:-docker.io/amd64/debian:bookworm-slim}" ;;
+            rv64)         printf '%s' "${RUN_GLIBC_RV64_IMAGE:-docker.io/riscv64/debian:trixie-slim}" ;;
+            *)            printf 'debian:bookworm-slim' ;;
+        esac
+        return
+    fi
+    local img; img="$(kit_test_image_for_arch "$arch")"
     [ -n "$img" ] && printf '%s' "$img" || printf 'alpine:latest'
 }
 
@@ -107,16 +125,14 @@ _exec_target_image() {
 # unavailable until `make test-images` provisions it. Cached per arch so
 # exec_target_supported stays a constant cost across hundreds of cases.
 _exec_target_image_present() {
-    local arch var cached
-    arch="$(_exec_target_arch "$1")"
-    var="_EXEC_TARGET_IMG_${arch}"
+    local img var cached
+    img="$(_exec_target_image "$1")"
+    # Memoize per IMAGE (not per arch): a given arch can map to either the
+    # alpine (musl) or the debian (glibc) image, and those presence answers differ.
+    var="_EXEC_TARGET_IMG_$(printf '%s' "$img" | tr -c 'A-Za-z0-9' _)"
     cached="${!var:-}"
     if [ -z "$cached" ]; then
-        if podman image exists "$(_exec_target_image "$1")" 2>/dev/null; then
-            cached=yes
-        else
-            cached=no
-        fi
+        if podman image exists "$img" 2>/dev/null; then cached=yes; else cached=no; fi
         printf -v "$var" '%s' "$cached"
     fi
     [ "$cached" = yes ]
@@ -137,7 +153,7 @@ _exec_target_native() {
     host_kernel="$(uname -s 2>/dev/null)"
     host_arch="$(uname -m 2>/dev/null)"
     case "$os" in
-        linux)
+        linux|linux-glibc)
             [ "$host_kernel" = "Linux" ] || return 1
             _exec_target_arch_matches_host "$arch" "$host_arch"
             ;;
@@ -195,6 +211,8 @@ _exec_target_qemu() {
 exec_target_supported() {
     local tag="$1" os
     os="$(_exec_target_os "$tag")"
+    # VM-backed OSes: qemu + a provisioned/reachable VM (see exec_vm.sh).
+    case "$os" in freebsd|windows) exec_vm_supported "$tag"; return $? ;; esac
     # macOS has no podman/qemu fallback — Mach-O exec requires a Darwin
     # host with matching arch. Cross-OS exec (macOS-on-Linux) is not
     # supported.
@@ -216,6 +234,7 @@ exec_target_run() {
     local tag="$1" exe="$2" out="$3" err="$4"
     local os qemu
     os="$(_exec_target_os "$tag")"
+    case "$os" in freebsd|windows) exec_vm_run "$tag" "$exe" "$out" "$err"; return ;; esac
     if _exec_target_native "$tag"; then
         "$exe" >"$out" 2>"$err"; RUN_RC=$?; return
     fi
@@ -263,6 +282,18 @@ exec_target_queue() {
 
 exec_target_queue_size() { echo "${#EXEC_TARGET_EXES[@]}"; }
 
+# Lifecycle hooks for stateful runners. The stateless runners (native/qemu/
+# podman) need neither; for VM tags these boot the VM lazily (idempotent) and
+# tear down every VM we booted at suite end. A consumer that may run VM tags
+# should `trap exec_target_teardown_all EXIT` once. No-ops for linux/macos.
+exec_target_setup() {
+    case "$(_exec_target_os "$1")" in
+        freebsd|windows) exec_vm_setup "$1" ;;
+        *) return 0 ;;
+    esac
+}
+exec_target_teardown_all() { exec_vm_teardown_all; }
+
 # Internal: drain every entry whose tag matches $1, using qemu (if
 # available for that arch), podman batched run, or the no-runner stub.
 _exec_target_flush_tag() {
@@ -276,6 +307,13 @@ _exec_target_flush_tag() {
     done
     [ "${#idx[@]}" -eq 0 ] && return 0
 
+    # VM-backed OSes: one batched VM session (boots lazily, stays warm). See
+    # exec_vm.sh. Must branch before the native/qemu/podman logic, which would
+    # otherwise try to run a FreeBSD/Windows binary under a Linux runner.
+    case "$os" in
+        freebsd|windows) exec_vm_flush_tag "$tag" "${idx[@]}"; return $? ;;
+    esac
+
     local k
     # Native exec (Linux-on-Linux, Darwin-on-Darwin) — same loop.
     if _exec_target_native "$tag"; then
diff --git a/test/lib/exec_vm.sh b/test/lib/exec_vm.sh
@@ -0,0 +1,264 @@
+# test/lib/exec_vm.sh — VM execution backend for test/lib/exec_target.sh.
+#
+# Sourced by exec_target.sh. Provides the `<arch>-freebsd` / `<arch>-windows`
+# runner: stateful VMs (expensive to boot) rather than the stateless
+# podman/qemu-user runners. exec_target's stateless flush would reboot a VM on
+# every call, so this file adds a lifecycle:
+#
+#   exec_vm_setup TAG        boot the VM for TAG if it is not already reachable,
+#                            remembering whether WE booted it (idempotent; a VM
+#                            booted for one tag is reused by sibling tags — the
+#                            single Windows VM serves both x64 and aarch64).
+#   exec_vm_flush_tag TAG K… run the queued exes (indices K into exec_target's
+#                            arrays) in ONE VM session and write each case's
+#                            .rc/.out/.err. Lazily calls exec_vm_setup, so the VM
+#                            stays warm across flushes.
+#   exec_vm_supported TAG    true if a runner (qemu + a provisioned/ reachable
+#                            VM) exists for TAG on this host.
+#   exec_vm_teardown_all     shut down every VM WE booted (install at suite end).
+#
+# Transport is scripts/{freebsd,windows}_vm.sh `run-batch <arch> <stage> <res>`:
+# ship a staging dir in, run its run-remote.{sh,ps1} entry script (authored
+# here), and bring each binary's <id>.{rc,out,err} back into <res>. The VM scripts
+# stay pure transport; this file owns the protocol + lifecycle.
+
+EXEC_VM_SCRIPTS="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../scripts" && pwd)"
+EXEC_VM_ROOT="$(cd "$EXEC_VM_SCRIPTS/.." && pwd)"
+EXEC_VM_WORK="${EXEC_VM_WORK:-${TMPDIR:-/tmp}/kit-exec-vm}"
+# Space-joined list of VM ids WE booted (so teardown stops only those, never a
+# VM the user already had running).
+EXEC_VM_STARTED=""
+
+# ---- tag parsing -----------------------------------------------------------
+# Tags are "<arch>-<os>"; os is the last '-' field. arch tokens are tolerant:
+# x64/amd64, aa64/aarch64, rv64/riscv64.
+_exec_vm_os()   { printf '%s' "${1##*-}"; }
+_exec_vm_arch() { printf '%s' "${1%-*}"; }
+
+# Map an exec arch token to the arch name the VM scripts expect.
+#   freebsd: amd64 / aarch64 / riscv64   windows: x64 / aarch64
+_exec_vm_vmarch() {
+    local os="$1" arch="$2"
+    case "$os" in
+        freebsd)
+            case "$arch" in
+                x64|amd64|x86_64) echo amd64 ;;
+                aa64|aarch64|arm64) echo aarch64 ;;
+                rv64|riscv64) echo riscv64 ;;
+                *) return 1 ;;
+            esac ;;
+        windows)
+            case "$arch" in
+                x64|amd64|x86_64) echo x64 ;;
+                aa64|aarch64|arm64) echo aarch64 ;;
+                *) return 1 ;;
+            esac ;;
+        *) return 1 ;;
+    esac
+}
+
+# VM identity for a tag: one Windows VM serves both arches; FreeBSD is per-arch.
+_exec_vm_id() {
+    local os arch vmarch
+    os="$(_exec_vm_os "$1")"; arch="$(_exec_vm_arch "$1")"
+    if [ "$os" = windows ]; then echo windows; return; fi
+    vmarch="$(_exec_vm_vmarch "$os" "$arch")" || return 1
+    echo "freebsd-$vmarch"
+}
+
+_exec_vm_fbsd_qemu() {
+    case "$1" in amd64) echo qemu-system-x86_64 ;; *) echo "qemu-system-$1" ;; esac
+}
+
+# True if the VM for (os, vmarch) already answers.
+_exec_vm_reachable() {
+    local os="$1" vmarch="$2"
+    if [ "$os" = freebsd ]; then
+        "$EXEC_VM_SCRIPTS/freebsd_vm.sh" ssh "$vmarch" true >/dev/null 2>&1
+    else
+        "$EXEC_VM_SCRIPTS/windows_vm.sh" ssh aarch64 ver >/dev/null 2>&1
+    fi
+}
+
+_exec_vm_win_marker() {
+    local cache="${KIT_WINDOWS_VM_CACHE:-${KIT_WINDOWS_CACHE_DIR:-${XDG_CACHE_HOME:-$HOME/.cache}/kit}/windows-vm}"
+    printf '%s/win11-arm64.provisioned' "$cache"
+}
+
+# ---- supported -------------------------------------------------------------
+exec_vm_supported() {
+    local os arch vmarch
+    os="$(_exec_vm_os "$1")"; arch="$(_exec_vm_arch "$1")"
+    vmarch="$(_exec_vm_vmarch "$os" "$arch")" || return 1
+    if [ "$os" = freebsd ]; then
+        command -v "$(_exec_vm_fbsd_qemu "$vmarch")" >/dev/null 2>&1 || return 1
+        [ -f "$EXEC_VM_ROOT/build/freebsd-vm/images/freebsd-$vmarch.provisioned" ] && return 0
+        _exec_vm_reachable freebsd "$vmarch" && return 0
+        return 1
+    fi
+    command -v "${KIT_WINDOWS_QEMU:-qemu-system-aarch64}" >/dev/null 2>&1 || return 1
+    [ -f "$(_exec_vm_win_marker)" ] && return 0
+    _exec_vm_reachable windows aarch64 && return 0
+    return 1
+}
+
+# ---- lifecycle -------------------------------------------------------------
+exec_vm_setup() {
+    local os arch vmarch vmid
+    os="$(_exec_vm_os "$1")"; arch="$(_exec_vm_arch "$1")"
+    vmarch="$(_exec_vm_vmarch "$os" "$arch")" || return 1
+    vmid="$(_exec_vm_id "$1")" || return 1
+    case " $EXEC_VM_STARTED " in *" $vmid "*) return 0 ;; esac
+    _exec_vm_reachable "$os" "$vmarch" && return 0
+    mkdir -p "$EXEC_VM_WORK"
+    if [ "$os" = freebsd ]; then
+        "$EXEC_VM_SCRIPTS/freebsd_vm.sh" run "$vmarch" \
+            > "$EXEC_VM_WORK/$vmid.log" 2>&1 &
+        echo "$!" > "$EXEC_VM_WORK/$vmid.pid"
+        "$EXEC_VM_SCRIPTS/freebsd_vm.sh" wait-ssh "$vmarch" \
+            > "$EXEC_VM_WORK/$vmid.wait" 2>&1 || return 1
+    else
+        "$EXEC_VM_SCRIPTS/windows_vm.sh" boot \
+            > "$EXEC_VM_WORK/$vmid.log" 2>&1 || return 1
+        "$EXEC_VM_SCRIPTS/windows_vm.sh" wait-ssh 900 \
+            > "$EXEC_VM_WORK/$vmid.wait" 2>&1 || return 1
+    fi
+    EXEC_VM_STARTED="$EXEC_VM_STARTED $vmid"
+    return 0
+}
+
+exec_vm_teardown_all() {
+    [ "${EXEC_VM_KEEP_UP:-0}" = 1 ] && return 0
+    local vmid vmarch pid
+    for vmid in $EXEC_VM_STARTED; do
+        case "$vmid" in
+            windows)
+                "$EXEC_VM_SCRIPTS/windows_vm.sh" stop >/dev/null 2>&1 || true ;;
+            freebsd-*)
+                vmarch="${vmid#freebsd-}"
+                "$EXEC_VM_SCRIPTS/freebsd_vm.sh" ssh "$vmarch" \
+                    'sync; shutdown -p now' >/dev/null 2>&1 || true
+                if [ -f "$EXEC_VM_WORK/$vmid.pid" ]; then
+                    pid="$(cat "$EXEC_VM_WORK/$vmid.pid")"
+                    for _ in $(seq 1 30); do
+                        kill -0 "$pid" 2>/dev/null || break; sleep 1
+                    done
+                    kill -0 "$pid" 2>/dev/null && kill "$pid" 2>/dev/null || true
+                    sleep 1
+                    kill -0 "$pid" 2>/dev/null && kill -9 "$pid" 2>/dev/null || true
+                    wait "$pid" 2>/dev/null || true
+                fi ;;
+        esac
+    done
+    EXEC_VM_STARTED=""
+}
+
+# ---- guest entry scripts (authored here; run-batch just executes them) ------
+_exec_vm_write_sh() {
+    local stage="$1" ids="$2"
+    {
+        echo '#!/bin/sh'
+        echo 'cd "$(dirname "$0")" || exit 99'
+        echo 'mkdir -p res'
+        printf 'for id in%s; do\n' "$ids"
+        echo '  chmod +x "./$id" 2>/dev/null'
+        echo '  "./$id" > "res/$id.out" 2> "res/$id.err"'
+        echo '  echo $? > "res/$id.rc"'
+        echo 'done'
+        echo 'exit 0'
+    } > "$stage/run-remote.sh"
+}
+
+# Capture via Start-Process -PassThru .ExitCode: a launch Windows blocks (e.g. a
+# Defender PUA false-positive) does NOT update $LASTEXITCODE, so the bare-`&`
+# form would report the previous binary's code. Start-Process throws on a blocked
+# launch -> we record rc 126 + a note so the case fails on its own merits.
+_exec_vm_write_ps1() {
+    local stage="$1" ids="$2" id
+    {
+        echo '$ErrorActionPreference = "Continue"'
+        echo 'Set-Location -LiteralPath $PSScriptRoot'
+        echo '$res = Join-Path $PSScriptRoot "res"'
+        echo 'New-Item -ItemType Directory -Force -Path $res | Out-Null'
+        for id in $ids; do
+            printf 'try { $p = Start-Process -FilePath ".\\%s.exe" -Wait -PassThru -WindowStyle Hidden -RedirectStandardOutput "$res\\%s.out" -RedirectStandardError "$res\\%s.err"; $p.ExitCode | Out-File -Encoding ascii "$res\\%s.rc" } catch { "126" | Out-File -Encoding ascii "$res\\%s.rc"; "launch blocked (Defender?)" | Out-File -Encoding ascii "$res\\%s.err" }\n' \
+                "$id" "$id" "$id" "$id" "$id" "$id"
+        done
+        echo 'exit 0'
+    } > "$stage/run-remote.ps1"
+}
+
+# ---- flush -----------------------------------------------------------------
+# exec_vm_flush_tag TAG K…  : K are indices into exec_target's EXEC_TARGET_*
+# arrays. Stage those exes, run them in the VM, write each case's rc/out/err.
+exec_vm_flush_tag() {
+    local tag="$1"; shift
+    local os arch vmarch ext stage res ids="" i k id raw
+    os="$(_exec_vm_os "$tag")"; arch="$(_exec_vm_arch "$tag")"
+    vmarch="$(_exec_vm_vmarch "$os" "$arch")" || { _exec_vm_mark_all 127 "$@"; return 0; }
+    ext=""; [ "$os" = windows ] && ext=".exe"
+
+    mkdir -p "$EXEC_VM_WORK"
+    stage="$(mktemp -d "$EXEC_VM_WORK/stage.XXXXXX")"
+    res="$stage.res"; mkdir -p "$res"
+
+    i=0
+    for k in "$@"; do
+        cp "${EXEC_TARGET_EXES[$k]}" "$stage/$i$ext"
+        ids="$ids $i"
+        i=$((i+1))
+    done
+    if [ "$os" = windows ]; then _exec_vm_write_ps1 "$stage" "$ids"
+    else _exec_vm_write_sh "$stage" "$ids"; fi
+
+    if ! exec_vm_setup "$tag"; then
+        _exec_vm_mark_all 127 "$@"; rm -rf "$stage" "$res"; return 0
+    fi
+
+    "$EXEC_VM_SCRIPTS/${os}_vm.sh" run-batch "$vmarch" "$stage" "$res" \
+        > "$stage.batch.out" 2> "$stage.batch.err" || true
+
+    i=0
+    for k in "$@"; do
+        if [ -f "$res/$i.rc" ]; then
+            raw="$(tr -d '\r' < "$res/$i.rc" | head -n1)"
+            case "$raw" in
+                ''|*[!0-9-]*) raw=126 ;;
+                # Windows preserves the full 32-bit exit code; mask to 8 bits so
+                # .rc is a uniform POSIX-style status across every exec tag.
+                *) [ "$os" = windows ] && raw=$(( raw & 255 )) ;;
+            esac
+            echo "$raw" > "${EXEC_TARGET_RCS[$k]}"
+        else
+            echo 127 > "${EXEC_TARGET_RCS[$k]}"
+        fi
+        if [ -f "$res/$i.out" ]; then tr -d '\r' < "$res/$i.out" > "${EXEC_TARGET_OUTS[$k]}"
+        else : > "${EXEC_TARGET_OUTS[$k]}"; fi
+        if [ -f "$res/$i.err" ]; then tr -d '\r' < "$res/$i.err" > "${EXEC_TARGET_ERRS[$k]}"
+        else : > "${EXEC_TARGET_ERRS[$k]}"; fi
+        i=$((i+1))
+    done
+    rm -rf "$stage" "$res"
+}
+
+# Mark every queued case (indices in $2..) with rc $1 and empty out/err.
+_exec_vm_mark_all() {
+    local rc="$1"; shift
+    local k
+    for k in "$@"; do
+        echo "$rc" > "${EXEC_TARGET_RCS[$k]}"
+        : > "${EXEC_TARGET_OUTS[$k]}"
+        : > "${EXEC_TARGET_ERRS[$k]}"
+    done
+}
+
+# Synchronous one-shot for a VM tag (mirrors exec_target_run): sets RUN_RC.
+exec_vm_run() {
+    local tag="$1" exe="$2" out="$3" err="$4"
+    EXEC_TARGET_EXES=("$exe"); EXEC_TARGET_OUTS=("$out")
+    EXEC_TARGET_ERRS=("$err"); EXEC_TARGET_RCS=("$EXEC_VM_WORK/.run.rc")
+    mkdir -p "$EXEC_VM_WORK"
+    exec_vm_flush_tag "$tag" 0
+    RUN_RC="$(cat "$EXEC_VM_WORK/.run.rc" 2>/dev/null || echo 127)"
+    EXEC_TARGET_EXES=(); EXEC_TARGET_OUTS=(); EXEC_TARGET_ERRS=(); EXEC_TARGET_RCS=()
+}
diff --git a/test/toy/vm.sh b/test/toy/vm.sh
@@ -2,37 +2,29 @@
 # test/toy/vm.sh — run the .toy corpus as real *hosted* programs inside the
 # FreeBSD and Windows VMs, asserting each case's `.expected` exit code.
 #
-# This is the VM counterpart to test/toy/run.sh's X lane: X cross-compiles a
+# The VM counterpart to test/toy/run.sh's X lane: X cross-compiles a
 # freestanding ELF and runs it under qemu-user/podman with a kit-built _start
-# stub; here we link a full hosted binary against a real OS sysroot
-# (FreeBSD base.txz extract, or the llvm-mingw UCRT sysroot) and execute it on
-# the genuine OS in a VM. So this exercises the whole hosted path — ABI, CRT
-# startup, the platform loader, syscalls/Win32 — that the Linux lanes cannot.
+# stub; here we link a full hosted binary against a real OS sysroot (FreeBSD
+# base.txz extract, or the llvm-mingw UCRT sysroot) and execute it on the genuine
+# OS in a VM — exercising the whole hosted path (ABI, CRT startup, the platform
+# loader, syscalls/Win32) the Linux lanes cannot.
 #
-#   usage: test/toy/vm.sh <os> [name_filter]
-#     os          freebsd | windows
-#     name_filter substring of the case basename to restrict the run (TDD)
+# Execution goes through the shared seam test/lib/exec_target.sh (the
+# `<arch>-freebsd` / `<arch>-windows` tags): this script only compiles each case
+# and queues it; exec_target/exec_vm own VM boot/reuse, the batched run-batch
+# transport, and shutdown-at-exit. Compilation happens for every applicable case
+# regardless of VM availability (so a codegen/link bug is caught even with no VM);
+# only execution is gated on a runner.
 #
-# env:
-#   KIT                     kit driver (default build/kit)
-#   KIT_TOY_FREEBSD_ARCHES  default "amd64 aarch64 riscv64"
-#   KIT_TOY_WINDOWS_ARCHES  default "x64 aarch64"
-#   KIT_OPT_LEVELS          default "0 1"
-#   KIT_FREEBSD_LINK        static | dynamic | both   (default both)
-#   KIT_TEST_FILTER         same as the positional name_filter
-#   KIT_TOY_VM_KEEP_UP      if 1, do not shut a VM we started (debugging)
+#   usage: test/toy/vm.sh <os> [name_filter]      os: freebsd | windows
 #
-# Execution model: per (os, arch) we compile every applicable case at every
-# (opt, link-mode) into ONE staging dir, ship that dir into the VM, and run all
-# the binaries in a single SSH session (scripts/<os>_vm.sh run-batch), which
-# emits "<id> <rc>" per binary. We then join those exit codes with a host-side
-# manifest and report pass/fail. One VM round-trip per arch keeps the corpus
-# (hundreds of binaries per arch at O0+O1) viable.
+# env: KIT, KIT_TOY_FREEBSD_ARCHES (amd64 aarch64 riscv64),
+#   KIT_TOY_WINDOWS_ARCHES (x64 aarch64), KIT_OPT_LEVELS (0 1),
+#   KIT_FREEBSD_LINK (static|dynamic|both), KIT_TEST_FILTER, KIT_TOY_VM_KEEP_UP.
 #
-# Like test/toy/run.sh, skips (missing sysroot/VM, arch-inapplicable cases) are
-# non-fatal: only a real FAIL gates the exit. Cases that legitimately do not
-# lower yet for an OS/arch are left RED on purpose — opt them out only with a
-# committed <name>.<os>.skip sidecar when the case is genuinely inapplicable.
+# Skips (missing sysroot/VM, arch-inapplicable cases) are non-fatal — only a real
+# FAIL gates the exit. Genuine codegen gaps are left RED on purpose; opt a case
+# out only with a committed <name>.<os>.skip sidecar when truly inapplicable.
 
 set -u
 
@@ -43,8 +35,15 @@ BUILD_DIR="$ROOT/build/test/toy-vm"
 
 # shellcheck source=../lib/kit_sh_report.sh
 . "$ROOT/test/lib/kit_sh_report.sh"
+# shellcheck source=../lib/exec_target.sh
+. "$ROOT/test/lib/exec_target.sh"
 kit_report_init
 
+# Honor the legacy keep-up knob through exec_vm's teardown gate, and shut any VM
+# we booted down at exit.
+[ "${KIT_TOY_VM_KEEP_UP:-0}" = 1 ] && export EXEC_VM_KEEP_UP=1
+trap exec_target_teardown_all EXIT
+
 OS="${1:-}"
 FILTER="${2:-${KIT_TEST_FILTER:-}}"
 case "$OS" in
@@ -65,8 +64,6 @@ fi
 shopt -s nullglob
 
 # ---- target / sysroot resolution -------------------------------------------
-
-# triple OS ARCH -> kit -target triple
 triple_for() {
   case "$1/$2" in
     freebsd/amd64)   echo x86_64-freebsd ;;
@@ -78,7 +75,6 @@ triple_for() {
   esac
 }
 
-# sysroot OS ARCH -> sysroot dir (may be empty / nonexistent)
 sysroot_for() {
   case "$1" in
     freebsd) "$ROOT/scripts/freebsd_sysroot.sh" path "$2" 2>/dev/null ;;
@@ -87,8 +83,7 @@ sysroot_for() {
 }
 
 # The basename suffix this (os,arch) "owns" — arch-only cases named *_x64 /
-# *_aa64 / *_rv64 use intrinsics that only lower on that arch, so they apply to
-# exactly one arch (cf. test/toy/run.sh:cross_one).
+# *_aa64 / *_rv64 use intrinsics that only lower on that arch (cf. run.sh).
 arch_suffix_for() {
   case "$1/$2" in
     freebsd/amd64|windows/x64) echo _x64 ;;
@@ -101,12 +96,8 @@ arch_suffix_for() {
 # case_skip_reason OS ARCH NAME SRC -> echoes a skip reason, or "" if applicable.
 case_skip_reason() {
   local os="$1" arch="$2" name="$3" src="$4" own
-  if [ -e "${src%.toy}.link.skip" ]; then
-    head -n1 "${src%.toy}.link.skip"; return; fi
-  if [ -e "${src%.toy}.$os.skip" ]; then
-    head -n1 "${src%.toy}.$os.skip"; return; fi
-  # asmnop is an aa64-only construct (no x64/rv64 lowering before toy asm
-  # selectors), matching the X lane's blanket skip.
+  if [ -e "${src%.toy}.link.skip" ]; then head -n1 "${src%.toy}.link.skip"; return; fi
+  if [ -e "${src%.toy}.$os.skip" ]; then head -n1 "${src%.toy}.$os.skip"; return; fi
   if [ "$arch" != aarch64 ] && grep -q 'asmnop' "$src" 2>/dev/null; then
     echo "asmnop is aa64-only"; return; fi
   own="$(arch_suffix_for "$os" "$arch")"
@@ -118,8 +109,6 @@ case_skip_reason() {
   echo ""
 }
 
-# link modes for a given OS (FreeBSD honors KIT_FREEBSD_LINK; Windows is the
-# single dynamic UCRT console mode).
 modes_for() {
   if [ "$1" = windows ]; then echo ucrt; return; fi
   case "$LINK" in
@@ -129,7 +118,6 @@ modes_for() {
   esac
 }
 
-# cc_extra_flags OS MODE -> extra kit cc flags for this OS/link mode
 cc_extra_flags() {
   case "$1/$2" in
     freebsd/static)  echo -static ;;
@@ -138,279 +126,121 @@ cc_extra_flags() {
   esac
 }
 
-# ---- staging ---------------------------------------------------------------
-# stage_arch OS ARCH TRIPLE SYSROOT STAGEDIR
-#   Compile every applicable case×opt×mode into STAGEDIR (binaries named by a
-#   running id), write the host-side manifest ($STAGEDIR/manifest: "id expected
-#   label") and the guest entry script (run-remote.sh / run-remote.ps1 listing
-#   the ids). Compile failures are reported as FAIL immediately. Returns the
-#   number of staged binaries via the global STAGED_N.
+# ---- compile + record (no exec here) ---------------------------------------
+# Parallel arrays of staged cases awaiting execution.
+TF_LABEL=(); TF_EXP=(); TF_EXE=(); TF_OUT=(); TF_ERR=(); TF_RC=(); TF_TAG=()
+
 STAGED_N=0
 stage_arch() {
   local os="$1" arch="$2" triple="$3" sysroot="$4" stage="$5"
-  local ext="" id=0 ids="" name base reason opt mode label out cc_err
+  local ext="" id=0 name base reason opt mode label out cc_err expected
   [ "$os" = windows ] && ext=".exe"
   rm -rf "$stage"; mkdir -p "$stage"
-  : > "$stage/manifest"
 
   for src in "$TEST_DIR"/cases/*.toy; do
     base="$(basename "${src%.toy}")"
     [ -n "$FILTER" ] && case "$base" in *"$FILTER"*) ;; *) continue ;; esac
     reason="$(case_skip_reason "$os" "$arch" "$base" "$src")"
-    if [ -n "$reason" ]; then
-      kit_skip "$base/$os-$arch" "$reason"
-      continue
-    fi
-    local expected=0
+    if [ -n "$reason" ]; then kit_skip "$base/$os-$arch" "$reason"; continue; fi
+    expected=0
     [ -f "${src%.toy}.expected" ] && expected="$(cat "${src%.toy}.expected")"
     for opt in $OPT_LEVELS; do
       for mode in $(modes_for "$os"); do
         label="$base/$os-$arch-$mode-O$opt"
-        out="$stage/$id$ext"
-        cc_err="$stage/$id.cc.err"
+        out="$stage/$id$ext"; cc_err="$stage/$id.cc.err"
         # shellcheck disable=SC2046
         if ! "$KIT" cc "-O$opt" -target "$triple" --sysroot "$sysroot" \
               $(cc_extra_flags "$os" "$mode") "$src" -o "$out" \
               > "$stage/$id.cc.out" 2> "$cc_err"; then
           kit_fail "$label" "kit cc -target $triple failed"
-          sed 's/^/    | /' "$cc_err"
-          continue
+          sed 's/^/    | /' "$cc_err"; continue
         fi
         if [ -s "$cc_err" ]; then
           kit_fail "$label" "kit cc -target $triple wrote stderr"
-          sed 's/^/    | /' "$cc_err"
-          continue
+          sed 's/^/    | /' "$cc_err"; continue
         fi
-        printf '%s %s %s\n' "$id" "$expected" "$label" >> "$stage/manifest"
-        ids="$ids $id"
+        TF_LABEL+=("$label"); TF_EXP+=("$expected"); TF_EXE+=("$out")
+        TF_OUT+=("$stage/$id.out"); TF_ERR+=("$stage/$id.err")
+        TF_RC+=("$stage/$id.rc"); TF_TAG+=("$arch-$os")
         id=$((id + 1))
       done
     done
   done
-
   STAGED_N=$id
-  [ "$id" -eq 0 ] && return 0
-  if [ "$os" = windows ]; then
-    write_remote_ps1 "$stage" "$ids"
-  else
-    write_remote_sh "$stage" "$ids"
-  fi
-}
-
-# The guest entry scripts. They run each staged binary with output suppressed
-# and print "<id> <rc>"; a crash reports its own rc and the loop continues.
-write_remote_sh() {
-  local stage="$1" ids="$2"
-  {
-    echo '#!/bin/sh'
-    echo 'cd "$(dirname "$0")" || exit 99'
-    printf 'for id in%s; do\n' "$ids"
-    echo '  chmod +x "./$id" 2>/dev/null'
-    echo '  "./$id" >/dev/null 2>&1'
-    echo '  echo "$id $?"'
-    echo 'done'
-    echo 'exit 0'
-  } > "$stage/run-remote.sh"
-}
-
-# Capture each exit code via Start-Process -PassThru .ExitCode rather than
-# `& exe; $LASTEXITCODE`: a launch that Windows blocks (e.g. a Defender PUA
-# false-positive on a kit-produced exe) does NOT update $LASTEXITCODE, so the
-# bare-`&` form silently reports the PREVIOUS binary's code. Start-Process throws
-# on a blocked launch instead, which we turn into a distinct LAUNCHFAIL token so
-# report_results flags it rather than mis-scoring a neighbor's exit code.
-write_remote_ps1() {
-  local stage="$1" ids="$2" id
-  {
-    echo '$ErrorActionPreference = "Continue"'
-    echo 'Set-Location -LiteralPath $PSScriptRoot'
-    echo '$o = Join-Path $env:TEMP "kit_toy_o.txt"'
-    echo '$e = Join-Path $env:TEMP "kit_toy_e.txt"'
-    for id in $ids; do
-      printf 'try { $p = Start-Process -FilePath ".\\%s.exe" -Wait -PassThru -WindowStyle Hidden -RedirectStandardOutput $o -RedirectStandardError $e; "%s $($p.ExitCode)" } catch { "%s LAUNCHFAIL" }\n' "$id" "$id" "$id"
-    done
-    echo 'exit 0'
-  } > "$stage/run-remote.ps1"
-}
-
-# ---- result join -----------------------------------------------------------
-# report_results MANIFEST BATCHOUT : join the VM's "<id> <rc>" lines with the
-# host manifest and emit pass/fail per case. expected is masked to 8 bits to
-# match POSIX exit semantics (Windows codes are small, so identical).
-report_results() {
-  local manifest="$1" batchout="$2" joined="$2.joined"
-  # The Windows VM returns CRLF lines; strip CR so the rc field stays numeric.
-  tr -d '\r' < "$batchout" > "$batchout.clean"
-  awk 'NR==FNR{rc[$1]=$2;next}{print $1, $2, $3, (($1) in rc ? rc[$1] : "?")}' \
-    "$batchout.clean" "$manifest" > "$joined"
-  local id expected label rc exp got
-  while read -r id expected label rc; do
-    exp=$((expected & 255))
-    # "?" = the VM emitted no line for this id; "LAUNCHFAIL" = the binary could
-    # not be started (e.g. a Defender block). Either way it did not run — FAIL,
-    # never silently scored against a stale/neighboring exit code.
-    case "$rc" in
-      ''|*[!0-9-]*)
-        kit_fail "$label" "binary did not run in VM (rc=$rc)"
-        continue ;;
-    esac
-    # The .toy oracle is a POSIX 8-bit exit status: cases may return values >255
-    # and rely on the kernel truncating to the low byte. FreeBSD/Linux already
-    # truncate, but Windows preserves the full 32-bit exit code, so mask both
-    # sides to compare on the same 8-bit oracle every other lane uses.
-    got=$(( rc & 255 ))
-    if [ "$got" -eq "$exp" ] 2>/dev/null; then
-      kit_pass "$label"
-    else
-      kit_fail "$label" "expected rc $exp, got $rc"
-    fi
-  done < "$joined"
-}
-
-# ---- FreeBSD lane ----------------------------------------------------------
-fbsd_vm() { "$ROOT/scripts/freebsd_vm.sh" "$@"; }
-
-fbsd_qemu_for() {
-  case "$1" in amd64) echo qemu-system-x86_64 ;; *) echo "qemu-system-$1" ;; esac
-}
-
-run_freebsd_arch() {
-  local arch="$1" triple sysroot stage qemu started=0 batch
-  triple="$(triple_for freebsd "$arch")"
-  if [ -z "$triple" ]; then kit_skip_na "toy/freebsd-$arch" "unknown arch"; return; fi
-  sysroot="$(sysroot_for freebsd "$arch")"
-  if [ -z "$sysroot" ] || [ ! -d "$sysroot/usr/include" ]; then
-    kit_skip "toy/freebsd-$arch" "missing sysroot (scripts/freebsd_sysroot.sh $arch)"; return; fi
-  case "$LINK" in
-    static|both) [ -f "$sysroot/usr/lib/libc.a" ] || {
-      kit_skip "toy/freebsd-$arch" "missing $sysroot/usr/lib/libc.a"; return; } ;;
-  esac
-  case "$LINK" in
-    dynamic|both) [ -f "$sysroot/lib/libc.so.7" ] || {
-      kit_skip "toy/freebsd-$arch" "missing $sysroot/lib/libc.so.7"; return; } ;;
-  esac
-
-  stage="$BUILD_DIR/freebsd/$arch"
-  stage_arch freebsd "$arch" "$triple" "$sysroot" "$stage"
-  [ "$STAGED_N" -eq 0 ] && return
-
-  qemu="$(fbsd_qemu_for "$arch")"
-  if ! command -v "$qemu" >/dev/null 2>&1; then
-    kit_skip "toy/freebsd-$arch" "$qemu missing (staged $STAGED_N binaries, not run)"; return; fi
-  if [ ! -f "$ROOT/build/freebsd-vm/images/freebsd-$arch.provisioned" ] &&
-     ! fbsd_vm ssh "$arch" true >/dev/null 2>&1; then
-    kit_skip "toy/freebsd-$arch" "VM not provisioned (scripts/freebsd_vm.sh prepare $arch)"; return; fi
-
-  if fbsd_vm ssh "$arch" true >/dev/null 2>&1; then
-    : # reuse a guest that is already up
-  else
-    fbsd_vm run "$arch" > "$stage/vm.log" 2>&1 &
-    echo "$!" > "$stage/vm.pid"; started=1
-  fi
-  if ! fbsd_vm wait-ssh "$arch" > "$stage/wait.log" 2>&1; then
-    kit_fail "toy/freebsd-$arch" "VM did not become reachable"
-    sed 's/^/    | /' "$stage/wait.log" | head -20
-    [ "$started" = 1 ] && fbsd_stop "$arch" "$stage"
-    return
-  fi
-
-  batch="$stage/batch.out"
-  if ! fbsd_vm run-batch "$arch" "$stage" > "$batch" 2> "$stage/batch.err"; then
-    kit_fail "toy/freebsd-$arch" "run-batch failed"
-    sed 's/^/    | /' "$stage/batch.err" | head -20
-    [ "$started" = 1 ] && fbsd_stop "$arch" "$stage"
-    return
-  fi
-  report_results "$stage/manifest" "$batch"
-  [ "$started" = 1 ] && fbsd_stop "$arch" "$stage"
 }
 
-fbsd_stop() {
-  local arch="$1" stage="$2" pid
-  [ "${KIT_TOY_VM_KEEP_UP:-0}" = 1 ] && return 0
-  fbsd_vm ssh "$arch" 'sync; shutdown -p now' >/dev/null 2>&1 || true
-  [ -f "$stage/vm.pid" ] || return 0
-  pid="$(cat "$stage/vm.pid")"
-  for _ in $(seq 1 30); do kill -0 "$pid" 2>/dev/null || return 0; sleep 1; done
-  kill "$pid" 2>/dev/null || true; sleep 1
-  kill -0 "$pid" 2>/dev/null && kill -9 "$pid" 2>/dev/null || true
-  wait "$pid" 2>/dev/null || true
-}
+# ---- drive -----------------------------------------------------------------
+mkdir -p "$BUILD_DIR/$OS"
 
-run_freebsd() {
-  local arch
+# Phase 1: compile every applicable case for every arch (catches codegen/link
+# bugs even with no VM).
+if [ "$OS" = freebsd ]; then
   printf 'toy-vm freebsd: arches="%s" opts="%s" link=%s\n' \
     "$FREEBSD_ARCHES" "$OPT_LEVELS" "$LINK"
-  for arch in $FREEBSD_ARCHES; do run_freebsd_arch "$arch"; done
-}
-
-# ---- Windows lane ----------------------------------------------------------
-win_vm() { "$ROOT/scripts/windows_vm.sh" "$@"; }
-
-run_windows() {
-  local arch triple sysroot stage started=0 any=0 batch
+  ARCHES="$FREEBSD_ARCHES"
+else
   printf 'toy-vm windows: arches="%s" opts="%s"\n' "$WINDOWS_ARCHES" "$OPT_LEVELS"
+  ARCHES="$WINDOWS_ARCHES"
+fi
 
-  # Stage every arch first (host-only work); only boot the VM if something was
-  # actually staged. One ARM64 VM serves both arches (x64 via Prism emulation).
-  for arch in $WINDOWS_ARCHES; do
-    triple="$(triple_for windows "$arch")"
-    if [ -z "$triple" ]; then kit_skip_na "toy/windows-$arch" "unknown arch"; continue; fi
-    sysroot="$(sysroot_for windows "$arch")"
+for arch in $ARCHES; do
+  triple="$(triple_for "$OS" "$arch")"
+  if [ -z "$triple" ]; then kit_skip_na "toy/$OS-$arch" "unknown arch"; continue; fi
+  sysroot="$(sysroot_for "$OS" "$arch")"
+  if [ "$OS" = freebsd ]; then
+    if [ -z "$sysroot" ] || [ ! -d "$sysroot/usr/include" ]; then
+      kit_skip "toy/$OS-$arch" "missing sysroot (scripts/freebsd_sysroot.sh $arch)"; continue; fi
+    case "$LINK" in static|both) [ -f "$sysroot/usr/lib/libc.a" ] || {
+      kit_skip "toy/$OS-$arch" "missing $sysroot/usr/lib/libc.a"; continue; } ;; esac
+    case "$LINK" in dynamic|both) [ -f "$sysroot/lib/libc.so.7" ] || {
+      kit_skip "toy/$OS-$arch" "missing $sysroot/lib/libc.so.7"; continue; } ;; esac
+  else
     if [ -z "$sysroot" ] || [ ! -r "$sysroot/include/windows.h" ] ||
        [ ! -r "$sysroot/lib/libucrt.a" ]; then
-      kit_skip "toy/windows-$arch" "missing UCRT sysroot (scripts/llvm_mingw_sysroot.sh prepare $arch)"; continue; fi
-    stage="$BUILD_DIR/windows/$arch"
-    stage_arch windows "$arch" "$triple" "$sysroot" "$stage"
-    [ "$STAGED_N" -gt 0 ] && any=1
-  done
-  [ "$any" -eq 0 ] && return
-
-  if ! command -v "${KIT_WINDOWS_QEMU:-qemu-system-aarch64}" >/dev/null 2>&1; then
-    kit_skip "toy/windows" "qemu-system-aarch64 missing (staged, not run)"; return; fi
-
-  if win_vm ssh aarch64 ver >/dev/null 2>&1; then
-    : # reuse a running VM
-  else
-    if ! win_vm boot > "$BUILD_DIR/windows/boot.log" 2>&1; then
-      kit_skip "toy/windows" "VM unavailable ($(tail -n1 "$BUILD_DIR/windows/boot.log" 2>/dev/null))"; return; fi
-    started=1
+      kit_skip "toy/$OS-$arch" "missing UCRT sysroot (scripts/llvm_mingw_sysroot.sh prepare $arch)"; continue; fi
   fi
-  if ! win_vm wait-ssh 900 > "$BUILD_DIR/windows/wait.log" 2>&1; then
-    kit_fail "toy/windows" "VM did not become reachable"
-    sed 's/^/    | /' "$BUILD_DIR/windows/wait.log" | head -20
-    [ "$started" = 1 ] && win_stop
-    return
+  stage_arch "$OS" "$arch" "$triple" "$sysroot" "$BUILD_DIR/$OS/$arch"
+done
+
+# Phase 2: execute via the shared seam. Queue every staged case whose tag has a
+# runner; SKIP (do not FAIL) cases for a tag with no VM. One flush drains all
+# tags, booting each VM lazily and reusing it across tags (one Windows VM serves
+# both arches); the EXIT trap tears the VMs down.
+n="${#TF_LABEL[@]}"
+i=0
+while [ "$i" -lt "$n" ]; do
+  tag="${TF_TAG[$i]}"
+  if exec_target_supported "$tag"; then
+    exec_target_queue "$tag" "${TF_LABEL[$i]}" "${TF_EXE[$i]}" \
+      "${TF_OUT[$i]}" "${TF_ERR[$i]}" "${TF_RC[$i]}"
+  else
+    kit_skip "${TF_LABEL[$i]}" "no runner for $tag"
+    TF_RC[$i]=""   # mark "not executed" so the result loop skips it
   fi
+  i=$((i + 1))
+done
+
+exec_target_flush
+
+# Compare each executed case's exit code to its oracle (8-bit mask: the .toy
+# oracle is a POSIX exit status; exec_vm already masks Windows codes when writing
+# .rc, so this is a no-op there but keeps FreeBSD >255 returns honest).
+i=0
+while [ "$i" -lt "$n" ]; do
+  rcfile="${TF_RC[$i]}"
+  if [ -z "$rcfile" ]; then i=$((i + 1)); continue; fi   # skipped above
+  label="${TF_LABEL[$i]}"; exp=$(( ${TF_EXP[$i]} & 255 ))
+  rc="$(cat "$rcfile" 2>/dev/null || echo 127)"
+  case "$rc" in
+    ''|*[!0-9-]*)
+      kit_fail "$label" "binary did not run in VM (rc=$rc)" ;;
+    *)
+      if [ "$(( rc & 255 ))" -eq "$exp" ] 2>/dev/null; then kit_pass "$label"
+      else kit_fail "$label" "expected rc $exp, got $rc"; fi ;;
+  esac
+  i=$((i + 1))
+done
 
-  for arch in $WINDOWS_ARCHES; do
-    stage="$BUILD_DIR/windows/$arch"
-    [ -f "$stage/manifest" ] && [ -s "$stage/manifest" ] || continue
-    batch="$stage/batch.out"
-    if ! win_vm run-batch "$arch" "$stage" > "$batch" 2> "$stage/batch.err"; then
-      kit_fail "toy/windows-$arch" "run-batch failed"
-      sed 's/^/    | /' "$stage/batch.err" | head -20
-      continue
-    fi
-    report_results "$stage/manifest" "$batch"
-  done
-  [ "$started" = 1 ] && win_stop
-}
-
-win_stop() {
-  [ "${KIT_TOY_VM_KEEP_UP:-0}" = 1 ] && return 0
-  win_vm stop >/dev/null 2>&1 || true
-}
-
-# ---- drive -----------------------------------------------------------------
-mkdir -p "$BUILD_DIR/freebsd" "$BUILD_DIR/windows"
-case "$OS" in
-  freebsd) run_freebsd ;;
-  windows) run_windows ;;
-esac
-
-# Skips (missing sysroot/VM, inapplicable cases) are non-fatal, matching
-# test/toy/run.sh — only a real FAIL gates the exit.
 KIT_SKIP_IS_FAILURE=0
 kit_summary "toy-vm-$OS"
 kit_exit

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README

M	doc/TESTING.md	\|	59	++++++++++++++++++++++++++++++++++++++++++++++++++++++-----
M	lang/c/parse/parse.c	\|	6	++++++
M	lang/c/parse/parse_priv.h	\|	8	++++++++
M	mk/test.mk	\|	34	+++++++++++++++++++++++++++++++++-
M	scripts/freebsd_vm.sh	\|	30	+++++++++++++++++-------------
A	scripts/hosted.sh	\|	196	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	scripts/windows_vm.sh	\|	42	+++++++++++++++++++++++-------------------
A	test/hosted/cases/hello.c	\|	6	++++++
A	test/hosted/cases/hello.expected	\|	1	+
A	test/hosted/cases/hello.stdout	\|	1	+
A	test/hosted/run.sh	\|	143	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	test/lib/exec_target.sh	\|	60	+++++++++++++++++++++++++++++++++++++++++++++++++-----------
A	test/lib/exec_vm.sh	\|	264	+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
M	test/toy/vm.sh	\|	376	++++++++++++++++++++++---------------------------------------------------------