commit ba84296701ef636f54258c4a3b2e5850f63254e7
parent e9d50b023fe017da1965ab93090e14c169f589b8
Author: Ryan Sepassi <rsepassi@gmail.com>
Date: Mon, 4 May 2026 11:00:58 -0700
OS.md update
Diffstat:
| M | docs/OS.md | | | 189 | ++++++++++++++++++++++++++++++++++++++++++++++++------------------------------- |
1 file changed, 114 insertions(+), 75 deletions(-)
diff --git a/docs/OS.md b/docs/OS.md
@@ -41,14 +41,43 @@ These are the native Linux ABIs; the per-arch shims in
~520–930) marshal P1 registers into them. Any kernel that implements
these three ABIs verbatim can host the chain.
-Syscall numbers are the standard Linux-on-`uname-m` numbers used by
-those macros (e.g. `read=63` on aarch64, `read=0` on amd64). A
-fresh-write OS is free to renumber, but only at the cost of also
-rewriting the per-arch `p1_sys_*` macros.
+## Platform layers
-## Process image
+A compliant platform owes the chain four things:
-### ELF
+1. **ISA execution** — a CPU (or emulator) that runs the target
+ user-mode instruction stream `M0`/`hex2` emit.
+2. **Image loader** — reads a static ELF, maps `PT_LOAD` segments,
+ lays out the initial stack, transfers control to `e_entry`.
+3. **Address space and syscall trap** — a per-process virtual memory
+ with a movable program break, plus a trap handler that decodes the
+ per-arch syscall ABI from §Targets and dispatches.
+4. **Syscall implementations** — the 8 Tier-1 / +3 Tier-2 behaviors,
+ backed by a byte-addressable persistent store for the file-related
+ ones.
+
+The remaining sections specify each layer. "Implementing the
+contract" means all four; readers chasing only the syscall tables
+will miss layers 1–3.
+
+## Layer 1 — ISA execution
+
+The chain emits **integer-only, user-mode** code for the chosen arch:
+
+- Integer arithmetic, load/store, branches/calls.
+- The syscall trap instruction from §Targets.
+- **No FPU.** `HAVE_FLOAT` is off through libc; `cc.scm` rejects
+ `0.0` literals. The kernel needs no FP save/restore beyond what
+ the platform demands (single-process here, so moot).
+- **No SIMD, no atomics.** Single-threaded; no shared memory.
+- **One arch per image.** No multi-arch fat ELFs.
+
+A platform that can run static integer-only Linux user binaries on
+the named arch already satisfies this layer.
+
+## Layer 2 — Image loader
+
+### ELF format
- **ET_EXEC, static.** No `PT_INTERP`, no dynamic linker. tcc-boot2's
output and every host artefact are statically linked.
@@ -57,12 +86,11 @@ rewriting the per-arch `p1_sys_*` macros.
- **Entry at `e_entry`.** No `_start` indirection required from the
kernel; the loader's job is to transfer control to `e_entry` with
the stack laid out below and to return execution to userspace.
-- **Single arch per image.** No multi-arch fat ELFs.
The `ELF.hex2` file in this repo emits exactly this shape (one
`PT_LOAD`, `e_entry` set, no PHDR self-reference).
-### Stack at entry
+### Initial stack
Standard Linux SysV layout. The kernel must place at the initial
stack pointer, low to high:
@@ -83,7 +111,9 @@ sp + 8 argv[0] (pointer)
NULL to find `environ`. **auxv is not required** — nothing in the
chain reads it.
-### Address space
+## Layer 3 — Address space and syscall trap
+
+### Memory model
- **One contiguous heap, grown via `brk`.** The kernel exposes a
per-process program break; `sys_brk(0)` returns it, `sys_brk(addr)`
@@ -95,55 +125,24 @@ chain reads it.
request.** No W^X enforcement complications: tcc-boot2 doesn't JIT;
every page is either RX (text) or RW (data/bss/stack/heap).
-## Process lifecycle
-
-- **Image swap via `execve`** (Tier 2). Replaces the calling process's
- memory map; on success, control returns at the new image's
- `e_entry`.
-- **Spawn via `clone`** with `fork()` semantics (Tier 2): new
- address space (no `CLONE_VM`), new fd table, parent/child return
- distinguished by return value (0 in child, child-pid in parent).
- The scheme1 prelude calls `(sys-clone)` with no arguments — the
- P1pp wrapper supplies `SIGCHLD` as the only flag. The `fork()`
- syscall itself is not required.
-- **Reap via `waitid`** (Tier 2). Only `WEXITED` (=4) is used. Job
- control flags are not needed.
-- **Termination via `exit_group`.** Exit status is the low byte of
- the argument. No `atexit`, no destructors.
-
-No signal-handler installation is required. Default actions
-(SIGSEGV → terminate, SIGPIPE → terminate, etc.) are sufficient. The
-chain installs zero handlers; `boot2-syscall.c` stubs `raise` to
-ENOSYS.
-
-## Filesystem
-
-A flat, byte-addressable file abstraction with POSIX read/write
-semantics. Concretely:
-
-- Regular files have a length and an in-file byte offset per fd.
-- `O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_TRUNC | O_APPEND` flags
- honored; no `O_NONBLOCK`, no `O_DIRECT`.
-- Mode bits on `openat(O_CREAT)`: only the user-rwx bits need
- honoring; group/other and setuid bits can be ignored.
-- `lseek` whences: `SEEK_SET=0`, `SEEK_CUR=1`, `SEEK_END=2`.
-- `unlinkat(AT_FDCWD, path, 0)` removes a regular file.
+### Syscall ABI
-**Not required:**
+Trap instruction, argument registers, syscall-number register, and
+return register are listed per arch in §Targets. Syscall numbers
+default to the standard Linux-on-`uname-m` values used by the per-arch
+P1 macros (e.g. `read=63` on aarch64, `read=0` on amd64). A
+fresh-write OS may renumber, but only at the cost of also rewriting
+the per-arch `p1_sys_*` macros in `P1/P1-{aarch64,amd64,riscv64}.M1pp`.
-- `stat`, `fstat`, directory iteration, symlinks, hard links, file
- modes beyond a usable subset, mtime, ownership.
-- A hierarchical filesystem in any rich sense; flat directory plus
- `/` separators is enough. tcc-boot2 reads files by literal path
- strings the build emits.
+Error returns follow the standard Linux convention: a non-negative
+result on success or a negative errno value in the return register.
+See [§Error convention](#error-convention).
-The chain opens 3 fd kinds: source files (read), output files
-(write+create+trunc), and the inherited stdin/stdout/stderr (0/1/2).
-Pipes appear only at Tier 2.
+## Layer 4 — Syscalls
-## Tier 1 — toolchain syscalls
+### Tier 1 — toolchain (8 calls)
-Eight calls. Wired in `P1/P1pp.P1pp:986-1055`.
+Wired in `P1/P1pp.P1pp:986-1055`.
| name | linux nr (aa64 / amd64 / riscv64) | semantics |
|-----------|-----------------------------------|------------------------------------------------------|
@@ -156,21 +155,48 @@ Eight calls. Wired in `P1/P1pp.P1pp:986-1055`.
| unlinkat | 35 / 263 / 35 | called as `unlinkat(AT_FDCWD=-100, path, 0)` |
| exit_group| 93 / 60 / 93 | `void exit(status)`; never returns |
-Errors are returned as negative errno (`-EBADF`, `-ENOENT`, …) in the
-result register, per the standard Linux convention. The libc errno
-layer (`vendor/mes-libc/boot2-syscall.c`) negates and stores into a
-single global `errno` int.
-
Everything in `docs/LIBC.txt`'s "syscall-using" column reduces to
exactly these eight (`fopen → openat`, `fseek → lseek`, `malloc/
realloc/free → brk`, `__assert_fail / abort / exit → exit_group`,
etc.).
-## Tier 2 — driver syscalls
+#### Filesystem semantics
+
+A flat, byte-addressable file abstraction with POSIX read/write
+semantics:
+
+- Regular files have a length and an in-file byte offset per fd.
+- `O_RDONLY | O_WRONLY | O_RDWR | O_CREAT | O_TRUNC | O_APPEND` flags
+ honored; no `O_NONBLOCK`, no `O_DIRECT`.
+- Mode bits on `openat(O_CREAT)`: only the user-rwx bits need
+ honoring; group/other and setuid bits can be ignored.
+- `lseek` whences: `SEEK_SET=0`, `SEEK_CUR=1`, `SEEK_END=2`.
+- `unlinkat(AT_FDCWD, path, 0)` removes a regular file.
+
+Not required: `stat`, `fstat`, directory iteration, symlinks, hard
+links, file modes beyond a usable subset, mtime, ownership. A
+hierarchical filesystem in any rich sense is not required either —
+flat directory plus `/` separators is enough; tcc-boot2 reads files
+by literal path strings the build emits.
+
+The chain opens 3 fd kinds: source files (read), output files
+(write+create+trunc), and the inherited stdin/stdout/stderr (0/1/2).
+No pipes are used at any tier.
+
+#### Termination
+
+- **`exit_group`.** Exit status is the low byte of the argument. No
+ `atexit`, no destructors.
+- **No signal-handler installation required.** Default actions
+ (SIGSEGV → terminate, SIGPIPE → terminate, etc.) are sufficient.
+ The chain installs zero handlers; `boot2-syscall.c` stubs `raise`
+ to ENOSYS.
+
+### Tier 2 — driver (+3 calls)
-Adds three. Per-arch macros already exist in `P1/P1-*.M1pp`. The
-scheme1 prelude's `spawn` / `run` / `wait` / `exit` are built
-directly on these (`scheme1/prelude.scm:520-537`).
+Per-arch macros already exist in `P1/P1-*.M1pp`. The scheme1 prelude's
+`spawn` / `run` / `wait` / `exit` are built directly on these
+(`scheme1/prelude.scm:520-537`).
| name | linux nr (aa64 / amd64 / riscv64) | driver role |
|---------|-----------------------------------|-------------------------------------------|
@@ -178,7 +204,20 @@ directly on these (`scheme1/prelude.scm:520-537`).
| execve | 221 / 59 / 221 | image swap; takes `(prog, argv)` — no envp arg in the prelude wrapper, so the kernel-side execve must accept a NULL/empty envp without erroring |
| waitid | 95 / 247 / 95 | reap child; called as `waitid(P_PID=1, pid, info, WEXITED=4)` — info[8]=si_code, info[24]=si_status (`scheme1/prelude.scm:497-506`) |
-**Notably not required:**
+#### Process lifecycle
+
+- **Image swap via `execve`.** Replaces the calling process's memory
+ map; on success, control returns at the new image's `e_entry`.
+- **Spawn via `clone`** with `fork()` semantics: new address space
+ (no `CLONE_VM`), new fd table, parent/child return distinguished by
+ return value (0 in child, child-pid in parent). The scheme1 prelude
+ calls `(sys-clone)` with no arguments — the P1pp wrapper supplies
+ `SIGCHLD` as the only flag. The `fork()` syscall itself is not
+ required.
+- **Reap via `waitid`.** Only `WEXITED` (=4) is used. Job control
+ flags are not needed.
+
+Notably **not** required at Tier 2:
- `dup3` / `dup2`, `pipe` / `pipe2` — no fd plumbing between
processes. Children inherit stdin/stdout/stderr (0/1/2) from the
@@ -190,18 +229,18 @@ directly on these (`scheme1/prelude.scm:520-537`).
If a future driver needs redirection (say, capturing tcc-boot2's
stderr into a file), the right move is to grow the prelude to use
-`dup3` and add the syscall here; until then it's not in the
-contract.
-
-## Errors
-
-- **Convention:** every syscall returns either a non-negative result
- or a negative errno value in the result register. No errno TLS
- variable in the kernel/userspace contract — the value lives in the
- return register.
-- **Errno numbers:** standard Linux constants (`EBADF=9`,
- `ENOENT=2`, `EFAULT=14`, …). The libc layer maps them through
- `strerror` lookup tables vendored from mes.
+`dup3` and add the syscall here; until then it's not in the contract.
+
+### Error convention
+
+- Every syscall returns either a non-negative result or a negative
+ errno value in the return register. No errno TLS variable in the
+ kernel/userspace contract — the value lives in the return register.
+ The libc errno layer (`vendor/mes-libc/boot2-syscall.c`) negates
+ and stores into a single global `errno` int.
+- Errno numbers: standard Linux constants (`EBADF=9`, `ENOENT=2`,
+ `EFAULT=14`, …). The libc layer maps them through `strerror` lookup
+ tables vendored from mes.
## Out of scope