boot2

Playing with the boostrap
git clone https://git.ryansepassi.com/git/boot2.git
Log | Files | Refs

commit 4ea652af1d9ea5e7dc9869386da837c95ff76f49
parent c24e386d1f9da71804e861ccfa961b6db0f6d924
Author: Ryan Sepassi <rsepassi@gmail.com>
Date:   Thu, 23 Apr 2026 06:54:48 -0700

Document m1macro P1 port plan

Diffstat:
Adocs/M1M-P1-PORT.md | 194+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 194 insertions(+), 0 deletions(-)

diff --git a/docs/M1M-P1-PORT.md b/docs/M1M-P1-PORT.md @@ -0,0 +1,194 @@ +# m1macro to P1 Port Plan + +## Goal + +Replace `src/m1macro.c` with a real P1 implementation in `src/m1m.M1`. +`src/m1m.M1` must be pure portable P1 source. The final `m1m` binary must +expand M1M input without shelling out to awk, C, Python, libc, or any host +macro processor. + +Contract: + +``` +m1m input.M1 output.M1 +``` + +Behavior should match `src/m1macro.c` byte-for-byte for valid inputs, except +where an implementation limit is explicitly documented. + +Architecture-specific code is not allowed in `src/m1m.M1`. The only +architecture-specific layer is the generated P1 `DEFINE` file that `catm` +prepends before `src/m1m.M1` during assembly. If the port needs additional +P1 op/register/immediate combinations, add them to the generator and +regenerate the arch-specific define tables. + +## Scope + +Implement the full current macro language: + +- `%macro NAME(a, b)` / `%endm` +- `%NAME(x, y)` function-like expansion with recursive rescanning +- `##` token paste +- `@local`, `:@local`, `&@local` per-expansion local rewriting +- `:param` / `&param` prefixed single-token parameter substitution +- `%le32(expr)` / `%le64(expr)` +- `%select(cond, then, else)` +- Lisp-shaped integer expressions used by the builtins + +Preserve the C tokenizer model: whitespace is normalized, strings are single +tokens, `#` and `;` comments are skipped, and output is emitted as tokens plus +newlines rather than preserving original formatting. + +## Static Data Model + +Use fixed BSS arenas, mirroring the C implementation: + +- Input buffer: raw file contents plus NUL sentinel. +- Output buffer: emitted text. +- Text buffer: copied token text and generated text. +- Source token array: token records for the original input. +- Macro table: name, params, and body token records. +- Expansion pool: temporary tokens produced by macro calls and `%select`. +- Stream stack: active token streams for recursive rescanning. + +Token record layout should be compact and uniform: + +``` +kind 8 bytes +text_ptr 8 bytes +len 8 bytes +line 8 bytes +``` + +Macro records should store offsets/pointers into the text arena and token +arena, not inline strings. Prefer power-of-two record sizes so address math +stays simple in P1. + +## Implementation Milestones + +1. **Runtime shell** + + Keep the existing P1 argv, open/read, write, and fatal-error paths. Remove + any external backend or `execve` shortcut. + +2. **Text and token primitives** + + Add helpers for `append_text_len`, `push_token`, token equality, + span equality, output token emission, and output newline emission. + Keep error handling simple: set an error message pointer and branch to + `fatal`. + +3. **Lexer** + + Port `lex_source` directly. It should fill `source_tokens` from + `m1m_input_buf`, copying all token text into `text_buf`. + +4. **Stream processor skeleton** + + Implement push/pop stream and the main `process_tokens` loop. Initially + support pass-through tokens and `%macro` skipping, then expand toward full + behavior. + +5. **Macro definitions** + + Port `define_macro`: parse header, params, body tokens, duplicate-name + checks, body limit checks, and line-start `%endm` recognition. + +6. **Macro call expansion** + + Port `parse_args`, parameter substitution, prefixed substitution, local + rewriting, token paste, and expansion-stream pushback. + +7. **Expression evaluator** + + Port integer atom parsing and S-expression evaluation. Implement arithmetic, + comparisons, shifts, and bitwise ops over 64-bit signed values as far as P1 + can represent them. Document any temporary 32-bit limitation if unavoidable, + but the target is C-compatible 64-bit behavior. + +8. **Builtins** + + Implement `%le32`, `%le64`, and `%select` on top of the expression evaluator + and stream pushback. + +9. **Cleanup and limits** + + Replace generic “not implemented” errors with precise failures for buffer + overflow, malformed macro headers, arg-count mismatch, bad expressions, and + bad paste operands. + +## Portability Rule + +`src/m1m.M1` must use only P1 tokens plus labels/data. Do not hand-code +aarch64, amd64, or riscv64 instructions in this file. Do not introduce +per-arch branches, per-arch data layouts, or per-arch syscall sequences in the +implementation. + +Allowed architecture-specific work: + +- Extend `src/p1_gen.py` when `m1m.M1` needs a P1 operation tuple that is not + currently generated. +- Regenerate `build/<arch>/p1_<arch>.M1`. +- Keep the existing build shape where the arch-specific define file is + prepended with `catm` before the portable P1 source. + +All algorithmic behavior, buffer layout, parsing, expansion, expression +evaluation, and error handling belongs in portable P1. + +## P1 Support Needed + +The current build may stage `PROG=m1m` on aarch64 first, but the source must +remain portable P1 from the start. Staging on one arch is a build milestone, +not permission to add arch-specific source. + +Likely generator/table updates: + +- More `ADDI` immediates for record-size and arena-limit arithmetic. +- More `LD/ST/LB/SB` offsets for token, macro, and stream record fields. +- Additional RRR register triples used by parser loops and address math. +- Possibly a small set of helpers/macros for 32-byte record addressing. + +Do not hide core behavior behind host tools. If a P1 operation is missing, +extend the generated P1 definitions or rewrite the algorithm in available P1. + +## Acceptance Tests + +Use `src/m1macro.c` as the oracle during development. + +Minimum checks: + +1. Build `m1m`: + + ``` + make PROG=m1m ARCH=aarch64 build/aarch64/m1m + ``` + +2. Compare representative inputs against the C implementation: + + ``` + src/m1macro.c oracle: p1/aarch64.M1M + src/m1macro.c oracle: p1/P1.M1M + custom fixture: paste, locals, prefixed args, %le32/%le64, %select + malformed fixtures: duplicate macro, bad paste, wrong arg count + ``` + +3. Require byte-identical output for valid fixtures. + +4. Require non-zero exit for invalid fixtures. + +5. Once stable, use `m1m` to expand the P1 M1M front-end and assemble a small + program through the normal stage0 toolchain. + +## Non-Goals + +- No dependency on awk, shell scripts, Python, libc, or the host C compiler at + runtime. +- No new macro language features. +- No formatting preservation beyond the current C expander behavior. +- No recursive macro cycle detection unless added after parity. + +## Done Definition + +`src/m1m.M1` contains the expander core, the generated `m1m` binary runs in the +target Alpine container, and all acceptance tests match `src/m1macro.c` without +executing any external macro-expansion program.