scheme1 garbage collector

scheme1 uses a non-moving, stop-the-world mark-and-sweep collector. All pairs, headered Scheme objects, and raw byte buffers share one 256 MiB managed heap. Object addresses never change, so eq?, mutation, record identity, and unsafe address-inspection behavior remain stable across a collection.

Managed block layout

Every allocation has a 16-byte collector header immediately before its unchanged payload:

header + 0: (total block bytes << 8) | kind | mark
header + 8: intrusive link
payload:    existing PAIR, HEAP, or raw-byte layout

The allocation kinds are PAIR, HEAP, RAW, and FREE. The low-byte mark bit is 0x80; kind occupies the low three bits. Total size includes the collector header and alignment padding. An unmarked allocated block keeps its own header address in the second word, validating object-start candidates. That word is reused by the mark worklist while collecting and by the free list after sweeping.

Scheme-level layouts and tags are unchanged. A pair payload is still two words and receives TAG.PAIR; a headered object still starts with its HDR word and receives TAG.HEAP; bytevector data and symbol-name copies are untagged RAW payload pointers.

Allocation

cons, alloc_hdr, and alloc_bytes all use the managed allocator. It:

Searches the address-ordered free list using first fit.
Splits a free block when at least a header and one aligned payload word remain.
Otherwise allocates from the unused heap tail.
On failure, performs one collection and retries.
Reports scheme1: heap exhausted if no contiguous block is large enough.

PAIR and HEAP payloads are cleared before publication so an object under construction is safe to trace if a later allocation triggers collection. RAW payloads are opaque and callers initialize the bytes they require.

heap-usage returns the currently allocated physical bytes, including all collector headers. (collect-garbage) requests a synchronous collection and returns the unspecified value.

Exact roots

The native P1 stack is never scanned conservatively. Allocation-capable runtime functions use bounded shadow-root frames. Each descriptor records:

the native-local base address;
a bitmap of slots holding tagged Scheme references;
a bitmap of slots holding managed RAW pointers.

Root slots are zero-initialized on frame entry. GC-aware return and tail-call macros pop the descriptor on every exit path. Overflow of the 8192-frame descriptor area terminates deterministically with scheme1: shadow root stack overflow.

The complete root set is:

active slots described by the shadow-root stack;
both arguments saved by cons while its allocator may collect;
every symbol table entry's stable RAW name and global Scheme binding.

Pointers obtained through unsafe inspection primitives are not roots.

Marking and tracing

Marking does not allocate. A newly marked block is pushed onto an intrusive worklist through its collector link word. Tagged references are dispatched by tag to the expected allocation kind; raw roots require RAW.

The tracer follows:

Allocation	Outgoing managed references
Pair	`car`, `cdr`
Bytevector	RAW data payload
Closure	parameters, body, environment
Primitive	parameter data (type descriptor or fixnum)
Type descriptor	field-name list
Record	type descriptor and every field
Multiple-values pack	every value slot
RAW	none

Symbols, fixnums, and immediates contain no managed pointer. Type-descriptor names are symbols, so they need no additional traversal.

Sweeping

Sweep walks the physical block chain from heap_base to heap_tail. Marked allocations survive and have their mark cleared. Unmarked allocations and prior free blocks become address-ordered free runs; adjacent runs are coalesced immediately. A final free run is trimmed from heap_tail instead of being retained on the free list. The pass also recomputes heap-usage.

Collection is synchronous and occurs only after allocation failure or an explicit collect-garbage call. Fragmentation can still exhaust the heap when total free space is sufficient but no single run satisfies a request.

Compiler integration

cc.scm builds translation units directly in the managed heap. Persistent compiler data stays reachable through the parser/world/codegen graph; discarded lexer, preprocessor, parser, and evaluator objects are reclaimed automatically. There is no main/scratch heap selection, promotion cycle, or arena rewind API. Generic deep-copy remains available as an ordinary identity-preserving graph clone.

	boot2 Playing with the boostrap
	git clone https://git.ryansepassi.com/git/boot2.git
	Log \| Files \| Refs \| README