Gram Import Plan

This is the import plan for folding /Users/ryan/code/ll1 into libkit and exposing it through the kit driver. The goal is not to make ll1 a language frontend. It should land as a reusable parser/lexer-generator subsystem: EBNF in, parse and lexer tables out, with allocation-free push runtimes for generated parsers and lexers.

The standalone project already has the right broad shape for libkit: explicit allocation, no hidden CLI dependency in the generator API, immutable generated tables, and caller-owned runtime state. The import work is mostly namespace, host-boundary, build-gating, and file-layout discipline.

Decision Summary

Driver command: gram.
Public generator API header: <kit/gram.h>.
Public parser runtime header: <kit/gram_parse.h>.
Public lexer runtime header: <kit/gram_lex.h>.
Generated-code support headers: <kit/support/gram_parse_tables.h> and <kit/support/gram_lex_tables.h>.
Library subsystem gate: KIT_GRAM_ENABLED.
Driver tool gate: KIT_TOOL_GRAM_ENABLED.
Implementation directory: src/gram/.
Tests: test/gram/ plus a driver smoke lane.

ll1 remains a source repository/project name only. It should not appear in the libkit API, installed command name, or user-facing help.

Boundaries

gram is a library subsystem, not a lang/ frontend. It does not register a KitFrontendVTable, does not emit KitCg, and does not participate in source-to-object compilation unless a future frontend chooses to use it internally. Its public surface is consumed by embedders, the driver command, and generated C files.

The driver remains the only hosted layer. File reads, directory creation, stdout, stderr, and CLI allocation policy live in driver/cmd/gram.c and the hosted driver environment. The library accepts input as KitSlice, writes output to KitWriter, allocates through KitContext.heap, and reports diagnostics through KitContext.diag.

Generated grammar tables are immutable static data and are fine as globals. Mutable parser, lexer, and generator state must hang off caller-owned runtime objects or explicit libkit handles.

File Moves

Import the C implementation, generated meta tables, runtime, and tests. Do not make the Python generator part of the normal libkit build.

Current file	Kit destination	Notes
`include/llgen.h`	`include/kit/gram.h`	Public generator API, renamed to Kit types and functions.
`include/llparse.h`	`include/kit/gram_parse.h`	Public parser runtime API.
`include/lllex.h`	`include/kit/gram_lex.h`	Public lexer runtime API.
`include/llparse_tables.h`	`include/kit/support/gram_parse_tables.h`	Public generated-code support, not ordinary embedder API.
`include/lllex_tables.h`	`include/kit/support/gram_lex_tables.h`	Public generated-code support, not ordinary embedder API.
`include/llunicode.h`	`src/gram/unicode.h`	Private helper; expose later as `<kit/unicode.h>` only if there is a broader API need.
`include/llunicode_props.h`	`src/gram/unicode_props.h`	Private generated Unicode property resolver.
`runtime/llparse.c`	`src/gram/parse_runtime.c`	Implements `<kit/gram_parse.h>`.
`runtime/lllex.c`	`src/gram/lex_runtime.c`	Implements `<kit/gram_lex.h>`.
`runtime/llunicode.c`	`src/gram/unicode.c`	Private Unicode helpers for generator and lexer runtime.
`runtime/llunicode_props.c`	`src/gram/unicode_props.c`	Checked-in generated property tables.
`gen/llgen.c`	`src/gram/generator.c`	Public API implementation and C emitter after Kit context rewrite.
`gen/llgen_ll1.c`	`src/gram/ll1.c`	LL(1), Pratt, FIRST/FOLLOW, and validation.
`gen/llgen_lex_byte.c`	`src/gram/lex_byte.c`	Byte-mode lexer compiler.
`gen/llgen_lex_unicode.c`	`src/gram/lex_unicode.c`	Unicode-mode lexer compiler.
`gen/llgen_internal.h`	`src/gram/internal.h`	Private to `src/gram/*.c`.
`gen/meta.ebnf`	`src/gram/meta.ebnf`	Source grammar for the generator's own parser; not consumed by normal builds.
generated `meta` `.c/.h`	`src/gram/meta_tables.c`, `src/gram/meta_tables.h`	Check in generated tables so libkit does not need Python or a previous `gram` to build.
`gen/llgen_cli.c`	`driver/cmd/gram.c`	Rewrite as a Kit driver command using public `<kit/gram.h>`.
`tools/gen_unicode_props.py`	`scripts/gen_gram_unicode_props.py`	Regeneration helper only; not in the build.
`data/ucd/17.0.0/*`	`data/ucd/17.0.0/` or `test/gram/ucd/17.0.0/`	Keep if we want reproducible Unicode-table regeneration in-tree.
`test/.c`, `test/.ebnf`	`test/gram/*`	Library tests and fixtures.
`test/errors/*`	`test/gram/errors/*`	Error fixtures.

gen/llgen.py should not be imported into libkit. If parity against the Python reference is still useful during the transition, keep it temporarily as scripts/gram_ref.py and exclude it from release/build dependencies. Delete it once the imported C generator is trusted.

Public Renames

The standalone ll_* and llgen_* names are too short for libkit and would violate the public symbol discipline. Public definitions for the imported subsystem must use KitGram, kit_gram_, or KIT_GRAM_.

Generator API

Standalone name	Kit name
`llgen_options`	`KitGramOptions`
`llgen_compiled`	`KitGramCompiled`
`llgen_codegen`	Remove or narrow; prefer `KitWriter` outputs.
`llgen_compile_text`	`kit_gram_compile_text`
`llgen_compiled_free`	`kit_gram_free`
`llgen_dump_sexpr_text`	`kit_gram_dump_sexpr`
`llgen_generate_c`	`kit_gram_emit_c`
`llgen_parser_grammar`	`kit_gram_parser_grammar`
`llgen_lexer_grammar`	`kit_gram_lexer_grammar`
`llgen_token_count`	`kit_gram_token_count`
`llgen_token_name`	`kit_gram_token_name`
`llgen_token_display`	`kit_gram_token_display`
`llgen_find_token`	`kit_gram_find_token`
`llgen_rule_count`	`kit_gram_rule_count`
`llgen_rule_name`	`kit_gram_rule_name`
`llgen_find_rule`	`kit_gram_find_rule`

The generator API should take a const KitContext* and KitSlice inputs. It should not expose llgen_allocator; the allocator becomes ctx->heap. C output should stream to caller-provided KitWriters. Callers that want owned in-memory text can use kit_writer_mem.

Proposed core shape:

typedef struct KitGramCompiled KitGramCompiled;

typedef struct KitGramOptions {
  KitSlice name; /* optional grammar name override */
} KitGramOptions;

typedef struct KitGramEmitOptions {
  KitSlice header_path;
  KitSlice source_path;
  KitSlice prefix;
} KitGramEmitOptions;

KIT_API KitStatus kit_gram_compile_text(const KitContext* ctx,
                                        KitSlice text, KitSlice path,
                                        const KitGramOptions* opts,
                                        KitGramCompiled** out);
KIT_API void kit_gram_free(KitGramCompiled*);

KIT_API KitStatus kit_gram_emit_c(const KitGramCompiled*,
                                  const KitGramEmitOptions* opts,
                                  KitWriter* header, KitWriter* source);
KIT_API KitStatus kit_gram_dump_sexpr(const KitContext* ctx, KitSlice text,
                                      KitSlice path,
                                      const KitGramOptions* opts,
                                      KitWriter* out);

Parser Runtime API

Standalone name	Kit name
`ll_tok_kind`	`KitGramTokenKind`
`ll_rule_id`	`KitGramRuleId`
`ll_sem`	`KitGramSem`
`ll_token`	`KitGramToken`
`ll_error`	`KitGramParseError`
`ll_err_action`	`KitGramErrorAction`
`LL_ABORT`	`KIT_GRAM_ERROR_ABORT`
`LL_SKIP`	`KIT_GRAM_ERROR_SKIP`
`LL_RESYNC`	`KIT_GRAM_ERROR_RESYNC`
`ll_actions`	`KitGramActions`
`ll_slot`	`KitGramSlot`
`LL_SLOT_SIZE`	`KIT_GRAM_SLOT_SIZE`
`ll_config`	`KitGramParserConfig`
`ll_grammar`	`KitGramGrammar`
`ll_parser`	`KitGramParser`
`LL_PARSER_SIZE`	`KIT_GRAM_PARSER_SIZE`
`ll_status`	`KitGramParseStatus`
`LL_NEED_MORE`	`KIT_GRAM_PARSE_NEED_MORE`
`LL_PARSE_ACCEPT`	`KIT_GRAM_PARSE_ACCEPT`
`LL_PARSE_ERROR`	`KIT_GRAM_PARSE_ERROR`
`ll_parser_init`	`kit_gram_parser_init`
`ll_parser_push`	`kit_gram_parser_push`
`ll_parser_finish`	`kit_gram_parser_finish`
`ll_parser_result`	`kit_gram_parser_result`
`ll_stack_bounds`	`kit_gram_stack_bounds`

Lexer Runtime API

Standalone name	Kit name
`ll_lex_grammar`	`KitGramLexGrammar`
`ll_lexer`	`KitGramLexer`
`LL_LEXER_SIZE`	`KIT_GRAM_LEXER_SIZE`
`ll_lex_config`	`KitGramLexConfig`
`ll_lex_status`	`KitGramLexStatus`
`LL_LEX_TOKEN`	`KIT_GRAM_LEX_TOKEN`
`LL_LEX_NEED_MORE`	`KIT_GRAM_LEX_NEED_MORE`
`LL_LEX_EOF`	`KIT_GRAM_LEX_EOF`
`LL_LEX_ERROR`	`KIT_GRAM_LEX_ERROR`
`ll_lex_error`	`KitGramLexError`
`ll_lexer_init`	`kit_gram_lexer_init`
`ll_lexer_push`	`kit_gram_lexer_push`
`ll_lexer_finish`	`kit_gram_lexer_finish`
`ll_lexer_next`	`kit_gram_lexer_next`
`ll_lexer_error`	`kit_gram_lexer_error`

Generated-Code Support API

The generated-code support headers should use Kit names too:

Standalone name	Kit name
`ll_sym_kind`	`KitGramSymKind`
`LL_S_TERM`	`KIT_GRAM_SYM_TERM`
`LL_S_RULE`	`KIT_GRAM_SYM_RULE`
`LL_S_REP`	`KIT_GRAM_SYM_REP`
`LL_S_OPT`	`KIT_GRAM_SYM_OPT`
`ll_sym`	`KitGramSym`
`ll_prod`	`KitGramProd`
`ll_pratt_op`	`KitGramPrattOp`
`ll_pratt`	`KitGramPratt`
`ll_rule`	`KitGramRule`
`LL_TERM`	`KIT_GRAM_TERM`
`LL_RULE`	`KIT_GRAM_RULE`
`ll_lex_accept`	`KitGramLexAccept`
`LL_LEX_DEAD`	`KIT_GRAM_LEX_DEAD`
`LL_LEX_ACCEPT_NONE`	`KIT_GRAM_LEX_ACCEPT_NONE`

Generated grammar-specific token/rule enums (TOK_*, R_*, and <prefix>_*) are user artifacts. They may keep their current shape because they are controlled by the generated prefix and are not libkit exports.

Include Rewrites

The generator should emit installed-style includes:

/* generated header */
#include <kit/gram_parse.h>
#include <kit/gram_lex.h>       /* only when a generated lexer exists */

/* generated source */
#include "generated_name.h"
#include <kit/support/gram_parse_tables.h>
#include <kit/support/gram_lex_tables.h> /* only when needed */

Private src/gram/*.c files include internal.h and the public headers they implement. They must not include driver headers.

Driver Command

The user-facing command is:

kit gram [--dump-sexpr] [--prefix PREFIX] [-o OUT.c] [--header OUT.h] grammar.ebnf

The first import should preserve existing behavior:

Default .c output path: replace the input suffix with .c.
Default .h output path: replace the input suffix with .h.
Default prefix: derive from the input basename and append _.
--dump-sexpr: write the meta-grammar dump to stdout.
Exit code 0: success.
Exit code 1: compile, diagnostic, or I/O failure.
Exit code 2: bad command-line usage.

The driver implementation should use DriverEnv for KitContext, file reads, writer opening, diagnostics, and memory. The command should not call malloc, free, fopen, fprintf, or exit directly.

Driver integration points:

Add KIT_TOOL_GRAM_ENABLED to include/kit/config.h.
Add driver/cmd/gram.c.
Add driver_gram and driver_help_gram to driver/driver.h.
Add the gram row to driver/main.c, gated by KIT_TOOL_GRAM_ENABLED.
Add $(call tool-cmd,GRAM,gram) to mk/driver_srcs.mk.
Keep it in DRIVER_GROUP_OTHER for now. It is a developer tool, not a default drop-in binutils/toolchain symlink.

Build Integration

Library integration points:

Add KIT_GRAM_ENABLED to include/kit/config.h as an optional library subsystem.
Add LIB_SRCS_GRAM := $(shell find src/gram -name '*.c' ...) to mk/lib_srcs.mk, and include it only when KIT_GRAM_ENABLED is 1.
Add weak public stubs in src/api/config_stubs.c for gated-out generator API entry points. Runtime stubs may be omitted if the runtime is considered part of the generated-code ABI and the whole subsystem is always enabled in the default build; if it is gated, public runtime symbols need stubs too.
Keep src/gram/meta_tables.c checked in. Normal make lib must not require Python, network access, UCD regeneration, or a bootstrap gram binary.
Add a regeneration-only maintenance target later, for example make regen-gram-meta and make regen-gram-unicode-props.

The first import can gate runtime and generator together under KIT_GRAM_ENABLED. If size-sensitive embeddings need generated parser runtime without the generator compiler, split later into:

KIT_GRAM_RUNTIME_ENABLED: parser/lexer runtime plus support headers.
KIT_GRAM_ENABLED: generator compiler, depending on runtime.

Do not introduce that split until there is a real embedding that benefits from it; it adds config and stub surface.

Implementation Phases

Phase 0: Freeze Standalone Behavior

Run the current ll1 test suite and save the passing command set in the import notes.
Generate and check in meta_tables.c / meta_tables.h from meta.ebnf.
Confirm generated output for representative grammars is stable.

Phase 1: Mechanical Import, Private Names Still Allowed Internally

Move files into the destinations above.
Keep behavior unchanged while fixing include paths.
Add test/gram fixtures and a make test-gram target, initially allowed to fail until the namespace rewrite lands.

Phase 2: Public Namespace Rewrite

Rename every public ll_*, llgen_*, and LL_* symbol to KitGram, kit_gram_, or KIT_GRAM_ spelling.
Update emitted C and generated meta tables to use the new names.
Run make test-lib-deps to catch leaked public symbols outside Kit, kit_, or KIT.

Phase 3: Kit Context Rewrite

Replace llgen_allocator with KitContext.heap.
Replace generated text return structs with KitWriter outputs.
Replace direct diagnostics with KitContext.diag.
Remove hosted libc calls from imported library code.
Keep setjmp/longjmp only if the existing frontend panic pattern accepts it for this subsystem; otherwise convert OOM and validation aborts to explicit KitStatus unwinding.

Phase 4: Driver Command

Port gen/llgen_cli.c to driver/cmd/gram.c.
Use the public generator API only.
Add help text consistent with other driver commands.
Add a focused driver test: invoke kit gram on a small grammar, compile the generated C against libkit, and run the parser.

Phase 5: Cleanup and Documentation

Document the stable runtime API in the public headers.
Add a durable design doc under doc/ only after the subsystem ships.
Add gram to README.md and doc/DESIGN.md capability lists after the command works.
Remove temporary compatibility shims and any retained Python parity path.

Tests

Targeted tests should land with the import:

make test-gram: direct API compile, table introspection, generated parser, generated lexer, Pratt grammar, UTF-8 lexer, and error fixtures.
make test-driver-gram: CLI generation and generated-code compile/run smoke.
make test-lib-deps: symbol discipline and no accidental hosted dependencies.

Map standalone tests as follows:

Standalone test	Kit test
`test/test_llgen_api.c`	`test/gram/api_test.c`
`test/test_calc.c`	`test/gram/calc_test.c`
`test/test_features.c`	`test/gram/features_test.c`
`test/test_lexer.c`	`test/gram/lexer_test.c`
`test/test_pratt_calc.c`	`test/gram/pratt_test.c`
`test/test_unicode_support.c`	`test/gram/unicode_test.c`
`test/test_unicode_lexer.c`	`test/gram/unicode_lexer_test.c`
`test/test_utf8_runtime.c`	`test/gram/utf8_runtime_test.c`
`test/errors/*.ebnf`	`test/gram/errors/*.ebnf`

Prefer red-green import steps:

Add the test target and fixtures first.
Import the runtime until hand-written/generated tables parse again.
Import the generator until direct API tests pass.
Add the driver command and smoke test last.

Open Questions

Should generated table-layout headers be documented as stable ABI, or merely stable enough for C emitted by the same libkit version? The first import should promise only same-version compatibility.
Should Unicode UCD source data live in-tree permanently, or should only the generated property tables be checked in? Keeping the data improves reproducible regeneration but increases repository size.
Should the runtime/generator gate be split immediately? The plan says no until an embedding needs runtime-only size savings.
Should gram eventually support non-C output modes? The API should not bake in more than emit_c today, but the command can grow --emit= later.

	kit kit
	git clone https://git.ryansepassi.com/git/kit.git
	Log \| Files \| Refs \| README