Deterministic Generation
Claudux treats documentation structure as source-owned state. The model can propose wording changes, but the repo owns the page tree, required sections, pinned headings, deletion policy, and source-to-doc mapping.
Why Large Repos Need a Manifest
Large repos rarely fail because the model cannot write a page. They fail because the model rewrites the tree around whatever it noticed last. A checkout route changes and the docs tool edits a pricing page, drops an E2E harness section, or reorders navigation because the prompt called the structure a preference.
docs-structure.json turns structure into a checked-in contract:
- Page IDs and paths are stable.
- Navigation order is explicit.
- Source-owned pages declare the files that make them stale.
- Required sections declare the headings that must survive.
- Pinned sections are not deleted or reparented by a model-only run.
- Deletion policy is reviewed as a manifest diff, not inferred from prose.
The manifest is intentionally separate from claudux.md. claudux.md describes site taste. docs-structure.json is operational state. If a legacy docs-map.md also exists, claudux treats it as supplemental guidance; the manifest remains the binding authority.
Pipeline
The deterministic pipeline is:
- Validate
docs-structure.jsonbefore generation. - Build
.claudux/index/static-analysis.jsonfrom tracked source files, docs files, package scripts, markdown headings, and manifest ownership. - Capture a guard snapshot for pinned heading order, pinned/read-only section body hashes, and protected skip-marker blocks.
- Add the static index summary to the model prompt as authoritative facts.
- Use
.claudux-state.jsonto find changed files since the previous run. - Resolve changed files through manifest
source_patternsto the impacted page or section set and write.claudux/index/impacted-docs.json. - Ask the model for section patch JSON instead of direct documentation writes.
- Apply patches only to manifest-owned generated sections inside the impacted allowlist during incremental runs.
- Validate
docs-structure.jsonagain after generation. - Validate the pre-generation guard snapshot and internal links.
- Rebuild the deterministic static index, impacted-doc allowlist, and guard snapshot against the final patched docs before saving the checkpoint.
Section Patch Application
When docs-structure.json exists, claudux removes direct documentation write authority from the model. The backend must return one marker-delimited JSON payload, and claudux extracts, validates, and applies that payload locally.
The extracted payload is staged at .claudux/index/section-patches.json by default. CLAUDUX_SECTION_PATCH_FILE can relocate it for tests, harnesses, or alternate scratch layouts.
format_section_patch_contract() prints the manifest-derived allowlist of patchable page_id#section_id targets and the separate read-only list before backend invocation. The manifest is the addressing surface. A generated section can be source-owned and still writable; read-only status comes from pinned: true or generated: false.
Extractor behavior
- The extractor scans raw output plus nested JSONL string fields named
text,content,result, andmessage. - Plain final responses and backend event streams use the same marker contract.
- Identical repeated payloads are deduplicated, including repeated marker pairs and echoed agent/result events.
- Conflicting repeated payloads, orphaned markers, end-before-start ordering, and invalid JSON fail the run.
- Fenced JSON is accepted, and a bare array is normalized to an object with a
patchesarray. - Turn-summary fields are ignored so truncated recap text cannot satisfy the contract.
Patch application rules
- Every patch must resolve to one manifest page and one manifest section, and a batch cannot target the same section twice.
body_markdownis the canonical body field.markdownandcontentremain compatibility aliases when reading older payloads.- If the body repeats the section heading, claudux strips that heading before writing.
- A body can contain deeper subheadings and fenced examples, but same-level or higher headings outside fences are rejected.
- Missing on-disk headings fail unless
create_if_missing: trueis set. - Pinned sections and
generated: falsesections require bothCLAUDUX_UNLOCK_PINNED_SECTIONS=1and per-patchunlock_pinned: true. - Incremental runs enforce the impacted-doc allowlist; full scans can touch any non-pinned generated section in the manifest.
- Transient cache provenance is rejected from patch bodies. The prompt can use run-specific facts for scope, but committed prose must describe stable behavior.
- Validation is all-or-nothing. One invalid, duplicate, read-only, out-of-scope, provenance-leaking, or boundary-escaping patch leaves every file unchanged.
After patches land, update() runs post-generation manifest validation, validates the pre-generation guard snapshot, runs link validation, refreshes deterministic caches, and only then saves the checkpoint. If extraction or application fails, retain_generation_debug_log() keeps the backend JSONL log available for inspection while the docs tree stays untouched.
Patch mode constrains filesystem authority, not provider compatibility. Claude is limited to Read. Codex keeps approval_policy set to never and defaults to a read-only sandbox in section-patch mode unless CODEX_SANDBOX_MODE overrides it.
Static Analysis Index
The static index is deterministic cache state written to .claudux/index/static-analysis.json by default. CLAUDUX_INDEX_DIR or CLAUDUX_STATIC_INDEX_FILE can relocate it, and prompt construction reads the resolved path.
build_static_analysis_index() rebuilds the index from tracked files on every deterministic run. Markdown under docs/ is indexed as documentation. Tracked non-doc files outside docs/, .claudux/, and nested node_modules/ paths are indexed as sources.
Recorded facts
The index stores structured facts for deterministic scoping:
- Manifest path, manifest hash, page ownership, and section ownership metadata.
- Package scripts from
package.json. - CLI tokens parsed from Bash
caselabels inbin/claudux. - Exported shell functions and tracked test files.
- Dependency edges from shell
sourceand.statements,REQUIRED_LIBS, the conditional Codex adapter source, and repo-file references inside package scripts. - Source and docs hashes plus markdown heading inventories.
- Internal markdown docs links.
- Protected skip-marker blocks with marker text, line spans, and hashes.
- Manifest page and section source ownership.
These are cache records, not documentation copy. The prompt can use them to scope a generation pass, but committed prose should avoid run-specific cache values.
Prompt summary
format_static_analysis_index_context() projects the index into a compact authoritative prompt summary before model output. That summary tells the model which scripts, command tokens, tests, ownership mappings, and preservation rules are current for the run.
Manifest mode still requires bounded section patch JSON. The static index can narrow what the model should consider, but it does not grant direct write access and it does not override the manifest allowlist.
Byte-stable caches
static-analysis.json, docs-guard-snapshot.json, and impacted-docs.json omit wall-clock timestamps and are written from reproducible inputs where the source graph permits it.
After section patches land and post-generation checks run, update() calls refresh_deterministic_generation_caches() before save_claudux_state(). That refresh rebuilds the static index against the final docs tree, recomputes the impact allowlist when the run has an incremental changed-file list, and captures a fresh guard snapshot.
Boundary of the index
The static index is authoritative for source ownership, command existence, dependency expansion, prompt scoping, docs-link inventories, and protected-block inventories. It is not a provider compatibility check and it is not the VitePress route validator. Backend model availability is checked at runtime, and nav/sidebar targets are validated by lib/validate-links.sh.
docs-structure.json Manifest
docs-structure.json is the default checked-in manifest, but claudux resolves the active manifest through docs_structure_path(). Advanced runs and tests can override the path with CLAUDUX_DOCS_STRUCTURE or DOCS_STRUCTURE_FILE without changing the repo default.
The manifest is the operational contract for docs structure. claudux.md can influence taste, but the manifest owns patch addresses, navigation targets, required headings, source ownership, and deletion authority. When both docs-structure.json and docs-map.md exist, prompt construction treats the manifest as primary and keeps the docs map as supplemental legacy guidance.
Manifest schema rules
Preflight validation enforces the structural contract before any backend runs:
- Root
deletion_policymust bemanifest_pages_require_manifest_change. - Root
generated_sections_defaultmust bebounded_patch. - Each page
deletion_policymust benever_delete_without_manifest_change. - Page paths must be repo-relative markdown paths under
docs/. - Page IDs, page paths, page order values, navigation IDs, and navigation order values must be unique.
- Navigation titles must be non-empty, and navigation links must be root-relative docs links that resolve to manifest pages.
- Page IDs, section IDs, navigation IDs, and page
nav_groupvalues must match[a-z0-9][a-z0-9._-]*. - Section IDs must be unique within a page, and a page cannot declare the same heading level plus heading text twice.
source_patternsmust be repo-root relative; absolute paths, drive prefixes, empty strings, non-string entries, and parent traversal are rejected before impact mapping.- Authority fields such as
pinned,generated, andrequiredmust be real JSON booleans. - A section is required by default unless it explicitly sets
required: false. generated: falsemarks a section read-only even when it is not pinned.
Post-generation rules
Post-generation validation adds on-disk checks:
- Every manifest page must exist.
- Required headings must exist in their declared page.
- Manifest-declared heading anchors must not be duplicated on disk.
- The manifest must include pinned doctrine for the guard snapshot to preserve.
These rules keep structure changes reviewable as manifest diffs instead of letting a model invent patch keys, nav targets, order values, deletion behavior, or ambiguous section addresses from prose.
Pinned Pages and Sections
Pinned is the write barrier. Required is the existence barrier.
During patch application:
- Ordinary generated sections can be rewritten when they are inside the current impact allowlist.
- Sections with
pinned: trueare read-only by default. - Sections with
generated: falseare read-only by the same guard, even if they are not pinned. - Section-level
source_patternsaffect incremental ownership and allowlist scope, but they do not make a section read-only. - A generated section can be source-owned and still patchable.
During guard validation, claudux tracks every pinned section plus every section that is still required:
- Pinned and required headings must still exist on disk after generation.
- The captured sequence must stay in manifest order within the page.
- Only read-only section bodies are hash-locked; editable generated sections can change as long as they stay inside their declared boundary.
required: falseopts a non-pinned section out of the existence and order guard, but it does not make agenerated: falsesection writable.- Manifest-owned pages themselves must remain present on disk.
An intentional pinned rewrite needs two signals in the same run: CLAUDUX_UNLOCK_PINNED_SECTIONS=1 in the environment and unlock_pinned: true on the individual patch. That keeps model-only runs from silently editing doctrine.
Page deletion is guarded separately from section editing. With a manifest present, the internal cleanup helper refuses manifest-owned deletion unless CLAUDUX_ALLOW_MANIFEST_CLEANUP=1 is set, and claudux recreate refuses deletion unless CLAUDUX_ALLOW_MANIFEST_RECREATE=1 is set. The public CLI exposes recreate, not a standalone cleanup subcommand.
recreate checks the deletion guard before backend validation. That ordering keeps a missing backend, auth failure, or unsupported model from masking the more important fact that a manifest-owned docs tree would be deleted.
Content Protection Markers
lib/content-protection.sh chooses literal marker pairs by file extension, and the deterministic helpers in lib/docs-manifest.sh mirror the same pairs when they index and guard protected blocks:
- Markdown, HTML, XML, and Vue use
<!-- skip -->/<!-- /skip -->. - JavaScript, TypeScript, Swift, Java, C-family, Rust, and Go use
// skip/// /skip. - Python, shell, Ruby, and Perl use
# skip/# /skip. - CSS-family files use
/* skip *///* /skip */. - SQL uses
-- skip/-- /skip. - Unknown extensions fall back to the hash-comment form.
Matching is trimmed, line-based, and literal. Indented markers still count, and regex-looking markers such as the CSS pair are treated as exact text rather than patterns.
The deterministic path uses those boundaries in two enforced places:
build_static_analysis_index()records protected blocks across tracked project files with marker text, line spans, and hashes.- The guard snapshot captures recorded protected blocks and later rejects runs that remove a block or change a recorded block hash.
strip_protected_content() is still shipped as a utility helper and is covered by tests/test-content-protection.sh, but the manifest pipeline preservation guarantee comes from indexed block facts plus guard validation, not from a pre-prompt stripping pass.
Protected-block preservation is not limited to markdown docs. Any tracked file with a recognized marker pair can participate in the guard, which keeps protected code snippets, fixture notes, and top-level project files stable during deterministic runs.
Dependency-Aware Scope
Incremental mode starts from claudux_diff_since_last(). That function unions the committed diff from last_sha..HEAD with dirty documentation and configuration files reported by claudux_docs_worktree_changes().
Dirty docs and config files
Dirty freshness signals are limited to files that can affect generated documentation state before they are committed:
docs/docs-structure.jsondocs-map.md.ai-docs-style.mddocs-site-plan.json
For those pathspecs, claudux includes unstaged changes, staged changes, and untracked files. This closes the dogfood gap where a section patch updates tracked docs while HEAD still matches the saved checkpoint. claudux diff shows that dirty docs/config state, and claudux status warns about it even when the checkpoint commit is otherwise current.
Incremental allowlist
After the changed-file list is built, resolve_impacted_docs_from_changed_files() expands scope through manifest ownership and reverse dependency edges from the static index. The expansion is intentionally upstream: if a sourced library changes, the router that sources it is pulled into scope. That matters for pages that own the router directly but not every library it loads.
Dependency edges come from more than shell source statements:
- Shell-like files contribute
sourceand.relationships. bin/clauduxcontributes explicit edges for files inREQUIRED_LIBSplus the conditional Codex adapter source.package.jsonscripts contribute edges when they reference repo files underbin/,lib/,tests/, orscripts/.
resolve_impacted_docs_from_changed_files() writes .claudux/index/impacted-docs.json by default, or the path from CLAUDUX_IMPACT_ALLOWLIST_FILE. The allowlist records the changed files, dependency-expanded files, dependency notes, impacted pages, and impacted sections. Patch mode then uses that file as the incremental write boundary:
- A section with its own
source_patternsmust be directly impacted to be patchable in an incremental run. - A generated section without its own ownership can be patched when its page is impacted.
- Full scans skip the allowlist and can touch any non-pinned generated section in the manifest.
For incremental section-patch runs, refresh_deterministic_generation_caches() reruns impact resolution with the same changed-file list and allowlist path after patches and validation. The refreshed allowlist is cache state for the final run, not a stale pre-patch artifact.
Validators
Validation is layered rather than one big pass. claudux update validates the manifest before model invocation, builds the static index, captures the guard snapshot before generation, and then applies model output only through the manifest section-patch contract. After patches land, it re-runs post-generation manifest checks, validates the pre-generation guard snapshot, runs link validation, refreshes deterministic caches, and only then saves the checkpoint.
The guard snapshot lives at .claudux/index/docs-guard-snapshot.json by default. CLAUDUX_GUARD_SNAPSHOT_FILE can relocate it for test harnesses or alternate scratch layouts.
Manifest and guard validation
Manifest validation covers contract correctness:
- JSON shape, unique page IDs, unique page paths, unique deterministic order values, and
docs/*.mdpage paths. - Stable manifest keys for navigation IDs, page IDs, section IDs, and
nav_group. - Strict enums for deletion policy and generated-section defaults.
- Non-empty navigation titles, root-relative docs links, and navigation targets that resolve to manifest pages.
- Repo-root-relative
source_patternsand real boolean values forpinned,generated, andrequired. - Unique section IDs plus unambiguous heading-level and heading-text pairs within each page.
- Post-generation checks that manifest pages exist on disk, required headings still exist, and declared heading anchors are not duplicated on disk.
- Post-generation runs also require pinned doctrine so the guard snapshot has read-only content to preserve.
The guard snapshot enforces preservation rules that schema validation cannot prove:
- Captured pinned and required headings must stay in manifest order.
- Pinned or otherwise read-only section bodies must keep the same hash unless pinned unlock is explicitly enabled.
- Files that carried recorded protected blocks must still exist on disk.
- Recorded skip-marker blocks must keep at least the captured block count, and each captured block must keep the same content hash in order across docs and source files.
The destructive recreate path uses the same manifest deletion posture but checks it before backend validation. A manifest-owned docs tree is refused before Codex or Claude availability is consulted.
VitePress proof
The checked-in VitePress config follows the project preferences:
baseisprocess.env.DOCS_BASE || '/',cleanUrlsis enabled, and the outline uses levels two and three with the labelOn this page.- The top nav order is Guide, Features, Technical, and API, matching the manifest navigation order.
- The sidebar defines a root
/entry so the sidebar appears on the homepage and provides the site-wide fallback. - Section-specific sidebar entries exist for Guide, Features, and Technical.
- Current internal nav/sidebar targets resolve to checked-in docs pages: Guide, Installation, Commands, Configuration, Features, Two-Phase Generation, Audit Snapshots, Smart Cleanup, Content Protection, Technical, Templates, Deterministic Generation, Examples, Vidux Team Agents, API, and Troubleshooting.
- Social links are absolute GitHub and npm URLs.
lib/validate-links.sh proves config targets by extracting link: entries from the VitePress config, resolving / to docs/index.md, /path/ to docs/path/index.md, and /path to docs/path.md. Before route checking it also rejects duplicate explicit markdown {#id} anchors. Hash fragments are stripped for file existence checks, so the validator proves route targets and explicit anchor uniqueness rather than arbitrary heading text.
Link validation behavior
Link validation adds docs-site checks on top of the manifest contract:
- On the green path,
lib/validate-links.shprints a single successful internal-link message, then the shared UI layer adds its own success prefix. - The failure path may re-run
lib/validate-links.sh --output <tmp>to collect a machine-readable missing-file list for one auto-fix pass. --strictturns any remaining broken links into a hard error.tests/run-tests.shincludes the regression guard thatclaudux validatemust not emit a doubled success prefix.
Backend-aware verification boundary
Verification intentionally distinguishes between configuration echo, backend preflight, and true generation failure:
show_headerandclaudux checkreport the active backend plus selected Codex settings, but they do not prove that the installed Codex CLI supports the selected model.- Commands that invoke a model go through
check_generation_backend(). On the Codex path,check_codex()must find the CLI and verify auth before generation starts. - Modern Codex CLI builds use
codex login statusfor a zero-token auth probe; older builds fall back to an exec probe. claudux recreateis the exception to eager backend preflight because it reachesrecreate_docs()first so the manifest deletion guard can refuse protected docs before backend checks run.- If a backend or patch-mode run fails after launch,
update()retains the raw JSONL log and prints backend-specific recovery steps instead of checkpointing a misleading success.
Pinned Harness Example
A mature web app often has a local E2E harness whose rules are more specific than generic "run the tests" advice.
For example, a local database harness might define the allowed start, reset, and stop commands for migrations and auth testing. A wrapper script such as scripts/run-local-harness.mjs can start services when needed, optionally reset fixtures, forward harness arguments, generate ephemeral test credentials when missing, and stop only the stack it started.
The browser seed path is even more structure-sensitive. A fixture such as e2e/fixtures/staging-seed.ts should be idempotent: resolve or create the canonical user, insert only missing domain records, and refuse to seed production resources. A global setup file can treat missing env as an opted-out no-op while still propagating the production guard.
That doctrine should not be rewritten as generic testing prose. In the application repo itself, it needs source-owned sections:
- Local service lifecycle owned by
scripts/run-local-harness.mjs. - Idempotent seed semantics owned by
e2e/fixtures/staging-seed.ts. - Production guard semantics pinned as a required section.
- Browser setup behavior owned by
e2e/fixtures/global-setup.tsandplaywright.config.ts.
When those files change, the docs should update those sections. When unrelated UI files change, the harness doctrine should survive untouched.
Claudux's own docs-structure.json keeps this section pinned as doctrine, but it does not use cross-repo paths as source_patterns. External example files are evidence in prose, not worktree-relative incremental ownership keys.
Checkpoint Contract
.claudux-state.json is the local freshness checkpoint that powers claudux diff and claudux status. It is developer-local, ignored by git, and separate from deterministic cache artifacts under .claudux/index/.
Saved fields
A successful save writes:
last_sha: the GitHEADrecorded at checkpoint time, orunknownoutside a usable Git history.last_run: the wall-clock timestamp for the successful save.backend: the active backend, such asclaudeorcodex.files_documented: tracked docs files present at save time.deterministic: metadata derived from the static analysis index.
The nested deterministic block includes:
prompt_version.- Index path, index version, and index head metadata.
- The manifest hash.
- Source hashes for tracked non-doc files.
- Section hashes for manifest sections currently found on disk.
- Source-to-section coverage built from page and section
source_patterns.
That nested block is intentionally best-effort. If Node is unavailable or the static index cannot be read, build_deterministic_state_metadata_json() returns a fallback object with nullable index and manifest metadata plus empty coverage arrays, so a successful docs run can still checkpoint freshness instead of failing after docs already updated.
The checkpoint records the backend but not the selected model or reasoning effort. A run can retry with different model settings while the persisted freshness state still answers the narrower question of which backend produced the docs.
Failed runs do not advance the checkpoint. save_claudux_state() only runs on the success path after generation, patch application, post-generation validation, link-validation handling, deterministic cache refresh, and change analysis. Backend rejection, section-patch extraction failure, strict link-validation failure, or cache-refresh failure keeps the previous checkpoint intact.
Diff and status
claudux diff compares last_sha..HEAD, then unions in uncommitted documentation/config changes under docs/, docs-structure.json, docs-map.md, .ai-docs-style.md, and docs-site-plan.json. That dirty-doc scan includes unstaged changes, staged changes, and untracked files for those pathspecs.
claudux status uses the same checkpoint to report generation time, backend, documented-file inventory size, and whether the saved commit is behind the current head when the saved commit still exists. It also reports dirty documentation/config files even when the checkpoint otherwise matches the current commit, with a prompt to run claudux diff for exact paths.
This makes the freshness model two-dimensional:
- Source commits after the saved commit mean the docs may be stale relative to code history.
- Dirty docs/config files mean the worktree may contain generated or structural documentation changes that have not been committed or re-checkpointed.
tests/test-diff-calculation.sh covers dirty tracked docs, staged docs, and untracked docs. tests/test-integration.sh covers the status warning when the checkpoint is otherwise fresh.
The split is intentional: last_run is wall-clock state, while deterministic metadata and deterministic cache files should stay stable when repo inputs and manifest ownership have not changed.