Skip to content

Enrichers

Enrichers are a new architectural layer in maat that sits between collectors and rules. Their sole purpose is to derive higher-level facts from lower-level facts.

An enricher consumes facts and produces new facts. Unlike a collector, it does not read the filesystem or network. Unlike a rule, it does not produce findings. It exists to enable semantic interpretation that static analysis cannot provide.

Why enrichers exist

Some architectural patterns are invisible to AST-based analysis:

  • Two functions implement the same business rule in different syntax (Connascence of Algorithm, semantic).
  • Architecture drift between intended design and actual code.
  • Semantic code smells that require interpreting meaning, not just parsing syntax.

These require a probabilistic model — typically an LLM — to interpret meaning. Enrichers provide a controlled path for this interpretation without violating the determinism of the rule layer.

The execution pipeline

Collectors (I/O, deterministic)

Enrichers (probabilistic)

Rules (pure, deterministic)

Findings

The kernel runs this in three phases:

  1. Collectors run in parallel — gather facts from the filesystem and codebase.
  2. Enrichers run in parallel — transform and augment facts. All enrichers receive the same snapshot of collected facts; they cannot depend on facts produced by other enrichers.
  3. Rules run in parallel — consume facts (both raw and enriched) and produce findings.

Writing an enricher

An enricher is a package with a default export created by defineEnricher() from @maat-tools/contracts.

ts
import { type Enricher, defineEnricher } from '@maat-tools/contracts'

export type SimilarFunctionPair = {
  functionA: string
  functionB: string
  similarityReason: string
}

export type SemanticSimilarityEnricherOptions = {
  threshold: number
}

export class SemanticSimilarityEnricher
  implements Enricher<'acme.ts.functions', 'acme.semantic.similarity'>
{
  readonly id = 'acme.semantic-similarity'
  readonly needFacts = ['acme.ts.functions'] as const
  readonly provideFacts = ['acme.semantic.similarity'] as const

  constructor(private readonly options: SemanticSimilarityEnricherOptions) {}

  async enrich(facts: { 'acme.ts.functions': unknown[] }): Promise<{
    facts: { 'acme.semantic.similarity': SimilarFunctionPair[] }
    usedTokens?: number
    cost?: number
  }> {
    // Use an LLM or other probabilistic model to analyze functions
    // and identify semantic similarity.
    const pairs: SimilarFunctionPair[] = []

    // ... analysis logic ...

    return {
      facts: { 'acme.semantic.similarity': pairs },
    }
  }
}

// Extend maat's registries for TypeScript autocomplete
declare module '@maat-tools/contracts' {
  interface FactRegistry {
    'acme.semantic.similarity': SimilarFunctionPair[]
  }

  interface EnricherRegistry {
    '@acme/maat-enricher-semantic-similarity': SemanticSimilarityEnricherOptions
  }
}

export default defineEnricher(
  (options: SemanticSimilarityEnricherOptions) => new SemanticSimilarityEnricher(options),
)

Using enrichers in config

ts
import { defineConfig } from '@maat-tools/core'

// Import plugin packages so their declaration merging is visible
import '@acme/maat-enricher-semantic-similarity'

export default defineConfig({
  check: { strict: true },
  collectors: [['@acme/maat-collector-ts-functions', { root: './src' }]],
  enrichers: [['@acme/maat-enricher-semantic-similarity', { threshold: 0.85 }]],
  rules: [['@acme/maat-rule-connascence-algorithm', {}]],
})

Probabilistic contamination

All enrichers are probabilistic by definition. They interpret, synthesize, or infer. There is no such thing as a "deterministic enricher" — if a transformation is deterministic, it belongs in a collector, a rule, or an insight.

When a rule consumes any fact produced by an enricher, the resulting finding is contaminated with uncertainty. The kernel marks it with requiresVerification: true. This is probabilistic contamination: the finding carries the uncertainty of its source.

The rule itself remains pure and deterministic. The finding is explicitly flagged as needing human review. The system does not pretend the finding is trustworthy.

Consequences for findings

PropertyDeterministic findingProbabilistic finding
SourceFacts from collectors onlyFacts from enrichers (directly or mixed)
BadgeNone[Verify] in CLI output
requiresVerificationfalse / absenttrue
Breaks strict build?YesNever
Goes to ledger?YesYes (as finding.unverified via maat check --ledger)
Can be baselined?YesYes — maat baseline baselines every non-baselined ledger record, including finding.unverified ones

Human-in-the-loop verification

Findings with requiresVerification: true are presented with a [Verify] badge. They never break CI builds. When maat check --ledger is used, they are written to the ledger as finding.unverified.

A human can verify a finding after reviewing it:

bash
maat verify --fingerprint <fp>

This promotes the finding in the ledger from finding.unverified to finding.observed. On subsequent runs, when the kernel produces the same finding, the CLI reconciles against the ledger: if the fingerprint is in observed state, the finding loses requiresVerification and its [Verify] badge. It is now treated as a normal, deterministic finding — it can be persisted, baselined, and can break builds.

If a finding is a false positive, it can be dismissed:

bash
maat verify --fingerprint <fp> --revoke

Caching

For the official @maat-tools/enricher-llm package, every LLM response is cached by default. Before any LLM call, each item is looked up in .maat/enricher-cache/ by a key derived from the item's content, the prompt instructions, and the provider/model pair. If nothing changed — same code, same prompt, same model — the cached result is used and the LLM is never called again for that item. Only changed items trigger new calls; entries are per-item, so one changed function re-asks about one function, not the whole batch. Entries for items that no longer exist are pruned automatically.

Commit .maat/enricher-cache/ with your repository. This makes enriched runs reproducible across machines and CI — same facts, no network access, no repeated cost — and it means LLM cost scales with how much code changed, not with how often maat check runs. The cache location can be overridden with the MAAT_ENRICHER_CACHE_DIR environment variable.

To bypass the cache for a single run — for example, when you suspect cached enriched facts are stale or want to reproduce an issue without committing the cache — pass --no-cache to maat check:

bash
maat check --no-cache

This forces all configured enrichers to re-run. It has no effect when no enrichers are configured.

Tradeoffs and design decisions

Enrichers vs. deterministic alternatives

Tradeoff: Enrichers introduce non-determinism into the fact pipeline. Every finding that depends on an enriched fact requires human verification before it can be treated as actionable.

ApproachProsCons
Deterministic collector + ruleFully reproducible, no human bottleneckCannot detect semantic patterns
Enricher + ruleCan detect semantic patternsRequires human verification, adds latency to CI feedback
LLM inside ruleSame output as enricherHard violation of ADR-006. Breaks determinism contract. Not supported.
LLM inside collector with cacheDeterministic facts, reproducibleCollector becomes more complex; cache must be committed

Performance and cost

Tradeoff: LLM-backed enrichers add latency and cost when code changes.

  • Enrichers run in parallel. Total latency is bounded by the slowest enricher, not the sum of all. They all receive the same snapshot of collected facts.
  • The always-on cache means unchanged items never trigger LLM calls — a run over unchanged code makes zero calls.

Verification fatigue

Tradeoff: Teams may accumulate many probabilistic findings that all require manual verification.

  • Only use enrichers for patterns that truly cannot be detected deterministically.
  • Prefer deterministic rules for structural checks (imports, layers, boundaries).
  • Use enrichers for semantic checks that justify the human-in-the-loop cost.

Determinism is preserved at the rule layer

This is the most important tradeoff to understand. The existence of enrichers does not mean maat rules are no longer deterministic. Rules remain pure functions:

  • Rule.evaluate(facts) is still synchronous and deterministic.
  • Rules do not call LLMs or make network requests.
  • The kernel does not invoke LLMs.
  • The non-determinism is explicitly bounded to the fact layer.

The maat philosophy is not contradicted: rules are still deterministic. The probabilistic nature is contained in the facts, and the system marks the findings that depend on them.

Does this mean maat is no longer deterministic?

No. maat's determinism guarantee applies to the rule layer, not the entire pipeline. Here is the exact boundary:

LayerDeterministic?Why
CollectorsYes (for same filesystem state)Produce deterministic facts
EnrichersNo (by design)Interpret, synthesize, infer
RulesYes (guaranteed)Pure function: same facts → same findings
Findings from deterministic factsYesFully reproducible
Findings from enriched factsFlagged for verificationExplicitly marked as uncertain

A finding that comes from deterministic facts is fully deterministic. A finding that comes from enriched facts is explicitly flagged and restricted until a human verifies it. Once verified, it is treated as deterministic.

This is architectural separation with explicit contamination tracking, not a breakdown of determinism.

Enricher package structure

The @maat-tools/enricher-llm package provides shared types and utilities for LLM-backed enrichers:

ts
import type { EnricherLLMInput } from '@maat-tools/enricher-llm'
TypePurpose
EnricherLLMInputLLM configuration for the supported provider/model combinations (provider, model, optional extra and timeoutMs). Currently Google Vertex AI with Gemini models — see LLM models