Composable Tools: Integration Without Coupling

The first two posts in this series described how we project a single library into multiple consumption contexts and choose which projection fits which caller. Both posts focused on individual tools — how one capability reaches different consumers.

This post is about what happens when multiple tools are present in the same environment. How does Biff know Vox is installed? How does PR/FAQ find Quarry? How does a shell hook — a bash script with no imports — participate in the same integration model as a Python library?

We’ve been working on an integration standard that answers these questions. The design has six layers, four composition axes, and one rule that holds the whole thing together: every tool must work alone.

The degradation rule

This is the constraint that shapes everything else. If Vox is installed, Biff speaks your messages aloud. If Vox isn’t installed, Biff still works — you just don’t hear anything. If Quarry is installed, the PR/FAQ researcher searches your knowledge base for evidence. If Quarry isn’t installed, the researcher proceeds with web research alone.

No building block may require another building block. Integration is enrichment, not dependency. This sounds obvious, but it constrains the design in useful ways — it means every integration point must be guarded by a check, and every check must have a graceful fallback.

Six layers, one discovery pattern

We found it helpful to name the integration layers explicitly, from loosest to tightest coupling:

Layer	Name	Mechanism
L0	Presence	Does the peer’s state directory exist?
L1	Discovery	Is the peer’s CLI on PATH? Are its MCP tools available?
L2	Events	Can I watch for the peer’s tool calls via hook matchers?
L3	State	Can I read the peer’s configuration or session state?
L4	Library	Can I import shared Python helpers?
L5	Orchestration	Can I shape Claude’s behavior through prompt coordination?

The first four layers (L0–L3) are what we call the universal tier. Any tool — ours or external — can integrate at these layers using filesystem conventions and shell scripts. No shared code, no Python imports, no npm packages. A bash script can check for a directory, run command -v, and parse YAML frontmatter with grep and sed.

L4 and L5 are the enhanced tier, reserved for tighter integration between our own tools. These layers provide Python helpers and prompt-level orchestration. The degradation rule applies here too — every L4/L5 feature must fall back to L0–L3 behavior when the shared library or peer plugin is absent.

Every integration follows the same check sequence:

L0: Is the peer's state directory present?
L1: Is the peer's CLI available?
L3: What is the peer's current state?
 ✓: Act on state, or skip if disabled.

If any step fails, the tool proceeds without the peer. No error, no warning, no degraded mode indicator — just the tool working as if the peer doesn’t exist.

Four composition axes

Building blocks compose along four axes. Not every tool participates in every axis.

Domain (WHAT) — Subject knowledge. Z Spec, PR/FAQ, Use Cases, LangLearn, Dungeon. These are the tools that know about a specific discipline.

Persona (WHO) — Character and teaching style. When a domain tool enters tutor mode, Persona adapts Claude’s tone and approach. A formal methods mentor for Z Spec. A product strategist for PR/FAQ.

Audio (HOW — spoken delivery) — Vox. Speaks output, notifications, and summaries. Any tool that produces output the user reads should deliver it through Vox when speaking is enabled.

Visual (HOW — diagrams and illustrations) — Lux (coming). The visual counterpart to Vox — generates diagrams, charts, and illustrations from text descriptions.

Any subset of axes works. The richest experience uses all four — a Z Spec tutor session with a formal methods mentor persona, spoken explanations, and generated diagrams. But each axis degrades independently. Remove Vox: text only. Remove Persona: default tone. Remove Lux: no diagrams. Remove the domain tool: no integration at all, which is fine — the other tools keep working.

Shell hooks as first-class integration

This is the part we didn’t expect. Shell hooks — bash scripts triggered by Claude Code events — turned out to be natural participants in the integration model.

A PostToolUse hook runs a shell script when a tool call completes. That script can check for a peer’s state directory (L0), read a peer’s config with grep (L3), and call a peer’s CLI (L1). It enters the same integration stack from an event-driven starting point, using the same filesystem conventions that Python code uses.

The Biff → Vox integration is a concrete example. When a /wall broadcast arrives, Biff’s notification hook:

Checks if .punt-labs/vox/ exists (L0)
Reads speak from Vox’s state file (L3)
Calls vox synthesize with the message text (L1)

The entire integration is a shell script. No Python, no imports, ~110ms end to end [1]. The integration standard was designed for this — YAML frontmatter that grep can parse, sentinel directories that test -d can check, and CLIs that command -v can discover.

What we’ve learned so far

The integration standard is new — most integrations are implemented but the standard itself was codified this week. A few observations:

Naming layers helped. Before we had explicit layer names, integrations were ad hoc. “Check if biff is installed” and “read biff’s config” felt like the same thing. Giving them distinct names (L0 Presence, L3 State) made it easier to reason about what each integration actually needs and where it can degrade.

The universal tier is more useful than the enhanced tier. Most of our integrations live at L0–L3 — filesystem checks, CLI calls, state file reads. We haven’t needed L4 (shared Python helpers) much yet because the shell-level primitives are sufficient. This may change as integrations get more complex.

YAML frontmatter was the right state format. Every plugin stores state in YAML frontmatter inside markdown files. This is parseable by shell scripts (grep + sed), readable by Python (yaml.safe_load), and human-inspectable with cat. The markdown body below the frontmatter is available for freeform content — but most plugins only use the frontmatter today.

We haven’t tested this with external tools yet. The universal tier (L0–L3) is designed to be open — any tool can participate. We haven’t validated that claim with tools outside our ecosystem. The design should work, but “should work” and “works” are different things [2].

What’s next

The integration standard is published in punt-kit. The integration architecture page on this site has the full layer model, composition axes, and concrete examples. Two upcoming building blocks — Persona (composition axis: WHO) and Lux (composition axis: Visual) — will be the first test of whether the standard holds up as the tool count grows.

References

See DES-017 in the Vox DESIGN.md for call path performance benchmarks: MCP ~3.2s, Bash ~4.6s, Hook → CLI ~110ms.
We plan to work with external tool authors to validate L0–L3 integration. If you build Claude Code plugins and want to try integrating with our tools, we’d like to hear from you.