Building L1 Tools for L4 Agents

When an LLM needs to show a user something — a table, a chart, a form — there are two extremes. It can generate raw UI code and run it. Or it can call a deterministic API that renders a fixed layout. The first is flexible but unverifiable. The second is safe but rigid.

We’ve been looking for the line between the two. Lux is our attempt: a display server built at Level 1 (fully deterministic) that agents compose at Level 4 (LLM orchestrator + deterministic tools). This post describes where we drew the boundary, how the layers work, and why that boundary placement matters for verification.

This builds on the five verification levels we described earlier. The short version: L1 code has zero verification gap (every path hand-coded, exact oracle). L4 code has an orchestration gap (tools are verifiable, but the sequence of calls is not). The strategy we proposed there — push behavior down the spectrum and maximize L1 surface area — is what Lux tries to put into practice for visual output.

The Architecture

Lux has three layers, separated by two boundaries:

Layer 3: LLM (Claude Code)           L4 — agentic composition
  │ MCP tool call (JSON)
  │ ─── boundary 1: the protocol ───
Layer 2: MCP Server (lux serve)       thin adapter
  │ Unix socket (JSON frames)
  │ ─── boundary 2: the socket ───
Layer 1: Display Server (lux display) L1 — deterministic rendering
  │ ImGui render loop (60fps)
  ▼
Window on screen

The display server is the L1 core. It renders an element tree — tables, plots, sliders, text, buttons, windows — at 60 frames per second using Dear ImGui [1] and OpenGL. All interactive behavior is built in: table filtering, row selection, pagination, keyboard navigation. The LLM never touches any of it.

The MCP server is a thin adapter. It translates six tool calls (show, update, set_menu, clear, ping, recv) into protocol messages sent over a Unix domain socket [2]. It adds nothing to the behavior — it’s Cockburn’s ports and adapters pattern [3] applied to the agent-display boundary.

The LLM composes scenes by describing element trees as JSON and calling show(). After that, it’s idle. All user interaction — filtering a table, clicking a button, navigating pages — happens in the display server at render time. The LLM only re-engages when the user asks for a mutation (e.g., close a bead, update a record), at which point it runs the command and sends an update() with fresh data.

What the LLM Actually Writes

Here’s where it gets concrete. We have a beads issue tracker that stores data as JSONL. A Lux skill teaches the LLM how to display it as a filterable board. The LLM’s job reduces to a data pipeline:

Read .beads/issues.jsonl and parse each line
Map bead fields to table columns (ID, Title, Status, Priority, Type)
Map bead fields to a detail panel (metadata grid + description body)
Construct a TableElement with filters and detail configuration
Call show() once

That’s about 40 lines of generated Python. The skill that teaches it is about 100 lines of markdown — no code, just instructions.

Everything else — the search filter, the status and type dropdown filters, the 10-row pagination with Prev/Next, the keyboard arrow navigation, the detail panel that syncs with row selection, the auto-select-first-row behavior on filter change — is built into the display server’s TableElement. Two thousand lines of deterministic, tested rendering code that the LLM never generates, never modifies, and never needs to understand.

The ratio matters. 40 lines of agentic data mapping. 2,000 lines of deterministic rendering. The verification gap exists only in the 40 lines — and those lines do nothing more than read JSON and populate a data structure.

Why the Boundary Sits Where It Does

We didn’t start here. Early prototypes had the LLM generating ImGui calls directly — writing Python render functions that ran inside the display loop. That worked, but it meant the LLM was writing L4 UI code: non-deterministic, hard to test, different every time. A table that filters correctly on one generation might not on the next.

Moving filtering, pagination, and selection into the display server was a deliberate decision to push behavior down the spectrum. The LLM doesn’t need to implement filtering — it needs to declare what’s filterable. That’s a much smaller surface for error.

Three properties fall out of this boundary placement:

Performance without round trips. Filtering a 500-row table at 60fps requires no LLM call. The display server applies the filter predicate locally. If the LLM had to re-render on every keystroke, latency would make the UI unusable.

Safety without sandboxing. The LLM sends JSON element descriptions, not executable code. There is nothing to sandbox because there is nothing to execute. The protocol is the constraint — if you can describe it as JSON, Lux renders it; if you can’t, it doesn’t happen.

Flexibility without fragility. The same TableElement works for beads, git logs, API responses, CSV files — any tabular data. The LLM adapts the data mapping; the display server provides consistent interactive behavior. A new data source doesn’t require new UI code, just a new 40-line pipeline.

When You Need to Cross the Boundary

Sometimes declarative elements aren’t enough. A custom visualization, a game canvas, a live animation — these require code that runs inside the render loop. Lux supports this with a render_function element that accepts Python source code.

This is an explicit L1-to-L4 boundary crossing, and Lux treats it as one. When the LLM sends a render function:

The display server shows a consent dialog with the full source code
An AST scanner flags suspicious patterns (filesystem access, network calls, subprocess spawns) as warnings — not a security boundary, just a signal
The user clicks Allow or Deny
If allowed, the code compiles and runs inside the frame loop with a RenderContext providing state, timing, and canvas dimensions

The consent gate is the boundary made visible. Everything below it is L1. Everything above it, once approved, is L4 code running with the user’s explicit permission. If denied, the element renders as a one-line “[Code execution denied]” message and the rest of the scene continues normally.

We haven’t needed render functions for most use cases. The 22 built-in element kinds — text, tables, plots, sliders, trees, draw canvases — cover the common patterns. The escape hatch exists, but the design pressure is to avoid it.

What We Don’t Know Yet

Lux is alpha software. We’ve tested it on our own tools — beads boards, dashboards, data explorers — but we don’t know how the pattern holds for use cases we haven’t tried. The 22 element kinds might not be enough. The skill-based teaching approach (markdown instructions, no code) might not scale to complex compositions. The consent gate for render functions hasn’t been stress-tested against adversarial inputs.

The Smalltalk Morphic model [4] — live, composable, inspectable objects — is the long-term inspiration, but we’re far from it. Right now Lux is a display server with a protocol. Whether it becomes something closer to a live programming environment depends on how the element tree and interaction model evolve.

What we can say is that the boundary placement — deterministic rendering below, agentic composition above — has held up for everything we’ve built so far. The LLM writes data mappings, not UI code. The verification gap stays narrow. And the 60fps render loop doesn’t care who composed the scene.

References

Cornut, O. (2014–present). “Dear ImGui: Bloat-free Graphical User Interface for C++ with Minimal Dependencies.” github.com/ocornut/imgui
Anthropic. (2024–present). “Model Context Protocol.” modelcontextprotocol.io
Cockburn, A. (2005). “Hexagonal Architecture.” alistair.cockburn.us
Maloney, J. H. & Smith, R. B. (1995). “Directness and Liveness in the Morphic User Interface Construction Environment.” UIST ‘95. acm.org