The Six Problems of Agentic Engineering

When engineering teams start working alongside AI agents, the tooling conversation tends to collapse into “which agent framework should we use?” But framework choice is one problem out of at least six, and the six don’t live in the same layer.

We’ve been mapping this landscape while building our own tools, and what we’ve found is that the problems separate cleanly by timescale and by who they serve. No single tool spans more than one layer well. The players are often building adjacent layers without realizing it — which means they’re complements, not competitors.

Here’s the map as we understand it today.

1. Intent & Trust — “Why was this built?”

Timescale: Per-commit. Backward-looking.

When an agent generates 300 lines from a one-line prompt, the reviewer has no access to the reasoning chain, the constraints considered, or the alternatives rejected. The code is decoupled from the intent that produced it.

Entire.io, created by Thomas Dohmke (CEO of Entire.io, formerly GitHub CEO), is building what they call AI-native version control — a semantic reasoning layer alongside Git [1]. Their Checkpoints tool captures prompts, AI reasoning, and the transcripts that produced each change, linked to commits. Where Git tracks what changed, Entire tracks why. They call the gap they’re closing the “provenance gap,” and they’ve described the shift as moving from “engineering as craft to engineering as specification and intent.”

This layer is about audit and trust — making the output of AI-assisted development reviewable and accountable.

2. Context & Memory — “What was decided?”

Timescale: Cross-session. Forward-looking.

Agent sessions are isolated. Each starts from zero. Prior architectural decisions, team conventions, and design rationale are forgotten. For a solo developer, this is friction. For a team, it’s architectural entropy — agents working on the same project contradict each other because they don’t share memory.

SageOx, founded by Ajit Banerjee (ex-Hugging Face), Milkana Brace (founder of Jargon, acquired by Remitly), and Ryan Snodgrass (15 years at Amazon), is building agentic context infrastructure [2]. Their system captures team discussions and coding sessions, structures them into durable history, and primes every new agent session with relevant context via their Ox CLI. Their key insight: “Agent sessions don’t start from zero. They start from shared memory.”

Meanwhile, the AI coding tools themselves are converging on simpler forms of persistent context. Anthropic’s CLAUDE.md files [3], Cursor’s .cursorrules, Windsurf’s rules files, and the emerging AGENTS.md convention [4] all address the same need — making project knowledge survive session boundaries.

We use Michael Nygard’s ADRs [5] adapted for AI co-authors (see our post on design decision logs). It’s deliberately low-tech — a DESIGN.md file that every agent reads on startup. As purpose-built context tools like SageOx mature, we expect the manual approach to become an input to those systems rather than a standalone practice.

Entire vs. SageOx: Close neighbors, different granularity. SageOx captures team-level decisions to inform future sessions (forward-looking). Entire captures commit-level reasoning to build trust in past output (backward-looking). A team could use both.

3. Communication & Presence — “Who is here right now?”

Timescale: Real-time. Present tense.

Engineers and agents work in the same repo but can’t see each other. There’s no way to ask “who is active on this project?” and get an answer that includes both humans and agents. There’s no way to send a directed message to a specific person or agent without leaving the terminal.

This is the layer Biff occupies — MCP-native commands for team communication inside Claude Code sessions. BSD Unix vocabulary (/who, /finger, /plan, /write, /read), NATS relay for cross-machine messaging, repo-scoped by default.

The distinctive design choice is that humans and agents are co-equal participants. Both have presence, plans, and mailboxes. This matters because many agent coordination frameworks — CrewAI [6], LangGraph [7], Google’s A2A [8] — primarily treat humans as operators who configure the system and approve outputs, rather than as participants alongside agents in a shared workspace. (LangGraph does support human-in-the-loop nodes, but the default interaction model is still orchestrator-driven.)

4. Work Tracking — “What needs doing?”

Timescale: Multi-session. Strategic.

As agents and humans produce work faster, the bottleneck shifts from execution to discovery and sequencing. What work exists? What depends on what? Who claimed what? What’s blocked?

We use Beads (bd) — git-native issue tracking where issues travel with the repository in a .beads/ directory. It’s not a project management tool (no sprints, no boards). It’s a work discovery tool that answers “what should I work on next?” accounting for dependencies and priority.

Anthropic’s agent teams [9] feature includes a shared task list, but it’s session-scoped and ephemeral — how do 4 agents divide a single feature right now? Beads operates at a different timescale — across 50 open issues, which 3 should I work on today? Different problems, different tools.

Traditional issue trackers (Linear, GitHub Issues, Jira) serve this layer for humans but aren’t agent-aware. The gap is work tracking that agents can read, claim, and update as first-class participants.

5. Session Coordination — “How do agents divide this task?”

Timescale: Within a session. Tactical.

A single task — refactoring auth, reviewing a PR, debugging a race condition — may benefit from multiple agents working in parallel. They need to divide files, share findings, avoid conflicts, and converge on a result.

Anthropic’s agent teams [9] is the most direct solution we’ve seen: one lead session coordinating N teammates, each with its own context window, sharing a task list and a mailbox. It’s experimental and has known limitations (no session resume, no persistent identity), but the architecture is sound — shared task list with dependency tracking, direct inter-agent messaging, and hooks for quality gates.

Claude Code’s subagents [10] offer a lighter alternative: focused workers that report results back to the caller without inter-agent communication. Lower overhead, less coordination.

Other frameworks occupy this layer with different tradeoffs: CrewAI [6] (role-based, hierarchical), LangGraph [7] (graph-based, shared immutable state), claude-flow [11] (MCP-based, hive-mind topology).

6. Agent Interoperability — “How do opaque agents talk to each other?”

Timescale: Infrastructure. Persistent.

Different organizations build agents using different frameworks, different models, different protocols. How does Agent A (built on CrewAI, running GPT-4) collaborate with Agent B (built on LangGraph, running Claude)?

Google’s A2A protocol (Agent2Agent) is the clearest answer: JSON-RPC 2.0 over HTTPS, Agent Cards for discovery, 22k+ GitHub stars, 23 open pull requests, Linux Foundation governance [8]. A2A explicitly preserves agent opacity — agents collaborate without sharing internal memory.

Microsoft’s Agent Framework (merging AutoGen and Semantic Kernel) adds native MCP, A2A, and OpenAPI support, targeting the same interop problem from within the enterprise stack [12].

Steve Yegge has described MCP as potentially becoming the integration standard of the AI era [13], which suggests this interoperability layer may consolidate faster than the coordination layers above it.

This is the infrastructure layer — standards that outlive individual sessions, organizations, and frameworks.

The Layer Cake

#	Layer	Question	Timescale	Who’s Building
1	Intent & Trust	Why was this built?	Per-commit	Entire.io
2	Context & Memory	What was decided?	Cross-session	SageOx, CLAUDE.md, ADRs
3	Communication & Presence	Who is here?	Real-time	Biff
4	Work Tracking	What needs doing?	Multi-session	Beads, Linear, GitHub Issues
5	Session Coordination	How do agents divide work?	Within-session	Agent Teams, CrewAI, LangGraph
6	Agent Interop	How do opaque agents talk?	Infrastructure	A2A, Microsoft Agent Framework

The key observation: no tool spans more than one layer well. Entire doesn’t do presence. SageOx doesn’t do messaging. Agent Teams doesn’t persist across sessions. Biff doesn’t track tasks. This isn’t fragmentation — it’s healthy specialization. The layers have different timescales, different state models, and different user models.

What We Don’t Know

This map reflects what we can see from our position — a small team building tools in this space. We’re practitioners, not analysts. The landscape is moving fast, and there are almost certainly layers or players we’re missing. Academic research on hybrid human-agent teams is emerging [14], but the field is young — most of what we know about multi-layer agent coordination comes from practitioners building in production, not controlled studies.

Some open questions:

Will Agent Teams go cross-machine? If Anthropic adds persistent identity and network relay, the overlap between layers 3 and 5 grows significantly.
Will context tools add real-time communication? SageOx’s design is explicitly async today. If they add a real-time layer, the boundary between layers 2 and 3 shifts.
Is six the right number? We’ve been working with this decomposition for a few weeks. It may turn out that some layers merge or that we’re missing one.

We’re sharing the map because it’s been useful to us for understanding what we’re building and — just as importantly — what we’re not building. If it’s useful to others working in this space, that’s the point.

References

Entire.io. “AI-Native Version Control.” 2025–present. entire.io
SageOx. “Agentic Context Infrastructure.” 2025–present. sageox.ai
Anthropic. “Claude Code Memory (CLAUDE.md).” 2024–present. docs.anthropic.com
“AGENTS.md.” 2025. pnote.eu
Nygard, M. “Documenting Architecture Decisions.” Cognitect Blog, 2011. cognitect.com
CrewAI. “AI Agent Framework.” 2024–present. github.com/crewAIInc
LangChain. “LangGraph.” 2024–present. langchain.com
Google. “Agent2Agent Protocol (A2A).” 2025. a2a-protocol.org
Anthropic. “Agent Teams.” 2025–present. docs.anthropic.com
Anthropic. “Claude Code Sub-Agents.” 2025–present. docs.anthropic.com
ruvnet. “claude-flow.” 2025. github.com/ruvnet
Microsoft. “Agent Framework.” 2025–present. learn.microsoft.com
Yegge, S. “MCP Is the New HTTP.” Sourcegraph Blog, 2025. sourcegraph.com
Hopf, K. et al. “Hybrid Human-Agent Teams in Information Systems Projects.” Journal of Strategic Information Systems, 2025. journals.sagepub.com