Agent harnesses like Claude Code run in the terminal. That's their strength. Text in, text out. No opinions about layout, no competing for screen real estate with your editor.
But every so often, the agent needs something the terminal can't give it. You need to pick between three visual options. Annotate a screenshot. Compare layouts side by side. Some interactions only make sense with pixels, and no amount of terminal formatting fixes that.
The terminal doesn't need replacing. It needs an escape hatch.
The Escape Hatch Already Exists
Claude Code's skill and hook system lets the agent run arbitrary processes. A process can open any UI surface: a browser tab, a native window, anything that puts pixels on screen.
The pattern is always the same. The agent opens a UI surface. The user interacts with it. The UI writes structured text to stdout, the process exits, and the agent reads the output like any other tool result. The session never breaks. You just borrow a better input surface for a moment.
Skills let the agent invoke UI on demand. But sometimes you don't want the agent deciding whether to show a UI. You want it to happen every time. That's what hooks are for. A post-tool hook on file edits could open a visual diff for review. A pre-commit hook could spawn an approval screen. Deterministic, not a judgement call.
A Working Example
Linemark is a diff review UI that integrates with Claude Code as a skill. You type /linemark. A browser tab opens, you leave line-level comments, and feedback flows back to the agent as structured markdown.
The implementation is two files. A Node.js server and a single HTML file. No dependencies. No build step. The browser doesn't talk to Claude; it talks to the local server, which formats feedback as markdown and writes it to stdout. The agent reads it like any other tool result.
Linemark is hand-crafted for one job. But it proves the contract works. And it raises the obvious question: why stop at hand-built tools?
From Hand-Built to Generated
The agent already knows how to write UI code. It does it all day for you. The missing piece was a reason to write it for itself, and a channel to collect the response.
You're picking a colour palette for a new project. The agent generates swatches from your brand colours, you click the combination you want. Try doing that with hex values in a terminal.
You're choosing between three visual directions for a landing page. It renders all three, you click one.
You're annotating a screenshot to show the agent what's wrong. It renders the image, you circle the problem areas, it gets back coordinates and notes.
Each of these UIs exists for 30 seconds. It doesn't need to be maintained, tested, or versioned. It just needs to collect one decision and get out of the way.
And it doesn't have to be HTML. On macOS, the agent could generate a SwiftUI view and run it natively. On Linux, a GTK app. On any platform, a single HTML file. Whatever renders fastest on the machine you're using.
Subagents as UI Brokers
The problem with agents generating raw UI code is that HTML is unbounded. Broken layouts, unexpected elements, one-offs with no shared component language. Frameworks like json-render fix this by constraining the agent to a component catalogue. Instead of writing HTML, the agent outputs JSON conforming to a schema. The renderer validates it and produces real components. The agent isn't freestyling UI. It's making a tool call.
Claude Code already delegates work to subagents. Research subagents read files. Testing subagents run suites. Same pattern for UI. The parent agent decides it needs visual input and spawns a subagent whose entire job is to generate the right components, serve them, collect the user's response, and return it as text.
The subagent is ephemeral too. It exists to broker one interaction. When it's done, it's gone.
The Constraints That Make It Work
Minimal overhead. Sub-second to visible, or people won't use it. No build steps, no package managers, no project scaffolding.
Ephemeral by default. No state persists beyond the interaction. If you need it again, the agent generates it again. Maybe differently, because the context changed.
Structured output. The UI must produce text the agent can parse. The contract is stdout. Not a websocket. Not a callback URL. Text written to a stream, then the process exits.
Safe. Raw HTML means trusting the model. A component catalogue means the schema decides what's possible, not the model.
The Takeaway
The terminal stays the conversation layer. But the agent gains the ability to spawn interaction surfaces whenever the conversation needs them. Not a fixed IDE with panels and tabs. The exact UI this moment requires, in whatever technology this machine runs best, then gone.
The best agent interface is the one that isn't there until you need it.
