essay · landscape

The 2026 landscape, in four shapes.

"AI in the browser" was an idea in 2024. By 2026 it's a category — and it has consolidated in a way the open web hasn't seen in a long time. Here's where everyone landed, and what it closes off.

Raffi Krikorian May 2026 ~12 min read

1. Vendor agent browsers

The first shape is the vertically-integrated agent browser. OpenAI's Atlas, Perplexity's Comet, Google's Gemini-in-Chrome. Each one is a fork of Chromium (or, in Google's case, Chromium itself) wrapped around a single vendor's model and a single vendor's agent. The agent reads your screen, moves your cursor, types in your fields. It is the platform.

The pitch is real: a competent agent that drives the browser is genuinely useful. Booking, research, comparison shopping. The trade is also real: you take the whole stack, or you take none of it. Your model is theirs. Your context is theirs. Your cross-site memory is stitched on their servers.

A vendor agent browser is not a browser with AI in it. It is an AI with a browser around it.

The DOM-driving approach also has a security shape we should name: the agent sees everything every page renders, including content placed there by other origins, including content placed there to manipulate the agent. Prompt injection isn't an edge case in this design — it's the architecture.

The calmer cousin: Brave Leo, Firefox's AI Window, the side-panel offerings in Edge and Arc. These add an AI surface next to the page rather than over it. They can summarize, answer, occasionally search.

They have a real virtue: the user knows where the AI starts and the page ends. They have a real limit: they can't do much. They can read the page, but acting on the page means going back to a DOM agent — and most of them don't.

The model is still the vendor's. The context still mostly leaves your machine. Sidebars are politer than agent browsers, but the trust model is the same.

3. Prompt API + WebMCP, alone

The standards-track answer is two complementary drafts. Chrome's Prompt API gives the page a window.ai-shaped object backed by a browser-bundled model. W3C's WebMCP draft lets the page declare tools an agent can call: navigator.modelContext.addTool.

Each is good. Each is partial.

They need a seam. Something that lets a page say "run an agent against these tools, with the user's model" without the page picking the model or holding the keys. That seam is the missing piece.

4. The fourth shape

The fourth shape is the one we're proposing. The browser as the broker. The page asks; the browser decides. The user picks the model — local, hosted, self-run — and the browser routes window.ai to it. The page declares its tools, or the user attaches their own MCP servers, and the browser brokers what each agent run is allowed to see and do.

Crucially, none of this requires a new browser. It's an extension today. It's a proposed addition to the Prompt API and WebMCP work for tomorrow. The proposal page has the short version.


Why the architecture matters more than the demo

It is tempting to compare these four shapes by what their demos look like. The demos converge: an agent does a task, on a page, in a browser. They look the same in a tweet.

The architectures don't converge. They diverge on every question that matters once you're not the demo: whose model decides what gets done; whose machine the context lives on; whose servers see your reading; what the revoke button looks like; what happens when the model is wrong.

The web answered these questions, once, for cookies and cameras and payments. The browser sat in the middle. The user could see what was happening. There was a button to say no.

The argument of this proposal is that the same architecture answers the same questions for AI. Not because the browser is special — because the browser is the only place those questions have ever been answered well.


Continue: read the Web Agent API spec →
Or jump to: the comparison table →