a proposal · open spec · working extension
Bring your AI to every website. Your model, your credentials, your context — mediated by the browser, not a vendor.
01 — the 2026 landscape
In 2024, "AI in the browser" was an idea. By 2026 it's everywhere — and it has consolidated in a way the open web hasn't, in a long time.
OpenAI ships Atlas. Perplexity ships Comet. Google rebuilt Chrome around Gemini. Each of them is a vertically-integrated stack: their browser, their agent, their model, their servers. The agent reads your screen, moves your cursor, types in your fields. To use any of it, you adopt the whole thing.
Sidebar assistants are the other shape: Brave Leo, Firefox's AI Window. Calmer, but bolted onto the side of the page. They can read the page; they can't do things on it without screen-driving DOM agents underneath.
Both shapes assume the same thing: that the agent is the platform's, not yours. The web is sliding from a place you visit into a place an agent visits on your behalf — and the agent answers to whoever shipped the browser.
02 — the proposal
fetch() is for networking?The browser is the right place to put the user's agency. It already mediates cookies, geolocation, the camera, payments. It already brokers trust between you and millions of sites you've never met. AI is the same kind of capability — and it should be wired in the same way.
A.
You pick the provider. Local, OpenAI, Anthropic, a self-hosted endpoint. Sites talk to a stable browser API; the browser talks to whichever model you chose.
B.
Your bookmarks, your tabs, your history, your MCP servers — kept on your device, surfaced to the agent only with your permission, per origin.
C.
The page asks; the browser decides what the agent can see and do. Permissions are scoped, revocable, visible. Same posture as the camera.
03 — how it works
The proposed Web Agent API is small on purpose. A page can prompt the user's model, run an agent against tools the user has approved, or register a tool of its own.
// The page never sees an API key. The browser does the routing.
const session = await window.ai.createTextSession();
const reply = await session.prompt("Summarize this page");
for await (const ev of window.agent.run({ task: "find recent press on quantum chips" })) {
if (ev.type === "tool_call") console.log("→", ev.tool);
if (ev.type === "final") console.log(ev.output);
}
navigator.modelContext.addTool({
name: "search_archive",
description: "Search our 20-year archive",
inputSchema: { type: "object", properties: { query: { type: "string" } } },
handler: ({ query }) => searchArchive(query),
});
04 — where harbor sits
Four shapes are in play in 2026. They look similar from the user's seat; underneath, the trust model is very different.
| Vendor agent browsers Atlas · Comet · Gemini in Chrome | Sidebar AI assistants Brave Leo · Firefox AI Window | Prompt API + WebMCP alone Chrome built-in AI · proposed standards | Harbor + Web Agent API extension · open spec · this proposal | |
|---|---|---|---|---|
| Whose model | vendor's, fixed | vendor's, fixed | browser-bundled (Gemini Nano) | user's choice — local, hosted, or self-run |
| Whose browser | theirs (a fork) | shipped one, locked to vendor | Chrome, today | any browser — Firefox + Chrome today |
| How agents act on pages | screen-drives the DOM | reads page, can't really act | page tools only | page tools + remote MCP, all user-scoped |
| Permissions surface | implicit, trust the vendor | per-feature toggle | page-level, opaque | typed actions + capability tokens, per-origin, with mode (plan/execute/watch) and an audit feed |
| Where context lives | vendor's servers | partly local, partly vendor | local, but only what the page sees | your machine, surfaced by the browser |
| Threat surface | full DOM, prompt-injectable | page text, summarization-shaped | narrow but Chrome-only | scoped tools — by design, smaller surface |
| Cross-site memory | vendor stitches it | none, mostly | none | user-stitched, via your MCP servers |
| Open spec | no | no | partial (Prompt API draft, WebMCP draft) | yes — and this proposal extends them |
| Who answers to you | the vendor | the vendor | the browser, narrowly | the browser, fully |
Atlas · Comet · Gemini in Chrome
Brave Leo · Firefox AI Window
Chrome built-in AI · proposed standards
extension · open spec · this proposal
05 — standards convergence
Three pieces of standards work overlap with what we're proposing. Harbor is the fourth — the seam that connects them.
Chrome Prompt API
Google · WICG draft
A built-in language model exposed to web pages. window.ai in spirit. Browser-bundled, Chrome-only, today.
WebMCP
W3C · early draft
Pages declare tools; agents call them. navigator.modelContext.addTool(). The verb-half of the puzzle.
MCP
Anthropic · de-facto protocol
The wire format for tools and context. Already what your AI talks to your servers in. A web-shaped MCP is the natural next step.
Web Agent API
this proposal
The seam. Lets a page run a real agent — model, tools, context — without the page choosing the model or holding the keys.
06 — one sketch · harbor
An extension for Firefox and Chrome. Implements the Web Agent API as
proposed. Routes window.ai to whichever model you've
configured. Brokers tools through MCP. Shows you, per origin, what the
agent is allowed to do.
07 — what you could build
The interesting demos aren't "an AI search box." They're the boring things that happen when every site can talk to your model, your tools, your context — without each one re-inventing the stack.
01
A 20-year news archive registers search_archive as a tool. Your AI uses it the way it would use any retrieval tool. The archive doesn't ship a model.
navigator.modelContext.addTool({ name: "search_archive", … })
02
One window.ai call summarizes whatever the user is reading, in their model — no API key, no server bill, no analytics.
await window.ai.createTextSession().then(s => s.prompt(article))
03
The site exposes find_flights and hold_seat as tools. The user's agent fills, picks, holds. The site runs no agent of its own.
window.agent.run({ task: "PDX → SFO Friday after 5pm, aisle" })
04
Pages declare semantic tools, not just ARIA. Screen-reader users — and everyone else — get an agent that actually understands the page's verbs.
addTool({ name: "open_chapter", inputSchema: { … } })
05
An extension hooks any input. Your AI rewrites, in your style, with your context. Same shape on every site, because the API is the browser's.
window.ai.createTextSession({ system: userStyleProfile })
08 — security & permissions
We're a draft proposal, not a hardened product. But the shape itself is the argument: the agent only sees the verbs the page declared, and only with the permissions the user granted, per origin.
credentials, payments, identity) — the engine refuses to forward labeled data into actions whose acceptsLabels excludes that label, no matter how generous the user's policy isplan (read-only), execute (act), watch (preview every write) — narrowable at any time, never widenable without re-grantAll of this is the surface of a single chokepoint — the PolicyEngine — that walks a 9-tier ladder (Ambient → Managed Deny → Sensitivity Gate → Information-Flow Check → Watchdog → Capability Token → Policy Allow → Policy Ask → Per-Origin Grant → Default-for-Effect) on every API call. Tiers 1–5 are a safety floor a user policy can't override. The full design lives in docs/PERMISSIONS.md ↗.
We use words like by design and smaller surface. We don't say safe. A draft is not a guarantee. Threat model →
We are not announcing a product. We are publishing a sketch, asking for arguments, and looking for the people who want the open web to keep being a place where your agent works for you.
09 — get started
01 — for standards folks
The Web Agent API explainer, the WebIDL, the threat model, the open questions.
02 — for everyone else
The reference extension. Firefox and Chrome. Bring your own model, your own MCP servers.
03 — for builders
We want sites that demo the API, not the API itself. Tell us what you'd build.