Harbor — AI as a browser capability

01 — the 2026 landscape

AI in the browser is everywhere. None of it is yours.

In 2024, "AI in the browser" was an idea. By 2026 it's everywhere — and it has consolidated in a way the open web hasn't, in a long time.

OpenAI ships Atlas. Perplexity ships Comet. Google rebuilt Chrome around Gemini. Each of them is a vertically-integrated stack: their browser, their agent, their model, their servers. The agent reads your screen, moves your cursor, types in your fields. To use any of it, you adopt the whole thing.

Sidebar assistants are the other shape: Brave Leo, Firefox's AI Window. Calmer, but bolted onto the side of the page. They can read the page; they can't do things on it without screen-driving DOM agents underneath.

Both shapes assume the same thing: that the agent is the platform's, not yours. The web is sliding from a place you visit into a place an agent visits on your behalf — and the agent answers to whoever shipped the browser.

Read the full landscape essay →

02 — the proposal

What if AI were a browser capability, like `fetch()` is for networking?

The browser is the right place to put the user's agency. It already mediates cookies, geolocation, the camera, payments. It already brokers trust between you and millions of sites you've never met. AI is the same kind of capability — and it should be wired in the same way.

A.

Your model.

You pick the provider. Local, OpenAI, Anthropic, a self-hosted endpoint. Sites talk to a stable browser API; the browser talks to whichever model you chose.

B.

Your context.

Your bookmarks, your tabs, your history, your MCP servers — kept on your device, surfaced to the agent only with your permission, per origin.

C.

Mediated by the browser.

The page asks; the browser decides what the agent can see and do. Permissions are scoped, revocable, visible. Same posture as the camera.

03 — how it works

Three primitives. The rest is policy and plumbing.

The proposed Web Agent API is small on purpose. A page can prompt the user's model, run an agent against tools the user has approved, or register a tool of its own.

1 · use the user's modelwindow.ai

// The page never sees an API key. The browser does the routing.
const session = await window.ai.createTextSession();
const reply   = await session.prompt("Summarize this page");

2 · run an agent that can call tools the user has approvedwindow.agent

for await (const ev of window.agent.run({ task: "find recent press on quantum chips" })) {
  if (ev.type === "tool_call") console.log("→", ev.tool);
  if (ev.type === "final")     console.log(ev.output);
}

3 · a page declares its own tool · W3C WebMCPnavigator.modelContext

navigator.modelContext.addTool({
  name:        "search_archive",
  description: "Search our 20-year archive",
  inputSchema: { type: "object", properties: { query: { type: "string" } } },
  handler:     ({ query }) => searchArchive(query),
});

04 — where harbor sits

The differentiation is in the architecture, not the chrome.

Four shapes are in play in 2026. They look similar from the user's seat; underneath, the trust model is very different.

	Vendor agent browsers Atlas · Comet · Gemini in Chrome	Sidebar AI assistants Brave Leo · Firefox AI Window	Prompt API + WebMCP alone Chrome built-in AI · proposed standards	Harbor + Web Agent API extension · open spec · this proposal
Whose model	vendor's, fixed	vendor's, fixed	browser-bundled (Gemini Nano)	user's choice — local, hosted, or self-run
Whose browser	theirs (a fork)	shipped one, locked to vendor	Chrome, today	any browser — Firefox + Chrome today
How agents act on pages	screen-drives the DOM	reads page, can't really act	page tools only	page tools + remote MCP, all user-scoped
Permissions surface	implicit, trust the vendor	per-feature toggle	page-level, opaque	typed actions + capability tokens, per-origin, with mode (plan/execute/watch) and an audit feed
Where context lives	vendor's servers	partly local, partly vendor	local, but only what the page sees	your machine, surfaced by the browser
Threat surface	full DOM, prompt-injectable	page text, summarization-shaped	narrow but Chrome-only	scoped tools — by design, smaller surface
Cross-site memory	vendor stitches it	none, mostly	none	user-stitched, via your MCP servers
Open spec	no	no	partial (Prompt API draft, WebMCP draft)	yes — and this proposal extends them
Who answers to you	the vendor	the vendor	the browser, narrowly	the browser, fully

Vendor agent browsers

Atlas · Comet · Gemini in Chrome

Whose model: vendor's, fixed
Whose browser: theirs (a fork)
How agents act: screen-drives the DOM
Permissions: implicit, trust the vendor
Context lives: on vendor servers
Threat surface: full DOM, prompt-injectable
Cross-site memory: vendor stitches it
Open spec: no
Answers to: the vendor

Sidebar AI assistants

Brave Leo · Firefox AI Window

Whose model: vendor's, fixed
Whose browser: shipped one, locked to vendor
How agents act: reads page, can't really act
Permissions: per-feature toggle
Context lives: partly local, partly vendor
Threat surface: page text, summarization-shaped
Cross-site memory: none, mostly
Open spec: no
Answers to: the vendor

Prompt API + WebMCP alone

Chrome built-in AI · proposed standards

Whose model: browser-bundled (Gemini Nano)
Whose browser: Chrome, today
How agents act: page tools only
Permissions: page-level, opaque
Context lives: local, but only what the page sees
Threat surface: narrow but Chrome-only
Cross-site memory: none
Open spec: partial (Prompt API, WebMCP drafts)
Answers to: the browser, narrowly

Harbor + Web Agent API

extension · open spec · this proposal

Whose model: user's choice — local, hosted, self-run
Whose browser: any — Firefox + Chrome today
How agents act: page tools + remote MCP, user-scoped
Permissions: typed actions + capability tokens, per-origin, with audit feed
Context lives: your machine, surfaced by the browser
Threat surface: scoped tools — by design, smaller
Cross-site memory: user-stitched via your MCP servers
Open spec: yes — and extends Prompt API + WebMCP
Answers to: the browser, fully

05 — standards convergence

We're not the only ones thinking about this. We're trying to fit in.

Three pieces of standards work overlap with what we're proposing. Harbor is the fourth — the seam that connects them.

Chrome Prompt API

Google · WICG draft

A built-in language model exposed to web pages. window.ai in spirit. Browser-bundled, Chrome-only, today.

WebMCP

W3C · early draft

Pages declare tools; agents call them. navigator.modelContext.addTool(). The verb-half of the puzzle.

MCP

Anthropic · de-facto protocol

The wire format for tools and context. Already what your AI talks to your servers in. A web-shaped MCP is the natural next step.

Web Agent API

this proposal

The seam. Lets a page run a real agent — model, tools, context — without the page choosing the model or holding the keys.

06 — one sketch · harbor

Harbor is the working sketch of the proposal.

An extension for Firefox and Chrome. Implements the Web Agent API as proposed. Routes window.ai to whichever model you've configured. Brokers tools through MCP. Shows you, per origin, what the agent is allowed to do.

Read the spec →
Build with it →

fig. 1 · the trust line

07 — what you could build

Boring things that are suddenly easy.

The interesting demos aren't "an AI search box." They're the boring things that happen when every site can talk to your model, your tools, your context — without each one re-inventing the stack.

01

An archive that searches itself with your words.

A 20-year news archive registers search_archive as a tool. Your AI uses it the way it would use any retrieval tool. The archive doesn't ship a model.

navigator.modelContext.addTool({ name: "search_archive", … })

02

A docs site that reads itself for you.

One window.ai call summarizes whatever the user is reading, in their model — no API key, no server bill, no analytics.

await window.ai.createTextSession().then(s => s.prompt(article))

03

A booking flow you can finish without using the booking flow.

The site exposes find_flights and hold_seat as tools. The user's agent fills, picks, holds. The site runs no agent of its own.

window.agent.run({ task: "PDX → SFO Friday after 5pm, aisle" })

04

An accessibility shortcut for everyone.

Pages declare semantic tools, not just ARIA. Screen-reader users — and everyone else — get an agent that actually understands the page's verbs.

addTool({ name: "open_chapter", inputSchema: { … } })

05

Your own writing, in your own model, anywhere there's a textarea.

An extension hooks any input. Your AI rewrites, in your style, with your context. Same shape on every site, because the API is the browser's.

window.ai.createTextSession({ system: userStyleProfile })

08 — security & permissions

A smaller surface, by design.

We're a draft proposal, not a hardened product. But the shape itself is the argument: the agent only sees the verbs the page declared, and only with the permissions the user granted, per origin.

What the model is shielded from

raw DOM and screen pixels — unless the user explicitly grants page-read
cookies, storage, and credentials of other origins
tools the user did not approve for this origin
cross-origin context smuggled in by prompt injection — tools are typed and scoped
data with sensitive labels (credentials, payments, identity) — the engine refuses to forward labeled data into actions whose acceptsLabels excludes that label, no matter how generous the user's policy is

What the user controls, per origin

which model the page may talk to, and at what cost ceiling
which tools — page-declared and remote MCP — are visible
which context (history, bookmarks, files) is permitted
session mode: plan (read-only), execute (act), watch (preview every write) — narrowable at any time, never widenable without re-grant
a visible audit trail of every engine decision, with one-click revoke and a "Why?" / "What if?" simulator

All of this is the surface of a single chokepoint — the PolicyEngine — that walks a 9-tier ladder (Ambient → Managed Deny → Sensitivity Gate → Information-Flow Check → Watchdog → Capability Token → Policy Allow → Policy Ask → Per-Origin Grant → Default-for-Effect) on every API call. Tiers 1–5 are a safety floor a user policy can't override. The full design lives in docs/PERMISSIONS.md ↗.

We use words like by design and smaller surface. We don't say safe. A draft is not a guarantee. Threat model →

We are not announcing a product. We are publishing a sketch, asking for arguments, and looking for the people who want the open web to keep being a place where your agent works for you.

Disagree with the architecture? File an issue.
Have a use case we're missing? Tell us.
Implementing something nearby? Let's compare notes.

raffi@mozilla.org
raffi.krikorian@gmail.com

09 — get started

Three doors in.

01 — for standards folks

Read the spec.

The Web Agent API explainer, the WebIDL, the threat model, the open questions.

spec.html →

02 — for everyone else

Install Harbor.

The reference extension. Firefox and Chrome. Bring your own model, your own MCP servers.

install →

03 — for builders

Talk to us.

We want sites that demo the API, not the API itself. Tell us what you'd build.

raffi@mozilla.org →

AI as a browser capability.