a proposal · open spec · working extension

AI as a
browser capability.

Bring your AI to every website. Your model, your credentials, your context — mediated by the browser, not a vendor.

status a working sketch of what user-controlled AI on the open web could look like — open spec, working extension, Firefox + Chrome.

01 — the 2026 landscape

AI in the browser is everywhere. None of it is yours.

In 2024, "AI in the browser" was an idea. By 2026 it's everywhere — and it has consolidated in a way the open web hasn't, in a long time.

OpenAI ships Atlas. Perplexity ships Comet. Google rebuilt Chrome around Gemini. Each of them is a vertically-integrated stack: their browser, their agent, their model, their servers. The agent reads your screen, moves your cursor, types in your fields. To use any of it, you adopt the whole thing.

Sidebar assistants are the other shape: Brave Leo, Firefox's AI Window. Calmer, but bolted onto the side of the page. They can read the page; they can't do things on it without screen-driving DOM agents underneath.

Both shapes assume the same thing: that the agent is the platform's, not yours. The web is sliding from a place you visit into a place an agent visits on your behalf — and the agent answers to whoever shipped the browser.

Read the full landscape essay →

02 — the proposal

What if AI were a browser capability, like fetch() is for networking?

The browser is the right place to put the user's agency. It already mediates cookies, geolocation, the camera, payments. It already brokers trust between you and millions of sites you've never met. AI is the same kind of capability — and it should be wired in the same way.

A.

Your model.

You pick the provider. Local, OpenAI, Anthropic, a self-hosted endpoint. Sites talk to a stable browser API; the browser talks to whichever model you chose.

B.

Your context.

Your bookmarks, your tabs, your history, your MCP servers — kept on your device, surfaced to the agent only with your permission, per origin.

C.

Mediated by the browser.

The page asks; the browser decides what the agent can see and do. Permissions are scoped, revocable, visible. Same posture as the camera.

03 — how it works

Three primitives. The rest is policy and plumbing.

The proposed Web Agent API is small on purpose. A page can prompt the user's model, run an agent against tools the user has approved, or register a tool of its own.

1 · use the user's modelwindow.ai
// The page never sees an API key. The browser does the routing.
const session = await window.ai.createTextSession();
const reply   = await session.prompt("Summarize this page");
2 · run an agent that can call tools the user has approvedwindow.agent
for await (const ev of window.agent.run({ task: "find recent press on quantum chips" })) {
  if (ev.type === "tool_call") console.log("→", ev.tool);
  if (ev.type === "final")     console.log(ev.output);
}
3 · a page declares its own tool · W3C WebMCPnavigator.modelContext
navigator.modelContext.addTool({
  name:        "search_archive",
  description: "Search our 20-year archive",
  inputSchema: { type: "object", properties: { query: { type: "string" } } },
  handler:     ({ query }) => searchArchive(query),
});

04 — where harbor sits

The differentiation is in the architecture, not the chrome.

Four shapes are in play in 2026. They look similar from the user's seat; underneath, the trust model is very different.

Vendor agent browsers Atlas · Comet · Gemini in Chrome Sidebar AI assistants Brave Leo · Firefox AI Window Prompt API + WebMCP alone Chrome built-in AI · proposed standards Harbor + Web Agent API extension · open spec · this proposal
Whose model vendor's, fixed vendor's, fixed browser-bundled (Gemini Nano) user's choice — local, hosted, or self-run
Whose browser theirs (a fork) shipped one, locked to vendor Chrome, today any browser — Firefox + Chrome today
How agents act on pages screen-drives the DOM reads page, can't really act page tools only page tools + remote MCP, all user-scoped
Permissions surface implicit, trust the vendor per-feature toggle page-level, opaque typed actions + capability tokens, per-origin, with mode (plan/execute/watch) and an audit feed
Where context lives vendor's servers partly local, partly vendor local, but only what the page sees your machine, surfaced by the browser
Threat surface full DOM, prompt-injectable page text, summarization-shaped narrow but Chrome-only scoped tools — by design, smaller surface
Cross-site memory vendor stitches it none, mostly none user-stitched, via your MCP servers
Open spec no no partial (Prompt API draft, WebMCP draft) yes — and this proposal extends them
Who answers to you the vendor the vendor the browser, narrowly the browser, fully

Vendor agent browsers

Atlas · Comet · Gemini in Chrome

Whose model
vendor's, fixed
Whose browser
theirs (a fork)
How agents act
screen-drives the DOM
Permissions
implicit, trust the vendor
Context lives
on vendor servers
Threat surface
full DOM, prompt-injectable
Cross-site memory
vendor stitches it
Open spec
no
Answers to
the vendor

Sidebar AI assistants

Brave Leo · Firefox AI Window

Whose model
vendor's, fixed
Whose browser
shipped one, locked to vendor
How agents act
reads page, can't really act
Permissions
per-feature toggle
Context lives
partly local, partly vendor
Threat surface
page text, summarization-shaped
Cross-site memory
none, mostly
Open spec
no
Answers to
the vendor

Prompt API + WebMCP alone

Chrome built-in AI · proposed standards

Whose model
browser-bundled (Gemini Nano)
Whose browser
Chrome, today
How agents act
page tools only
Permissions
page-level, opaque
Context lives
local, but only what the page sees
Threat surface
narrow but Chrome-only
Cross-site memory
none
Open spec
partial (Prompt API, WebMCP drafts)
Answers to
the browser, narrowly

Harbor + Web Agent API

extension · open spec · this proposal

Whose model
user's choice — local, hosted, self-run
Whose browser
any — Firefox + Chrome today
How agents act
page tools + remote MCP, user-scoped
Permissions
typed actions + capability tokens, per-origin, with audit feed
Context lives
your machine, surfaced by the browser
Threat surface
scoped tools — by design, smaller
Cross-site memory
user-stitched via your MCP servers
Open spec
yes — and extends Prompt API + WebMCP
Answers to
the browser, fully

05 — standards convergence

We're not the only ones thinking about this. We're trying to fit in.

Three pieces of standards work overlap with what we're proposing. Harbor is the fourth — the seam that connects them.

Chrome Prompt API

Google · WICG draft

A built-in language model exposed to web pages. window.ai in spirit. Browser-bundled, Chrome-only, today.

WebMCP

W3C · early draft

Pages declare tools; agents call them. navigator.modelContext.addTool(). The verb-half of the puzzle.

MCP

Anthropic · de-facto protocol

The wire format for tools and context. Already what your AI talks to your servers in. A web-shaped MCP is the natural next step.

Web Agent API

this proposal

The seam. Lets a page run a real agent — model, tools, context — without the page choosing the model or holding the keys.

06 — one sketch · harbor

Harbor is the working sketch of the proposal.

An extension for Firefox and Chrome. Implements the Web Agent API as proposed. Routes window.ai to whichever model you've configured. Brokers tools through MCP. Shows you, per origin, what the agent is allowed to do.

Read the spec →
Build with it →

web page your-favorite-site.example window.ai · window.agent browser (mediates) user permissions, per origin · audit, revoke, scope — Harbor lives here — user permits per origin user's model local · hosted · self-run user's MCP servers page tools · remote tools page user-controlled the trust line moves up, by one box
fig. 1 · the trust line

07 — what you could build

Boring things that are suddenly easy.

The interesting demos aren't "an AI search box." They're the boring things that happen when every site can talk to your model, your tools, your context — without each one re-inventing the stack.

01

An archive that searches itself with your words.

A 20-year news archive registers search_archive as a tool. Your AI uses it the way it would use any retrieval tool. The archive doesn't ship a model.

navigator.modelContext.addTool({ name: "search_archive", … })

02

A docs site that reads itself for you.

One window.ai call summarizes whatever the user is reading, in their model — no API key, no server bill, no analytics.

await window.ai.createTextSession().then(s => s.prompt(article))

03

A booking flow you can finish without using the booking flow.

The site exposes find_flights and hold_seat as tools. The user's agent fills, picks, holds. The site runs no agent of its own.

window.agent.run({ task: "PDX → SFO Friday after 5pm, aisle" })

04

An accessibility shortcut for everyone.

Pages declare semantic tools, not just ARIA. Screen-reader users — and everyone else — get an agent that actually understands the page's verbs.

addTool({ name: "open_chapter", inputSchema: { … } })

05

Your own writing, in your own model, anywhere there's a textarea.

An extension hooks any input. Your AI rewrites, in your style, with your context. Same shape on every site, because the API is the browser's.

window.ai.createTextSession({ system: userStyleProfile })

08 — security & permissions

A smaller surface, by design.

We're a draft proposal, not a hardened product. But the shape itself is the argument: the agent only sees the verbs the page declared, and only with the permissions the user granted, per origin.

What the model is shielded from

  • raw DOM and screen pixels — unless the user explicitly grants page-read
  • cookies, storage, and credentials of other origins
  • tools the user did not approve for this origin
  • cross-origin context smuggled in by prompt injection — tools are typed and scoped
  • data with sensitive labels (credentials, payments, identity) — the engine refuses to forward labeled data into actions whose acceptsLabels excludes that label, no matter how generous the user's policy is

What the user controls, per origin

  • which model the page may talk to, and at what cost ceiling
  • which tools — page-declared and remote MCP — are visible
  • which context (history, bookmarks, files) is permitted
  • session mode: plan (read-only), execute (act), watch (preview every write) — narrowable at any time, never widenable without re-grant
  • a visible audit trail of every engine decision, with one-click revoke and a "Why?" / "What if?" simulator

All of this is the surface of a single chokepoint — the PolicyEngine — that walks a 9-tier ladder (Ambient → Managed Deny → Sensitivity Gate → Information-Flow Check → Watchdog → Capability Token → Policy Allow → Policy Ask → Per-Origin Grant → Default-for-Effect) on every API call. Tiers 1–5 are a safety floor a user policy can't override. The full design lives in docs/PERMISSIONS.md ↗.

We use words like by design and smaller surface. We don't say safe. A draft is not a guarantee. Threat model →

We are not announcing a product. We are publishing a sketch, asking for arguments, and looking for the people who want the open web to keep being a place where your agent works for you.

Disagree with the architecture? File an issue.
Have a use case we're missing? Tell us.
Implementing something nearby? Let's compare notes.

raffi@mozilla.org
raffi.krikorian@gmail.com

09 — get started

Three doors in.

01 — for standards folks

Read the spec.

The Web Agent API explainer, the WebIDL, the threat model, the open questions.

spec.html →

02 — for everyone else

Install Harbor.

The reference extension. Firefox and Chrome. Bring your own model, your own MCP servers.

install →

03 — for builders

Talk to us.

We want sites that demo the API, not the API itself. Tell us what you'd build.

raffi@mozilla.org →