spec · v1.4 · draft
Web Agent API.
A small surface for browsers to broker AI capabilities to web pages —
window.ai for text generation, window.agent for tool use,
navigator.modelContext (W3C WebMCP) for page-declared tools.
Compatible with the Chrome Prompt API. Designed to be implementable natively.
Goals
- Give developers a stable browser API for AI without choosing the model, holding the keys, or running the inference.
- Let users keep their model, credentials, and context — and decide, per origin, what each website can use.
- Be compatible with Chrome's Prompt API and W3C WebMCP. Polyfillable today, native tomorrow.
- Make the threat surface small by construction: pages get verbs, not the DOM.
Non-goals
- Replace cloud AI providers. The user picks the provider; the API just brokers.
- Define training, fine-tuning, or evaluation. Inference only.
- Specify model behaviour. Capability and quality depend on the user's configured backend.
- Handle billing or quota. Cost ceilings are user-side concerns.
API surface
window.ai — text generation
A page asks the browser for a session against the user's chosen model. The page never sees an API key.
const session = await window.ai.createTextSession({
systemPrompt: "Be concise.",
});
const reply = await session.prompt("Summarize this page.");
Surface-compatible with the Chrome Prompt API
(window.LanguageModel.create()) where it exists; falls through to
the user's configured provider where it doesn't.
window.agent — tools and autonomous runs
A page can request scopes, list available tools, call them, or run an autonomous loop the user has authorised.
await window.agent.requestPermissions({
scopes: ["model:tools", "mcp:tools.list", "mcp:tools.call"],
reason: "Research assistant needs search.",
});
// Note: agent.run is gated behind the toolCalling feature flag in the
// Web Agents API sidebar. It yields typed events.
for await (const ev of window.agent.run({
task: "find recent press on quantum chips",
})) {
if (ev.type === "tool_call") console.log("→", ev.tool);
if (ev.type === "final") console.log(ev.output);
}
navigator.modelContext — page-declared tools (W3C WebMCP)
The page exposes its own JavaScript-backed tools. Any agent — Harbor today, Chrome's Auto Browse tomorrow — can call them.
navigator.modelContext.addTool({
name: "search_archive",
description: "Search our 20-year archive",
inputSchema: { type: "object", properties: { query: { type: "string" } } },
handler: ({ query }) => searchArchive(query),
});
Permission model
Every API call goes through a single chokepoint — the PolicyEngine — which evaluates the call against a 9-tier ladder of checks. The full design lives in docs/PERMISSIONS.md; the short version is below.
Typed actions, not opaque scopes
Every API call is described by a typed action of the form
verb.noun.qualifier. The engine reasons in this vocabulary, and
so do user policies. The set is small and effect-classified:
model.prompt.local/model.prompt.remote.firstParty/model.prompt.remote.thirdParty— LLM access, distinguishing on-device vs cloud egressmodel.list— list configured providers/models (metadata only)tool.list/tool.call— list and call MCP toolsmcp.server.register— register a website-provided MCP server (BYOC)browser.read.activeTab/read.element/read.screenshot/read.tabs— page readsbrowser.write.interact/write.navigate/write.tabsCreate— page mutationsnetwork.egress.same_origin/network.egress.cross_origin— fetch through the proxyagent.register/discover/invoke/delegate.*— multi-agent surface
Each action carries metadata (effect tier, locality, reversibility, default data labels) the engine uses to pick a default disposition. Reads default to session-bound prompts; writes default to preview-then-confirm; destructive actions are always confirmed and cannot be auto-allowed.
Sessions and capability tokens
An agent operates inside a session that is bound to an
origin and to a capability token — the unit of authority
the engine checks on every gated action. Tokens carry an
allowedActions set, an acceptedLabels set,
explicit budgets (tool calls, wall clock, etc.), a TTL, and a
mode:
plan— reads and prompts only; the token'sallowedActionsexcludes everywriteaction.execute— normal evaluation per the ladder.watch— Execute, but writes return preview and pause for explicit confirmation.
Mode transitions don't just flip a flag — they re-mint the token. Cached
authority can't be widened after the fact. When a session delegates to a
subagent, the engine enforces strict OCAP-style attenuation: child
allowedActions ⊆ parent, child acceptedLabels ⊆
parent, child budgets ≤ parent, child TTL ≤ parent.
Information-flow labels
Reads attach DataLabels to the data they produce
(credentials, payments, identity,
regulated, confidential). The engine propagates
labels through prompts and tool calls, and any attempt to send labeled
data to an action whose acceptsLabels excludes that label
fails closed at Tier 3 with ERR_LABEL_FLOW_BLOCKED — no
matter how generous the user's policy is.
Origin-scoped, revocable, auditable
Permissions remain origin-scoped: example.com's grant for
tool.call does not transfer to other.com. Tool
allow-lists still let the user pick which tools each origin may
call. Every engine decision is recorded in an audit log with the
ladder tier that fired, the matched rule (if any), and the labels on
the input — surfaced as a "Why?" / "What if?" affordance in the
sidebar so silent allows are never invisible.
Higher-risk action classes remain gated by extension-level feature flags so users can disable whole capability classes (browser write, tool calling, multi-agent) globally and out-of-band of any per-origin grant.
Threat model, in one paragraph
The agent reads what the page hands it through the API — not the live DOM, not other origins, not stored credentials. Tools are typed and scoped. Indirect prompt injection is bounded by which tools the origin holds and which tool inputs reach the model. We assume the model will be fooled and design the surface so the blast radius is small. By design, a smaller surface than free-roaming DOM agents. We don't say safe.
The full threat model lives with the spec, including specific mitigations against the Comet/Atlas-class attacks documented in 2025–26.
Read the full spec
The complete explainer (Web IDL, error model, security & privacy analysis, open questions) lives in the repo:
- Full Web Agent API explainer (GitHub) ↗
- Security & privacy threat model ↗
- Working code examples ↗
- Positioning & differentiation (May 2026) ↗
License
Harbor is released under the MIT license. The spec is intended to be implementable by anyone — and we'd happily move it to a community group if there's appetite.
Disagree with the architecture? File an issue ↗
Want to build with it? Build with Harbor →