spec · v1.4 · draft

Web Agent API.

A small surface for browsers to broker AI capabilities to web pages — window.ai for text generation, window.agent for tool use, navigator.modelContext (W3C WebMCP) for page-declared tools. Compatible with the Chrome Prompt API. Designed to be implementable natively.

Status · draft Version · v1.4 Updated · May 2026

Goals

Non-goals

API surface

window.ai — text generation

A page asks the browser for a session against the user's chosen model. The page never sees an API key.

const session = await window.ai.createTextSession({
  systemPrompt: "Be concise.",
});

const reply = await session.prompt("Summarize this page.");

Surface-compatible with the Chrome Prompt API (window.LanguageModel.create()) where it exists; falls through to the user's configured provider where it doesn't.

window.agent — tools and autonomous runs

A page can request scopes, list available tools, call them, or run an autonomous loop the user has authorised.

await window.agent.requestPermissions({
  scopes: ["model:tools", "mcp:tools.list", "mcp:tools.call"],
  reason: "Research assistant needs search.",
});

// Note: agent.run is gated behind the toolCalling feature flag in the
// Web Agents API sidebar. It yields typed events.
for await (const ev of window.agent.run({
  task: "find recent press on quantum chips",
})) {
  if (ev.type === "tool_call") console.log("→", ev.tool);
  if (ev.type === "final")     console.log(ev.output);
}

navigator.modelContext — page-declared tools (W3C WebMCP)

The page exposes its own JavaScript-backed tools. Any agent — Harbor today, Chrome's Auto Browse tomorrow — can call them.

navigator.modelContext.addTool({
  name:        "search_archive",
  description: "Search our 20-year archive",
  inputSchema: { type: "object", properties: { query: { type: "string" } } },
  handler:     ({ query }) => searchArchive(query),
});

Permission model

Every API call goes through a single chokepoint — the PolicyEngine — which evaluates the call against a 9-tier ladder of checks. The full design lives in docs/PERMISSIONS.md; the short version is below.

Typed actions, not opaque scopes

Every API call is described by a typed action of the form verb.noun.qualifier. The engine reasons in this vocabulary, and so do user policies. The set is small and effect-classified:

Each action carries metadata (effect tier, locality, reversibility, default data labels) the engine uses to pick a default disposition. Reads default to session-bound prompts; writes default to preview-then-confirm; destructive actions are always confirmed and cannot be auto-allowed.

Sessions and capability tokens

An agent operates inside a session that is bound to an origin and to a capability token — the unit of authority the engine checks on every gated action. Tokens carry an allowedActions set, an acceptedLabels set, explicit budgets (tool calls, wall clock, etc.), a TTL, and a mode:

Mode transitions don't just flip a flag — they re-mint the token. Cached authority can't be widened after the fact. When a session delegates to a subagent, the engine enforces strict OCAP-style attenuation: child allowedActions ⊆ parent, child acceptedLabels ⊆ parent, child budgets ≤ parent, child TTL ≤ parent.

Information-flow labels

Reads attach DataLabels to the data they produce (credentials, payments, identity, regulated, confidential). The engine propagates labels through prompts and tool calls, and any attempt to send labeled data to an action whose acceptsLabels excludes that label fails closed at Tier 3 with ERR_LABEL_FLOW_BLOCKED — no matter how generous the user's policy is.

Origin-scoped, revocable, auditable

Permissions remain origin-scoped: example.com's grant for tool.call does not transfer to other.com. Tool allow-lists still let the user pick which tools each origin may call. Every engine decision is recorded in an audit log with the ladder tier that fired, the matched rule (if any), and the labels on the input — surfaced as a "Why?" / "What if?" affordance in the sidebar so silent allows are never invisible.

Higher-risk action classes remain gated by extension-level feature flags so users can disable whole capability classes (browser write, tool calling, multi-agent) globally and out-of-band of any per-origin grant.

Threat model, in one paragraph

The agent reads what the page hands it through the API — not the live DOM, not other origins, not stored credentials. Tools are typed and scoped. Indirect prompt injection is bounded by which tools the origin holds and which tool inputs reach the model. We assume the model will be fooled and design the surface so the blast radius is small. By design, a smaller surface than free-roaming DOM agents. We don't say safe.

The full threat model lives with the spec, including specific mitigations against the Comet/Atlas-class attacks documented in 2025–26.


Read the full spec

The complete explainer (Web IDL, error model, security & privacy analysis, open questions) lives in the repo:

License

Harbor is released under the MIT license. The spec is intended to be implementable by anyone — and we'd happily move it to a community group if there's appetite.


Disagree with the architecture? File an issue ↗
Want to build with it? Build with Harbor →