build · quickstart

Build with Harbor.

Install the reference extension. Bring your own model. Write three lines. The whole loop — Firefox, Ollama, your code — runs in about ten minutes.

Firefox primary Chrome supported Safari experimental

1. Install Harbor

Harbor is two browser extensions and a small native bridge (Rust). Both extensions need to be loaded for the Web Agent API to work end-to-end.

Prerequisites

Node.js 18+ · nodejs.org
Rust · curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Ollama · ollama.com or brew install ollama
Firefox 109+ or Chrome 120+ — Firefox recommended

Clone and build

git clone --recurse-submodules https://github.com/r/harbor.git
cd harbor

# Build the Harbor extension
cd extension && npm install && npm run build && cd ..

# Build the Web Agents API extension
cd web-agents-api && npm install && npm run build && cd ..

# Build and install the native bridge
cd bridge-rs && cargo build --release && ./install.sh && cd ..

Start a model

ollama serve &
ollama pull llama3.2

Load both extensions in Firefox

Open about:debugging#/runtime/this-firefox
"Load Temporary Add-on…" → extension/dist-firefox/manifest.json
"Load Temporary Add-on…" again → web-agents-api/dist-firefox/manifest.json
Open the Harbor sidebar (Cmd + B). Verify Bridge: Connected and LLM: Ollama.

Chrome works the same way with chrome://extensions → Load unpacked → the two dist-chrome/ directories. One extra step for Chrome's native messaging is documented in the repo.

2. Hello, browser AI

The minimum viable use of the API: ask permission, prompt the model, render the answer.

<button id="ask">Ask AI</button>
<div id="out"></div>
<script>
document.getElementById("ask").onclick = async () => {
  await window.agent.requestPermissions({
    scopes: ["model:prompt"],
    reason: "To answer your question",
  });

  const session = await window.ai.createTextSession();
  const reply   = await session.prompt("What is 2 + 2?");
  document.getElementById("out").textContent = reply;
};
</script>

3. Run an agent

Real agents create an explicit session in plan mode, propose what they want to do, and only ask the user to upgrade to execute when they have something to apply. The session carries a capability token that bounds everything the agent can do — when you upgrade, the token is re-minted with the new actions. agent.run remains gated by the toolCalling feature flag in the Web Agents API sidebar — flip it on for development.

const session = await agent.requestCapabilities({
  name: "Research assistant",
  reason: "Research assistant needs search and memory.",
  mode: "plan",
  require: [
    { action: "model.prompt.local" },
    { action: "tool.call", server: "brave-search", toolNames: ["search"] },
    { action: "tool.call", server: "memory", toolNames: ["save"] },
  ],
  budget: { maxToolCalls: 30, ttlMinutes: 15 },
});

for await (const ev of agent.run({
  sessionId: session.id,
  task: "Find recent press on AI safety",
  maxToolCalls: 10,
})) {
  if (ev.type === "thinking")  console.log(ev.content);
  if (ev.type === "tool_call") console.log("→", ev.tool);
  if (ev.type === "final")     console.log(ev.output);
}

// Capability tokens are one-way: upgradeSession can narrow but not
// widen. To gain write authority later, mint a fresh session with a
// scoped, reason-tagged prompt the user can grant in one click:
const writeSession = await agent.requestCapabilities({
  name:    "Apply suggested edit",
  reason:  "Apply the changes you approved",
  mode:    "execute",
  require: [{ action: "browser.write.interact" }],
});

4. Expose your own tools (W3C WebMCP)

Pages can declare JavaScript-backed tools the user's agent can call. No server, no auth — the function lives in the page.

navigator.modelContext.addTool({
  name:        "search_archive",
  description: "Search our 20-year archive",
  inputSchema: { type: "object", properties: { query: { type: "string" } } },
  handler:     ({ query }) => searchArchive(query),
});

Going deeper

The full developer documentation lives in the repo. The starter guide is designed to drop into any AI coding assistant's context (Cursor, Claude Code, Copilot) so it can write Web Agent API code with you.

Tell us what you build

We want sites that demo the API, not the API itself. If you build something — even a half-broken sketch — let us know. The architecture lives or dies on the use cases that show up.

raffi@mozilla.org · GitHub Discussions ↗