-->

Runtime for real-time, multimodal AI.

The shared workspace where full-duplex AI sees, hears, speaks, remembers, and acts alongside humans. One multiplayer surface, one permission model, one event stream. In production today, carrying paid customer workloads.

50 msevent propagation, p99
99.99%+availability, measured
< 2 secspace cold-start, p99
Fig. 00The Spacebar canvas: humans, embedded applications, and AI participants working on a single synchronized surface, under one permission model.
01

The models are ready. They can reason, see, hear, and act. What they lack is a shared, stateful room to work alongside people.

A frontier model with no real surface to act on is an F1 driver in a rental car. The talent is not the constraint. The car is. A great car turns an average driver into a fast one — and an exceptional driver into a world champion.

Collaboration tools are everywhere; most now include AI. What they lack is a persistent, programmable surface where a human and an AI share context, see the same state, and act under the same rules. Notion holds the document; its AI is a sidebar. Zoom carries the conversation; its AI is a notetaker. The AI is always adjacent to the work, never intrinsic to it.

Spacebar is one synchronized space where every signal, every action, and every participant — human or model — flows into a single event stream. It is the F1 car.

Fig. 01One shared event stream is the single source of truth for the space. Every signal observable; every action addressable; humans and agents share context.
02

What the canvas is made of.

01

Spatial state

An infinite canvas that remembers exactly where everything is. Lay out documents, applications, browsers, whiteboard sketches, and video tiles where they belong, and they stay there. Close a space, come back a week later, and every object is exactly where it was left.

Fig. 02Persistent spatial state: objects persist between sessions.
02

Live presence

Cursors, voice, video, and screen share sit on the same surface as the work itself. Video tiles are objects in the space, not a frame layered over it. Nothing here is a meeting tool bolted on.

Workspace presence indicators showing Marilyn, Travis, and Rebecca online with video and mic icons plus two labeled cursors.
Fig. 03Presence lives on the work surface, not in chrome around it.
03

Embedded software

Open web apps, desktop applications, or virtual machines directly inside a space. Server-provisioned VMs streamed over WebRTC, no install. Any number of users sharing control of one live instance.

Fig. 04A server-hosted application, shared by default: one live instance, many participants.
04

Programmable runtime

Every object, event, and stream on the canvas is observable and addressable through one SDK. space.objects.add(…) to write, space.events.subscribe(…) to read. Any service or agent joins the same way a person does.

room.objects.add(. )
// every event observable
room.speak(stream)
// every action addressable
Fig. 05An SDK call, an addressable event, a canvas object. Actor: agent, permission: delegated.
05

Shared browser

A server-provisioned browser tab that the human and the agent operate simultaneously — not a remote desktop, not a Live View onto an agent-controlled session. Both parties click, scroll, and type on the same live instance. The agent receives a continuous video feed of the human's interaction, not discrete screenshots. That signal — mouse trajectories, scroll pauses, abandoned inputs — is the telemetry that screenshot-based agents never see.

SpacebarBrowserbase / Browser Use
Who drives

Human and agent simultaneously

Agent drives; human intervenes

What the agent sees

Continuous video + live DOM

Discrete screenshots or a11y tree

What the human feels

Native browsing

Remote control

The architecture difference sits above the headless-browser primitive, not within it. Spacebar uses infrastructure like that under the hood; the distinction is in the observation and control layer built on top. Full comparison with Browserbase →

Everything a real-time agent needs.

What a real-time multimodal AI needsWhere Spacebar already has it
01

To work multimodally (conversation, sight, action) on a single surface

Voice, video, cursor, gesture, screen share, every embedded browser, every canvas object — all on one surface, through one coherent interface. Not a stack of SDKs glued together.

Perceive02

To see what humans are looking at, in real time

Live, structured access to every canvas object, cursor position, embedded browser, and video feed — all observable through one consistent interface.

03

To hear humans, including overlapping speech, while filtering out noise

Per-participant, server-side audio streams. Full-duplex. No “release the mic” turn-taking. Noise suppression and acoustic filtering applied per stream. STT provider configurable per session, not locked to one vendor.

04

To read embodied and social signals

Client-side perception streams: 478 face landmarks, 21 keypoints per hand, 20+ recognized gestures, attention score, engagement signals — all derived locally, in the browser. Only the results leave the device and are made available to the agent; the video stream follows standard WebRTC routing, as in any video call.

05

To read drawings and sketches as data, not just as images

Every stroke on the canvas is stored as a vector object — coordinates, shape, path — directly addressable through the same SDK as any other canvas element. Drawings are structured, addressable data from the moment of creation, not pixels that need interpretation.

act06

To act on the same surface as humans, through the same interface

A symmetric API: every canvas action a human takes is also an API call, through the same surface and the same permission model. Same interface, same rules, for any actor.

07

To act with bounded authority

An agent takes on the full persona of the user it represents. The same role assignments, space access, and device controls that bind a person bind it.

Persist08

To remember — minutes, weeks, or months

Persistent space state, full journaled history, lossless replay: every event captured in sequence, used by the substrate to reconstruct context for the agent's memory. Short-term and long-term memory layers, active in the voice + vision agent.

Infrastructure09

To run on any model, from any vendor

Pluggable STT, LLM, and TTS providers, selected per session. No single vendor is a hard dependency; when one degrades, new sessions route to another provider.

10

To respond fast enough that it feels like collaboration

50 ms event propagation, p99. The latency budget belongs to the model; the canvas adds effectively none.

03

Two modes. One substrate.

Spacebar supports two distinct agent modes — not as features, but as first-class deployment patterns. Real-time agents join live sessions and act in the moment. Always-on agents run between sessions, on triggers, and return to a Space rather than sending a notification. Both run on the same substrate. Both are in production today.

01 / Voice + vision agent

A multimodal assistant that joins live rooms.

The agent joins a Spacebar space over WebRTC as a first-class participant, on the same surface as everyone in the space. It runs a streaming voice-activity detector against the live per-participant audio streams, transcribes each participant’s speech in real time through a pluggable provider, and watches the canvas as a live visual stream. Board images are sampled at the cadence of speaking activity, so the multimodal frame budget follows the conversation rather than being exhausted during silence. It speaks back through a pluggable streaming text-to-speech provider into the shared audio mix, and retains session memory in two tiers.

Real-time audioPer-participant full-duplex media server, managed
Vision streamBoard images + canvas snapshots, sampled per frame
ReasoningPluggable LLM provider
MemoryShort-term + long-term per user
Fig. 06What the agent sees: per-participant audio, sampled board images, LLM context, and memory tiers.
02 / Canvas-native assistant

24 Structured tools

Runs as a sidebar inside the canvas itself. The board image is piped into every iteration, so the model always has full visual context. Every tool routes through the same authorization layer a human action would pass through; the assistant cannot do what the user it represents is not permitted to do. It runs multi-step loops, observing the board, calling a tool, and receiving the result — until the requested change is complete or the loop depth is exceeded.

BoardsCreate, duplicate, and organize boards. Set backgrounds, review and annotate content, and add generated material directly onto the canvas.
DiscoveryBrowse all boards and folders, build folder structures, and navigate to the current active board.
RoomsList available rooms, create new ones, and assign participants to the right space.
UsersList all users in a space and see exactly who is currently active.
Device control
Force microphone and camera state for any participant or the entire space.

Mute everyone but one participant. Or unmute an entire space. One call.

Embedded appsInsert timers, stopwatches, and interactive widgets directly onto the canvas.
Generation & vision
Generate structured content onto the board and export board images to feed back into the model’s vision context.
Fig. 07The action loop: observe the board, call a structured tool through the shared authorization layer, write the result, observe again.

Both are production systems, not demos: evidence that a real-time multimodal agent can participate reliably in a live, multi-participant space. We built the substrate, scaled it under real customer load, and proved the cost structure holds. The numbers are measured, not modeled.

Most proactive agent products are headless monitoring loops with notification surfaces. The agent finishes and fires off a Slack message. That model works for tasks with a single output. It breaks down the moment the task is ongoing, ambiguous, or needs a human to pick up and continue it.

Most proactive agent products today are headless monitoring loops with notification surfaces. When the agent finishes, it sends you a Slack message. Spacebar is different: when your agent finishes, it has been working in a Space. You walk back into the room it was working in — the canvas already laid out, the sources already open, the draft already there. You don't read a summary of what happened. You see it.

Time trigger

Every morning at 7am, scan our RFP inbox and draft responses.

The agent opens the inbox, reads each RFP, checks the canvas for relevant past proposals, drafts a response, and leaves it pinned to the board — ready for your first review when you arrive.

Event trigger

Customer NPS drops below threshold — start the retention playbook.

The agent opens a Space, pulls the account history from the CRM, drafts a personalized outreach sequence, and flags the three most at-risk accounts with recommended next actions. No prompt required.

Layered

Monitored 12 supplier pages overnight. Flagged 3 changes. Drafted outreach for the one that matters.

You asked it to watch. It watched. When it found something real — a 7% price increase on a critical component — it logged it on the canvas, cross-referenced your contract terms, and drafted a response for your approval.

The handoff. When the agent comes back to you, it is not a notification. It is a Space with the work already laid out — sources open, canvas annotated, next steps visible. Every other proactive agent platform sends you a summary. Spacebar puts you back in the room it was working in.

04

Where the time goes.

Fig. 08 · where the time goes

Turn-based AI in a browser tab, prompt to response, p50~2,500 ms
Model inference and network, typical, p50~600 ms
Spacebar event propagation, p9950 ms

The first two are p50 estimates, for context. The 50 ms is Spacebar’s measured p99: even the worst case clears the others’ typical case by more than an order of magnitude. Whatever a user waits on, almost none of it is the substrate.

05

Built for an agent to live in.

Building a system that holds live video, an embedded browser, a shared document, and a collaborative whiteboard in one synchronized space — all of it surviving a dropped connection — took five years. It is the infrastructure a real-time agent needs to see, hear, act, and remember alongside people, without adding latency, context loss, or broken permissions.

Native

CRDT engine

A custom CRDT engine on the hot path, computing minimum binary deltas from a client-supplied state vector. Runs as a compiled native service with multi-threaded execution, process-isolated from socket I/O to keep the hot path fast under load.

Dedicated

Worker pool

CRDT apply and encode operations run on a dedicated worker pool, sized to leave headroom on the main loop for socket I/O.

Layered

Cache

Hot in-memory state, backed by a shared cache, backed by versioned durable storage. Versioning invalidates the hot tier whenever a snapshot lands: warm-start by design.

Async

Compaction

Snapshot compaction runs out-of-band with cooldown to prevent cascading recompactions. The hot path never blocks on snapshot work. That is how the 50 ms p99 holds under load.

Distributed

Ownership lock

Per-space ownership locks ensure one server compacts a given space at a time; ownership tracking enables failover detection. The mechanism behind reliable sharded sessions.

Self-healing

Desync detection

Client and server each maintain expectations about the next updates; deviation is detected within milliseconds and the system recovers without losing a single state update. An immutable mutation log captures every create, update, and delete.

06

The model layer is converging.
‍The substrate isn't.

Model capability is converging: real-time APIs from Google and OpenAI, computer-control surfaces from Anthropic, tool-using agents everywhere. The substrate beneath them is not.

A real-time agent needs a persistent, permissioned, multiplayer surface to see, hear, act, and remember alongside people. Building one is a multi-year systems project. The opportunity cost is real: every month spent building this layer is a month not spent on your actual product. Spacebar is that surface, in production. The question is whether you build it or build on it.

What you needWhat Spacebar provides
What you'd need to build
Multiplayer persistent canvas: CRDT state, snapshots, replayAvailable now12-24 mo
Per-participant full-duplex audio + video over WebRTCAvailable now6-12 mo
Server-hosted shared browser, doc, and app surface inside the canvasAvailable now12+ mo
Symmetric agent action API: shared surface, shared permission model, per-actor scopingAvailable now6+ mo
Short-term + long-term memory tiers, per participantAvailable now3-6 mo
Scoped agency: agents inherit human permissions, no privilege escalationAvailable now3+ mo
Enterprise compliance controls: SOC 2, HIPAA, GDPRAvailable now12+ mo
Real-user operating data: per-session cost, failure modes, latency, and verified throughput (10,000+/sec across spaces)Available nowUnavailable
Estimates assume production-grade implementation: hardened, redundant, and operationally stable under real load.

You will need most of this. Even with a large team working in parallel, building it is at least two years of work. Build on Spacebar, and free that time for your product.

The hard problems in applied AI have moved. The model is rarely the bottleneck now — it's everything around it: state, permissions, the surface a human and an agent actually share. Spacebar is built for exactly this.

[Placeholder Name][Title], [Company]

Most “agentic” demos fall apart the moment they meet a real, multi-party session. What's notable here is that it is already running in production, with the operating data to show for it.

[Placeholder Name][Title], [Company]

We evaluated building the real-time layer ourselves and stopped counting at eighteen months. One SDK, one permission model, one event stream — that is the part nobody should be rebuilding.

[Placeholder Name][Title], [Company]

Putting a human and an agent on the same surface, under the same rules, sounds obvious until you try to build it. This is the first substrate I have seen that actually treats them as equals.

[Placeholder Name][Title], [Company]
07

A space any application, service, or agent can join.

Every object, event, and stream on the Spacebar canvas is observable, addressable, and actionable through one coherent SDK. Build an app, an integration, a service, or an autonomous agent — each joins the canvas the same way a person would. The protocol is symmetric: human and agent reach the same surface through the same API. Any permission a human holds, an agent can hold. The system draws no distinction between what is available to a person and what is available to a program.

// Make any web component multiplayer in a Spacebar space.
// @pncl/mario — Spacebar's real-time SDK
import { MarioClient } from "@pncl/mario";

const space = await MarioClient.join("space_8KXq...");
space.bind(myComponent);

// every state change now syncs to everyone in the space.
// presence, CRDT conflict resolution, cursors, undo/redo,
// version history, and snapshot recovery: free.
// Observe everything happening in a space, in real time.
import { MarioClient } from "@pncl/mario";

const space = await MarioClient.join("space_8KXq...");

space.events.subscribe(event => {
  // event.kind: "cursor" | "draw" | "speak" | "type"
  //             | "object.add" | "object.move" | ...
  // event.actor, event.timestamp, event.payload
});
// Drive the canvas the same way a human would.
await space.objects.add({
  kind: "stickyNote",
  x: 320, y: 480,
  text: "Try this approach instead."
});

await space.objects.move("obj_a91...", { x: 600, y: 480 });
await space.speak({ stream: ttsStream });
await space.browser.type("doc_b14...", "Hello.");

An agent built against this SDK is not integrated into the canvas. It is a participant in it, with the same reach and the same limits as the person sitting next to it.

08

Connect anything. In minutes, not months.

Most real-time AI platforms are rigid: the surface they ship is the surface you get. Spacebar is designed to be extended. Everything in a space — presence, state, events, audio, browser, memory — is observable and writable through the same SDK your agent uses. Adding a connector is adding a participant that speaks a specific protocol. That is all it is.

A model deployed on Spacebar reaches the systems your customers use. External agents connect through the protocol they already speak. Wherever the work goes — desktop, mobile, any browser — the substrate follows.

MCP server

native

Any agent that speaks MCP can speak Spacebar out of the box. Exposes users, spaces, sessions, recordings, transcripts, presence, audit logs, billing, and scheduling availability — the full operational context — through standard MCP protocol. No custom integration code required.

SDK + REST API

extensible

Every connector on this list was built using the same public SDK and REST API available to you. If a connector does not exist yet, building one takes hours, not weeks. The event model is uniform: subscribe to anything, write to anything, from any runtime.

Webhooks

HMAC-signed

HMAC-signed outbound events on session, recording, and presence state. Drop a URL and start receiving structured payloads immediately — no polling, no SDK required on the receiving end.

Meeting adapters

bots

Adapter bots join the meeting tool your team already uses — Zoom, Meet, Teams — bringing the canvas and the agent surface with them. Your AI joins the call as a participant, not a sidebar.

Browser extensions

Chrome · Firefox

Surface a Spacebar sidebar on any webpage. Useful for building context-aware agents that work alongside users in their existing browser workflows, without redirecting them to a new URL.

Calendar

Google · Outlook · Office 365

Bi-directional session sync. Sessions appear on the user's calendar; calendar events can trigger space creation. Two lines of configuration, not two weeks of integration.

Mobile

iOS · Android

iOS and iPad with native screen capture over WebRTC. Full canvas on Android through Chrome or any modern browser. Browser-based delivery is intentional: browsers are automatable, which matters for agent integrations running on mobile surfaces.

09

What it costs, measured in production.

Pencil Spaces — our own platform, built entirely on Spacebar — has run paid customer workloads for over four years. The numbers are measured, not estimated: per-session cost, failure modes, cost structure, and uptime. Across that four-year window, availability has held above 99.99%. Status page → Technical evaluators: reach out to partnerships@spacebar.ai for the full data package. Spacebar monetizes as infrastructure: usage-based pricing for builders, enterprise licensing for deployers, and trajectory data licensing for frontier labs. Working with frontier labs on training data? See spacebar.ai/labs →

99.99%+Measured availability
50 ms p99Event propagation
200,000,000+Minutes served, to date
Compliance
SOC 2·HIPAA·GDPR

Scoped agency. An agent takes on the full persona of the user it represents and nothing more. The same role assignments, space access, and device controls that bind a person bind it. The authorization model prevents privilege escalation: an agent cannot exceed its delegated permissions.

Tenant isolation

Terraform-managed regional infrastructure: GKE, object storage, and cache. JWT-scoped real-time channels, validated at every message, not just at connection time. Perception models run in-browser: face landmarks and gesture signals are derived locally and only the results leave the device. The video stream follows standard WebRTC routing.

10

Built to your specification.

Not every use case fits the standard platform. When yours does not, we build to specification.

For example — demos

A purpose-built space for demonstrating your AI.

A controlled environment for showing your system to investors, customers, or conference audiences. Custom canvas layout, branded, silent by default — only what you want visible, nothing you do not. Live in front of anyone.

Custom layoutBrandedAudience-ready
For example — consults

A structured session environment for high-stakes conversations.

An AI agent participates in a real consultation alongside a professional — seeing the same documents, hearing the same conversation, assisting in real time. Role-scoped, fully auditable, HIPAA-ready on request. Built for the constraints of regulated industries.

Role-scopedAuditableHIPAA-ready

Every custom build starts with a conversation. Tell us what you need →

11

Frequently asked questions

What exactly is Spacebar?
Who is Spacebar built for?
How is this different from just building on top of an LLM API?
Is Spacebar a product or infrastructure?
What does "full-duplex" mean in this context?
What is the event stream?
How does the permission model work?
What is the latency of the event stream?
Can agents run without a human present?
What triggers can activate a proactive agent?
How does Spacebar compare to Dust?
What infrastructure does Spacebar run on?
What compliance certifications does Spacebar hold?
How do I get started building on Spacebar?
What does custom development look like?
12

Get in touch.

Tell us what you’re building and we’ll route you to the right person. Or reach out directly. Whichever is easiest for you. If your use case calls for something beyond the standard platform, we build to specification.

We’ll get back to you quickly.

Prefer a different channel? Every contact option reaches a real person.

Thanks. We’ll get back to you quickly.
Something went wrong. Please use one of the direct contact options.
Call or text(650) 550-9341
Book a timeFind a slot →
Organizations
sales@spacebar.aiDeployment, procurement, support
Developers
engineering@spacebar.aiRuntime, SDK, MCP server
Labs & partners
partnerships@spacebar.aiResearch and infrastructure partnerships
13

The people who built it

Spacebar was built by the same team behind Pencil Spaces — four years of production at scale, carrying real customer workloads. The substrate was not designed in theory.

Co-founder & CEO

Ayush Agarwal

Head of Product, Map Ads at Google. Head of Enterprise Products at Meta. Venture investing at Madrona. McKinsey.

LinkedIn
Co-founder & CTO

Amogh Asgekar

Senior Staff Software Engineer at Google, leading systems handling hundreds of thousands of queries per second at low latency. IIT Bombay MTech.

LinkedIn
Head of Product

Imran Ahmed

McKinsey & Company, QuantumBlack. MEng with distinction, University of Cambridge. Cambridge–MIT Exchange in Computer Science.

LinkedIn
Coo

Swati Khandelwal

Operations and growth across the full Pencil Learning Technologies portfolio. Four years scaling Spacebar and Pencil Spaces from zero to production.

LinkedIn