Spacebar for AI Labs

The RL gym where agents and humans actually work together.

Every collaborative session in a Spacebar Space is, by construction, a labeled multi-turn trajectory: voice, canvas, browser interaction, tool use, and natural recovery — captured as a byproduct of real work. The bottleneck for frontier model training isn't compute. It's exactly this kind of data.

The data gap

What the papers say — and what doesn't exist yet.

The Salesforce APIGen-MT paper opens by stating directly that high-quality data capturing realistic human-agent dynamics is scarce and expensive to collect manually. PC Agent-E identifies computer-use trajectories as a critical bottleneck. Fireworks's multi-turn RL guide is blunt: supervised fine-tuning on golden trajectories breaks down because the model never sees recovery paths from failure.

That recovery data only exists in sessions where humans and agents are working together on real tasks, in a real multimodal environment, with the freedom to make mistakes and correct them. Spacebar generates exactly that as a byproduct of collaboration.

What's undertrained

Long-horizon, multimodal, with natural recovery paths.

Sessions span voice, canvas, browser, and chat. They include ambiguity, clarification turns, mistakes, and corrections — exactly the failure modes current frontier models undertrain.

APIGen-MT · tau-bench · BFCL

What synthetic can't provide

The high-realism anchor distribution that synthetic pipelines need.

Synthetic data gets you a long way. Spacebar doesn't replace it — it provides the real distribution that synthetic pipelines use as a seed and validation set.

Fireworks multi-turn RL guide · PC Agent-E

Market signal

The data-labeling market is shifting toward specialist interaction data.

The move is toward high-realism, multimodal, human-agent interaction data. Spacebar generates it as a byproduct of real work — not in a lab, not synthetically.

Trajectory schema

What a Spacebar trajectory actually contains.

Every session is replayable frame by frame and exportable in formats compatible with standard RL training pipelines. Here is the field-level schema.

FieldDescription
audio [ ]voice_turns

Timestamped voice turns per participant: raw audio (WebM/Opus), ASR transcript, speaker identity (role-scoped), turn boundaries, and VAD confidence. Overlapping speech captured separately per stream — not mixed.

events [ ]canvas_state

Full CRDT event log: every object create, update, delete, and move with precise timestamps and actor identity. Vector-level fidelity — strokes, shapes, and annotations are structured data, not raster images. Replayable to any point in session history.

videobrowser_interaction

Continuous video of the human's browser session — not screenshots, not accessibility tree snapshots. Mouse trajectories, scroll behavior, hover states, click positions, and abandoned inputs visible. Synchronized with canvas and voice streams by wall-clock timestamp.

calls [ ] tool_invocations

Every tool call: name, input parameters, execution result, latency, and whether the result was accepted, modified, or retried. Full loop structure (observe → call → result → observe again) preserved at call-graph level.

objectsession_metadata

Session ID, participant roles (not PII), duration, space configuration, model provider, memory tier states at session start and end, consent tier, and licensing classification.

signals [ ]perception_events

In-browser perception stream results: face landmark positions (478 points), hand keypoints (21 per hand), gesture classifications, attention score, and engagement signal — derived locally, only results transmitted. Synchronized with all other streams.

Competitive landscape

How this compares to other data pipelines.

Frontier labs currently have three options for human-agent interaction data. Here is what each actually provides.

SpacebarScale / Surge / MercorSynthetic (APIGen-MT style)
Human behavior

Natural — humans are using a real product, not performing for a system

Structured but performed — annotators following task scripts

Plausible but not real — model-generated approximations of human behavior

Modalities

Voice + canvas + browser video + tool calls + perception signals — all synchronized

Typically text or structured annotation; some include screen recordings

Text-only or text + tool call; no video, no voice, no canvas

Recovery paths

Naturally occurring — mistakes, corrections, and abandoned attempts captured as they happen

Can be scripted but expensive; rarely includes genuine error recovery

Structurally absent — synthetic pipelines optimize for golden paths, not failure modes

Session length

Long-horizon — full work sessions, not isolated task clips

Typically task-scoped; multi-turn but bounded

Bounded by prompt length and generation cost

Collection mechanism

Byproduct of real product usage — no separate collection pipeline

Dedicated annotation workforce; separate ingestion pipeline

Automated generation; no human labor after initial design

Strategic use

High-realism anchor and validation set; seed for synthetic pipelines

Large-volume training data; good coverage but limited behavioral depth

Scalable volume at low cost; known distribution bias

Data governance

Consent, licensing, and what we will and won't do.

Frontier labs will not touch data that cannot answer these questions cleanly. Here are the answers.

Consent model

Every trajectory is collected under explicit user consent at the session level. The consent flow specifies exactly what data is captured, how it will be used, and the licensing tier. Users can mark sessions as private (never used for training), internal (available for their organization's own fine-tuning), or licensed (available to approved third parties under signed agreements). No session data is used for training without affirmative consent.

Licensing tiers

Session-level licensing with three tiers: private, internal, and licensed. Licensed sessions are available to approved third parties under executed data licensing agreements. Agreements specify permitted use, territory, sublicensing rights, and audit access. We do not license data to parties we cannot audit.

PII handling

Voice transcripts are run through PII detection before delivery. Canvas state is delivered with user identity replaced by role-scoped identifiers. Raw audio and video are never transmitted without explicit per-session consent. Perception signals (face landmarks, gesture data) are derived in-browser and only the computed results are transmitted — raw video never leaves the device unless a session is explicitly licensed for video delivery.

Purpose limitation

Licensed trajectories may be used only for the purposes specified in the agreement. Repurposing, sublicensing without consent, and use for identity inference are prohibited. Labs receive structured trajectory files, not access to our systems. We maintain a complete audit log of all data deliveries.

Redaction pipeline

Before any trajectory is made available for licensing, it passes through a redaction pipeline: PII scrubbing on transcripts, replacement of identifying metadata, removal of any canvas objects flagged as sensitive by the session owner, and quality review. Labs can request additional redaction passes under agreement.

Prescribed scenario programs

For labs that need high-volume, prescribed-scenario data — specific task types, specific tool combinations, specific error-recovery sequences — we support structured data collection programs with scenario templates, facilitator training, quality review, and delivery pipelines. This is a separate enterprise engagement. Contact partnerships@spacebar.ai.

Questions

What labs typically ask first.

What volume of trajectories are available now, and what's the growth trajectory?
Can we get a sample trajectory before signing anything?
What export formats do you support?
How does this compare to Scale or Surge?
Is the browser interaction genuinely continuous video, or is it a series of screenshots?
What's the latency on a new data licensing agreement?

If you're building or fine-tuning frontier models, let's talk.