BossRoom - Gamifying Work Across 900+ Apps

Inspiration

AI agents are powerful, but interaction is still tool-centric: prompts, configs, dashboards.

We built an interface where delegation is spatial and direct — like walking up to a coworker and stating the outcome.

BossRoom is a real-time multiplayer 3D office where AI agents are persistent coworkers.

Agents execute real work across 900+ enterprise integrations :

Messaging & Communication: Slack, Microsoft Teams, Discord, WhatsApp Business
Email & Calendar: Gmail, Outlook, Google Calendar
Docs & Knowledge: Notion, Confluence, Google Docs/Drive, Airtable
Whiteboarding & Design: Miro, Figma
Project & Issue Tracking: Linear, Jira, GitHub, GitLab
CRM & Sales: Salesforce, HubSpot
DevOps & Cloud: AWS, GCP, Azure
Databases & Infra: PostgreSQL, MongoDB, Supabase
Commerce & Payments: Stripe, Visa Intelligent Commerce (MCP)
Search & Data: SERP APIs, live product search, external data sources

All actions execute in real user-scoped accounts via secure OAuth — no API key's, no configuration.

A Receptionist receives a goal (e.g., “research competitors + write report”).
It dynamically spawns 3–12 agents via tool calls.
The LLM decides roles, skills, and leadership.
A lead agent delegates subtasks to workers.
Workers post updates to a shared scratchpad feed.
The lead compiles and finalizes output.
All agents and workspace state persist in PostgreSQL.

No hardcoded bots. Every workspace builds itself from the goal.

Two independent spatial pipelines (shared AudioContext):

Agent voice loop
- Mic → WebSocket → Deepgram (STT)
- LLM execution
- Inworld TTS → HRTF spatial playback
Player voice
- PeerJS WebRTC
- HRTF panner per remote player
- Distance-based rolloff

Fully deployed. Not localhost.

Building a physics-based 3D world with responsive third-person controls
Real-time multiplayer state sync (positions, agent state, scratchpad, products) over a single multiplexed WebSocket connection
Designing and validating 46 typed WebSocket message types (end-to-end Zod schema enforcement)
Dynamic agent spawning (3–12 agents per workspace) with persistent storage and zero race conditions during streaming tool calls
Multi-step LLM tool orchestration (up to 25 steps/turn) without blocking or state corruption
Maintaining per-agent memory, role separation, and runtime skill creation
Dual spatial audio pipelines (agent TTS + WebRTC player voice) sharing one AudioContext without interference
Real-time STT → LLM → TTS voice loop with spatial playback tied to 3D coordinates
OAuth scoping per user across 900+ integrations (secure isolation per Firebase UID)
MCP tool server integration (Visa Intelligent Commerce) with fallback payment rails
Cloud Run WebSocket deployment (HTTP/1.1, 3600s timeout, SQL proxy sidecar, keepalive strategy)
Streaming AI responses while preserving deterministic game-state updates
Procedural chunked terrain generation with LOD and performance constraints

Turned “agent workflows” into a game loop: walk up → ask → watch progress → get the outcome.
Made non-technical users effective on day one — no prompt craft, no dashboards, no setup rituals.
Converted messy, multi-step execution into a single clear interaction: users state intent, the system handles planning + delegation + tool actions.
Made agent work observable: you can see who’s doing what and hear responses spatially, instead of guessing in a black box.
Built a collaborative feel (multiplayer + proximity voice) so delegating to AI feels like working in a room, not using a tool.
Shipped real-world execution end-to-end (emails, tickets, meetings, payments) inside a fully deployed product in 36 hours.

~179 commits
6,000+ lines of TypeScript
46 WebSocket message types
14 Zustand stores
7 database tables
Full infrastructure-as-code deployment

This is a working system, not just a prototype.