Inspiration

AI agents are powerful, but interaction is still tool-centric: prompts, configs, dashboards.

We built an interface where delegation is spatial and direct — like walking up to a coworker and stating the outcome.


What It Does

BossRoom is a real-time multiplayer 3D office where AI agents are persistent coworkers.

Real Execution (Not Chat)

Agents execute real work across 900+ enterprise integrations :

  • Messaging & Communication: Slack, Microsoft Teams, Discord, WhatsApp Business
  • Email & Calendar: Gmail, Outlook, Google Calendar
  • Docs & Knowledge: Notion, Confluence, Google Docs/Drive, Airtable
  • Whiteboarding & Design: Miro, Figma
  • Project & Issue Tracking: Linear, Jira, GitHub, GitLab
  • CRM & Sales: Salesforce, HubSpot
  • DevOps & Cloud: AWS, GCP, Azure
  • Databases & Infra: PostgreSQL, MongoDB, Supabase
  • Commerce & Payments: Stripe, Visa Intelligent Commerce (MCP)
  • Search & Data: SERP APIs, live product search, external data sources

All actions execute in real user-scoped accounts via secure OAuth — no API key's, no configuration.

Dynamic Multi-Agent Teams

  • A Receptionist receives a goal (e.g., “research competitors + write report”).
  • It dynamically spawns 3–12 agents via tool calls.
  • The LLM decides roles, skills, and leadership.
  • A lead agent delegates subtasks to workers.
  • Workers post updates to a shared scratchpad feed.
  • The lead compiles and finalizes output.
  • All agents and workspace state persist in PostgreSQL.

No hardcoded bots. Every workspace builds itself from the goal.

Interaction Layer

  • Push-to-talk voice input
  • Real-time transcription → execution → spoken response
  • Agents have visible states (listening, thinking, working, done, error)
  • Proximity-based player voice chat (WebRTC spatial audio)

World Layer

  • Procedurally generated chunked terrain (simplex noise, LOD)
  • Physics-based controls
  • Multiple avatar models
  • In-world 3D speech bubbles + visual state indicators

- In-world integrated views from your favorite apps

Architecture

Frontend

  • Next.js 16 + React 19 + TypeScript + Tailwind v4
  • React Three Fiber + drei
  • Rapier physics
  • Zustand (14 stores) synced via WebSocket
  • 46 typed WebSocket message types, validated with Zod
  • shadcn/ui for panels, scratchpad, product cards

Backend

  • Node.js WebSocket game server (domain-driven modules)
  • PostgreSQL 15 + Drizzle ORM (7 tables)
  • Dynamic agent creation + runtime skill system
  • Vercel AI SDK (streamText, multi-step tool calls)
  • Vercel AI Gateway (Gemini / Claude / GPT-4o swappable)
  • Composio OAuth for Gmail, Calendar, Linear, Stripe etc
  • MCP support for external tool servers (Visa Intelligent Commerce)

Voice

Two independent spatial pipelines (shared AudioContext):

  1. Agent voice loop

    • Mic → WebSocket → Deepgram (STT)
    • LLM execution
    • Inworld TTS → HRTF spatial playback
  2. Player voice

    • PeerJS WebRTC
    • HRTF panner per remote player
    • Distance-based rolloff

Infrastructure

  • Terraform-managed infrastructure
  • Google Cloud Run (WebSocket server, 3600s timeout)
  • Cloud SQL (Postgres)
  • Cloudflare Pages (frontend)
  • Firebase Auth (Google Sign-In)
  • Cloud Build → Artifact Registry → Docker deploy

Fully deployed. Not localhost.


Challenges

  • Building a physics-based 3D world with responsive third-person controls
  • Real-time multiplayer state sync (positions, agent state, scratchpad, products) over a single multiplexed WebSocket connection
  • Designing and validating 46 typed WebSocket message types (end-to-end Zod schema enforcement)
  • Dynamic agent spawning (3–12 agents per workspace) with persistent storage and zero race conditions during streaming tool calls
  • Multi-step LLM tool orchestration (up to 25 steps/turn) without blocking or state corruption
  • Maintaining per-agent memory, role separation, and runtime skill creation
  • Dual spatial audio pipelines (agent TTS + WebRTC player voice) sharing one AudioContext without interference
  • Real-time STT → LLM → TTS voice loop with spatial playback tied to 3D coordinates
  • OAuth scoping per user across 900+ integrations (secure isolation per Firebase UID)
  • MCP tool server integration (Visa Intelligent Commerce) with fallback payment rails
  • Cloud Run WebSocket deployment (HTTP/1.1, 3600s timeout, SQL proxy sidecar, keepalive strategy)
  • Streaming AI responses while preserving deterministic game-state updates
  • Procedural chunked terrain generation with LOD and performance constraints

- Shipping production infra (Terraform, Cloud Build, Docker, Cloud SQL, Cloudflare Pages) during a 36-hour hackathon

Accomplishments

  • Turned “agent workflows” into a game loop: walk up → ask → watch progress → get the outcome.
  • Made non-technical users effective on day one — no prompt craft, no dashboards, no setup rituals.
  • Converted messy, multi-step execution into a single clear interaction: users state intent, the system handles planning + delegation + tool actions.
  • Made agent work observable: you can see who’s doing what and hear responses spatially, instead of guessing in a black box.
  • Built a collaborative feel (multiplayer + proximity voice) so delegating to AI feels like working in a room, not using a tool.
  • Shipped real-world execution end-to-end (emails, tickets, meetings, payments) inside a fully deployed product in 36 hours.

~179 commits
6,000+ lines of TypeScript
46 WebSocket message types
14 Zustand stores
7 database tables
Full infrastructure-as-code deployment

This is a working system, not just a prototype.


What We Learned

The interface layer matters as much as the model.

Dynamic team creation — letting the LLM design the org structure per task — was the key architectural unlock.

Coordination becomes intuitive when agents are embodied, stateful, and observable.


What’s Next

  • Automatic model routing per task type
  • Visible in-world agent-to-agent collaboration
  • Cross-workspace skill marketplace
  • Expanded MCP integrations
  • Deeper in-world commerce flows

Every team will manage fleets of AI agents.

BossRoom is the interface layer.

Built With

Share this project:

Updates