CueBee

Inspiration

We've all been in conversations where we wished we had an expert in our ear — a networking event, a client meeting, a difficult negotiation. You know the answer exists somewhere, but there's no time to Google it without breaking the flow.

CueBee started as a simple question: what if your earbuds could think with you? Not after the fact, not on a screen you have to glance at — but a voice that listens, understands context, and whispers the right thing at the right moment.


What It Does

CueBee is a real-time AI co-pilot for live conversations. It runs in the background through your earbuds, continuously listening to the conversation around you. When it detects a moment where context, facts, or a suggested response would help — a sharp question you weren't expecting, a statistic you can't quite recall, an awkward silence that needs filling, or a curveball that could derail the whole conversation — it proactively speaks to you, privately, without interrupting anyone else.

Key features:

  • Proactive voice assistance — surfaces relevant information without you having to ask
  • Natural language delivery — responses are structured and refined by the LLM in real time, so what you hear isn't raw AI output — it's language you can speak naturally and confidently
  • Persistent conversation memory — remembers context across sessions, so it gets smarter the more you use it
  • Secure user accounts — personalized experience with authentication
  • Cloud-deployed backend — low-latency responses from anywhere

How We Built It

We built CueBee's audio processing pipeline to run entirely on-device — fully local on both iOS and Android. This pipeline performs real-time speech-to-text and speaker diarization simultaneously with millisecond-level latency, and it's the cornerstone of our entire product. Every feature we've built sits on top of this reliable, low-latency foundation.

For the on-device runtime, we used Sherpa-ONNX, an open-source inference runtime optimized for running audio models on mobile hardware. This allowed us to avoid sending raw audio to the cloud, which is critical for both latency and user privacy.

On the backend, we use LangGraph to orchestrate our multi-agent system. As live transcriptions stream in from the client, our dispatcher routes context to specialized agents — each equipped with a different tool set built for a specific conversational need. For example:

  • A Fact-Check Agent quietly verifies claims made during the conversation in real time, so you're never caught nodding along to something inaccurate.
  • A News Agent pulls in the latest relevant stories as potential conversational material, so you always have something informed to add.
  • A Research Agent searches academic papers on the fly — useful when you're in a meeting or conversation with someone from academia and want to engage at their level.
  • More agents can be dynamically added as new conversational contexts are discovered.

The core insight here is that different conversations demand different tools. A casual networking chat needs different assistance than a client negotiation or a research discussion. By modularizing intelligence into specialized agents rather than one generic assistant, CueBee can provide deeply contextual, situation-aware support.

Layered on top are gateway agents that continuously monitor the live transcript and proactively push the right agent's insights to the user — creating the feeling that CueBee is always one step ahead.

To make suggestions feel natural and personalized, we integrated Backboard's API as a core component of our memory layer. Its built-in RAG system lets us store and retrieve user-specific context — background, communication style, phrasing preferences — so that every live suggestion CueBee surfaces sounds like something the user would actually say, not a generic AI response.

Underpinning the entire backend is a Go-based API gateway that handles request routing, load distribution, and authentication before anything reaches the agent layer. This keeps the core agentic logic clean and ensures the system scales without bottlenecks at the entry point.

For tool connectivity, we implemented OAuth-based integrations that let users securely connect their existing accounts from third-party services — currently LinkedIn and X (Twitter) — which are then exposed as live tools available to the agents. This means an agent can pull your LinkedIn connections during a networking conversation, or surface relevant posts from X as real-time conversation context. The tool wrapper architecture is designed to be extensible — new integrations can be added without touching the core agent logic, giving us a clear path to supporting any service a user's professional life depends on.


Challenges We Ran Into

1. Real-Time On-Device Audio Pipeline

Achieving low-latency, simultaneous STT and speaker diarization on-device was harder than it sounds — these are two separate models that run at different speeds and produce outputs on different cadences. We had to design a custom synchronization algorithm to merge the two output streams in real time, correctly attributing transcribed speech to the right speaker without introducing perceptible lag. Getting this to feel seamless on both iOS and Android, across varying hardware, was one of our toughest early engineering problems.

2. Architecting a Truly Proactive Agent System

Building an assistant that volunteers the right information at the right moment — rather than waiting to be asked — required rethinking how agents are organized. We had to carefully design our gateway agents to continuously analyze live conversation without overwhelming the pipeline, while simultaneously orchestrating sub-agents to go fetch relevant information (facts, news, papers, social context) in the background. The hard part was making all of this feel instantaneous and coherent to the user, not like a system scrambling to keep up.

3. Making MCP Servers User-Aware

MCP servers weren't designed with multi-user environments in mind — by default, they have no concept of per-user credentials or auth tokens. But for CueBee, every user needs to connect their own accounts to tools like LinkedIn or their personal calendar. We had to fork and modify existing MCP servers (such as the LinkedIn MCP and Google Calendar MCP) to support per-user authentication, so each user's agent session operates with their own isolated credentials and data access. This unlocked the personalized, account-aware tool use that makes CueBee's integrations actually useful.


Accomplishments That We're Proud Of

Getting the full end-to-end pipeline running was the milestone that made everything feel real. Hearing CueBee respond — naturally, quickly, in context — through earbuds during a live conversation demo was the moment the whole team knew this was something worth building.

We're also proud of the multi-API integration: five sponsor technologies working together as a coherent system, not bolted-on checkboxes.


What We Learned

Real-time audio is unforgiving. Buffering strategies, streaming APIs, and latency budgets forced us to think about system design in ways that typical request-response apps don't require.

We also learned that the hardest part of building a voice AI product isn't the AI — it's the UX of silence. Knowing when to speak, and when to stay quiet, is what separates a useful co-pilot from an annoying interruption.


What's Next for CueBee

The goal is to take what we validated this weekend and turn it into a product people actually use every day. Next steps include deeper personalization (learning your communication style and knowledge gaps over time), integration with calendar and meeting context for pre-loaded briefings, and expanding to Android and desktop.

The long-term vision: an AI layer that lives in your ear and makes every conversation you have the best version of itself.

Built With

  • auth0
  • backboard
  • elevenlabs
  • fastapi
  • google-gemini-api
  • python
  • swift
  • vultr
Share this project:

Updates