Inspiration

Most AI assistants today focus on productivity — writing emails, summarizing documents, or answering questions. But human communication is much more complex than that. Conversations involve emotions, social context, negotiation, and subtle decision-making.

We wondered: What if AI could exist inside conversations instead of outside them?

Instead of being another chatbot you talk to, what if your AI could quietly assist you while you talk with other people — helping you think, respond, and navigate social situations in real time?

That idea became Spices AI.

Just like spices enhance food without becoming the main dish, Spices enhances human conversations without replacing them.

What it does

Spices AI is a real-time conversational co-pilot.

It lives alongside your messaging conversations and helps you interact more intelligently.

Spices can:

• Analyze incoming messages and provide context-aware suggestions • Help draft replies that match your tone and intent • Answer questions about the conversation in real time • Provide private AI advice that only you can see • Interact via voice using Gemini Live so users can ask for help naturally

The key idea is that AI becomes a participant beside you, not the person you're talking to.

For example:

You can ask your AI:

“Does this reply sound too aggressive?”

Or:

“Help me say this more politely.”

Spices will respond instantly and even suggest improved responses — while keeping the conversation flowing naturally.

How we built it

Spices AI is built as a cloud-native multimodal agent powered by Google's AI stack.

Core architecture:

Frontend

Messaging-style interface for human conversations

Voice interaction layer for live AI conversations

Backend

Google Cloud Run hosts the main application server

Handles agent logic, conversation processing, and streaming responses

Database

Firestore stores conversation context, user state, and AI memory

AI Layer

Gemini Live API for real-time voice interaction and interruptible conversations

Gemini models for conversation understanding and response generation

Gemini image generation to support multimodal creative features

The system processes messages in real time, sends contextual prompts to Gemini, and returns suggestions or voice responses to the user.

Challenges we ran into

Building a real-time conversational agent introduced several technical challenges:

Low-latency interaction

For AI to feel like a natural conversational partner, responses must be extremely fast. We optimized backend flows and streaming responses to reduce perceived latency.

Conversation context

Maintaining the right context across messages while avoiding hallucinations required careful prompt design and structured conversation state management.

Voice interaction

Integrating Gemini Live required handling audio streaming, interruptions, and conversational turn-taking to create a natural speaking experience.

Human-centric UX

The biggest challenge was designing an interface where AI helps without becoming intrusive. Spices needed to feel like a subtle assistant, not a replacement for human interaction.

Accomplishments that we're proud of

Built a multimodal conversational agent combining text, voice, and AI reasoning

Integrated Gemini Live for real-time voice interaction

Deployed a fully working backend on Google Cloud Run

Designed a UX where AI enhances conversations instead of replacing them

Created a concept of AI that lives inside conversations

What we learned

This project taught us that the future of AI interaction is not just about better answers — it's about better presence.

AI doesn't always need to be the main character. Sometimes the most powerful role is being the invisible co-pilot that helps humans communicate more effectively.

We also learned how powerful Google's AI ecosystem can be when combining:

Gemini models

Gemini Live real-time interaction

Google Cloud infrastructure

Together they enable a new generation of live, multimodal AI agents.

What's next for Spices AI

We believe Spices represents a new paradigm: personal AI companions embedded inside human communication.

Future directions include:

Personalized AI personas that reflect each user's style

Deeper multimodal interaction (voice, images, context signals)

AI-to-AI negotiation and collaborative agents

Integration with more communication platforms

Our long-term vision is simple:

Your AI shouldn't replace your conversations. It should empower them.

Built With

Share this project:

Updates