Inspiration
Most AI assistants today focus on productivity — writing emails, summarizing documents, or answering questions. But human communication is much more complex than that. Conversations involve emotions, social context, negotiation, and subtle decision-making.
We wondered: What if AI could exist inside conversations instead of outside them?
Instead of being another chatbot you talk to, what if your AI could quietly assist you while you talk with other people — helping you think, respond, and navigate social situations in real time?
That idea became Spices AI.
Just like spices enhance food without becoming the main dish, Spices enhances human conversations without replacing them.
What it does
Spices AI is a real-time conversational co-pilot.
It lives alongside your messaging conversations and helps you interact more intelligently.
Spices can:
• Analyze incoming messages and provide context-aware suggestions • Help draft replies that match your tone and intent • Answer questions about the conversation in real time • Provide private AI advice that only you can see • Interact via voice using Gemini Live so users can ask for help naturally
The key idea is that AI becomes a participant beside you, not the person you're talking to.
For example:
You can ask your AI:
“Does this reply sound too aggressive?”
Or:
“Help me say this more politely.”
Spices will respond instantly and even suggest improved responses — while keeping the conversation flowing naturally.
How we built it
Spices AI is built as a cloud-native multimodal agent powered by Google's AI stack.
Core architecture:
Frontend
Messaging-style interface for human conversations
Voice interaction layer for live AI conversations
Backend
Google Cloud Run hosts the main application server
Handles agent logic, conversation processing, and streaming responses
Database
Firestore stores conversation context, user state, and AI memory
AI Layer
Gemini Live API for real-time voice interaction and interruptible conversations
Gemini models for conversation understanding and response generation
Gemini image generation to support multimodal creative features
The system processes messages in real time, sends contextual prompts to Gemini, and returns suggestions or voice responses to the user.
Challenges we ran into
Building a real-time conversational agent introduced several technical challenges:
Low-latency interaction
For AI to feel like a natural conversational partner, responses must be extremely fast. We optimized backend flows and streaming responses to reduce perceived latency.
Conversation context
Maintaining the right context across messages while avoiding hallucinations required careful prompt design and structured conversation state management.
Voice interaction
Integrating Gemini Live required handling audio streaming, interruptions, and conversational turn-taking to create a natural speaking experience.
Human-centric UX
The biggest challenge was designing an interface where AI helps without becoming intrusive. Spices needed to feel like a subtle assistant, not a replacement for human interaction.
Accomplishments that we're proud of
Built a multimodal conversational agent combining text, voice, and AI reasoning
Integrated Gemini Live for real-time voice interaction
Deployed a fully working backend on Google Cloud Run
Designed a UX where AI enhances conversations instead of replacing them
Created a concept of AI that lives inside conversations
What we learned
This project taught us that the future of AI interaction is not just about better answers — it's about better presence.
AI doesn't always need to be the main character. Sometimes the most powerful role is being the invisible co-pilot that helps humans communicate more effectively.
We also learned how powerful Google's AI ecosystem can be when combining:
Gemini models
Gemini Live real-time interaction
Google Cloud infrastructure
Together they enable a new generation of live, multimodal AI agents.
What's next for Spices AI
We believe Spices represents a new paradigm: personal AI companions embedded inside human communication.
Future directions include:
Personalized AI personas that reflect each user's style
Deeper multimodal interaction (voice, images, context signals)
AI-to-AI negotiation and collaborative agents
Integration with more communication platforms
Our long-term vision is simple:
Your AI shouldn't replace your conversations. It should empower them.
Built With
- antigavity
- cloud
- gemini
- javascript
- python
Log in or sign up for Devpost to join the conversation.