Inspiration
The inspiration for ReScene came from a clear gap in AI photography today. On one end, basic filter apps are easy to use but lack real creativity. On the other end, professional AI tools require complex prompt engineering, which most users find intimidating.
We asked a simple question: what if everyone had a world-class Photography Director in their pocket?
With ReScene, you simply upload a photo. The AI Agent analyzes the scene to identify the location, and recommending the season, weather, or time of day that would make it look its best. If you want something different, you can just chat with it.
What it does
ReScene is an AI Photography Director Agent. It understands the scene, imagines the best version of it, proposes creative options, and executes the transformation. Here's how it works:
• Upload a photo: ReScene analyzes the scene and identifies the location.
• Discover the best moment: it recommends the season, weather, or lighting that would make the photo look its best.
• Transform instantly: generate a cinematic version of the scene with one tap.
• Refine by chatting: describe the vibe you want, and the Agent will adjust the scene for you.
How we built it
We designed ReScene using a “Left Brain / Right Brain” architecture, combining reasoning and creativity.
- Frontend (iOS / SwiftUI): The app is built natively with Swift and SwiftUI to deliver a fast, immersive experience. We implemented dynamic UI components such as real-time loading states, conversational chat bubbles, and an interactive Before/After slider for seamless visual comparison.
- Backend (Node.js / Fastify): Our backend runs on a stateless, serverless architecture deployed on Google Cloud Run. The system is modular and decoupled, using dependency injection so we can easily swap and upgrade AI models as the ecosystem evolves.
- The Left Brain - AI Reasoning: Powered by Gemini, the “Left Brain” handles understanding and planning. It analyzes user intent, maintains conversational context, and decides when to chat with the user versus when to generate a structured rendering plan for the image transformation.
- The Right Brain - Image Generation: The “Right Brain” executes the visual transformation. It takes the AI-generated blueprint and performs high-fidelity image generation and editing to produce the final cinematic result.
Challenges we ran into
Serverless State Management & Memory Constraints (OOM) Our biggest hurdle was handling large images across a serverless architecture. Initially, we sent Base64-encoded images directly in JSON payloads between the iOS client, Cloud Run backend, and Vertex AI API. Because Base64 inflates the payload by roughly 33%: $$S_{base64} = 4 \times \left\lceil \frac{S_{binary}}{3} \right\rceil$$
this caused severe network latency and risked crashing stateless Cloud Run instances under concurrent requests.
The Fix: We re-architected the flow to leverage Google Cloud Storage (GCS). The app now uploads binary images to a temporary GCS bucket and passes a lightweight gs:// URI between services. This reduced payloads to essentially O(1), enabling fast, multi-turn agent interactions without repeated uploads. We also set a 1-day lifecycle deletion policy for fully automated cleanup.
Accomplishments that we're proud of
- Dual-Mode Agent Routing: Engineered Gemini to seamlessly switch between conversational text and structured JSON “Proposal Cards” using prompt design and Vertex AI’s structured outputs.
- Zero-Friction UX: The native SwiftUI Before/After slider, enhanced with tactile haptic feedback, delivers an instantly satisfying “Aha!” moment when users see the AI render.
- Production-Ready Architecture: Implemented a full CI/CD pipeline with GitHub and Google Cloud Run, supported by a robust, stateless backend built for scale and reliability.
What we learned
- Prompt Engineering is a Backend Skill: Crafting a user-facing prompt is straightforward, but designing a meta-prompt that instructs an Agent to generate prompts for another model is a complex engineering challenge.
- The Power of Serverless + Storage: By leveraging cloud storage buckets for inter-API handoffs, we can elegantly bypass the limitations of stateless compute and avoid memory bottlenecks.
- Structured Outputs are Essential: Relying on raw text from LLMs for app logic leads to instability. Enforcing JSON schemas is the only reliable way to build robust AI pipelines.
What's next for ReScene
We plan to fully integrate the Gemini Live Multimodal Voice API so users can literally talk to their AI Director while pointing their camera at a scene, generating real-time environmental remastering proposals on the fly.
Built With
- fastify
- gemini
- google-cloud
- ios
- swift

Log in or sign up for Devpost to join the conversation.