Mr.Stitch | Devpost

main page
gallery
GIF
Video - Result #1 (Promoting Pepsi)
GIF
Video - Result #2 (Promoting Pepsi)
GIF
Video - Result #3 (Promoting Pepsi)
Web ad - Result #1

Inspiration

We wanted to build an ad tool that feels less like forced placement and more like a creative editing assistant. A lot of digital ads interrupt the experience they are trying to monetize, so we asked a different question: what if a system could understand the context of a scene first, then generate an ad moment that actually fits?

That idea became CAFAI: Context-Aware Fused Ad Insertion. The goal was to make product placement feel native instead of awkward, and to prove that the same idea could work across both video and website media.

What it does

CAFAI is a creative generation workflow with two lanes:

1. Video ad insertion

A user uploads a source video and product, the system analyzes the video for candidate insertion windows, ranks the best moment, generates a short branded bridge clip, and stitches that clip back into the original footage as a previewable final cut.

2. Website ad generation

A user provides product info plus article context, and the system generates a matching banner ad and vertical ad that are designed to fit the tone and topic of the page. Those assets can then be shown inside example website layouts.

In short, CAFAI helps turn product promotion into something more contextual, more visual, and more watchable.

How we built it

We built CAFAI as a full-stack hackathon project with a custom frontend and backend pipeline.

Frontend

We built the user-facing product in React, with a playful pink voxel-inspired interface. The frontend includes:

a homepage and proof wall
a gallery for processed outputs
an upload flow for both video ads and website ads
a simple About page
review pages for generated results

Backend

We used Go for the control plane and API layer. The backend manages:

job creation and workflow stages
analysis and slot selection
generation requests
preview rendering
website ad asset generation and delivery
metadata storage through SQLite

AI / media services

We connected multiple services depending on the task:

Azure Video Indexer for scene analysis
Azure OpenAI for ranking insertion slots and generating creative prompts
Higgsfield Kling as the primary video generation path
Azure ML as fallback generation wiring
Hugging Face / Stable Diffusion XL for website ad image generation

We also used local file storage for previews and generated assets, plus optional Notion logging for job audit history.

Challenges we ran into

One of the biggest challenges was making the project feel like one product instead of a pile of disconnected AI features. Video insertion, static ad generation, previews, galleries, and job tracking all had different technical needs, but the experience still had to feel coherent.

We also ran into the practical challenge of building a pipeline that depends on multiple external services. Different providers have different inputs, speeds, and failure cases, so a lot of work went into keeping the workflow understandable even when the backend was doing complex orchestration.

Another challenge was proving that the generated result was actually believable. It was not enough to say “AI made a clip.” We needed a proof-oriented UI that showed the original scene, the selected insert window, the generated bridge, and the final stitched output.

Accomplishments that we're proud of

We are proud that CAFAI is not just a concept mockup. It is a working multi-step system with a real frontend, backend routes, stored assets, and demo outputs.

Some highlights we are especially proud of:

building both a video ad lane and a website ad lane in one project
creating a polished frontend with a strong visual identity
showing proof assets instead of just final outputs
supporting real generated website ads through the backend
stitching branded video moments back into source footage
making the whole project feel playful on the surface while still technically serious underneath

What we learned

We learned a lot about designing AI workflows as products, not just demos. The most important lesson was that orchestration matters as much as generation. Picking the right moment, structuring the pipeline, showing evidence, and handling state between steps are what make the experience feel useful.

We also learned how much presentation affects trust. A cleaner interface, clearer job states, and visible proof artifacts make users more willing to believe the output. On the engineering side, we got deeper experience with React, Go APIs, provider integration, media handling, and building around imperfect AI outputs.