Inspiration

Editing videos is powerful but painful. Traditional editors require scrubbing through timelines, hunting clips, and stacking effects. We asked: what if editing was as easy as telling your computer what to do?

That’s how this project was born — inspired by Cursor, but reimagined for video editing. Instead of cut() and drag-and-drop, you just say:

“Trim the part where the guy speaking is silent.”

…and it happens.


How We Built It

  • Backend (FastAPI + TwelveLabs + FFmpeg): Handles video/audio processing, trims clips, adds effects, and generates previews.
  • NLP (Cohere): Parses natural language into structured editing commands.
  • Video Search (Twelve Labs): Finds key moments (like “LeBron silent” or “when the person dies”).
  • Frontend: A minimal Cursor-like UI with chat-driven commands, video preview, and instant feedback.
  • Dev Environment (Windsurf): Used Windsurf’s AI-first IDE to rapidly prototype, refactor, and debug the entire stack under hackathon time pressure. It cut down iteration time massively.

The workflow looks like this:

User Command → Cohere (parse intent) → Twelve Labs (find moment) 
→ Executor (FFmpeg) → Preview video

What We Learned

  • How much AI-powered development environments like Windsurf accelerate building — almost like pair-programming with a senior engineer on demand.
  • How multimodal AI (language + video) can completely reshape creative tools.
  • The trade-offs between real-time performance vs. hackathon prototyping .
  • How important it is to scope ruthlessly: better to demo 3 magical features than 10 broken ones.
  • That even for creative tools, structured pipelines (NLP → search → execution) make everything easier.

Challenges We Faced

  • Latency: Running Cohere + Our Server + Twelve Labs can get heavy. We solved it with mocks + pre-indexed demos.
  • Parsing ambiguity: Natural language is messy. We had to carefully prompt Cohere and fallback to heuristics.
  • Media handling: Combining audio overlays with video timelines isn’t trivial — ffmpeg saved us here.
  • Time pressure: 30 hours forced us to keep scope razor-thin.

What’s Next

  • More robust effect libraries (glitch, transitions, auto-cuts).
  • Real-time collaboration — multiple users editing the same video via chat.
  • Scaling up beyond demos: distributed video rendering for fast results.

Final Thought

We set out to answer one question:

What if video editing was as simple as talking to your video?

This project is our first step toward that future.

Built With

Share this project:

Updates