Project Story

Inspiration

Modern organizations face a fragmented security landscape — HTTP traffic anomalies are monitored in one tool, dependency vulnerabilities in another, cloud infrastructure in yet another, and code security in a separate pipeline entirely. Security teams are forced to context-switch between dashboards, losing precious time during incidents. We asked: what if a single, AI-driven dashboard could unify all of these into one real-time command center?

CyberLens was born from the frustration of juggling disconnected security tools and the realization that large language models have become powerful enough to perform nuanced threat analysis, not just pattern matching.

What it does

CyberLens is a full-stack security operations platform with three core capabilities:

  1. Live Threat Monitoring — Watches Nginx access logs in real time, batches HTTP requests, and uses Google Gemini (via ADK) to classify threats with structured AI analysis. Alerts surface instantly through WebSocket push.
  2. Supply Chain & Code Scanning — Scans GitHub repos or local projects for dependency vulnerabilities via the OSV database, then runs a multi-stage AI code analysis pipeline (inventory → chunking → 7 risk passes → evidence verification → synthesis) to find security issues in source code itself.
  3. GCP Security SOC — A dark-themed "war room" dashboard that monitors Cloud Run services, aggregates security events from Cloud Armor, IAM Audit, and IAP logs, clusters them into incidents with a rule engine, and visualizes threats on a geo attack map with real-time timeseries metrics.

Everything connects through Redis pub/sub → Socket.IO, so the frontend updates the instant something happens.

How we built it

The backend is Django 5.1 with Celery workers handling all async AI and scanning tasks. We chose Django REST Framework for clean API serialization and used Google ADK (InMemoryRunner) with Pydantic schemas to get structured, typed output from Gemini — no more parsing freeform LLM text.

The frontend is a React 18 + TypeScript SPA built with Vite. The GCP SOC page uses a Material You (M3) dark theme with custom SOC color tokens designed for extended monitoring sessions. The dependency visualization uses D3.js for interactive tree graphs.

A dedicated Node.js Socket.IO server bridges Redis pub/sub to the browser, keeping the backend stateless and horizontally scalable. The entire stack runs in Docker Compose with PostgreSQL, Redis, and Nginx.

For code scanning, we built a multi-stage ADK pipeline: file inventory → intelligent chunking → AI summarization → seven specialized risk analysis passes → candidate generation → evidence expansion → verification → final repository synthesis. This approach avoids the "dump everything into one prompt" anti-pattern and produces far more accurate findings.

Challenges we faced

  • Structured AI output — Getting Gemini to reliably return valid JSON matching our Pydantic schemas required careful prompt engineering and a clean_json_response() utility to strip markdown fencing from LLM output.
  • Real-time at scale — Batching log entries (15 requests or 5-second timeout) was critical to avoid overwhelming the AI analyzer while keeping latency low.
  • Code scan pipeline design — A naive "scan everything at once" approach either hit token limits or produced shallow results. The multi-stage chunked pipeline with seven specialized risk passes solved this but required careful orchestration.
  • GCP log normalization — Logs from Cloud Run, Cloud Armor, IAM Audit, and IAP have wildly different schemas. Building a unified event parser with pattern-based attack classification was one of the more tedious but essential pieces.
  • Session-based auth across three services — Coordinating Django sessions across the backend, frontend proxy, and Socket.IO realtime server required the realtime service to verify sessions against the backend API.

What we learned

  • Google ADK with Pydantic models is a game-changer for structured AI output — it turns LLMs from "text generators" into reliable components in a data pipeline.
  • Redis pub/sub as the sole communication layer between backend and frontend (via a thin Socket.IO bridge) keeps the architecture clean and each service independently deployable.
  • Multi-stage AI pipelines with specialized passes dramatically outperform single-prompt approaches for complex analysis tasks like code security scanning.

Built With

Share this project:

Updates