YuroLabs | Devpost

YuroLabs From Prompt to Production — Automating the Entire ML Pipeline

YuroLabs is an AI platform that lets anyone — technical or non-technical — create, train, and deploy custom machine learning models simply by describing them in natural language.

Behind the scenes, YuroLabs automates dataset discovery, synthetic data generation, architecture search, training, fine-tuning, and deployment, returning a ready-to-use PyTorch model file or a one-click deployable endpoint.

Inspiration

Training modern AI models is incredibly hard. Finding the right architecture, dataset, and training configuration — and integrating them into a coherent pipeline — typically requires weeks of manual work and deep ML expertise.

We wanted to simplify that entire process. With YuroLabs, all you have to do is prompt:

"Build me a vision model that can detect different dog breeds." …and our system handles everything else — from dataset sourcing to cloud GPU training to model deployment.

How We Built It

YuroLabs is built as an autonomous AI agent orchestration system that can interpret prompts, search the web, allocate compute, and build full ML pipelines dynamically.

Prompt-Driven Model Generation

A conversational interface (chatbot) interprets natural-language model descriptions. The agent classifies the request as a vision, language, multimodal, or LLM task. Based on the task type, it constructs a structured ML plan: dataset requirements, training objectives, model family, and architecture candidates.

Data Discovery & Curation

The system’s MCP (Model Control Protocol) agent has access to the web, Kaggle, and Hugging Face APIs. It autonomously searches and aggregates the most relevant datasets for the task. Datasets are validated for quality, label consistency, and input/output schema compatibility before training.

Synthetic Data Generation (Stable Diffusion + SAM + CLIP)

When dataset gaps are detected (missing classes, low variety, or domain imbalance), we fill them using synthetic data generation. Our process:

Extract objects from frames using SAM (Segment Anything Model).

Perform feature extraction using CLIP-Segment to identify visual attributes.

Use Stable Diffusion to regenerate the object over 10–15 synthetic backgrounds, creating diverse samples for robust training.

Apply data augmentation (rotations, color jitter, scaling) to further expand coverage.

This allows users to train models even with limited data while maintaining high accuracy and generalization.

Model Architecture & Training

We use meta-learning heuristics to select the optimal architecture for each use case — from CNNs and ViTs for vision tasks to Transformers and LLM adapters for text. The agent provisions GPU instances on the cloud for training and fine-tuning. LoRA (Low-Rank Adaptation) is used for efficient fine-tuning of large models on limited compute. The final trained model is exported as a .pt PyTorch file, ready for downstream inference or deployment.

Deployment & CLI Integration

We built a fully functional CLI + pip package (pip install yuro-labs) for technical users.

It enables: pip install yurolabs yuro --help (for figuring out commands) yuro models (shows all the models you trained with indices) yuro deploy 1 (or any integer for it)

Features: One-command deployment to cloud GPU endpoints. Automatic model versioning and inference endpoint creation. Integration with Hugging Face, AWS, and Vercel AI SDKs for inference APIs.

Non-technical users can use our lovable web UI, which offers: Drag-and-drop dataset upload. Live training visualization. One-click “Prompt → Model → Deploy” flow.

Architecture Overview

User Prompt ↓ Prompt Parser / MCP Agent ↓ Dataset Retriever (Kaggle + Web + Hugging Face) ↓ Synthetic Data Generator (SAM + CLIP + Stable Diffusion) ↓ Model Architecture Selector ↓ Training Orchestrator (PyTorch + LoRA + Cloud GPU) ↓ Fine-tuned Model (.pt) ↓ Deployment Engine (CLI + API)

Challenges We Ran Into

Dynamic orchestration: Allowing the agent to autonomously decide between multiple model families and data sources. Dataset validation: Automating data quality and consistency checks across heterogeneous sources. Synthetic data realism: Ensuring stable diffusion outputs maintained feature fidelity with extracted CLIP embeddings. Scalable compute: Managing GPU allocation and containerized training environments across multiple cloud providers.

Accomplishments We’re Proud Of

Fully autonomous agent capable of going from prompt → dataset → model → deployment. Seamless integration of SAM, CLIP, and Stable Diffusion for data generation and augmentation. A working PyTorch training orchestration pipeline running on real GPUs. Published a pip-installable CLI for production-level usage. Created one of the first frameworks that unifies data sourcing, model creation, synthetic generation, and deployment in a single interface.

For Lava, we were able to fully integrate all APIs to be rerouted through the Lava API. We are in talks with the founder to figure out a way to also integrate Modal, where we are training our data, and be able to figure out payment processing that can then go through with Lava and be processed through one singular page.

What We Learned

Prompt-driven ML can dramatically lower barriers to AI innovation. Synthetic data can bridge the gap between scarce real data and high-accuracy models. Building generalist AI agents requires careful control over tool access and autonomy boundaries. Automation in ML requires as much engineering orchestration as algorithmic design.

What’s Next for YuroLabs

Expand to multi-modal model composition (e.g., vision + language + audio). Introduce federated training orchestration for privacy-preserving model creation. Integrate live evaluation metrics and explainability tools directly into the UI. Launch a collaborative research hub where users can publish and remix generated models. Optimize our cloud agent runtime for faster model compilation and lower-cost GPU scaling.

In Summary

YuroLabs transforms model creation into a conversational experience — combining the intelligence of autonomous AI agents with the power of modern ML infrastructure.

From prompt → dataset → model → deployment — All in one line of text.