Quick Start Guide
Get your first chute deployed in under 10 minutes! This guide will walk you through creating, building, and deploying a simple AI application.
Prerequisites
Make sure you've completed the Installation & Setup guide first.
Step 1: Create Your First Chute
Let's build a simple text generation chute using a pre-built template.
Create a new file called :
from chutes.chute import NodeSelector
from chutes.chute.template.vllm import build_vllm_chute
# Build a chute using the VLLM template
chute = build_vllm_chute(
username="your-username", # Replace with your Chutes username
readme="## Meta Llama 3.2 1B Instruct\n### Hello.",
model_name="unsloth/Llama-3.2-1B-Instruct",
node_selector=NodeSelector(
gpu_count=1,
),
concurrency=4,
readme="""
# My First Chute
A simple conversational AI powered by Llama 3.2.
## Usage
Send a POST request to `/v1/chat/completions` with your message.
"""
)That's it! You've just defined a complete AI application with:
- ✅ A pre-configured VLLM server
- ✅ Automatic model downloading
- ✅ OpenAI-compatible API endpoints
- ✅ GPU resource requirements
- ✅ Auto-scaling configuration
Step 2: Build Your Image
Build the Docker image for your chute:
chutes build my_first_chute:chute --waitThis will:
- 📦 Create a Docker image with all dependencies
- 🔧 Install VLLM and required libraries
- ⬇️ Pre-download your model
- ✅ Validate the configuration
The flag streams the build logs to your terminal so you can monitor progress.
Step 3: Deploy Your Chute
Deploy your chute to the Chutes platform:
chutes deploy my_first_chute:chuteAfter deployment, you'll see output like:
✅ Chute deployed successfully!
🌐 Public API: https://your-username-my-first-chute.chutes.ai
📋 Chute ID: 12345678-1234-5678-9abc-123456789012Step 4: Test Your Chute
Your chute is now live! Test it with a simple chat completion:
Option 1: Using curl
curl -X POST https://your-username-my-first-chute.chutes.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Llama-3.2-1B-Instruct",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}'Option 2: Using Python
import asyncio
import aiohttp
import json
async def chat_with_chute():
url = "https://your-username-my-first-chute.chutes.ai/v1/chat/completions"
payload = {
"model": "unsloth/Llama-3.2-1B-Instruct",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}
async with aiohttp.ClientSession() as session:
async with session.post(url, json=payload) as response:
result = await response.json()
print(json.dumps(result, indent=2))
# Run the test
asyncio.run(chat_with_chute())Option 3: Test Locally
You can also test your chute locally before deploying using the CLI:
# Run your chute locally
chutes run my_first_chute:chute --dev
# Then in another terminal, test with curl
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "unsloth/Llama-3.2-1B-Instruct",
"messages": [
{"role": "user", "content": "Hello! How are you today?"}
],
"max_tokens": 100,
"temperature": 0.7
}'Step 5: Monitor and Manage
View Your Chutes
chutes chutes listGet Detailed Information
chutes chutes get my-first-chuteCheck Logs
Visit the Chutes Dashboard to view real-time logs and metrics.
Deleting Resources
When you're done with a chute, it's good practice to clean up your resources.
- Note: You must remove a chute before you can delete its image. Images tied to running chutes cannot be deleted.
# 1. Delete the chute
chutes chutes delete <chute_id>
# 2. Delete the image (after chute is removed)
chutes images delete <image_id>What Just Happened?
Congratulations! You just:
- 🎯 Defined an AI application with just a few lines of Python
- 🏗️ Built a production-ready Docker image
- 🚀 Deployed to GPU-accelerated infrastructure
- 🌐 Exposed OpenAI-compatible API endpoints
- 💰 Pay-per-use - only charged when your chute receives requests
Next Steps
Now that you have a working chute, explore more advanced features:
🎨 Try Different Models
Replace with:
- (requires more VRAM)
🔧 Customize Hardware
Adjust your :
NodeSelector(
gpu_count=1, # Use 1 GPU
min_vram_gb_per_gpu=24, # Require 24GB VRAM per GPU
include=["a100", "h100"], # Prefer specific GPU types
exclude=["k80"] # Avoid older GPUs
)🎛️ Tune Performance
Modify engine arguments:
chute = build_vllm_chute(
# ... other parameters ...
engine_args={
"max_model_len": 4096,
"gpu_memory_utilization": 0.9,
"max_num_seqs": 32
}
)📚 Learn Core Concepts
- Understanding Chutes - Deep dive into the Chute class
- Security Architecture - Learn about our TEE and hardware attestation security
- Cords (API Endpoints) - Custom API endpoints
- Custom Images - Build your own Docker images
🏗️ Build Custom Applications
- Your First Custom Chute - Build from scratch
- Custom Image Building - Advanced Docker setups
- Input/Output Schemas - Type-safe APIs
🔗 Integrations
- Vercel AI SDK - Use Chutes with the Vercel AI SDK for streaming, tool calling, and more
Common Questions
Q: How much does this cost? A: You only pay for GPU time when your chute is processing requests. Idle time is free!
Q: Can I use my own models? A: Yes! Upload models to HuggingFace or use the custom image building features.
Q: What about scaling? A: Chutes automatically scales based on demand. Configure to control how many requests each instance handles.
Q: How do I debug issues? A: Check the logs in the Chutes Dashboard or use the CLI:
Troubleshooting
Build failed?
- Check that your model name is correct
- Try with a smaller model first
Deployment failed?
- Verify your image built successfully
- Check your username and chute name are valid
- Ensure you have proper permissions
Can't access your chute?
- Wait a few minutes for DNS propagation
- Check the exact URL from
- Verify the chute is in "running" status
Get Help
- 📖 Detailed Guides: Continue with Your First Custom Chute
- 💬 Community: Join our Discord
- 🐛 Issues: GitHub Issues
- 📧 Support:
Ready to build something more advanced? Check out Your First Custom Chute to learn how to build completely custom applications!