Get started in minutes
Start fast with Serverless
Use popular models instantly with pay-per-token pricing. Perfect for quality vibe testing and prototyping.
Deploy models & autoscale on dedicated GPUs
Deploy with high performance on dedicated GPUs with fast autoscaling and minimal cold starts. Optimize deployments for speed and throughput.
Fine-tune models for best quality
Boost model quality with supervised and reinforcement fine-tuning of models up to 1T+ parameters. Start training in minutes, deploy immediately.
Not sure where to start? First, pick the right model for your use case with our model selection guide. Then choose Serverless to prototype quickly, move to Deployments to optimize and run production workloads, or use Fine-tuning to improve quality.Need help optimizing deployments, fine-tuning models, or setting up production infrastructure? Talk to our team - we’ll help you get the best performance and reliability.
What you can build
100+ Supported Models
Text, vision, audio, image, and embeddings
Migrate from OpenAI
Drop-in replacement - just change the base URL
Function Calling
Connect models to tools and APIs
Structured Outputs
Reliable JSON responses for agentic workflows
Vision Models
Analyze images and documents
Speech to Text
Real-time or batch audio transcription
Embeddings & Reranking
Use embeddings & reranking in search & context retrieval
Batch Inference
Run async inference jobs at scale, faster and cheaper
Resources & help
Which model should I use?
Find the best model for your use case
Cookbook
Code examples and tutorials
API Reference
Complete API documentation
Discord Community
Ask questions and get help from developers
Security & Compliance
SOC 2, HIPAA, and audit reports
System Status
Check service uptime
Talk to Sales
Talk to our team