Serverless AI is a Nebius AI Cloud service for running containerized AI workloads without creating or operating virtual machines or clusters. To run your workload in Serverless AI, you just need to choose how to deploy it (as an interactive endpoint or as a non-interactive job), specify the path to your container, and select the computing and storage resources that the workload requires. Serverless AI handles resource provisioning and lifecycle (endpoints and jobs run on Compute containers over VMs), and usage-based, per-second billing, allowing you to focus on interacting with the workload and getting results from it. To catch and handle errors or unexpected outcomes, you can use the observability and debugging tools that Serverless AI provides.Documentation Index
Fetch the complete documentation index at: https://docs.nebius.com/llms.txt
Use this file to discover all available pages before exploring further.
Endpoints and jobs
You can deploy your workload as an endpoint that listens for requests and returns results immediately, or as a job that runs in the background and quits after completing its task. Here is the comparison between endpoints and jobs at a glance:| Endpoint | Job | |
|---|---|---|
| Workflow | Interactive, listens for requests until you terminate it | Non-interactive, terminates upon task completion or timeout |
| Stop/start | Yes | No |
| Public URL for requests | Yes | No |
| Typical lifetime | Hours to days | Minutes to days |
| Use cases | Persistent workloads: serving and A/B-testing models, real-time inference | Batch workloads: pre-processing data, training and fine-tuning models, batch inference and model evaluation, scientific simulations |
| Guides | Getting started with endpoints | Getting started with jobs |