Skip to main content
Ctrl+K
Try Ray with $100 credit — Start now

Site Navigation

  • Get Started

  • Use Cases

  • Example Gallery

  • Library

    • Ray CoreScale general Python applications

    • Ray DataScale data ingest and preprocessing

    • Ray TrainScale machine learning training

    • Ray TuneScale hyperparameter tuning

    • Ray ServeScale model serving

    • Ray RLlibScale reinforcement learning

  • Docs

  • Resources

    • Discussion ForumGet your Ray questions answered

    • TrainingHands-on learning

    • BlogUpdates, best practices, user-stories

    • EventsWebinars, meetups, office hours

    • Success StoriesReal-world workload examples

    • EcosystemLibraries integrated with Ray

    • CommunityConnect with us

Try Managed Ray

Site Navigation

  • Get Started

  • Use Cases

  • Example Gallery

  • Library

    • Ray CoreScale general Python applications

    • Ray DataScale data ingest and preprocessing

    • Ray TrainScale machine learning training

    • Ray TuneScale hyperparameter tuning

    • Ray ServeScale model serving

    • Ray RLlibScale reinforcement learning

  • Docs

  • Resources

    • Discussion ForumGet your Ray questions answered

    • TrainingHands-on learning

    • BlogUpdates, best practices, user-stories

    • EventsWebinars, meetups, office hours

    • Success StoriesReal-world workload examples

    • EcosystemLibraries integrated with Ray

    • CommunityConnect with us

Try Managed Ray
  • Overview
  • Getting Started
  • Installation
  • Use Cases
    • Ray for ML Infrastructure
  • Examples
    • Multi-modal AI pipeline
      • Batch inference
      • Distributed training
      • Online serving
    • LLM training and inference
    • Audio batch inference
    • Distributed XGBoost pipeline
      • Distributed training of an XGBoost model
      • Model validation using offline batch inference
      • Scalable online XGBoost inference with Ray Serve
    • Time-series forecasting
      • Distributed training of a DLinear time-series model
      • DLinear model validation using offline batch inference
      • Online serving for DLinear model using Ray Serve
    • Scalable video processing
      • Fine-tuning a face mask detection model with Faster R-CNN
      • Object detection batch inference on test dataset and metrics calculation
      • Video processing with object detection using batch inference
      • Host an object detection model as a service
    • Distributed RAG pipeline
      • Build a Regular RAG Document Ingestion Pipeline (No Ray required)
      • Scalable RAG Data Ingestion and Pagination with Ray Data
      • Deploy LLM with Ray Serve LLM
      • Build Basic RAG App
      • Improve RAG with Prompt Engineering
      • Evaluate RAG with Online Inference
      • Evaluate RAG using Batch Inference with Ray Data LLM
    • Deploy MCP servers
      • Deploying a custom MCP in Streamable HTTP mode with Ray Serve
      • Deploy an MCP Gateway with existing Ray Serve apps
      • Deploying an MCP STDIO Server as a scalable HTTP service with Ray Serve
      • Deploying multiple MCP services with Ray Serve
      • Build a Docker image for an MCP server
    • Build a tool-using agent
    • Build a multi-agent system with the A2A protocol
  • Ecosystem
  • Ray Core
    • Key Concepts
    • User Guides
      • Tasks
        • Nested Remote Functions
      • Actors
        • Named Actors
        • Terminating Actors
        • AsyncIO / Concurrency for Actors
        • Limiting Concurrency Per-Method with Concurrency Groups
        • Utility Classes
        • Out-of-band Communication
        • Actor Task Execution Order
      • Objects
        • Serialization
        • Object Spilling
      • Environment Dependencies
      • Scheduling
        • Use labels to control scheduling
        • Resources
        • Accelerator Support
        • Placement Groups
        • Memory Management
        • Out-Of-Memory Prevention
      • Fault tolerance
        • Task Fault Tolerance
        • Actor Fault Tolerance
        • Object Fault Tolerance
        • Node Fault Tolerance
        • GCS Fault Tolerance
      • Design Patterns & Anti-patterns
        • Pattern: Using nested tasks to achieve nested parallelism
        • Pattern: Using generators to reduce heap memory usage
        • Pattern: Using ray.wait to limit the number of pending tasks
        • Pattern: Using resources to limit the number of concurrently running tasks
        • Pattern: Using asyncio to run actor methods concurrently
        • Pattern: Using an actor to synchronize other tasks and actors
        • Pattern: Using a supervisor actor to manage a tree of actors
        • Pattern: Using pipelining to increase throughput
        • Anti-pattern: Returning ray.put() ObjectRefs from a task harms performance and fault tolerance
        • Anti-pattern: Calling ray.get on task arguments harms performance
        • Anti-pattern: Calling ray.get in a loop harms parallelism
        • Anti-pattern: Calling ray.get unnecessarily harms performance
        • Anti-pattern: Processing results in submission order using ray.get increases runtime
        • Anti-pattern: Fetching too many objects at once with ray.get causes failure
        • Anti-pattern: Over-parallelizing with too fine-grained tasks harms speedup
        • Anti-pattern: Redefining the same remote function or class harms performance
        • Anti-pattern: Passing the same large argument by value repeatedly harms performance
        • Anti-pattern: Closure capturing large objects harms performance
        • Anti-pattern: Using global variables to share state between tasks and actors
        • Anti-pattern: Serialize ray.ObjectRef out of band
        • Anti-pattern: Forking new processes in application code
      • Ray Direct Transport (RDT)
        • Implementing a custom tensor transport (Advanced)
      • Ray Compiled Graph (beta)
        • Quickstart
        • Profiling
        • Experimental: Overlapping communication and computation
        • Troubleshooting
        • Compiled Graph API
      • Resource Isolation With Cgroup v2
      • Advanced topics
        • Tips for first-time users
        • Type hints in Ray
        • Starting Ray
        • Ray Generators
        • Using Namespaces
        • Cross-language programming
        • Working with Jupyter Notebooks & JupyterLab
        • Lazy Computation Graphs with the Ray DAG API
        • Miscellaneous Topics
        • Authenticating Remote URIs in runtime_env
        • Lifetimes of a User-Spawn Process
        • Head Node Memory Management
    • Examples
      • Batch Prediction with Ray Core
      • A Gentle Introduction to Ray Core by Example
      • Using Ray for Highly Parallelizable Tasks
      • A Simple MapReduce Example with Ray Core
      • Monte Carlo Estimation of π
      • Simple Parallel Model Selection
      • Parameter Server
      • Learning to Play Pong
      • Speed up your web crawler by parallelizing it with Ray
    • Ray Core API
      • Core API
      • Scheduling API
      • Runtime Env API
      • Utility
      • Exceptions
      • Ray Core CLI
      • State CLI
      • State API
      • Ray Direct Transport (RDT) API
    • Internals
      • Task Lifecycle
      • Streaming Generator
      • Autoscaler v2
      • RPC Fault Tolerance
      • Token Authentication
      • Metric Exporter Infrastructure
      • Ray Event Exporter Infrastructure
      • Port Service Discovery
      • Object Spilling
  • Ray Data
    • Ray Data Quickstart
    • Key Concepts
    • User Guides
      • Loading Data
      • Inspecting Data
      • Transforming Data
      • Aggregating Data
      • Iterating over Data
      • Joining Data
      • Shuffling Data
      • Weighted Dataset Mixing
      • Saving Data
      • Working with Images
      • Working with Text
      • Working with Tensors / NumPy
      • Working with Zarr
      • Working with PyTorch
      • Working with LLMs
      • How to avoid out-of-memory errors (OOMs)
      • Monitoring Your Workload
      • Execution Configurations
      • Run multiple Datasets in one cluster
      • End-to-end: Offline Batch Inference
      • Advanced: Performance Tips and Tuning
      • Advanced: Scaling out expensive collate functions
      • Advanced: Read and Write Custom File Types
    • Examples
    • Ray Data API
      • Loading Data API
      • Saving Data API
      • Dataset API
      • DataIterator API
      • ExecutionOptions API
      • Checkpoint API
      • Aggregation API
      • GroupedData API
      • Expressions API
      • Data types
      • Global configuration
      • Preprocessor
      • Large Language Model (LLM) API
      • API Guide for Users from Other Data Libraries
    • Contributing to Ray Data
      • Contributing Guide
      • How to write tests
    • Comparing Ray Data to other systems
    • Ray Data Benchmarks
    • Ray Data Internals
  • Ray Train
    • Overview
    • PyTorch Guide
    • PyTorch Lightning Guide
    • Hugging Face Transformers Guide
    • XGBoost Guide
    • JAX Guide
    • More Frameworks
      • Hugging Face Accelerate Guide
      • DeepSpeed Guide
      • TensorFlow and Keras Guide
      • LightGBM Guide
      • Horovod Guide
    • User Guides
      • Data Loading and Preprocessing
      • Configuring Scale and Accelerators
      • Configuring Persistent Storage
      • Monitoring and Logging Metrics
      • Saving and Loading Checkpoints
      • Validating checkpoints asynchronously
      • Experiment Tracking
      • Inspecting Training Results
      • Handling Failures and Node Preemption
      • Elastic training
      • Ray Train Metrics
      • Local Mode
      • Reproducibility
      • Hyperparameter Optimization
    • Tutorials
      • Introduction to Ray Train workloads
      • Computer vision pattern
      • Tabular workload pattern
      • Time series workload pattern
      • Generative computer vision pattern
      • Diffusion policy pattern
      • Recommendation system pattern
    • Examples
    • Benchmarks
    • Ray Train API
  • Ray Tune
    • Getting Started
    • Key Concepts
    • User Guides
      • Running Basic Experiments
      • Logging and Outputs in Tune
      • Setting Trial Resources
      • Using Search Spaces
      • How to Define Stopping Criteria for a Ray Tune Experiment
      • How to Save and Load Trial Checkpoints
      • How to Configure Persistent Storage in Ray Tune
      • How to Enable Fault Tolerance in Ray Tune
      • Using Callbacks and Metrics
      • Getting Data in and out of Tune
      • Analyzing Tune Experiment Results
      • A Guide to Population Based Training with Tune
        • Visualizing and Understanding PBT
      • Deploying Tune in the Cloud
      • Tune Architecture
      • Scalability Benchmarks
    • Ray Tune Examples
      • PyTorch Example
      • PyTorch Lightning Example
      • XGBoost Example
      • LightGBM Example
      • Hugging Face Transformers Example
      • Ray RLlib Example
      • Keras Example
      • PyTorch with ASHA
      • Weights & Biases Example
      • MLflow Example
      • Aim Example
      • Comet Example
      • Ax Example
      • HyperOpt Example
      • Bayesopt Example
      • BOHB Example
      • Nevergrad Example
      • Optuna Example
    • Ray Tune FAQ
    • Ray Tune API
      • Tune Execution (tune.Tuner)
      • Tune Experiment Results (tune.ResultGrid)
      • Training in Tune (tune.Trainable, tune.report)
      • Tune Search Space API
      • Tune Search Algorithms (tune.search)
      • Tune Trial Schedulers (tune.schedulers)
      • Tune Stopping Mechanisms (tune.stopper)
      • Tune Console Output (Reporters)
      • Syncing in Tune
      • Tune Loggers (tune.logger)
      • Tune Callbacks (tune.Callback)
      • Environment variables used by Ray Tune
      • External library integrations for Ray Tune
      • Tune Internals
      • Tune CLI (Experimental)
  • Ray Serve
    • Getting Started
    • Key Concepts
    • Develop and Deploy an ML Application
    • Deploy Compositions of Models
    • Deploy Multiple Applications
    • Model Multiplexing
    • Model Registry Integration
    • Configure Ray Serve deployments
    • Set Up FastAPI and HTTP
    • Serving LLMs
      • Quickstart
      • Examples
        • Deploy a small-sized LLM
        • Deploy a medium-sized LLM
        • Deploy a large-sized LLM
        • Deploy a vision LLM
        • Deploy a reasoning LLM
        • Deploy a hybrid reasoning LLM
        • Deploy gpt-oss
      • User Guides
        • Configuration reference
        • Deployment initialization
        • Multi-LoRA deployment
        • Cross-node parallelism
        • Data parallel attention
        • Fractional GPU serving
        • Prefill/decode disaggregation
        • KV cache offloading
        • Prefix-aware routing
        • Direct streaming
        • vLLM compatibility
        • SGLang integration
        • Observability and monitoring
      • Architecture
        • Architecture overview
        • Core components
        • Serving patterns
        • Request routing
      • Benchmarks
      • Troubleshooting
    • Production Guide
      • Serve Config Files
      • Deploy on Kubernetes
      • Custom Docker Images
      • Add End-to-End Fault Tolerance
      • Handle Dependencies
      • Best practices in production
    • Monitor Your Application
    • Resource Allocation
    • Ray Serve Autoscaling
    • Asynchronous Inference
    • Advanced Guides
      • Pass Arguments to Applications
      • Advanced Ray Serve Autoscaling
      • Asyncio and concurrency best practices in Ray Serve
      • Performance Tuning
      • Dynamic Request Batching
      • Updating Applications In-Place
      • Development Workflow
      • Set Up a gRPC Service
      • Replica ranks
      • Replica scheduling
      • Gang scheduling
      • Experimental Java API
      • Deploy on VM
      • Run Multiple Applications in Different Containers
      • Use Custom Algorithm for Request Routing
      • Use deployment-scoped actors
      • Troubleshoot multi-node GPU serving on KubeRay
    • Architecture
    • Examples
    • Ray Serve API
  • Ray RLlib
    • Getting Started
    • Key concepts
    • Environments
      • Multi-Agent Environments
      • Hierarchical Environments
      • External Environments and Applications
    • AlgorithmConfig API
    • Algorithms
    • User Guides
      • Advanced Python APIs
      • Callbacks
      • Checkpointing
      • MetricsLogger API
      • Episodes
      • ConnectorV2 and ConnectorV2 pipelines
        • Env-to-module pipelines
        • Learner connector pipelines
      • Replay Buffers
      • Working with offline data
      • RL Modules
      • Learner (Alpha)
      • Fault Tolerance And Elastic Training
      • Install RLlib for Development
      • RLlib scaling guide
    • Examples
    • New API stack migration guide
    • Ray RLlib API
      • Algorithm Configuration API
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_algo
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner_group
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_learner
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.is_multi_agent
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.is_offline
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learner_class
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.model_config
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.rl_module_spec
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.total_train_batch_size
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_default_learner_class
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_default_rl_module_spec
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_evaluation_config_object
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_rl_module_spec
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_multi_agent_setup
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.get_rollout_fragment_length
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.copy
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.validate
        • ray.rllib.algorithms.algorithm_config.AlgorithmConfig.freeze
      • Algorithms
        • ray.rllib.algorithms.algorithm.Algorithm
        • ray.rllib.algorithms.algorithm.Algorithm.setup
        • ray.rllib.algorithms.algorithm.Algorithm.get_default_config
        • ray.rllib.algorithms.algorithm.Algorithm.env_runner
        • ray.rllib.algorithms.algorithm.Algorithm.eval_env_runner
        • ray.rllib.algorithms.algorithm.Algorithm.train
        • ray.rllib.algorithms.algorithm.Algorithm.training_step
        • ray.rllib.algorithms.algorithm.Algorithm.save_to_path
        • ray.rllib.algorithms.algorithm.Algorithm.restore_from_path
        • ray.rllib.algorithms.algorithm.Algorithm.from_checkpoint
        • ray.rllib.algorithms.algorithm.Algorithm.get_state
        • ray.rllib.algorithms.algorithm.Algorithm.set_state
        • ray.rllib.algorithms.algorithm.Algorithm.evaluate
        • ray.rllib.algorithms.algorithm.Algorithm.get_module
        • ray.rllib.algorithms.algorithm.Algorithm.add_policy
        • ray.rllib.algorithms.algorithm.Algorithm.remove_policy
      • Callback APIs
        • ray.rllib.callbacks.callbacks.RLlibCallback
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_algorithm_init
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_sample_end
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_train_result
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_evaluate_start
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_evaluate_end
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_env_runners_recreated
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_checkpoint_loaded
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_environment_created
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_created
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_start
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_step
        • ray.rllib.callbacks.callbacks.RLlibCallback.on_episode_end
      • Environments
        • EnvRunner API
        • SingleAgentEnvRunner API
        • SingleAgentEpisode API
        • MultiAgentEnv API
        • MultiAgentEnvRunner API
        • MultiAgentEpisode API
        • External Envs
        • Env Utils
      • RLModule APIs
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.build
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.module_class
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.observation_space
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.action_space
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.inference_only
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.learner_only
        • ray.rllib.core.rl_module.rl_module.RLModuleSpec.model_config
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModuleSpec
        • ray.rllib.core.rl_module.multi_rl_module.MultiRLModuleSpec.build
        • ray.rllib.core.rl_module.default_model_config.DefaultModelConfig
        • ray.rllib.core.rl_module.rl_module.RLModule
        • ray.rllib.core.rl_module.rl_module.RLModule.observation_space
        • ray.rllib.core.rl_module.rl_module.RLModule.action_space
        • ray.rllib.core.rl_module.rl_module.RLModule.inference_only
        • ray.rllib.core.rl_module.rl_module.RLModule.model_config
        • ray.rllib.core.rl_module.rl_module.RLModule.setup
        • ray.rllib.core.rl_module.rl_module.RLModule.as_multi_rl_module
        • ray.rllib.core.rl_module.rl_module.RLModule.forward_exploration
        • ray.rllib.core.rl_module.rl_module.RLModule.forward_inference