Skip to main content
Ctrl+K
Try Ray with $100 credit — Start now

Site Navigation

  • Get Started

  • Use Cases

  • Example Gallery

  • Library

    • Ray CoreScale general Python applications

    • Ray DataScale data ingest and preprocessing

    • Ray TrainScale machine learning training

    • Ray TuneScale hyperparameter tuning

    • Ray ServeScale model serving

    • Ray RLlibScale reinforcement learning

  • Docs

  • Resources

    • Discussion ForumGet your Ray questions answered

    • TrainingHands-on learning

    • BlogUpdates, best practices, user-stories

    • EventsWebinars, meetups, office hours

    • Success StoriesReal-world workload examples

    • EcosystemLibraries integrated with Ray

    • CommunityConnect with us

Try Managed Ray

Site Navigation

  • Get Started

  • Use Cases

  • Example Gallery

  • Library

    • Ray CoreScale general Python applications

    • Ray DataScale data ingest and preprocessing

    • Ray TrainScale machine learning training

    • Ray TuneScale hyperparameter tuning

    • Ray ServeScale model serving

    • Ray RLlibScale reinforcement learning

  • Docs

  • Resources

    • Discussion ForumGet your Ray questions answered

    • TrainingHands-on learning

    • BlogUpdates, best practices, user-stories

    • EventsWebinars, meetups, office hours

    • Success StoriesReal-world workload examples

    • EcosystemLibraries integrated with Ray

    • CommunityConnect with us

Try Managed Ray
  • Overview
  • Getting Started
  • Installation
  • Use Cases
    • Ray for ML Infrastructure
  • Examples
    • Multi-modal AI pipeline
      • Batch inference
      • Distributed training
      • Online serving
    • LLM training and inference
    • Audio batch inference
    • Distributed XGBoost pipeline
      • Distributed training of an XGBoost model
      • Model validation using offline batch inference
      • Scalable online XGBoost inference with Ray Serve
    • Time-series forecasting
      • Distributed training of a DLinear time-series model
      • DLinear model validation using offline batch inference
      • Online serving for DLinear model using Ray Serve
    • Scalable video processing
      • Fine-tuning a face mask detection model with Faster R-CNN
      • Object detection batch inference on test dataset and metrics calculation
      • Video processing with object detection using batch inference
      • Host an object detection model as a service
    • Distributed RAG pipeline
      • Build a Regular RAG Document Ingestion Pipeline (No Ray required)
      • Scalable RAG Data Ingestion and Pagination with Ray Data
      • Deploy LLM with Ray Serve LLM
      • Build Basic RAG App
      • Improve RAG with Prompt Engineering
      • Evaluate RAG with Online Inference
      • Evaluate RAG using Batch Inference with Ray Data LLM
    • Deploy MCP servers
      • Deploying a custom MCP in Streamable HTTP mode with Ray Serve
      • Deploy an MCP Gateway with existing Ray Serve apps
      • Deploying an MCP STDIO Server as a scalable HTTP service with Ray Serve
      • Deploying multiple MCP services with Ray Serve
      • Build a Docker image for an MCP server
    • Build a tool-using agent
    • Build a multi-agent system with the A2A protocol
  • Ecosystem
  • Ray Core
    • Key Concepts
    • User Guides
      • Tasks
        • Nested Remote Functions
      • Actors
        • Named Actors
        • Terminating Actors
        • AsyncIO / Concurrency for Actors
        • Limiting Concurrency Per-Method with Concurrency Groups
        • Utility Classes
        • Out-of-band Communication
        • Actor Task Execution Order
      • Objects
        • Serialization
        • Object Spilling
      • Environment Dependencies
      • Scheduling
        • Use labels to control scheduling
        • Resources
        • Accelerator Support
        • Placement Groups
        • Memory Management
        • Out-Of-Memory Prevention
      • Fault tolerance
        • Task Fault Tolerance
        • Actor Fault Tolerance
        • Object Fault Tolerance
        • Node Fault Tolerance
        • GCS Fault Tolerance
      • Design Patterns & Anti-patterns
        • Pattern: Using nested tasks to achieve nested parallelism
        • Pattern: Using generators to reduce heap memory usage
        • Pattern: Using ray.wait to limit the number of pending tasks
        • Pattern: Using resources to limit the number of concurrently running tasks
        • Pattern: Using asyncio to run actor methods concurrently
        • Pattern: Using an actor to synchronize other tasks and actors
        • Pattern: Using a supervisor actor to manage a tree of actors
        • Pattern: Using pipelining to increase throughput
        • Anti-pattern: Returning ray.put() ObjectRefs from a task harms performance and fault tolerance
        • Anti-pattern: Calling ray.get on task arguments harms performance
        • Anti-pattern: Calling ray.get in a loop harms parallelism
        • Anti-pattern: Calling ray.get unnecessarily harms performance
        • Anti-pattern: Processing results in submission order using ray.get increases runtime
        • Anti-pattern: Fetching too many objects at once with ray.get causes failure
        • Anti-pattern: Over-parallelizing with too fine-grained tasks harms speedup
        • Anti-pattern: Redefining the same remote function or class harms performance
        • Anti-pattern: Passing the same large argument by value repeatedly harms performance
        • Anti-pattern: Closure capturing large objects harms performance
        • Anti-pattern: Using global variables to share state between tasks and actors
        • Anti-pattern: Serialize ray.ObjectRef out of band
        • Anti-pattern: Forking new processes in application code
      • Ray Direct Transport (RDT)
        • Implementing a custom tensor transport (Advanced)
      • Ray Compiled Graph (beta)
        • Quickstart
        • Profiling
        • Experimental: Overlapping communication and computation
        • Troubleshooting
        • Compiled Graph API
      • Resource Isolation With Cgroup v2
      • Advanced topics
        • Tips for first-time users
        • Type hints in Ray
        • Starting Ray
        • Ray Generators
        • Using Namespaces
        • Cross-language programming
        • Working with Jupyter Notebooks & JupyterLab
        • Lazy Computation Graphs with the Ray DAG API
        • Miscellaneous Topics
        • Authenticating Remote URIs in runtime_env
        • Lifetimes of a User-Spawn Process
        • Head Node Memory Management
    • Examples
      • Batch Prediction with Ray Core
      • A Gentle Introduction to Ray Core by Example
      • Using Ray for Highly Parallelizable Tasks
      • A Simple MapReduce Example with Ray Core
      • Monte Carlo Estimation of π
      • Simple Parallel Model Selection
      • Parameter Server
      • Learning to Play Pong
      • Speed up your web crawler by parallelizing it with Ray
    • Ray Core API
      • Core API
      • Scheduling API
      • Runtime Env API
      • Utility
      • Exceptions
      • Ray Core CLI
      • State CLI
      • State API
      • Ray Direct Transport (RDT) API
    • Internals
      • Task Lifecycle
      • Streaming Generator
      • Autoscaler v2
      • RPC Fault Tolerance
      • Token Authentication
      • Metric Exporter Infrastructure
      • Ray Event Exporter Infrastructure
      • Port Service Discovery
      • Object Spilling
  • Ray Data
    • Ray Data Quickstart
    • Key Concepts
    • User Guides
      • Loading Data
      • Inspecting Data
      • Transforming Data
      • Aggregating Data
      • Iterating over Data
      • Joining Data
      • Shuffling Data
      • Weighted Dataset Mixing
      • Saving Data
      • Working with Images
      • Working with Text
      • Working with Tensors / NumPy
      • Working with PyTorch
      • Working with LLMs
      • How to avoid out-of-memory errors (OOMs)
      • Monitoring Your Workload
      • Execution Configurations
      • End-to-end: Offline Batch Inference
      • Advanced: Performance Tips and Tuning
      • Advanced: Scaling out expensive collate functions
      • Advanced: Read and Write Custom File Types
    • Examples
    • Ray Data API
      • Loading Data API
      • Saving Data API
      • Dataset API
      • DataIterator API
      • ExecutionOptions API
      • Checkpoint API
      • Aggregation API
      • GroupedData API
      • Expressions API
      • Data types
      • Global configuration
      • Preprocessor
      • Large Language Model (LLM) API
      • API Guide for Users from Other Data Libraries
    • Contributing to Ray Data
      • Contributing Guide
      • How to write tests
    • Comparing Ray Data to other systems
    • Ray Data Benchmarks
    • Ray Data Internals
  • Ray Train
    • Overview
    • PyTorch Guide
    • PyTorch Lightning Guide
    • Hugging Face Transformers Guide
    • XGBoost Guide
    • JAX Guide
    • More Frameworks
      • Hugging Face Accelerate Guide
      • DeepSpeed Guide
      • TensorFlow and Keras Guide
      • LightGBM Guide
      • Horovod Guide
    • User Guides
      • Data Loading and Preprocessing
      • Configuring Scale and Accelerators
      • Configuring Persistent Storage
      • Monitoring and Logging Metrics
      • Saving and Loading Checkpoints
      • Validating checkpoints asynchronously
      • Experiment Tracking
      • Inspecting Training Results
      • Handling Failures and Node Preemption
      • Elastic training
      • Ray Train Metrics
      • Local Mode
      • Reproducibility
      • Hyperparameter Optimization
    • Tutorials
      • Introduction to Ray Train workloads
      • Computer vision pattern
      • Tabular workload pattern
      • Time series workload pattern
      • Generative computer vision pattern
      • Diffusion policy pattern
      • Recommendation system pattern
    • Examples
    • Benchmarks
    • Ray Train API
  • Ray Tune