GenAI-native serving and modeling, built for performance.
Build once, deploy anywhere with a single programmable stack for high-performance GenAI on any hardware
Your entire AI infrastructure in a single dependency
GPU agnostic
The same code runs on NVIDIA, AMD, and Apple Silicon. When new generations of hardware enter the datacenter, MAX is the fastest to take advantage and deliver top performance. Hardware will only get more exciting - be ready for it with MAX.
Open source & extensible
All of the MAX Python API, all of the model pipelines, and all the GPU kernels (for NVIDIA, AMD, and Apple) are open sourced for you to learn from and contribute to.
Measurable performance
See the numbers for yourself. MAX includes max benchmark, an open-source benchmarking tool adapted from vLLM. Run it against your endpoint with datasets like ShareGPT or arxiv-summarization, or bring your own. Export shareable YAML configs for reproducible results.