Modular acquires BentoML to deliver production AI in the cloud!  - Read more

Enterprise innovation, supercharged by Modular

Modular delivers high-speed inference, cross-architecture flexibility, and SLA-backed reliability—so your teams can innovate faster and scale without surprises.

  • +80%

    Faster

    vs vLLM (0.13)

  • +70%

    Cost reduction

    vs vLLM (0.13)

  • 2-5x

    Faster from research to production

    vs writing traditional kernels

~70% faster compared to vanilla vLLM

"Our collaboration with Modular is a glimpse into the future of accessible AI infrastructure. Our API now returns the first 2 seconds of synthesized audio on average ~70% faster compared to vanilla vLLM based implementation, at just 200ms for 2 second chunks. This allowed us to serve more QPS with lower latency and eventually offer the API at a ~60% lower price than would have been possible without using Modular’s stack."

Igor Poletaev

Chief Science Officer - Inworld

Slashed our inference costs by 80%

"Modular’s team is world class. Their stack slashed our inference costs by 80%, letting our customer dramatically scale up. They’re fast, reliable, and real engineers who take things seriously. We’re excited to partner with them to bring down prices for everyone, to let AI bring about wide prosperity."

Evan Conrad

CEO - San Francisco Compute

Confidently deploy our solution across NVIDIA and AMD

"Modular allows Qwerky to write our optimized code and confidently deploy our solution across NVIDIA and AMD solutions without the massive overhead of re-writing native code for each system."

Evan Owen

CTO, Qwerky AI

MAX Platform supercharges this mission

"At AWS we are focused on powering the future of AI by providing the largest enterprises and fastest-growing startups with services that lower their costs and enable them to move faster. The MAX Platform supercharges this mission for our millions of AWS customers, helping them bring the newest GenAI innovations and traditional AI use cases to market faster."

Bratin Saha

VP of Machine Learning & AI services

Supercharging and scaling

"Developers everywhere are helping their companies adopt and implement generative AI applications that are customized with the knowledge and needs of their business. Adding full-stack NVIDIA accelerated computing support to the MAX platform brings the world’s leading AI infrastructure to Modular’s broad developer ecosystem, supercharging and scaling the work that is fundamental to companies’ business transformation."

Dave Salvator

Director, AI and Cloud

Build, optimize, and scale AI systems on AMD

"We're truly in a golden age of AI, and at AMD we're proud to deliver world-class compute for the next generation of large-scale inference and training workloads… We also know that great hardware alone is not enough. We've invested deeply in open software with ROCm, empowering developers and researchers with the tools they need to build, optimize, and scale AI systems on AMD. This is why we are excited to partner with Modular… and we’re thrilled that we can empower developers and researchers to build the future of AI."

Vamsi Boppana

SVP of AI, AMD

Inworld
San Francisco Compute
Qwerky AI
AWS
NVIDIA
AMD

Case Studies

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Modular partners with AWS to democratize AI Infrastructure

Modular partnered with AWS to bring MAX to AWS Marketplace, offering SOTA performance for GenAI workloads across GPUs types.

2x cost savings with the fastest text-to-speech model ever

We made state-of-the-art speech synthesis scalable, and achieved a truly remarkable improvement both for the latency and throughput.

Unlocking fast AMD compute for all

AI inference has a cost problem. Hardware alone isn't enough - customers need software that can extract every ounce of performance from these chips. TensorWave and Modular team up to shatter the cost-performance ceiling for AI inference.

AI batch processing is now cheaper than anyone thought possible

When selling GPUs as a commodity meets the fastest inference engine - cost savings can skyrocket.

Modular partners with NVIDIA to accelerate AI compute everywhere

Modular’s Platform provides state-of-the-art support for NVIDIA Blackwell, Hopper, Ampere, Ada Lovelace and NVIDIA Grace Superchips.

Unleashing AI performance on AMD GPUs with Modular's Platform

Modular partners with AMD to bring the AI ecosystem more choice with state-of-the-art performance on AMD Instinct GPUs.

Revolutionizing your own research to production

Modular allows Qwerky AI to do advanced AI research, to write optimized code and deploy across NVIDIA, AMD, and other types of silicon.

Scales for enterprises