DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • AI-Driven Kubernetes Troubleshooting With DeepSeek and k8sgpt
  • Building Scalable AI-Driven Microservices With Kubernetes and Kafka
  • Right-Sizing GPU and CPU Resources for Training and Inferencing Using Kubernetes
  • AI/ML Innovation in the Kubernetes Ecosystem

Trending

  • How the Go Runtime Preempts Goroutines for Efficient Concurrency
  • Overcoming React Development Hurdles: A Guide for Developers
  • Why We Still Struggle With Manual Test Execution in 2025
  • Simplify Authorization in Ruby on Rails With the Power of Pundit Gem
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Increase Model Flexibility and ROI for GenAI App Delivery With Kubernetes

Increase Model Flexibility and ROI for GenAI App Delivery With Kubernetes

Kubernetes enables a multiple model operating approach for GenAI app delivery that can increase innovation while reducing costs.

By 
Camille Crowell-Lee user avatar
Camille Crowell-Lee
·
Myles Gray user avatar
Myles Gray
·
Oct. 29, 24 · Analysis
Likes (2)
Comment
Save
Tweet
Share
8.3K Views

Join the DZone community and get the full member experience.

Join For Free

As with past technology adoption journeys, initial experimentation costs eventually shift to a focus on ROI. In a recent post on X, Andrew Ng extensively discussed GenAI model pricing reductions. This is great news, since GenAI models are crucial for powering the latest generation of AI applications. However, model swapping is also emerging as both an innovation enabler, and a cost saving strategy, for deploying these applications. Even if you've already standardized on a specific model for your applications with reasonable costs, you might want to explore the added benefits of a multiple model approach facilitated by Kubernetes.

A Multiple Model Approach to GenAI

A multiple model operating approach enables developers to use the most up-to-date GenAI models throughout the lifecycle of an application. By operating in a continuous upgrade approach for GenAI models, developers can harness the specific strengths of each model as they shift over time. In addition,  the introduction of specialized, or purpose-built models, enables applications to be tested and refined for optimal accuracy, performance and cost. 

Kubernetes, with its declarative orchestration API, is perfectly suited for rapid iteration in GenAI applications. With Kubernetes, organizations can start small and implement governance to conduct initial experiments safely and cost-effectively. Kubernetes’ seamless scaling and orchestration capabilities facilitate model swapping and infrastructure optimization while ensuring high performance of applications.

Expect the Unexpected When Utilizing Models

While GenAI is an extremely powerful tool for driving enhanced user experience, it's not without its challenges. Content anomalies and hallucinations are well-known concerns for GenAI models. Without proper governance, raw models—those used without an app platform to codify governance— are more likely to be led astray or even manipulated into jailbreak scenarios by malicious actors. Such vulnerabilities can result in financial loss amounting to millions in token usage and severely impact brand reputation. The financial implications of security failures are massive. A report by Cybercrime Magazine earlier this year suggests that cybercrime will cost upwards of $10 trillion annually by next year. Implementing effective governance and mitigation, such as brokering models through a middleware layer, will be critical to delivering GenAI applications safely, consistently, and at scale.

Kubernetes can help with strong model isolation through separate clusters and then utilize a model proxy layer to broker the models to the application. Kubernetes' resource tagging adds another layer of value by allowing you to run a diverse range of model types or sizes, requiring different accelerators within the same infrastructure. This flexibility also helps with budget optimization, as it prevents defaulting to the largest, most expensive accelerators. Instead, you can choose a model and accelerator combo that strike a balance between excellent performance and cost-effectiveness, ensuring the application remains efficient while adhering to budget constraints.


A diagram of an example of model curation for app platform governance and flexibility.

Example 1: Model curation for additional app platform governance and flexibility


Moreover, role-based access controls in Kubernetes ensures that only authorized individuals or apps can initiate requests to certain models in an individual cluster. This not only prevents unnecessary expenses from unauthorized usage, but also enhances security across the board. Additionally, with the capacity to configure specific roles and permissions, organizations can better manage and allocate resources, minimize risks, and optimize operational efficiency. Rapidly evolving GenAI models benefit from these governance mechanisms while maximizing potential benefits.

Scaling and Abstraction for GenAI! Oh My!

The scale of the model you choose for your GenAI application can vary significantly depending on the applications’ requirements. Applications might work perfectly well with a simple, compact, purpose-built model versus a large, complex model that demands more resources. To ensure the optimal performance of your GenAI application, automating deployment and operations is crucial. Kubernetes can be made to facilitate this automation across multiple clusters and hosts using GitOps or other methodologies, enabling platform engineers to expedite GenAI app operations. 

One of the critical advantages of using Kubernetes for delivering GenAI apps is its ability to handle GPU and TPU accelerated workloads. Accelerators are essential for training and inferencing of complex models quickly and efficiently. With Kubernetes, you can easily deploy and manage clusters with hardware accelerators, allowing you to scale your GenAI projects as needed without worrying about performance being limited by hardware. The same can be said for models optimized for modern CPU instruction sets which helps avoid the need to schedule for more scarce GPUs and TPUs resources.

In addition to handling GPU-accelerated workloads, Kubernetes also has features that make it well-suited for inferencing tasks. By utilizing capabilities like Horizontal Pod Autoscaling, Kubernetes can dynamically adjust resources based on the demand for your inferencing applications. This ensures that your applications are always running smoothly and can handle sudden spikes in traffic. On top of all this, the ML tooling ecosystem for Kubernetes is quite robust and allows for keeping data closer to the workloads. For example,  JupyterHub can be used to deploy Jupyter notebooks right next to the data with GPUs auto-attached,  allowing for enhanced latency and performance during the model experimentation phase. 

Getting Started With GenAI Apps With Kubernetes

Platform engineering teams can be key enablers for GenAI application delivery. By simplifying and abstracting away complexity from developers, platform engineering can facilitate ongoing innovation with GenAI by curating models based on application needs. Developers don't need to acquire new skills in model evaluation and management; they can simply utilize the resources available in their Kubernetes-based application platform. Also, platform engineering can help with improved accuracy and cost effectiveness of GenAI apps by continuously assessing accuracy and optimizing costs through model swapping. With frequent advancements and the introduction of smaller GenAI models, applications can undergo refinements over time.

Diagram of the setup of a GenAI app.

Example 2: How VMware Cloud Foundation + VMware Tanzu leverage Kubernetes

 

Kubernetes is pivotal in this continuous GenAI model upgrade approach, offering flexibility to accommodate model changes while adding access governance to the models. Kubernetes also facilitates seamless scaling and optimization of infrastructure while maintaining high-performance applications. Consequently, developers have the freedom to explore various models, and platform engineering can curate and optimize placement for those innovations.

This article was shared as part of DZone's media partnership with KubeCon + CloudNativeCon.

View the Event

AI Kubernetes Multi-model database

Opinions expressed by DZone contributors are their own.

Related

  • AI-Driven Kubernetes Troubleshooting With DeepSeek and k8sgpt
  • Building Scalable AI-Driven Microservices With Kubernetes and Kafka
  • Right-Sizing GPU and CPU Resources for Training and Inferencing Using Kubernetes
  • AI/ML Innovation in the Kubernetes Ecosystem

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: