Exploring WebAssembly AI Services on Cloudflare Workers

This is a guest post by Videet Parekh, Abelardo Lopez-Lagunas, Sek Chai at Latent AI.

Edge networks present a significant opportunity for Artificial Intelligence (AI) performance and applicability. AI technologies already make it possible to run compelling applications like object and voice recognition, navigation, and recommendations.

AI at the edge presents a host of benefits. One is scalability—it is simply impractical to send all data to a centralized cloud. In fact, one study has predicted a global scope of 90 zettabytes generated by billions of IoT devices by 2025. Another is privacy—many users are reluctant to move their personal data to the cloud, whereas data processed at the edge are more ephemeral.

When AI services are distributed away from centralized data centers and closer to the service edge, it becomes possible to enhance the overall application speed without moving data unnecessarily. However, there are still challenges to make AI from the deep-cloud run efficiently on edge hardware. Here, we use the term deep-cloud to refer to highly centralized, massively-sized data centers. Deploying edge AI services can be hard because AI is both computational and memory bandwidth intensive. We need to tune the AI models so the computational latency and bandwidth can be radically reduced for the edge.

The Case for Distributed AI Services

Edge network infrastructure for distributed AI is already widely available. Edge networks like Cloudflare serve a significant proportion of today’s Internet traffic, and can serve as the bridge between devices and the centralized cloud. Highly-performant AI services are possible because of the distributed processing that has excellent spatial proximity to the edge data.

We at Latent AI are exploring ways to deploy AI at the edge, with technology that transforms and compresses AI models for the edge. The size of our edge AI model is many orders of magnitudes smaller than the sensor data (e.g., kilobytes or megabytes for the edge AI model, compared to petabytes of edge data). We are exploring using WebAssembly (WASM) within the Cloudflare Workers environment. We want to identify possible operating points for the distributed AI services by exploring achievable performance on the available edge infrastructure.

Architectural Approach for Exploration

WebAssembly (WASM) is a new open-standard format for programs that run on the Web. It is a popular way to enable high-performance web-based applications. WASM is closer to machine code, and thus faster than JavaScript (JS) or JIT. Compiler optimizations, already done ahead of time, reduce the overhead in fetching and parsing application code. Today, WASM offers the flexibility and portability of JS at the near-optimum performance of compiled machine code.

AI models have notoriously large memory usage demands because configuring them requires high parameter counts. Cloudflare already extends support for WASM using their Wrangler CLI, and we chose to use it for our exploration.