One language, any hardware. Systems-level performance. Pythonic syntax.

Mojo unifies high-level AI development with low-level systems programming. Write once, deploy everywhere - from CPUs to GPUs - without vendor lock-in.

Start building now!

Mojo’s roadmap

Mojo highlights

  fn add(out: &mut LayoutTensor, a: &LayoutTensor, b: &LayoutTensor):
      i = global_idx.x
      if i < size:
          out[i] = a[i] + b[i]

  def mojo_square_array(array_obj: PythonObject):
      alias simd_width = simdwidthof[DType.int64]()
      ptr = array_obj.ctypes.data.unsafe_get_as_pointer[DType.int64]()
      @parameter
      fn pow[width: Int](i: Int):
          elem = ptr.load[width=width](i)
          ptr.store[width=width](i, elem * elem)

  struct VectorAddition:
      @staticmethod
      def execute[target: StaticString](
          out: OutputTensor[rank=1],
          lhs: InputTensor[dtype = out.dtype, rank = out.rank],
          rhs: InputTensor[dtype = out.dtype, rank = out.rank]
      )
          @parameter
          if target == "cpu":
              vector_addition_cpu(out, lhs, rhs)
          elif target == "gpu":
              vector_addition_gpu(out, lhs, rhs)
          else:
              raise Error("No known target:", target)

Why we built Mojo?

Vendor lock-in is expensive
You're forced to choose: NVIDIA's CUDA, AMD's ROCm, or Intel's oneAPI. Rewrite everything when you switch vendors. Your code becomes a hostage to hardware politics.
The two-language tax
Prototype in Python. Rewrite in C++ for production. Debug across language boundaries. Your team splits into 'researchers' and 'engineers' - neither can work on the full stack.
Python hits a wall
Python is 1000x too slow for production AI. The GIL blocks true parallelism. Can't access GPUs directly. Every optimization means dropping into C extensions. Simplicity becomes a liability at scale.
Toolchain chaos
PyTorch for training. TensorRT for inference. vLLM for serving. Each tool has its own bugs, limitations, and learning curve. Integration nightmares multiply with every component.
Memory bugs in production
C++ gives you footguns by default. Race conditions in parallel code. Memory leaks that OOM your servers. Segfaults in production at 3 AM.
Developer experience ignored
30-minute build times. Cryptic template errors. Debuggers that can't inspect GPU state. Profilers that lie about performance. Modern developers deserve tools that accelerate, not frustrate.

Why should I use Mojo?

Easier

GPU Programming Made Easy

Traditionally, writing custom GPU code means diving into CUDA, managing memory, and compiling separate device code. Mojo simplifies the whole experience while unlocking top-tier performance on NVIDIA and AMD GPUs.

Get Started With GPUs

  # GPU-specific coordinates for MMA tile processing
  @parameter
  for n_mma in range(num_n_mmas):
      alias mma_id = n_mma * num_m_mmas + m_mma
      var mask_frag_row = mask_warp_row + m_mma * MMA_M
      var mask_frag_col = mask_warp_col + n_mma * MMA_N
      @parameter
      if is_nvidia_gpu():
          mask_frag_row += lane // (MMA_N // p_frag_simdwidth)
          mask_frag_col += (lane * p_frag_simdwidth) % MMA_N
      elif is_amd_gpu():
          mask_frag_row += (lane // MMA_N) * p_frag_simdwidth
          mask_frag_col += lane % MMA_N

Performant

Bare metal performance on any GPU

Get raw GPU performance without complex toolchains. Mojo makes it easy to write high-performance kernels with intuitive syntax, zero boilerplate, and native support for NVIDIA, AMD, and more.

GPU Fundamentals

  # Using low level warp GPU instructions ergonomically
  
  @parameter
  for i in range(K):
      var reduced = top_k_sram[tid]
      alias limit = log2_floor(WARP_SIZE)
  
      @parameter
      for j in reversed(range(limit)):
          alias offset = 1 << j
          var shuffled = TopKElement(
              warp.shuffle_down(reduced.idx, offset),
              warp.shuffle_down(reduced.val, offset),
          )
          reduced = max(reduced, shuffled)
  
      barrier()

Interoperable

Use Mojo to extend python

Mojo interoperates natively with Python so you can speed up bottlenecks without rewriting everything. Start with one function, scale as needed—Mojo fits into your codebase

Intro to Python Interop

  if __name__ == "__main__":
      # Calling into a Mojo `passthrough` function from Python:
      result = hello_mojo.passthrough("Hello")
      print(result)

  fn passthrough(value: PythonObject) raises -> PythonObject:
      """A very basic function illustrating passing values to and from Mojo."""
      return value + " world from Mojo"

Community
Build with us in the open to create the future of AI
Mojo has more than 750K+ lines of open-source code with an active community of 50K+ members. We're actively working to open even more to build a transparent, developer-first foundation for the future of AI infrastructure.
View Open Kernel Repo
750k
lines of open-source code

MOJO + MAX

Write GPU Kernels with MAX

  # Define a custom GPU subtraction kernel
  
  @compiler.register("mo.sub")
  struct Sub:
      @staticmethod
      fn execute[target: StaticString, _trace_name: StaticString]
          z: FusedOutputTensor,
          x: FusedInputTensor,
          y: FusedInputTensor,
          ctx: DeviceContextPtr,
      capturing raises:
          @parameter
          @always_inline
          fn func[width: Int](idx: IndexList[z.rank]) -> SIMD[z.dtype, width]:
              var lhs = rebind[SIMD[z.dtype, width]](x._fused_load[width](idx))
              var rhs = rebind[SIMD[z.dtype, width]](y._fused_load[width](idx))
              return lhs - rhs
  
          foreach[
              func,
              target=target,
              _trace_name=_trace_name,
          ](z, ctx)

Interoperable
Powering Breakthroughs in Production AI
Top AI teams use Mojo to turn ideas into optimized, low-level GPU code. From Inworld’s custom logic to Qwerky’s memory-efficient Mamba, Mojo delivers where performance meets creativity.
Inworld Case Study
Qwerky Case Study
- Inworld
  Inworld used Mojo to define high-efficiency custom kernels to create things like a tailored silence-detection kernel that runs directly on the GPU.
- Qwerky
  Mojo enables Qwerky to compile custom GPU kernels accelerating Mamba's linear-time complexity for conversation history
Performant
World-Class Tools, Out of the Box
Mojo ships with a great VSCode debugger and works with dev tools like Cursor and Claude. Mojo makes modern dev workflows feel seamless.
Get VSCode Extension

Mojo learns from

- What Mojo keeps from C++
  - Zero cost abstractions
  - Metaprogramming power
    Turing complete: can build a compiler in templates
  - Low level hardware control
    Inline asm, intrinsics, zero dependencies
  - Unified host/device language
- What Mojo improves about C++
  - Slow compile times
  - Template error messages
  - Limited metaprogramming
    ...and that templates != normal code
  - Not MLIR-native
- What Mojo keeps from Python
  - Minimal boilerplate
  - Easy-to-read syntax
  - Interoperability with the massive Python ecosystem
- What Mojo improves about Python
  - Performance
  - Memory usage
  - Device portability
- What Mojo keeps from Rust
  - Memory safety through borrow checker
  - Systems language performance
- What Mojo improves about Rust
  - More flexible ownership semantics
  - Easier to learn
  - More readable syntax
- What Mojo keeps from Zig
  - Compile-time metaprogramming
  - Systems language performance
- What Mojo improves about Zig
  - Memory safety
  - More readable syntax

Get started with Mojo

Start using Mojo
( FREE )
Install Mojo and get up and running in minutes. A simple install, familiar tooling, and clear docs make it easy to start writing code immediately.
Install Mojo🔥
Mojo Quickstart
Easy ways to get started
Not sure where to start? These examples below give you a few simple entry points into Mojo.
- Mojo Manual
  Write a simple GPU program and learn the basics.
- GPU Puzzles
  Practice GPU programming with guided puzzles.
- Python Interoperability
  Read and write Mojo using familiar Python syntax.

One language, any hardware. Systems-level performance. Pythonic syntax.

Mojo highlights

Why we built Mojo?

Why should I use Mojo?

GPU Programming Made Easy

Bare metal performance on any GPU

Use Mojo to extend python

Build with us in the open to create the future of AI

Write GPU Kernels with MAX

Powering Breakthroughs in Production AI

World-Class Tools, Out of the Box

Mojo learns from

Get started with Mojo

Start using Mojo

Popular Mojo Tech Talks