News

CubeCL: High-Performance GPU Compute Language for Rust

CubeCL: High-Performance GPU Compute Language for Rust

April 24, 2025
Rust GPU CUDA ROCm WGPU Compute Kernels High-Performance Computing SIMD Autotuning
CubeCL is a multi-platform Rust extension that enables developers to write efficient GPU kernels using Rust, supporting CUDA, ROCm/HIP, and WGPU for cross-platform GPU compute.

CubeCL: High-Performance GPU Compute Language for Rust

Video: Something new: CubeCL, Writing Pure Rust GPU Kernels.

CubeCL is a multi-platform high-performance compute language extension for Rust that enables you to write GPU kernels using Rust. It supports various GPU runtimes, including CUDA for NVIDIA GPUs, ROCm/HIP for AMD GPUs (work in progress), and WGPU for cross-platform GPU support (Vulkan, Metal, DirectX, WebGPU).

With CubeCL, you can leverage Rust's zero-cost abstractions to develop maintainable, flexible, and efficient compute kernels. It currently supports functions, generics, and structs, with partial support for traits, methods, and type inference. The project aims to provide broader support for Rust language primitives while maintaining optimal performance.

CubeCL introduces several key features:

  • Automatic Vectorization: Automatically uses SIMD instructions when possible for improved performance.
  • Comptime: Allows compile-time optimizations, instruction specialization, loop unrolling, and shape specialization.
  • Autotuning: Simplifies kernel selection by running benchmarks at runtime to determine the best configurations for the current hardware.

To use CubeCL, you annotate functions with the #[cube] attribute to indicate they should run on the GPU. For example:

use cubecl::prelude::*;

#[cube(launch_unchecked)]
fn gelu_array(input: &Array>, output: &mut Array>) {
    if ABSOLUTE_POS < input.len() {
        output[ABSOLUTE_POS] = gelu_scalar(input[ABSOLUTE_POS]);
    }
}

#[cube]
fn gelu_scalar(x: Line) -> Line {
    let sqrt2 = F::new(comptime!(2.0f32.sqrt()));
    let tmp = x / Line::new(sqrt2);
    x * (Line::erf(tmp) + 1.0) / 2.0
}

You can then launch the kernel using the autogenerated gelu_array::launch_unchecked function. CubeCL also provides a memory management strategy optimized for throughput with heavy buffer reuse to avoid allocations.

CubeCL is designed to ease the pain of writing highly optimized compute kernels that are portable across hardware. It aims to develop an ecosystem of high-performance and scientific computing in Rust, including linear algebra components and other essential algorithms like convolutions, random number generation, and fast Fourier transforms.

For more details, you can explore the CubeCL GitHub repository or check out the CubeCL Architecture Overview.

Sources

CubeCL: GPU Kernels in Rust for CUDA, ROCm, and WGPU Given that it can target WGPU I'm really wondering why OpenCL isn't included as a backend. One of my biggest complaints about GPGPU stuff is ...
tracel-ai/cubecl: Multi-platform high-performance compute language ... With CubeCL, you can program your GPU using Rust, taking advantage of zero-cost abstractions to develop maintainable, flexible, and efficient compute kernels.
CubeCL Architecture Overview - Running Rust on your GPU ... - Gist CubeCL provides runtimes ( cubecl_wgpu and cubecl_cuda ) that are built on top of the following backends: Wgpu and Cuda.