Video: Something new: CubeCL, Writing Pure Rust GPU Kernels.
CubeCL is a multi-platform high-performance compute language extension for Rust that enables you to write GPU kernels using Rust. It supports various GPU runtimes, including CUDA for NVIDIA GPUs, ROCm/HIP for AMD GPUs (work in progress), and WGPU for cross-platform GPU support (Vulkan, Metal, DirectX, WebGPU).
With CubeCL, you can leverage Rust's zero-cost abstractions to develop maintainable, flexible, and efficient compute kernels. It currently supports functions, generics, and structs, with partial support for traits, methods, and type inference. The project aims to provide broader support for Rust language primitives while maintaining optimal performance.
CubeCL introduces several key features:
To use CubeCL, you annotate functions with the #[cube] attribute to indicate they should run on the GPU. For example:
use cubecl::prelude::*;
#[cube(launch_unchecked)]
fn gelu_array(input: &Array>, output: &mut Array>) {
if ABSOLUTE_POS < input.len() {
output[ABSOLUTE_POS] = gelu_scalar(input[ABSOLUTE_POS]);
}
}
#[cube]
fn gelu_scalar(x: Line) -> Line {
let sqrt2 = F::new(comptime!(2.0f32.sqrt()));
let tmp = x / Line::new(sqrt2);
x * (Line::erf(tmp) + 1.0) / 2.0
}
You can then launch the kernel using the autogenerated gelu_array::launch_unchecked function. CubeCL also provides a memory management strategy optimized for throughput with heavy buffer reuse to avoid allocations.
CubeCL is designed to ease the pain of writing highly optimized compute kernels that are portable across hardware. It aims to develop an ecosystem of high-performance and scientific computing in Rust, including linear algebra components and other essential algorithms like convolutions, random number generation, and fast Fourier transforms.
For more details, you can explore the CubeCL GitHub repository or check out the CubeCL Architecture Overview.