DeepGEMM

by DeepSeek

DeepGEMM is an open-source library designed for efficient and concise FP8 matrix multiplication, optimized for NVIDIA Hopper Tensor Cores.

What is DeepGEMM?

DeepGEMM is an open-source library by DeepSeek designed for efficient FP8 (8-bit floating point) matrix multiplication (GEMM). Optimized for NVIDIA Hopper Tensor Cores, it supports both regular and Mixture of Experts (MoE) grouped GEMM operations. With Just-In-Time (JIT) compilation, DeepGEMM dynamically optimizes performance at runtime, eliminating the need for installation-time compilation.

Key Features

Efficient FP8 Matrix Multiplication: Designed for FP8 GEMM, it supports fine-grained scaling to improve precision and performance.
Regular and Grouped GEMM: Handles standard matrix multiplication and grouped GEMM for MoE models.
JIT Compilation: Kernels are dynamically compiled at runtime based on matrix shape and parameters.
Hopper Architecture Optimization: Leverages Tensor Memory Accelerator (TMA) for enhanced data transfer efficiency.
Fine-Grained Scaling and Dual-Level Accumulation: Addresses FP8 precision issues by converting results to higher precision formats.
Lightweight Design: Core code is concise (300 lines), making it easy to learn and optimize.

Performance

Regular GEMM: Up to 2.7x speedup for certain matrix shapes, achieving over 1000 TFLOPS in large-scale operations.
Grouped GEMM: 1.1-1.2x speedup for MoE models, optimizing memory bandwidth utilization.

System Requirements

Hardware: NVIDIA Hopper architecture GPUs (e.g., H800, H100).
Software: CUDA 12.3+, Python 3.8+, PyTorch 2.1+, CUTLASS 3.6+, Linux OS.

Use Cases

Large-Scale AI Model Inference: Accelerates high-dimensional matrix multiplication.
MoE Models: Optimizes grouped matrix multiplication for efficient training and inference.
Low-Precision Computation: Solves FP8 precision issues while maintaining high-precision output.
High-Performance Computing: Enhances matrix operation efficiency on Hopper architecture.

Getting Started

Visit the GitHub repository for installation instructions and documentation.

Framework Features

Supported Tasks

Matrix Multiplication Mixture Of Experts (Moe) Operations Low-Precision Computation

Getting Started

Pricing

free

Requirements

NVIDIA Hopper Architecture GPU (e.g., H800, H100)
CUDA 12.3+
Python 3.8+
PyTorch 2.1+
CUTLASS 3.6+
Linux OS (e.g., Ubuntu, CentOS)

Screenshots & Images

Primary Screenshot

Additional Images

View Repository

Stats

0 Views

0 Favorites

5112 GitHub Stars

Community & Support

GitHub Repository

Similar Frameworks

TPO

Phantom by ByteDance

AgentSociety by Tsinghua University

DeepGEMM

What is DeepGEMM?

Key Features

Performance

System Requirements

Use Cases

Getting Started

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Recently Viewed

Company

Categories

Stay Updated

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

DeepGEMM

What is DeepGEMM?

Key Features

Performance

System Requirements

Use Cases

Getting Started

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Recently Viewed

Company

Categories

Stay Updated

Drop files here or click to upload.

Drop files here or click to upload.