DeepEP

by DeepSeek

DeepEP is an open-source Expert Parallel (EP) communication library designed for training and inference of Mixture of Experts (MoE) models, offering high throughput and low latency GPU kernels.

What is DeepEP?

DeepEP is an open-source Expert Parallel (EP) communication library developed by DeepSeek, specifically designed for training and inference of Mixture of Experts (MoE) models. It provides high-throughput and low-latency all-to-all GPU kernels, supporting both intra-node and inter-node NVLink and RDMA communications.

Key Features of DeepEP

Efficient Communication Kernels: High-throughput and low-latency all-to-all GPU kernels for MoE's dispatch and combine operations.
Low-Precision Computing Support: Supports FP8 and BF16 data formats, improving computational efficiency and reducing memory requirements.
Optimized Communication Mechanism: Optimized kernels for the group-restricted gating algorithm, supporting asymmetric bandwidth forwarding from NVLink to RDMA.
Low-Latency Inference Decoding: Pure RDMA low-latency kernel, with latency as low as 163 microseconds.
Communication-Computation Overlap: Hook-based method that does not occupy GPU's stream multiprocessor (SM) resources.
Flexible Resource Management: Supports flexible GPU resource management, adapting to different workloads.
Network Configuration Optimization: Tested on InfiniBand networks, supporting traffic isolation through virtual lanes (VLs).

Performance of DeepEP

High-Throughput Kernels: Tested on H800 GPUs and CX7 InfiniBand 400 Gb/s RDMA network cards, demonstrating excellent throughput performance.
Low-Latency Kernels: Designed for inference decoding, using pure RDMA technology, significantly reducing latency.
System Compatibility: Compatible with InfiniBand networks and RDMA over Converged Ethernet (RoCE).

System Requirements for DeepEP

Hardware Requirements: Hopper architecture GPUs (e.g., H100, H800), GPUDirect RDMA-capable devices, NVLink for intra-node communication, and RDMA networks for inter-node communication.
Software Requirements: Python 3.8+, CUDA 12.3+, PyTorch 2.1+, and a modified version of NVSHMEM.
Network Requirements: InfiniBand networks, compatible with RDMA over Converged Ethernet (RoCE).

Application Scenarios of DeepEP

Large-Scale Model Training: Efficient parallel communication support for training MoE models.
Inference Tasks: Suitable for latency-sensitive inference decoding scenarios.
High-Performance Computing: Optimizes communication performance on NVLink and RDMA networks.
Intelligent Customer Service: Optimizes the inference process for quick response to user inquiries.
Financial Sector: Used for risk assessment and automated report generation.

Framework Features

Supported Tasks

Moe Model Training Moe Model Inference High-Performance Computing

Getting Started

Pricing

free

Requirements

Python 3.8+
CUDA 12.3+
PyTorch 2.1+
Hopper GPU architecture
NVLink for intra-node communication
RDMA for inter-node communication

Screenshots & Images

Additional Images

View Repository

Stats

0 Views

0 Favorites

7338 GitHub Stars

Community & Support

GitHub Repository

Similar Frameworks

TPO

Phantom by ByteDance

AgentSociety by Tsinghua University

Helping everyone find the best AI for their work and daily life through deep analysis and honest comparisons.

Company

About Contact News Insights

Stay Updated

Get notified about new AI tools, models, and insights.

DeepEP

What is DeepEP?

Key Features of DeepEP

Performance of DeepEP

System Requirements for DeepEP

Application Scenarios of DeepEP

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Company

Categories

Stay Updated

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

DeepEP

What is DeepEP?

Key Features of DeepEP

Performance of DeepEP

System Requirements for DeepEP

Application Scenarios of DeepEP

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Company

Categories

Stay Updated

Drop files here or click to upload.

Drop files here or click to upload.