KTransformers

by Tsinghua University

KTransformers is an open-source project by Tsinghua University's KVCache.AI team, designed to optimize the inference performance of large language models and reduce hardware requirements.

What is KTransformers?

KTransformers is an open-source framework developed by Tsinghua University's KVCache.AI team in collaboration with Qujing Technology. It is designed to optimize the inference performance of large language models (LLMs) and reduce hardware requirements, making it possible to run models with hundreds of billions of parameters on consumer-grade hardware.

Key Features of KTransformers

Supports Local Inference of Large Models: Run full versions of large models like DeepSeek-R1 with 671B parameters on a single GPU with only 24GB of VRAM.
Improves Inference Speed: Achieve preprocessing speeds of up to 286 tokens/s and inference generation speeds of up to 14 tokens/s.
Compatible with Various Models and Operators: Supports DeepSeek series and other MoE architecture models, with a flexible template injection framework for customization.
Reduces Hardware Requirements: Enables "home-based" deployment of large models, making it accessible to individual users and small teams.
Supports Long Sequence Tasks: Integrates Intel AMX instruction set for CPU pre-fill speeds up to 286 tokens/s, reducing processing time from minutes to seconds.

Technical Principles

MoE Architecture: Offloads sparse MoE matrices to CPU/DRAM while keeping dense parts on the GPU, reducing VRAM requirements.
Offload Strategy: Distributes tasks based on computational intensity, prioritizing high-intensity tasks on the GPU.
High-Performance Operator Optimization: Uses llamafile for CPU optimization and Marlin operators for GPU, achieving significant speedups.
CUDA Graph Optimization: Minimizes CPU/GPU communication breaks, improving inference performance.
Quantization and Storage Optimization: Uses 4bit quantization to compress model storage and optimize KV cache size.

Application Scenarios

Individual Developers and Small Teams: Run large models on consumer-grade hardware for text generation, Q&A systems, and more.
Long Sequence Tasks: Efficiently process long texts, code analysis, and other tasks.
Enterprise Applications: Deploy large models locally for intelligent customer service and content recommendation.
Academic Research: Explore and optimize MoE architecture models on ordinary hardware.
Education and Training: Use as a teaching tool for large model applications and optimization techniques.

Getting Started

Visit the GitHub repository to download and start using KTransformers.

Framework Features

Supported Tasks

Text Generation Q&a Systems Long Sequence Processing Code Analysis Intelligent Customer Service Content Recommendation

Getting Started

Pricing

free

Requirements

GPU with 24GB VRAM
Python 3.8+
CUDA Toolkit

Screenshots & Images

Primary Screenshot

Additional Images

View Repository Documentation

Stats

0 Views

0 Favorites

13211 GitHub Stars

Community & Support

GitHub Repository

Similar Frameworks

TPO

Phantom by ByteDance

AgentSociety by Tsinghua University

KTransformers

What is KTransformers?

Key Features of KTransformers

Technical Principles

Application Scenarios

Getting Started

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Recently Viewed

Company

Categories

Stay Updated

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

KTransformers

What is KTransformers?

Key Features of KTransformers

Technical Principles

Application Scenarios

Getting Started

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Recently Viewed

Company

Categories

Stay Updated

Drop files here or click to upload.

Drop files here or click to upload.