Qwerky-72B Model Trained Efficiently on 8 AMD MI300X GPUs

April 01, 2025

Qwerky-72B AMD MI300X GPU AI training large-scale models attention-free models

The Qwerky-72B model, a large attention-free model, was successfully trained using only 8 AMD MI300X GPUs, showcasing the efficiency and scalability of AMD's MI300X accelerators for large-scale model training.

Qwerky-72B Model Trained Efficiently on 8 AMD MI300X GPUs

Video: AMD MI300X server review 8x GPUs | Llama 405b model tested

The Qwerky-72B model, a large attention-free model, was trained using only 8 AMD MI300X GPUs. This highlights the efficiency and scalability of AMD's MI300X accelerators for training large-scale models. The MI300X GPUs are equipped with high bandwidth memory (HBM) and a significant number of streaming multiprocessors (SMs), which contribute to their ability to handle extensive computational workloads.

For more detailed insights into the training process and the specific optimizations applied to leverage the MI300X hardware, you can refer to the following resources:

Sources

Qwerky-72B trained using only 8 AMD MI300X GPUs | Hacker News > At a high level, you take an existing transformer model Freeze all the weights, delete the attention layer, replace it with RWKV, and train it ...

Qwerky-72B and 32B : Training large attention free models, with ... Qwerky-72B and 32B : Training large attention free models, with only 8 GPU's ‼️ Attention is NOT all you need ‼️

Training Transformers and Hybrid models on AMD Instinct MI300X ... We explain how Zyphra harnessed the hardware advantages of the MI300x hardware for training both dense transformers and Zyphra's hybrid models.

Qwerky-72B Model Trained Efficiently on 8 AMD MI300X GPUs

Qwerky-72B Model Trained Efficiently on 8 AMD MI300X GPUs

Sources

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

News

Qwerky-72B Model Trained Efficiently on 8 AMD MI300X GPUs

Qwerky-72B Model Trained Efficiently on 8 AMD MI300X GPUs

Sources

Drop files here or click to upload.

Drop files here or click to upload.