News

Running Qwen 2.5-Omni-3B on Consumer PCs: Key Requirements and Installation Guide

Running Qwen 2.5-Omni-3B on Consumer PCs: Key Requirements and Installation Guide

April 30, 2025
Qwen 2.5-Omni-3B multimodal model GPU memory PyTorch FlashAttention 2 Hugging Face BF16 precision
Qwen 2.5-Omni-3B, a powerful multimodal model capable of processing text, images, audio, and video, can be run on consumer PCs with specific hardware and software requirements, including a modern GPU with sufficient VRAM and optimized precision settings.

Running Qwen 2.5-Omni-3B on Consumer PCs: Key Requirements and Installation Guide

Video: Qwen Produces Virtual Human - Qwen2.5 Omni - Install Thinker-Talker Locally

Qwen 2.5-Omni-3B is a powerful multimodal model capable of processing text, images, audio, and video, and generating text and natural speech responses. While it is designed for high-performance environments, it is possible to run it on a consumer PC or laptop with certain considerations.

Key Requirements:

  • GPU Memory: The model requires a significant amount of GPU memory. For Qwen 2.5-Omni-3B, the minimum memory requirements are:
    • FP32 Precision: 89.10 GB (not recommended for most consumer PCs)
    • BF16 Precision: 18.38 GB (recommended for better performance and memory efficiency)
  • Hardware: A modern GPU with sufficient VRAM (e.g., NVIDIA RTX 3090 or higher) is recommended. FlashAttention 2 can be used to speed up generation, but it requires compatible hardware.
  • Software: Ensure you have the latest version of PyTorch, Hugging Face Transformers, and FlashAttention 2 installed. You can install these using pip:
    pip install -U flash-attn --no-build-isolation

Installation Steps:

  1. Install the necessary libraries:
    pip uninstall transformers
    pip install git+https://github.com/huggingface/[email protected]
    pip install accelerate
  2. Install the Qwen Omni utilities for handling multimodal inputs:
    pip install qwen-omni-utils[decord] -U
  3. Load the model using the following code snippet:
    from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
    from qwen_omni_utils import process_mm_info
    
    model = Qwen2_5OmniForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-Omni-3B", torch_dtype="auto", device_map="auto")
    processor = Qwen2_5OmniProcessor.from_pretrained("Qwen/Qwen2.5-Omni-3B")

Usage Tips:

  • Batch Inference: The model supports batch processing of mixed media inputs (text, images, audio, and video).
  • Audio Output: To enable audio output, ensure the system prompt is set correctly. You can also change the voice type of the output audio using the speaker parameter.
  • Memory Optimization: Use BF16 precision and FlashAttention 2 to reduce memory usage and speed up generation.

For more detailed instructions and examples, refer to the Hugging Face model page.

Sources

Qwen/Qwen2.5-Omni-3B - Hugging Face Key Features · Omni and Novel Architecture · Real-Time Voice and Video Chat · Natural and Robust Speech Generation · Strong Performance Across ...
Run Qwen 2.5 Local & Private on your laptop - YouTube In this video, we'll guide you through running Qwen 2.5 locally and privately on your laptop.
How to Install Qwen2.5-Omni 7B Locally - NodeShift Qwen-2.5 Omni 7B represents a significant advancement in AI by effortlessly integrating multiple data modalities, such as text, images, audio, ...