Pixtral 12B is a multimodal AI model developed by Mistral AI, designed to handle both text and image data. With 12 billion parameters and a size of approximately 24GB, it excels in tasks such as image captioning, object counting, and answering questions based on image content. Built on the Nemo 12B text model, it incorporates a 400-million-parameter vision adapter, enabling high-resolution image processing up to 1024x1024 pixels. The model is open-sourced under the Apache 2.0 license, allowing users to download, fine-tune, and deploy it for various applications. Pixtral 12B is optimized for inference using the TensorRT-LLM engine and supports dynamic batching and quantization on NVIDIA GPUs.
Pixtral 12B - Mistral AI's First Multimodal AI Model
Introduction
Pixtral 12B is a groundbreaking multimodal AI model developed by Mistral AI, capable of processing both text and image data. With 12 billion parameters, it is designed to handle complex tasks such as image captioning, object counting, and visual question answering.
Key Features
- Multimodal Capabilities: Processes both text and image data seamlessly.
- High Parameter Count: 12 billion parameters for enhanced performance.
- Vision Encoder: Supports high-resolution images up to 1024x1024 pixels.
- Open Source: Available under the Apache 2.0 license for customization and deployment.
- Optimized Inference: Utilizes TensorRT-LLM for efficient performance on NVIDIA GPUs.
Technical Details
- Architecture: 40 layers, 14,336 hidden dimensions, 32 attention heads.
- Vision Adapter: 400 million parameters with GeLU activation.
- Inference Optimization: Supports dynamic batching, KV caching, and quantization.
Use Cases
- Image and Text Understanding: Ideal for tasks requiring simultaneous parsing of visual and language information.
- Content Creation: Assists in generating descriptive text for images and creating article illustrations.
- Customer Support: Helps in understanding and responding to image-related queries.
- Medical Image Analysis: Provides diagnostic support by analyzing medical images.
Getting Started
Pixtral 12B is available for download and fine-tuning on HuggingFace. For more details, visit the project website.