Pixtral12B

Pixtral12B

by Mistral AI
Pixtral 12B is Mistral AI's first multimodal AI model, capable of processing both images and text, with 12 billion parameters.

Pixtral 12B - Mistral AI's First Multimodal AI Model

Introduction

Pixtral 12B is a groundbreaking multimodal AI model developed by Mistral AI, capable of processing both text and image data. With 12 billion parameters, it is designed to handle complex tasks such as image captioning, object counting, and visual question answering.

Key Features

  • Multimodal Capabilities: Processes both text and image data seamlessly.
  • High Parameter Count: 12 billion parameters for enhanced performance.
  • Vision Encoder: Supports high-resolution images up to 1024x1024 pixels.
  • Open Source: Available under the Apache 2.0 license for customization and deployment.
  • Optimized Inference: Utilizes TensorRT-LLM for efficient performance on NVIDIA GPUs.

Technical Details

  • Architecture: 40 layers, 14,336 hidden dimensions, 32 attention heads.
  • Vision Adapter: 400 million parameters with GeLU activation.
  • Inference Optimization: Supports dynamic batching, KV caching, and quantization.

Use Cases

  • Image and Text Understanding: Ideal for tasks requiring simultaneous parsing of visual and language information.
  • Content Creation: Assists in generating descriptive text for images and creating article illustrations.
  • Customer Support: Helps in understanding and responding to image-related queries.
  • Medical Image Analysis: Provides diagnostic support by analyzing medical images.

Getting Started

Pixtral 12B is available for download and fine-tuning on HuggingFace. For more details, visit the project website.

Model Capabilities

Model Type
multimodal
Supported Tasks
Image Captioning Object Counting Visual Question Answering Image Classification Content Creation Medical Image Analysis
Tags
Multimodal AI Image Processing Text Processing Open Source High Performance Natural Language Processing Computer Vision AI Models Machine Learning Deep Learning

Usage & Integration

Pricing
free
API Access
Available
License
Open Source Apache 2.0
Requirements
  • NVIDIA GPU
  • TensorRT-LLM
  • Python 3.8+

Screenshots & Images

Primary Screenshot
Additional Images

Stats

68 Views
0 Favorites

Community & Support

Similar Models

Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab
296
Zonos by Zyphra
275
Step-Video-T2V by Leapfrogging Star
294