Pixtral12B

Pixtral12B

by Mistral AI
Pixtral 12B is a multimodal AI model developed by Mistral AI, designed to handle both text and image data. With 12 billion parameters and a size of approximately 24GB, it excels in tasks such as image captioning, object counting, and answering questions based on image content. Built on the Nemo 12B text model, it incorporates a 400-million-parameter vision adapter, enabling high-resolution image processing up to 1024x1024 pixels. The model is open-sourced under the Apache 2.0 license, allowing users to download, fine-tune, and deploy it for various applications. Pixtral 12B is optimized for inference using the TensorRT-LLM engine and supports dynamic batching and quantization on NVIDIA GPUs.

Pixtral 12B - Mistral AI's First Multimodal AI Model

Introduction

Pixtral 12B is a groundbreaking multimodal AI model developed by Mistral AI, capable of processing both text and image data. With 12 billion parameters, it is designed to handle complex tasks such as image captioning, object counting, and visual question answering.

Key Features

  • Multimodal Capabilities: Processes both text and image data seamlessly.
  • High Parameter Count: 12 billion parameters for enhanced performance.
  • Vision Encoder: Supports high-resolution images up to 1024x1024 pixels.
  • Open Source: Available under the Apache 2.0 license for customization and deployment.
  • Optimized Inference: Utilizes TensorRT-LLM for efficient performance on NVIDIA GPUs.

Technical Details

  • Architecture: 40 layers, 14,336 hidden dimensions, 32 attention heads.
  • Vision Adapter: 400 million parameters with GeLU activation.
  • Inference Optimization: Supports dynamic batching, KV caching, and quantization.

Use Cases

  • Image and Text Understanding: Ideal for tasks requiring simultaneous parsing of visual and language information.
  • Content Creation: Assists in generating descriptive text for images and creating article illustrations.
  • Customer Support: Helps in understanding and responding to image-related queries.
  • Medical Image Analysis: Provides diagnostic support by analyzing medical images.

Getting Started

Pixtral 12B is available for download and fine-tuning on HuggingFace. For more details, visit the project website.

Model Capabilities

Model Type
multimodal
Supported Tasks
Image Captioning Object Counting Visual Question Answering Image Classification Content Creation Medical Image Analysis
Tags
Multimodal AI Image Processing Text Processing Open Source High Performance Natural Language Processing Computer Vision AI Models Machine Learning Deep Learning

Usage & Integration

Pricing
free
API Access
Available
License
Open Source Apache 2.0
Requirements
  • NVIDIA GPU
  • TensorRT-LLM
  • Python 3.8+

Screenshots & Images

Primary Screenshot
Additional Images

Stats

0 Views
0 Likes

Community & Support

Similar Models

LongWriter by Tsinghua University and Zhipu AI
0
LongCite by Tsinghua University
0