SmolDocling is a compact, multimodal document processing model designed for efficient conversion of document images into structured text. It supports a variety of elements including text, formulas, charts, and tables, making it suitable for academic papers, technical reports, and other document types. With only 256M parameters, it ensures fast inference speeds, processing each page in just 0.35 seconds on an A100 GPU. The model is fully compatible with Docling and supports multiple export formats.
What is SmolDocling?
SmolDocling is a lightweight, multimodal document processing model designed for efficient conversion of document images into structured text. It supports various elements including text, formulas, charts, and tables, making it ideal for academic papers, technical reports, and other document types.
Key Features
- Multimodal Document Conversion: Converts image documents into structured text, supporting both scientific and non-scientific documents.
- Fast Inference: Processes a page in just 0.35 seconds on an A100 GPU.
- OCR and Layout Recognition: Accurately extracts text while preserving document structure and element bounding boxes.
- Complex Element Recognition: Recognizes and processes code blocks, mathematical formulas, charts, and tables.
- Seamless Integration with Docling: Supports multiple export formats and is compatible with Docling.
Technical Details
- Lightweight Design: With only 256M parameters, SmolDocling is optimized for fast processing on consumer-grade GPUs.
- Visual Backbone Network: Uses SigLIP base patch-16/512 for efficient image processing.
- Text Encoder: Employs SmolLM-2 for text processing and multimodal fusion.
- Optimized Training: Trained on a diverse dataset with a higher pixel token rate for improved efficiency.
Getting Started
To use SmolDocling, install the necessary dependencies and follow the example code provided in the documentation. The model supports inference using Transformers, VLLM, or ONNX, and results can be exported in multiple formats using Docling.
Application Scenarios
- Document Conversion and Digitization: Efficiently converts image-based documents into structured text formats.
- Scientific and Non-Scientific Document Processing: Recognizes and extracts key information from various document types.
- Mobile and Low-Resource Device Support: Runs on mobile devices or resource-constrained environments.