SmolDocling

by ds4sd

SmolDocling is an efficient, lightweight multimodal document processing model that converts document images into structured text.

What is SmolDocling?

SmolDocling is a lightweight, multimodal document processing model designed for efficient conversion of document images into structured text. It supports various elements including text, formulas, charts, and tables, making it ideal for academic papers, technical reports, and other document types.

Key Features

Multimodal Document Conversion: Converts image documents into structured text, supporting both scientific and non-scientific documents.
Fast Inference: Processes a page in just 0.35 seconds on an A100 GPU.
OCR and Layout Recognition: Accurately extracts text while preserving document structure and element bounding boxes.
Complex Element Recognition: Recognizes and processes code blocks, mathematical formulas, charts, and tables.
Seamless Integration with Docling: Supports multiple export formats and is compatible with Docling.

Technical Details

Lightweight Design: With only 256M parameters, SmolDocling is optimized for fast processing on consumer-grade GPUs.
Visual Backbone Network: Uses SigLIP base patch-16/512 for efficient image processing.
Text Encoder: Employs SmolLM-2 for text processing and multimodal fusion.
Optimized Training: Trained on a diverse dataset with a higher pixel token rate for improved efficiency.

Getting Started

To use SmolDocling, install the necessary dependencies and follow the example code provided in the documentation. The model supports inference using Transformers, VLLM, or ONNX, and results can be exported in multiple formats using Docling.

Application Scenarios

Document Conversion and Digitization: Efficiently converts image-based documents into structured text formats.
Scientific and Non-Scientific Document Processing: Recognizes and extracts key information from various document types.
Mobile and Low-Resource Device Support: Runs on mobile devices or resource-constrained environments.

Model Capabilities

Model Type

multimodal

Supported Tasks

Ocr Text Extraction Formula Recognition Chart Recognition Table Recognition Document Conversion

Usage & Integration

Pricing

free

API Access

Available

License

Open Source

Requirements

Python 3.8+
GPU

Screenshots & Images

Primary Screenshot

Additional Images

Try Now Documentation

Stats

165 Views

0 Favorites

Similar Models

Ola by Tsinghua University, Tencent Hunyuan Research Team, NUS S-Lab

627

Zonos by Zyphra

516

Step-Video-T2V by Leapfrogging Star

639

SmolDocling

What is SmolDocling?

Key Features

Technical Details

Getting Started

Application Scenarios

Model Capabilities

Usage & Integration

Screenshots & Images

Stats

Similar Models

Recently Viewed

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

SmolDocling

What is SmolDocling?

Key Features

Technical Details

Getting Started

Application Scenarios

Model Capabilities

Usage & Integration

Screenshots & Images

Stats

Similar Models

Recently Viewed

Drop files here or click to upload.

Drop files here or click to upload.