ViDoRAG

by Alibaba Tongyi Lab, USTC, SJTU

ViDoRAG is a framework for enhancing retrieval and reasoning in complex visual documents through multi-agent collaboration and dynamic iterative reasoning.

What is ViDoRAG?

ViDoRAG is a visual document retrieval-augmented generation framework developed by Alibaba Tongyi Lab in collaboration with the University of Science and Technology of China (USTC) and Shanghai Jiao Tong University (SJTU). It addresses the limitations of traditional methods in handling complex visual documents through multi-agent collaboration and dynamic iterative reasoning.

Key Features of ViDoRAG

Multimodal Retrieval: Integrates visual and textual information for precise document retrieval.
Dynamic Iterative Reasoning: Multi-agent collaboration progressively refines answers, enhancing reasoning depth and accuracy.
Complex Document Understanding: Supports single-hop and multi-hop reasoning for handling complex visual document content.
Generation Consistency Assurance: Ensures the accuracy and consistency of final answers through the Answer Agent.
Efficient Generation: Dynamically adjusts the number of retrieval results, reducing computational overhead and improving generation efficiency.

Technical Principles of ViDoRAG

Multimodal Hybrid Retrieval: Combines text and visual retrieval results, dynamically adjusting the number of retrieval results based on Gaussian Mixture Models (GMM).
Dynamic Iterative Reasoning Framework: Includes Seeker, Inspector, and Answer Agents for rapid screening, detailed review, and final answer generation.
Coarse-to-Fine Generation Strategy: Starts from a global perspective, gradually focusing on local details, improving generation efficiency and accuracy.
Reasoning Ability Activation: Enhances performance in multi-hop reasoning and complex document understanding tasks.
Dynamic Retrieval Length Adjustment: Dynamically adjusts the number of retrieval results based on GMM, improving retrieval efficiency and generation quality.

Application Scenarios of ViDoRAG

Education: Helps students and teachers quickly retrieve charts, data, and text content from textbooks.
Finance: Extracts key data and charts from financial reports and market research documents.
Healthcare: Quickly locates charts and data in medical literature.
Legal: Retrieves relevant clauses and case charts from legal documents.
Enterprise Knowledge Management: Extracts key information from internal documents, quickly answering employee queries.

Project Address of ViDoRAG

GitHub Repository: https://github.com/Alibaba-NLP/ViDoRAG
arXiv Technical Paper: https://arxiv.org/pdf/2502.18017

Framework Features

Supported Tasks

Visual Document Retrieval Complex Document Understanding Dynamic Iterative Reasoning Multimodal Retrieval

Getting Started

Pricing

free

Screenshots & Images

Additional Images

View Repository

Stats

0 Views

0 Favorites

442 GitHub Stars

Community & Support

GitHub Repository

Similar Frameworks

TPO

Phantom by ByteDance

AgentSociety by Tsinghua University

Helping everyone find the best AI for their work and daily life through deep analysis and honest comparisons.

Company

About Contact News Insights

Stay Updated

Get notified about new AI tools, models, and insights.

ViDoRAG

What is ViDoRAG?

Key Features of ViDoRAG

Technical Principles of ViDoRAG

Application Scenarios of ViDoRAG

Project Address of ViDoRAG

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Company

Categories

Stay Updated

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

ViDoRAG

What is ViDoRAG?

Key Features of ViDoRAG

Technical Principles of ViDoRAG

Application Scenarios of ViDoRAG

Project Address of ViDoRAG

Framework Features

Getting Started

Screenshots & Images

Stats

Community & Support

Similar Frameworks

Company

Categories

Stay Updated

Drop files here or click to upload.

Drop files here or click to upload.