futureunreal_mcp_pdf2md

futureunreal_mcp_pdf2md

by FutureUnreal
An MCP-based service for converting PDF files to structured Markdown format, supporting batch processing and integration with LLM clients.

PDF to Markdown Conversion with MCP Integration

Overview

The MCP-PDF2MD service is a high-performance tool designed to convert PDF files into structured Markdown format. It leverages the MinerU API and supports batch processing for both local files and URL links. This service is particularly useful for users who need to maintain document structure while converting PDFs to Markdown.

Key Features

  • Format Conversion: Convert PDF files to structured Markdown format.
  • Multi-source Support: Process both local PDF files and URL links.
  • Batch Processing: Efficiently handle large volumes of PDF files.
  • MCP Integration: Seamless integration with LLM clients like Claude Desktop.
  • Structure Preservation: Maintain original document structure, including headings, paragraphs, lists, etc.
  • Formula Conversion: Automatically recognize and convert formulas to LaTeX format.
  • Table Extraction: Recognize and convert tables to structured format.
  • Cleanup Optimization: Remove headers, footers, footnotes, and page numbers for semantic coherence.

System Requirements

  • Software: Python 3.10+

Quick Start

  1. Clone the repository and enter the directory:
    shell git clone https://github.com/FutureUnreal/mcp-pdf2md.git cd mcp-pdf2md
  2. Create a virtual environment and install dependencies:
  3. Linux/macOS:
    shell uv venv source .venv/bin/activate uv pip install -e .
  4. Windows:
    shell uv venv .venv\Scripts\activate uv pip install -e .
  5. Configure environment variables:
    Create a .env file in the project root directory and set the following variables:
    MINERU_API_BASE=https://mineru.net/api/v4/extract/task MINERU_BATCH_API=https://mineru.net/api/v4/extract/task/batch MINERU_BATCH_RESULTS_API=https://mineru.net/api/v4/extract-results/batch MINERU_API_KEY=your_api_key_here
  6. Start the service:
    shell uv run pdf2md

Claude Desktop Configuration

Add the following configuration in Claude Desktop:

  • Windows:
    json { "mcpServers": { "pdf2md": { "command": "uv", "args": [ "--directory", "C:\\path\\to\\mcp-pdf2md", "run", "pdf2md", "--output-dir", "C:\\path\\to\\output" ], "env": { "MINERU_API_KEY": "your_api_key_here" } } } }

  • Linux/macOS:
    json { "mcpServers": { "pdf2md": { "command": "uv", "args": [ "--directory", "/path/to/mcp-pdf2md", "run", "pdf2md", "--output-dir", "/path/to/output" ], "env": { "MINERU_API_KEY": "your_api_key_here" } } } }

MCP Tools

The server provides the following MCP tools:
- convert_pdf_url: Convert PDF URL to Markdown
- convert_pdf_file: Convert local PDF file to Markdown

Getting MinerU API Key

  1. Visit MinerU official website and register for an account.
  2. Apply for API testing qualification here.
  3. Once approved, access the API Management page to generate your API key.

Demo

Input PDF

Input PDF

Output Markdown

Output Markdown

License

MIT License - see the LICENSE file for details.

Credits

This project is based on the API from MinerU.

Features & Capabilities

Categories
mcp_server model_context_protocol python docker pdf_conversion markdown api_integration batch_processing claude

Implementation Details

Stats

0 Views
1 GitHub Stars

Repository Info

FutureUnreal Organization

Similar MCP Servers

continuedev_continue by continuedev
25049
21423
9300