donphi_mcp_server

donphi_mcp_server

by donphi
An MCP server that enables AI assistants to search and access private documents, codebases, and tech info by processing Markdown, text, and PDFs into a searchable database.

AI Assistant Document Integration Server

Overview

The AI Assistant Document Integration Server extends the capabilities of AI assistants by enabling them to access and search private documents, codebases, and up-to-date technical information. It processes Markdown, text, and PDF files into a searchable database, allowing AI models to retrieve information beyond their training data. Built with Docker, it supports both free and paid embeddings, ensuring AI assistants stay updated with your data.

Key Features

  • Document Processing: Converts Markdown, text, and PDF files into a searchable database.
  • Model Context Protocol (MCP): Implements the MCP standard to allow AI assistants to query external data sources.
  • Up-to-Date Knowledge: Overcomes LLM knowledge cutoffs by integrating the latest framework documentation, private codebases, and technical specifications.
  • Flexible Embedding Models: Supports both free local embeddings and paid OpenAI embeddings.
  • Docker Integration: Easy setup and deployment using Docker containers.

Architecture

The system consists of two main components:
1. Processing Pipeline: Reads, chunks, and generates embeddings for documents, storing them in a vector database.
2. MCP Server: Exposes processed content through MCP tools, enabling AI assistants to search and retrieve information.

Use Cases

  • Latest Framework Documentation: Keep AI assistants updated with the latest React, Angular, or Vue documentation.
  • Private Codebase Integration: Allow AI assistants to understand and debug proprietary code.
  • Technical Specifications: Provide AI assistants with up-to-date API and protocol documentation.

Prerequisites

  • Docker: Docker Desktop for Windows/Mac or Docker Engine for Linux.
  • OpenAI API Key (Optional): Required for paid embedding models.
  • MCP-Compatible AI Assistant: Such as Roo or other compatible assistants.

Setup

  1. Clone the repository:
    shell git clone https://github.com/donphi/mcp-server.git cd mcp-server
  2. Create a .env file from the example:
    shell cp .env.example .env nano .env
  3. Place your Markdown and text files in the data/ directory.

Configuration

Configure the server using environment variables in the .env file:

OPENAI_API_KEY=your_openai_api_key_here
CHUNK_SIZE=800
CHUNK_OVERLAP=120
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

Embedding Models

Free Models (No API Key Required)

  • sentence-transformers/all-MiniLM-L6-v2: Compact model for sentence and paragraph encoding.
  • BAAI/bge-m3: Supports multiple retrieval functionalities and over 100 languages.
  • Snowflake/snowflake-arctic-embed-m: Optimized for high-quality retrieval.

Paid Models (Require OpenAI API Key)

  • text-embedding-3-small: Cost-effective with good quality.
  • text-embedding-3-large: Highest quality embeddings.

Usage

Processing Files

Run the pipeline to process files and generate embeddings:

docker-compose build pipeline
docker-compose run pipeline

Building the MCP Server

After processing documents, build the server:

docker-compose build server

Connecting to an AI Assistant

Generate the configuration file using the provided scripts:
- macOS/Linux:
shell chmod +x setup-mcpServer-json.sh ./setup-mcpServer-json.sh
- Windows:
Double-click setup-mcpServer-json.bat.

MCP Tools

The server exposes the following tools:
- read_md_files: Process and retrieve files.
- search_content: Search across processed content.
- get_context: Retrieve contextual information.
- project_structure: Provide project structure information.
- suggest_implementation: Generate implementation suggestions.

Supported File Types

  • Markdown (.md)
  • Text (.txt)
  • PDF (.pdf)
  • Word documents (.docx, .doc)

Troubleshooting

  • Docker not found: Ensure Docker is installed and running.
  • "Invalid reference format" error: Build the server image before running it.
  • API key issues: Use free local embedding models without API keys.
  • Chroma database not found: Run the pipeline to process documents first.

Advanced Configuration

Customize the pipeline and server for advanced use cases:
- Custom Embedding Functions: Modify embedding logic.
- Chunking Behavior: Adjust chunking parameters.
- Chunk Analysis: Compare standard and enhanced chunking methods.

License

This project is licensed under the MIT License.


Created with ❤️ by donphi

Features & Capabilities

Categories
mcp_server model_context_protocol python docker search api_integration markdown pdf_processing vector_database

Implementation Details

Stats

0 Views
0 Favorites
2 GitHub Stars

Repository Info

donphi Organization

Similar Servers

continuedev_continue by continuedev
0
0
0