donphi_mcp_server

by donphi

An MCP server that enables AI assistants to search and access private documents, codebases, and tech info by processing Markdown, text, and PDFs into a searchable database.

AI Assistant Document Integration Server

Overview

The AI Assistant Document Integration Server extends the capabilities of AI assistants by enabling them to access and search private documents, codebases, and up-to-date technical information. It processes Markdown, text, and PDF files into a searchable database, allowing AI models to retrieve information beyond their training data. Built with Docker, it supports both free and paid embeddings, ensuring AI assistants stay updated with your data.

Key Features

Document Processing: Converts Markdown, text, and PDF files into a searchable database.
Model Context Protocol (MCP): Implements the MCP standard to allow AI assistants to query external data sources.
Up-to-Date Knowledge: Overcomes LLM knowledge cutoffs by integrating the latest framework documentation, private codebases, and technical specifications.
Flexible Embedding Models: Supports both free local embeddings and paid OpenAI embeddings.
Docker Integration: Easy setup and deployment using Docker containers.

Architecture

The system consists of two main components:
1. Processing Pipeline: Reads, chunks, and generates embeddings for documents, storing them in a vector database.
2. MCP Server: Exposes processed content through MCP tools, enabling AI assistants to search and retrieve information.

Use Cases

Latest Framework Documentation: Keep AI assistants updated with the latest React, Angular, or Vue documentation.
Private Codebase Integration: Allow AI assistants to understand and debug proprietary code.
Technical Specifications: Provide AI assistants with up-to-date API and protocol documentation.

Prerequisites

Docker: Docker Desktop for Windows/Mac or Docker Engine for Linux.
OpenAI API Key (Optional): Required for paid embedding models.
MCP-Compatible AI Assistant: Such as Roo or other compatible assistants.

Setup

Clone the repository:
shell git clone https://github.com/donphi/mcp-server.git cd mcp-server
Create a .env file from the example:
shell cp .env.example .env nano .env
Place your Markdown and text files in the data/ directory.

Configuration

Configure the server using environment variables in the .env file:

OPENAI_API_KEY=your_openai_api_key_here
CHUNK_SIZE=800
CHUNK_OVERLAP=120
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

Embedding Models

Free Models (No API Key Required)

sentence-transformers/all-MiniLM-L6-v2: Compact model for sentence and paragraph encoding.
BAAI/bge-m3: Supports multiple retrieval functionalities and over 100 languages.
Snowflake/snowflake-arctic-embed-m: Optimized for high-quality retrieval.

Paid Models (Require OpenAI API Key)

text-embedding-3-small: Cost-effective with good quality.
text-embedding-3-large: Highest quality embeddings.

Usage

Processing Files

Run the pipeline to process files and generate embeddings:

docker-compose build pipeline
docker-compose run pipeline

Building the MCP Server

After processing documents, build the server:

docker-compose build server

Connecting to an AI Assistant

Generate the configuration file using the provided scripts:
- macOS/Linux:
shell chmod +x setup-mcpServer-json.sh ./setup-mcpServer-json.sh
- Windows:
Double-click setup-mcpServer-json.bat.

MCP Tools

The server exposes the following tools:
- read_md_files: Process and retrieve files.
- search_content: Search across processed content.
- get_context: Retrieve contextual information.
- project_structure: Provide project structure information.
- suggest_implementation: Generate implementation suggestions.

Supported File Types

Markdown (.md)
Text (.txt)
PDF (.pdf)
Word documents (.docx, .doc)

Troubleshooting

Docker not found: Ensure Docker is installed and running.
"Invalid reference format" error: Build the server image before running it.
API key issues: Use free local embedding models without API keys.
Chroma database not found: Run the pipeline to process documents first.

Advanced Configuration

Customize the pipeline and server for advanced use cases:
- Custom Embedding Functions: Modify embedding logic.
- Chunking Behavior: Adjust chunking parameters.
- Chunk Analysis: Compare standard and enhanced chunking methods.

License

This project is licensed under the MIT License.

Created with ❤️ by donphi

Features & Capabilities

Implementation Details

View on GitHub

Stats

0 Views

0 Favorites

2 GitHub Stars

Repository Info

donphi Organization

Similar Servers

continuedev_continue by continuedev

cherryhq_cherry_studio by CherryHQ

bytedance_ui_tars_desktop by bytedance

No result found

Advanced Search

Search Preferences