You signed in with another tab or window. Reload
to refresh your session. You signed out in another tab or window. Reload
to refresh your session. You switched accounts on another tab or window. Reload
to refresh your session. Dismiss alert
jonverrier / McpDoc Public
AI-Powered Documentation Generator for Legacy Codebases
22 stars
3 forks
Branches
Tags
Activity
Notifications
You must be signed in to change notification settings
main
Go to file
Code
| Name | | Name | Last commit message | Last commit date |
| --- | --- | --- | --- |
| Latest commit
-------------
Jon Verrier
Moved to Selenium to parse for valid mermaid, works better. regenerat…
Mar 24, 2025
8738c8f
· Mar 24, 2025
History
-------
6 Commits
| | |
| src | | src | Moved to Selenium to parse for valid mermaid, works better. regenerat… | Mar 24, 2025 |
| test | | test | Moved to Selenium to parse for valid mermaid, works better. regenerat… | Mar 24, 2025 |
| .gitignore | | .gitignore | Clean initial commit | Mar 21, 2025 |
| .mocharc.json | | .mocharc.json | Clean initial commit | Mar 21, 2025 |
| C4Container.McpDoc.md | | C4Container.McpDoc.md | Clean initial commit | Mar 21, 2025 |
| C4Context.McpDoc.md | | C4Context.McpDoc.md | Clean initial commit | Mar 21, 2025 |
| README.md | | README.md | Moved to Selenium to parse for valid mermaid, works better. regenerat… | Mar 24, 2025 |
| Show HN - Using MCP to generate high level documentation from legacy code bases | | Show HN - Using MCP to generate high level documentation from legacy code bases | Clean initial commit | Mar 21, 2025 |
| package-lock.json | | package-lock.json | Moved to Selenium to parse for valid mermaid, works better. regenerat… | Mar 24, 2025 |
| package.json | | package.json | Moved to Selenium to parse for valid mermaid, works better. regenerat… | Mar 24, 2025 |
| tsconfig.json | | tsconfig.json | Clean initial commit | Mar 21, 2025 |
| View all files | | |
McpDoc is a Model Context Protocol (MCP) server implementation designed to generate documentation for existing systems. It provides a set of MCP prompts and tools for generating code summaries and C4 architecture diagrams using Mermaid.js.
Learn more about MCP
. Learn more about C4
.
The prompts direct the model to walk the directory tree of a system, creating summary documentation as it goes, and then rolling this up to the top level.
For each directory containing 'source code' (you can decide what this is by tailoring the prompt), generate a README.McpDoc.md file. The concept is that any repo in need of automatic documentation generation is likely too large to fit in the context window, so you need to 'pre-store' summaries with a denser level of information than the source. The prompts direct the model to check file timestamps so we only re-generate summaries when we need to.
Alongside each README.McpDoc.md, we generate a C4Component diagram to show the structure of the source modules in the directory.
Finally, we roll up all the README.McpDoc.md files into a C4Context and a C4Container diagram in the root directory to serve as an overview. In principle, you can then navigate all the way from overview diagrams in the root directory through intermediate diagrams in each sub-directory containing source code.
We use the C4 model, as it aims to align with modern Agile practices by providing "just enough" documentation. The C4 approach emphasizes lightweight, living documentation that evolves alongside the codebase, avoiding the common problem of documentation becoming outdated or irrelevant over time ("documentation rot"). By focusing on essential architectural views at different levels of detail, C4 helps teams maintain useful documentation without creating burdensome maintenance overhead that often plagues more traditional documentation approaches.
The idea is aimed at the widely acknowledged problem of legacy codebases being complex and time-consuming to onboard new developers, developers having a hard time working out where to make changes, and people outside of the team having no clue what's going on. If you can auto-generate documentation that runs from the top to the bottom of your system, you have a much better chance of onboarding people quickly and helping everyone navigate around the system.
Going forward, you run the tools from within your IDE, check and tune the output, and then bingo you have done your job of providing a fighting chance for those who come after you.
The C4 model is a hierarchical approach to software architecture documentation, consisting of four levels of diagrams:
Context Diagrams - The highest level view showing how your software system interacts with users and other systems. This diagram helps stakeholders and non-technical audiences understand the big picture.
Container Diagrams - Zooms in to show the high-level technical building blocks of your software system. Containers represent applications, data stores, microservices etc. that work together to deliver functionality.
Component Diagrams - A detailed view inside individual containers showing the key logical components and their interactions. This helps developers understand how the container is structured internally.
Deployment Diagrams - Shows how your software system is deployed across infrastructure. This includes details about technologies, hardware, and deployment environments.
Each level progressively adds more detail while following consistent notation. The C4 approach helps maintain clarity by showing the right level of detail for different audiences - from high-level stakeholders to developers working on specific components.
Here is an example from running the prompts over the MCP Typescript SDK:
Loading
C4Context
title Model Context Protocol (MCP) System Architecture
Person(developer, "Developer", "Uses MCP tools and services")
Person(end\_user, "End User", "Interacts with applications built using MCP")
System\_Boundary(mcp, "Model Context Protocol (MCP)") {
System(sdk, "MCP SDK", "Development kit for building applications using MCP")
System(everything, "Everything Server", "Comprehensive server with various MCP features")
System\_Boundary(data\_storage, "Data Storage") {
System(filesystem, "Filesystem Server", "File operations with security measures")
System(memory, "Memory Server", "Knowledge Graph-based memory system")
System(postgres, "PostgreSQL Server", "Database access and querying")
System(redis, "Redis Server", "Key-value store operations")
System(sqlite, "SQLite Server", "Database operations and business insights")
}
System\_Boundary(external\_integrations, "External Service Integrations") {
System(aws\_kb, "AWS KB Retrieval", "AWS Bedrock Knowledge Base retrieval")
System(brave\_search, "Brave Search", "Web and local search capabilities")
System(everart, "EverArt", "AI image generation")
System(gdrive, "Google Drive", "Google Drive file access")
System(github, "GitHub", "Repository and issue management")
System(gitlab, "GitLab", "Project and merge request management")
System(google\_maps, "Google Maps", "Location-based services")
System(puppeteer, "Puppeteer", "Browser automation")
System(sentry, "Sentry", "Error tracking and analysis")
System(slack, "Slack", "Workspace communication")
System(time, "Time", "Timezone operations and conversions")
}
System(sequential\_thinking, "Sequential Thinking", "Structured problem-solving framework")
}
System\_Ext(aws\_system, "AWS Bedrock", "AI models and knowledge base")
System\_Ext(brave\_system, "Brave Search API", "Web search engine")
System\_Ext(everart\_system, "EverArt API", "Image generation service")
System\_Ext(gdrive\_system, "Google Drive API", "Cloud storage service")
System\_Ext(github\_system, "GitHub API", "Code repository hosting")
System\_Ext(gitlab\_system, "GitLab API", "Code repository hosting")
System\_Ext(gmaps\_system, "Google Maps API", "Mapping and location services")
System\_Ext(sentry\_system, "Sentry API", "Error tracking platform")
System\_Ext(slack\_system, "Slack API", "Team communication platform")
Rel(developer, sdk, "Builds apps with")
Rel(end\_user, developer, "Interacts with apps created by")
Rel(aws\_kb, aws\_system, "Retrieves data from")
Rel(brave\_search, brave\_system, "Searches using")
Rel(everart, everart\_system, "Generates images using")
Rel(gdrive, gdrive\_system, "Accesses files on")
Rel(github, github\_system, "Manages repositories on")
Rel(gitlab, gitlab\_system, "Manages projects on")
Rel(google\_maps, gmaps\_system, "Gets location data from")
Rel(sentry, sentry\_system, "Tracks errors using")
Rel(slack, slack\_system, "Communicates through")
Rel(everything, sequential\_thinking, "Uses for structured problem solving")
Rel(everything, filesystem, "Uses for file operations")
Rel(everything, memory, "Uses for knowledge storage")
Rel(everything, postgres, "Uses for SQL database operations")
Rel(everything, redis, "Uses for key-value storage")
Rel(everything, sqlite, "Uses for embedded database operations")
Rel(everything, aws\_kb, "Uses for knowledge retrieval")
Rel(everything, brave\_search, "Uses for web search")
Rel(everything, everart, "Uses for image generation")
Rel(everything, gdrive, "Uses for cloud storage")
Rel(everything, github, "Uses for code management")
Rel(everything, gitlab, "Uses for code management")
Rel(everything, google\_maps, "Uses for location services")
Rel(everything, puppeteer, "Uses for web automation")
Rel(everything, sentry, "Uses for error tracking")
Rel(everything, slack, "Uses for team communication")
Rel(everything, time, "Uses for time operations")
Its not bad. It has picked up all the major components, and correctly linked them. IMHO you would take this kind of thing if you had a million lines of VB and didnt know where to start with it.
Here is another - this time one MCP Doc drew of itself:
Loading
C4Container
title Container diagram for McpDoc Documentation Generator
Person(developer, "Software Developer", "Uses McpDoc to generate documentation")
Person(maintainer, "Project Maintainer", "Maintains and extends McpDoc")
System\_Boundary(mcpdoc, "McpDoc Documentation Generator") {
Container(mcp\_server, "MCP Server", "TypeScript, Node.js", "Model Context Protocol server handling client requests")
Container(doc\_generator, "Documentation Generator", "TypeScript", "Generates README and C4 documentation from source code")
Container(mermaid\_engine, "Mermaid Engine", "TypeScript", "Processes and validates Mermaid diagrams")
Container(file\_handler, "File Handler", "TypeScript, Node.js", "Manages file system operations and timestamps")
Container(prompt\_manager, "Prompt Manager", "TypeScript", "Manages and expands documentation generation prompts")
Container(type\_system, "Type System", "TypeScript", "Core types and interfaces for the application")
}
System\_Ext(mermaidjs, "Mermaid.js", "External diagram rendering library")
System\_Ext(vscode, "Visual Studio Code", "IDE integration")
System\_Ext(browser, "Web Browser", "Diagram preview")
System\_Ext(filesystem, "File System", "Source code and documentation storage")
Rel(developer, mcp\_server, "Sends requests", "HTTP/WebSocket")
Rel(maintainer, mcp\_server, "Maintains", "Development and testing")
Rel(mcp\_server, doc\_generator, "Delegates", "Documentation tasks")
Rel(mcp\_server, prompt\_manager, "Uses", "Get prompts")
Rel(doc\_generator, mermaid\_engine, "Uses", "Generate diagrams")
Rel(doc\_generator, file\_handler, "Uses", "Read/Write files")
Rel(mermaid\_engine, mermaidjs, "Uses", "Render diagrams")
Rel(file\_handler, filesystem, "Reads/Writes", "Files")
Rel(mermaid\_engine, browser, "Previews via", "HTML")
Rel(mcp\_server, vscode, "Integrates with", "Extension API")
Rel\_Back(type\_system, mcp\_server, "Provides types", "TypeScript interfaces")
Rel\_Back(type\_system, doc\_generator, "Provides types", "TypeScript interfaces")
Rel\_Back(type\_system, mermaid\_engine, "Provides types", "TypeScript interfaces")
Rel\_Back(type\_system, file\_handler, "Provides types", "TypeScript interfaces")
Rel\_Back(type\_system, prompt\_manager, "Provides types", "TypeScript interfaces")
Documentation Tools
Mermaid Support
Various tools to help validate & improve quality of generated Mermaid.js diagrams
The MCP Documenter follows a modular architecture designed for extensibility and maintainability. The system is composed of several key components as shown in the C4 Container diagram above:
Server Core: The main entry point and request router that implements the Model Context Protocol. It initializes the server, registers capabilities, and manages communication via the MCP SDK.
Prompt Manager: Handles the registration and processing of documentation generation prompts. This includes prompts for README files, component-level C4 diagrams, and system-level C4 diagrams.
Function Handler: Manages tool registration and processes function calls for Mermaid diagram operations. It coordinates between different tools and validates inputs/outputs.
Mermaid Parser: Validates and parses Mermaid.js diagram syntax. This ensures generated diagrams are syntactically correct before being saved or previewed.
Preview Generator: Creates HTML-based previews of Mermaid diagrams that can be viewed in a web browser.
Diagram Validator: Specifically validates C4 diagram types and formats, ensuring they follow C4 model conventions.
Developers interact with the Server Core through MCP protocol
Please note that these prompts have been quite extensively tested with Claude Sonnet 3.5 (from Cursor), and Claude Opus 3.7 (from Claude Desktop). In general, both models can produce pretty good diagrams. In testing prior versions of this software with OpenAI GPT40 and Gemini 1.5, the error rate was much higher - hence the more detailed prompts. Your mileage may vary.
Personally I would only do this with the Claude famility of models which do seem to be the state of the art for cdoe generation - and Mermaid.js makdown is a niche sub flavour of code generation.
To generate documentation for each directory containing source code:
Use the filesystem tool to list all subdirectories of {RootDirectory}. Ignore any 'node_modules' subdirectories. Then recursively list the contents of each other subdirectory (apart from any 'node_modules' subdirectories) for typescript files. If the subdirectory contains one or more typescript files, call the mcp_documenter tool 'should_regenerate_readme' to see if the README file should be regenerated. If the README file should be regenerated, then read every typescript file in the subdirectory, and create a 50 word summary of the file in markdown format intended to brief new developers on its content. Accumulate all the summaries and write a concatenated summary into a file named README.McpDoc.md in the same subdirectory, giving an absolute path to the tool.
To generate a C4Component diagram in each directory containing source code:
Use the filesystem tool to list all subdirectories of ${RootDirectory}. Ignore the node_modules subdirectory. Then recursively seach each other subdirectory. If the subdirectory contains a file README.McpDoc.md, then read the contents of the file. and generate a C4Component Mermaid.js diagram from the contents. Use the provided tools to parse and validate the generated diagram, and if it is valid, generate a preview, and write the markdown to a file named C4Component.McpDoc.md in the same subdirectory, giving an absolute path to the tool.
Your chain of thought:
1) Use C4Component for the diagram type (avoid C4_Component, PlantUML syntax, or any unrecognized element)
2) Identify the primary users and the main system elements
3) If you see any non-standard C4 elements, convert them to valid Mermaid C4 elements like Person, Container, or System
4) Group related nodes in System_Boundary blocks if appropriate
5) Use System_Ext for external systems or services
6) Only create relationships ('Rel()') between valid elements — refer to components by ID (not just strings). Only use 'Rel', not 'Rel_Neighbor'. Link to nodes directly, not to System_Boundary() groups.
7) Output only valid Mermaid code — no extra commentary or text — which supports built-in rendering in markdown environments
8) Verify there are no lexical or syntax errors. If the markdown is not valid mermaid.js, try to diagnose the error using the parse tool and try again
In my experience, the 'Chain of Thought' is not really needed by Claude. It seems harmless though, and at the time of writing (March 2025), is definitley needed by Gemini or OpenAI GPT4o to get thjem to generate syntactically correct models.
To generate a C4Component diagram in each directory containing source code:
Use the filesystem tool to list all subdirectories of ${RootDirectory}. Ignore the node_modules subdirectory. Then recursively search each other subdirectory for a file named README.McpDoc.md. Concatenate the contents of all these files, and generate a ${C4Type} Mermaid.js diagram from the contexts. Use the provided tools to parse and validate the generated diagram, and if it is valid, generate a preview, and write the markdown to a file named ${C4Type}.McpDoc.md in the directory ${RootDirectory}.
Your chain of thought:
1) Use ${C4Type} for the diagram type (avoid C4_Component, PlantUML syntax, or any unrecognized element).
2) Identify the primary user(s) and the main system element(s).
3) If you see any non-standard C4 elements, convert them to valid Mermaid C4 elements like Person(), Container(), or System().
4) Group related nodes in System_Boundary() blocks if appropriate.
5) Use System_Ext() for external systems or services.
6) Only create relationships ('Rel()') between valid elements — refer to components by ID (not just strings). Only use 'Rel', not 'Rel_Neighbor'. Link to nodes directly, not to System_Boundary() groups.
7) Output only valid Mermaid code — no extra commentary or text — which supports built-in rendering in markdown environments.
8) Verify there are no lexical or syntax errors. If the markdown is not valid mermaid.js, try to diagnose the error using the parse tool and try again.
The same qualifier applies to 'Chain of Thought'.
Clone the repository:
shell
git clone https://github.com/yourusername/McpDoc.git
Install dependencies:
shell
npm install
Build the project:
shell
npm run build
Install the Anthropic filesystem MCP server
.
The project includes unit tests written with Mocha.
npm run test
To use the MCP server from a host, you need to update your AI development environment. Common configuration settings are shown below:
{
"mcpServers": {
"mcp-documenter": {
"command": "node",
"args": ["YourCodeRoot/McpDoc/dist/src/index.js"]
},
"mcp-filesystem": {
"command": "node",
"args": ["YourCodeRoot/McpFS/dist/index.js", "YourCodeRoot"]
}
}
}
For specific IDE setup instructions, refer to:
Generated by 'dogfooding' - McpDoc has geneated a README.McpDoc.md and a C4Component.McpDoc.md in each sub-directory, plus a master C4Context and C4Container in the root directory.
./C4Context.McpDoc.md
- Overview C4Context diagram./C4Container.McpDoc.md
- Overview C4Container diagramsrc/README.McpDoc.md
- Source code documentationsrc/C4Component.McpDoc.md
- Source code component diagramtest/README.McpDoc.md
- Test suite documentationtest/C4Component.McpDoc.md
- Test suite component diagramThe main area that needs improvement is parsing and validating the diagrams to give feedback to the model in case it makes syntax errors. It turns out that Mermaid.js is tricksy to run outside a browser. So actually the MCP server needs to spin up a headless browser / Selenium-driven client, and then look for parse failures in the HTML. Too much for me at the present.
The prompts to date have only been run over typescript and python.
As mentioned in the prompts, this approach has only really been tested using Claude Sonnet 3.5 (Cursor) and Claude Opus 3.7 (Claude Desktop). Other models proved less good at successfully generating usable diagrams. All can generate usable code summaries.
Contributions are welcome! Please feel free to submit a Pull Request.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)MIT
AI-Powered Documentation Generator for Legacy Codebases
No releases published
No packages published
You can’t perform that action at this time.