jae_jae_fetch_mcp

jae_jae_fetch_mcp

by jae-jae
An MCP server that fetches web page content using Playwright headless browser with intelligent content extraction and parallel processing.

Fetcher MCP for Web Content Extraction

Overview

Fetcher MCP is a powerful server designed to fetch web page content using the Playwright headless browser. It excels in handling dynamic web content and modern web applications, making it an ideal tool for web scraping and content extraction tasks.

Advantages

  • JavaScript Support: Executes JavaScript to handle dynamic web content.
  • Intelligent Content Extraction: Uses a Readability algorithm to extract main content.
  • Flexible Output Format: Supports HTML and Markdown output formats.
  • Parallel Processing: Enables concurrent fetching of multiple URLs.
  • Resource Optimization: Blocks unnecessary resources to reduce bandwidth usage.
  • Robust Error Handling: Comprehensive error handling and logging.
  • Configurable Parameters: Fine-grained control over timeouts and content extraction.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser:

npx playwright install chromium

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Features

  • fetch_url: Retrieve web page content from a specified URL.
  • Parameters:
    • url: The URL of the web page to fetch.
    • timeout: Page loading timeout in milliseconds.
    • waitUntil: Specifies when navigation is considered complete.
    • extractContent: Whether to intelligently extract the main content.
    • maxLength: Maximum length of returned content.
    • returnHtml: Whether to return HTML content instead of Markdown.
    • waitForNavigation: Whether to wait for additional navigation.
    • navigationTimeout: Maximum time to wait for additional navigation.
    • disableMedia: Whether to disable media resources.
    • debug: Whether to enable debug mode.
  • fetch_urls: Batch retrieve web page content from multiple URLs in parallel.
  • Parameters:
    • urls: Array of URLs to fetch.
    • Other parameters are the same as fetch_url.

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

  • Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms.
  • Increase Timeout Duration: For websites that load slowly.

Content Retrieval Adjustments

  • Preserve Original HTML Structure: When content extraction might fail.
  • Fetch Complete Page Content: When extracted content is too limited.
  • Return Content as HTML: When HTML format is needed instead of default Markdown.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: To display the browser window during a specific fetch operation.

Using Custom Cookies for Authentication

  • Manual Login: To login using your own credentials.
  • Interacting with Debug Browser: When debug mode is enabled.
  • Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request.

Development

Install Dependencies

npm install

Install Playwright Browser

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug

Related Projects

  • g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously.

License

Licensed under the MIT License

About

MCP server for fetch web page content using Playwright headless browser.

Topics

ai mcp playwright

Resources

Readme

License

MIT license

Code of conduct

Code of conduct

Activity

Stars

446 stars

Watchers

4 watching

Forks

27 forks

Report repository

Releases


No releases published

Packages 0


No packages published

Languages

Features & Capabilities

Categories
mcp_server model_context_protocol playwright javascript typescript web_scraping api_integration parallel_processing

Implementation Details

Stats

0 Views
446 GitHub Stars

Repository Info

jae-jae Organization

Similar MCP Servers

continuedev_continue by continuedev
25049
21423
9300