jae_jae_fetch_mcp

by jae-jae

An MCP server that fetches web page content using Playwright headless browser with intelligent content extraction and parallel processing.

Fetcher MCP for Web Content Extraction

Overview

Fetcher MCP is a powerful server designed to fetch web page content using the Playwright headless browser. It excels in handling dynamic web content and modern web applications, making it an ideal tool for web scraping and content extraction tasks.

Advantages

JavaScript Support: Executes JavaScript to handle dynamic web content.
Intelligent Content Extraction: Uses a Readability algorithm to extract main content.
Flexible Output Format: Supports HTML and Markdown output formats.
Parallel Processing: Enables concurrent fetching of multiple URLs.
Resource Optimization: Blocks unnecessary resources to reduce bandwidth usage.
Robust Error Handling: Comprehensive error handling and logging.
Configurable Parameters: Fine-grained control over timeouts and content extraction.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser:

npx playwright install chromium

Debug Mode

Run with the --debug option to show the browser window for debugging:

npx -y fetcher-mcp --debug

Configuration MCP

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Features

fetch_url: Retrieve web page content from a specified URL.
Parameters:
- url: The URL of the web page to fetch.
- timeout: Page loading timeout in milliseconds.
- waitUntil: Specifies when navigation is considered complete.
- extractContent: Whether to intelligently extract the main content.
- maxLength: Maximum length of returned content.
- returnHtml: Whether to return HTML content instead of Markdown.
- waitForNavigation: Whether to wait for additional navigation.
- navigationTimeout: Maximum time to wait for additional navigation.
- disableMedia: Whether to disable media resources.
- debug: Whether to enable debug mode.
fetch_urls: Batch retrieve web page content from multiple URLs in parallel.
Parameters:
- urls: Array of URLs to fetch.
- Other parameters are the same as fetch_url.

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Wait for Complete Loading: For websites using CAPTCHA, redirects, or other verification mechanisms.
Increase Timeout Duration: For websites that load slowly.

Content Retrieval Adjustments

Preserve Original HTML Structure: When content extraction might fail.
Fetch Complete Page Content: When extracted content is too limited.
Return Content as HTML: When HTML format is needed instead of default Markdown.

Debugging and Authentication

Enabling Debug Mode

Dynamic Debug Activation: To display the browser window during a specific fetch operation.

Using Custom Cookies for Authentication

Manual Login: To login using your own credentials.
Interacting with Debug Browser: When debug mode is enabled.
Enable Debug for Specific Requests: Even if the server is already running, you can enable debug mode for a specific request.

Development

Install Dependencies

npm install

Install Playwright Browser

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

You can also enable visible browser mode for debugging:

node build/index.js --debug

Related Projects

g-search-mcp: A powerful MCP server for Google search that enables parallel searching with multiple keywords simultaneously.

License

Licensed under the MIT License

About

MCP server for fetch web page content using Playwright headless browser.

No releases published

Packages 0

No packages published

Languages

Features & Capabilities

Implementation Details

View on GitHub

Stats

0 Views

0 Favorites

446 GitHub Stars

Repository Info

jae-jae Organization

Similar Servers

continuedev_continue by continuedev

cherryhq_cherry_studio by CherryHQ

bytedance_ui_tars_desktop by bytedance

No result found

Advanced Search

Search Preferences

jae_jae_fetch_mcp

Fetcher MCP for Web Content Extraction

Overview

Advantages

Quick Start

Debug Mode

Configuration MCP

Features

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Content Retrieval Adjustments

Debugging and Authentication

Enabling Debug Mode

Using Custom Cookies for Authentication

Development

Install Dependencies

Install Playwright Browser

Build the Server

Debugging

Related Projects

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages

Features & Capabilities

Implementation Details

Stats

Repository Info

Similar Servers

Drop files here or click to upload.

Drop files here or click to upload.