jae_jae_fetcher_mcp

by jae-jae

An MCP server that fetches web page content using Playwright's headless browser, supporting JavaScript execution and intelligent content extraction.

Fetcher MCP for Web Content Extraction

Fetcher MCP Icon

Overview

Fetcher MCP is a powerful server designed to fetch web page content using the Playwright headless browser. It is particularly adept at handling dynamic web content and modern web applications, making it an essential tool for web content extraction.

Key Features

JavaScript Support: Executes JavaScript to handle dynamic content.
Intelligent Content Extraction: Uses a Readability algorithm to extract main content.
Flexible Output Formats: Supports HTML and Markdown outputs.
Parallel Processing: Fetches multiple URLs concurrently.
Resource Optimization: Blocks unnecessary resources to improve performance.
Robust Error Handling: Comprehensive error handling for reliable operation.
Configurable Parameters: Fine-grained control over various extraction parameters.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser:

npx playwright install chromium

Debug Mode

Run with the --debug option to show the browser window:

npx -y fetcher-mcp --debug

Configuration

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Usage

`fetch_url`

Retrieve web page content from a specified URL.

Parameters:
url: The URL of the web page to fetch.
timeout: Page loading timeout in milliseconds.
waitUntil: Specifies when navigation is considered complete.
extractContent: Whether to intelligently extract the main content.
maxLength: Maximum length of returned content.
returnHtml: Whether to return HTML content instead of Markdown.
waitForNavigation: Whether to wait for additional navigation.
navigationTimeout: Maximum time to wait for additional navigation.
disableMedia: Whether to disable media resources.
debug: Whether to enable debug mode.

`fetch_urls`

Batch retrieve web page content from multiple URLs in parallel.

Parameters:
urls: Array of URLs to fetch.
Other parameters are the same as fetch_url.

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Wait for Complete Loading: Use waitForNavigation: true.
Increase Timeout Duration: Adjust timeout and navigationTimeout.

Content Retrieval Adjustments

Preserve Original HTML Structure: Use extractContent: false and returnHtml: true.
Fetch Complete Page Content: Use extractContent: false.
Return Content as HTML: Use returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

Dynamic Debug Activation: Use debug: true.

Using Custom Cookies for Authentication

Manual Login: Use debug: true for manual login.
Interacting with Debug Browser: Keep the browser window open for manual login.
Enable Debug for Specific Requests: Use debug: true for specific requests.

Development

Install Dependencies

npm install

Install Playwright Browser

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

Enable visible browser mode:

node build/index.js --debug

Related Projects

g-search-mcp: A powerful MCP server for Google search.

License

Licensed under the MIT License

About

MCP server for fetch web page content using Playwright headless browser.

No releases published

Packages 0

No packages published

Languages

Features & Capabilities

Implementation Details

View on GitHub

Stats

0 Views

0 Favorites

446 GitHub Stars

Repository Info

jae-jae Organization

Similar Servers

continuedev_continue by continuedev

cherryhq_cherry_studio by CherryHQ

bytedance_ui_tars_desktop by bytedance

No result found

Advanced Search

Search Preferences

jae_jae_fetcher_mcp

Fetcher MCP for Web Content Extraction

Overview

Key Features

Quick Start

Debug Mode

Configuration

Usage

fetch_url

fetch_urls

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

Content Retrieval Adjustments

Debugging and Authentication

Enabling Debug Mode

Using Custom Cookies for Authentication

Development

Install Dependencies

Install Playwright Browser

Build the Server

Debugging

Related Projects

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages

Features & Capabilities

Implementation Details

Stats

Repository Info

Similar Servers

Drop files here or click to upload.

Drop files here or click to upload.

`fetch_url`

`fetch_urls`