jae_jae_fetcher_mcp

jae_jae_fetcher_mcp

by jae-jae
An MCP server that fetches web page content using Playwright's headless browser, supporting JavaScript execution and intelligent content extraction.

Fetcher MCP for Web Content Extraction

Fetcher MCP Icon

Overview

Fetcher MCP is a powerful server designed to fetch web page content using the Playwright headless browser. It is particularly adept at handling dynamic web content and modern web applications, making it an essential tool for web content extraction.

Key Features

  • JavaScript Support: Executes JavaScript to handle dynamic content.
  • Intelligent Content Extraction: Uses a Readability algorithm to extract main content.
  • Flexible Output Formats: Supports HTML and Markdown outputs.
  • Parallel Processing: Fetches multiple URLs concurrently.
  • Resource Optimization: Blocks unnecessary resources to improve performance.
  • Robust Error Handling: Comprehensive error handling for reliable operation.
  • Configurable Parameters: Fine-grained control over various extraction parameters.

Quick Start

Run directly with npx:

npx -y fetcher-mcp

First time setup - install the required browser:

npx playwright install chromium

Debug Mode

Run with the --debug option to show the browser window:

npx -y fetcher-mcp --debug

Configuration

Configure this MCP server in Claude Desktop:

On MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json

On Windows: %APPDATA%/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "fetcher": {
      "command": "npx",
      "args": ["-y", "fetcher-mcp"]
    }
  }
}

Usage

fetch_url

Retrieve web page content from a specified URL.

  • Parameters:
  • url: The URL of the web page to fetch.
  • timeout: Page loading timeout in milliseconds.
  • waitUntil: Specifies when navigation is considered complete.
  • extractContent: Whether to intelligently extract the main content.
  • maxLength: Maximum length of returned content.
  • returnHtml: Whether to return HTML content instead of Markdown.
  • waitForNavigation: Whether to wait for additional navigation.
  • navigationTimeout: Maximum time to wait for additional navigation.
  • disableMedia: Whether to disable media resources.
  • debug: Whether to enable debug mode.

fetch_urls

Batch retrieve web page content from multiple URLs in parallel.

  • Parameters:
  • urls: Array of URLs to fetch.
  • Other parameters are the same as fetch_url.

Tips

Handling Special Website Scenarios

Dealing with Anti-Crawler Mechanisms

  • Wait for Complete Loading: Use waitForNavigation: true.
  • Increase Timeout Duration: Adjust timeout and navigationTimeout.

Content Retrieval Adjustments

  • Preserve Original HTML Structure: Use extractContent: false and returnHtml: true.
  • Fetch Complete Page Content: Use extractContent: false.
  • Return Content as HTML: Use returnHtml: true.

Debugging and Authentication

Enabling Debug Mode

  • Dynamic Debug Activation: Use debug: true.

Using Custom Cookies for Authentication

  • Manual Login: Use debug: true for manual login.
  • Interacting with Debug Browser: Keep the browser window open for manual login.
  • Enable Debug for Specific Requests: Use debug: true for specific requests.

Development

Install Dependencies

npm install

Install Playwright Browser

npm run install-browser

Build the Server

npm run build

Debugging

Use MCP Inspector for debugging:

npm run inspector

Enable visible browser mode:

node build/index.js --debug

Related Projects

License

Licensed under the MIT License

About

MCP server for fetch web page content using Playwright headless browser.

Topics

ai mcp playwright

Resources

Readme

License

MIT license

Code of conduct

Code of conduct

Activity

Stars

446 stars

Watchers

4 watching

Forks

27 forks

Report repository

Releases


No releases published

Packages 0


No packages published

Languages

Features & Capabilities

Categories
mcp_server model_context_protocol typescript javascript playwright web_scraping api_integration content_extraction

Implementation Details

Stats

0 Views
446 GitHub Stars

Repository Info

jae-jae Organization

Similar MCP Servers

continuedev_continue by continuedev
25049
21423
9300
SunMonTueWedThuFriSat
303112345678910111213141516171819202122232425262728293012345678910
:
PM
SunMonTueWedThuFriSat
303112345678910111213141516171819202122232425262728293012345678910
:
PM