saishridhar_webscraper

saishridhar_webscraper

by saishridhar
An MCP server that transcribes webpages, YouTube videos, and PDFs for LLMs like Claude to process.

Web Scraping Integration for Claude

Overview

The Web Scraping Integration for Claude is an MCP server designed to enhance Claude's capabilities by allowing it to scrape and transcribe content from webpages, YouTube videos, and PDFs. By providing a URL, Claude can extract and utilize the text content, enabling it to answer questions or perform tasks based on the provided links.

Tools Available

get_pdf

Converts a URL that leads to a PDF file into markdown text.

Args:
- input_url (str): Path to the PDF file to convert.

Returns:
- str: markdown_text

get_webpage_content

Returns the text content on a webpage based on the link provided. This tool is useful for accessing and extracting text from general webpages.

Args:
- url: The URL from which you want the text to be extracted.

get_youtube_transcript

Extracts the transcript from a YouTube video. This tool is particularly useful when users provide YouTube links and ask questions based on the video content.

Args:
- url: The URL from which you want the text to be extracted.

Configuration

To set up the Web Scraping Integration for Claude, follow these steps:

  1. Clone the Repository:
    bash git clone https://github.com/saishridhar/webscraper.git

  2. Install Dependencies:
    bash pip install -r requirements.txt

  3. Run the Server:
    bash ./run_webscraper.sh

Usage

Once the server is running, you can integrate it with Claude to enable web scraping capabilities. Simply provide the URL of the webpage, YouTube video, or PDF, and Claude will be able to extract and utilize the text content.

Example Usage

  • Extracting Webpage Content:
    python from webscraper import get_webpage_content content = get_webpage_content("https://example.com") print(content)

  • Extracting YouTube Transcript:
    python from webscraper import get_youtube_transcript transcript = get_youtube_transcript("https://www.youtube.com/watch?v=example") print(transcript)

  • Converting PDF to Markdown:
    python from webscraper import get_pdf markdown_text = get_pdf("https://example.com/document.pdf") print(markdown_text)

About

This MCP server is designed to transcribe webpages for LLMs like Claude, enabling them to access and utilize content from various sources by simply providing the URL. This integration enhances Claude's ability to interact with and respond to user queries based on external content.

Resources

Features & Capabilities

Categories
mcp_server model_context_protocol python web_scraping claude api_integration

Implementation Details

Stats

0 Views
0 Favorites
1 GitHub Stars

Repository Info

saishridhar Organization

Similar Servers

continuedev_continue by continuedev
0
0
0