saishridhar_webscraper

by saishridhar

An MCP server that transcribes webpages, YouTube videos, and PDFs for LLMs like Claude to process.

Web Scraping Integration for Claude

Overview

The Web Scraping Integration for Claude is an MCP server designed to enhance Claude's capabilities by allowing it to scrape and transcribe content from webpages, YouTube videos, and PDFs. By providing a URL, Claude can extract and utilize the text content, enabling it to answer questions or perform tasks based on the provided links.

Tools Available

`get_pdf`

Converts a URL that leads to a PDF file into markdown text.

Args:
- input_url (str): Path to the PDF file to convert.

Returns:
- str: markdown_text

`get_webpage_content`

Returns the text content on a webpage based on the link provided. This tool is useful for accessing and extracting text from general webpages.

Args:
- url: The URL from which you want the text to be extracted.

`get_youtube_transcript`

Extracts the transcript from a YouTube video. This tool is particularly useful when users provide YouTube links and ask questions based on the video content.

Args:
- url: The URL from which you want the text to be extracted.

Configuration

To set up the Web Scraping Integration for Claude, follow these steps:

Clone the Repository:
bash git clone https://github.com/saishridhar/webscraper.git
Install Dependencies:
bash pip install -r requirements.txt
Run the Server:
bash ./run_webscraper.sh

Usage

Once the server is running, you can integrate it with Claude to enable web scraping capabilities. Simply provide the URL of the webpage, YouTube video, or PDF, and Claude will be able to extract and utilize the text content.

Example Usage

Extracting Webpage Content:
python from webscraper import get_webpage_content content = get_webpage_content("https://example.com") print(content)
Extracting YouTube Transcript:
python from webscraper import get_youtube_transcript transcript = get_youtube_transcript("https://www.youtube.com/watch?v=example") print(transcript)
Converting PDF to Markdown:
python from webscraper import get_pdf markdown_text = get_pdf("https://example.com/document.pdf") print(markdown_text)

About

This MCP server is designed to transcribe webpages for LLMs like Claude, enabling them to access and utilize content from various sources by simply providing the URL. This integration enhances Claude's ability to interact with and respond to user queries based on external content.

Resources

Features & Capabilities

Implementation Details

View on GitHub

Stats

0 Views

0 Favorites

1 GitHub Stars

Repository Info

saishridhar Organization

Similar Servers

continuedev_continue by continuedev

cherryhq_cherry_studio by CherryHQ

bytedance_ui_tars_desktop by bytedance

saishridhar_webscraper

Web Scraping Integration for Claude

Overview

Tools Available

`get_pdf`

`get_webpage_content`

`get_youtube_transcript`

Configuration

Usage

Example Usage

About

Resources

Features & Capabilities

Implementation Details

Stats

Repository Info

Similar Servers

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

What’s in Startup Plan?

Details

Frameworks

Database

Billing

Completed

Project Type

Project Settings

Drop files here or click to upload.

Budget

Build a Team

Set First Target

Upload Files

Drop files here or click to upload.

Project Created!

No result found

Advanced Search

Search Preferences

saishridhar_webscraper

Web Scraping Integration for Claude

Overview

Tools Available

get_pdf

get_webpage_content

get_youtube_transcript

Configuration

Usage

Example Usage

About

Resources

Features & Capabilities

Implementation Details

Stats

Repository Info

Similar Servers

Drop files here or click to upload.

Drop files here or click to upload.

`get_pdf`

`get_webpage_content`

`get_youtube_transcript`