WebsiteScraper
A node that extracts structured content from websites
Node Input
url
(string or string[]): The URL of the website to scrape. It can be a single URL string or a list of URLs, depending on the use case.
Node Output
content
(string): The scraped content of the website in markdown format, based on the provided URL.
Function
The WebsiteScraper node is designed to extract content from specified websites. It navigates to the provided URL(s), optionally executing JavaScript if required, and retrieves the site’s content as markdown. This node can handle both general websites and specific formats, such as LinkedIn profiles, by applying relevant scraping strategies.
When to Use It?
The WebsiteScraper node is especially useful in cases such as:
- Extracting structured content from profiles, news articles, or product pages
- Gathering text-based data for further analysis
- Automating web content extraction within workflows
For best results, ensure the url
provided is specific to the content you
want to extract. The node can handle both single and multiple URLs for
flexible usage.