Fire Crawl Crawl Tool Node

Purpose: Traverse a website and collect content from all accessible subpages.

Overview

The Crawl function automatically navigates through a website, discovering and retrieving content from all connected pages. It's ideal for comprehensive data collection across entire websites without requiring a sitemap.

When to Use

Gathering data from an entire website
Building a complete content inventory
Collecting information across multiple related pages
Understanding website structure and content distribution
Archiving website content

Key Features

Automatic Discovery: Finds and crawls all accessible subpages
No Sitemap Required: Works without needing a pre-built sitemap
Comprehensive Coverage: Collects content from the entire website structure
Configurable Format: Returns content in your specified format
Respects Site Structure: Follows the website's internal linking

Limitations

Max Crawl Limit: 10 pages maximum per crawl operation
Scope: Limited to accessible, linked pages from the starting URL
Rate Limiting: Respects website server limits and robots.txt rules

Input Requirements

Starting URL: The root or entry point of the website to crawl
Format Preference: Desired output format (markdown, JSON, etc.)

Output

Content from All Pages: Organized content from up to 10 crawled pages
URL Mapping: List of all pages discovered and crawled
Structured Format: Content organized according to your specified format

Example Use Cases

Collecting all product listings from an e-commerce site
Gathering documentation from a knowledge base
Archiving a small website's complete content
Analyzing content across multiple related pages
Building a content database from a website