This is a simple web crawler implemented in Go. It crawls a given website up to a specified number of pages using concurrent goroutines.
crawler <baseURL> <maxConcurrency> <maxPages>
baseURL
: The starting URL to begin crawling from.maxConcurrency
: The maximum number of concurrent goroutines to use for crawling.maxPages
: The maximum number of pages to crawl.
- Concurrent crawling using goroutines
- Configurable maximum concurrency and page limit
- Prints a report of crawled pages and their links
- The program takes command-line arguments for the base URL, maximum concurrency, and maximum pages to crawl.
- It configures the crawler with the provided parameters.
- The crawling process starts from the base URL.
- The program uses goroutines to crawl pages concurrently, respecting the maximum concurrency limit.
- It continues crawling until it reaches the maximum number of pages or exhausts all links.
- Finally, it prints a report of the crawled pages and their links.
go run main.go https://example.com 5 100
This command will start crawling https://example.com
using a maximum of 5 concurrent goroutines and will stop after crawling 100 pages or when there are no more links to crawl.
The program includes basic error handling for:
- Insufficient or excessive command-line arguments
- Invalid input for
maxConcurrency
andmaxPages
(non-integer values) - Configuration errors
This project uses only the Go standard library and does not require any external dependencies.
Contributions are welcome! Please feel free to submit a Pull Request.