Web Crawler

This is a simple web crawler implemented in Go. It crawls a given website up to a specified number of pages using concurrent goroutines.

Usage

crawler <baseURL> <maxConcurrency> <maxPages>

baseURL: The starting URL to begin crawling from.
maxConcurrency: The maximum number of concurrent goroutines to use for crawling.
maxPages: The maximum number of pages to crawl.

Features

Concurrent crawling using goroutines
Configurable maximum concurrency and page limit
Prints a report of crawled pages and their links

How it works

The program takes command-line arguments for the base URL, maximum concurrency, and maximum pages to crawl.
It configures the crawler with the provided parameters.
The crawling process starts from the base URL.
The program uses goroutines to crawl pages concurrently, respecting the maximum concurrency limit.
It continues crawling until it reaches the maximum number of pages or exhausts all links.
Finally, it prints a report of the crawled pages and their links.

Example

go run main.go https://example.com 5 100

This command will start crawling https://example.com using a maximum of 5 concurrent goroutines and will stop after crawling 100 pages or when there are no more links to crawl.

Error Handling

The program includes basic error handling for:

Insufficient or excessive command-line arguments
Invalid input for maxConcurrency and maxPages (non-integer values)
Configuration errors

Dependencies

This project uses only the Go standard library and does not require any external dependencies.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
configure.go		configure.go
crawl_page.go		crawl_page.go
get_html.go		get_html.go
get_urls_from_html.go		get_urls_from_html.go
get_urls_from_html_test.go		get_urls_from_html_test.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
normalize_url.go		normalize_url.go
normalize_url_test.go		normalize_url_test.go
print_report.go		print_report.go
print_report_test.go		print_report_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawler

Usage

Features

How it works

Example

Error Handling

Dependencies

Contributing

About

Releases

Packages

Languages

kyomel/crawler

Folders and files

Latest commit

History

Repository files navigation

Web Crawler

Usage

Features

How it works

Example

Error Handling

Dependencies

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages