This is a simple web crawler written in PHP.
Note:- Minimum requirements are PHP 5.3+
Please run the index.php in command line or browser.
The application lists out 20 URLs by default.
Edit the value to a higher by $crawler->setLinksAllowed(Your Value).
- The Crawler uses a main controller class
Crawler/WebCrawler.php. - The class
Crawler/Request/RequestAbstract.phpis responsible for handling each request and parsing the request. This class isabstractto let us use new methods of requests to send the request in future. Eg-cURL, Simple DOM HTML, etc. - The class
Crawler/Request/RequestRegular.phpinheritsCrawler/Request/RequestAbstract.phpand uses the defaultfile_get_contentandDOMelementobject to get and parse the links respectively. - The object
Crawler/WebPage/Webpage.phpholds the complete property of each page (or URL.). To let add new properties like title and other details.
Note:- Edit the set_time_limit(220); to a higher value.