Skip to content

dotnepal/WebInfoScraper

This branch is 1 commit ahead of, 1 commit behind bikal-basnet/WebInfoScraper:master.

Repository files navigation

WebInfoScraper

Implemetation codes only included. Core codes, priorietary to the org and hence is excluded.

  1. SiteCrawler : Crawl the site and get all the links in the site. Handles pagination, login, threads with skip patterns that can be customised during the call.

  2. WebInfoRetrriever : Retrieve the product information from the products urls.

       a. Configuration driven information extractor.
    
       b. Multiple thread support
    
       c. Can retrieve product information from login required websites.
    
       d. Login field type and name  can be explicitly supplied, to  uniqiely identify the login forms
    
  3. BulkImageRetriver : Provided a list of images links, download all the images to the destination folder.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%