WebInfoScraper

Implemetation codes only included. Core codes, priorietary to the org and hence is excluded.

SiteCrawler : Crawl the site and get all the links in the site. Handles pagination, login, threads with skip patterns that can be customised during the call.

WebInfoRetrriever : Retrieve the product information from the products urls.

   a. Configuration driven information extractor.

   b. Multiple thread support

   c. Can retrieve product information from login required websites.

   d. Login field type and name  can be explicitly supplied, to  uniqiely identify the login forms

BulkImageRetriver : Provided a list of images links, download all the images to the destination folder.

Name	Name	Last commit message	Last commit date
Latest commit samundra remove SG API key Aug 2, 2017 d62e400 · Aug 2, 2017 History 6 Commits
framework	framework	remove SG API key	Aug 2, 2017
BulkImageDownloader.py	BulkImageDownloader.py	Added WEb infor retriver code implementation	Jul 17, 2016
README.md	README.md	Create README.md	Jul 17, 2016
SiteCrawler.py	SiteCrawler.py	Added WEb infor retriver code implementation	Jul 17, 2016
WebInfoRetriever.py	WebInfoRetriever.py	Added WEb infor retriver code implementation	Jul 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WebInfoScraper

About

Releases

Packages

Languages

dotnepal/WebInfoScraper

Folders and files

Latest commit

History

Repository files navigation

WebInfoScraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages