An Adidas Database

A customary data processing pipeline is generally composed of three fundamental phases: collect, prepare, and access.
The objective of the project outlined in this repo is to build an Adidas shoe database. The intention further involves querying this database and subsequently presenting interesting statistical insights derived from it on a website.

Workflow

Collect: scrape shoe product information from adidas.de
Prepare: set up a MongoDB database and insert scraped info into that database
Access: query the database, write the results in XML file, write XSLT templates to transform XML to HTML

File Structure

├── README.md
├── Scraper
│   ├── adidas_men_scraper.py
│   ├── adidas_woman_scraper.py
│   └── check_robots_adidas.py
├── html
│   ├── histogram.html
│   ├── scatter.html
│   ├── statistics.html
│   └── style.css
├── json_files
│   ├── shoe_men_dict_adidas_without_review.json
│   └── shoe_women_dict_adidas_without_review.json
├── prepare_access.py
├── requirement_scraper.txt
├── requirement_textTech.txt
├── workflow.png
├── xml
│   └── out.xml
└── xslt
    ├── xlst_template_hist.xml
    ├── xlst_template_scatter.xml
    └── xlst_template_statistics.xml

Notes on running files

Scraper

To run the scraper, one should install all packages listed in requirement_scraper.txt. Apart from that, make sure to have the following downloaded and set up:

Chrome
ChromeDriver matching with Chrome version

Database

Before running all functions in prepare_access.py, one needs to make sure all packages in requirement_textTech.txt are installed. In addition, make sure to:

install and set up a local MongoDB database
start the server whenever necessary

Running prepare_access.py will create the out.xml and the html files inside the html folder.

One can change the style of the website, e.g., fonts, colors, spacing, with Cascading Style Sheets 4 in the style.css file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Adidas Database

Workflow

File Structure

Notes on running files

Scraper

Database

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Scraper		Scraper
html		html
json_files		json_files
xml		xml
xslt		xslt
README.md		README.md
prepare_access.py		prepare_access.py
requirement_scraper.txt		requirement_scraper.txt
requirement_textTech.txt		requirement_textTech.txt
workflow.png		workflow.png

Yen444/adidasDatabase

Folders and files

Latest commit

History

Repository files navigation

An Adidas Database

Workflow

File Structure

Notes on running files

Scraper

Database

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages