Skip to content

Yen444/adidasDatabase

Repository files navigation

An Adidas Database

  • A customary data processing pipeline is generally composed of three fundamental phases: collect, prepare, and access.
  • The objective of the project outlined in this repo is to build an Adidas shoe database. The intention further involves querying this database and subsequently presenting interesting statistical insights derived from it on a website.

Workflow

  • Collect: scrape shoe product information from adidas.de
  • Prepare: set up a MongoDB database and insert scraped info into that database
  • Access: query the database, write the results in XML file, write XSLT templates to transform XML to HTML

File Structure

├── README.md
├── Scraper
│   ├── adidas_men_scraper.py
│   ├── adidas_woman_scraper.py
│   └── check_robots_adidas.py
├── html
│   ├── histogram.html
│   ├── scatter.html
│   ├── statistics.html
│   └── style.css
├── json_files
│   ├── shoe_men_dict_adidas_without_review.json
│   └── shoe_women_dict_adidas_without_review.json
├── prepare_access.py
├── requirement_scraper.txt
├── requirement_textTech.txt
├── workflow.png
├── xml
│   └── out.xml
└── xslt
    ├── xlst_template_hist.xml
    ├── xlst_template_scatter.xml
    └── xlst_template_statistics.xml

Notes on running files

Scraper

To run the scraper, one should install all packages listed in requirement_scraper.txt. Apart from that, make sure to have the following downloaded and set up:

Database

Before running all functions in prepare_access.py, one needs to make sure all packages in requirement_textTech.txt are installed. In addition, make sure to:

Running prepare_access.py will create the out.xml and the html files inside the html folder.

One can change the style of the website, e.g., fonts, colors, spacing, with Cascading Style Sheets 4 in the style.css file.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published