Scrap-a-zon

Author: Tarek El-Hajjaoui

Description of Program

Background on the site choice

A Webscraper for https://books.toscrape.com, a book e-commerce website that is intended for users to webscrape from. Originally I wanted to webscrape Amazon.com, but Amazon is very difficult to webscrape because each search result can give very different html pages and Amazon purposely designs their frontend to make it difficult their website. Using https://books.toscrape.com is also better because it is a stable website and so this project can serve to be a portfolio project for a longer time than an actual company e-commerce site.

What the program does

Upon opening the application, the user is presented with menu items and a search bar with a submit button. The user can search any of the tags (menu item options) in the search bar which will invokes the WebScraper to scrape the book items of the user inputted category from https://books.toscrape.com. The results are presented in a csv (comma separated value) format. The columns are: book title, book price, and availability. Each category's results can be verified by visiting and looking through https://books.toscrape.com. After the results for a category appears, a save button appears to the left of the results. If clicked, the results are written in a csv file called "output.csv" within the same directory; the absolute path is shown to the user for thoroughness. The user can then search for another category.

Edge cases and error handling

If the Webscraper is not able to connect to https://books.toscrape.com, an error message appears on the frontend and console.
If the user inputted category is invalid, an error message appears on the frontend and console.
If the user adds extra or missing spaces, has different capitalizations, but still spells the category correctly, the results of the category shall still appear.

Pre-requisites

Before running the program:

Have Java version 8 installed on computer.
Have Apache Ant(TM) version 1.10.11 or above installed on computer.
Have Netbeans 12.4 or above installed

Important

Instructions to run the program (if using .jar file)

In order for Jsoup to work from the .jar file, must follow these instructions.

Using the command line, navigate to the root directory of Scrap-A-zon.
cd (change directory) into the dist folder:

cd dist

To execute this command to start the application:

java -jar Scrap-a-zon.jar

Use the program.

Instructions to run the program (if downloading src code from GitHub):

Open the project with Netbeans.
Click on the play button.
Use the program.

Source Files (under src directory):

Controller directory

AppController.java
Test.java
WebScraper.java

Model directory

FileHelper.java
GridMatrix.java
Helper.java
Product.java
UserInput.java

View directory

App.java
App.fxml
app.css
assets

Sources referred to:

JavaFx documentation: https://docs.oracle.com/javase/8/javafx/api/toc.htm
JavaFx & FXML tutorial: https://www.youtube.com/watch?v=9XJicRt_FaI
Jsoup documentation: https://jsoup.org/apidocs/
Java create, write to, and delete file: https://www.w3schools.com/java/java_files_create.asp
Java ArrayList: https://docs.oracle.com/javase/8/docs/api/java/util/ArrayList.html

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
build		build
dist		dist
lib		lib
nbproject		nbproject
src		src
store		store
test/Controller		test/Controller
.classpath		.classpath
.gitignore		.gitignore
README.md		README.md
build.xml		build.xml
manifest.mf		manifest.mf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scrap-a-zon

Author: Tarek El-Hajjaoui

Description of Program

Background on the site choice

What the program does

Edge cases and error handling

Pre-requisites

Before running the program:

Important

Instructions to run the program (if using .jar file)

Instructions to run the program (if downloading src code from GitHub):

Source Files (under src directory):

Controller directory

Model directory

View directory

Sources referred to:

About

Releases

Packages

Languages

tarekel96/Scrape-a-zon

Folders and files

Latest commit

History

Repository files navigation

Scrap-a-zon

Author: Tarek El-Hajjaoui

Description of Program

Background on the site choice

What the program does

Edge cases and error handling

Pre-requisites

Before running the program:

Important

Instructions to run the program (if using .jar file)

Instructions to run the program (if downloading src code from GitHub):

Source Files (under src directory):

Controller directory

Model directory

View directory

Sources referred to:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages