This repository is made as part of a assignment of the course 'Databases Advanced', the intention of this assignment is to get familiar with services like MongoDB, Docker, Virtual Machines etc. To keep this assignment clear and easy to understand we splitted this task in multiple smaller tasks.
The first task is to scrape the Blockcain website for all the current Bitcoin (BTC) transactions all over the world. The output of this part of the assignment has to be the highest USD value at the moment of scraping. When you are running a webscraper permanently it can become quite heavy for your computer thats why I recommend running this in a cloud based environment or virutual machine and running the script once every minute. (For me it is running on an Ubuntu Virtual Machine)
Step 1: Clone my repository
git clone https://github.com/yorickcleerbout/Databases-Advanced.git
Step 2: Install required python packages
pip3 install -r requirements.txt
Step 3: Make the python script executable (Linux)
chmod +x scraper.py
Step 4: Run the Script
python3 scraper.py
At this point the highest amount in USD is printed to the terminal, I also added a feature that the highest amount in saved inside a results.json
file where the date is sorted per date. By having this file you can select the highest trades on a specific day if you would like.
Json Output Format:
{
"yyyy-mm-dd": [
{
"Hash" : "hash is here",
"Time": "Time of transaction",
"Amount (BTC)": "Amount of BTC",
"Amount (USD)": "Amount in USD"
}
]
}
The next objective is to save the highest BTC transaction to a MongoDB collection. In order to accomplish this objective you need to download and install MongoDB. As we are using an ubuntu virtual machine, this is quite easy by using the terminal.
Step 1: Install MongoDB
wget -qO - https://www.mongodb.org/static/pgp/server-4.4.asc | sudo apt-key add -
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/4.4 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-4.4.list
sudo apt-get install -y mongodb-org
Step 2: Import python package
If you didnt install all the python packages required for this project mentioned in Task 1 (Installed using requirements.txt file), you need to install the python package for mongodb manually.
pip3 install pymongo
Step 3: Start MongoDB Service
sudo systemctl start mongod
For task 1 you had to run the scraper.py
file to execute the scraper, in this part of the assignment I chanced it up a little bit. I created a file called main.py
. From now on, the only file you need to run is main.py
to use this project. As mentioned before you need to make this file executable in order to use it.
(Reminder)
Step 1: Clone my repository
git clone https://github.com/yorickcleerbout/Databases-Advanced.git
Step 2: Install required python packages
pip3 install -r requirements.txt
Step 3: Make the python script executable (Linux)
chmod +x main.py
Step 4: Run the Script
python3 main.py
This task is all about the availability of the data during execution, Redis is a key-value paired database that I'm using to cache my scraped data temperary. The way I implemented Redis is after i scrape the data I immediatly "save" the data in a Redis database that holds the information for about 1 minute, when the data is in Redis my parser.py
file gets the data out of Redis to filter for the highest value to be able to save this into a MongoDB.
Step 1: Install Redis
sudo apt install redis-server
Step 2: Import python package
If you didnt install all the python packages required for this project mentioned in Task 1 (Installed using requirements.txt file), you need to install the python package for redis manually.
pip3 install redis
Step 3: Start Redis Service
sudo systemctl start redis
As always just run the file main.py
to use the full project.
For the last part of this assignment we had to transform our project into containers so we can run every single component in a docker container. What this does is, it makes it possible to run this project everywhere you want, the only thing you have to do is install docker (Windows, Linux or MacOS) and start the program.
https://www.docker.com/products/docker-desktop
sudo apt install docker.io
You can pull my created images from my docker hub profile or you can create your own images using Dockerfiles.
Scraper: https://hub.docker.com/repository/docker/yorickcleerbout/scraper
Parser: https://hub.docker.com/repository/docker/yorickcleerbout/parser
Put the next code in a file with name Dockerfile
and no extension or download my dockerfiles from this repository.
Scraper
MAINTAINER yorickcleerbout
COPY . .
RUN apt-get update && apt-get install -y git
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN git clone https://github.com/yorickcleerbout/Databases-Advanced.git
RUN cd Databases-Advanced
RUN pip3 install requests
RUN pip3 install beautifulsoup4
RUN pip3 install pandas
RUN pip3 install pymongo
RUN pip3 install redis
RUN cp "Databases-Advanced/DockerVersion/scraper.py" .
CMD ["python3", "scraper.py"]
Parser
FROM ubuntu:latest AS parser
MAINTAINER yorickcleerbout
COPY . .
RUN apt-get update && apt-get install -y git
RUN apt-get install -y python3
RUN apt-get install -y python3-pip
RUN git clone https://github.com/yorickcleerbout/Databases-Advanced.git
RUN cd Databases-Advanced
RUN pip3 install requests
RUN pip3 install beautifulsoup4
RUN pip3 install pandas
RUN pip3 install pymongo
RUN pip3 install redis
RUN cp "Databases-Advanced/DockerVersion/parser.py" .
CMD ["python3", "parser.py"]
Mongo & Redis
docker pull mongo
docker pull redis
docker run --name scraper {imageID}
docker run --name parser {imageID}
docker run -p 27017:27017 --name mongo mongo
docker run --name redis redis
We need to create a network to connect these containers with each other.
docker network create {networkName}
docker network connect {networkName} {containerName}
This assignment was an overal fun and educational experience. Python was ofcourse nothing new to me but to work with mongoDB, Redis and Docker was a totaly new for me. I hope I can do more of this kind of assignments in the future to further expand my knowledge and skills as part of a learning experience.