This repository has now been archived as of 16/12/2021. This repository has not been maintained since May 2020
A programme written by Sailesh Patel (160034811) designed to scrape information from course programme specification PDFs, as a part of the FYP project, A Chatbot for Assisting University Admission Process, supervised by Dr Sylvia Wong at Aston University.
- Clone the repository
- Install the required technologies listed above (the links are to their respective installation instructions)
Note PIP is not required, but would be beneficial to install Tabula-Py, BeautifulSoup, and Requests
- Please ensure that all the software requirements have been met before executing the program
- To execute the program, run the command
python3 programme-scraper.py
- To run the PDF scraper
- Type
P
and pressEnter
- Type the PDF file in without the
.pdf
extension and pressEnter
BScComputerScience
shows the PDF scraper workingBScDigitalDegreeApprenticeship
shows the PDF scraper not working
- Type
- To run the web scraper
- Type
W
and pressEnter
- Type
EAS
for the school and pressEnter
- Type the website you would like to scrape
- Type
https://www2.aston.ac.uk/study/courses/computer-science-bsc
to show the web scraper working - Type
https://www2.aston.ac.uk/study/courses/chemistry-bsc
to show the web scraper fail to format the text inside the Entry Requirements & Fees for 2020
- Type
- Type
- Type
All Rights Reserved