Skip to content
This repository was archived by the owner on Dec 17, 2021. It is now read-only.

SaileshPatel/programme-specification-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 

Repository files navigation

course-programme-specification-scraper

This repository has now been archived as of 16/12/2021. This repository has not been maintained since May 2020

A programme written by Sailesh Patel (160034811) designed to scrape information from course programme specification PDFs, as a part of the FYP project, A Chatbot for Assisting University Admission Process, supervised by Dr Sylvia Wong at Aston University.

Tech Used

Installation

  1. Clone the repository
  2. Install the required technologies listed above (the links are to their respective installation instructions)

Note PIP is not required, but would be beneficial to install Tabula-Py, BeautifulSoup, and Requests

Usage

  • Please ensure that all the software requirements have been met before executing the program
  • To execute the program, run the command python3 programme-scraper.py
  • To run the PDF scraper
    • Type P and press Enter
    • Type the PDF file in without the .pdf extension and press Enter
      • BScComputerScience shows the PDF scraper working
      • BScDigitalDegreeApprenticeship shows the PDF scraper not working
  • To run the web scraper
    • Type W and press Enter
      • Type EAS for the school and press Enter
      • Type the website you would like to scrape
        • Type https://www2.aston.ac.uk/study/courses/computer-science-bsc to show the web scraper working
        • Type https://www2.aston.ac.uk/study/courses/chemistry-bsc to show the web scraper fail to format the text inside the Entry Requirements & Fees for 2020

License

All Rights Reserved

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages