Skip to content

przona/notebooks

Repository files navigation

Notebooks used in the project Przona

Contact person: Erik Tjong Kim Sang [email protected]

Notebooks for scraping websites with medical guidelines and performing text analysis

Website richtlijnendatabase.nl

  1. Run scrape_website.ipynb to retrieve the html files. They will be stored in the directory ../data/richtlijnendatabase.nl
  2. Run get_paragraphs.ipynb to extract the paragraphs with text from the downloaded files. They will be stored in the file csv/paragraphs_20210712.csv
  3. Run steps 1 and 4 of text_ranking.ipynb to find the paragraphs with relevant medical terms regarding ehealth. This information will be stored in the files paragraphs.json and index.html
  4. Run json_diff.ipynb to compare the json file of step 3 with a previous version and classify the html pages according to treatment steps. The results will be stored in the file index.html

About

Notebooks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages