Skip to content
Anirudh edited this page Jan 30, 2021 · 3 revisions

Welcome to the paperviz wiki!

Flow:

  1. Scrape a conference with the following info and save to json.

    • 'id'
    • 'link'
    • 'title'
    • 'authors'
    • 'abstract'
  2. Download the pdfs for the respective conferences using the links in the json.

  3. Run Image Extraction on the downloaded pdfs to generate images for each paper in the conference.

  4. Generate compressed resized version of the images which will be used as thumbnail.

  5. Upload the extarcted large images to drive and also update the 'img_large' value in the conference json for as the 'file_id' of each uploaded img.

  6. Upload the small resized version of images to github repo and also update the 'img_small' value in the conference json.

  7. Generate 2D Embeddings using the abstract information for each paper to create the visualizations with the following methods:

    • Specter
    • SciBert
    • SentBert
  8. Push the updated conference json containing 2D embeddings to github repo and add the conference to the conf list on the website.

Clone this wiki locally