Skip to content

Image Extraction

Anirudh edited this page Oct 21, 2020 · 3 revisions

We tested 2 different methods for image extraction from pdfs.

TODO: Pros/Cons for both

Image Extractor ()

  1. Based on Computer Vision and Object detection.
  2. Requires GPUs.
  3. Sufficiently accurate.
  1. Takes a lot of time to run.
  2. No GPUs requires
  3. Extracts and saves multiple images along with table saving functionality.

TODO: #Add Runs

python run.py ../data/paper_pdfs/ICML_2019 ../data/paper_pdfs/ICML_2019_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2018 ../data/paper_pdfs/ICML_2018_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2017 ../data/paper_pdfs/ICML_2017_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2016 ../data/paper_pdfs/ICML_2016_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2015 ../data/paper_pdfs/ICML_2015_Pics --no_rename

python run.py ../data/paper_pdfs/ACL_2020 ../data/paper_pdfs/ACL_2020_Pics --no_rename
python run.py ../data/paper_pdfs/ACL_2019 ../data/paper_pdfs/ACL_2019_Pics --no_rename
python run.py ../data/paper_pdfs/ACL_2018 ../data/paper_pdfs/ACL_2018_Pics --no_rename
python run.py ../data/paper_pdfs/ACL_2017 ../data/paper_pdfs/ACL_2017_Pics --no_rename

python run.py ../data/paper_pdfs/NAACL_2019 ../data/paper_pdfs/NAACL_2019_Pics --no_rename
python run.py ../data/paper_pdfs/NAACL_2017 ../data/paper_pdfs/NAACL_2017_Pics --no_rename
Clone this wiki locally