-
Notifications
You must be signed in to change notification settings - Fork 0
Image Extraction
Anirudh edited this page Oct 21, 2020
·
3 revisions
We tested 2 different methods for image extraction from pdfs.
TODO: Pros/Cons for both
- Based on Computer Vision and Object detection.
- Requires GPUs.
- Sufficiently accurate.
Allenai pdffigures2 (https://github.com/allenai/pdffigures2)
- Takes a lot of time to run.
- No GPUs requires
- Extracts and saves multiple images along with table saving functionality.
TODO: #Add Runs
python run.py ../data/paper_pdfs/ICML_2019 ../data/paper_pdfs/ICML_2019_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2018 ../data/paper_pdfs/ICML_2018_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2017 ../data/paper_pdfs/ICML_2017_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2016 ../data/paper_pdfs/ICML_2016_Pics --no_rename
python run.py ../data/paper_pdfs/ICML_2015 ../data/paper_pdfs/ICML_2015_Pics --no_rename
python run.py ../data/paper_pdfs/ACL_2020 ../data/paper_pdfs/ACL_2020_Pics --no_rename
python run.py ../data/paper_pdfs/ACL_2019 ../data/paper_pdfs/ACL_2019_Pics --no_rename
python run.py ../data/paper_pdfs/ACL_2018 ../data/paper_pdfs/ACL_2018_Pics --no_rename
python run.py ../data/paper_pdfs/ACL_2017 ../data/paper_pdfs/ACL_2017_Pics --no_rename
python run.py ../data/paper_pdfs/NAACL_2019 ../data/paper_pdfs/NAACL_2019_Pics --no_rename
python run.py ../data/paper_pdfs/NAACL_2017 ../data/paper_pdfs/NAACL_2017_Pics --no_rename