Skip to content

Scraping Conferences

Anirudh edited this page Oct 20, 2020 · 2 revisions

paperviz scraper!

For each conference the runs are summarized to download the pdfs from the extracted json conference contents and reproduce the results.

ACL Anthology Bib with abstracts is parsed for the required conferences. Note: All conferences before 2017 do not have abstracts.

  1. ACL (2017,18,19,20)
  2. EMNLP (2017,18,19)
  3. NAACL (2018,19,20) [Was not held in 2017]
#ACL Runs
python Anthology_PDF_Downloader.py --conf_json scrape/data/ACL/ACL_2020.json --save_dir ../data/paper_pdfs/ACL_2020 --parallel
python Anthology_PDF_Downloader.py --conf_json scrape/data/ACL/ACL_2019.json --save_dir ../data/paper_pdfs/ACL_2019 --parallel
python Anthology_PDF_Downloader.py --conf_json scrape/data/ACL/ACL_2018.json --save_dir ../data/paper_pdfs/ACL_2018 --parallel
python Anthology_PDF_Downloader.py --conf_json scrape/data/ACL/ACL_2017.json --save_dir ../data/paper_pdfs/ACL_2017 --parallel

#NAACL Runs
python Anthology_PDF_Downloader.py --conf_json scrape/data/NAACL/NAACL_2019.json --save_dir ../data/paper_pdfs/NAACL_2019 --parallel
python Anthology_PDF_Downloader.py --conf_json scrape/data/NAACL/NAACL_2018.json --save_dir ../data/paper_pdfs/NAACL_2018 --parallel

#EMNLP Runs
python Anthology_PDF_Downloader.py --conf_json scrape/data/EMNLP/EMNLP_2019.json --save_dir ../data/paper_pdfs/EMNLP_2019 --parallel
python Anthology_PDF_Downloader.py --conf_json scrape/data/EMNLP/EMNLP_2018.json --save_dir ../data/paper_pdfs/EMNLP_2018 --parallel
python Anthology_PDF_Downloader.py --conf_json scrape/data/EMNLP/EMNLP_2017.json --save_dir ../data/paper_pdfs/EMNLP_2017 --parallel
  1. NeurIPS (2015,16,17,18,19)
  1. ICML (2015,16,17,18,19)
  2. AISTATS (2015,16,17,18,19,20)
#ICML Runs
python PMLR_PDF_Downloader.py --conf_json scrape/data/ICML/ICML_2019.json --save_dir ../data/paper_pdfs/ICML_2019 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/ICML/ICML_2018.json --save_dir ../data/paper_pdfs/ICML_2018 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/ICML/ICML_2017.json --save_dir ../data/paper_pdfs/ICML_2017 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/ICML/ICML_2016.json --save_dir ../data/paper_pdfs/ICML_2016 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/ICML/ICML_2015.json --save_dir ../data/paper_pdfs/ICML_2015 --parallel

#AISTATS Runs
python PMLR_PDF_Downloader.py --conf_json scrape/data/AISTATS/AISTATS_2020.json --save_dir ../data/paper_pdfs/AISTATS_2020 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/AISTATS/AISTATS_2019.json --save_dir ../data/paper_pdfs/AISTATS_2019 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/AISTATS/AISTATS_2018.json --save_dir ../data/paper_pdfs/AISTATS_2018 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/AISTATS/AISTATS_2017.json --save_dir ../data/paper_pdfs/AISTATS_2017 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/AISTATS/AISTATS_2016.json --save_dir ../data/paper_pdfs/AISTATS_2016 --parallel
python PMLR_PDF_Downloader.py --conf_json scrape/data/AISTATS/AISTATS_2015.json --save_dir ../data/paper_pdfs/AISTATS_2015 --parallel
  1. CVPR (2015,16,17,18,19)
  2. ICCV (2015,16,17,18,19)
  3. WACV (2015,16,17,18,19)