-
Notifications
You must be signed in to change notification settings - Fork 4
Bulkrax imports
.env
values are populated for:
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
AWS_PROQUEST_ETD_BUCKET_NAME
-
Make sure that
/opt/scholarspace/scholarspace-ingest/etd_zips
is empty. If this directory is not empty, remove all files from it -
Run the
gwss:download_new_pq_zips
task, which requires a destination path argument. For example:bundle exec rails gwss:download_new_pq_zips['/opt/scholarspace/scholarspace-ingest/etd_zips']
Run the ingest_pq_etds
rake task either inside the container or from the outside using docker exec
. The task requires an argument, which is the path to the directory containing the ProQuest zips you wish to include in the ingest. For example, bundle exec rails gwss:ingest_pq_etds['/opt/scholarspace/scholarspace-ingest/etd-zips']
if etds are in /opt/scholarspace/scholarspace-ingest/etd-zips
.
The Bulkrax manifest will be written in a bulkrax_zips
directory, inside the directory corresponding to the value of the TEMP_FILE_BASE
environment variable (typically set in .env
). The manifest contains:
- a
metadata.csv
Bulkrax-compliant manifest file - a
files
directory, containing a directory for each ETD zip, which itself contains:- the ProQuest XML file
- the main ETD PDF
- optionally, a folder containing additional attachments for the ETD
Within the GW ScholarSpace web application, log in as an administrative user. On the Dashboard, click on Importers. Create a New importer with the following values:
- Name = any name
- Administrative Set = ETDs
- Frequency = Once (on save)
- Limit = leave blank
- Parser = CSV - Comma Separated Values
- Visibility = Public
- Rights Statement = leave blank
- Add CSV File to Import: Specify a Path on the Server. Import file path =
{TEMP_FILE_BASE}/bulkrax_zip/metadata.csv
-
Before starting the import, open a tab to the Sidekiq administrator (at
/sidekiq
) so that you can watch progress of the queues and monitor for any problems.
Then proceed and click Create and Import.
*If you wish to re-run the task to generate the bulkrax-ready metadata and files, then you'll need to first clear out the results of the previous run: rm -r {TEMP_FILE_BASE}/bulkrax_zip
-
Remove all files downloaded to
/opt/scholarspace/scholarspace-ingest/etd_zips
(so that they won't be re-loaded next time). -
Remove the Bulkrax metadata.csv and
files
directory:rm -r {TEMP_FILE_BASE}/bulkrax_zip
You will need to create a zip file containing:
-
metadata.csv
(TODO: provide examplemetadata.csv
). Column names should be: "model", "title", "creator", "contributor", "language", "description", "keyword", "degree", "resource_type", "advisor", "gw_affiliation", "date_created", "committee_member", "rights_statement", "license", "proquest_zipfile", "bulkrax_identifier", "file", "parents", "visibility", "visibility_during_embargo", "visibility_after_embargo", "embargo_release_date" - a
files
directory containing attachments referenced inmetadata.csv
Q: When I create an importer, the administrative set that I wish to import to isn't showing up in the dropdown list.
A: This can occur when your user has the admin
role and can therefore access /importers
but does not have the contentadmin
role; contentadmin
s can import to any admin set. Try adding the contentadmin
role to your administrative user.