diff --git a/digitization/filenames.md b/digitization/filenames.md index 1fac507..24a13b5 100644 --- a/digitization/filenames.md +++ b/digitization/filenames.md @@ -1,82 +1,47 @@ --- layout: page -title: "Digitization - File Names" +title: "Naming and Organizing Digital Surrogate Files" permalink: /digitization/filenames/ parent: Digitization +nav_order: 1 --- -# File Naming Conventions +# Naming and Organizing Digital Surrogate Files -## General Rules -- Be consistent in your file naming for a project. Look at past related projects to help determine a structure for file names. Consistency within and across projects will allow for easier management and manipulation of files. -- Use leading zeroes when necessary. Never use a single digit without a leading zero. - - "1" = bad - - "01" = good +Prior to actually starting any digitization project you should decide on a structure and scheme for organizing and naming the files you produce. -## Suggested and Example File Name Structures - -These are suggested file names. Depending on your project there may be valid reasons to deviate from these structures. - -### Manuscript/Archival Material -- ms####_s##_c##_f##_i##_p### -- Breakdown of elements: - - ms = collection identifier - - s = series - - c = container (ignore the container type indicator in ArchivesSpace, use "c" regardless of type) - - f = folder - - i = item - - p = page -- Example = ms2123_s11_ss01_c31_f01.pdf - - -### File names for materials from Corcoran Archives -For materials from the Corcoran Archivces collections following structure should be used. - -#### Collection IDs -The collection identifier should be simplified as follows: - - COR0001.0-RG -> cor1-0 - - COR0003.1-RG -> cor3-1 - - COR0013-MS -> cor13 - -#### Containers - -Container numbers should be simplified as follows: -- Box RG2-2008.018 -> rg2-2008-018 -- Box RG5.0-2008.029 -> rg5-0-2008-029 - -#### Full file name examaples: -cor2-0_s01_ss01_rg2-2008-001_f11_i01.tiff -cor5-0_s06_ss01_rg5-2008-020_f01.pdf - -### Cataloged Books & Pamphlets (rarebooks) -- CallNumber_PageNumber - - Example = spec_ps3544_h56_page34.tif - - 'spec' may be replaced with other collection areas depending on call number in catalog (ex. mei, kiev) - -### Serials (cataloged and from manuscript collections) -In general, it is best to create file names for serials that reflect the items' volume/sequential designation (ex. vol. 1, no. 1). For serials that exist within an archival collection, this information can be easily confused with the top container type "volume." Therefore, the volume designation of the actual work should be used over the instance record volume. - -Example: -- RG0044_s39_vol12_no03 -- GWNews_vol12_no03 - -It is also appropriate to use date information to form file names. This may be relevant if the volume/sequential designations are not present or irregular. -- FBNews_1965_10 - -Example: -- GWNews_199712 (Title_YYYYMM) -- GWTimes_199712-199801 (Title_YYYMM-YYYYMM) - -### Audiovisual Material -In certain cases, an audiovisual work may have multiple parts. Maintaining consistency within the project and including as much collection identifying information as possible is essential. - -In addition, audiovisual material is often minimally described and processed in an archival collection. Often a single archival object may represent many audiovisual items. - -Examples of file names: -- collectionID_s#_c#_f#_i# -- collectionID_s#_c#_title_of_video -- collectionID_c#_title_of_video -- collectionID_s#_c#_f#_i#_part1 -- CollectionID_c#_title_of_video_part2 - -### Born Digital Material -While this section is specifically for filenaming conventions used for digitized content, it should be mentioned that it is often inappropriate to change file names of born-digital records. The original file names, as given by the record creator/s, should be respected when possible. Normalization of born-digital file names can be done (removing bad characters, spaces, ect), but it is not recommended to try to make them fit any of the above schemes. \ No newline at end of file +## Using RefIDs for File Names + +For material that is represented in ArchivesSpace by a corresponding archival object record, the **refid** of the record should be used the basis of the file name and organization. + +Start by creating a directory (folder) with the refid as the name. Any files that you produce should use the refid as the basis of the file name. + +### Example: 2-sided cassette + +``` +root_folder/ +├── ref9916/ + ├── ref9916_001.wav + ├── ref9916_002.wav + └── derivatives/ + ├── ref9916_001.mp3 + ├── ref9916_001_caption_eng.vtt + ├── ref9916_002.mp3 + └── ref9916_002_caption_eng.vtt +``` + +### Example: text-based document +``` +root_folder/ +├── e203d9a24f90f062871a72fe359c7900/ + ├── e203d9a24f90f062871a72fe359c7900_001.tif + ├── e203d9a24f90f062871a72fe359c7900_002.tif + ├── e203d9a24f90f062871a72fe359c7900_003.tif + ├── e203d9a24f90f062871a72fe359c7900_004.tif + ├── e203d9a24f90f062871a72fe359c7900_005.tif + ├── ... + └── derivatives/ + └── e203d9a24f90f062871a72fe359c7900.pdf + ``` + +## Born Digital Material +While this section is specifically for conventions used for digital surrogates, it should be mentioned that it is often inappropriate to change file names of born-digital records. The original file names, as given by the record creator/s, should be respected when possible. Normalization of born-digital file names can be done (removing bad characters, spaces, ect), but it is not recommended to try to make them fit any of the above schemes. \ No newline at end of file diff --git a/digitization/imaging/imaging.md b/digitization/imaging/imaging.md index fc976ee..1c42da5 100644 --- a/digitization/imaging/imaging.md +++ b/digitization/imaging/imaging.md @@ -4,5 +4,5 @@ title: "Digitization: Imaging Text and Graphics" parent: Digitization has_children: true has_toc: true -nav_order: 1 +nav_order: 2 --- \ No newline at end of file diff --git a/managing/audittool.md b/managing/audittool.md deleted file mode 100644 index 01c1eac..0000000 --- a/managing/audittool.md +++ /dev/null @@ -1,36 +0,0 @@ ---- -layout: page -title: "Audit Tool" -permalink: /managing/audittool/ -nav_exclude: true ---- -# **This tool has been depreciated!** - -# Audit Tool -[Github Repo](https://github.com/gwu-libraries/audit-tool) - -The Simple Audit Tool is a command-line tool designed to inventory and monitor file changes (additions, alterations, and deletions) on a storage filesystem. - -## Functionality -- Analyzes for and keeps a log of all changes detected (additions, alterations, deletions) -- Change reports can be automatically emailed to staff. -- Requests humans manually review that all changes were deliberate, non-malicious, and complete. If approved, inventory is updated with changes. -- In combination with cron jobs, can run scheduled audits. - -*What the tool does not do:* - -- Audit files before and after data exchange (i.e., when adding files to storage). That must happen using other tools (e.g., rsync, bagit). - -## Approving Weekly Change Logs - gwspec-digcol1 -This is assuming you've reviewed the weekly logs and found no issues. -- Log into gwspec-digcol1 -- Change directory to audit-tool directory and activate environment -``` - cd /opt/audit-tool - source ENV/bin/activate -``` -- Add changes to the canonical inventory by linking to the JSON report -```` -python audit_tool.py update /path to json report -```` - diff --git a/managing/bagit_profile.md b/managing/bagit_profile.md new file mode 100644 index 0000000..ed290ad --- /dev/null +++ b/managing/bagit_profile.md @@ -0,0 +1,24 @@ +--- +layout: page +title: "Bag Profiles" +permalink: /bag/ +parent: SCRC Digital Collection Storage +grand_parent: Managing Digital Collections - Access and Preservation +--- +BagIt profiles used to package digital content for preservation storage should attempt to conform with single-level requirements for descriptions prescribed by DACS. + +# Digitized Content + +``` +ArchivesSpace-URI: +Bag-Software-Agent: +BagIt-Profile-Identifier: scrc-digitization-profile.json +Bagging-Date: +Collection-ID: +End-Date: +Origin: digitization +Payload-Oxum: +Rights-ID: +Start-Date: +Title: +``` \ No newline at end of file diff --git a/managing/creatingdaos.md b/managing/creatingdaos.md index 4a48491..72171ef 100644 --- a/managing/creatingdaos.md +++ b/managing/creatingdaos.md @@ -9,6 +9,11 @@ nav_order: 1 These instructions are a work in progress. Efforts are underway to integrate the creation of digital object records within broader workflows. These new workflows would generate digital object records upon ingest into the preservation environment or access systems. +# Using package_and_ship +[package_and_ship](https://github.com/gwu-libraries/package_and_ship) + +DAO records are automatically created when using the package_and_ship tool to ingest content into SCRC's digital collection storage. A DAO record created by this tool hold will hold `File URI` values that point to the content in the storage area. + # Using Digital Object Creator (Google Colab Notebook) [Aspace Digital Object Creator](https://drive.google.com/drive/folders/1br8rcrGZlsoAOBGiLDVIG12c8szJwXuQ?usp=drive_link)