Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

daos_file-names #8

Merged
merged 2 commits into from
Feb 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 40 additions & 75 deletions digitization/filenames.md
Original file line number Diff line number Diff line change
@@ -1,82 +1,47 @@
---
layout: page
title: "Digitization - File Names"
title: "Naming and Organizing Digital Surrogate Files"
permalink: /digitization/filenames/
parent: Digitization
nav_order: 1
---
# File Naming Conventions
# Naming and Organizing Digital Surrogate Files

## General Rules
- Be consistent in your file naming for a project. Look at past related projects to help determine a structure for file names. Consistency within and across projects will allow for easier management and manipulation of files.
- Use leading zeroes when necessary. Never use a single digit without a leading zero.
- "1" = bad
- "01" = good
Prior to actually starting any digitization project you should decide on a structure and scheme for organizing and naming the files you produce.

## Suggested and Example File Name Structures

These are suggested file names. Depending on your project there may be valid reasons to deviate from these structures.

### Manuscript/Archival Material
- ms####_s##_c##_f##_i##_p###
- Breakdown of elements:
- ms = collection identifier
- s = series
- c = container (ignore the container type indicator in ArchivesSpace, use "c" regardless of type)
- f = folder
- i = item
- p = page
- Example = ms2123_s11_ss01_c31_f01.pdf


### File names for materials from Corcoran Archives
For materials from the Corcoran Archivces collections following structure should be used.

#### Collection IDs
The collection identifier should be simplified as follows:
- COR0001.0-RG -> cor1-0
- COR0003.1-RG -> cor3-1
- COR0013-MS -> cor13

#### Containers

Container numbers should be simplified as follows:
- Box RG2-2008.018 -> rg2-2008-018
- Box RG5.0-2008.029 -> rg5-0-2008-029

#### Full file name examaples:
cor2-0_s01_ss01_rg2-2008-001_f11_i01.tiff
cor5-0_s06_ss01_rg5-2008-020_f01.pdf

### Cataloged Books & Pamphlets (rarebooks)
- CallNumber_PageNumber
- Example = spec_ps3544_h56_page34.tif
- 'spec' may be replaced with other collection areas depending on call number in catalog (ex. mei, kiev)

### Serials (cataloged and from manuscript collections)
In general, it is best to create file names for serials that reflect the items' volume/sequential designation (ex. vol. 1, no. 1). For serials that exist within an archival collection, this information can be easily confused with the top container type "volume." Therefore, the volume designation of the actual work should be used over the instance record volume.

Example:
- RG0044_s39_vol12_no03
- GWNews_vol12_no03

It is also appropriate to use date information to form file names. This may be relevant if the volume/sequential designations are not present or irregular.
- FBNews_1965_10

Example:
- GWNews_199712 (Title_YYYYMM)
- GWTimes_199712-199801 (Title_YYYMM-YYYYMM)

### Audiovisual Material
In certain cases, an audiovisual work may have multiple parts. Maintaining consistency within the project and including as much collection identifying information as possible is essential.

In addition, audiovisual material is often minimally described and processed in an archival collection. Often a single archival object may represent many audiovisual items.

Examples of file names:
- collectionID_s#_c#_f#_i#
- collectionID_s#_c#_title_of_video
- collectionID_c#_title_of_video
- collectionID_s#_c#_f#_i#_part1
- CollectionID_c#_title_of_video_part2

### Born Digital Material
While this section is specifically for filenaming conventions used for digitized content, it should be mentioned that it is often inappropriate to change file names of born-digital records. The original file names, as given by the record creator/s, should be respected when possible. Normalization of born-digital file names can be done (removing bad characters, spaces, ect), but it is not recommended to try to make them fit any of the above schemes.
## Using RefIDs for File Names

For material that is represented in ArchivesSpace by a corresponding archival object record, the **refid** of the record should be used the basis of the file name and organization.

Start by creating a directory (folder) with the refid as the name. Any files that you produce should use the refid as the basis of the file name.

### Example: 2-sided cassette

```
root_folder/
├── ref9916/
├── ref9916_001.wav
├── ref9916_002.wav
└── derivatives/
├── ref9916_001.mp3
├── ref9916_001_caption_eng.vtt
├── ref9916_002.mp3
└── ref9916_002_caption_eng.vtt
```

### Example: text-based document
```
root_folder/
├── e203d9a24f90f062871a72fe359c7900/
├── e203d9a24f90f062871a72fe359c7900_001.tif
├── e203d9a24f90f062871a72fe359c7900_002.tif
├── e203d9a24f90f062871a72fe359c7900_003.tif
├── e203d9a24f90f062871a72fe359c7900_004.tif
├── e203d9a24f90f062871a72fe359c7900_005.tif
├── ...
└── derivatives/
└── e203d9a24f90f062871a72fe359c7900.pdf
```

## Born Digital Material
While this section is specifically for conventions used for digital surrogates, it should be mentioned that it is often inappropriate to change file names of born-digital records. The original file names, as given by the record creator/s, should be respected when possible. Normalization of born-digital file names can be done (removing bad characters, spaces, ect), but it is not recommended to try to make them fit any of the above schemes.
2 changes: 1 addition & 1 deletion digitization/imaging/imaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,5 @@ title: "Digitization: Imaging Text and Graphics"
parent: Digitization
has_children: true
has_toc: true
nav_order: 1
nav_order: 2
---
36 changes: 0 additions & 36 deletions managing/audittool.md

This file was deleted.

24 changes: 24 additions & 0 deletions managing/bagit_profile.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
layout: page
title: "Bag Profiles"
permalink: /bag/
parent: SCRC Digital Collection Storage
grand_parent: Managing Digital Collections - Access and Preservation
---
BagIt profiles used to package digital content for preservation storage should attempt to conform with single-level requirements for descriptions prescribed by DACS.

# Digitized Content

```
ArchivesSpace-URI:
Bag-Software-Agent:
BagIt-Profile-Identifier: scrc-digitization-profile.json
Bagging-Date:
Collection-ID:
End-Date:
Origin: digitization
Payload-Oxum:
Rights-ID:
Start-Date:
Title:
```
5 changes: 5 additions & 0 deletions managing/creatingdaos.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ nav_order: 1

These instructions are a work in progress. Efforts are underway to integrate the creation of digital object records within broader workflows. These new workflows would generate digital object records upon ingest into the preservation environment or access systems.

# Using package_and_ship
[package_and_ship](https://github.com/gwu-libraries/package_and_ship)

DAO records are automatically created when using the package_and_ship tool to ingest content into SCRC's digital collection storage. A DAO record created by this tool hold will hold `File URI` values that point to the content in the storage area.

# Using Digital Object Creator (Google Colab Notebook)
[Aspace Digital Object Creator](https://drive.google.com/drive/folders/1br8rcrGZlsoAOBGiLDVIG12c8szJwXuQ?usp=drive_link)

Expand Down