From 750b655b292ee60563adbff7ff6bb92d4d9e81a3 Mon Sep 17 00:00:00 2001 From: DaltonAlves <110255670+DaltonAlves@users.noreply.github.com> Date: Wed, 6 Nov 2024 15:46:29 -0500 Subject: [PATCH] web archives metadata + digcol storage --- managing/audittool.md | 3 +- managing/storage.md | 22 ++++-- webarchives/webmetadata.md | 138 +++++++++++++++++++++++++++++++++++++ 3 files changed, 158 insertions(+), 5 deletions(-) create mode 100644 webarchives/webmetadata.md diff --git a/managing/audittool.md b/managing/audittool.md index 3a74788..3a38611 100644 --- a/managing/audittool.md +++ b/managing/audittool.md @@ -2,8 +2,9 @@ layout: page title: "Audit Tool" permalink: /managing/audittool/ -parent: Managing Digital Collections - Access and Preservation --- +# **This tool has been depreciated!** + # Audit Tool [Github Repo](https://github.com/gwu-libraries/audit-tool) diff --git a/managing/storage.md b/managing/storage.md index ab8f9c5..ec2a5d2 100644 --- a/managing/storage.md +++ b/managing/storage.md @@ -1,7 +1,21 @@ --- layout: page -title: "Digital Collections Storage" -permalink: /storage/ -parent: Managing Digital Collections - Access, Storage, & Preservation -has_children: true +title: "SCRC Digital Collection Storage" +permalink: /managing/storage/ +parent: Managing Digital Collections - Access and Preservation --- + +# SCRC Digital Collection Storage + +SCRC digital collections are presently stored and managed within Amazon Web Services' "Simple Storage Service (Amazon S3)." This cloud-based storage environment is used to maintain Archival Information Packages (AIPs). + +## DigCol Cloudfront Application +[Digcol-cloudfront-app Repo](https://github.com/gwu-libraries/digcol-cloudfront-app) + +For internal access to SCRC Digital Collections Storage and the AIPs that it contains, SCRC staff may browse an inventory of the storage environment via the DigCol Cloud Front application. This tool provides direct access to AIPs and facilitates the retrieval of digital content.[Digital Archival Object records](/managing/daos.md) are leveraged to maintain URIs that link Archival Objects with their corresponding digital content in the SCRC Digital Collections Storage, accessible through the DigCol CloudFront application. + +Access to the cloudfront application requires authentication via GW Single Sign-on. + +## Technical Debt + +Historically, SCRC has not packaged digital collection objects with metadata. This absence of descriptive, administrative, and technical metadata presents significant challenges for managing and providing access to digital collection materials. Without this critical information, it becomes more difficult to ensure the proper organization, preservation, and discoverability of digital objects, potentially hindering long-term access and curation efforts. \ No newline at end of file diff --git a/webarchives/webmetadata.md b/webarchives/webmetadata.md new file mode 100644 index 0000000..3c88d73 --- /dev/null +++ b/webarchives/webmetadata.md @@ -0,0 +1,138 @@ +--- +layout: page +title: "Metadata for Web Archives" +permalink: /webarchives/metadata/ +parent: GW Web Archives +--- + +# Metadata for Web Archives + +Metadata for web archives exists in two primary places: within Archive-It, the system used by GWLAI to collect and manage its web archives collection, and within ArchivesSpace, the system that GWLAI uses to describe its archives collections. + +Archive-It uses 15 Dublin Core elements. Additional custom metadata fields may also be applied. These metadata fields may be applied to any level of a web archive: collection, seed, and document. SCRC primarily applies metadata to the collection and seed level. See the [Add, edit, and manage your metadata article](https://support.archive-it.org/hc/en-us/articles/208332603-Add-edit-and-manage-your-metadata) in the Archive-It Help Center for more information about metadata and Archive-It + +This page synthesizes local application recommendations based upon the *Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group.* [^1] + +## Metadata Profile for Web Archives + +| Element | **Required/Optional** | +| ---------------------- | --------------------- | +| [Title](#title) | **Required** | +| [Description](#description) | **Required** | +| [Collector](#collector) | **Required** | +| [Date](#date) | **Required** | +| [Language](#language) | **Required** | +| [Creator](#creator) | **Strongly Encouraged** | +| [Identifier_Collection](#identifier_collection) | **Strongly Encouraged** | +| [Subject](#subject) | **Strongly Encouraged** | +| [Source of Description](#source-of-description) | **Optional** | +| [Genre/Form](#genreform) | **Optional** | +| [Contributor](#contributor) | **Optional** | + +This metadata profile is primarily designed for individual seeds within web archive collections, rather than entire collections of web archive content. While it shares similarities with metadata used in finding aids/EAD, it focuses specifically on metadata creation for individual seeds. + +Metadata should be applied to seeds in Archive-It when they are created. It is *strongly encouraged* that all new seeds receive all *required* fields from this profile. + +### Title + +| **Required** | Explanation | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ | +| Definition | The name by which an archive website or collection is known. | +| Standard Usage for SCRC | Office of the President website | +| Standards | DACS 2.3 | +| Guidance | Transcribe directly from the head of the homepage (website) or inspect the homepage for a relevant metatag or other related element. | + +### Description + +| **Required** | Explanation | +| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Definition | One or more notes explaining the content, context, and other aspects of an archived website or collection. | +| Example Usage for SCRC | Website of the Office of the President, George Washington University. Contains information about the President’s office and community messages published by the President. | +| Standards | DACS 3.1 | +| Guidance | Briefly describe the scope and content of the archived website. Describe what the website is about, its purpose, and describe who created it. | + +### Collector + +| **Required** | Explanation | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Definition | Organization responsible for collecting the archived content. | +| Example Usage for SCRC | - Special Collections Research Center, The George Washington University
- George Washington University Libraries | +| Standards | DACS 2.2 | +| Guidance | Identify the institution responsible for selecting websites for archiving, crawling the websites, and creating and maintaining the metadata that describes the content | +| Note | Use SCRC when content falls under SCRC collecting scope. Use GW Libraries when content falls outside of SCRC collecting scope. + +### Date + +| **Required** | Explanation | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Definition | A single date or span of dates associated with the _capture_ of an archived website or collection. | +| Example Usage for SCRC | * Date first crawled: 2024-11-01
* Captured 2024-ongoing
* Captured 2021-2024 | +| Standards | DACS 2.4 | +| Guidance | For non-scheduled crawls, use single dates. For seeds that are scheduled, use an ongoing date. If a scheduled seed becomes inactive, use an end date as part of a date range. | +| Note | Do not include dates outside the range of the archived content. For example, if a website was first crawled in 2024, but the website was initially created in 2004, only use the 2024 date. | + +### Language + +| **Required** | Explanation | +| ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------- | +| Definition | The language(s) of the archived content.
| +| Standard Usage for SCRC | English
Spanish
Chinese | +| Standards | DACS 4.5 | +| Guidance | This field may be repeated as many times as necessary to capture the languages used throughout the archived content.
Use the English name of Language from ISO 639.2 (not the ISO codes) | + +### Creator + +| **Strongly Encouraged** | Explanation | +| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Definition | An organization or person principally responsible for creating the intellectual content of an archived website or collection. | +| Standard Usage for SCRC | George Washington University. Office of the President | +| Standards | DACS 2.6 | +| Guidance | The creator of a single website, such as an institutional home page, blog or twitter feed, usually is easily identified unless purposely anonymous, while a collection of websites focused on a current event or topic rarely has an overall creator. See also: [contributor](#contributor).| + +### Identifier_Collection + +| **Strongly Encouraged** | Explanation | +| ---------------------------- | --------------------------------------------------------------------------------------------------------| +| Definition | The collection Identifier associated with the archived website. Use the resource record that | +| Example Usage for SCRC | - RG002
- MS2285
- NEA1011-RG | +| Standards | DACS 2.1 | +| Guidance | Identify the creator of the website first, then see if there is a collection for that creator. | +| Note | May use multiple collection identifiers if the archived website is associated with multiple collections. | + +### Subject + +| **Strongly Encouraged** | Explanation | +| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Definition | Primary topic(s) describing the content of an archived website or collection | +| Example Usage for SCRC | \- George Washington University
\- Education, Higher -- United States
\- Cross-country running | +| Guidance | Identify the creator of the website first, then see if there is a collection for that creator. | +| Note | Identify topical subjects, geographic locations, and people and organizations relevant to the content of the collection.
Use subjects already present in ArchivesSpace. If not present, use FAST or LCSH. | + +### Source of Description + +| **Optional** | Explanation | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ | +| Definition | Information about the gathering or creation of the metadata itself, such as sources of data or the date on which source data was obtained. | +| Example Usage for SCRC | Description based on archived webpage captured on November 6, 2024. | +| Standards | DACS 7.1.8 | +| Guidance | Added value. Use when the website has been crawled for a long period of time and undergone many changes. | + +### Genre/Form + +| **Optional** | Explanation | +| ---------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- | +| Definition | The type of content in an archived website or collection. | +| Example Usage for SCRC | \- Website
\- News article
\- Social Media | +| Note | At present, do not use for individual seeds/websites unless the format is uncommon. This is mostly relevant to collection-level metadata or description for web archives in ArchivesSpace (finding aids). | + +### Contributor + +| **Optional** | Explanation | +| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Definition | An organization or person secondarily responsible for the content of an archived website or collection | +| Example Usage for SCRC | Knapp, Steven, 1951- | +| Standards | DACS 2.6 | +| Guidance | If two or more entities share principal responsibility, place them all in Creator field. Otherwise, place one in the Contributor element. *Use Contributor for all that have secondary responsibility* | +| Note | Use agent records from ArchivesSpace. If agent record not present, please reach out to the archivist responsible for the relevant collection area to discuss adding an agent record. When no agent record is present in ArchivesSpace, use LCNAF.
| + +[^1]: Dooley, Jackie, and Kate Bowers. 2018. Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group. Dublin, OH: OCLC Research. https://doi.org/10.2