From 9b59dc8eab7a13c788e27fda17940cf2fd62f5be Mon Sep 17 00:00:00 2001
From: DaltonAlves <110255670+DaltonAlves@users.noreply.github.com>
Date: Thu, 23 May 2024 12:47:01 -0400
Subject: [PATCH 1/2] webvtt

---
 README.md                     |  2 +-
 digitization/av/av_records.md |  2 +-
 digitization/av/captions.md   | 38 +++++++++++++++++++++++++++--------
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/README.md b/README.md
index ed8fa0f..d8bb2de 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,2 @@
 # Digital Services Documentation
-Documentation for [LAI Digital Services unit](https://library.gwu.edu/digital-collections-and-services). The documentation can be accessed at: (https://gwu-libraries.github.io/DigitalServicesDocs/)
+Documentation for [LAI Digital Services unit](https://library.gwu.edu/digital-collections-and-services). The documentation can be accessed at: https://gwu-libraries.github.io/DigitalServicesDocs/
diff --git a/digitization/av/av_records.md b/digitization/av/av_records.md
index f7299a0..6ee007d 100644
--- a/digitization/av/av_records.md
+++ b/digitization/av/av_records.md
@@ -4,7 +4,7 @@ title: "Enhancing AV records"
 permalink: /digitization/avrecords/
 parent: "Digitization: Audiovisual Carriers"
 grand_parent: Digitization
-nav_order: 1
+nav_order: 3
 ---
 
 When digitizing audiovisual carriers we should update finding aids to reflect new information gained from the digitization process. Use the following fields in ArchivesSpace to hold new information.
diff --git a/digitization/av/captions.md b/digitization/av/captions.md
index eff1979..a372101 100644
--- a/digitization/av/captions.md
+++ b/digitization/av/captions.md
@@ -1,24 +1,46 @@
 ---
 layout: page
-title: "Creating Captions and Transcripts for A/V material"
+title: "Captions and Transcripts for AV material"
 permalink: /digitization/captions/
 parent: "Digitization: Audiovisual Carriers"
 grand_parent: Digitization
-nav_order: 1
+nav_order: 2
 ---
-# Creating Captions and Transcripts for Audiovisual material
-Captions and transcripts for audiovisual material make collections material more accessible and enhance the over-all level of access. Captions should be created as VTT (WEBVTT) format. Captions can be embedded into the content, but only for access files.
+# Captions and Transcripts for Audiovisual Material
+Captions and transcripts for audiovisual material make collections material more accessible and enhance the over-all level of access. [^1] Captions should be created in the WebVTT format. Captions can be embedded or burned into access files, but they should not be embedded into master files.
 
-## Tools for Creating and Reviewing Captions
+# Tools for Creating and Reviewing Captions
+
+## WhisperAI - Jupyter Notebook Python Script
+
+The Digital Services unit uses [WhisperAI](https://github.com/openai/whisper) via a python script run in a Jupyter Notebook environment to generate WebVTT caption files. This script sets WebVTT metadata values as described in the [Guidelines for Embedding Metadata in WebVTT Files (FADGI)](https://www.digitizationguidelines.gov/guidelines/accessibilty_WebVTT.html)
+
+The WebVTT files generated through this script can be edited in SubtitleEdit. 
+
+## SubtitleEdit 
+SubtitleEdit is an open source tool for creating and editing subtitles and captions. SubtitleEdit can also be used as a GUI for WhisperAI.
 
 ## Adobe Premiere
 The SCRC digitization lab provides access to Adobe Premiere. You may also request a license for your individual workstation via GWU IT.
 
 -[Guide: Creating Captions using Adobe Premiere](https://docs.google.com/document/d/1ZRbPUFNSYGfbZA5lTdnkvejmCeflvuGvMg0Ot4aBwyU/edit?usp=sharing)
 
-## SubtitleEdit 
-SubtitleEdit is an open source tool for creating and editing subtitles and captions. SubtitleEdit can also be used as a GUI for WhisperAI.
 
-[^1]: Dave Rodriguez, Bryan J. Brown, and Florida State University Libraries, “Comparative Analysis of Automated Speech Recognition Technologies for Enhanced Audiovisual Accessibility,” The Code4Lib Journal, no. 58 (December 4, 2023), https://journal.code4lib.org/articles/17820.
+# File Naming for Caption and Subtitle Files
+WebVTT files should inherent the file name of the AV source file. The file name should also include information about the language and WebVTT type to improve accessibility.
 
+Example:
+- *Source AV File*: ms0000_s01_ss02_c03_i01.mp4
+- *WebVTT Type*: caption
+- *Language*: eng
+- **WebVTT file**: ms0000_s01_ss02_c03_i01.caption.eng.vtt
 
+# Metadata for WebVTT 
+See [Guidelines for Embedding Metadata in WebVTT Files (FADGI)](https://www.digitizationguidelines.gov/guidelines/accessibilty_WebVTT.html). Required elements should be used when possible. The following elements are *always recommended*:
+ 
+- Type
+- Language
+- Originating File
+
+---
+[^1]: Dave Rodriguez, Bryan J. Brown, and Florida State University Libraries, “Comparative Analysis of Automated Speech Recognition Technologies for Enhanced Audiovisual Accessibility,” The Code4Lib Journal, no. 58 (December 4, 2023), https://journal.code4lib.org/articles/17820.
\ No newline at end of file

From b4e4756121e565ed5367808dd5bc3c2adf81f732 Mon Sep 17 00:00:00 2001
From: DaltonAlves <110255670+DaltonAlves@users.noreply.github.com>
Date: Fri, 24 May 2024 11:44:44 -0400
Subject: [PATCH 2/2] Update captions.md

---
 digitization/av/captions.md | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/digitization/av/captions.md b/digitization/av/captions.md
index a372101..84723f5 100644
--- a/digitization/av/captions.md
+++ b/digitization/av/captions.md
@@ -27,13 +27,12 @@ The SCRC digitization lab provides access to Adobe Premiere. You may also reques
 
 
 # File Naming for Caption and Subtitle Files
-WebVTT files should inherent the file name of the AV source file. The file name should also include information about the language and WebVTT type to improve accessibility.
+WebVTT files should inherent the file name of the AV source file. The file name should also include information about the language to improve accessibility. 
 
 Example:
 - *Source AV File*: ms0000_s01_ss02_c03_i01.mp4
-- *WebVTT Type*: caption
 - *Language*: eng
-- **WebVTT file**: ms0000_s01_ss02_c03_i01.caption.eng.vtt
+- **WebVTT file**: ms0000_s01_ss02_c03_i01_eng.vtt
 
 # Metadata for WebVTT 
 See [Guidelines for Embedding Metadata in WebVTT Files (FADGI)](https://www.digitizationguidelines.gov/guidelines/accessibilty_WebVTT.html). Required elements should be used when possible. The following elements are *always recommended*: