Skip to content

Latest commit

 

History

History
43 lines (34 loc) · 2.41 KB

README.md

File metadata and controls

43 lines (34 loc) · 2.41 KB

Preservica-Deduplication

Process to run on exported collections in preservica to sort through duplicated collections and match correct PreservicaRef's found in Gamera

deduplicate.sh

Bash script that takes in the exported.csv structure and pulls the SourceIDs from the inital csv. Then runs a drush command that pulls the rels-ext of each sourceID and stores the preservica Ref from the file. Both SourceID and preservicaRef gets stored in a new .csv file to be read by accessAPI.py

accessAPI.py

takes in a .csv file with the first column being the sourceID (pitt:##########) and the second column being the subsequent preservica ref pulled from Gamera

using the SourceID and preservica Ref the following pseudocode is implemented:

        Find islandora_PIDs with an RELS-EXT islandora:preservicaRef
        Foreach islandora_PID
                    Search Preservica's sourceID for the islandora_PID value
                    Foreach Preservica RefId
                        If the Preservica RefId has a root folder which is not "IslandoraIngests"
                                    If no RefId is flagged as authoritative
                                        Flag this RefId as authoritative
                                    Else
                                        Raise an error (mulitple are authoritative)
                                        Move to the next islandora_PID
                                    EndIf
                        EndIf
                    EndForeach
                    If No RefIf is flagged as authoritative
                        Flag the last seen RefId as authoritative
                    EndIf
                    Foreach Preservica RefId
                        If the RefId is flagged as authoritative
                                    If the islandora:preservicaRef differs
                                        Update the islandora:preservicaRef in the RELS-EXT
                                    EndIf
                        Else
                                    Mark the Preservica RefId to be moved to the ZZZ_DELETE folder
                        EndIf
                    EndForeach
        EndForeach

once the code executes the remaining file - update.csv will be used to run the move_to_trash function or the update_preservica_ref function depending on the key.