Digital Preservation at the Library of Birmingham: The journey of a digital record from transfer to ingest

In my previous post I discussed the various digital resources and collections held by the Library of Birmingham, focusing on the Archives and Collections Service.

With the service reopened for a few months now, we recently had increasing numbers of enquiries about potential deposits of digital records and have received a few new digital accessions. This has provided an opportunity to test various workflows and processes I have been developing around the transfer, processing and ingest of such material.

We recently received a deposit of additional material from the City of Birmingham Symphony Orchestra (Collection ref: MS 4657 (2021/072)), comprising CBSO Programmes dating from around the mid-1950s to the present. This included 53.6 megabytes of PDF programme files stored on an external hard drive.

USB hard drive containing the CBSO programme files

After receiving the material, I created a folder to temporarily store the assets on our servers. I created a sub-folder within this directory to store any documentation and metadata relating to the files generated during the transfer process – which included a JPEG image of the photograph of the storage media taken after deposit, shown above.  

Before copying the files, I needed to perform a few virus and integrity checks on the drive. A process critical to digital preservation is characterisation, a series of activities undertaken in order to identify and describe what a file is and its defining technical characteristics. I used a free application called DROID (developed by the National Archives) to scan the drive and generate various technical and structural metadata, to identify file names, formats and sizes, and dates of creation/modification.

CSV of DROID profile, with file checksums highlighted

The metadata in the DROID profile were saved in CSV spreadsheet format, and I created a verifiable manifest of the files on the drive. DROID creates checksums (or hashes), an alphanumeric sequence unique to each file analysed. These serve as a ‘digital signature’ that can help verify the authenticity of the digital object over time. If the file is altered minutely – for example by slightly cropping a JPEG image – running the same file through DROID again would reveal a completely different checksum.

I then used a robust copy command in the Command Line window, a more powerful and versatile copying process which reports back on numbers of files copied, and any files missed or mismatched, therefore guaranteeing more effectively against the risk of data loss or corruption during the transfer process.

I ran DROID again after the files were copied to their new directory, comparing the checksums and dates modified against each other on the manifest. None of the checksums were altered during the transfer process. The characterisation process had not revealed any file duplication, broken file extensions, or unidentified/ unreadable file formats.

DROID profile in the applications user interface

The reports, manifests and other documentation gathered during the characterisation and validation process were saved to the documentation and metadata folder, after which I set about generating some more descriptions of the actual contents of the programmes – contextual information relating to title and date of performance, description, identifier, creator name, copyright information, and the like. I will go into the subject of metadata in my next post, but to summarise we are currently adhering to the more basic 15-element version of the Dublin Core Metadata Standard.

Using information gathered from the initial characterisation process and more detailed file analysis, I assembled metadata for each PDF file in a CSV document, with the Identifier for each file comprising the file name and extension. The CSV was saved in an encoded format that allowed me to convert it into XML sidecar files that can be moved into our Preservica digital preservation system alongside their assets, as seen in the screenshot below. 

CBSO programme PDF, corresponding metadata, and open XML metadata file

Once files and metadata were together, I worked through the ingest process, moving them into our digital archive in Preservica using the system’s Preservation and Upload Tool. Again, I will look at Preservica in a future post, but the process was successful, the reports revealed no issues, and the assets now sit with the metadata in the XML files now embedded into the PDFs themselves and displayed alongside the asset.

The digital files can now be searched in the system by staff and, in time, members of the public. This post provides only a very cursory glance at the journey of a digital asset from transfer to entry into a digital repository system – but it is satisfying to begin the process of getting our digital collections into a place where they can be better managed, monitored, preserved and made accessible in the future.

Michael Hunkin, Digital Preservation Officer