Data to last a lifetime: the Imperial War Museum's storage campaign
"When I started I was shown an office with a shelf with a load of 500 gigabyte hard drives"
CIOs everywhere are duty-bound to source long-term, secure storage for their organisations - but spare a thought in particular for Ian Crawford, CIO at the Imperial War Museum (IWM). Not only is his remit to preserve and store five heritage sites' worth of multimedia containing extremely sensitive data, but to store it forever. With one of the oldest film archives in the world, containing around 24,000 hours of complex audio and video as well as newly recorded material, providing the infrastructure for a digitisation programme that can cope with assets this size (as well as the usual business storage requirements) is no small order.
"When I started [in 2007] I was shown an office with a shelf, which was about three meters long and bowing in the middle, with a load of 500 gigabyte hard drives on, and that was the collection. No backups… IT had just installed its first NAS box to start migrating the data onto it."
Though the museum had been digitising informally before his arrival, Crawford knew there was still a lot to be done. "We couldn't go out and buy a commercial product at the time that would meet all our needs, so we ended up with a sort of hybrid of two different companies' products: one [Axial] to do the cataloguing of the objects we've got in the collection; and then we use a commercial digital asset management (DAMs) company [Spectra Logic] to provide that side of the operation; and then we just use a simple API to link the two and got it integrated."
IWM deployed Spectra Logic's Storcycle software in March 2021, as part of a large-scale archive infrastructure project to preserve the digital information it already had, as well as the continuous additions to the collection through events, documentaries and shared projects. "We've had quite a lot of experience working with Spectra over the years: we've had the tape libraries - I think I ordered the first one about 10 years ago. We've still got it, and we've upgraded it several times. We've got a second one at Duxford as well, physically separate, and that's where the bulk of the digital assets we're generating [are stored], where we're making digital copies of the film."
Infrastructure that combines the ability to store huge quantities of data with ‘old-school' cataloguing has not been easy to develop, but the structure that Crawford has implemented goes a long way to enabling exactly this. "We've got two T950 tape libraries, both using different tape technology. One is using an LTO-7, and the other's IBM TS1150. We follow an Open Archival Information System standard, which basically tells you that you need multiple copies and you need to do clever stuff like making sure they're not physically next to each other."
These tape libraries are kept at Duxford, though any time a copy is made, it can simultaneously be saved to both archives. "We use DPX which is non-compressed - basically like a string of TIFF files, all sequenced - and then we use Prores to make a sort of mezzanine copy, which we can then generate other files from: DivX or MPEG 4, H.264, that sort of stuff. That is normally kept on spinning disks, so we replicate spinning disk as well; so, we've got a set of spinning disks - again, Spectra Logic - at Duxford, but this time it's replicated down to London over our Janet connection."
The final frontier
Finding the space for similar storage in London is a challenge unlikely ever to be solved. Given unlimited budget, it could be possible, but the ideal world the heritage sector dreams of is a long way from reality. Funding and space are the prizes any museum or archive is always seeking more of, but recent global events have also added their own issues.
"We've been doing a project over the last five or six years to put new galleries into our Lambeth Road building, and that's actually taken away quite a lot of office space. We've also given up another building, which is on the same campus, and that was where our main server room was, so we're trying to move all the kit out of that at the moment into Lambeth Road and up to Duxford, so Covid has done us no favours in delaying that project a bit."
Ian Crawford, CIO, Imperial War Museums
Fortunately, good foresight in terms of planning for future work spaces meant that Covid hasn't affected the IWM as harshly as it might have done. Shifting to support flexible working and to ensure those who most needed on-site access could get it have meant that, despite the same loss of revenue most other heritage sites have seen during the pandemic, digitisation projects have been able to continue.
Of course, with projects still steaming ahead, and historic data in need of continued safe storage, the issue of space remains at the front of Crawford's awareness. One of the greatest challenges has been not only finding a product that can store the archives already in existence, but one with a capacity to control the length of time data is stored.
No-one ever says to me how long they want to keep it, it's just sort of assumed we can keep it forever
"If we digitise a film, we might make multiple copies before we put it into the DAMs system, and they need to be stored somewhere while they're working on them… No-one ever says to me how long they want to keep it, it's just sort of assumed we can keep it forever… As a result, like a lot of organisations, we've ended up with quite a lot of data that just sits on spinning disk that's very, very rarely accessed, but we need to keep hold of it. This [StorCycle] product really is aimed at that. It gives us a platform where I'm confident that we can store [data] cheaply, because it's tape, but they can still access it when they need to."
Tape in 2021?
Tape storage over cloud storage, to anyone outside the heritage sector, may seem less than ideal; but there have always been specific criteria the IWM isn't willing to - or can't - compromise on. With volatile material like nitrate film requiring cold storage (currently with the BFI), and large quantities of digitised material requiring back-ups, the necessity of having more than one storage type is obvious. Plus, the sheer size of the files creates an issue with online storage.
"It's the cost, you know, the volumes we're talking about, and there's also sensitivity about some of the data we've got. A lot of it is Crown Copyright and we're the custodians of it, so we feel a bit safer looking after it ourselves as well… We've just invested in two new 4K scanners; and you can imagine the size of the files we're generating now - at 4K an hour-long propaganda documentary in our collection is over a terabyte in size… We did some trials of putting some large files into the cloud a couple of years ago and it just took forever to get them back."
A British munitions factory during World War I. Credit: Imperial War Museums
While going full cloud may not be the way forward for the time being - and some items will always need to remain in their original, historical format - Crawford is aware that products will continue to improve over time, and has plans to keep abreast of developments.
"StorCycle is clearly aimed at our more production and business-type storage, where we've got all the normal stuff people have, like Word documents and Excel, but then we've got a lot of quite exotic stuff to do with exhibitions, so a lot of films stills, but stuff that's not in the DAMs… We know we're going to migrate about 70 per cent of the data that's on our existing platform into StorCycle eventually."
With new galleries for World War II and the Holocaust set to open in September/October, and the hopeful start of financial recovery along with it, it's more important than ever that the IWM's exponential data growth is well-managed and fully accessible at all times. Ultimately, Crawford points out, "The more we digitise, the more we can do with that data."