For centuries, information has been stored on paper or some sort of analog material, but now that more content is stored digitally, the question now is how to better preserve it? By 2020, the volume of data will increase to roughly 44 zettabytes or 44 trillion GB! It more than doubles every 2 years. Think about huge volumes of photos, videos, emails, documents, social media files, and so on.
There are challenges of digital preservation:
- Data volumes: While data storage is becoming cheaper, the issue is deciding what file to keep and what to delete. Not all information is necessary to be stored. It becomes more complex when dealing with a larger volume of data. The storage system may not be able to cope with large files or multiple versions of files. Also many files are multiplied in different formats and stored in multiple locations.
- Hardware and software: Digital materials are more at risk and have a much shorter lifespan than analog materials which means that they need to be maintained and copied to newer locations whenever possible. New hardware may not read old software and old software may not work on new hardware.
- Cloud storage: While cloud is becoming more popular for file storage, it is not designed for archiving. Since cloud is hosted on third party servers, anything may go wrong and data may be lost. So it’s recommended to archive files on personal external drives. Another issue with cloud storage is security and privacy issues.
- File formats: Proprietary file formats are more challenging to preserve than open file formats. Whenever possible, it’s recommended to convert the file to an open format. Even not all open formats are suitable for long term preservation.
Bernard Marr defined the 5 “V”s of Big Data: Volume, Velocity, Variety, Veracity, Value.
As Helen Shenton, a head of collection care for the British Library, said: “A book left on the shelf for a hundred years might be fine, but digital data must be read and checked constantly to ensure their integrity.” Imagine libraries trying to manage all those volumes of digital data..
Those issues were also discussed during the Digital Preservation event at Google by Vint Cerf, VP and CIE of Google that took place in NYC last month. He’s also called the Father of the Internet. Cerf said that we need to find a way to store all data in a way that will remain accessible 100 years from now. He suggested “digital vellum” that allows to preserve every piece of software and hardware so that it never becomes obsolete – like in a museum, but in a digital space on the cloud. If that idea is successful, it may prevent us from entering “digital Dark Ages.” We need to be able to preserve content not only for ourselves, but also for future generations and for years to come.