In my last two blog posts (here and here), I have covered what the phrase “Digital Dark Age” means, and how it might affect your scientific data. Here, I will provide some basic tips on how to avoid losing your data in the event that the Digital Dark Age becomes a reality. A lot of what I cover in this blog post can also be found in the DataONE primer on data management or my previous DCXL post, Data Management 101.
1. Use non-proprietary software. Keep in mind that even the most ubiquitous software programs are ephemeral. Keep archival copies of your data and other files in open file formats that are able to be read by multiple programs. This means using formats like .txt for documents, .csv for spreadsheets, and .mp3 for audio. Check out Virginia Tech’s complete list of recommended file formats for more information.
2. Back up often and in multiple locations. This makes good sense for both short-term and long-term data preservation. It’s a similar concept to storing your old-school photo negatives in a different location from your actual printed photos, just in case of fire/flood/apocalypse. Read more about backing up in my blog post on the subject.
3. Transfer your backups to new media types every 3-4 years. All of the data from my undergraduate research project are safely stored on zip disks, despite the fact that I have no zip drive to read these disks. I am not alone: there are myriad stories about lost data stored on outdated media types. Here is an example from the Council on Library and Information Resources:
10-20% of data from the Viking Mars mission that was recorded on magnetic tapes have significant errors, because, as Jet Propulsion Laboratory technicians now realize, the magnetic tape on which they are stored is “a disaster for an archival storage medium.”
To avoid becoming a statistic, create backups on the latest greatest media type every few years.
4. Document, document, document. Let’s imagine that you save your data in a non-proprietary format, make plenty of backups, and transfer to new media types frequently. These activities are only useful if others can effectively understand and use the data that you archived. Create quality metadata, take notes on your workflow, and generally document how you generate your data and what you do with it.
5. For the really important stuff, create hard copies. I encourage scientists to move as much of their process into digital formats as possible: eliminate paper lab notebooks and stop filling out “data sheets” with a pencil. I discourage these methods because they imply future manual data entry, and because they make it much more difficult to keep data documentation with the data it describes. This advice does not, however, preclude creating paper copies of your digital data. In fact, printing off the most important information and storing those hard copies in an offsite location is generally a good idea. Paper is capabile of lasting much longer than digital formats, so you can be certain that your most important work will be available irrespective of the next amazing media type that emerges.