Discussions about data seem to be everywhere. For evidence of this, look at recent discussions of big data, calls for increasing cyber-infrastructure for data, data management requirements by funders, and data sharing requirements by journals. Given all of this discussion, researchers are (or should be) considering how to handle their own data for both the long term and the short term.
The popularity of discussions about data is good and bad for the average researcher.
Let’s start with the bad first: it means researchers are now, more than ever, responsible for being good data stewards (before commenting that this “isn’t a bad thing!!” read on). Gone are the days when you could manage your data in-house, with no worries that others might notice your terrible file naming schemes or scoff at the color coding system in your spreadsheets. With increasing requirements for managing and sharing data, researchers should be careful to construct their datasets and perform their analyses knowing that they will have to share those files eventually. This means that researchers need to learn a bit about best practices for data management and invest some time in creating data management plans that go beyond simply funder requirements (which are NOT adequate for actually properly managing your data – see next week’s blog post for more).
Arguably, the “bad” I mention above is not actually bad at all. Speaking from the point of view of a researcher, however, anything that requires more demands on your time can be taxing. Moving on to the good: all of this attention being given to data stewardship means that there are lots of places to go for help and guidance. You aren’t in this alone, researchers. In previous posts I’ve written about the stubbornness of scientists and our inherent inability to believe that someone might be able us. In the case of data management and related topics, it will pay off in the long run to put aside your ego and ask for help. Who? Here are a few ideas:
- Librarians. I’ve blogged about how great and under-used academic libraries and librarians tend to be, but it is worth mentioning again. Librarians are very knowledgeable about information. Yes, your information is special. No, no one can possibly understand how great/complex/important/nuanced your data set is. But I promise you will learn something if you go hang out with a librarian. Since my entry into the libraries community, I have found that librarians are great listeners. They will actively listen while you to babble on endlessly about your awesome data and project, and then provide you with insight that only someone from the outside might provide. Bonus: many librarians are active in the digital data landscape, and therefore are likely to be able to guide you towards helpful resources for scientific data management.
- Data Centers/repositories. If you have never submitted data to a data center for archiving, you will soon. Calls for sharing data publicly will only get louder in the next few years, from funders, journals, and institutions interested in maximizing their investment and increasing credibility. Although you might be just hearing of data centers’ existence, they have been around for a long time and have been thinking about how to organize and manage data. How to pick a data center? A wonderful searchable database of repositories is available at www.databib.org. Once you zero in on a data center that’s appropriate for your particular data set, contact them. They will have advice on all kinds of useful stuff, including metadata, file formats, and getting persistent identifiers for your data.
- Publishers and Funders. Although they wouldn’t be my first resource for topics related to data, many publishers and funders are increasingly providing guidance, help text, and links to resources that might help you in your quest for improved data stewardship.
My final takeaway is this: researchers, you aren’t in this alone. There is lots of support available for those humble enough to accept it.