Greetings, all. I’m a new postdoc at the CDL and I’m very excited to be spending the next couple of years thinking about data publication. Carly has discussed data publication several times before, but briefly, the goal is to improve dataset reproduction and reuse by publishing datasets as “first class” scholarly objects akin to journal articles- with the attendant opportunities for preservation, citation, and award of credit.
I spent most of grad school tickling worms with an eyebrow hair glued to a toothpick (this is true), but now I’m moving from lab to library as a CLIR/DLF Postdoctoral Fellow in Data Curation for the Sciences and Social Sciences. The Sloan Foundation funds these fellowships to, as Josh Greenberg puts it, train “professionals with one foot in research and one foot in data curation”.
Partly for my own edification, I’m starting with a thorough survey of the data publication landscape. I’ll be looking at current practices and proposals for data publication, citation, and peer-review. I’m interested in questions like: How can the quality of a dataset evaluated? How does the creator of a dataset get credit for it? How do datasets remain findable, accessible, and useable in the future? Does it even make sense to apply the terms “publication” or “peer-review” to data at all?
Where things go from there depends on how the survey turns out, so that’s much more up in the air. One possibility is to put a workflow for data publication together from existing tools. Another is to identify a need not met by existing tools that the CDL could address.
If you have ideas you’d like to share, please comment here or email me.
Shameless Plug: Applications for 2014 CLIR/DLF Fellowships are opening soon!