The concept of data publication is rather simple in theory: rather than relying on journal articles alone for scholarly communication, let’s publish data sets as “first class citizens” (hat tip to the DataCite group).  Data sets have inherent value that makes them standalone scholarly objects— they are more likely to be discovered by researchers in other domains and working on other questions if they are not associated with a specific journal and all of the baggage that entails.

Consider this example (taken from personal experience).  If you are a biologist interested in studying clam population connectivity, how likely are you to find the (extremely relevant) data related to clam shell chemistry that are associated with paleo-oceanography journals?  It took me several months before I discovered them during research for my graduate work.  If those datasets had been published in a repository, however, with a few well-chosen keywords and a quick web search, I would have located those datasets much more quickly.

Who would be against this idea, you ask?  It turns out data publication is similar to data management: no one is against the concept per se, but they are against all of the work, angst, and effort involved in making it a reality.  There is also considerable debate about how we should proceed to make data publication the norm in scientific communication.  In fact, there is debate about whether we should call it “data publication”.

A few months back, Mark Parsons of the National Snow and Ice Data Center and Peter Fox of Rensselaer Polytechnic Institute wrote a paper title “Is data publication the right metaphor?”, with plans to publish in Data Science Journal. Before publication, however, they opened the paper up for comments on the web. This move sparked a lively debate among folks in the information, data, and libraries community, which I will leave you to explore on the Parsons blog, the Open Citations and Semantic Publishing blog post about this, and Bryan Lawrence’s comments on his wiki.

The basic argument is that the word “publication” insinuates that we are beholden to the current broken system of journal publication.  The word itself has too much baggage.  The opposing argument is that bureaucrats, funders, and institutions have a familiarity with the word publication and that will ensure the success of the data publication goals, regardless of whether we break the mold in the process.

Do you have thoughts on the subject? Email us, comment on this post below, or comment on the Parsons and Fox paper.


Wouldn't it be great if data were as easy to find, read, and store as books? "Faculty Wives Book Fair" courtesy of San Joaquin Valley Library System, from Calisphere


