The Skinny on Data Publication

The concept of data publication is rather simple in theory: rather than relying on journal articles alone for scholarly communication, let’s publish data sets as “first class citizens” (hat tip to the DataCite group).  Data sets have inherent value that makes them standalone scholarly objects— they are more likely to be discovered by researchers in other domains and working on other questions if they are not associated with a specific journal and all of the baggage that entails.

Consider this example (taken from personal experience).  If you are a biologist interested in studying clam population connectivity, how likely are you to find the (extremely relevant) data related to clam shell chemistry that are associated with paleo-oceanography journals?  It took me several months before I discovered them during my PhD.  If those datasets had been published in a repository, however, with a few well-chosen keywords and a quick web search, I would have located those datasets much more quickly.

Who would be against this idea, you ask?  It turns out data publication is similar to data management: no one is against the concept per se, but they are against all of the work, angst, and effort involved in making it a reality.  There is also considerable debate about how we should proceed to make data publication the norm in scientific communication.

phd cartoon

A summary of what's wrong with the current system, from a PhD Comics cartoon:

I had a lovely dinner last week with some colleagues in town for the AGU meeting, where a passionate debate ensued about data publication.  One of the scientists made the (quite valid) argument that data publication is  a terrible phrase because the word “publication” insinuates that we are beholden to the current broken system of journal publication.  The word itself has too much baggage.  The opposing counsel suggested that bureaucrats, funders, and institutions have a familiarity with the word publication and that will ensure the success of the data publication goals, regardless of whether we break the mold in the process.  We agreed to brainstorm potential metaphors for the concept of data publication that might result in a better phrase to describe the idea.  Any suggestions?

This has relevance to the DCXL project since we consider this Excel add-in to be a stepping stone towards data publication (whatever we end up calling it). By allowing scientists to directly link with archives and upload their data, we are promoting data as a unique scholarly object. Through services like EZID, you can even get a DOI for your dataset.  These are all good advances towards promoting data as a first class object.

For more on the current debate that is raging about scholarly communication via journal publication, check out these two recent excellent pieces:

And for a giggle, watch the awesome cartoon called Scientist Meets Publisher from the blog Ceptional.

One thought on “The Skinny on Data Publication

  1. Mark Parsons says:

    An here is a link to an essay on the topic by the scientist arguing against the data publication metaphor:

    The essay is out for open review, so I would welcome any critique.

