Closed Data… Excuses, Excuses

If you are a fan of data sharing, open data, open science, and generally openness in research, you’ve heard them all: excuses for keeping data out of the public domain. If you are NOT a fan of openness, you should be. For both groups (the fans and the haters), I’ve decided to construct a “Frankenstein monster” blog post composed of other peoples’ suggestions for how to deal with the excuses.

Yes, I know. Frankenstein was the doctor, not the monster. From Flickr by Chop Shop Garage.

Yes, I know. Frankenstein was the doctor, not the monster. From Flickr by Chop Shop Garage.

I have drawn some comebacks from Christopher Gutteridge, University of Southampton, and Alexander Dutton, University of Oxford. They created an open google doc of excuses for closing off data and appropriate responses, and generously provided access to the document under a CC-BY license. I also reference the UK Data Archive‘s list of barriers and solutions to data sharing, available via the Digital Curation Centre‘s PDF, “Research Data Management for Librarians” (pages 14-15).

People will contact me to ask about stuff

Christopher and Alex (C&A) say: “This is usually an objection of people who feel overworked and that [data sharing] isn’t part of their job…” I would add to this that science is all about learning from each other – if a researcher is opposed to the idea of discussing their datasets, collaborating with others, and generally being a good science citizen, then they should be outed by their community as a poor participant.

People will misinterpret the data

C&A suggest this: “Document how it should be interpreted. Be prepared to help and correct such people; those that misinterpret it by accident will be grateful for the help.” From the UK Data Archive: “Producing good documentation and providing contextual information for your research project should enable other researchers to correctly use and understand your data.”

It’s worth mentioning, however, a second point C&A make: “Publishing may actually be useful to counter willful misrepresentation (e.g. of data acquired through Freedom of Information legislation), as one can quickly point to the real data on the web to refute the wrong interpretation.”

My data is not very interesting

C&A: “Let others judge how interesting or useful it is — even niche datasets have people that care about them.” I’d also add that it’s impossible to decide whether your dataset has value to future research. Consider the many datasets collected before “climate change” was a research topic which have now become invaluable to documenting and understanding the phenomenon. From the UK Data Archive: “Who would have thought that amateur gardener’s diaries would one day provide essential data for climate change research?”

I might want to use it in a research paper

Anyone who’s discussed data sharing with a researcher is familiar with this excuse. The operative word here is might. How many papers have we all considered writing, only to have them shift to the back burner due to other obligations? That said, this is a real concern.

C&A suggest the embargo route: “One option is to have an automatic or optional embargo; require people to archive their data at the time of creation but it becomes public after X months. You could even give the option to renew the embargo so only things that are no longer cared about become published, but nothing is lost and eventually everything can become open.” Researchers like to have a say in the use of their datasets, but I would caution to have any restrictions default to sharing. That is, after X months the data are automatically made open by the repository.

I would also add that, as the original collector of the data, you are at a huge advantage compared to others that might want to use your dataset. You have knowledge about your system, the conditions during collection, the nuances of your methods, et cetera that could never be fully described in the best metadata.

I’m not sure I own the data

No doubt, there are a lot of stakeholders involved in data collection: the collector, the PI (if different), the funder, the institution, the publisher, … C&A have the following suggestions:

  • Sometimes as it’s as easy as just finding out who does own the data
  • Sometimes nobody knows who owns the data. This often seems to occur when someone has moved into a post and isn’t aware that they are now the data owner.
  • Going up the management chain can help. If you can find someone who clearly has management over the area the dataset belongs to they can either assign an owner or give permission.
  • Get someone very senior to appoint someone who can make decisions about apparently “orphaned” data.

My data is too complicated.

C&A: “Don’t be too smug. If it turns out it’s not that complicated, it could harm your professional [standing].” I would add that if it’s too complicated to share, then it’s too complicated to reproduce, which means it’s arguably not real scientific progress. This can be solved by more documentation.

My data is embarrassingly bad

C&A: “Many eyes will help you improve your data (e.g. spot inaccuracies)… people will accept your data for what it is.” I agree. All researchers have been on the back end of making the sausage. We know it’s not pretty most of the time, and we can accept that. Plus it helps you strive will be at managing and organizing data during your next collection phase.

It’s not a priority and I’m busy

Good news! Funders are making it your priority! New sharing mandates in the OSTP memorandum state that any research conducted with federal funds must be accessible. You can expect these sharing mandates to drift down to you, the researcher, in the very near future (6-12 months).

Tagged , , , , , ,

8 thoughts on “Closed Data… Excuses, Excuses

  1. Useful advocacy.

    But why, when you argue for CC-BY in the post, is your own post CC-NC? This means I can’t repost it on my own blog, for example.

    • Carly Strasser says:

      Ah – fair point, Peter! I’m actually supposed to mark all of my posts with “All rights reserved – Regents of University California”. I had to lobby hard for CC-NC, but it was better than the former! If it were my own domain and a non-UC blog, I would certainly do things differently.

  2. Carly Strasser says:

    Comment from Matt Jones via Google+:

    you make some good points about how to rebut the reasons that people give for not sharing data. Thanks for the nice overview. I think the issue of data ownership, however, should really be addressed differently, at least in the US. When someone says “I’m not sure I own the data”, in the US the correct answer is: you’re right, you don’t own the data. Even if you collected it. Even if funded only by private interests. To own something implies a property interest in that thing, and in the case of data, there is no property interest. Data are facts, and facts can not be copyrighted. In addition, for presentations of information that can be copyrighted, people often confuse the regulatory monopolies granted by copyright and patent law with ownership, but this is a spurious association — see Cory Doctorow’s excellent article on the dangers of using a property metaphor for knowledge (

    The reason that common science metadata standards like FGDC use terms like ‘Originator’ is to explicitly avoid the use of the word owner. Nobody owns data in the US. In some scientific cultures, a scientist might have an ethical right to be cited when their data are used, but that shouldn’t be confused with owning the data. I think these issues along with the problems associated with attribution stacking (see section 5.3 of and the nice illustration of the problem in are what really are pushing people towards releasing data under CC0 to be perfectly clear about the legal standing of data across jurisdictions.

  3. A useful resource, but I think the response to “My data is not very interesting” is a bit condescending and unhelpful. Hitting data producers with the doctrine “you must publish all your data, no matter how useless and uninteresting” does nothing to improve the relationship. Instead: “start with your most interesting data, and draw a line when the cost/benefit ratio doesn’t make sense” .

    • I see where you’re coming from. At least on the insitutional (i.e., not research) side, if the people pushing for opening up the data can’t come up with some examples — even if they don’t quite convince the data owner/originator — then the demonstrable benefit is low. However, our approach is to target stuff we can see would be useful for internal purposes, so the less interesting/useful stuff wouldn’t be asked for. Conversely, if I’m asking for it, it’s because at least one person — me — sees utility in it.

      Research data isn’t my area, but I suspect supporting serendipity and unforeseen re-use is far more important.

  4. […] Closed Data… Excuses, Excuses ::: Data Pub […]

  5. […] interesting resource is a blog post in the Data Pub blog titled “Closed Data…Excuses, Excuses“, which provides an alternative approach for opening up data; it lists a number of excuses […]

  6. Αgro-Κnow says:

    Carly thank you for this really interesting blog post! It shows exactly how easy it is to respond to the naive excuses posed by open data-ignorant people; if everyone could only understand the usefulness and potential of open data, they could have taken some time and effort to work towards opening up their data, which in turn could prove extremely useful to other stakeholders!

    We have just published a short blog post mentioning yours; we hope that you like it!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: