Tag Archives: data sharing

Embargoing the Term “Embargoes” Indefinitely

I’m two months into a position that lends part of its time to overseeing Dash, a Data Publication platform for the University of California. On my first day I was told that a big priority for Dash was to build out an embargo feature. Coming to the California Digital Library (CDL) from PLOS, an OA publisher with an OA Data Policy, I couldn’t understand why I would be leading endeavors to embargo data and not open it up- so I met this embargo directive with apprehension.

I began to acquaint myself with the campuses and a couple of weeks ago while at UCSF I presented the prototype for what this “embargo” feature would look like and I questioned why researchers wanted to close data on an open data platform. This is where it gets fun.

“Our researchers really just want a feature to keep their data private while their associated paper is under peer review. We see this frequently when people submit to PLOS”.

Yes, I had contributed to my own conflict.

While I laughed about how I was previously the person at PLOS convincing UC researchers to make their data public- I recognized that this would be an easy issue to clarify. And here we are.

Embargoes imply a negative connotation in the open community and I ask that moving forward we do not use this phrase to talk about keeping data private until an associated manuscript has been accepted. Let us call this “Private for Peer Review” or “Timed Release”, with a “Peer Review URL” that is available for sharing data during the peer review process as Dryad does.

  • Embargoes imply that data are being held private for reasons other than the peer review process.
  • Embargoes are not appropriate if you have a funder, publisher, or other mandate to open up your data.
  • Embargoes are not appropriate for sensitive data, as these data should not be held in a public repository (embargoed) unless this were through a data access committee and the repository had proper security.
  • Embargoes are not appropriate for open Data Publications.

To embargo your data for longer than the peer review process (or for other reasons) is to shield your data from being used, built off of, or validated. This is contrary to “Open” as a strategy to further scientific findings and scholarly communications.

Dash is implementing features that will allow researchers to choose, in line with what we believe is reasonable for peer review and revisions, a publication date up to six months after submission. If researchers choose to use this feature, they will be given a Peer Review URL that can be shared to download the data until the data are public. It is important to note though that while the data may be private during this time, the DOI for the data and associated metadata will be public and should be used for citation. These features will be for the use of Peer Review; we do not believe that data should be held private for a period of time on an open data publication platform for other reasons.

Opening up data, publishing data, and giving credit to data are all important in emphasizing that data are a credible and necessary piece of scholarly work. Dash and other repositories will allow for data to be private through peer review (with the intent to have data be public and accessible in the close future). However, my hope is that as the data revolution evolves, incentives to open up data sooner will become apparent. The first step is to check our vocab and limit the use of the term “embargo” to cases where data are being held private without an open data intention.

Tagged , , ,

There’s a new Dash!

Dash: an open source, community approach to data publication

We have great news! Last week we refreshed our Dash data publication service.  For those of you who don’t know, Dash is an open source, community driven project that takes a unique approach to data publication and digital preservation.

Dash focuses on search, presentation, and discovery and delegates the responsibility for the data preservation function to the underlying repository with which it is integrated. It is a project based at the University of California Curation Center (UC3), a program at California Digital Library (CDL) that aims to develop interdisciplinary research data infrastructure.

Dash employs a multi-tenancy user interface; providing partners with extensive opportunities for local branding and customization, use of existing campus login credentials, and, importantly, offering the Dash service under a tenant-specific URL, an important consideration helping to drive adoption. We welcome collaborations with other organizations wishing to provide a simple, intuitive data publication service on top of more cumbersome legacy systems.

There are currently seven live instances of Dash: – UC BerkeleyUC IrvineUC MercedUC Office of the PresidentUC RiversideUC Santa CruzUC San FranciscoONEshare (in partnership with DataONE)

Architecture and Implementation

Dash is completely open source. Our code is made publicly available on GitHub (http://cdluc3.github.io/dash/). Dash is based on an underlying Ruby-on-Rails data publication platform called Stash. Stash encompasses three main functional components: Store, Harvest, and Share.

  • Store: The Store component is responsible for the selection of datasets; their description in terms of configurable metadata schemas, including specification of ORCID and Fundref identifiers for researcher and funder disambiguation; the assignment of DOIs for stable citation and retrieval; designation of an optional limited time embargo; and packaging and submission to the integrated repository
  • Harvest: The Harvest component is responsible for retrieval of descriptive metadata from that repository for inclusion into a Solr search index
  • Share: The Share component, based on GeoBlacklight, is responsible for the faceted search and browse interface

Dash Architecture Diagram

Individual dataset landing pages are formatted as an online version of a data paper, presenting all appropriate descriptive and administrative metadata in a form that can be downloaded as an individual PDF file, or as part of the complete dataset download package, incorporating all data files for all versions.

To facilitate flexible configuration and future enhancement, all support for the various external service providers and repository protocols are fully encapsulated into pluggable modules. Metadata modules are available for the DataCite and Dublin Core metadata schemas. Protocol modules are available for the SWORD 2.0 deposit protocol and the OAI-PMH and ResourceSync harvesting protocols. Authentication modules are available for InCommon/Shibboleth and Google/OAuth19 identity providers (IdPs). We welcome collaborations to develop additional modules for additional metadata schemas and repository protocols. Please email UC3 (uc3 at ucop dot edu) or visit GitHub (http://cdluc3.github.io/dash/) for more information.

Features of the newly refreshed Dash service

What are the new features on our refresh of the Dash services?  Take a look.

Feature Tech-focused User-focused Description
Open Source X All components open source, MIT licensed code (http://cdluc3.github.io/dash/)
Standards compliant X Dash integrates with any SWORD/OAI-PMH-compliant repository
Pluggable Framework X Inherent extensibility for supporting additional protocols and metadata schemas
Flexible metadata schemas X Support Datacite metadata schema out-of-the-box, but can be configured to support any schema
Innovation X Our modular framework will make new feature development easier and quicker
Mobile/responsive design X X Built mobile-first, from the ground up, for better user experience
Geolocation – Metadata X X For applicable research outputs, we have an easy to use way to capture location of your datasets
Persistent Identifers – ORCID X X Dash allows researchers to attach their ORCID, allowing them to track and get credit for their work
Persistent Identifers – DOIs X X Dash issues DOIs for all datasets, allowing researchers to track and get credit for their work
Persistent Identifers – Fundref X X Dash tracks funder information using FundRef, allowing researchers and funders to track their reasearch outputs
Login – Shibboleth /OAuth2 X X We offer easy single-sign with your campus credentials or Google account
Versioning X X Datasets can change. Dash offers a quick way for you to upload new versions of your datasets and offer a simple process for tracking updates
Accessibility X X The technology, design, and user workflows have all been built with accessibility in mind
Better user experience X Self-depositing made easy. Simple workflow, drag-and-drop upload, simple navigation, clean data publication pages, user dashboards
Geolocation – Search X With GeoBlacklight, we can offer search by location
Robust Search X Search by subject, filetype, keywords, campus, location, etc.
Discoverability X Indexing by search engines for Google, Bing, etc.
Build Relationships X Many datasets are related to publications or other data. Dash offers a quick way to describe these relationships
Supports Best Practices X Data publication can be confusing. But with Dash, you can trust Dash is following best practices
Data Metrics X See the reach of your datasets through usage and download metrics
Data Citations X Quick access to a well-formed citiation reference (with DOI) to every data publication. Easy for your peers to quickly grab
Open License X Dash supports open Creative Commons licensing for all data deposits; can be configured for other licenses
Lower Barrier to Entry X For those in a hurry, Dash offers a quick interface to self-deposit. Only three steps and few required fields
Support Data Reuse X Focus researchers on describing methods and explaining ways to reuse their datasets
Satisfies Data Availability Requirements X Many publishers and funders require researchers to make their data available. Dash is an readily accepted and easy way to comply

A little Dash history

The Dash project began as DataShare, a collaboration among UC3, the University of California San Francisco Library and Center for Knowledge Management, and the UCSF Clinical and Translational Science Institute (CTSI). CTSI is part of the Clinical and Translational Science Award program funded by the National Center for Advancing Translational Sciences at the National Institutes of Health. Dash version 2 developed by UC3 and partners with funding from the Alfred P. Sloan Foundation (our funded proposal). Read more about the code, the project, and contributing to development on the Dash GitHub site.

A little Dash future

We will continue the development of the new Dash platform and will keep you posted. Next up: support for timed deposits and embargoes.  Stay tuned!

Tagged , ,

Make Data Rain

Last October, UC3,  PLOS, and DataONE launched Making Data Count, a collaboration to develop data-level metrics (DLMs). This 12-month National Science Foundation-funded project will pilot a suite of metrics to track and measure data use that can be shared with funders, tenure & promotion committees, and other stakeholders.

Featured image

[image from Freepik]

To understand how DLMs might work best for researchers, we conducted an online survey and held a number of focus groups, which culminated on a very (very) rainy night last December in a discussion at the PLOS offices with researchers in town for the 2014 American Geophysical Union Fall Meeting.

Six eminent researchers participated:

Much of the conversation concerned how to motivate researchers to share data. Sources of external pressure that came up included publishers, funders, and peers. Publishers can require (as PLOS does) that, at a minimum, the data underlying every figure be available. Funders might refuse to ‘count’ publications based on unavailable data, and refuse to renew funding for projects that don’t release data promptly. Finally, other researchers– in some communities, at least– are already disinclined to work with colleagues who won’t share data.

However, Making Data Count is particularly concerned with the inverse– not punishing researchers who don’t share, but rewarding those who do. For a researcher, metrics demonstrating data use serve not only to prove to others that their data is valuable, but also to affirm for themselves that taking the time to share their data is worthwhile. The researchers present regarded altmetrics with suspicion and overwhelmingly affirmed that citations are the preferred currency of scholarly prestige.

Many of the technical difficulties with data citation (e.g., citing  dynamic data or a particular subset) came up in the course of the conversation. One interesting point was raised by many: when citing a data subset, the needs of reproducibility and credit diverge. For reproducibility, you need to know exactly what data has been used– at a maximum level of granularity. But, credit is about resolving to a single product that the researcher gets credit for, regardless of how much of the dataset or what version of it was used– so less granular is better.

We would like to thank everyone who attended any of the focus groups. If you have ideas about how to measure data use, please let us know in the comments!

Tagged , ,

Announcing The Dash Tool: Data Sharing Made Easy

We are pleased to announce the launch of Dash – a new self-service tool from the UC Curation Center (UC3) and partners that allows researchers to describe, upload, and share their research data. Dash helps researchers perform the following tasks:

  • Prepare data for curation by reviewing best practice guidance for the creation or acquisition of digital research data.
  • Select data for curation through local file browse or drag-and-drop operation.
  • Describe data in terms of the DataCite metadata schema.
  • Identify data with a persistent digital object identifier (DOI) for permanent citation and discovery.
  • Preserve, manage, and share data by uploading to a public Merritt repository collection.
  • Discover and retrieve data through faceted search and browse.

Who can use Dash?

There are multiple instances of the Dash tool that all have similar functions, look, and feel.  We took this approach because our UC campus partners were interested in their Dash tool having local branding (read more). It also allows us to create new Dash instances for projects or partnerships outside of the UC (e.g., DataONE Dash and our Site Descriptors project).

Researchers at UC Merced, UCLA, UC Irvine, UC Berkeley, or UCOP can use their campus-specific Dash instance:

Other researchers can use DataONE Dash (oneshare.cdlib.org). This instance is available to anyone, free of charge. Use your Google credentials to deposit data.

Note: Data deposited into any Dash instance is visible throughout all of Dash. For example, if you are a UC Merced researcher and use dash.ucmerced.edu to deposit data, your dataset will appear in search results for individuals looking for data via any of the Dash instances, regardless of campus affiliation.

See the Users Guide to get started using Dash.

Stay connected to the Dash project:

Dash Origins

The Dash project began as DataShare, a collaboration among UC3, the University of California San Francisco Library and Center for Knowledge Management, and the UCSF Clinical and Translational Science Institute (CTSI). CTSI is part of the Clinical and Translational Science Award program funded by the National Center for Advancing Translational Sciences at the National Institutes of Health (Grant Number UL1 TR000004).

Fontana del Nettuno

Sound the horns! Dash is live! “Fontana del Nettuno” by Sorin P. from Flickr.

Tagged , , , ,

Data: Do You Care? The DLM Survey

We all know that data is important for research. So how can we quantify that? How can you get credit for the data you produce? What do you want to know about how your data is used?

If you are a researcher or data manager, we want to hear from you. Take this 5-10 minute survey and help us craft data-level metrics:

surveymonkey.com/s/makedatacount

Please share widely! The survey will be open until December 1st.

Read more about the project at mdc.plos.org or check out our previous post. Thanks to John Kratz for creating the survey and jumping through IRB hoops!

What do you think of data metrics? We're listening.  From gizmodo.com. Click for more pics of dogs + radios.

What do you think of data metrics? We’re listening.
From gizmodo.com. Click for more pics of dogs + radios.

Tagged , , , ,

Dash Project Receives Funding!

We are happy to announce the Alfred P. Sloan Foundation has funded our project to improve the user interface and functionality of our Dash tool! You can read the full grant text at http://escholarship.org/uc/item/2mw6v93b.

More about Dash

Dash is a University of California project to create a platform that allows researchers to easily describe, deposit and share their research data publicly. Currently the Dash platform is connected to the UC3 Merritt Digital Repository; however, we have plans to make the platform compatible with other repositories using protocols during our Sloan-funded work. The Dash project is open-source; read more on our GitHub site. We encourage community discussion and contribution via GitHub Issues.

Currently there are five instances of the Dash tool available:

We plan to launch the new DataONE Dash instance in two weeks; this tool will replace the existing DataUp tool and allow anyone to deposit data into the DataONE infrastructure via the ONEShare repository using their Google credentials. Along with the release of DataONE Dash, we will release Dash 1.1 for the live sites listed above. There will be improvements to the user interface and experience.

The Newly Funded Sloan Project

Problem Statement

Researchers are not archiving and sharing their data in sustainable ways. Often data sharing involves using commercially owned solutions, posting data on personal websites, or submitting data alongside articles as supplemental material. A better option for data archiving is community repositories, which are owned and operated by trusted organizations (i.e., institutional or disciplinary repositories). Although disciplinary repositories are often known and used by researchers in the relevant field, institutional repositories are less well known as a place to archive and share data.

Why aren’t researchers using institutional repositories?

First, the repositories are often not set up for self-service operation by individual researchers who wish to deposit a single dataset without assistance. Second, many (or perhaps most) institutional repositories were created with publications in mind, rather than datasets, which may in part account for their less-than-ideal functionality. Third, user interfaces for the repositories are often poorly designed and do not take into account the user’s experience (or inexperience) and expectations. Because more of our activities are conducted on the Internet, we are exposed to many high-quality, commercial-grade user interfaces in the course of a workday. Correspondingly, researchers have expectations for clean, simple interfaces that can be learned quickly, with minimal need for contacting repository administrators.

Our Solution

We propose to address the three issues above with Dash, a well-designed, user friendly data curation platform that can be layered on top of existing community repositories. Rather than creating a new repository or rebuilding community repositories from the ground up, Dash will provide a way for organizations to allow self-service deposit of datasets via a simple, intuitive interface that is designed with individual researchers in mind. Researchers will be able to document, preserve, and publicly share their own data with minimal support required from repository staff, as well as be able to find, retrieve, and reuse data made available by others.

Three Phases of Work

  1. Requirements gathering: Before the design process begins, we will build requirements for researchers via interviews and surveys
  2. Design work: Based on surveys and interviews with researchers (Phase 1), we will develop requirements for a researcher-focused user interface that is visually appealing and easy to use.
  3. Technical work: Dash will be an added-value data sharing platform that integrates with any repository that supports community protocols (e.g., SWORD (Simple Web-service Offering Repository Deposit).

The dash is a critical component of any good ascii art. By reddit user Haleljacob

Tagged , , , , ,

The 10 Things Every New Grad Student Should Do

It’s now mid-October, and I’m guessing that first year graduate students are knee-deep in courses, barely considering their potential thesis project. But for those that can multi-task, I have compiled this list of 10 things that you should undertake in your first year as a grad student. These aren’t just any 10 things… they are 10 steps you can take to make sure you contribute to a culture shift towards open science. Some a big steps, and others are small, but they will all get you (and the rest of your field) one step closer to reproducible, transparent research.

1. Learn to code in some language. Any language.

Here’s the deal: it’s easier to use black-box applications to run your analyses than to create scripts. Everyone knows this. You put in some numbers and out pop your results; you’re ready to write up your paper and get that H-index headed upwards. But this approach will not cut the mustard for much longer in the research world. Researchers need to know about how to code. Growing amounts and diversity of data, more interdisciplinary collaborators, and increasing complexity of analyses mean that no longer can black-box models, software, and applications be used in research. The truth is, if you want your research to be reproducible and transparent, you must code. In a 2013 article “The Big Data Brain Drain: Why Science is in Trouble“, Jake Vanderplas argues that

In short, the new breed of scientist must be a broadly-trained expert in statistics, in computing, in algorithm-building, in software design, and (perhaps as an afterthought) in domain knowledge as well.

I learned MATLAB in graduate school, and experimented with R during a postdoc. I wish I’d delved into this world earlier, and had more skills and knowledge about best practices for scientific software. Basically, I wish I had attended a Software Carpentry bootcamp.

The growing number of Software Carpentry (SWC) bootcamps are more evidence that researchers are increasingly aware of the importance of coding and reproducibility. These bootcamps teach researchers the basics of coding, version control, and similar topics, with the potential for customizing the course’s content to the primary discipline of the audience. I’m a big fan of SWC – read more in my blog post on the organization. Check out SWC founder Greg Wilson’s article on some insights from his years in teaching bootcamps: Software Carpentry: Lessons Learned.

2. Stop using Excel. Or at least stop ONLY using Excel.

Most seasoned researchers know that Microsoft Excel can be potentially problematic for data management: there are loads of ways to manipulate, edit, reorder, and change your data without really knowing exactly what you did. In nerd terms, the trail of dataset changes is known as provenance; generally Excel is terrible at documenting provenance. I wrote about this a few years ago on the blog, and we mentioned a few of the more egregious ways people abuse Excel in our F1000Research publication on the DataUp tool. More recently guest blogger Kara Woo wrote a great post about struggles with dates in Excel.

Of course, everyone uses Excel. In our surveys for the DataUp project, about 88% of the researchers we interviewed used Excel at some point in their research. And we can’t expect folks to stop using it: it’s a great tool! It should, however, be used carefully. For instance, don’t manipulate the sole copy of your raw data in Excel; keep your raw data raw. Use Excel to explore your data, but use other tools to clean and analyze it, such as R, Python, or MATLAB (see #1 above on learning to code). For more help with spreadsheets, see our list of resources and tools: UC3 Spreadsheet Help.

3. Learn about how to properly care for your data.

You might know more about your data than anyone else, but you aren’t so smart when it comes stewardship your data. There are some great guidelines for how best to document, manage, and generally care for your data; I’ve collected some of my favorites here on CiteULike with the tag best_practices. Pick one (or all of them) to read and make sure your data don’t get short shrift.

4. Write a data management plan.

I know, it sounds like the ultimate boring activity for a Friday night. But these three words (data management plan) can make a HUGE difference in the time and energy spent dealing with data during your thesis. Basically, if you spend some time thinking about file organization, sample naming schemes, backup plans, and quality control measures, you can save many hours of heartache later. Creating a data management plan also forces you to better understand best practices related to data (#3 above). Don’t know how to start? Head over to the DMPTool to write a data management plan. It’s free to use, and you can get an idea for the types of things you should consider when embarking on a new project. Most funders require data management plans alongside proposal submissions, so you might as well get the experience now.

5. Read Reinventing Discovery by Michael Nielsen.

 Reinventing Discovery: The New Era of Networked Science by Michael Nielsen was published in 2013, and I’ve since heard it referred to as the Bible for Open Science, and the must-read book for anyone interested in engaging in the new era of 4th paradigm research. I’ve only just recently read the book, and wow. I was fist-bumping quite a bit while reading it, which must have made fellow airline passengers wonder what the fuss was about. If they had asked, I would have told them about Nielsen’s stellar explanation of the necessity for and value of openness and transparency in research, the problems with current incentive structures in science, and the steps we should all take towards shifting the culture of research to enable more connectivity and faster progress. Just writing this blog post makes me want to re-read the book.

6. Learn version control.

My blog post, Git/GitHub: a Primer for Researchers covers much of the importance of version control. Here’s an excerpt:

From git-scm.com, “Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later.”  We all deal with version control issues. I would guess that anyone reading this has at least one file on their computer with “v2” in the title. Collaborating on a manuscript is a special kind of version control hell, especially if those writing are in disagreement about systems to use (e.g., LaTeX versus Microsoft Word). And figuring out the differences between two versions of an Excel spreadsheet? Good luck to you. TheWikipedia entry on version control makes a statement that brings versioning into focus:

The need for a logical way to organize and control revisions has existed for almost as long as writing has existed, but revision control became much more important, and complicated, when the era of computing began.

Ah, yes. The era of collaborative research, using scripting languages, and big data does make this issue a bit more important and complicated. Version control systems can make this much easier, but they are not necessarily intuitive for the fledgling coder. It might take a little time (plus attending a Software Carpentry Bootcamp) to understand version control, but it will be well worth your time. As an added bonus, your work can be more reproducible and transparent by using version control. Read Karthik Ram’s great article, Git can facilitate greater reproducibility and increased transparency in science.

7. Pick a way to communicate your science to the public. Then do it.

You don’t have to have a black belt in Twitter or run a weekly stellar blog to communicate your work. But you should communicate somehow. I have plenty of researcher friends who feel exasperated by the idea that they need to talk to the public about their work. But the truth is, in the US this communication is critical to our research future. My local NPR station recently ran a great piece called Why Scientists are seen as untrustworthy and why it matters. It points out that many (most?) scientists aren’t keen to spend a lot of time engaging with the broader public about their work. However:

…This head-in-the-sand approach would be a big mistake for lots of reasons. One is that public mistrust may eventually translate into less funding and so less science. But the biggest reason is that a mistrust of scientists and science will have profound effects on our future.

Basically, we are avoiding the public at our own peril. Science funding is on the decline, we are facing increasing scrutiny, and it wouldn’t be hyperbole to say that we are at war without even knowing it. Don’t believe me? Read this recent piece in Science (paywall warning): Battle between NSF and House science committee escalates: How did it get this bad?

So start talking. Participate in public lecture series, write a guest blog post, talk about your research to a crotchety relative at Thanksgiving, or write your congressman about the governmental attack on science.

8. Let everyone watch.

Consider going open. That is, do all of your science out in the public eye, so that others can see what you’re up to. One way to do this is by keeping an open notebook. This concept throws out the idea that you should be a hoarder, not telling others of your results until the Big Reveal in the form of a publication. Instead, you keep your lab notebook (you do have one, right?) out in a public place, for anyone to peruse. Most often an open notebook takes the form of a blog or a wiki, and the researcher updates their notebook daily, weekly, or whatever is most appropriate. There are links to data, code, relevant publications, or other content that helps readers, and the researcher themselves, understand the research workflow. Read more in these two blog posts: Open Up  and Open Science: What the Fuss is About.

9. Get your ORCID.

ORCID stands for “Open Researcher & Contributor ID”. The ORCID Organization is an open, non-profit group working to provide a registry of unique researcher identifiers and a transparent method of linking research activities and outputs to these identifiers. The endgame is to support the creation of a permanent, clear and unambiguous record of scholarly communication by enabling reliable attribution of authors and contributors. Basically, researcher identifiers are like social security numbers for scientists. They unambiguously identify you throughout your research life.

Lots of funders, tools, publishers, and universities are buying into the ORCID system. It’s going to make identifying researchers and their outputs much easier. If you have a generic, complicated, compound, or foreign name, you will especially benefit from claiming your ORCID and “stamping” your work with it. It allows you to claim what you’ve done and keep you from getting mixed up with that weird biochemist who does studies on the effects of bubble gum on pet hamsters. Still not convinced? I wrote a blog post a while back that might help.

10. Publish in OA journals, or make your work OA afterward.

A wonderful post by Michael White, Why I don’t care about open access to research: and why you should, captures this issue well:

It’s hard for me to see why I should care about open access…. My university library can pay for access to all of the scientific journals I could wish for, but that’s not true of many corporate R&D departments, municipal governments, and colleges and schools that are less well-endowed than mine. Scientific knowledge is not just for academic scientists at big research universities.

It’s easy to forget that you are (likely) among the privileged academics. Not all researchers have access to publications, and this is even more true for the general public. Why are we locking our work in the Ivory Tower, allowing for-profit publishers to determine who gets to read our hard-won findings? The Open Access movement is going full throttle these days, as evidenced by increasing media coverage (see “Steal this research paper: you already paid for it” from MotherJones, or The Guardian’s blog post “University research: if you believe in openness, stand up for it“). So what can you do?

Consider publishing only in open access journals (see the Directory of Open Access Journals). Does this scare you? Are you tied to a disciplinary favorite journal with a high impact factor? Then make your work open access after publishing in a standard journal. Follow my instructions here: Researchers! Make Your Previous Work #OA.

Openness is one of the pillars of a stellar academic career. From Flickr by David Pilbrow.

Openness is the pillar of a good academic career. From Flickr by David Pilbrow.

Tagged , , , , , ,

Sharing is caring, but should it count?

The following is a guest post by Shea Swauger, Data Management Librarian at Colorado State University. Shea and I both participated in a meeting for the Colorado Alliance of Research Libraries on 11 July 2014, where he presented survey results described below.


 

 Vanilla Ice has a timely message for the data community. From Flickr by wiredforlego.

Vanilla Ice has a timely message for the data community. From Flickr by wiredforlego.

It shouldn’t be a surprise that many of the people who collect and generate research data are academic faculty members. One of the gauntlets that these individuals must face is the tenure and promotion process, an evaluation system that measures and rewards professional excellence, scholarly impact and can greatly affect the career arch of an aspiring scholar. As a result, tenure and promotion metrics naturally influence the kind and quantity of scholarly products that faculty produce.

Some advocates of data sharing have suggested using the tenure and promotion process as a way to incentivize data sharing. I thought this was a brilliant idea and had designs to advocate its implementation to members of the executive administration at my university, but first I wanted to gather some evidence to support my argument. Some of my colleagues, Beth Oehlerts, Daniel Draper, Don Zimmerman and I sent out a survey to all faculty members as to how they felt about incorporating shared research data as an assessment measure in the tenure and promotion process. Only about 10% (202) responded, so while generalizations about the larger population can’t be made, their answers are still interesting.

This is how I expected the survey to work:

Me: “If sharing your research data counted, in some way, towards you achieving tenure and promotion, would you be more likely to do it?”

Faculty: “Yes, of course!”

I’d bring this evidence to the university, sweeping changes would be made, data sharing would proliferate and all would be well.

I was wrong.

Speaking broadly, only about half of the faculty members surveyed said that changing the tenure and promotion process would make them more likely to share their data.

While 76% of the faculty were interested in sharing data in the future, and 84% said that data generation or collection is important to their research, half of faculty said that shared research data has little to no impact on their scholarly community and almost a quarter of faculty said they are unable to judge the impact.

Okay, let’s back up.

The tenure system is supposed to measure, among several things like teaching, service, etc., someone’s impact on their scholarly community. According to this idea there should be a correlation between the things that impact your scholarly community and the things that impact you achieving tenure. Now, back to the survey.

I asked faculty to rate the impact of several research products on their scholarly community as well as on their tenure and promotion. 94% of faculty rated ‘peer-reviewed journal articles’ at ‘high impact’ (the top of the scale) for impact upon their scholarly community, and 96% of faculty rated ‘peer-reviewed journal articles’ at ‘high impact’ upon their tenure and promotion. This supports the idea that because peer-viewed journal articles have a high impact on the scholarly community, they have a high impact on the tenure and promotion process.

Shared research data had a similar impact correlation, though on the opposite end of the impact spectrum. Little impact on the scholarly community means little impact on the tenure and promotion process. Bad news for data sharing. Reductively speaking, I believe this to be the essence of the argument: contributions that are valuable to a research community should be rewarded in the tenure and promotion process; shared research data isn’t valuable to the research community; therefore, data sharing should not be rewarded.

Also, I received several responses from faculty saying that they were obligated not to share their data because of the kind of research they were doing, be it in defense, the private sector, or working with personally identifiable or sensitive data.  They felt that if the university started rewarding data sharing, they would be unfairly punished because of the nature of their research. Some suggested that a more local implementation of a data sharing policy, perhaps on a departmental basis or an individual opt-in system might be fairer to researchers who can’t share their data for one reason or another.

So what does this mean?

Firstly, it means that there’s a big perception gap on the importance of ‘my data to my research’, and the importance of ‘my data to someone else’s research’. Closing this gap could go a long way to increasing data sharing. Secondly, it means that the tenure and promotion system is a complicated, political mechanism and trying to leverage it as a way to incentivize data sharing is not easy or straightforward. For now, I’ve decided not to try and pursue amending the local tenure system, however I have hope that as interest in data sharing grows we can find meaningful ways that reward people who choose to share their data.

Note: the work described above is being prepared for publication in 2015.

Tagged , , , , ,

Feedback Wanted: Publishers & Data Access

This post is co-authored with Jennifer Lin, PLOS

Short Version: We need your help!

We have generated a set of recommendations for publishers to help increase access to data in partnership with libraries, funders, information technologists, and other stakeholders. Please read and comment on the report (Google Doc), and help us to identify concrete action items for each of the recommendations here (EtherPad).

Background and Impetus

The recent governmental policies addressing access to research data from publicly funded research across the US, UK, and EU reflect the growing need for us to revisit the way that research outputs are handled. These recent policies have implications for many different stakeholders (institutions, funders, researchers) who will need to consider the best mechanisms for preserving and providing access to the outputs of government-funded research.

The infrastructure for providing access to data is largely still being architected and built. In this context, PLOS and the UC Curation Center hosted a set of leaders in data stewardship issues for an evening of brainstorming to re-envision data access and academic publishing. A diverse group of individuals from institutions, repositories, and infrastructure development collectively explored the question:

What should publishers do to promote the work of libraries and IRs in advancing data access and availability?

We collected the themes and suggestions from that evening in a report: The Role of Publishers in Access to Data. The report contains a collective call to action from this group for publishers to participate as informed stakeholders in building the new data ecosystem. It also enumerates a list of high-level recommendations for how to effect social and technical change as critical actors in the research ecosystem.

We welcome the community to comment on this report. Furthermore, the high-level recommendations need concrete details for implementation. How will they be realized? What specific policies and technologies are required for this? We have created an open forum for the community to contribute their ideas. We will then incorporate the catalog of listings into a final report for publication. Please participate in this collective discussion with your thoughts and feedback by April 24, 2014.

We need suggestions! Feedback! Comments! From Flickr by Hash Milhan

We need suggestions! Feedback! Comments! From Flickr by Hash Milhan

 

Tagged , , , , ,

Mountain Observatories in Reno

A few months ago, I blogged about my experiences at the NSF Large Facilities Workshop. “Large Facilities” encompass things like NEON (National Ecological Observatory Network), IRIS PASSCAL Instrument Center (Incorporated Research Institutions for Seismology Program for Array Seismic Studies of the Continental Lithosphere), and the NRAO (National Radio Astronomy Observatory). I found the event itself to be an eye-opening experience: much to my surprise, there was some resistance to data sharing in this community. I had always assumed that large, government-funded projects had strict data sharing requirements, but this is not the case. I had stimulating arguments with Large Facilities managers who considered their data too big and complex to share, and (more worrisome), that their researchers would be very resistant to opening up the data they generated at these large facilities.

Why all this talk about large facilities? Because I’m getting the chance to make my arguments again, to a group with overlapping interests to that of the Large Facilities community. I’m very excited to be speaking at Mountain Observatories: A Global Fair and Workshop  this July in Reno, Nevada. Here’s a description from the organizers:

The event is focused on observation sites, networks, and systems that provide data on mountain regions as coupled human-natural systems. So the meeting is expected to bring together biophysical as well as socio-economic researchers to discuss how we can create a more comprehensive and quantitative mountain observing network using the sites, initiatives, and systems already established in various regions of the world.

I must admit, I’m ridiculously excited to geek out with this community. I’ll get to hear about the GLORIA Project (GLObal Robotic-telescopes Intelligent Array), something called “Mountain Ethnobotany“, and “Climate Change Adaptation Governance”. See a full list of the proposed sessions here. The conference is geared towards researchers and managers, which means I’ll have the opportunity to hear about data sharing proclivities straight from their mouths. The roster of speakers joining me include a hydroclimatologist (Mike Dettinger, USGS) and a researcher focused on socio-cultural systems (Courtney Flint, Utah State University), plus representatives from the NSF, a sensor networks company, and others. The conference should be a great one – abstract submission deadline was just extended, so there’s still time to join me and nerd out about science!

Reno! From Flickr by Ravensmagiclantern

Reno! From Flickr by Ravensmagiclantern

Tagged , , , ,