Dash Enables ORCiD Login

The Dash team has now added a second way to login and submit. In addition to using Single Sign-On, users now have the ability to login with ORCiD. This means that not only can you authenticate with ORCiD, but once you have logged in this way, your ORCiD ID will connect to your Dash account. The next times that you submit to Dash, your ORCiD ID will auto populate in your submission form.

To back-up a little: ORCiD is a persistent identifier used to distinguish researchers from one another, and connect researchers with their research. If you are a researcher and do not currently have an ORCiD, sign up!

To connect your ORCiD:

  1. Login using the button on the far right of the Dash homepage
  2. Here you will see two options. If you click on the top ORCiD button will send you out to the ORCiD authentication page, and after correctly entering your ORCiD info, send you back to Dash.
    Screen Shot 2017-08-17 at 10.04.30 AM
  3. Although you have now successfully authenticated with ORCiD, to ensure you are connected to your correct submitting instance (a campus, a department, DataONE, etc…) you will be asked to choose your Single Sign-On. This is the only time you will be asked to login twice.Screen Shot 2017-08-17 at 10.14.22 AM
  4. After successfully logging in with Single Sign-On you will have your account connected to your ORCiD. In the future, you will not need to repeat this process and instead you will either be able to save your login to your browser or choose one of the two options for logging in.If you have already submitted to Dash before, you may logout, and go through the same steps above. This process will tie your ORCiD to your existing account and allow for either ORCiD or Single Sign-On in the future.

Dash: The Data Publication Tool for Researchers

This post has been crossposted on Medium

We all know that research data should be archived and shared. That’s why Dash was created, a Data Publishing platform free to UC researchers. Dash complies with journal and funder requirements, follows best practices, and is easy to use. In addition, new features are continuously being developed to better integrate with your research workflow.

Why is Dash the best solution for UC researchers:

  • Data are archived indefinitely. You can use Dash to ensure all of your research data will be available even after you get a new computer or switch institutions. Beyond that, your data will have all the important associated documentation on the funding sources for the research, the research methods and equipment used, and readme files on how your data was processed so future researchers from your own lab or globally can utilize your work.
  • Data can be published at any time. While we do have features that assist with affiliated article publication like keeping your data private during the review process, Data Publications do not need to be associated with an article. Publish out your data at any point in time.
  • Data can be versioned. As you update and optimize protocols, or do further analysis on your data, you may update your data files or documentation. Your DOI will always resolve to a landing page listing all versions of the dataset.
  • Data can be uploaded to Dash directly from your computer or through a “manifest”. “Manifest” means you may enter up to 1000 URLs where your data are living on servers, box, dropbox, or google drive and the data will be transferred to Dash without waiting several hours or dealing with timeouts.
  • You can upload up to 100gb of data per submission.
  • Dash does not limit file type. So long as the data are within the size limits listed above, publications can be image data, tabular data, qualitative data, etc…
  • Related works can be linked. Code, articles, other datasets, and protocols can be linked to your data for a more comprehensive package of your research.
  • Data deposited to Dash receive a DOI. This means that not only can your data be located but you can cite your data as you would articles. The landing page for each dataset includes an author list for your citation as well, so each author who contributed to the data collection and analysis may receive credit for their work.
  • Data are assigned an open license. Data deposited are publicly available for re-use to anyone using a Creative Commons license. You put many hours and coffees into producing these data, public release will give your research a broader reach. A light reminder that your name are still associated with data and making your data public does not mean you are “giving away” your work.
  • Dash is a UC project. Dash can be customized per campus. Many campus libraries are subsidizing the cost of storage, and it is developed by University of California Curation Center (UC3) meaning this service is set-up to serve your needs.

We hear a lot about the cost of storage being an inhibitor. But, on many campuses, the storage costs associated with Dash are subsidized by academic libraries or departments. The cost of storage could also be written into grants (as funders do require data to be archived).

We are always looking for feedback on what features would be the most useful, so that we can make data publishing a part of your normal workflows. Get in touch with us or start using Dash to archive and share your data.

From Brain Blobs to Research Data Management

If you spend some time browsing the science section of a publication like the New York Times you’ll likely run across an image that looks something like the one below: A cross section of a brain covered in colored blobs. These images are often used to visualize the results of studies using a technique called functional magnetic resonance imaging (fMRI), a non-invasive method for measuring brain activity (or, more accurately, a correlate of brain activity) over time. Researchers who use fMRI are often interested in measuring the activity associated with a particular mental process or clinical condition.

fMRI

A visualization of the results of an fMRI study. These images are neat to look at but not particularly useful without information the underlying data and analysis.

Because of the size and complexity of the datasets involved, research data management (RDM) is incredibly important in fMRI research. In addition to the brain images, a typical fMRI study involves the collection of questionnaire data, behavioral measures, and sensitive medical information. Analyzing all this data often requires the development of custom code or scripts. This analysis is also iterative and cumulative, meaning that a researcher’s decisions at each step along the way can have significant effects on both the subsequent steps and what is ultimately reported in a presentation, poster, or journal article. Those blobby brain images may look cool, but they aren’t particularly useful in the absence of information about the underlying data and analyses.

In terms of both the financial investment and researcher hours involved, fMRI research is quite expensive. Throughout fMRI’s relatively short history, data sharing has been proposed multiple times times as a method for maximizing the value of individual datasets and for overcoming the field’s ongoing methodological issues. Unfortunately, a very practical issue has hampered efforts to foster the open sharing of fMRI data- researchers have historically organized, documented, and saved their data (and code) in very different ways.

What we are doing and why

Recently, following concerns about sub-optimal statistical practices and long-standing software errors, fMRI researchers have begun to cohere around a set of standards regarding how data should be collected, analyzed, and reported. From a research data management perspective, it’s also very exciting to see that there is also an emerging standard regarding how data should be organized and described. But, even with these emerging standards, our understanding of the data-related practices actually employed by fMRI in the lab and how those practices relate to data sharing and other open science-related activities remains mostly anecdotal.

To help fill this knowledge gap and hopefully advance some best practices related to data management and sharing, Dr. Ana Van Gulick and I are conducting a survey of fMRI researchers. Developed in consultation with members of the open and reproducible neuroscience communities, our survey asks researchers about their own data-related practices, how they view the field as a whole, their interactions with RDM service providers, and the degree to which they’ve embraced developments like registrations and pre-prints. Our hope is that our results will be useful for both the community of researchers who use fMRI but and for data service providers looking to engage with researchers on their own terms.

If you are a researcher who uses fMRI and would like to complete our survey, please follow this link. We estimate that the survey should take between 10 and 20 minutes.

If you are a data service provider and would like to chat with us about what we’re doing and why, please feel free to either leave a comment or contact me directly.

Tagged , , , , , ,

Building a Community: Three months of Library Carpentry.

Back in May, almost 30 librarians, researchers, and faculty members got together in Portland Oregon to learn how to teach lessons from Software, Data, and Library Carpentry. After spending two days learning the ins and outs of Carpentry pedagogy and live coding, we all returned to our home institutions, as part of the burgeoning Library Carpentry community.

lc-icon-black

Library Carpentry didn’t begin in Portland, of course. It began in 2014 when the community began developing a group of lessons at the British Library. Since then, dozens of Library Carpentry workshops have been held across four continents. But the Portland event, hosted by California Digital Library, was the first Library Carpentry-themed instructor training session. Attendees not only joined the Library Carpentry community, but took their first step in getting certified as Software and Data Carpentry instructors. If Library Carpentry was born in London, it went through a massive growth spurt in Portland.

Together, the carpentries are a global movement focused on teaching people computing skills like navigating the Unix Shell, doing version control with Git, and programming with Python. While Software and Data Carpentry are focused on researchers, Library Carpentry is by and for Librarians. Library Carpentry lessons include an introduction to data for librarians, Open Refine, and many more. Many attendees of the Portland instructor training contributed to these lessons during the Mozilla Global Sprint in June. After more than 850 Github events (pull requests, forks, issues, etc), Library Carpentry ended up as far and away the most active part of the global sprint. We even had a five month old get in on the act!

Since the instructor training and the subsequent sprint, a number of Portland attendees have completed their instructor certification. We are on track to have 10 certified instructors in the UC system alone. Congratulations, everyone!

Tagged , , , , ,

Building an RDM Guide for Researchers – An (Overdue) Update

It has been a little while since I last wrote about the work we’re doing to develop a research data management (RDM) guide for researchers. Since then, we’ve thought a lot about the goals of this project and settled on a concrete plan for building out our materials. Because we will soon be proactively seeking feedback on the different elements of this project, I wanted to provide an update on what we’re doing and why.

RosettaStone

A section of the Rosetta Stone. Though it won’t help decipher Egyptian hieroglyphs, we hope our RDM guide will researchers and data service providers speak the same language. Image from the British Museum.

Communication Barriers and Research Data Management

Several weeks ago I wrote about addressing Research Data Management (RDM) as a “wicked problem”, a problem that is difficult to solve because different stakeholders define and address it in different ways. My own experience as a researcher and library postdoc bears this out. Researchers and librarians often think and talk about data in very different ways! But as researchers face changing expectations from funding agencies, academic publishers, their own peers, and other RDM stakeholders about how they should manage and share their data, overcoming such communication barriers becomes increasingly important.

From visualizations like the ubiquitous research data lifecycle to instruments like the Data Curation Profiles, there are a wide variety of excellent tools that can be used to facilitate communication between different RDM stakeholders. Likewise, there are also discipline-specific best practice guidelines and tools like the Research Infrastructure Self Evaluation Framework (RISE) that allow researchers and organizations to assess and advance their RDM activities. What’s missing is a tool that combines these two elements that enables researchers the means to easily self-assess where they are in regards to RDM and allows data service providers to provide easily customizable guidance about how to advance their data-related practices.

Enter our RDM guide for researchers.

Our RDM Guide for Researchers

What I want to emphasize most about our RDM guide is that it is, first and foremost, designed to be a communication tool. The research and library communities both have a tremendous amount of knowledge and expertise related to data management. Our guide is not intended to supplant tools developed by either, but to assist in overcoming communication barriers in a way that removes confusion, grows confidence, and helps people in both communities find direction.

While the shape of RDM guide has not changed significantly since my last post, we have refined its basic structure and have begun filling in the details.

The latest iteration of our guide consists of two main elements:

  1. A RDM rubric which allows researchers to self-assess their data-related practices using language and terminology with which they are familiar.
  2. A series of one page guides that provide information about how to advance data-related practices as necessary, appropriate, or desired.
RDM_rubric (1)

The two components of our RDM Guide for Researchers. The rubric is intended to help researchers orient themselves in the ever changing landscape of RDM while the guides are intended to help them move forward.

The rubric is similar to the “maturity model”  described in my earlier blog posts. In this iteration, it consists of a grid containing three columns and a number of rows. The leftmost column contains descriptions of different phases of the research process. At present, the rubric contains four such phases: Planning, Collection, Analysis, and Sharing. These research data lifecycle-esque terms are in place to provide a framing familiar to data service providers in the library and elsewhere.

The next column includes phrases that describe specific research activities using language and terminology familiar to researchers. The language in this column is, in part, derived from the unofficial survey we conducted to understand how researchers describe the research process. By placing these activities beside those drawn from the research data lifecycle, we hope to ground our model in terms that both researchers and RDM service providers can relate to.

The rightmost column then contains a series of declarative statements which a researcher can use to identify their individual practices in terms of the degree to which they are defined, communicated, and forward thinking.

Each element of the rubric is designed to be customizable. We understand that RDM service providers at different institutions may wish to emphasize different services toggled to different parts data lifecycle and that researchers in different disciplines may have different ways of describing their data-related activities. For example, while we are working on refining the language of the declarative statements, I have left them out of the diagram above because they are likely the  rubric that will remain most open for customization.

Each row within the rubric will be complemented by a one page guide that will provide researchers with concrete information about data-related best practices. If the purpose of the rubric is to allow researchers to orient themselves in RDM landscape, the purpose of these guides is to help them move forward.

Generating Outputs

Now that we’ve refined the basic structure of our model, it’s time to start creating some outputs. Throughout the remainder of the summer and into the autumn, members of the UC3 team will be meeting regularly to review the content of the first set of one page guides. This process will inform our continual refinement of the RDM rubric which will, in turn, shape the writing of a formal paper.

Moving forward, I hope to workshop this project with as many interested parties as I can, both to receive feedback on what we’ve done so far and to potentially crowdsource some of the content. Over the next few weeks I’ll be soliciting feedback on various aspects of the RDM rubric. If you’d like to provide feedback, please either click through the links below (more to be added in the coming weeks) or contact me directly.

 

Provide feedback on our guide!

Planning for Data

Saving Data

More coming soon!

Disambiguating Dash and Merritt

What’s Dash? What’s Merritt? What’s the difference? After numerous questions about where things should go and what the differences are between our UC3 services, we got the hint that we are not communicating clearly.

Clearing things up

A group of us sat down and talked through different use cases and what wording we were using that was causing such confusion, and have come up with what we hope is a disambiguation of Dash versus Merritt. 

Screen Shot 2017-07-10 at 1.54.06 PM

Different intentions, different target users

While Dash and Merritt interact with each other at a technical level, they have different intentions and users should not be looking at these two services as a comparison. Dash is optimized for researchers and therefore its user interface, user experience, and metadata schema are optimized for use by individual researchers. Merritt is designed for use by institutional librarians, archivists, and curators.

Because of the different intended purposes, features, and users, UC3 does not recommend that Merritt be advertised to researchers on Research Data Management (RDM) sites or researcher-facing Library Guides.

Below are quick descriptions of each service that should clarify intentions and target users:

  • Dash is an open data publication platform for researchers. Self-service depositing of research data through Dash fulfills publisher, funder, and data management plan requirements regarding data sharing and preservation. When researchers publish their datasets through Dash, their datasets are issued a DOI to optimize citability, are publicly available for download and re-use under a CC BY 4.0 or CC-0 license, and are preserved in Merritt, California Digital Library’s preservation repository.  Dash is available to researchers at participating UC campuses, as well as researchers in Environmental and Earth Sciences through the DataONE network.
  • Merritt is a preservation repository for mediated deposits by UC organizations. We work with staff at UC libraries, archives, and departments to preserve digital assets and collections. Merritt offers bit-level preservation and replication with both public or private access. Merritt is also the preservation repository that preserves Dash-deposited data.

The cost of service vs. the cost of storage

California Digital Library does not charge individual users for the Dash or Merritt services. However, we do recharge your institution for the amount of storage used in Merritt (remember, Dash preserves data in Merritt) on an annual basis.  On most campuses, the Library fully subsidizes Dash storage costs, so there is no extra financial obligation to individual researchers depositing data into Dash.

Follow-up

If you have any questions about edge cases or would like to know any more details about the architecture of the Dash platform or Merritt repository, please get in touch at uc3@ucop.edu.

And while you’re here: check out Dash’s new features for uploading large data sets, and uploading directly from the cloud.

Talking About Data: Lessons from Science Communication

As a person who worked for years in psychology and neuroscience laboratories before coming to work in academic libraries, I have particularly strong feelings about ambiguous definitions. One of my favorite anecdotes about my first year of graduate school involves watching two researchers argue about the definition of “attention” for several hours, multiple times a week, for an entire semester. One of the researchers was a clinical psychologist, the other a cognitive psychologist. Though they both devised research projects and wrote papers on the topic of attention, their theories and methods could not have been more different. The communication gap between them was so wide that they were never able to move forward productively. The punchline is that, after sitting through hours of their increasingly abstract and contentious arguments, I would go on to study attention using yet another set of theories and methods as a cognitive neuroscientist. Funny story aside, this anecdote illustrates the degree to which people with different perspectives and levels of expertise can define the same problem in strikingly different ways.

VisualSearch

A facsimile of a visual search array used by cognitive psychologists to study attention. Spot the horizontal red rectangle.

In the decade that has elapsed since those arguments, I have undergone my own change in perspective- from a person who primarily collects and analyzes their own research data to a person who primarily thinks about ways to help other researchers manage and share their data. While my day-to-day activities look rather different, there is one aspect of my work as a library post-doc that is similar to my work as a neuroscientist- many of my colleagues ostensibly working on the same things often have strikingly different definitions, methods, and areas of expertise. Fortunately, I have been able to draw on a body of work that addresses this very thing- science communication.

Wicked Problems

A “wicked problem” is a problem that is extremely difficult to solve because different stakeholders define and address it in different ways. In my anecdote about argumentative professors, understanding attention can be considered a wicked problem. Without getting too much into the weeds, the clinical psychologist understood attention mostly in the context of diagnoses like Attention Deficit Disorder, while the cognitive psychologist understood it the context of scanning visual environments for particular elements or features. As a cognitive neuroscientist, I came to understand it mostly in terms of its effects within neural networks as measured by brain imaging methods like fMRI.

Research data management (RDM) has been described as a wicked problem. A data service provider in an academic library may define RDM as “the documentation, curation, and preservation of research data”, while a researcher may define RDM as either simply part of their daily work or, in the case of something like a data management plan written for a grant proposal, as an extra burden placed upon such work. Other RDM stakeholders, including those affiliated with IT, research support, and university administration, may define it in yet other ways.

Science communication is chock full of wicked problems, including concepts like climate change and the use of stem cell use. Actually, given the significant amount of scholarship devoted to defining terms like “scientific literacy” and the multitudes of things that the term describes, science communication may itself be a wicked problem.

What is Scientific Communication?

Like attention and RDM, it is difficult to give a comprehensive definition of science communication. Documentaries like “Cosmos” are probably the most visible examples, but science communication actually comes in a wide variety of forms including science journalism, initiatives aimed at science outreach and advocacy, and science art. What these activities have in common is that they all generally aim to help people make informed decisions in a world dominated by science and technology. In parallel, there is also a burgeoning body of scholarship devoted to the science of science communication which, among other things, examines how effective different communication strategies are for changing people’s perceptions and behaviors around scientific topics.

For decades, the prevailing theory in science communication was the “Deficit Model”, which posits that scientific illiteracy is due to a simple lack of information. In the deficit model, skepticism about topics such as climate change are assumed to be due to a lack of comprehension of the science behind them. Thus, at least according to the deficit model, the “solution” to the problem of science communication is as straightforward as providing people with all the facts. In this conception, the audience is generally assumed to be homogeneous and communication is assumed to be one way (from scientists to the general public).

Though the deficit model persists, study after study (after meta-analysis) has shown that merely providing people with facts about a scientific topic does not cause them to change their perceptions or behaviors related to that topic. Instead, it turns out that presenting facts that conflict with a person’s worldview can actually cause them to double down on that worldview. Also, audiences are not homogenous. Putting aside differences in political and social worldviews, people have very different levels of scientific knowledge and relate to that knowledge in very different ways. For this reason, more modern models of science communication focus not on one-way transmissions of information but on fostering active engagement, re-framing debates, and meeting people where they are. For example, one of the more effective strategies for getting people to pay attention to climate change is not to present them with a litany of (dramatic and terrifying) facts, but to link it to their everyday emotions and concerns.

VisualSearch2

Find the same rectangle as before. It takes a little longer now that the other objects have a wider variety of features, right? Read more about visual search tasks here.

Communicating About Data

If we adapt John Durant’s nicely succinct definition of science literacy,What the general public ought to know about science.” to an RDM context, the result is something like “What researcher’s out to know about handling data.” Thus, data services in academic libraries can be said to be a form of science communication. As with “traditional” science communicators, data service providers interact with audiences possessing different perspectives and levels of knowledge as their own. The major difference, of course, being that the audience for data service providers is specifically the research community.

There is converging evidence that many of the current plans for fostering better RDM have led to mixed results. Recent studies of NSF data management plans have revealed a significant amount of variability in terms of the degree to which researchers address data management-related concepts like metadata, data sharing, and long-term preservation. The audience of data service providers is, like those of more “traditional science communicators, quite heterogeneous, so perhaps adopting methods from the repertoire of science communication could help foster more active engagement and the adoption of better practices. Many libraries and data service providers have already adopted some of these methods, perhaps without realizing their application in other domains. But I also don’t mean to criticize any existing efforts to engage researchers on the topic of RDM. If I’ve learned one thing from doing different forms of science communication over the years, it is that outreach is difficult and change is slow.

In a series of upcoming blog posts, I’ll write about some of my current projects that incorporate what I’ve written here. First up: I’ll provide an update of the RDM Maturity Model project that I previously described here and here. Coming soon!

Tagged , ,

Cirrus-ly Convenient Uploading

That was a cloud pun! Following our release two weeks ago, the Dash team is thrilled to present our newest functionality: you may now upload files directly from Box, Dropbox, and Google Drive!

Let’s get you publishing (and citing and getting credit for your data):

  • Using the “upload from server” option, you may enter up to 1000 URLs (and up to 100gb per submission) by pasting in the sharing link from Box, Dropbox, or Google Drive.

Screen Shot 2017-06-20 at 1.40.37 PM[2]

  •  Validate the files and your URLs will appear including the filename and size.

Screen Shot 2017-06-20 at 1.41.25 PM[2].png

  • Submit & download.
    • Box, Dropbox, and Google uploaded files will download the same as they were uploaded to the cloud
    • Google docs, sheets, or presentations will download as Microsoft Office word documents, excel spreadsheets, or powerpoint presentations.

We will be updating our help and FAQ pages this week to reflect our new features, but in the meantime please let us know if you have any questions or feedback. 

Manifesting Large and Bulk File Data Publications– Now A Reality!

The Dash team is excited to announce our June feature release: Large and Bulk File upload. Taking into consideration the need for large size and file numbers of datasets, as well as the practicality of server timeouts, we have developed a new feature that allows for up to 1,000 files or 100gb* of data to be published per DOI.

To accomplish this we are using a “manifest” workflow- which means that instead of uploading data directly from your computer, you may enter URLS for where your data are located (on a server or public site) for upload. Once uploaded, Dash will display the data in the same manner as direct upload. To reflect this new option for upload we have updated the Upload page to choose between uploading locally (from your computer) or via a server. Information about file size limits (2gb/file, 10gb total local or 1000 files any size up to 100gb*) are listed on this landing page.

Step 1: Enter URLs where data are located

Screen Shot 2017-06-07 at 1.01.59 PM

Step 2: Validated files will appear in Uploaded Files table with any other data files associated from current or former versions

Screen Shot 2017-06-07 at 1.02.19 PM

The benefit of using this workflow is that as a user you do not have to watch your screen for many hours as the data upload and instead your data will be uploaded in the back-end, without the involvement of your computer. This upload mechanism is also not limited to large file use- it can be an easy way to transfer your data directly from a server regardless of size.

A complication with this process is that you cannot upload local data and server-hosted data in the same version. Though this seems tricky- we would like to remind you that Dash supports versioning and after successful publication of the server uploaded data you could go back in and add local files (or vice versa).

While at the moment we do not allow for upload from Gdrive, Box, or Dropbox, we are investigating the sharing links necessary for integrating uploads from the cloud. If you have any feedback to make this feature, or any features more accessible or valuable for researchers please do get in touch. Happy Data Publishing!

Note: To utilize this feature and publish your datasets, your data will need to be hosted on a server. Many institutions, departments, and labs have servers used to host data and information (good examples across the UC campuses, MIT, University of Iowa, etc…). If you have any questions about servers on your campus or external resources, please utilize your campus librarians

*Size limits vary per institutional tenant- please check in with your UC Data Librarians if you have any questions

Make Data Count: Building a System to Support Recognition of Data as a First Class Research Output

The Alfred P. Sloan Foundation has made a 2-year, $747K award to the California Digital Library, DataCite and DataONE to support collection of usage and citation metrics for data objects. Building on pilot work, this award will result in the launch of a new service that will collate and expose data level metrics.

The impact of research has traditionally been measured by citations to journal publications: journal articles are the currency of scholarly research.  However, scholarly research is made up of a much larger and richer set of outputs beyond traditional publications, including research data. In order to track and report the reach of research data, methods for collecting metrics on complex research data are needed.  In this way, data can receive the same credit and recognition that is assigned to journal articles.

Recognition of data as valuable output from the research process is increasing and this project will greatly enhance awareness around the value of data and enable researchers to gain credit for the creation and publication of data” – Ed Pentz, Crossref.

This project will work with the community to create a clear set of guidelines on how to define data usage. In addition, the project will develop a central hub for the collection of data level metrics. These metrics will include data views, downloads, citations, saves, social media mentions, and will be exposed through customized user interfaces deployed at partner organizations. Working in an open source environment, and including extensive user experience testing and community engagement, the products of this project will be available to data repositories, libraries and other organizations to deploy within their own environment, serving their communities of data authors.

Are you working in the data metrics space? Let’s collaborate.

Find out more and follow us at: www.makedatacount.org, @makedatacount

About the Partners

California Digital Library was founded by the University of California in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. University of California Curation Center (UC3), one of four main programs within the CDL, helps researchers and the UC libraries manage, preserve, and provide access to their important digital assets as well as developing tools and services that serve the community throughout the research and data life cycles.

DataCite is a leading global non-profit organization that provides persistent identifiers (DOIs) for research data. Our goal is to help the research community locate, identify, and cite research data with confidence. Through collaboration, DataCite supports researchers by helping them to find, identify, and cite research data; data centres by providing persistent identifiers, workflows and standards; and journal publishers by enabling research articles to be linked to the underlying data/objects.

DataONE (Data Observation Network for Earth) is an NSF DataNet project which is developing a distributed framework and sustainable cyber infrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.