Category Archives: Dash

We’re hiring a new Product Manager!

CDL is recruiting for a new Product Manager.  This position will oversee the product management and outreach activities for the Dash project and service, as well as offer research data management and digital preservation consulting for the UC community.

We are looking for an experienced professional with a full understanding of product/service development and production practices.  This position (officially titled “UC3 Service Manager, Dash”) will focus on the successful development, outreach, and adoption of the Dash service.  A complete revamp of the UI and technical architecture of Dash is nearing completion.  More detail about Dash is available here. A recent presentation on the project is also available here. Because this position will focus on continuous development of Dash, it requires an enthusiastic advocate for research data management best practices, open source community building, and digital curation skills development.

A successful candidate will advocate for the needs of our constituents and translate those needs into detailed enhancements of diverse scope, size, impact, and budget  This Dash Product Manager will have a large support network: the UC3 Director, other UC3 product managers, UC3 development team, other California Digital Library departments, plus the library/IT teams across the 10 UC campuses.  

Learn more and apply here.

What is Dash?

Dash is an open source, online data publication service that makes research data sharing easy.  While Dash gives the appearance of being a full-fledged data repository, it is actually a lightweight overlay layer that sits on top of, and freely interoperates with, standards-compliant repositories supporting common protocols for submission and harvesting.  UC3 has integrated Dash with its Merritt curation repository. The Dash system provides intuitive, easy-to-use interfaces for dataset submission, description, publication, and discovery.  Dash imposes minimal prescriptive eligibility and submission requirements, and automates and hides the mechanical details of DOI assignment, data packaging, and repository deposit from the user.  It features a streamlined, self-service user experience that can be integrated easily and unobtrusively into multifarious scholarly workflows.  

What is UC3?

This position is within the University of California Curation Center (UC3) at the California Digital Library (CDL), an administrative unit of the University of California Office of the President (UCOP).  UC3 works within CDL and across the 10 UC campuses to deliver leading-edge digital curation services.  We plan, create, maintain, enhance, and operate robust services responsive to the evolving needs of UC stakeholders.  UC3’s current initiatives include digital preservation, research data management, data publication, alternative metrics for usage and impact, and web archiving. Reporting to the UC3 Director, this position is responsible for managing the development and maintenance of the Dash service, including playing a key role in promoting  and setting the strategic direction for Dash. As a member of this dynamic team, a successful candidate will be asked to contribute to furthering our work advancing digital curation concepts across the UC community.  More information about UC3 can be found at http://www.cdlib.org/uc3.  

More information about this position can be found here.

Announcing The Dash Tool: Data Sharing Made Easy

We are pleased to announce the launch of Dash – a new self-service tool from the UC Curation Center (UC3) and partners that allows researchers to describe, upload, and share their research data. Dash helps researchers perform the following tasks:

  • Prepare data for curation by reviewing best practice guidance for the creation or acquisition of digital research data.
  • Select data for curation through local file browse or drag-and-drop operation.
  • Describe data in terms of the DataCite metadata schema.
  • Identify data with a persistent digital object identifier (DOI) for permanent citation and discovery.
  • Preserve, manage, and share data by uploading to a public Merritt repository collection.
  • Discover and retrieve data through faceted search and browse.

Who can use Dash?

There are multiple instances of the Dash tool that all have similar functions, look, and feel.  We took this approach because our UC campus partners were interested in their Dash tool having local branding (read more). It also allows us to create new Dash instances for projects or partnerships outside of the UC (e.g., DataONE Dash and our Site Descriptors project).

Researchers at UC Merced, UCLA, UC Irvine, UC Berkeley, or UCOP can use their campus-specific Dash instance:

Other researchers can use DataONE Dash (oneshare.cdlib.org). This instance is available to anyone, free of charge. Use your Google credentials to deposit data.

Note: Data deposited into any Dash instance is visible throughout all of Dash. For example, if you are a UC Merced researcher and use dash.ucmerced.edu to deposit data, your dataset will appear in search results for individuals looking for data via any of the Dash instances, regardless of campus affiliation.

See the Users Guide to get started using Dash.

Stay connected to the Dash project:

Dash Origins

The Dash project began as DataShare, a collaboration among UC3, the University of California San Francisco Library and Center for Knowledge Management, and the UCSF Clinical and Translational Science Institute (CTSI). CTSI is part of the Clinical and Translational Science Award program funded by the National Center for Advancing Translational Sciences at the National Institutes of Health (Grant Number UL1 TR000004).

Fontana del Nettuno

Sound the horns! Dash is live! “Fontana del Nettuno” by Sorin P. from Flickr.

Tagged , , , ,

Dash Project Receives Funding!

We are happy to announce the Alfred P. Sloan Foundation has funded our project to improve the user interface and functionality of our Dash tool! You can read the full grant text at http://escholarship.org/uc/item/2mw6v93b.

More about Dash

Dash is a University of California project to create a platform that allows researchers to easily describe, deposit and share their research data publicly. Currently the Dash platform is connected to the UC3 Merritt Digital Repository; however, we have plans to make the platform compatible with other repositories using protocols during our Sloan-funded work. The Dash project is open-source; read more on our GitHub site. We encourage community discussion and contribution via GitHub Issues.

Currently there are five instances of the Dash tool available:

We plan to launch the new DataONE Dash instance in two weeks; this tool will replace the existing DataUp tool and allow anyone to deposit data into the DataONE infrastructure via the ONEShare repository using their Google credentials. Along with the release of DataONE Dash, we will release Dash 1.1 for the live sites listed above. There will be improvements to the user interface and experience.

The Newly Funded Sloan Project

Problem Statement

Researchers are not archiving and sharing their data in sustainable ways. Often data sharing involves using commercially owned solutions, posting data on personal websites, or submitting data alongside articles as supplemental material. A better option for data archiving is community repositories, which are owned and operated by trusted organizations (i.e., institutional or disciplinary repositories). Although disciplinary repositories are often known and used by researchers in the relevant field, institutional repositories are less well known as a place to archive and share data.

Why aren’t researchers using institutional repositories?

First, the repositories are often not set up for self-service operation by individual researchers who wish to deposit a single dataset without assistance. Second, many (or perhaps most) institutional repositories were created with publications in mind, rather than datasets, which may in part account for their less-than-ideal functionality. Third, user interfaces for the repositories are often poorly designed and do not take into account the user’s experience (or inexperience) and expectations. Because more of our activities are conducted on the Internet, we are exposed to many high-quality, commercial-grade user interfaces in the course of a workday. Correspondingly, researchers have expectations for clean, simple interfaces that can be learned quickly, with minimal need for contacting repository administrators.

Our Solution

We propose to address the three issues above with Dash, a well-designed, user friendly data curation platform that can be layered on top of existing community repositories. Rather than creating a new repository or rebuilding community repositories from the ground up, Dash will provide a way for organizations to allow self-service deposit of datasets via a simple, intuitive interface that is designed with individual researchers in mind. Researchers will be able to document, preserve, and publicly share their own data with minimal support required from repository staff, as well as be able to find, retrieve, and reuse data made available by others.

Three Phases of Work

  1. Requirements gathering: Before the design process begins, we will build requirements for researchers via interviews and surveys
  2. Design work: Based on surveys and interviews with researchers (Phase 1), we will develop requirements for a researcher-focused user interface that is visually appealing and easy to use.
  3. Technical work: Dash will be an added-value data sharing platform that integrates with any repository that supports community protocols (e.g., SWORD (Simple Web-service Offering Repository Deposit).

The dash is a critical component of any good ascii art. By reddit user Haleljacob

Tagged , , , , ,

New Project: Citing Physical Spaces

A few months ago, the UC3 group was contacted by some individuals interested in solving a problem: how should we reference field stations? Rob Plowes from University of Texas/Brackenridge Field Lab emailed us:

I am on a [National Academy of Sciences] panel reviewing aspects of field stations, and we have been discussing a need for data archiving. One idea proposed is for each field station to generate a simple document with a DOI reference to enable use in publications that make reference to the field station. Having this DOI document would enable a standardized citation that could be tracked by an online data aggregator.

We thought this was a great idea and started having a few conversations with other groups (LTER, NEON, etc.) about its feasibility. Fast forward to two weeks ago, when Plowes and Becca Fenwick of UC Merced presented our more fleshed out idea to the OBFS/NAML Joint Meeting in Woods Hole, MA. (OBFS: Organization of Biological Field Stations, and NAML: National Association of Marine Laboratories). The response was overwhelmingly positive, so we are proceeding with the idea in earnest here at the CDL.

The intent of this blog post is to gather feedback from the broader community about our idea, including our proposed metadata fields, our plans for implementation, and whether there are existing initiatives or groups that we should be aware of and/or partner with moving forward.

In a Nutshell

Problem: Tracking publications associated with a field station or site is difficult. There is no clear or standard way to cite field station descriptions.

Proposal: Create individual, citable “publications” with associated persistent identifiers for each field station (more generically called a “site”). Collect these Site Descriptors in the general use DataONE repository, ONEShare. The user interface will be a new instance of the existing UC3 Dash service (under development) with some modifications for Site Descriptors.

What we need from you: 

Moving forward: We plan on gathering community feedback for the next few months, with an eye towards completing a pilot version of the interface by February 2015. We will be ramping up Dash development over the next 12 months thanks to recent funding from the Alfred P. Sloan Foundation, and this development work will include creating a more robust version of the Site Descriptors database.

Project Partners:

  • Rob Plowes, UT Austin/Brackenridge Field Lab
  • Mark Stromberg, UC Berkeley/UC Natural Reserve System
  • Kevin Browne, UC Natural Reserve System Information Manager
  • Becca Fenwick, UC Merced
  • UC3 group
  • DataONE organization

Lovers Point Laboratory (1930), which was later renamed Hopkins Marine Laboratory. From Calisphere, contributed by Monterey County Free Libraries.

Tagged , , ,

The Dash Partners Meeting

This past Thursday, 30 University of California system librarians, developers, and colleagues from nine of the ten campuses assembled at UCLA’s Charles E. Young Library for a discussion of the Dash service. If you weren’t aware, Dash is a University of California project to create a platform that allows researchers to easily describe, deposit and share their research data publicly. The group assembled to talk about the project’s progress and future plans. See the full agenda.

Introductions & Expectations

UC Curation Center (UC3) Director Trisha Cruse kicked off the meeting by asking attendees to introduce themselves and describe what they want to learn  during the meeting. Responses had the following themes:

  • Better understanding of what the Dash service is, how it works, and what it offers for researchers.
  • Participation ideas: how the campuses can work together as a group, and what that work looks like.
  • How we will prioritize development and work together as a cohesive group to determine the trajectory of the service.
  • An understanding of how the campuses are implementing Dash: how they plan to reach out to faculty, how the service should be talked about on the campus, what outreach might look like, how this service can fit into the overall research infrastructure, and campus rollout/adoption plans.
  • Future plans for the Dash service.

Overview of the Dash Service

The team then provided an overview of the Dash service, demonstrating how to log in, describe, and upload a dataset to Dash. Four campus instances of Dash went live (beta) on Tuesday 23 September, and campuses were provided with instructions on how to help test the new system. Stephen Abrams covered the technical infrastructure of the Dash service, describing the relationship between the Merritt repository, the EZID identifier service, the DataONE network, and each of the campus Dash instances (slides).

Yours truly followed with a description of DataONE Dash, a unique instance of the service that will replace the existing DataUp Tool (slides). This instance will be available to anyone with a Google login, and all data submitted to DataONE Dash will be in the ONEShare repository (a DataONE Member Node) and therefore discoverable in the DataONE system. Emily Lin of UC Merced pointed out that some UC Dash contributors might also want their datasets discoverable in DataONE; an enhancement was suggested that would allow UC Dash users to check a box, indicating they would like their work indexed by DataONE.

Stephen then discussed the cost model that is pending approval for Dash (slides). This model is based  on recovering the cost for storage only; there is no service fee for UC users. The model indicates that UC could provide researchers, staff, and graduate students 10 GB of storage in Dash for a total of $290,000/year for the entire system. Sharon Farb of UCLA suggested that we determine what storage solutions are already in place on the various campuses, and coordinate our efforts with those extant solutions. Colleagues from UCSF pointed out that budgets are tight for research labs, and charging for storage may be a significant hurdle for them to participate. We need they need a concrete answer regarding costs now – options may be for each campus to pay up front, or for the UC Office of the President pays for the system. Individual researcher charges would be the responsibility of each campus; CDL has no plans to take on that responsibility.

I followed Stephen with an overview of data governance in Dash (slides). Dash will offer only CC-BY for UC researchers; DataONE Dash will offer only CC-0. The existing DataShare system at UCSF (on which Dash is based) uses a contract (i.e., data use agreement), however this option will not be available moving forward since it inhibits data reuse and complicates Dash implementation. The decision to use CC-BY for Dash is based on conversations with UC General Counsel, which is currently undergoing evaluation of the UC Data Policy. The UC Regents technically own data produced by UC researchers, which complicates how licenses can be used in Dash.

Development Contributions

Marisa Strong then described how campuses can get involved in the development process. She identified the different components of the Dash service, which include three code lines (all in GitHub under an MIT license):

  1. dash-xtf, which houses the search and browse functionality;
  2. dash-ingest, the rails client for ingest to Merritt; and
  3. dash-harvester, a python script for harvesting metadata.

Instructions on how to contribute code are available on the Dash wiki, including how to set up a local test environment.

Matthew McKinley from UC Irvine then described their group’s development efforts in working on the Dash code lines to implement geospatial metadata fields. He described the process for forking the code, implementing the new feature in a local branch, then merging that branch back into the main code line via a pull request.

Plans for Development with Requested Funding

UC3 has submitted a proposal to the Alfred P. Sloan Foundation, requesting funds to continue development of the Dash service. If approved, the grant would fund one year of development focused on the following:

  • Streamlined and improved user interface / user experience
  • Development of embedded widgets for deposit and search functionality in Dash
  • Generalization of Dash protocols so can be layered on top of any repository
  • Expanded functions, including parsing spreadsheets for cleaning and best practices (similar to previous DataUp functionality)
  • Support for more metadata schemas, e.g., EML, FGDC

This work would happen in parallel with the existing Dash application, allowing continuous service while development is ongoing. Declan Fleming of UCSD asked whether UC3 efforts would be better spent using existing infrastructures and tools, such as Fedora. The UC3 team said that they would like to talk further about better possible approaches to the Dash system, and encouraged attendees to share ideas prior to the start of development efforts (if funded).

Dash Enhancements: Identification & Prioritization

The group went through the existing enhancements suggested for Dash, available on GitHub Issues. There were 18 existing enhancements, and the group then suggested an additional 51. Attendees then broke into three groups to prioritize the 69 enhancements for future development. Enhancements that floated to the top included:

  • embargoes (restricted access) for datasets
  • metrics/feedback for data depositors and users (e.g., dataset-level metrics)
  • integration with tools and software such as GitHub, ResearchGate, R, and eScholarship
  • improvements to metadata, including ORCID and Fundref integration

This exercise is only the beginning of the process; the UC3 group plans to tidy up the list and re-share with the group after the meeting for continued discussion. This process will be documented on GitHub and via the Dash listserv. Stay tuned!

Next Steps & Wrap-up

The meeting ended with a discussion about how the campuses would stay informed, what contributions each campus might make to Dash, and how the cross-campus partnership should take shape moving forward. Communication lines will include the Dash Facebook page, Twitter account (@UC3Dash), and the GitHub page. Trisha facilitated a final around-the-room, where attendees could share final thoughts. Common thoughts included excitement for the Dash service, meeting campus partners and hearing about development plans moving forward.

The UCLA campus as it appeared in 1929. Enrollment was 6,175. Contributed to Calisphere by UC Berkeley.

The UCLA campus as it appeared in 1929. Enrollment was 6,175. Contributed to Calisphere by UC Berkeley.

DataUp is Merging with Dash!

Exciting news! We are merging the DataUp tool with our new data sharing platform, Dash.

About Dash

Dash is a University of California project to create a platform that allows researchers to easily describe, deposit and share their research data publicly. Currently the Dash platform is connected to the UC3 Merritt Digital Repository; however, we have plans to make the platform compatible with other repositories using protocols such as SWORD and OAI-PMH. The Dash project is open-source and we encourage community discussion and contribution to our GitHub site.

About the Merge

There is significant overlap in functionality for Dash and DataUp (see below), so we will merge these two projects to enable better support for our users. This merge is funded by an NSF grant (available on eScholarship) supplemental to the DataONE project.

The new service will be an instance of our Dash platform (to be available in late September), connected to the DataONE repository ONEShare. Previously the only way to deposit datasets into ONEShare was via the DataUp interface, thereby limiting deposits to spreadsheets. With the Dash platform, this restriction is removed and any dataset type can be deposited. Users will be able to log in with their Google ID (other options being explored). There are no restrictions on who can use the service, and therefore no restrictions on who can deposit datasets into ONEShare, and the service will remain free. The ONEShare repository will continue to be supported by the University of New Mexico in partnership with CDL/UC3. 

The NSF grant will continue to fund a developer to work with the UC3 team on implementing the DataONE-Dash service, including enabling login via Google and other identity providers, ensuring that metadata produced by Dash will meet the conditions of harvest by DataONE, and exploring the potential for implementing spreadsheet-specific functionality that existed in DataUp (e.g., the best practices check). 

Benefits of the Merge

  • We will be leveraging work that UC3 has already completed on Dash, which has fully-implemented functionality similar to DataUp (upload, describe, get identifier, and share data).
  • ONEShare will continue to exist and be a repository for long tail/orphan datasets.
  • Because Dash is an existing UC3 service, the project will move much more quickly than if we were to start from “scratch” on a new version of DataUp in a language that we can support.
  • Datasets will get DataCite digital object identifiers (DOIs) via EZID.
  • All data deposited via Dash into ONEShare will be discoverable via DataONE.

FAQ about the change

What will happen to DataUp as it currently exists?

The current version of DataUp will continue to exist until November 1, 2014, at which point we will discontinue the service and the dataup.org website will be redirected to the new service. The DataUp codebase will still be available via the project’s GitHub repository.

Why are you no longer supporting the current DataUp tool?

We have limited resources and can’t properly support DataUp as a service due to a lack of local experience with the C#/.NET framework and the Windows Azure platform.  Although DataUp and Dash were originally started as independent projects, over time their functionality converged significantly.  It is more efficient to continue forward with a single platform and we chose to use Dash as a more sustainable basis for this consolidated service.  Dash is implemented in the  Ruby on Rails framework that is used extensively by other CDL/UC3 service offerings.

What happens to data already submitted to ONEShare via DataUp?

All datasets now in ONEShare will be automatically available in the new Dash discovery environment alongside all newly contributed data.  All datasets also continue to be accessible directly via the Merritt interface at https://merritt.cdlib.org/m/oneshare_dataup.

Will the same functionality exist in Dash as in DataUp?

Users will be able to describe their datasets, get an identifier and citation for them, and share them publicly using the Dash tool. The initial implementation of DataONE-Dash will not have capabilities for parsing spreadsheets and reporting on best practices compliance. Also the user will not be able to describe column-level (i.e., attribute) metadata via the web interface. Our intention, however, is develop out these functions and other enhancements in the future. Stay tuned!

Still want help specifically with spreadsheets?

  • We have pulled together some best practices resources: Spreadsheet Help 
  • Check out the Morpho Tool from the KNB – free, open-source data management software you can download to create/edit/share spreadsheet metadata (both file- and column-level). Bonus – The KNB is part of the DataONE Network.

 

It's the dawn of a new day for DataUp! From Flickr by David Yu.

It’s the dawn of a new day for DataUp! From Flickr by David Yu.

Tagged , , , , , ,