This past Thursday, 30 University of California system librarians, developers, and colleagues from nine of the ten campuses assembled at UCLA’s Charles E. Young Library for a discussion of the Dash service. If you weren’t aware, Dash is a University of California project to create a platform that allows researchers to easily describe, deposit and share their research data publicly. The group assembled to talk about the project’s progress and future plans. See the full agenda.
Introductions & Expectations
UC Curation Center (UC3) Director Trisha Cruse kicked off the meeting by asking attendees to introduce themselves and describe what they want to learn during the meeting. Responses had the following themes:
- Better understanding of what the Dash service is, how it works, and what it offers for researchers.
- Participation ideas: how the campuses can work together as a group, and what that work looks like.
- How we will prioritize development and work together as a cohesive group to determine the trajectory of the service.
- An understanding of how the campuses are implementing Dash: how they plan to reach out to faculty, how the service should be talked about on the campus, what outreach might look like, how this service can fit into the overall research infrastructure, and campus rollout/adoption plans.
- Future plans for the Dash service.
Overview of the Dash Service
The team then provided an overview of the Dash service, demonstrating how to log in, describe, and upload a dataset to Dash. Four campus instances of Dash went live (beta) on Tuesday 23 September, and campuses were provided with instructions on how to help test the new system. Stephen Abrams covered the technical infrastructure of the Dash service, describing the relationship between the Merritt repository, the EZID identifier service, the DataONE network, and each of the campus Dash instances (slides).
Yours truly followed with a description of DataONE Dash, a unique instance of the service that will replace the existing DataUp Tool (slides). This instance will be available to anyone with a Google login, and all data submitted to DataONE Dash will be in the ONEShare repository (a DataONE Member Node) and therefore discoverable in the DataONE system. Emily Lin of UC Merced pointed out that some UC Dash contributors might also want their datasets discoverable in DataONE; an enhancement was suggested that would allow UC Dash users to check a box, indicating they would like their work indexed by DataONE.
Stephen then discussed the cost model that is pending approval for Dash (slides). This model is based on recovering the cost for storage only; there is no service fee for UC users. The model indicates that UC could provide researchers, staff, and graduate students 10 GB of storage in Dash for a total of $290,000/year for the entire system. Sharon Farb of UCLA suggested that we determine what storage solutions are already in place on the various campuses, and coordinate our efforts with those extant solutions. Colleagues from UCSF pointed out that budgets are tight for research labs, and charging for storage may be a significant hurdle for them to participate. We need they need a concrete answer regarding costs now – options may be for each campus to pay up front, or for the UC Office of the President pays for the system. Individual researcher charges would be the responsibility of each campus; CDL has no plans to take on that responsibility.
I followed Stephen with an overview of data governance in Dash (slides). Dash will offer only CC-BY for UC researchers; DataONE Dash will offer only CC-0. The existing DataShare system at UCSF (on which Dash is based) uses a contract (i.e., data use agreement), however this option will not be available moving forward since it inhibits data reuse and complicates Dash implementation. The decision to use CC-BY for Dash is based on conversations with UC General Counsel, which is currently undergoing evaluation of the UC Data Policy. The UC Regents technically own data produced by UC researchers, which complicates how licenses can be used in Dash.
Marisa Strong then described how campuses can get involved in the development process. She identified the different components of the Dash service, which include three code lines (all in GitHub under an MIT license):
- dash-xtf, which houses the search and browse functionality;
- dash-ingest, the rails client for ingest to Merritt; and
- dash-harvester, a python script for harvesting metadata.
Instructions on how to contribute code are available on the Dash wiki, including how to set up a local test environment.
Matthew McKinley from UC Irvine then described their group’s development efforts in working on the Dash code lines to implement geospatial metadata fields. He described the process for forking the code, implementing the new feature in a local branch, then merging that branch back into the main code line via a pull request.
Plans for Development with Requested Funding
UC3 has submitted a proposal to the Alfred P. Sloan Foundation, requesting funds to continue development of the Dash service. If approved, the grant would fund one year of development focused on the following:
- Streamlined and improved user interface / user experience
- Development of embedded widgets for deposit and search functionality in Dash
- Generalization of Dash protocols so can be layered on top of any repository
- Expanded functions, including parsing spreadsheets for cleaning and best practices (similar to previous DataUp functionality)
- Support for more metadata schemas, e.g., EML, FGDC
This work would happen in parallel with the existing Dash application, allowing continuous service while development is ongoing. Declan Fleming of UCSD asked whether UC3 efforts would be better spent using existing infrastructures and tools, such as Fedora. The UC3 team said that they would like to talk further about better possible approaches to the Dash system, and encouraged attendees to share ideas prior to the start of development efforts (if funded).
Dash Enhancements: Identification & Prioritization
The group went through the existing enhancements suggested for Dash, available on GitHub Issues. There were 18 existing enhancements, and the group then suggested an additional 51. Attendees then broke into three groups to prioritize the 69 enhancements for future development. Enhancements that floated to the top included:
- embargoes (restricted access) for datasets
- metrics/feedback for data depositors and users (e.g., dataset-level metrics)
- integration with tools and software such as GitHub, ResearchGate, R, and eScholarship
- improvements to metadata, including ORCID and Fundref integration
This exercise is only the beginning of the process; the UC3 group plans to tidy up the list and re-share with the group after the meeting for continued discussion. This process will be documented on GitHub and via the Dash listserv. Stay tuned!
Next Steps & Wrap-up
The meeting ended with a discussion about how the campuses would stay informed, what contributions each campus might make to Dash, and how the cross-campus partnership should take shape moving forward. Communication lines will include the Dash Facebook page, Twitter account (@UC3Dash), and the GitHub page. Trisha facilitated a final around-the-room, where attendees could share final thoughts. Common thoughts included excitement for the Dash service, meeting campus partners and hearing about development plans moving forward.
The UCLA campus as it appeared in 1929. Enrollment was 6,175. Contributed to Calisphere by UC Berkeley.