Tag Archives: requirements

An RDM Model for Researchers: What we’ve learned

Thanks to everyone who gave feedback on our previous blog post describing our data management tool for researchers. We received a great deal of input related to our guide’s use of the term “data sharing” and our guide’s position in relation to other RDM tools as well as quite a few questions about what our guide will include as we develop it further.

As stated in our initial post, we’re building a tool to enable individual researchers to assess the maturity of their data management practices within an institutional or organizational context. To do this, we’ve taken the concept of RDM maturity from in existing tools like the Five Organizational Stages of Digital Preservation, the Scientific Data Management Capability Model, and the Capability Maturity Guide and placed it within a framework familiar to researchers, the research data lifecycle.

researchercmm_090916

A visualization of our guide as presented in our last blog post. An updated version, including changed made in response to reader feedback, is presented later in this post.

Data Sharing

The most immediate feedback we received was about the term “Data Sharing”. Several commenters pointed out the ambiguity of this term in the context of the research data life cycle. In the last iteration of our guide, we intended “Data Sharing” as a shorthand to describe activities related to the communication of data. Such activities may range from describing data in a traditional scholarly publication to depositing a dataset in a public repository or publishing a data paper. Because existing data sharing policies (e.g. PLOS, The Gates Foundation, and The Moore Foundation) refer specifically to the latter over the former, the term is clearly too imprecise for our guide.

Like “Data Sharing”, “Data Publication” is a popular term for describing activities surrounding the communication of data. Even more than “Sharing”, “Publication” relays our desire to advance practices that treat data as a first class research product. Unfortunately the term is simultaneously too precise and too ambiguous it to be useful in our guide. On one hand, the term “Data Publication” can refer specifically to a peer reviewed document that presents a dataset without offering any analysis or conclusion. While data papers may be a straightforward way of inserting datasets into the existing scholarly communication ecosystem, they represent a single point on the continuum of data management maturity. On the other hand, there is currently no clear consensus between researchers about what it means to “publish” data.

For now, we’ve given that portion of our guide the preliminary label of “Data Output”. As the development process proceeds, this row will include a full range of activities- from description of data in traditional scholarly publications (that may or may not include a data availability statement) to depositing data into public repositories and the publication of data papers.

Other Models and Guides

While we correctly identified that there are are range of rubrics, tools, and capability models with similar aims as our guide, we overstated that ours uniquely allows researchers to assess where they are and where they want to be in regards to data management. Several of the tools we cited in our initial post can be applied by researchers to measure the maturity of data management practices within a project or institutional context.

Below we’ve profiled four such tools and indicated how we believe our guide differs from each. In differentiating our guide, we do not mean to position it strictly as an alternative. Rather, we believe that our guide could be used in concert with these other tools.

Collaborative Assessment of Research Data Infrastructure and Objectives (CARDIO)

CARDIO is a benchmarking tool designed to be used by researchers, service providers, and coordinators for collaborative data management strategy development. Designed to be applied at a variety of levels, from entire institutions down to individual research projects, CARDIO enables its users to collaboratively assess data management requirements, activities, and capacities using an online interface. Users of CARDIO rate their data management infrastructure relative to a series of statements concerning their organization, technology, and resources. After completing CARDIO, users are given a comprehensive set of quantitative capability ratings as well as a series of practical recommendations for improvement.

Unlike CARDIO, our guide does not necessarily assume its users are in contact with data-related service providers at their institution. As we stated in our initial blog post, we intend to guide researchers to specialist knowledge without necessarily turning them into specialists. Therefore, we would consider a researcher making contact with their local data management, research IT, or library service providers for the first time as a positive application of our guide.

Community Capability Model Framework (CCMF)

The Community Capability Model Framework is designed to evaluate a community’s readiness to perform data intensive research. Intended to be used by researchers, institutions, and funders to assess current capabilities, identify areas requiring investment, and develop roadmaps for achieving a target state of readiness, the CCMF encompasses eight “capability factors” including openness, skills and training, research culture, and technical infrastructure. When used alongside the Capability Profile Template, the CCMF provides its users with a scorecard containing multiple quantitative scores related to each capability factor.   

Unlike the CCMF, our guide does not necessarily assume that its users should all be striving towards the same level of data management maturity. We recognize that data management practices may vary significantly between institutions or research areas and that what works for one researcher may not necessarily work for another. Therefore, we would consider researchers understanding the maturity of their data management practices within their local contexts to be a positive application of our guide.

Data Curation Profiles (DCP) and DMVitals

The Data Curation Profile toolkit is intended to address the needs of an individual researcher or research group with regards to the “primary” data used for a particular project. Taking the form of a structured interview between an information professional and a researcher, a DCP can allow an individual research group to consider their long-term data needs, enable an institution to coordinate their data management services, or facilitate research into broader topics in digital curation and preservation.

DMVitals is a tool designed to take information from a source like a Data Curation Profile and use it to systematically assess a researcher’s data management practices in direct comparison to institutional and domain standards. Using the DMVitals, a consultant matches a list of evaluated data management practices with responses from an interview and ranks the researcher’s current practices by their level of data management “sustainability.” The tool then generates customized and actionable recommendations, which a consultant then provides to the researcher as guidance to improve his or her data management practices.  

Unlike DMVitals, our guide does not calculate a quantitative rating to describe the maturity of data management practices. From a measurement perspective, the range of practice maturity may differ between the four stages of our guide (e.g. the “Project Planning” stage could have greater or fewer steps than the “Data Collection” stage), which would significantly complicate the interpretation of any quantitative ratings derived from our guide. We also recognize that data management practices are constantly evolving and likely dependent on disciplinary and institutional context. On the other hand, we also recognize the utility of quantitative ratings for benchmarking. Therefore, if, after assessing the maturity of their data management practices with our guide, a researcher chooses to apply a tool like DMVitals, we would consider that a positive application of our guide.

Our Model (Redux)

Perhaps the biggest takeaway from the response to our  last blog post is that it is very difficult to give detailed feedback on a guide that is mostly whitespace. Below is an updated mock-up, which describes a set of RDM practices along the continuum of data management maturity. At present, we are not aiming to illustrate a full range of data management practices. More simply, this mock-up is intended to show the types of practices that could be described by our guide once it is complete.

screen-shot-2016-11-08-at-11-37-35-am

An updated visualization of our guide based on reader feedback. At this stage, the example RDM practices are intended to be representative not comprehensive.

Project Planning

The “Project Planning” stage describes practices that occur prior to the start of data collection. Our examples are all centered around data management plans (DMPs), but other considerations at this stage could include training in data literacy, engagement with local RDM services, inclusion of “sharing” in project documentation (e.g. consent forms), and project pre-registration.

Data Collection

The “Data Collection” stage describes practices related to the acquisition, accumulation, measurement, or simulation of data. Our examples relate mostly to standards around file naming and structuring, but other considerations at this stage could include the protection of sensitive or restricted data, validation of data integrity, and specification of linked data.

Data Analysis

The “Data Analysis” stage describes practices that involve the inspection, modeling, cleaning, or transformation of data. Our examples mostly relate to documenting the analysis workflow, but other considerations at this stage could include the generation and annotation of code and the packaging of data within sharable files or formats.

Data Output

The “Data Output” stage describes practices that involve the communication of either the data itself of conclusions drawn from the data. Our examples are mostly related to the communication of data linked to scholarly publications, but other considerations at this stage could include journal and funder mandates around data sharing, the publication of data papers, and the long term preservation of data.

Next Steps

Now that we’ve solicited a round of feedback from the community that works on issues around research support, data management, and digital curation, our next step is to broaden our scope to include researchers.

Specifically we are looking for help with the following:

  • Do you find the divisions within our model useful? We’ve used the research data lifecycle as a framework because we believe it makes our tool user-friendly for researchers. At the same time, we also acknowledge that the lines separating planning, collection, analysis, and output can be quite blurry. We would be grateful to know if researchers or data management service providers find these divisions useful or overly constrained.
  • Should there be more discrete “steps” within our framework? Because we view data management maturity as a continuum, we have shied away from creating discrete steps within each division. We would be grateful to know how researchers or data management service providers view this approach, especially when compared to the more quantitative approach employed by CARDIO, the Capability Profile Template, and DMVitals.
  • What else should we put into our model? Researchers are faced with changing expectations and obligations in regards to data management. We want our model to reflect that. We also want our model to reflect the relationship between research data management and broader issues like openness and reproducibility. With that in mind, what other practices and considerations should or model include?
Tagged , , , , , ,

Building a user-friendly RDM maturity model

UC3 is developing a guide to help researchers assess and progress the maturity of their data management practices.

What are we doing?

Researchers are increasingly faced with new expectations and obligations in regards to data management. To help researchers navigate this changing landscape and to complement existing instruments that enable librarians and other data managers to assess the maturity of data management practices at an institutional or organizational level, we’re developing a guide that will enable researchers to assess the maturity of their individual practices within an institutional or organizational context.

Our aim is to be descriptive rather than prescriptive. We do not assume every researcher will want or need to achieve the same level of maturity for all their data management practices. Rather, we aim to provide researchers with a guide to specialist knowledge without necessarily turning researchers into specialists. We want to help researchers understand where they are and, where appropriate, how to get to where they want or need to be.

Existing Models

As a first step in building our own guide, we’ve researched the range of related tools, rubrics, and capability models. Many, including the Five Organizational Stages of Digital Preservation, the Scientific Data Management Capability Model, and the Capability Maturity Guide developed by the Australian National Data Service, draw heavily from the SEI Capability Maturity Model and are intended to assist librarians, repository managers, and other data management service providers in benchmarking the policies, infrastructure, and services of their organization or institution.  Others, including the Collaborative Assessment of Research Data Infrastructure and Objectives (CARDIO), DMVitals, and the Community Capability Framework, incorporate feedback from researchers and are intended to assist in benchmarking a broad set of data management-related topics for a broad set of stockholders – from organizations and institutions down to individual research groups.

We intend for our guide to build on these tools but to have a different, and we think novel, focus. While we believe it could be a useful tool for data management service providers, the intended audience of our guide is research practitioners. While integration with service providers in the library, research IT, and elsewhere will be included where appropriate, the the focus will be on equipping researchers to assess and refine their individual own data management activities. While technical infrastructure will be included where appropriate, the focus will be on behaviors, “soft skills”, and training.

Our Guide

Below is a preliminary mockup of our guide. Akin to the “How Open Is It?” guide developed by SPARC, PLOS, and the OASPA, our aim is to provide a tool that is comprehensive, user-friendly, and provides tangible recommendations.  

researchercmm_090916

Obviously we still have a significant amount of work to do to refine the language and fill in the details. At the moment, we are using elements of the research data lifecycle to broadly describe research activities and very general terms to describe the continuum of practice maturity. Our next step is to fill in the blanks- to more precisely describe research activities and more clearly delineate the stages of practice maturity. From there, we will work to outline the behaviors, skills, and expertise present for each research activity at each stage.

Next Steps

Now that we’ve researched existing tools for assessing data management services and sketched out a preliminary framework for our guide, our next step is to elicit feedback from the broader community that works on issues around research support, data management, and digital curation and preservation.

Specifically we are looking for help on the following:

  • Have we missed anything? There is a range of data management-related rubrics, tools, and capability models – from the community-focused frameworks described above to frameworks focused on the preservation and curation of digital assets (e.g. the Digital Asset Framework, DRAMBORA). As far as we’re aware, there isn’t a complementary tool that allows researchers to assess where they are and where they want to be in regards to data management. Are there efforts that have already met this need? We’d be grateful for any input about the existence of frameworks with similar goals.
  • What would be the most useful divisions and steps within our framework? The “three legged stool” developed by the Digital Preservation Management workshop has been highly influential for community and data management provider-focused tools. Though examining policies, resources, and infrastructure are also important for researchers when self-assessing their data management practices, we believe it would be more useful for our guide to be more reflective of how data is generated, managed, disseminated in a research context. We’d be grateful for any insight into how we could incorporate related models – such as those depicting the research data lifecycle – into our framework.
Tagged , , , , , ,

UC3, PLOS, and DataONE join forces to build incentives for data sharing

We are excited to announce that UC3, in partnership with PLOS and DataONE, are launching a new project to develop data-level metrics (DLMs). This 12-month project is funded by an Early Concept Grants for Exploratory Research (EAGER) grant from the National Science Foundation, and will result in a suite of metrics that track and measure data use. The proposal is available via CDL’s eScholarship repository: http://escholarship.org/uc/item/9kf081vf. More information is also available on the NSF Website.

Why DLMs? Sharing data is time consuming and researchers need incentives for undertaking the extra work. Metrics for data will provide feedback on data usage, views, and impact that will help encourage researchers to share their data. This project will explore and test the metrics needed to capture activity surrounding research data.

The DLM pilot will build from the successful open source Article-Level Metrics community project, Lagotto, originally started by PLOS in 2009. ALM provide a view into the activity surrounding an article after publication, across a broad spectrum of ways in which research is disseminated and used (e.g., viewed, shared, discussed, cited, and recommended, etc.)

About the project partners

PLOS (Public Library of Science) is a nonprofit publisher and advocacy organization founded to accelerate progress in science and medicine by leading a transformation in research communication.

Data Observation Network for Earth (DataONE) is an NSF DataNet project which is developing a distributed framework and sustainable cyberinfrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.

The University of California Curation Center (UC3) at the California Digital Library is a creative partnership bringing together the expertise and resources of the University of California. Together with the UC libraries, we provide high quality and cost-effective solutions that enable campus constituencies – museums, libraries, archives, academic departments, research units and individual researchers – to have direct control over the management, curation and preservation of the information resources underpinning their scholarly activities.

The official mascot for our new project: Count von Count. From muppet.wikia.com

The official mascot for our new project: Count von Count. From muppet.wikia.com

Tagged , , ,

UC Open Access: How to Comply

Free access to UC research is almost as good as free hugs! From Flickr by mhauri

Free access to UC research is almost as good as free hugs! From Flickr by mhauri

My last two blog posts have been about the new open access policy that applies to the entire University of California system. For big open science nerds like myself, this is exciting progress and deserves much ado. For the on-the-ground researcher at a UC, knee-deep in grants and lecture preparation, the ado could probably be skipped in lieu of a straightforward explanation of how to comply with the procedure. So here goes.

Who & When:

  • 1 November 2013: Faculty at UC Irvine, UCLA, and UCSF
  • 1 November 2014: Faculty at UC Berkeley, UC Merced, UC Santa Cruz, UC Santa Barbara, UC Davis, UC San Diego, UC Riverside

Note: The policy applies only to ladder-rank faculty members. Of course, graduate students and postdocs should strongly consider participating as well.

To comply, faculty members have two options:

Option 1: Out-of-the-box open access

. There are two ways to do this:

  1. Publishing in an open access-only journal (see examples here). Some have fees and others do not.
  2. Publishing with a more traditional publisher, but paying a fee to ensure the manuscript is publicly available. These are article-processing charges (APCs) and vary widely depending on the journal. For example, Elsevier’s Ecological Informatics charges $2,500, while Nature charges $5,200.

Learn more about different journals’ fees and policies: Directory of Open Access Journals: www.doaj.org

Option 2: Deposit your final manuscript in an open access repository.

In this scenario, you can publish in whatever journal you prefer – regardless of its openness. Once the manuscript is published, you take action to make a version of the article freely and openly available.

As UC faculty (or any UC researcher, including grad students and postdocs), you can comply via Option 2 above by depositing your publications in UC’s eScholarship open access repository. The CDL Access & Publishing Group is currently perfecting a user-friendly, efficient workflow for managing article deposits into eScholarship. The new workflow will be available as of November 1stLearn more.

Does this still sound like too much work? Good news! The Publishing Group is also working on a harvesting tool that will automate deposit into eScholarship. Stay tuned – the estimated release of this tool is June 2014.

An Addendum: Are you not a UC affiliate? Don’t fret! You can find your own version of eScholarship (i.e., an open access repository) by going to OpenDOAR. Also see my full blog post about making your publications open access.

Why?

Academic libraries must pay exorbitant fees to provide their patrons (researchers) with access to scholarly publications.  The very patrons who need these publications are the ones who provide the content in the form of research articles.  Essentially, the researchers are paying for their own work, by proxy via their institution’s library.

What if you don’t have access? Individuals without institutional affiliations (e.g., between jobs), or who are affiliated with institutions that have no/a poorly funded library (e.g., in 2nd or 3rd world countries), depend on open access articles for keeping up with the scholarly literature. The need for OA isn’t limited to jobless or international folks, though. For proof, one only has to notice that the Twitter community has developed a hash tag around this, #Icanhazpdf (Hat tip to the Lolcats phenomenon). Basically, you tweet the name of the article you can’t access and add the hashtag in hopes that someone out in the Twittersphere can help you out and send it to you.

Special thanks to Catherine Mitchell from the CDL Publishing & Access Group for help on this post.

Tagged , , , , ,

It’s Time for Better Project Metrics

I’m involved in lots of projects, based at many institutions, with multiple funders and oodles of people involved. Each of these projects has requirements for reporting metrics that are used to prove the project is successful. Here, I want to argue that many of these metrics are arbitrary, and in some cases misleading. I’m not sure what the solution is – but I am anxious for a discussion to start about reporting requirements for funders and institutions, metrics for success, and how we measure a project’s impact.

What are the current requirements for projects to assess success? The most common request is for text-based reports – which are reminiscent of junior high book reports. My colleague here at the CDL, John Kunze, has been working for the UC in some capacity for a long time. If anyone is familiar with the bureaucratic frustrations of metrics, it’s John. Recently he brought me a sticky-note with an acronym he’s hoping will catch on:

SNωωRF: Stuff nobody wants to write, read, or fund

The two lower-case omegas, which translate to “w” for the acronym, represent the letter “O” to facilitate pronunciation –i.e.,  “snorf”. He was prompted to invent this catchy acronym after writing up a report for a collaborative project we work on, based in Europe. After writing the report, he was told it “needed to be longer by two or three pages”. The necessary content was there in the short version – but it wasn’t long enough to look thorough. Clearly brevity is not something that’s rewarded in project reporting.

Which orange dot is bigger? Overall impressions differ from what the measurements say. Measuring and comparing projects doesn't always reflect success. From donomic10.edublogs.org

Which orange dot is bigger? Overall impressions differ from what the measurements say. Project metrics doesn’t always reflect success. From donomic10.edublogs.org

Outside of text-based reports, there are other reports and metrics that higher-ups like: number of website hits, number of collaborations, number of conferences attended, number of partners/institutions involved, et cetera. A really successful project can look weak in all these ways. Similarly, a crap project can look quite successful based on the metrics listed. So if there is not a clear correlation between metrics used for project success, and actual project success, why do we measure them?

So what’s the alternative? The simplest alternative – not measuring/reporting metrics – is probably not going to fly with funders, institutions, or organizations. In fact, metrics play an important role. They allow for comparisons among projects, provide targets to strive for, and allow project members to assess progress. Perhaps rather than defaulting to the standard reporting requirements, funders and institutions could instead take some time to consider what success means for a particular project, and customize the metrics based on that.

In the space I operate (data sharing, data management, open science, scholarly publishing etc.) project success is best assessed by whether the project has (1) resulted in new conversations, debates and dialogue, and/or (2) changed the way science is done. Examples of successful projects based on this definition: figshare, ImpactStory, PeerJ, IPython Notebook, and basically anything funded by the Alfred P. Sloan Foundation. Many of these would also pass the success test based on more traditional metrics, but not necessarily. I will avoid making enemies by listing projects that I deem unsuccessful, despite their passing the test based on traditional metrics.

The altmetrics movement is focused on reviewing researcher and research impact in new, interesting ways (see my blog posts on the topic here and here). What would this altmetrics movement look like in terms of projects? I’m not sure, but I know that its time has come.

Tagged , , ,

Collecting Journal Data Policies: JoRD

My last two posts have related to IDCC 2013; that makes this post three in a row. Apparently IDCC is a gift that just keeps giving (albeit a rather short post in this case).

Today the topic is the JoRD project, funded by JISC. JoRD stands for Journal Research Data; the JoRD Policy Bank is basically a project to collect and summarize data policies for a range of academic journals.

From the JISC project website, this project aims to

provide researchers, managers of research data and other stakeholders with an easy source of reference to understand and comply with Research Data policies.

How to go about this? The project’s objectives (cribbed and edited from the project site):

  1. Identify and consult with stakeholders; develop stakeholder requirements
  2. Investigate the current state of data sharing policies within journals
  3. Deliver recommendations on a central service to summarize journal research data policies and provide a reference for guidance and information on journal policies.

I’m most interested in #2: what are journals saying about data sharing? To tackle this, project members are collecting information about data sharing policies on the the top 100 and bottom 100 Science Journals, and the top 100 and bottom 100 Social Science Journals. Based on the stated journal policies about data sharing, they fill out an extensive spreadsheet. I’m anxious to see the final outcome of this data collection – my hunch is that most journals “encourage” or “recommend” data sharing, but do not mandate it.

I think of the JoRD Policy Bank as having two major benefits:

Educating Researchers. As  you may be aware, many researchers are a bit slow to jump on the data sharing bandwagon.  This is the case despite the fact that all signs point to future requirements for sharing at the time of publication (see my post about it, Thanks in Advance for Sharing Your Data). Once researchers come to terms with the fact that soon data sharing will not be optional, they will need to know how to comply. Enter JoRD Policy Bank!

Encouraging Publishers. The focus on stakeholder needs and requirements suggests that the outcomes of this project will provide guidance to publishers about how to proceed in their requirements surrounding data sharing. There might be a bit of peer pressure, as well: Journals don’t want to seem behind the times when it comes to data sharing, lest their credibility be threatened.

In general, the JoRD website is chock full of information about data sharing policies, open data, and data citation. Check it out!

C'mon researchers! Jump on the data sharing band wagon! From purlem.com

C’mon researchers! Jump on the data sharing band wagon! From purlem.com

Tagged , , , , , ,

Thanks in Advance For Sharing Your Data

barbara bates turkey

Barbara Bates says to be sure to dress your turkey properly this season! Then invite him to eat some tofurky with you. From Flickr by carbonated

It’s American Thanksgiving this week, which means that hall traffic at your local university is likely to dwindle down to zero by Wednesday afternoon.  Because it’s a short week, this is a short post.  I wanted to briefly touch on data sharing policies in journals.

Will you be required to share your data next time you publish? If you are looking for a short answer, it’s probably not. Depending on the field you are in, the requirements for data sharing are not very… forceful. They often involve phrases like “strongly encourage” or “provided on demand”, rather than requiring researchers to archive their data, obtain an identifier, and submit that information alongside the journal article.  The journal Nature just beefed up their wording a bit; still no requirements for archiving though. Read the Nature policy on availability of data and materials.

Despite the slow progress towards data sharing mandates, there is a growing list of journals that sign up for the Joint Data Archiving Policy (JDAP), the brainchild of folks over at the Dryad Repository. The JDAP  verbiage, which journals can use in their instructions for authors, states that supporting data must be publicly available:

<< Journal >> requires, as a condition for publication, that data supporting the results in the paper should be archived in an appropriate public archive, such as << list of approved archives here >>. Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future. Authors may elect to have the data publicly available at time of publication, or, if the technology of the archive allows, may opt to embargo access to the data for a period up to a year after publication. Exceptions may be granted at the discretion of the editor, especially for sensitive information such as human subject data or the location of endangered species.

The bold face emphasis was mine, which I did because it’s important: the journal requires, as a condition for publication, that you share your data.  Now we’re cooking with gas!

The JDAP was adopted in a joint and coordinated fashion by many leading journals in the field of evolution in 2011, and JDAP has since been adopted by other journals across various disciplines. A list of journals that require data sharing via the JDAP verbiage are below.

Two other interesting bits about data sharing, in this case in PLOS:

List of Journals that require data sharing:

  • The American Naturalist
  • Biological Journal of the Linnean Society
  • BMC Ecology
  • BMC Evolutionary Biology
  • BMJ
  • BMJ Open
  • Ecological Applications
  • Ecological Monographs
  • Ecology
  • Ecosphere
  • Evolution
  • Evolutionary Applications
  • Frontiers in Ecology and the Environment
  • Functional Ecology
  • Genetics
  • Heredity
  • Journal of Applied Ecology
  • Journal of Ecology
  • Journal of Evolutionary Biology
  • Journal of Fish and Wildlife Management
  • Journal of Heredity
  • Journal of Paleontology
  • Molecular Biology and Evolution
  • Molecular Ecology and Molecular Ecology Resources
  • Nature
  • Nucleic Acids Research
  • Paleobiology
  • PLOS
  • Science
  • Systematic Biology
  • ZooKeys
Tagged , ,

Progress & Plans for DataUp Release

I can't get no satisfaction album cover

Unlike Mick Jagger, our beta testers are satisfied. From wikipedia.org

It was one year ago today that I moved up to the Bay Area to work on DataUp (then DCXL) in earnest.  It seems fitting that this milestone be marked by some significant progress on the project.  No, we haven’t released DataUp to the public yet, but we have a release date slated for this September.  This is very exciting news, especially since the project got off to a bit of a slow  start.  We have been cooking with gas since March, however, and the DataUp tool promises to do much of what I had envisioned on my drive from Santa Barbara last year.

If you are wondering what DataUp looks like, you will need to be patient.  You can, however, see some preliminary responses from our very gracious beta testers.  The good news is this: most folks seem pretty happy with the tool as-is, and many offered some really great feedback that will improve the tool as we move into the community involvement phase of the development effort.

We asked 21 beta testers what they thought of DataUp features, and here are the results:

We expect that the DataUp tool will only improve from here on out, so stay tuned for our big debut in less than two months!

Tagged , , ,

Survey says…

A few weeks ago we reached out to the scientific community for help on the direction of the DCXL project.  The major issue at hand was whether we should develop a web-based application or an add-in for Microsoft Excel.  Last week, I reported that we decided that rather than choose, we will develop both.  This might seem like a risky proposition: the DCXL project has a one-year timeline, meaning this all needs to be developed before August (!).  As someone in a DCXL meeting recently put it, aren’t we settling for “twice the product and half the features”?  We discussed what features might need to be dropped from our list of desirables based on the change in trajectory, however we are confident that both of the DCXL products we develop will be feature-rich and meet the needs of the target scientific community.  Of course, this is made easier by the fact that the features in the two products will be nearly identical.

Family Feud screen shot

What would Richard Dawson want? Add-in or web app? From Wikipedia. Source: J Graham (1988). Come on Down!!!: the TV Game Show Book. Abbeville Press

How did we arrive at developing an add-in and a web app? By talking to scientists. It became obvious that there were aspects of both products that appeal to our user communities based on feedback we collected.  Here’s a summary of what we heard:

Show of hands:  I ran a workshop on Data Management for Scientists at the Ocean Sciences 2012 Meeting in February.  At the close of the workshop, I described the DCXL project and went over the pros and cons of the add-in option and the web app option.  By show of hands, folks in the audience voted about 80% for the web app (n~150)

Conversations: here’s a sampling of some of the things folks told me about the two options:

  • “I don’t want to go to the web. It’s much easier if it’s incorporated into Excel.” (add-in)
  • “As long as I can create metadata offline, I don’t mind it being a web app. It seems like all of the other things it would do require you to be online anyway” (either)
  • “If there’s a link in the spreadsheet, that seems sufficient. (either)  It would be better to have something that stays on the menu bar no matter what file is open.” (Add-in)
  • “The updates are the biggest issue for me. If I have to update software a lot, I get frustrated. It seems like Microsoft is always making update something. I would rather go to the web and know it’s the most recent version.” (web app)
  • Workshop attendee: “Can it work like Zotero, where there’s ways to use it both offline and online?” (both)

Survey: I created a very brief survey using the website SurveyMonkey. I then sent the link to the survey out via social media and listservs.  Within about a week, I received over 200 responses.

Education level of respondents:

Survey questions & answers:

 

So with those results, there was a resounding “both!” emanating from the scientific community.  First we will develop the add-in since it best fits the needs of our target users (those who use Excel heavily and need assistance with good data management skills).  We will then develop the web application, with the hope that the community at large will adopt and improve on the web app over time.  The internet is a great place for building a community with shared needs and goals– we can only hope that DCXL will be adopted as wholeheartedly as other internet sources offering help and information.

Tagged , , , , ,