Tag Archives: software

Understanding researcher needs and values related to software

Software is as important as data when it comes to building upon existing scholarship. However, while there has been a small amount of research into how researchers find, adopt, and credit it, there is a comparative lack of empirical data on how researchers use, share, and value their software.

The UC Berkeley Library and the California Digital Library are investigating researchers’ perceptions, values, and behaviors in regards to software generated as part of the research process. If you are a researcher, it would be greatly appreciated if you could spare 10-15 minutes to complete the following survey:

Take the survey now!

The results of this survey will help us better understand researcher needs and values related to software and may also inform the development of library services related to software best practices, code sharing, and the reproducibility of scholarly activity.

If you have questions about our study or any problems accessing the survey, please contact yasminal@berkeley.edu or John.Borghi@ucop.edu.

Tagged , , , , , ,

Software for Reproducibility

The ultimate replication machine: DNA. Sculpture at Lawrence Berkeley School of Science, Berkeley CA. From Flickr by D.H. Parks.

The ultimate replication machine: DNA. Sculpture at Lawrence Berkeley School of Science, Berkeley CA. From Flickr by D.H. Parks.

Last week I thought a lot about one of the foundational tenets of science: reproducibility. I attended the Workshop on Software Infrastructure for Reproducibility in Science, held in Brooklyn at the new Center for Urban Science and Progress, NYU. This workshop was made possible by the Alfred P. Sloan Foundation and brought together heavy-hitters from the reproducibility world who work on software for workflows.

New to workflows? Read more about workflows in old blog posts on the topic, here and here. Basically, a workflow is a formalization of “process metadata”.  Process metadata is information about the process used to get to your final figures, tables, and other representations of your results. Think of it as a precise description of the scientific procedures you follow.

After sitting through demos and presentations on the different tools folks have created, my head was spinning, in a good way. A few of my takeaways are below. For my next Data Pub post I will provide list of the tools we discussed.

Takeaway #1: Reuse is different from reproducibility.

The end-goal of documenting and archiving a workflow may be different for different people/systems. Reuse of a workflow, for instance, is potentially much easier than exactly reproducing the results .  Any researcher will tell you: reproducibility is virtually impossible. Of course, this differs a bit depending on discipline: anything involving a living thing is much more unpredictable (i.e., biology), while engineering experiments are more likely to be spot-on when reproduced. The level of detail needed to reproduce results is likely to dwarf details and information needed for reuse of workflows.

Takeaway #2: Think of reproducibility as archiving.

This was something Josh Greenberg said, and it struck a chord with me. It was said in the context of considering exactly how much stuff should be captured for reproducibility. Josh pointed out that there is a whole body of work out there addressing this very question: archival science.

Example: an archivist at a library gets boxes of stuff from a famous author who recently passed away. How does s/he decide what is important? What should be kept, and what should be thrown out? How should the items be arranged to ensure that they are useful? What metadata, context, or other information (like a finding aid) should be provided?

The situation with archiving workflows is similar: how much information is needed? What are the likely uses for the workflow? How much detail is too much? Too little? I like considering the issues around capturing the scientific process as similar to archival science scenarios– it makes the problem seem a bit more manageable.

Takeaway #3: High-quality APIs are critical for any tool developed.

We talked about MANY different tools. The one thing we could all agree on was that they should play nice with other tools. In the software world, this means having a nice, user-friendly Application Program Interface (API) that basically tells two pieces of software how to talk to one another.

Takeaway #4: We’ve got the tech-savvy researchers covered. Others? not so much.

The software we discussed is very nifty. That said, many of these tools are geared towards researchers with some impressive tech chops. The tools focus on helping capture code-based work, and integrate with things like LaTeX, Git/Github, the command line. Did I lose you there? You aren’t alone… many of the researchers I interact with are not familiar with these tools, and would therefore not be able to effectively use the software we discussed.

Takeaway #5: Closing the gap between the tools and the researchers that should use them is hard. But not impossible.

There are three basic approaches that we can take:

  1. Focus on better user experience design
  2. Emphasize researcher training via workshops, one-on-one help from experts, et cetera
  3. Force researchers to close the gap on their own. (i.e., Wo/man up).

The reality is that it’s likely to be some combination of these three. Those at the workshop recognized the need for better user interfaces, and some projects here at the CDL are focusing on extensive usability testing prior to release. Funders are beginning to see the value of funding new positions for “human bridges” to help sync up researcher skill sets with available tools. And finally, researchers are slowly recognizing the need to learn basic coding– note the massive uptake of R in the Ecology community as an example.

Tagged , , , , ,