Category Archives: Meetings & Conferences

csv conf is back in 2017!

csv,conf,v3 is happening!csv

This time the community-run conference will be in Portland, Oregon, USA on 2nd and 3rd of May 2017. It will feature stories about data sharing and data analysis from science, journalism, government, and open source. We want to bring together data makers/doers/hackers from backgrounds like science, journalism, open go
vernment and the wider software industry to share knowledge and stories.

csv,conf is a non-profit community conference run by people who love data and sharing knowledge. This isn’t just a conference about spreadsheets. CSV Conference is a conference about data sharing and data tools. We are curating content about advancing the art of data collaboration, from putting your data on GitHub to producing meaningful insight by running large scale distributed processing on a cluster.

Submit a Talk!  Talk proposals for csv,conf close Feb 15, so don’t delay, submit today! The deadline is fast approaching and we want to hear from a diverse range of voices from the data community.

Talks are 20 minutes long and can be about any data-related concept that you think is interesting. There are no rules for our talks, we just want you to propose a topic you are passionate about and think a room full of data nerds will also find interesting. You can check out some of the past talks from csv,conf,v1 and csv,conf,v2 to get an idea of what has been pitched before.

If you are passionate about data and the many applications it has in society, then join us in Portland!

csv-pic

Speaker perks:

  • Free pass to the conference
  • Limited number of travel awards available for those unable to pay
  • Did we mention it’s in Portland in the Spring????

Submit a talk proposal today at csvconf.com

Early bird tickets are now on sale here.

If you have colleagues or friends who you think would be a great addition to the conference, please forward this invitation along to them! csv,conf,v3 is committed to bringing a diverse group together to discuss data topics. 

– UC3 and the entire csv,conf,v3 team

For questions, please email csv-conf-coord@googlegroups.com, DM @csvconference or join the csv,conf public slack channel.

This was cross-posted from the Open Knowledge International Blog: http://blog.okfn.org/2017/01/12/csvconf-is-back-in-2017-submit-talk-proposals-on-the-art-of-data-analysis-and-collaboration/

Software Carpentry / Data Carpentry Instructor Training for Librarians

We are pleased to announce that we are partnering with Software Carpentry (http://software-carpentry.org) and Data Carpentry (http://datacarpentry.org) to offer an open instructor training course on May 4-5, 2017 geared specifically for the Library Carpentry movement.  

Open call for Instructor Training

This course will take place in Portland, OR, in conjunction with csv,conf,v3, a community conference for data makers everywhere. It’s open to anyone, but the two-day event will focus on preparing members of the library community as Software and Data Carpentry instructors. The sessions will be led by Library Carpentry community members, Belinda Weaver and Tim Dennis.

If you’d like to participate, please apply by filling in the form at https://amy.software-carpentry.org/forms/request_training/  Application closed

What is Library Carpentry?

lib_carpentryFor those that don’t know, Library Carpentry is a global community of library professionals that is customizing Software Carpentry and Data Carpentry modules for training the library community in software and data skills. You can follow us on twitter @LibCarpentry.

Library Carpentry is actively creating training modules for librarians and holding workshops around the world. It’s a relatively new movement that has already been a huge success. You can learn more by reading the recently published article: Library Carpentry: software skills training for library professionals.

Why should I get certified?

Library Carpentry is a movement tightly coupled with the Software Carpentry and Data Carpentry organizations. Since all are based on a train-the-trainer model, one of our challenges has been how to get more experience as instructors. This issue is handled within Software and Data Carpentry by requiring instructor certification.

Although certification is not a requirement to be involved in Library Carpentry, we know that doing so will help us refine workshops, teaching modules, and grow the movement. Also, by getting certified, you can start hosting your own Library Carpentry, Software Carpentry, or Data Carpentry events on your campus. It’s a great way to engage with your campuses and library community!

Prerequisites

Applicants will learn how to teach people the skills and perspectives required to work more effectively with data and software. The focus will be on evidence-based education techniques and hands-on practice; as a condition of taking part, applicants must agree to:

  1. Abide by our code of conduct, which can be found at http://software-carpentry.org/conduct/ and http://datacarpentry.org/code-of-conduct/,
  1. Agree to teach at a Library Carpentry, Software Carpentry, or Data Carpentry workshop within 12 months of the course, and
  1. Complete three short tasks after the course in order to complete the certification. The tasks take a total of approximately 8-10 hours: see http://swcarpentry.github.io/instructor-training/checkout/ for details.

Costs

This course will be held in Portland, OR, in conjunction with csv,conf,v3 and is sponsored by csv,conf,v3 and the California Digital Library. To help offset the costs of this event, we will ask attendees to contribute an optional fee (tiered prices will be recommended based on your or your employer’s ability to pay). No one will be turned down based on inability to pay and a small number of travel awards will be made available (more information coming soon).  

Application

Hope to see you there! To apply for this Software Carpentry / Data Carpentry Instructor Training course, please submit the application by Jan 31, 2017:

  https://amy.software-carpentry.org/forms/request_training/  Application closed

Under Group Name, use “CSV (joint)” if you wish to attend both the training and the conference, or “CSV (training only)” if you only wish to attend the training course.

More information

If you have any questions about this Instructor Training course, please contact admin@software-carpentry.org. And if you have any questions about the Library Carpentry movement, please contact via email at uc3@ucop.edu, via twitter @LibCarpentry or join the Gitter chatroom.

Science Boot Camp West

Last week Stanford Libraries hosted the third annual Science Boot Camp West (SBCW 2015),

“… building on the great Science Boot Camp events held at the University of Colorado, Boulder in 2013 and at the University of Washington, Seattle in 2014. Started in Massachusetts and spreading throughout the USA, science boot camps for librarians are 2.5 day events featuring workshops and educational presentations delivered by scientists with time for discussion and information sharing among all the participants. Most of the attendees are librarians involved in supporting research in the sciences, engineering, medicine or technology although anybody with an interest in science research is welcome.”

As a former researcher and newcomer to the library and research data management (RDM) scenes, I was already familiar with many of the considerable challenges on both sides of the equation (Jake Carlson recently summarized the plight of data librarians). What made SBCW 2015 such an excellent event is that it brought researchers and librarians together to identify immediate opportunities for collaboration. It also showcased examples of Stanford libraries and librarians directly facilitating the research process, from the full-service Stanford Geospatial Center to organizing Software and Data Carpentry workshops (more on this below, and from an earlier post).

Collaboration: Not just a fancy buzzword

The mostly Stanford-based researchers were generous with their time, introducing us to high-level concerns (e.g., why electrons do what they do in condensed matter) as well as more practical matters (e.g., shopping for alternatives to Evernote—yikes—for electronic lab notebooks [ELNs]). They revealed the intimate details of their workflows and data practices (Dr. Audrey Ellerbee admitted that it felt like letting guests into her home to find dirty laundry strewn everywhere, a common anxiety among researchers that in her case was unwarranted), flagged the roadblocks, and presented a constant stream of ideas for building relationships across disciplines and between librarians and researchers.

From the myriad opportunities for library involvement, here are some of the highlights:

  • Facilitate community discussions of best practices, especially for RDM issues such as programming, digital archiving, and data sharing
  • Consult with researchers about available software solutions (e.g., ELNs such as Labguru and LabArchives; note: representatives from both of these companies gave presentations and demonstrations at SBCW 2015), connect them with other users on campus, and provide help with licensing
  • Provide local/basic IT support for students and researchers using commercial products such as ELNs (e.g., maintain FAQ lists to field common questions)
  • Leverage experience with searching databases to improve delivery of informatics content to researchers (e.g., chemical safety data)
  • Provide training in and access to GIS and other data visualization tools

A winning model

The final half-day was dedicated to computer science-y issues. Following a trio of presentations involving computational workflows and accompanying challenges (the most common: members of the same research group writing the same pieces of code over and over with scant documentation and zero version control), Tracy Teal (Executive Director of Data Carpentry) and Amy Hodge (Science Data Librarian at Stanford) introduced a winning model for improving everyone’s research lives.

Software Carpentry and Data Carpentry are extremely affordable 2-day workshops that present basic concepts and tools for more effective programming and data handling, respectively. Training materials are openly licensed (CC-BY) and workshops are led by practitioners for practitioners allowing them to be tailored to specific domains (genomics, geosciences, etc.). At present the demand for these (international) workshops exceeds the capacity to meet it … except at Stanford. With local, library-based coordination, Amy has brokered (and in some cases taught) five workshops for individual departments or research groups (who covered the costs themselves). This is the very thing I wished for as a graduate student—muddling through databases and programming in R on my own—and I think it should be replicated at every research institution. Better yet, workshops aren’t restricted to the sciences; Data Carpentry is developing training materials for techniques used in the digital humanities such as text mining.

Learning to live outside of the academic bubble

Another, subtler theme that ran throughout the program was the need/desire to strengthen connections between the academy and industry. Efforts along these lines stand to improve the science underlying matters of public policy (e.g., water management in California) and public health (e.g., new drug development). They also address the mounting pressure placed on researchers to turn knowledge into products. Mark Smith addressed this topic directly during his presentation on ChEM-H: a new Stanford initiative for supporting research across Chemistry, Engineering, and Medicine to understand and advance Human Health. I appreciated that Mark—a medicinal chemist with extensive experience in both sectors—and others emphasized the responsibility to prepare students for jobs in a rapidly shifting landscape with increasing demand for technical skills.

Over the course of SBCW 2015 I met engaged librarians, data managers, researchers, and product managers, including some repeat attendees who raved about the previous two SBCW events; the consensus seemed to be that the third was another smashing success. Helen Josephine (Head of the Engineering Library at Stanford who chaired the organizing committee) is already busy gathering feedback for next year.

SBCW 2015 at Stanford included researchers from:

Gladstone Institutes in San Francisco

ChEM-H Stanford’s lab for Chemistry, Engineering & Medicine for Human Health

Water in the West Institute at Stanford

NSF Engineering Research Center for Re-inventing the Nation’s Urban Water Infrastructure (ReNUWIt)

DeepDive

Special project topics on Software and Data Carpentry with Physics and BioPhysics faculty and Tracy Teal from Software Carpentry.

Many thanks to:

Helen Josephine, Suzanne Rose Bennett, and the rest of the Local Organizing Committee at Stanford. Sponsored by the National Network of Libraries of Medicine – Pacific Southwest Region, Greater Western Library Alliance, Stanford University Libraries, SPIE, IEEE, Springer Science+Business Media, Annual Reviews, Elsevier.

From Flickr by Paula Fisher (It was just like this, but indoors, with coffee, and powerpoints.)

From Flickr by Paula Fisher (It was just like this, but indoors, with coffee, and powerpoints.)

Tagged , , , ,

The Dash Partners Meeting

This past Thursday, 30 University of California system librarians, developers, and colleagues from nine of the ten campuses assembled at UCLA’s Charles E. Young Library for a discussion of the Dash service. If you weren’t aware, Dash is a University of California project to create a platform that allows researchers to easily describe, deposit and share their research data publicly. The group assembled to talk about the project’s progress and future plans. See the full agenda.

Introductions & Expectations

UC Curation Center (UC3) Director Trisha Cruse kicked off the meeting by asking attendees to introduce themselves and describe what they want to learn  during the meeting. Responses had the following themes:

  • Better understanding of what the Dash service is, how it works, and what it offers for researchers.
  • Participation ideas: how the campuses can work together as a group, and what that work looks like.
  • How we will prioritize development and work together as a cohesive group to determine the trajectory of the service.
  • An understanding of how the campuses are implementing Dash: how they plan to reach out to faculty, how the service should be talked about on the campus, what outreach might look like, how this service can fit into the overall research infrastructure, and campus rollout/adoption plans.
  • Future plans for the Dash service.

Overview of the Dash Service

The team then provided an overview of the Dash service, demonstrating how to log in, describe, and upload a dataset to Dash. Four campus instances of Dash went live (beta) on Tuesday 23 September, and campuses were provided with instructions on how to help test the new system. Stephen Abrams covered the technical infrastructure of the Dash service, describing the relationship between the Merritt repository, the EZID identifier service, the DataONE network, and each of the campus Dash instances (slides).

Yours truly followed with a description of DataONE Dash, a unique instance of the service that will replace the existing DataUp Tool (slides). This instance will be available to anyone with a Google login, and all data submitted to DataONE Dash will be in the ONEShare repository (a DataONE Member Node) and therefore discoverable in the DataONE system. Emily Lin of UC Merced pointed out that some UC Dash contributors might also want their datasets discoverable in DataONE; an enhancement was suggested that would allow UC Dash users to check a box, indicating they would like their work indexed by DataONE.

Stephen then discussed the cost model that is pending approval for Dash (slides). This model is based  on recovering the cost for storage only; there is no service fee for UC users. The model indicates that UC could provide researchers, staff, and graduate students 10 GB of storage in Dash for a total of $290,000/year for the entire system. Sharon Farb of UCLA suggested that we determine what storage solutions are already in place on the various campuses, and coordinate our efforts with those extant solutions. Colleagues from UCSF pointed out that budgets are tight for research labs, and charging for storage may be a significant hurdle for them to participate. We need they need a concrete answer regarding costs now – options may be for each campus to pay up front, or for the UC Office of the President pays for the system. Individual researcher charges would be the responsibility of each campus; CDL has no plans to take on that responsibility.

I followed Stephen with an overview of data governance in Dash (slides). Dash will offer only CC-BY for UC researchers; DataONE Dash will offer only CC-0. The existing DataShare system at UCSF (on which Dash is based) uses a contract (i.e., data use agreement), however this option will not be available moving forward since it inhibits data reuse and complicates Dash implementation. The decision to use CC-BY for Dash is based on conversations with UC General Counsel, which is currently undergoing evaluation of the UC Data Policy. The UC Regents technically own data produced by UC researchers, which complicates how licenses can be used in Dash.

Development Contributions

Marisa Strong then described how campuses can get involved in the development process. She identified the different components of the Dash service, which include three code lines (all in GitHub under an MIT license):

  1. dash-xtf, which houses the search and browse functionality;
  2. dash-ingest, the rails client for ingest to Merritt; and
  3. dash-harvester, a python script for harvesting metadata.

Instructions on how to contribute code are available on the Dash wiki, including how to set up a local test environment.

Matthew McKinley from UC Irvine then described their group’s development efforts in working on the Dash code lines to implement geospatial metadata fields. He described the process for forking the code, implementing the new feature in a local branch, then merging that branch back into the main code line via a pull request.

Plans for Development with Requested Funding

UC3 has submitted a proposal to the Alfred P. Sloan Foundation, requesting funds to continue development of the Dash service. If approved, the grant would fund one year of development focused on the following:

  • Streamlined and improved user interface / user experience
  • Development of embedded widgets for deposit and search functionality in Dash
  • Generalization of Dash protocols so can be layered on top of any repository
  • Expanded functions, including parsing spreadsheets for cleaning and best practices (similar to previous DataUp functionality)
  • Support for more metadata schemas, e.g., EML, FGDC

This work would happen in parallel with the existing Dash application, allowing continuous service while development is ongoing. Declan Fleming of UCSD asked whether UC3 efforts would be better spent using existing infrastructures and tools, such as Fedora. The UC3 team said that they would like to talk further about better possible approaches to the Dash system, and encouraged attendees to share ideas prior to the start of development efforts (if funded).

Dash Enhancements: Identification & Prioritization

The group went through the existing enhancements suggested for Dash, available on GitHub Issues. There were 18 existing enhancements, and the group then suggested an additional 51. Attendees then broke into three groups to prioritize the 69 enhancements for future development. Enhancements that floated to the top included:

  • embargoes (restricted access) for datasets
  • metrics/feedback for data depositors and users (e.g., dataset-level metrics)
  • integration with tools and software such as GitHub, ResearchGate, R, and eScholarship
  • improvements to metadata, including ORCID and Fundref integration

This exercise is only the beginning of the process; the UC3 group plans to tidy up the list and re-share with the group after the meeting for continued discussion. This process will be documented on GitHub and via the Dash listserv. Stay tuned!

Next Steps & Wrap-up

The meeting ended with a discussion about how the campuses would stay informed, what contributions each campus might make to Dash, and how the cross-campus partnership should take shape moving forward. Communication lines will include the Dash Facebook page, Twitter account (@UC3Dash), and the GitHub page. Trisha facilitated a final around-the-room, where attendees could share final thoughts. Common thoughts included excitement for the Dash service, meeting campus partners and hearing about development plans moving forward.

The UCLA campus as it appeared in 1929. Enrollment was 6,175. Contributed to Calisphere by UC Berkeley.

The UCLA campus as it appeared in 1929. Enrollment was 6,175. Contributed to Calisphere by UC Berkeley.

The First UC Libraries Code Camp

This post was co-authored by Stephen Abrams.

Military camp on Coronado Island, California. Contributed to Calisphere by the San Diego History Center. Click on the image for more information.

Military camp on Coronado Island, California. Contributed to Calisphere by the San Diego History Center. Click on the image for more information.

So 30 coders walk into a conference center in Oakland… No, it’s not a bad joke in need of a punch line, it instead describes the start of the first UC Libraries Code Camp, which took place in downtown Oakland last week. These coders were all from the University of California system (8 out of 10 campuses were represented!) and work with or for the UC libraries. CDL sponsored the event and was well represented among the attendees.

The event consisted of two days of lively collaborative brainstorming on ways to provide better, more sustainable library services to the UC community.  Camp participants represented a variety of library roles– curatorial, development, and IT– providing a useful synergistic approach to common problems and solutions. The camp was organized according to the participatory unconference format, in which topics of discussion were arrived at through group consensus.  The final schedule included 10 breakout sessions on topics as diverse as the UC Libraries Digital Collection (UCLDC), data visualization, agile methodology, cloud computing, and use of APIs.  There was also a plenary session of “dork shorts” in which campus representatives gave summary presentations on selected services and initiatives of common interest.

The conference agenda, with notes from the various breakouts, is available on the event website. For those of us that work in the very large and expansive UC system, get-togethers like this one are crucial for ensuring we are efficiently and effectively supporting the UC community.

Of Note

  • We established a GitHub organization: UCLT. Join by emailing your GitHub username to uc3@ucop.edu.
  • We are establishing a Listserv: uclibrarytech-l@ucop.edu
  • Next code camp to take place in the south, in January or February 2015. (we need a southern campus to volunteer!)

Next Steps

  1. Establish a new Common Knowledge Group for Libraries Information Technologists. We need to draft a charter and establish the initial principles of group. Status: in progress, being led by Rosalie Lack, CDL
  2. Help articulate the need for more resources (staff, knowledge, skills, funding) that would allow libraries better support data and researchers creating/managing data. Status: database of skills table is being filled out. Will help guide discussions about library resources across the UC.
  3. Build up a database of UC libraries technologists; help share expertise and skills. Status: table being filled out. Will be moved to GitHub wiki once completed.
  4. Establish a collaborative space for us to share war stories, questions, concerns, approaches to problems, etc. Status: GitHub Organization created. Those interested should join by emailing us at uc3@ucop.edu with their GitHub username.
  5. Have more Code Camp style events, and rotate locations between campuses and regions (e.g., North versus South). Status: can plan these via GitHub organization + listserv
  6. Keep UC Code Camp conversations going, drilling down into some specific topics via virtual conferencing. Status: can plan these via GitHub organization + listserv. Can create specific “teams” within the GitHub organization to help organize more specific groups within the organization.
  7. Develop teams of IT + librarians to help facilitate outreach and education on campuses.
  8. Have CDL visit campuses more often to run informational sessions.
  9. Have space for sharing outreach and education materials around data management, tools and services available, etc. Status: can use GitHub organization or …?

The DataCite Meeting in Nancy, France

Last week I took a lovely train ride through the cow-dotted French countryside to attend the 2014 DataCite Annual Conference. The event was held at the Institut de l’information Scientifique et Technique (INIST) in Nancy, France, which is about 1.5 hours by train outside of Paris. INIST is the French DataCite member (more on DataCite later). I was invited to the meeting to represent the CDL, which has been an active participant in DataCite since its inception (see my slides). But before I can provide an overview of the DataCite meeting, we need to back up and make sure everyone understands the concept of identifiers, plus a few other bits of key background information.

Background

Identifiers

An identifier is a string of characters that uniquely identifies an object. The object might be a dataset, software, or other research product. Most researchers are familiar with a particular type of identifier, the digital object identifier (DOI). These have been used by the academic publishing industry for uniquely identifying digital versions of journal articles for the last 15 years or so, and their use recently has expanded to other types of digital objects (posters, datasets, code, etc.). Although the DOI is the most widely known type of identifier, there are many, many other identifier schemes. Researchers do not necessarily need to understand the nuances of identifiers, however, since the data repository often chooses the scheme. The most important thing for researchers to understand is that their data needs an identifier to be easy to find, and to facilitate getting credit for that data.

The DataCite Organization

For those unfamiliar with DataCite, it’s a nonprofit organization founded in 2009. According to their website, their aims are to:

  • establish easier access to research data on the Internet
  • increase acceptance of research data as legitimate, citable contributions to the scholarly record
  • support data archiving that will permit results to be verified and re-purposed for future study.

In this capacity, DataCite has working groups, participates in large initiatives, and partners with national and international groups. Arguably they are most known for their work in helping organizations issue DOIs. CDL was a founding member of DataCite, and has representation on the advisory board and in the working groups.

EZID: Identifiers made easy

The CDL has a service that provides DataCite DOIs to researchers and those that support them, called EZID. The EZID service allows its users to create and manage long term identifiers (they do more than just DOIs). Note that individuals currently cannot go to the EZID website and obtain an identifier, however. They must instead work with one of the EZID clients, of which there are many, including academic groups, private industry, government organizations, and publishers. Figshare, Dryad, many UC libraries, and the Fred Hutchinson Cancer Research Center are among those who obtain their DataCite DOIs from EZID.

Highlights from the meeting

#1: Enabling culture shifts

Andrew Treloar from the Australian National Data Service (ANDS) presented a great way to think about how we can enable the shift to a world where research data is valued, documented, and shared. The new paradigm first needs to be possible: this means supporting infrastructure at the institutional and national levels, giving institutions and researchers the tools to properly manage research data outputs, and providing ways to count data citations and help incentivize data stewardship. Second, the paradigm needs to be encouraged/required. We are making slow but steady headway on this front, with new initiatives for open data from government-funded research and requirements for data management plans. Third, the new paradigm needs to be adopted/embraced. That is, researchers should be asking for DOIs for their data, citing the data they use, and understanding the benefits of managing and sharing their data. This is perhaps the most difficult of the three. These three aspects of a new paradigm can help frame tool development, strategies for large initiatives, and arguments for institutional support.

#2: ZENODO’s approach to meeting research data needs

Lars Holm Nielsen from the European Organization for Nuclear Research (CERN) provided a great overview of the repository ZENODO. If you are familiar with figshare, this repository has similar aspects: anyone can deposit their information, regardless of country, institution, etc. This was a repository created to meet the needs of researchers interested in sharing research products. One of the interesting features about Zenodo is their openness to multiple types of licenses, including those that do not result in fully open data. Although I feel strongly about ensuring data are shared with open, machine-readable waivers/licenses, Nielsen made an interesting point: step one is actually getting the data into a repository. If this is accomplished, then opening the data up with an appropriate license can be discussed at a later date with the researcher. I’m not sure if I agree with this strategy (I envision repositories full of data no one can actually search or use), it’s an interesting take.

Full disclosure: I might have a small crush on CERN due to the recent release of Particle Fever, a documentary on the discovery of the Higgs boson particle).

#3: the re3data-databib merger

Maxi Kindling from Humboldt University Berlin (representing re3data) and Michael Witt from Purdue University Libraries (representing databib) co-presented on plans for merging their two services, both searchable databases of repositories. Both re3data and databib have extensive metadata on data repositories available for depositing research data, covering a wide range of data types and disciplines. This merger makes sense since the two services emerged within X months of one another and there is no need for running them separately, with separate support, personnel, and databases. Kindling and Witt described the five principles of agreement for the merge: openness, optimal quality assurance, innovative functionality development, shared leadership (i.e., the two are equal partners), and sustainability. Regarding this last principle, the service that will result from the merge has been “adopted” by DataCite, which will support it for the long term. The service that will be born of the merge will be called re3data, with an advisory board called databib.

Attendees of the DataCite meeting had interesting lunchtime conversations around future integrations and tools development in conjunction with the new re3data. What about a repository “match-making” service, which could help researchers select the perfect repository for their data? Or integration with tools like the DMPTool? The re3data-databib group is likely coming up with all kinds of great ideas as a result of their new partnership, which will surely benefit the community as a whole.

#4: Lots of other great stuff

There were many other interesting presentations at the meeting: Amye Kenall from BioMed Central (BMC) talking about their GigaScience data journal; Mustapha Mokrane from the ICSU-World Data System on data publishing efforts; and Nigel Robinson from Thomson-Reuters on the Data Citation Index, to name a few. DataCite plans on making all of the presentations available on the conference website, so be sure to check that out in the next few weeks.

My favorite non-data part? The light show at the central square of Nancy, Place Stanislas. 20 minutes well-spent.

Related on Data Pub:

Tagged , ,

Sharing is caring, but should it count?

The following is a guest post by Shea Swauger, Data Management Librarian at Colorado State University. Shea and I both participated in a meeting for the Colorado Alliance of Research Libraries on 11 July 2014, where he presented survey results described below.


 

 Vanilla Ice has a timely message for the data community. From Flickr by wiredforlego.

Vanilla Ice has a timely message for the data community. From Flickr by wiredforlego.

It shouldn’t be a surprise that many of the people who collect and generate research data are academic faculty members. One of the gauntlets that these individuals must face is the tenure and promotion process, an evaluation system that measures and rewards professional excellence, scholarly impact and can greatly affect the career arch of an aspiring scholar. As a result, tenure and promotion metrics naturally influence the kind and quantity of scholarly products that faculty produce.

Some advocates of data sharing have suggested using the tenure and promotion process as a way to incentivize data sharing. I thought this was a brilliant idea and had designs to advocate its implementation to members of the executive administration at my university, but first I wanted to gather some evidence to support my argument. Some of my colleagues, Beth Oehlerts, Daniel Draper, Don Zimmerman and I sent out a survey to all faculty members as to how they felt about incorporating shared research data as an assessment measure in the tenure and promotion process. Only about 10% (202) responded, so while generalizations about the larger population can’t be made, their answers are still interesting.

This is how I expected the survey to work:

Me: “If sharing your research data counted, in some way, towards you achieving tenure and promotion, would you be more likely to do it?”

Faculty: “Yes, of course!”

I’d bring this evidence to the university, sweeping changes would be made, data sharing would proliferate and all would be well.

I was wrong.

Speaking broadly, only about half of the faculty members surveyed said that changing the tenure and promotion process would make them more likely to share their data.

While 76% of the faculty were interested in sharing data in the future, and 84% said that data generation or collection is important to their research, half of faculty said that shared research data has little to no impact on their scholarly community and almost a quarter of faculty said they are unable to judge the impact.

Okay, let’s back up.

The tenure system is supposed to measure, among several things like teaching, service, etc., someone’s impact on their scholarly community. According to this idea there should be a correlation between the things that impact your scholarly community and the things that impact you achieving tenure. Now, back to the survey.

I asked faculty to rate the impact of several research products on their scholarly community as well as on their tenure and promotion. 94% of faculty rated ‘peer-reviewed journal articles’ at ‘high impact’ (the top of the scale) for impact upon their scholarly community, and 96% of faculty rated ‘peer-reviewed journal articles’ at ‘high impact’ upon their tenure and promotion. This supports the idea that because peer-viewed journal articles have a high impact on the scholarly community, they have a high impact on the tenure and promotion process.

Shared research data had a similar impact correlation, though on the opposite end of the impact spectrum. Little impact on the scholarly community means little impact on the tenure and promotion process. Bad news for data sharing. Reductively speaking, I believe this to be the essence of the argument: contributions that are valuable to a research community should be rewarded in the tenure and promotion process; shared research data isn’t valuable to the research community; therefore, data sharing should not be rewarded.

Also, I received several responses from faculty saying that they were obligated not to share their data because of the kind of research they were doing, be it in defense, the private sector, or working with personally identifiable or sensitive data.  They felt that if the university started rewarding data sharing, they would be unfairly punished because of the nature of their research. Some suggested that a more local implementation of a data sharing policy, perhaps on a departmental basis or an individual opt-in system might be fairer to researchers who can’t share their data for one reason or another.

So what does this mean?

Firstly, it means that there’s a big perception gap on the importance of ‘my data to my research’, and the importance of ‘my data to someone else’s research’. Closing this gap could go a long way to increasing data sharing. Secondly, it means that the tenure and promotion system is a complicated, political mechanism and trying to leverage it as a way to incentivize data sharing is not easy or straightforward. For now, I’ve decided not to try and pursue amending the local tenure system, however I have hope that as interest in data sharing grows we can find meaningful ways that reward people who choose to share their data.

Note: the work described above is being prepared for publication in 2015.

Tagged , , , , ,

It takes a data management village

A couple of weeks ago, information scientists, librarians, social scientists, and their compatriots gathered in Toronto for the 2014 IASSIST meeting. IASSIST is, of course, an acronym which I always have to look up to remember – International Association for Social Science Information Service & Technology. Despite its forgettable name, this conference is one of the better meetings I’ve attended. The conference leadership manages to put together a great couple of days, chock full of wonderful plenaries and interesting presentations, and even arranged a hockey game for the opening reception.

Yonge Street crowds celebrating the end of the Boer War, Toronto, Canada. This image is available from the City of Toronto Archives, and is in the public domain.

Although there were many interesting talks, and I’m still processing the great discussions I had in Toronto, a couple really rang true for me. I’m going to now shamelessly paraphrase one of these talks (with permission, of course) about building a “village” of data management experts at institutions to best service researchers’ needs. All credit goes to Alicia Hofelich Mohr and Thomas Lindsay, both from University of Minnesota. Their presentation was called “It takes a village: Strengthening data management through collaboration with diverse institutional offices.” I’m sure IASSIST will make the slides available online in the near future, but I think this information is too important to not share asap.

Mohr and Lindsay first described the data life cycle, and emphasized the importance of supporting data throughout its life – especially early on, when small things can make a big difference down the road. They asserted that in order to provide support for data management, librarians need to connect with other service providers at their institutions. They then described who these providers are, and where they fit into the broader picture. Below I’ve summarized Mohr and Lindsay’s presentation.

Grants coordinators

Faculty writing grants are constantly interacting with these individuals. They are on the “front lines” of data management planning, in particular, since they can point researchers to other service providers who can help over the course of the project. Bonus – grants offices often have a deep knowledge of agency requirements for data management.

Sponsored projects

The sponsored projects office is another service provider that often has early interactions with researchers during their project planning. Researchers are often required to submit grants directly to this office, who ensure compliance and focus on requirements needed for proposals to be complete.

College research deans

Although this might be an intimidating group to connect with, they are likely to be the most aware of the current research climate and can help you target your services to the needs of their researchers. They can also help advocate for your services, especially via things like new faculty orientation. Generally, this group is an important ally in facilitating data sharing and reuse.

IT system administrators

This group is often underused by researchers, despite their ability to potentially provide researchers with server space, storage, collaboration solutions, and software licenses. They are also useful allies in ensuring security for sensitive data.

Research support services & statistical consulting offices

Some universities have support for researchers in the designing, collecting, and analyzing of their data. These groups are sometimes housed within specific departments, and therefore might have discipline-specific knowledge about repositories, metadata standards, and cultural norms for that discipline. They are often formally trained as researchers and can therefore better relate to your target audience. In addition, these groups have the opportunity to promote replicable workflows and help researchers integrate best practices for data management into their everyday processes.

Data security offices, copyright/legal offices, & commercialization offices

Groups such as these are often overlooked by librarians looking to build a community of support around data management. Individuals in these offices may be able to provide invaluable expertise to your network, however. These groups contribute to and implement University security, data, and governance policies, and are knowledgeable about the legal implications of data sharing, especially related to sensitive data. Intellectual property rights, commercialization, and copyright are all complex topics that require expertise not often found among other data stewardship stakeholders. Partnering with experts can help reduce the potential for future problems, plus ensure data are shared to the fullest extent possible.

Library & institutional repository

The library is, of course, distinct from an institutional repository. However, often the institution’s library plays a key role in supporting, promoting, and often implementing the repository. I often remind researchers that librarians are experts in information, and data is one of many types of information. Researchers often underuse librarians and their specialized skills in metadata, curation, and preservation. The researchers’ need for a data repository and the strong link between repositories and librarians will change this in the coming years, however. Mohr and Lindsay ended with this simple statement, which nicely sums up their stellar presentation:

The data support village exists across levels and boundaries of the institution as well as across the lifecycle of data management.

Tagged , , , , , ,

Mountain Observatories in Reno

A few months ago, I blogged about my experiences at the NSF Large Facilities Workshop. “Large Facilities” encompass things like NEON (National Ecological Observatory Network), IRIS PASSCAL Instrument Center (Incorporated Research Institutions for Seismology Program for Array Seismic Studies of the Continental Lithosphere), and the NRAO (National Radio Astronomy Observatory). I found the event itself to be an eye-opening experience: much to my surprise, there was some resistance to data sharing in this community. I had always assumed that large, government-funded projects had strict data sharing requirements, but this is not the case. I had stimulating arguments with Large Facilities managers who considered their data too big and complex to share, and (more worrisome), that their researchers would be very resistant to opening up the data they generated at these large facilities.

Why all this talk about large facilities? Because I’m getting the chance to make my arguments again, to a group with overlapping interests to that of the Large Facilities community. I’m very excited to be speaking at Mountain Observatories: A Global Fair and Workshop  this July in Reno, Nevada. Here’s a description from the organizers:

The event is focused on observation sites, networks, and systems that provide data on mountain regions as coupled human-natural systems. So the meeting is expected to bring together biophysical as well as socio-economic researchers to discuss how we can create a more comprehensive and quantitative mountain observing network using the sites, initiatives, and systems already established in various regions of the world.

I must admit, I’m ridiculously excited to geek out with this community. I’ll get to hear about the GLORIA Project (GLObal Robotic-telescopes Intelligent Array), something called “Mountain Ethnobotany“, and “Climate Change Adaptation Governance”. See a full list of the proposed sessions here. The conference is geared towards researchers and managers, which means I’ll have the opportunity to hear about data sharing proclivities straight from their mouths. The roster of speakers joining me include a hydroclimatologist (Mike Dettinger, USGS) and a researcher focused on socio-cultural systems (Courtney Flint, Utah State University), plus representatives from the NSF, a sensor networks company, and others. The conference should be a great one – abstract submission deadline was just extended, so there’s still time to join me and nerd out about science!

Reno! From Flickr by Ravensmagiclantern

Reno! From Flickr by Ravensmagiclantern

Tagged , , , ,

My picks for #AGU13

Nerds come in many flavors at the AGU meeting. From Flickr by Westfield, Ma

Nerds come in many flavors at the AGU meeting. From Flickr by Westfield, Ma

Next week, the city of San Francisco will be overrun with nerds. More specifically,more than 22,000 geophysicists, oceanographers, geologists, seismologists, meteorologists, and volcanologists will be descending upon the Bay Area to attend the 2013 American Geophysical Union Fall Meeting.

If you are among the thousands of attendees, you are probably (like me) overwhelmed by the plethora in sessions, speakers, posters, and mixers. In an effort to force myself to look at the schedule well in advance of the actual meeting, I’m sharing my picks for must-sees at the AGU meeting below.

Note! I’m co-chairing “Managing Ecological Data for Effective Use and Reuse” along with Amber Budden of DataONE and Karthik Ram of rOpenSci. Prepare for a great set of talks about DMPTool, rOpenSci, DataONE, and others.

Session Title

Abbr

Type

Day

Time

Translating Science into Action: Innovative Services for the Geo- and Environmental- Sciences in the Era of Big Data I GC11F Oral Mon 8:00 AM
Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science I IN11D Oral Mon 8:00 AM
Data Curation, Credibility, Preservation Implementation, and Data Rescue to Enable Multi-source Science II IN12A Oral Mon 10:20 AM
Enabling Better Science Through Improving Science Software Development Culture I IN22A Oral Tue 10:20 AM
Collaborative Frameworks and Experiences in Earth and Space Science Posters IN23B Poster Tue 1:40 PM
Enabling Better Science Through Improving Science Software Development Culture II Posters IN23C Poster Tue 1:40 PM
Managing Ecological Data for Effective Use and Reuse I ED43E Oral Thu 1:40 PM
Open-Source Programming, Scripting, and Tools for the Hydrological Sciences II H51R Oral Fri 8:00 AM
Data Stewardship in Theory and in Practice I IN51D Oral Fri 8:00 AM
Managing Ecological Data for Effective Use and Reuse II Posters ED53B Poster Fri 1:40 PM

Download the full program as a PDF

Previous Data Pub blog post about AGU: Scientific Data at AGU 2011

Tagged , , , , ,