Building an RDM Guide for Researchers – An (Overdue) Update

It has been a little while since I last wrote about the work we’re doing to develop a research data management (RDM) guide for researchers. Since then, we’ve thought a lot about the goals of this project and settled on a concrete plan for building out our materials. Because we will soon be proactively seeking feedback on the different elements of this project, I wanted to provide an update on what we’re doing and why.

RosettaStone

A section of the Rosetta Stone. Though it won’t help decipher Egyptian hieroglyphs, we hope our RDM guide will researchers and data service providers speak the same language. Image from the British Museum.

Communication Barriers and Research Data Management

Several weeks ago I wrote about addressing Research Data Management (RDM) as a “wicked problem”, a problem that is difficult to solve because different stakeholders define and address it in different ways. My own experience as a researcher and library postdoc bears this out. Researchers and librarians often think and talk about data in very different ways! But as researchers face changing expectations from funding agencies, academic publishers, their own peers, and other RDM stakeholders about how they should manage and share their data, overcoming such communication barriers becomes increasingly important.

From visualizations like the ubiquitous research data lifecycle to instruments like the Data Curation Profiles, there are a wide variety of excellent tools that can be used to facilitate communication between different RDM stakeholders. Likewise, there are also discipline-specific best practice guidelines and tools like the Research Infrastructure Self Evaluation Framework (RISE) that allow researchers and organizations to assess and advance their RDM activities. What’s missing is a tool that combines these two elements that enables researchers the means to easily self-assess where they are in regards to RDM and allows data service providers to provide easily customizable guidance about how to advance their data-related practices.

Enter our RDM guide for researchers.

Our RDM Guide for Researchers

What I want to emphasize most about our RDM guide is that it is, first and foremost, designed to be a communication tool. The research and library communities both have a tremendous amount of knowledge and expertise related to data management. Our guide is not intended to supplant tools developed by either, but to assist in overcoming communication barriers in a way that removes confusion, grows confidence, and helps people in both communities find direction.

While the shape of RDM guide has not changed significantly since my last post, we have refined its basic structure and have begun filling in the details.

The latest iteration of our guide consists of two main elements:

  1. A RDM rubric which allows researchers to self-assess their data-related practices using language and terminology with which they are familiar.
  2. A series of one page guides that provide information about how to advance data-related practices as necessary, appropriate, or desired.
RDM_rubric (1)

The two components of our RDM Guide for Researchers. The rubric is intended to help researchers orient themselves in the ever changing landscape of RDM while the guides are intended to help them move forward.

The rubric is similar to the “maturity model”  described in my earlier blog posts. In this iteration, it consists of a grid containing three columns and a number of rows. The leftmost column contains descriptions of different phases of the research process. At present, the rubric contains four such phases: Planning, Collection, Analysis, and Sharing. These research data lifecycle-esque terms are in place to provide a framing familiar to data service providers in the library and elsewhere.

The next column includes phrases that describe specific research activities using language and terminology familiar to researchers. The language in this column is, in part, derived from the unofficial survey we conducted to understand how researchers describe the research process. By placing these activities beside those drawn from the research data lifecycle, we hope to ground our model in terms that both researchers and RDM service providers can relate to.

The rightmost column then contains a series of declarative statements which a researcher can use to identify their individual practices in terms of the degree to which they are defined, communicated, and forward thinking.

Each element of the rubric is designed to be customizable. We understand that RDM service providers at different institutions may wish to emphasize different services toggled to different parts data lifecycle and that researchers in different disciplines may have different ways of describing their data-related activities. For example, while we are working on refining the language of the declarative statements, I have left them out of the diagram above because they are likely the  rubric that will remain most open for customization.

Each row within the rubric will be complemented by a one page guide that will provide researchers with concrete information about data-related best practices. If the purpose of the rubric is to allow researchers to orient themselves in RDM landscape, the purpose of these guides is to help them move forward.

Generating Outputs

Now that we’ve refined the basic structure of our model, it’s time to start creating some outputs. Throughout the remainder of the summer and into the autumn, members of the UC3 team will be meeting regularly to review the content of the first set of one page guides. This process will inform our continual refinement of the RDM rubric which will, in turn, shape the writing of a formal paper.

Moving forward, I hope to workshop this project with as many interested parties as I can, both to receive feedback on what we’ve done so far and to potentially crowdsource some of the content. Over the next few weeks I’ll be soliciting feedback on various aspects of the RDM rubric. If you’d like to provide feedback, please either click through the links below (more to be added in the coming weeks) or contact me directly.

 

Provide feedback on our guide!

Planning for Data

More coming soon!

Disambiguating Dash and Merritt

What’s Dash? What’s Merritt? What’s the difference? After numerous questions about where things should go and what the differences are between our UC3 services, we got the hint that we are not communicating clearly.

Clearing things up

A group of us sat down and talked through different use cases and what wording we were using that was causing such confusion, and have come up with what we hope is a disambiguation of Dash versus Merritt. 

Screen Shot 2017-07-10 at 1.54.06 PM

Different intentions, different target users

While Dash and Merritt interact with each other at a technical level, they have different intentions and users should not be looking at these two services as a comparison. Dash is optimized for researchers and therefore its user interface, user experience, and metadata schema are optimized for use by individual researchers. Merritt is designed for use by institutional librarians, archivists, and curators.

Because of the different intended purposes, features, and users, UC3 does not recommend that Merritt be advertised to researchers on Research Data Management (RDM) sites or researcher-facing Library Guides.

Below are quick descriptions of each service that should clarify intentions and target users:

  • Dash is an open data publication platform for researchers. Self-service depositing of research data through Dash fulfills publisher, funder, and data management plan requirements regarding data sharing and preservation. When researchers publish their datasets through Dash, their datasets are issued a DOI to optimize citability, are publicly available for download and re-use under a CC BY 4.0 or CC-0 license, and are preserved in Merritt, California Digital Library’s preservation repository.  Dash is available to researchers at participating UC campuses, as well as researchers in Environmental and Earth Sciences through the DataONE network.
  • Merritt is a preservation repository for mediated deposits by UC organizations. We work with staff at UC libraries, archives, and departments to preserve digital assets and collections. Merritt offers bit-level preservation and replication with both public or private access. Merritt is also the preservation repository that preserves Dash-deposited data.

The cost of service vs. the cost of storage

California Digital Library does not charge individual users for the Dash or Merritt services. However, we do recharge your institution for the amount of storage used in Merritt (remember, Dash preserves data in Merritt) on an annual basis.  On most campuses, the Library fully subsidizes Dash storage costs, so there is no extra financial obligation to individual researchers depositing data into Dash.

Follow-up

If you have any questions about edge cases or would like to know any more details about the architecture of the Dash platform or Merritt repository, please get in touch at uc3@ucop.edu.

And while you’re here: check out Dash’s new features for uploading large data sets, and uploading directly from the cloud.

Talking About Data: Lessons from Science Communication

As a person who worked for years in psychology and neuroscience laboratories before coming to work in academic libraries, I have particularly strong feelings about ambiguous definitions. One of my favorite anecdotes about my first year of graduate school involves watching two researchers argue about the definition of “attention” for several hours, multiple times a week, for an entire semester. One of the researchers was a clinical psychologist, the other a cognitive psychologist. Though they both devised research projects and wrote papers on the topic of attention, their theories and methods could not have been more different. The communication gap between them was so wide that they were never able to move forward productively. The punchline is that, after sitting through hours of their increasingly abstract and contentious arguments, I would go on to study attention using yet another set of theories and methods as a cognitive neuroscientist. Funny story aside, this anecdote illustrates the degree to which people with different perspectives and levels of expertise can define the same problem in strikingly different ways.

VisualSearch

A facsimile of a visual search array used by cognitive psychologists to study attention. Spot the horizontal red rectangle.

In the decade that has elapsed since those arguments, I have undergone my own change in perspective- from a person who primarily collects and analyzes their own research data to a person who primarily thinks about ways to help other researchers manage and share their data. While my day-to-day activities look rather different, there is one aspect of my work as a library post-doc that is similar to my work as a neuroscientist- many of my colleagues ostensibly working on the same things often have strikingly different definitions, methods, and areas of expertise. Fortunately, I have been able to draw on a body of work that addresses this very thing- science communication.

Wicked Problems

A “wicked problem” is a problem that is extremely difficult to solve because different stakeholders define and address it in different ways. In my anecdote about argumentative professors, understanding attention can be considered a wicked problem. Without getting too much into the weeds, the clinical psychologist understood attention mostly in the context of diagnoses like Attention Deficit Disorder, while the cognitive psychologist understood it the context of scanning visual environments for particular elements or features. As a cognitive neuroscientist, I came to understand it mostly in terms of its effects within neural networks as measured by brain imaging methods like fMRI.

Research data management (RDM) has been described as a wicked problem. A data service provider in an academic library may define RDM as “the documentation, curation, and preservation of research data”, while a researcher may define RDM as either simply part of their daily work or, in the case of something like a data management plan written for a grant proposal, as an extra burden placed upon such work. Other RDM stakeholders, including those affiliated with IT, research support, and university administration, may define it in yet other ways.

Science communication is chock full of wicked problems, including concepts like climate change and the use of stem cell use. Actually, given the significant amount of scholarship devoted to defining terms like “scientific literacy” and the multitudes of things that the term describes, science communication may itself be a wicked problem.

What is Scientific Communication?

Like attention and RDM, it is difficult to give a comprehensive definition of science communication. Documentaries like “Cosmos” are probably the most visible examples, but science communication actually comes in a wide variety of forms including science journalism, initiatives aimed at science outreach and advocacy, and science art. What these activities have in common is that they all generally aim to help people make informed decisions in a world dominated by science and technology. In parallel, there is also a burgeoning body of scholarship devoted to the science of science communication which, among other things, examines how effective different communication strategies are for changing people’s perceptions and behaviors around scientific topics.

For decades, the prevailing theory in science communication was the “Deficit Model”, which posits that scientific illiteracy is due to a simple lack of information. In the deficit model, skepticism about topics such as climate change are assumed to be due to a lack of comprehension of the science behind them. Thus, at least according to the deficit model, the “solution” to the problem of science communication is as straightforward as providing people with all the facts. In this conception, the audience is generally assumed to be homogeneous and communication is assumed to be one way (from scientists to the general public).

Though the deficit model persists, study after study (after meta-analysis) has shown that merely providing people with facts about a scientific topic does not cause them to change their perceptions or behaviors related to that topic. Instead, it turns out that presenting facts that conflict with a person’s worldview can actually cause them to double down on that worldview. Also, audiences are not homogenous. Putting aside differences in political and social worldviews, people have very different levels of scientific knowledge and relate to that knowledge in very different ways. For this reason, more modern models of science communication focus not on one-way transmissions of information but on fostering active engagement, re-framing debates, and meeting people where they are. For example, one of the more effective strategies for getting people to pay attention to climate change is not to present them with a litany of (dramatic and terrifying) facts, but to link it to their everyday emotions and concerns.

VisualSearch2

Find the same rectangle as before. It takes a little longer now that the other objects have a wider variety of features, right? Read more about visual search tasks here.

Communicating About Data

If we adapt John Durant’s nicely succinct definition of science literacy,What the general public ought to know about science.” to an RDM context, the result is something like “What researcher’s out to know about handling data.” Thus, data services in academic libraries can be said to be a form of science communication. As with “traditional” science communicators, data service providers interact with audiences possessing different perspectives and levels of knowledge as their own. The major difference, of course, being that the audience for data service providers is specifically the research community.

There is converging evidence that many of the current plans for fostering better RDM have led to mixed results. Recent studies of NSF data management plans have revealed a significant amount of variability in terms of the degree to which researchers address data management-related concepts like metadata, data sharing, and long-term preservation. The audience of data service providers is, like those of more “traditional science communicators, quite heterogeneous, so perhaps adopting methods from the repertoire of science communication could help foster more active engagement and the adoption of better practices. Many libraries and data service providers have already adopted some of these methods, perhaps without realizing their application in other domains. But I also don’t mean to criticize any existing efforts to engage researchers on the topic of RDM. If I’ve learned one thing from doing different forms of science communication over the years, it is that outreach is difficult and change is slow.

In a series of upcoming blog posts, I’ll write about some of my current projects that incorporate what I’ve written here. First up: I’ll provide an update of the RDM Maturity Model project that I previously described here and here. Coming soon!

Tagged , ,

Cirrus-ly Convenient Uploading

That was a cloud pun! Following our release two weeks ago, the Dash team is thrilled to present our newest functionality: you may now upload files directly from Box, Dropbox, and Google Drive!

Let’s get you publishing (and citing and getting credit for your data):

  • Using the “upload from server” option, you may enter up to 1000 URLs (and up to 100gb per submission) by pasting in the sharing link from Box, Dropbox, or Google Drive.

Screen Shot 2017-06-20 at 1.40.37 PM[2]

  •  Validate the files and your URLs will appear including the filename and size.

Screen Shot 2017-06-20 at 1.41.25 PM[2].png

  • Submit & download.
    • Box, Dropbox, and Google uploaded files will download the same as they were uploaded to the cloud
    • Google docs, sheets, or presentations will download as Microsoft Office word documents, excel spreadsheets, or powerpoint presentations.

We will be updating our help and FAQ pages this week to reflect our new features, but in the meantime please let us know if you have any questions or feedback. 

Manifesting Large and Bulk File Data Publications– Now A Reality!

The Dash team is excited to announce our June feature release: Large and Bulk File upload. Taking into consideration the need for large size and file numbers of datasets, as well as the practicality of server timeouts, we have developed a new feature that allows for up to 1,000 files or 100gb* of data to be published per DOI.

To accomplish this we are using a “manifest” workflow- which means that instead of uploading data directly from your computer, you may enter URLS for where your data are located (on a server or public site) for upload. Once uploaded, Dash will display the data in the same manner as direct upload. To reflect this new option for upload we have updated the Upload page to choose between uploading locally (from your computer) or via a server. Information about file size limits (2gb/file, 10gb total local or 1000 files any size up to 100gb*) are listed on this landing page.

Step 1: Enter URLs where data are located

Screen Shot 2017-06-07 at 1.01.59 PM

Step 2: Validated files will appear in Uploaded Files table with any other data files associated from current or former versions

Screen Shot 2017-06-07 at 1.02.19 PM

The benefit of using this workflow is that as a user you do not have to watch your screen for many hours as the data upload and instead your data will be uploaded in the back-end, without the involvement of your computer. This upload mechanism is also not limited to large file use- it can be an easy way to transfer your data directly from a server regardless of size.

A complication with this process is that you cannot upload local data and server-hosted data in the same version. Though this seems tricky- we would like to remind you that Dash supports versioning and after successful publication of the server uploaded data you could go back in and add local files (or vice versa).

While at the moment we do not allow for upload from Gdrive, Box, or Dropbox, we are investigating the sharing links necessary for integrating uploads from the cloud. If you have any feedback to make this feature, or any features more accessible or valuable for researchers please do get in touch. Happy Data Publishing!

Note: To utilize this feature and publish your datasets, your data will need to be hosted on a server. Many institutions, departments, and labs have servers used to host data and information (good examples across the UC campuses, MIT, University of Iowa, etc…). If you have any questions about servers on your campus or external resources, please utilize your campus librarians

*Size limits vary per institutional tenant- please check in with your UC Data Librarians if you have any questions

Make Data Count: Building a System to Support Recognition of Data as a First Class Research Output

The Alfred P. Sloan Foundation has made a 2-year, $747K award to the California Digital Library, DataCite and DataONE to support collection of usage and citation metrics for data objects. Building on pilot work, this award will result in the launch of a new service that will collate and expose data level metrics.

The impact of research has traditionally been measured by citations to journal publications: journal articles are the currency of scholarly research.  However, scholarly research is made up of a much larger and richer set of outputs beyond traditional publications, including research data. In order to track and report the reach of research data, methods for collecting metrics on complex research data are needed.  In this way, data can receive the same credit and recognition that is assigned to journal articles.

Recognition of data as valuable output from the research process is increasing and this project will greatly enhance awareness around the value of data and enable researchers to gain credit for the creation and publication of data” – Ed Pentz, Crossref.

This project will work with the community to create a clear set of guidelines on how to define data usage. In addition, the project will develop a central hub for the collection of data level metrics. These metrics will include data views, downloads, citations, saves, social media mentions, and will be exposed through customized user interfaces deployed at partner organizations. Working in an open source environment, and including extensive user experience testing and community engagement, the products of this project will be available to data repositories, libraries and other organizations to deploy within their own environment, serving their communities of data authors.

Are you working in the data metrics space? Let’s collaborate.

Find out more and follow us at: www.makedatacount.org, @makedatacount

About the Partners

California Digital Library was founded by the University of California in 1997 to take advantage of emerging technologies that were transforming the way digital information was being published and accessed. University of California Curation Center (UC3), one of four main programs within the CDL, helps researchers and the UC libraries manage, preserve, and provide access to their important digital assets as well as developing tools and services that serve the community throughout the research and data life cycles.

DataCite is a leading global non-profit organization that provides persistent identifiers (DOIs) for research data. Our goal is to help the research community locate, identify, and cite research data with confidence. Through collaboration, DataCite supports researchers by helping them to find, identify, and cite research data; data centres by providing persistent identifiers, workflows and standards; and journal publishers by enabling research articles to be linked to the underlying data/objects.

DataONE (Data Observation Network for Earth) is an NSF DataNet project which is developing a distributed framework and sustainable cyber infrastructure that meets the needs of science and society for open, persistent, robust, and secure access to well-described and easily discovered Earth observational data.

Announcing New Dash Features- April 2017

The Dash team is pleased to announce the release of our newest features. Taking in requests from users as well as standards in the field, we have now adapted the platform with the following releases: Private for Peer Review (Timed-Release of Data), ORCiD integration, email capture for corresponding authors, user friendly downloads, and a variety of search and view enhancements.

Private for Peer Review (Timed-Release of Data)

As mentioned in a previous post, this was formally referred to as embargoing data but we are releasing this feature in the context of keeping data private for the length of peer review. We have now implemented a feature to allow researchers to keep data private, for the purposes of peer review, for up to six months. If a researcher decides to use this option they will be given a private Reviewer URL that can be used by an external party to download the data.

This URL will redirect to the landing page with available data for download as soon as the data are public. If external parties have any questions or would like to request a download they will also now have the ability to reach the corresponding author.

Corresponding Author Email Capture & ORCiD Integration

Corresponding authors (and contributing authors) will now have the ability to enter their email address and ORCiD iD which will both appear on the landing page beneath author name. Just as article publications have, we believe Data Publications should have a corresponding author contact who can be reached with questions about the dataset.

User Friendly Downloads & Interface Improvements

What one uploads is what another may download. When choosing to download the data files, only the files uploaded by the corresponding author will be downloaded.

Some other fixes and features include:

  • the wording our our search filters and browse option
  • a checkbox at the file upload stage to ensure researchers are not uploading sensitive or identifying information 
  • explanatory information within the metadata submission for usage notes and related work
  • a preview of how large the dataset is on the download button

What’s up next?

  • Next Feature: large file upload and bulk file upload
  • Future Feature: a curation layer that will allow for administration capabilities

For more information or if you have any questions please check for updates on the @uc3cdl twitter feed, or get in touch at uc3@ucop.edu.

 

Embargoing the Term “Embargoes” Indefinitely

I’m two months into a position that lends part of its time to overseeing Dash, a Data Publication platform for the University of California. On my first day I was told that a big priority for Dash was to build out an embargo feature. Coming to the California Digital Library (CDL) from PLOS, an OA publisher with an OA Data Policy, I couldn’t understand why I would be leading endeavors to embargo data and not open it up- so I met this embargo directive with apprehension.

I began to acquaint myself with the campuses and a couple of weeks ago while at UCSF I presented the prototype for what this “embargo” feature would look like and I questioned why researchers wanted to close data on an open data platform. This is where it gets fun.

“Our researchers really just want a feature to keep their data private while their associated paper is under peer review. We see this frequently when people submit to PLOS”.

Yes, I had contributed to my own conflict.

While I laughed about how I was previously the person at PLOS convincing UC researchers to make their data public- I recognized that this would be an easy issue to clarify. And here we are.

Embargoes imply a negative connotation in the open community and I ask that moving forward we do not use this phrase to talk about keeping data private until an associated manuscript has been accepted. Let us call this “Private for Peer Review” or “Timed Release”, with a “Peer Review URL” that is available for sharing data during the peer review process as Dryad does.

  • Embargoes imply that data are being held private for reasons other than the peer review process.
  • Embargoes are not appropriate if you have a funder, publisher, or other mandate to open up your data.
  • Embargoes are not appropriate for sensitive data, as these data should not be held in a public repository (embargoed) unless this were through a data access committee and the repository had proper security.
  • Embargoes are not appropriate for open Data Publications.

To embargo your data for longer than the peer review process (or for other reasons) is to shield your data from being used, built off of, or validated. This is contrary to “Open” as a strategy to further scientific findings and scholarly communications.

Dash is implementing features that will allow researchers to choose, in line with what we believe is reasonable for peer review and revisions, a publication date up to six months after submission. If researchers choose to use this feature, they will be given a Peer Review URL that can be shared to download the data until the data are public. It is important to note though that while the data may be private during this time, the DOI for the data and associated metadata will be public and should be used for citation. These features will be for the use of Peer Review; we do not believe that data should be held private for a period of time on an open data publication platform for other reasons.

Opening up data, publishing data, and giving credit to data are all important in emphasizing that data are a credible and necessary piece of scholarly work. Dash and other repositories will allow for data to be private through peer review (with the intent to have data be public and accessible in the close future). However, my hope is that as the data revolution evolves, incentives to open up data sooner will become apparent. The first step is to check our vocab and limit the use of the term “embargo” to cases where data are being held private without an open data intention.

Tagged , , ,

California Digital Library Supports the Initiative for Open Citations

California Digital Library (CDL) is proud to announce our formal endorsement for the Initiative for Open Citations (I4OC). CDL has long supported free and reusable scholarly work, as well as organizations and initiatives supporting citations in publication. With a growing database of literature and research data citations, there is a need for an open global network of citation data.

The Initiative for Open Citations will work with Crossref and their Cited-by service to open up all references indexed in Crossref. Many publishers and stakeholders have opted in to participate in opening up their citation data, and we hope that each year this list will grow to encompass all fields of publication. Furthermore, we are looking forward to seeing how research data citations will be a part of this discussion.

CDL is a firm believer in and advocate for data citations and persistent identifiers in scholarly work. However, if research publications are cited and those citations are not freely accessible and searchable- our goal is not accomplished. We are proud to support the Initiative for Open Citations and invite you to get in touch with any questions you may have about the need for open citations or ways to be an advocate for this necessary change.

Below are some Frequently Asked Questions about the need, ways to get involved, and misconceptions regarding citations. The answers are provided by the Board and founders of the I4OC Initiative:

I am a scholarly publisher not enrolled in the Cited-by service. How do I enable it?

If not already a participant in Cited-by, a Crossref member can register for this service free-of-charge. Having done so, there is nothing further the publisher needs to do to ‘open’ its reference data, other than to give its consent to Crossref, since participation in Cited-by alone does not automatically make these references available via Crossref’s standard APIs.

I am a scholarly publisher already depositing references to Crossref. How do I publicly release them?

We encourage all publishers to make their reference metadata publicly available. If you are already submitting article metadata to Crossref as a participant in their Cited-by service, opening them can be achieved in a matter of days. Publishers can easily and freely achieve this:

  • either by contacting Crossref support directly by e-mail, asking them to turn on reference distribution for all of the relevant DOI prefixes;
  • or by themselves setting the < reference_distribution_opt > metadata element to “ any ” for each DOI deposit for which they want to make references openly available.

How do I access open citation data?

Once made open, the references for individual scholarly publications may be accessed immediately through the Crossref REST API.

Open citations are also available from the OpenCitations Corpus , a database created to house scholarly citations, that is progressively and systematically harvested citation data from Crossref and other sources. An advantage of accessing citation data from the OpenCitations Corpus is that they are available in standards-compliant machine-readable RDF format , and include information about both incoming and outgoing citations of bibliographic resources (published articles and books).

Does this initiative cover future citations only or also historical data?

Both. All DOIs under a prefix set for open reference distribution will have open references through Crossref, for past, present, and future publications.

Past and present publications that lack DOIs are not dealt with by Crossref, and gaining access to their citation data will require separate initiatives by their publishers or others to extract and openly publish those references.

Under what licensing terms is citation data being made available?

Crossref exposes article and reference metadata without a license, since it regards these as raw facts that cannot be licensed.

The structured citation metadata within the OpenCitations Corpus are published under a Creative Commons CC0 public domain dedication, to make it explicitly clear that these data are open.

My journal is open access. Aren’t its articles’ citations automatically available?

No. Although Open Access articles may be open and freely available to read on the publisher’s website, their references are not separate, and are not necessarily structured or accessible programmatically. Additionally, although their reference metadata may be submitted to Crossref, Crossref historically set the default for references to “closed,” with a manual opt-in being required for public references. Many publisher members have not been aware that they could simply instruct Crossref to make references open, and, as a neutral party, Crossref has not promoted the public reference option. All publishers therefore have to opt in to open distribution of references via Crossref.

Is there a programmatic way to check whether a publisher’s or journal’s citation data is free to reuse?

For Crossref metadata , their REST API reveals how many and which publishers have opened references. Any system or tool (or a JSON viewer) can be pointed to this query: http://api.crossref.org/members?filter=has-public-references:true&rows=1000 to show the count and the list of publishers with public-references “: true .

To query a specific publisher’s status, use, for example:

http://api.crossref.org/members?filter=has-public-references:true&rows=1000&qu ery=springer then find the tag for public-references. In some cases it will be set to false.

Contact

You can contact the founding group by e-mail at: info@i4oc.org .

Describing the Research Process

We at UC3 are constantly developing new tools and resources to help researchers manage their data. However, while working on projects like our RDM guide for researchers, we’ve noticed that researchers, librarians, and people working in the broader digital curation space often talk about the research process in very different ways.

To help bridge this gap, we are conducting an informal survey to understand the terms researchers use when talking about the various stages of a research project.

If you are a researcher and can spare about 5 minutes, we would greatly appreciate it if you would click the link below to participate in our survey.

http://survey.az1.qualtrics.com/jfe/form/SV_a97IJAEMwR7ifRP

Thank you.