Tag Archives: citation

Make Data Rain

Last October, UC3,  PLOS, and DataONE launched Making Data Count, a collaboration to develop data-level metrics (DLMs). This 12-month National Science Foundation-funded project will pilot a suite of metrics to track and measure data use that can be shared with funders, tenure & promotion committees, and other stakeholders.

Featured image

[image from Freepik]

To understand how DLMs might work best for researchers, we conducted an online survey and held a number of focus groups, which culminated on a very (very) rainy night last December in a discussion at the PLOS offices with researchers in town for the 2014 American Geophysical Union Fall Meeting.

Six eminent researchers participated:

Much of the conversation concerned how to motivate researchers to share data. Sources of external pressure that came up included publishers, funders, and peers. Publishers can require (as PLOS does) that, at a minimum, the data underlying every figure be available. Funders might refuse to ‘count’ publications based on unavailable data, and refuse to renew funding for projects that don’t release data promptly. Finally, other researchers– in some communities, at least– are already disinclined to work with colleagues who won’t share data.

However, Making Data Count is particularly concerned with the inverse– not punishing researchers who don’t share, but rewarding those who do. For a researcher, metrics demonstrating data use serve not only to prove to others that their data is valuable, but also to affirm for themselves that taking the time to share their data is worthwhile. The researchers present regarded altmetrics with suspicion and overwhelmingly affirmed that citations are the preferred currency of scholarly prestige.

Many of the technical difficulties with data citation (e.g., citing  dynamic data or a particular subset) came up in the course of the conversation. One interesting point was raised by many: when citing a data subset, the needs of reproducibility and credit diverge. For reproducibility, you need to know exactly what data has been used– at a maximum level of granularity. But, credit is about resolving to a single product that the researcher gets credit for, regardless of how much of the dataset or what version of it was used– so less granular is better.

We would like to thank everyone who attended any of the focus groups. If you have ideas about how to measure data use, please let us know in the comments!

Tagged , ,

The DataCite Meeting in Nancy, France

Last week I took a lovely train ride through the cow-dotted French countryside to attend the 2014 DataCite Annual Conference. The event was held at the Institut de l’information Scientifique et Technique (INIST) in Nancy, France, which is about 1.5 hours by train outside of Paris. INIST is the French DataCite member (more on DataCite later). I was invited to the meeting to represent the CDL, which has been an active participant in DataCite since its inception (see my slides). But before I can provide an overview of the DataCite meeting, we need to back up and make sure everyone understands the concept of identifiers, plus a few other bits of key background information.



An identifier is a string of characters that uniquely identifies an object. The object might be a dataset, software, or other research product. Most researchers are familiar with a particular type of identifier, the digital object identifier (DOI). These have been used by the academic publishing industry for uniquely identifying digital versions of journal articles for the last 15 years or so, and their use recently has expanded to other types of digital objects (posters, datasets, code, etc.). Although the DOI is the most widely known type of identifier, there are many, many other identifier schemes. Researchers do not necessarily need to understand the nuances of identifiers, however, since the data repository often chooses the scheme. The most important thing for researchers to understand is that their data needs an identifier to be easy to find, and to facilitate getting credit for that data.

The DataCite Organization

For those unfamiliar with DataCite, it’s a nonprofit organization founded in 2009. According to their website, their aims are to:

  • establish easier access to research data on the Internet
  • increase acceptance of research data as legitimate, citable contributions to the scholarly record
  • support data archiving that will permit results to be verified and re-purposed for future study.

In this capacity, DataCite has working groups, participates in large initiatives, and partners with national and international groups. Arguably they are most known for their work in helping organizations issue DOIs. CDL was a founding member of DataCite, and has representation on the advisory board and in the working groups.

EZID: Identifiers made easy

The CDL has a service that provides DataCite DOIs to researchers and those that support them, called EZID. The EZID service allows its users to create and manage long term identifiers (they do more than just DOIs). Note that individuals currently cannot go to the EZID website and obtain an identifier, however. They must instead work with one of the EZID clients, of which there are many, including academic groups, private industry, government organizations, and publishers. Figshare, Dryad, many UC libraries, and the Fred Hutchinson Cancer Research Center are among those who obtain their DataCite DOIs from EZID.

Highlights from the meeting

#1: Enabling culture shifts

Andrew Treloar from the Australian National Data Service (ANDS) presented a great way to think about how we can enable the shift to a world where research data is valued, documented, and shared. The new paradigm first needs to be possible: this means supporting infrastructure at the institutional and national levels, giving institutions and researchers the tools to properly manage research data outputs, and providing ways to count data citations and help incentivize data stewardship. Second, the paradigm needs to be encouraged/required. We are making slow but steady headway on this front, with new initiatives for open data from government-funded research and requirements for data management plans. Third, the new paradigm needs to be adopted/embraced. That is, researchers should be asking for DOIs for their data, citing the data they use, and understanding the benefits of managing and sharing their data. This is perhaps the most difficult of the three. These three aspects of a new paradigm can help frame tool development, strategies for large initiatives, and arguments for institutional support.

#2: ZENODO’s approach to meeting research data needs

Lars Holm Nielsen from the European Organization for Nuclear Research (CERN) provided a great overview of the repository ZENODO. If you are familiar with figshare, this repository has similar aspects: anyone can deposit their information, regardless of country, institution, etc. This was a repository created to meet the needs of researchers interested in sharing research products. One of the interesting features about Zenodo is their openness to multiple types of licenses, including those that do not result in fully open data. Although I feel strongly about ensuring data are shared with open, machine-readable waivers/licenses, Nielsen made an interesting point: step one is actually getting the data into a repository. If this is accomplished, then opening the data up with an appropriate license can be discussed at a later date with the researcher. I’m not sure if I agree with this strategy (I envision repositories full of data no one can actually search or use), it’s an interesting take.

Full disclosure: I might have a small crush on CERN due to the recent release of Particle Fever, a documentary on the discovery of the Higgs boson particle).

#3: the re3data-databib merger

Maxi Kindling from Humboldt University Berlin (representing re3data) and Michael Witt from Purdue University Libraries (representing databib) co-presented on plans for merging their two services, both searchable databases of repositories. Both re3data and databib have extensive metadata on data repositories available for depositing research data, covering a wide range of data types and disciplines. This merger makes sense since the two services emerged within X months of one another and there is no need for running them separately, with separate support, personnel, and databases. Kindling and Witt described the five principles of agreement for the merge: openness, optimal quality assurance, innovative functionality development, shared leadership (i.e., the two are equal partners), and sustainability. Regarding this last principle, the service that will result from the merge has been “adopted” by DataCite, which will support it for the long term. The service that will be born of the merge will be called re3data, with an advisory board called databib.

Attendees of the DataCite meeting had interesting lunchtime conversations around future integrations and tools development in conjunction with the new re3data. What about a repository “match-making” service, which could help researchers select the perfect repository for their data? Or integration with tools like the DMPTool? The re3data-databib group is likely coming up with all kinds of great ideas as a result of their new partnership, which will surely benefit the community as a whole.

#4: Lots of other great stuff

There were many other interesting presentations at the meeting: Amye Kenall from BioMed Central (BMC) talking about their GigaScience data journal; Mustapha Mokrane from the ICSU-World Data System on data publishing efforts; and Nigel Robinson from Thomson-Reuters on the Data Citation Index, to name a few. DataCite plans on making all of the presentations available on the conference website, so be sure to check that out in the next few weeks.

My favorite non-data part? The light show at the central square of Nancy, Place Stanislas. 20 minutes well-spent.

Related on Data Pub:

Tagged , ,

Impact Factors: A Broken System

From Flickr by The Official CTBTO Photostream

How big is your impact? Sedan Plowshare Crater, 1962. From Flickr by The Official CTBTO Photostream

If you are a researcher, you are very familiar with the concept of a journal’s Impact Factor (IF). Basically, it’s a way to grade journal quality. From Wikipedia:

The impact factor (IF) of an academic journal is a measure reflecting the average number of citations to recent articles published in the journal. It is frequently used as a proxy for the relative importance of a journal within its field, with journals with higher impact factors deemed to be more important than those with lower ones.

The IF was devised in the 1970s as a tool for research libraries to judge the relative merits of journals when allocating their subscription budgets. However it is now being used as a way to evaluate the merits of individual scientists– something for which it was never intended to be used.  As Björn Brembs puts it, “…scientific careers are made and broken by the editors at high-ranking journals.”

In his great post, “Sick of Impact Factors“, Stephen Curry says that the real problem started when impact factors began to be applied to papers and people.

I can’t trace the precise origin of the growth but it has become a cancer that can no longer be ignored. The malady seems to particularly afflict researchers in science, technology and medicine who, astonishingly for a group that prizes its intelligence, have acquired a dependency on a valuation system that is grounded in falsity. We spend our lives fretting about how high an impact factor we can attach to our published research because it has become such an important determinant in the award of the grants and promotions needed to advance a career. We submit to time-wasting and demoralising rounds of manuscript rejection, retarding the progress of science in the chase for a false measure of prestige.

Curry isn’t alone. Just last week Bruce Alberts, Editor-in-Chief of Science, wrote  a compelling editorial about Impact Factor distortions. Alberts’ editorial was inspired by the recently released San Francisco Declaration on Research Assessment (DORA). I think this is one of the more important declarations/manifestoes peppering the internet right now, and has the potential to really change the way scholarly publishing is approached by researchers.

DORA was created by a group of editors and publishers who met up at the Annual Meeting of the American Society for Cell Biology (ASCB) in 2012. Basically, it lays out all the problems with impact factors and provides a set of general recommendations for different stakeholders (funders, institutions, publishers, researchers, etc.). The goal of DORA is to improve “the way in which the quality of research output is evaluated”.  Read more on the DORA website and sign the declaration (I did!).

An alternative to IF?

If most of us can agree that impact factors are not a great way to assess researchers or their work, then what’s the alternative? Curry thinks the solution lies in Web 2.0 (quoted from this post):

…we need to find ways to attach to each piece of work the value that the scientific community places on it though use and citation. The rate of accrual of citations remains rather sluggish, even in today’s wired world, so attempts are being made to capture the internet buzz that greets each new publication…

That’s right, skeptical scientists: he’s talking about buzz on the internet as a way to assess impact. Read more about “alternative metrics” in my blog post on the subject: The Future of Metrics in Science.  Also check out the list of altmetrics-related tools at altmetrics.org. The great thing about altmetrics is that they don’t rely solely on citation counts, plus they are capable of taking other research products into account (like blog posts and datasets).

Other good reads on this subject:

Tagged , , ,

The New OSTP Policy & What it Means

Last week, the White House Office of Science and Technology Policy (OSTP) responded to calls for broader access to federally funded research. I was curious as to whether this policy had any teeth, so I actually read the official memorandum. Here I summarize and have a few thoughts.

The overall theme of the document is best represented by this phrase:

…wider availability of peer-reviewed publications and scientific data in digital formats will create innovative economic markets for services related to curation, preservation, analysis, and visualization.

OSTP must have fielded early concerns  from journal publishers, because several times in the memo there were sentiments like this:

The Administration also recognizes that publishers provide valuable services, including the coordination of peer review, that are essential for ensuring the high quality and integrity of many scholarly publications. It is critical that these services continue to be made available.

And now we get to the big change:

Federal agencies investing in research and development (more than $100 million in annual expenditures) must have clear and coordinated policies for increasing public access to research products

Each of the agency plans is required to outline strategies to:

  • leverage existing archives and partnerships with journals
  • improve public’s ability to locate and access data
  • provide optimized search, archival, and dissemination features that encourage accessibility and interoperability
  • notify researchers of their new obligations for increasing access to research products (e.g., guidance, conditions for funding)
  • measure and enforce researcher compliance

Draft plans for each agency are due within 6 months of the memo. This is all great news for open science advocates: agencies must require researchers to comply with open data mandates and help them do it.

Hopefully the teeth in this new OSTP memo won't be slowed down by its tiny arms. From Flickr by Hammerhead27

Hopefully the teeth in this new OSTP memo won’t be slowed down by its tiny arms. From Flickr by Hammerhead27

The memo then outlines what agency plans should include, breaking the guidelines into those for scientific articles, and those for data.

Scientific Articles:

New agency plans must include provisions for open access to scientific articles reporting on research. The memo provides two main guidelines related to this:

  • public access to research articles (including the ability to read, download, and analyze digitally) should happen within about 12 months post-publication
  • there should be free, full public access to the research article’s metadata, in standard format

Scientific Data:

First, the memo defines data:

…digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications, but does not include laboratory notebooks, preliminary analyses, drafts of scientific papers, plans for future research, peer review reports, communications with colleagues, or physical objects, such as laboratory specimens.

It then sets the following guidelines. The agency plans should:

  1. Maximize free public access while keeping in mind privacy/confidentiality, proprietary interests, and that not all data should be kept forever
  2. Ensure researchers create data management plans
  3. Allow costs for data preservation and access in proposal budgets
  4. Ensure evaluation of data management plan merits
  5. Ensure researchers comply with their data management plans
  6. Promote data deposition into public repositories
  7. Encourage public/private partnerships to ensure interoperability
  8. Develop approaches for identification and attribution of datasets
  9. Educate folks about data stewardship
  10. Assess long-term needs for repositories and infrastructure

This list got me excited: there might actually be some teeth in #4 and #5 above. We all know that the NSF’s data management plan requirements has been rather weak up to now, but this implies that there will now be more teeth to the requirement.

I’m also quite pleased to see #6: data should be deposited in public repositories. The icing on the cake is #8: datasets need identification and attribution. Overall, my feelings about this list can be summed up by one word – hooray!

Official versions of related documents:

Tagged , , ,

Thoughts on Data Publication

If you read last week’s post on the IDCC meeting in Amsterdam, you may know that today’s post was inspired by a post-conference workshop on Data Publication, sponsored by the PREPARDE group. The workshop was “Data publishing, peer review and repository accreditation: everyone a winner?” (to access the workshop agenda, goals, and slides, go to the conference workshop website and scroll down to Workshop 6).

Basically the workshop focused on all things data publication, and incited lively discussion among those in attendance. Check out the workshop’s Twitter backchannel via this Storify by Sarah Callaghan of STFC.  My previous blog post about data publication sums it up like this:

The concept of data publication is rather simple in theory: rather than relying on journal articles alone for scholarly communication, let’s publish data sets as “first class citizens”.  Data sets have inherent value that makes them standalone scholarly objects— they are more likely to be discovered by researchers in other domains and working on other questions if they are not associated with a specific journal and all of the baggage that entails.

Stealing shamelessly from Sarah’s presentation, I’m providing a brief overview of issues surrounding data publication for those not well-versed:

First, the benefits of data publication:

  • Allows credit to data producers and curators (via data citation and emerging altmetrics)
  • Encourages reuse of datasets and discourages duplication of effort
  • Encourages proper curation and management of data (you don’t want to share messy data, right?)
  • Ensures completeness of the scientific record, as well as transparency and reproducibility of research (fundamental tenets of the scientific method!)
  • Improves discoverability of datasets (they will never be discovered on that old hard drive in your desk drawer)

We had an internal meeting here at CDL yesterday about data publication. After running through this list of benefits for those in attendance, one of my colleagues asked the question: “Does listing these benefits work? Do researchers want to publish their data?” I didn’t hesitate to answer “No”.

Why not? The biggest reason is a lack of time. Preparing data for sharing and publication is laborious, and overstretched researchers aren’t motivated by these benefits given the current incentive structures in research (papers, papers, papers. And citation of those papers.). Of course, I think this is changing in the very near future. Check out my post on data sharing mandates in the works. So let’s go with the assumption that researchers want to publish. How do they go about this?

Methods for “publishing” data:

  • A personal or lab webpage. This is a common choice for researchers who wish to share data since they can maintain control of the datasets. However, there are issues with stability, persistence, discoverability of these data, siloed on individual websites. Plus, website maintenance often falls to the bottom of a researcher’s to-do list.
  • A disciplinary repository. This is a common solution for only a select few data types (e.g., genetic data). Most disciplines are still awaiting a culture change that will motivate researchers to share their data in this way.
  • An institutional repository. Of course, researchers have to know that this is an option (most don’t), and must then properly prepare their data for deposit.
  • Supplementary materials.  In this case, the data accompany a primary journal article as supporting information. I recently shared data this way, but recognized that the data should also be placed in a curated repository.  There are a few reasons for this apparent duplication:
    • Supplemental materials are sometimes not available many years after publication due to broken links.
    • Journals are not particularly excited about archiving lots of supplementary data, especially if it’s a large volume of data. This is not their area of expertise, after all.
  • Data article. This is a new-ish option: basically, you publish your data in a proper data journal (see this semi-complete list of data journals on the PREPARDE blog).

Wondering what a “data article” is? Let’s look to Sarah again:

A data article describes a dataset, giving details of its collection, processing, software, file formats, et cetera, without the requirement of  novel analyses or ground-breaking conclusions.

That is, it’s a standalone product of research that can be cited as such. There is much debate surrounding such data articles. Among the issues are:

  • Is it really “publication”? How is this different from a landing page for the dataset that’s stored in a repository?
  • Traditional academic use of “publication” implies peer review. How do you review datasets?
  • How should publication differ depending on the discipline?

There are no easy answers to these questions, but I love hearing the debate. I’m optimistic that the forthcoming person we hire as a data publication postdoc will have some great ideas to contribute. Stay tuned!

Amsterdam! CC-BY license, C. Strasser

Amsterdam! CC-BY license, C. Strasser


Tagged , , ,

NSF now allows data in biosketch accomplishments

Hip hip hooray for data! Contributed to Calisphere by Sourisseau Academy for State and Local History (click for more information)

Hip hip hooray for data! Contributed to Calisphere by Sourisseau Academy for State and Local History (click for more information)

Back in October, the National Science Foundation announced changes to its Grant Proposal Guidelines (Full GPG for January 2013 here).  I blogged about this back when the announcement was made, but now that the changes are official, I figure it warrants another mention.

As of January 2013, you can now list products in your biographical sketches, not just publications. This is big (and very good) news for data advocates like myself.

The change is that the biosketch for senior personnel should contain a list of 5 products closely related to the project and 5 other significant products that may or may not be related to the project. But what counts as a product? “products are…including but not limited to publications, data sets, software, patents, and copyrights.”

To make it count, however, it needs to be both citable and accessible. How to do this?

  1.  Archive your data in a repository (find help picking a repo here)
  2. Obtain a unique, persistent identifier for your dataset (e.g., a DOI or ARK)
  3. Start citing your product!

For the librarians, data nerds, and information specialists in the group, the UC3 has put together a flyer you can use to promote listing data as a product. It’s available as a PDF (click on the image to the right to download). For the original PPT that you can customize for your institution and/or repository, send me an email.


Direct from the digital mouths of NSF:

Summary of changes: http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_sigchanges.jsp

Chapter II.C.2.f(i)(c), Biographical Sketch(es), has been revised to rename the “Publications” section to “Products” and amend terminology and instructions accordingly. This change makes clear that products may include, but are not limited to, publications, data sets, software, patents, and copyrights.

New wording: http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/gpg_2.jsp

(c) Products

A list of: (i) up to five products most closely related to the proposed project; and (ii) up to five other significant products, whether or not related to the proposed project. Acceptable products must be citable and accessible including but not limited to publications, data sets, software, patents, and copyrights. Unacceptable products are unpublished documents not yet submitted for publication, invited lectures, and additional lists of products. Only the list of 10 will be used in the review of the proposal.

Each product must include full citation information including (where applicable and practicable) names of all authors, date of publication or release, title, title of enclosing work such as journal or book, volume, issue, pages, website and Uniform Resource Locator (URL) or other Persistent Identifier.

Tagged , , , ,

Resources, and Versions, and Identifiers! Oh, my!

The only constant is change.  —Heraclitus

Data publication, management, and citation would all be so much easier if data never changed, or at least, if it never changed after publication. But as the Greeks observed so long ago, change is here to stay. We must accept that data will change, and given that fact, we are probably better off embracing change rather than avoiding it. Because the very essence of data citation is identifying what was referenced at the time it was referenced, we need to be able to put a name on that referenced quantity, which leads to the requirement of assigning named versions to data. With versions we are providing the x that enables somebody to say, “I used version x of dataset y.”

Since versions are ultimately names, the problem of defining versions is inextricably bound up with the general problem of identification. Key questions that must be asked when addressing data versioning and identification include:

  • What is being identified by a version? This can be a surprisingly subtle question. Is a particular set of bits being identified? A conceptual quantity (to use FRBR terms, an expression or manifestation)? A location? A conceptual quantity at a location? For a resource that changes rapidly or predictably, such as a data stream that accumulates over time, it will probably be necessary to address the structure of the stream separately from the content of the stream, and to support versions and/or citation mechanisms that allow the state of the stream to be characterized at the time of reference. In any case, the answer to the question of what is being identified will greatly impact both what constitutes change (and therefore what constitutes a version) and the appropriateness of different identifier technologies to identifying those versions.
  • When does a change constitute a new version? Always? Even when only a typographical error is being corrected? Or, in a hypertext document, when updating a broken hyperlink? (This is a particularly difficult case, since updating a hyperlink requires updating the document, of course, but a URL is really a property of the identifiee, not the identifier.) In the case of a science dataset, does changing the format of the data constitute a new version? Reorganizing the data within a format (e.g., changing from row-major to column-major order)? Re-computing the data on different floating-point hardware? Versions are often divided into “major” versions and “minor” versions to help characterize the magnitude and backward-compatibility of changes.
  • Is each version an independent resource? Or is there one resource that contains multiple versions? This may seem a purely semantic distinction, but the question has implications on how the resource is managed in practice. The W3C struggled with this question in identifying the HTML specification. It could have created one HTML resource with many versions (3.1, 4.2, 5, …), but for manageability it settled on calling HTML3 one resource (with versions 3.1, 3.2, etc.), HTML4 a separate resource (with analogous versions 4.1, 4.2, etc.), and continuing on to HTML5 as yet another resource.

So far we have only raised questions, and that’s the nature of dealing with versions: the answers tend to be very situation-specific. Fortunately, some broad guidelines have emerged:

  • Assign an identifier to each version to support identification and citation.
  • Assign an identifier to the resource as a whole, that is, to the resource without considering any particular version of the resource. There are many situations where it is desirable to be able to make a version-agnostic reference. Consider that, in the text above, we were able to refer to something called “HTML4” without having to name any particular version of that resource. What if that were not possible?
  • Provide linkages between the versions, and between the versions and the resource as a whole.

These guidelines still leave the question of how to actually assign identifiers to versions unanswered. One approach is to assign a different, unrelated identifier to each version. For example, doi:10.1234/FOO might refer to version 1 of a resource and doi:10.5678/BAR to version 2. Linkages, stored in the resource versions themselves or externally in a database, can record the relationships between these identifiers. This approach may be appropriate in many cases, but it should be recognized that it places a burden on both the resource maintainer (every link that must be maintained represents a breakage point) and user (there is no easily visible or otherwise obvious relationship between the identifiers). Another approach is to syntactically encode version information in the identifiers. With this approach, we might start with doi:10.1234/FOO as a base identifier for the resource, and then append version information in a visually apparent way. For example, doi:10.1234/FOO/v1 might refer to version 1, doi:10.1234/FOO/v2 to version 2, and so forth. And in a logical extension we could then treat the version-less identifier doi:10.1234/FOO as identifying the resource as a whole. This is exactly the approach used by the arXiv preprint service.

Resources, versions, identifiers, citations: the issues they present tend to get bound up in a Gordian knot.  Oh, my!

Further reading:

ESIP Interagency Data Stewardship/Citations/Provider Guidelines

DCC “Cite Datasets and Link to Publications” How-to Guide

Resources, Versions, and URIs

Tagged , ,

DataCite Metadata Schema update


This spring, work is underway on a new version of the DataCite metadata schema. DataCite is a worldwide consortium founded in 2009 dedicated to “helping you find, access, and reuse data.” The principle mechanism for doing so is the registration of digital object identifiers (DOIs) via the member organizations. To make sure dataset citations are easy to find, each registration for a DataCite DOI has to be accompanied by a small set of citation metadata. It is small on purpose:  this is intended to be a “big tent” for all research disciplines. DataCite has specified these requirements with a metadata schema.

The team in charge of this task is the Metadata Working Group. This group responds to suggestions from DataCite clients and community members. I chair the group, and my colleagues on the group come from the British Library, GESIS, the TIB, CISTI, and TU Delft.

The new version of the schema, 2.3, will be the first to be paired with a corresponding version in the Dublin Core Application Profile format. It fulfills a commitment that the Working Group made with its first release in January of 2011. The hope is that the application profile will promote interoperability with Dublin Core, a common metadata format in the library community, going forward. We intend to maintain synchronization between the schema and the profile with future versions.

Additional changes will include some new selections for the optional fields including support for a new relationType (isIdenticalTo), and we’re considering a way to specify temporal collection characteristics of the resource being registered. This would mean describing, in simple terms and optionally, a data set collected between two dates. There are a few other changes under discussion as well, so stay tuned.

DataCite metadata is available in the Search interface to the DataCite Metadata Store. The metadata is also exposed for harvest, via an OAI-PMH protocol. California Digital Library is a founding member, and our DataCite implementation is the EZID service, which also offers ARKs, an alternative identifier scheme. Please let me know if you have any questions by contacting uc3 at ucop.edu.

Tagged , , ,

EZID: now even easier to manage identifiers

EZID, the easy long-term identifier service, just got a new look. EZID lets you create and maintain ARKs and DataCite Digital Object Identifiers (DOIs), and now it’s even easier to use:

  • One stop for EZID and all EZID information, including webinars, FAQs, and more.

    Image by Simon Cousins

    • A clean, bright new look.
    • No more hunting across two locations for the materials and information you need.
  • NEW Manage IDs functions:
    • View all identifiers created by logged-in account;
    • View most recent 10 interactions–based on the account–not the session;
    • See the scope of your identifier work without any API programming.
  • NEW in the UI: Reserve an Identifier
    • Create identifiers early in the research cycle;
    • Choose whether or not you want to make your identifier public–reserve them if you don’t;
    • On the Manage screen, view the identifier’s status (public, reserved, unavailable/just testing).

In the coming months, we will also be introducing these EZID user interface enhancements:

  • Enhanced support for DataCite metadata in the UI;
  • Reporting support for institution-level clients.

So, stay tuned: EZID just gets better and better!

Tagged , ,