Author Archives: John Kratz

Data metrics survey results published

Today, we are pleased to announce the publication Making Data Count in Scientific Data. John Kratz and Carly Strasser led the research effort to understand the needs and values of both the researchers who create and use data and of the data managers who preserve and publish it. The Making Data Count project is a collaboration between the CDL, PLOS, and DataONE to define and implement a practical suite of metrics for evaluating the impact of datasets, which is a necessary prerequisite to widespread recognition of datasets as first class scholarly objects.

We started the project with research to understand what metrics would be meaningful to stakeholders and what metrics we can practically collect. We conducted a literature review, focus groups, and– the subject of today’s paper–  a pair of online surveys for researchers and data managers.

In November and December of 2014, 247 researchers and 73 data repository managers answered our questions about data sharing, use, and metrics.Graph of interest in various metrics Survey and anonymized data are available in the Dash repository. These responses told us, among other things, which existing Article Level Metrics (ALMs) might be profitably applied to data:

  • Social media: We should not worry excessively about capturing social media (Twitter, Facebook, etc.) activity around data yet, because there is not much to capture. Only 9% of researchers said they would “definitely” use social media to look for a dataset.
  • Page views: Page views are widely collected by repositories but neither researchers nor data managers consider them meaningful. (It stands to reason that, unlike a paper, you can’t have engaged very deeply with a dataset if all you’ve done is read about it.)
  • Downloads: Download counts, on the other hand, are both highly valuable and practical to collect. Downloads were a resounding second-choice metric for researchers and 85% of repositories already track them.
  • Citations: Citations are the coin of the academic realm. They were by far the most interesting metric to both researchers and data managers. Unfortunately, citations are much more difficult than download counts to work with, and relatively few repositories track them. Beyond technical complexity, the biggest challenge is cultural: data citation practices are inconsistent at best, and formal data citation is rare. Despite the difficulty, the value of citations is too high to ignore, even in the short term.

We have already begun to collect data on the sample project corpus– the entire DataONE collection of 100k+ datasets. Using this pilot corpus, we see preliminary indications of researcher engagement with data across a number of online channels not previously thought to be in use by scholars. The results of this pilot will complement the survey described in today’s paper with real measurement of data-related activities “in the wild.”

For more conclusions and in-depth discussion of the initial research, see the paper, which is open access and available here: http://dx.doi.org/10.1038/sdata.2015.39. Stay tuned for analysis and results of the DataONE data-level metrics data on the Making Data Count project page: http://lagotto.io/MDC/.

Make Data Rain

Last October, UC3,  PLOS, and DataONE launched Making Data Count, a collaboration to develop data-level metrics (DLMs). This 12-month National Science Foundation-funded project will pilot a suite of metrics to track and measure data use that can be shared with funders, tenure & promotion committees, and other stakeholders.

Featured image

[image from Freepik]

To understand how DLMs might work best for researchers, we conducted an online survey and held a number of focus groups, which culminated on a very (very) rainy night last December in a discussion at the PLOS offices with researchers in town for the 2014 American Geophysical Union Fall Meeting.

Six eminent researchers participated:

Much of the conversation concerned how to motivate researchers to share data. Sources of external pressure that came up included publishers, funders, and peers. Publishers can require (as PLOS does) that, at a minimum, the data underlying every figure be available. Funders might refuse to ‘count’ publications based on unavailable data, and refuse to renew funding for projects that don’t release data promptly. Finally, other researchers– in some communities, at least– are already disinclined to work with colleagues who won’t share data.

However, Making Data Count is particularly concerned with the inverse– not punishing researchers who don’t share, but rewarding those who do. For a researcher, metrics demonstrating data use serve not only to prove to others that their data is valuable, but also to affirm for themselves that taking the time to share their data is worthwhile. The researchers present regarded altmetrics with suspicion and overwhelmingly affirmed that citations are the preferred currency of scholarly prestige.

Many of the technical difficulties with data citation (e.g., citing  dynamic data or a particular subset) came up in the course of the conversation. One interesting point was raised by many: when citing a data subset, the needs of reproducibility and credit diverge. For reproducibility, you need to know exactly what data has been used– at a maximum level of granularity. But, credit is about resolving to a single product that the researcher gets credit for, regardless of how much of the dataset or what version of it was used– so less granular is better.

We would like to thank everyone who attended any of the focus groups. If you have ideas about how to measure data use, please let us know in the comments!

Tagged , ,

Fifteen ideas about data validation (and peer review)

Phrenology diagram showing honest and dishonest head shapes

It’s easy to evaluate a person by the shape of their head, but datasets are more complicated. From Vaught’s Practical Character Reader in the Internet Archive.

Many open issues drift around data publication, but validation is both the biggest and the haziest. Some form of validation at some stage in a data publication process is essential; data users need to know that they can trust the data they want to use, data creators need a stamp of approval to get credit for their work, and the publication process must avoid getting clogged with unusable junk. However, the scientific literature’s validation mechanisms don’t translate as directly to data as its mechanism for, say, citation.

This post is in part a very late response to a data publication workshop I attended last February at the International Digital Curation Conference (IDCC). In a breakout discussion of models for data peer review, there were far more ideas about data review than time to discuss them. Here, for reference purposes, is a longish list of non-parallel, sometimes-overlapping ideas about how data review, validation, or quality assessment could or should work. I’ve tried to stay away from deeper consideration of what data quality means (which I’ll discuss in a future post) and from the broader issues of peer review associated with the literature, but they inevitably pop up anyway.

  1. Data validation is like peer review of the literature: Peer review is an integral part of science; even when they resent the process, scientists understand and respect it. If we are to ask them to start reviewing data, it behooves us to slip data into existing structures. Data reviewed in conjunction with a paper fits this approach. Nature publishing group’s Scientific Data publishes data papers through a traditional review process that considers the data as well as the paper. Peer review at F1000Research follows a literature-descended (although decidedly non-traditional) process that asks reviewers to examine underlying data together with the paper.
  2. Data validation is not like peer review of the literature: Data is fundamentally different from literature, and shouldn’t be treated as such. As Mark Parsons put it at the workshop, “literature is an argument; data is a fact.” The fundamental question in peer review of an article is “did the authors actually demonstrate what they claim?” This involves evaluation of the data, but in the context of a particular question and conclusion. Without a question, there is no context, and no way to meaningfully evaluate the data.
  3. Divide the concerns: Separate out aspects of data quality and consider them independently. For example, Sarah Callaghan divides data quality into technical and scientific quality. Technical quality demands complete data and metadata and appropriate file formats; scientific quality requires appropriate collection methods and high overall believability.
  4. Divvy up the roles: Separate concerns need not be evaluated by the same person or even the same organization. For instance, GigaScience assigns a separate data reviewer for technical review. Data paper publishers generally coordinate scientific review and leave at least some portion of the technical review to the repository that houses the data. Third party peer-review services like LIBRE or Rubriq could conceivably take up data review.
  5. Review data and metadata together: A reviewer must assess data in conjunction with its documentation and metadata. Assessing data quality without considering documentation is both impossible and pointless; it’s impossible to know that data is “good” without knowing exactly what it is and, even if one could, it would be pointless because no one will ever be able to use it. This idea is at least implicit any data review scheme. In particular, data paper journals explicitly raise evaluation of the documentation to the same level as evaluation of the data. Biodiversity Data Journal’peer review guidelines are not unusual in addressing not only the quality of the data and the quality of the documentation, but the consistency between them.
  6. Experts should review the data: Like a journal article, a dataset should pass review by experts in the field. Datasets are especially prone to cross-disciplinary use, in which case the user may not have the background to evaluate the data themselves. Sarah Callaghan illustrated how peer review might work– even without a data paper– by reviewing a pair of (already published) datasets.
  7. The community should review the data: Like a journal article, the real value of a dataset emerges over time as a result of community engagement. After a slow start, post-publication commenting on journal articles (e.g. through PubMed Commons) seems to be gaining momentum.
  8. Users should review the data: Data review can be a byproduct of use. A researcher using a dataset interrogates it more thoroughly than someone just reviewing it. And, because they were doing it anyway, the only “cost” is the effort of capturing their opinion. In a pilot study, the Dutch Data Archiving and Networked Services repository solicited feedback by emailing a link to an online form to researchers who had downloaded their data.
  9. Use is review: “Indeed, data use in its own right provides a form of review.” Even without explicit feedback, evidence of successful use is itself evidence of quality. Such evidence could be presented by collecting a list of papers that cite to the dataset.
  10. Forget quality, consider fitness for purpose: A dataset may be good enough for one purpose but not another. Trying to assess the general “quality” of a dataset is hopeless; consider instead whether the dataset is suited to a particular use. Extending the previous idea, documentation of how and in what contexts a dataset has been used may be more informative than an assessment of abstract quality.
  11. Rate data with multiple levels of quality: The binary accept/reject of traditional peer review (or, for that matter, fit/unfit for purpose) is overly reductive. A one-to-five (or one-to-ten) scale, familiar from pretty much the entire internet, affords a more nuanced view. The Public Library of Science (PLOS) Open Evaluation Tool applies a five-point scale to journal articles, and DANS users rated datasets on an Amazon-style five-star scale.
  12. Offer users multiple levels of assurance: Not all data, even in one place, needs be reviewed to the same extent. It may be sensible to invest limited resources to most thoroughly validate those datasets which are most likely to be used. For example, Open Context offers five different levels of assurance, ranging from “demonstration, minimal editorial acceptance” to “peer-reviewed.” This idea could also be framed as levels of service ranging (as Mark Parsons put it at the workshop) from “just thrown out there” to “someone answers the phone.”
  13. Rate data along multiple facets : Data can be validated or rated along multiple facets or axes. DANS datasets are rated on quality, completeness, consistency, and structure; two additional facets address documentation quality and usefulness of file formats. This is arguably a different framing of  divided concerns, with a difference in application: there, independent assessments are ultimately synthesized into a single verdict; here, the facets are presented separately.
  14. Dynamic datasets need ongoing review: Datasets can change over time, either through addition of new data or revision and correction of existing data. Additions and changes to datasets may necessitate a new (perhaps less extensive) review. Lawrence (2011) asserts that any change to a dataset should trigger a new review.
  15. Unknown users will put the data to unknown uses: Whereas the audience for, and findings of, a journal article are fairly well understood by the author, a dataset may be used by a researcher from a distant field for an unimaginable purpose. Such a person is both the most important to provide validation for– because they lack the expertise to evaluate the data themselves– and the most difficult– because no one can guess who they will be or what they will want to do.

Have an idea about data review that I left out? Let us know in the comments!

Finding Disciplinary Data Repositories with DataBib and re3data

This post is by Natsuko Nicholls and John Kratz.  Natsuko is a CLIR/DLF Postdoctoral Fellow in Data Curation for the Sciences and Social Sciences at the University of Michigan.

The problem: finding a repository

Everyone tells researchers not to abandon their data on a departmental server, hard drive, USB stick , CD-ROM, stack of Zip disks, or quipu– put it in a repository! But, most researchers don’t know what repository might be appropriate for their data. If your organization has an Institutional Repository (IR), that’s one good home for the data. However, not everyone has access to an IR, and data in IRs can be difficult for others to discover, so it’s important to consider the other major (and not mutually exclusive!) option: deposit in a Disciplinary Repository (DR).

Many disciplinary repositories exist to handle data from a particular field or of a particular type (e.g. WormBase cares about nematode biology, while GenBank takes only DNA sequences). Some may be asking if the co-existence of IRs and DRs means competition or is mutually beneficial to both universities and research communities, some may be wondering how many repositories are out there for archiving digital assets, but most librarians and researchers just want to find an appropriate repository in a sea of choices.

For those involved in assisting researchers with data management, helping to find the right place to put data for sharing and preservation has become a crucial part of data services. This is certainly true at the University of Michigan—during a recent data management workshop for faculty, faculty members expressed their interest in receiving more guidance on disciplinary repositories from librarians.

The help: directories of data repositories

Fortunately, there is help to be found in the form of repository directories.  The Open Access Directory maintains a subdirectory of data repositories.  In the Life Sciences, BioSharing collects data policies, standards, and repositories.  Here, we’ll be looking at two large directories that list repositories from any discipline: DataBib and the REgistry of REsearch data REpositories (re3data.org).

DataBib originated in a partnership between Purdue and Penn State University, and it’s hosted by Purdue. The 600 repositories in DataBib are each placed in a single discipline-level category and tagged with more detailed descriptors of the contents.

re3data.org, which is sponsored by the German Research Foundation, started indexing relatively recently, in 2012, but it already lists 628 repositories.  Unlike DataBib, repositories aren’t assigned to a single category, but instead tagged with subjects, content types, and keywords.  Last November, re3data and BioSharing agreed to share records.  re3data is more completely described in this paper.

Given the similar number of repositories listed in DataBib and re3data, one might expect that their contents would be roughly similar and conclude that there are something around 600 operating DRs.  To test this possibility and get a better sense of the DR landscape, we examined the contents of both directories.

The question: how different are DataBib and re3data?

Repository overlap is only 19%Contrary to expectation, there is little overlap between the databases.  At least 1,037 disciplinary data repositories currently exist, and only 18% (191) are listed in both databases.  That’s a lot to look for one right place to put data, because except for a few exceptions, most IRs are not listed in re3data and Databib (you can find  a long list of academic open access repositories).  Of the repositories in both databases, a majority (72%) are categorized into STEM fields. Below is a breakdown of the overlap by discipline (as assigned by DataBib).

CrossoverRepositories

Another way of characterizing repository collections by re3data and Databib is by the repository’s host country. In re3data, the top three contributing countries (US 36%, Germany 15%, UK 12%) form the majority, whereas in Databib 58% of repositories are hosted by the US, followed by UK (12%) and Canada (7%). This finding may not be too surprising, since re3data is based in Germany and Databib is in the US.  If you are a researcher looking for the right disciplinary data repository, the host country may matter, depending on your (national-international/private-public) funding agencies and the scale of collaboration.

The full list of repositories is available here .

The conclusion: check both

Going forward, help with disciplinary repository selection will be increasingly be a part of data management workflows; the Data Management Planing Tool (DMPTool) plans to incorporate repository recommendations through DataBib, and DataCite may integrate with re3data. Further simplifying matters, DataBib and re3data plan to merge their services in some as-yet-undefined way.  But, for now, it’s safe to say that anyone looking for a disciplinary repository should check both DataBib and re3data.

Tagged , , ,

Data Publication Practices and Perceptions

Surveyors working

Credit: Captain Harry Garber, C&GS. From NOAA Photo Library

Today, we’re opening a survey of researcher perceptions and practices around data publication.

Why are you doing a survey?

The term “Data publication” applies language and ideas from traditional scholarly publishing to datasets, with the goal of situating data within the academic reward system and encouraging sharing and reuse. However, the best way to apply these ideas to data is not obvious. The library community has been productively discussing these issues for some time; we hope to contribute by asking researchers directly what they would expect and want from a data publication.

Who should take it?

We are interested in responses from anyone doing research in any branch of the Sciences or Social Sciences at any level (but especially PIs and postdocs).

What do you hope to learn?

  • What do researchers think it means to “publish” data? What do they expect from “peer review” of data?
  • As creators of data, how do they want to be credited? What do they think is adequate?
  • As users of published data, what would help them decide whether to work with a dataset?
  • In evaluating their colleagues, what dataset metrics are most useful? What would be most impressive to, for instance, tenure & promotions committees?

What will you do with the results?

The results will inform the CDL’s vision of data publication and influence our efforts. Additionally, the results will be made public for use by anyone.

What do you want from me?

If you are a researcher, please take 5-10 minutes to complete the survey and consider telling your colleagues about it.

If you are a librarian or other campus staff, please consider forwarding the link to any researchers, departments, or listservs that you feel are appropriate. The text of an email describing the survey can be found here.

The survey can be found at:

http://goo.gl/PuIVoC

 

Link to survey

If you have any questions or concerns, email me or comment on this post.

A forthcoming experiment in data publication

What we’re doing:

Like these dapper gentlemen, as small or as large as needed... From the Public Domain Review.

Like these dapper gentlemen, as small or as large as needed…
From The Public Domain Review.

Some time next year, the CDL will start an experiment in data publication. Our version of data publication will look like lightweight, non-peer reviewed dataset descriptions. These publications are designed to be flexible in structure and size. At a minimum, each document must have six elements:

  • Title
  • Creator(s)
  • Publisher
  • Publication year
  • Identifier (e.g.DOI or ARK)
  • Citation to the dataset

This bare bones document can expand to be richly descriptive, with optional items like subject keywords, version number, spatial or temporal range, collection methods, and as much description as the author cares to suppy.

Why we’re doing it:

The general agreement expressed in the recently released draft FORCE11 Declaration of Data Citation Principles –that datasets should be treated like “first class” research objects in how they are discovered, cited, and recognized– is still far from reality. Datasets are largely invisible to search engines, and authors rarely cite them formally.

A solution being implemented by a number of journals (e.g. Nature Scientific Data and Geoscience Data Journal
) is to publish proxy objects for discovery and citation called “data descriptors” or, more commonly, “data papers”. Data papers are formal scholarly publications that describe a dataset’s rationale and collection methods, but don’t analyze the data or draw any conclusions. Peer reviewers ensure that the paper contains all the information needed to use, re-use, or replicate the dataset.

The strength of the data paper approach– creators must write up rich and useful metadata to pass peer review– leads directly to the weakness: a data paper often takes more time and energy to produce than dataset creators are willing to invest. In a 2011 survey, researchers said that the biggest impediment to publishing data is lack of time. For researchers who manage to publish datasets but lack time to write and submit (and revise and resubmit) a data paper, we will provide some of the benefits of a data paper at none of the cost.

How we’re doing it:

We will publish these documents through EZID (easy-eye-dee), an identifier service that has supplied DataCite DOIs to over 167,000 datasets. All of the dataset metadata records have at least the five elements required by the DataCite metadata schema, more than 2,000 already have abstracts, and another 2,000 have other kinds of descriptive metadata. EZID will begin using dataset metadata to automatically generate publications that can be viewed as HTML in a web browser or as a dynamically generated PDF. The documents will be hosted by EZID in a format optimized for indexing by search engines like Google and Google Scholar.

Dataset creators won’t have to do anything to get a publication that they don’t already have to do to get a DOI. If the creator only fills in the required metadata, the document will function as a cover-sheet or landing page. If they submit an abstract and methods, the document expands to begin to look like a traditional journal article (while retaining the linking functionality of a landing page). It will capture as much effort as the researcher puts forth, whether that’s a lot or very little.

Do you have thoughts or comments on our idea? We would love to hear from you! Comment on this blog post or email us at uc3@ucop.edu.

Data Citation Developments

Citation is a defining feature of scholarly publication and if we want to say that a dataset has been published, we have to be able to cite it. The purpose of traditional paper citations– to recognize the work of others and allow readers to judge the basis of the author’s assertions– align with the purpose of data citations. Check out previous posts on the topic here.

Although in the past, datasets and databases have usually been mentioned haphazardly, if at all, in the body of a paper and left out of the list of references, this no longer has to be the case.

Last month, there was quite a bit of activity on the data citation front:

  1. Importance: Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications.

  2. Credit and Attribution: Data citations should facilitate giving scholarly credit and normative and legal atribution to all contributors to the data, recognizing that a single style or mechanism of atribution may not be applicable to all data.

  3. Evidence: Where a specific claim rests upon data, the corresponding data citation should be provided.

  4. Unique Identifiers: A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community.

  5. Access: Data citations should facilitate access to the data themselves and to such associated metadata, documentation, and other materials, as are necessary for both humans and machines to make informed use of the referenced data.

  6. Persistence: Metadata describing the data, and unique identifiers should persist, even beyond the lifespan of the data they describe.

  7. Versioning and Granularity: Data citations should facilitate identification and access to different versions and/or subsets of data. Citations should include sufficient detail to verifiably link the citing work to the portion and version of data cited.

  8. Interoperability and Flexibility: Data citation methods should be sufficiently flexible to accommodate the variant practices among communities but should not differ so much that they compromise interoperability of data citation practices across communities.

In the simplest case– when a researcher wants to cite the entirety of a static dataset– there seems to be a consensus set of core elements between DataCite, CODATA and others. There is less agreement with respect to more complicated cases, so let’s tackle the easy stuff first.

(Nearly) Universal Core Elements

  • Creator(s): Essential, of course, to publicly credit the researchers who did the work. One complication here is that datasets can have large (into the hundreds) numbers of authors, in which case an organizational name might be used.
  • Date: The year of publication or, occasionally, when the dataset was finalized.
  • Title: As is the case with articles, the title of a dataset should help the reader decide whether your dataset is potentially of interest. The title might contain the name of the organization responsible, or information such as the date range covered.
  • Publisher: Many standards split the publisher into separate producer and distributor fields. Sometimes the physical location (City, State) of the organization is included.
  • Identifier: A Digital Object Identifier (DOI), Archival Resource Key (ARK), or other unique and unambiguous label for the dataset.

Common Additional Elements

  • Location: A web address from which the dataset can be accessed. DOIs and ARKs can be used to locate the resource cited, so this field is often redundant.
  • Version: May be necessary for getting the correct dataset when revisions have been made.
  • Access Date: The date the data was accessed for this particular publication.
  • Feature Name: May be a formal feature from a controlled vocabulary, or some other description of the subset of the dataset used.
  • Verifier: Information that can be used to be make sure you have the right dataset.

Complications

Datasets are different from journal articles in ways that can make them more difficult to cite. The first issue is deep citation or granularity, and the second is dynamic data.

Deep Citation

Traditional journal articles are cited as a whole and it is left to the reader to sort through the article to find the relevant information. When citing a dataset, more precision is sometimes necessary. An analysis is done on part of a dataset, it can only be repeated by extracting exactly that subset of the data. Consequently, there is a desire for mechanisms allowing precise citation of data subsets. A number of solutions have been put forward:

  • Most common and least useful is to describe how you extracted the subset in the text of the article.

  • For some applications, such as time series, you many be able to specify a date or geographic range, or a limited number of variables within the citation.

  • Another approach is to mint a new identifier that refers to only the subset used, and refer back to the source dataset in the metadata of the subset. The DataCite DOI metadata scheme includes a flexible mechanism to specify relationships between objects, including that one is part of another.

  • The citation can include a Universal Numeric Fingerprint (UNF) as a verifier for the subset. A UNF can be used to test whether two datasets are identical, even if they are stored in different file formats. This won’t help you to find the subset you want, but it will tell you whether you’ve succeeded.

Dynamic Data

When a journal article is published, it’s set in stone. Corrections and retractions are are rare occurrences, and small errors like typos are allowed to stand. In contrast, some datasets can be expected to change over time. There is no consensus as to whether or how much change is permitted before an object must be issued a new identifier. DataCite recommends but does not require that DOIs point to a static object.

Broadly, dynamic datasets can be split into two categories:

  • Appendable datasets get new data over time, but the existing data is never changed. If timestamps are applied to each entry, inclusion of an access date or a date range in the citation may allow a user to confidently reconstruct the state of the dataset. The Federation of Earth Science Information Partners (ESIP), for instance, specifies that an add-on dataset be issued a DOI only once, and a time range specified in the citation. On the other hand, the Dataverse standard and DCC guidelines require new DOIs for any change. If the dataset is impractically large, the new DOI may cover a “time slice” containing only the new data. For instance, each year of data from a sensor could be issued its own DOI.

  • Data in revisable datasets may be inserted, altered, or deleted. Citations to revisable datasets are likely to include version numbers or access dates. In this case ESIP specifies that a new DOI should be minted for each “major” but not “minor” version. If a new DOI is required for each version, a “snapshot” of the dataset can be frozen from time to time and issued it’s own DOI.

Hello Data Publication World

from foodbeast.com

Greetings, all. I’m a new postdoc at the CDL and I’m very excited to be spending the next couple of years thinking about data publication.  Carly has discussed data publication several times before, but briefly, the goal is to improve dataset reproduction and reuse by publishing datasets as “first class” scholarly objects akin to journal articles- with the attendant opportunities for preservation, citation, and award of credit.

I spent most of grad school tickling worms with an eyebrow hair glued to a toothpick (this is true), but now I’m moving from lab to library as a CLIR/DLF Postdoctoral Fellow in Data Curation for the Sciences and Social Sciences.  The Sloan Foundation funds these fellowships to, as Josh Greenberg puts it, train “professionals with one foot in research and one foot in data curation”.

Partly for my own edification, I’m starting with a thorough survey of the data publication landscape.  I’ll be looking at current practices and proposals for data publication, citation, and peer-review.  I’m interested in questions like: How can the quality of a dataset evaluated?  How does the creator of a dataset get credit for it?  How do datasets remain findable, accessible, and useable in the future?  Does it even make sense to apply the terms “publication” or “peer-review” to data at all?

Where things go from there depends on how the survey turns out, so that’s much more up in the air.  One possibility is to put a workflow for data publication together from existing tools.  Another is to identify a need not met by existing tools that the CDL could address.

If you have ideas you’d like to share, please comment here or email me.

Shameless Plug: Applications for 2014 CLIR/DLF Fellowships are opening soon!

Tagged ,