The Digital Curation Centre, based in the UK, has a handy section of their website on Disciplinary Metadata Standards. I was pretty darn excited to see that they took on the onerous task of helping researchers navigate the dark and stormy waters of metadata. I tweeted about it earlier this week and had big plans for referring scientists and librarians to this site. This morning, I took some time to look over the site and was a bit… disappointed. Let me explain.

Navigating metadata is hard. We need better tools and advice. Original image (Rembrandt’s Christ in the Storm on the Lake of Galilee) from Wikipedia.
First and foremost, the DCC is awesome. They have been a seemingly bottomless source of information and resources for me since I ventured into this data curation world. Everyone I’ve met from the DCC has been both congenial and helpful.
The DCC website has a list of metadata standards, broken out by discipline. There is a handy tag cloud for sussing out those in the list that might be most applicable to you. But that’s where the clarity and ease stops. Once you dig deeper into the metadata standards, there are links to websites that are completely indecipherable to the average joe (e.g., me). Jargon is abundant and navigation of these sites is, at best, based on an intimate knowledge of the metadata standard; at worst, erratic and inexplicable.
To be fair, my use case for this website might not be the one the DCC had in mind. Here is my idea for how someone might use this site: I imagine that a researcher is writing their data management plan for the NSF. They get to the question asking about what metadata standards they will use. They turn to the internet for help and end up on the DCC site. They are a hydrologist, so they go to the “Earth Science” section of the site and select “Hydrology” from the word cloud to narrow down the list. They are now faced with two links, one of which takes them to a second list of metadata standards. The links associated with the metadata standards take the researcher to external websites that are not always obviously helpful. At this point, the researcher is cranky and tired, picks one of the standards at random, and moves on to the next part of the data management plan.
In all of my use cases for this site, a researcher, research assistant, librarian, or grant writer is sifting through these standards, trying to make decisions without a very easy way to compare and contrast the potential metadata standards. I admit that there are no easy answers or solutions when it comes to many parts of data management and curation, however I don’t think that needs to be reflected in the resources that we provide.
My frustration is actually much larger than the DCC metadata website: we have no easy way for researchers to start understanding and creating metadata. I touched on this a bit in a Data Pub post last year about the communication difficulties among Nerds, Geeks and Dweebs. Basically, the folks that create the metadata and the websites housing information about the metadata are not clearly communicating to the researchers, who are in theory going to be the creators of this metadata. Furthermore, the tools available for assisting in metadata creation are generally not user-tested, buggy, and poorly documented.
I have high hopes for the libraries and curation communities to develop great tools for researchers to navigate the new-to-them world of data stewardship. I think this DCC site is a great step in the right direction: it’s collecting disparate information into a single location and organizing that information in a sensible way. However we have a long way to go before I can say with confidence that we have “good tools available for researchers to create metadata”.

This work, unless otherwise expressly stated, is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.


You mean you don’t find it helpful to be linked to a standard that has the intuitive name like ISO 19115? After all, if you want to know more about that standard you can simply buy the pdf for a mere CHF 224,00. Google tells me that’s about 23,731.23 US dollars. surely a bargain for something described as “Geographic information — Metadata”. (http://www.iso.org/iso/catalogue_detail.htm?csnumber=26020)
I agree with what you say, and I think the problem is even more complicated. Or maybe… it isn’t? My point is that you don’t usually have the luxury of picking the standard that seems to fit best (right discipline! fabulous tools!). Aside from creating your own personal metadata library, you pretty much have to use whatever is required by the repository where you’re putting your data. If that’s KNB, you use EML. If it’s a GIS data repository, it’s ISO-19115 (or FGDC-CSDGM). You could (and I’ve done this) create a metadata record according to any standard you like and deposit as a supplementary file if the repository you choose supports that, but you still need to create metadata according to whatever the repository’s requirements are. So, perhaps the *choices* are not so overwhelming after all? The tools issue I’ll concede is harder, for many standards, if the repositories themselves don’t have reasonable tools or interfaces.
I totally agree that the easiest way to figure out what metadata to use is to ask your favorite repo. However, often those that are new to data management don’t know about that step… Ideally they would be in touch with an expert librarian like yourself. So perhaps it’s back in the outreach camp?
Great conversation starter, Carly. I agree with Gail’s point, as you do, but I’ve noticed that in many (too many) cases, the “sharing” strategy of investigators is either posting data to their personal web page or stating that they will make data available by request. Sadly, both of these are perfectly acceptable options to proposal reviewers reading over a data management plan. In these cases, there are no guiding principles for the investigator in terms of selecting a metadata schema, and they probably aren’t likely to share their data anyway (regardless of the “requirement” from NSF).
I think the choices are still somewhat overwhelming to investigators, but the actual process of creating metadata (in XML, god forbid!) is a far greater hurdle. We need more consensus within disciplinary communities on a standard schema within that group, and we certainly need better tools to help them create metadata that is useful. The reality of the situation is that creating metadata is a huge amount of work, and it’s not a priority for many PIs. We need to create and market tools that make the process easier to have any chance of a widespread commitment within the scientific community to share their data in a productive way.
Great point Gail, and I can see where you are coming from. But for accuracy’s sake, let me say that many repositories support multiple metadata standards, and some support arbitrary metadata standards. The KNB, for example, supports arbitrary metadata standards (anything expressed in XML), and can easily be used to house EML, FGDC, ISO19115, Dublin Core, etc. That’s how we house such diverse metadata as EML and Kepler’s MoML workflow specifications in the same repository.
Two thumbs up for KNB’s standards-agnostic infrastructure. If think, though, that Carly’s point (or at least mine) is that there is still a great gulf between what is *possible* and what is actually *easy* for researchers to do themselves.
Gail – you hit the nail on the head. This was the disconnect I wrote about in a previous blog post (referenced in this one). Although tools exist, and there is flexibility and help available, there is a complete lack of effective communication among the different groups. Researchers need easy, which requires good communication.
Great post, Carly.
I completely agree that more mediation is needed. Collating resources and doing some basic filtering is only one initial step. Hopefully researchers will be able to draw on local support and experts from data repositories to help them to understand and navigate the dark and stormy waters of metadata further.
There’s definitely a need for more outreach so people know where they can turn for support
Sarah, DCC
Great topic. I just wanted to second Sarah’s point here about local support. A model that allows for an “embedded” data person in the research environment would be tremendously helpful in many cases. Researchers can’t (really) be expected to do all this on their own, and the library or repository experts who can help them are not always on their radar.
-Limor
ISPS, Yale University