Thursday, October 13, 2016

Dataset metadata for search engine optimization

Last week I wrote a post on metadata. Google is experimenting with a new metadata schema it calls Science Datasets that will allow it to better make public datasets discoverable.

The mechanism is under development and they are currently soliciting interested parties with the following kinds of public data:

  • A table or a CSV file with some data
  • A file in a proprietary format that contains data
  • A collection of files that together constitute some meaningful dataset
  • A structured object with data in some other format that you might want to load into a special tool for processing
  • Images capturing the data
  • Anything that looks like a dataset to you

In your metadata schema you can use any of the schema.org dataset properties, but it should contain at least the following basic properties: name, description, url, sameAs, version, keywords, variableMeasured, and creator.name. If your dataset is part of a corpus, you can reference it in the includedInDataCatalog property.

There are also properties for download information, temporal coverage, spatial coverage, citations and publications, and provenance and license information.

This is a worthwhile effort to make your research and public datasets more useful to the community.

Creative Commons LicenseGoogle