Research and Course Guides: Research Data Management: Metadata; Describing Data

Metadata and describing data

Metadata is documentation that describes data. It can take the form of embedded machine readable metadata, like that used in web pages, databases, and library catalogs, and/or can be presented in separate documents like manifests and codebooks.

Properly describing and documenting data allows users (the researchers themselves included) to understand and track important details of the work. In addition to describing data, having metadata about the data also facilitates search and retrieval of the data when deposited in a data repository.

In a lab setting, much of the content used to describe data is initially collected in a notebook; metadata is a more formal, sharable expression of this information. It can include content such as contact information, geographic locations, details about units of measure, abbreviations or codes used in the dataset, instrument and protocol information, survey tool details, provenance and version information and much more. Where no appropriate, formal metadata standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.

Metadata formats and standards

Metadata can take many different forms, from free text to standardized, structured, machine-readable, extensible content. Specific disciplines, repositories, or data centers may guide or even dictate the content and format of metadata, possibly using a formal standard. Because creation of standardized metadata can be difficult and time consuming, another consideration when selecting a standard is the availability of tools that can help generate the metadata (e.g. Morpho allows for easy creation of EML, Nesstar for DDI data, etc.).

The Digital Curation Center provides a catalog of common metadata standards, organized by discipline.

Some specific examples of metadata standards, both general and domain specific are:

Dublin Core - domain agnostic, basic and widely used metadata standard
DDI (Data Documentation Initiative) - common standard for social, behavioral and economic sciences, including survey data
EML (Ecological Metadata Language) - specific for ecology disciplines
ISO 19115 and FGDC-CSDGM (Federal Geographic Data Committee's Content Standard for Digital Geospatial Metadata) - for describing geospatial information
MINSEQE (MINimal information about high throughput SEQeuencing Experiments) - Genomics standard
FITS (Flexible Image Transport System) - Astronomy digital file standard that includes structured, embedded metadata
MIBBI - Minimum Information for Biological and Biomedical Investigations

Related information

Best Practices in Creating Metadata
Part of the ICPSR's Guide to Social Science Data Preparation and Archiving.
Minimum Information for Biological and Biomedical Investigations
MIBBI Project. Minimum Information guidelines from diverse bioscience communities.
DataOne Skillbuilding Hub - Description Best Practices

Best Practices menu

Data Services Librarian

Jim Kelly

He/him/his

Email Me

Contact:

O’Shaughnessy-Frey Library | LIB 115

651-962-5012

Subjects: Accounting, Business, Business Analytics, Computer & Information Sciences, Data & Statistics, Economics, Engineering, Finance, Operations & Supply Chain Mgmt