The objective of data management within a data centre is to ensure that data may be reused with confidence decades after their collection and without the need for any kind of communication with the scientists who collected that data. This objective is of course shared by the PDL.
Consequently the following technical criteria, based on good practice criteria adopted across the NERC Environmental Data Centres, must be met for a dataset to be accepted for publication in the PDL.
- The format must be well-documented and conformant with widely accepted standards, such as ASCII or NetCDF. Preferably data formats should conform to internationally agreed content standards, such as CF-compliant NetCDF or SeaDataNet ASCII spreadsheet format.
- The format must be readable by tools that are freely available now and are likely to remain freely available indefinitely.
- Data files should be named in a clear and consistent manner throughout the dataset with filenames (rather than pathnames) that reflect the contents and uniquely identify the file. Filename extensions should conform to appropriate extensions for the file type. Filenames should be constructed from lower case letters, numbers, dashes and underscores and be no longer than 64 bytes.
- Parameters in data files should either be labelled using an internationally recognised standard vocabulary specifically designed for labelling parameters, such as the BODC Parameter Usage Vocabulary or CF Standard Names, or by local labels that are accompanied by clear, unambiguous plaintext descriptions.
- Units of measure must be included for all parameters and labelled following accepted standards such as UDUNITS or the SeaDataNet units vocabulary.
- Data must be accompanied by the following XML metadata documents
- A Dublin Core metadata record — including the dc:title, dc:creator, dc:subject, dc:period, dc:description, dc:contributor, dc:date, dc:language and dc:coverage elements.
- A discovery metadata record conforming to a recognised standard. Examples of such include
European Directory of Marine Environmental Data (EDMED).
Global Change Master Directory (GCMD) Directory Interchange Format (DIF).
Marine Environmental Data and Information Network (MEDIN) ISO19139 discovery metadata profile.
Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM).
- Data must be accompanied by sufficient usage metadata to enable its reliable reuse. Some of this (such as spatial-temporal co-ordinates, parameter labels and units of measure) may be embedded within the data files. The remainder should be included as standard XML documents (e.g. SensorML or ISO19156) or descriptive documents formatted in HTML or PDF.