Dataset selection procedures
The PDL is based on a model in which multiple copies of datasets are stored in full indefinitely. Data storage, however, is not an infinite resource. Therefore, the PDL operational system has obvious limits to the size and number of datasets that it can handle. Procedures therefore need to be put in place to decide which datasets should be included. The PDL is designed for base datasets suitable for future re-use in other applications, rather than data reworked specifically for a single research publication (sometimes termed 'data behind the graph').
PDL datasets will be of two types
- Datasets that have been ingested into the BODC system and subsequently exported. Candidate datasets of this type will be identified through negotiation between the scientists who supplied the data and the BODC data scientists responsible for their ingestion in consultation with BODC management. The technical quality of these datasets is BODC's responsibility.
- Datasets that have not yet been ingested into the BODC system but are destined for future ingestion. Again candidate datasets will be identified through negotiation between data originators and the BODC data scientists responsible for the data. The technical quality of these datasets (including metadata) to a standard deemed acceptable by BODC is the responsibility of the data originator. BODC will also judge the acceptability of candidate datasets in terms of their completeness, but not in terms of their scientific quality or value.
There is obviously a risk that a service of this type could be swamped by demand and potential contributors to the PDL should be prepared for possible disappointment, particularly for large volume datasets or if deadlines are short.
Suppliers to the BODC PDL should be aware that it lies outside the full BODC access control system. Datasets are publicly accessible with no need for users to register or log in. Consequently, user download monitoring is limited to 'what' and 'when' access statistics.