Data policy

The following data policy was written and agreed by the MArProd steering committee. It provides the framework for short term and long term management of data and samples arising from the Marine Productivity thematic programme. A PDF copy of the report is also available Download PDF data policy (214 KB).


  1. Introduction
  2. Role of BODC
  3. Links with GLOBEC
  4. Minimum standards for stewardship of NERC data
  5. Data and sample acquisition
  6. Data formats and media
  7. Data backup policy
  8. Protection of data originators' Intellectual Property Rights
  9. Long term sample curation
  10. Data and sample availability
  11. Identifying data and samples for management purposes

1. Introduction

NERC requires all thematic programmes to carefully look after the data they collect - for the mutual benefit of programme participants (while the programme is running), and also to provide wider access and exploitation of data sets of long term importance. Properly managed environmental data provides a key NERC resource, which will be used long after the formal end of individual projects and programmes. The scale of effort dedicated to data stewardship should reflect the anticipated long term value of the data.

In the context of NERC Data Policy, "data may be held in either analogue or digital form and be stored either on paper or a variety of computer-compatible media; physical specimens in curated collections are outside the usual sense of the word". Nevertheless, it is appropriate for sample management issues to be also considered here, making clear which aspects apply in different sections.

2. Role of BODC

The British Oceanographic Data Centre (BODC) is the NERC Designated Data Centre for digital information arising from the Marine Productivity programme. BODC has provided data management services for many other multi-laboratory and multidisciplinary marine programmes, NERC and non-NERC (e.g. BOFS, LOIS, PRIME and OMEX), and has delivered high quality, accessible data sets, primarily via CDs, for further scientific use. Costs have been allocated in the MarProd budget for BODC services, and Dr Gwen Moncoiffé has been appointed contact person for liaison with the Steering Committee and individual projects.

BODC effort will focus on the quality control, integration and long term stewardship of data sets obtained from MarProd-supported cruises, under Phase 2 of the programme. For Phase 1 data and laboratory-derived data and models, BODC will not necessarily have such close involvement. Thus for non-cruise data, BODC is not expected to undertake quality control, and other data stewardship arrangements may apply. For Phase 1 awards that involve joint work with non-NERC bodies, project-specific data storage and access procedures may need to be negotiated. Nevertheless, BODC needs to be aware of the totality of data arising from the MarProd programme, and the programme Steering Committee needs to be assured that similar quality standards for data stewardship (see A6.4 below) apply to all data collected through MarProd support.

The programme recognizes the importance of direct involvement of BODC staff in planning MarProd fieldwork, including the opportunity to attend cruise planning meetings and relevant agenda items of Steering Committee meetings. BODC cruise participation is also welcomed (subject to berth availability), particularly when novel and/or relatively complex sampling arrangements are involved.

Some additional data management considerations result from the Marine Productivity programme being the main UK contribution to the international GLOBEC project (Global Ocean Ecosystem Dynamics). Whilst it is not a condition of award, all MarProd PIs are encouraged to provide the following information to GLOBEC, for international access: i) a basic project description; ii) metadata [= coverage, scope and derivation of a data set]; and iii) information on how actual data can (currently or in future) be obtained.

A DIF (Directory Interchange Format) entry should be used to provide this summary information to the GLOBEC International Project Office. Several projects have already done so, and their DIFs can be accessed via the globec web page (click on Data in the Menu Bar, then Metadata Portal etc with UK selected as Location). Hester Willson will be pleased to give further guidance on this matter.

4. Minimum standards of stewardship for NERC corporate data

The following minimum standards are expected to apply when (digital) data sets form part of NERC's enduring data resource
  • The ownership and Intellectual Property Rights to the data set must be established, and NERC's policy towards exploiting and making it available to third parties agreed
  • The data set must be catalogued to the level of detail required by a NERC Designated Data Centre, so that it can be mentioned in web-based NERC data catalogues
  • Formal responsibility for the custody of the data set must be agreed
  • The data must be fully "worked up" (i.e. calibrated, quality-controlled etc) with sufficient associated documentation to be of use to third parties without reference to the original collector
  • The technical details of how the data are to be stored, managed and accessed must be agreed and suitably documented
  • The technological implications must be established (digital data stewardship implies the need for an underlying infrastructure of IT equipment and support)
  • The resources needed to carry out these intentions over the planned life of the data, in terms of staff (whether in project teams or the Data Centre) and IT equipment/infrastructure must be estimated and sources identified.
  • A review mechanism must exist to reconsider periodically the costs and benefits of continuing to maintain the data. The intention to destroy or put at risk data should be publicised in advance, allowing time for response by interested parties.

The above NERC-wide requirements will be looked after "automatically" for the MarProd data sets managed by BODC. Nevertheless, PIs need to be aware of this framework, particularly if alternative means of long term data stewardship are envisaged.

5. Data and sample acquisition

A well-structured and user-friendly identification system is essential for cruise-based data collection and sample labeling. Such arrangements are traditionally the responsibility of the cruise Principal Scientist. For Marine Productivity, an overall consistency in approach is necessary - with cruise identifiers linked to unique combinations of site and station (gear cast) numbers. MarProd protocols are being developed in the context of the first North Atlantic cruise (Discovery 258; Nov-Dec 2001) with direct BODC involvement. Further information is provided in the D258 Cruise Report.

Sorting procedures and sample preservation protocols are being developed for biological material collected on MarProd cruises. Whilst formaldehyde solution is used for bulk zooplankton samples (from ARIES and Dual Methot nets), other preservation techniques apply to sub-samples for specific purposes; for example, liquid nitrogen for lipid work, ethanol for molecular biology and genetics, and -20°C freezing for isotope ratio analyses. Investigators with other needs should discuss their proposed preservation methods with the cruise Principal Scientist. The long term curation of biological material is discussed under paragraph 9 below.

Station identifiers, navigational information and "basic" oceanographic data (for which BODC will have quality-control responsibilities) must be provided to BODC by the Principal Scientist immediately after a MarProd cruise. Normal practice will be for BODC to meet the ship when it docks and to take delivery of this material together with a copy of the logs, calibration data and sensor information. A copy of the Cruise Summary Report (ROSCOP form) should be provided to BODC by the Principal Scientist within one working week of the end of the cruise. A copy of the full cruise report should also be sent to BODC, preferably electronically, as soon as it is completed. BODC will then assist the MarProd in making this more widely available (e.g. via a link from the main programme web site).

Processed and project-specific cruise data must be provided to BODC by the Principal Scientist and project teams as they becomes available, not in the concluding few months or weeks of projects. However, great importance is given both by the programme and by BODC to protecting the interests of data originators, and restrictions on the wider availability of BODC-held data sets will therefore apply (see para. 8 and 10 below).

6. Data formats and data media

Digital data should be collected and stored using standard, widely-available software products and their related data formats. Whilst BODC has experience in handling a very wide range of software, formats and media, Investigators should discuss with them the proposed use of any data-handling or storage protocols that might be regarded as "non-standard".

CD-ROMs are currently the preferred means for making integrated data products from marine thematic available to the wider research community. The MarProd Steering Committee will advise on the number of CDs, and set target times for their release.

7. Data backup policy

Daily backup programmes apply at BODC (and other NERC Designated Data Centres) to safeguard major digital databases. Project PIs and Co-Is are responsible for providing appropriate back-up strategies for unique digital data stored locally and/or via other organisations.

As far as possible, analogue data (such as photographs) should be "disaster proofed" by transferring them into digital form, e.g. by scanning. Such duplication is not a waste of effort, even though the original, analogue version may have a longer lifetime than the format/media used for the digital transcription. Such data may then be included on a programme CD. Note that BODC has considerable experience in managing and publishing image data.

8. Protection of data originators' Intellectual Property Rights

The following arrangements have been developed to ensure an appropriate balance between the protection of data originators' intellectual property rights and the potential benefits that may arise via data use by the programme, the wider research community and other interested parties

  • All data collected in the Marine Productivity programme through NERC funding and provided to BODC should be freely available to all programme participants (PIs and Co-Is) for MarProd purposes on the condition that the originator is kept informed about how the data are being used and is duly acknowledged in any exploitation of that data
  • Due acknowledgement is considered to be co-authorship, specific reference to the data source or a share of any financial reward. The form of this should be negotiated between the data originator and the data exploiter. If a dispute should arise, then the problem will be referred to the Steering Committee for resolution.
  • Until MarProd data enter the public domain, BODC will not transfer them to parties outside the programme without the explicit agreement of the originator. Steering Committee advice will also need to be sought if major data transfers are involved, to avoid compromising the interests of other programme participants.
  • The mechanism for entry into the public domain is expected to be the release of the MarProd CDROM at the conclusion of the programme.
  • A condition of CDROM usage is that it is regarded as a data publication and all usage of the data contained therein should acknowledge the data originator through citation

9. Long term sample curation

Biological material obtained on MarProd cruises is owned collectively by the programme. However, during the programme lifetime, sample-originators have responsibility for the stewardship of material, recording any removals and (if shared with other research groups) keeping track of its movements and usage.

It is recognised that indefinite storage of all biological material is impractical, and that some identification and analytic procedures require sample destruction. Nevertheless, it is expected that nearly all net-collected zooplankton and representative sub-samples of micro-plankton will be stored for the duration of the programme by sample-originators. [Exceptions may be made for larger-size zooplankton, e.g. jellyfish, and organisms removed for biochemical analyses and experimental purposes]. Subsequently, long term archiving (of at least 5-10 yr) will be arranged by the programme for as many samples as possible, to maximise the exploitation of the taxonomic information that they contain.

Before MarProd researchers dispose of biological material in their possession, an assessment should therefore be made as to whether it might be of value to other groups, not necessarily part of the MarProd programme.

10. Data and sample availability

It is NERC policy to ensure that "individual scientists, principal investigator teams and participants in programmes will be permitted a reasonable period to work exclusively on, and publish the results of, the data collected by such individuals and teams". Nevertheless, as the MarProd programme develops, there is necessarily a sequential widening of access to data and samples. This process has already been outlined with respect to data under paragraph 8 above. It can be generalised with reference to three access levels:

Level 1 (Project). Availability limited to the investigators responsible for data/sample collection (for MarProd cruises, data/sample collection is expected to be a shared responsibility; thus group ownership applies, under overall control of the Principal Scientist); any wider sharing at the discretion of the investigators

Level 2 (Programme). When data is transferred to BODC, their availability is automatically extended to other investigators within the MarProd programme. Nevertheless, their further use is still under the control of the data originator, and any wider sharing is at the discretion of the MarProd Steering Committee.

Level 3 (Public). Data publication, at or near the end of the programme. Availability extended to external users, either openly (for academic use) or at the discretion of BODC/NERC (for commercial exploitation, in consultation with data-originators). Post-programme availability of biological material to be controlled by the body responsible for its archiving, on the basis that 'ground-rules' will then have been established by the Steering Committee, and that sample-originators will be consulted wherever practicable.

It is to the benefit of the programme as a whole that inter-project collaborations are developed under Level 1, and that the transition between Levels 1 and 2 is made as rapidly as possible.

11. Identifying data and samples for management purposes

It is important that the MarProd programme maintains an awareness of all data and samples collected through its support, including outputs from partnership arrangements (e.g. from use of non-NERC vessels, or through collaborations involving other funding agencies). Thus it is likely that a reporting system will be established to gain information on such data/samples, and their stewardship arrangements, if not via BODC. However, the Steering Committee is keen to minimise any duplication of reporting effort, and a version of the GLOBEC DIF system (see paragraph 3 above) may therefore be adopted for such purposes.