- Data management
- Data policy
The following provides the framework for the short term and long term management of data and samples arising from the RAPID-WATCH Programme. A PDF copy of the RAPID-WATCH data management plan is also available. (169 KB)
A PDF copy of the RAPID (2001-2007) data management plan is available. (243 KB)
RAPID-WATCH data management plan
- Role of RAPID Data Centre (RDC)
- Minimum standards of stewardship for NERC data
- Data acquisition
- Data formats and data media
- Data back-up policy
- RAPID-WATCH data policy (approved by the Programme Advisory Group)
- RAPID metadata protocol
NERC requires all Directed Programmes to plan and implement a data management scheme. The planning must cover the practical arrangements while the programme is running and the subsequent maintenance and long-term curation of the data sets. The latter is increasingly important in view of the Environmental Information Regulations, which place a duty on Government funded bodies to make all publicly funded data readily and easily available.
The NERC Data Policy requires that all data are lodged with the appropriate NERC Designated Data Centre. In the context of RAPID-WATCH these are the British Oceanographic Data Centre (BODC) and the British Atmospheric Data Centre (BADC), the respective Designated Data Centres for Marine and Atmospheric Sciences. The minimum required standards of stewardship are summarised in section 3 .
NERC provides funding to the Data Centres for basic infra-structure support and the long-term maintenance and curation of NERC's data assets. Programme budgets include the funds necessary for project data management within the life of the project. An integral part of the Data Management Plan is an obligation upon RAPID-WATCH Programme Principal Investigators (PIs) to ensure that data management is undertaken in a suitable way, and that adequate consideration is given to the "data side" of their work.
The data management policy as defined by the RAPID-WATCH Programme Advisory Group (PAG) steering group is outlined in section 8.
This plan has been formulated following a review of the specified resource requirements and outputs set out in the RAPID-WATCH Work Plan and a series of discussions between BODC/BADC and the project PIs in order to assess the scale of data production.
2. The Role of the RAPID-WATCH Data Centre (RDC)
As was the case in RAPID, submission of and access to data will be through a common 'portal' and for the purposes of RAPID-WATCH the term RAPID-WATCH Data Centre (RDC) will refer to BADC and BODC. Data management costs have been allocated in the RAPID-WATCH budget for RDC services.
The RDC will be the focal point for PIs regarding data issues. The RDC web site will contain inventories providing comprehensive up to date information about the status of all project data sets and model runs, so that all RAPID-WATCH participants can easily request available data. The RDC will service data requests by RAPID-WATCH participants and is expected that automatic data download for observations series will be available from autumn 2009.
Following the completion of RAPID-WATCH the RDC will ensure that data are passed to the appropriate International Data Centres, ensuring that NERC meets its international obligations.
3. Minimum standards of stewardship for NERC data
The following minimum standards are expected to apply when digital data sets form part of NERC's enduring data resource:
- NERC's policy towards exploiting and making data available to third parties must be agreed at the outset.
- The data set must be catalogued to the level of detail required by a NERC Designated Data Centre, so that it can be mentioned in web-based NERC data catalogues.
- Formal responsibility for the custody of the data set must be agreed.
- The data must be fully "worked up" (i.e. calibrated, quality-controlled etc.) with sufficient associated documentation to be of use to third parties without reference to the original collector.
- The technical details of how the data are to be stored, managed and accessed must be agreed and suitably documented.
- The technological implications must be established (digital data stewardship implies the need for an underlying infrastructure of IT equipment and support).
- The resources needed to carry out these intentions over the planned life of the data, in terms of staff (whether in project teams or the Data Centre) and IT equipment/infrastructure must be estimated and sources identified.
- A review mechanism must exist to reconsider periodically the costs and benefits of continuing to maintain the data. The intention to destroy or put at risk data should be publicised in advance, allowing time for response by interested parties.
The above NERC-wide requirements, set out in the NERC Data Policy, will be looked after "automatically" for the RAPID-WATCH data sets managed by BODC and BADC. Nevertheless, PIs need to be aware of this framework.
4. Data acquisition
RAPID-WATCH data cover oceanographic data and the generation of model output. It is not the intention of this document to specify in detail how these data are collected, described and delivered to the data centres; however, a number of generic principles need to be adhered to.
Processed and project-specific data must be provided to the RDC by the Principal Scientist and project teams as they become available, not in the concluding few months or weeks of projects.
A well structured and user-friendly identification system is essential for cruise-based data collection and sample labelling. Such arrangements are the responsibility of the cruise Principal Scientist. Station identifiers, navigational information and "basic" oceanographic data must be provided to the RDC by the Principal Scientist as soon as possible after a cruise. A copy of the Cruise Summary Report (ROSCOP form) should be provided to the RDC by the Principal Scientist within one working week of the end of the cruise. A copy of the full cruise report should also be sent to the RDC, electronically, as soon as it is completed. The RDC will then assist in making this more widely available (e.g. via a link from the main programme web site).
In the case of model data, the details for submission and serving will be agreed with individual PIs. Broad principles are given in section 5. In general, information accompanying submitted model data should include the model name and version number and a brief description of the model's general aim. See the metadata protocol for more detail.
Metadata are a crucial part of any data archive since they ensure that the data can be understood at a later date. To guarantee the RAPID-WATCH data archive quality, full documentation on all validated raw and processed data, as well as on models and model results, must be provided to the RDC. It is therefore essential that metadata are submitted at the same time as the data sets to which they pertain. The responsibility for producing the metadata will lie with project PIs and the RDC. A metadata protocol is outlined at section 9.
In addition to the standard metadata, investigators are encouraged to archive at the RDC all relevant information electronically, including references, papers, reports, etc., unless agreed otherwise between the PIs and the RDC.
6. Data formats and data media
Digital data should be collected and stored using standard, widely available software products and their related data formats. Whilst the RDC has experience in handling a very wide range of software, formats and media, Investigators should discuss with them at an early stage the proposed use of any data-handling or storage protocols that might be regarded as "non-standard".
In general, model data should be offered to the RDC in the recommended CF compliant NetCDF format, although there may be exceptions (particularly PP and HDF will also be accepted in extreme situations). Documentation on formats and conventions is available from the RDC, which also provides links to downloadable free software packages to support NetCDF file handling.
Submission of field data continue as in RAPID, mainly accessed via the computer network. At an early stage Investigators should discuss the options for model data submission with the RDC.
7. Data back-up policy
The consequences of losing data, due to having made insufficient or inappropriate provision for their back-up, are potentially catastrophic in the case of large data collections, and cumulatively serious in the case of smaller data sets. Rigid daily back-up programmes operate at the RDC and safeguard major digital databases. Provision and support of back-up strategies for digital data stored locally is the responsibility of individual PIs, or their delegates. Project PIs and Co-Is are responsible for providing appropriate back-up strategies for digital data stored locally and/or via other organisations.
PIs should bear in mind that the timely deposit of data with the RDC will provide additional security for the project data.
8. RAPID-WATCH data policy
An important aim of RAPID-WATCH is to ensure that the data from the RAPID observing system is made available to the wider climate change science community as soon as possible after collection. To facilitate this, the following data policy has been recommended by the Programme Advisory Group (PAG) in discussion with the PIs of the RAPID-WATCH observing system and the PIs of projects funded under the data exploitation AO, and agreed by the Project Executive Board (PEB). It will apply to data1 from all projects funded through RAPID-WATCH:
- Data from the RAPID-WATCH observing system should be lodged with the RAPID-WATCH Data Centre (RDC) as soon as possible and in general no later than 6 months after acquisition2, together with such metadata as are defined under the RAPID-WATCH Data Management Plan (DMP).
- Model output and data-model syntheses deemed to be wider interest should be lodged with the Data Centre according to agreed schedules3, and no later than the end of the project.
- Free and open access4 to all RAPID-WATCH data will be available to anyone accepting the terms and conditions for data use via the Data Centre web portal.
- Users of RAPID-WATCH data are required to acknowledge RAPID-WATCH in any published work making use of the data. Within 3 years of the data being collected users are also expected to acknowledge the PI and/or co-workers (as appropriate) in any resulting papers.
- The RAPID-WATCH data policy will apply to all projects funded by NERC5 under RAPID-WATCH (2008-2014), and to all data from the observing system at 26 North and the WAVE arrays in the Deep Western Boundary Current (DWBC), including data from the period funded by RAPID (2001-2007). Data from other projects funded under RAPID will continue to be subject to the RAPID data policy.
- PIs and/or co-workers failing to comply with the RAPID-WATCH data policy would be subject to appropriate sanctions.
1. RAPID-WATCH data includes all observations from the RAPID-WATCH observing system at 26 North and in the DWBC (including data acquired during RAPID 2001-2007); model output from projects funded under the RAPID-WATCH Exploitation AO and deemed to be of wider interest; data syntheses and data-model syntheses carried out as part of projects funded by RAPID-WATCH.
2. As soon as possible after acquisition: the date of acquisition is the date on which the data was downloaded from an instrument; in the case of instruments recovered during a cruise, the acquisition date will be the end-date of the cruise. The time-scale for lodging data with the RDC may vary between data types; some data (for example, real-time data) could go directly to a data centre but the overall aim is to keep the time-scale as short as possible.
All data from measurements are to be calibrated and banked within 6 months, but in exceptional circumstances (agreed in advance with the PEB) data may be submitted later than this. This is to ensure that data acquired are available to the RAPID-WATCH and wider climate change communities on a time scale that allows data use to be considered as part of the observing system review in 2011.
3. Schedules for delivery of model output agreed between the Data Centre, the PIs and the PEB will be linked to project milestones to ensure that where possible model output and model-data syntheses may be taken into consideration as part of the observing system review in 2011.
4. Free and open access: unrestricted access for any use, including academic, non-profit and commercial, free of any charge except the cost of data provision.
5. The RAPID-WATCH data policy will be applied to projects funded by NERC: PIs of collaborating projects in the USA and Canada will not be expected to comply with the terms of this data policy, but will be encouraged to do so, and the RDC will provide the necessary support to make this possible.
9. RAPID-WATCH metadata protocol
The term metadata encompasses all the information necessary to interpret, understand and use a given dataset. Discovery metadata more particularly apply to information (keywords) that can be used to identify and locate the data that meet the user's requirements (via a Web browser, a Web based catalogue, etc). Detailed metadata include the additional information necessary for a user to work with the data without reference back to the data provider. The metadata required by the RAPID-WATCH Programme include both discovery and detailed metadata.
Metadata pertaining to observational data, for example, include details about how (with which instrument or technique), when and where the data have been collected, by whom (including affiliation and contact address or telephone number) and in the framework of which research project.
In the case of all submitted data, the RDC needs to know how the values were arrived at. The derivation process must be stated: all processing and calibration steps should be described and calibration values supplied. The nature and units of the recorded variables are essential, as well as the grid or the reference system. The RDC requests that as much information as possible about fieldwork instrumentation be included, e.g. serial number, copies of manufacturer's calibration sheets, and recent calibrations, if applicable.
Metadata pertaining to model output should be as comprehensive as possible, and include information such as the name of the model, the conditions of the calculation, the nature of its output and the geographical domain over which the output is defined (when applicable). Specific conditions applying to the model or the experiment may be mentioned. Where non-self describing files are submitted, metadata may also include information on the format in which the data are stored, and the order of the variables, to allow potential users to access them. Metadata pertaining to software models include the key points of the theory on which the model is based, the techniques and computational language used, and references.
The following lists the minimum metadata required to accompany data files submitted to the RAPID Data Centre (RDC).
Metadata for tables of numbers (observations or model output)
Ideally, each data file should include a header containing the metadata. If the file is not self describing and there is a large amount of information (e.g. description of many processing steps, calibration techniques), then a separate text file can be used as an alternative.
Metadata include the following overall information. Some information in this list may be applicable in specific cases only.
- Information about the experiment
- Date when fieldwork, experiment or model simulation started
- Site or trajectory bounding box or domain limits
- Platform (e.g. ship, cruise number)
- Instrumentation (including instrument make, model and serial number)
- Model name
- Information about the experimenter(s)
- Names, affiliation, contact address including e-mail, telephone number
- Programme name, research project number
- Information about the independent variables (spatio-temporal grid)
- Names, units, domain of definition of independent variables
- Interval values when appropriate
- Information about the data
- Version number
- Date of last revision
- Processing level (nature of raw data, derivation method: processing steps, calibrations applied)
- Nature, name, units, scaling factors of dependent variables
- Information about data storage
- Number of files of the entire data set
- File number of current file
- Information about data format
- Type of format e.g. ASCII, Excel, MATLAB, netCDF
- Additional information
- May include particular conditions of experiment or model run, model boundary conditions, article reference, and sources of further information
Metadata for software
Metadata relative to software can be included as comments in the top section of the source file or can alternatively be provided as a separate text file.
Metadata pertaining to a model should include the following:
- Information on the model
- Brief description of model general aim
- Model structure
- Physical processes involved, including equation set
- Algorithmic implementation techniques used
- Spatio-temporal coverage when applying
- Boundary conditions, including reference(s)
- Initial conditions, including reference(s)
- Program language
- Input nature and format
- Output nature and format.
- Summary of model validation, or appropriate reference(s)
- Summary of results from former studies conducted with the model, or appropriate reference(s)
- Information on the author(s)
- Names, affiliation, contact address including e-mail, telephone number
- Programme name, research project number
Any additional documentation on recorded data or images, whether pertaining to a single data file or a whole data set, that would not find its place into the structures described above (because it does not fall into any described category or because it is too voluminous) may be submitted to the RDC in the form of a text file that will be stored in the RAPID-WATCH archive documentation directory. These documents may for example include technique description, possible use of the data, study conclusions, etc.