Information technology

Database design

Figure 1 Relationship between parameter and data tables
Figure 1 Relationship between parameter and data tables ©

The design of BODC's databases have developed over the years to support the evolving nature of data collected by the science of oceanography.

Contents

  1. Oracle Relational Database Management System (RDBMS)
  2. National Oceanographic Database (NODB)
  3. Project Database
  4. Web Database

Go to the top of this page

1. Oracle Relational Database Management System (RDBMS)

Our databases adhere to the relational model of database design and have been implemented using the Oracle Relational Database Management System (RDBMS).

The databases contain tables which are used to store information. Tables consist of columns and rows. Sometimes a column is called a field and a row is called a record. Each row should be unique and can be identified using a primary key. A primary key is made up of one or more fields.

Tables can have relationships or links with other tables. A value (e.g. a parameter code) stored in one table can have more information stored about it in another table. This approach helps to minimise duplication of information.

Figure 1 illustrates the relationship between a plankton net data table (NETDATA) and a parameter table (PARAMETER). The field PARA_CODE in NETDATA links to CODE in PARAMETER. PARAMETER contains more information about the parameter that was measured

The relationship between PARAMETER and NETDATA is one-to-many. Each row in PARAMETER corresponds to one type of parameter. The rows in NETDATA can store the same parameter code many times.

Go to the top of this page

2. National Oceanographic Database (NODB)

The NODB stores metadata (data about data) in a relational database in the form of a data series. Metadata includes the data originator, position and collection start/end dates and times. The actual data measurements are stored outside the database in binary data files. These data are available by online delivery via our NODB data series facility.

The concept of a data series is fundamental to the design of the NODB. Each series consists of a number of data cycles. Each data cycle usually contains one value for each parameter sampled. Examples include

Each data series has one row in the master NODB table. The table links to other NODB tables containing metadata about parameters, restrictions, documents and storage.

The origins of the NODB go back to the 1970s. It was developed by Meirion Jones and Trevor Sankey using a COnference on DAta SYstems Languages (CODASYL) system. Its aim was to improve the availability of oceanographic data for science and industry, and to protect it for future use. A function the NODB still performs today.

Go to the top of this page

3. Project Database

The Project Database is used to manage data for large multi-disciplinary research projects. Both the data measurements and the metadata are stored in the Oracle RDBMS. Its evolution started in 1988, when BODC took on data management for the North Sea Project.

Advances in analytical technology used by oceanographers had dramatically increased the number of parameters that could be measured. The NODB was not designed to support such large numbers. Metadata collected for the NODB was paper based and was not adequate for the hundreds of water bottle samples taken during the North Sea Project.

BODC decided to develop a new database structure. This was made possible by the replacement of the CODASYSL system (running on a Honeywell mainframe computer) by the Oracle RDBMS (running on an IBM mainframe). The Oracle RDBMS enabled efficient collation, population and interrogation of both data and metadata.

The Project Database is organised around three main groups of tables

i. Sampling metadata tables consist of sampling and event description tables. These link to the fieldwork description table and to the sampling instrument or gear code table.

ii. Data tables consist of a series of data storage tables for each main type of sampling or data collection technique.

iii. Parameter dictionary tables define the eight byte parameter codes used to label the measurements in the data tables.

The data table structure has been adapted and modified for each main type of data collection technique. This enables storage of specific metadata information, such as bottle depth and type for water collection events, and plankton net depth range, mesh size and mouth area for plankton net hauls. All data tables have the same field structure consisting of a

This structure enables easy expansion to include new sampling instruments, methodologies and parameters. Figure 2 illustrates the flow of data into the Project Database.

Figure 2 Flow of data into Project Database
Figure 2 Flow of data into Project Database ©

The Project Database has not replaced the NODB. Both are used in parallel.

Go to the top of this page

4. Web Database

Currently, we are developing the Web Database to meet the specific needs of web applications. Issues such as security and performance require particular care when exposing data to the internet.

 


Related BODC pages

Data processing steps     Future strategy
Software engineering