Software engineering

The BODC series model

The following is a summary of objects, concepts and formats used in the BODC series model. It assumes you have prior knowledge of software programming.

The BODC series model is a data model developed in the late 1970s to describe oceanographic measurements. A data model is an abstract representation of real world objects in terms of computer information.

Contents

  1. Data series
  2. Data cycle
  3. Parameter set
  4. Channels
  5. Series attributes
  6. Rank
  7. QXF format
  8. PXF format
  9. Obsolete formats

 

1. Data series

The data series is the fundamental object in the data model. It consists of repeated sets of measurements taken at different points within a coordinate framework. These may be measurements taken in

The boundaries of data series map to both absolute (such as a CTD cast or current meter deployment) and arbitrary (such as a year of sea level data from a single station) real world objects.

Data series are fine grained entities that may be combined in different ways to produce data sets, such as all the CTD casts from a cruise or all the years of sea level data from a given station. The data sets represent different real world objects to their component data series. Data series may be included in more than one data set.

2. Data cycle

The data cycle is an individual measurement set from a data series. It therefore represents the state of the real world object represented by the series at a single point in space and time.

3. Parameter set

The parameter set is the collection measurements taken on each data cycle, with each measurement known as a parameter.

4. Channels

Channel is the term used for all the occurrences of a given parameter within the data cycles of a series. The physical representation of a channel within BODC is a series of 32-bit integer or 32-bit floating point (single precision) data storage locations. Most, but not all, channels also include a 1-byte storage location for an alphanumeric flag carrying quality control information.

Double precision is not used because most parameters do not require it. Higher precision is achieved by scaling data by an appropriate power of ten and using integer storage. Avoiding the processing overhead of double precision values has a performance advantage.

5. Series attributes

The series attributes are items of information that describe the measurements included in the series, often termed the series metadata. In addition to discrete numbers, such as the latitude and longitude of a sea level station, the metadata includes complex documents describing such things as mooring configurations, analytical techniques and instrument calibration procedures.

6. Rank

If the real world object being modelled consists of measurements being taken in a limited number of fixed places but at different times, the most convenient representation of that object in the series is as a two dimensional array.

For example, an Acoustic Doppler Current Profiler (ADCP) measures water velocity at a fixed set of depths in the water column. ADCP data may be represented by one dimensional arrays holding date and times, a one dimensional array holding the bin depths and two dimensional arrays holding the current velocities.

The number of bin depths is fixed and therefore the size of the corresponding array is fixed and constitutes header information rather than part of a data cycle. This is referred to as rank-0 data. There is a single date and time for each data cycle. These are referred to as rank-1 channels. The two dimensional arrays in the data are termed rank-2.

A series having one or more rank-2 channels is referred to as a 2D series.

7. QXF format

QXF is the primary format used for the storage of data series datacycles. It is a BODC defined subset of netCDF first deployed in 1997. The name was chosen in succession to PXF (see below) and is not an acronym.

QXF is a random access format supporting rank-2 channels (see above). It is platform independent.

Channels are identified with netCDF variables. The 'primary' dimension is the record or unlimited dimension, sometimes know as the independent variable. The 'secondary' dimension allows rank-2 data.

Channel attributes include

If the extremes are undetermined they are set to the absent data value ('.' in the case of flag channels). Header information includes an internal 40 character identifier.

All new data processing activities in BODC now use QXF.

8. PXF format

PXF is a BODC defined, random access, binary format. It was designed in 1978 and was based on the Southampton Oceanographic Centre's P-STAR format.

PXF is not an acronym but gives an indication of its origins (its precursor was P-STAR or PX extended to handle flags). It has a limit of 64 unflagged channels and 51 flagged channels.

Up until the year 2000, PXF was the mainstay of BODC's series data management. It has a rigid limit of 51 flagged channels (now exceeded by some types of data e.g. spectral radiation), does not support rank-2 channels and is not platform independent.

Data held in PXF can be transferred to QXF with no loss of information, except for formatting information held for each channel where the two formats are not quite isomorphic.

PXF has twelve header records of 256 bytes followed by interleaved records of similar length recording 64 values from the separate channels. The header includes a 40 character internal identifier and a (Fortran) listing format.

For the purposes of storage, flag channels become packed flag channels. Up to four flag channels can occupy the space of one data channel.

BODC has had three forms of PXF relating to a succession of different operating systems.

Floating point representation differs between all three.

We are currently in the middle of a project which is reformatting our PXF data holdings into QXF. The objective is to totally retire PXF by the end of 2005.

9. Obsolete formats

The following BODC formats are no longer used in BODC's data holdings


Related BODC pages

Software engineering at BODC      Visualisation developments - a Java prototype
Edserplo — BODC's current visualisation and tidal processing tool      The BODC Transfer system
Former BODC visualisation programs      BODC's Underway Data Processing System (BUDS)
Edteva — our former tidal processing program      The BODC Explorer software package