QXF (a netCDF) format
The BODC National Oceanographic Database is designed around two elements, a metadata store in a relational database and a data store. The data store consists of thousands of binary files containing the repeated measurements (termed datacycles) that make up the data.
QXF is BODC's primary datacycle storage format. QXF is based on an internationally recognised platform-independent exchange format (netCDF). Consequently, it may be considered as a viable data delivery format for some of our customers. Previously BODC stored most of its datacycles in an in-house binary format known as PXF. The name QXF is not an acronym but chosen because the format succeeded PXF. QXF has increased functionality over PXF including the ability to handle multi-dimensional data and is much easier to manipulate.
MATLAB versions R2008b and later support netCDF. However, should you have an earlier version of MATLAB or require that previous code be compatible a netCDF toolbox is available from SOURCEFORGE.
The information provided below assumes you have some prior knowledge of software programming.
QXF is 'a format for datacycles'. Consequently, it does not include the metadata, even the fields that are essential for the effective use of the data. If data are delivered in QXF then BODC practice is to deliver the metadata separately either as a simple ASCII text file or an Extensible Markup Language (XML) file.
In netCDF there are variable attributes and global attributes
- Variable attributes are used for storing such items as absent data values, formats and limits
- File or global attributes can be used to store metadata relating to the file as a whole
NetCDF operates with the notion of dimension. The primary dimension is the netCDF’s record dimension. This is normally time and is always present. The secondary dimension (which can be absent) contains information, such as, observing depths.
Each QXF file contains a header (a series of global attributes) and a series of QXF channels. These channels generally consist of a data variable (integer or single-precision floating point) plus a 1 byte quality control flag channel. The flag channel will not be encountered in the following circumstances
- Channels that are attached to the 'secondary' dimension alone (zero rank — see below)
- BODC standard date channel (the flag is carried by the separate time channel)
The data variables are named using the 8 byte alphanumeric codes defined in the BODC Parameter Dictionary. The relevant subset of the dictionary is included in the metadata report, or a full version is available from BODC. The associate flag variables are named using the data variable names prefixed by an 'F'.
The QXF header consists of netCDF global attributes. The complete list, including data types, for QXF 1.0 is
|QXFVER||Long||QXF version. Two element array set at (1,0)|
|SERIDN||Char*40||Internal file identifier. Series name|
|GOODFL||Char*1||Value of good flag (always set blank)|
|NULLFL||Char*1||Value of null flag (always set to 'N')|
|STOROP||Long||Storage option flag (always set zero)|
|NSCHAN||Long||Number of QXF channels|
|NSOFDA||Long||Number of storage locations used for the file (not bytes)|
|HEADSZ||Long||Header size expressed in locations|
|CYCLSZ||Long||Cycle size expressed in locations|
Most of these fields are either designed for use by BODC systems or are to allow non-standard usage of the format within the data centre. Consequently, they are of little interest to external users. However, it is worth checking that the format version hasn't changed and that the Internal File Identifier is as expected. Files that have been completely processed by the BODC system will have this set to BSRnnnnnnnn, where 'nnnnnnnn' is the BODC Series Reference for the data.
Rank is a mathematical term used to describe the number of dimensions possessed by an array. The term is used here in a comparable sense for QXF channels.
- Zero rank — Channels attached to ‘secondary’ dimension alone
- Rank 1 — Channels attached to ‘primary’ dimension alone
- Rank 2 — Channels attached to both dimensions
QXF channel types
Three types of QXF channel need to be known about by the user:
- 'Rank Zero' channels — These are only found in data that include 2-dimensional netCDF data variables. The rank zero channels are associated with the NetCDF secondary dimension. Often there will only be one rank zero channel (e.g. bin depth for a moored ADCP), but more than one is a possibility (e.g. cell latitude and cell longitude for OSCR data). Rank zero channels do not carry a flag variable.
- Date channel — These consist of a 1-dimensional long
integer data variable named 'AADYAA01', a 1-dimensional single precision
floating point data variable named 'AAFDZZ01' and a 1 byte character flag
channel named 'FAAFDZZ01'. Date channels may not always be included as
some data types (e.g. CTD profiles) may have other parameters such as
depth as the primary independent variable.
Variable 'AADYAA01' contains the date expressed as a 'Loch Daynumber'. This is the number of whole days elapsed since 00:00 on 01/01/1760. Open source conversion functions to change Loch Daynumbers into 'year, month and day' are available from BODC in Fortran and Pascal or inbuilt Matlab functions may be used. It may help users to know that the Loch Daynumber for 'C-Zero' ( 01/01/1970 ) is 76701 and the Loch Daynumber for 01/01/2000 is 87658.
Variable 'AAFDZZ01' contains the time expressed as a day fraction (e.g. 06:00 corresponds to 0.25, 18:00 corresponds to 0.75, etc.).
Variable 'FAAFDZZ01' is used to signify quality aspects of the whole datacycle. If it has been set non-blank, it does not mean that there is a problem with the time. Rather that the quality information applies to all parameters in the datacycle. Users should not encounter non-blank datacycle flags in fully processed data as they are only used to indicate datacycles that are to be deleted during BODC processing.
- Data channel — These contain a long-integer or single precision floating point data variable that may be either one or two-dimensional. The variable is named using the appropriate BODC parameter dictionary 8-byte code. Each data variable is accompanied by a single-byte flag variable of equal dimension. The full list of flag definitions is given below. In practice, many of these flags have only been used on rare occasions and are unlikely to be encountered. The flags L and M apply to data regarded as suspect.
|<||Below detection limit|
|>||In excess of quoted value|
|A||Taxonomic flag for affinis (aff.)|
|B||Beginning of CTD Down/Up Cast|
|C||Taxonomic flag for confer (cf.)|
|E||End of CTD Down/Up Cast|
|I||Taxonomic flag for single species (sp.)|
|K||Improbable value — unknown quality control source|
|L||Improbable value — originator's quality control|
|M||Improbable value — BODC quality control|
|O||Improbable value — user quality control|
Each netCDF variable in a QXF file carries a set of variable attributes (sometimes termed channel attributes). These are named using the variable name suffixed by 'MIN', 'MAX', 'LFM', and 'ABS' (e.g. TEMPPR01.MAX). Data variables carry all four variable attributes. Flag variables only carry the first two.
The attributes are defined as follows
MIN — Minimum value stored in the variable array. For data variables, it includes all values except for absent data values, whatever the status of the accompanying flag.
MAX — Maximum value stored in the variable array. For data variables, it includes all values except for absent data values, whatever the status of the accompanying flag.
LFM — Information on a suitable output format for the data held in the array. In the simple case the LFM (logical format element) specifies the number of places before (BEF) and after (AFT) the decimal point.
- LFM = 100*BEF + AFT
Thus 100 would equate to F2.0 in Fortran for a float or real variable and I1 for an integer. In C the equivalent would be %2f and %1d. Note that space for a leading sign must be met by BEF.
Scientific notation, that is the use of mantissa and exponent, is met by inverting LFM's sign and setting BEF to 1. The number of significant digits is then 1 + AFT and the form is
- sd.<m.. ..m>Esxx
where <m.. ..m> is the part of the mantissa following the decimal point, 's's are the signs, and 'xx' is the exponent. LFM =-105 is equivalent to %12.5e (C), 1pe12.5 (Fortran). -0.0825 in this format appears as
Note. The space for the mantissa's sign is automatically allocated and it is not possible to define the case of the letter 'e' introducing the exponent. The presentation is chosen to coincide with that used in C's standard library.
ABS — This is the absent data value used in the variable array. This value (and no other) will be accompanied by a flag variable element set to 'N'.