Software engineering

Visualisation developments — a Java prototype

The following is a technical summary and analysis of a Java prototype which was developed by BODC between July 2003 and the end of January 2004.

It is targeted at readers who have a sound knowledge of the Object-Oriented methodology of software engineering.

The Java prototype, with its cross platform capabilities, demonstrated that Java provided the facilities and performance to support our visualisation operations. Consequently, we have adopted Java for future visualisation developments.

Contents

  1. Introduction to the prototype
  2. Design objectives
  3. Series, channels and selection
  4. ChannelSilo, IO and option handling
  5. Channel aliasing and derived channels
  6. Iterators
  7. ChannelSilo and page synchronisation
  8. Outlier chasing of scatter plots
  9. Selection entities
  10. Array handling and blocking
  11. The next step

 

Go to the top of this page

1. Introduction

BODC data scientists use visualisation software (see Table 1) to screen and flag oceanographic data. Flagging is a mechanism where suspect data are highlighted but the actual data values are not changed. This is an essential part of the quality control procedures carried out by BODC.

BODC's visualisation programs

Name Data type Languages Year deployed
Serplo Time/depth series Fortran 77, Fortran 90 1989
EDTEVA Tide gauge data Fortran 77, Fortran 90, SQL 1993
Waview Spectral wave data C++, Fortran 77 1996
Xerplo 2D time series data C++, Fortran 77 1997

 

The visualisation programs in the table above depend on Silicon Graphics hardware. Today, with the low prices of Intel-based hardware and associated graphics cards, it is prudent to change to using Linux or Windows. 

The Java prototype was called JSerplo. This name will be used for the Java development version. We will revert to Serplo once the current, Silicon Graphics, Inc (SGI) versions of the programs are no longer used.

Go to the top of this page

2. Design objectives

Go to the top of this page

3. Series, channels and selection

Each series is an instantiation of the series class — which itself is an instantiation of the BODC series model. Each series and channel has a selection switch. A SeriesChannel object has two, one from each parent.

If the relevant channel is deselected all associated SeriesChannels are immediately deselected since they have a reference to the same selection object. Similarly, if a series is deselected all ChannelSeries are automatically deselected, since the same selection object is embedded in both.

Each channel is identified uniquely by name, type, and rank. Each series is uniquely identified by internal name, external name and also ordinal within set (so that repeated instances of the same series can be distinguished as can similar data from the same port emanating from the same file as happens with tide gauge data).

Go to the top of this page

4. ChannelSilo, IO and option handling

JSerplo's data ingester has the responsibility for reading the options and files listed in the driver file. Global options should therefore appear at the beginning.

The current version of DriverOptions object is cloned into each series. This it does by determining the file format and reading the data into series objects.

These in turn are collected into a ChannelSilo object which holds all the series data and the iterators and other information peculiar to each of the pages (the pages request the item). The ChannelSilo object can be serialised and when operating in "EDTEVA" mode this takes the place of the dump file.

The ChannelSilo object is passed to the page constructors. Each page derives an iterator by pruning out those series which cannot be expressed through that page. Thus if the series does not have a time channel it cannot be displayed by TimPage. If the series does not have a second dimension it cannot be expressed through the TCADPage. If there is no (rank-1) latitude and longitude, there cannot be a series track map.

By contrast Xerplo makes use of a cascaded invocation of constructors for the various pages. But JSerplo's approach is simpler and also in keeping with an Object-Oriented approach. Xerplo's series classification is done centrally but in JSerplo the responsibility lies with each of the page classes. Xerplo's Page1 is effectively the owner of the data and therefore not distinct from graphical aspects. ChannelSilo objects, while manipulated by GUI elements, are distinct.

Go to the top of this page

5. Channel aliasing and derived channels

Once the data have been read in the derived channels can be calculated and the aliased channels can be generated.

Examples of derived channels include Cartesian current vectors derived from speed and direction (required for velocity scatter plots) and residual currents or sea levels after the tidal signal has been removed.

These channels do not belong to the series but reside as separate objects within the ChannelSilo. They incorporate the SeriesId object appropriate to the series to which they relate. Aliased channels share arrays with their counterparts within the series objects.

Go to the top of this page

6. Iterators

There are separate iterator classes, e.g.

These iterators are more complicated and richer in functionality than those of the Java Collections Framework.

ChannelSilo cs = new ChannelSilo();
ChannelId[] criterion = new ChannelId[2];
  ...   
SeriesIterator serit = cs.getSeriesIterator(page,criterion);
serit.start();
while(serit.next()){
  Series sr = serit.get();
  System.out.println("Series " + (SeriesId) sr);
}
if(!serit.doneAny())
  System.out.println("Nothing selected");  

Note that serit.get() can return a null pointer if not embedded in a next() loop. If the underlying item to which the internal pointer of the iterator object has been deselected, get() will re-assign the pointer by searching in the currently indicated direction to the next available selected item. If none it will return a null pointer.

Also note that these classes are not synchronised, so this could happen in the loop illustrated above in a multithreaded environment. To avoid this one could synchronise on the ChannelSilo object.

Go to the top of this page

7. ChannelSilo and page synchronisation

The ChannelSilo object retains a HashMap of entities required for the smooth running of the program including iterators and interPage communication such as data cycle locking.

Conventions for the naming of these objects have yet to be established (e.g. SI:TimPage might be the SeriesIterator for the TimPage). When fielding a request for a given iterator for a given page it first looks at the HashMap and if it finds none it creates one, stores it and passes it to the requesting page. Otherwise it passes the existing object.

EDTEVA maintains synchrony of Port currency between Page1 and TimPage, whereas Serplo does not maintain currency between the corresponding pages. This is not a critical issue but it serves to illustrate the power of the mechanism. By asking for the iterator for another page and using it, synchronisation is automatically obtained (this ignores the problem that pages may be distinct in terms of the series that are viewable through them and whether this is acceptable or not).

Note that the criterion object for any given page is available via a static method for that page. A criterion object identifies the channel identifiers (ChannelId objects) which must be present for a series to be viewable. If no criterion argument is given (the alternative get() method) only the existing object can be retrieved and no object will be generated.

Go to the top of this page

8. Outlier chasing on scatter plots

An outlier is a point divorced from the main body of data. Typically you want to view its context in a time series plot or to flag it immediately. As BODC stores vectors in polar form (direction and magnitude), and the plot is generated via Cartesian (X-Y) coordinates, this is not straightforward.

An object similar to an iterator called a CannedOutlier is created within the ChannelSilo. This is initially empty but it can store (a list of) points identified by cycle number and series. The chief problem here is to decide what conditions apply to garbage collection as a list is expensive in terms of storage. The choice is between having a set for each series or one for the current series and deciding on what conditions the selected set will outlive the current session.

Waview demonstrates a similar (solved) requirement. Points associated with a particular histogram block can be inspected in turn on the TimPage display. In fact this functionality is the reason why Waview remained as a separate program rather than being incorporated as one or two extra display pages within Xerplo.

Go to the top of this page

9. Selection entities

Selection entities of interest

EDTEVA is different in having a larger, more encompassing entity than the Series: EDTEVA's Page1 selects Ports and Channels. In JSerplo any series sporting a non-null PortId will qualify for the PortPage (corresponds to EDTEVA's Page1) and will not appear, in the first instance, on the SeriesPage (which takes over from Serplo/Xerplo/Waview's Page1). A mechanism will be provided to allow the series related to a given port to be listed separately (via special iterators) as on Serplos Page1.

Go to the top of this page

10. Array handling and blocking

In C++ and Fortran you can pass an array simply by giving the start address. Contiguous subsetting can be done by incrementing the address to the start of the subset.

In Java the situation is not so simple as an array also has a defined length. This means that Java can be grossly inefficient by comparison because you have to make a copy of each subset requested. It also creates problems when it comes to updating because you're no longer updating the master array.

The alternative, which will be used, is to pass the beginning and endpoints along with the master array. The master array however is encapsulated within the Channel object. For rank-2 data the storage order is bin within data cycle. The solution is to equip the Channel class with appropriate subsetting methods.

Blocking is the mechanism introduced to improve plotting speed. The data are subsetted into blocks of, say, 200 cycles and only blocks which intersect with the plotting window get to be plotted. Plotting in the prototype indicated no cost penalty for doing without but it is believed that the data set was probably too small to demonstrate the effect. Thus we intend to retain blocking for the time being.

One efficiency gain will be that the underlying files will be updated on a channel wide basis as each series channel will keep tabs on whether it has been modified or not. This will noticeably reduce the time taken to write back data to file, which currently takes several minutes when the file is accessed via Network File System (NFS).

Go to the top of this page

11. The next step

The JSerplo prototype focused on the Graphical User Interface (GUI) aspects of the visualisation program suite and served its purpose by demonstrating that Java could deliver the plotting speed the application requires. The document has outlined the other aspects of the system required to produce the operational tool BODC's data scientists require. This involved making a clear separation between data manipulation and the GUI.

From January 2004, work focused on the development of the data manipulation aspects of the system. These were married to GUI elements derived from the prototype to produce the application.

The replacement software, Edserplo, became operational in 2005.


Related BODC pages

Software engineering at BODC      The BODC series model
Edserplo — BODC's current visualisation and tidal processing tool     The BODC Transfer system
Former BODC visualisation programs     BODC's Underway Data Processing System (BUDS)
Edteva — our former tidal processing program     The BODC Explorer software package