Software engineering
Visualisation developments — a Java prototype
The following is a technical summary and analysis of a Java prototype which was developed by BODC between July 2003 and the end of January 2004.
It is targeted at readers who have a sound knowledge of the Object-Oriented methodology of software engineering.
The Java prototype, with its cross platform capabilities, demonstrated that Java provided the facilities and performance to support our visualisation operations. Consequently, we have adopted Java for future visualisation developments.
Contents
- Introduction to the prototype
- Design objectives
- Series, channels and selection
- ChannelSilo, IO and option handling
- Channel aliasing and derived channels
- Iterators
- ChannelSilo and page synchronisation
- Outlier chasing of scatter plots
- Selection entities
- Array handling and blocking
- The next step
1. Introduction
BODC data scientists use visualisation software (see Table 1) to screen and flag oceanographic data. Flagging is a mechanism where suspect data are highlighted but the actual data values are not changed. This is an essential part of the quality control procedures carried out by BODC.
BODC's visualisation programs
| Name | Data type | Languages | Year deployed |
|---|---|---|---|
| Serplo | Time/depth series | Fortran 77, Fortran 90 | 1989 |
| EDTEVA | Tide gauge data | Fortran 77, Fortran 90, SQL | 1993 |
| Waview | Spectral wave data | C++, Fortran 77 | 1996 |
| Xerplo | 2D time series data | C++, Fortran 77 | 1997 |
The visualisation programs in the table above depend on Silicon Graphics hardware. Today, with the low prices of Intel-based hardware and associated graphics cards, it is prudent to change to using Linux or Windows.
The Java prototype was called JSerplo. This name will be used for the Java development version. We will revert to Serplo once the current, Silicon Graphics, Inc (SGI) versions of the programs are no longer used.
2. Design objectives
- Adopt an integrated approach allowing the simultaneous replacement of the four principal programs of BODC's visualisation suite (see Table 1) by one development program. Focusing on commonalities should reduce the overall effort of rewriting the software. If appropriate, we can split off the EDTEVA functionality into a separate program at the end of development.
- A modular system for handling formats.
- New ability to capture state at the conclusion of a session and re-open with a particular configuration. In present programs all axes have to be re-established manually.
- Allow dynamic generation of data series and subsequent output with the new option to view or not to view.
- New ability to allow the flagging of outliers on scatter plots. An outlier is a point divorced from the main body of data.
- Dynamic generation of derived channels, in particular tides and residuals.
- New ability to tidally analyse non-port data, e.g. current meter data, including 2-dimensional data, such as acoustic doppler current profiler (ADCP) data.
- New ability to select subsets of bins. Note that we have the ability to define subsets of cells for the Ocean Surface Current Radar (OSCR) Page.
- New ability to display ADCP scatter plots for separate bins.
- Introduce a standardised mechanism for dealing with
selection. Selection ranges over
- series
- channels
- bins
- ports
- casts
- wave histogram blocks
- cells
- Implicit selection of data cycles via "blocking".
Selection emerges at three levels and has to be supported in an efficient
manner
- It is explicit on Page 1
- It is implicit on any presentational page (since only a subset of series can be presented through it)
- Performed by using the singleton toggle within the presentational page.
3. Series, channels and selection
Each series is an instantiation of the series class — which itself is an instantiation of the BODC series model. Each series and channel has a selection switch. A SeriesChannel object has two, one from each parent.
If the relevant channel is deselected all associated SeriesChannels are immediately deselected since they have a reference to the same selection object. Similarly, if a series is deselected all ChannelSeries are automatically deselected, since the same selection object is embedded in both.
Each channel is identified uniquely by name, type, and rank. Each series is uniquely identified by internal name, external name and also ordinal within set (so that repeated instances of the same series can be distinguished as can similar data from the same port emanating from the same file as happens with tide gauge data).
4. ChannelSilo, IO and option handling
JSerplo's data ingester has the responsibility for reading the options and files listed in the driver file. Global options should therefore appear at the beginning.
The current version of DriverOptions object is cloned into each series. This it does by determining the file format and reading the data into series objects.
These in turn are collected into a ChannelSilo object which holds all the series data and the iterators and other information peculiar to each of the pages (the pages request the item). The ChannelSilo object can be serialised and when operating in "EDTEVA" mode this takes the place of the dump file.
The ChannelSilo object is passed to the page constructors. Each page derives an iterator by pruning out those series which cannot be expressed through that page. Thus if the series does not have a time channel it cannot be displayed by TimPage. If the series does not have a second dimension it cannot be expressed through the TCADPage. If there is no (rank-1) latitude and longitude, there cannot be a series track map.
By contrast Xerplo makes use of a cascaded invocation of constructors for the various pages. But JSerplo's approach is simpler and also in keeping with an Object-Oriented approach. Xerplo's series classification is done centrally but in JSerplo the responsibility lies with each of the page classes. Xerplo's Page1 is effectively the owner of the data and therefore not distinct from graphical aspects. ChannelSilo objects, while manipulated by GUI elements, are distinct.
5. Channel aliasing and derived channels
Once the data have been read in the derived channels can be calculated and the aliased channels can be generated.
Examples of derived channels include Cartesian current vectors derived from speed and direction (required for velocity scatter plots) and residual currents or sea levels after the tidal signal has been removed.
These channels do not belong to the series but reside as separate objects within the ChannelSilo. They incorporate the SeriesId object appropriate to the series to which they relate. Aliased channels share arrays with their counterparts within the series objects.
6. Iterators
There are separate iterator classes, e.g.
- SeriesIterator
- ChannelIterator
- SeriesChannelIterator
- PortIterator
- BinIterator
- CTDCastIterator
These iterators are more complicated and richer in functionality than those of the Java Collections Framework.
- They can be traversed in either direction.
- They may or may not be in singleton mode.
- They are responsive to the selectability of the underlying component.
- They can be reset.
- Processing starts with the current pointer and proceeds, if not in singleton mode, in ring-buffer fashion to terminate with the one that precedes it. Thus a fragment of code might look like the following
ChannelSilo cs = new ChannelSilo();
ChannelId[] criterion = new ChannelId[2];
...
SeriesIterator serit = cs.getSeriesIterator(page,criterion);
serit.start();
while(serit.next()){
Series sr = serit.get();
System.out.println("Series " + (SeriesId) sr);
}
if(!serit.doneAny())
System.out.println("Nothing selected");
Note that serit.get() can return a null pointer if not embedded in a next() loop. If the underlying item to which the internal pointer of the iterator object has been deselected, get() will re-assign the pointer by searching in the currently indicated direction to the next available selected item. If none it will return a null pointer.
Also note that these classes are not synchronised, so this could happen in the loop illustrated above in a multithreaded environment. To avoid this one could synchronise on the ChannelSilo object.
7. ChannelSilo and page synchronisation
The ChannelSilo object retains a HashMap of entities required for the smooth running of the program including iterators and interPage communication such as data cycle locking.
Conventions for the naming of these objects have yet to be established (e.g. SI:TimPage might be the SeriesIterator for the TimPage). When fielding a request for a given iterator for a given page it first looks at the HashMap and if it finds none it creates one, stores it and passes it to the requesting page. Otherwise it passes the existing object.
EDTEVA maintains synchrony of Port currency between Page1 and TimPage, whereas Serplo does not maintain currency between the corresponding pages. This is not a critical issue but it serves to illustrate the power of the mechanism. By asking for the iterator for another page and using it, synchronisation is automatically obtained (this ignores the problem that pages may be distinct in terms of the series that are viewable through them and whether this is acceptable or not).
Note that the criterion object for any given page is available via a static method for that page. A criterion object identifies the channel identifiers (ChannelId objects) which must be present for a series to be viewable. If no criterion argument is given (the alternative get() method) only the existing object can be retrieved and no object will be generated.
8. Outlier chasing on scatter plots
An outlier is a point divorced from the main body of data. Typically you want to view its context in a time series plot or to flag it immediately. As BODC stores vectors in polar form (direction and magnitude), and the plot is generated via Cartesian (X-Y) coordinates, this is not straightforward.
An object similar to an iterator called a CannedOutlier is created within the ChannelSilo. This is initially empty but it can store (a list of) points identified by cycle number and series. The chief problem here is to decide what conditions apply to garbage collection as a list is expensive in terms of storage. The choice is between having a set for each series or one for the current series and deciding on what conditions the selected set will outlive the current session.
Waview demonstrates a similar (solved) requirement. Points associated with a particular histogram block can be inspected in turn on the TimPage display. In fact this functionality is the reason why Waview remained as a separate program rather than being incorporated as one or two extra display pages within Xerplo.
9. Selection entities
Selection entities of interest
- In EDTEVA — Ports, Series and Channels.
- In Xerplo — Series, Bins, Cells and Channels.
- In Serplo — Series, Casts and Channels.
- In Waview — Series, Histogram Blocks and Channels.
EDTEVA is different in having a larger, more encompassing entity than the Series: EDTEVA's Page1 selects Ports and Channels. In JSerplo any series sporting a non-null PortId will qualify for the PortPage (corresponds to EDTEVA's Page1) and will not appear, in the first instance, on the SeriesPage (which takes over from Serplo/Xerplo/Waview's Page1). A mechanism will be provided to allow the series related to a given port to be listed separately (via special iterators) as on Serplos Page1.
10. Array handling and blocking
In C++ and Fortran you can pass an array simply by giving the start address. Contiguous subsetting can be done by incrementing the address to the start of the subset.
In Java the situation is not so simple as an array also has a defined length. This means that Java can be grossly inefficient by comparison because you have to make a copy of each subset requested. It also creates problems when it comes to updating because you're no longer updating the master array.
The alternative, which will be used, is to pass the beginning and endpoints along with the master array. The master array however is encapsulated within the Channel object. For rank-2 data the storage order is bin within data cycle. The solution is to equip the Channel class with appropriate subsetting methods.
Blocking is the mechanism introduced to improve plotting speed. The data are subsetted into blocks of, say, 200 cycles and only blocks which intersect with the plotting window get to be plotted. Plotting in the prototype indicated no cost penalty for doing without but it is believed that the data set was probably too small to demonstrate the effect. Thus we intend to retain blocking for the time being.
One efficiency gain will be that the underlying files will be updated on a channel wide basis as each series channel will keep tabs on whether it has been modified or not. This will noticeably reduce the time taken to write back data to file, which currently takes several minutes when the file is accessed via Network File System (NFS).
11. The next step
The JSerplo prototype focused on the Graphical User Interface (GUI) aspects of the visualisation program suite and served its purpose by demonstrating that Java could deliver the plotting speed the application requires. The document has outlined the other aspects of the system required to produce the operational tool BODC's data scientists require. This involved making a clear separation between data manipulation and the GUI.
From January 2004, work focused on the development of the data manipulation aspects of the system. These were married to GUI elements derived from the prototype to produce the application.
The replacement software, Edserplo, became operational in 2005.
Related BODC pages
| Software engineering at BODC | The BODC series model | |
| Edserplo — BODC's current visualisation and tidal processing tool | The BODC Transfer system | |
| Former BODC visualisation programs | BODC's Underway Data Processing System (BUDS) | |
| Edteva — our former tidal processing program | The BODC Explorer software package |