National Oceanic and
Atmospheric Administration
United States Department of Commerce


 

FY 2007

Compression of MOST Propagation Database

Tolkova, E.

NOAA Tech. Memo. OAR PMEL-134, NTIS: PB2007-108218, 9 pp (2007)


The MOST Propagation Database consists of approximately 1000 file triplets, representing time series for wave height, meridional currents, and zonal currents in a modeled tsunami caused by each of 804 unit earthquakes (tsunami sources) in the Pacific and 194 in the Atlantic Ocean. The data represents a 24-hour-long evolution of a tsunami with 1-minute time resolution and 16 angular minute space resolution in both directions. These data comprise three 646 × 516 × 1441 blocks of individual floating-point values for each Pacific source file (the space grid size is different for the Atlantic). The size of each of those data blocks is 1832 Mbyte, while the whole database (tsunami data and accompanying information) for the Pacific region only is 4.2 TB (tera = 10 = 2). This volume of data presents problems with access, storage, and distribution, and hence employing some compression technique to reduce its size is desirable. Donald Denbo reduced the database size to about one half by rearranging data in time series of variable length. In his compression scheme the only data retained are from the moment the data values became greater than some threshold value and these data files are then supplemented by 2D arrays of starting indexes, ending indexes, and starting times (Venturato et al., 2005).

To reduce data volume further, individual time series have been quantized and compressed using Differential Pulse Code Modulation. Currently, data are kept with precision 0.001 cm for water height and 0.0001 cm/sec for velocities. The total size of the entire Pacific database has been reduced to 266 GB, or 6% of its original size, while no visible changes have occurred in either the database time series or in results of MOST calculations that utilize the quantized time series as input. The compression algorithm used is described in the present paper in the following sections:

  1. How the data are encoded
  2. How the data are stored
  3. How much the data can be compressed
  4. How the precision of quantization in MOST input affects MOST output



Feature Publications | Outstanding Scientific Publications

Contact Sandra Bigley |