National Oceanic and Atmospheric Administration
NOAA logo PMEL - A leader in developing ocean observing systems

 

FY 2007

Compression of MOST Propagation Database

Tolkova, E.

NOAA Tech. Memo. OAR PMEL-134, NTIS: PB2007-108218, 9 pp (2007)


The MOST Propagation Database consists of approximately 1000 file triplets, representing time series for wave height, meridional currents, and zonal currents in a modeled tsunami caused by each of 804 unit earthquakes (tsunami sources) in the Pacific and 194 in the Atlantic Ocean. The data represents a 24-hour-long evolution of a tsunami with 1-minute time resolution and 16 angular minute space resolution in both directions. These data comprise three 646 × 516 × 1441 blocks of individual floating-point values for each Pacific source file (the space grid size is different for the Atlantic). The size of each of those data blocks is 1832 Mbyte, while the whole database (tsunami data and accompanying information) for the Pacific region only is 4.2 TB (tera = 10 = 2). This volume of data presents problems with access, storage, and distribution, and hence employing some compression technique to reduce its size is desirable. Donald Denbo reduced the database size to about one half by rearranging data in time series of variable length. In his compression scheme the only data retained are from the moment the data values became greater than some threshold value and these data files are then supplemented by 2D arrays of starting indexes, ending indexes, and starting times (Venturato et al., 2005).

To reduce data volume further, individual time series have been quantized and compressed using Differential Pulse Code Modulation. Currently, data are kept with precision 0.001 cm for water height and 0.0001 cm/sec for velocities. The total size of the entire Pacific database has been reduced to 266 GB, or 6% of its original size, while no visible changes have occurred in either the database time series or in results of MOST calculations that utilize the quantized time series as input. The compression algorithm used is described in the present paper in the following sections:

  1. How the data are encoded
  2. How the data are stored
  3. How much the data can be compressed
  4. How the precision of quantization in MOST input affects MOST output



Contact Sandra Bigley |
Acronyms | Outstanding PMEL Publications
About Us | Research | Publications | Data | Infrastructure | Theme Pages | Education
US Department of Commerce | NOAA | OAR | PMEL
Pacific Marine Environmental Laboratory
NOAA /R/PMEL
7600 Sand Point Way NE
Seattle, WA 98115
  Phone: (206) 526-6239
Fax: (206) 526-6815
Contacts
Privacy Policy | Disclaimer | Accessibility Statement
oar.pmel.webmaster@noaa.gov
Watch PMEL's YouTube Channel