[Thread Prev][Thread Next][Index]

Re: [las_users] limit on storage


This topic has me immediatlely climbing on my soapbox to talk about data management in general.  The following opinions are therefore my own and not necessarily shared by others in the LAS group.

THREDDS aggregation is a wonderful thing but it can mask over some fundamental problems in the way data is accessed.  Many data providers generate data in a time sliced fashion with each 1-3D file containing 1-N separate variables.  This is convenient for the data providers but inconvenient for those wishing to provide interactive access to the data.

LAS gives users the illusion that they are working with a 4D file with N variables.   Typical requests might be for an XY slice at a particular time and height/depth or a time series or profile at a particular location.  More than one variable may be requested but it would be atypical for a  user to request hundreds of variables at once.

Thus, for LAS, the optimal data storage strategy would be to have each variable in its own 4D file.  For very long time series you might break up the file into yearly segments of ~ 1Gb and then aggregate the segments.  Data requests will force remote servers to open at most a few files.  In your case it seems you have a THREDDS aggregation with thousands of irregular timesteps so that any time series request will force the THREDDS aggregation server to open thousands of separate files -- an expensive bunch of IO that will most likely result in non-interactive performance.

So, even though you can create THREDDS aggregations of many separate temporal snapshots, it's not necessarily a wise thing to do if you want to provide time series access.  (You could of course configure a special LAS UI behavior that allowed users to select a time but not provide access to 'views' with a time axis.  Check the bottom of this page for an example:  http://ferret.pmel.noaa.gov/LASdoc/serve/cache/50.html)

In the best of all possible worlds, data managers would take the data that is created by data providers and, where necessary, reformat it so as to provide optimal performance for data users.  After all, the work of reformatting only has to be done once but the work of opening 10K separate snapshot files has to be done every single time a user makes a time series request.

As it turns out, for irregular time axes Ferret will have to open up all those files twice -- once to read in the time axes and once to read in the data.  Yes, caching inside of Ferret and OPeNDAP will improve performance but the right way to solve the problem is to manage your data for the benefit of the end users, not the data providers.

-- Jon

Jerome King wrote:
Hi Ferreters and LASers,

I am not a Ferret expert as I have rarelly used it. I have been dealing indirectly with it because of the customization I do with the Live Access Server.

With new technologies such as THREDDS that allows to aggregate several datasets together, one can end up with really big files with a lot of time steps.

I have been reaching the infamous error message several times now:
**TMAP ERR: limit on storage for coordinates has been
reached MAX=750000

I am aggregating unevenly spaced data over time which probably makes things worse.

I was wondering if anyone found a way around this problem or if the Ferret developers plan on improving this. There is a message in the archive about using dynamic storage and this may solve the issue. Are there any updates about this?

Thanks a lot!
Jerome King.

[Thread Prev][Thread Next][Index]

Dept of Commerce / NOAA / OAR / PMEL / TMAP
Contact Us | Privacy Policy | Disclaimer | Accessibility Statement