Re: [las_users] how to frequently update thredds dataset in las

To: Robert Fuller <rfuller@xxxxxxxxx>
Subject: Re: [las_users] how to frequently update thredds dataset in las
From: Roland Schweitzer <Roland.Schweitzer@xxxxxxxx>
Date: Thu, 09 Oct 2008 09:14:57 -0500
Cc: oar.pmel.las_users@xxxxxxxx
In-reply-to: <48EDF813.3020406@xxxxxxxxx>
References: <48EDF813.3020406@xxxxxxxxx>
Sender: owner-las_users@xxxxxxxx
User-agent: Thunderbird 1.5.0.14ubu (X11/20080925)

Hi Robert,

You are struggling with issues that have been our radar screen for awhile. I appreciate having the details of your use case so we can makesure the solutions we're working on meet your needs as best we can.Below are some questions, some ideas and a few solutions.


Robert Fuller wrote:

Hi,
We're setting up a configuration where a Thredds dataset is accessiblethrough LAS. The dataset is a regularly updated Thredds aggregation ofncml files, an oceanographic forecast generated with ROMS. The Threddsdataset has the same URL before and after the update.
At the moment we are using addXML to regenerate the relevant sectionof las.xml, then using tomcat manager to reload the las servlet topick up these changes (in normal operation only the time arangeelement will change in the las dataset).
This method of regenerating the las dataset is not ideal for a coupleof reasons:
1. Ideally we would use a natural dataset id rather than the onegenerated from addXML (the id is based on a hash of the thredds url,so it is persistent which is good, but it is not humanly interesting)

Can you explain to me why the ID should be "humanly interesting". I'vegotten some push back in the past from other LAS developers about usinga hash for the ID but I haven't been convinced there's a bettersolution. Of course, internally LAS uses the ID to reference datasetsand if you were to create, save or type a LAS URL by hand you'd have toknow the ID, but the user interaction via the standard LAS userinterface should not directly require a human to know the ID. There'sprobably a use case for having a human readable ID so I'd be interestedto hear about how it would help you.

If you're talking about the data set name that shows up in the LAS UI,you can control that with a switch on the addXML command line to eitherpass in the name you want or to tell addXML which global attribute touse for the name.

2. Ideally we would use natural variable names rather than thosegenerated by addXML (perstent but not humanly interesting)

AddXML does the best job it can extracting the variable name from theactual metadata in the file, but it could do a better job. As I typethis I realize it probably doesn't look for the CF standard name. Rightnow it looks for a "long_name" attribute and uses that. If no long_nameis available it uses the actual netCDF variable name. Would it help ifit looked at the CF standard name (which are pretty ugly what with allthose underscores and no capitalization) or if you could pass in thename of an attribute to use for the variable name.

The key to being successful with this issue is to work on addXML untilit can produce LAS configuration you like, because addXML will be thebasis of some automatic update facilities we're building right now.

3. After a number of reloads of the las servlet, tomcat process runsshort on PERM GEN space, which requires restarting the tomcat serverto resolve.
I have looked into a couple of ways of improving the situation:
i. use script to generate the las dataset xml setting the arange tothe new values, then reloading the LAS servlet. This would addressmost of the issues noted above, but not number 3.
ii. modify the las.xml dataset in such a way that it will still workafter the thredds dataset has been updated. I've tried two optionsa. Remove the time arange from the las dataset. This breaks the lasgui (date control vanishes) and also the wms service (includingGetCapabilities).b. Set the time arange to an early start date with a large number ofsteps. This works in the las gui provided the user knows the correctdate to pick, but still breaks wms.
My suspicion is that if I do not reload the LAS servlet we will alsoencounter problems with LAS caching older views.

Not sure what you mean by this. If the only thing that happens is thatthe time range of data set extends, then a cache hit on a plot forMonday should still be valid on Tuesday even though the data set onlyextended to Monday when the plot was made.

I welcome suggestions on other options for updating the las datasetafter the thredds dataset has been updated.

We have some help for this problem coming out in the next release (whichwill be sometime in a couple of weeks -- I promise :-}). The nextrelease will include a beta version of LAS manager's interface. Onefeature of the interface is a reinit process. Using this interface youcan go to a particular URL (which is access-controlled by the samemechanisms that control access to the THREDDS Data Server installed withLAS) and ask LAS to reload it's configuration. Since this is aninternal process to the running LAS servlet, it should not have the sameproblems with the container reloading the servlet.

You probably want to hit the reinit URL from a process instead of abrowser and I'm not sure exactly how to accomplish that right now, butsome other folks we're working with want to do this also so we'll figureout a way.

The manager's interface also includes a cache manager's interface whereyou can empty the entire cache or clear the cache of only those filesassociated with a particular dataset.

Finally, in a future release -- not the one coming up this month, butrelatively soon after we will have some new configuration options inLAS. This will allow you to configure an LAS directly from the THREDDSURL and specify how often you want LAS to re-initialize itsconfiguration from that catalog. The new config will look somethinglike this:

<datasetsrc="http://pcmdi3.llnl.gov/thredds/esgcet/catalog.xml";src_type="THREDDS" update_time="23:00" update_interval="24 hours">

            <properties>
                <addXML>
                    <esg>true</esg>
                    <units_format>yyyy-M-d</units_format>
                    <categories>true</categories>

<global_title_attribute>experiment_id</global_title_attribute>

                </addXML>
            </properties>
       </dataset>

In this case, LAS will add the data sets and variables it finds in theTHREDDS catalog using the addXML parameters in the properties section.This means the dates will be parsed using "yyyy-M-d", the categoriesmatching the THREDDS hierarchy will be included and the data sets willbe named using the value of the "experiment_id" global attribute in thefile. (The "esg" parameter means the catalog includes ESG metadata andLAS will use that metadata when building it's configuration).

Using the update_time and update_interval, LAS will mark each datasetwith an "expires" attribute. After the update_interval has passed (24hours in this case) the next time after that when 11pm rolls around LASwill re-initialize itself. During that process, the configuration forthis THREDDS catalog will get regenerated because its expires date willhave passed.

If a catalog is marked with an expires date, but that date has still notpassed then the configuration for that catalog will be read from thecache. If the original configuration did not include an update_time andupdate_interval, the configuration will always be read from the cacheand the catalog will never be re-read unless the configuration for thatcatalog is not found in the cache for some reason.

Finally, LAS will compute the next time it should reinitialize based onthe minimum time to the next "expires" time.

This means that addXML logic is now internal to LAS so this is why it'scritical for us to figure out how to get a configuration that meetsyour needs directly from addXML. Help me out with suggestions for howaddXML could do a better job reading the catalogs you're interested inusing.


Roland


Thanks,
Robert.

Follow-Ups:
- Re: [las_users] how to frequently update thredds dataset in las
  - From: Robert Fuller

References:
- [las_users] how to frequently update thredds dataset in las
  - From: Robert Fuller

Previous by thread: [las_users] how to frequently update thredds dataset in las
Next by thread: Re: [las_users] how to frequently update thredds dataset in las

[Thread Prev][Thread Next][Index]

Dept of Commerce / NOAA / OAR / PMEL / TMAP