[Thread Prev][Thread Next][Index]
Re: [las_users] how to frequently update thredds dataset in las
Hi Robert,
You are struggling with issues that have been our radar screen for a
while. I appreciate having the details of your use case so we can make
sure the solutions we're working on meet your needs as best we can.
Below are some questions, some ideas and a few solutions.
Robert Fuller wrote:
Hi,
We're setting up a configuration where a Thredds dataset is accessible
through LAS. The dataset is a regularly updated Thredds aggregation of
ncml files, an oceanographic forecast generated with ROMS. The Thredds
dataset has the same URL before and after the update.
At the moment we are using addXML to regenerate the relevant section
of las.xml, then using tomcat manager to reload the las servlet to
pick up these changes (in normal operation only the time arange
element will change in the las dataset).
This method of regenerating the las dataset is not ideal for a couple
of reasons:
1. Ideally we would use a natural dataset id rather than the one
generated from addXML (the id is based on a hash of the thredds url,
so it is persistent which is good, but it is not humanly interesting)
Can you explain to me why the ID should be "humanly interesting". I've
gotten some push back in the past from other LAS developers about using
a hash for the ID but I haven't been convinced there's a better
solution. Of course, internally LAS uses the ID to reference datasets
and if you were to create, save or type a LAS URL by hand you'd have to
know the ID, but the user interaction via the standard LAS user
interface should not directly require a human to know the ID. There's
probably a use case for having a human readable ID so I'd be interested
to hear about how it would help you.
If you're talking about the data set name that shows up in the LAS UI,
you can control that with a switch on the addXML command line to either
pass in the name you want or to tell addXML which global attribute to
use for the name.
2. Ideally we would use natural variable names rather than those
generated by addXML (perstent but not humanly interesting)
AddXML does the best job it can extracting the variable name from the
actual metadata in the file, but it could do a better job. As I type
this I realize it probably doesn't look for the CF standard name. Right
now it looks for a "long_name" attribute and uses that. If no long_name
is available it uses the actual netCDF variable name. Would it help if
it looked at the CF standard name (which are pretty ugly what with all
those underscores and no capitalization) or if you could pass in the
name of an attribute to use for the variable name.
The key to being successful with this issue is to work on addXML until
it can produce LAS configuration you like, because addXML will be the
basis of some automatic update facilities we're building right now.
3. After a number of reloads of the las servlet, tomcat process runs
short on PERM GEN space, which requires restarting the tomcat server
to resolve.
I have looked into a couple of ways of improving the situation:
i. use script to generate the las dataset xml setting the arange to
the new values, then reloading the LAS servlet. This would address
most of the issues noted above, but not number 3.
ii. modify the las.xml dataset in such a way that it will still work
after the thredds dataset has been updated. I've tried two options
a. Remove the time arange from the las dataset. This breaks the las
gui (date control vanishes) and also the wms service (including
GetCapabilities).
b. Set the time arange to an early start date with a large number of
steps. This works in the las gui provided the user knows the correct
date to pick, but still breaks wms.
My suspicion is that if I do not reload the LAS servlet we will also
encounter problems with LAS caching older views.
Not sure what you mean by this. If the only thing that happens is that
the time range of data set extends, then a cache hit on a plot for
Monday should still be valid on Tuesday even though the data set only
extended to Monday when the plot was made.
I welcome suggestions on other options for updating the las dataset
after the thredds dataset has been updated.
We have some help for this problem coming out in the next release (which
will be sometime in a couple of weeks -- I promise :-}). The next
release will include a beta version of LAS manager's interface. One
feature of the interface is a reinit process. Using this interface you
can go to a particular URL (which is access-controlled by the same
mechanisms that control access to the THREDDS Data Server installed with
LAS) and ask LAS to reload it's configuration. Since this is an
internal process to the running LAS servlet, it should not have the same
problems with the container reloading the servlet.
You probably want to hit the reinit URL from a process instead of a
browser and I'm not sure exactly how to accomplish that right now, but
some other folks we're working with want to do this also so we'll figure
out a way.
The manager's interface also includes a cache manager's interface where
you can empty the entire cache or clear the cache of only those files
associated with a particular dataset.
Finally, in a future release -- not the one coming up this month, but
relatively soon after we will have some new configuration options in
LAS. This will allow you to configure an LAS directly from the THREDDS
URL and specify how often you want LAS to re-initialize its
configuration from that catalog. The new config will look something
like this:
<dataset
src="http://pcmdi3.llnl.gov/thredds/esgcet/catalog.xml"
src_type="THREDDS" update_time="23:00" update_interval="24 hours">
<properties>
<addXML>
<esg>true</esg>
<units_format>yyyy-M-d</units_format>
<categories>true</categories>
<global_title_attribute>experiment_id</global_title_attribute>
</addXML>
</properties>
</dataset>
In this case, LAS will add the data sets and variables it finds in the
THREDDS catalog using the addXML parameters in the properties section.
This means the dates will be parsed using "yyyy-M-d", the categories
matching the THREDDS hierarchy will be included and the data sets will
be named using the value of the "experiment_id" global attribute in the
file. (The "esg" parameter means the catalog includes ESG metadata and
LAS will use that metadata when building it's configuration).
Using the update_time and update_interval, LAS will mark each dataset
with an "expires" attribute. After the update_interval has passed (24
hours in this case) the next time after that when 11pm rolls around LAS
will re-initialize itself. During that process, the configuration for
this THREDDS catalog will get regenerated because its expires date will
have passed.
If a catalog is marked with an expires date, but that date has still not
passed then the configuration for that catalog will be read from the
cache. If the original configuration did not include an update_time and
update_interval, the configuration will always be read from the cache
and the catalog will never be re-read unless the configuration for that
catalog is not found in the cache for some reason.
Finally, LAS will compute the next time it should reinitialize based on
the minimum time to the next "expires" time.
This means that addXML logic is now internal to LAS so this is why it's
critical for us to figure out how to get a configuration that meets
your needs directly from addXML. Help me out with suggestions for how
addXML could do a better job reading the catalogs you're interested in
using.
Roland
Thanks,
Robert.
[Thread Prev][Thread Next][Index]
Contact Us
Dept of Commerce /
NOAA /
OAR /
PMEL /
TMAP
Privacy Policy | Disclaimer | Accessibility Statement