Adam,
The script that converts GDS xml output into LAS xml input may be
working, but we first need to do some data aggregation. The GDS as set
up does not automatically aggregate the data files so the GDS-->LAS
script is trying to create a separate configuration file for each entry.
The whole point of LAS is to allow users to slice and dice the data any
way they want. This will require you (or whoever is managing these
datasets) to create aggregations of individual files. This can be
done by creating large files or by using an aggregation tool like the
OPeNDAP Aggregation server or our own FDS. I expect that GDS can be
instructed to do these aggregations as well but I haven't done this
myself.
Looking at the GHCN directory
http://nomads.ncdc.noaa.gov:9090/dods/NCDC_CLIMATE_GHCN
There are 128 separate files that differ only in the year they
represent. At the bottom of this list are two composite files for
individual variables. These composite files are what you want to use
when generating xml files for LAS. (The composite files in this
directory aren't working at the moment.)
I'm afraid there will not be any magic bullet scripts that get you
around the need to aggregate the data. Once that job is done you
should have a small number of datasets (or dataset aggregations) that
you can point addXml.pl at to generate the LAS dataset configuration
files. You may need (or want) to modify these configuration files by
hand to create the best possible presentation of your data.
Sorry we can't automate everything for you but this is the state of the
art so far.
-- Jon
Adam Baxter wrote:
Jon,
The xmls were generated by the script that I was pointed to. I then
pointed that script at our first GDS server which resides at
http://nomads.ncdc.noaa.gov:9090/dods/.
The script read from http://nomads.ncdc.noaa.gov:9090/dods/xml and
after close to a day of processing (I killed it this morning) the
resulting gds-datasets.xml weighs in at around 310mbs.
Let's see. Where to begin?
We have the GHCN - 128 entries, 12 months worth of monthly max, min,
and mean temperatures for each entry.
The GHCNP - 106 entries, 12 months worth of monthly precipitation
anomoly for each entry.
The Extended Reconstructed Global SST - 153 entries, 12 months worth
of temp. in celcuis for each entry.
The NOAAPort ETA - 22 months worth, every day has ~16 entries with a
variable count ranging from 11-53, possibly higher.
The NOAAPort GFS is similar. 22 months worth, every day has ~23 entries
with a variable count ranging from 18-60.
The NOAAPort RUC has 22 months worth, every day having ~26 entries with
variable counts starting at 30.
The SRRS has 20 months worth, every day having ~17 entries with one
variable each - pwatclm, rhprs, hgtprs (1000mb), hgtprs (500mb), and
cwdi.*
*3 - I'll have to get back to you on that.
4 - I'm going to try that now. Here's hoping.
*
*Thanks,
Adam
Jonathan S Callahan wrote:
Adam,
I'm surprised that you have 300 Megabytes of .xml files. Our NVODS
server, the largest collection of data in an LAS that we are aware of,
has only 1.8 Megabytes of .xml files which represent several Terabytes
of data.
Can you point us to the data repository you are trying to set up in LAS
so that we have some idea what your up against. It would also be
helpful to know the following:
1. What version of LAS are you using?
2. How many datasets and variables are you trying to set up in LAS?
3. Can you describe the grids these variables are on?
4. Have you gotten LAS set up with a subset of your data?
Thanks,
-- Jon
Adam Baxter wrote:
Oh fun,
When your set of xml files together is roughly 300 mbs (and that's
still not all of it), what does one do? genLas dies after a few minutes
after sucking up around 3gigs of system memory, complaining about being
out of memory. This takes roughly six minutes.
Any ideas?
Adam
|