[Thread Prev][Thread Next][Index]

Re: [ferret_users] pyferret crash while storing large dataset - alternatives to prepare an ensemble?



Hi Ansley and list,

exactly, DEFINE DATA, ncecat and other methods to make an ensemble are only the next step and probably quite straightforward. My issue was with preparing the data for that step by unifying grids. I do have shorter time axes already, and a ragged array in E. As described below, I prepare the data to be an ensemble with ferret in the following way:

1) read in a dataset
2) define long common axes for T and E that will be shared by all files
3) loop over all variables in the files using the file attribute ..varnames (variable names are the same in all files)
4) write out all variables on the new common T and E axes

I did not find a simpler way to "extend" grids, e.g., adding a certain number of undefined data points along the axis E of an existing file.

The error message I previously got was cryptic to me, but with the ideas below and 'set memory/size=10000' it worked out fine.


Thanks,
Hella



________________________________________
From: Ansley Manke - NOAA Federal <ansley.b.manke@xxxxxxxx>
Sent: Thursday, April 27, 2017 16:46
To: Riede, Hella
Cc: ferret users
Subject: Re: [ferret_users] pyferret crash while storing large dataset - alternatives to prepare an ensemble?

Hella,
You might try an experiment or two, perhaps making a much shorter common time axis and doing the commands without the netCDF compression, to see if that might be related to the crash.  I'm not aware of any problems with that in the netCDF4 calls, but that might be worth checking.

The "DEFINE DATA" aggregations on the fly would want to create an ensemble by defining an E axis and treating the ensemble members as a list on that new axis.  Since your data have an E axis already, in order to use that you'd need to perhaps open your datasets with and force the data onto another grid, maybe with USE/ORDER=TF  - but as you say the variables are not coming in on the same grid so this probably doesn't quite fit.

Ansley

On Thu, Apr 27, 2017 at 8:24 AM, Riede, Hella <hella.riede@xxxxxxx<mailto:hella.riede@xxxxxxx>> wrote:
Hi Karl,

thanks for your idea. As to the aggregate command, this only works for "those variables from the datasets that have identical variable names and grids.". I do have identical variable names in all files, but not identical grids yet.

I try exactly this: to prepare the datasets to become an ensemble by giving them identical grids, but at the moment they have a ragged array in E, and only partial overlap in T.


Best wishes,
Hella

________________________________________
From: Karl Smith - NOAA Affiliate <karl.smith@xxxxxxxx<mailto:karl.smith@xxxxxxxx>>
Sent: Thursday, April 27, 2017 3:30:13 AM
To: Riede, Hella
Cc: ferret users
Subject: Re: [ferret_users] pyferret crash while storing large dataset - alternatives to prepare an ensemble?

Hi Hella,

The cryptic error message is showing the (last) line in the 'pyferret' shell script making the call to run PyFerret in Python.  So yes, not meaningful and wishing there was more output indicating what the actual problem is and where it is coming from.

As for aggregating your datasets, have you tried the define data /aggregate command?

http://ferret.pmel.noaa.gov/Ferret/documentation/users-guide/commands-reference/DEFINE#_define_data

I must admit I did not look closely enough at what you had to see if one of the options would work.

Karl


On Wed, Apr 26, 2017 at 8:25 PM, Riede, Hella <hella.riede@xxxxxxx<mailto:hella.riede@xxxxxxx><mailto:hella.riede@xxxxxxx<mailto:hella.riede@xxxxxxx>>> wrote:
Hello ferreters,

I am storing a large dataset, looping over variables. The main line for that is
save/quiet/file=($out)/NCFORMAT=4/DEFLATE=4/append ($var);\

After the file has grown to 713 MB, the script crashes with the following error message:

/afs/ipp/.cs/python_modules/amd64_generic/pyferret/anaconda/2/4.1.1/bin/ferret: line 17: 40827 Killed                  python ${python_flags} -c "import sys; import pyferret; (errval, errmsg) = pyferret.init(sys.argv[1:], True)" "$@"

Same happens when using DEFLATE=1 instead as recommended.

This is quite cryptic to me - can anybody help?



*Background* (Maybe there's another more efficient way?)

I have about 80 large netCDF files with 2 dimensions - time and E. The times overlap partially and cover altogether about 3 years. The E dimension has different lengths. I'd like to combine those datasets into an ensemble / one file, for instance along F.

To combine the data as an ensemble, I have to unify the existing dimensions first, so I
1) read in a dataset
2) define the unified axes for T and E
3) loop over all variables in the files (they are the same in all files)
4) write out all variables on the new T and E axes.

After having written 713 MB of the new file, which corresponds to 31 of the 115 variables, the above error occurs. There's no special issues with the variable where it happens, no non-standard name.


An other method I have tried: Enlarge the E dimension by producing an empty hyperslab and glueing this onto the original dataset via ncrcat. This is impossibly slow.


Any other way to efficiently fill a file with undefined values to make them ready for being an ensemble?


Thanks in advance!

Hella





--
Karl M. Smith, Ph.D.
JISAO Univ. Wash. and PMEL NOAA
"The contents of this message are mine personally and do
not necessarily reflect any position of the Government
or the National Oceanic and Atmospheric Administration."





[Thread Prev][Thread Next][Index]
Contact Us
Dept of Commerce / NOAA / OAR / PMEL / Ferret

Privacy Policy | Disclaimer | Accessibility Statement