[Thread Prev][Thread Next][Index]

Re: [ferret_users] memory strategies for handling large computational requests



Hi Patrick,
Ferret does not need to load the whole grid into memory to do these operations. Ferret will try to break up the calculation itself, particularly with the memory-use enhancements in v7.2.  For the operations you're using ferret will break up this computation.  Not all computations can be broken up, either because of the nature of the operations themselves, or because we haven't implemented all combinations of operations.  For instance function calls  are not  broken up.  If that is the case for your computations, then writing a file will likely be necessary.  Doing the operations in pieces in the T direction and appending is a good option. You can also append in other directions, as long as you first save the entire grid, containing say, missing data, and then use APPEND/K= /J= /I=  to overwrite the file as the actual results are computed.  See a few small examples of that in chapter 10 of the Ferret users Guide, examples 4, 4a, and 5.

I'm going to do an example with your operations in some detail, since few of us have explored this.


yes? use https://vesg.ipsl.upmc.fr/thredds/dodsC/IPSLFS/brocksce/tmp/CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc

yes? show data
     currently SET data sets:
    1> https://vesg.ipsl.upmc.fr/thredds/dodsC/IPSLFS/brocksce/tmp/CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc  (default)
 name     title                             I         J         K         L
 TIME_CENTERED
          Time axis                        ...       ...       ...       1:8280
 TIME_CENTERED_BOUNDS
                                           1:2       ...       ...       1:8280
 TRANSPIR Transpiration                    1:144     1:143     1:13      1:8280
 AREAS    Mesh areas                       1:144     1:143     ...       ...
 CONTFRAC Continental fraction             1:144     1:143     ...       ...
 

(I made a local dataset with the same size grid for testing, as reading lots of data from the thredds server is somewhat slow.)

Diagnostic mode lists information as Ferret runs, noting when it puts tasks on its operation stack, reads data, computes transformations, and does "gathering".  The memory management in Ferret v7.2 breaks up computations into parts, executes those parts while saving partial results and finally "finalizes" by putting the results together.  Earlier versions of Ferret also break computations up into pieces to  reduce the amount of data that must be read in; in fact earlier Ferret versions do this particular thing in exactly the same way - the example below works with any old Ferret version. V7.2 would also break up the computation along the axes being compressed if that was the only way to shorten the computation.

I'll put in some comments here in orange,

First a shorter example, compute the result on L=1:360. 

yes? let var=TRANSPIR[k=@sum, x=@ave, y=@ave]

yes? set memory/size=200
yes? set mode diagnostic ! this is not needed, but lets us see Ferret memory management in action

yes? save/clobber/file=file1.nc
/L=1:360 var[l=@sbx:120]

 getgrid EX#1     C:  5 dset:   1 I:      1      1  J:    1    1  K:    1    1  L:      1      1  M:    1    1  N:    1    1
 getgrid VAR      C:  7 dset:   1 I:      1      1  J:    1    1  K:    1    1  L:      1      1  M:    1    1  N:    1    1
 allocate dynamic grid GBC3            LON       LAT       VEGET1    TIME_COUNT
 allocate dynamic grid GBC3            LON       LAT       VEGET1    TIME_COUNT
 

Here Ferret is setting up to get data for L=1:420, to be able to correctly return the boxcar smoother var[L@SBX:120] on L=1:360, and sets up "gathering" to return the averaging and sum requests
strip limits reconciliation : EX#1
 eval    EX#1     C:  5 dset:   1 I:      1    144  J:    1   90  K:    1   13  L:      1    360
 strip --> VAR[L=1:360@SBX:120,D=1]
 eval    VAR      C:  8 dset:   1 I:      1    144  J:    1   90  K:    1   13  L:      1    420
 strip gathering TRANSPIR on T axis:        1      420 dset:   1           420=request     100000000=availableMem
 strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 strip --> TRANSPIR[Y=90S:90N@AV4,D=1]

It will read and compute the XY averages and Z sum for each subset in L:
strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
 reading TRANSPIR M: 17 dset:   1 I:      1    144  J:    1   90  K:    1   13  L:      1    138
 doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
 final --> TRANSPIR[Y=90S:90N@AV4,D=1]
 doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 doing gathering TRANSPIR on T axis:        1      138 dset:   1           138=request      99999724=availableMem
 strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
 reading TRANSPIR M: 13 dset:   1 I:      1    144  J:    1   90  K:    1   13  L:    139    276
 doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
 final --> TRANSPIR[Y=90S:90N@AV4,D=1]
 doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 doing gathering TRANSPIR on T axis:      139      276 dset:   1           138=request      99999304=availableMem
 strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
 reading TRANSPIR M: 10 dset:   1 I:      1    144  J:    1   90  K:    1   13  L:    277    414
 doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
 final --> TRANSPIR[Y=90S:90N@AV4,D=1]
 doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 doing gathering TRANSPIR on T axis:      277      414 dset:   1           138=request      99999304=availableMem
 strip --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 strip --> TRANSPIR[Y=90S:90N@AV4,D=1]
 reading TRANSPIR M:  7 dset:   1 I:      1    144  J:    1   90  K:    1   13  L:    415    420
 doing --> TRANSPIR[Y=90S:90N@AV4,D=1]
 final --> TRANSPIR[Y=90S:90N@AV4,D=1]
 doing --> TRANSPIR[Z=0.5:13.5@SUM,D=1]
 doing gathering TRANSPIR on T axis:      415      420 dset:   1             6=request      99999568=availableMem

And finally Ferret does the @SBX on the entire time series of XY averaged, Z summed data, returning
VAR[L=1:360@SBX:120]
 doing --> VAR[L=1:360@SBX:120,D=1]
 LISTing to file file1.nc

yes?

Now do the whole set, using a larger memory setting.  It does perhaps 12-15 different reads.
yes? set mem/siz=400
yes? save/clobber/file=file2.nc   var[l=@sbx:120]
...
 doing --> VAR[T=01-JAN-232018:00:31-DEC-300918:00@SBX:120,D=1]

 LISTing to file file2.nc


Verify that the operations on a subset of the data, computed in different chunks, matches what is computed for the entire dataset.


yes? cancel data/all
yes? cancel var/all

yes? use file1.nc, file2.nc
yes? list/l=60:70 var[d=2] - var[d=1]
             VARIABLE : VAR[D=file2] - VAR[D=file1]
             SUBSET   : 11 points (TIME)
             CALENDAR : NOLEAP
 16-DEC-2324 12 / 60:    ....
 16-JAN-2325 12 / 61:  0.0000
 15-FEB-2325 00 / 62:  0.0000
 16-MAR-2325 12 / 63:  0.0000
 16-APR-2325 00 / 64:  0.0000
 16-MAY-2325 12 / 65:  0.0000
 16-JUN-2325 00 / 66:  0.0000
 16-JUL-2325 12 / 67:  0.0000
 16-AUG-2325 12 / 68:  0.0000
 16-SEP-2325 00 / 69:  0.0000
 16-OCT-2325 12 / 70:  0.0000

etc.


On 11/22/2017 2:57 AM, Patrick Brockmann wrote:
Hi ferreters,

I would like to plot time series with quite huge file (8.3G) from a variable XYZT (144x143x13x8280).
I have worked with last 7.2 ferret release and tried different increases of memory without success.
I always get  **ERROR: request exceeds memory setting

to break up my request into fragments.
Is it the best solution in my case ?

My ressource is available from

Typical code lines are:

yes? use CM6012.1-pi-ttop-02_23200101_30091231_1M_transpir.nc
yes? let var=TRANSPIR[k=@sum, x=@ave, y=@ave]
yes? plot var[l=@sbx:120]

! the following pass because I have limites time range (1:1200)
yes? plot var[l=1:1200@sbx:120]


Any help welcome.
Regards
Patrick

--
Data Analysis and Visualization Engineer
LSCE/IPSL, CEA-CNRS-UVSQ laboratory
LSCE - Climate and Environment Sciences Laboratory
IPSL - Institut Pierre Simon Laplace
--



[Thread Prev][Thread Next][Index]
Contact Us
Dept of Commerce / NOAA / OAR / PMEL / Ferret

Privacy Policy | Disclaimer | Accessibility Statement